[Vegan-commits] r1940 - in pkg/vegan: R inst man

noreply at r-forge.r-project.org noreply at r-forge.r-project.org
Sun Oct 9 19:53:23 CEST 2011


Author: jarioksa
Date: 2011-10-09 19:53:22 +0200 (Sun, 09 Oct 2011)
New Revision: 1940

Modified:
   pkg/vegan/R/permutest.cca.R
   pkg/vegan/inst/ChangeLog
   pkg/vegan/man/anova.cca.Rd
Log:
parallelized code for all OS, and 'permutations' argument can be a permute::shuffleSet() object

Modified: pkg/vegan/R/permutest.cca.R
===================================================================
--- pkg/vegan/R/permutest.cca.R	2011-10-09 17:37:16 UTC (rev 1939)
+++ pkg/vegan/R/permutest.cca.R	2011-10-09 17:53:22 UTC (rev 1940)
@@ -7,17 +7,22 @@
 `permutest.cca` <-
     function (x, permutations = 99,
               model = c("reduced", "direct", "full"), first = FALSE,
-              strata = NULL, parallel = 1, ...) 
+              strata = NULL, parallel = 1, kind = c("snow", "multicore"),...) 
 {
+    kind <- match.arg(kind)
+    parallel <- as.integer(parallel)
     model <- match.arg(model)
     isCCA <- !inherits(x, "rda")
     isPartial <- !is.null(x$pCCA)
     ## Function to get the F statistics in one loop
-    getF <- function (R, ...)
+    getF <- function (indx, ...)
     {
+        if (!is.matrix(indx))
+            dim(indx) <- c(1, length(indx))
+        R <- nrow(indx)
         mat <- matrix(0, nrow = R, ncol = 3)
         for (i in seq_len(R)) {
-            take <- permuted.index(N, strata)
+            take <- indx[i,]
             Y <- E[take, ]
             if (isCCA)
                 wtake <- w[take]
@@ -101,18 +106,29 @@
         runif(1)
     seed <- get(".Random.seed", envir = .GlobalEnv, inherits = FALSE)
     ## permutations
-    if (parallel > 1 && getRversion() >= "2.14" && require(parallel)
-        && .Platform$OS.type == "unix") {
-        R <- ceiling(permutations/parallel)
-        mc.reset.stream()
-        tmp <- do.call(rbind, mclapply(seq_len(parallel), getF, R = R,
-                                       mc.cores = parallel))
+    if (length(permutations) == 1) {
+        permutations <- shuffleSet(N, permutations)
+    }
+    nperm <- nrow(permutations)
+    if (parallel > 1 && getRversion() >= "2.14" && require(parallel)) {
+        if (kind == "snow") {
+            cl <- makeCluster(parallel)
+            clusterEvalQ(cl, library(vegan))
+            tmp <- parRapply(cl, permutations, function(i) getF(i))
+            tmp <- t(matrix(tmp, nrow=3))
+            stopCluster(cl)
+        } else {
+            tmp <- do.call(rbind,
+                           mclapply(1:nperm,
+                                    function(i) getF(permutations[i,]),
+                                    mc.cores = parallel))
+        }
     } else {
-        tmp <- getF(R = permutations)
+        tmp <- getF(permutations)
     }
-    num <- tmp[1:permutations,1]
-    den <- tmp[1:permutations,2]
-    F.perm <- tmp[1:permutations,3]
+    num <- tmp[,1]
+    den <- tmp[,2]
+    F.perm <- tmp[,3]
     ## Round to avoid arbitrary ordering of statistics due to
     ## numerical inaccuracy
     F.0 <- round(F.0, 12)

Modified: pkg/vegan/inst/ChangeLog
===================================================================
--- pkg/vegan/inst/ChangeLog	2011-10-09 17:37:16 UTC (rev 1939)
+++ pkg/vegan/inst/ChangeLog	2011-10-09 17:53:22 UTC (rev 1940)
@@ -31,29 +31,31 @@
 	not find function in all packages, but 'vegan' is made known, and
 	'stats' and 'base' seem to be known.
 
-	* permutest.cca: First attempt of setting 'parallel' processing in
-	permutest.cca. Currently the parallelization only works in R
-	2.14.0 (alpha) and later with the 'parallel' package, and in
-	unix-like operating systems (Linux and MacOS X were
-	tested). Function permutest.cca gets a new argument 'parallel'
-	(defaults 1) that gives the number of desired parallel
-	processes. The argument is silently ignored if the system is not
-	capable of parallel processing (missing 'parallel' package,
-	Windows). The argument may be bassed to permutest.cca() from
-	anova.cca(), but currently setting the random number generator
-	seed will fail, and the results probably will be wrong. This
-	feature is only for testing. The functionality cannot be included
-	cleanly: it depends on the package 'parallel', but suggesting
-	'parallel' fails R CMD check in the current R release (2.13.2)
-	which does not yet have 'parallel'. So we get warnings:
-	'library' or 'require' call not declared from: parallel, and
-	permutest.cca: no visible global function definition for
-	‘mclapply’.
-	Perhaps we delay adding this feature, and cancel this submission
-	later. However, with these warnings, the function passes tests in
-	R 2.13.2. (It fails in R 2.14.0 alpha since it suggests 'rgl', and
-	that package fails in R 2.14.0.)
+	* permutest.cca: implemented 'parallel' processing in
+	permutest.cca.  The parallelization only works in R 2.14.0 (alpha)
+	and later with the 'parallel' package. Function permutest.cca gets
+	a new arguments 'parallel' (defaults 1) that gives the number of
+	parallel process, and 'kind' that selects the parallelization
+	style which is either "snow" (large overhead, but works in al
+	OS's) and "multicore" (faster, but only works in unix-like systems
+	like Linux and MacOS X). The arguments are silently ignored if the
+	system is not capable of parallel processing. The functionality
+	cannot be included cleanly: it depends on the package 'parallel',
+	but suggesting 'parallel' fails R CMD check in the current R
+	release (2.13.2) which does not yet have 'parallel'. So we get
+	warnings: 'library' or 'require' "call not declared from:
+	parallel", and "permutest.cca: no visible global function
+	definition for ‘mclapply". However, with these warnings,
+	the function passes tests in R 2.13.2.
 
+	* permutest.cca: the user interface changed so that argument
+	'permutations' can be either the number permutations (like
+	previosly), or a matrix of permutations like produced by
+	permute::shuffleSet(). This was done to move RNG outside
+	parallelized code. This will also allow much simpler and
+	anova.cca* code. Currently, the 'strata' argument will not work,
+	but this will be fixed "real soon now".
+
 Version 2.1-2 (opened October 4, 2011)
 
 	* permutest.cca could not be update()d, because "permutest.cca"

Modified: pkg/vegan/man/anova.cca.Rd
===================================================================
--- pkg/vegan/man/anova.cca.Rd	2011-10-09 17:37:16 UTC (rev 1939)
+++ pkg/vegan/man/anova.cca.Rd	2011-10-09 17:53:22 UTC (rev 1940)
@@ -25,7 +25,8 @@
 
 \method{permutest}{cca}(x, permutations = 99,
           model = c("reduced", "direct", "full"),
-          first = FALSE, strata, parallel = 1, ...)
+          first = FALSE, strata, parallel = 1, kind = c("snow", "multicore"),
+          ...)
 }
 
 \arguments{
@@ -54,11 +55,15 @@
     permutation. If supplied, observations are permuted only within the
     specified strata.}
 
-  \item{parallel}{Number of parallel processes. The parallel
-    processing is only possible in \R version 2.14.x and later, and
-    currently only works in unix-like operating systems, such as Linux
-    and MacOS X. The argument is silently ignored if the system is not
-    capable of parallel processing.  }
+  \item{parallel, kind}{Number of parallel processes. The parallel
+    processing is only possible in \R version 2.14.x and later. The
+    argument is silently ignored if the system is not capable of
+    parallel processing. There are two \code{kind} of parallelization:
+    \code{kind = "snow"} selects a socket cluster which is available
+    in all operationg systems, and \code{kind = "multicore"} selects a
+    fork cluster that is available only in unix-like systems (Linux,
+    MacOS X), but is usually faster. These arguments are experimental
+    and may change or disappear in any version. }
 
 }
 \details{



More information about the Vegan-commits mailing list