[Vegan-commits] r2041 - pkg/vegan/inst/doc

noreply at r-forge.r-project.org noreply at r-forge.r-project.org
Mon Jan 9 11:20:47 CET 2012


Author: jarioksa
Date: 2012-01-09 11:20:46 +0100 (Mon, 09 Jan 2012)
New Revision: 2041

Modified:
   pkg/vegan/inst/doc/decision-vegan.Rnw
Log:
proof reading

Modified: pkg/vegan/inst/doc/decision-vegan.Rnw
===================================================================
--- pkg/vegan/inst/doc/decision-vegan.Rnw	2012-01-08 21:40:58 UTC (rev 2040)
+++ pkg/vegan/inst/doc/decision-vegan.Rnw	2012-01-09 10:20:46 UTC (rev 2041)
@@ -48,13 +48,14 @@
   \R{} version 2.14.0.}  The \pkg{parallel} package in \R{} implements
 the functionality of earlier contributed packages \pkg{multicore} and
 \pkg{snow}.  The \pkg{multicore} functionality forks the analysis to
-the multiple cores and \pkg{snow} functionality sets up a socket
-cluster.  The \pkg{multicore} functionality only works in unix-like
+multiple cores. and \pkg{snow} functionality sets up a socket cluster
+of workers.  The \pkg{multicore} functionality only works in unix-like
 systems (such as MacOS and Linux), but \pkg{snow} functionality works
-in all OS's.  \pkg{Vegan} can use either method, but defaults to
-\pkg{multicore} functionality when this is available, because its fork
-processes are usually faster.  This chapter describes both the user
-interface and internal implementation for the developers.
+in all operating systems.  \pkg{Vegan} can use either method, but
+defaults to \pkg{multicore} functionality when this is available,
+because its forked clusters are usually faster.  This chapter
+describes both the user interface and internal implementation for the
+developers.
 
 \subsection{User interface}
 \label{sec:parallel:ui}
@@ -74,7 +75,7 @@
     socket clusters will be set up, initialized and closed in Windows.
   \item The argument of \code{parallel} can be a previously created
     socket cluster. This saves time as the cluster is not set up and
-    closed repeatedly.  If the argument is a socket cluster, they will
+    closed repeatedly.  If the argument is a socket cluster, it will
     also be used in unix-like systems. Setting up a socket cluster is
     discussed in \S~\ref{sec:parallel:socket}.  
 \end{enumerate}
@@ -91,7 +92,7 @@
 @ 
 
 The \code{mc.cores} option is defined in the \pkg{parallel} package,
-but it is usualy unset in which case \pkg{vegan} will default to
+but it is usually unset in which case \pkg{vegan} will default to
 non-parallel computation.  The \code{mc.cores} option can be set by
 the environmental variable \code{MC_CORES}.
 
@@ -108,14 +109,14 @@
 \label{sec:parallel:socket}
 
 If socket clusters are used (and they are the only alternative in
-Windows), it is often wise and faster to set a cluster before calling
-parallelized code in \pkg{vegan} and use the pre-defined cluster as
-the argument for the \code{parallel} argument.  If you want to use
+Windows), it is often wise to set up a cluster before calling
+parallelized code and give the pre-defined cluster as the value of
+the \code{parallel} argument in \pkg{vegan}.  If you want to use
 socket clusters in unix-like systems (MacOS, Linux), this can be only
 done with pre-defined clusters as these systems default to fork
 clusters. If you use socket clusters, you must pre-define your
-clusters if you need to use other functions than those in
-\pkg{vegan}. 
+clusters when you need to other functions than those in \pkg{vegan}
+and basic \R.
 
 If socket cluster is not set in Windows, \pkg{vegan} will set and
 close the cluster within the function body. This involves following commands:
@@ -125,42 +126,32 @@
 stopCluster(clus)
 @ 
 The first command sets up the cluster, in this case with four
-cores. The second command makes \pkg{vegan} and \pkg{parallel}
-commands known to the established cluster and allows their use within
-the parallel code. Finally, the third command stops the cluster.  You
-should give the two first commands to establish a cluster used with
-\pkg{vegan} commands, and after finishing all parallel processing you
-should \code{stopCluster}.
+cores. The second command makes \pkg{vegan} (and \pkg{permute} that is
+also loaded) functions known to the cluster and allows their use
+within the parallel code. Finally, the third command stops the
+cluster.  You should give the two first commands to set up a cluster
+used with \pkg{vegan} commands, and after finishing all parallel
+processing you should \code{stopCluster}.
 
-If you need other packages than \pkg{vegan} and \pkg{parallel}, you
-must made those known to your cluster with \code{clusterEvalQ}, or
-alternatively with \code{clusterCall} (and perhaps even with
-\code{clusterExport}).  This is unnecessary in most parallel code in
-\pkg{vegan}, but you can define your own functions in \code{oecosimu}.
-If your own functions contain functions or elements from other
-packages, you must use pre-defined clusters and define all these
-external packages with \code{clusterEvalQ}.  The parallel processing
-will fail in Windows if you only give the integer value to the
-\code{parallel} argument is such cases. You must set the cluster in
-the session and call \code{oecosimu} giving the cluster to the
-\code{parallel} argument. 
+If you need other packages than \pkg{vegan} and \pkg{permute}, you
+must make those known to your cluster with \code{clusterEvalQ}, or
+similar commands (\code{clusterCall}, \code{clusterExport}).  This is
+unnecessary in most parallel code in \pkg{vegan}, but in
+\code{oecosimu} you can define your own functions, and if these
+contain functions or items from other packages, you must use
+pre-defined clusters and declare all these external packages with
+\code{clusterEvalQ}.
 
 If you pre-set the cluster, you can also use \pkg{snow} style clusters
 in unix-like systems.  
 
-In \R-devel you can set a default socket cluster
-(\code{setDefaultCluster}) and  that will be used for parallel
-processing in all operating systems.  Such default cluster must have
-defined \code{clusterEvalQ} for \code{library(vegan)} and all other
-necessary packages.
-
 \subsubsection{Random number generation}
 
 \pkg{Vegan} does not use parallel processing in random number
 generation.  This means that you do not need to define the type of the
-random number generator.  You can set the seed for the standard random
-number generation, and setting the seed for the parallelized generator
-(L'Ecuyer) has no effect in \pkg{vegan}.
+random number generator for parallel processing.  You can set the seed
+for the standard random number generation, and setting the seed for
+the parallelized generator (L'Ecuyer) has no effect in \pkg{vegan}.
 
 \subsubsection{Does it pay off?}
 
@@ -171,19 +162,19 @@
 \code{library(vegan)} with \code{clusterEvalQ} can take two seconds,
 and only pays off if the non-parallel analysis takes close to ten
 seconds. Using pre-defined clusters will reduce the overhead, but not
-completely.  Fork cluster (in unix-likes operating systems) has
+completely.  Fork clusters (in unix-likes operating systems) have
 smaller overhead and can be faster. 
 
-Parallel processes also need parallel memory, and for a large number
-of processors you also need large memory.  If the memory is exhausted,
-the parallel processes can stall and can take a huge amount longer
-time than non-parallel processes (minutes instead of seconds).
+Each parallel process needs memory, and for a large number of
+processes you need much memory.  If the memory is exhausted, the
+parallel processes can stall and can take much longer than
+non-parallel processes (minutes instead of seconds).
 
 If the analysis is fast, and function runs in, say, less than five
 seconds, parallel processing is rarely useful.  Parallel processing is
 useful only in slow analyses: large number of replications or
 simulations, slow evaluation of each simulation. It also seems that
-increasing the number of processors gives diminishing yields, in
+increasing the number of processes gives diminishing returns, in
 particular in socket clusters.  The danger of memory exhaustion must
 also be remembered. 
 
@@ -203,26 +194,30 @@
     number of forked processes, and in Windows it used as the number
     of workers in created socket clusters which are closed after the
     use. In socket clusters, the command \code{clusterEvalQ(clus,
-      library(vegan))} must be evaluated.
+      library(vegan))} must be evaluated for the created cluster
+    \code{clus}.
   \item If \code{parallel} is a socket cluster, it must be used in all
     operating systems, and not be closed after the analysis.
   \item If \code{parallel = NULL}, then it is assumed that a
     \code{setDefaultCluster} socket cluster has been defined and it
-    will be used in all operating systems.
+    will be used in all operating systems.\footnote{This needs better
+    heuristics, and a system should be developed where parallel
+    processing is always done when \code{setDefaultCluster} is defined
+    (this may not be possible before \R~2.15.0 is released).}
   \item If \code{parallel} is undefined (missing argument value), then
     the number of parallel processes is taken from the option
-    \code{mc.cores}, and if the option is not set, will be take as
-    \code{parallel = 1} implying non-parallel processing (in contrast
-    to the practice in the \pkg{parallel} package where the default is
-    \code{parallel = 2}.
+    \code{mc.cores}, and if the option is not set, will be taken as
+    \code{parallel = 1} implying non-parallel processing.
+  \item The fallback must be non-parallel (serial) processing.
 \end{enumerate}
 
-For the refenrence, following is the implementation in
+For the reference, following is the implementation in
 \code{oecosimu}.  The function is called with argument:
 <<eval=false>>=
 parallel = getOption("mc.cores", 1L)
 @ 
-which sets the default value. The parallel processing is done in this block:
+which sets the default value to $1$ unless option \code{mc.cores} is
+set.. The parallel processing is done in this block:
 <<eval=false>>=
     hasClus <- inherits(parallel, "cluster") || is.null(parallel)
     if ((hasClus || parallel > 1)  && require(parallel)) {
@@ -236,7 +231,7 @@
         } else {
              if (!hasClus) {
                 parallel <- makeCluster(parallel)
-                 clusterEvalQ(parallel, library(vegan))
+                clusterEvalQ(parallel, library(vegan))
             }
             simind <- parApply(parallel, x, 3, function(z)
                                applynestfun(z, fun = nestfun,
@@ -249,7 +244,9 @@
                         statistic = statistic, ...)
     }
 @ 
-The last line (after the last \code{else}) peforms non-parallel processing.
+Functions \code{mclapply} and \code{parApply} perform the actual
+parallel processing, and \code{apply} (after the last \code{else}) is
+the fall-back to non-parallel processing.
 
 \section{Nestedness and Null models}
 



More information about the Vegan-commits mailing list