[Vegan-commits] r2047 - pkg/vegan/inst/doc

noreply at r-forge.r-project.org noreply at r-forge.r-project.org
Tue Jan 17 10:34:52 CET 2012


Author: jarioksa
Date: 2012-01-17 10:34:52 +0100 (Tue, 17 Jan 2012)
New Revision: 2047

Modified:
   pkg/vegan/inst/doc/decision-vegan.Rnw
Log:
update documentation to match new model of parallel processing in oecosimu

Modified: pkg/vegan/inst/doc/decision-vegan.Rnw
===================================================================
--- pkg/vegan/inst/doc/decision-vegan.Rnw	2012-01-16 11:02:52 UTC (rev 2046)
+++ pkg/vegan/inst/doc/decision-vegan.Rnw	2012-01-17 09:34:52 UTC (rev 2047)
@@ -69,14 +69,15 @@
 For parallel processing, the \code{parallel} argument can be either
 
 \begin{enumerate}
-  \item Integer $>1$ in which case the given number of parallel
-    processes will be launched. In unix-like systems (\emph{e.g.},
-    MacOS, Linux) these will be forked \code{multicore} processes, but
-    socket clusters will be set up, initialized and closed in Windows.
-  \item A previously created socket cluster. This saves time as the
-    cluster is not set up and closed repeatedly.  If the argument is a
-    socket cluster, it will also be used in unix-like systems. Setting
-    up a socket cluster is discussed in \S~\ref{sec:parallel:socket}.
+\item Integer in which case the given number of parallel processes
+  will be launched (value $1$ launches non-parallel processing). In
+  unix-like systems (\emph{e.g.}, MacOS, Linux) these will be forked
+  \code{multicore} processes, but socket clusters will be set up,
+  initialized and closed in Windows.
+\item A previously created socket cluster. This saves time as the
+  cluster is not set up and closed repeatedly.  If the argument is a
+  socket cluster, it will also be used in unix-like systems. Setting
+  up a socket cluster is discussed in \S~\ref{sec:parallel:socket}.
 \end{enumerate}
 
 \subsubsection{Using parallel processing as default}
@@ -95,13 +96,11 @@
 non-parallel computation.  The \code{mc.cores} option can be set by
 the environmental variable \code{MC_CORES}.
 
-The development version of \R\footnote{Probably released as \R-2.15.0
-  on October 2012.} makes it possible to set up a default socket
-cluster with command \code{setDefaultCluster}.  In that case
-\pkg{vegan} will use the set default cluster if parallelized functions
-are called with argument \code{parallel = NULL}.\footnote{Something
-  better and more automatic is needed here, please help with
-  suggestion or alternative implementation.}
+The \R{} development version\footnote{Probably released as \R-2.15.0
+  in October 2012.} allows setting up a default socket cluster
+(\code{setDefaultCluster}).  In that case \pkg{vegan} will perform
+parallel processing with that default cluster.  If the \code{mc.cores}
+is also set, it takes precedence.  
 
 \subsubsection{Setting up socket clusters}
 \label{sec:parallel:socket}
@@ -114,7 +113,7 @@
 done with pre-defined clusters as these systems default to fork
 clusters.
 
-If socket cluster is not set in Windows, \pkg{vegan} will set and
+If socket cluster is not set up in Windows, \pkg{vegan} will create and
 close the cluster within the function body. This involves following commands:
 <<eval=false>>=
 clus <- makeCluster(4)
@@ -129,28 +128,16 @@
 used with \pkg{vegan} commands, and after finishing all parallel
 processing you should \code{stopCluster}.
 
-If you need other packages than \pkg{vegan} and \pkg{permute}, you
-must make those known to your cluster with \code{clusterEvalQ}, or
-similar commands (\code{clusterCall}, \code{clusterExport}).  This is
-unnecessary in most parallel code in \pkg{vegan}, but in
-\code{oecosimu} you can define your own functions, and if these
-contain functions or items from other packages, you must use
-pre-defined clusters and declare all these external packages with
-\code{clusterEvalQ}.
-
 Most parallelized \pkg{vegan} functions work similarly in socket and
 fork clusters, but in \code{oecosimu} the parallel processing is used
 to evaluate user-defined functions.  If these functions need other
-packages than \pkg{vegan}, \pkg{permute} and standard \R{} packages,
-it is necessary to use pre-defined socket clusters which declare these
-other packages.  Socket clusters are always used in Windows, and there
-the socket cluster must be reset, whereas the fork clusters in
-unix-likes work also in these cases.  For example, if you want to use
-the Ochiai dissimilarity in the function \code{dsvdis} of the
-\pkg{labdsv} package in the \code{meandist} function of the
-\code{oecosimu} example in Windows, you must pre-set the socket
-cluster, and in addition also load the \pkg{labdsv} package before the
-call: 
+packages than \pkg{vegan}, \pkg{permute} and base \R{} packages, it is
+necessary to use pre-defined socket clusters which declare these other
+packages.  For example, if you want to use the Ochiai dissimilarity in
+the function \code{dsvdis} of the \pkg{labdsv} package in the
+\code{meandist} function of the \code{oecosimu} example in Windows,
+you must pre-set the socket cluster, and in addition also load the
+\pkg{labdsv} package before the call:
 <<eval=false>>=
 ## start up and define meandist()
 library(vegan)
@@ -182,12 +169,13 @@
 Parallelized processing has a considerable overhead, and the analysis
 is faster only if the non-parallel code is really slow (takes several
 seconds in wall clock time). The overhead is particularly large in
-socket clusters (in Windows). Setting a socket cluster and evaluating
-\code{library(vegan)} with \code{clusterEvalQ} can take two seconds,
-and only pays off if the non-parallel analysis takes close to ten
-seconds. Using pre-defined clusters will reduce the overhead, but not
-completely.  Fork clusters (in unix-likes operating systems) have a
-smaller overhead and can be faster. 
+socket clusters (in Windows). Creating a socket cluster and evaluating
+\code{library(vegan)} with \code{clusterEvalQ} can take two seconds or
+longer, and only pays off if the non-parallel analysis takes ten
+seconds or longer. Using pre-defined clusters will reduce the
+overhead, but not completely.  Fork clusters (in unix-likes operating
+systems) have a smaller overhead and can be faster, but they also have
+an overhead.
 
 Each parallel process needs memory, and for a large number of
 processes you need much memory.  If the memory is exhausted, the
@@ -209,69 +197,64 @@
 
 The implementation of the parallel processing should accord with the
 description of the user interface above (\S~\ref{sec:parallel:ui}).
-The following rules should be followed:
-\begin{enumerate}
-  \item If argument \code{parallel} is specified, it should be
-    honoured despite all other default settings.
-  \item If \code{parallel} is an interger $>1$, this should be used as
-    the number of parallel processes.  In unix-likes, this is the
-    number of forked processes, and in Windows it used as the number
-    of workers in created socket clusters which are closed after the
-    use. In socket clusters, the command \code{clusterEvalQ(clus,
-      library(vegan))} must be evaluated for the created cluster
-    \code{clus}.
-  \item If \code{parallel} is a socket cluster, it must be used in all
-    operating systems, and not be closed after the analysis.
-  \item If \code{parallel = NULL}, then it is assumed that a
-    \code{setDefaultCluster} socket cluster has been defined and it
-    will be used in all operating systems.\footnote{This needs better
-    heuristics, and a system should be developed where parallel
-    processing is always done when \code{setDefaultCluster} is defined
-    (this may not be possible before \R~2.15.0 is released).}
-  \item If \code{parallel} is undefined (missing argument value), then
-    the number of parallel processes is taken from the option
-    \code{mc.cores}, and if the option is not set, will be taken as
-    \code{parallel = 1} implying non-parallel processing.
-  \item The fallback must be non-parallel (serial) processing.
-\end{enumerate}
+Function \code{oecosimu} can be used as the reference implementation,
+and similar interpretation and order of interpretation of arguments
+should be followed.  All future implementations should be consistent
+and all must be changed if the call heuristic changes.
 
-For the reference, following is the implementation in
-\code{oecosimu}.  The function is called with argument:
+The value of the \code{parallel} argument can be \code{NULL}, a
+positive integer or a socket cluster.  Integer $1$ means that no
+parallel processing is performed.  The ``normal'' default is
+\code{NULL} which in  the ``normal'' case is interpreted as $1$.  Here
+``normal'' means that \R{} is run with default settings.  Function
+\code{oecosimu} interprets the \code{parallel} arguments in the
+following way:
+\begin{enumerate} 
+\item \code{NULL}: The function is called with argument \code{parallel
+    = getOption("mc.cores")}. The option \code{mc.cores} is normally
+  unset and then the default is \code{parallel = NULL}.  This is
+  interpreted as \code{parallel = 1} in \R-2.14.x.  In \R-2.15.x (not
+  yet released) the function inspects if a default socket cluster is
+  defined.  \R-2.15.0 has an unexported environment
+  \code{parallel:::.reg} with variable \code{default} that is either
+  \code{NULL} for unset default or a socket cluster.  Querying this
+  environment is an error in \R-2.14.x so that we also need to test
+  the \R{} version.  In the following \code{oecosimu} code we first
+  see if the default cluster is set when \code{parallel = NULL}, and
+  if it is unset, the \code{parallel} will still be \code{NULL} and
+  will be changed to \code{1}.  After this, the value of
+  \code{parallel} will be either an integer or a socket cluster, and
+  information on the type is saved in variable \code{hasClus}:
 <<eval=false>>=
-parallel = getOption("mc.cores", 1L)
+    if (is.null(parallel) && getRversion() >= "2.15.0")
+        parallel <- get("default", envir = parallel:::.reg)
+    if (is.null(parallel) || getRversion() < "2.14.0")
+        parallel <- 1
+    hasClus <- inherits(parallel, "cluster")
 @ 
-which sets the default value to $1$ unless option \code{mc.cores} is
-set.. The parallel processing is done in this block:
-<<eval=false>>=
-    hasClus <- inherits(parallel, "cluster") || is.null(parallel)
-    if ((hasClus || parallel > 1)  && require(parallel)) {
-        if(.Platform$OS.type == "unix" && !hasClus) {
-            tmp <- mclapply(1:nsimul,
-                            function(i)
-                            applynestfun(x[,,i], fun=nestfun,
-                                         statistic = statistic, ...),
-                            mc.cores = parallel)
-            simind <- do.call(cbind, tmp)
-        } else {
-             if (!hasClus) {
-                parallel <- makeCluster(parallel)
-                clusterEvalQ(parallel, library(vegan))
-            }
-            simind <- parApply(parallel, x, 3, function(z)
-                               applynestfun(z, fun = nestfun,
-                                            statistic = statistic, ...))
-            if (!hasClus)
-                stopCluster(parallel)
-        }
-    } else {
-        simind <- apply(x, 3, applynestfun, fun = nestfun,
-                        statistic = statistic, ...)
-    }
-@ 
-Functions \code{mclapply} and \code{parApply} perform the actual
-parallel processing, and \code{apply} (after the last \code{else}) is
-the fall-back to non-parallel processing.
+\item Integer: An integer value is taken as the number of created
+  parallel processes.  In unix-like systems this is the number of
+  forked multicore processes, and in Windows this is the number of
+  workers in socket clusters.  In Windows, the socket clustes is
+  created, \code{library(vegan)} is evaluated in the cluster, and the
+  cluster is stopped after parallel processing.
+\item Socket cluster: If a socket cluster is given, it will be used in
+  all operating systems.  It is not created, \code{library(vegan)} is
+  not evaluated and the cluster is not stopped.
+\end{enumerate}
 
+This gives the following precedence order for parallel processing
+(highest to lowest):
+\begin{enumerate}
+  \item Explicitly given argument value of \code{parallel} will always
+    be used.
+  \item If \code{mc.cores} is set, it will be used. In Windows this
+    will mean creating and stopping socket clusters even when a
+    default cluster is set if \code{mc.cores} is not \code{NULL}. 
+  \item In \R-2.15.0 the default socket cluster will be used if set.
+  \item The fall back behaviour is no parallel processing. 
+\end{enumerate}
+
 \section{Nestedness and Null models}
 
 Some indicators of nestedness and null models of communities are only



More information about the Vegan-commits mailing list