[Vegan-commits] r2903 - in pkg/vegan: R man

Wed Oct 22 10:26:23 CEST 2014

Author: jarioksa
Date: 2014-10-22 10:26:22 +0200 (Wed, 22 Oct 2014)
New Revision: 2903

Modified:
   pkg/vegan/R/anova.cca.R
   pkg/vegan/man/permutations.Rd
Log:
Squashed commit of the following:

commit 05ced8cfb03aa540abc415be0e91baa59e486856
Merge: 2b4aa54 efd575f
Author: Jari Oksanen <jari.oksanen at oulu.fi>
Date:   Wed Oct 22 10:27:16 2014 +0300

    Merge pull request #60 from gavinsimpson/fix-argument-passing-in-anova

    pass relevant named arguments from anova.cca to permutest.cca

commit efd575f1f654348dad12245e19ca2f276ea7c7dc
Author: Gavin Simpson <ucfagls at gmail.com>
Date:   Mon Oct 20 16:19:59 2014 -0600

    pass relevant named arguments from anova.cca to permutest.cca

commit 2b4aa54974f330357b8c5323987866e96a895cf0
Author: Gavin Simpson <ucfagls at gmail.com>
Date:   Thu Oct 16 22:00:50 2014 -0600

    some minor tweaks, fixes for typos, and a clarification

commit 31d2e8f9317d1562a558e095d1ef59b093d985b8
Merge: eada1ca 845060a
Author: Gavin Simpson <ucfagls at gmail.com>
Date:   Thu Oct 16 20:59:41 2014 -0600

    Merge pull request #57 from gavinsimpson/improve-permutations-rd

    Improve permutations.Rd

commit 845060a209d9512c935b346af125abe390694431
Author: Gavin Simpson <ucfagls at gmail.com>
Date:   Thu Oct 16 20:36:16 2014 -0600

    sampleSet -> shuffleSet typo

commit cf642a5834743ad07c7f475a08a6a36aa15988a6
Author: Gavin Simpson <ucfagls at gmail.com>
Date:   Thu Oct 16 20:30:05 2014 -0600

    add davison and hinkley reference

commit 3a64ee41e734cab1074e89c9a4b2b342dc9c9627
Author: Gavin Simpson <ucfagls at gmail.com>
Date:   Wed Oct 15 21:38:03 2014 -0600

    final round of edits for draft of revised intro to permutations for vegan users

commit c7fee52e21ccf1db3f6ab19330a0b22c591eb450
Author: Gavin Simpson <ucfagls at gmail.com>
Date:   Wed Oct 15 20:24:38 2014 -0600

    first stab at updates to the permutations Rd file

Modified: pkg/vegan/R/anova.cca.R
===================================================================

--- pkg/vegan/R/anova.cca.R	2014-10-16 08:50:21 UTC (rev 2902)
+++ pkg/vegan/R/anova.cca.R	2014-10-22 08:26:22 UTC (rev 2903)
@@ -2,7 +2,7 @@
     function(object, ..., permutations = how(nperm=999), by = NULL,
              model = c("reduced", "direct", "full"),
              parallel = getOption("mc.cores"), strata = NULL,
-             cutoff = 1, scope = NULL) 
+             cutoff = 1, scope = NULL)
 {
     model <- match.arg(model)
     ## permutation matrix
@@ -21,7 +21,7 @@
             dotargs <- dotargs[isCCA]
             object <- c(list(object), dotargs)
             sol <-
-                anova.ccalist(object, 
+                anova.ccalist(object,
                               permutations = permutations,
                               model = model,
                               parallel = parallel)
@@ -54,15 +54,17 @@
         return(sol)
     }
     ## basic overall test
-    tst <- permutest.cca(object, permutations = permutations, ...)
+    tst <- permutest.cca(object, permutations = permutations,
+                         model = model, parallel = parallel,
+                         strata = strata, ...)
     Fval <- c(tst$F.0, NA)
     Pval <- (sum(tst$F.perm >= tst$F.0) + 1)/(tst$nperm + 1)
     Pval <- c(Pval, NA)
     table <- data.frame(tst$df, tst$chi, Fval, Pval)
     is.rda <- inherits(object, "rda")
-    colnames(table) <- c("Df", ifelse(is.rda, "Variance", "ChiSquare"), 
+    colnames(table) <- c("Df", ifelse(is.rda, "Variance", "ChiSquare"),
                          "F", "Pr(>F)")
-    head <- paste0("Permutation test for ", tst$method, " under ", 
+    head <- paste0("Permutation test for ", tst$method, " under ",
                   tst$model, " model\n", howHead(control))
     mod <- paste("Model:", c(object$call))
     structure(table, heading = c(head, mod), Random.seed = seed,

Modified: pkg/vegan/man/permutations.Rd
===================================================================
--- pkg/vegan/man/permutations.Rd	2014-10-16 08:50:21 UTC (rev 2902)
+++ pkg/vegan/man/permutations.Rd	2014-10-22 08:26:22 UTC (rev 2903)
@@ -3,35 +3,58 @@
 
 \title{Permutation tests in Vegan}
 \description{
-  Unless stated otherwise, vegan currently provides for two types of
-  permutation test:
+  From version 2.2-0, \pkg{vegan} has significantly improved access to
+  restricted permutations which brings it into line with those offered
+  by Canoco. The permutation designs are modelled after the permutation
+  schemes of Canoco 3.1 (ter Braak, 1990).
+
+  \pkg{vegan} currently provides for the following features within
+  permutation tests:
   \enumerate{
-    \item{Free permutation of \emph{DATA}, also known as randomisation,
+    \item{Free permutation of \emph{DATA}, also known as randomisation,}
+    \item{Free permutation of \emph{DATA} within the levels of a
+      grouping variable,}
+    \item{Restricted permutations for line transects or time series,}
+    \item{Permutation of groups of samples whilst retaining the
+      within-group ordering,}
+    \item{Restricted permutations for spatial grids,}
+    \item{Blocking, samples are never permuted \emph{between} blocks,
       and}
-    \item{Free permutation of \emph{DATA} within the levels of a factor
-      variable.}
+    \item{Split-plot designs, with permutation of whole plots, split
+      plots, or both.}
   }
-  We use \emph{DATA} to mean either the observed data themselves or some
-  function of the data, for example the residuals of an ordination model
-  in the presence of covariables.
+  Above, we use \emph{DATA} to mean either the observed data themselves
+  or some function of the data, for example the residuals of an
+  ordination model in the presence of covariables.
   
-  The second type of permutation test above is available if the function
-  providing the test accepts an argument \code{strata} or passes
-  additional arguments (via \code{\dots}) to
-  \code{permuted.index}.
+  These capabilities are provided by functions from the \pkg{permute}
+  package. The user can request a particular type of permutation by
+  supplying the \code{permutations} argument of a function with an
+  object returned by \code{\link{how}}, which defines how samples should
+  be permuted. Alternatively, the user can simply specify the required
+  number of permutations and a simple randomisation procedure will be
+  performed. Finally, the user can supply a matrix of permutations (with
+  number of rows equal to the number of permutations and number of
+  columns equal to the number of observations in the data) and
+  \pkg{vegan} will use these permutations instead of generating new
+  permutations.
+  
+  The majority of functions in \pkg{vegan} allow for the full range of
+  possibilities outlined above. Exceptions include
+  \code{\link{kendall.post}} and \code{\link{kendall.global}}.
 
-  The Null hypothesis for these two types of permutation test assumes
-  free exchangeability of \emph{DATA} (within the levels of
-  \code{strata} if specified). Dependence between observations, such as
-  that which arises due to spatial or temporal autocorrelation, or
-  more-complicated experimental designs, such as split-plot designs,
-  violates this fundamental assumption of the test and requires restricted
-  permutation test designs. The next major version of Vegan will include
-  infrastructure to handle these more complicated permutation designs.
+  The Null hypothesis for the first two types of permutation test listed
+  above assumes free exchangeability of \emph{DATA} (within the levels
+  of the grouping variable, if specified). Dependence between
+  observations, such as that which arises due to spatial or temporal
+  autocorrelation, or more-complicated experimental designs, such as
+  split-plot designs, violates this fundamental assumption of the test
+  and requires more complex restricted permutation test designs. It is
+  these designs that are available via the \pkg{permute} package and to
+  which \pkg{vegan} provides access from version 2.2-0 onwards.
 
-  Again, unless otherwise stated in the help pages for specific
-  functions, permutation tests in Vegan all follow the same
-  format/structure:
+  Unless otherwise stated in the help pages for specific functions,
+  permutation tests in \pkg{vegan} all follow the same format/structure:
   \enumerate{
     \item{An appropriate test statistic is chosen. Which statistic is
       chosen should be described on the help pages for individual
@@ -40,63 +63,109 @@
       data and analysis/model and recorded. Denote this value
       \eqn{x_0}{x[0]}.}
     \item{The \emph{DATA} are randomly permuted according to one of the
-      above two schemes, and the value of the test statistic for this
+      above schemes, and the value of the test statistic for this
       permutation is evaluated and recorded.}
     \item{Step 3 is repeated a total of \eqn{n} times, where \eqn{n} is
       the number of permutations requested. Denote these values as
       \eqn{x_i}{x[i]}, where \eqn{i = 1, ..., n}{{i = 1, \ldots, n}.}}
-    \item{The values of the test statistic for the \eqn{n} permutations
-      of the \emph{DATA} are added to the value of the test statistic
-      for the observed data. These \emph{n + 1} values represent the
-      \emph{Null} or \emph{randomisation} distribution of the test
-      statistic. The observed value for the test statistic is included
-      in the Null distribution because under the Null hypothesis being
-      tested, the observed value is just a typical value of the test
-      statistic, inherently no different from the values obtained via
-      permutation of \emph{DATA}.}
-    \item{The number of times that a value of the test statistic in the
-      Null distribution is equal to or greater than the value of the
-      test statistic for the observed data is recorded. Note the point
-      mentioned in step 5 above; the Null distribution includes the
-      \strong{observed} value of the test statistic. Denote this count
-      as \eqn{N}.}
+    \item{Count the number of values of the test statistic,
+      \eqn{x_i}{x[i]}, in the Null distribution that are as extreme as
+      test statistic for the observed data \eqn{x_0}{x[0]}. Denote this
+      count as \eqn{N}.
+
+      We use the phrase \emph{as extreme} to include cases where a
+      two-sided test is performed and large negative values of the test
+      statistic should be considered.}
     \item{The permutation p-value is computed as
-      \deqn{p = \frac{N}{n + 1}}{N / (n + 1)}}
+      \deqn{p = \frac{N + 1}{n + 1}}{(N + 1) / (n + 1)}}
   }
+  
   The above description illustrates why the default number of
-  permutations specified in Vegan functions takes values of 199 or 999
-  for example. Once the observed value of the test statistic is added to
-  this number of random permutations of \emph{DATA}, pretty p-values are
-  achievable because \eqn{n + 1} becomes 200 or 1000, for example.
+  permutations specified in \pkg{vegan} functions takes values of 199 or
+  999 for example. Pretty \emph{p} values are achieved because the
+  \eqn{+ 1} in the denominator results in division by 200 or 1000, for
+  the 199 or 999 random permutations used in the test.
 
-  The minimum achievable p-value is
-  \deqn{p_{\mathrm{min}} = \frac{1}{n +1}}{p[min] = 1 / (n + 1)}
+  The simple intuition behind the presence of \eqn{+ 1} in the numerator
+  and denominator is that these represent the inclusion of the observed
+  value of the statistic in the Null distribution (e.g. Manly 2006).
+  Phipson & Smyth (2010) present a more compelling explanation for the
+  inclusion of \eqn{+ 1} in the numerator and denominator of the
+  \emph{p} value calculation.
+
+  Fisher (1935) had in mind that a permutation test would involve
+  enumeration of all possible permutations of the data yielding an exact
+  test. However, doing this complete enumeration may not be feasible in
+  practice owing to the potentially vast number of arrangements of the
+  data, even in modestly-sized data sets with free permutation of
+  samples. As a result we evaluate the \emph{p} value as the tail
+  probability of the Null distribution of the test statistic directly
+  from the random sample of possible permutations. Phipson & Smyth
+  (2010) show that the naive calculation of the permutation \emph{p}
+  value is
+
+  \deqn{p = \frac{N}{n}}{p = (N / n)}
+
+  which leads to an invalid test with incorrect type I error rate. They
+  go on to show that by replacing the unknown tail probability (the
+  \emph{p} value) of the Null distribution with the biased estimator
+
+  \deqn{p = \frac{N + 1}{n + 1}}{p = (N + 1 / n + 1)}
   
-  A more common definition, in ecological circles, for \eqn{N} would be
-  the number of \eqn{x_i}{x[i]} greater than or equal to
-  \eqn{x_0}{x[0]}. The permutation p-value would then be defined as
-  \deqn{p = \frac{N + 1}{n + 1}}{(N + 1) / (n + 1)}
-  The + 1 in the numerator of the above equation represents the observed
-  statistic \eqn{x_0}{x[0]}. The minimum p-value would then be defined as
-  \deqn{p_{\mathrm{min}} = \frac{0 + 1}{n +1}}{p[min] = 0 + 1 / (n + 1)}
-  However this definition discriminates between the observed
-  statistic and the other \eqn{x_i}{x[i]}. Under the Null hypothesis
-  there is no such distinction, hence we prefer the definintion used in
-  the numbered steps above.
+  that the positive bias induced is of just the right size to
+  account for the  uncertainty in the estimation of the tail probability
+  from the set of randomly sampled permutations to yield a test with the
+  correct type I error rate.
 
-  One cannot simply increase the number of permutations
-  (\eqn{n}) to achieve a potentially lower p-value unless the number of
-  observations available permits such a number of permutations. This is
-  unlikely to be a problem for all but the smallest data sets when
-  free permutation (randomisation) is valid, but in designs where
-  \code{strata} is specified and there are a low number of observations
-  within each level of \code{strata}, there may not be as many actual
-  permutations of the data as you might want.
+  The estimator described above is correct for the situation where
+  permutations of the data are samples randomly \emph{without}
+  replacement. This is not strictly what happens in \pkg{vegan} because
+  permutations are drawn pseudo-randomly independent of one
+  another. Note that the actual chance of this happening is practice is
+  small but the functions in \pkg{permute} do not guarantee to generate
+  a unique set of permutations unless complete enumeration of
+  permutations is requested. This is not feasible for all but the
+  smallest of data sets or restrictive of permutation designs, but in
+  such cases the chance of drawing a set of permutations with repeats is
+  lessened as the sample size, and thence the size of set of all
+  possible permutations, increases.
+
+  Under the situation of sampling permutations with replacement then,
+  the tail probability \eqn{p} calculated from the biased estimator
+  described above is somewhat \strong{conservative}, being too large by
+  an amount that depends on the number of possible values that the test
+  statistic can take under permutation of the data (Phipson & Smyth,
+  2010). This represents a slight loss of statistical power for the
+  conservative \emph{p} value calculation used here. However, unless
+  smaples sizes are small and the the permutation design such that the
+  set of values that the test statistic can take is also small, this
+  loss of power is unlikely to be critical.
+
+  The minimum achievable p-value is
+
+  \deqn{p_{\mathrm{min}} = \frac{1}{n + 1}}{p[min] = 1 / (n + 1)}
+
+  and hence depends on the number of permutations evaluated. However,
+  one cannot simply increase the number of permutations (\eqn{n}) to
+  achieve a potentially lower p-value unless the number of observations
+  available permits such a number of permutations. This is unlikely to
+  be a problem for all but the smallest data sets when free permutation
+  (randomisation) is valid, but in restricted permutation designs with a
+  low number of observations, there may not be as many unique
+  permutations of the data as you might desire to reach the required
+  level of significance.
   
   It is currently the responsibility of the user to determine the total
-  number of possible permutations for their \emph{DATA}. No checks are
-  made within Vegan functions to ensure a sensible number of
-  permutations is chosen.
+  number of possible permutations for their \emph{DATA}. The number of
+  possible permutations allowed under the specified design can be
+  calculated using \code{\link[permute]{numPerms}} from the
+  \pkg{permute} package. Heuristics employed within the
+  \code{\link[permute]{shuffleSet}} function used by \pkg{vegan} can be
+  triggered to generate the entire set of permutations instead of a
+  random set. The settings controlling the triggering of the complete
+  enumeration step are contained within a permutation design created
+  using \code{link[permute]{how}} and can be set by the user. See
+  \code{\link[permute]{how}} for details.
 
   Limits on the total number of permutations of \emph{DATA} are more
   severe in temporally or spatially ordered data or experimental designs
@@ -106,17 +175,43 @@
 
   In situations where only a low number of permutations is possible due
   to the nature of \emph{DATA} or the experimental design, enumeration
-  of all permutations becomes important and achievable
-  computationally. Currently, Vegan does not include functions to
-  perform complete enumeration of the set of possible
-  permutations. The next major release of Vegan will include such
-  functionality, however.
+  of all permutations becomes important and achievable computationally.
+
+  Above, we have provided only a brief overview of the capbilities of
+  \pkg{vegan} and \pkg{permute}. To get the best out of the new
+  functionality and for details on how to set up permutation designs
+  using \code{\link[permute]{how}}, consult the vignette
+  \emph{Restricted permutations; using the permute package} supplied
+  with \pkg{permute} and accessible via \code{vignette("permutations",
+  package = "permute").}
 }
 
 \seealso{
-  \code{\link{permutest}}
+  \code{\link{permutest}} for the main interface in \pkg{vegan}. See
+  also \code{\link[permute]{how}} for details on permutation design
+  specification, \code{\link[permute]{shuffleSet}} for the code used to
+  generate a set of permutations, \code{\link[permute]{numPerms}} for
+  a function to return the size of the set of possible permutations
+  under the current design.
 }
-%\references{
-%}
-\author{ Gavin Simpson }
+
+\references{
+
+  Manly, B. F. J. (2006). \emph{Randomization, Bootstrap and Monte Carlo
+  Methods in Biology}, Third Edition. Chapman and Hall/CRC.
+  
+  Phipson, B., & Smyth, G. K. (2010). Permutation P-values should never
+  be zero: calculating exact P-values when permutations are randomly
+  drawn. \emph{Statistical Applications in Genetics and Molecular
+    Biology}, \strong{9}, Article 39. DOI: 10.2202/1544-6115.1585
+  
+  ter Braak, C. J. F. (1990). \emph{Update notes: CANOCO version
+    3.1}. Wageningen: Agricultural Mathematics Group. (UR).
+
+  See also:
+
+  Davison, A. C., & Hinkley, D. V. (1997). \emph{Bootstrap Methods and
+    their Application}. Cambridge University Press.
+}
+\author{ Gavin L. Simpson }
 \keyword{multivariate}