[Vegan-commits] r2121 - pkg/vegan/man

Wed Mar 21 10:36:05 CET 2012

Author: jarioksa
Date: 2012-03-21 10:36:05 +0100 (Wed, 21 Mar 2012)
New Revision: 2121

Modified:
   pkg/vegan/man/anosim.Rd
   pkg/vegan/man/mrpp.Rd
   pkg/vegan/man/simper.Rd
Log:
Tell about location/dispersion mix-up in anosim, mrpp & simper

References to a new paper by Warton et al (MEE 3, 89-101; 2012)
to support previous warnings on the same issue in help pages.

Modified: pkg/vegan/man/anosim.Rd
===================================================================

--- pkg/vegan/man/anosim.Rd	2012-03-09 13:27:09 UTC (rev 2120)
+++ pkg/vegan/man/anosim.Rd	2012-03-21 09:36:05 UTC (rev 2121)
@@ -84,15 +84,24 @@
 }
 \references{
   Clarke, K. R. (1993). Non-parametric multivariate analysis of changes
-  in community structure. \emph{Australian Journal of Ecology} 18, 117-143.
+  in community structure. \emph{Australian Journal of Ecology} 18,
+  117--143.
+  
+  Warton, D.I., Wright, T.W., Wang, Y. 2012. Distance-based multivariate
+  analyses confound location and dispersion effects. \emph{Methods in
+  Ecology and Evolution}, 3, 89--101
+  
 }
 \author{Jari Oksanen, with a help from Peter R. Minchin.}
 \note{
-  I don't quite trust this method.  Somebody should study its
-  performance carefully.  The function returns a lot of information 
-  to ease further scrutiny. Most \code{anosim} models could be analysed
-  with \code{\link{adonis}} which seems to be a more robust alternative.
 
+  The \code{anosim} function can confound the differences between groups
+  and dispersion within groups and the results can be difficult to
+  interpret (cf. Warton et al. 2012).  The function returns a lot of
+  information to ease studying its performance. Most \code{anosim}
+  models could be analysed with \code{\link{adonis}} which seems to be a
+  more robust alternative.
+
 }
 
 \seealso{\code{\link{mrpp}} for a similar function using original

Modified: pkg/vegan/man/mrpp.Rd
===================================================================
--- pkg/vegan/man/mrpp.Rd	2012-03-09 13:27:09 UTC (rev 2120)
+++ pkg/vegan/man/mrpp.Rd	2012-03-21 09:36:05 UTC (rev 2121)
@@ -54,72 +54,75 @@
   \item{\dots}{Further arguments passed to functions.}
 }
 
-\details{ Multiple Response Permutation Procedure (MRPP) provides a test
-of whether there is a significant difference between two or more groups
-of sampling units. This difference may be one of location (differences
-in mean) or one of spread (differences in within-group
-distance). Function \code{mrpp} operates on a \code{data.frame} matrix
-where rows are observations and responses data matrix. The response(s)
-may be uni- or multivariate. The method is philosophically and
-mathematically allied with analysis of variance, in that it compares
-dissimilarities within and among groups. If two groups of sampling units
-are really different (e.g. in their species composition), then average
-of the within-group compositional dissimilarities ought to be less than
-the average of the dissimilarities between two random collection of
-sampling units drawn from the entire population. 
+\details{
 
-The mrpp statistic \eqn{\delta} is the overall weighted mean of
-within-group means of the pairwise dissimilarities among sampling
-units. The choice of group weights is currently not clear. The
-\code{mrpp} function offers three choices: (1) group size (\eqn{n}), (2) a
-degrees-of-freedom analogue (\eqn{n-1}), and (3) a weight that is the number
-of unique distances calculated among \eqn{n} sampling units (\eqn{n(n-1)/2}).
+  Multiple Response Permutation Procedure (MRPP) provides a test of
+  whether there is a significant difference between two or more groups
+  of sampling units. This difference may be one of location (differences
+  in mean) or one of spread (differences in within-group distance;
+  cf. Warton et al. 2012). Function \code{mrpp} operates on a
+  \code{data.frame} matrix where rows are observations and responses
+  data matrix. The response(s) may be uni- or multivariate. The method
+  is philosophically and mathematically allied with analysis of
+  variance, in that it compares dissimilarities within and among
+  groups. If two groups of sampling units are really different (e.g. in
+  their species composition), then average of the within-group
+  compositional dissimilarities ought to be less than the average of the
+  dissimilarities between two random collection of sampling units drawn
+  from the entire population.
 
-The \code{mrpp} algorithm first calculates all pairwise distances in the
-entire dataset, then calculates \eqn{\delta}. It then permutes the
-sampling units and their associated pairwise distances, and recalculates
-\eqn{\delta} based on the permuted data. It repeats the permutation
-step \code{permutations} times. The significance test is the
-fraction of permuted deltas that are less than the observed delta, with
-a small sample correction. The function also calculates the
-change-corrected within-group agreement
-\eqn{A = 1 -\delta/E(\delta)}, where \eqn{E(\delta)} is the expected
-\eqn{\delta} assessed as the average of dissimilarities.
+  The mrpp statistic \eqn{\delta} is the overall weighted mean of
+  within-group means of the pairwise dissimilarities among sampling
+  units. The choice of group weights is currently not clear. The
+  \code{mrpp} function offers three choices: (1) group size (\eqn{n}),
+  (2) a degrees-of-freedom analogue (\eqn{n-1}), and (3) a weight that
+  is the number of unique distances calculated among \eqn{n} sampling
+  units (\eqn{n(n-1)/2}).
 
-If the first argument \code{dat} can be interpreted as dissimilarities,
-they will be used directly. In other cases the function treats
-\code{dat} as observations, and uses \code{\link{vegdist}} to find the
-dissimilarities.  The default \code{distance} is Euclidean as in the
-traditional use of the method, but other dissimilarities in
-\code{\link{vegdist}} also are available.
+  The \code{mrpp} algorithm first calculates all pairwise distances in
+  the entire dataset, then calculates \eqn{\delta}. It then permutes the
+  sampling units and their associated pairwise distances, and
+  recalculates \eqn{\delta} based on the permuted data. It repeats the
+  permutation step \code{permutations} times. The significance test is
+  the fraction of permuted deltas that are less than the observed delta,
+  with a small sample correction. The function also calculates the
+  change-corrected within-group agreement \eqn{A = 1 -\delta/E(\delta)},
+  where \eqn{E(\delta)} is the expected \eqn{\delta} assessed as the
+  average of dissimilarities.
 
-Function \code{meandist} calculates a matrix of mean within-cluster
-dissimilarities (diagonal) and between-cluster dissimilarities
-(off-diagonal elements), and an attribute \code{n} of \code{grouping}
-counts. Function \code{summary} finds the within-class, between-class
-and overall means of these dissimilarities, and the MRPP statistics with
-all \code{weight.type} options and the Classification Strength, CS (Van
-Sickle and Hughes, 2000). CS is defined for dissimiliraties as
-\eqn{\bar{B} - \bar{W}}{Bbar-Wbar}, where \eqn{\bar{B}}{Bbar} is the
-mean between cluster dissimilarity and \eqn{\bar{W}}{Wbar} is the mean
-within cluster dissimilarity with \code{weight.type = 1}. The function
-does not perform significance tests for these statistics, but you must
-use \code{mrpp} with appropriate \code{weight.type}. There is currently
-no significance test for CS, but \code{mrpp} with \code{weight.type = 1}
-gives the correct test for \eqn{\bar{W}}{Wbar} and a good approximation
-for CS.  Function \code{plot} draws a dendrogram or a histogram of the
-result matrix based on the within-group and between group
-dissimilarities. The dendrogram is found with the method given in the
-\code{cluster} argument using function \code{\link{hclust}}. The
-terminal segments hang to within-cluster dissimilarity. If some of the
-clusters are more heterogeneous than the combined class, the leaf
-segment are reversed.  The histograms are based on dissimilarites, but
-ore otherwise similar to those of Van Sickle and Hughes (2000):
-horizontal line is drawn at the level of mean between-cluster
-dissimilarity and vertical lines connect within-cluster dissimilarities
-to this line.
-}
+  If the first argument \code{dat} can be interpreted as
+  dissimilarities, they will be used directly. In other cases the
+  function treats \code{dat} as observations, and uses
+  \code{\link{vegdist}} to find the dissimilarities.  The default
+  \code{distance} is Euclidean as in the traditional use of the method,
+  but other dissimilarities in \code{\link{vegdist}} also are available.
 
+  Function \code{meandist} calculates a matrix of mean within-cluster
+  dissimilarities (diagonal) and between-cluster dissimilarities
+  (off-diagonal elements), and an attribute \code{n} of \code{grouping}
+  counts. Function \code{summary} finds the within-class, between-class
+  and overall means of these dissimilarities, and the MRPP statistics
+  with all \code{weight.type} options and the Classification Strength,
+  CS (Van Sickle and Hughes, 2000). CS is defined for dissimiliraties as
+  \eqn{\bar{B} - \bar{W}}{Bbar-Wbar}, where \eqn{\bar{B}}{Bbar} is the
+  mean between cluster dissimilarity and \eqn{\bar{W}}{Wbar} is the mean
+  within cluster dissimilarity with \code{weight.type = 1}. The function
+  does not perform significance tests for these statistics, but you must
+  use \code{mrpp} with appropriate \code{weight.type}. There is
+  currently no significance test for CS, but \code{mrpp} with
+  \code{weight.type = 1} gives the correct test for \eqn{\bar{W}}{Wbar}
+  and a good approximation for CS.  Function \code{plot} draws a
+  dendrogram or a histogram of the result matrix based on the
+  within-group and between group dissimilarities. The dendrogram is
+  found with the method given in the \code{cluster} argument using
+  function \code{\link{hclust}}. The terminal segments hang to
+  within-cluster dissimilarity. If some of the clusters are more
+  heterogeneous than the combined class, the leaf segment are reversed.
+  The histograms are based on dissimilarites, but ore otherwise similar
+  to those of Van Sickle and Hughes (2000): horizontal line is drawn at
+  the level of mean between-cluster dissimilarity and vertical lines
+  connect within-cluster dissimilarities to this line.  }
+
 \value{
 The function returns a list of class mrpp with following items:
   \item{call }{	Function call.}
@@ -147,7 +150,6 @@
   B. McCune and J. B. Grace. 2002. \emph{Analysis of Ecological
   Communities.} MjM  Software Design, Gleneden Beach, Oregon, USA.
 
-
   P. W. Mielke and K. J. Berry. 2001. \emph{Permutation Methods: A
   Distance  Function Approach.} Springer Series in
   Statistics. Springer.  
@@ -156,6 +158,9 @@
   ecoregions, catchments, and geographic clusters of aquatic vertebrates
   in Oregon. \emph{J. N. Am. Benthol. Soc.} 19:370--384.
 
+  Warton, D.I., Wright, T.W., Wang, Y. 2012. Distance-based multivariate
+  analyses confound location and dispersion effects. \emph{Methods in
+  Ecology and Evolution}, 3, 89--101
 
 }
 \author{

Modified: pkg/vegan/man/simper.Rd
===================================================================
--- pkg/vegan/man/simper.Rd	2012-03-09 13:27:09 UTC (rev 2120)
+++ pkg/vegan/man/simper.Rd	2012-03-21 09:36:05 UTC (rev 2121)
@@ -60,6 +60,15 @@
   the data frames also include the cumulative contributions and
   are ordered by species contribution.
 
+  The results of \code{simper} can be very difficult to interpret. The
+  method very badly confounds the mean between group differences and
+  within group variation, and seems to single out variable species
+  instead of distinctive species (Warton et al. 2012). Even if you make
+  groups that are copies of each other, the method will single out
+  species with high contribution, but these are not contributions
+  to non-existing between-group differences but to within-group
+  variation in species abundance.
+
 }
 
 \value{
@@ -92,6 +101,10 @@
   Clarke, K.R. 1993. Non-parametric multivariate analyses of changes
     in community structure. \emph{Australian Journal of Ecology}, 18,
     117–143.
+
+  Warton, D.I., Wright, T.W., Wang, Y. 2012. Distance-based multivariate
+    analyses confound location and dispersion effects. \emph{Methods in
+    Ecology and Evolution}, 3, 89--101.
 }
 \keyword{multivariate}