[Vegan-commits] r2615 - in branches/2.0/inst: . doc

noreply at r-forge.r-project.org noreply at r-forge.r-project.org
Wed Sep 11 09:22:45 CEST 2013


Author: jarioksa
Date: 2013-09-11 09:22:45 +0200 (Wed, 11 Sep 2013)
New Revision: 2615

Added:
   branches/2.0/inst/doc/vegan.sty
Removed:
   branches/2.0/inst/doc/veganjss.sty
Modified:
   branches/2.0/inst/ChangeLog
   branches/2.0/inst/doc/decision-vegan.Rnw
   branches/2.0/inst/doc/diversity-vegan.Rnw
   branches/2.0/inst/doc/intro-vegan.Rnw
   branches/2.0/inst/doc/vegan.bib
Log:
merge edits to Rnw files

includes reformatting as standard article, mostly with two columns,
and adding references to diversity-vegan.Rnw. decision-vegan.Rnw
needed hand-editing, because text on parallel processing was not merged.
Prepares for moving vignettes to vignettes/ directory (not done yet).


Modified: branches/2.0/inst/ChangeLog
===================================================================
--- branches/2.0/inst/ChangeLog	2013-09-10 16:14:13 UTC (rev 2614)
+++ branches/2.0/inst/ChangeLog	2013-09-11 07:22:45 UTC (rev 2615)
@@ -19,6 +19,9 @@
 	* merge r2570: aspell fixes in R files.
 	* merge r2568: aspell fixes in Rd files.
 	* merge r2564: line wrapping in betadiver(help=TRUE).
+	* merge r2562, 2563, 2565-7, 2572-6: edit and reformat Rnw file;
+	decision-vegan.Rnw required hand-editing due to conflicts in
+	text on parallel processing that was not merged.
 	* merge r2558: ordiktplot bmp device in all platforms. 
 	
 Version 2.0-8 (released July 10, 2013)

Modified: branches/2.0/inst/doc/decision-vegan.Rnw
===================================================================
--- branches/2.0/inst/doc/decision-vegan.Rnw	2013-09-10 16:14:13 UTC (rev 2614)
+++ branches/2.0/inst/doc/decision-vegan.Rnw	2013-09-11 07:22:45 UTC (rev 2615)
@@ -1,59 +1,52 @@
 % -*- mode: noweb; noweb-default-code-mode: R-mode; -*-
 %\VignetteIndexEntry{Design decisions and implementation}
 
-\documentclass[article,nojss]{jss}
-\usepackage{veganjss} % package options and redefinitions
-\usepackage{amsmath}
-%\usepackage{ucs}
-%\usepackage[utf8x]{inputenc}
-%\usepackage[T1]{fontenc}
-\usepackage{sidecap}
-\usepackage[english]{babel} % kluge to avoid visible ~ in Figure~1.
-\renewcommand{\floatpagefraction}{0.8}
-\renewcommand{\cite}{\citep}
+\documentclass[a4paper,10pt,twocolumn]{article}
+\usepackage{vegan} % package options and redefinitions
 
 \author{Jari Oksanen}
 \title{Design decisions and implementation details in vegan}
-\Abstract{
-  This document describes design decisions, and discusses implementation
-and algorithmic details in some vegan functions. The proper FAQ is
-another document.
-  }
- \Keywords{nestdness, matrix temperature, community null models, scaling of PCA and RDA, WA
-   and LC scores}
-%% hijack Address for version info
-\Address{$ $Id$ $
+
+\date{\footnotesize{$ $Id$ $
   processed with vegan
 \Sexpr{packageDescription("vegan", field="Version")}
-in \Sexpr{R.version.string} on \today}
-\Footername{About this version}
+in \Sexpr{R.version.string} on \today}}
 
-%% need no \usepackage{Sweave.sty}
+%% need no \usepackage{Sweave}
 \begin{document}
+\bibliographystyle{jss}
+
 \SweaveOpts{strip.white=true}
-\setkeys{Gin}{width=0.55\linewidth}
+
 <<echo=false,results=hide>>=
 figset <- function() par(mar=c(4,4,1,1)+.1)
 options(SweaveHooks = list(fig = figset))
-options("prompt" = "R> ", "continue" = "+  ")
+options("prompt" = "> ", "continue" = "  ")
+options(width = 55) 
 require(vegan)
 @
+\maketitle
 
+\begin{abstract}
+  This document describes design decisions, and discusses implementation
+and algorithmic details in some vegan functions. The proper FAQ is
+another document.
+\end{abstract}
+
 \tableofcontents
 
-
 \section{Nestedness and Null models}
 
-Some indicators of nestedness and null models of communities are only
-described in general terms, and they could be implemented in various
-ways. Here I discuss the implementation in \pkg{vegan}.
+Some published indices of nestedness and null models of communities
+are only described in general terms, and they could be implemented in
+various ways. Here I discuss the implementation in \pkg{vegan}.
 
 \subsection{Matrix temperature}
 
 The matrix temperature is intuitively simple
 (Fig. \ref{fig:nestedtemp}), but the the exact calculations were not
 explained in the original publication \cite{AtmarPat93}.
-\begin{SCfigure}
+\begin{figure}
 <<fig=true,echo=false,results=hide>>=
 data(sipoo)
 mod <- nestedtemp(sipoo)
@@ -76,7 +69,7 @@
   the fill line. The ``surprise'' for this point is $u = (d/D)^2$ and
   the matrix temperature is based on the sum of surprises: presences
   outside the fill line or absences within the fill line.}
-\end{SCfigure}
+\end{figure}
 The function can be implemented in many ways following the general
 principles.  \citet{RodGir06} have seen the original code and reveal
 more details of calculations, and their explanation is the basis of
@@ -91,12 +84,12 @@
 
 \begin{itemize}
 \item Species and sites are put into unit square \citep{RodGir06}. The
-  coordinates for $n$ item will be $(k-0.5)/n$ for $k=1 \ldots n$, so
-  that there are no points in the corners or the margins of the unit
-  square, and a diagonal line can be drawn through any point. I do not
-  know how the rows and columns are converted to the unit square in
-  other software, and this may be a considerable source of differences
-  among implementations.
+  row and column coordinates will be $(k-0.5)/n$ for $k=1 \ldots n$,
+  so that there are no points in the corners or the margins of the
+  unit square, and a diagonal line can be drawn through any point. I
+  do not know how the rows and columns are converted to the unit
+  square in other software, and this may be a considerable source of
+  differences among implementations.
   \item Species and sites are ordered alternately using indices
     \citep{RodGir06}:
     \begin{equation}
@@ -135,7 +128,7 @@
     \citep{RodGir06}. Small details in the fill line combined with
     differences in scores used in the unit square (especially in the
     corners) can cause large differences in the results.
-  \item A line with slope $-1$ is drawn through the point and the $x$
+  \item A line with slope\,$= -1$ is drawn through the point and the $x$
     coordinate of the intersection of this line and the fill line is
     found using function \code{uniroot}. The difference of this
     intersection and the row coordinate gives the argument $d$ of matrix
@@ -200,7 +193,7 @@
 \code{rda} and \code{prcomp} even use \textsc{svd} internally in
 their algorithm.
 
-In \textsc{svd} a centred data matrix is decomposed into orthogonal
+In \textsc{svd} a centred data matrix $\mathbf{X} = \{x_{ij}\}$ is decomposed into orthogonal
 components so that $x_{ij} = \sum_k \sigma_k u_{ik} v_{jk}$, where
 $u_{ik}$ and $v_{jk}$ are orthonormal coefficient matrices and
 $\sigma_k$ are singular values.  Orthonormality means that sums of
@@ -226,7 +219,7 @@
 found indirectly as well, so that we have the same components in both
 methods.
 
-The coefficients $u_{ik}$ and $v_{jk}$ are scaled similarly for all
+The coefficients $u_{ik}$ and $v_{jk}$ are scaled to unit length for all
 axes $k$. Singular values $\sigma_k$ or eigenvalues $\lambda_k$ give
 the information of the importance of axes, or the `axis lengths.'
 Instead of the orthonormal coefficients, or equal length axes, it is
@@ -238,7 +231,8 @@
 apply to species scores and constraints or linear combination scores;
 weighted averaging scores have somewhat wider dispersion.
 
-\begin{table}
+\begin{table*}[t]
+  \centering
   \caption{\label{tab:scales} Alternative scalings for \textsc{rda} used
     in the functions \code{prcomp} and \code{princomp}, and the
     one used in the \pkg{vegan} function \code{rda} 
@@ -252,9 +246,12 @@
     is derived
     dividing each  species by its standard deviation $s_j$ (possibly
     with some additional constant multiplier).  }
-\begin{tabular}{lcc}
+ \begin{tabular}{lcc}
+  \\
+  \toprule
 & \textbf{Site scores} $u_{ik}^*$ &
 \textbf{Species scores} $v_{jk}^*$ \\
+\midrule
 \code{prcomp, princomp} &
 $u_{ik} \sqrt{n-1} \sqrt{\lambda_k}$ &
 $v_{jk}$ \\
@@ -271,7 +268,7 @@
 \code{rda, scaling < 0} &
 $u_{ik}^*$ &
 $\sqrt{\sum \lambda_k /(n-1)} s_j^{-1} v_{jk}^*$
-% \\
+\\
 % \code{Canoco 3, scaling=-1} &
 % $u_{ik} \sqrt{n-1} \sqrt{\lambda_k / \sum \lambda_k}$ &
 % $v_{jk} \sqrt{n}$ \\
@@ -282,9 +279,12 @@
 % \code{Canoco 3, scaling=-3} &
 % $u_{ik} \sqrt{n-1} \sqrt[4]{\lambda_k / \sum \lambda_k}$ &
 % $v_{jk} \sqrt{n} \sqrt[4]{\lambda_k / \sum \lambda_k}$
+\bottomrule
 \end{tabular}
-\end{table}
+\end{table*}
 
+
+
 In community ecology, it is common to plot both species and sites in
 the same graph.  If this graph is a graphical display of \textsc{svd},
 or a graphical, low-dimensional approximation of the data, the graph
@@ -331,21 +331,27 @@
 have two separate scaling constants: the first for the species, and
 the second for sites and friends, and this allows getting scores of
 other software or \proglang{R} functions (Table \ref{tab:rdaconst}). 
-\begin{table}
+
+\begin{table*}[t]
+  \centering
   \caption{\label{tab:rdaconst} Values of the \code{const} argument in
     \textbf{vegan} to get the scores that are equal to those from
     other functions and software. Number of sites (rows) is $n$, 
     the number of species (columns) is $m$, and the sum of all
     eigenvalues is $\sum_k \lambda_k$ (this is saved as the item
     \code{tot.chi} in the \code{rda} result)}.
-\begin{tabular}{lccc}
+ \begin{tabular}{lccc}
+  \\
+  \toprule
 & \textbf{Scaling} &\textbf{Species constant} & \textbf{Site constant} \\
+\midrule
 \pkg{vegan} & any  & $\sqrt[4]{(n-1) \sum \lambda_k}$ & $\sqrt[4]{(n-1) \sum \lambda_k}$\\
 \code{prcomp}, \code{princomp} & \code{1} & $1$ & $\sqrt{(n-1) \sum_k \lambda_k}$\\
 \proglang{Canoco\,v3} & \code{-1, -2, -3} & $\sqrt{n-1}$ & $\sqrt{n}$\\
-\proglang{Canoco\,v4} & \code{-1, -2, -3} & $\sqrt{m}$ & $\sqrt{n}$
+\proglang{Canoco\,v4} & \code{-1, -2, -3} & $\sqrt{m}$ & $\sqrt{n}$\\
+\bottomrule
 \end{tabular}
-\end{table}
+\end{table*}
 
 In this chapter, I used always centred data matrices.  In principle
 \textsc{svd} could be done with original, non-centred data, but
@@ -380,25 +386,23 @@
 species scores that are as similar to LC scores as possible.
 \end{itemize}
 Many computer programs for constrained ordinations give only or
-primarily LC scores, following Mike Palmer's recommendation
-\cite{Palmer93}.  However, functions \code{cca} and \code{rda} in
+primarily LC scores following recommendation of
+\citet{Palmer93}.  However, functions \code{cca} and \code{rda} in
 the \pkg{vegan} package use primarily WA scores. This chapter
 explains the reasons for this choice.
 
 Briefly, the main reasons are that
 \begin{itemize}
-\item
-LC scores \emph{are} linear combinations, so they give us only the
-(scaled) environmental variables. This means that they are
-independent of vegetation and cannot be found from the species
-composition.  Moreover, identical combinations of environmental
-variables give identical LC scores irrespective of vegetation.
-\item
-Bruce McCune has demonstrated that noisy environmental variables
-result in deteriorated LC scores whereas WA scores tolerate some errors
-in environmental variables \cite{McCune97}.  All environmental
-measurements contain some errors, and therefore it is safer to use WA
-scores.
+\item LC scores \emph{are} linear combinations, so they give us only
+  the (scaled) environmental variables. This means that they are
+  independent of vegetation and cannot be found from the species
+  composition.  Moreover, identical combinations of environmental
+  variables give identical LC scores irrespective of vegetation.
+\item \citet{McCune97} has demonstrated that noisy environmental
+  variables result in deteriorated LC scores whereas WA scores
+  tolerate some errors in environmental variables.  All environmental
+  measurements contain some errors, and therefore it is safer to use
+  WA scores.
 \end{itemize}
 This article studies mainly the first point.  The users of
 \pkg{vegan} have a choice of either LC or WA (default) scores, but
@@ -423,13 +427,13 @@
 <<a,fig=false>>=
 plot(orig, dis=c("lc","bp"))
 @
-\begin{SCfigure}
+\begin{figure}
 <<fig=true,echo=false>>=
 <<a>>
 @
 \caption{LC scores in CCA of the original data.}
 \label{fig:ccalc}
-\end{SCfigure}
+\end{figure}
 
 What would happen to linear combinations of LC scores if we shuffle
 the ordering of sites in species data?  Function \code{sample()} below
@@ -438,26 +442,27 @@
 i <- sample(nrow(varespec))
 shuff <- cca(varespec[i,] ~ Al + K, varechem)
 @
-\begin{SCfigure}
+\begin{figure}
 <<fig=true,echo=false>>=
 plot(shuff, dis=c("lc","bp"))
 @
 \caption{LC scores of shuffled species data.}
 \label{fig:ccashuff}
-\end{SCfigure}
+\end{figure}
 It seems that site scores are fairly similar, but oriented differently
 (Fig. \ref{fig:ccashuff}).  We can use Procrustes rotation to see how
 similar the site scores indeed are (Fig. \ref{fig:ccaproc}).
 <<a,fig=false>>=
-plot(procrustes(scores(orig, dis="lc"), scores(shuff, dis="lc")))
+plot(procrustes(scores(orig, dis="lc"), 
+                scores(shuff, dis="lc")))
 @
-\begin{SCfigure}
+\begin{figure}
 <<fig=true,echo=false>>=
 <<a>>
 @
 \caption{Procrustes rotation of LC scores from CCA of original and shuffled data.}
 \label{fig:ccaproc}
-\end{SCfigure}
+\end{figure}
 There is a small difference, but this will disappear if we use
 Redundancy Analysis (RDA) instead of CCA
 (Fig. \ref{fig:rdaproc}). Here we use a new shuffling as well.
@@ -466,13 +471,14 @@
 i <- sample(nrow(varespec)) # Different shuffling
 tmp2 <- rda(varespec[i,] ~ Al + K, varechem)
 @
-\begin{SCfigure}
+\begin{figure}
 <<fig=true,echo=false>>=
-plot(procrustes(scores(tmp1, dis="lc"), scores(tmp2, dis="lc")))
+plot(procrustes(scores(tmp1, dis="lc"), 
+                scores(tmp2, dis="lc")))
 @
 \caption{Procrustes rotation of LC scores in RDA of the original and shuffled data.}
 \label{fig:rdaproc}
-\end{SCfigure}
+\end{figure}
 
 LC scores indeed are linear combinations of constraints (environmental
 variables) and \emph{independent of species data}: You can
@@ -484,23 +490,21 @@
 on the variability of site totals.
 
 The original data and shuffled data differ in their goodness of
-fit\footnote{Or probably differ: The randomization is done while
-generating this article, and different versions may have different
-randomizations.}.
+fit:
 <<>>=
 orig
 shuff
 @
 Similarly their WA scores will be (probably) very different
 (Fig. \ref{fig:ccawa}).
-\begin{SCfigure}
+\begin{figure}
 <<fig=true,echo=false>>=
 plot(procrustes(orig, shuff))
 @
 \caption{Procrustes rotation of WA scores of CCA with the original and
   shuffled data.}
 \label{fig:ccawa}
-\end{SCfigure}
+\end{figure}
 
 The example used only two environmental variables so that we can
 easily plot all constrained axes.  With a larger number of
@@ -511,7 +515,8 @@
 <<>>=
 tmp1 <- rda(varespec ~ ., varechem)
 tmp2 <- rda(varespec[i,] ~ ., varechem)
-proc <- procrustes(scores(tmp1, dis="lc", choi=1:14), scores(tmp2, dis="lc", choi=1:14))
+proc <- procrustes(scores(tmp1, dis="lc", choi=1:14), 
+                   scores(tmp2, dis="lc", choi=1:14))
 max(residuals(proc))
 @
 In \code{cca} the difference would be somewhat larger than now
@@ -527,19 +532,18 @@
 <<>>=
 data(dune)
 data(dune.env)
-summary(dune.env)
 orig <- cca(dune ~ Moisture, dune.env)
 @
 When the results are plotted using LC scores, sample plots fall only
 in four alternative positions (Fig. \ref{fig:factorlc}).
-\begin{SCfigure}
+\begin{figure}
 <<fig=TRUE,echo=false>>=
 plot(orig, dis="lc")
 @
 \caption{LC scores of the dune meadow data using only one factor as a
   constraint.}
 \label{fig:factorlc}
-\end{SCfigure}
+\end{figure}
 In the previous chapter we saw that this happens because LC scores
 \emph{are} the environmental variables, and they can be distinct only
 if the environmental variables are distinct.  However, normally the user
@@ -557,14 +561,14 @@
 ordispider(orig, col="red")
 text(orig, dis="cn", col="blue")
 @
-\begin{SCfigure}
+\begin{figure}
 <<fig=TRUE,echo=false>>=
 <<a>>
 @
 \caption{A ``spider plot'' connecting WA scores to corresponding LC
   scores. The shorter the web segments, the better the ordination.}
 \label{fig:walcspider}
-\end{SCfigure}
+\end{figure}
 This is the standard way of displaying results of discriminant
 analysis, too.  Moisture classes \code{1} and \code{2} seem to be
 overlapping, and cannot be completely separated by their

Modified: branches/2.0/inst/doc/diversity-vegan.Rnw
===================================================================
--- branches/2.0/inst/doc/diversity-vegan.Rnw	2013-09-10 16:14:13 UTC (rev 2614)
+++ branches/2.0/inst/doc/diversity-vegan.Rnw	2013-09-11 07:22:45 UTC (rev 2615)
@@ -1,52 +1,43 @@
 % -*- mode: noweb; noweb-default-code-mode: R-mode; -*-
 %\VignetteIndexEntry{Diversity analysis in vegan}
-\documentclass[article,nojss]{jss}
-\usepackage{veganjss} %% vegan setup
-\usepackage{ucs}
-\usepackage[utf8x]{inputenc}
-\usepackage[T1]{fontenc}
-\usepackage{sidecap}
-\usepackage{amsmath}
-\usepackage{amssymb} % \gtrapprox
-\usepackage[english]{babel} % kluge to avoid visible ~ in Figure~1
+\documentclass[a4paper,10pt,twocolumn]{article}
+\usepackage{vegan} %% vegan setup
 
+%% TODO: SSarrhenius, adipart, beals update, betadisper
+%% expansion (+ permutest), contribdiv, eventstar, multipart, refer to
+%% FD, check Kindt reference to specaccum, check estimateR ref
+
 \title{Vegan: ecological diversity} \author{Jari Oksanen} 
 
-\Abstract{ This document explains diversity related methods in
-  \pkg{vegan}. The methods are briefly described, and the equations
-  used them are given often in more detail than in their help
-  pages. The methods discussed include common diversity indices and
-  rarefaction, families of diversity indices, species abundance
-  models, species accumulation models and beta diversity, extrapolated
-  richness and probability of being a member of the species pool. The
-  document is still incomplete and does not cover all diversity
-  methods in \pkg{vegan}.}
-
-\Keywords{diversity, Shannon, Simpson, R{\'e}nyi, Hill number,
-  Tsallis, rarefaction, species accumulation, beta diversity, species
-  abundance, Fisher alpha, Fisher logarithmic series, Preston
-  log-normal model, species abundance models, Whittaker plots,
-  extended richness, taxonomic diversity, functional divesity, species
-  pool}
-
-%% misuse next for scm data
-\Address{$ $Id$ $
+\date{\footnotesize{$ $Id$ $
   processed with vegan \Sexpr{packageDescription("vegan", field="Version")}
-  in \Sexpr{R.version.string} on \today}
-\Footername{About this version}
+  in \Sexpr{R.version.string} on \today}}
 
 %% need no \usepackage{Sweave}
 \begin{document}
-\setkeys{Gin}{width=0.55\linewidth}
+\bibliographystyle{jss}
+
 \SweaveOpts{strip.white=true}
 <<echo=false>>=
 par(mfrow=c(1,1))
-options(width=72)
+options(width=55) 
 figset <- function() par(mar=c(4,4,1,1)+.1)
 options(SweaveHooks = list(fig = figset))
-options("prompt" = "R> ", "continue" = "+  ")
+options("prompt" = "> ", "continue" = "  ")
 @
 
+\maketitle
+\begin{abstract} 
+  This document explains diversity related methods in
+  \pkg{vegan}. The methods are briefly described, and the equations
+  used them are given often in more detail than in their help
+  pages. The methods discussed include common diversity indices and
+  rarefaction, families of diversity indices, species abundance
+  models, species accumulation models and beta diversity, extrapolated
+  richness and probability of being a member of the species pool. The
+  document is still incomplete and does not cover all diversity
+  methods in \pkg{vegan}.
+\end{abstract}
 \tableofcontents
 
 
@@ -72,7 +63,7 @@
 \section{Diversity indices}
 
 Function \code{diversity} finds the most commonly used diversity
-indices:
+indices \citep{Hill73number}:
 \begin{align}
 H &= - \sum_{i=1}^S p_i \log_b  p_i & \text{Shannon--Weaver}\\
 D_1 &= 1 - \sum_{i=1}^S p_i^2  &\text{Simpson}\\
@@ -99,7 +90,7 @@
 the numbers of species.
 
 \pkg{vegan} also can estimate series of R\'{e}nyi and Tsallis
-diversities. R{\'e}nyi diversity of order $a$ is:
+diversities. R{\'e}nyi diversity of order $a$ is \citep{Hill73number}:
 \begin{equation}
 H_a = \frac{1}{1-a} \log \sum_{i=1}^S p_i^a
 \end{equation}
@@ -108,7 +99,7 @@
 \exp(H')$, $N_2 = D_2$, and $N_\infty = 1/(\max p_i)$. The
 corresponding R\'{e}nyi diversities are $H_0 = \log(S)$, $H_1 = H'$, $H_2 =
 - \log(\sum p_i^2)$, and $H_\infty = - \log(\max p_i)$.  
-Tsallis diversity of order $q$ is:
+Tsallis diversity of order $q$ is \citep{Tothmeresz95}:
 \begin{equation}
   H_q = \frac{1}{q-1} \left(1 - \sum_{i=1}^S p^q \right) \, .
 \end{equation}
@@ -127,7 +118,7 @@
 diversities are higher than in another site.  We can inspect this
 graphically using the standard \code{plot} function for the
 \code{renyi} result (Fig. \ref{fig:renyi}).
-\begin{SCfigure}
+\begin{figure}
 <<fig=true,echo=false>>=
 print(plot(R))
 @
@@ -136,10 +127,10 @@
   show the values for sites, and the lines the extremes and median in
   the data set.}
 \label{fig:renyi}
-\end{SCfigure}
+\end{figure}
 
 Finally, the $\alpha$ parameter of Fisher's log-series can be used as
-a diversity index:
+a diversity index \citep{FisherEtal43}:
 <<>>=
 alpha <- fisher.alpha(BCI)
 @
@@ -150,24 +141,24 @@
 richness actually may be caused by differences in sample size.  To
 solve this problem, we may try to rarefy species richness to the same
 number of individuals.  Expected number of species in a community
-rarefied from $N$ to $n$ individuals is:
-\begin{equation}
+rarefied from $N$ to $n$ individuals is \citep{Hurlbert71}:
+\begin{multline}
 \label{eq:rare}
-\hat S_n = \sum_{i=1}^S (1 - q_i),\quad \text{where} \quad q_i = {N-x_i
+\hat S_n = \sum_{i=1}^S (1 - q_i),\\ \text{where} \quad q_i = {N-x_i
   \choose n} \Bigm /{N \choose n}
-\end{equation}
+\end{multline}
 where $x_i$ is the count of species $i$, and ${N \choose n}$ is the
 binomial coefficient, or the number of ways we can choose $n$ from
 $N$, and $q_i$ give the probabilities that species $i$ does \emph{not} occur in a
 sample of size $n$.  This is defined only when $N-x_i > n$, but for
 other cases $q_i = 0$ or the species is sure to occur in the sample.
-The variance of rarefied richness is:
-\begin{equation}
+The variance of rarefied richness is \citep{HeckEtal75}:
+\begin{multline}
 \label{eq:rarevar}
-s^2 = q_i (1-q_i) + 2 \sum_{i=1}^S \sum_{j>i} \left[ {N- x_i - x_j
+s^2 = q_i (1-q_i)  \\ + 2 \sum_{i=1}^S \sum_{j>i} \left[ {N- x_i - x_j
     \choose n} \Bigm / {N
     \choose n} - q_i q_j\right]
-\end{equation}
+\end{multline}
 Equation \ref{eq:rarevar} actually is of the same form as the variance
 of sum of correlated variables:
 \begin{equation}
@@ -197,31 +188,31 @@
 richness:
 <<>>=
 all(rank(Srar) == rank(S2))
-@
+@ 
 Moreover, the rarefied richness for two individuals is a finite
-sample variant of Simpson's diversity index (or, more precisely of
-$D_1 + 1$), and these two are almost identical in BCI:
+sample variant of Simpson's diversity index \citep{Hurlbert71}\,--\,or
+more precisely of $D_1 + 1$, and these two are almost identical in
+BCI:
 <<>>=
 range(diversity(BCI, "simp") - (S2 -1))
-@
+@ 
 Rarefaction is sometimes presented as an ecologically meaningful
-alternative to dubious diversity indices, but the differences really
-seem to be small.
+alternative to dubious diversity indices \citep{Hurlbert71}, but the
+differences really seem to be small.
 
 \section{Taxonomic and functional diversity}
 
 Simple diversity indices only consider species identity: all different
 species are equally different. In contrast, taxonomic and functional
-diversity indices judge the differences of species
-are. Taxonomic and functional diversities are used in different fields
-of science, but they really have very similar reasoning, and either
-could be used either with taxonomic or functional properties of
-species.
+diversity indices judge the differences of species. Taxonomic and
+functional diversities are used in different fields of science, but
+they really have very similar reasoning, and either could be used
+either with taxonomic or functional traits of species.
 
-\subsection{Taxonomic diversity: average distance of properties}
+\subsection{Taxonomic diversity: average distance of traits}
 
-The two basic indices are called taxonomic diversity ($\Delta$) and
-taxonomic distinctness ($\Delta^*$):
+The two basic indices are called taxonomic diversity $\Delta$ and
+taxonomic distinctness $\Delta^*$ \citep{ClarkeWarwick98}:
 \begin{align}
   \Delta &= \frac{\sum \sum_{i<j} \omega_{ij} x_i x_j}{n (n-1) / 2}\\
 \Delta^* &= \frac{\sum \sum_{i<j} \omega_{ij} x_i x_j}{\sum \sum_{i<j} x_i x_j}
@@ -237,23 +228,23 @@
   species richness, but lower case $s$ is used here in accordance with
   the original papers on taxonomic diversity}
 to give $s \Delta^+$, or it can be used to estimate an index of
-variation in taxonomic distinctness $\Lambda^+$:
+variation in taxonomic distinctness $\Lambda^+$ \citep{ClarkeWarwick01}:
 \begin{equation}
   \Lambda^+ = \frac{\sum \sum_{i<j} \omega_{ij}^2}{n (n-1) / 2} - (\Delta^+)^2
 \end{equation}
 
 We still need the taxonomic differences among species ($\omega$) to
-calculate the indices. These can be any
-distance structure among species, but usually it is found from
-established hierarchic taxonomy. Typical coding is that differences
-among species in the same genus is $1$, among the same family it is
-$2$ etc. However, the taxonomic differences are scaled to maximum
-$100$ for easier comparison between different data sets and
-taxonomies. Alternatively, it is possible to scale steps between
-taxonomic level proportional to the reduction in the number of
-categories: if almost all genera have only one species, it does not
-make a great difference if two individuals belong to a different
-species or to a different genus.
+calculate the indices. These can be any distance structure among
+species, but usually it is found from established hierarchic
+taxonomy. Typical coding is that differences among species in the same
+genus is $1$, among the same family it is $2$ etc. However, the
+taxonomic differences are scaled to maximum $100$ for easier
+comparison between different data sets and taxonomies. Alternatively,
+it is possible to scale steps between taxonomic level proportional to
+the reduction in the number of categories \citep{ClarkeWarwick99}: if
+almost all genera have only one species, it does not make a great
+difference if two individuals belong to a different species or to a
+different genus.
 
 Function \code{taxondive} implements indices of taxonomic diversity,
 and \code{taxa2dist} can be used to convert classification tables to
@@ -270,7 +261,7 @@
 taxdis <- taxa2dist(dune.taxon, varstep=TRUE)
 mod <- taxondive(dune, taxdis)
 @
-\begin{SCfigure}
+\begin{figure}
 <<fig=true,echo=false>>=
 plot(mod)
 @
@@ -278,25 +269,26 @@
   points are diversity values of single sites, and the funnel is their
   approximate confidence intervals ($2 \times$ standard error).}
 \label{fig:taxondive}
-\end{SCfigure}
+\end{figure}
 
-\subsection{Functional diversity: the height of property tree}
+\subsection{Functional diversity: the height of trait tree}
 
 In taxonomic diversity the primary data were taxonomic trees which
 were transformed to pairwise distances among species. In functional
-diversity the primary data are species properties which are translated
-to pairwise distances among species and then to clustering trees of
-species properties. The argument for trees is that in this way a
+diversity the primary data are species traits which are translated to
+pairwise distances among species and then to clustering trees of
+species traits. The argument for using trees is that in this way a
 single deviant species will have a small influence, since its
 difference is evaluated only once instead of evaluating its distance
-to all other species.
+to all other species \citep{PetcheyGaston06}.
 
 Function \code{treedive} implements functional diversity defined as
 the total branch length in a trait dendrogram connecting all species,
-but excluding the unnecessary root segments of the tree.  The example
-uses the taxonomic distances of the previous chapter. These are first
-converted to a hierarchic clustering (which actually were their
-original form before \code{taxa2dist} converted them into distances)
+but excluding the unnecessary root segments of the tree
+\citep{PetcheyGaston02, PetcheyGaston06}.  The example uses the
+taxonomic distances of the previous chapter. These are first converted
+to a hierarchic clustering (which actually were their original form
+before \code{taxa2dist} converted them into distances)
 <<>>=
 tr <- hclust(taxdis, "aver")
 mod <- treedive(dune, tr)
@@ -313,7 +305,7 @@
 \subsection{Fisher and Preston}
 
 In Fisher's log-series, the expected number of species $\hat f$ with $n$
-individuals is:
+individuals is \citep{FisherEtal43}:
 \begin{equation}
 \hat f_n = \frac{\alpha x^n}{n}
 \end{equation}
@@ -326,14 +318,14 @@
 fish <- fisherfit(BCI[k,])
 fish
 @
-\begin{SCfigure}
+\begin{figure}
 <<fig=true,echo=false>>=
 plot(fish)
 @
 \caption{Fisher's log-series fitted to one randomly selected site
   (\Sexpr{k}).}
 \label{fig:fisher}
-\end{SCfigure}
+\end{figure}
 We already saw $\alpha$ as a diversity index.  Now we also obtained
 estimate of standard error of $\alpha$ (these also are optionally
 available in \code{fisher.alpha}).  The standard errors are based on
@@ -348,26 +340,25 @@
 @
 
 Preston's log-normal model is the main challenger to Fisher's
-log-series.  Instead of plotting species by frequencies, it bins
-species into frequency classes of increasing sizes.  As a result,
-upper bins with high range of frequencies become more common, and
-sometimes the result looks similar to Gaussian distribution truncated
-at the left.
+log-series \citep{Preston48}.  Instead of plotting species by
+frequencies, it bins species into frequency classes of increasing
+sizes.  As a result, upper bins with high range of frequencies become
+more common, and sometimes the result looks similar to Gaussian
+distribution truncated at the left.
 
 There are two alternative functions for the log-normal model:
-\code{prestonfit} and \code{prestondistr}.  Function
-\code{prestonfit} uses traditionally binning approach, and is burdened
-with arbitrary choices of binning limits and treatment of ties. It
-seems that Preston split ties between adjacent octaves: only half of
-the species observed once were in the first octave, and half were
-transferred to the next octave, and the same for all species at the
-octave limits occuring 2, 4, 8, 16\ldots times. Function
+\code{prestonfit} and \code{prestondistr}.  Function \code{prestonfit}
+uses traditionally binning approach, and is burdened with arbitrary
+choices of binning limits and treatment of ties. It seems that Preston
+split ties between adjacent octaves: only half of the species observed
+once were in the first octave, and half were transferred to the next
+octave, and the same for all species at the octave limits occurring 2,
+4, 8, 16\ldots times \citep{WilliamsonGaston05}. Function
 \code{prestonfit} can either split the ties or keep all limit cases in
-the lower octave.
-Function \code{prestondistr} directly
-maximizes truncated log-normal likelihood without binning data, and it
-is the recommended alternative.  Log-normal models  usually fit poorly
-to the BCI data, but here our random plot (number \Sexpr{k}):
+the lower octave.  Function \code{prestondistr} directly maximizes
+truncated log-normal likelihood without binning data, and it is the
+recommended alternative.  Log-normal models usually fit poorly to the
+BCI data, but here our random plot (number \Sexpr{k}):
 <<>>=
 prestondistr(BCI[k,])
 @
@@ -376,11 +367,11 @@
 
 An alternative approach to species abundance distribution is to plot
 logarithmic abundances in decreasing order, or against ranks of
-species.  These are known as ranked abundance
+species \citep{Whittaker65}.  These are known as ranked abundance
 distribution curves, species abundance curves, dominance--diversity
-curves or Whittaker plots.
-Function \code{radfit} fits some of the most popular models using
-maximum likelihood estimation:
+curves or Whittaker plots.  Function \code{radfit} fits some of the
+most popular models \citep{Bastow91} using maximum likelihood
+estimation:
 \begin{align}
 \hat a_r &= \frac{N}{S} \sum_{k=r}^S \frac{1}{k} &\text{brokenstick}\\
 \hat a_r &= N \alpha (1-\alpha)^{r-1} & \text{preemption} \\
@@ -404,14 +395,14 @@
 rad <- radfit(BCI[k,])
 rad
 @
-\begin{SCfigure}
+\begin{figure}
 <<fig=true,echo=false>>=
 print(radlattice(rad))
 @
 \caption{Ranked abundance distribution models for a random plot
   (no. \Sexpr{k}).  The best model has the lowest \textsc{aic}.}
 \label{fig:rad}
-\end{SCfigure}
+\end{figure}
 
 Function \code{radfit} compares the models using alternatively
 Akaike's or Schwartz's Bayesian information criteria.  These are based
@@ -442,23 +433,24 @@
 they happen to be, and repeated accumulation in random order.  In
 addition, there are three analytic models.  Rarefaction pools
 individuals together, and applies rarefaction equation (\ref{eq:rare})
-to these individuals.  Kindt's exact accumulator resembles rarefaction:
-\begin{equation}
+to these individuals.  Kindt's exact accumulator resembles rarefaction
+\citep{UglandEtal03}:
+\begin{multline}
 \label{eq:kindt}
-\hat S_n = \sum_{i=1}^S (1 - p_i), \, \text{where} \quad p_i = {N- f_i
+\hat S_n = \sum_{i=1}^S (1 - p_i), \, \\ \text{where} \quad  p_i = {N- f_i
 \choose n} \Bigm / {N \choose n}
-\end{equation}
+\end{multline}
 where $f_i$ is the frequency of species $i$.  Approximate variance
 estimator is:
-\begin{equation}
+\begin{multline}
 \label{eq:kindtvar}
-s^2 = p_i (1 - p_i) + 2 \sum_{i=1}^S \sum_{j>i} \left( r_{ij}
+s^2 = p_i (1 - p_i)  \\ + 2 \sum_{i=1}^S \sum_{j>i} \left( r_{ij}
   \sqrt{p_i(1-p_i)} \sqrt{p_j (1-p_j)}\right)
-\end{equation}
+\end{multline}
 where $r_{ij}$ is the correlation coefficient between species $i$ and
[TRUNCATED]

To get the complete diff run:
    svnlook diff /svnroot/vegan -r 2615


More information about the Vegan-commits mailing list