[Vegan-commits] r2921 - pkg/vegan/vignettes

Fri Dec 12 14:47:02 CET 2014

Author: jarioksa
Date: 2014-12-12 14:47:02 +0100 (Fri, 12 Dec 2014)
New Revision: 2921

Modified:
   pkg/vegan/vignettes/diversity-vegan.Rnw
   pkg/vegan/vignettes/vegan.bib
Log:
Conflicts:
	vignettes/diversity-vegan.Rnw

Modified: pkg/vegan/vignettes/diversity-vegan.Rnw
===================================================================

--- pkg/vegan/vignettes/diversity-vegan.Rnw	2014-12-12 09:01:27 UTC (rev 2920)
+++ pkg/vegan/vignettes/diversity-vegan.Rnw	2014-12-12 13:47:02 UTC (rev 2921)
@@ -67,7 +67,7 @@
 \begin{align}
 H &= - \sum_{i=1}^S p_i \log_b  p_i & \text{Shannon--Weaver}\\
 D_1 &= 1 - \sum_{i=1}^S p_i^2  &\text{Simpson}\\
-D_2 &= \frac{1}{\sum_{i=1}^S p_i^2}  &\text{inverse Simpson}
+D_2 &= \frac{1}{\sum_{i=1}^S p_i^2}  &\text{inverse Simpson}\,,
 \end{align}
 where $p_i$ is the proportion of species $i$, and $S$ is the number of
 species so that $\sum_{i=1}^S p_i = 1$, and $b$ is the base of the
@@ -92,9 +92,9 @@
 \pkg{vegan} also can estimate series of R\'{e}nyi and Tsallis
 diversities. R{\'e}nyi diversity of order $a$ is \citep{Hill73number}:
 \begin{equation}
-H_a = \frac{1}{1-a} \log \sum_{i=1}^S p_i^a
+H_a = \frac{1}{1-a} \log \sum_{i=1}^S p_i^a \,,
 \end{equation}
-or the corresponding Hill numbers $N_a = \exp(H_a)$.  Many common
+and the corresponding Hill number is $N_a = \exp(H_a)$.  Many common
 diversity indices are special cases of Hill numbers: $N_0 = S$, $N_1 =
 \exp(H')$, $N_2 = D_2$, and $N_\infty = 1/(\max p_i)$. The
 corresponding R\'{e}nyi diversities are $H_0 = \log(S)$, $H_1 = H'$, $H_2 =
@@ -117,7 +117,7 @@
 We can really regard a site more diverse if all of its R\'{e}nyi
 diversities are higher than in another site.  We can inspect this
 graphically using the standard \code{plot} function for the
-\code{renyi} result (Fig. \ref{fig:renyi}).
+\code{renyi} result (Fig.~\ref{fig:renyi}).
 \begin{figure}
 <<fig=true,echo=false>>=
 print(plot(R))
@@ -142,28 +142,28 @@
 solve this problem, we may try to rarefy species richness to the same
 number of individuals.  Expected number of species in a community
 rarefied from $N$ to $n$ individuals is \citep{Hurlbert71}:
-\begin{multline}
+\begin{equation}
 \label{eq:rare}
-\hat S_n = \sum_{i=1}^S (1 - q_i),\\ \text{where} \quad q_i = {N-x_i
-  \choose n} \Bigm /{N \choose n}
-\end{multline}
-where $x_i$ is the count of species $i$, and ${N \choose n}$ is the
+\hat S_n = \sum_{i=1}^S (1 - q_i)\,, \quad\text{where }  q_i =
+\frac{{N-x_i \choose n}}{{N \choose n}} \,.
+\end{equation}
+Here $x_i$ is the count of species $i$, and ${N \choose n}$ is the
 binomial coefficient, or the number of ways we can choose $n$ from
 $N$, and $q_i$ give the probabilities that species $i$ does \emph{not} occur in a
-sample of size $n$.  This is defined only when $N-x_i > n$, but for
+sample of size $n$.  This is positive only when $N-x_i \ge n$, but for
 other cases $q_i = 0$ or the species is sure to occur in the sample.
 The variance of rarefied richness is \citep{HeckEtal75}:
 \begin{multline}
 \label{eq:rarevar}
-s^2 = q_i (1-q_i)  \\ + 2 \sum_{i=1}^S \sum_{j>i} \left[ {N- x_i - x_j
-    \choose n} \Bigm / {N
-    \choose n} - q_i q_j\right]
+s^2 = q_i (1-q_i)  \\ + 2 \sum_{i=1}^S \sum_{j>i} \left[ \frac{{N- x_i - x_j
+    \choose n}}{ {N
+    \choose n}} - q_i q_j\right] \,.
 \end{multline}
-Equation \ref{eq:rarevar} actually is of the same form as the variance
+Equation~\ref{eq:rarevar} actually is of the same form as the variance
 of sum of correlated variables:
 \begin{equation}
 \VAR \left(\sum x_i \right) = \sum \VAR (x_i) + 2 \sum_{i=1}^S
-\sum_{j>i} \COV (x_i, x_j)
+\sum_{j>i} \COV (x_i, x_j) \,.
 \end{equation}
 
 The number of stems per hectare varies in our
@@ -215,7 +215,8 @@
 taxonomic distinctness $\Delta^*$ \citep{ClarkeWarwick98}:
 \begin{align}
   \Delta &= \frac{\sum \sum_{i<j} \omega_{ij} x_i x_j}{n (n-1) / 2}\\
-\Delta^* &= \frac{\sum \sum_{i<j} \omega_{ij} x_i x_j}{\sum \sum_{i<j} x_i x_j}
+\Delta^* &= \frac{\sum \sum_{i<j} \omega_{ij} x_i x_j}{\sum \sum_{i<j}
+  x_i x_j} \,.
 \end{align}
 These equations give the index values for a single site, and summation
 goes over species $i$ and $j$, and $\omega$ are the taxonomic
@@ -230,7 +231,8 @@
 to give $s \Delta^+$, or it can be used to estimate an index of
 variation in taxonomic distinctness $\Lambda^+$ \citep{ClarkeWarwick01}:
 \begin{equation}
-  \Lambda^+ = \frac{\sum \sum_{i<j} \omega_{ij}^2}{n (n-1) / 2} - (\Delta^+)^2
+  \Lambda^+ = \frac{\sum \sum_{i<j} \omega_{ij}^2}{n (n-1) / 2} -
+  (\Delta^+)^2 \,.
 \end{equation}
 
 We still need the taxonomic differences among species ($\omega$) to
@@ -254,7 +256,7 @@
   but taxonomic differences proved to be of little use in the Barro
   Colorado data: they only singled out sites with Monocots (palm
   trees) in the data.}
-but there is such a table for the Dune meadow data (Fig. \ref{fig:taxondive}):
+but there is such a table for the Dune meadow data (Fig.~\ref{fig:taxondive}):
 <<>>=
 data(dune)
 data(dune.taxon)
@@ -307,12 +309,12 @@
 In Fisher's log-series, the expected number of species $\hat f$ with $n$
 individuals is \citep{FisherEtal43}:
 \begin{equation}
-\hat f_n = \frac{\alpha x^n}{n}
+\hat f_n = \frac{\alpha x^n}{n} \,,
 \end{equation}
 where $\alpha$ is the diversity parameter, and $x$ is a nuisance
 parameter defined by $\alpha$ and total number
 of individuals $N$ in the site, $x = N/(N-\alpha)$.  Fisher's
-log-series for a randomly selected plot is (Fig. \ref{fig:fisher}):
+log-series for a randomly selected plot is (Fig.~\ref{fig:fisher}):
 <<>>=
 k <- sample(nrow(BCI), 1)
 fish <- fisherfit(BCI[k,])
@@ -369,7 +371,7 @@
 \hat a_r &= N \hat p_1 r^\gamma &\text{Zipf}\\
 \hat a_r &= N c (r + \beta)^\gamma &\text{Zipf--Mandelbrot}
 \end{align}
-Where $\hat a_r$ is the expected abundance of species at rank $r$, $S$
+In all these, $\hat a_r$ is the expected abundance of species at rank $r$, $S$
 is the number of species, $N$ is the number of individuals, $\Phi$ is
 a standard normal function, $\hat p_1$ is the estimated proportion of
 the most abundant species, and $\alpha$, $\mu$, $\sigma$, $\gamma$,
@@ -379,7 +381,7 @@
 abundances $a_r$, but there is no reason for this, and \code{radfit}
 is able to work with the original abundance data.  We have count data,
 and the default Poisson error looks appropriate, and our example data
-set gives (Fig. \ref{fig:rad}):
+set gives (Fig.~\ref{fig:rad}):
 <<>>=
 rad <- radfit(BCI[k,])
 rad
@@ -426,30 +428,31 @@
 \citep{UglandEtal03}:
 \begin{multline}
 \label{eq:kindt}
-\hat S_n = \sum_{i=1}^S (1 - p_i), \, \\ \text{where} \quad  p_i = {N- f_i
-\choose n} \Bigm / {N \choose n}
+\hat S_n = \sum_{i=1}^S (1 - p_i), \,\quad \text{where }
+p_i = \frac{{N- f_i \choose n}}{{N \choose n}} \,,
 \end{multline}
-where $f_i$ is the frequency of species $i$.  Approximate variance
+and $f_i$ is the frequency of species $i$.  Approximate variance
 estimator is:
 \begin{multline}
 \label{eq:kindtvar}
 s^2 = p_i (1 - p_i)  \\ + 2 \sum_{i=1}^S \sum_{j>i} \left( r_{ij}
-  \sqrt{p_i(1-p_i)} \sqrt{p_j (1-p_j)}\right)
+  \sqrt{p_i(1-p_i)} \sqrt{p_j (1-p_j)}\right) \,,
 \end{multline}
 where $r_{ij}$ is the correlation coefficient between species $i$ and
-$j$.  Both of these are unpublished: eq. \ref{eq:kindt} was developed
-by Roeland Kindt, and eq. \ref{eq:kindtvar} by Jari Oksanen. The third
+$j$.  Both of these are unpublished: eq.~\ref{eq:kindt} was developed
+by Roeland Kindt, and eq.~\ref{eq:kindtvar} by Jari Oksanen. The third
 analytic method was suggested by \citet{Coleman82}:
 \begin{equation}
 \label{eq:cole}
-S_n = \sum_{i=1}^S (1 - p_i), \, \text{where} \quad p_i = \left(1 - \frac{1}{n}\right)^{f_i}
+S_n = \sum_{i=1}^S (1 - p_i), \quad \text{where }  p_i = \left(1 -
+  \frac{1}{n}\right)^{f_i} \,,
 \end{equation}
-and he suggested variance $s^2 = p_i (1-p_i)$ which ignores the
-covariance component.  In addition, eq. \ref{eq:cole} does not
+and the suggested variance is $s^2 = p_i (1-p_i)$ which ignores the
+covariance component.  In addition, eq.~\ref{eq:cole} does not
 properly handle sampling without replacement and underestimates the
 species accumulation curve.
 
-The recommended is Kindt's exact method (Fig. \ref{fig:sac}):
+The recommended is Kindt's exact method (Fig.~\ref{fig:sac}):
 <<a>>=
 sac <- specaccum(BCI)
 plot(sac, ci.type="polygon", ci.col="yellow")
@@ -478,7 +481,7 @@
 richness per one site $\bar \alpha$ \citep{Tuomisto10a}:
 \begin{equation}
   \label{eq:beta}
-  \beta = S/\bar \alpha - 1
+  \beta = S/\bar \alpha - 1 \,.
 \end{equation}
 Subtraction of one means that $\beta = 0$ when there are no excess
 species or no heterogeneity between sites. For this index, no specific
@@ -488,16 +491,16 @@
 ncol(BCI)/mean(specnumber(BCI)) - 1
 @
 
-The index of eq. \ref{eq:beta} is problematic because $S$ increases
+The index of eq.~\ref{eq:beta} is problematic because $S$ increases
 with the number of sites even when sites are all subsets of the same
 community.  \citet{Whittaker60} noticed this, and suggested the index
 to be found from pairwise comparison of sites. If the number of shared
 species in two sites is $a$, and the numbers of species unique to each
 site are $b$ and $c$, then $\bar \alpha = (2a + b + c)/2$ and $S =
-a+b+c$, and index \ref{eq:beta} can be expressed as:
+a+b+c$, and index~\ref{eq:beta} can be expressed as:
 \begin{equation}
   \label{eq:betabray}
-  \beta = \frac{a+b+c}{(2a+b+c)/2} - 1 = \frac{b+c}{2a+b+c}
+  \beta = \frac{a+b+c}{(2a+b+c)/2} - 1 = \frac{b+c}{2a+b+c} \,.
 \end{equation}
 This is the S{\o}rensen index of dissimilarity, and it can be found
 for all sites using \pkg{vegan} function \code{vegdist} with
@@ -508,7 +511,7 @@
 @
 
 There are many other definitions of beta diversity in addition to
-eq. \ref{eq:beta}.  All commonly used indices can be found using
+eq.~\ref{eq:beta}.  All commonly used indices can be found using
 \code{betadiver} \citep{KoleffEtal03}. The indices in \code{betadiver}
 can be referred to by subscript name, or index number:
 <<>>=
@@ -520,7 +523,7 @@
 on the Arrhenius species--area model
 \begin{equation}
   \label{eq:arrhenius}
-  \hat S = c X^z
+  \hat S = c X^z\,,
 \end{equation}
 where $X$ is the area (size) of the patch or site, and $c$ and $z$ are
 parameters. Parameter $c$ is uninteresting, but $z$ gives the
@@ -541,7 +544,7 @@
 with respect to classes or factors \citep{Anderson06, AndersonEtal06}.
 There is no such classification available for the Barro Colorado
 Island data, and the example studies beta diversities in the
-management classes of the dune meadows (Fig. \ref{fig:betadisper}):
+management classes of the dune meadows (Fig.~\ref{fig:betadisper}):
 <<>>=
 data(dune)
 data(dune.env)
@@ -596,7 +599,7 @@
 \label{eq:chao}
 \hat f_0 = \begin{cases} 
     \frac{f_1^2}{2 f_2} \frac{N-1}{N} &\text{if } f_2 > 0 \\
-\frac{f_1 (f_1 -1)}{2}  \frac{N-1}{N} & \text{if } f_2 = 0
+\frac{f_1 (f_1 -1)}{2}  \frac{N-1}{N} & \text{if } f_2 = 0 \,.
 \end{cases}
 \end{equation}
 The latter case for $f_2=0$ is known as the bias-corrected
@@ -607,11 +610,11 @@
 \citep{SmithVanBelle84}:
 \begin{align}
 \hat f_0 &=  f_1 \frac{N-1}{N}  \\ 
-\hat f_0 & =  f_1 \frac{2N-3}{N}  + f_2 \frac{(N-2)^2}{N(N-1)}
+\hat f_0 & =  f_1 \frac{2N-3}{N}  + f_2 \frac{(N-2)^2}{N(N-1)} \,.
 \end{align}
 The boostrap estimator is \citep{SmithVanBelle84}:
 \begin{equation}
-\hat f_0 =  \sum_{i=1}^{S_o} (1-p_i)^N
+\hat f_0 =  \sum_{i=1}^{S_o} (1-p_i)^N \,.
 \end{equation}
 The idea in jackknife seems to be that we missed about as many species
 as we saw only once, and the idea in bootstrap that if we repeat
@@ -625,7 +628,7 @@
 \begin{multline}
 \label{eq:var-chao-basic}
 \VAR(\hat f_0) = f_1 \left(A^2 \frac{G^3}{4} + A^2 G^2 + A \frac{G}{2} \right),\\
-\text{where}\; A = \frac{N-1}{N}\;\text{and}\; G = \frac{f_1}{f_2} 
+\text{where } A = \frac{N-1}{N}\;\text{and } G = \frac{f_1}{f_2} \,.
 \end{multline}
 %% The variance of bias-corrected Chao estimate can be approximated by
 %% replacing the terms of eq.~\ref{eq:var-chao-basic} with the
@@ -635,18 +638,20 @@
 %% s^2 = A \frac{f_1(f_1-1)}{2} + A^2 \frac{f_1(2 f_1+1)^2}{(f_2+1)^2}\\
 %%  + A^2 \frac{f_1^2 f_2 (f_1 -1)^2}{4 (f_2 + 1)^4}
 %% \end{multline}
-For the bias-corrected form of eq.~\ref{eq:chao}  (case $f_2 = 0$), the he variance is 
+For the bias-corrected form of eq.~\ref{eq:chao}  (case $f_2 = 0$), the variance is
 \citep[who omit small-sample correction in some terms]{ChiuEtal14}:
 \begin{multline}
 \label{eq:var-chao-bc0}
-\VAR(\hat f_0) = \frac{1}{4} A^2 f_1 (2f_1 -1)^2 + \frac{1}{2} A f_1 (f_1-1) - \frac{1}{4}A^2 \frac{f_1^4}{S_p}
+\VAR(\hat f_0) = \tfrac{1}{4} A^2 f_1 (2f_1 -1)^2 + \tfrac{1}{2} A f_1
+(f_1-1) \\- \tfrac{1}{4}A^2 \frac{f_1^4}{S_p} \,.
 \end{multline}
 
 The variance of the first-order jackknife is based on the number of
 ``singletons'' $r$ (species occurring only once in the data) in sample
 plots \citep{SmithVanBelle84}:
 \begin{equation}
-\VAR(\hat f_0) = \left(\sum_{i=1}^N r_i^2 - \frac{f_1}{N}\right) \frac{N-1}{N}
+\VAR(\hat f_0) = \left(\sum_{i=1}^N r_i^2 - \frac{f_1}{N}\right)
+\frac{N-1}{N} \,.
 \end{equation}
 Variance of the second-order jackknife is not evaluated in
 \code{specpool} (but contributions are welcome).
@@ -657,7 +662,7 @@
   j}^{S_o} \left[(Z_{ij}/N)^N - q_i q_j \right] \\
 \text{where } q_i = (1-p_i)^N \, ,
 \end{multline}
-where $Z_{ij}$ is the number of sites where both species are absent.
+and $Z_{ij}$ is the number of sites where both species are absent.
 
 The extrapolated richness values for the whole BCI data are:
 <<>>=
@@ -701,7 +706,7 @@
 \begin{multline}
   \label{eq:var-chao-bc}
  s^2 = \frac{a_1(a_1-1)}{2} + \frac{a_1(2 a_1+1)^2}{(a_2+1)^2}\\
-  + \frac{a_1^2 a_2 (a_1 -1)^2}{4 (a_2 + 1)^4}
+  + \frac{a_1^2 a_2 (a_1 -1)^2}{4 (a_2 + 1)^4} \,.
 \end{multline}
 However, \pkg{vegan} does not use this, but instead the following more
 exact form which was directly derived from eq.~\ref{eq:chao-bc}
@@ -721,7 +726,7 @@
 \frac{a_1}{C_\mathrm{ACE}} \gamma^2\, , \quad \text{where}\\
 C_\mathrm{ACE} &= 1 - \frac{a_1}{N_\mathrm{rare}}\\
 \gamma^2 &= \frac{S_\mathrm{rare}}{C_\mathrm{ACE}} \sum_{i=1}^{10} i
-(i-1) a_1 \frac{N_\mathrm{rare} - 1}{N_\mathrm{rare}}
+(i-1) a_1 \frac{N_\mathrm{rare} - 1}{N_\mathrm{rare}}\,.
 \end{split}
 \end{equation}
 Now $a_1$ takes the place of $f_1$ above, and means the number of
@@ -741,7 +746,7 @@
 estimate the pool size.  Log-normal model has a finite number of
 species which can be found integrating the log-normal:
 \begin{equation}
-S_p = S_\mu \sigma \sqrt{2 \pi}
+S_p = S_\mu \sigma \sqrt{2 \pi} \,,
 \end{equation}
 where $S_\mu$ is the modal height or the expected number of species at
 maximum (at $\mu$), and $\sigma$ is the width.  Function
@@ -771,12 +776,12 @@
 We may see how the estimated probability of occurrence and observed
 numbers of stems relate in one of the more familiar species. We study
 only one species, and to avoid circular reasoning we do not include
-the target species in the smoothing (Fig. \ref{fig:beals}):
+the target species in the smoothing (Fig.~\ref{fig:beals}):
 <<a>>=
 j <- which(colnames(BCI) == "Ceiba.pentandra")
 plot(beals(BCI, species=j, include=FALSE), BCI[,j], 
-     main="Ceiba pentandra", xlab="Probability of occurrence",
-     ylab="Occurrence")
+     ylab="Occurrence", main="Ceiba pentandra", 
+     xlab="Probability of occurrence")
 @
 \begin{figure}
 <<fig=true,echo=false>>=

Modified: pkg/vegan/vignettes/vegan.bib
===================================================================
--- pkg/vegan/vignettes/vegan.bib	2014-12-12 09:01:27 UTC (rev 2920)
+++ pkg/vegan/vignettes/vegan.bib	2014-12-12 13:47:02 UTC (rev 2921)
@@ -266,7 +266,7 @@
 }
 
 @Article{Tothmeresz95,
-  author = 	 {B. Tothmeresz},
+  author = 	 {B. T{\'o}thm{\'e}r{\'e}sz},
   title = 	 {Comparison of different methods for diversity ordering},
   journal = 	 {Journal of Vegetation Science},
   year = 	 1995,