[Vegan-commits] r2918 - in pkg/vegan: R vignettes

noreply at r-forge.r-project.org noreply at r-forge.r-project.org
Fri Dec 12 09:42:59 CET 2014


Author: jarioksa
Date: 2014-12-12 09:42:58 +0100 (Fri, 12 Dec 2014)
New Revision: 2918

Modified:
   pkg/vegan/R/estimateR.default.R
   pkg/vegan/R/specpool.R
   pkg/vegan/vignettes/diversity-vegan.Rnw
   pkg/vegan/vignettes/vegan.bib
Log:
Squashed commit of the following:

commit fd3ffc60c65f46f1c5554025ece18f2fd58cfda1
Author: Jari Oksanen <jari.oksanen at oulu.fi>
Date:   Thu Dec 11 12:51:19 2014 +0200

    tweaks

    (cherry picked from commit 685e5795e02deb75c78803f2f1c2e82ef3c49663)

commit 66aca05f78de8dadb0cc7c9e978da59794726dd9
Author: Jari Oksanen <jari.oksanen at oulu.fi>
Date:   Wed Dec 10 20:30:30 2014 +0200

    Edit diversity-vegan

    (cherry picked from commit 79eb46ece1cac5081f50d4bd84b0c4e8899b2b1c)

commit 2bc1a16c03f11d8886f4ed57ad391e7a5d851e32
Author: Jari Oksanen <jari.oksanen at oulu.fi>
Date:   Sun Dec 7 19:04:14 2014 +0200

    Edit diversity-vegan vignette

    Be explicit about extended species richness only concerning
    the number of unseen species and not the total number of species.

    (cherry picked from commit b06ec8328d3083804d2a84ad4b74b67cab41d428)

commit ee9153e2ce44b179eb118611c0e6989e8bdd56bf
Author: Jari Oksanen <jari.oksanen at oulu.fi>
Date:   Sun Dec 7 18:14:20 2014 +0200

    Use only the new "exact" variance estimator for Chao bias-reduced

    (cherry picked from commit 9e49c56035bd0ac91b4bd23f2900d57ba4361ff9)

commit 39f7e67a1c30ff84a6dd6263df950d2e7cbe075e
Author: Jari Oksanen <jari.oksanen at oulu.fi>
Date:   Sun Dec 7 18:06:39 2014 +0200

    Document the exact Chao bia-corrected variance estimator

    (cherry picked from commit d8253316fc983d4d0ba5a2910a0079b8cffd8852)

commit b150c979190a9288d2aca2291aa50308dee40324
Author: Jari Oksanen <jari.oksanen at oulu.fi>
Date:   Sun Dec 7 16:50:53 2014 +0200

    More exact derivation of approximate variance of bias-reduced Chao

    The common texts (and EstimateS web page) use approximate estimator
    for bias-corrected Chao estimate. This uses derivation for the basic
    estimator but replaces terms with those of bias corrected form
    (a1^2 -> a1*(a1-1), 2*a2 -> 2*(a2+1)). The used form is unpublished
    as such, but it follows exactly the derivation as given in Chiu et al.,
    web appendix. The numerical differences between forms are tiny: in
    BCI the maximum difference is ca. 0.02%, and in the 'mite' data up to
    12% -- it seems that difference vanishes as the sample size (no. of
    individuals) increases.

    (cherry picked from commit b7f92c6fb41288c3e441e14acc044bb0ce748e26)

commit e2332b369d9b7df79545819406336db3662202b7
Author: Jari Oksanen <jari.oksanen at oulu.fi>
Date:   Thu Dec 4 20:28:49 2014 +0200

    Formatting of diversity vignette

    (cherry picked from commit d24db8b233328d961ea77c65e46f9531b5ee418e)

commit 1ed473db86678e0070de068dbe3324952386589d
Author: Jari Oksanen <jari.oksanen at oulu.fi>
Date:   Thu Dec 4 17:14:40 2014 +0200

    Edit description of Chao estimates in the diversity vignette

    (cherry picked from commit d689ed87db134f5da95c24485a2f54e4847b7703)

commit 0b6e8af9c070084fba41e99c0578293e3f42de45
Author: Jari Oksanen <jari.oksanen at oulu.fi>
Date:   Thu Dec 4 13:11:29 2014 +0200

    Use simpler expression for the variance of Chao extrapolation

    The new expression uses lower power or one multiplication less than
    the old one. The old one directly lifted from the literature, and
    it is mathematically equal to the new one.

    (cherry picked from commit 92db96cb09a3191add7464220777ca9cfcf97e27)

commit 3aef904857191eb6f69d840adb57254ff7c82253
Author: Jari Oksanen <jari.oksanen at oulu.fi>
Date:   Mon Dec 8 12:39:35 2014 +0200

    typo reported by Olivier Missa at ens.fr
    (cherry picked from commit 199131ed12c5c514223f8681121eabf6e516a6b2)

Modified: pkg/vegan/R/estimateR.default.R
===================================================================
--- pkg/vegan/R/estimateR.default.R	2014-12-08 08:47:23 UTC (rev 2917)
+++ pkg/vegan/R/estimateR.default.R	2014-12-12 08:42:58 UTC (rev 2918)
@@ -55,15 +55,26 @@
     ##else
     ##    S.Chao1 <- S.obs
     Deriv.Ch1 <- gradF(a, i)
-    ##if (a[2] > 0)
-    ##    sd.Chao1 <- sqrt(a[2] * (SSC * (SSC * (G^4/4 + G^3) + G^2/2)))
-    ##else if (a[1] > 0)
-    sd.Chao1 <-
-        sqrt(SSC*(a[1]*(a[1]-1)/2/(a[2]+1) +
-                  SSC*(a[1]*(2*a[1]-1)^2/4/(a[2]+1)^2 +
-                       a[1]^2*a[2]*(a[1]-1)^2/4/(a[2]+1)^4)))
-    ##else
-    ##    sd.Chao1 <- 0
+
+    ## The commonly used variance estimator is wrong for bias-reduced
+    ## Chao estimate. It is based on the variance estimator of basic
+    ## Chao estimate, but replaces the basic terms with corresponding
+    ## terms in the bias-reduced estimate. The following is directly
+    ## derived from the bias-reduced estimate.
+
+    ## The commonly used one (for instance, in EstimateS):
+    ##sd.Chao1 <-
+    ##    sqrt(SSC*(a[1]*(a[1]-1)/2/(a[2]+1) +
+    ##              SSC*(a[1]*(2*a[1]-1)^2/4/(a[2]+1)^2 +
+    ##                  a[1]^2*a[2]*(a[1]-1)^2/4/(a[2]+1)^4)))
+
+    sd.Chao1 <- (a[1]*((-a[2]^2+(-2*a[2]-a[1])*a[1])*a[1] +
+                       (-1+(-4+(-5-2*a[2])*a[2])*a[2] +
+                        (-2+(-1+(2*a[2]+2)*a[2])*a[2] +
+                         (4+(6+4*a[2])*a[2] + a[1]*a[2])*a[1])*a[1])*S.Chao1))/
+                             4/(a[2]+1)^4/S.Chao1
+    sd.Chao1 <- sqrt(sd.Chao1)
+
     C.ace <- 1 - a[1]/N.rare
     i <- 1:length(a)
     thing <- i * (i - 1) * a
@@ -72,7 +83,7 @@
     S.ACE <- S.abund + S.rare/C.ace + max(Gam, 0) * a[1]/C.ace
     sd.ACE <- sqrt(sum(Deriv.Ch1 %*% t(Deriv.Ch1) * (diag(a) - 
                                                      a %*% t(a)/S.ACE)))
-    out <- list(S.obs = S.obs, S.chao1 = S.Chao1, se.chao1 = sd.Chao1, 
+    out <- list(S.obs = S.obs, S.chao1 = S.Chao1, se.chao1 = sd.Chao1,
                 S.ACE = S.ACE, se.ACE = sd.ACE)
     out <- unlist(out)
     out

Modified: pkg/vegan/R/specpool.R
===================================================================
--- pkg/vegan/R/specpool.R	2014-12-08 08:47:23 UTC (rev 2917)
+++ pkg/vegan/R/specpool.R	2014-12-12 08:42:58 UTC (rev 2918)
@@ -52,7 +52,7 @@
             a1/a2
         else 0
         if (a2 > 0)
-            var.chao[is] <- a2 * ssc * (0.5 + ssc * (1 + aa/4) * aa) * aa * aa
+            var.chao[is] <- a1 * ssc * (0.5 + ssc * (1 + aa/4) * aa) * aa
         else
             var.chao[is] <-
                 ssc * (ssc * (a1*(2*a1-1)^2/4 - a1^4/chao[is]/4) + a1*(a1-1)/2)

Modified: pkg/vegan/vignettes/diversity-vegan.Rnw
===================================================================
--- pkg/vegan/vignettes/diversity-vegan.Rnw	2014-12-08 08:47:23 UTC (rev 2917)
+++ pkg/vegan/vignettes/diversity-vegan.Rnw	2014-12-12 08:42:58 UTC (rev 2918)
@@ -103,8 +103,8 @@
 \begin{equation}
   H_q = \frac{1}{q-1} \left(1 - \sum_{i=1}^S p^q \right) \, .
 \end{equation}
-This corresponds to common diversity indices: $H_0 = S-1$, $H_1 = H'$,
-and $H_2 = D_2$, and can be converted to the Hill number:
+These correspond to common diversity indices: $H_0 = S-1$, $H_1 = H'$,
+and $H_2 = D_1$, and can be converted to Hill numbers:
 \begin{equation}
   N_q = (1 - (q-1) H_q )^\frac{1}{1-q} \, .
 \end{equation}
@@ -572,66 +572,90 @@
 species is related to the number of rare species, or species seen only
 once or twice.
 
+The incidence-based functions group species by their number of
+occurrences $f_i = f_0, f_1, \ldots, f_N$, where $f$ is the number of
+species occuring in exactly $i$ sites in the data: $f_N$ is the number
+of species occurring on every $N$ site, $f_1$ the number of species
+occurring once, and $f_0$ the number of species in the species pool
+but not found in the sample. The total number of species in the pool
+$S_p$ is
+\begin{equation}
+S_p = \sum_{i=0}^N f_i = f_0+ S_o \,,
+\end{equation}
+where $S_o = \sum_{i>0} f_i$ is the observed number of species.  The
+sampling proportion $i/N$ is an estimate for the commonness of the
+species in the community. When species is present in the community but
+not in the sample, $i=0$ is an obvious under-estimate, and
+consequently, for values $i>0$ the species commonness is
+over-estimated \citep{Good53}. The models for the pool size estimate
+the number of species missing in the sample $f_0$.
+
 Function \code{specpool} implements the following models to estimate
-the pool size $S_p$ \citep{SmithVanBelle84, Chao87, ChiuEtal14}:
+the number of missing species $f_0$. Chao estimator  is \citep{Chao87, ChiuEtal14}:
+\begin{equation}
+\label{eq:chao}
+\hat f_0 = \begin{cases} 
+    \frac{f_1^2}{2 f_2} \frac{N-1}{N} &\text{if } f_2 > 0 \\
+\frac{f_1 (f_1 -1)}{2}  \frac{N-1}{N} & \text{if } f_2 = 0
+\end{cases}
+\end{equation}
+The latter case for $f_2=0$ is known as the bias-corrected
+form. \citet{ChiuEtal14} introduced the small-sample correction term
+$\frac{N}{N-1}$, but it was not originally used \citep{Chao87}.
+
+The first and second order jackknife estimators are
+\citep{SmithVanBelle84}:
 \begin{align}
-\label{eq:chao-basic}
-S_p &= S_o + \frac{f_1^2}{2 f_2} \frac{N-1}{N} & \text{Chao}\\
-\label{eq:chao-bc}
-S_p &= S_o + \frac{f_1 (f_1 -1)}{2 (f_2+1)}  \frac{N-1}{N} & \text{Chao bias-corrected}\\
-S_p &= S_o + f_1 \frac{N-1}{N}  & \text{1st order Jackknife}\\
-S_p & = S_o + f_1 \frac{2N-3}{N} \nonumber \\ & + f_2 \frac{(N-2)^2}{N(N-1)}
-& \text{2nd order Jackknife}\\
-S_p &= S_o + \sum_{i=1}^{S_o} (1-p_i)^N & \text{Bootstrap}
+\hat f_0 &=  f_1 \frac{N-1}{N}  \\ 
+\hat f_0 & =  f_1 \frac{2N-3}{N}  + f_2 \frac{(N-2)^2}{N(N-1)}
 \end{align}
-Here $S_o$ is the observed number of species, $f_1$ and $f_2$ are the
-numbers of species observed once or twice, $N$ is the number of sites,
-and $p_i$ are proportions of species.  The idea in jackknife seems to
-be that we missed about as many species as we saw only once, and the
-idea in bootstrap that if we repeat sampling (with replacement) from
-the same data, we miss as many species as we missed originally.
-\citet{ChiuEtal14} introduced the small-sample correction term
-$\frac{N}{N-1}$, but it was not originally used \citep{Chao87}.
+The boostrap estimator is \citep{SmithVanBelle84}:
+\begin{equation}
+\hat f_0 =  \sum_{i=1}^{S_o} (1-p_i)^N
+\end{equation}
+The idea in jackknife seems to be that we missed about as many species
+as we saw only once, and the idea in bootstrap that if we repeat
+sampling (with replacement) from the same data, we miss as many
+species as we missed originally.
 
-The variance the estimator of the basic Chao estimate is \citep{ChiuEtal14}:
+The variance estimaters only concern the estimated number of missing
+species $\hat f_0$, although they are often expressed as they would
+apply to the pool size $S_p$; this is only true if we assume that
+$\VAR(S_o) = 0$.  The variance of the Chao estimate is \citep{ChiuEtal14}:
 \begin{multline}
 \label{eq:var-chao-basic}
-s^2 = f_2 \left(A^2 \frac{G^4}{4} + A^2 G^3 + A \frac{G^2}{2} \right),\\
+\VAR(\hat f_0) = f_1 \left(A^2 \frac{G^3}{4} + A^2 G^2 + A \frac{G}{2} \right),\\
 \text{where}\; A = \frac{N-1}{N}\;\text{and}\; G = \frac{f_1}{f_2} 
 \end{multline}
-The variance of bias-corrected Chao estimate can be approximated by
-replacing the terms of eq.~\ref{eq:var-chao-basic} with the
-corresponding terms in eq.~\ref{eq:chao-bc}:
-\begin{multline}
-\label{eq:var-chao-bc}
-s^2 = A \frac{f_1(f_1-1)}{2(f_2+1)} + A^2 \frac{f_1(2 f_1+1)^2}{(f_2+1)^2}\\
- + A^2 \frac{f_1^2 f_2 (f_1 -1)^2}{4 (f_2 + 1)^4}
-\end{multline}
-If we apply the bias-correction in the special case where there are no
-doubletons ($f_2 = 0$), the he variance is 
+%% The variance of bias-corrected Chao estimate can be approximated by
+%% replacing the terms of eq.~\ref{eq:var-chao-basic} with the
+%% corresponding terms of the bias-correcter form of in eq.~\ref{eq:chao}:
+%% \begin{multline}
+%% \label{eq:var-chao-bc}
+%% s^2 = A \frac{f_1(f_1-1)}{2} + A^2 \frac{f_1(2 f_1+1)^2}{(f_2+1)^2}\\
+%%  + A^2 \frac{f_1^2 f_2 (f_1 -1)^2}{4 (f_2 + 1)^4}
+%% \end{multline}
+For the bias-corrected form of eq.~\ref{eq:chao}  (case $f_2 = 0$), the he variance is 
 \citep[who omit small-sample correction in some terms]{ChiuEtal14}:
 \begin{multline}
 \label{eq:var-chao-bc0}
-s^2 = \frac{1}{4} A^2 f_1 (2f_1 -1)^2 + \frac{1}{2} A f_1 (f_1-1) - \frac{1}{4}A^2 \frac{f_1^4}{S_p}
+\VAR(\hat f_0) = \frac{1}{4} A^2 f_1 (2f_1 -1)^2 + \frac{1}{2} A f_1 (f_1-1) - \frac{1}{4}A^2 \frac{f_1^4}{S_p}
 \end{multline}
-Function \code{specpool} uses eq.~\ref{eq:chao-basic} and estimates
-its variance with eq.~\ref{eq:var-chao-basic} when $f_2 > 0$. When
-$f_2 = 0$, \code{specpool} applies eq.~\ref{eq:chao-bc} which reduces
-to $\frac{N-1}{N} \frac{1}{2} f_1 (f_1 - 1)$, and its variance
-estimator eq.~\ref{eq:var-chao-bc0}.
 
 The variance of the first-order jackknife is based on the number of
 ``singletons'' $r$ (species occurring only once in the data) in sample
 plots \citep{SmithVanBelle84}:
 \begin{equation}
-s^2 = \left(\sum_{i=1}^N r_i^2 - \frac{f_1}{N}\right) \frac{N-1}{N}
+\VAR(\hat f_0) = \left(\sum_{i=1}^N r_i^2 - \frac{f_1}{N}\right) \frac{N-1}{N}
 \end{equation}
 Variance of the second-order jackknife is not evaluated in
 \code{specpool} (but contributions are welcome).
-For the variance of bootstrap estimator, it is practical to define a
-new variable $q_i = (1-p_i)^N$ for each species \citep{SmithVanBelle84}:
+
+The variance of bootstrap estimator is\citep{SmithVanBelle84}:
 \begin{multline}
-s^2 = \sum_{i=1}^{S_o} q_i (1-q_i)  \\ +2 \sum_{i \neq j}^{S_o} \left[(Z_{ij}/N)^N - q_i q_j \right]
+\VAR(\hat f_0) = \sum_{i=1}^{S_o} q_i (1-q_i)  \\ +2 \sum_{i \neq
+  j}^{S_o} \left[(Z_{ij}/N)^N - q_i q_j \right] \\
+\text{where } q_i = (1-p_i)^N \, ,
 \end{multline}
 where $Z_{ij}$ is the number of sites where both species are absent.
 
@@ -658,12 +682,40 @@
 <<>>=
 estimateR(BCI[k,])
 @ 
+In abundance based models $a_i$ denotes the number of species with $i$
+individuals, and takes the place of $f_i$ of previous models.
 Chao's method is similar as the bias-corrected model
-eq.~\ref{eq:chao-bc} with its variance estimator
-eq.~\ref{eq:var-chao-bc}, but it uses counts of individuals instead of
-incidences, and does not use small sample correction.  \textsc{ace} is
-based on rare species also:
+eq.~\ref{eq:chao} \citep{Chao87, ChiuEtal14}:
 \begin{equation}
+  \label{eq:chao-bc}
+  S_p = S_o + \frac{a_1 (a_1 - 1)}{2 (a_2 + 1)}\,.
+\end{equation}
+When $f_2=0$, eq.~\ref{eq:chao-bc} reduces to the bias-corrected form
+of eq.~\ref{eq:chao}, but quantitative estimators are based on
+abundances and do not use small-sample correction. This is not usually
+needed because sample sizes are total numbers of individuals, and
+these are usually high, unlike in frequency based models, where the
+sample size is the number of sites \citep{ChiuEtal14}. 
+
+A commonly used approximate variance estimator of eq.~\ref{eq:chao-bc} is:
+\begin{multline}
+  \label{eq:var-chao-bc}
+ s^2 = \frac{a_1(a_1-1)}{2} + \frac{a_1(2 a_1+1)^2}{(a_2+1)^2}\\
+  + \frac{a_1^2 a_2 (a_1 -1)^2}{4 (a_2 + 1)^4}
+\end{multline}
+However, \pkg{vegan} does not use this, but instead the following more
+exact form which was directly derived from eq.~\ref{eq:chao-bc}
+following \citet[web appendix]{ChiuEtal14}:
+\begin{multline}
+  s^2 = \frac{1}{4} \frac{1}{(a_2+1)^4 S_p} [a_1 (S_p a_1^3
+      a_2 + 4 S_p a_1^2 a_2^2 \\+  2 S_p a_1 a_2^3 + 6 S_p a_1^2 a_2 + 2 S_p
+      a_1 a_2^2 -2 S_p a_2^3 \\+ 4 S_p a_1^2 + S_p a_1 a_2 -5 S_p a_2^2 - a_1^3 - 2
+      a_1^2 a_2\\ - a_1 a_2^2 - 2 S_p a_1 - 4 S_p a_2 - S_p ) ]\,.
+\end{multline}
+The variance estimators only concern the number of unseen species like previously.
+
+The \textsc{ace} is estimator is defined as \citep{OHara05}:
+\begin{equation}
 \begin{split}
 S_p &= S_\mathrm{abund} + \frac{S_\mathrm{rare}}{C_\mathrm{ACE}} +
 \frac{a_1}{C_\mathrm{ACE}} \gamma^2\, , \quad \text{where}\\
@@ -677,7 +729,9 @@
 Here $S_\mathrm{abund}$ and $S_\mathrm{rare}$ are the numbers of
 species of abundant and rare species, with an arbitrary upper limit of
 10 individuals for a rare species, and $N_\mathrm{rare}$ is the total
-number of individuals in rare species.
+number of individuals in rare species. The variance estimator uses
+iterative solution, and it is best interpreted from the source code or
+following \citet{OHara05}.
 
 The pool size
 is estimated separately for each site, but if input is a data frame,

Modified: pkg/vegan/vignettes/vegan.bib
===================================================================
--- pkg/vegan/vignettes/vegan.bib	2014-12-08 08:47:23 UTC (rev 2917)
+++ pkg/vegan/vignettes/vegan.bib	2014-12-12 08:42:58 UTC (rev 2918)
@@ -1,4 +1,22 @@
 
+ at Article{OHara05,
+  author =	 {R. B. O'Hara},
+  title =	 {Species richness estimators: how many species can
+                  dance on the head of a pin},
+  journal =	 {Journal of Animal Ecology},
+  year =	 2005,
+  volume =	 74,
+  pages =	 {375--386}}
+
+ at Article{Good53,
+  author =	 {I. J. Good},
+  title =	 {The population frequencies of species and the
+                  estimation of population parameters},
+  journal =	 {Biometrika},
+  year =	 1953,
+  volume =	 40,
+  pages =	 {237--264}}
+
 @Article{ChiuEtal14,
   author =	 {C. H. Chiu and Y. T. Wang and B. A. Walther and
                   A. Chao},



More information about the Vegan-commits mailing list