[Vegan-commits] r243 - in pkg: inst/doc man

Sun Feb 24 09:05:57 CET 2008

Author: jarioksa
Date: 2008-02-24 09:05:57 +0100 (Sun, 24 Feb 2008)
New Revision: 243

Modified:
   pkg/inst/doc/diversity-vegan.Rnw
   pkg/man/taxondive.Rd
Log:
Added discussion on taxonomid diversity & proof reading

Modified: pkg/inst/doc/diversity-vegan.Rnw
===================================================================

--- pkg/inst/doc/diversity-vegan.Rnw	2008-02-22 07:38:38 UTC (rev 242)
+++ pkg/inst/doc/diversity-vegan.Rnw	2008-02-24 08:05:57 UTC (rev 243)
@@ -28,8 +28,8 @@
 
 \tableofcontents
 
-\noindent The \texttt{vegan} packages has two major components:
-multivariate analysis, mainly ordination, and methods for diversity
+\noindent The \texttt{vegan} package has two major components:
+multivariate analysis (mainly ordination), and methods for diversity
 analysis of ecological communities.  This document gives an
 introduction to the latter.  Ordination methods are covered in other
 documents.  Many of the diversity functions were written by Roeland
@@ -37,9 +37,9 @@
 
 Most diversity methods assume that data are counts of individuals.
 The methods are used with other data types, and some people argue that
-biomass or cover are more adequate units than counts of individuals of
-variable sizes.  However, this document only uses a data set with
-counts: stem counts of trees on 1ha plots in the Barro Colorado
+biomass or cover are more adequate than counts of individuals of
+variable sizes.  However, this document mainly uses a data set with
+counts: stem counts of trees on $1$ha plots in the Barro Colorado
 Island.  The following steps make these data available for the
 document:
 <<>>=
@@ -72,16 +72,19 @@
 <<>>=
 J <- H/log(specnumber(BCI))
 @
-where \texttt{specnumber} is a simple \texttt{vegan} function.
+where \texttt{specnumber} is a simple \texttt{vegan} function to find
+the numbers of species.
 
-\texttt{Vegan} also can estimate RÃ©nyi diversities of order $a$:
+\texttt{Vegan} also can estimate Rényi diversities of order $a$:
 \begin{equation}
 H_a = \frac{1}{1-a} \log \sum_{i=1}^S p_i^a
 \end{equation}
 or the corresponding Hill numbers $N_a = \exp(H_a)$.  Many common
 diversity indices are special cases of Hill numbers: $N_0 = S$, $N_1 =
-\exp(H')$, $N_2 = D_2$, and $N_\infty = 1/(\max p_i)$.  We select a
-random subset of five sites for RÃ©nyi diversities:
+\exp(H')$, $N_2 = D_2$, and $N_\infty = 1/(\max p_i)$. The
+correspoding Rényi diversities are $H_0 = \log(S)$, $H_1 = H'$, $H_2 =
+- \log(\sum p_i^2)$, and $H_\infty = - \log(\max p_i)$.  We select a
+random subset of five sites for Rényi diversities:
 <<>>=
 k <- sample(nrow(BCI), 6)
 R <- renyi(BCI[k,])
@@ -89,7 +92,7 @@
 We can really regard a site more diverse if all of its RÃ©nyi
 diversities are higher than in another site.  We can inspect this
 graphically using the standard \texttt{plot} function for the
-\texttt{renyi} result(Fig. \ref{fig:renyi}).
+\texttt{renyi} result (Fig. \ref{fig:renyi}).
 <<echo=false,results=hide>>=
 require(lattice, quietly=TRUE)
 @
@@ -97,7 +100,7 @@
 <<fig=true,echo=false>>=
 print(plot(R))
 @
-\caption{RÃ©nyi diversities in six randomly selected plots. The plot
+\caption{Rényi diversities in six randomly selected plots. The plot
   uses Trellis graphics with a separate panel for each site. The dots
   show the values for sites, and the lines the extremes and median in
   the data set.}
@@ -124,8 +127,8 @@
 \end{equation}
 where $x_i$ is the count of species $i$, and ${N \choose n}$ is the
 binomial coefficient, or the number of ways we can choose $n$ from
-$N$. $p_i$ give the probabilities that species $i$ does not occur in a
-sample of size $n$.  This is only defined for $N-x_i > n$, but for
+$N$, and $p_i$ give the probabilities that species $i$ does not occur in a
+sample of size $n$.  This is defined only when $N-x_i > n$, but for
 other cases $p_i = 0$ or the species is sure to occur in the sample.
 The variance of rarefied richness is:
 \begin{equation}
@@ -151,8 +154,7 @@
 @
 Rarefaction curves often are seen as an objective solution for
 comparing species richness with different sample sizes.  However, rank
-orders typically differ among different rarefaction sample sizes, and
-rarefaction richness often shares the problems of RÃ©nyi diversities.
+orders typically differ among different rarefaction sample sizes.
 
 As an extreme case we may rarefy sample size to two individuals:
 <<>>=
@@ -163,16 +165,86 @@
 <<>>=
 all(rank(Srar) == rank(S2))
 @
-Moreover, the rarefied richness for two individuals only is a finite
+Moreover, the rarefied richness for two individuals is a finite
 sample variant of Simpson's diversity index (or, more precisely of
-$D_1 + 1$), and almost identical with sample sizes in BCI:
+$D_1 + 1$), and these two are almost identical in BCI:
 <<>>=
 range(diversity(BCI, "simp") - (S2 -1))
 @
-Rarefaction is sometimes presented as ecologically meaningful
+Rarefaction is sometimes presented as an ecologically meaningful
 alternative to dubious diversity indices, but the differences really
 seem to be small.
 
+\section{Taxonomic diversity}
+
+Simple diversity indices only consider species identity: all different
+species are equally different. In contrast, taxonomic diversity sees
+how different two different species are. The index is much used in
+aquatic ecology, in particular for studying the effects of pollution
+or other degradation, which often is first evident in the loss of
+higher level taxonomic units.
+
+The two basic indecies are called taxonomic diversity ($\Delta$) and
+taxonomic distinctness ($\Delta^*$):
+\begin{align}
+  \Delta &= \frac{\sum \sum_{i<j} \omega_{ij} x_i x_j}{n (n-1) / 2}\\
+\Delta^* &= \frac{\sum \sum_{i<j} \omega_{ij} x_i x_j}{\sum \sum_{i<j} x_i x_j}
+\end{align}
+These equations give the index values for a single site, and summation
+goes over species $i$ and $j$, and $\omega$ are the taxonomic
+distnaces among taxa, $x$ are species abundances, and $n$ is the total
+abundance for a site.  With presence absence data, both indeices
+reduce to the same index called $\Delta^+$, and for this it is
+possible to estimate standarad deviation. There are two indices
+derived from $\Delta^+$: it can be multiplied with species
+richness\footnote{This text normally uses upper case letter $S$ for
+  species richness, but lower case $s$ is used here in accordance with
+  the original papers on taxonomic diversity} 
+to give $s \Delta^+$, or it can be used to estimate an index of
+variation in taxonomic distinctness $\Lambda^+$:
+\begin{equation}
+  \Lambda^+ = \frac{\sum \sum_{i<j} \omega_{ij}^2}{n (n-1) / 2} - (\Delta^+)^2
+\end{equation}
+
+We still need the taxonomic differences among species ($\omega$) to
+calculate the indices of taxonomic differences. This can be any
+distance structure among species, but usually it is found from
+established hierarchic taxonomy. Typical coding is that differences
+among species in the same genus is $1$, among the same family it is
+$2$ etc. However, the taxonomic differences are scaled to maximum
+$100$ for easier comparison between different data sets and
+taxonomies. Alternatively, it is possible to scale steps between
+taxonomic level proportional to the reduction in the number of
+categories: if almost all genera have only one species, it does not
+make a great difference if two individuals belong to a different
+species or to a different genus.
+
+Function \texttt{taxondive} implements indices of taxonomic diversity,
+and \texttt{taxa2dist} can be used to convert classification tables to
+taxonomid distances either with constant or variable step lengths
+between succesive categories. There is no taxonomic table for the BCI
+data in \texttt{vegan}\footnote{Actually I made such a classification,
+  but taxonomic differences proved to be of little use in the Barro
+  Colorado data: they only singled out sites with Monocots (palm
+  trees) in the data.}
+but there is such a table for the Dune meadow data (Fig. \ref{fig:taxondive}):
+<<>>=
+data(dune)
+data(dune.taxon)
+taxdis <- taxa2dist(dune.taxon, varstep=TRUE)
+mod <- taxondive(dune, taxdis)
+@ 
+\begin{SCfigure}
+<<fig=true,echo=false>>=
+plot(mod)
+@
+\caption{Taxonomic diversity $\Delta^+$ for the dune meadow data. The
+  points are diversity values of single sites, and the funnel is their
+  approximate confidence intervals ($2 \times$ standard error).}
+\label{fig:taxondive}
+\end{SCfigure} 
+
+
 \section{Species abundance models}
 
 Diversity indices may be regarded as variance measures of species
@@ -183,12 +255,13 @@
 
 \subsection{Fisher and Preston}
 
-In Fisher's log-series, the expected number of species with $n$
+In Fisher's log-series, the expected number of species $\hat f$ with $n$
 individuals is:
 \begin{equation}
 \hat f_n = \frac{\alpha x^n}{n}
 \end{equation}
-where $x$ is a nuisance parameter defined by $\alpha$ and total number
+where $\alpha$ is the diversity parameter, and $x$ is a nuisance
+parameter defined by $\alpha$ and total number 
 of individuals $N$ in the site, $x = N/(N-\alpha)$.  Fisher's
 log-series for a randomly selected plot is (Fig. \ref{fig:fisher}):
 <<>>=
@@ -204,16 +277,15 @@
   (\Sexpr{k}).}
 \label{fig:fisher}
 \end{SCfigure}
-We already saw this model as a diversity index.  Now we also obtained
+We already saw $\alpha$ as a diversity index.  Now we also obtained
 estimate of standard error of $\alpha$ (these also are optionally
 available in \texttt{fisher.fit}).  The standard errors are based on
-the second derivatives (curvature) of the partial derivatives of
-log-likelihood at the solution of $\alpha$.  The distribution of
-$\alpha$ often is very non-normal and skewed, and standard errors are
-of not much use.  However, \texttt{fisherfit} has a \texttt{profile}
-method that can be used to inspect the validity of normal assumptions,
-and will be used in calculations of confidence intervals from profile
-deviance:
+the second derivatives (curvature) of log-likelihood at the solution
+of $\alpha$.  The distribution of $\alpha$ is often non-normal
+and skewed, and standard errors are of not much use.  However,
+\texttt{fisherfit} has a \texttt{profile} method that can be used to
+inspect the validity of normal assumptions, and will be used in
+calculations of confidence intervals from profile deviance:
 <<>>=
 confint(fish)
 @
@@ -232,7 +304,7 @@
 Function \texttt{prestondistr} directly
 maximizes truncated log-normal likelihood without binning data, and it
 is the recommended alternative.  Log-normal models  usually fit poorly
-to the BCI data, but here our random plot:
+to the BCI data, but here our random plot (number \Sexpr{k}):
 <<>>=
 prestondistr(BCI[k,])
 @
@@ -241,8 +313,9 @@
 
 An alternative approach to species abundance distribution is to plot
 logarithmic abundances in decreasing order, or against ranks of
-species.  These are known, among other names, as ranked abundance
-distribution curves, dominance--diversity curves and Whittaker plots.
+species.  These are known as ranked abundance
+distribution curves, species abundance curves, dominance--diversity
+curves or Whittaker plots. 
 Function \texttt{radfit} fits some of the most popular models using
 maximum likelihood estimation:
 \begin{align}
@@ -333,8 +406,7 @@
 properly handle sampling without replacement and underestimates the
 species accumulation curve.
 
-but the recommended is Kindt's exact
-method (Fig. \ref{fig:sac}):
+The recommended is Kindt's exact method (Fig. \ref{fig:sac}):
 <<a>>=
 sac <- specaccum(BCI)
 plot(sac, ci.type="polygon", ci.col="yellow")
@@ -349,16 +421,15 @@
 
 \subsection{Number of unseen species}
 
-Species accumulation models indicate that not all potential species
-are seen in any sites.  These unseen species also belong to the
-species pool of the site.  Functions \texttt{specpool} and
-\texttt{estimateR} implement some methods of estimating the number of
-unseen species.  Function \texttt{specpool} studies a collection of
-sites, and assumes how many species may be unobserved.  Function
-\texttt{estimateR} works with counts of individuals, and also can be
-used with a single site.  Both functions assume that the number of
-unseen species is related to the number of rare species, or species
-seen only once or twice.
+Species accumulation models indicate that not all species were seen in
+any site.  These unseen species also belong to the species pool.
+Functions \texttt{specpool} and \texttt{estimateR} implement some
+methods of estimating the number of unseen species.  Function
+\texttt{specpool} studies a collection of sites, and
+\texttt{estimateR} works with counts of individuals, and can be used
+with a single site.  Both functions assume that the number of unseen
+species is related to the number of rare species, or species seen only
+once or twice.
 
 Function \texttt{specpool} implements the following models to estimate
 the pool size $S_p$:
@@ -465,8 +536,8 @@
 
 \subsection{Probability of pool membership}
 
-Beals smoothing was originally suggested as tool of regularizing data
-for ordination.  It regularizes data too strongly for that purpose,
+Beals smoothing was originally suggested as a tool of regularizing data
+for ordination.  It regularizes data too strongly,
 but it has been suggested as a method of estimating which of the
 missing species could occur in a site, or which sites are suitable for
 a species.  The probability for each species at each site is assessed

Modified: pkg/man/taxondive.Rd
===================================================================
--- pkg/man/taxondive.Rd	2008-02-22 07:38:38 UTC (rev 242)
+++ pkg/man/taxondive.Rd	2008-02-24 08:05:57 UTC (rev 243)
@@ -71,7 +71,7 @@
   taxonomic, but other species classifications can be used. 
 
   Function \code{taxa2dist} can produce a suitable \code{dist} object
-  form a classification table. Each species (or basic taxon) correspond
+  from a classification table. Each species (or basic taxon) corresponds
   to a row of the classification table, and columns give the
   classification at different levels. With \code{varstep = FALSE} the
   successive levels will be separated by equal steps, and with