[Analogue-commits] r257 - in pkg: . inst inst/doc vignettes

noreply at r-forge.r-project.org noreply at r-forge.r-project.org
Tue Apr 3 21:59:57 CEST 2012


Author: gsimpson
Date: 2012-04-03 21:59:57 +0200 (Tue, 03 Apr 2012)
New Revision: 257

Added:
   pkg/vignettes/
   pkg/vignettes/Z.cls
   pkg/vignettes/analogue_methods.Rnw
   pkg/vignettes/analogue_refs.bib
Removed:
   pkg/inst/doc/Z.cls
   pkg/inst/doc/analogue_methods.Rnw
   pkg/inst/doc/analogue_refs.bib
Modified:
   pkg/inst/ChangeLog
Log:
move vignettes from inst/doc to vignettes

Modified: pkg/inst/ChangeLog
===================================================================
--- pkg/inst/ChangeLog	2012-04-03 19:49:54 UTC (rev 256)
+++ pkg/inst/ChangeLog	2012-04-03 19:59:57 UTC (rev 257)
@@ -1,5 +1,14 @@
 analogue Change Log
 
+Version 0.8-2
+
+	* Dependencies: analogue now requires R >= 2.15.0
+
+	* Replaced remaining instances of `.Internal`; now use
+	`.colSums` and `.rowSums` from R 2.15.0
+
+	* Deleted jss.bst from inst/doc
+
 Version 0.8-1
 
 	* cma: if cutoff meant that all analogues returned for all

Deleted: pkg/inst/doc/Z.cls
===================================================================
--- pkg/inst/doc/Z.cls	2012-04-03 19:49:54 UTC (rev 256)
+++ pkg/inst/doc/Z.cls	2012-04-03 19:59:57 UTC (rev 257)
@@ -1,247 +0,0 @@
-\def\fileversion{1.2}
-\def\filename{Z}
-\def\filedate{2007/02/12}
-%%
-%% Package `Z' to use with LaTeX2e for Z reports
-%% Copyright (C) 2004 Achim Zeileis
-%%
-\NeedsTeXFormat{LaTeX2e}
-\ProvidesClass{Z}[\filedate\space\fileversion\space Z class by Achim Zeileis]
-
-%% options
-\LoadClass[10pt,a4paper,twoside]{article}
-\newif\if at notitle
-\@notitlefalse
-\newif\if at noheadings
-\@noheadingsfalse
-\newif\if at shortnames
-\@shortnamesfalse
-\DeclareOption{notitle}{\@notitletrue}
-\DeclareOption{noheadings}{\@noheadingstrue}
-\DeclareOption{shortnames}{\@shortnamestrue}
-\ProcessOptions
-
-%% required packages
-\RequirePackage{graphicx,a4wide,color,hyperref,ae,fancyvrb,thumbpdf}
-\RequirePackage[T1]{fontenc}
-%% bibliography
-\if at shortnames
-  \usepackage[authoryear,round]{natbib}
-\else
-  \usepackage[authoryear,round,longnamesfirst]{natbib}
-\fi
-\bibpunct{(}{)}{;}{a}{}{,}
-\bibliographystyle{jss}
-
-%% paragraphs
-\setlength{\parskip}{0.7ex plus0.1ex minus0.1ex}
-\setlength{\parindent}{0em}
-
-%% for all publications
-\newcommand{\Plaintitle}[1]{\def\@Plaintitle{#1}}
-\newcommand{\Shorttitle}[1]{\def\@Shorttitle{#1}}
-\newcommand{\Plainauthor}[1]{\def\@Plainauthor{#1}}
-\newcommand{\Keywords}[1]{\def\@Keywords{#1}}
-\newcommand{\Plainkeywords}[1]{\def\@Plainkeywords{#1}}
-\newcommand{\Abstract}[1]{\def\@Abstract{#1}}
-
-%% defaults
-\author{Firstname Lastname\\Affiliation}
-\title{Title}
-\Abstract{---!!!---an abstract is required---!!!---}
-\Plainauthor{\@author}
-\Plaintitle{\@title}
-\Shorttitle{\@title}
-\Keywords{---!!!---at least one keyword is required---!!!---}
-\Plainkeywords{\@Keywords}
-
-%% Sweave(-like)
-\DefineVerbatimEnvironment{Sinput}{Verbatim}{fontshape=sl}
-\DefineVerbatimEnvironment{Soutput}{Verbatim}{}
-\DefineVerbatimEnvironment{Scode}{Verbatim}{fontshape=sl}
-\newenvironment{Schunk}{}{}
-\setkeys{Gin}{width=0.8\textwidth}
-
-%% new \maketitle
-\def\maketitle{
- \begingroup
-   \def\thefootnote{\fnsymbol{footnote}}
-   \def\@makefnmark{\hbox to 0pt{$^{\@thefnmark}$\hss}}
-   \long\def\@makefntext##1{\parindent 1em\noindent
-			    \hbox to1.8em{\hss $\m at th ^{\@thefnmark}$}##1}
-   \@maketitle \@thanks
- \endgroup
- \setcounter{footnote}{0}
-
- \if at noheadings
-   %% \thispagestyle{empty}
-   %% \markboth{\centerline{\@Shorttitle}}{\centerline{\@Plainauthor}}
-   %% \pagestyle{myheadings}
- \else
-   \thispagestyle{empty}
-   \markboth{\centerline{\@Shorttitle}}{\centerline{\@Plainauthor}}
-   \pagestyle{myheadings}
- \fi
-
- \let\maketitle\relax \let\@maketitle\relax
- \gdef\@thanks{}\gdef\@author{}\gdef\@title{}\let\thanks\relax
-}
-
-% Author information can be set in various styles:
-% For several authors from the same institution:
-% \author{Author 1 \and ... \and Author n \\
-%     Address line \\ ... \\ Address line}
-% if the names do not fit well on one line use
-%         Author 1 \\ {\bf Author 2} \\ ... \\ {\bf Author n} \\
-% For authors from different institutions:
-% \author{Author 1 \\ Address line \\  ... \\ Address line
-%     \And  ... \And
-%     Author n \\ Address line \\ ... \\ Address line}
-% To start a seperate ``row'' of authors use \AND, as in
-% \author{Author 1 \\ Address line \\  ... \\ Address line
-%     \AND
-%     Author 2 \\ Address line \\ ... \\ Address line \And
-%     Author 3 \\ Address line \\ ... \\ Address line}
-
-\def\@maketitle{\vbox{\hsize\textwidth \linewidth\hsize
- {\centering
- {\LARGE\bf \@title\par}
- \vskip 0.2in plus 1fil minus 0.1in
- {
-     \def\and{\unskip\enspace{\rm and}\enspace}%
-     \def\And{\end{tabular}\hss \egroup \hskip 1in plus 2fil
- 	      \hbox to 0pt\bgroup\hss \begin{tabular}[t]{c}\large\bf\rule{\z@}{24pt}\ignorespaces}%
-     \def\AND{\end{tabular}\hss\egroup \hfil\hfil\egroup
- 	      \vskip 0.1in plus 1fil minus 0.05in
- 	      \hbox to \linewidth\bgroup\rule{\z@}{10pt} \hfil\hfil
- 	      \hbox to 0pt\bgroup\hss \begin{tabular}[t]{c}\large\bf\rule{\z@}{24pt}\ignorespaces}
-     \hbox to \linewidth\bgroup\rule{\z@}{10pt} \hfil\hfil
-     \hbox to 0pt\bgroup\hss \begin{tabular}[t]{c}\large\bf\rule{\z@}{24pt}\@author
-     \end{tabular}\hss\egroup
- \hfil\hfil\egroup}
- \vskip 0.3in minus 0.1in
- \hrule
- \begin{abstract}
- \@Abstract
- \end{abstract}}
- \textit{Keywords}:~\@Keywords.
- \vskip 0.1in minus 0.05in
- \hrule
- \vskip 0.2in minus 0.1in
-}}
-
-%% \def\@maketitle{\vbox{\hsize\textwidth \linewidth\hsize 
-%%  {\centering
-%%  {\LARGE\bf \@title\par}
-%%    \def\And{\end{tabular}\hfil\linebreak[0]\hfil
-%% 	    \begin{tabular}[t]{c}\large\bf\rule{\z@}{24pt}\ignorespaces}%
-%%     \begin{tabular}[t]{c}\large\bf\rule{\z@}{24pt}\@author\end{tabular}%
-%%  \vskip 0.3in minus 0.1in
-%%  \hrule
-%%  \begin{abstract}
-%%  \@Abstract
-%%  \end{abstract}}
-%%  \textit{Keywords}:~\@Keywords.
-%%  \vskip 0.1in minus 0.05in
-%%  \hrule
-%%  \vskip 0.2in minus 0.1in
-%% }}
-
-
-%% sections, subsections, and subsubsections
-\newlength{\preXLskip}
-\newlength{\preLskip}
-\newlength{\preMskip}
-\newlength{\preSskip}
-\newlength{\postMskip}
-\newlength{\postSskip}
-\setlength{\preXLskip}{1.8\baselineskip plus 0.5ex minus 0ex}
-\setlength{\preLskip}{1.5\baselineskip plus 0.3ex minus 0ex}
-\setlength{\preMskip}{1\baselineskip plus 0.2ex minus 0ex}
-\setlength{\preSskip}{.8\baselineskip plus 0.2ex minus 0ex}
-\setlength{\postMskip}{.5\baselineskip plus 0ex minus 0.1ex}
-\setlength{\postSskip}{.3\baselineskip plus 0ex minus 0.1ex}
-
-\newcommand{\jsssec}[2][default]{\vskip \preXLskip%
-  \pdfbookmark[1]{#1}{Section.\thesection.#1}%
-  \refstepcounter{section}%
-  \centerline{\textbf{\Large \thesection. #2}} \nopagebreak
-  \vskip \postMskip \nopagebreak}
-\newcommand{\jsssecnn}[1]{\vskip \preXLskip%
-  \centerline{\textbf{\Large #1}} \nopagebreak
-  \vskip \postMskip \nopagebreak}
-
-\newcommand{\jsssubsec}[2][default]{\vskip \preMskip%
-  \pdfbookmark[2]{#1}{Subsection.\thesubsection.#1}%
-  \refstepcounter{subsection}%
-  \textbf{\large \thesubsection. #2} \nopagebreak
-  \vskip \postSskip \nopagebreak}
-\newcommand{\jsssubsecnn}[1]{\vskip \preMskip%
-  \textbf{\large #1} \nopagebreak
-  \vskip \postSskip \nopagebreak}
-
-\newcommand{\jsssubsubsec}[2][default]{\vskip \preSskip%
-  \pdfbookmark[3]{#1}{Subsubsection.\thesubsubsection.#1}%
-  \refstepcounter{subsubsection}%
-  {\large \textit{#2}} \nopagebreak
-  \vskip \postSskip \nopagebreak}
-\newcommand{\jsssubsubsecnn}[1]{\vskip \preSskip%
-  {\textit{\large #1}} \nopagebreak
-  \vskip \postSskip \nopagebreak}
-
-\newcommand{\jsssimplesec}[2][default]{\vskip \preLskip%
-%%  \pdfbookmark[1]{#1}{Section.\thesection.#1}%
-  \refstepcounter{section}%
-  \textbf{\large #1} \nopagebreak
-  \vskip \postSskip \nopagebreak}
-\newcommand{\jsssimplesecnn}[1]{\vskip \preLskip%
-  \textbf{\large #1} \nopagebreak
-  \vskip \postSskip \nopagebreak}
-
-\renewcommand{\section}{\secdef \jsssec \jsssecnn}
-\renewcommand{\subsection}{\secdef \jsssubsec \jsssubsecnn}
-\renewcommand{\subsubsection}{\secdef \jsssubsubsec \jsssubsubsecnn}
-
-%% colors
-\definecolor{Red}{rgb}{0.5,0,0} %%{0.7,0,0}
-\definecolor{Blue}{rgb}{0,0,0.5} %%{0,0,0.8}
-\hypersetup{%
-  hyperindex = {true},
-  colorlinks = {true},
-  linktocpage = {true},
-  plainpages = {false},
-  linkcolor = {Blue},
-  citecolor = {Blue},
-  urlcolor = {Red},
-  pdfstartview = {Fit},
-  pdfpagemode = {UseOutlines},
-  pdfview = {XYZ null null null}
-}
-
-\AtBeginDocument{
-  \hypersetup{%
-    pdfauthor = {\@Plainauthor},
-    pdftitle = {\@Plaintitle},
-    pdfkeywords = {\@Plainkeywords}
-  }
-}
-\if at notitle
-  %% \AtBeginDocument{\maketitle}
-\else
-  \AtBeginDocument{\maketitle}
-\fi
-
-%% commands
-\makeatletter
-\newcommand\code{\bgroup\@makeother\_\@makeother\~\@makeother\$\@codex}
-\def\@codex#1{{\normalfont\ttfamily\hyphenchar\font=-1 #1}\egroup}
-\makeatother
-%%\let\code=\texttt
-\let\proglang=\textsf
-\newcommand{\pkg}[1]{{\normalfont\fontseries{b}\selectfont #1}}
-\newcommand{\email}[1]{\href{mailto:#1}{\normalfont\texttt{#1}}}
-\newcommand{\doi}[1]{\href{http://dx.doi.org/#1}{\normalfont\texttt{doi:#1}}}
-\newcommand{\E}{\mathsf{E}}
-\newcommand{\VAR}{\mathsf{VAR}}
-\newcommand{\COV}{\mathsf{COV}}
-\newcommand{\Prob}{\mathsf{P}}

Deleted: pkg/inst/doc/analogue_methods.Rnw
===================================================================
--- pkg/inst/doc/analogue_methods.Rnw	2012-04-03 19:49:54 UTC (rev 256)
+++ pkg/inst/doc/analogue_methods.Rnw	2012-04-03 19:59:57 UTC (rev 257)
@@ -1,655 +0,0 @@
-\documentclass[article,shortnames]{Z}
-\usepackage{thumbpdf}
-
-%\VignetteIndexEntry{Analogue Methods in Palaeoecology}
-%\VignettePackage{analogue}
-%\VignetteDepends{vegan}
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-%% declarations for jss.cls %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
-%% almost as usual
-\author{Gavin L. Simpson\\Environmental Change Research Centre --- UCL}
-\title{Analogue Methods in Palaeoecology:\\ Using the \pkg{analogue} Package}
-
-%% for pretty printing and a nice hypersummary also set:
-\Plainauthor{Gavin L. Simpson} %% comma-separated
-\Plaintitle{Analogue Methods in Palaeoecology: Using the analogue Package} %% without formatting
-\Shorttitle{Analogue Methods in Palaeoecology} %% a short title (if necessary)
-
-%% an abstract and keywords
-\Abstract{
-  Palaeoecology is an important branch of ecology that uses the subfossil remains of organisms preserved in lake, ocean and bog sediments to inform on changes in ecosystems and the environment through time. The \pkg{analogue} package contains functions to perform modern analogue technique (MAT) transfer functions, which can be used to predict past changes in the environment, such as climate or lake-water pH from species data. A related technique is that of analogue matching, which is concerned with identifying modern sites that are floristically and faunistically similar to fossil samples. These techniques, and others, are increasingly being used to inform public policy on environmental pollution and conservation practices. These methods and other functionality in \pkg{analogue} are illustrated using the Surface Waters Acidification Project diatom:pH training set and diatom counts on samples of a sediment core from the Round Loch of Glenhead, Galloway, Scotland. The paper is aimed at palaeoecologists who are familiar with the techniques described but not with \proglang{R}.
-}
-\Keywords{analogue matching, palaeoecology, modern analogue technique, dissimilarity, \proglang{R}}
-\Plainkeywords{analogue matching, palaeoecology, modern analogue technique, dissimilarity, R} %% without formatting
-%% at least one keyword must be supplied
-
-%% publication information
-%% NOTE: This needs to filled out ONLY IF THE PAPER WAS ACCEPTED.
-%% If it was not (yet) accepted, leave them commented.
-%% \Volume{13}
-%% \Issue{9}
-%% \Month{September}
-%% \Year{2004}
-%% \Submitdate{2004-09-29}
-%% \Acceptdate{2004-09-29}
-
-%% The address of (at least) one author should be given
-%% in the following format:
-%\Address{
-%  Gavin L. Simpson\\
-%  Environmental Change Research Centre\\
-%  UCL Department of Geography\\
-%  Pearson Building\\
-%  Gower Street\\
-%  London, UK, WC1E 6BT\\
-%  E-mail: \email{gavin.simpson at ucl.ac.uk}\\
-%  URL: \url{http://www.homepages.ucl.ac.uk/~ucfagls/}
-%}
-%% It is also possible to add a telephone and fax number
-%% before the e-mail in the following format:
-%% Telephone: +43/1/31336-5053
-%% Fax: +43/1/31336-734
-
-%% for those who use Sweave please include the following line (with % symbols):
-%% need no \usepackage{Sweave.sty}
-
-%% end of declarations %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
-
-\begin{document}
-
-%% include your article here, just as usual
-%% Note that you should use the \pkg{}, \proglang{} and \code{} commands.
-<<preliminary,results=hide,echo=false>>=
-options("prompt" = "R> ", "continue" = "+ ")
-@
-\section{Introduction}
-%% Note: If there is markup in \(sub)section, then it has to be escape as above.
-Palaeoecology is a small but increasingly important branch of ecology. Sub-fossil remains of a range of organisms are well preserved in a number of media, primarily lake and ocean sediments and peat bogs. Analysis of these remains can show how individual organisms through to whole ecosystems develop and evolve, and how they respond to external environmental pressures, such as climate change and anthropogenic pollution. In recent decades palaeoecology has progressed from a primarily descriptive science to one which today involves a wide range of quantitative analysis. This development has been required as palaeoecology has begun to be used to answer questions in areas relating to public policy on pollution impacts and in conservation biology.
-
-Two important quantitative applications of palaeoecology are palaeoenvironmental reconstructions and approaches to define reference conditions and restoration success.
-
-Quantitative palaeoecology has played a key role in identifying the problem and the causes of major environmental issues that have been at the centre of much public concern over the past 20 years or so, such as acid rain and surface water acidification, eutrophication and anthropogenic climate change. In each of these cases, the onset of change or pollution occurred long before environmental monitoring programs were around to detect any change. A key issue, therefore, is to be able to reconstruct past changes in the environment (e.g.~lake water pH or nutrient concentrations, air temperatures, and sea surface temperature and salinity) from the remains of organisms preserved in sediments, so that the extent and timing of the change can be determined. These may in turn suggest particular causative mechanisms.
-
-Acknowledging that many aquatic environments are today degraded as a result of anthropogenic activities major new pieces of legislation have been enacted in Europe (the European Council Water Framework Directive, WFD; \citealt{euwfd}) and the USA (Clean Water Act; \citealt{uscleanwater}), which at their heart contain the concept of change over a baseline state, the reference condition. In Europe for example, the WFD requires member states to restore all degraded fresh waters to at least good status by 2015. Good status is defined as very minor change compared to the reference condition. In many cases we simply do not know what the appropriate reference state should be as there are invariably few, if any, reliable records that predate the onset of change.
-
-Palaeoecology can also play a role here; palaeoenvironmental reconstructions can inform us as to the likely hydrochemical conditions in the past for certain key parameters, and the remains of various species groups preserved in lake sediments can tell us about the flora and fauna living in a lake prior to change. However, because only certain species groups preserve well in lake sediments, direct palaeoecological analysis of lake sediments can provide only part of the answer. Analogue matching can then be used to identify lakes that are today most similar to the reference conditions of the target lake, and the missing species information filled in from surveys of those species living in the identified sites \citep{simpson_envpol2005}.
-
-\subsection{Calibration}
-Palaeoenvironmental reconstruction is a multivariate calibration problem. Calibration methods (known as \emph{transfer functions} in the palaeoecological literature) can be classified into two main types; \emph{classical} and \emph{inverse} methods. In general, the species assemblages, $\mathbf{Y}$, in a training set are assumed to be some function $f$ of the environment at those sites, $\mathbf{X}$, plus an error term. This is commonly written as
-\begin{equation}
- \mathbf{Y} = f(\mathbf{X}) + \epsilon
-\end{equation}
-where $\mathbf{Y}$ is an $n \times m$ matrix of counts on $m$ species and $\mathbf{Y}$ is an $n \times p$ matrix of $p$ environmental variables for $n$ samples or sites.
-
-In the classical approach to calibration, $f$ is estimated from a set of training data via regression of $\mathbf{Y}$ on $\mathbf{X}$. Given a sample of fossil species data, $y_0$, $f$ is inverted to yield an estimate of the environment, $x_0$, that gave rise to the fossil assemblage. In all but the simplest cases, however, the inverse of $f$ does not exist and must be estimated from the data, for example via numerical optimisation techniques.
-
-The inverse approach avoids the problem of inverting $f$ by directly estimating the inverse of $f$, denoted $g$, from the data by regressing $\mathbf{X}$ on $\mathbf{Y}$
-\begin{equation}
- \mathbf{X} = g(\mathbf{Y}) + \epsilon.
-\end{equation}
-Note that we do not believe that the species ($\mathbf{Y}$) influence their environment ($\mathbf{X}$).
-
-Inverse approaches are known to perform slightly better in situations where the fossil samples are from the central part of the distribution of the training set, whereas classical approaches perform slightly better at the extremes of the training set and with a small amount of extrapolation \citep{1215}. The modern analogue technique, described below, is an inverse multivariate calibration approach.
-
-\subsection{The modern analogue technique (MAT)}
-The quantitative analysis of stratigraphic records from sediment archives is predicated on the concept of Uniformitarianism \citep{rymer1978}, which is summarised by the phrase \emph{the present is the key to the past}. Through knowledge of the present-day ecology of species, inferences about past environmental conditions can be made via analogy to that same set of conditions existing where those species are found living today. This is known as space-for-time substitution, or more commonly as the modern analogue technique (MAT). In MAT, the environment of samples from a modern set of lakes that are most similar in terms of their species composition to a fossil sample can be used as a direct prediction of the environment that existed at the time the fossil sample was deposited \citep{1482}. MAT is a \emph{k}-nearest neighbours (\emph{k}-NN) method.
-
-Defining how similar two samples are to one another is a critical consideration in MAT. Dissimilarity or distance coefficients are used, which measure the floristic or faunistic similarity between a fossil sample and each modern training set sample. One recommended dissimilarity coefficient for use with compositional data is the chord distance as it has good signal to noise properties \citep{904,1453}.
-
-The chord distance between samples $j$ and $k$, $d_{jk}$, is
-\begin{equation}
- d_{jk} = \sqrt{\sum\limits_{k=1}^m\left(x_{ij}^{0.5}-x_{ik}^{0.5}\right)^2}
-\end{equation}
-where $x_{ij}$ is the proportion of taxon $i$ in sample $k$. For the chord distance, values for $d_{jk}$ range from 0 to $\sqrt{2}$. Another commonly used measure is the $\chi^2$ distance \citep{957, 122}. Often the squared forms of these coefficients have been used for no other reason than computational efficiency.
-
-Despite having some optimal properties for percentage compositional data, \citet{1527} have criticised the chord distance as a weak measure of compositional dissimilarity.
-
-A wide range of dissimilarity coefficients have been proposed, several of which have been implemented in the function \code{distance} (see Section \ref{dissims}), including several of the coefficients recommended by \citet{1527} as good measures of compositional dissimilarity.
-
-\subsection{Analogue matching}
-Analogue matching \citep{904, 361} is a palaeoecological technique used to identify the \emph{k}-closest sites from a modern set of lakes that are biologically most similar to the impacted lake prior to the onset of change. The \emph{k}-closest sites are selected on the basis of how similar they are to the target sample in those organisms that are preserved in lake sediments, and are known as modern analogues. The pre-impact or reference condition flora and fauna for the target lake from groups that do not preserve in lake sediments can then be inferred on the basis of the species found living in the modern analogues today \citep{simpson_envpol2005}.
-
-\subsection{Outline of the paper}
-Section \ref{using_analogue} contains a worked example providing an overview of the \pkg{analogue} package for \proglang{R} \citep{R}. In Section \ref{choose_k} we look at alternative ways of selecting the number of analogues, $k$, to retain in a MAT model. Section \ref{other_features} describes the wider functionality contained within \pkg{analogue}, including the dissimilarity coefficients available, an overview of the plotting functions provided, and how to produce sample specific error estimates for fossil samples and use an independent test set in MAT transfer functions. The paper concludes with a short description of future plans for the package (Section \ref{future_plans}).
-
-\section[Using analogue]{Using \pkg{analogue}}\label{using_analogue}
-This section contains a worked example of how to use the \pkg{analogue} package to fit MAT transfer function models and to perform analogue matching. The \pkg{analogue} package first has to be loaded before it can be used:
-
-<<>>=
-library("analogue")
-@
-
-The version of \pkg{analogue} installed is printed if the package has been successfully loaded.
-
-To illustrate \pkg{analogue}, the Surface Waters Acidification Project (SWAP) diatom:pH training set is used \citep{swapredbook}, along with diatom counts from a sediment core taken from the Round Loch of Glenhead, Galloway, Scotland \citep{604}. The data sets also need to be loaded before they can be used:
-\label{join}
-<<>>=
-data(swapdiat, swappH, rlgh, package = "analogue")
-@
-
-The \code{swapdiat} data set contains diatom\footnote{Diatoms are unicellular algae that possess a frustule (cell wall) composed of a form of silica. Diatoms live wherever there is water and light. Diatom frustules are highly resistant and as such preserve well in lake sediments. Individual diatom species are identified by different ornamentation of the frustule.} counts on \Sexpr{ncol(swapdiat)} species from \Sexpr{nrow(swapdiat)} lakes. Matching measurements of lake water pH (acidity) are available for each lake in \code{swappH}. These pH measurements are the average of four quarterly samples.
-
-The sediment core from the Round Loch of Glenhead (RLGH from now on) contains diatom counts on \Sexpr{ncol(rlgh)} species from \Sexpr{nrow(rlgh)} levels.
-
-In both datasets the diatom counts are expressed as percentage abundances.
-
-\subsection{MAT transfer functions}
-MAT transfer functions are built using the generic function \code{mat}. The default method for \code{mat} takes three arguments; \code{x} --- a data frame of diatom counts for the training set, \code{y} --- a numeric vector of observations of the environmental variable of interest, and \code{method} --- the dissimilarity coefficient to use.
-
-The data frame of diatom counts (\code{x}), must have the same columns (species) as the data frame of counts for the sediment core for which MAT reconstructions are required. To ensure that both data frames have the same set of columns, the \code{join} function is used to merge the two data sets.
-
-<<>>=
-dat <- join(swapdiat, rlgh, verbose = TRUE)
-@
-
-The \code{verbose = TRUE} argument instructs the function to print out summaries of the merged data sets. \code{dat} is a list containing two data frames. These are the original datasets but now with a common set of columns (species). The defaults for \code{join} also replace the missing values created when merging the two data sets with zeros. This behaviour can be controlled through the \code{na.replace} argument.
-
-An alternative to merging the two data sets would be to select only the intersect of the data sets, i.e.~select only those columns in common between the two datasets. This is a non-standard approach however, and is not consistent with implementations in other software packages. One potential problem with the merging approach employed by \code{join} is the additional zero values added to one or both of the training set or fossil samples, which may exacerbate the double-zero problem or have an unduly large effect on the values of the chosen dissimilarity coefficient. As such, care must be taken when forming training sets and fossil samples, as well as in the choice of dissimilarity coefficient.
-
-By convention, dissimilarity coefficients are defined for proportional data. As the data used in this example are percentages we need to convert them to proportions. We extract each of the merged data sets (the components of \code{dat}) back into the training set and the fossil set, converting the data into proportions as we do so.
-
-<<>>=
-swapdiat <- dat$swapdiat / 100
-rlgh <- dat$rlgh / 100
-@
-
-The data are now ready for analysis. We will fit a MAT model to the SWAP training set using the squared chord distance (SCD) coefficient:
-
-<<>>=
-swap.mat <- mat(swapdiat, swappH, method = "SQchord")
-@
-
-An overview of the fitted model is produced by printing the stored object:
-
-<<>>=
-swap.mat
-@
-
-The percentiles of the distribution of SCD values for the training set are displayed, along with model performance statistics for the training data of inferences for pH based on the mean and weighted mean of the \emph{k} closest analogues. The weights used are the inverse of the dissimilarity, $1 / d_{jk}$, for each of the $k$-closest analogues. It should be noted that this may give overly large weights to nearly identical analogues, which may be of concern in species poor oceanic data sets, but not generally in species rich limnological training sets. By default only statistics for $k = 1,\ldots,10$ closest analogues are shown. The RMSEP values shown are leave-one-out errors; the prediction for each sample in the training set is based on $k$-closest analogues excluding that sample. These values are not strongly biased, unlike the apparent (RMSE) errors from other methods such as the weighted averaging-based techniques. There is not much to choose between models that use the mean or weighted mean. For the rest of this example, we restrict ourselves to non-weighted versions of the models.
-
-A more detailed summary of the results may be displayed using the \code{summary} method:
-
-<<results=hide>>=
-summary(swap.mat)
-@
-
-\setkeys{Gin}{width=0.7\textwidth}
-\begin{figure}
-\centering
-<<plot_mat, fig=true, echo=false>>=
-opar <- par(mfrow = c(2,2))
-plot(swap.mat)
-par(opar)
-@
-\caption{\label{plot_mat}Summary diagram of the results of a MAT model applied to predict lake water pH from the SWAP diatom data set --- see text for details.}
-\end{figure}
-\setkeys{Gin}{width=0.8\textwidth}
-
-Before using this model to reconstruct pH for the RLGH core, the number of analogues, $k$, to use in the reconstructions must be determined. A simple way of choosing $k$ is to select $k$ from the model with lowest RMSEP. In the printed results shown above, the model with the lowest RMSEP was a model with $k = 10$ closest analogues for both the mean and weighted mean indices. We should check this number however, as the displayed lists were restricted to show only the $k = 1,\ldots,10$ closest analogues. Whenever $k$ is not specified, the functions in \pkg{analogue} automatically choose the model with lowest RMSEP. The simplest way to check this is to the use the \code{getK} extractor function:
-
-<<>>=
-getK(swap.mat)
-@
-
-This shows that the model with 10 closest analogues has the lowest RMSEP, and that this value was chosen automatically and not set by the user.
-
-\code{mat} has a \code{plot} method, which provides a \code{plot.lm}-like function to graphically summarise the fitted model. By default 4 different plots of the model are produced, so we split the plotting region in four before plotting and subsequently restore the original settings:
-
-<<fig=false>>=
-<<plot_mat>>
-@
-
-The resulting plot is displayed in Figure \ref{plot_mat}. The upper left panel of Figure \ref{plot_mat} shows a plot of the observed versus fitted values, whilst the upper right panel shows a plot of the observed values versus model residuals. The dashed blue line in the residuals plot shows the average bias in the model. In both plots, the solid red line is a LOWESS smoother (span = 2/3).
-
-The labels for the y-axes of both plots show the value of $k$ selected automatically by \code{mat} --- in this case $k = 10$ analogues. We can confirm this value by looking at the plot of the leave-one-out errors (RMSEP) in the lower left panel of Figure \ref{plot_mat}. This is a screeplot of the RMSEP values for models with various values of $k$ (by default this is restricted to be $\leq 20$ to avoid clutter). We can see that a model with 10 analogues has lowest RMSEP although there is not a lot of difference in the RMSEP of models with between 6 and 11 analogues. The lower right panel of Figure \ref{plot_mat} shows a screeplot, similar to the plot of leave-one-out errors, but which displays the maximum bias in models of various sizes.
-
-This choice of $k$ is generally not strongly biased despite being determined \textit{post hoc} from the training data. However, \citet{1461} demonstrate a worst case where this $k$ is badly biased. The use of an independent optimsation set, alongside the usual training and test sets, is recommended to avoid this bias \citep{1461}. Section \ref{test_set} shows how to use independent test or optimsation sets with \pkg{analogue}.
-
-This model can now be used to reconstruct past pH values for the RLGH core. The \code{predict} method of \code{mat} can be used for reconstructions:
-
-<<results=hide>>=
-rlgh.mat <- predict(swap.mat, rlgh, k = 10)
-rlgh.mat
-@
-
-The \code{reconPlot} method can be used to plot the reconstructed values as a time series-like plot --- the resulting plot is shown in Figure \ref{plot_recon}:
-
-<<plot_recon, fig=false>>=
-reconPlot(rlgh.mat, use.labels = TRUE, ylab = "pH", xlab = "Depth (cm.)")
-@
-
-\begin{figure}
-\centering
-<<fig=true, echo=false, width = 6, height = 4>>=
-<<plot_recon>>
-@
-\caption{\label{plot_recon}Time series plot of the pH reconstruction for the RLGH core. Depth is a surrogate for time, with 0 being the most recent period represented by the core.}
-\end{figure}
-
-The argument \code{use.labels = TRUE} instructs the function to take the names component of the predicted values as the values for the x-axis. Here depth is a surrogate for time.
-
-If we are interested in how reliable our reconstructed values are, a useful descriptor is the minimum dissimilarity between a core sample and the training set samples (minDC). If there are no close modern analogues in the training set for certain fossil samples, we will have less faith in the MAT reconstructions for those fossil samples than for samples that do have close modern analogues. The \code{minDC} function can be used to extract the minimum dissimilarity for each fossil sample:
-
-<<>>=
-rlgh.mdc <- minDC(rlgh.mat)
-@
-
-Printing the resulting object (\code{rlgh.mdc}) doesn't yield very much information. It is easier to display the minDC values in a plot similar to the one produced by \code{reconPlot} above:
-
-<<plot_minDC, fig=false>>=
-plot(rlgh.mdc, use.labels = TRUE, xlab = "Depth (cm.)")
-@
-
-\begin{figure}
-\centering
-<<fig=true, echo=false, width = 6, height = 4>>=
-<<plot_minDC>>
-@
-\caption{\label{plot_minDC}Time series plot of the minimum dissimilarity between each core (fossil) sample and the SWAP training set samples. The dotted, horizontal lines are drawn at various percentiles of the distribution of the pair-wise dissimilarities for the training set samples.}
-\end{figure}
-
-The resulting plot is shown in Figure \ref{plot_minDC}. The dotted horizontal lines are the probability quantiles of the distribution of dissimilarity values for the training samples. A useful rule of thumb is that a fossil sample has no close modern analogues where the minDC for the sample is greater than the 5th percentile of the distribution of dissimilarity values for the training samples. As Figure \ref{plot_minDC} shows, there are several periods of the RLGH core that have no close modern analogues.
-
-\subsection{Analogue matching}
[TRUNCATED]

To get the complete diff run:
    svnlook diff /svnroot/analogue -r 257


More information about the Analogue-commits mailing list