[Vegan-commits] r2639 - in pkg/lmodel2: . inst vignettes

noreply at r-forge.r-project.org noreply at r-forge.r-project.org
Thu Oct 17 12:47:48 CEST 2013


Author: jarioksa
Date: 2013-10-17 12:47:47 +0200 (Thu, 17 Oct 2013)
New Revision: 2639

Added:
   pkg/lmodel2/vignettes/
   pkg/lmodel2/vignettes/lmodel2.bib
   pkg/lmodel2/vignettes/mod2user.Rnw
Removed:
   pkg/lmodel2/inst/doc/
Log:
move vignettes from inst/doc/ to vignettes/

Copied: pkg/lmodel2/vignettes/lmodel2.bib (from rev 2638, pkg/lmodel2/inst/doc/lmodel2.bib)
===================================================================
--- pkg/lmodel2/vignettes/lmodel2.bib	                        (rev 0)
+++ pkg/lmodel2/vignettes/lmodel2.bib	2013-10-17 10:47:47 UTC (rev 2639)
@@ -0,0 +1,86 @@
+ at Book{Legendre-Legendre98,
+  author =       {P. Legendre and L. Legendre},
+  title =        {Numerical ecology},
+  publisher =    {Elsevier},
+  year =         1998,
+  number =       20,
+  series =       {Developments in Environmental Modelling},
+  address =      {Amsterdam},
+  edition =      {2nd}
+}
+ at Article{AnderLeg99,
+  author =	 {M. J. Anderson and P. Legendre},
+  title =	 {An empirical comparison of permutation methods for
+                  tests of partial regression coefficients in a linear
+                  model},
+  journal =	 {Journal of Statistical Computation and Simulation},
+  year =	 1999,
+  volume =	 62,
+  pages =	 {271-303}
+}
+ at Article{Hines.ea97,
+  author =	 {A.H. Hines and R. B. Whitlatch and S. F. Thrush and
+                  J. E. Hewitt and V. J. Cummings and P. K. Dayton and
+                  P. Legendre},
+  title =	 { Nonlinear foraging response of a large marine
+                  predator to benthic prey: eagle ray pits and
+                  bivalves in a {N}ew {Z}ealand sandflat},
+  journal =	 {Journal of Experimental Marine Biology and Ecology},
+  year =	 1997,
+  volume =	 216,
+  pages =	 {191-210}
+}
+ at Article{Mesple.ea96,
+  author =	 {F. Mespl{\'e} and M. Troussellier and C. Casellas and
+                  P. Legendre},
+  title =	 {Evaluation of simple statistical criteria to qualify
+                  a simulation},
+  journal =	 {Ecological Modelling},
+  year =	 1996,
+  volume =	 88,
+  pages =	 {9-18}
+}
+ at Book{SokalRohlf95,
+  author =	 {R. R. Sokal and F. J. Rohlf},
+  title =	 {Biometry: The principles and practice of statistics
+                  in biological research},
+  publisher =	 {W. H. Freeman},
+  year =	 1995,
+  edition =	 {3rd}
+}
+ at Article{Jolicoeur90,
+  author =	 {P. Jolicoeur},
+  title =	 {Bivariate allometry: interval estimation of the
+                  slopes of the ordinary and standardized normal major
+                  axes and structural relationship},
+  journal =	 {Journal of Theoretical Biology},
+  year =	 1990,
+  volume =	 144,
+  pages =	 {275-285}
+}
+ at Article{JoliMosi68,
+  author =	 {P. Jolicoeur and J. E. Mosiman},
+  title =	 {Intervalles de confiance pour la pente de l'axe
+                  majeur d'une distribution normale bidimensionnelle},
+  journal =	 {Biom{\'e}trie-Praxim{\'e}trie},
+  year =	 1968,
+  volume =	 9,
+  pages =	 {121-140}
+}
+ at Article{McArdle88,
+  author =	 {B. McArdle},
+  title =	 {The structural relationship: regression in biology},
+  journal =	 {Canadian Journal of Zoology},
+  year =	 1988,
+  volume =	 66,
+  pages =	 {2329-2339}
+}
+ at Book{Neter.ea96,
+  author =	 {J. Neter and M. H. Kutner and C. J. Nachtsheim and
+                  W. Wasserman},
+  title =	 {Applied linear statistical models},
+  publisher =	 {Richad D. Irwin Inc.},
+  year =	 1996,
+  edition =	 {4th}
+}
+

Copied: pkg/lmodel2/vignettes/mod2user.Rnw (from rev 2638, pkg/lmodel2/inst/doc/mod2user.Rnw)
===================================================================
--- pkg/lmodel2/vignettes/mod2user.Rnw	                        (rev 0)
+++ pkg/lmodel2/vignettes/mod2user.Rnw	2013-10-17 10:47:47 UTC (rev 2639)
@@ -0,0 +1,750 @@
+% -*- mode: noweb; noweb-default-code-mode: R-mode; -*-
+%\VignetteIndexEntry{Model II Regression User Guide}
+\documentclass[a4paper,10pt,reqno]{amsart}
+\usepackage{ucs}
+\usepackage[utf8x]{inputenc}
+\usepackage[T1]{fontenc}
+\usepackage[authoryear,round]{natbib}
+\usepackage{hyperref}
+\usepackage{graphicx}
+\usepackage{tikz}
+\usepackage{sidecap}
+\setlength{\captionindent}{0pt}
+\usepackage{url}
+\usepackage{booktabs}
+\usepackage{alltt}
+\renewcommand{\floatpagefraction}{0.8}
+\usepackage{ae}
+
+\title{Model II regression user's guide, R edition}
+
+\author{Pierre Legendre}
+
+\address{D{\'e}partement de sciences biologiques, Universit{\'e} de
+Montr{\'e}al, C.P. 6128, succursale Centre-ville, Montr{\'e}al,
+Qu{\'e}bec H3C 3J7, Canada}
+\email{Pierre.Legendre at umontreal.ca}
+
+\date{$ $Id$ $}
+
+\begin{document}
+
+\setkeys{Gin}{width=0.55\linewidth}
+\SweaveOpts{strip.white=true}
+<<echo=false>>=
+require(lmodel2)
+options(width=72)
+figset <- function() par(mar=c(4,4,1,1)+.1)
+options(SweaveHooks = list(fig = figset))
+@
+
+
+\maketitle
+
+\tableofcontents
+
+Function \texttt{lmodel2} computes model II simple linear regression
+using the following methods: major axis (MA), standard major axis
+(SMA), ordinary least squares (OLS), and ranged major axis
+(RMA). Information about these methods is available, for instance, in
+section 10.3.2 of \citet{Legendre-Legendre98} and in sections 14.13
+and 15.7 of \citet{SokalRohlf95}\footnote{In Sokal and Rohlf
+  (Biometry, 2nd edition, 1981: 551), the numerical result for MA
+  regression for the example data set is wrong. The mistake has been
+  corrected in the 1995 edition.}. Parametric 95\% confidence
+intervals are computed for the slope and intercept parameters. A
+permutation test is available to determine the significance of the
+slopes of MA, OLS and RMA and also for the correlation
+coefficient. This function represents an evolution of a
+\textsc{Fortran} program written in 2000 and 2001.
+
+Bartlett's three-group model II regression method, described by the
+above mentioned authors, is not computed by the program because it
+suffers several drawbacks. Its main handicap is that the regression
+lines are not the same depending on whether the grouping (into three
+groups) is made based on $x$ or $y$. The regression line is not
+guaranteed to pass through the centroid of the scatter of points and
+the slope estimator is not symmetric, i.e. the slope of the regression
+$y = f(x)$ is not the reciprocal of the slope of the regression $x =
+f(y)$.
+
+Model II regression should be used when the two variables in the
+regression equation are random, i.e. not controlled by the
+researcher. Model I regression using least squares underestimates the
+slope of the linear relationship between the variables when they both
+contain error; see example in chapter \ref{sec:exa4}
+(p. \pageref{sec:exa4}). Detailed recommendations follow.
+
+\section{Recommendations on the use of model II regression methods}
+
+Considering the results of simulation studies,
+\citet{Legendre-Legendre98} offer the following recommendations to
+ecologists who have to estimate the parameters of the functional
+linear relationships between variables that are random and measured
+with error (Table \ref{tab:recommend}).
+
+\begin{table}
+\label{tab:recommend}
+  \caption{Application of the model II regression methods. The numbers
+    in the left-hand column refer to the corresponding paragraphs in
+    the text.}
+\begin{Small}
+\begin{tabular}{lllc}
+\toprule
+Par.& Method &Conditions of application& Test possible\\
+\midrule
+1 &OLS &Error on $y \gg$ error on $x$ &Yes\\
+3 &MA& Distribution is bivariate normal & Yes \\
+& &Variables are in the same physical units or dimensionless \\
+& &Variance of error about the same for $x$ and $y$ \\
+
+4 & &
+Distribution is bivariate normal\\
+& &Error variance on each axis proportional to variance of\\
+& &corresponding variable \\
+
+4a & RMA &
+Check scatter diagram: no outlier & Yes\\
+4b &SMA& Correlation $r$ is significant& No\\
+5 &OLS& Distribution is not bivariate normal& Yes\\
+& &Relationship between $x$ and $y$ is linear\\
+6 &OLS& To compute forecasted (fitted) or predicted $y$ values& Yes\\
+& &(Regression equation and confidence intervals are irrelevant) \\
+7& MA& To compare observations to model predictions &Yes\\
+\bottomrule
+\end{tabular}
+\end{Small}
+\end{table}
+
+\begin{enumerate}
+\item If the magnitude of the random variation (i.e. the error
+  variance\footnote{Contrary to the sample variance, the error
+    variance on $x$ or $y$ cannot be estimated from the data. An
+    estimate can only be made from knowledge of the way the variables
+    were measured.}) on the response variable $y$ is much larger
+  (i.e. more than three times) than that on the explanatory variable
+  $x$, use OLS.  Otherwise, proceed as follows.
+\item Check whether the data are approximately bivariate normal,
+  either by looking at a scatter diagram or by performing a formal
+  test of significance. If not, attempt transformations to render them
+  bivariate normal. For data that are or can be made to be reasonably
+  bivariate normal, consider recommendations 3 and 4. If not, see
+  recommendation 5.
+\item For bivariate normal data, use major axis (MA) regression if
+  both variables are expressed in the same physical units
+  (untransformed variables that were originally measured in the same
+  units) or are dimensionless (e.g. log-transformed variables), if it
+  can reasonably be assumed that the error variances of the variables
+  are approximately equal.
+
+  When no information is available on the ratio of the error variances
+  and there is no reason to believe that it would differ from 1, MA
+  may be used provided that the results are interpreted with
+  caution. MA produces unbiased slope estimates and accurate
+  confidence intervals \citep{Jolicoeur90}.
+
+  MA may also be used with dimensionally heterogeneous variables when
+  the purpose of the analysis is (1) to compare the slopes of the
+  relationships between the same two variables measured under
+  different conditions (e.g. at two or more sampling sites), or (2) to
+  test the hypothesis that the major axis does not significantly
+  differ from a value given by hypothesis (e.g. the relationship $E =
+  b_1 m$ where, according to the famous equation of Einstein, $b_1 =
+  c^2$, $c$ being the speed of light in vacuum).
+
+\item For bivariate normal data, if MA cannot be used because the
+  variables are not expressed in the same physical units or the error
+  variances on the two axes differ, two alternative methods are
+  available to estimate the parameters of the functional linear
+  relationship if it can reasonably be assumed that the error variance
+  on each axis is proportional to the variance of the corresponding
+  variable, i.e. (the error variance of $y$ / the sample variance of
+  $y$) (the error variance of $x$ / the sample variance of $x$). This
+  condition is often met with counts (e.g. number of plants or
+  animals) or log-transformed data \citep{McArdle88}.
+
+\begin{enumerate}
+\item Ranged major axis regression (RMA) can be used. The method is
+  described below. Prior to RMA, one should check for the presence of
+  outliers, using a scatter diagram of the objects.
+\item Standard major axis regression (SMA) can be used. One should
+  first test the significance of the correlation coefficient ($r$) to
+  determine if the hypothesis of a relationship is supported.  No SMA
+  regression equation should be computed when this condition is not
+  met.
+
+  This remains a less-than-ideal solution since SMA slope estimates
+  cannot be tested for significance.  Confidence intervals should also
+  be used with caution: simulations have shown that, as the slope
+  departs from $\pm 1$, the SMA slope estimate is increasingly biased
+  and the confidence interval includes the true value less and less
+  often. Even when the slope is near $\pm 1$ (e.g. example \S
+  \ref{sec:exa5}), the confidence interval is too narrow if $n$ is
+  very small or if the correlation is weak.
+\end{enumerate}
+
+\item If the distribution is not bivariate normal and the data cannot
+  be transformed to satisfy that condition (e.g. if the distribution
+  possesses two or several modes), one should wonder whether the slope
+  of a regression line is really an adequate model to describe the
+  functional relationship between the two variables. Since the
+  distribution is not bivariate normal, there seems little reason to
+  apply models such as MA, SMA or RMA, which primarily describe the
+  first principal component of a bivariate normal distribution. So,
+  (1) if the relationship is linear, OLS is recommended to estimate
+  the parameters of the regression line. The significance of the slope
+  should be tested by permutation, however, because the distributional
+  assumptions of parametric testing are not satisfied. (2) If a
+  straight line is not an appropriate model, polynomial or nonlinear
+  regression should be considered.
+
+\item When the purpose of the study is not to estimate the parameters
+  of a functional relationship, but simply to forecast or predict
+  values of $y$ for given $x$'s, use OLS in all cases.  OLS is the
+  only method that minimizes the squared residuals in y. The OLS
+  regression line itself is meaningless. Do not use the standard error
+  and confidence bands, however, unless $x$ is known to be free of
+  error \citep[p. 545, Table 14.3]{SokalRohlf95}; this warning applies
+  in particular to the 95\% confidence intervals computed for OLS by
+  this program.
+\item Observations may be compared to the predictions of a
+  statistical or deterministic model (e.g. simulation model) in order
+  to assess the quality of the model. If the model contains random
+  variables measured with error, use MA for the comparison since
+  observations and model predictions should be in the same units.
+
+  If the model fits the data well, the slope is expected to be $1$ and
+  the intercept $0$. A slope that significantly differs from $1$
+  indicates a difference between observed and simulated values which
+  is proportional to the observed values. For relative-scale
+  variables, an intercept which significantly differs from $0$
+  suggests the existence of a systematic difference between
+  observations and simulations \citep{Mesple.ea96}.
+
+\item With all methods, the confidence intervals are large when n is
+  small; they become smaller as n goes up to about 60, after which
+  they change much more slowly. Model II regression should ideally be
+  applied to data sets containing 60 observations or more. Some of the
+  examples presented below have fewer observations; they are only
+  presented for illustration.
+\end{enumerate}
+
+\section{Ranged major axis regression}
+
+Ranged major axis regression (RMA) is described in
+\citep[511--512]{Legendre-Legendre98}. It is computed as follows:
+
+\begin{enumerate}
+\item Transform the $y$ and $x$ variables into $y'$ and $x'$,
+  respectively, whose range is 1. Two formulas are available for
+  ranging, depending on the nature of the variables:
+  \begin{itemize}
+  \item For variables whose variation is expressed relative to an
+    arbitrary zero (interval-scale variables, e.g. temperature in
+    $^{\circ}$C), the formula for ranging is:
+\begin{equation}
+\label{eq:range1}
+y'_i = \frac{y_i - y_{\min}}{y_{\max} - y_{\min}} \quad \text{or} \quad
+x'_i = \frac{x_i - x_{\min}}{x_{\max}-x_{\min}}
+\end{equation}
+
+\item For variables whose variation is expressed relative to a true
+  zero value (ratio-scale or relative-scale variables, e.g. species
+  abundances, or temperature expressed in $^{\circ}$K), the recommended
+  formula for ranging assumes a minimum value of $0$;
+  eq. \ref{eq:range1} reduces to:
+\begin{equation}
+y'_i = \frac{y_i}{y_{\max}} \quad \text{or} \quad
+x'_i = \frac{x_i}{x_{\max}}
+\end{equation}
+\end{itemize}
+\item Compute MA regression between the ranged variables $y'$ and
+  $x'$. Test the significance of the slope estimate by permutation if
+  needed.
+\item Back-transform the estimated slope, as well as its confidence
+  interval limits, to the original units by multiplying them by the
+  ratio of the ranges:
+\begin{equation}
+b_1 = b'_1 \frac{y_{\max} - y_{\min}}{x_{\max} - x_{\min}}
+\end{equation}
+
+\item Recompute the intercept $b_0$ and its confidence interval
+  limits, using the original centroid ($\bar x$, $\bar y$) of the
+  scatter of points and the estimates of the slope $b_1$ and its
+  confidence limits:
+\begin{equation}
+b_0= \bar y - b_1 \bar x
+\end{equation}
+\end{enumerate}
+
+The RMA slope estimator has several desirable properties when the
+variables $x$ and $y$ are not expressed in the same units or when the
+error variances on the two axes differ.  (1) The slope estimator
+scales proportionally to the units of the two variables: the position
+of the regression line in the scatter of points remains the same
+irrespective of any linear change of scale of the variables. (2) The
+estimator is sensitive to the covariance of the variables; this is not
+the case for SMA. (3) Finally, and contrary to SMA, it is possible to
+test the hypothesis that an RMA slope estimate is equal to a stated
+value, in particular 0 or 1. As in MA, the test may be done either by
+permutation, or by comparing the confidence interval of the slope to
+the hypothetical value of interest. Thus, whenever MA regression
+cannot be used because of incommensurable units or because the error
+variances on the two axes differ, RMA regression can be used. There is
+no reason, however, to use RMA when MA is justified.
+
+Prior to RMA, one should check for the presence of outliers, using a
+scatter diagram of the objects. RMA should not be used in the presence
+of outliers because they cause important changes to the estimates of
+the ranges of the variables. Outliers that are not aligned fairly well
+with the dispersion ellipse of the objects may have an undesirable
+influence on the slope estimate. The identification and treatment of
+outliers is discussed in \citep[Section 13.4]{SokalRohlf95}. Outliers
+may, in some cases, be eliminated from the data set, or they may be
+subjected to a winsorizing procedure described by these authors.
+
+\section{Input file}
+
+A data frame with objects in rows and variables in columns.
+
+\section{Output file}
+
+The output file obtained by \texttt{print.lmodel2} contains the
+following results:
+
+\begin{enumerate}
+\item The call to the function.
+\item General regression statistics: number of objects ($n$),
+  correlation coefficient ($r$), coefficient of determination ($r^2$)
+  of the OLS regression, parametric $P$-values (2-tailed, one-tailed)
+  for the test of the correlation coefficient and the OLS slope, angle
+  between the two OLS regression lines, \texttt{lm(y \~\ x)} and
+  \texttt{lm(x \~\ y)}.
+\item A table with rows corresponding to the four regression
+  methods. Column 1 gives the method name, followed by the intercept
+  and slope estimates, the angle between the regression line and
+  the abscissa, and the permutational probability (one-tailed, for the
+  tail corresponding to the sign of the slope estimate).
+
+\item A table with rows corresponding to the four regression
+  methods. The method name is followed by the parametric 95\%
+  confidence intervals (2.5 and 97.5 percentiles) for the intercept
+  and slope estimates.
+\item The eigenvalues of the bivariate dispersion, computed during
+  major axis regression, and the $H$ statistic used for computing the
+  confidence interval of the major axis slope (notation following
+  \citealt{SokalRohlf95}).
+
+  For the slopes of MA, OLS and RMA, the permutation tests are carried
+  out using the slope estimates $b$ themselves as the reference
+  statistics. In OLS simple linear regression, a permutation test of
+  significance based on the $r$ statistic is equivalent to a
+  permutation test based on the pivotal $t$-statistic associated with
+  $b_{OLS}$ \citep[21]{Legendre-Legendre98}. On the other hand, across
+  the permutations, the slope estimate ($b_{OLS}$) differs from $r$ by a
+  constant ($s_y/s_x$) since $b_{OLS} = r_{xy} s_y/s_x$, so that $b_{OLS}$ and
+  $r$ are equivalent statistics for permutation testing. As a
+  consequence, a permutation test of $b_{OLS}$ is equivalent to a
+  permutation test carried out using the pivotal $t$-statistic
+  associated with $b_{OLS}$. This is not the case in multiple linear
+  regression, however, as shown by \citet{AnderLeg99}.
+\end{enumerate}
+
+If the objective is simply to assess the relationship between the two
+variables under study, one can simply compute the correlation
+coefficient r and test its significance. A parametric test can be used
+when the assumption of binormality can safely be assumed to hold, or a
+permutation test when it cannot.
+
+For the intercept of OLS, the confidence interval is computed using
+the standard formulas found in textbooks of statistics; results are
+identical to those of standard statistical software. No such formula,
+providing correct $\alpha$-coverage, is known for the other three
+methods. In the program, the confidence intervals for the intercepts
+of MA, SMA and RMA are computed by projecting the bounds of the
+confidence intervals of the slopes onto the ordinate; this results in
+an underestimation of these confidence intervals.
+
+\begin{figure}
+  \centering
+\resizebox{\linewidth}{!}{  
+  \begin{tikzpicture}
+    \draw[->, very thick] (0,0) -- (6,0);
+    \draw[<-, very thick]  (3,3) -- (3,-3);
+    \draw[->, very thick] (8,0) -- (14, 0);
+    \draw[->, very thick] (11,-3) -- (11,3);
+    \draw[->, thick] (6.8,-2) -- (8.6,-2);
+    \draw[<-, thick] (7.7,-1.1) -- (7.7,-2.8);
+    \draw[font=\footnotesize] (8.2,-1.6) node {I};
+    \draw[font=\footnotesize] (7.2,-1.6) node {II};
+    \draw[font=\footnotesize] (7.2,-2.4) node {III};
+    \draw[font=\footnotesize] (8.2,-2.4) node {IV};
+    \draw[dashed] (3,0) -- (3+3/2.75,3) node[above]{$b_{1 \inf} = 2.75$};
+    \draw (3,0) -- (3-3/2.75,-3) node[at start,sloped,above]{Lower bound
+    of CI};
+    \draw (3,0) -- (3+3/5.67,-3) node[midway,sloped,above]{MA
+      regression line} node[below]{$b_1 = -5.67$};
+    \draw (3,0) -- (3+2.7/1.19, -2.7) node[midway,sloped,above]{Upper
+      bound of CI} node[below]{$b_{1 \sup}=-1.19$};
+    \draw (11,0) -- (11-3/5.67,3);
+    \draw[dashed] (11,0) -- (11+3/5.67,-3)
+    node[at start,sloped,below]{Upper bound of CI} node[below]{$b_{1 \sup}
+      = -5.67$};
+    \draw (11,0) -- (11+3/2.75, 3) node[near end,sloped,above]{MA
+      regression line} node[right]{$b_1 = 2.75$};
+    \draw (11,0) -- (11+2/0.84, 2) node[near end,sloped,above]{Lower
+      bound of CI} node[right]{$b_{1 \inf} =0.84$};
+    \draw[font=\large] (0,3) node{(a)};
+    \draw[font=\large] (8,3) node{(b)};
+  \end{tikzpicture}
+}
+  \caption{(a) If a MA regression line has the lower bound of its
+    confidence interval (C.I.) in quadrant III, this bound has a
+    positive slope ($+2.75$ in example). (b) Likewise, if a MA
+    regression line has the upper bound of its confidence interval in
+    quadrant II, this bound has a negative slope ($-5.67$ in example).}
+  \label{fig:rma}
+\end{figure}
+In MA or RMA regression, the bounds of the confidence interval (C.I.)
+of the slope may, on occasions, lie outside quadrants I and IV of the
+plane centred on the centroid of the bivariate distribution. When the
+\emph{lower bound} of the confidence interval corresponds to a line in
+quadrant III (Fig. \ref{fig:rma}a), it has a positive slope; the RMA
+regression line of example in chapter \ref{sec:exa5}
+(p. \pageref{sec:exa5}) provides an example of this
+phenomenon. Likewise, when the \emph{upper bound} of the confidence
+interval corresponds to a line in quadrant II (Fig. \ref{fig:rma}b),
+it has a negative slope. In other instances, the confidence interval
+of the slope may occupy all 360$^{\circ}$ of the plane, which results
+in it having no bounds. The bounds are then noted 0.00000; see chapter
+\ref{sec:exa5} (p. \pageref{sec:exa5}).
+
+In SMA or OLS, confidence interval bounds cannot lie outside quadrants
+I and IV. In SMA, the regression line always lies at a $+45^{\circ}$
+or $-45^{\circ}$ angle in the space of the standardized variables; the
+SMA slope is a back-transformation of $\pm45^{\circ}$ to the units of
+the original variables. In OLS, the slope is always at an angle closer
+to zero than the major axis of the dispersion ellipse of the points,
+i.e. it always underestimates the MA slope in absolute value.
+
+
+\section{Examples}
+\subsection{Surgical unit data}
+\label{sec:exa1}
+
+\subsubsection{Input data}
+
+This example compares observations to the values forecasted by a
+model. A hospital surgical unit wanted to forecast survival of
+patients undergoing a particular type of liver surgery. Four
+explanatory variables were measured on patients. The response variable
+Y was survival time, which was $\log_{10}$-transformed. The data are
+described in detail in Section 8.2 of \citet{Neter.ea96} who also
+provide the original data sets. The data were divided in two groups of
+54 patients. The first group was used to construct forecasting models
+whereas the second group was reserved for model validation. Several
+regression models were studied. One of them, which uses variables $X_3 =$
+enzyme function test score and $X_4$ = liver function test score, is used
+as the basis for the present example. The multiple regression equation
+is the following:
+
+
+\begin{equation*}
+\hat Y = 1.388778 + 0.005653 X_3 + 0.139015 X_4
+\end{equation*}
+This equation was applied to the second data set (also 54 patients) to
+produce forecasted survival times. In the present example, these
+values are compared to the observed survival times. Fig. \ref{fig:ex1}
+shows the scatter diagram with $\log_{10}$(observed survival time) in
+abscissa and forecasted values in ordinate. The MA regression line is
+shown with its 95\% confidence region. The $45^{\circ}$ line, which
+would correspond to perfect forecasting, is also shown for comparison.
+
+\subsubsection{Output file}
+
+MA, SMA and OLS equations, 95\% C.I., and tests of significance, were
+obtained with the following R commands. The RMA method, which is
+optional, was not computed since MA is the only appropriate method in
+this example.
+
+<<>>=
+data(mod2ex1)
+Ex1.res <- lmodel2(Predicted_by_model ~ Survival, data=mod2ex1, nperm=99)
+Ex1.res
+@
+\begin{SCfigure}
+<<fig=true,echo=false,results=hide>>=
+plot(Ex1.res, centro = TRUE, xlab="log(observed survival time", ylab="Forecasted")
+abline(diff(colMeans(mod2ex1)), 1, lty=2)
+legend("topleft", c("MA regression", "Confidence limits", "45 degree line"), col=c("red", "grey", "black"), lty=c(1,1,2))
+@
+\caption{Scatter diagram of the 2.5 Example 1 data showing the major
+  axis (MA) regression line and its 95\% confidence region. The
+  45$^{\circ}$ line is drawn for reference. The cross indicates the
+  centroid of the bivariate distribution. The MA regression line
+  passes through this centroid.}
+\label{fig:ex1}
+\end{SCfigure}
+
+The interesting aspect of the MA regression equation in this example
+is that the regression line is not parallel to the 45$^{\circ}$ line
+drawn in Fig. \ref{fig:ex1}. The 45$^{\circ}$ line is not included in
+the 95\% confidence interval of the MA slope, which goes from
+$\tan^{-1}(0.62166) = 31.87$ to $\tan^{-1}(0.89456) = 41.81$. The
+Figure shows that the forecasting equation overestimated survival
+below the mean and underestimated it above the mean. The OLS
+regression line, which is often (erroneously) used by researchers for
+comparisons of this type, would show an even greater discrepancy
+(33.3$^{\circ}$ angle) from the 45$^{\circ}$ line, compared to the MA
+regression line (36.8$^{\circ}$ angle).
+
+\subsection{Eagle rays and \emph{Macomona} bivalves}
+\label{sec:exa2}
+
+\subsubsection{Input data}
+
+The following table presents observations at 20 sites from a study on
+predator-prey relationships \citep{Hines.ea97}. $y$ is the number of
+bivalves (\emph{Macomona liliana}) larger than 15 mm in size, found in
+0.25 m$^2$ quadrats of sediment; $x$ is the number of sediment
+disturbance pits of a predator, the eagle ray (\emph{Myliobatis
+  tenuicaudatus}), found within circles of a 15 m radius around the
+bivalve quadrats.
+
+
+The variables $x$ and $y$ are expressed in the same physical units and
+are estimated with sampling error, and their distribution is
+approximately bivariate normal. The error variance is not the same for
+$x$ and $y$ but, since the data are animal counts, it seems reasonable
+to assume that the error variance along each axis is proportional to
+the variance of the corresponding variable. The correlation is
+significant: $r = 0.86$, $p < 0.001$. RMA and SMA are thus appropriate
+for this data set; MA and OLS are not. Fig. \ref{fig:ex2} shows the
+scatter diagram.  The various regression lines are presented to allow
+their comparison.
+
+\subsubsection{Output file}
+
+MA, SMA, OLS and RMA regression equations, confidence intervals, and
+tests of significance, were obtained with the following R commands.
+That the 95\% confidence intervals of the SMA and RMA intercepts do
+not include 0 may be due to different reasons: (1) the relationship
+may not be perfectly linear; (2) the C.I. of the intercepts are
+underestimated; (3) the predators (eagle rays) may not be attracted to
+sampling locations containing few prey (bivalves).
+<<>>=
+data(mod2ex2)
+Ex2.res = lmodel2(Prey ~ Predators, data=mod2ex2, "relative","relative",99)
+Ex2.res
+@
+\begin{SCfigure}
+<<fig=true,echo=false>>=
+plot(Ex2.res, confid=FALSE, xlab="Eagle rays (predators)", ylab="Bivalves (prey)", main = "", centr=TRUE)
+lines(Ex2.res, "OLS", col=1, confid=FALSE)
+lines(Ex2.res, "SMA", col=3, confid=FALSE)
+lines(Ex2.res, "RMA", col=4, confid=FALSE)
+legend("topleft", c("OLS", "MA", "SMA", "RMA"), col=1:4, lty=1)
+@
+\caption{Scatter diagram of the Example 2 data (number of bivalves as
+  a function of the number of eagle rays) showing the major axis (MA),
+  standard major axis (SMA), ordinary 50 least-squares (OLS) and
+  ranged major axis (RMA) regression lines. SMA and RMA are the
+  appropriate regression lines in this example. The cross indicates
+  the centroid of the bivariate distribution. The four regression
+  lines pass through this centroid.  }
+\label{fig:ex2}
+\end{SCfigure}
+
+[1] In this table of the output file, the rows correspond,
+respectively, to the MA, OLS and RMA slopes and to the coefficient of
+correlation $r$ (\texttt{Corr}). \texttt{Stat.} is the value of the
+statistic being tested for significance. As explained in the
+\emph{Output file section}, the statistic actually used by the program
+for the test of the MA slope, in this example, is the inverse of the
+$b_{MA}$ slope estimate ($1/3.46591 = 0.28852$) because the reference
+value of the statistic in this permutation test must not exceed
+1. One-tailed probabilities (\texttt{One-tailed p}) are computed in
+the direction of the sign of the coefficient. For a one-tailed test in
+the upper tail (i.e. for a coefficient with a positive sign), $p =$ (EQ
++ GT)/(Number of permutations + 1). For a test in the lower tail
+(i.e. for a coefficient with a negative sign), $p =$ (LT + EQ)/(Number
+of permutations + 1), where
+\begin{itemize}
+\item LT is the number of values under permutation that are smaller
+  than the reference value;
+\item EQ is the number of values under permutation that are equal to
+  the reference value of the statistic, plus 1 for the reference value
+  itself;
+\item GT is the number of values under permutation that are greater
+  than the reference value.
+\end{itemize}
+
+\subsection{Cabezon spawning}%
+\label{sec:exa3}%
+\subsubsection{Input data}
+
+The following table presents data used by \citet[Box
+14.12]{SokalRohlf95} to illustrate model II regression analysis. They
+concern the mass ($x$) of unspawned females of a California fish, the
+cabezon (\emph{Scorpaenichthys marmoratus}), and the number of eggs
+they subsequently produced ($y$). One may be interested to estimate
+the functional equation relating the number of eggs to the mass of
+females before spawning. The physical units of the variables are as in
+the table published by \citet[546]{SokalRohlf95}.
+
+Since the variables are in different physical units and are estimated
+with error, and their distribution is approximately bivariate normal,
+RMA and SMA are appropriate for this example; MA is inappropriate. The
+OLS regression line is meaningless; in model II regression, OLS should
+only be used for forecasting or prediction. It is plotted in
+Fig. \ref{fig:ex3} only to allow comparison.
+
+The RMA and SMA regression lines are nearly indistinguishable in this
+example. The slope of RMA can be tested for significance ($H0$:
+$b_{RMA} = 0$), however, whereas the SMA slope cannot. The 95\%
+confidence intervals of the intercepts of RMA and SMA, although
+underestimated, include the value 0, as expected if a linear model
+applies to the data: a female with a mass of 0 is expected to produce
+no egg.
+
+Another interesting property of RMA and SMA is that their estimates of
+slope and intercept change proportionally to changes in the units of
+measurement. One can easily verify that by changing the decimal places
+in the Example 2 data file and recomputing the regression
+equations. RMA and SMA share this property with OLS. MA regression
+does not have this property; this is why it should only be used with
+variables that are in the same physical units, as those of Example 1.
+
+\subsubsection{Output file}
+
[TRUNCATED]

To get the complete diff run:
    svnlook diff /svnroot/vegan -r 2639


More information about the Vegan-commits mailing list