[Rcpp-commits] r1269 - pkg/RcppArmadillo/man

Tue May 18 16:14:44 CEST 2010

Author: dmbates
Date: 2010-05-18 16:14:44 +0200 (Tue, 18 May 2010)
New Revision: 1269

Modified:
   pkg/RcppArmadillo/man/fastLm.Rd
Log:
Added an example showing rank deficiency.


Modified: pkg/RcppArmadillo/man/fastLm.Rd
===================================================================

--- pkg/RcppArmadillo/man/fastLm.Rd	2010-05-18 14:14:09 UTC (rev 1268)
+++ pkg/RcppArmadillo/man/fastLm.Rd	2010-05-18 14:14:44 UTC (rev 1269)
@@ -52,13 +52,16 @@
   package is because \code{Armadillo} uses the Lapack version of the QR
   decomposition while the stats package uses a \emph{modified} Linpack
   version.  Hence \code{Armadillo} uses level-3 BLAS code whereas the
-  stats package uses level-1 BLAS.  However, \code{Armadillo} will choke
+  stats package uses level-1 BLAS.  However, \code{Armadillo} will
+  either fail or, worse, produce completely incorrect answers
   on rank-deficient model matrices whereas the functions from the stats
   package will handle them properly due to the modified Linpack code.
-  Statisticians want a pivoting scheme of \dQuote{pivot only on
-  (apparent) rank deficiency} and numerical analysts have no idea why
-  statisticians want this so it is not part of conventional linear
-  algebra software.
+
+  An example of the type of situation requiring extra care in checking
+  for rank deficiency is a two-way layout with missing cells (see the
+  examples section).  These cases require a special pivoting scheme of
+  \dQuote{pivot only on (apparent) rank deficiency} which is not part of
+  conventional linear algebra software.
 }
 \value{
   \code{fastLm} returns a list with three components:
@@ -87,6 +90,17 @@
   ## standard R interface for formula or data returning object of class fastLm
   flmmod <- fastLm( log(Volume) ~ log(Girth), data=trees)
   summary(flmmod)
+
+  ## case where fastLm breaks down
+  dd <- data.frame(f1 = gl(4, 6, labels = LETTERS[1:4]),
+                   f2 = gl(3, 2, labels = letters[1:3]))[-(7:8), ]
+  xtabs(~ f2 + f1, dd)     # one missing cell
+  mm <- model.matrix(~ f1 * f2, dd)
+  kappa(mm)                # large, indicating rank deficiency
+  set.seed(1)
+  dd$y <- mm \%*\% seq_len(ncol(mm)) + rnorm(nrow(mm), sd = 0.1)
+  summary(lm(y ~ f1 * f2, dd))     # detects rank deficiency
+  summary(fastLm(y ~ f1 * f2, dd)) # some huge coefficients
 }
 \keyword{regression}