[adegenet-commits] r554 - pkg/man
noreply at r-forge.r-project.org
noreply at r-forge.r-project.org
Mon Feb 8 16:06:59 CET 2010
Author: jombart
Date: 2010-02-08 16:06:59 +0100 (Mon, 08 Feb 2010)
New Revision: 554
Modified:
pkg/man/find.clusters.Rd
Log:
Done find.clusters doc.
Modified: pkg/man/find.clusters.Rd
===================================================================
--- pkg/man/find.clusters.Rd 2010-02-08 14:39:53 UTC (rev 553)
+++ pkg/man/find.clusters.Rd 2010-02-08 15:06:59 UTC (rev 554)
@@ -56,10 +56,10 @@
be sought. If provided, the function will only run K-means once, for this
number of clusters. If left as \code{NULL}, several K-means are run for a
range of k (number of clusters) values.}
-\item{stat}{ a \code{character} string matching 'BIC', 'AIC', or 'WISS', which
+\item{stat}{ a \code{character} string matching 'BIC', 'AIC', or 'WSS', which
indicates the statistic to be computed for each model (i.e., for each value of
\code{k}). BIC: Bayesian Information Criterion. AIC: Aikaike's Information
- Criterion. WISS: within-groups sum of squares, that is, residual variance.}
+ Criterion. WSS: within-groups sum of squares, that is, residual variance.}
\item{choose.n.clust}{ a \code{logical} indicating whether the number of
clusters should be chosen by the user (TRUE, default), or automatically, based
on a given criterion (argument \code{criterion}). IT IS HIGHLY RECOMMENDED to
@@ -100,7 +100,7 @@
So far, the analysis of data simulated under various population genetics
models (see reference) suggested an ad hoc rule for the selection of the
- optimal number of clusters. First, BIC seems for efficient than AIC and WISS
+ optimal number of clusters. First, BIC seems for efficient than AIC and WSS
to select the appropriate number of clusters. The rule of thumb consists in
increasing K until it no longer leads to an appreciable improve of fit (i.e.,
decrease of BIC). In the most simple models (island models), BIC decreases
@@ -118,7 +118,7 @@
- "diff": model selection based on successive improvement of the test
statistic. This procedure attempts to increase K until the model improvement
- (difference in successive BIC, AIC, or WISS) is no longer important. May be
+ (difference in successive BIC, AIC, or WSS) is no longer important. May be
more appropriate to models relating to stepping stones.
"conserv": another criterion meant to be conservative, in that it seeks a good
@@ -129,8 +129,12 @@
\value{
The class \code{find.clusters} is a list with the following
components:\cr
- \item{}{}
-
+ \item{Kstat}{a \code{numeric} vector giving the values of the statistics for the
+ different values of K. Is NULL if \code{n.clust} was specified.}
+ \item{stat}{a \code{numeric} value giving the value of the statistics for the
+ retained model}
+ \item{grp}{a \code{factor} giving group membership for each individual.}
+ \item{size}{an \code{integer} vector giving the size of the different clusters.}
}
\references{
Jombart, T., Devillard, S. and Balloux, F.
@@ -138,25 +142,58 @@
genetically structured populations. Submitted to \emph{PLoS genetics}.
}
\seealso{
- \code{\link{}}
- \code{\link{}}
- \code{\link{}}
- \code{\link{}}
- \code{\link{}}
+ - \code{\link{dapc}}: implements the DAPC.
+
+ - \code{\link[pkg:stats]{kmeans}}: implementation of K-means in the stat
+ package.
+
+ - \code{\link{eHGDP}}: dataset illustrating the DAPC and \code{find.clusters}.
}
\author{ Thibaut Jombart \email{t.jombart at imperial.ac.uk} }
\examples{
-## data(find.clustersIllus), data(eHGDP), and data(H3N2) illustrate the find.clusters
-## see ?find.clustersIllus, ?eHGDP, ?H3N2
-##
+\dontrun{
+## THIS ONE TAKES A FEW MINUTES TO RUN ##
+data(eHGDP)
-example(find.clusters)
+## here, n.clust is specified, so that only on K value is used
+grp <- find.clusters(eHGDP, max.n=30, n.pca=200, scale=FALSE, n.clust=4) # takes about 2 minutes
+names(grp)
+grp$Kstat
+grp$stat
-\dontrun{
-example(eHGDP)
-example(H3N2)
+## to try different values of k (interactive)
+grp <- find.clusters(hgdp, max.n=50, n.pca=200, scale=FALSE)
+
+## and then, to plot BIC values:
+plot(grp$Kstat, type="b", col="blue")
}
+
+## ANOTHER SIMPLE EXAMPLE ##
+data(sim2pop) # this actually contains 2 pop
+
+## DETECTION WITH BIC (clear result)
+foo.BIC <- find.clusters(sim2pop, n.pca=100, choose=FALSE)
+plot(foo.BIC$Kstat, type="o", xlab="number of clusters (K)", ylab="BIC",
+col="blue", main="Detection based on BIC")
+points(2, foo.BIC$Kstat[2], pch="x", cex=3)
+mtext(3, tex="'X' indicates the actual number of clusters")
+
+
+## DETECTION WITH AIC (less clear-cut)
+foo.AIC <- find.clusters(sim2pop, n.pca=100, choose=FALSE, stat="AIC")
+plot(foo.AIC$Kstat, type="o", xlab="number of clusters (K)", ylab="AIC", col="purple", main="Detection based on AIC")
+points(2, foo.AIC$Kstat[2], pch="x", cex=3)
+mtext(3, tex="'X' indicates the actual number of clusters")
+
+
+## DETECTION WITH WSS (less clear-cut)
+foo.WSS <- find.clusters(sim2pop, n.pca=100, choose=FALSE, stat="WSS")
+plot(foo.WSS$Kstat, type="o", xlab="number of clusters (K)", ylab="WSS
+(residual variance)", col="red", main="Detection based on WSS")
+points(2, foo.WSS$Kstat[2], pch="x", cex=3)
+mtext(3, tex="'X' indicates the actual number of clusters")
+
}
\keyword{multivariate}
More information about the adegenet-commits
mailing list