From noreply at r-forge.r-project.org Wed May 8 15:19:07 2013
From: noreply at r-forge.r-project.org (noreply at r-forge.r-project.org)
Date: Wed, 8 May 2013 15:19:07 +0200 (CEST)
Subject: [adegenet-commits] r1121 - pkg/vignettes www
Message-ID: <20130508131908.0DF731802F0@r-forge.r-project.org>
Author: jombart
Date: 2013-05-08 15:19:07 +0200 (Wed, 08 May 2013)
New Revision: 1121
Modified:
pkg/vignettes/adegenet-dapc.tex
www/literature.html
Log:
+1ref
Modified: pkg/vignettes/adegenet-dapc.tex
===================================================================
--- pkg/vignettes/adegenet-dapc.tex 2013-04-30 09:59:19 UTC (rev 1120)
+++ pkg/vignettes/adegenet-dapc.tex 2013-05-08 13:19:07 UTC (rev 1121)
@@ -21,7 +21,7 @@
\newcommand{\code}[1]{{{\tt #1}}}
-\title{A tutorial for Discriminant Analysis of Principal Components (DAPC) using \textit{adegenet} 1.3-0}
+\title{A tutorial for Discriminant Analysis of Principal Components (DAPC) using \textit{adegenet} 1.3-7}
\author{Thibaut Jombart}
\date{\today}
@@ -204,12 +204,12 @@
We specify that we want to evaluate up to $k=40$ groups (\texttt{max.n.clust=40}):
\begin{Schunk}
\begin{Sinput}
-> grp <- find.clusters(x, max.n.clust = 40)
+> grp <- find.clusters(x, max.n.clust=40)
\end{Sinput}
\end{Schunk}
\begin{center}
- \includegraphics[width=.7\textwidth]{figs/findclust-pca.pdf}
+ \includegraphics[width=.7\textwidth]{findclust-pca.pdf}
\end{center}
\noindent
@@ -221,7 +221,7 @@
Then, the function displays a graph of BIC values for increasing values of $k$:
\begin{center}
- \includegraphics[width=.7\textwidth]{figs/findclust-bic.pdf}
+ \includegraphics[width=.7\textwidth]{findclust-bic.pdf}
\end{center}
\noindent This graph shows a clear decrease of BIC until $k=6$ clusters, after which BIC increases.
@@ -254,14 +254,14 @@
\end{Sinput}
\begin{Soutput}
001 002 003 004 005 006 007 008 009 010
- 1 1 1 5 1 1 1 1 1 1
+ 6 6 6 2 6 6 6 6 6 6
Levels: 1 2 3 4 5 6
\end{Soutput}
\begin{Sinput}
> grp$size
\end{Sinput}
\begin{Soutput}
-[1] 98 97 102 99 105 99
+[1] 97 105 102 99 99 98
\end{Soutput}
\end{Schunk}
@@ -276,19 +276,18 @@
\end{Sinput}
\begin{Soutput}
1 2 3 4 5 6
- 1 97 0 0 0 3 0
- 2 0 0 0 99 1 0
- 3 0 2 0 0 0 98
+ 1 0 3 0 0 0 97
+ 2 0 1 0 0 99 0
+ 3 2 0 0 98 0 0
4 0 0 100 0 0 0
- 5 1 95 2 0 2 0
- 6 0 0 0 0 99 1
+ 5 95 2 2 0 0 1
+ 6 0 99 0 1 0 0
\end{Soutput}
\begin{Sinput}
-> table.value(table(pop(x), grp$grp), col.lab = paste("inf", 1:6),
-+ row.lab = paste("ori", 1:6))
+> table.value(table(pop(x), grp$grp), col.lab=paste("inf", 1:6), row.lab=paste("ori", 1:6))
\end{Sinput}
\end{Schunk}
-\includegraphics{figs/dapc-006}
+\includegraphics{dapc-006}
\noindent
Rows correspond to actual groups ("ori''), while columns correspond to inferred groups ("inf'').
@@ -313,7 +312,7 @@
better, more efficient summaries of the data than others.
For instance, in the following case:
\begin{center}
- \includegraphics[width=.7\textwidth]{figs/findclust-noclearcut.pdf}
+ \includegraphics[width=.7\textwidth]{findclust-noclearcut.pdf}
\end{center}
\noindent , the concept of "true $k$" is fairly hypothetical. This does not mean that clutering
@@ -384,7 +383,7 @@
probabilities returned by the method (see section below about the stability of membership probabilities).
\begin{center}
- \includegraphics[width=.7\textwidth]{figs/findclust-pca.pdf}
+ \includegraphics[width=.7\textwidth]{findclust-pca.pdf}
\end{center}
\noindent The bottomline is therefore retaining a few PCs without sacrificing too much information.
@@ -394,7 +393,7 @@
Then, the method displays a barplot of eigenvalues for the discriminant analysis, asking for a
number of discriminant functions to retain (unless argument \texttt{n.da} is provided).
\begin{center}
- \includegraphics[width=.7\textwidth]{figs/eigen-dapc.pdf}
+ \includegraphics[width=.7\textwidth]{eigen-dapc.pdf}
\end{center}
For small number of clusters, all eigenvalues can be retained since all discriminant functions can
@@ -409,9 +408,9 @@
> dapc1
\end{Sinput}
\begin{Soutput}
- #########################################
+ #################################################
# Discriminant Analysis of Principal Components #
- #########################################
+ #################################################
class: dapc
$call: dapc.genind(x = x, pop = grp$grp, n.pca = 40, n.da = 100)
@@ -452,7 +451,7 @@
> scatter(dapc1)
\end{Sinput}
\end{Schunk}
-\includegraphics{figs/dapc-010}
+\includegraphics{dapc-010}
\noindent The obtained graph represents the individuals as dots and the groups as inertia
ellipses. Eigenvalues of the analysis are displayed in inset. These graphs are fairly easy to
@@ -481,33 +480,31 @@
\begin{Schunk}
\begin{Sinput}
-> scatter(dapc1, posi.da = "bottomright", bg = "white", pch = 17:22)
+> scatter(dapc1, posi.da="bottomright", bg="white", pch=17:22)
\end{Sinput}
\end{Schunk}
-\includegraphics{figs/dapc-011}
+\includegraphics{dapc-011}
\noindent This is still not entirely satisfying: we need to define other colors more visible over a white
background, and we can remove the segments linking the points to their ellipses:
\begin{Schunk}
\begin{Sinput}
-> myCol <- c("darkblue", "purple", "green", "orange", "red", "blue")
-> scatter(dapc1, posi.da = "bottomright", bg = "white", pch = 17:22,
-+ cstar = 0, col = myCol, scree.pca = TRUE, posi.pca = "bottomleft")
+> myCol <- c("darkblue","purple","green","orange","red","blue")
+> scatter(dapc1, posi.da="bottomright", bg="white", pch=17:22, cstar=0, col=myCol, scree.pca=TRUE, posi.pca="bottomleft")
\end{Sinput}
\end{Schunk}
-\includegraphics{figs/dapc-012}
+\includegraphics{dapc-012}
\noindent Another possibility is remove the labels within the ellipses and add a legend to the
plot. We also use the same symbol for all individuals, but use bigger dots and transparent colours
to have a better feel for the density of individuals on the factorial plane.
\begin{Schunk}
\begin{Sinput}
-> scatter(dapc1, scree.da = FALSE, bg = "white", pch = 20, cell = 0,
-+ cstar = 0, col = myCol, solid = 0.4, cex = 3, clab = 0, leg = TRUE,
-+ txt.leg = paste("Cluster", 1:6))
+> scatter(dapc1, scree.da=FALSE, bg="white", pch=20, cell=0, cstar=0, col=myCol, solid=.4,
++ cex=3,clab=0, leg=TRUE, txt.leg=paste("Cluster",1:6))
\end{Sinput}
\end{Schunk}
-\includegraphics{figs/dapc-013}
+\includegraphics{dapc-013}
We can also add a minimum spanning tree based on the (squared) distances between populations within the
entire space.
@@ -519,27 +516,22 @@
using \texttt{scree.pca=TRUE}).
\begin{Schunk}
\begin{Sinput}
-> scatter(dapc1, ratio.pca = 0.3, bg = "white", pch = 20, cell = 0,
-+ cstar = 0, col = myCol, solid = 0.4, cex = 3, clab = 0, mstree = TRUE,
-+ scree.da = FALSE, posi.pca = "bottomright", leg = TRUE, txt.leg = paste("Cluster",
-+ 1:6))
-> par(xpd = TRUE)
-> points(dapc1$grp.coord[, 1], dapc1$grp.coord[, 2], pch = 4, cex = 3,
-+ lwd = 8, col = "black")
-> points(dapc1$grp.coord[, 1], dapc1$grp.coord[, 2], pch = 4, cex = 3,
-+ lwd = 2, col = myCol)
-> myInset <- function() {
+> scatter(dapc1, ratio.pca=0.3, bg="white", pch=20, cell=0, cstar=0, col=myCol, solid=.4,
++ cex=3, clab=0, mstree=TRUE, scree.da=FALSE,
++ posi.pca="bottomright", leg=TRUE, txt.leg=paste("Cluster",1:6))
+> par(xpd=TRUE)
+> points(dapc1$grp.coord[,1], dapc1$grp.coord[,2], pch=4, cex=3, lwd=8, col="black")
+> points(dapc1$grp.coord[,1], dapc1$grp.coord[,2], pch=4, cex=3, lwd=2, col=myCol)
+> myInset <- function(){
+ temp <- dapc1$pca.eig
-+ temp <- 100 * cumsum(temp)/sum(temp)
-+ plot(temp, col = rep(c("black", "lightgrey"), c(dapc1$n.pca,
-+ 1000)), ylim = c(0, 100), xlab = "PCA axis", ylab = "Cumulated variance (%)",
-+ cex = 1, pch = 20, type = "h", lwd = 2)
++ temp <- 100* cumsum(temp)/sum(temp)
++ plot(temp, col=rep(c("black","lightgrey"), c(dapc1$n.pca,1000)), ylim=c(0,100),
++ xlab="PCA axis", ylab="Cumulated variance (%)", cex=1, pch=20, type="h", lwd=2)
+ }
-> add.scatter(myInset(), posi = "bottomright", inset = c(-0.03,
-+ -0.01), ratio = 0.28, bg = transp("white"))
+> add.scatter(myInset(), posi="bottomright", inset=c(-0.03,-0.01), ratio=.28, bg=transp("white"))
\end{Sinput}
\end{Schunk}
-\includegraphics{figs/dapc-014}
+\includegraphics{dapc-014}
Lastly, note that \texttt{scatter} can also represent a single discriminant function, which is
@@ -548,11 +540,10 @@
different colors for different groups:
\begin{Schunk}
\begin{Sinput}
-> scatter(dapc1, 1, 1, col = myCol, bg = "white", scree.da = FALSE,
-+ legend = TRUE, solid = 0.4)
+> scatter(dapc1,1,1, col=myCol, bg="white", scree.da=FALSE, legend=TRUE, solid=.4)
\end{Sinput}
\end{Schunk}
-\includegraphics{figs/dapc-015}
+\includegraphics{dapc-015}
@@ -604,7 +595,7 @@
\end{Soutput}
\begin{Sinput}
> pop(H3N2) <- H3N2$other$epid
-> dapc.flu <- dapc(H3N2, n.pca = 30, n.da = 10)
+> dapc.flu <- dapc(H3N2, n.pca=30,n.da=10)
\end{Sinput}
\end{Schunk}
@@ -612,22 +603,20 @@
second one shows the originality of 2006 strains.
\begin{Schunk}
\begin{Sinput}
-> myPal <- colorRampPalette(c("blue", "gold", "red"))
-> scatter(dapc.flu, col = transp(myPal(6)), scree.da = FALSE, cell = 1.5,
-+ cex = 2, bg = "white", cstar = 0)
+> myPal <- colorRampPalette(c("blue","gold","red"))
+> scatter(dapc.flu, col=transp(myPal(6)), scree.da=FALSE, cell=1.5, cex=2, bg="white",cstar=0)
\end{Sinput}
\end{Schunk}
-\includegraphics{figs/dapc-017}
+\includegraphics{dapc-017}
We can assess which alleles most highlight the originality of 2006 using \texttt{loadingplot}:
\begin{Schunk}
\begin{Sinput}
> set.seed(4)
-> contrib <- loadingplot(dapc.flu$var.contr, axis = 2, thres = 0.07,
-+ lab.jitter = 1)
+> contrib <- loadingplot(dapc.flu$var.contr, axis=2, thres=.07, lab.jitter=1)
\end{Sinput}
\end{Schunk}
-\includegraphics{figs/dapc-018}
+\includegraphics{dapc-018}
\noindent \texttt{temp} is a list invisibly returned by \texttt{loadingplot} which contains the most
contributing alleles (i.e., contributions above a given threshold -- argument \texttt{threshold}).
@@ -638,10 +627,8 @@
> temp <- seploc(H3N2)
> snp906 <- truenames(temp[["906"]])$tab
> snp399 <- truenames(temp[["399"]])$tab
-> freq906 <- apply(snp906, 2, function(e) tapply(e, pop(H3N2),
-+ mean, na.rm = TRUE))
-> freq399 <- apply(snp399, 2, function(e) tapply(e, pop(H3N2),
-+ mean, na.rm = TRUE))
+> freq906 <- apply(snp906, 2, function(e) tapply(e, pop(H3N2), mean, na.rm=TRUE))
+> freq399 <- apply(snp399, 2, function(e) tapply(e, pop(H3N2), mean, na.rm=TRUE))
> freq906
\end{Sinput}
\begin{Soutput}
@@ -666,16 +653,14 @@
2006 0.357142857 0.6428571
\end{Soutput}
\begin{Sinput}
-> par(mfrow = c(1, 2), mar = c(5.1, 4.1, 4.1, 0.1), las = 3)
-> matplot(freq906, pch = c("a", "c"), type = "b", xlab = "year",
-+ ylab = "allele frequency", xaxt = "n", cex = 1.5, main = "SNP # 906")
-> axis(side = 1, at = 1:6, lab = 2001:2006)
-> matplot(freq399, pch = c("c", "t"), type = "b", xlab = "year",
-+ ylab = "allele frequency", xaxt = "n", cex = 1.5, main = "SNP # 399")
-> axis(side = 1, at = 1:6, lab = 2001:2006)
+> par(mfrow=c(1,2), mar=c(5.1,4.1,4.1,.1),las=3)
+> matplot(freq906, pch=c("a","c"), type="b",xlab="year",ylab="allele frequency", xaxt="n", cex=1.5, main="SNP # 906")
+> axis(side=1, at=1:6, lab=2001:2006)
+> matplot(freq399, pch=c("c","t"), type="b", xlab="year",ylab="allele frequency", xaxt="n", cex=1.5, main="SNP # 399")
+> axis(side=1, at=1:6, lab=2001:2006)
\end{Sinput}
\end{Schunk}
-\includegraphics{figs/dapc-019}
+\includegraphics{dapc-019}
In both cases, a new allele appeared in 2005 at a very low frequency, and reached high or even dominant frequencies a
year later.
@@ -714,16 +699,16 @@
[1] 600 6
\end{Soutput}
\begin{Sinput}
-> round(head(dapc1$posterior), 3)
+> round(head(dapc1$posterior),3)
\end{Sinput}
\begin{Soutput}
- 1 2 3 4 5 6
-001 1.000 0 0 0 0.000 0
-002 1.000 0 0 0 0.000 0
-003 1.000 0 0 0 0.000 0
-004 0.016 0 0 0 0.984 0
-005 1.000 0 0 0 0.000 0
-006 1.000 0 0 0 0.000 0
+ 1 2 3 4 5 6
+001 0 0.000 0 0 0 1.000
+002 0 0.000 0 0 0 1.000
+003 0 0.000 0 0 0 1.000
+004 0 0.984 0 0 0 0.016
+005 0 0.000 0 0 0 1.000
+006 0 0.000 0 0 0 1.000
\end{Soutput}
\end{Schunk}
Each row corresponds to an individual, each column to a group.
@@ -744,17 +729,17 @@
$assign.per.pop
1 2 3 4 5 6
-1.0000000 1.0000000 0.9901961 1.0000000 0.9904762 1.0000000
+1.0000000 0.9904762 0.9901961 1.0000000 1.0000000 1.0000000
$prior.grp.size
1 2 3 4 5 6
- 98 97 102 99 105 99
+ 97 105 102 99 99 98
$post.grp.size
1 2 3 4 5 6
- 99 97 101 99 105 99
+ 97 105 101 99 99 99
\end{Soutput}
\end{Schunk}
The slot \texttt{assign.per.pop} indicates the proportions of successful reassignment (based on
@@ -766,10 +751,10 @@
options); here, we choose to represent only the first 50 individuals to make the figure readable:
\begin{Schunk}
\begin{Sinput}
-> assignplot(dapc1, subset = 1:50)
+> assignplot(dapc1, subset=1:50)
\end{Sinput}
\end{Schunk}
-\includegraphics{figs/dapc-022}
+\includegraphics{dapc-022}
\noindent
This figure is the simple graphical translation of the \texttt{posterior} table above. Heat colors
@@ -787,27 +772,25 @@
We can plot information of all individuals to have a global picture of the clusters composition.
\begin{Schunk}
\begin{Sinput}
-> compoplot(dapc1, posi = "bottomright", txt.leg = paste("Cluster",
-+ 1:6), lab = "", ncol = 1, xlab = "individuals")
+> compoplot(dapc1, posi="bottomright", txt.leg=paste("Cluster", 1:6), lab="", ncol=1, xlab="individuals")
\end{Sinput}
\end{Schunk}
-\includegraphics{figs/dapc-023}
+\includegraphics{dapc-023}
\noindent We can also have a closer look at a subset of individuals; for instance, for the first 50 individuals:
\begin{Schunk}
\begin{Sinput}
-> compoplot(dapc1, subset = 1:50, posi = "bottomright", txt.leg = paste("Cluster",
-+ 1:6), lab = "", ncol = 2, xlab = "individuals")
+> compoplot(dapc1, subset=1:50, posi="bottomright", txt.leg=paste("Cluster", 1:6), lab="", ncol=2, xlab="individuals")
\end{Sinput}
\end{Schunk}
-\includegraphics{figs/dapc-024}
+\includegraphics{dapc-024}
Obviously, we can use the power of R to lead our investigation further. For instance, which are the
most 'admixed' individuals?
Let us consider as admixed individuals having no more than 90\% of probability of membership in a single cluster:
\begin{Schunk}
\begin{Sinput}
-> temp <- which(apply(dapc1$posterior, 1, function(e) all(e < 0.9)))
+> temp <- which(apply(dapc1$posterior,1, function(e) all(e<0.9)))
> temp
\end{Sinput}
\begin{Soutput}
@@ -815,11 +798,10 @@
21 47 243 280
\end{Soutput}
\begin{Sinput}
-> compoplot(dapc1, subset = temp, posi = "bottomright", txt.leg = paste("Cluster",
-+ 1:6), ncol = 2)
+> compoplot(dapc1, subset=temp, posi="bottomright", txt.leg=paste("Cluster", 1:6), ncol=2)
\end{Sinput}
\end{Schunk}
-\includegraphics{figs/dapc-025}
+\includegraphics{dapc-025}
@@ -881,18 +863,16 @@
@other: a list containing: coun breed spe
\end{Soutput}
\begin{Sinput}
-> temp <- summary(dapc(microbov, n.da = 100, n.pca = 3))$assign.per.pop *
-+ 100
+> temp <- summary(dapc(microbov, n.da=100, n.pca=3))$assign.per.pop*100
\end{Sinput}
\end{Schunk}
\begin{Schunk}
\begin{Sinput}
-> par(mar = c(4.5, 7.5, 1, 1))
-> barplot(temp, xlab = "% of reassignment to actual breed", horiz = TRUE,
-+ las = 1)
+> par(mar=c(4.5,7.5,1,1))
+> barplot(temp, xlab="% of reassignment to actual breed", horiz=TRUE, las=1)
\end{Sinput}
\end{Schunk}
-\includegraphics{figs/dapc-027}
+\includegraphics{dapc-027}
\noindent
We can see that some breeds are well discriminated (e.g. Zebu, Lagunaire, > 90\%) while others are
@@ -901,18 +881,16 @@
We repeat the analysis, this time keeping 300 PCs:
\begin{Schunk}
\begin{Sinput}
-> temp <- summary(dapc(microbov, n.da = 100, n.pca = 300))$assign.per.pop *
-+ 100
+> temp <- summary(dapc(microbov, n.da=100, n.pca=300))$assign.per.pop*100
\end{Sinput}
\end{Schunk}
\begin{Schunk}
\begin{Sinput}
-> par(mar = c(4.5, 7.5, 1, 1))
-> barplot(temp, xlab = "% of reassignment to actual breed", horiz = TRUE,
-+ las = 1)
+> par(mar=c(4.5,7.5,1,1))
+> barplot(temp, xlab="% of reassignment to actual breed", horiz=TRUE, las=1)
\end{Sinput}
\end{Schunk}
-\includegraphics{figs/dapc-029}
+\includegraphics{dapc-029}
\noindent We now obtain almost 100\% of discrimination for all groups.
Is this result satisfying? Actually not.
@@ -923,18 +901,16 @@
\begin{Sinput}
> x <- microbov
> pop(x) <- sample(pop(x))
-> temp <- summary(dapc(x, n.da = 100, n.pca = 300))$assign.per.pop *
-+ 100
+> temp <- summary(dapc(x, n.da=100, n.pca=300))$assign.per.pop*100
\end{Sinput}
\end{Schunk}
\begin{Schunk}
\begin{Sinput}
-> par(mar = c(4.5, 7.5, 1, 1))
-> barplot(temp, xlab = "% of reassignment to actual breed", horiz = TRUE,
-+ las = 1)
+> par(mar=c(4.5,7.5,1,1))
+> barplot(temp, xlab="% of reassignment to actual breed", horiz=TRUE, las=1)
\end{Sinput}
\end{Schunk}
-\includegraphics{figs/dapc-031}
+\includegraphics{dapc-031}
\noindent
Groups have been randomised, and yet we still get very good discrimination.
@@ -955,7 +931,7 @@
groups, and computing $a$-scores for each group, as well as the average $a$-score:
\begin{Schunk}
\begin{Sinput}
-> dapc2 <- dapc(microbov, n.da = 100, n.pca = 10)
+> dapc2 <- dapc(microbov, n.da=100, n.pca=10)
> temp <- a.score(dapc2)
> names(temp)
\end{Sinput}
@@ -963,7 +939,7 @@
[1] "tab" "pop.score" "mean"
\end{Soutput}
\begin{Sinput}
-> temp$tab[1:5, 1:5]
+> temp$tab[1:5,1:5]
\end{Sinput}
\begin{Soutput}
Borgou Zebu Lagunaire NDama Somba
@@ -995,7 +971,7 @@
The number of retained PCs can be chosen so as to optimize the $a$-score; this is achived by \texttt{optim.a.score}:
\begin{Schunk}
\begin{Sinput}
-> dapc2 <- dapc(microbov, n.da = 100, n.pca = 50)
+> dapc2 <- dapc(microbov, n.da=100, n.pca=50)
\end{Sinput}
\end{Schunk}
\begin{Schunk}
@@ -1004,7 +980,7 @@
\end{Sinput}
\end{Schunk}
\begin{center}
- \includegraphics[width=.7\textwidth]{figs/ascore.pdf}
+ \includegraphics[width=.7\textwidth]{ascore.pdf}
\end{center}
@@ -1019,23 +995,23 @@
We perform the analysis with 20 PCs retained, and then map the membership probabilities as before:
\begin{Schunk}
\begin{Sinput}
-> dapc3 <- dapc(microbov, n.da = 100, n.pca = 20)
+> dapc3 <- dapc(microbov, n.da=100, n.pca=20)
> myCol <- rainbow(15)
\end{Sinput}
\end{Schunk}
\begin{Schunk}
\begin{Sinput}
-> par(mar = c(5.1, 4.1, 1.1, 1.1), xpd = TRUE)
-> compoplot(dapc3, lab = "", posi = list(x = 12, y = -0.01), cleg = 0.7)
+> par(mar=c(5.1,4.1,1.1,1.1), xpd=TRUE)
+> compoplot(dapc3, lab="", posi=list(x=12,y=-.01), cleg=.7)
\end{Sinput}
\end{Schunk}
-\includegraphics{figs/dapc-036}
+\includegraphics{dapc-036}
And as before, we can investigate further admixed individuals, which we arbitrarily define as those
having no more than 0.5 probability of membership to any group:
\begin{Schunk}
\begin{Sinput}
-> temp <- which(apply(dapc3$posterior, 1, function(e) all(e < 0.5)))
+> temp <- which(apply(dapc3$posterior,1, function(e) all(e<0.5)))
> temp
\end{Sinput}
\begin{Soutput}
@@ -1052,12 +1028,11 @@
\end{Soutput}
\begin{Sinput}
> lab <- pop(microbov)
-> par(mar = c(8, 4, 5, 1), xpd = TRUE)
-> compoplot(dapc3, subset = temp, cleg = 0.6, posi = list(x = 0,
-+ y = 1.2), lab = lab)
+> par(mar=c(8,4,5,1), xpd=TRUE)
+> compoplot(dapc3, subset=temp, cleg=.6, posi=list(x=0,y=1.2),lab=lab)
\end{Sinput}
\end{Schunk}
-\includegraphics{figs/dapc-037}
+\includegraphics{dapc-037}
\noindent Admixture appears to be the strongest between a few breeds (Blonde d'Aquitaine, Bretonne Pie-Noire,
Limousine and Gascone). Some features are fairly surprising; for instance, the last individual is
@@ -1102,8 +1077,7 @@
\begin{Sinput}
> data(microbov)
> set.seed(2)
-> kept.id <- unlist(tapply(1:nInd(microbov), pop(microbov), function(e) sample(e,
-+ 20, replace = FALSE)))
+> kept.id <- unlist(tapply(1:nInd(microbov), pop(microbov), function(e) sample(e, 20,replace=FALSE)))
> x <- microbov[kept.id]
> x.sup <- microbov[-kept.id]
> nInd(x)
@@ -1126,8 +1100,8 @@
supplementary individuals:
\begin{Schunk}
\begin{Sinput}
-> dapc4 <- dapc(x, n.pca = 20, n.da = 15)
-> pred.sup <- predict.dapc(dapc4, newdata = x.sup)
+> dapc4 <- dapc(x,n.pca=20,n.da=15)
+> pred.sup <- predict.dapc(dapc4, newdata=x.sup)
> names(pred.sup)
\end{Sinput}
\begin{Soutput}
@@ -1141,7 +1115,7 @@
15 Levels: Borgou Zebu Lagunaire NDama Somba Aubrac ... Salers
\end{Soutput}
\begin{Sinput}
-> pred.sup$ind.scores[1:5, 1:3]
+> pred.sup$ind.scores[1:5,1:3]
\end{Sinput}
\begin{Soutput}
LD1 LD2 LD3
@@ -1152,7 +1126,7 @@
005 -4.718570 -0.200391 -0.9196541
\end{Soutput}
\begin{Sinput}
-> round(pred.sup$posterior[1:5, 1:5], 3)
+> round(pred.sup$posterior[1:5, 1:5],3)
\end{Sinput}
\begin{Soutput}
Borgou Zebu Lagunaire NDama Somba
@@ -1173,26 +1147,23 @@
\begin{Schunk}
\begin{Sinput}
> col <- rainbow(length(levels(pop(x))))
-> col.points <- transp(col[as.integer(pop(x))], 0.2)
-> scatter(dapc4, col = col, bg = "white", scree.da = 0, pch = "",
-+ cstar = 0, clab = 0, xlim = c(-10, 10), legend = TRUE)
-> par(xpd = TRUE)
-> points(dapc4$ind.coord[, 1], dapc4$ind.coord[, 2], pch = 20,
-+ col = col.points, cex = 5)
+> col.points <- transp(col[as.integer(pop(x))],.2)
+> scatter(dapc4, col=col, bg="white", scree.da=0, pch="", cstar=0, clab=0, xlim=c(-10,10), legend=TRUE)
+> par(xpd=TRUE)
+> points(dapc4$ind.coord[,1], dapc4$ind.coord[,2], pch=20, col=col.points, cex=5)
> col.sup <- col[as.integer(pop(x.sup))]
-> points(pred.sup$ind.scores[, 1], pred.sup$ind.scores[, 2], pch = 15,
-+ col = transp(col.sup, 0.7), cex = 2)
-> add.scatter.eig(dapc4$eig, 15, 1, 2, posi = "bottomright", inset = 0.02)
+> points(pred.sup$ind.scores[,1], pred.sup$ind.scores[,2], pch=15, col=transp(col.sup,.7), cex=2)
+> add.scatter.eig(dapc4$eig,15,1,2, posi="bottomright", inset=.02)
\end{Sinput}
\end{Schunk}
-\includegraphics{figs/dapc-040}
+\includegraphics{dapc-040}
\noindent Light dots and ellipses correspond to the original analysis, while more solid squares indicate
supplementary individuals.
Results are fairly satisfying:
\begin{Schunk}
\begin{Sinput}
-> mean(as.character(pred.sup$assign) == as.character(pop(x.sup)))
+> mean(as.character(pred.sup$assign)==as.character(pop(x.sup)))
\end{Sinput}
\begin{Soutput}
[1] 0.7549505
@@ -1204,10 +1175,10 @@
table of the actual cluster \textit{vs} the inferred one:
\begin{Schunk}
\begin{Sinput}
-> table.value(table(pred.sup$assign, pop(x.sup)), col.lab = levels(pop(x.sup)))
+> table.value(table(pred.sup$assign, pop(x.sup)), col.lab=levels(pop(x.sup)))
\end{Sinput}
\end{Schunk}
-\includegraphics{figs/dapc-042}
+\includegraphics{dapc-042}
\noindent Columns correspond to actual clusters of the supplementary individuals, while rows
correspond to inferred clusters.
Modified: www/literature.html
===================================================================
--- www/literature.html 2013-04-30 09:59:19 UTC (rev 1120)
+++ www/literature.html 2013-05-08 13:19:07 UTC (rev 1121)
@@ -63,6 +63,7 @@
+
the bublisher's website]
@@ -95,6 +96,7 @@
+
abstract]
- the paper presenting the spatial
@@ -112,6 +114,7 @@
+
principal component analysis (sPCA, function spca), global and
@@ -132,6 +135,7 @@
+
cryptic spatial patterns in genetic variability by a new
multivariate method. Heredity
101: 92-103. doi:
@@ -153,6 +157,7 @@
+
abstract]
@@ -176,6 +181,7 @@
+
simulations of genealoies of haplotypes (haploGen):
Jombart T, Eggo RM, Dodd PJ, Balloux F (2010)
@@ -199,6 +205,7 @@
+
of Principal Components (DAPC, functions find.clusters
@@ -230,6 +237,7 @@
+
Behaviour76:
87-95.
@@ -250,6 +258,7 @@
+
Genomics9: 256.
@@ -285,6 +294,7 @@
+
marmota.Molecular
@@ -299,6 +309,7 @@
+
Ecology 18:
1491-1503.
@@ -352,6 +363,7 @@
+
australis in North America. Biological Invasions. doi:
10.1007/s10530-010-9699-6.
@@ -509,6 +521,7 @@
+
Oct 6. [Epub ahead of print]
[24] SANTOS, H., BURBAN, C., ROUSSELET, J.,
@@ -527,6 +540,7 @@
+
pityocampa, Lepidoptera, Notodontidae). Journal of Evolutionary Biology,
no. doi: 10.1111/j.1420-9101.2010.02147.x
@@ -550,6 +564,7 @@
+
Vol. Sci. Pap. ICCAT, 65(3): 988-995
[26] 2010 Population Genetic
@@ -585,6 +601,7 @@
+
ONE5(11): e13810. doi:10.1371/journal.pone.0013810
@@ -615,6 +632,7 @@
+
DOI: 10.1007/s10329-010-0232-4
* adegenet not or wrongly cited, but actually
used in the paper.
From noreply at r-forge.r-project.org Tue May 14 11:17:14 2013
From: noreply at r-forge.r-project.org (noreply at r-forge.r-project.org)
Date: Tue, 14 May 2013 11:17:14 +0200 (CEST)
Subject: [adegenet-commits] r1122 - www
Message-ID: <20130514091714.E605B18508E@r-forge.r-project.org>
Author: jombart
Date: 2013-05-14 11:17:14 +0200 (Tue, 14 May 2013)
New Revision: 1122
Modified:
www/literature.html
Log:
+1 ref
Modified: www/literature.html
===================================================================
--- www/literature.html 2013-05-08 13:19:07 UTC (rev 1121)
+++ www/literature.html 2013-05-14 09:17:14 UTC (rev 1122)
@@ -64,6 +64,7 @@
+
the bublisher's website]
@@ -97,6 +98,7 @@
+
abstract]
- the paper presenting the spatial
@@ -115,6 +117,7 @@
+
principal component analysis (sPCA, function spca), global and
@@ -136,6 +139,7 @@
+
cryptic spatial patterns in genetic variability by a new
multivariate method. Heredity
101: 92-103. doi:
@@ -158,6 +162,7 @@
+
abstract]
@@ -182,6 +187,7 @@
+
simulations of genealoies of haplotypes (haploGen):
Jombart T, Eggo RM, Dodd PJ, Balloux F (2010)
@@ -206,6 +212,7 @@
+
of Principal Components
[24] SANTOS, H., BURBAN, C., ROUSSELET, J.,
@@ -541,6 +554,7 @@
+
pityocampa, Lepidoptera, Notodontidae). Journal of Evolutionary Biology,
no. doi: 10.1111/j.1420-9101.2010.02147.x
@@ -565,6 +579,7 @@
+
Vol. Sci. Pap. ICCAT, 65(3): 988-995
[26] 2010 Population Genetic
@@ -602,6 +618,7 @@
+
ONE5(11): e13810. doi:10.1371/journal.pone.0013810
@@ -633,6 +650,7 @@
+
DOI: 10.1007/s10329-010-0232-4
* adegenet not or wrongly cited, but actually
used in the paper.
From noreply at r-forge.r-project.org Tue May 14 15:39:36 2013
From: noreply at r-forge.r-project.org (noreply at r-forge.r-project.org)
Date: Tue, 14 May 2013 15:39:36 +0200 (CEST)
Subject: [adegenet-commits] r1123 - in pkg: . R man
Message-ID: <20130514133936.149951851D8@r-forge.r-project.org>
Author: jombart
Date: 2013-05-14 15:39:35 +0200 (Tue, 14 May 2013)
New Revision: 1123
Modified:
pkg/ChangeLog
pkg/R/auxil.R
pkg/man/auxil.Rd
Log:
Fixes to auxiliary color functions
Modified: pkg/ChangeLog
===================================================================
--- pkg/ChangeLog 2013-05-14 09:17:14 UTC (rev 1122)
+++ pkg/ChangeLog 2013-05-14 13:39:35 UTC (rev 1123)
@@ -1,3 +1,16 @@
+ CHANGES IN ADEGENET VERSION 1.3-8
+
+NEW FEATURES
+
+ o new palettes: azur, wasp
+
+ o new function any2col translates (numeric, factor, character)
+ vectors into colors, also providing information for a legend
+
+ o
+
+
+
CHANGES IN ADEGENET VERSION 1.3-7
NEW FEATURES
Modified: pkg/R/auxil.R
===================================================================
--- pkg/R/auxil.R 2013-05-14 09:17:14 UTC (rev 1122)
+++ pkg/R/auxil.R 2013-05-14 13:39:35 UTC (rev 1123)
@@ -197,7 +197,7 @@
## translate numeric values into colors of a palette
num2col <- function(x, col.pal=heat.colors, reverse=FALSE,
x.min=min(x), x.max=max(x), na.col="green"){
- if(any(is.na(x))) warning("NAs detected in x")
+ ## if(any(is.na(x))) warning("NAs detected in x")
x[x < x.min] <- x.min
x[x > x.max] <- x.max
x <- x-x.min # min=0
@@ -225,17 +225,21 @@
###########
## translate a factor into colors of a palette
## colors are randomized based on the provided seed
-fac2col <- function(x, col.pal=funky, na.col="grey", seed=1){
+fac2col <- function(x, col.pal=funky, na.col="grey", seed=NULL){
## get factors and levels
x <- factor(x)
lev <- levels(x)
nlev <- length(lev)
## get colors corresponding to levels
- set.seed(seed)
- newseed <- round(runif(1,1,1e9))
- on.exit(set.seed(newseed))
- col <- sample(col.pal(nlev))
+ if(!is.null(seed)){
+ set.seed(seed)
+ newseed <- round(runif(1,1,1e9))
+ on.exit(set.seed(newseed))
+ col <- sample(col.pal(nlev))
+ } else {
+ col <- col.pal(nlev)
+ }
## get output colors
res <- rep(na.col, length(x))
@@ -246,20 +250,45 @@
}
+###########
+## any2col
+###########
+any2col <- function(x, col.pal=seasun, na.col="transparent"){
+ ## handle numeric data
+ if(is.numeric(x)){
+ col <- num2col(x, col.pal=col.pal)
+ leg.col <- num2col(pretty(x), x.min=min(x, na.rm=TRUE),
+ x.max=max(x, na.rm=TRUE), col.pal=col.pal,
+ na.col=na.col)
+ leg.txt <- pretty(x)
+ } else{ ## handle factor
+ x <- factor(x)
+ col <- fac2col(x, col.pal=col.pal)
+ leg.col <- col.pal(length(levels(x)))
+ leg.txt <- levels(x)
+ }
+
+ return(list(col=col, leg.col=leg.col, leg.txt=leg.txt))
+} # end any2col
+
+
+
## pre-defined palettes ##
## mono color
bluepal <- colorRampPalette(c("lightgrey","blue"))
redpal <- colorRampPalette(c("lightgrey","red"))
-greenpal <- colorRampPalette(c("lightgrey","green"))
+greenpal <- colorRampPalette(c("lightgrey","green3"))
## bi-color
-flame <- colorRampPalette(c("gold","red"))
+flame <- colorRampPalette(c("gold","red3"))
+azur <- colorRampPalette(c("gold","royalblue"))
## tri-color
seasun <- colorRampPalette(c("blue","gold","red"))
lightseasun <- colorRampPalette(c("deepskyblue2","gold","red1"))
deepseasun <- colorRampPalette(c("blue2","gold","red2"))
+wasp <- colorRampPalette(c("yellow2","brown", "black"))
## psychedelic
-funky <- colorRampPalette(c("blue","green3","gold","orange","red","brown4","purple"))
+funky <- colorRampPalette(c("blue","green3","gold","orange","red","brown4","purple","pink2"))
Modified: pkg/man/auxil.Rd
===================================================================
--- pkg/man/auxil.Rd 2013-05-14 09:17:14 UTC (rev 1122)
+++ pkg/man/auxil.Rd 2013-05-14 13:39:35 UTC (rev 1123)
@@ -10,14 +10,17 @@
\alias{corner}
\alias{num2col}
\alias{fac2col}
+\alias{any2col}
\alias{transp}
\alias{bluepal}
\alias{redpal}
\alias{greenpal}
\alias{flame}
+\alias{azur}
\alias{seasun}
\alias{lightseasun}
\alias{deepseasun}
+\alias{wasp}
\alias{funky}
\title{ Auxiliary functions for adegenet}
@@ -27,41 +30,51 @@
variables (numeric or factors) onto a color scale, adding transparency
to existing colors, pre-defined color palettes, extra functions to
access documentation, and low-level treatment of character vectors.
-
+
These functions are mostly auxiliary procedures used internally in
adegenet, with the exception of, which opens the
adegenet website in the default navigator.\cr
- These items include:\cr
- - \code{adegenetWeb}: opens the adegenet website in a web navigator
- - \code{num2col}: translates a numeric vector into colors. \cr
- - \code{fac2col}: translates a numeric vector into colors. \cr
- - \code{transp}: adds transparency to a vector of colors. Note that
- transparent colors are not supported on some graphical devices.\cr
- - \code{corner}: adds text to a corner of a figure. \cr
- - \code{checkType}: checks the type of markers being used in a
- function and issues an error if appropriate.\cr
- - \code{.rmspaces}: remove peripheric spaces in a character string. \cr
- - \code{.genlab}: generate labels in a correct alphanumeric ordering. \cr
- - \code{.readExt}: read the extension of a given file. \cr
+ These items include:
+ \itemize{
+ \item \code{adegenetWeb}: opens the adegenet website in a web navigator
+ \item \code{num2col}: translates a numeric vector into colors.
+ \item \code{fac2col}: translates a factor into colors.
+ \item \code{any2col}: translates a vector of type numeric, character
+ or factor into colors.
+ \item \code{transp}: adds transparency to a vector of colors. Note that
+ transparent colors are not supported on some graphical devices.
+ \item \code{corner}: adds text to a corner of a figure.
+ \item \code{checkType}: checks the type of markers being used in a
+ function and issues an error if appropriate.
+ \item \code{.rmspaces}: remove peripheric spaces in a character string.
+ \item \code{.genlab}: generate labels in a correct alphanumeric ordering.
+ \item \code{.readExt}: read the extension of a given file.
+ }
- Color palettes include:\cr
- - \code{bluepal}: white->blue\cr
- - \code{redpal}: white->red\cr
- - \code{greenpal}: white->green\cr
- - \code{flame}: gold->red\cr
- - \code{seasun}: blue->gold->red\cr
- - \code{lightseasun}: blue->gold->red (light variant)\cr
- - \code{deepseasun}: blue->gold->red (deep variant)\cr
- - \code{funky}: many colors\cr
+ Color palettes include:
+ \itemize{
+ \item \code{bluepal}: white->blue
+ \item \code{redpal}: white->red
+ \item \code{greenpal}: white->green
+ \item \code{flame}: gold->red
+ \item \code{azur}: gold->blue
+ \item \code{seasun}: blue->gold->red
+ \item \code{lightseasun}: blue->gold->red (light variant)
+ \item \code{deepseasun}: blue->gold->red (deep variant)
+ \item \code{wasp}: gold->brown->black
+ \item \code{funky}: many colors
+ }
}
\usage{
adegenetWeb()
.genlab(base, n)
corner(text, posi="topleft", inset=0.1, \dots)
num2col(x, col.pal=heat.colors, reverse=FALSE,
- x.min=min(x), x.max=max(x), na.col="green")
-fac2col(x, col.pal=funky, na.col="grey", seed=1)
+ x.min=min(x,na.rm=TRUE), x.max=max(x,na.rm=TRUE),
+ na.col="green")
+fac2col(x, col.pal=funky, na.col="grey", seed=NULL)
+any2col(x, col.pal=seasun, na.col="transparent")
transp(col, alpha=.5)
}
\arguments{
@@ -82,13 +95,19 @@
\item{x.max}{the maximal value from which to start the color scale}
\item{na.col}{the color to be used for missing values (NAs)}
\item{seed}{a seed for R's random number generated, used to fix the
- random permutation of colors in the palette used.}
+ random permutation of colors in the palette used; if NULL, no
+ randomization is used and the colors are taken from the palette
+ according to the ordering of the levels.}
\item{col}{a vector of colors}
\item{alpha}{a numeric value between 0 and 1 representing the alpha
- coefficient; 0: total transparency; 1: no transparency.}
+ coefficient; 0: total transparency; 1: no transparency.}
}
\value{
For \code{.genlab}, a character vector of size "n".
+ \code{num2col} and \code{fac2col} return a vector of
+ colors. \code{any2col} returns a list with the following components:
+ \code{$col} (a vector of colors), \code{$leg.col} (colors for the
+ legend), and \code{$leg.txt} (text for the legend).
}
\author{Thibaut Jombart \email{t.jombart at imperial.ac.uk} }
\examples{
@@ -108,8 +127,8 @@
plot(1:100, col=num2col(1:100), pch=20, cex=4)
plot(1:100, col=num2col(1:100, col.pal=bluepal), pch=20, cex=4)
plot(1:100, col=num2col(1:100, col.pal=flame), pch=20, cex=4)
-plot(1:100, col=num2col(1:100, col.pal=seasun), pch=20, cex=4)
-plot(1:100, col=num2col(1:100, col.pal=seasun,rev=TRUE), pch=20, cex=4)
+plot(1:100, col=num2col(1:100, col.pal=wasp), pch=20, cex=4)
+plot(1:100, col=num2col(1:100, col.pal=azur,rev=TRUE), pch=20, cex=4)
## factor as colors using fac2col
dat <- cbind(c(rnorm(50,8), rnorm(100), rnorm(150,3),
@@ -119,5 +138,16 @@
plot(dat, col=transp(fac2col(fac)), pch=19, cex=4)
plot(dat, col=transp(fac2col(fac,seed=2)), pch=19, cex=4)
+## use of any2col
+x <- factor(1:10)
+col.info <- any2col(x, col.pal=funky)
+plot(x, col=col.info$col, main="Use of any2col on a factor")
+legend("bottomleft", fill=col.info$leg.col, legend=col.info$leg.txt, bg="white")
+
+x <- 100:1
+col.info <- any2col(x, col.pal=wasp)
+barplot(x, col=col.info$col, main="Use of any2col on a numeric")
+legend("bottomleft", fill=col.info$leg.col, legend=col.info$leg.txt, bg="white")
+
}
\keyword{manip}
\ No newline at end of file
From noreply at r-forge.r-project.org Tue May 14 15:39:54 2013
From: noreply at r-forge.r-project.org (noreply at r-forge.r-project.org)
Date: Tue, 14 May 2013 15:39:54 +0200 (CEST)
Subject: [adegenet-commits] r1124 - pkg/R
Message-ID: <20130514133954.84C9B184988@r-forge.r-project.org>
Author: jombart
Date: 2013-05-14 15:39:54 +0200 (Tue, 14 May 2013)
New Revision: 1124
Modified:
pkg/R/auxil.R
Log:
Fixes to auxiliary color functions
Modified: pkg/R/auxil.R
===================================================================
--- pkg/R/auxil.R 2013-05-14 13:39:35 UTC (rev 1123)
+++ pkg/R/auxil.R 2013-05-14 13:39:54 UTC (rev 1124)
@@ -196,7 +196,8 @@
###########
## translate numeric values into colors of a palette
num2col <- function(x, col.pal=heat.colors, reverse=FALSE,
- x.min=min(x), x.max=max(x), na.col="green"){
+ x.min=min(x,na.rm=TRUE), x.max=max(x,na.rm=TRUE),
+ na.col="green"){
## if(any(is.na(x))) warning("NAs detected in x")
x[x < x.min] <- x.min
x[x > x.max] <- x.max
From noreply at r-forge.r-project.org Tue May 14 16:30:32 2013
From: noreply at r-forge.r-project.org (noreply at r-forge.r-project.org)
Date: Tue, 14 May 2013 16:30:32 +0200 (CEST)
Subject: [adegenet-commits] r1125 - pkg
Message-ID: <20130514143032.5DB131855F7@r-forge.r-project.org>
Author: greatsage
Date: 2013-05-14 16:30:28 +0200 (Tue, 14 May 2013)
New Revision: 1125
Modified:
pkg/ChangeLog
Log:
updated Changelog -- Fede
Modified: pkg/ChangeLog
===================================================================
--- pkg/ChangeLog 2013-05-14 13:39:54 UTC (rev 1124)
+++ pkg/ChangeLog 2013-05-14 14:30:28 UTC (rev 1125)
@@ -7,7 +7,8 @@
o new function any2col translates (numeric, factor, character)
vectors into colors, also providing information for a legend
- o
+ o new function xval.dapc (and its wrapper xval), that performs
+ cross-validation for a dapc analysis.
From noreply at r-forge.r-project.org Tue May 14 17:20:13 2013
From: noreply at r-forge.r-project.org (noreply at r-forge.r-project.org)
Date: Tue, 14 May 2013 17:20:13 +0200 (CEST)
Subject: [adegenet-commits] r1126 - pkg/R
Message-ID: <20130514152013.AF17018444A@r-forge.r-project.org>
Author: jombart
Date: 2013-05-14 17:20:13 +0200 (Tue, 14 May 2013)
New Revision: 1126
Modified:
pkg/R/auxil.R
Log:
fixed handling of missing data
Modified: pkg/R/auxil.R
===================================================================
--- pkg/R/auxil.R 2013-05-14 14:30:28 UTC (rev 1125)
+++ pkg/R/auxil.R 2013-05-14 15:20:13 UTC (rev 1126)
@@ -257,14 +257,14 @@
any2col <- function(x, col.pal=seasun, na.col="transparent"){
## handle numeric data
if(is.numeric(x)){
- col <- num2col(x, col.pal=col.pal)
+ col <- num2col(x, col.pal=col.pal, na.col=na.col)
leg.col <- num2col(pretty(x), x.min=min(x, na.rm=TRUE),
x.max=max(x, na.rm=TRUE), col.pal=col.pal,
na.col=na.col)
leg.txt <- pretty(x)
} else{ ## handle factor
x <- factor(x)
- col <- fac2col(x, col.pal=col.pal)
+ col <- fac2col(x, col.pal=col.pal, na.col=na.col)
leg.col <- col.pal(length(levels(x)))
leg.txt <- levels(x)
}
From noreply at r-forge.r-project.org Tue May 14 17:21:11 2013
From: noreply at r-forge.r-project.org (noreply at r-forge.r-project.org)
Date: Tue, 14 May 2013 17:21:11 +0200 (CEST)
Subject: [adegenet-commits] r1127 - in pkg: R man
Message-ID: <20130514152111.30C1918444A@r-forge.r-project.org>
Author: jombart
Date: 2013-05-14 17:21:10 +0200 (Tue, 14 May 2013)
New Revision: 1127
Modified:
pkg/R/auxil.R
pkg/man/auxil.Rd
Log:
fixed handling of missing data
Modified: pkg/R/auxil.R
===================================================================
--- pkg/R/auxil.R 2013-05-14 15:20:13 UTC (rev 1126)
+++ pkg/R/auxil.R 2013-05-14 15:21:10 UTC (rev 1127)
@@ -197,7 +197,7 @@
## translate numeric values into colors of a palette
num2col <- function(x, col.pal=heat.colors, reverse=FALSE,
x.min=min(x,na.rm=TRUE), x.max=max(x,na.rm=TRUE),
- na.col="green"){
+ na.col="transparent"){
## if(any(is.na(x))) warning("NAs detected in x")
x[x < x.min] <- x.min
x[x > x.max] <- x.max
@@ -226,7 +226,7 @@
###########
## translate a factor into colors of a palette
## colors are randomized based on the provided seed
-fac2col <- function(x, col.pal=funky, na.col="grey", seed=NULL){
+fac2col <- function(x, col.pal=funky, na.col="transparent", seed=NULL){
## get factors and levels
x <- factor(x)
lev <- levels(x)
Modified: pkg/man/auxil.Rd
===================================================================
--- pkg/man/auxil.Rd 2013-05-14 15:20:13 UTC (rev 1126)
+++ pkg/man/auxil.Rd 2013-05-14 15:21:10 UTC (rev 1127)
@@ -72,8 +72,8 @@
corner(text, posi="topleft", inset=0.1, \dots)
num2col(x, col.pal=heat.colors, reverse=FALSE,
x.min=min(x,na.rm=TRUE), x.max=max(x,na.rm=TRUE),
- na.col="green")
-fac2col(x, col.pal=funky, na.col="grey", seed=NULL)
+ na.col="transparent")
+fac2col(x, col.pal=funky, na.col="transparent", seed=NULL)
any2col(x, col.pal=seasun, na.col="transparent")
transp(col, alpha=.5)
}
From noreply at r-forge.r-project.org Tue May 14 23:54:19 2013
From: noreply at r-forge.r-project.org (noreply at r-forge.r-project.org)
Date: Tue, 14 May 2013 23:54:19 +0200 (CEST)
Subject: [adegenet-commits] r1128 - in pkg: . R man vignettes
Message-ID: <20130514215419.9D7BB184ECF@r-forge.r-project.org>
Author: jombart
Date: 2013-05-14 23:54:19 +0200 (Tue, 14 May 2013)
New Revision: 1128
Modified:
pkg/ChangeLog
pkg/R/dapc.R
pkg/R/find.clust.R
pkg/R/scale.R
pkg/R/spca.R
pkg/man/as.genlight.Rd
pkg/man/ascore.Rd
pkg/man/dapc.Rd
pkg/man/dapcGraphics.Rd
pkg/man/eHGDP.Rd
pkg/man/find.clusters.Rd
pkg/man/inbreeding.Rd
pkg/man/loadingplot.Rd
pkg/man/read.structure.Rd
pkg/man/scale.Rd
pkg/man/seqTrack.Rd
pkg/man/sequences.Rd
pkg/man/spca.Rd
pkg/man/spcaIllus.Rd
pkg/vignettes/adegenet-spca.Rnw
Log:
Fixed release 1.3-8; new version of xvalDapc;removed method argument in scaleGen
Modified: pkg/ChangeLog
===================================================================
--- pkg/ChangeLog 2013-05-14 15:21:10 UTC (rev 1127)
+++ pkg/ChangeLog 2013-05-14 21:54:19 UTC (rev 1128)
@@ -7,8 +7,8 @@
o new function any2col translates (numeric, factor, character)
vectors into colors, also providing information for a legend
- o new function xval.dapc (and its wrapper xval), that performs
- cross-validation for a dapc analysis.
+ o new function xvalDapc which performs cross-validation for a dapc
+ analysis.
Modified: pkg/R/dapc.R
===================================================================
--- pkg/R/dapc.R 2013-05-14 15:21:10 UTC (rev 1127)
+++ pkg/R/dapc.R 2013-05-14 21:54:19 UTC (rev 1128)
@@ -64,7 +64,7 @@
## keep relevant PCs - stored in XU
X.rank <- sum(pcaX$eig > 1e-14)
n.pca <- min(X.rank, n.pca)
- if(n.pca >= N) stop("number of retained PCs of PCA is greater than N")
+ if(n.pca >= N) n.pca <- N-1
if(n.pca > N/3) warning("number of retained PCs of PCA may be too large (> N /3)\n results may be unstable ")
n.pca <- round(n.pca)
@@ -152,7 +152,7 @@
## dapc.genind
#############
dapc.genind <- function(x, pop=NULL, n.pca=NULL, n.da=NULL,
- scale=FALSE, scale.method=c("sigma", "binom"), truenames=TRUE, var.contrib=TRUE, pca.info=TRUE,
+ scale=FALSE, truenames=TRUE, var.contrib=TRUE, pca.info=TRUE,
pca.select=c("nbEig","percVar"), perc.pca=NULL, ...){
## FIRST CHECKS
@@ -176,7 +176,7 @@
## PERFORM PCA ##
maxRank <- min(dim(x at tab))
- X <- scaleGen(x, center = TRUE, scale = scale, method = scale.method,
+ X <- scaleGen(x, center = TRUE, scale = scale,
missing = "mean", truenames = truenames)
## CALL DATA.FRAME METHOD ##
@@ -299,7 +299,7 @@
N <- nInd(x)
X.rank <- sum(pcaX$eig > 1e-14)
n.pca <- min(X.rank, n.pca)
- if(n.pca >= N) stop("number of retained PCs of PCA is greater than N")
+ if(n.pca >= N) n.pca <- N-1
if(n.pca > N/3) warning("number of retained PCs of PCA may be too large (> N /3)\n results may be unstable ")
U <- pcaX$loadings[, 1:n.pca, drop=FALSE] # principal axes
@@ -983,37 +983,89 @@
-## ############
-## ## crossval
-## ############
+############
+## crossval
+############
-xval.dapc <- function(object, n.pca, n.da, training.set = 90, ...){
- training.set = training.set/100
- kept.id <- unlist(tapply(1:nInd(object), pop(object), function(e) {pop.size = length(e); pop.size.train = round(pop.size * training.set); sample(e, pop.size.train, replace=FALSE)}))
- training <- object[kept.id]
- validating <- object[-kept.id]
- post = vector(mode = 'list', length = n.pca)
- asgn = vector(mode = 'list', length = n.pca)
- ind = vector(mode = 'list', length = n.pca)
- mtch = vector(mode = 'list', length = n.pca)
- for(i in 1:n.pca){
- dapc.base = dapc(training, n.pca = i, n.da = 15)
- dapc.p = predict.dapc(dapc.base, newdata = validating)
- match.prp = mean(as.character(dapc.p$assign)==as.character(pop(validating)))
- post[[i]] = dapc.p$posterior
- asgn[[i]] = dapc.p$assign
- ind[[i]] = dapc.p$ind.score
- mtch[[i]] = match.prp
- }
- res = list(assign = asgn, posterior = post, ind.score = ind, match.prp = mtch)
- return(res)
-} # end of xval.dapc
+xvalDapc <- function (x, ...) UseMethod("xvalDapc")
-xval <- function (object, n.pca, n.da, training.set, ...) UseMethod("xval")
-xval.genind <- function(object, n.pca, n.da, training.set = 90, ...){
- res = xval.dapc(object = object, n.pca = n.pca, n.da = n.da, training.set = training.set)
- return(res)
+xvalDapc.data.frame <- function(x, grp, n.pca.max, n.da=NULL, training.set = 1/2,
+ center=TRUE, scale=FALSE,
+ n.pca=NULL, n.rep=10, ...){
+
+ ## CHECKS ##
+ grp <- factor(grp)
+ n.pca <- n.pca[n.pca>0]
+ if(is.null(n.da)) {
+ n.da <- length(levels(grp))-1
+ }
+
+ ## GET TRAINING SET SIZE ##
+ N <- nrow(x)
+ N.training <- round(N*training.set)
+
+ ## GET FULL PCA ##
+ pcaX <- dudi.pca(x, nf=n.pca.max, scannf=FALSE, center=center, scale=scale)
+ n.pca.max <- min(n.pca.max,pcaX$rank,N.training-1)
+
+ ## DETERMINE N.PCA IF NEEDED ##
+ if(is.null(n.pca)){
+ n.pca <- round(pretty(1:n.pca.max,10))
+ }
+ n.pca <- n.pca[n.pca>0 & n.pca<(N.training-1)]
+
+ ## FUNCTION GETTING THE % OF ACCURATE PREDICTION FOR ONE NUMBER OF PCA PCs ##
+ ## n.pca is a number of retained PCA PCs
+ get.prop.pred <- function(n.pca){
+ f1 <- function(){
+ toKeep <- sample(1:N, N.training)
+ temp.pca <- pcaX
+ temp.pca$li <- temp.pca$li[toKeep,,drop=FALSE]
+ temp.dapc <- suppressWarnings(dapc(x[toKeep,,drop=FALSE], grp[toKeep], n.pca=n.pca, n.da=n.da, dudi=temp.pca))
+ temp.pred <- predict.dapc(temp.dapc, newdata=x[-toKeep,,drop=FALSE])
+ return(mean(temp.pred$assign==grp[-toKeep]))
+ }
+ return(replicate(n.rep, f1()))
+ }
+
+
+ ## GET %SUCCESSFUL OF ACCURATE PREDICTION FOR ALL VALUES ##
+ res.all <- unlist(lapply(n.pca, get.prop.pred))
+ res <- list(success=res.all, n.pca=factor(rep(n.pca, each=n.rep)))
+ return(res)
}
+
+
+xvalDapc.matrix <- xvalDapc.data.frame
+
+
+## There's a bunch of problems down there, commenting it for now?
+## xval.dapc <- function(object, n.pca, n.da, training.set = 90, ...){
+## training.set = training.set/100
+## kept.id <- unlist(tapply(1:nInd(object), pop(object), function(e) {pop.size = length(e); pop.size.train = round(pop.size * training.set); sample(e, pop.size.train, replace=FALSE)})) # this can't work: nInd/pop not defined for DAPC objects
+## training <- object[kept.id]
+## validating <- object[-kept.id]
+## post = vector(mode = 'list', length = n.pca)
+## asgn = vector(mode = 'list', length = n.pca)
+## ind = vector(mode = 'list', length = n.pca)
+## mtch = vector(mode = 'list', length = n.pca)
+## for(i in 1:n.pca){
+## dapc.base = dapc(training, n.pca = i, n.da = 15) # Why 15??
+## dapc.p = predict.dapc(dapc.base, newdata = validating)
+## match.prp = mean(as.character(dapc.p$assign)==as.character(pop(validating)))
+## post[[i]] = dapc.p$posterior
+## asgn[[i]] = dapc.p$assign
+## ind[[i]] = dapc.p$ind.score
+## mtch[[i]] = match.prp
+## }
+## res = list(assign = asgn, posterior = post, ind.score = ind, match.prp = mtch)
+## return(res)
+## } # end of xval.dapc
+
+## xval.genind <- function(object, n.pca, n.da, training.set = 90, ...){
+## res = xval.dapc(object = object, n.pca = n.pca, n.da = n.da, training.set = training.set)
+## return(res)
+## }
## ###############
## ## randtest.dapc
## ###############
Modified: pkg/R/find.clust.R
===================================================================
--- pkg/R/find.clust.R 2013-05-14 15:21:10 UTC (rev 1127)
+++ pkg/R/find.clust.R 2013-05-14 21:54:19 UTC (rev 1128)
@@ -193,7 +193,7 @@
find.clusters.genind <- function(x, clust=NULL, n.pca=NULL, n.clust=NULL, stat=c("BIC", "AIC", "WSS"), choose.n.clust=TRUE,
criterion=c("diffNgroup", "min","goesup", "smoothNgoesup", "goodfit"),
max.n.clust=round(nrow(x at tab)/10), n.iter=1e5, n.start=10,
- scale=FALSE, scale.method=c("sigma", "binom"), truenames=TRUE, ...){
+ scale=FALSE, truenames=TRUE, ...){
## CHECKS ##
if(!require(ade4, quietly=TRUE)) stop("ade4 library is required.")
@@ -210,7 +210,7 @@
## PERFORM PCA ##
maxRank <- min(dim(x at tab))
- X <- scaleGen(x, center = TRUE, scale = scale, method = scale.method,
+ X <- scaleGen(x, center = TRUE, scale = scale,
missing = "mean", truenames = truenames)
## CALL DATA.FRAME METHOD
Modified: pkg/R/scale.R
===================================================================
--- pkg/R/scale.R 2013-05-14 15:21:10 UTC (rev 1127)
+++ pkg/R/scale.R 2013-05-14 21:54:19 UTC (rev 1128)
@@ -4,13 +4,11 @@
setGeneric("scaleGen", function(x,...){standardGeneric("scaleGen")})
setMethod("scaleGen", "genind", function(x, center=TRUE, scale=TRUE,
- method=c("sigma", "binom"), missing=c("NA","0","mean"), truenames=TRUE){
+ missing=c("NA","0","mean"), truenames=TRUE){
THRES <- 1e-10
- method <- match.arg(method)
missing <- match.arg(missing)
## checkType(x)
- if(method=="binom" & x at type=="PA") stop("This scaling is not available for presence/absence markers.")
## handle "missing" arg
if(missing %in% c("0","mean")){
@@ -18,7 +16,7 @@
}
## handle specific cases
- if(scale[1] & tolower(method)=="binom"){
+ if(scale[1]){
## get allele freq
temp <- apply(x$tab,2,mean,na.rm=TRUE)
if(x at type=="codom"){
@@ -59,13 +57,11 @@
setMethod("scaleGen", "genpop", function(x, center=TRUE, scale=TRUE,
- method=c("sigma", "binom"), missing=c("NA","0","mean"), truenames=TRUE){
+ missing=c("NA","0","mean"), truenames=TRUE){
THRES <- 1e-10
- method <- match.arg(method)
missing <- match.arg(missing)
## checkType(x)
- if(method=="binom" & x at type=="PA") stop("This scaling is not available for presence/absence markers.")
## make allele frequencies here
if(x at type=="codom"){
@@ -75,7 +71,7 @@
}
## handle specific cases
- if(scale[1] & tolower(method)=="binom"){
+ if(scale[1]){
## get allele freq
temp <- apply(X,2,mean,na.rm=TRUE)
if(x at type=="codom"){
Modified: pkg/R/spca.R
===================================================================
--- pkg/R/spca.R 2013-05-14 15:21:10 UTC (rev 1127)
+++ pkg/R/spca.R 2013-05-14 21:54:19 UTC (rev 1128)
@@ -16,7 +16,7 @@
# spca genind
################
spca <- function(obj, xy=NULL, cn=NULL, matWeight=NULL,
- scale=FALSE, scale.method=c("sigma","binom"),
+ scale=FALSE,
scannf=TRUE, nfposi=1, nfnega=1,
type=NULL, ask=TRUE, plot.nb=TRUE, edit.nb=FALSE,
truenames=TRUE, d1=NULL, d2=NULL, k=NULL, a=NULL, dmin=NULL){
@@ -89,7 +89,7 @@
}
## handle NAs, centring and scaling
- X <- scaleGen(obj, center=TRUE, scale=scale, method=scale.method, missing="mean", truenames=truenames)
+ X <- scaleGen(obj, center=TRUE, scale=scale, missing="mean", truenames=truenames)
## perform analyses
pcaX <- dudi.pca(X, center=FALSE, scale=FALSE, scannf=FALSE)
Modified: pkg/man/as.genlight.Rd
===================================================================
--- pkg/man/as.genlight.Rd 2013-05-14 15:21:10 UTC (rev 1127)
+++ pkg/man/as.genlight.Rd 2013-05-14 21:54:19 UTC (rev 1128)
@@ -45,6 +45,7 @@
- \code{\linkS4class{genind}}
}
\examples{
+\dontrun{
## data to be converted
dat <- list(toto=c(1,1,0,0,2,2,1,2,NA), titi=c(NA,1,1,0,1,1,1,0,0), tata=c(NA,0,3, NA,1,1,1,0,0))
@@ -58,7 +59,7 @@
identical(x1,x2)
identical(x1,x3)
+}
-
}
\keyword{classes}
Modified: pkg/man/ascore.Rd
===================================================================
--- pkg/man/ascore.Rd 2013-05-14 15:21:10 UTC (rev 1127)
+++ pkg/man/ascore.Rd 2013-05-14 21:54:19 UTC (rev 1128)
@@ -10,7 +10,8 @@
\usage{
a.score(x, n.sim=10, \ldots)
-optim.a.score(x, n.pca=1:ncol(x$tab), smart=TRUE, n=10, plot=TRUE, n.sim=10, n.da=length(levels(x$grp)), \ldots)
+optim.a.score(x, n.pca=1:ncol(x$tab), smart=TRUE, n=10, plot=TRUE,
+ n.sim=10, n.da=length(levels(x$grp)), \ldots)
}
\arguments{
\item{x}{a \code{dapc} object.}
Modified: pkg/man/dapc.Rd
===================================================================
--- pkg/man/dapc.Rd 2013-05-14 15:21:10 UTC (rev 1127)
+++ pkg/man/dapc.Rd 2013-05-14 21:54:19 UTC (rev 1128)
@@ -9,9 +9,9 @@
\alias{print.dapc}
\alias{summary.dapc}
\alias{predict.dapc}
-\alias{xval.dapc}
-\alias{xval}
-\alias{xval.genind}
+\alias{xvalDapc}
+\alias{xvalDapc.data.frame}
+\alias{xvalDapc.matrix}
\alias{as.lda}
\alias{as.lda.dapc}
\title{Discriminant Analysis of Principal Components (DAPC)}
@@ -31,18 +31,19 @@
- \code{matrix} (only numeric data)\cr
- \code{\linkS4class{genind}} objects (genetic markers)\cr
- \code{\linkS4class{genlight}} objects (genome-wide SNPs)
-
+
These methods all return an object with class \code{dapc}.
Functions that can be applied to these objects are (the ".dapc" can be
ommitted):
- \code{print.dapc}: prints the content of a \code{dapc} object.\cr
- - \code{summary.dapc}: extracts useful information from a \code{dapc} object.\cr
+ - \code{summary.dapc}: extracts useful information from a \code{dapc}
+ object.\cr
- \code{predict.dapc}: predicts group memberships based on DAPC results.\cr
- - \code{xval.dapc}: performs cross-validation of DAPC function varying the number of PCs and keeping the number of DAs fixed.
- - \code{xval}: performs cross-validation of DAPC function varying the number of PCs and keeping the number of DAs fixed.
- - \code{xval.genind}: performs cross-validation of DAPC function varying the number of PCs and keeping the number of DAs fixed.
+ - \code{xvalDapc}: performs cross-validation of DAPC using varying
+ numbers of PCs (and keeping the number of discriminant functions
+ fixed); it currently has methods for \code{data.frame} and \code{matrix}.\cr
@@ -55,7 +56,6 @@
\code{as.lda} is a generic with a method for \code{dapc} object which
converts these objects into outputs similar to that of
\code{lda.default}.
-
}
\usage{
\method{dapc}{data.frame}(x, grp, n.pca=NULL, n.da=NULL, center=TRUE,
@@ -65,8 +65,8 @@
\method{dapc}{matrix}(x, \ldots)
\method{dapc}{genind}(x, pop=NULL, n.pca=NULL, n.da=NULL, scale=FALSE,
- scale.method=c("sigma", "binom"), truenames=TRUE, var.contrib=TRUE,
- pca.info=TRUE, pca.select=c("nbEig","percVar"), perc.pca=NULL, \ldots)
+ truenames=TRUE, var.contrib=TRUE, pca.info=TRUE,
+ pca.select=c("nbEig","percVar"), perc.pca=NULL, \ldots)
\method{dapc}{genlight}(x, pop = NULL, n.pca = NULL, n.da = NULL, scale
= FALSE, var.contrib = TRUE, pca.info=TRUE, pca.select = c("nbEig", "percVar"),
@@ -81,9 +81,13 @@
\method{predict}{dapc}(object, newdata, prior = object$prior, dimen,
method = c("plug-in", "predictive", "debiased"), ...)
-\method{xval}{dapc}(object, n.pca, n.da, training.set = 90, \ldots)
+\method{xvalDapc}{data.frame}(x, grp, n.pca.max, n.da=NULL,
+ training.set = 1/2, center=TRUE, scale=FALSE,
+ n.pca=NULL, n.rep=10, \ldots)
-\method{xval}{genind}(object, n.pca, n.da, training.set = 90, \ldots)
+\method{xvalDapc}{matrix}(x, grp, n.pca.max, n.da=NULL,
+ training.set = 1/2, center=TRUE, scale=FALSE,
+ n.pca=NULL, n.rep=10, \ldots)
}
\arguments{
\item{x}{\code{a data.frame}, \code{matrix}, or \code{\linkS4class{genind}}
@@ -100,8 +104,7 @@
\item{scale}{a \code{logical} indicating whether variables should be scaled
(TRUE) or not (FALSE, default). Scaling consists in dividing variables by their
(estimated) standard deviation to account for trivial differences in
- variances. Further scaling options are available for \linkS4class{genind}
- objects (see argument \code{scale.method}).}
+ variances.}
\item{var.contrib}{a \code{logical} indicating whether the
contribution of original variables (alleles, for \linkS4class{genind} objects)
should be provided (TRUE, default) or not (FALSE). Such output can be useful,
@@ -127,10 +130,6 @@
dimension reduction is not performed (saving computational time) but
taken directly from this object.}
\item{object}{a \code{dapc} object.}
- \item{scale.method}{a \code{character} specifying the scaling method to be used
- for allele frequencies, which must match "sigma" (usual estimate of standard
- deviation) or "binom" (based on binomial distribution). See \code{\link{scaleGen}} for
- further details.}
\item{truenames}{a \code{logical} indicating whether true (i.e., user-specified)
labels should be used in object outputs (TRUE, default) or not (FALSE).}
\item{dudi}{optionally, a multivariate analysis with the class
@@ -143,10 +142,11 @@
original ('training') data. In particular, variables must be exactly
the same as in the original data. For \linkS4class{genind}
objects, see \code{\link{repool}} to ensure matching of alleles.}
- \item{training.set}{the percentage of individuals randomly chosen in each population
- as the training set used for cross-validation. This value is applied to all groups/pops
- defined in the object. The default is set to 90\%.
- For meaningful cross-validation it is recommended not to go below 80\%}
+ \item{n.pca.max}{maximum number of PCA components to retain.}
+ \item{training.set}{the proportion of data (individuals) to be used
+ for the training set; defaults to one half.}
+ \item{n.rep}{the number of replicate to be used for each number of PCA
+ components retained.}
\item{prior,dimen,method}{see \code{?predict.lda}.}
}
\details{
@@ -179,8 +179,9 @@
\item{tab}{matrix of retained principal components of PCA}
\item{loadings}{principal axes of DAPC, giving coefficients of the linear
combination of retained PCA axes.}
- \item{ind.coord}{principal components of DAPC, giving the coordinates of individuals onto
- principal axes of DAPC; also called the discriminant functions.}
+ \item{ind.coord}{principal components of DAPC, giving the coordinates
+ of individuals onto principal axes of DAPC; also called the
+ discriminant functions.}
\item{grp.coord}{coordinates of the groups onto the principal axes of DAPC.}
\item{posterior}{a data.frame giving posterior membership probabilities for
all individuals and all clusters.}
@@ -198,10 +199,12 @@
\code{assign.prop} (proportion of overall correct assignment),
\code{assign.per.pop} (proportion of correct assignment per group),
\code{prior.grp.size} (prior group sizes), and \code{post.grp.size} (posterior
- group sizes), \code{xval.dapc}, \code{xval.genind} and \code{xval} (all return a list of four lists, each one with as
- many items as cross-validation runs. The first item is a list of \code{assign} components,
- the secon is a list of \code{posterior} components, the thirs is a list of \code{ind.score}
- components and the fourth is a list of \code{match.prp} items, i.e. the prortion of the validation
+ group sizes), \code{xval.dapc}, \code{xval.genind} and \code{xval}
+ (all return a list of four lists, each one with as many items as
+ cross-validation runs. The first item is a list of \code{assign}
+ components, the secon is a list of \code{posterior} components, the
+ thirs is a list of \code{ind.score} components and the fourth is a
+ list of \code{match.prp} items, i.e. the prortion of the validation
set correctly matched to its original population)
}
\references{
@@ -314,30 +317,12 @@
title("30 indiv popA, 30 indiv pop B, 30 hybrids")
## CROSS-VALIDATION ##
-# select dataset
-data(microbov)
-summary(microbov) # the dataset contains 15 populations of different sizes
+data(sim2pop)
+xval <- xvalDapc(sim2pop at tab, pop(sim2pop), n.pca.max=100, n.rep=3)
+xval
+boxplot(xval$success~xval$n.pca, xlab="Number of PCA components",
+ylab="Classification succes", main="DAPC - cross-validation")
-# we take a fixed number of disriminant functions (15 in this case)
-# and we test how the cross-validation does varying the number of PCs
-# we specify the *maximum* number of PCs, and we will test how
-# the cross-validation performs by going from 1 PC to the maximum
-# we specified in the fucntion call
-
-crossval.test <- xval.dapc(microbov, n.pca = 40, n.da = 15, training.set = 90)
-
-attributes(crossval.test) # we get four lists of lists
-# namely "assign" "posterior" "ind.score" "match.prp"
-# a quick visual inspection of the cross-validation
-
-plot(unlist(crossval.test$match.prp))
-
-# the use can also just call xval:
-crossval.test2 <- xval(microbov, n.pca = 40, n.da = 15, training.set = 90)
-plot(unlist(crossval.test2$match.prp))
-
-
-
}
Modified: pkg/man/dapcGraphics.Rd
===================================================================
--- pkg/man/dapcGraphics.Rd 2013-05-14 15:21:10 UTC (rev 1127)
+++ pkg/man/dapcGraphics.Rd 2013-05-14 21:54:19 UTC (rev 1128)
@@ -22,23 +22,23 @@
}
\usage{
\method{scatter}{dapc}(x, xax=1, yax=2, grp=x$grp, col=rainbow(length(levels(grp))),
- pch=20, bg="lightgrey", solid=.7, scree.da=TRUE,
- scree.pca=FALSE, posi.da="bottomright",
- posi.pca="bottomleft", bg.inset="white", ratio.da=.25,
- ratio.pca=.25, inset.da=0.02, inset.pca=0.02,
- inset.solid=.5, onedim.filled=TRUE, mstree=FALSE, lwd=1,
- lty=1, segcol="black", legend=FALSE, posi.leg="topright",
- cleg=1, txt.leg=levels(grp), cstar = 1, cellipse = 1.5,
- axesell = FALSE, label = levels(grp), clabel = 1, xlim =
- NULL, ylim = NULL, grid = FALSE, addaxes = TRUE, origin =
- c(0,0), include.origin = TRUE, sub = "", csub = 1, possub =
- "bottomleft", cgrid = 1, pixmap = NULL, contour = NULL, area
- = NULL, \ldots)
+ pch=20, bg="lightgrey", solid=.7, scree.da=TRUE,
+ scree.pca=FALSE, posi.da="bottomright",
+ posi.pca="bottomleft", bg.inset="white", ratio.da=.25,
+ ratio.pca=.25, inset.da=0.02, inset.pca=0.02,
+ inset.solid=.5, onedim.filled=TRUE, mstree=FALSE, lwd=1,
+ lty=1, segcol="black", legend=FALSE, posi.leg="topright",
+ cleg=1, txt.leg=levels(grp), cstar = 1, cellipse = 1.5,
+ axesell = FALSE, label = levels(grp), clabel = 1, xlim =
+ NULL, ylim = NULL, grid = FALSE, addaxes = TRUE, origin =
+ c(0,0), include.origin = TRUE, sub = "", csub = 1, possub =
+ "bottomleft", cgrid = 1, pixmap = NULL, contour = NULL, area
+ = NULL, \ldots)
assignplot(x, only.grp=NULL, subset=NULL, new.pred=NULL, cex.lab=.75,pch=3)
compoplot(x, only.grp=NULL, subset=NULL, new.pred=NULL, col=NULL, lab=NULL,
- legend=TRUE, txt.leg=NULL, ncol=4, posi=NULL, cleg=.8, bg=transp("white"), ...)
+ legend=TRUE, txt.leg=NULL, ncol=4, posi=NULL, cleg=.8, bg=transp("white"), ...)
}
\arguments{
\item{x}{a \code{dapc} object.}
@@ -156,7 +156,8 @@
inset.pca=c(.01,.3), lab=paste("year\n",2001:2006), axesel=FALSE, col=terrain.colors(10))
## without ellipses, use legend for groups
-scatter(dapc1, cell=0, cstar=0, scree.da=FALSE, clab=0, cex=3, solid=.4, bg="white", leg=TRUE, posi.leg="topleft")
+scatter(dapc1, cell=0, cstar=0, scree.da=FALSE, clab=0, cex=3,
+solid=.4, bg="white", leg=TRUE, posi.leg="topleft")
## only one axis
scatter(dapc1,1,1,scree.da=FALSE, legend=TRUE, solid=.4,bg="white")
@@ -174,7 +175,8 @@
dapc2
## plot results
-scatter(dapc2, scree.da=FALSE, leg=TRUE, txt.leg=paste("group", c('A','B')), col=c("red","blue"))
+scatter(dapc2, scree.da=FALSE, leg=TRUE, txt.leg=paste("group",
+c('A','B')), col=c("red","blue"))
## SNP contributions
loadingplot(dapc2$var.contr)
Modified: pkg/man/eHGDP.Rd
===================================================================
--- pkg/man/eHGDP.Rd 2013-05-14 15:21:10 UTC (rev 1127)
+++ pkg/man/eHGDP.Rd 2013-05-14 21:54:19 UTC (rev 1128)
@@ -30,7 +30,7 @@
d'Etude du Polymorphisme Humain (CEPH). See reference [4] for Native
American populations.
- This copy of the dataset was prepared by Francois Balloux (f.balloux at imperial.ac.uk).
+ This copy of the dataset was prepared by Francois Balloux.
}
\references{
[1] Rosenberg NA, Pritchard JK, Weber JL, Cann HM, Kidd KK, et
@@ -62,7 +62,8 @@
## PERFORM DAPC - USE POPULATIONS AS CLUSTERS
## to reproduce exactly analyses from the paper, use "n.pca=1000"
-dapc1 <- dapc(eHGDP, all.contrib=TRUE, scale=FALSE, n.pca=200, n.da=80) # takes 2 minutes
+dapc1 <- dapc(eHGDP, all.contrib=TRUE, scale=FALSE,
+n.pca=200, n.da=80) # takes 2 minutes
dapc1
## (see ?dapc for details about the output)
@@ -70,7 +71,8 @@
## SCREEPLOT OF EIGENVALUES
-barplot(dapc1$eig, main="eHGDP - DAPC eigenvalues", col=c("red","green","blue", rep("grey", 1000)))
+barplot(dapc1$eig, main="eHGDP - DAPC eigenvalues",
+col=c("red","green","blue", rep("grey", 1000)))
@@ -111,7 +113,8 @@
## and then
## plot(grp$Kstat, type="b", col="blue")
-grp <- find.clusters(eHGDP, max.n=30, n.pca=200, scale=FALSE, n.clust=4) # takes about 2 minutes
+grp <- find.clusters(eHGDP, max.n=30, n.pca=200,
+scale=FALSE, n.clust=4) # takes about 2 minutes
names(grp)
## (see ?find.clusters for details about the output)
@@ -120,7 +123,8 @@
## PERFORM DAPC - USE POPULATIONS AS CLUSTERS
## to reproduce exactly analyses from the paper, use "n.pca=1000"
-dapc2 <- dapc(eHGDP, pop=grp$grp, all.contrib=TRUE, scale=FALSE, n.pca=200, n.da=80) # takes around a 1 minute
+dapc2 <- dapc(eHGDP, pop=grp$grp, all.contrib=TRUE,
+scale=FALSE, n.pca=200, n.da=80) # takes around a 1 minute
dapc2
@@ -131,7 +135,8 @@
## MAP DAPC2 RESULTS
if(require(maps)){
-xy <- cbind(eHGDP$other$popInfo$Longitude, eHGDP$other$popInfo$Latitude)
+xy <- cbind(eHGDP$other$popInfo$Longitude,
+eHGDP$other$popInfo$Latitude)
myCoords <- apply(dapc2$ind.coord, 2, tapply, pop(eHGDP), mean)
Modified: pkg/man/find.clusters.Rd
===================================================================
--- pkg/man/find.clusters.Rd 2013-05-14 15:21:10 UTC (rev 1127)
+++ pkg/man/find.clusters.Rd 2013-05-14 21:54:19 UTC (rev 1128)
@@ -50,8 +50,7 @@
stat=c("BIC","AIC", "WSS"), choose.n.clust=TRUE,
criterion=c("diffNgroup", "min","goesup", "smoothNgoesup",
"goodfit"), max.n.clust=round(nrow(x at tab)/10), n.iter=1e5,
- n.start=10, scale=FALSE, scale.method=c("sigma", "binom"),
- truenames=TRUE, \ldots)
+ n.start=10, scale=FALSE, truenames=TRUE, \ldots)
\method{find.clusters}{genlight}(x, clust=NULL, n.pca=NULL,
n.clust=NULL, stat=c("BIC", "AIC",
@@ -118,11 +117,7 @@
percentage (interactively, or via \code{perc.pca}). }
\item{perc.pca}{a \code{numeric} value between 0 and 100 indicating the
minimal percentage of the total variance of the data to be expressed by the
- retained axes of PCA.}
- \item{scale.method}{a \code{character} specifying the scaling method to be used
- for allele frequencies, which must match "sigma" (usual estimate of standard
- deviation) or "binom" (based on binomial distribution). See
- \code{\link{scaleGen}} for further details.}
+ retained axes of PCA.}
\item{truenames}{a \code{logical} indicating whether true (i.e., user-specified)
labels should be used in object outputs (TRUE, default) or not
(FALSE), in which case generic labels are used.}
@@ -231,7 +226,8 @@
data(eHGDP)
## here, n.clust is specified, so that only on K value is used
-grp <- find.clusters(eHGDP, max.n=30, n.pca=200, scale=FALSE, n.clust=4) # takes about 2 minutes
+grp <- find.clusters(eHGDP, max.n=30, n.pca=200, scale=FALSE,
+n.clust=4) # takes about 2 minutes
names(grp)
grp$Kstat
grp$stat
@@ -258,7 +254,8 @@
## DETECTION WITH AIC (less clear-cut)
foo.AIC <- find.clusters(sim2pop, n.pca=100, choose=FALSE, stat="AIC")
-plot(foo.AIC$Kstat, type="o", xlab="number of clusters (K)", ylab="AIC", col="purple", main="Detection based on AIC")
+plot(foo.AIC$Kstat, type="o", xlab="number of clusters (K)",
+ylab="AIC", col="purple", main="Detection based on AIC")
points(2, foo.AIC$Kstat[2], pch="x", cex=3)
mtext(3, tex="'X' indicates the actual number of clusters")
@@ -276,7 +273,8 @@
x
plot(x)
grp <- find.clusters(x, n.pca=100, choose=FALSE, stat="BIC")
-plot(grp$Kstat, type="o", xlab="number of clusters (K)",ylab="BIC",main="find.clusters on a genlight object\n(two groups)")
+plot(grp$Kstat, type="o", xlab="number of clusters (K)",
+ylab="BIC",main="find.clusters on a genlight object\n(two groups)")
}
}
\keyword{multivariate}
Modified: pkg/man/inbreeding.Rd
===================================================================
--- pkg/man/inbreeding.Rd 2013-05-14 15:21:10 UTC (rev 1127)
+++ pkg/man/inbreeding.Rd 2013-05-14 21:54:19 UTC (rev 1128)
@@ -11,7 +11,8 @@
1 is acceptable.
}
\usage{
-inbreeding(x, pop = NULL, truenames = TRUE, res.type = c("sample", "function"), N = 200, M = N * 10)
+inbreeding(x, pop = NULL, truenames = TRUE, res.type = c("sample","function"),
+ N = 200, M = N * 10)
}
\arguments{
\item{x}{an object of class \linkS4class{genind}.}
@@ -22,9 +23,9 @@
used (TRUE, default) instead of generic labels (FALSE); used if
res.type is "matrix".}
\item{res.type}{a character string matching "sample" or "function",
- specifying whether the output should be a function giving the density of probability
- of F values ("function") or a sample of F values taken from this
- distribution ("sample", default).}
+ specifying whether the output should be a function giving the density
+ of probability of F values ("function") or a sample of F values taken
+ from this distribution ("sample", default).}
\item{N}{an integer indicating the size of the sample to be taken from
the distribution of F values.}
\item{M}{an integer indicating the number of different F values to be
@@ -43,10 +44,10 @@
probability for an individual to inherit two identical alleles from a
single ancestor.
- Let \eqn{p_i} refer to the frequency of allele \eqn{i} in the population. Let
- \eqn{h} be an variable which equates 1 if the individual is
- homozygote, and 0 otherwise. For one locus, the probability of being
- homozygote is computed as:
+ Let \eqn{p_i} refer to the frequency of allele \eqn{i} in the
+ population. Let \eqn{h} be an variable which equates 1 if the
+ individual is homozygote, and 0 otherwise. For one locus, the
+ probability of being homozygote is computed as:
\eqn{ F + (1-F) \sum_i p_i^2}
@@ -72,11 +73,13 @@
Fsamp <- inbreeding(lagun, N=30)
## plot the first 10 results
-invisible(sapply(Fsamp[1:10], function(e) plot(density(e), xlab="F", xlim=c(0,1), main="Density of the sampled F values")))
+invisible(sapply(Fsamp[1:10], function(e) plot(density(e), xlab="F",
+xlim=c(0,1), main="Density of the sampled F values")))
## compute means for all individuals
Fmean=sapply(Fsamp, mean)
-hist(Fmean, col="orange", xlab="mean value of F", main="Distribution of mean F across individuals")
+hist(Fmean, col="orange", xlab="mean value of F",
+main="Distribution of mean F across individuals")
## estimate inbreeding - return proba density functions
Fdens <- inbreeding(lagun, res.type="function")
@@ -85,6 +88,7 @@
Fdens[[1]]
## plot the first 10 functions
-invisible(sapply(Fdens[1:10], plot, ylab="Density", main="Density of probability of F values"))
+invisible(sapply(Fdens[1:10], plot, ylab="Density",
+main="Density of probability of F values"))
}
}
\ No newline at end of file
Modified: pkg/man/loadingplot.Rd
===================================================================
--- pkg/man/loadingplot.Rd 2013-05-14 15:21:10 UTC (rev 1127)
+++ pkg/man/loadingplot.Rd 2013-05-14 21:54:19 UTC (rev 1128)
@@ -13,10 +13,11 @@
\usage{
loadingplot(x, \dots)
-\method{loadingplot}{default}(x, at=NULL, threshold=quantile(x,0.75), axis=1, fac=NULL, byfac=FALSE,
+\method{loadingplot}{default}(x, at=NULL, threshold=quantile(x,0.75),
+ axis=1, fac=NULL, byfac=FALSE,
lab=NULL, cex.lab=0.7, cex.fac=1, lab.jitter=0,
- main="Loading plot", xlab="Variables", ylab="Loadings", srt = 0, adj = NULL, \dots)
-
+ main="Loading plot", xlab="Variables", ylab="Loadings",
+ srt = 0, adj = NULL, \dots)
}
\arguments{
\item{x}{either a vector with numeric values to be plotted, or a
Modified: pkg/man/read.structure.Rd
===================================================================
--- pkg/man/read.structure.Rd 2013-05-14 15:21:10 UTC (rev 1127)
+++ pkg/man/read.structure.Rd 2013-05-14 21:54:19 UTC (rev 1128)
@@ -17,7 +17,9 @@
\code{\link{df2genind}}.
}
\usage{
-read.structure(file, n.ind=NULL, n.loc=NULL, onerowperind=NULL, col.lab=NULL, col.pop=NULL, col.others=NULL, row.marknames=NULL, NA.char="-9", pop=NULL, missing=NA, ask=TRUE, quiet=FALSE)
+read.structure(file, n.ind=NULL, n.loc=NULL, onerowperind=NULL,
+ col.lab=NULL, col.pop=NULL, col.others=NULL, row.marknames=NULL,
+ NA.char="-9", pop=NULL, missing=NA, ask=TRUE, quiet=FALSE)
}
\arguments{
\item{file}{ a character string giving the path to the file to
@@ -31,9 +33,9 @@
labels of genotypes. '0' if absent.}
\item{col.pop}{an integer giving the index of the column containing
population to which genotypes belong. '0' if absent.}
[TRUNCATED]
To get the complete diff run:
svnlook diff /svnroot/adegenet -r 1128
From noreply at r-forge.r-project.org Wed May 15 12:57:43 2013
From: noreply at r-forge.r-project.org (noreply at r-forge.r-project.org)
Date: Wed, 15 May 2013 12:57:43 +0200 (CEST)
Subject: [adegenet-commits] r1129 - in pkg: . R
Message-ID: <20130515105743.F41C918453F@r-forge.r-project.org>
Author: jombart
Date: 2013-05-15 12:57:43 +0200 (Wed, 15 May 2013)
New Revision: 1129
Removed:
pkg/TODO
Modified:
pkg/R/glPlot.R
Log:
fixed a NOTE due to glPlot; removed useless TODO file
Modified: pkg/R/glPlot.R
===================================================================
--- pkg/R/glPlot.R 2013-05-14 21:54:19 UTC (rev 1128)
+++ pkg/R/glPlot.R 2013-05-15 10:57:43 UTC (rev 1129)
@@ -31,6 +31,9 @@
+## hack to remove the NOTE in R CMD check about:
+## "plot,genlight: no visible binding for global variable ?y?"
+if(getRversion() >= "2.15.1") utils::globalVariables("y")
## plot method
setMethod("plot", signature(x="genlight", y="ANY"), function(x, y=NULL, col=NULL, legend=TRUE,
Deleted: pkg/TODO
===================================================================
--- pkg/TODO 2013-05-14 21:54:19 UTC (rev 1128)
+++ pkg/TODO 2013-05-15 10:57:43 UTC (rev 1129)
@@ -1,82 +0,0 @@
-#######################
-#
-# adegenet TODO list
-#
-#######################
-#
-# please list here all intended modifications
-# and all detected bugs
-#
-# please add a "-- done" or "-- fixed" tag when
-# you achieved something
-# '*' indicates stuff to do
-# 'o' indicates done stuff
-# '*o*' indicates partly done stuff
-#
-# Inside a given section, priority goes decreasing.
-#
-# Delete fixed things each new release.
-#
-# T.J. 2008
-#
-######################
-
-
-
-# FOR NEXT STABLE VERSION
-=========================
-=========================
-
-# CODE ISSUES:
-==============
-* fix request 1.2-2.04 (implement adjusted heretozygosity in summary)
-
-
-# DOCUMENTATION ISSUES:
-=======================
-* explain new changes inside the tutorial (handling of AFLP/RAPD...)
-
-
-# NEW IMPLEMENTATIONS:
-=====================
-
-o allow genind2df to export alleles on separate columns -- done
-
-
-# TESTING:
-==========
-
-
-# LOW PRIORITY / MINOR ISSUES
-===========================
-===========================
-* in spca, when nfposi=0, the returned object actually contains what corresponds to nfposi=1. Comes from multispati in ade4. To correct in ade4.
-* use spcaIllus to illustrate global.rtest and local.rtest
-* Implement a method to merge different markers for the same individuals
-*o* Build accessors for:
-- marker names -- done
- - linux sources
- - MacOS X binary
- - Windows binary
+ version
[24] SANTOS, H., BURBAN, C., ROUSSELET, J.,
@@ -555,6 +568,7 @@
+
pityocampa, Lepidoptera, Notodontidae). Journal of Evolutionary Biology,
no. doi: 10.1111/j.1420-9101.2010.02147.x
@@ -580,6 +594,7 @@
+
Vol. Sci. Pap. ICCAT
[26] 2010 Population Genetic
@@ -619,6 +635,7 @@
+
ONE5(11): e13810. doi:10.1371/journal.pone.0013810
@@ -651,6 +668,7 @@
+
DOI: 10.1007/s10329-010-0232-4
* adegenet not or wrongly cited, but actually
used in the paper.
From noreply at r-forge.r-project.org Tue May 28 21:58:19 2013
From: noreply at r-forge.r-project.org (noreply at r-forge.r-project.org)
Date: Tue, 28 May 2013 21:58:19 +0200 (CEST)
Subject: [adegenet-commits] r1136 - www
Message-ID: <20130528195820.0F02E1855DE@r-forge.r-project.org>
Author: jombart
Date: 2013-05-28 21:58:19 +0200 (Tue, 28 May 2013)
New Revision: 1136
Modified:
www/literature.html
Log:
+3 ref
Modified: www/literature.html
===================================================================
--- www/literature.html 2013-05-21 10:26:08 UTC (rev 1135)
+++ www/literature.html 2013-05-28 19:58:19 UTC (rev 1136)
@@ -66,6 +66,7 @@
+
the bublisher's website]
@@ -101,6 +102,7 @@
+
abstract]
- the paper presenting the spatial
@@ -121,6 +123,7 @@
+
principal component analysis (sPCA, function spca), global and
@@ -144,6 +147,7 @@
+
cryptic spatial patterns in genetic variability by a new
multivariate method. Heredity
101: 92-103. doi:
@@ -168,6 +172,7 @@
+
abstract]
@@ -194,6 +199,7 @@
+
simulations of genealoies of haplotypes (haploGen):
Jombart T, Eggo RM, Dodd PJ, Balloux F (2010)
@@ -220,6 +226,7 @@
+
of Principal Components
[24] SANTOS, H., BURBAN, C., ROUSSELET, J.,
@@ -569,6 +582,7 @@
+
pityocampa, Lepidoptera, Notodontidae). Journal of Evolutionary Biology,
no. doi: 10.1111/j.1420-9101.2010.02147.x
@@ -595,6 +609,7 @@
+
Vol. Sci. Pap. ICCAT, 65(3): 988-995
[26] 2010 Population Genetic
@@ -636,6 +652,7 @@
+
ONE5(11): e13810. doi:10.1371/journal.pone.0013810
@@ -669,6 +686,7 @@
+
DOI: 10.1007/s10329-010-0232-4
* adegenet not or wrongly cited, but actually
used in the paper.