From vanesse.labeyrie at cirad.fr Thu Sep 1 15:07:08 2016 From: vanesse.labeyrie at cirad.fr (vlabeyrie) Date: Thu, 01 Sep 2016 15:07:08 +0200 Subject: [adegenet-forum] Plot supplementary individuals DAPC Message-ID: <57C827FC.4010905@cirad.fr> Dear adegenet users, I have a problem when plotting supplementary individuals on DAPC scatterplot: while they are clearly assigned to a group (with a very high probability), they are displayed outside of this group on the scatterplot. I defined two datasets: 1 to perform DAPC and another of supplementary individuals x.sup_80<-Ge_atp_gcp_80[c(1:nrow(Ge_atp at tab)),] # supplementary individuals x_80<-Ge_atp_gcp_80[-c(1:nrow(Ge_atp at tab)),] # Individuals on which performing DAPC Then I performed DAPC on X_80, specifying a-priori groups dapc_GCP_14ssr_STRk5_b<- dapc(x_80, pop(x_80), n.pca=30,n.da=4) #perform DAPC I assigned supplementary individuals to DAPC groups predict_atp_strk5<-predict.dapc(dapc_GCP_14ssr_STRk5_b,newdata=x.sup_80 The predicted group memberships of supplementary individuals based on DAPC results is high, so I expect that supplementary individuals would be located in the DAPC groups ... But it is not the case, as on the scatterplot, supplementary individuals mostly appear outside from the groups to which they are assigned !! col<-c("#F8766D", "#A3A500", "#00BF7D", "#00B0F6", "#E76BF3") # colors for DAPC individuals colb<-c("darkblue","dodgerblue","darkorange2","red","gold","grey") # colors for supplementary individuals #axes 1 and 2 col.points_80<-transp(col[as.integer(pop(x_80))],.2) # define the color of DAPC individuals as transparent scatter(dapc_GCP_14ssr_STRk5_b,col=col,bg="white",scree.da=0,pch="",cstar=0,clab=0,xlim=c(-10,10),legend=F)# par(xpd=TRUE) points(dapc_GCP_14ssr_STRk5_b$ind.coord[,1], dapc_GCP_14ssr_STRk5_b$ind.coord[,2],pch=20,col=col.points_80,cex=1) ## scatter DAPC groups col.sup_80<-colb[as.integer(pop(x.sup_80))] ## Define supplementary individuals color points(predict_atp_strk5$ind.scores[,1], predict_atp_strk5$ind.scores[,2],pch=8,col=transp(col.sup_80,< span class="number" style="box-sizing: border-box; color: rgb(0, 153, 153);">.7),cex=1) # plot supplementary individuals With a previous version of adegenet, this problem did not appear as supplementary individuals were located within the DAPC groups, and this problem appeared while running my script with the new adegenet version ... Does someone have an idea of what the problem is ? I can provide the picture of the scatterplot, full script and data if needed Thank you for your help ! -- Vanesse Labeyrie -------------- next part -------------- An HTML attachment was scrubbed... URL: From thibautjombart at gmail.com Thu Sep 1 17:10:16 2016 From: thibautjombart at gmail.com (Thibaut Jombart) Date: Thu, 1 Sep 2016 16:10:16 +0100 Subject: [adegenet-forum] Plot supplementary individuals DAPC In-Reply-To: <57C827FC.4010905@cirad.fr> References: <57C827FC.4010905@cirad.fr> Message-ID: Hi there, it could be 2 things: a normal behaviour, or a bug. For the first, the result could make sense: individuals will be assigned to their closest group, even if they fall outside the group's cloud of point. So the result in itself may not be too surprising. The fact that the result changed with the version of adegenet is more concerning at it suggests a bug. However, the DAPC itself hasn't changed much. Can you provide a reproducible example showing the change? Best Thibaut -- Dr Thibaut Jombart Lecturer, Department of Infectious Disease Epidemiology Imperial College London https://sites.google.com/site/thibautjombart/ https://github.com/thibautjombart Twitter: @TeebzR On 1 September 2016 at 14:07, vlabeyrie wrote: > Dear adegenet users, > > I have a problem when plotting supplementary individuals on DAPC > scatterplot: while they are clearly assigned to a group (with a very high > probability), they are displayed outside of this group on the scatterplot. > > > I defined two datasets: 1 to perform DAPC and another of supplementary > individuals > > x.sup_80<-Ge_atp_gcp_80[c(1:nrow(Ge_atp at tab)),] # supplementary > individuals > > x_80<-Ge_atp_gcp_80[-c(1:nrow(Ge_atp at tab)),] # Individuals on which > performing DAPC > > Then I performed DAPC on X_80, specifying a-priori groups > > dapc_GCP_14ssr_STRk5_b<- dapc(x_80, pop(x_80), n.pca=30,n.da=4) #perform > DAPC > > I assigned supplementary individuals to DAPC groups > > predict_atp_strk5<-predict.dapc(dapc_GCP_14ssr_STRk5_b,newdata=x.sup_80 > > The predicted group memberships of supplementary individuals based on DAPC > results is high, so I expect that supplementary individuals would be > located in the DAPC groups ... > > But it is not the case, as on the scatterplot, supplementary individuals > mostly appear outside from the groups to which they are assigned !! > > col<-c("#F8766D", "#A3A500", "#00BF7D", "#00B0F6", "#E76BF3") # colors for > DAPC individuals > colb<-c("darkblue","dodgerblue","darkorange2","red","gold","grey") # > colors for supplementary individuals > > #axes 1 and 2 > col.points_80<-transp(col[as.integer(pop(x_80))],.2) # define the color > of DAPC individuals as transparent > scatter(dapc_GCP_14ssr_STRk5_b,col=col,bg="white",scree.da= > 0,pch="",cstar=0,clab=0,xlim=c(-10,10),legend=F)# > par(xpd=TRUE) > points(dapc_GCP_14ssr_STRk5_b$ind.coord[,1], dapc_GCP_14ssr_STRk5_b$ind. > coord[,2],pch=20,col=col.points_80,cex=1) ## scatter DAPC groups > > col.sup_80<-colb[as.integer(pop(x.sup_80))] ## Define supplementary > individuals color > points(predict_atp_strk5$ind.scores[,1], predict_atp_strk5$ind.scores[, > 2],pch=8,col=transp(col.sup_80,< > span class="number" style="box-sizing: border-box; color: rgb(0, 153, > 153);">.7),cex=1) # plot supplementary individuals > > > With a previous version of adegenet, this problem did not appear as > supplementary individuals were located within the DAPC groups, and this > problem appeared while running my script with the new adegenet version ... > > Does someone have an idea of what the problem is ? I can provide the > picture of the scatterplot, full script and data if needed > > > Thank you for your help ! > > -- > Vanesse Labeyrie > > > > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/ > listinfo/adegenet-forum > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alangarcia87 at hotmail.com Thu Sep 1 21:15:26 2016 From: alangarcia87 at hotmail.com (Alan Garcia-Elfring) Date: Thu, 1 Sep 2016 15:15:26 -0400 Subject: [adegenet-forum] IBD test: Adding spatial variable to "optional content" of genepop object Message-ID: Hi everyone, I'm trying to use the IBD test shown on chapter 7 of the tutorial. The example data used, spcaIllus, has spatial information that my .genepop files do not have. How does one incorporate the spatial data into the "optional content" ? Any help is greatly appreciated. Best, Alan -------------- next part -------------- An HTML attachment was scrubbed... URL: From thibautjombart at gmail.com Fri Sep 2 11:18:49 2016 From: thibautjombart at gmail.com (Thibaut Jombart) Date: Fri, 2 Sep 2016 10:18:49 +0100 Subject: [adegenet-forum] IBD test: Adding spatial variable to "optional content" of genepop object In-Reply-To: References: Message-ID: Hi there You can read xy coordinates separately using read.csv, read.table etc. For the test there is no need to attach the xy coordinates to the object. But to do so, use other(x)$xy <- xy Where 'x' is your genind or genpop. Accessors are documented in the basics tutorial (adegenetTutorial()). Cheers Thibaut On 1 Sep 2016 20:15, "Alan Garcia-Elfring" wrote: > Hi everyone, > > I'm trying to use the IBD test shown on chapter 7 of the tutorial. The > example data used, spcaIllus, has spatial information that my .genepop > files do not have. > > How does one incorporate the spatial data into the "optional content" ? > > Any help is greatly appreciated. > > Best, > Alan > > > > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/ > listinfo/adegenet-forum > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thibautjombart at gmail.com Fri Sep 2 13:11:01 2016 From: thibautjombart at gmail.com (Thibaut Jombart) Date: Fri, 2 Sep 2016 12:11:01 +0100 Subject: [adegenet-forum] Plot supplementary individuals DAPC In-Reply-To: <57C827FC.4010905@cirad.fr> References: <57C827FC.4010905@cirad.fr> Message-ID: Dear all, after some exchanges off the forum (data involved), it turned out there was indeed a problem: predict.dapc was using raw allele counts and not frequencies, because of an implicit conversion using as.matrix, which changed behaviour since version 2.0. This means that all supplementary individuals were effectively in the right direction, but twice as far from the origin as they should have been. I have now fixed this in the devel version (commit b8c29c51). More importantly: I have recompiled the DAPC tutorial, so that the ugly, unbearable rainbow colors have now been replaced by the funky palette in the microbov example. See adegenetTutorial("dapc"). Thanks again to Vanesse for flagging this issue! Best Thibaut -- Dr Thibaut Jombart Lecturer, Department of Infectious Disease Epidemiology Imperial College London https://sites.google.com/site/thibautjombart/ https://github.com/thibautjombart Twitter: @TeebzR On 1 September 2016 at 14:07, vlabeyrie wrote: > Dear adegenet users, > > I have a problem when plotting supplementary individuals on DAPC > scatterplot: while they are clearly assigned to a group (with a very high > probability), they are displayed outside of this group on the scatterplot. > > > I defined two datasets: 1 to perform DAPC and another of supplementary > individuals > > x.sup_80<-Ge_atp_gcp_80[c(1:nrow(Ge_atp at tab)),] # supplementary > individuals > > x_80<-Ge_atp_gcp_80[-c(1:nrow(Ge_atp at tab)),] # Individuals on which > performing DAPC > > Then I performed DAPC on X_80, specifying a-priori groups > > dapc_GCP_14ssr_STRk5_b<- dapc(x_80, pop(x_80), n.pca=30,n.da=4) #perform > DAPC > > I assigned supplementary individuals to DAPC groups > > predict_atp_strk5<-predict.dapc(dapc_GCP_14ssr_STRk5_b,newdata=x.sup_80 > > The predicted group memberships of supplementary individuals based on DAPC > results is high, so I expect that supplementary individuals would be > located in the DAPC groups ... > > But it is not the case, as on the scatterplot, supplementary individuals > mostly appear outside from the groups to which they are assigned !! > > col<-c("#F8766D", "#A3A500", "#00BF7D", "#00B0F6", "#E76BF3") # colors for > DAPC individuals > colb<-c("darkblue","dodgerblue","darkorange2","red","gold","grey") # > colors for supplementary individuals > > #axes 1 and 2 > col.points_80<-transp(col[as.integer(pop(x_80))],.2) # define the color > of DAPC individuals as transparent > scatter(dapc_GCP_14ssr_STRk5_b,col=col,bg="white",scree.da= > 0,pch="",cstar=0,clab=0,xlim=c(-10,10),legend=F)# > par(xpd=TRUE) > points(dapc_GCP_14ssr_STRk5_b$ind.coord[,1], dapc_GCP_14ssr_STRk5_b$ind. > coord[,2],pch=20,col=col.points_80,cex=1) ## scatter DAPC groups > > col.sup_80<-colb[as.integer(pop(x.sup_80))] ## Define supplementary > individuals color > points(predict_atp_strk5$ind.scores[,1], predict_atp_strk5$ind.scores[, > 2],pch=8,col=transp(col.sup_80,< > span class="number" style="box-sizing: border-box; color: rgb(0, 153, > 153);">.7),cex=1) # plot supplementary individuals > > > With a previous version of adegenet, this problem did not appear as > supplementary individuals were located within the DAPC groups, and this > problem appeared while running my script with the new adegenet version ... > > Does someone have an idea of what the problem is ? I can provide the > picture of the scatterplot, full script and data if needed > > > Thank you for your help ! > > -- > Vanesse Labeyrie > > > > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/ > listinfo/adegenet-forum > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thibautjombart at gmail.com Mon Sep 5 11:48:56 2016 From: thibautjombart at gmail.com (Thibaut Jombart) Date: Mon, 5 Sep 2016 10:48:56 +0100 Subject: [adegenet-forum] IBD test: Adding spatial variable to "optional content" of genepop object In-Reply-To: References: Message-ID: Hello, the practice you describe is a Mantel correlogram. It is not, to my knowledge, implemented in ade4, but I think it is in the package vegan. Best Thibaut -- Dr Thibaut Jombart Lecturer, Department of Infectious Disease Epidemiology Imperial College London https://sites.google.com/site/thibautjombart/ https://github.com/thibautjombart Twitter: @TeebzR On 2 September 2016 at 20:28, Alan Garcia-Elfring wrote: > > Thanks! > However, is there a way to check for IBD for a subset of the genetic a > geographical distances, like the in the data frame below? > My sites are separated by 2 rivers (3 transects) so I'd like to test for > IBD within transects. > > Ex: the differentiation and the distance of 10 sites relative to their > southern-most site > > Sites (km) Fst > N2 37.56 0.0431971 > N3 120.05 0.0521581 > N4 148.25 0.0570743 > SW2 19.94 0.0365175 > SW3 23.55 0.0340058 > SW4 55.50 0.0412582 > SW5 56.83 0.0404807 > SE2 40.35 0.0374914 > SE3 66.39 0.0357506 > SE4 89.53 0.0404285 > > It may not be applicable to the mantel.randtest() but I'm not sure. > Thanks, again. > Alan > > > ------------------------------ > From: thibautjombart at gmail.com > Date: Fri, 2 Sep 2016 10:18:49 +0100 > Subject: Re: [adegenet-forum] IBD test: Adding spatial variable to > "optional content" of genepop object > To: alangarcia87 at hotmail.com > CC: adegenet-forum at lists.r-forge.r-project.org > > > Hi there > > You can read xy coordinates separately using read.csv, read.table etc. For > the test there is no need to attach the xy coordinates to the object. But > to do so, use other(x)$xy <- xy > Where 'x' is your genind or genpop. Accessors are documented in the basics > tutorial (adegenetTutorial()). > > Cheers > Thibaut > > On 1 Sep 2016 20:15, "Alan Garcia-Elfring" > wrote: > > Hi everyone, > > I'm trying to use the IBD test shown on chapter 7 of the tutorial. The > example data used, spcaIllus, has spatial information that my .genepop > files do not have. > > How does one incorporate the spatial data into the "optional content" ? > > Any help is greatly appreciated. > > Best, > Alan > > > > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo > /adegenet-forum > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thibautjombart at gmail.com Mon Sep 26 18:11:58 2016 From: thibautjombart at gmail.com (Thibaut Jombart) Date: Mon, 26 Sep 2016 17:11:58 +0100 Subject: [adegenet-forum] Help to get membership probabilities from DAPC analyses In-Reply-To: References: Message-ID: Hi Catalina, I'm sorry for the long delay in the reply, your post was held amongst hundreds of moderated messages. In case it would still be relevant, you want to use 'predict' and get the $posterior from the output. For instance: > data(H3N2) > pop(H3N2) <- factor(H3N2$other$epid) > dapc1 <- dapc(H3N2, var.contrib=FALSE, scale=FALSE, n.pca=150, n.da=5) > pred <- predict(dapc1) > names(pred) [1] "assign" "posterior" "ind.scores" > head(pred$posterior) 2001 2002 2003 2004 2005 AB434107 0.9977417 2.258268e-03 1.616863e-18 8.134542e-31 6.793613e-41 AB434108 0.9977417 2.258268e-03 1.616863e-18 8.134542e-31 6.793613e-41 AB438242 1.0000000 4.059732e-20 2.300095e-35 5.708029e-47 1.457122e-55 AB438243 1.0000000 3.139077e-22 1.704267e-39 1.603998e-50 1.840345e-58 AB438244 1.0000000 8.422137e-22 1.018917e-39 1.747282e-50 2.828567e-58 AB438245 1.0000000 2.302028e-22 6.767904e-40 4.474589e-49 2.362144e-55 2006 AB434107 5.738948e-77 AB434108 5.738948e-77 AB438242 6.191441e-88 AB438243 2.226799e-91 AB438244 4.335585e-91 AB438245 4.015468e-88 > *Important disclaimer*: these values are group assignment probabilities defined based on a geometric criteria. They are not probabilities that genotypes come from each group as in STRUCTURE, so they cannot be directly compared. We are currently developing a STRUCTURE equivalent approach for this (though much faster), which will be an actual likelihood. Depending on how time-sensitive your work is, you may want to look into this new approach. If so, be in touch via email. All the best Thibaut -- Dr Thibaut Jombart Lecturer, Department of Infectious Disease Epidemiology, Imperial College London Head of RECON: https://reconhub.github.io/ https://sites.google.com/site/thibautjombart/ https://github.com/thibautjombart Twitter: @TeebzR On 12 August 2016 at 22:08, Catalina Salgado wrote: > Hello all! > > First apologize in advance if this question seems pretty basic. > I have been using adegenet lately and I understand mostly everything. But > I could not find the way to get the membership probabilities (the actual > values) that where calculated when a DAPC analysis was done. > I would like to have this file or information so I can compare it with > STRUCTURE membership probabilities. > > Thank you very much in advance! > > Best, > > -- > Catalina Salgado > USDA-ARS > Beltsville, MD > USA > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/ > listinfo/adegenet-forum > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thibautjombart at gmail.com Mon Sep 26 18:04:49 2016 From: thibautjombart at gmail.com (Thibaut Jombart) Date: Mon, 26 Sep 2016 17:04:49 +0100 Subject: [adegenet-forum] Problem DAPC scatterplot supplementary individuals In-Reply-To: <57B59BA1.20505@cirad.fr> References: <57B59BA1.20505@cirad.fr> Message-ID: Hello, this has been fixed in the devel version at revision a3498410e0097392fb86b44 Best Thibaut -- Dr Thibaut Jombart Lecturer, Department of Infectious Disease Epidemiology, Imperial College London Head of RECON: https://reconhub.github.io/ https://sites.google.com/site/thibautjombart/ https://github.com/thibautjombart Twitter: @TeebzR On 18 August 2016 at 12:27, vlabeyrie wrote: > Dear adegenet users, > > I have a problem when plotting supplementary individuals on DAPC > scatterplot. > > I defined two datasets: 1 to perform DAPC and another of supplementary > individuals > > x.sup_80<-Ge_atp_gcp_80[c(1:nrow(Ge_atp at tab)),] # supplementary > individuals > > x_80<-Ge_atp_gcp_80[-c(1:nrow(Ge_atp at tab)),] # Individuals on which > performing DAPC > > Then I performed DAPC on X_80, specifying a-priori groups > > dapc_GCP_14ssr_STRk5_b<- dapc(x_80, pop(x_80), n.pca=30,n.da=4) #perform > DAPC > > I assigned supplementary individuals to DAPC groups > > predict_atp_strk5<-predict.dapc(dapc_GCP_14ssr_STRk5_b,newdata=x.sup_80 > > The predicted group memberships of supplementary individuals based on DAPC > results is high, so I expect that supplementary individuals would be > located in the DAPC groups ... > > But it is not the case, as on the scatterplot, supplementary individuals > mostly appear outside from the groups to which they are assigned !! > > col<-c("#F8766D", "#A3A500", "#00BF7D", "#00B0F6", "#E76BF3") # colors for African collection individualscolb<-c("darkblue","dodgerblue","darkorange2","red","gold","grey") # colors for supplementary Mount Kenya individuals (according to their STRUCTURE group) > #axes 1 and 2col.points_80<-transp(col[as.integer(pop(x_80))],.2) # define the color of African individuals as transparentscatter(dapc_GCP_14ssr_STRk5_b,col=col,bg="white",scree.da=0,pch="",cstar=0,clab=0,xlim=c(-10,10),legend=F)# par(xpd=TRUE)points(dapc_GCP_14ssr_STRk5_b$ind.coord[,1], dapc_GCP_14ssr_STRk5_b$ind.coord[,2],pch=20,col=col.points_80,cex=1) ## scatter DAPC groups / African GCP dataset > col.sup_80<-colb[as.integer(pop(x.sup_80))< > span cl > ass="paren" style="box-sizing: border-box; color: rgb(104, 118, 135);">] ## Define supplementary individuals colorpoints(predict_atp_strk5$ind.scores[,1], predict_atp_strk5$ind.scores[,2],pch=8,col=transp(col.sup_80< > /span>,< > span class="number" style="box-sizing: border-box; color: rgb(0, 153, 153);">.7),cex=1) # plot supplementary individuals > > > With a previous version of adegenet, this problem did not appear as > supplementary individuals were located within the DAPC groups, and this > problem appeared while running my script with the new adegenet version ... > > Does someone have an idea of what the problem is ? I can provide the full > script and data if needed > > > > > > Thank you for your help ! > > -- > Vanesse Labeyrie > > > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/ > listinfo/adegenet-forum > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: gfhijbbh.png Type: image/png Size: 69217 bytes Desc: not available URL: From alexandros.lemopoulos at gmail.com Thu Sep 29 11:02:37 2016 From: alexandros.lemopoulos at gmail.com (Alexandre Lemo) Date: Thu, 29 Sep 2016 12:02:37 +0300 Subject: [adegenet-forum] xcal/optim.a.score consistency Message-ID: Dear Dr. Jombart and *adegenet* users, I am trying to run a DPCA on a dataset of 3975 SNPS obtained through RAD sequencing. Tere are 11 populations and 306 individuals examined here (minmum 16 ind /pop). Note that I am not using the find.cluster function. My problem is that I can't get any consistency in the number of PC that I should use for the DPCA. Actually, everytime I run *optim.a.score* or *xval*, I get different results. I tried changing the training set (tried 0.7, 0.8 and 0.9) but still the optimal PC retained change in each run. Here is an example of my script: #str is a genind object *optim_PC <- xvalDapc(tab(str, NA.method = "mean", training.set =0.9), pop(str), n.pca = 5:100, n.rep = 1000, parallel = "snow", ncpus = 4L* *optim_PC_2<- xvalDapc(tab(str, NA.method = "mean", training.set =0.9), pop(str), n.pca = 5:100, n.rep = 1000, parallel = "snow", ncpus = 4L*What happens here is that optim_PC will give me an optimal PC of (e.g) 76 while optim_PC_2 will give me 16. I tried running this several times and everytime results are different. I also tried using optim.a.score() : *dapc.str <- dapc(str, var.contrib = TRUE, scale = FALSE, n.pca = 100,n.da = NULL)* *optim.a.score (dapc.str)* Here, the number of PC will change everytime I run the function. Does anyone have an idea of why this is happening or had several issues? I am quite confused as results obviously change a lot depending on how many PC are used... Thanks for your help and for this great adegenet package! Best, Alexandre -------------- next part -------------- An HTML attachment was scrubbed... URL: