From thibautjombart at gmail.com Thu Jan 4 12:49:37 2018 From: thibautjombart at gmail.com (Thibaut Jombart) Date: Thu, 4 Jan 2018 11:49:37 +0000 Subject: [adegenet-forum] POV in DAPC In-Reply-To: References: Message-ID: Hi there, These values are in general percentage of inertia, which in the case of the PCA is a variance, and in the case of the DAPC is the ratio variance between / total variance (ie F statistic). Best Thibaut -- Dr Thibaut Jombart Lecturer, Department of Infectious Disease Epidemiology, Imperial College London Head of RECON: repidemicsconsortium.org WHO Consultant - outbreak analysis https://thibautjombart.netlify.com Twitter: @TeebzR +44(0)20 7594 3658 On 13 December 2017 at 13:08, Andrew Veale wrote: > Hey! > > I have been asked to add the percentage of variance explained by each axis > in the DAPC scatter plot. I?m not sure how best to do this, and I think it > isn?t the most meaningful thing to do as it isn?t a straight PCA. The POV > of a PCA is fine, and I can see how to get the Eigen-values for each PC > retained - just not the then combined ones for the DAPC. > > Any thoughts on this? > > Thanks! > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/ > listinfo/adegenet-forum -------------- next part -------------- An HTML attachment was scrubbed... URL: From michaeldonaldson at trentu.ca Fri Jan 19 16:46:52 2018 From: michaeldonaldson at trentu.ca (Mike Donaldson) Date: Fri, 19 Jan 2018 10:46:52 -0500 Subject: [adegenet-forum] Do you need to replace missing data before performing a glPca? Message-ID: Hello, I imported a vcf file and converted it to a genlight object using vcfR as follows: gl.x <- vcfR2genlight(file.vcf) ploidy(gl.x) <- 2 pop <- read.csv("popfile.csv", sep=",", header=TRUE) pop(gl.x) <- pop$Pop The dataset has ~10% missing data. When using the glPca function, do you need to first convert the missing data to the mean and transform it based on frequency using the tab function (tab(gl.x, NA.method="mean", freq=TRUE)), or does that happen "behind the scenes"? I ask because the glPca will proceed without that conversion, while a dudi.pca will not. Thank you for your time, Mike -------------- next part -------------- An HTML attachment was scrubbed... URL: From roman.lustrik at biolitika.si Fri Jan 19 17:55:51 2018 From: roman.lustrik at biolitika.si (Roman =?utf-8?Q?Lu=C5=A1trik?=) Date: Fri, 19 Jan 2018 17:55:51 +0100 (CET) Subject: [adegenet-forum] Do you need to replace missing data before performing a glPca? In-Reply-To: References: Message-ID: <1874430037.1113289.1516380951441.JavaMail.zimbra@biolitika.si> Could it be that it goes through because NAs are replaced by 0 ? ---- In god we trust, all others bring data. From: "Mike Donaldson" To: adegenet-forum at lists.r-forge.r-project.org Sent: Friday, January 19, 2018 4:46:52 PM Subject: [adegenet-forum] Do you need to replace missing data before performing a glPca? Hello, I imported a vcf file and converted it to a genlight object using vcfR as follows: gl.x <- vcfR2genlight(file.vcf) ploidy(gl.x) <- 2 pop <- read.csv("popfile.csv", sep=",", header=TRUE) pop(gl.x) <- pop$Pop The dataset has ~10% missing data. When using the glPca function, do you need to first convert the missing data to the mean and transform it based on frequency using the tab function ( tab(gl.x, NA.method="mean", freq=TRUE) ), or does that happen "behind the scenes"? I ask because the glPca will proceed without that conversion, while a dudi.pca will not. Thank you for your time, Mike _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum -------------- next part -------------- An HTML attachment was scrubbed... URL: From mike.donaldson at gmail.com Fri Jan 19 16:42:21 2018 From: mike.donaldson at gmail.com (Mike Donaldson) Date: Fri, 19 Jan 2018 10:42:21 -0500 Subject: [adegenet-forum] Do you need to replace missing data before performing a glPca? Message-ID: Hello, I imported a vcf file and converted it to a genlight object using vcfR as follows: gl.x <- vcfR2genlight(file.vcf) ploidy(gl.x) <- 2 pop <- read.csv("popfile.csv", sep=",", header=TRUE) pop(gl.x) <- pop$Pop The dataset has ~10% missing data. When using the glPca function, do you need to first convert the missing data to the mean and transform it based on frequency using the tab function (tab(gl.x, NA.method="mean", freq=TRUE)), or does that happen "behind the scenes"? I ask because the glPca will proceed without that conversion, while a dudi.pca will not. Thank you for your time, Mike -------------- next part -------------- An HTML attachment was scrubbed... URL: From sheamaddocklambert at gmail.com Sun Jan 28 23:11:54 2018 From: sheamaddocklambert at gmail.com (Shea Lambert) Date: Sun, 28 Jan 2018 15:11:54 -0700 Subject: [adegenet-forum] snapclust: NaN for proba Message-ID: Hello, I've been trying snapclust, and it runs/converges successfully, but all my entries for $proba are "NaN". I've tried removing all missing data, but still get "NaN" for everything. Any advice much appreciated. Shea -------------- next part -------------- An HTML attachment was scrubbed... URL: From thibautjombart at gmail.com Tue Jan 30 13:45:31 2018 From: thibautjombart at gmail.com (Thibaut Jombart) Date: Tue, 30 Jan 2018 12:45:31 +0000 Subject: [adegenet-forum] snapclust: NaN for proba In-Reply-To: References: Message-ID: Hi Shea my guess would also be missing data. Could you post that as an issue on github with a reproducible example? https://github.com/thibautjombart/adegenet/issues I'll look into it asap. Best Thibaut -- Dr Thibaut Jombart Lecturer, Department of Infectious Disease Epidemiology, Imperial College London Head of RECON: repidemicsconsortium.org WHO Consultant - outbreak analysis https://thibautjombart.netlify.com Twitter: @TeebzR +44(0)20 7594 3658 On 28 January 2018 at 22:11, Shea Lambert wrote: > Hello, > > I've been trying snapclust, and it runs/converges successfully, but all my > entries for $proba are "NaN". I've tried removing all missing data, but > still get "NaN" for everything. Any advice much appreciated. > > Shea > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/ > listinfo/adegenet-forum > -------------- next part -------------- An HTML attachment was scrubbed... URL: From benjamin.dauphin at wsl.ch Wed Jan 31 10:17:51 2018 From: benjamin.dauphin at wsl.ch (Benjamin Dauphin) Date: Wed, 31 Jan 2018 10:17:51 +0100 Subject: [adegenet-forum] Kmeans and DAPC on poolSeq data Message-ID: <22A6ABF6-1D2B-4DB6-9D52-5899300649A8@wsl.ch> Dear all, I am newly working on pool sequencing data and I simply wonder if I can use kmeans (find.cluster) and DAPC to investigate population structure from poolseq data (allele frequencies)? How find.clusters can deal with allele frequencies? Dataset: 7 pools and 100?000 SNPs Any comment or help would be much appreciated. Best regards Ben From nlv209 at hotmail.com Tue Jan 30 19:08:20 2018 From: nlv209 at hotmail.com (Nikki Vollmer) Date: Tue, 30 Jan 2018 18:08:20 +0000 Subject: [adegenet-forum] How to interpret Density Plot for K=2 Message-ID: Hi, I am trying to analyze ~200 RADseq loci for ~200 individuals. STRUCTURE results suggest the best number of populations given the data is 2. Pairwise Fst values are quite low for my taxa (<0.003) with pvalue 0.01802. I was trying to do a DAPC on this same data to compare results. DAPC similarly suggested the best # of clusters is 2 and I was able to plot a 1-dimensional density plot for the one DF I kept (attached). However, I am not sure how to interpret the plot. Is it correct to say that because the two peaks do not overlap that suggests the 2 clusters are quite differentiated from one another (similar to two clusters on a scatter plot being in opposite quadrants)? (...or is that logic flawed?) I am trying to figure out if these 2 groups are very genetically differentiated or not, and I am not clear what the density plot is supporting/suggesting. I very much appreciate any guidance on this matter! Thank you, Nikki -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Rplot.pdf Type: application/pdf Size: 1252439 bytes Desc: Rplot.pdf URL: From danielledanielle89 at gmail.com Wed Jan 31 01:18:46 2018 From: danielledanielle89 at gmail.com (Danielle Louise) Date: Wed, 31 Jan 2018 10:18:46 +1000 Subject: [adegenet-forum] snapclust Message-ID: Hello. I am looking at implementing your snapclust function, and I am reading through your recent paper. I have a few questions regarding incorporating empirical data. I have simulated data sets with parental and F1 F2 and BC and I am wondering how to incorporate the empirical data - do I add it in to the simulated data and measure the accuracy of the assignment to classes to then determine the reliability of detection of hybrids in the empirical data? The tutorial gives a good outline of using the simulated data, but I think I am missing something when it comes to checking the empirical data, so I am asking for some really practical advice about how to incorporate the empirical data ? Also should we bootstrap the final probabilities to clarify the results? Thanks Dan