From weave271 at umn.edu Sun Mar 3 19:53:16 2019 From: weave271 at umn.edu (Samuel Weaver) Date: Sun, 3 Mar 2019 12:53:16 -0600 Subject: [adegenet-forum] Replacing NAs in genlight object Message-ID: I'm trying to change NAs in my dataset to the mean allele frequency to run various PopGen analyses through adegenet, but have been having trouble. Commands start below. Data<-vcfR2genlight(MyVcf) Data /// GENLIGHT OBJECT ///////// // 11 genotypes, 5,467 binary SNPs, size: 877.8 Kb 7647 (12.72 %) missing data // Basic content @gen: list of 11 SNPbin // Optional content @ind.names: 11 individual labels @loc.names: 5467 locus labels @chromosome: factor storing chromosomes of the SNPs @position: integer storing positions of the SNPs @other: a list containing: elements without names I then try to run the command listed in the tutorial and elsewhere on the forum, which is as follows: tab(Data, NA.method="mean"). This, however, does nothing to replace the NA values in the original "Data" genlight object. I've looked for hours on how to do this with no avail, so any help would be greatly appreciated. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kamvarz at science.oregonstate.edu Tue Mar 5 10:43:59 2019 From: kamvarz at science.oregonstate.edu (Zhian Kamvar) Date: Tue, 5 Mar 2019 09:43:59 +0000 Subject: [adegenet-forum] adegenet-forum Digest, Vol 126, Issue 1 In-Reply-To: References: Message-ID: Hi Samuel, The tab function simply gives back a matrix of allele counts (for a genlight object). It won't replace the values IN the genlight object (since R is side-effect free). Because of the way a genlight object works, you can't replace the values with the average counts of alleles, so what you might want to do is to use `tab()` to get the allele matrix with missing replaced with the mean value and then use `celing()` on that matrix to transform the average values to the nearest allele copy number (note: you could also use `round()`, but this will always round to the nearest even digit). From there, you can convert back to a genlight with the missing values replaced. On Mon, Mar 4, 2019 at 11:00 AM < adegenet-forum-request at lists.r-forge.r-project.org> wrote: > Send adegenet-forum mailing list submissions to > adegenet-forum at lists.r-forge.r-project.org > > To subscribe or unsubscribe via the World Wide Web, visit > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum > > or, via email, send a message with subject or body 'help' to > adegenet-forum-request at lists.r-forge.r-project.org > > You can reach the person managing the list at > adegenet-forum-owner at lists.r-forge.r-project.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of adegenet-forum digest..." > > > Today's Topics: > > 1. Replacing NAs in genlight object (Samuel Weaver) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Sun, 3 Mar 2019 12:53:16 -0600 > From: Samuel Weaver > To: adegenet-forum at lists.r-forge.r-project.org > Subject: [adegenet-forum] Replacing NAs in genlight object > Message-ID: > enha-qA at mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > I'm trying to change NAs in my dataset to the mean allele frequency to run > various PopGen analyses through adegenet, but have been having trouble. > Commands start below. > > Data<-vcfR2genlight(MyVcf) > Data > /// GENLIGHT OBJECT ///////// > > // 11 genotypes, 5,467 binary SNPs, size: 877.8 Kb > 7647 (12.72 %) missing data > > // Basic content > @gen: list of 11 SNPbin > > // Optional content > @ind.names: 11 individual labels > @loc.names: 5467 locus labels > @chromosome: factor storing chromosomes of the SNPs > @position: integer storing positions of the SNPs > @other: a list containing: elements without names > > I then try to run the command listed in the tutorial and elsewhere on the > forum, which is as follows: > > tab(Data, NA.method="mean"). > > This, however, does nothing to replace the NA values in the original "Data" > genlight object. I've looked for hours on how to do this with no avail, so > any help would be greatly appreciated. > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20190303/cabd61d5/attachment-0001.html > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum > > ------------------------------ > > End of adegenet-forum Digest, Vol 126, Issue 1 > ********************************************** > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rturba at outlook.com Tue Mar 5 20:24:42 2019 From: rturba at outlook.com (Rachel Turba) Date: Tue, 5 Mar 2019 19:24:42 +0000 Subject: [adegenet-forum] Scripting adegenet with whole-genome data Message-ID: Hi all, I have researched this on the forum but was not successful in finding help, so I apologize in advance if this has been addressed already. In this tutorial (https://grunwaldlab.github.io/Population_Genetics_in_R/DAPC.html) it says that adegenet cannot be scripted a priori, but my data has 6Gb and takes a very long time to generate the DAPC object. So I have to submit it as a script to the university cluster. I get the plot of variance explained by PC but then everything halts because I do not provide the number of PCs to be retained for continuing the analysis. Is there a way around this? How does anyone handle large datasets? Thank you in advance, Rachel -------------- next part -------------- An HTML attachment was scrubbed... URL: From briank.lists at gmail.com Tue Mar 5 23:05:03 2019 From: briank.lists at gmail.com (brian knaus) Date: Tue, 5 Mar 2019 14:05:03 -0800 Subject: [adegenet-forum] Scripting adegenet with whole-genome data In-Reply-To: References: Message-ID: Hi Rachel, Demographic patterns are generally thought to occur throughout the genome. In which case you shouldn't need the entire genome but can use a representative subset of your variants. We've demonstrated how to accomplish this at the below link. https://grunwaldlab.github.io/Population_Genetics_in_R/gbs_analysis.html#subsetting-a-vcfr-object-to-200-random-variants Good luck! Brian On Tue, Mar 5, 2019 at 11:24 AM Rachel Turba wrote: > Hi all, > > > > I have researched this on the forum but was not successful in finding > help, so I apologize in advance if this has been addressed already. > > > > In this tutorial ( > https://grunwaldlab.github.io/Population_Genetics_in_R/DAPC.html) it says > that adegenet cannot be scripted a priori, but my data has 6Gb and takes a > very long time to generate the DAPC object. So I have to submit it as a > script to the university cluster. I get the plot of variance explained by > PC but then everything halts because I do not provide the number of PCs to > be retained for continuing the analysis. Is there a way around this? How > does anyone handle large datasets? > > > > Thank you in advance, > > Rachel > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum -------------- next part -------------- An HTML attachment was scrubbed... URL: From rturba at outlook.com Wed Mar 6 20:14:18 2019 From: rturba at outlook.com (Rachel Turba) Date: Wed, 6 Mar 2019 19:14:18 +0000 Subject: [adegenet-forum] Scripting adegenet with whole-genome data In-Reply-To: References: , Message-ID: Hi Brian, Thank you for the help. I was avoiding to subset my data because I am interested in investigating SNPs that could be related to a change in morph between these populations. Cheers, Rachel ________________________________ From: brian knaus Sent: Tuesday, March 5, 2019 2:05:03 PM To: Rachel Turba Cc: adegenet-forum at lists.r-forge.r-project.org Subject: Re: [adegenet-forum] Scripting adegenet with whole-genome data Hi Rachel, Demographic patterns are generally thought to occur throughout the genome. In which case you shouldn't need the entire genome but can use a representative subset of your variants. We've demonstrated how to accomplish this at the below link. https://grunwaldlab.github.io/Population_Genetics_in_R/gbs_analysis.html#subsetting-a-vcfr-object-to-200-random-variants Good luck! Brian On Tue, Mar 5, 2019 at 11:24 AM Rachel Turba > wrote: Hi all, I have researched this on the forum but was not successful in finding help, so I apologize in advance if this has been addressed already. In this tutorial (https://grunwaldlab.github.io/Population_Genetics_in_R/DAPC.html) it says that adegenet cannot be scripted a priori, but my data has 6Gb and takes a very long time to generate the DAPC object. So I have to submit it as a script to the university cluster. I get the plot of variance explained by PC but then everything halts because I do not provide the number of PCs to be retained for continuing the analysis. Is there a way around this? How does anyone handle large datasets? Thank you in advance, Rachel _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum -------------- next part -------------- An HTML attachment was scrubbed... URL: From briank.lists at gmail.com Wed Mar 6 22:27:02 2019 From: briank.lists at gmail.com (brian knaus) Date: Wed, 6 Mar 2019 13:27:02 -0800 Subject: [adegenet-forum] Scripting adegenet with whole-genome data In-Reply-To: References: Message-ID: Hi Rachel, I believe that the statement for why DAPC can not be scripted is because the steps to determine the number of principal components and discriminant functions are interactive. If you know this information you can send the DAPC step to a server without a monitor. Not sure DAPC is going to help you find SNPs associated with phenotypes though. Good luck! Brian On Wed, Mar 6, 2019 at 11:14 AM Rachel Turba wrote: > Hi Brian, > > > > Thank you for the help. I was avoiding to subset my data because I am > interested in investigating SNPs that could be related to a change in morph > between these populations. > > > > Cheers, > > Rachel > > > ------------------------------ > *From:* brian knaus > *Sent:* Tuesday, March 5, 2019 2:05:03 PM > *To:* Rachel Turba > *Cc:* adegenet-forum at lists.r-forge.r-project.org > *Subject:* Re: [adegenet-forum] Scripting adegenet with whole-genome data > > Hi Rachel, > > Demographic patterns are generally thought to occur throughout the genome. > In which case you shouldn't need the entire genome but can use a > representative subset of your variants. We've demonstrated how to > accomplish this at the below link. > > > https://grunwaldlab.github.io/Population_Genetics_in_R/gbs_analysis.html#subsetting-a-vcfr-object-to-200-random-variants > > Good luck! > Brian > > On Tue, Mar 5, 2019 at 11:24 AM Rachel Turba wrote: > >> Hi all, >> >> >> >> I have researched this on the forum but was not successful in finding >> help, so I apologize in advance if this has been addressed already. >> >> >> >> In this tutorial ( >> https://grunwaldlab.github.io/Population_Genetics_in_R/DAPC.html) it >> says that adegenet cannot be scripted a priori, but my data has 6Gb and >> takes a very long time to generate the DAPC object. So I have to submit it >> as a script to the university cluster. I get the plot of variance explained >> by PC but then everything halts because I do not provide the number of PCs >> to be retained for continuing the analysis. Is there a way around this? How >> does anyone handle large datasets? >> >> >> >> Thank you in advance, >> >> Rachel >> _______________________________________________ >> adegenet-forum mailing list >> adegenet-forum at lists.r-forge.r-project.org >> >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum -------------- next part -------------- An HTML attachment was scrubbed... URL: From rturba at outlook.com Wed Mar 6 23:17:02 2019 From: rturba at outlook.com (Rachel Turba) Date: Wed, 6 Mar 2019 22:17:02 +0000 Subject: [adegenet-forum] Scripting adegenet with whole-genome data In-Reply-To: References: , Message-ID: Hi Brian Yes, I think I will have to try that. I am not sure either if it will work but in the tutorial there was a step that looked at SNPs contributing to the variation in the set. I just thought it would be worth trying as an alternative tool alongside genome wide Fst. Thank you so much for the advice and help. Cheers, Rachel ________________________________ From: brian knaus Sent: Wednesday, March 6, 2019 1:27:02 PM To: Rachel Turba Cc: adegenet-forum at lists.r-forge.r-project.org Subject: Re: [adegenet-forum] Scripting adegenet with whole-genome data Hi Rachel, I believe that the statement for why DAPC can not be scripted is because the steps to determine the number of principal components and discriminant functions are interactive. If you know this information you can send the DAPC step to a server without a monitor. Not sure DAPC is going to help you find SNPs associated with phenotypes though. Good luck! Brian On Wed, Mar 6, 2019 at 11:14 AM Rachel Turba > wrote: Hi Brian, Thank you for the help. I was avoiding to subset my data because I am interested in investigating SNPs that could be related to a change in morph between these populations. Cheers, Rachel ________________________________ From: brian knaus > Sent: Tuesday, March 5, 2019 2:05:03 PM To: Rachel Turba Cc: adegenet-forum at lists.r-forge.r-project.org Subject: Re: [adegenet-forum] Scripting adegenet with whole-genome data Hi Rachel, Demographic patterns are generally thought to occur throughout the genome. In which case you shouldn't need the entire genome but can use a representative subset of your variants. We've demonstrated how to accomplish this at the below link. https://grunwaldlab.github.io/Population_Genetics_in_R/gbs_analysis.html#subsetting-a-vcfr-object-to-200-random-variants Good luck! Brian On Tue, Mar 5, 2019 at 11:24 AM Rachel Turba > wrote: Hi all, I have researched this on the forum but was not successful in finding help, so I apologize in advance if this has been addressed already. In this tutorial (https://grunwaldlab.github.io/Population_Genetics_in_R/DAPC.html) it says that adegenet cannot be scripted a priori, but my data has 6Gb and takes a very long time to generate the DAPC object. So I have to submit it as a script to the university cluster. I get the plot of variance explained by PC but then everything halts because I do not provide the number of PCs to be retained for continuing the analysis. Is there a way around this? How does anyone handle large datasets? Thank you in advance, Rachel _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum -------------- next part -------------- An HTML attachment was scrubbed... URL: