From siobhan.dennison at mq.edu.au Tue Sep 9 08:31:35 2014 From: siobhan.dennison at mq.edu.au (Siobhan Dennison) Date: Tue, 9 Sep 2014 16:31:35 +1000 Subject: [adegenet-forum] Problems with find.cluster Message-ID: I am working on genetic structure of a threatened species, and as such have rather small sample sizes. Two of my four populations are out of HWE, and so I am using DAPC to look at population clustering because it does not assume HWE. The DAPC yielded 4 clusters as I expected, using the location information, and retaining a very conservative 11 PCs (following a.score). However, when I wanted to look at clustering with no location priors on the data, things got a bit weird. I used the find.clusters option in adegenet, and I keep getting very different results to my other analyses - the lowest BIC falls at K=1, but the BIC values are extremely low (~420), steadily increasing from there (I attached the graph FYI). My Fst values based on microsatellites suggest high differentiation between the 4 sites. I standardised my Fst values following Miermans 2006, which gave rather high Fst values (0.2-0.4). My mitochondrial Fst values are also high (>0.5). Using Structure with LOCprior (accounting for low sample sizes), I get K=4 as the most likely number of clusters, and PCA also shows delineation between the four sample sites. Given that all of my other analyses tell the same story (that there a four rather differentiated sites), I'm wondering if anyone can tell me where I might be going wrong here? Any pointers would be greatly appreciated!! Thanks, Siobhan -- -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: find.clusters output.pdf Type: application/pdf Size: 5249 bytes Desc: not available URL: From crypticlineage at gmail.com Fri Sep 12 19:31:46 2014 From: crypticlineage at gmail.com (Vikram Chhatre) Date: Fri, 12 Sep 2014 13:31:46 -0400 Subject: [adegenet-forum] Per locus pairwise Fst In-Reply-To: References: <2CB2DA8E426F3541AB1907F98ABA6570A8233A2D@icexch-m1.ic.ac.uk> Message-ID: I am revisiting this topic due to some technical problems. The task at hand is to estimate pairwise Fst matrices for each locus separately. # Genind object is stored in: gen100_genind # Use seploc to separate loci: gen100_seploc <- seploc(gen100_genind, truenames=TRUE, res.type=c('genind', 'matrix') # Calculate pairwise Fst: gen100_perLocusPWFst <- lapply(gen100_seploc, pairwise.fst, res.type=c('dist', 'matrix'), trunames=TRUE) For a data set consisting of 30 populations, 20 individuals each, 1000 loci and 2 alleles per locus (1.2 million data points), it takes up to 6 hours to estimate the pairwise Fst matrix with this method. Is there any way to speed this up? Should I look into any other packages? Many thanks for your time and help. Vikram On Mon, Jul 14, 2014 at 9:16 AM, Vikram Chhatre wrote: > Perfect! Thank you for both solutions. > > V > > > On Mon, Jul 14, 2014 at 9:13 AM, Jombart, Thibaut < > t.jombart at imperial.ac.uk> wrote: > >> >> Hi there, >> >> you can use seploc to separate loci, and lapply over the resulting list >> using your prefered fst function. >> >> Cheers >> Thibaut >> ________________________________________ >> From: adegenet-forum-bounces at lists.r-forge.r-project.org [ >> adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Vikram >> Chhatre [crypticlineage at gmail.com] >> Sent: 14 July 2014 14:01 >> To: adegenet-forum at lists.r-forge.r-project.org >> Subject: [adegenet-forum] Per locus pairwise Fst >> >> Good morning. >> >> I would like to estimate per locus pairwise Fst for populations, but it >> appears that Adegenet only estimates this over all loci (i.e. single >> matrix). What I would like is one matrix per locus. Has anyone modified >> the functions or know of alternative programs that can do this? >> >> Thanks >> Vikram >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vojta at trapa.cz Sat Sep 13 19:47:34 2014 From: vojta at trapa.cz (=?utf-8?B?Vm9qdMSbY2g=?= Zeisek) Date: Sat, 13 Sep 2014 19:47:34 +0200 Subject: [adegenet-forum] Per locus pairwise Fst In-Reply-To: References: Message-ID: <1821182.AM9yii2LuR@veles.site> Hello, R is basically single threaded. Some packages/functions implement parallelisation and if not, You can do it yourself. As You use function from apply family, it should be easy, although I don't have solution right available in my pocket. There are also several possibilities and it might require some testing to find out the best solution for You task and equipment. See http://www.r-bloggers.com/parallel-computing-in-r/ and details about mentioned functions on http://cran.r-project.org/ When You google for parallel computing in R, You get many links... Good luck! Vojt?ch Dne P? 12. z??? 2014 13:31:46, Vikram Chhatre napsal(a): > I am revisiting this topic due to some technical problems. > > The task at hand is to estimate pairwise Fst matrices for each locus > separately. > > # Genind object is stored in: > gen100_genind > > # Use seploc to separate loci: > gen100_seploc <- seploc(gen100_genind, truenames=TRUE, res.type=c('genind', > 'matrix') > > # Calculate pairwise Fst: > gen100_perLocusPWFst <- lapply(gen100_seploc, pairwise.fst, > res.type=c('dist', 'matrix'), trunames=TRUE) > > For a data set consisting of 30 populations, 20 individuals each, 1000 loci > and 2 alleles per locus (1.2 million data points), it takes up to 6 hours > to estimate the pairwise Fst matrix with this method. > > Is there any way to speed this up? Should I look into any other packages? > > Many thanks for your time and help. > Vikram > > On Mon, Jul 14, 2014 at 9:16 AM, Vikram Chhatre > > wrote: > > Perfect! Thank you for both solutions. > > > > V > > > > On Mon, Jul 14, 2014 at 9:13 AM, Jombart, Thibaut < > > > > t.jombart at imperial.ac.uk> wrote: > >> Hi there, > >> > >> you can use seploc to separate loci, and lapply over the resulting list > >> using your prefered fst function. > >> > >> Cheers > >> Thibaut > >> ________________________________________ > >> From: adegenet-forum-bounces at lists.r-forge.r-project.org [ > >> adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Vikram > >> Chhatre [crypticlineage at gmail.com] > >> Sent: 14 July 2014 14:01 > >> To: adegenet-forum at lists.r-forge.r-project.org > >> Subject: [adegenet-forum] Per locus pairwise Fst > >> > >> Good morning. > >> > >> I would like to estimate per locus pairwise Fst for populations, but it > >> appears that Adegenet only estimates this over all loci (i.e. single > >> matrix). What I would like is one matrix per locus. Has anyone modified > >> the functions or know of alternative programs that can do this? > >> > >> Thanks > >> Vikram -- Vojt?ch Zeisek http://trapa.cz/en/ Department of Botany, Faculty of Science Charles University in Prague Ben?tsk? 2, Prague, 12801, CZ http://botany.natur.cuni.cz/en/ Institute of Botany, Academy of Science Z?mek 1, Pr?honice, 25243, CZ http://www.ibot.cas.cz/en/ Czech Republic -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 490 bytes Desc: This is a digitally signed message part. URL: From t.jombart at imperial.ac.uk Sat Sep 13 20:20:39 2014 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Sat, 13 Sep 2014 18:20:39 +0000 Subject: [adegenet-forum] Per locus pairwise Fst In-Reply-To: References: <2CB2DA8E426F3541AB1907F98ABA6570A8233A2D@icexch-m1.ic.ac.uk> , Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570A825D6A1@icexch-m1.ic.ac.uk> Hi there, yes, this function is not optimized for large datasets. You can use the same approach but using functions from the hierfstat package. Cheers Thibaut ________________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Vikram Chhatre [crypticlineage at gmail.com] Sent: 12 September 2014 18:31 To: adegenet-forum at lists.r-forge.r-project.org Subject: Re: [adegenet-forum] Per locus pairwise Fst I am revisiting this topic due to some technical problems. The task at hand is to estimate pairwise Fst matrices for each locus separately. # Genind object is stored in: gen100_genind # Use seploc to separate loci: gen100_seploc <- seploc(gen100_genind, truenames=TRUE, res.type=c('genind', 'matrix') # Calculate pairwise Fst: gen100_perLocusPWFst <- lapply(gen100_seploc, pairwise.fst, res.type=c('dist', 'matrix'), trunames=TRUE) For a data set consisting of 30 populations, 20 individuals each, 1000 loci and 2 alleles per locus (1.2 million data points), it takes up to 6 hours to estimate the pairwise Fst matrix with this method. Is there any way to speed this up? Should I look into any other packages? Many thanks for your time and help. Vikram On Mon, Jul 14, 2014 at 9:16 AM, Vikram Chhatre > wrote: Perfect! Thank you for both solutions. V On Mon, Jul 14, 2014 at 9:13 AM, Jombart, Thibaut > wrote: Hi there, you can use seploc to separate loci, and lapply over the resulting list using your prefered fst function. Cheers Thibaut ________________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Vikram Chhatre [crypticlineage at gmail.com] Sent: 14 July 2014 14:01 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] Per locus pairwise Fst Good morning. I would like to estimate per locus pairwise Fst for populations, but it appears that Adegenet only estimates this over all loci (i.e. single matrix). What I would like is one matrix per locus. Has anyone modified the functions or know of alternative programs that can do this? Thanks Vikram From t.jombart at imperial.ac.uk Sat Sep 13 20:22:21 2014 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Sat, 13 Sep 2014 18:22:21 +0000 Subject: [adegenet-forum] Per locus pairwise Fst In-Reply-To: <1821182.AM9yii2LuR@veles.site> References: , <1821182.AM9yii2LuR@veles.site> Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570A825D6B1@icexch-m1.ic.ac.uk> On non-windows systems, mclapply can be used to get a nice speedup, but really the first thing to do is use a function which does computations in a more optimal way. Cheers Thibaut ________________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Vojt?ch Zeisek [vojta at trapa.cz] Sent: 13 September 2014 18:47 To: adegenet-forum at lists.r-forge.r-project.org Subject: Re: [adegenet-forum] Per locus pairwise Fst Hello, R is basically single threaded. Some packages/functions implement parallelisation and if not, You can do it yourself. As You use function from apply family, it should be easy, although I don't have solution right available in my pocket. There are also several possibilities and it might require some testing to find out the best solution for You task and equipment. See http://www.r-bloggers.com/parallel-computing-in-r/ and details about mentioned functions on http://cran.r-project.org/ When You google for parallel computing in R, You get many links... Good luck! Vojt?ch Dne P? 12. z??? 2014 13:31:46, Vikram Chhatre napsal(a): > I am revisiting this topic due to some technical problems. > > The task at hand is to estimate pairwise Fst matrices for each locus > separately. > > # Genind object is stored in: > gen100_genind > > # Use seploc to separate loci: > gen100_seploc <- seploc(gen100_genind, truenames=TRUE, res.type=c('genind', > 'matrix') > > # Calculate pairwise Fst: > gen100_perLocusPWFst <- lapply(gen100_seploc, pairwise.fst, > res.type=c('dist', 'matrix'), trunames=TRUE) > > For a data set consisting of 30 populations, 20 individuals each, 1000 loci > and 2 alleles per locus (1.2 million data points), it takes up to 6 hours > to estimate the pairwise Fst matrix with this method. > > Is there any way to speed this up? Should I look into any other packages? > > Many thanks for your time and help. > Vikram > > On Mon, Jul 14, 2014 at 9:16 AM, Vikram Chhatre > > wrote: > > Perfect! Thank you for both solutions. > > > > V > > > > On Mon, Jul 14, 2014 at 9:13 AM, Jombart, Thibaut < > > > > t.jombart at imperial.ac.uk> wrote: > >> Hi there, > >> > >> you can use seploc to separate loci, and lapply over the resulting list > >> using your prefered fst function. > >> > >> Cheers > >> Thibaut > >> ________________________________________ > >> From: adegenet-forum-bounces at lists.r-forge.r-project.org [ > >> adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Vikram > >> Chhatre [crypticlineage at gmail.com] > >> Sent: 14 July 2014 14:01 > >> To: adegenet-forum at lists.r-forge.r-project.org > >> Subject: [adegenet-forum] Per locus pairwise Fst > >> > >> Good morning. > >> > >> I would like to estimate per locus pairwise Fst for populations, but it > >> appears that Adegenet only estimates this over all loci (i.e. single > >> matrix). What I would like is one matrix per locus. Has anyone modified > >> the functions or know of alternative programs that can do this? > >> > >> Thanks > >> Vikram -- Vojt?ch Zeisek http://trapa.cz/en/ Department of Botany, Faculty of Science Charles University in Prague Ben?tsk? 2, Prague, 12801, CZ http://botany.natur.cuni.cz/en/ Institute of Botany, Academy of Science Z?mek 1, Pr?honice, 25243, CZ http://www.ibot.cas.cz/en/ Czech Republic From crypticlineage at gmail.com Sat Sep 13 22:48:07 2014 From: crypticlineage at gmail.com (Vikram Chhatre) Date: Sat, 13 Sep 2014 16:48:07 -0400 Subject: [adegenet-forum] Per locus pairwise Fst In-Reply-To: <2CB2DA8E426F3541AB1907F98ABA6570A825D6A1@icexch-m1.ic.ac.uk> References: <2CB2DA8E426F3541AB1907F98ABA6570A8233A2D@icexch-m1.ic.ac.uk> <2CB2DA8E426F3541AB1907F98ABA6570A825D6A1@icexch-m1.ic.ac.uk> Message-ID: Thank you for all the replies. I have been looking at the pp.fst() function in the Hierfstat package. Does the post-seploc data frame need to be converted into something that Hierfstat understands first? The following doesn't seem to work: # Use seploc to separate loci: gen100_seploc <- seploc(gen100_genind, truenames=TRUE, res.type=c('genind', 'matrix') # Load Hierfstat library(hierfstat) # Calculate pairwise Fst: gen100_perLocusPWFst <- lapply(gen100_seploc, pp.fst, diploid=TRUE) Error in unique.default(Pop) : unique() applies only to vectors On Sat, Sep 13, 2014 at 2:20 PM, Jombart, Thibaut wrote: > > Hi there, > > yes, this function is not optimized for large datasets. You can use the > same approach but using functions from the hierfstat package. > > Cheers > Thibaut > ________________________________________ > From: adegenet-forum-bounces at lists.r-forge.r-project.org [ > adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Vikram > Chhatre [crypticlineage at gmail.com] > Sent: 12 September 2014 18:31 > To: adegenet-forum at lists.r-forge.r-project.org > Subject: Re: [adegenet-forum] Per locus pairwise Fst > > I am revisiting this topic due to some technical problems. > > The task at hand is to estimate pairwise Fst matrices for each locus > separately. > > # Genind object is stored in: > gen100_genind > > # Use seploc to separate loci: > gen100_seploc <- seploc(gen100_genind, truenames=TRUE, > res.type=c('genind', 'matrix') > > # Calculate pairwise Fst: > gen100_perLocusPWFst <- lapply(gen100_seploc, pairwise.fst, > res.type=c('dist', 'matrix'), trunames=TRUE) > > For a data set consisting of 30 populations, 20 individuals each, 1000 > loci and 2 alleles per locus (1.2 million data points), it takes up to 6 > hours to estimate the pairwise Fst matrix with this method. > > Is there any way to speed this up? Should I look into any other packages? > > Many thanks for your time and help. > Vikram > > > > > On Mon, Jul 14, 2014 at 9:16 AM, Vikram Chhatre > wrote: > Perfect! Thank you for both solutions. > > V > > > On Mon, Jul 14, 2014 at 9:13 AM, Jombart, Thibaut < > t.jombart at imperial.ac.uk> wrote: > > Hi there, > > you can use seploc to separate loci, and lapply over the resulting list > using your prefered fst function. > > Cheers > Thibaut > ________________________________________ > From: adegenet-forum-bounces at lists.r-forge.r-project.org adegenet-forum-bounces at lists.r-forge.r-project.org> [ > adegenet-forum-bounces at lists.r-forge.r-project.org adegenet-forum-bounces at lists.r-forge.r-project.org>] on behalf of Vikram > Chhatre [crypticlineage at gmail.com] > Sent: 14 July 2014 14:01 > To: adegenet-forum at lists.r-forge.r-project.org adegenet-forum at lists.r-forge.r-project.org> > Subject: [adegenet-forum] Per locus pairwise Fst > > Good morning. > > I would like to estimate per locus pairwise Fst for populations, but it > appears that Adegenet only estimates this over all loci (i.e. single > matrix). What I would like is one matrix per locus. Has anyone modified > the functions or know of alternative programs that can do this? > > Thanks > Vikram > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.jombart at imperial.ac.uk Sun Sep 14 21:45:33 2014 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Sun, 14 Sep 2014 19:45:33 +0000 Subject: [adegenet-forum] Per locus pairwise Fst In-Reply-To: References: <2CB2DA8E426F3541AB1907F98ABA6570A8233A2D@icexch-m1.ic.ac.uk> <2CB2DA8E426F3541AB1907F98ABA6570A825D6A1@icexch-m1.ic.ac.uk>, Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570A825D858@icexch-m1.ic.ac.uk> Yes, you need to use: ?genind2hierfstat Cheers Thibaut ________________________________________ From: Vikram Chhatre [crypticlineage at gmail.com] Sent: 13 September 2014 21:48 To: adegenet-forum at lists.r-forge.r-project.org; Jombart, Thibaut Subject: Re: [adegenet-forum] Per locus pairwise Fst Thank you for all the replies. I have been looking at the pp.fst() function in the Hierfstat package. Does the post-seploc data frame need to be converted into something that Hierfstat understands first? The following doesn't seem to work: # Use seploc to separate loci: gen100_seploc <- seploc(gen100_genind, truenames=TRUE, res.type=c('genind', 'matrix') # Load Hierfstat library(hierfstat) # Calculate pairwise Fst: gen100_perLocusPWFst <- lapply(gen100_seploc, pp.fst, diploid=TRUE) Error in unique.default(Pop) : unique() applies only to vectors On Sat, Sep 13, 2014 at 2:20 PM, Jombart, Thibaut > wrote: Hi there, yes, this function is not optimized for large datasets. You can use the same approach but using functions from the hierfstat package. Cheers Thibaut ________________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Vikram Chhatre [crypticlineage at gmail.com] Sent: 12 September 2014 18:31 To: adegenet-forum at lists.r-forge.r-project.org Subject: Re: [adegenet-forum] Per locus pairwise Fst I am revisiting this topic due to some technical problems. The task at hand is to estimate pairwise Fst matrices for each locus separately. # Genind object is stored in: gen100_genind # Use seploc to separate loci: gen100_seploc <- seploc(gen100_genind, truenames=TRUE, res.type=c('genind', 'matrix') # Calculate pairwise Fst: gen100_perLocusPWFst <- lapply(gen100_seploc, pairwise.fst, res.type=c('dist', 'matrix'), trunames=TRUE) For a data set consisting of 30 populations, 20 individuals each, 1000 loci and 2 alleles per locus (1.2 million data points), it takes up to 6 hours to estimate the pairwise Fst matrix with this method. Is there any way to speed this up? Should I look into any other packages? Many thanks for your time and help. Vikram On Mon, Jul 14, 2014 at 9:16 AM, Vikram Chhatre >> wrote: Perfect! Thank you for both solutions. V On Mon, Jul 14, 2014 at 9:13 AM, Jombart, Thibaut >> wrote: Hi there, you can use seploc to separate loci, and lapply over the resulting list using your prefered fst function. Cheers Thibaut ________________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org> [adegenet-forum-bounces at lists.r-forge.r-project.org>] on behalf of Vikram Chhatre [crypticlineage at gmail.com>] Sent: 14 July 2014 14:01 To: adegenet-forum at lists.r-forge.r-project.org> Subject: [adegenet-forum] Per locus pairwise Fst Good morning. I would like to estimate per locus pairwise Fst for populations, but it appears that Adegenet only estimates this over all loci (i.e. single matrix). What I would like is one matrix per locus. Has anyone modified the functions or know of alternative programs that can do this? Thanks Vikram From caroline.duffie at gmail.com Tue Sep 16 22:45:25 2014 From: caroline.duffie at gmail.com (Caroline Judy) Date: Tue, 16 Sep 2014 16:45:25 -0400 Subject: [adegenet-forum] randomize pop labels in a genind object for randomization experiment. Message-ID: Hi Thibaut, Vikram, and others: I'd like to try a randomization experiment to further explore my radseq data using DAPC. Data structure: 40 individuals in 2 (apriori) populations 6451 SNP loci My data are for two very closely related "species" which show little to no divergence at traditional markers. I performed a DAPC using a priori pop definitions (set as species). The function can discriminate my species, but the allelic contributions are very low ( highest few around .0015). I am interested in trying a randomization experiment in which I shuffle the population labels 100 times and then perform DAPC on each of these. Ultimately the goal is to compare allelic loadings for the discriminant function generated using true labels vs. randomized labels. I am fairly new to R. A colleague suggested the general format to create a loop, but could anyone offer a solution that could be implemented with a genind object? Otherwise, I think it would be too labor intensive - I would have to create 100 different structure input files to be converted to genind objects. nrep<- 100 results<- list() # or vector/matrix, depending on the case For(I in 1:nrep) { Rand.labels<- sample(labels) ## do some analyses and assign relevant results to results } Thanks, Caroline On Sun, Sep 14, 2014 at 3:45 PM, Jombart, Thibaut wrote: > > Yes, you need to use: > ?genind2hierfstat > > Cheers > Thibaut > > ________________________________________ > From: Vikram Chhatre [crypticlineage at gmail.com] > Sent: 13 September 2014 21:48 > To: adegenet-forum at lists.r-forge.r-project.org; Jombart, Thibaut > Subject: Re: [adegenet-forum] Per locus pairwise Fst > > Thank you for all the replies. I have been looking at the pp.fst() > function in the Hierfstat package. Does the post-seploc data frame need to > be converted into something that Hierfstat understands first? The > following doesn't seem to work: > > # Use seploc to separate loci: > gen100_seploc <- seploc(gen100_genind, truenames=TRUE, > res.type=c('genind', 'matrix') > > # Load Hierfstat > library(hierfstat) > > # Calculate pairwise Fst: > gen100_perLocusPWFst <- lapply(gen100_seploc, pp.fst, diploid=TRUE) > > Error in unique.default(Pop) : unique() applies only to vectors > > On Sat, Sep 13, 2014 at 2:20 PM, Jombart, Thibaut < > t.jombart at imperial.ac.uk> wrote: > > Hi there, > > yes, this function is not optimized for large datasets. You can use the > same approach but using functions from the hierfstat package. > > Cheers > Thibaut > ________________________________________ > From: adegenet-forum-bounces at lists.r-forge.r-project.org adegenet-forum-bounces at lists.r-forge.r-project.org> [ > adegenet-forum-bounces at lists.r-forge.r-project.org adegenet-forum-bounces at lists.r-forge.r-project.org>] on behalf of Vikram > Chhatre [crypticlineage at gmail.com] > Sent: 12 September 2014 18:31 > To: adegenet-forum at lists.r-forge.r-project.org adegenet-forum at lists.r-forge.r-project.org> > Subject: Re: [adegenet-forum] Per locus pairwise Fst > > I am revisiting this topic due to some technical problems. > > The task at hand is to estimate pairwise Fst matrices for each locus > separately. > > # Genind object is stored in: > gen100_genind > > # Use seploc to separate loci: > gen100_seploc <- seploc(gen100_genind, truenames=TRUE, > res.type=c('genind', 'matrix') > > # Calculate pairwise Fst: > gen100_perLocusPWFst <- lapply(gen100_seploc, pairwise.fst, > res.type=c('dist', 'matrix'), trunames=TRUE) > > For a data set consisting of 30 populations, 20 individuals each, 1000 > loci and 2 alleles per locus (1.2 million data points), it takes up to 6 > hours to estimate the pairwise Fst matrix with this method. > > Is there any way to speed this up? Should I look into any other packages? > > Many thanks for your time and help. > Vikram > > > > > On Mon, Jul 14, 2014 at 9:16 AM, Vikram Chhatre crypticlineage at gmail.com>>> wrote: > Perfect! Thank you for both solutions. > > V > > > On Mon, Jul 14, 2014 at 9:13 AM, Jombart, Thibaut < > t.jombart at imperial.ac.uk t.jombart at imperial.ac.uk>> wrote: > > Hi there, > > you can use seploc to separate loci, and lapply over the resulting list > using your prefered fst function. > > Cheers > Thibaut > ________________________________________ > From: adegenet-forum-bounces at lists.r-forge.r-project.org adegenet-forum-bounces at lists.r-forge.r-project.org> adegenet-forum-bounces at lists.r-forge.r-project.org adegenet-forum-bounces at lists.r-forge.r-project.org>> [ > adegenet-forum-bounces at lists.r-forge.r-project.org adegenet-forum-bounces at lists.r-forge.r-project.org> adegenet-forum-bounces at lists.r-forge.r-project.org adegenet-forum-bounces at lists.r-forge.r-project.org>>] on behalf of Vikram > Chhatre [crypticlineage at gmail.com crypticlineage at gmail.com>] > Sent: 14 July 2014 14:01 > To: adegenet-forum at lists.r-forge.r-project.org adegenet-forum at lists.r-forge.r-project.org> adegenet-forum at lists.r-forge.r-project.org adegenet-forum at lists.r-forge.r-project.org>> > Subject: [adegenet-forum] Per locus pairwise Fst > > Good morning. > > I would like to estimate per locus pairwise Fst for populations, but it > appears that Adegenet only estimates this over all loci (i.e. single > matrix). What I would like is one matrix per locus. Has anyone modified > the functions or know of alternative programs that can do this? > > Thanks > Vikram > > > > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carla.rivarossi at gmail.com Wed Sep 17 16:02:37 2014 From: carla.rivarossi at gmail.com (Carla Riva Rossi) Date: Wed, 17 Sep 2014 11:02:37 -0300 Subject: [adegenet-forum] assignplot Message-ID: Hi Everyone, I would like to change the color scheme in an assignplot to represent membership probabilities with gray colors (where black =1, white=0) instead of heat colors and then add a scale legend with the probability intervals. Is there a way to do that? Thanks in advance for the answers. Carla Riva Rossi.- -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.jombart at imperial.ac.uk Thu Sep 18 11:41:12 2014 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Thu, 18 Sep 2014 09:41:12 +0000 Subject: [adegenet-forum] randomize pop labels in a genind object for randomization experiment. In-Reply-To: References: Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570A82686D8@icexch-m1.ic.ac.uk> Hi there, no need to recode everything: what you describe is cross-validation, and it is implemented in adegenet. See ?xvalDapc Cheers Thibaut ________________________________________ From: Caroline Judy [caroline.duffie at gmail.com] Sent: 16 September 2014 21:45 To: Jombart, Thibaut Cc: Vikram Chhatre; adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] randomize pop labels in a genind object for randomization experiment. Hi Thibaut, Vikram, and others: I'd like to try a randomization experiment to further explore my radseq data using DAPC. Data structure: 40 individuals in 2 (apriori) populations 6451 SNP loci My data are for two very closely related "species" which show little to no divergence at traditional markers. I performed a DAPC using a priori pop definitions (set as species). The function can discriminate my species, but the allelic contributions are very low ( highest few around .0015). I am interested in trying a randomization experiment in which I shuffle the population labels 100 times and then perform DAPC on each of these. Ultimately the goal is to compare allelic loadings for the discriminant function generated using true labels vs. randomized labels. I am fairly new to R. A colleague suggested the general format to create a loop, but could anyone offer a solution that could be implemented with a genind object? Otherwise, I think it would be too labor intensive - I would have to create 100 different structure input files to be converted to genind objects. nrep<- 100 results<- list() # or vector/matrix, depending on the case For(I in 1:nrep) { Rand.labels<- sample(labels) ## do some analyses and assign relevant results to results } Thanks, Caroline On Sun, Sep 14, 2014 at 3:45 PM, Jombart, Thibaut > wrote: Yes, you need to use: ?genind2hierfstat Cheers Thibaut ________________________________________ From: Vikram Chhatre [crypticlineage at gmail.com] Sent: 13 September 2014 21:48 To: adegenet-forum at lists.r-forge.r-project.org; Jombart, Thibaut Subject: Re: [adegenet-forum] Per locus pairwise Fst Thank you for all the replies. I have been looking at the pp.fst() function in the Hierfstat package. Does the post-seploc data frame need to be converted into something that Hierfstat understands first? The following doesn't seem to work: # Use seploc to separate loci: gen100_seploc <- seploc(gen100_genind, truenames=TRUE, res.type=c('genind', 'matrix') # Load Hierfstat library(hierfstat) # Calculate pairwise Fst: gen100_perLocusPWFst <- lapply(gen100_seploc, pp.fst, diploid=TRUE) Error in unique.default(Pop) : unique() applies only to vectors On Sat, Sep 13, 2014 at 2:20 PM, Jombart, Thibaut >> wrote: Hi there, yes, this function is not optimized for large datasets. You can use the same approach but using functions from the hierfstat package. Cheers Thibaut ________________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org> [adegenet-forum-bounces at lists.r-forge.r-project.org>] on behalf of Vikram Chhatre [crypticlineage at gmail.com>] Sent: 12 September 2014 18:31 To: adegenet-forum at lists.r-forge.r-project.org> Subject: Re: [adegenet-forum] Per locus pairwise Fst I am revisiting this topic due to some technical problems. The task at hand is to estimate pairwise Fst matrices for each locus separately. # Genind object is stored in: gen100_genind # Use seploc to separate loci: gen100_seploc <- seploc(gen100_genind, truenames=TRUE, res.type=c('genind', 'matrix') # Calculate pairwise Fst: gen100_perLocusPWFst <- lapply(gen100_seploc, pairwise.fst, res.type=c('dist', 'matrix'), trunames=TRUE) For a data set consisting of 30 populations, 20 individuals each, 1000 loci and 2 alleles per locus (1.2 million data points), it takes up to 6 hours to estimate the pairwise Fst matrix with this method. Is there any way to speed this up? Should I look into any other packages? Many thanks for your time and help. Vikram On Mon, Jul 14, 2014 at 9:16 AM, Vikram Chhatre >>>> wrote: Perfect! Thank you for both solutions. V On Mon, Jul 14, 2014 at 9:13 AM, Jombart, Thibaut >>>> wrote: Hi there, you can use seploc to separate loci, and lapply over the resulting list using your prefered fst function. Cheers Thibaut ________________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org>>> [adegenet-forum-bounces at lists.r-forge.r-project.org>>>] on behalf of Vikram Chhatre [crypticlineage at gmail.com>>>] Sent: 14 July 2014 14:01 To: adegenet-forum at lists.r-forge.r-project.org>>> Subject: [adegenet-forum] Per locus pairwise Fst Good morning. I would like to estimate per locus pairwise Fst for populations, but it appears that Adegenet only estimates this over all loci (i.e. single matrix). What I would like is one matrix per locus. Has anyone modified the functions or know of alternative programs that can do this? Thanks Vikram _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum From caitiecollins at gmail.com Thu Sep 18 14:46:16 2014 From: caitiecollins at gmail.com (Caitlin Collins) Date: Thu, 18 Sep 2014 13:46:16 +0100 Subject: [adegenet-forum] Problems with find.cluster In-Reply-To: References: Message-ID: Hi Siobhan, As a preliminary suggestion that will be easy to investigate, I would suggest that perhaps the number of PCs retained is affecting your results from find.clusters. Have you had a look at the xvalDapc function? Similar to a.score, xvalDapc can be used to help mediate the trade-off between discriminatory power and over-fitting. I would be curious to see what xvalDapc recommends as the number of PCs to retain to best differentiate the four groups you are identifying via other methods. If the optimal number of PCs selected by xvalDapc for the four groups is greater than the 11 PCs you have selected with a.score, this would suggest that you may not have enough information for the BIC to identify more than one cluster, so I would recommend re-running find.clusters with the number of PCs suggested by xvalDapc to see if you get different results. Of course, it is possible that the problem lies elsewhere, or that according to the BIC there is simply not enough evidence for more than one cluster, but at least it will be very easy to check this theory. Please let us know the results and we can then continue to search for other solutions if necessary. Best, Caitlin. On Tue, Sep 9, 2014 at 7:31 AM, Siobhan Dennison wrote: > I am working on genetic structure of a threatened species, and as such > have rather small sample sizes. Two of my four populations are out of HWE, > and so I am using DAPC to look at population clustering because it does not > assume HWE. > > The DAPC yielded 4 clusters as I expected, using the location information, > and retaining a very conservative 11 PCs (following a.score). However, when > I wanted to look at clustering with no location priors on the data, things > got a bit weird. I used the find.clusters option in adegenet, and I keep > getting very different results to my other analyses - the lowest BIC falls > at K=1, but the BIC values are extremely low (~420), steadily increasing > from there (I attached the graph FYI). > > My Fst values based on microsatellites suggest high differentiation > between the 4 sites. I standardised my Fst values following Miermans 2006, > which gave rather high Fst values (0.2-0.4). My mitochondrial Fst values > are also high (>0.5). > > Using Structure with LOCprior (accounting for low sample sizes), I get K=4 > as the most likely number of clusters, and PCA also shows delineation > between the four sample sites. > > Given that all of my other analyses tell the same story (that there a four > rather differentiated sites), I'm wondering if anyone can tell me where I > might be going wrong here? > > Any pointers would be greatly appreciated!! > > Thanks, > Siobhan > -- > > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum > -------------- next part -------------- An HTML attachment was scrubbed... URL: From maria.david.salas at gmail.com Thu Sep 18 16:08:31 2014 From: maria.david.salas at gmail.com (Maria del Carmen David) Date: Thu, 18 Sep 2014 09:08:31 -0500 Subject: [adegenet-forum] how to label individuals in scatter(dapc) Message-ID: Hello. I can't find the way of labeling individuals in my dapc graphic. I know that for those who work with huge amount of individuals it isn't necessary but i have a bit less than 150 and i want to see how they plot. I have used assignplot to get a better idea of the group assignments but it would be extremely helpful to be able to label my samples. Thanks in advance. Maria del Carmen -------------- next part -------------- An HTML attachment was scrubbed... URL: From caroline.duffie at gmail.com Thu Sep 18 16:20:47 2014 From: caroline.duffie at gmail.com (Caroline Judy) Date: Thu, 18 Sep 2014 10:20:47 -0400 Subject: [adegenet-forum] randomize pop labels in a genind object for randomization experiment. In-Reply-To: <2CB2DA8E426F3541AB1907F98ABA6570A82686D8@icexch-m1.ic.ac.uk> References: <2CB2DA8E426F3541AB1907F98ABA6570A82686D8@icexch-m1.ic.ac.uk> Message-ID: I've been working through the tutorial again, and I now see (and understand) the randomization step that is part of cross validation. I'm so glad Adegenet has this formalized test. Thanks so much. C On Thu, Sep 18, 2014 at 5:41 AM, Jombart, Thibaut wrote: > > Hi there, > > no need to recode everything: what you describe is cross-validation, and > it is implemented in adegenet. See ?xvalDapc > > Cheers > > Thibaut > > > ________________________________________ > From: Caroline Judy [caroline.duffie at gmail.com] > Sent: 16 September 2014 21:45 > To: Jombart, Thibaut > Cc: Vikram Chhatre; adegenet-forum at lists.r-forge.r-project.org > Subject: [adegenet-forum] randomize pop labels in a genind object for > randomization experiment. > > Hi Thibaut, Vikram, and others: > > I'd like to try a randomization experiment to further explore my radseq > data using DAPC. > > Data structure: > 40 individuals in 2 (apriori) populations > 6451 SNP loci > > My data are for two very closely related "species" which show little to no > divergence at traditional markers. I performed a DAPC using a priori pop > definitions (set as species). The function can discriminate my species, but > the allelic contributions are very low ( highest few around .0015). > > I am interested in trying a randomization experiment in which I shuffle > the population labels 100 times and then perform DAPC on each of these. > Ultimately the goal is to compare allelic loadings for the discriminant > function generated using true labels vs. randomized labels. > > I am fairly new to R. A colleague suggested the general format to create a > loop, but could anyone offer a solution that could be implemented with a > genind object? Otherwise, I think it would be too labor intensive - I would > have to create 100 different structure input files to be converted to > genind objects. > > nrep<- 100 > results<- list() # or vector/matrix, depending on the case > For(I in 1:nrep) > { > Rand.labels<- sample(labels) > ## do some analyses and assign relevant results to results > } > > Thanks, > Caroline > > > On Sun, Sep 14, 2014 at 3:45 PM, Jombart, Thibaut < > t.jombart at imperial.ac.uk> wrote: > > Yes, you need to use: > ?genind2hierfstat > > Cheers > Thibaut > > ________________________________________ > From: Vikram Chhatre [crypticlineage at gmail.com crypticlineage at gmail.com>] > Sent: 13 September 2014 21:48 > To: adegenet-forum at lists.r-forge.r-project.org adegenet-forum at lists.r-forge.r-project.org>; Jombart, Thibaut > Subject: Re: [adegenet-forum] Per locus pairwise Fst > > Thank you for all the replies. I have been looking at the pp.fst() > function in the Hierfstat package. Does the post-seploc data frame need to > be converted into something that Hierfstat understands first? The > following doesn't seem to work: > > # Use seploc to separate loci: > gen100_seploc <- seploc(gen100_genind, truenames=TRUE, > res.type=c('genind', 'matrix') > > # Load Hierfstat > library(hierfstat) > > # Calculate pairwise Fst: > gen100_perLocusPWFst <- lapply(gen100_seploc, pp.fst, diploid=TRUE) > > Error in unique.default(Pop) : unique() applies only to vectors > > On Sat, Sep 13, 2014 at 2:20 PM, Jombart, Thibaut < > t.jombart at imperial.ac.uk t.jombart at imperial.ac.uk>> wrote: > > Hi there, > > yes, this function is not optimized for large datasets. You can use the > same approach but using functions from the hierfstat package. > > Cheers > Thibaut > ________________________________________ > From: adegenet-forum-bounces at lists.r-forge.r-project.org adegenet-forum-bounces at lists.r-forge.r-project.org> adegenet-forum-bounces at lists.r-forge.r-project.org adegenet-forum-bounces at lists.r-forge.r-project.org>> [ > adegenet-forum-bounces at lists.r-forge.r-project.org adegenet-forum-bounces at lists.r-forge.r-project.org> adegenet-forum-bounces at lists.r-forge.r-project.org adegenet-forum-bounces at lists.r-forge.r-project.org>>] on behalf of Vikram > Chhatre [crypticlineage at gmail.com crypticlineage at gmail.com>] > Sent: 12 September 2014 18:31 > To: adegenet-forum at lists.r-forge.r-project.org adegenet-forum at lists.r-forge.r-project.org> adegenet-forum at lists.r-forge.r-project.org adegenet-forum at lists.r-forge.r-project.org>> > Subject: Re: [adegenet-forum] Per locus pairwise Fst > > I am revisiting this topic due to some technical problems. > > The task at hand is to estimate pairwise Fst matrices for each locus > separately. > > # Genind object is stored in: > gen100_genind > > # Use seploc to separate loci: > gen100_seploc <- seploc(gen100_genind, truenames=TRUE, > res.type=c('genind', 'matrix') > > # Calculate pairwise Fst: > gen100_perLocusPWFst <- lapply(gen100_seploc, pairwise.fst, > res.type=c('dist', 'matrix'), trunames=TRUE) > > For a data set consisting of 30 populations, 20 individuals each, 1000 > loci and 2 alleles per locus (1.2 million data points), it takes up to 6 > hours to estimate the pairwise Fst matrix with this method. > > Is there any way to speed this up? Should I look into any other packages? > > Many thanks for your time and help. > Vikram > > > > > On Mon, Jul 14, 2014 at 9:16 AM, Vikram Chhatre crypticlineage at gmail.com>> crypticlineage at gmail.com> crypticlineage at gmail.com>>>> wrote: > Perfect! Thank you for both solutions. > > V > > > On Mon, Jul 14, 2014 at 9:13 AM, Jombart, Thibaut < > t.jombart at imperial.ac.uk t.jombart at imperial.ac.uk> t.jombart at imperial.ac.uk t.jombart at imperial.ac.uk>>> wrote: > > Hi there, > > you can use seploc to separate loci, and lapply over the resulting list > using your prefered fst function. > > Cheers > Thibaut > ________________________________________ > From: adegenet-forum-bounces at lists.r-forge.r-project.org adegenet-forum-bounces at lists.r-forge.r-project.org> adegenet-forum-bounces at lists.r-forge.r-project.org adegenet-forum-bounces at lists.r-forge.r-project.org>> adegenet-forum-bounces at lists.r-forge.r-project.org adegenet-forum-bounces at lists.r-forge.r-project.org> adegenet-forum-bounces at lists.r-forge.r-project.org adegenet-forum-bounces at lists.r-forge.r-project.org>>> [ > adegenet-forum-bounces at lists.r-forge.r-project.org adegenet-forum-bounces at lists.r-forge.r-project.org> adegenet-forum-bounces at lists.r-forge.r-project.org adegenet-forum-bounces at lists.r-forge.r-project.org>> adegenet-forum-bounces at lists.r-forge.r-project.org adegenet-forum-bounces at lists.r-forge.r-project.org> adegenet-forum-bounces at lists.r-forge.r-project.org adegenet-forum-bounces at lists.r-forge.r-project.org>>>] on behalf of > Vikram Chhatre [crypticlineage at gmail.com > >> >>>] > Sent: 14 July 2014 14:01 > To: adegenet-forum at lists.r-forge.r-project.org adegenet-forum at lists.r-forge.r-project.org> adegenet-forum at lists.r-forge.r-project.org adegenet-forum at lists.r-forge.r-project.org>> adegenet-forum at lists.r-forge.r-project.org adegenet-forum at lists.r-forge.r-project.org> adegenet-forum at lists.r-forge.r-project.org adegenet-forum at lists.r-forge.r-project.org>>> > Subject: [adegenet-forum] Per locus pairwise Fst > > Good morning. > > I would like to estimate per locus pairwise Fst for populations, but it > appears that Adegenet only estimates this over all loci (i.e. single > matrix). What I would like is one matrix per locus. Has anyone modified > the functions or know of alternative programs that can do this? > > Thanks > Vikram > > > > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org adegenet-forum at lists.r-forge.r-project.org> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum > -------------- next part -------------- An HTML attachment was scrubbed... URL: From caitiecollins at gmail.com Thu Sep 18 17:49:10 2014 From: caitiecollins at gmail.com (Caitlin Collins) Date: Thu, 18 Sep 2014 16:49:10 +0100 Subject: [adegenet-forum] how to label individuals in scatter(dapc) In-Reply-To: References: Message-ID: Hi, If you think the individuals you are plotting are spaced far enough that you will be able to read labels at the individual level, one way to do it is to use s.label. Here is an example of how to use s.label to overlap labels to a scatterplot of DAPC: ############# ## EXAMPLE ## ############# set.seed(14) # generate a simulated dataset with 3 populations simpop <- glSim(100, 500, 5, k=3, sort.pop=TRUE) # isolate the SNPs and the population factor snps <- as.matrix(simpop) phen <- simpop at other$ancestral.pops # run a dapc dapc1 <- dapc(snps, phen, n.pca=20, n.da=4) # create the scatter plot as before scatter(dapc1, cstar=0, cex=5, label=NULL) # change graphical parameter to subsequently overlay the labels without drawing a new plot par(new=TRUE) # make a data frame of the dapc coordinates used in scatter df <- data.frame(x = dapc1$ind.coord[,1], y = dapc1$ind.coord[,2]) # identify/ create a vector of names for the individuals in your plot noms <- paste("ind", c(1:100), sep=".") # use the text function to add labels to the positions given by the coordinates you used in plot s.label(dfxy = df, xax=1, yax=2, label=noms, clabel=0.7, # change the size of the labels boxes=TRUE, # if points are spaced wide enough, can use TRUE to add boxes around the labels grid=FALSE, addaxes=FALSE) # do not draw lines or axes in addition to the labels The comments in the example above hopefully should give you all of the relevant information, so please give them a read and then feel free to let me know if you have any questions. You will almost certainly want to play around with the arguments clabel and boxes in the s.label function to get the labels to be readable for your case. I hope that helps! Best, Caitlin. On Thu, Sep 18, 2014 at 3:08 PM, Maria del Carmen David < maria.david.salas at gmail.com> wrote: > Hello. I can't find the way of labeling individuals in my dapc graphic. I > know that for those who work with huge amount of individuals it isn't > necessary but i have a bit less than 150 and i want to see how they plot. I > have used assignplot to get a better idea of the group assignments but it > would be extremely helpful to be able to label my samples. Thanks in > advance. > > Maria del Carmen > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jackie.Lighten at Dal.Ca Mon Sep 22 13:59:47 2014 From: Jackie.Lighten at Dal.Ca (Jackie Lighten) Date: Mon, 22 Sep 2014 11:59:47 +0000 Subject: [adegenet-forum] Trouble converting to genid object Message-ID: Hi, I am having trouble converting a presence/absence genotype data frame to a genid object Please see attached for test data file. Using obj2 <- genind(test, ploidy=1, type="PA") I get the error: Error in `colnames<-`(`*tmp*`, value = c("L1", "L2")) : length of 'dimnames' [2] not equal to array extent Using obj2 <- df2genind(test, ploidy=1, type="PA") I get the error: Error in `colnames<-`(`*tmp*`, value = "L1") : length of 'dimnames' [2] not equal to array extent In addition: Warning messages: 1: In eval(expr, envir, enclos) : NAs introduced by coercion 2: In df2genind(test, ploidy = 1, type = "PA") : entirely non-type marker(s) deleted Any help would be much appreciated Thanks, Jack -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: test.txt URL: From t.jombart at imperial.ac.uk Thu Sep 25 12:04:17 2014 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Thu, 25 Sep 2014 10:04:17 +0000 Subject: [adegenet-forum] Trouble converting to genid object In-Reply-To: References: Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570A826B775@icexch-m1.ic.ac.uk> Hi there, it looks like a bug. I'll investigate and get back to you. Cheers Thibaut ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Jackie Lighten [Jackie.Lighten at Dal.Ca] Sent: 22 September 2014 12:59 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] Trouble converting to genid object Hi, I am having trouble converting a presence/absence genotype data frame to a genid object Please see attached for test data file. Using obj2 <- genind(test, ploidy=1, type="PA") I get the error: Error in `colnames<-`(`*tmp*`, value = c("L1", "L2")) : length of 'dimnames' [2] not equal to array extent Using obj2 <- df2genind(test, ploidy=1, type="PA") I get the error: Error in `colnames<-`(`*tmp*`, value = "L1") : length of 'dimnames' [2] not equal to array extent In addition: Warning messages: 1: In eval(expr, envir, enclos) : NAs introduced by coercion 2: In df2genind(test, ploidy = 1, type = "PA") : entirely non-type marker(s) deleted Any help would be much appreciated Thanks, Jack -------------- next part -------------- An HTML attachment was scrubbed... URL: From jean-luc.legras at supagro.inra.fr Thu Sep 25 16:02:15 2014 From: jean-luc.legras at supagro.inra.fr (Jean-Luc LEGRAS) Date: Thu, 25 Sep 2014 16:02:15 +0200 Subject: [adegenet-forum] Combining genetic and phenotypic data? Message-ID: Hello, I am a adegenet user, and I saw the discussion about the joined analysis of phenotypic and genetic data: ? [adegenet-forum] Combining genetic and phenotypic data? ? one year ago. We have genotyped yeast population by sequencing had pheotyped them as well for the production of many metabolites. I was wondering if you have already implemented such functions in adegenet, are if this is on the way. Thank you in advance for your answer. Best regards. Jean-Luc PS: Je suis d?sol? d?avoir rat? votre passage ? Montpellier, mais j??tais pris le jour de votre visite au printemps dernier. -------------- next part -------------- An HTML attachment was scrubbed... URL: From caitiecollins at gmail.com Fri Sep 26 18:27:48 2014 From: caitiecollins at gmail.com (Caitlin Collins) Date: Fri, 26 Sep 2014 17:27:48 +0100 Subject: [adegenet-forum] adegenet - a.score.opt vs. xvalDAPC In-Reply-To: <24b1342a49a4456fb17abb347c5f96e5@icexch-h3.ic.ac.uk> References: <24b1342a49a4456fb17abb347c5f96e5@icexch-h3.ic.ac.uk> Message-ID: Good question. Essentially these are just two different approaches to the same problem of trying to find the optimal number of PCs to retain in DAPC. The short answer is: *Use xvalDapc instead of optim.a.score.* optim.a.score was our first approach, and xvalDapc is our new and improved approach? xvalDapc is easier to interpret and is likely to give better results. ------ If you?re just generally curious about the two approaches, I can offer a brief description and an explanation of the way I think about them, at least: Both methods rely on repeated measurements to perform model validation relating to the impact of the number of PCs on the ability of the model to predict the correct group membership of all individuals in the dataset. In cross-validation with xvalDapc, DAPC is performed (with increasing numbers of PCs) on a ?training set? (typically 90% of the dataset) and then we project the individuals left out of the analysis onto the discriminant axes constructed by DAPC. We measure how accurately we can place this left-out 10% of individuals in the multidimensional space (in which their position corresponds to their group membership). With too few PCs retained, we fail to correctly assign the validation set of individuals to the correct groups because we simply do not have enough information. With too many PCs retained, we also begin to fail to correctly assign these individuals, because essentially now all we are doing is over-describing each of the individuals in the training set instead of painting a general picture of just those features that relate to their group structure. This over-description merely adds ?noise? that drowns out the group-defining ?signal? that we had been attempting to summarise. We perform the cross-validation procedure repeatedly (each time varying the number of PCs retained) with different training and validation sets until we find the right signal-to-noise ratio, the goldilocks point between weak discrimination and unstable results. When using the a.score to achieve this aim, we repeatedly perform DAPC with different numbers of retained PCs; but, by contrast to xvalDapc, we keep all individuals in the analysis. Instead, with optim.a.score, at each level of PC retention, we measure reassignment success to the real populations of interest, and also measure that ?success? to fake randomized populations. If there is any real group structure to be identified in the dataset, the optimal level of PC retention will be the one at which our ability to assign individuals to their real groups exceeds by the greatest margin our ability to assign individuals to the false groupings, calculated as Pt ? Pr, ie. probability of reassignment to the True cluster vs. the Random cluster. With too few PCs, the probability of successful reassignment will be low for both the true clusters and the random ones. On the other hand, with too many PCs, you have so much information retained that you could paint effectively any picture of groupings in the data, so reassignment success to the false clusters will begin to approach that to the true clusters and the a-score will decline, once again leaving a goldilocks point in the middle of the arc indicating the optimal number of PCs to retain. The results of cross-validation and optim.a.score should not give completely contradictory results, but they may not always give the same result. If results differed, we would always recommend that you use the results of xvalDapc over optim.a.score, hence you may as well just not worry about optim.a.score in the first place. Hope that helps! Best, Caitlin. On Sat, Sep 20, 2014 at 10:35 PM, Judy (Duffie), Caroline wrote: > Dear Dr. Collins, > > I was wondering if you could help me understand the difference between > using a.score.opt. vs. xvalDapc. It seems that both methods are used to > determine the number of PCs to retain in the DAPC. Why and when would you > use one method vs. the other? > > Thanks for any clarification you can offer. I?ve been through the papers > and the tutorials, but am still trying to wrap my mind around these > procedures. > > Caroline > > Caroline D. Judy, PhD Candidate & Peter Buck Fellow > National Museum of Natural History > Smithsonian Institution > judyc at si.edu > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: