From paolo.momigliano at gmail.com Thu Aug 6 09:00:53 2015 From: paolo.momigliano at gmail.com (Paolo Momigliano) Date: Thu, 6 Aug 2015 17:00:53 +1000 Subject: [adegenet-forum] [adegenet forum] hw.test on genind object Message-ID: Hi all, i am having some real trouble with a very simple task: running hw.test from pegas which is supposed to work with adegenet. i tried importing my files both from genepop and fstat using the read.genepop and read.fstat functions: below an example code # load libraries library(adegenet) library(pegas) # import data neu<-read.genepop("snp1.gen", ncode = 3L) # Calculate HWE for first pop HWE_Ning<-hw.test(neu[1:23,], B=100) at which point, i get warnings: >There were 50 or more warnings (use warnings() to see the first 50) > warnings() Warning messages: 1: In O - E : longer object length is not a multiple of shorter object length 2: In (O - E)^2/E : longer object length is not a multiple of shorter object length 3: In O - E : longer object length is not a multiple of shorter object length 4: In (O - E)^2/E : longer object length is not a multiple of shorter object length 5: In O - E : longer object length is not a multiple of shorter object length 6: In (O - E)^2/E : longer object length is not a multiple of shorter object length 7: In O - E : longer object length is not a multiple of shorter object length etc.... The hw.test function works just fine on the "nancycat" dataset from Adegenet . But it doesn't whenever i import a dataset. I know the datasets are not corrupted, as i have carried out a lot of other analyses in adegenet and diveRsity and never had a problem... This problem persists if i try to convert the genind object using the as.loci function before running hw.test. Do you have any suggestion? Thanks so much! Paolo -------------- next part -------------- An HTML attachment was scrubbed... URL: From paolo.momigliano at gmail.com Thu Aug 6 09:17:01 2015 From: paolo.momigliano at gmail.com (Paolo Momigliano) Date: Thu, 6 Aug 2015 17:17:01 +1000 Subject: [adegenet-forum] [adegenet forum] hw.test on genind object In-Reply-To: References: Message-ID: Hi all, i am having some real trouble with a very simple task: running hw.test from pegas which is supposed to work with adegenet. i tried importing my files both from genepop and fstat using the read.genepop and read.fstat functions: below an example code # load libraries library(adegenet) library(pegas) # import data neu<-read.genepop("snp1.gen", ncode = 3L) # Calculate HWE for first pop HWE_Ning<-hw.test(neu[1:23,], B=100) at which point, i get warnings: >There were 50 or more warnings (use warnings() to see the first 50) > warnings() Warning messages: 1: In O - E : longer object length is not a multiple of shorter object length 2: In (O - E)^2/E : longer object length is not a multiple of shorter object length 3: In O - E : longer object length is not a multiple of shorter object length 4: In (O - E)^2/E : longer object length is not a multiple of shorter object length 5: In O - E : longer object length is not a multiple of shorter object length 6: In (O - E)^2/E : longer object length is not a multiple of shorter object length 7: In O - E : longer object length is not a multiple of shorter object length etc.... The hw.test function works just fine on the "nancycat" dataset from Adegenet . But it doesn't whenever i import a dataset. I know the datasets are not corrupted, as i have carried out a lot of other analyses in adegenet and diveRsity and never had a problem... This problem persists if i try to convert the genind object using the as.loci function before running hw.test. Do you have any suggestion? Thanks so much! Paolo > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vojta at trapa.cz Thu Aug 6 09:18:25 2015 From: vojta at trapa.cz (=?utf-8?B?Vm9qdMSbY2g=?= Zeisek) Date: Thu, 06 Aug 2015 09:18:25 +0200 Subject: [adegenet-forum] [adegenet forum] hw.test on genind object In-Reply-To: References: Message-ID: <1937877.EyhbUskscb@veles> Hi Dne ?t 6. srpna 2015 17:00:53, Paolo Momigliano napsal(a): > Hi all, > > i am having some real trouble with a very simple task: running hw.test from > pegas which is supposed to work with adegenet. > > i tried importing my files both from genepop and fstat using the > read.genepop and read.fstat functions: > > below an example code > > # load libraries > library(adegenet) > library(pegas) > > # import data > neu<-read.genepop("snp1.gen", ncode = 3L) I use to use this function only with loci object (I use to start with pegas' read.loci function to import data), although man page says it should work also with genind. Might be You could try to convert Your genind object into loci by genind2loci function. > # Calculate HWE for first pop > HWE_Ning<-hw.test(neu[1:23,], B=100) > > at which point, i get warnings: > >There were 50 or more warnings (use warnings() to see the first 50) > > > > warnings() > > Warning messages: > 1: In O - E : longer object length is not a multiple of shorter object > length > 2: In (O - E)^2/E : > longer object length is not a multiple of shorter object length > 3: In O - E : longer object length is not a multiple of shorter object > length > 4: In (O - E)^2/E : > longer object length is not a multiple of shorter object length > 5: In O - E : longer object length is not a multiple of shorter object > length > 6: In (O - E)^2/E : > longer object length is not a multiple of shorter object length > 7: In O - E : longer object length is not a multiple of shorter object > length > > etc.... > > The hw.test function works just fine on the "nancycat" dataset from > Adegenet . But it doesn't whenever i import a dataset. I know the datasets > are not corrupted, as i have carried out a lot of other analyses in > adegenet and diveRsity and never had a problem... > > This problem persists if i try to convert the genind object using the > as.loci function before running hw.test. > > Do you have any suggestion? Adegenet also has function HWE.test.genind. It works fine for me on genind objects. There is also function HWE.test in package genetics, but I haven't used it myself. > Thanks so much! > > Paolo HTH, Vojt?ch -- Vojt?ch Zeisek http://trapa.cz/en/ Department of Botany, Faculty of Science Charles University in Prague Ben?tsk? 2, Prague, 12801, CZ http://botany.natur.cuni.cz/en/ Institute of Botany, Academy of Science Z?mek 1, Pr?honice, 25243, CZ http://www.ibot.cas.cz/en/ Czech Republic -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 473 bytes Desc: This is a digitally signed message part. URL: From paolo.momigliano at gmail.com Thu Aug 6 09:31:20 2015 From: paolo.momigliano at gmail.com (Paolo Momigliano) Date: Thu, 6 Aug 2015 17:31:20 +1000 Subject: [adegenet-forum] [adegenet forum] hw.test on genind object In-Reply-To: <1937877.EyhbUskscb@veles> References: <1937877.EyhbUskscb@veles> Message-ID: Thanks for the tip, unfortunately using the genind2loci function to convert does not help (i guess that does the same as as.loci?). i used the HWE.test function in adegenet too before, yet i understand that this function is not present anymore in the new adegenet and its functionality is supposed to be replaced by pegas hwe.test? If i try to run HWE.est from adegenet i get this > HWE.test(neu) Error: could not find function "HWE.test" or > HWE.test.genind(neu) As of adegenet_1.5-0, this function has been removed and is replaced by 'hw.test' in the package 'pegas' Did anyone have the same problem? Cheers Paolo On Thu, Aug 6, 2015 at 5:18 PM, Vojt?ch Zeisek wrote: > Hi > > Dne ?t 6. srpna 2015 17:00:53, Paolo Momigliano napsal(a): > > Hi all, > > > > i am having some real trouble with a very simple task: running hw.test > from > > pegas which is supposed to work with adegenet. > > > > i tried importing my files both from genepop and fstat using the > > read.genepop and read.fstat functions: > > > > below an example code > > > > # load libraries > > library(adegenet) > > library(pegas) > > > > # import data > > neu<-read.genepop("snp1.gen", ncode = 3L) > > I use to use this function only with loci object (I use to start with > pegas' > read.loci function to import data), although man page says it should work > also > with genind. Might be You could try to convert Your genind object into > loci by > genind2loci function. > > > # Calculate HWE for first pop > > HWE_Ning<-hw.test(neu[1:23,], B=100) > > > > at which point, i get warnings: > > >There were 50 or more warnings (use warnings() to see the first 50) > > > > > > warnings() > > > > Warning messages: > > 1: In O - E : longer object length is not a multiple of shorter object > > length > > 2: In (O - E)^2/E : > > longer object length is not a multiple of shorter object length > > 3: In O - E : longer object length is not a multiple of shorter object > > length > > 4: In (O - E)^2/E : > > longer object length is not a multiple of shorter object length > > 5: In O - E : longer object length is not a multiple of shorter object > > length > > 6: In (O - E)^2/E : > > longer object length is not a multiple of shorter object length > > 7: In O - E : longer object length is not a multiple of shorter object > > length > > > > etc.... > > > > The hw.test function works just fine on the "nancycat" dataset from > > Adegenet . But it doesn't whenever i import a dataset. I know the > datasets > > are not corrupted, as i have carried out a lot of other analyses in > > adegenet and diveRsity and never had a problem... > > > > This problem persists if i try to convert the genind object using the > > as.loci function before running hw.test. > > > > Do you have any suggestion? > > Adegenet also has function HWE.test.genind. It works fine for me on genind > objects. > There is also function HWE.test in package genetics, but I haven't used it > myself. > > > Thanks so much! > > > > Paolo > > HTH, > Vojt?ch > > -- > Vojt?ch Zeisek > http://trapa.cz/en/ > > Department of Botany, Faculty of Science > Charles University in Prague > Ben?tsk? 2, Prague, 12801, CZ > http://botany.natur.cuni.cz/en/ > > Institute of Botany, Academy of Science > Z?mek 1, Pr?honice, 25243, CZ > http://www.ibot.cas.cz/en/ > > Czech Republic > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vojta at trapa.cz Thu Aug 6 09:42:39 2015 From: vojta at trapa.cz (=?utf-8?B?Vm9qdMSbY2g=?= Zeisek) Date: Thu, 06 Aug 2015 09:42:39 +0200 Subject: [adegenet-forum] [adegenet forum] hw.test on genind object In-Reply-To: References: <1937877.EyhbUskscb@veles> Message-ID: <7057116.uogps0hNZK@veles> Dne ?t 6. srpna 2015 17:31:20, Paolo Momigliano napsal(a): > Thanks for the tip, > > unfortunately using the genind2loci function to convert does not help (i > guess that does the same as as.loci?). i used the HWE.test function in What was the error message? Hm, then the only idea (going back to Your error messages) is this: Are You really absolutely sure You have all vectors of same lengths? I mean specifically individual names, populations and all loci. How do You mark missing data? Are there individual/population/loci names? How do they look like? Does print(neu) say what You expect (number of loci and individuals)? And what about summary(neu) - missing data and other characteristics? Does print(neu, details=TRUE) look OK? > adegenet too before, yet i understand that this function is not present > anymore in the new adegenet and its functionality is supposed to be > replaced by pegas hwe.test? > > If i try to run HWE.est from adegenet i get this > > > HWE.test(neu) > > Error: could not find function "HWE.test" > > or > > > HWE.test.genind(neu) > > As of adegenet_1.5-0, this function has been removed and is replaced by > 'hw.test' in the package 'pegas' Yes, I see. I haven't used newest adegenet much yet, so I missed that. > Did anyone have the same problem? Once I had problem with this function (I don't remember the error message) when I had too much missing data (~15% in some loci). After removal of those loci/individuals, it worked fine. But I guess this is not Your case now... > Cheers > > Paolo Good luck, Vojt?ch > On Thu, Aug 6, 2015 at 5:18 PM, Vojt?ch Zeisek wrote: > > Hi > > > > Dne ?t 6. srpna 2015 17:00:53, Paolo Momigliano napsal(a): > > > Hi all, > > > > > > i am having some real trouble with a very simple task: running hw.test > > > > from > > > > > pegas which is supposed to work with adegenet. > > > > > > i tried importing my files both from genepop and fstat using the > > > read.genepop and read.fstat functions: > > > > > > below an example code > > > > > > # load libraries > > > library(adegenet) > > > library(pegas) > > > > > > # import data > > > neu<-read.genepop("snp1.gen", ncode = 3L) > > > > I use to use this function only with loci object (I use to start with > > pegas' > > read.loci function to import data), although man page says it should work > > also > > with genind. Might be You could try to convert Your genind object into > > loci by > > genind2loci function. > > > > > # Calculate HWE for first pop > > > HWE_Ning<-hw.test(neu[1:23,], B=100) > > > > > > at which point, i get warnings: > > > >There were 50 or more warnings (use warnings() to see the first 50) > > > > > > > > warnings() > > > > > > Warning messages: > > > 1: In O - E : longer object length is not a multiple of shorter object > > > length > > > > > > 2: In (O - E)^2/E : > > > longer object length is not a multiple of shorter object length > > > > > > 3: In O - E : longer object length is not a multiple of shorter object > > > length > > > > > > 4: In (O - E)^2/E : > > > longer object length is not a multiple of shorter object length > > > > > > 5: In O - E : longer object length is not a multiple of shorter object > > > length > > > > > > 6: In (O - E)^2/E : > > > longer object length is not a multiple of shorter object length > > > > > > 7: In O - E : longer object length is not a multiple of shorter object > > > length > > > > > > etc.... > > > > > > The hw.test function works just fine on the "nancycat" dataset from > > > Adegenet . But it doesn't whenever i import a dataset. I know the > > > > datasets > > > > > are not corrupted, as i have carried out a lot of other analyses in > > > adegenet and diveRsity and never had a problem... > > > > > > This problem persists if i try to convert the genind object using the > > > as.loci function before running hw.test. > > > > > > Do you have any suggestion? > > > > Adegenet also has function HWE.test.genind. It works fine for me on genind > > objects. > > There is also function HWE.test in package genetics, but I haven't used it > > myself. > > > > > Thanks so much! > > > > > > Paolo > > > > HTH, > > Vojt?ch -- Vojt?ch Zeisek http://trapa.cz/en/ Department of Botany, Faculty of Science Charles University in Prague Ben?tsk? 2, Prague, 12801, CZ http://botany.natur.cuni.cz/en/ Institute of Botany, Academy of Science Z?mek 1, Pr?honice, 25243, CZ http://www.ibot.cas.cz/en/ Czech Republic -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 473 bytes Desc: This is a digitally signed message part. URL: From simon.crameri at env.ethz.ch Thu Aug 6 16:43:24 2015 From: simon.crameri at env.ethz.ch (Crameri Simon) Date: Thu, 6 Aug 2015 14:43:24 +0000 Subject: [adegenet-forum] xvalDapc and group prediction accuracy In-Reply-To: References: Message-ID: <9BBBFDA6-8107-45A9-8FF0-4486545B9DB0@ethz.ch> Hi Caitlin I'm writing to you because you are the author of xvalDapc. I'm still somewhat confused regarding question 2) of my first post. You don't need to read it again, lets just consider this: - I have a genetic dataset of 100 individuals, and I know the true group membership of every individual. - I'd like to build a cross-validated DAPC "model" (let's call it DAPC model) which can be used to predict group membership of further individuals. - I run xvalDapc on say 50% of the 100 individuals (the reason I can't take 90% lies in the small size of some groups). - I get n.pca = 25 as the best n.pca for building the DAPC model, and xvalDapc automatically produces an according DAPC, albeit with 100% of the individuals. Now comes the tricky question: Can I really use the DAPC produced by xvalDapc for prediction purposes? I still think that it is somewhat problematic to take the full dataset (100 individuals) to build a cross-validated DAPC model when the n.pca used in the PCA step of DAPC was determined from training sets of just 50 individuals. Perhaps this is the reason why you set training.set = 0.9 as a default value, to make this difference as small as possible? An alternative approach would be to use xvalDapc as "just" a (wonderful!) tool to get an optimal n.pca for your data. But for prediction purposes, I'd suggest to build a DAPC model with a training set of in this case 50 individuals (from a stratified sampling) instead of all individuals. If you don't like to loose the information of the other 50 individuals, you even could produce say 30 permuted training sets in the same way as xvalDapc does it, build 30 DAPC models and predict your further individuals against all permuted 30 DAPC models separately, taking the group that was most oftenly assigned to an additional sample as the predicted group. Do you have any comments on that? I know, it's all very complicated, but wouldn't that be statistically more appropriate? Thank you in advance, Simon ---------------------------------------------------------------------- Message: 1 Date: Tue, 28 Jul 2015 11:52:41 +0000 From: "Jombart, Thibaut" > To: "Crameri Simon" >, ">" > Subject: Re: [adegenet-forum] xvalDapc and group prediction accuracy Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABF58B2D at icexch-m1.ic.ac.uk> Content-Type: text/plain; charset="iso-8859-1" Hi there see the argument 'result' in xvalDapc. The difference you see is the difference between the mean % of successful prediction averaged over groups (default), and the overall % of successful prediction. These two quantities are increasingly different when sample size are unequal. Cheers Thibaut ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Crameri Simon [simon.crameri at env.ethz.ch] Sent: 17 July 2015 16:25 To: > Subject: [adegenet-forum] xvalDapc and group prediction accuracy Hi Thibaut I am still working with my tree species whose genotypes I'd like to model using DAPC, and I am still aiming to use the results as a forensic tool to identify species genetically. Therefore, the whole approach needs to be as reliable as possible. I tried xvalDapc() to perform DAPC cross-validation and found an optimal n.pca: table(data at pop) P01 P02 P03 P04 P05 P06 P07 P08 P09 P10 P11 11 5 5 16 10 15 34 4 4 11 4 xval <- xvalDapc(data at tab, pop(data), training.set = 0.5, result = "groupMean", n.pca = 10:20, n.rep = 1000) xval$`Mean Successful Assignment by Number of PCs of PCA`[as.numeric(xval$`Number of PCs Achieving Highest Mean Success`)] 14 0.9953977 xval$'Number of PCs Achieving Lowest MSE' [1] "14" xval$DAPC$n.pca [1] 14 It all works fine, the resulting best n.pca is still 14 if xvalDapc() is carried out multiple times using the same parameters, and even so when changing training.set to say 0.9. Now I use the validated model (xval$DAPC) to predict species membership of additional samples: predict(xval$DAPC, newdata=new.data) Again, it's all working perfectly, but what I don't fully understand is this: 1) As it happens, I know the true group membership of the additional samples. Therefore I can assess the prediction accuracy of xval$DAPC. It turns out that 96.8% (group mean!) of the additional samples are correctly predicted by xval$DAPC. Why is this number slightly different from the expected 99.5%? May it be due to the different group sizes present in the full dataset (table(data at pop))? 2) If the full dataset contains groups of very different size, some of which are fairly small: would it be more reliable to predict group membership of additional samples using the above determined n.pca and all 1000 training sets (which have approximately equal group size) as a reference, instead of using the full dataset (where group sizes differ) and just one prediction? The resulting 1000 prediction outcomes could be screened for the groups most oftenly assinged to each new sample. Any opinions / ideas? Thanks in advance, Simon ************* phD student ETH Zurich Plant Ecological Genetics -------------- next part -------------- An HTML attachment was scrubbed... URL: From kamvarz at science.oregonstate.edu Thu Aug 6 17:52:08 2015 From: kamvarz at science.oregonstate.edu (Zhian Kamvar) Date: Thu, 6 Aug 2015 08:52:08 -0700 Subject: [adegenet-forum] [adegenet forum] hw.test on genind object In-Reply-To: References: Message-ID: This is due to an issue in pegas where it expects the alleles to be in numerical order for micro satellites. Until it's fixed, I have made a workaround here: https://gist.github.com/zkamvar/95af8adaf1b3f01a4995 Copy + paste the function into your R console and then use resort_microsat(myData) for the workaround. Cheers, Zhian > On Aug 6, 2015, at 03:00 , adegenet-forum-request at lists.r-forge.r-project.org wrote: > > Send adegenet-forum mailing list submissions to > adegenet-forum at lists.r-forge.r-project.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum > > or, via email, send a message with subject or body 'help' to > adegenet-forum-request at lists.r-forge.r-project.org > > You can reach the person managing the list at > adegenet-forum-owner at lists.r-forge.r-project.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of adegenet-forum digest..." > > > Today's Topics: > > 1. Re: [adegenet forum] hw.test on genind object (Vojt?ch Zeisek) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 06 Aug 2015 09:42:39 +0200 > From: Vojt?ch Zeisek > To: adegenet-forum at lists.r-forge.r-project.org > Subject: Re: [adegenet-forum] [adegenet forum] hw.test on genind > object > Message-ID: <7057116.uogps0hNZK at veles> > Content-Type: text/plain; charset="utf-8" > > Dne ?t 6. srpna 2015 17:31:20, Paolo Momigliano napsal(a): >> Thanks for the tip, >> >> unfortunately using the genind2loci function to convert does not help (i >> guess that does the same as as.loci?). i used the HWE.test function in > > What was the error message? Hm, then the only idea (going back to Your error > messages) is this: Are You really absolutely sure You have all vectors of same > lengths? I mean specifically individual names, populations and all loci. How > do You mark missing data? Are there individual/population/loci names? How do > they look like? Does print(neu) say what You expect (number of loci and > individuals)? And what about summary(neu) - missing data and other > characteristics? Does print(neu, details=TRUE) look OK? > >> adegenet too before, yet i understand that this function is not present >> anymore in the new adegenet and its functionality is supposed to be >> replaced by pegas hwe.test? >> >> If i try to run HWE.est from adegenet i get this >> >>> HWE.test(neu) >> >> Error: could not find function "HWE.test" >> >> or >> >>> HWE.test.genind(neu) >> >> As of adegenet_1.5-0, this function has been removed and is replaced by >> 'hw.test' in the package 'pegas' > > Yes, I see. I haven't used newest adegenet much yet, so I missed that. > >> Did anyone have the same problem? > > Once I had problem with this function (I don't remember the error message) > when I had too much missing data (~15% in some loci). After removal of those > loci/individuals, it worked fine. But I guess this is not Your case now... > >> Cheers >> >> Paolo > > Good luck, > Vojt?ch > >> On Thu, Aug 6, 2015 at 5:18 PM, Vojt?ch Zeisek wrote: >>> Hi >>> >>> Dne ?t 6. srpna 2015 17:00:53, Paolo Momigliano napsal(a): >>>> Hi all, >>>> >>>> i am having some real trouble with a very simple task: running hw.test >>> >>> from >>> >>>> pegas which is supposed to work with adegenet. >>>> >>>> i tried importing my files both from genepop and fstat using the >>>> read.genepop and read.fstat functions: >>>> >>>> below an example code >>>> >>>> # load libraries >>>> library(adegenet) >>>> library(pegas) >>>> >>>> # import data >>>> neu<-read.genepop("snp1.gen", ncode = 3L) >>> >>> I use to use this function only with loci object (I use to start with >>> pegas' >>> read.loci function to import data), although man page says it should work >>> also >>> with genind. Might be You could try to convert Your genind object into >>> loci by >>> genind2loci function. >>> >>>> # Calculate HWE for first pop >>>> HWE_Ning<-hw.test(neu[1:23,], B=100) >>>> >>>> at which point, i get warnings: >>>>> There were 50 or more warnings (use warnings() to see the first 50) >>>>> >>>>> warnings() >>>> >>>> Warning messages: >>>> 1: In O - E : longer object length is not a multiple of shorter object >>>> length >>>> >>>> 2: In (O - E)^2/E : >>>> longer object length is not a multiple of shorter object length >>>> >>>> 3: In O - E : longer object length is not a multiple of shorter object >>>> length >>>> >>>> 4: In (O - E)^2/E : >>>> longer object length is not a multiple of shorter object length >>>> >>>> 5: In O - E : longer object length is not a multiple of shorter object >>>> length >>>> >>>> 6: In (O - E)^2/E : >>>> longer object length is not a multiple of shorter object length >>>> >>>> 7: In O - E : longer object length is not a multiple of shorter object >>>> length >>>> >>>> etc.... >>>> >>>> The hw.test function works just fine on the "nancycat" dataset from >>>> Adegenet . But it doesn't whenever i import a dataset. I know the >>> >>> datasets >>> >>>> are not corrupted, as i have carried out a lot of other analyses in >>>> adegenet and diveRsity and never had a problem... >>>> >>>> This problem persists if i try to convert the genind object using the >>>> as.loci function before running hw.test. >>>> >>>> Do you have any suggestion? >>> >>> Adegenet also has function HWE.test.genind. It works fine for me on genind >>> objects. >>> There is also function HWE.test in package genetics, but I haven't used it >>> myself. >>> >>>> Thanks so much! >>>> >>>> Paolo >>> >>> HTH, >>> Vojt?ch > -- > Vojt?ch Zeisek > http://trapa.cz/en/ > > Department of Botany, Faculty of Science > Charles University in Prague > Ben?tsk? 2, Prague, 12801, CZ > http://botany.natur.cuni.cz/en/ > > Institute of Botany, Academy of Science > Z?mek 1, Pr?honice, 25243, CZ > http://www.ibot.cas.cz/en/ > > Czech Republic > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: signature.asc > Type: application/pgp-signature > Size: 473 bytes > Desc: This is a digitally signed message part. > URL: > > ------------------------------ > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum > > End of adegenet-forum Digest, Vol 84, Issue 2 > ********************************************* From paolo.momigliano at gmail.com Fri Aug 7 09:29:26 2015 From: paolo.momigliano at gmail.com (Paolo Momigliano) Date: Fri, 7 Aug 2015 17:29:26 +1000 Subject: [adegenet-forum] [adegenet forum] hw.test on genind object In-Reply-To: References: Message-ID: Hi Zhian, thank you so much this worked perfectly. Cheers Paolo On Fri, Aug 7, 2015 at 1:52 AM, Zhian Kamvar < kamvarz at science.oregonstate.edu> wrote: > This is due to an issue in pegas where it expects the alleles to be in > numerical order for micro satellites. > > Until it's fixed, I have made a workaround here: > https://gist.github.com/zkamvar/95af8adaf1b3f01a4995 > > Copy + paste the function into your R console and then use > resort_microsat(myData) for the workaround. > > Cheers, > Zhian > > > On Aug 6, 2015, at 03:00 , > adegenet-forum-request at lists.r-forge.r-project.org wrote: > > > > Send adegenet-forum mailing list submissions to > > adegenet-forum at lists.r-forge.r-project.org > > > > To subscribe or unsubscribe via the World Wide Web, visit > > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum > > > > or, via email, send a message with subject or body 'help' to > > adegenet-forum-request at lists.r-forge.r-project.org > > > > You can reach the person managing the list at > > adegenet-forum-owner at lists.r-forge.r-project.org > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of adegenet-forum digest..." > > > > > > Today's Topics: > > > > 1. Re: [adegenet forum] hw.test on genind object (Vojt?ch Zeisek) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Thu, 06 Aug 2015 09:42:39 +0200 > > From: Vojt?ch Zeisek > > To: adegenet-forum at lists.r-forge.r-project.org > > Subject: Re: [adegenet-forum] [adegenet forum] hw.test on genind > > object > > Message-ID: <7057116.uogps0hNZK at veles> > > Content-Type: text/plain; charset="utf-8" > > > > Dne ?t 6. srpna 2015 17:31:20, Paolo Momigliano napsal(a): > >> Thanks for the tip, > >> > >> unfortunately using the genind2loci function to convert does not help (i > >> guess that does the same as as.loci?). i used the HWE.test function in > > > > What was the error message? Hm, then the only idea (going back to Your > error > > messages) is this: Are You really absolutely sure You have all vectors > of same > > lengths? I mean specifically individual names, populations and all loci. > How > > do You mark missing data? Are there individual/population/loci names? > How do > > they look like? Does print(neu) say what You expect (number of loci and > > individuals)? And what about summary(neu) - missing data and other > > characteristics? Does print(neu, details=TRUE) look OK? > > > >> adegenet too before, yet i understand that this function is not present > >> anymore in the new adegenet and its functionality is supposed to be > >> replaced by pegas hwe.test? > >> > >> If i try to run HWE.est from adegenet i get this > >> > >>> HWE.test(neu) > >> > >> Error: could not find function "HWE.test" > >> > >> or > >> > >>> HWE.test.genind(neu) > >> > >> As of adegenet_1.5-0, this function has been removed and is replaced by > >> 'hw.test' in the package 'pegas' > > > > Yes, I see. I haven't used newest adegenet much yet, so I missed that. > > > >> Did anyone have the same problem? > > > > Once I had problem with this function (I don't remember the error > message) > > when I had too much missing data (~15% in some loci). After removal of > those > > loci/individuals, it worked fine. But I guess this is not Your case > now... > > > >> Cheers > >> > >> Paolo > > > > Good luck, > > Vojt?ch > > > >> On Thu, Aug 6, 2015 at 5:18 PM, Vojt?ch Zeisek wrote: > >>> Hi > >>> > >>> Dne ?t 6. srpna 2015 17:00:53, Paolo Momigliano napsal(a): > >>>> Hi all, > >>>> > >>>> i am having some real trouble with a very simple task: running hw.test > >>> > >>> from > >>> > >>>> pegas which is supposed to work with adegenet. > >>>> > >>>> i tried importing my files both from genepop and fstat using the > >>>> read.genepop and read.fstat functions: > >>>> > >>>> below an example code > >>>> > >>>> # load libraries > >>>> library(adegenet) > >>>> library(pegas) > >>>> > >>>> # import data > >>>> neu<-read.genepop("snp1.gen", ncode = 3L) > >>> > >>> I use to use this function only with loci object (I use to start with > >>> pegas' > >>> read.loci function to import data), although man page says it should > work > >>> also > >>> with genind. Might be You could try to convert Your genind object into > >>> loci by > >>> genind2loci function. > >>> > >>>> # Calculate HWE for first pop > >>>> HWE_Ning<-hw.test(neu[1:23,], B=100) > >>>> > >>>> at which point, i get warnings: > >>>>> There were 50 or more warnings (use warnings() to see the first 50) > >>>>> > >>>>> warnings() > >>>> > >>>> Warning messages: > >>>> 1: In O - E : longer object length is not a multiple of shorter object > >>>> length > >>>> > >>>> 2: In (O - E)^2/E : > >>>> longer object length is not a multiple of shorter object length > >>>> > >>>> 3: In O - E : longer object length is not a multiple of shorter object > >>>> length > >>>> > >>>> 4: In (O - E)^2/E : > >>>> longer object length is not a multiple of shorter object length > >>>> > >>>> 5: In O - E : longer object length is not a multiple of shorter object > >>>> length > >>>> > >>>> 6: In (O - E)^2/E : > >>>> longer object length is not a multiple of shorter object length > >>>> > >>>> 7: In O - E : longer object length is not a multiple of shorter object > >>>> length > >>>> > >>>> etc.... > >>>> > >>>> The hw.test function works just fine on the "nancycat" dataset from > >>>> Adegenet . But it doesn't whenever i import a dataset. I know the > >>> > >>> datasets > >>> > >>>> are not corrupted, as i have carried out a lot of other analyses in > >>>> adegenet and diveRsity and never had a problem... > >>>> > >>>> This problem persists if i try to convert the genind object using the > >>>> as.loci function before running hw.test. > >>>> > >>>> Do you have any suggestion? > >>> > >>> Adegenet also has function HWE.test.genind. It works fine for me on > genind > >>> objects. > >>> There is also function HWE.test in package genetics, but I haven't > used it > >>> myself. > >>> > >>>> Thanks so much! > >>>> > >>>> Paolo > >>> > >>> HTH, > >>> Vojt?ch > > -- > > Vojt?ch Zeisek > > http://trapa.cz/en/ > > > > Department of Botany, Faculty of Science > > Charles University in Prague > > Ben?tsk? 2, Prague, 12801, CZ > > http://botany.natur.cuni.cz/en/ > > > > Institute of Botany, Academy of Science > > Z?mek 1, Pr?honice, 25243, CZ > > http://www.ibot.cas.cz/en/ > > > > Czech Republic > > -------------- next part -------------- > > A non-text attachment was scrubbed... > > Name: signature.asc > > Type: application/pgp-signature > > Size: 473 bytes > > Desc: This is a digitally signed message part. > > URL: < > http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20150806/83cca2e1/attachment-0001.sig > > > > > > ------------------------------ > > > > _______________________________________________ > > adegenet-forum mailing list > > adegenet-forum at lists.r-forge.r-project.org > > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum > > > > End of adegenet-forum Digest, Vol 84, Issue 2 > > ********************************************* > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum > -------------- next part -------------- An HTML attachment was scrubbed... URL: From diedericks.genevieve at gmail.com Mon Aug 3 14:46:24 2015 From: diedericks.genevieve at gmail.com (Genevieve Diedericks) Date: Mon, 3 Aug 2015 14:46:24 +0200 Subject: [adegenet-forum] FW: MSPA In-Reply-To: <5170e01912e743ddb69d4eab1e85ad55@AM3PR07MB450.eurprd07.prod.outlook.com> References: <2CB2DA8E426F3541AB1907F98ABA6570ABF5B34D@icexch-m1.ic.ac.uk> <2CB2DA8E426F3541AB1907F98ABA6570ABF5B374@icexch-m1.ic.ac.uk> <5170e01912e743ddb69d4eab1e85ad55@AM3PR07MB450.eurprd07.prod.outlook.com> Message-ID: Hi Jombart, I'm struggling to install sedarJombart and cannot seem to find the zip file either (I've installed an older R version in the hopes of getting it installed). Any advice? Thank you. Regards Genevieve On Thu, Jul 30, 2015 at 1:25 PM, Diedericks, G, Me < gend at sun.ac.za> wrote: > > ------------------------------ > *From:* Jombart, Thibaut > *Sent:* 30 July 2015 01:24:53 PM (UTC+02:00) Harare, Pretoria > *To:* Diedericks, G, Me ; > adegenet-forum at lists.r-forge.r-project.org > *Subject:* RE: MSPA > > > The graph may be too large for your computer to handle? Rendering of large > graphs is typically slow on R's graphic devices. How many nodes do you have? > > Best > Thibaut > > ps: please keep the forum Cced > > > ------------------------------ > *From:* Diedericks, G, Me [gend at sun.ac.za] > *Sent:* 30 July 2015 12:07 > *To:* Jombart, Thibaut > *Subject:* Re: MSPA > > Hi, > > > Thanks for the speedy reply! I did as you suggested, but my R session > keeps on bombing out or hanging (I am using Rstudio). > > Any suggestions on how to fix this? > > > Thank you. > > > Kind regards, > > Genevieve > > > > ------------------------------ > *From:* Jombart, Thibaut > *Sent:* 30 July 2015 12:37 PM > > *To:* Diedericks, G, Me ; > adegenet-forum at lists.R-forge.R-project.org > *Subject:* RE: MSPA > > Hi there, > > if I remember well, chooseCN has an option to edit the graph manually / > interactively. The interface is a bit clunky but it should do the trick. > > Just set the argument edit.nb=TRUE when creating your graph. > > Cheers > Thibaut > > > ------------------------------ > *From:* adegenet-forum-bounces at lists.r-forge.r-project.org [ > adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of > Diedericks, G, Me [gend at sun.ac.za] > *Sent:* 30 July 2015 10:24 > *To:* adegenet-forum at lists.R-forge.R-project.org > *Cc:* Diedericks, G, Me > *Subject:* [adegenet-forum] MSPA > > Good day, > > > I'm trying to run a MSPA for a freshwater fish species, sampled at 10 > sites along a river. I have chosen the Delaunay Triangulation, but need to > edit the connections as some of the sites are below a dam wall, so the fish > cannot move back up the river. Could you please assist me with this? > > > Kind regards, > > Genevieve > > > * ------------------------------ Genevieve Diedericks PhD candidate ~ > Zoology Centre for Invasion Biology (C.I.B) Department of Botany & Zoology > Stellenbosch University South Africa +27 (0) 21 808 4135 > <%2B27%20%280%29%2021%20808%204135> * > -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.jombart at imperial.ac.uk Mon Aug 10 12:51:46 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Mon, 10 Aug 2015 10:51:46 +0000 Subject: [adegenet-forum] FW: MSPA In-Reply-To: References: <2CB2DA8E426F3541AB1907F98ABA6570ABF5B34D@icexch-m1.ic.ac.uk> <2CB2DA8E426F3541AB1907F98ABA6570ABF5B374@icexch-m1.ic.ac.uk> <5170e01912e743ddb69d4eab1e85ad55@AM3PR07MB450.eurprd07.prod.outlook.com>, Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570B127BC24@icexch-m1.ic.ac.uk> Hi there, yes, this 'package' is merely a set of functions I had contributed during a workshop - I am not maintaining it currently. I think Stephane Dray (Cced), author of ade4, is also working on an adespatial which would integrate the MSPA. Stephane, is this functional yet? Best Thibaut ________________________________ From: Genevieve Diedericks [diedericks.genevieve at gmail.com] Sent: 03 August 2015 13:46 To: Jombart, Thibaut Cc: adegenet-forum at lists.r-forge.r-project.org Subject: Re: FW: MSPA Hi Jombart, I'm struggling to install sedarJombart and cannot seem to find the zip file either (I've installed an older R version in the hopes of getting it installed). Any advice? Thank you. Regards Genevieve On Thu, Jul 30, 2015 at 1:25 PM, Diedericks, G, Me > > wrote: ________________________________ From: Jombart, Thibaut Sent: 30 July 2015 01:24:53 PM (UTC+02:00) Harare, Pretoria To: Diedericks, G, Me >; adegenet-forum at lists.r-forge.r-project.org Subject: RE: MSPA The graph may be too large for your computer to handle? Rendering of large graphs is typically slow on R's graphic devices. How many nodes do you have? Best Thibaut ps: please keep the forum Cced ________________________________ From: Diedericks, G, Me > [gend at sun.ac.za] Sent: 30 July 2015 12:07 To: Jombart, Thibaut Subject: Re: MSPA Hi, Thanks for the speedy reply! I did as you suggested, but my R session keeps on bombing out or hanging (I am using Rstudio). Any suggestions on how to fix this? Thank you. Kind regards, Genevieve ________________________________ From: Jombart, Thibaut > Sent: 30 July 2015 12:37 PM To: Diedericks, G, Me >; adegenet-forum at lists.R-forge.R-project.org Subject: RE: MSPA Hi there, if I remember well, chooseCN has an option to edit the graph manually / interactively. The interface is a bit clunky but it should do the trick. Just set the argument edit.nb=TRUE when creating your graph. Cheers Thibaut ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Diedericks, G, Me > [gend at sun.ac.za] Sent: 30 July 2015 10:24 To: adegenet-forum at lists.R-forge.R-project.org Cc: Diedericks, G, Me > Subject: [adegenet-forum] MSPA Good day, I'm trying to run a MSPA for a freshwater fish species, sampled at 10 sites along a river. I have chosen the Delaunay Triangulation, but need to edit the connections as some of the sites are below a dam wall, so the fish cannot move back up the river. Could you please assist me with this? Kind regards, Genevieve ________________________________ Genevieve Diedericks PhD candidate ~ Zoology Centre for Invasion Biology (C.I.B) Department of Botany & Zoology Stellenbosch University South Africa +27 (0) 21 808 4135 -------------- next part -------------- An HTML attachment was scrubbed... URL: From karine.bounan at florimond-desprez.fr Tue Aug 11 08:38:40 2015 From: karine.bounan at florimond-desprez.fr (karine henry) Date: Tue, 11 Aug 2015 06:38:40 +0000 Subject: [adegenet-forum] problem installing packages Message-ID: <4DE9ABC5E00F7544A4E8A6758C0217E4F213D62F@srv-exchange.florimond-desprez.fr> Dear all, I was using former version of adegenet and when changing my computer, download R 3.2.1 and try to install new version of adegenet without success, I get several errors including BH directory... could you please help me? regards Karine -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.jombart at imperial.ac.uk Tue Aug 11 13:00:06 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Tue, 11 Aug 2015 11:00:06 +0000 Subject: [adegenet-forum] problem installing packages In-Reply-To: <4DE9ABC5E00F7544A4E8A6758C0217E4F213D62F@srv-exchange.florimond-desprez.fr> References: <4DE9ABC5E00F7544A4E8A6758C0217E4F213D62F@srv-exchange.florimond-desprez.fr> Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570B127BE73@icexch-m1.ic.ac.uk> Hi Karine, can you please post this as an issue on the github issue system? https://github.com/thibautjombart/adegenet/issues Please report your R version (typing R.version) and copy-paste any error you may have. Thanks Thibaut ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of karine henry [karine.bounan at florimond-desprez.fr] Sent: 11 August 2015 07:38 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] problem installing packages Dear all, I was using former version of adegenet and when changing my computer, download R 3.2.1 and try to install new version of adegenet without success, I get several errors including BH directory... could you please help me? regards Karine -------------- next part -------------- An HTML attachment was scrubbed... URL: From carlo.pecoraro2 at unibo.it Wed Aug 12 17:04:28 2015 From: carlo.pecoraro2 at unibo.it (Carlo Pecoraro) Date: Wed, 12 Aug 2015 15:04:28 +0000 Subject: [adegenet-forum] I: Save a new genind/genepop object In-Reply-To: <386B59F78C25A74C9D53373962B04CF80121C6AACA@E10-MBX1-CS.personale.dir.unibo.it> References: <386B59F78C25A74C9D53373962B04CF80121C6AACA@E10-MBX1-CS.personale.dir.unibo.it> Message-ID: <386B59F78C25A74C9D53373962B04CF80121C6AAEA@E10-MBX1-CS.personale.dir.unibo.it> Hi all, I am modifying my genind object, filtering the data for MAF, missing values etc..and I would like to save the new object created after these steps. Any tips? How could I save this new dataset? Many thanks in advance, All the best, Carlo ________________________________ Da: Carlo Pecoraro Inviato: mercoled? 12 agosto 2015 16.59 A: adegenet-forum at lists.r-forge.r-project.org Oggetto: Save a new genind/genepop object Hi all, I am modifying my genind object, filtering the data for MAF, missing values etc..and I would like to save the new object created after these steps. Any tips? How could I save this new dataset? Many thanks in advance, All the best, Carlo From t.jombart at imperial.ac.uk Wed Aug 12 17:09:34 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Wed, 12 Aug 2015 15:09:34 +0000 Subject: [adegenet-forum] Save a new genind/genepop object In-Reply-To: <386B59F78C25A74C9D53373962B04CF80121C6AAEA@E10-MBX1-CS.personale.dir.unibo.it> References: <386B59F78C25A74C9D53373962B04CF80121C6AACA@E10-MBX1-CS.personale.dir.unibo.it>, <386B59F78C25A74C9D53373962B04CF80121C6AAEA@E10-MBX1-CS.personale.dir.unibo.it> Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570B127D0B5@icexch-m1.ic.ac.uk> Hi Carlo, it depends on what you do next with the object. If it is to reuse in R, simply use: save(x, file="x.RData") where 'x' is your R object. Otherwise, to use data in other software, you can use df2genind to convert the data to the appropriate format and then write.table / write.csv etc to save to file. Cheers Thibaut ________________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Carlo Pecoraro [carlo.pecoraro2 at unibo.it] Sent: 12 August 2015 16:04 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] I: Save a new genind/genepop object Hi all, I am modifying my genind object, filtering the data for MAF, missing values etc..and I would like to save the new object created after these steps. Any tips? How could I save this new dataset? Many thanks in advance, All the best, Carlo ________________________________ Da: Carlo Pecoraro Inviato: mercoled? 12 agosto 2015 16.59 A: adegenet-forum at lists.r-forge.r-project.org Oggetto: Save a new genind/genepop object Hi all, I am modifying my genind object, filtering the data for MAF, missing values etc..and I would like to save the new object created after these steps. Any tips? How could I save this new dataset? Many thanks in advance, All the best, Carlo _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum From carlo.pecoraro2 at unibo.it Wed Aug 12 17:20:40 2015 From: carlo.pecoraro2 at unibo.it (Carlo Pecoraro) Date: Wed, 12 Aug 2015 15:20:40 +0000 Subject: [adegenet-forum] Save a new genind/genepop object In-Reply-To: <2CB2DA8E426F3541AB1907F98ABA6570B127D0B5@icexch-m1.ic.ac.uk> References: <386B59F78C25A74C9D53373962B04CF80121C6AACA@E10-MBX1-CS.personale.dir.unibo.it>, <386B59F78C25A74C9D53373962B04CF80121C6AAEA@E10-MBX1-CS.personale.dir.unibo.it>, <2CB2DA8E426F3541AB1907F98ABA6570B127D0B5@icexch-m1.ic.ac.uk> Message-ID: <386B59F78C25A74C9D53373962B04CF80121C6AAFF@E10-MBX1-CS.personale.dir.unibo.it> Hi Thibaut, many thanks for your quick answer. I want to use the data in other software. So I will try the second option. Many thanks for your help. Cheers, Carlo ________________________________________ Da: Jombart, Thibaut [t.jombart at imperial.ac.uk] Inviato: mercoled? 12 agosto 2015 17.09 A: Carlo Pecoraro; adegenet-forum at lists.r-forge.r-project.org Oggetto: RE: Save a new genind/genepop object Hi Carlo, it depends on what you do next with the object. If it is to reuse in R, simply use: save(x, file="x.RData") where 'x' is your R object. Otherwise, to use data in other software, you can use df2genind to convert the data to the appropriate format and then write.table / write.csv etc to save to file. Cheers Thibaut ________________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Carlo Pecoraro [carlo.pecoraro2 at unibo.it] Sent: 12 August 2015 16:04 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] I: Save a new genind/genepop object Hi all, I am modifying my genind object, filtering the data for MAF, missing values etc..and I would like to save the new object created after these steps. Any tips? How could I save this new dataset? Many thanks in advance, All the best, Carlo ________________________________ Da: Carlo Pecoraro Inviato: mercoled? 12 agosto 2015 16.59 A: adegenet-forum at lists.r-forge.r-project.org Oggetto: Save a new genind/genepop object Hi all, I am modifying my genind object, filtering the data for MAF, missing values etc..and I would like to save the new object created after these steps. Any tips? How could I save this new dataset? Many thanks in advance, All the best, Carlo _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum From carlo.pecoraro2 at unibo.it Mon Aug 17 16:14:19 2015 From: carlo.pecoraro2 at unibo.it (Carlo Pecoraro) Date: Mon, 17 Aug 2015 14:14:19 +0000 Subject: [adegenet-forum] HWE Message-ID: <386B59F78C25A74C9D53373962B04CF80121C6C253@E10-MBX1-CS.personale.dir.unibo.it> Hi all, I am trying to remove from my dataset those loci in HWE. I have measured it both per population: hwe.pop<-(lapply(seppop(x), hw.test) and per locus hwe.loc<-hw.test(x), B=1000). Now I would like to define a threshold (0.05) for the Pr(chi^2>), removing all those loci with a derived P-values above this threshold. How could I perform this analysis? Do you have any suggestion? I am wondering if this is the best way to filter my dataset according to the HWE. Any advise would be more than welcome. Best regards, Carlo -- Carlo Pecoraro, PhD Candidate Laboratory of Genetics & Genomics of Marine Resources and Environment (GenoDREAM) Dept. Biological, Geological & Environmental Sciences (BiGeA) University of Bologna Via S. Alberto 163, 48123 Ravenna (Italy) IRD (Institut de Recherche pour le D?veloppement) UMR 212 EME (Ecosyt?mes Marins Exploit?s) BP 570 Victoria, Mah? Seychelles Ph: +39 3337603101 skype contact: carlo_pecoraro -------------- next part -------------- An HTML attachment was scrubbed... URL: From hvh22 at cam.ac.uk Mon Aug 17 16:52:19 2015 From: hvh22 at cam.ac.uk (Harriet Hunt) Date: Mon, 17 Aug 2015 15:52:19 +0100 Subject: [adegenet-forum] snpposi.plot with multiple chromosomes Message-ID: <4b5b7e5e54b71c4555111a64422eaa14@cam.ac.uk> Hello all, I have a dataset with SNP positions given as base numbers along chromosomes, and the number of the chromosome is in a separate column in the original data table. Can anyone suggest a way either to easily select and plot the SNPs from one particular chromosome using snpposi.plot, without having to extract the data for each chromosome manually from the original data table? Or even better would be if there's a way to use snpposi.plot to show the density plots for all the chromosomes in order. thanks, Harriet From carlo.pecoraro2 at unibo.it Mon Aug 17 17:58:35 2015 From: carlo.pecoraro2 at unibo.it (Carlo Pecoraro) Date: Mon, 17 Aug 2015 15:58:35 +0000 Subject: [adegenet-forum] I: HWE Message-ID: <386B59F78C25A74C9D53373962B04CF80121C6C27D@E10-MBX1-CS.personale.dir.unibo.it> Hi all, I am trying to remove from my dataset those loci in HWE. I have measured it both per population: hwe.pop<-(lapply(seppop(x), hw.test) and per locus hwe.loc<-hw.test(x), B=1000). Now I would like to define a threshold (0.05) for the Pr(chi^2>), removing all those loci with a derived P-values above this threshold. How could I perform this analysis? Do you have any suggestion? I am wondering if this is the best way to filter my dataset according to the HWE. Any advise would be more than welcome. Best regards, Carlo Da: Carlo Pecoraro Inviato: luned? 17 agosto 2015 16:14 A: 'adegenet-forum at lists.r-forge.r-project.org' Oggetto: HWE Hi all, I am trying to remove from my dataset those loci in HWE. I have measured it both per population: hwe.pop<-(lapply(seppop(x), hw.test) and per locus hwe.loc<-hw.test(x), B=1000). Now I would like to define a threshold (0.05) for the Pr(chi^2>), removing all those loci with a derived P-values above this threshold. How could I perform this analysis? Do you have any suggestion? I am wondering if this is the best way to filter my dataset according to the HWE. Any advise would be more than welcome. Best regards, Carlo -- Carlo Pecoraro, PhD Candidate Laboratory of Genetics & Genomics of Marine Resources and Environment (GenoDREAM) Dept. Biological, Geological & Environmental Sciences (BiGeA) University of Bologna Via S. Alberto 163, 48123 Ravenna (Italy) IRD (Institut de Recherche pour le D?veloppement) UMR 212 EME (Ecosyt?mes Marins Exploit?s) BP 570 Victoria, Mah? Seychelles Ph: +39 3337603101 skype contact: carlo_pecoraro -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.jombart at imperial.ac.uk Mon Aug 17 18:04:25 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Mon, 17 Aug 2015 16:04:25 +0000 Subject: [adegenet-forum] snpposi.plot with multiple chromosomes In-Reply-To: <4b5b7e5e54b71c4555111a64422eaa14@cam.ac.uk> References: <4b5b7e5e54b71c4555111a64422eaa14@cam.ac.uk> Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570B12858D6@icexch-m1.ic.ac.uk> Hi there, basically the workflow should be 1) split data by chromosome and 2) lapply over the result to generate the plot. 1) should be implemented by seploc, and I have just posted a feature request for this: https://github.com/thibautjombart/adegenet/issues/84 and for the whole thing. Meanwhile, there is a simple work around if you have i) SNP positions and ii) chromosome info. Split the first with the second, and apply snpposi.plot to the resulting list; or tapply directly. Here's an example with a simulated dataset: ## load package library(adegenet) ## simulate data: 10 indiv, 1,000 SNPs from 10,000 nucleotide positions; first 600 SNPs are chr1, the other are chr2 x=glSim(10, 1000) position(x) <- sort(sample(1:1e4, 1000)) chromosome(x) <- rep(1:2, c(600,400)) ## split positions by chromosome and apply snpposi.plot to the bits allPlots <- tapply(position(x), chromosome(x), snpposi.plot, genome.size=1e4) allPlots[[1]] # chr 1 allPlots[[2]] # chr 2 Note that if you want the positions to be relative to the chromosomes, then you have to subtract manually the starting positions, e.g. temp <- split(position(x), chromosome(x)) temp[[2]] <- temp[[2]] - 5504 # assuming chr 2 starts at position 5504 allPlots.scaled <- lapply(1:2, function(i) snpposi.plot(temp[[i]], genome.size=max(temp[[i]]))) Cheers Thibaut ________________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Harriet Hunt [hvh22 at cam.ac.uk] Sent: 17 August 2015 15:52 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] snpposi.plot with multiple chromosomes Hello all, I have a dataset with SNP positions given as base numbers along chromosomes, and the number of the chromosome is in a separate column in the original data table. Can anyone suggest a way either to easily select and plot the SNPs from one particular chromosome using snpposi.plot, without having to extract the data for each chromosome manually from the original data table? Or even better would be if there's a way to use snpposi.plot to show the density plots for all the chromosomes in order. thanks, Harriet _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum From t.jombart at imperial.ac.uk Mon Aug 17 18:16:12 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Mon, 17 Aug 2015 16:16:12 +0000 Subject: [adegenet-forum] HWE In-Reply-To: <386B59F78C25A74C9D53373962B04CF80121C6C27D@E10-MBX1-CS.personale.dir.unibo.it> References: <386B59F78C25A74C9D53373962B04CF80121C6C27D@E10-MBX1-CS.personale.dir.unibo.it> Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570B12858E7@icexch-m1.ic.ac.uk> Hi Carlo, if you want to be conservative, you can apply Bonferroni correction to your data. Here's an example using nancycats: > library(adegenet) > library(pegas) > temp <- hw.test(nancycats) > pval <- temp[,3] > pval fca8 fca23 fca43 fca45 fca77 fca78 0.000000e+00 0.000000e+00 0.000000e+00 1.622163e-03 0.000000e+00 0.000000e+00 fca90 fca96 fca37 0.000000e+00 1.965095e-14 1.209777e-10 > loc.to.keep <- pval < (0.05/nLoc(nancycats)) # use Bonferroni > loc.to.keep fca8 fca23 fca43 fca45 fca77 fca78 fca90 fca96 fca37 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE > x <- nancycats[loc=loc.to.keep] Here all the loci are kept, but loci at HWE would have been filtered out. Cheers Thibaut ============================== Dr Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Department of Infectious Disease Epidemiology Imperial College - School of Public Health Norfolk Place, London W2 1PG, UK Tel. : 0044 (0)20 7594 3658 http://sites.google.com/site/thibautjombart/ http://sites.google.com/site/therepiproject/ http://adegenet.r-forge.r-project.org/ Twitter: @thibautjombart ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Carlo Pecoraro [carlo.pecoraro2 at unibo.it] Sent: 17 August 2015 16:58 To: adegenet-forum at lists.r-forge.r-project.org Cc: Thibaut Jombart Subject: [adegenet-forum] I: HWE Hi all, I am trying to remove from my dataset those loci in HWE. I have measured it both per population: hwe.pop<-(lapply(seppop(x), hw.test) and per locus hwe.loc<-hw.test(x), B=1000). Now I would like to define a threshold (0.05) for the Pr(chi^2>), removing all those loci with a derived P-values above this threshold. How could I perform this analysis? Do you have any suggestion? I am wondering if this is the best way to filter my dataset according to the HWE. Any advise would be more than welcome. Best regards, Carlo Da: Carlo Pecoraro Inviato: luned? 17 agosto 2015 16:14 A: 'adegenet-forum at lists.r-forge.r-project.org' Oggetto: HWE Hi all, I am trying to remove from my dataset those loci in HWE. I have measured it both per population: hwe.pop<-(lapply(seppop(x), hw.test) and per locus hwe.loc<-hw.test(x), B=1000). Now I would like to define a threshold (0.05) for the Pr(chi^2>), removing all those loci with a derived P-values above this threshold. How could I perform this analysis? Do you have any suggestion? I am wondering if this is the best way to filter my dataset according to the HWE. Any advise would be more than welcome. Best regards, Carlo -- Carlo Pecoraro, PhD Candidate Laboratory of Genetics & Genomics of Marine Resources and Environment (GenoDREAM) Dept. Biological, Geological & Environmental Sciences (BiGeA) University of Bologna Via S. Alberto 163, 48123 Ravenna (Italy) IRD (Institut de Recherche pour le D?veloppement) UMR 212 EME (Ecosyt?mes Marins Exploit?s) BP 570 Victoria, Mah? Seychelles Ph: +39 3337603101 skype contact: carlo_pecoraro -------------- next part -------------- An HTML attachment was scrubbed... URL: From muzhinjin at yahoo.com Wed Aug 19 19:24:01 2015 From: muzhinjin at yahoo.com (normanm muzhinji) Date: Wed, 19 Aug 2015 17:24:01 +0000 (UTC) Subject: [adegenet-forum] Importing data Message-ID: <1729827843.7438870.1440005041610.JavaMail.yahoo@mail.yahoo.com> ?I am trying to import my data using structure format but I am getting thisr message Error in if (!toupper(.readExt(file)) %in% c("STR", "STRU")) stop("File extension .stru expected") :? argument is of length zero Could you kindly help on what commands to use for importing a data saved on my documents or /on desktop to adegenet for analysis. Thanks for your help -------------- next part -------------- An HTML attachment was scrubbed... URL: From caitiecollins at gmail.com Wed Aug 19 19:51:44 2015 From: caitiecollins at gmail.com (Caitlin Collins) Date: Wed, 19 Aug 2015 18:51:44 +0100 Subject: [adegenet-forum] xvalDapc and group prediction accuracy In-Reply-To: <9BBBFDA6-8107-45A9-8FF0-4486545B9DB0@ethz.ch> References: <9BBBFDA6-8107-45A9-8FF0-4486545B9DB0@ethz.ch> Message-ID: Hi Simon, *First, I am not sure that using 50% is preferable to 90% for the training set size.* Think of the plot that xvalDapc produces (the one with the blue dots (with n.pc on the x-axis and predictive success on the y-axis). Our aim in cross-validation is just to identify the optimum number of PCs to use in DAPC. Therefore, we want to see an arc on the plot produced by xvalDapc, as in low predictive success at the lower limit of the x-axis (too little information) and low predictive success at the upper limit of the x-axis (too much noise), but better predictive success in the middle. When we use *smaller training sets* (eg. if we were to use 50%), the result is *reduced variability* in the predictive success response variable (ie. within each n.pca, the black dots will be more likely to fall within a tighter range on the y-axis) *but lower predictive success* than you would get with larger training sets. By contrast, when we use* larger training sets* (eg.90%), the result is *increased variability* (dots in each n.pca column will be more spread out on the y-axis), *but a better picture of the true maximum and minimum predictive success possible at each n.pca. * *In this latter case, we are more likely to see the arc-like shape* we are hoping for in any optimisation problem. The fact that the dots will be more spread out on the y-axis (and the arc therefore more blurry) does not prevent us from identifying the maximum (optimal number of PCs). By contrast, in the former case, while the dots are more densely packed, there will be a far greater chance that we will fail to see an arc-like shape and fail to identify the true optimum number of PCs. You really don't want to be losing 50% of the available information when building these models. Even with small and uneven sample sizes, stratified cross-validation as performed by xvalDapc is still designed to identify the number of PCs that will give you the highest predictive success. *Whether you are trying to build a model for explanatory or predictive purposes, I would suggest using 90% for the training set size. * *Second, I want to thank you for drawing my attention to cases in which xvalDapc needs to handle small groups with training.set sizes that are not 90%: Thanks to your input, I've made some changes to xvalDapc that I think may help you and other users of adegenet!* The updated version of xvalDapc should handle groups smaller than 10 individuals more intelligently than the current stable version. *Please try working with the** development** version of **adegenet* (which will become the stable version, but not until the next release!) by using the following steps: ## you'll need tha package devtools installed, if you don't already have it: install.packages("devtools") library(devtools) ## you may need to remove your previous version of adegenet before installing and loading the devel version: install_github("thibautjombart/adegenet") library(adegenet) Note that you will only notice the behaviour of xvalDapc change in the following cases: - The smallest group in your sample has less than 10 individuals *and* training.set is not set to 0.9. - More than one n.pc gives the lowest RMSE (the old version would have chosen the smallest of those n.pc; the new version will choose the largest) *Third, it seems to me that what you are proposing in the final paragraph of your e-mail effectively is what xvalDapc already does (but I may need more clarification here). * The 30 separate experiments you propose seem no different from the 30 repetitions in xvalDapc (ie. argument n.reps=30). The only difference I can spot is in the last step. xvalDapc returns the mean predictive success from each of the 30 runs while (and correct me if I am wrong here) you propose instead to take, for each individual, the mode as the "best-guess" predicted group (ie. the most frequent "group membership" assignment for each individual). But I may need a little more clarification. Could you explain a little further: (i) How your proposed approach differs from the approach of stratified cross-validation (ii) What you are hoping to infer or to do after you have identified the most frequent predicted group for each individual. Right, I hope my first point offers a little bit of help. Please try installing the devel version of adegenet as outlined in my second point, play around with the updated version of xvalDapc, and see what you think. And if you can give me your further thoughts on my third point, we can go from there. All the best, Caitlin. On Thu, Aug 6, 2015 at 3:43 PM, Crameri Simon wrote: > Hi Caitlin > > I'm writing to you because you are the author of xvalDapc. I'm still > somewhat confused regarding question 2) of my first post. > > You don't need to read it again, lets just consider this: > > - I have a genetic dataset of 100 individuals, and I know the true group > membership of every individual. > - I'd like to build a cross-validated DAPC "model" (let's call it DAPC > model) which can be used to predict group membership of further individuals. > - I run xvalDapc on say 50% of the 100 individuals (the reason I can't > take 90% lies in the small size of some groups). > - I get n.pca = 25 as the best n.pca for building the DAPC model, and > xvalDapc automatically produces an according DAPC, albeit with 100% of the > individuals. > > Now comes the tricky question: Can I really use the DAPC produced by > xvalDapc for prediction purposes? I still think that it is somewhat > problematic to take the full dataset (100 individuals) to build a > cross-validated DAPC model when the n.pca used in the PCA step of DAPC was > determined from training sets of just 50 individuals. Perhaps this is the > reason why you set training.set = 0.9 as a default value, to make this > difference as small as possible? > > An alternative approach would be to use xvalDapc as "just" a (wonderful!) > tool to get an optimal n.pca for your data. But for prediction purposes, > I'd suggest to build a DAPC model with a training set of in this case 50 > individuals (from a stratified sampling) instead of all individuals. If you > don't like to loose the information of the other 50 individuals, you even > could produce say 30 permuted training sets in the same way as xvalDapc > does it, build 30 DAPC models and predict your further individuals against > all permuted 30 DAPC models separately, taking the group that was most > oftenly assigned to an additional sample as the predicted group. > > Do you have any comments on that? I know, it's all very complicated, but > wouldn't that be statistically more appropriate? > > Thank you in advance, > Simon > > > > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Tue, 28 Jul 2015 11:52:41 +0000 > From: "Jombart, Thibaut" > To: "Crameri Simon" , > "" > > Subject: Re: [adegenet-forum] xvalDapc and group prediction accuracy > Message-ID: > <2CB2DA8E426F3541AB1907F98ABA6570ABF58B2D at icexch-m1.ic.ac.uk> > Content-Type: text/plain; charset="iso-8859-1" > > > Hi there > > see the argument 'result' in xvalDapc. The difference you see is the > difference between the mean % of successful prediction averaged over groups > (default), and the overall % of successful prediction. These two quantities > are increasingly different when sample size are unequal. > > Cheers > Thibaut > > > ________________________________ > From: adegenet-forum-bounces at lists.r-forge.r-project.org [ > adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Crameri > Simon [simon.crameri at env.ethz.ch] > Sent: 17 July 2015 16:25 > To: > Subject: [adegenet-forum] xvalDapc and group prediction accuracy > > Hi Thibaut > > I am still working with my tree species whose genotypes I'd like to model > using DAPC, and I am still aiming to use the results as a forensic tool to > identify species genetically. Therefore, the whole approach needs to be as > reliable as possible. I tried xvalDapc() to perform DAPC cross-validation > and found an optimal n.pca: > > table(data at pop) > > > P01 P02 P03 P04 P05 P06 P07 P08 P09 P10 P11 > 11 5 5 16 10 15 34 4 4 11 4 > > xval <- xvalDapc(data at tab, pop(data), training.set = 0.5, result = > "groupMean", n.pca = 10:20, n.rep = 1000) > > > xval$`Mean Successful Assignment by Number of PCs of > PCA`[as.numeric(xval$`Number of PCs Achieving Highest Mean Success`)] > > 14 > 0.9953977 > > xval$'Number of PCs Achieving Lowest MSE' > > [1] "14" > > xval$DAPC$n.pca > > [1] 14 > > > It all works fine, the resulting best n.pca is still 14 if xvalDapc() is > carried out multiple times using the same parameters, and even so when > changing training.set to say 0.9. Now I use the validated model (xval$DAPC) > to predict species membership of additional samples: > > predict(xval$DAPC, newdata=new.data) > > > Again, it's all working perfectly, but what I don't fully understand is > this: > > 1) As it happens, I know the true group membership of the additional > samples. Therefore I can assess the prediction accuracy of xval$DAPC. It > turns out that 96.8% (group mean!) of the additional samples are correctly > predicted by xval$DAPC. Why is this number slightly different from the > expected 99.5%? May it be due to the different group sizes present in the > full dataset (table(data at pop))? > > 2) If the full dataset contains groups of very different size, some of > which are fairly small: would it be more reliable to predict group > membership of additional samples using the above determined n.pca and all > 1000 training sets (which have approximately equal group size) as a > reference, instead of using the full dataset (where group sizes differ) and > just one prediction? The resulting 1000 prediction outcomes could be > screened for the groups most oftenly assinged to each new sample. > > > Any opinions / ideas? Thanks in advance, > > Simon > > ************* > phD student > ETH Zurich > Plant Ecological Genetics > > > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vojta at trapa.cz Wed Aug 19 20:01:45 2015 From: vojta at trapa.cz (=?utf-8?B?Vm9qdMSbY2g=?= Zeisek) Date: Wed, 19 Aug 2015 20:01:45 +0200 Subject: [adegenet-forum] Importing data In-Reply-To: <1729827843.7438870.1440005041610.JavaMail.yahoo@mail.yahoo.com> References: <1729827843.7438870.1440005041610.JavaMail.yahoo@mail.yahoo.com> Message-ID: <5356746.UZtQVanxUU@veles> Hi Dne St 19. srpna 2015 17:24:01, normanm muzhinji napsal(a): > I am trying to import my data using structure format but I am getting thisr > message Error in if (!toupper(.readExt(file)) %in% c("STR", "STRU")) > stop("File extension .stru expected") : argument is of length zero Which commands did You use? How do Your data look like? > Could you kindly help on what commands to use for importing a data saved on > my documents or /on desktop to adegenet for analysis. You probably started with documentation https://github.com/thibautjombart/adegenet/wiki/Tutorials We then need to know what You did to find out what went wrong... > Thanks for your help Sincerely, Vojt?ch -- Vojt?ch Zeisek http://trapa.cz/en/ Department of Botany, Faculty of Science Charles University in Prague Ben?tsk? 2, Prague, 12801, CZ http://botany.natur.cuni.cz/en/ Institute of Botany, Academy of Science Z?mek 1, Pr?honice, 25243, CZ http://www.ibot.cas.cz/en/ Czech Republic -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 473 bytes Desc: This is a digitally signed message part. URL: From caitiecollins at gmail.com Wed Aug 19 21:38:33 2015 From: caitiecollins at gmail.com (Caitlin Collins) Date: Wed, 19 Aug 2015 20:38:33 +0100 Subject: [adegenet-forum] Question about how to interpret Cross validation in my analysis. Thanks! In-Reply-To: <8795262F44452D43BA65839DAE6842D0438BA9@ci011.cawthron.org.nz> References: <8795262F44452D43BA65839DAE6842D03DBE7D@ci011.cawthron.org.nz> <8795262F44452D43BA65839DAE6842D0438BA9@ci011.cawthron.org.nz> Message-ID: Hi Angela, Sorry for the delay in getting back to you. I?ve just returned from teaching in the woods without Internet. I?m glad to see you?ve been thinking so much about cross-validation and mean-squared error! First of all, *I think you may be focusing too much on the spread of your RMSE*. The MSE is just a measure of how far, on average, your estimates are from the target (in our case, 100% success). We use RMSE instead of the mean success to pick the ?best? n.pc in cross-validation because, as I demonstrated with my toy example, in cases where the mean is the same, RMSE allows us to identify which of these cases with the same mean has, in addition, a lower error. That said, I think there may be an error in your sample calculations for the toy example you present regarding RMSE! *My calculations for those two sets of numbers show that while they both have the same mean, their RMSEs do differ*. Try running the following commands to get those results (and then, if you want, try the examples that follow): ## your case 1 x <- c(30.5, 34.5, 31, 31, 35) mean(x) # 32.4 sqrt(mean(((100-x)^2))) # 67.62766 ## your case 2 x <- c(50, 15, 34, 38, 25) mean(x) # 32.4 sqrt(mean(((100-x)^2))) # 68.62944 ## Try these out too to get a feel for how RMSE can vary while the mean stays the same: x <- c(5,95) mean(x) # 50 sqrt(mean(((100-x)^2))) # 67.27 x <- c(25,75) mean(x) # 50 sqrt(mean(((100-x)^2))) # 55.9 x <- c(5, 25, 75, 95) mean(x) # 50 sqrt(mean(((100-x)^2))) # 61.85 x <- c(5, 5, 25, 25, 75, 75, 95, 95) mean(x) # 50 sqrt(mean(((100-x)^2))) # 61.85 x <- c(5, 5, 25, 75, 95, 95) mean(x) # 50 sqrt(mean(((100-x)^2))) # 63.71 In general, I would say that *because RMSE is able to tell you which n.pcs gives you the best model, you don?t necessarily need to delve much further into the results to pick the optimum model*. If you do, nevertheless, want to look into your results for each replicate of cross-validation, that should be possible. When you ran the analysis to which you are referring, did you happen to save the output of xvalDapc? If so, *the first element of this output contains the results for each replicate*, at each level of PC retention. If not, I would suggest running it again with the same argument settings as before. And, as an added suggestion, try running the command set.seed(1) directly before running xvalDapc. This will allow you to control the random behaviour inherent in xvalDapc?s random sampling (ie. While effectively random, you would be able to get the exact same results every time you ran set.seed(1) followed by xvalDapc, in case you want to be able to perfectly replicate the results you get on a future occasion.) I?ve also posted something relatively recently that I think may help address your final point. *Please take a look at out one of my previous posts to the adegenet forum on the subject of interpreting ?accuracy? in xvalDapc results:* http://lists.r-forge.r-project.org/pipermail/adegenet-forum/2015-July/001212.html Does that help at all? Overall, I would say that I think *particularly because you said you are not actually trying to build a model for the purposes of prediction, you may be better off just reporting the proportion of successful assignment achieved with the DAPC model built with the optimum number of PCs selected via xvalDapc (ie. using assign.prop from summary(dapc))*. If you still have questions after reading this e-mail and that post, of course I?ll still be here to address them! All the best, Caitlin. On Tue, Aug 4, 2015 at 4:05 AM, Angela Merino wrote: > > > Thank you very much for your detailed and really useful explanations. J > > > > I will try to explain what I aim to do: > > > > First of all, a bit of the background: I am working with a migratory bird > species from which any genetic analysis has been done before. This species > migrates from New Zealand to Alaska: they breed in Alaska. We don?t know > much about its ecology in Alaska, but from data recorded in New Zealand and > geolocators that have been set in some individuals and successfully > retrieved from them we have seen that there is a relationship between > departure time from NZ and breeding latitude in Alaska (earlier departers > breed southern, later departers northern: in a cline). Moreover, although > the species has a span time of departure for migration, individuals show > high consistency in their departure time (so I categorized individuals into > predefined clusters based on behavior = i.e. ?early birds? ?late birds?). > In other words, my hypothesis is that because there is a relationship > between timing of migration and breeding site, they may show genetic > structuration (which could be seen using neutral markers, as > microsatellites). If I find a sign of significant structuration in the > population using predefined clusters based on its behavior (I have 145 > individuals sampled and with behavior data) I could say that migratory > patterns are at least associated with the subtle population structure. > > > > STRUCTURE v.2.3 (Bayesian clustering) found no structure (k = 1) however k > = 2 had quite similar values. > > In Arlequin (AMOVA) I got very weak Fst between two predefined groups with > the most extreme behaviours, and significant! > > K-means algorithm find k = 2 when I use predefined subpopulations ?early? > vs. ?late?. Although the correct assignment of these inferred clusters > don?t correspond very much with my predefined clusters: only about 60% > correct assignment. But I think this is in accordance with the subtle > population structure found in the other analyses. > > > > So my aim with DAPC is to explore more this very weak genetic structure > that Bayesian clustering and AMOVA are suggesting. I don?t aim to build any > model to use later for identifying behavior of an individual from its > markers, but I aim to see whether it could be possible to build a model > (using a number of PCs from the information provided by the > microsatellites) that could identify/ classify/ group/ distinguish > individuals in their correct behavior, using those 145 individuals from > which I have genotyping (microsatellites) and behavior. Therefore, if I can > built a model from these microsatellites with better success than random > chance in distinguishing the migratory behavior, that could suggest that > neutral markers structure in the population is associated to migratory > behavior in this long-distance migratory bird: my predefined clusters based > on behavior are not random groups but they mean something (in relation to > its population structure). > > > > > > --------- > > Answering this question: > > Based on the results of those other clustering methods you mention, I > assume you have identified potential clusters based on the genetic markers > you are using (right?). When running the DAPC analysis, you have grouped > individuals into clusters of interest. So my first question is:* Have you > used either (i) The clusters identified by other methods? Or (ii) Clusters > based on ?behaviour?, that are NOT the same as those identified by other > methods?* > > *If you have used (ii)* Clusters based on ?behaviour?, that are NOT the > same as those identified by other methods, *then can you please try to > explain again what exactly you are trying to do? * > > > > Yes, I did not use clusters identified by other methods since there was > not much accordance of my clustering with K-means inferred clusters, as I > said before. Then I decided to keep my predefined clusters based on my > behavior data (which is quire robust data from observations and geolocators > along about 6 years). > > > > I hope this clarifies what I am doing J. Please tell me if you find that > this methods would help me to solve my aim. > > > > As for the specific question about the meaning of the value of MSE, thanks > very much for such a detailed explanation, I understand now much better. J > > > > I think my situation is more similar to the case 2 you explained. However > I don?t have the data for each replicate (if I well remember I did 15 > replicates (?) which correspond to each point per PCs in the graphic-see > below-don?t pay attention to the yellow highlighted text-), so I don?t know > how different the MSE for each replicate is from each other (it could be > that the RMSE which is 32.4% came from = > > > > sqrt((((30.5-100)^2) + ((34.5-100)^2) + ((31-100)^2) + ((31-100)^2) + ((35-100)^2)) > / 5) > > *RMSE = 32.4% (in blue, the lower success value, in red, the higher > success value) **?** more consistent* > > > > *Or* > > > > sqrt((((50-100)^2) + ((15-100)^2) + ((34-100)^2) + ((38-100)^2) + > ((25-100)^2)) / 5) > > *RMSE = 32.4% (in blue, the lower success value, in red, the higher > success value) **?** less consistent* > > > > *among many other possibilities?..* > > > > Therefore, one need to see the values from which the mean success and the > RMSE are calculated from, if not, I can?t determine how consistent the > error (or success) of my model is (right?). > > In other words, without this information (success and/or error for each > replicate per X number of PCs), I don?t know if the value of my RMSE > (32.4%) is far from the higher error my model could have. Or, in other > words, I won?t know (and I don?t know indeed), what is the lower success I > could see in my ?best? model. > > > > In the case the mean success and RMSE were consistent along the replicates > when PCs=25 of my ?best? model, it would mean that I found a model that is > able to assign individuals correctly to the behavior-group they belong to > better than just classifying them randomly. And, although it is not a very > useful model (not a great mean success and quite high error, as discussed > before), it is meaningful. > > > > Thank you very much, > > > > I will be looking forward for your answer, J. Let me know if you need > more clarifications. > > > > ?Angela > > > > > > *From:* Caitlin Collins [mailto:caitiecollins at gmail.com > ] > *Sent:* Wednesday, 29 April 2015 6:22 a.m. > *To:* Angela Merino > *Cc:* t.jombart at imperial.ac.uk > > *Subject:* Re: Question about how to interpret Cross validation in my > analysis. Thanks! > > > > Hi Angela, > > > > Nice to hear from you. > > > > *Before I can answer you, I think I need to ask you to explain in a little > more detail what you are trying to do. * > > My confusion is mainly with: > > (1) Your aim > > (2) Your clusters. (ie. DAPC clusters based on behaviour, versus clusters > identified by Bayesian, K-means, and AMOVA) > > > > You say, ?? if I can build a model from genotyping (neutral markers) I > could say that the behavior I used to defined those subpopulations would be > associated with the very weak pop. structure I found with other methods > (Bayesian, K-means, AMOVA)?. This is mostly what I am confused about? > > > > Based on the results of those other clustering methods you mention, I > assume you have identified potential clusters based on the genetic markers > you are using (right?). When running the DAPC analysis, you have grouped > individuals into clusters of interest. So my first question is:* Have you > used either (i) The clusters identified by other methods? Or (ii) Clusters > based on ?behaviour?, that are NOT the same as those identified by other > methods?* > > > > *If you have used (i)* The clusters identified by other methods, then I > would take your aim in using DAPC to be: ?Identify the genetic markers that > best describe the differences between the (weakly-supported (??)) > population clusters identified by other methods?. *Does that sound like > what you are trying to do?* > > > > *If you have used (ii)* Clusters based on ?behaviour?, that are NOT the > same as those identified by other methods, *then can you please try to > explain again what exactly you are trying to do? * > > > > > *---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------* > > > > > *---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------* > > > > *For now, without fully understanding what you are trying to do, I can try > to give you some input?* > > > > First, you are correct in saying that the MSE (32.4%) is quite high (ie. > Not good). However, as we discussed in previous e-mails, because your mean > success falls outside of the limits of the confidence interval for random > chance, you will still be able to use the "best" model based on 25 PCs (ie. > While it is not a great model, it is not a meaningless model either). > > > > Second, *let me try and explain why RMSE is given as a percent, and what > RMSE really is:* > > - The root-mean-square error or ?root-mean-square deviation? (same > thing, we?ll just call it ?*RMSE*?) is just *a measure of the difference > between the expected value for a model and the observed values*. > > - In the case of cross-validation, we are trying to *build a model > by combining some number of principal components (PCs).* > > - As you can see in the cross-validation plot produced by xvalDapc, > the *?value?* we are measuring is the *proportion of successful outcome > prediction.* > > o *NOTE*: In the plot, this value ranges from 0 to 1, representing the > proportion of individuals, out of 1, for which we were able to successfully > predict the cluster to which they belong, based on the model composed of > however many PCs are indicated on the x-axis. However, this ?proportion? > can also be thought of as a percent. This is also the case for the *RMSE, > which can be thought of approximately as the inverse of the ?Proportion > successful outcome prediction? or ?Mean successful assignment? **(as you > know, these last two are the same thing, both representing ?success?), so > the RMSE is approximately (1 - success). * > > > > *I think I will be able to explain this best with a simple example, based > loosely on your results (but not identical to them): * > > > > *Example.1: * > > - You see that the *highest mean success* you can get with any > number of PCs in the model is *0.69*. This proportion if written as a > percent would be *69%*. > > o So you can say that, ?the highest mean success you can achieve in > correctly predicting the cluster of the individuals in your dataset is 69% > (with 25 PCs in the model)?. (ie. 69 times out of 100, you would correctly > identify the true cluster of an individual, with a model based on 25 PCs). > > - You also see that the *lowest RMSE* you can achieve is *0.32417? *This > proportion, if written as a percent, would be ~ *32.4%* RMSE. > > o You may have noticed that the RMSE with a model based on 25 PCs (RMSE > = 32.4%) is *approximately* (*100% - success)*, based on 25 PCs. > > - This brings me to *the reason we used RMSE to choose the ?best? > number of PCs to keep in the model, instead of mean success:* > > o While in your case, *both* highest mean success *and *lowest RMSE > were found to occur when the model was based on 25 PCs, * this is not > always the case*. > > o Sometimes, the model with the lowest RMSE is based on a *different *number > of PCs than the model with the highest mean success. This is because, as > you will have noticed, the* RMSE is NOT exactly (100% - success)*. > > o *Let me illustrate this with another little example:* > > > > *Example 1.1:* > > o Say you have performed your cross-validation analysis with only *5 > replications*. > > o *Both** of the following sets of observed values would allow you to > achieve a mean success of 69%:* > > ? *Case 1:* *69, 69, 69, 69, 69* > > ? *Case 2:* *65 69, 73, 54.5, 83.5 * > > o We can calculate the RMSE for both of these sets of values. > > o First,* recall* that our hypothetical* ?expected value?* is* always > 100%.* > > o The following is the equation for calculating the RMSE: > > ? *RMSE = sqrt( E(x? ? x)^2)* > > o In English, this formula says that the root mean squared error is > equal to the square root of the expected value (aka the sum across all > observations) of the difference between x? (the observed value (ie. the > observed success for a given point)) and x (the ?expected value?, which we > have said is always 100% success), squared. > > o Let?s calculate RMSE for both cases: > > o *Case 1: * > > o RMSE = > > ? sqrt((((69-100)^2) + ((69-100)^2) + ((69-100)^2) + ((69-100)^2) + ((69-100)^2)) > / 5) > > ? *RMSE = 31%* > > o *Case 2:* > > o RMSE = > > ? sqrt((((65-100)^2) + ((69-100)^2) + ((73-100)^2) + ((54.5-100)^2) + > ((83.5-100)^2)) / 5) > > ? *RMSE = 32.4%* (*just like yours!*) > > - In *Case 1*, because for each point, we got a success rate of > exactly 69%, the RMSE is *exactly (100% ? success) because 100 ? 69 = 31. > * > > - However, in *Case 2*, the success rates for each round varied > (although they still gave a mean success of 69%), so the *RMSE is NOT > exactly 100 ? success, but a little more than this number (32.4%).* > > - The reason Case 2 had a higher (read: *worse*) RMSE than Case 1 > is because, while they both had a mean success of 69%, the points in the > Case 2 set varied from this value (ie. sometimes the success rate was much > lower than 69, and sometimes it was higher). > > - *This is why we use RMSE to determine which model has the ?best? > number of PCs, instead of mean success*. If, for example, we had gotten > the values for Case 1 with 30 PCs in the model, and the values for Case 2 > with 25 PCs in the model, we would want to choose the model with 30 PCs as > the ?best? model. This is because, while both models were able to give a > mean success rate of 69%, the model with 30 PCs (ie. Case 1) was able to do > this *more consistently*. Therefore, if we choose the model with 30 PCs, > we can be confident that we will consistently have about 69% success and > 31% error, while if we chose the model with 25 PCs (Case 2), we would > expect about 69% success, but we could see success as low as 54.5% > (corresponding to errors as high as 45.5%. > > > > > > Okay, now I will stop writing and say sorry for the very very long answer! > But I hope this helps you in thinking about and interpreting RMSE. > > > > > Please let me know more about my question in Part I, and I will do my best > to help you there. > > > > All the best, > > Caitlin. > > > > On Fri, Apr 24, 2015 at 2:23 AM, Angela Merino < > Angela.Merino at cawthron.org.nz> wrote: > > Hi Caitlin Collins, > > > > I have another question regarding the interpretation of the *root mean > square error*. Sorry for this lap of time between questions! I find this > very interesting, but a challenge for me to really get the right > interpretation. > > > > I understood all what you explained me in previous emails (very helpful, > thanks so much!!!). J J > > > > But now I have been thinking about the mean squared error and what is it > telling me about the model for predicting predefined subpopulations. > Refreshing my situation: I aim to test predefined subpopulations by using > genotype data and DAPC to build a model that may be able to find a function > to accurately distinguish those two hypothetical subpopulations. My > rational is that if I can build a model from genotyping (neutral markers) I > could say that the behavior I used to defined those subpopulations would be > associated with the very weak pop. structure I found with other methods > (Bayesian, K-means, AMOVA). I got a model with a success rate of 69% > (random chance 49%, 43%-60%), and I understood/agree that being my success > rate higher than the higher limit of random chance?.etc. However, as u > said, the MSE is quite high (*32.4%).* > > > > I read about MSE and understood that it gives information about how well > data points fit to the line of the function (right?). Well, it is actually > the average of the distances of my set of data points from the line given > by the function (I try to visualize it?). But I don?t understand well *why > is it given in %?* > > > > I have the impression that what I have is lots of outliers that don?t fit > well with the function (I visualize this as noise, lot of noise?which for > me means, not accuracy, then not good predictor of my predefined > subpopulations, then genetic data is not very useful to identify my > hypothetical subpopulations?then my hypothetical subpopulations may don?t > have sense with the weak genetic structure other methods (i.e. Bayesian, > AMOVA and K-means) suggested). *What do you think?* > > > > Thanks in advance, J > > > > Kind regards, > > > > ?Angela > > > > *From:* Caitlin Collins [mailto:caitiecollins at gmail.com] > *Sent:* Saturday, 25 October 2014 6:03 a.m. > *To:* Angela Merino; adegenet-forum at lists.r-forge.r-project.org > > > *Cc:* Collins, Caitlin; Jombart, Thibaut > *Subject:* Re: Question about how to interpret Cross validation in my > analysis. Thanks! > > > > Hello again, > > > > In response to your two questions: > > > > *1) * > > > > The output element ?mean and CI for random chance? provides the values > that are used to draw the horizontal solid (mean) and dashed (CI) lines on > the plot generated for cross-validation. > > > > In your case, the mean and CI for random chance was 49% (43%, 60%). The > interpretation of this would be that if the highest success in outcome > prediction that you were able to achieve with any model was between 43% and > 60%, then you could be 95% confident that the ability of even the best > model to assign individuals to the correct group does not differ > significantly from the success rate you could achieve by assigning > individuals to a group at random by, say, flipping a coin as a method of > determining what group they belonged to. Ergo, you would not have succeeded > in creating a useful model. > > > > However, your results indicate that with 25 PCs retained, your model had a > success rate of 69.5%, so you *have* created a ?useful? model. Even > though it is not a particularly successful model, it still has a mean > success rate that is 20% higher than the mean success for the coin toss > approach, and 10% higher than the upper limit of the CI for random chance. > So you can be 95% confident that the somewhat modest ability of your best > model to discriminate between groups is not just happening by chance?the > model is truly doing something useful. > > ------ > > *2)** 2)* > > While your interpretation is generally true, in that group > membership is not well-predicted by any model, I think you have mis-read > the results. The way they are laid out, at least in the text you copied > into the e-mail, has skewed the values given for the means to the right of > the number of PCs that they should be corresponding to? With 25 PCs, your > optimal model is actually achieving a mean success of nearly 70%. Still not > too good, but better than 63%. The MSE for 25 PCs is 32.4%, which is indeed > quite high. > > However, the interpretation of this is not that you can only be ?sure? of > correctly predicting around 20% to the right pre-defined group. Rather, you > can be ?sure? of correctly predicting almost 70%! I think your confusion > here may come from your interpretation of what the random chance values > mean. Finding that the mean success for your best model is 20% above the > mean success for random chance does not mean you can only be sure of 20% > correct predictions. Rather, you could say that while you can in fact > expect a 70% success rate (your highest mean success), your model is only > providing an improvement of ~ 20% over the success rate you could have > achieved by tossing a coin. > > This changes the severity of your final conclusion. First, I should > mention that it?s not fair to say that ?[your] set of microsatellites can?t > explain well [your] pre-defined groups?. Instead, it might be more accurate > to say, ?*With* the set of microsatellites available, you are unable to > build a *model* with DAPC that explains well the variation between your > pre-defined groups.? Finally, in light of the points above, while it is > still true that the model does not explain the variation between groups > particularly well, it does explain about 70% of that variation, so I > wouldn?t consider it to be ?unsuccessful?. > > ----- > > Sorry for the long answer, but I hope it helps a bit at least! > > Please let me know if it doesn?t though, or if you have any more > questions. > > > > All the best, > Caitlin. > > > > On Thu, Oct 16, 2014 at 11:30 PM, Angela Merino < > Angela.Merino at cawthron.org.nz> wrote: > > Thanks you very much! It was really helpful! J > > > > Then I understand that my models is not significantly the best model that > could be found using my variables (in my case, microsatellites). If I use a > model with n.pca=20 or =40 I got pretty the same success of membership > prediction (and with the same big root mean squared error). > > > > 1) My last questions (I hope!) to understand the output of the > *cross.validation* function is what does it mean the Median and > Confidence Interval for Random Chance (below in yellow)? I think it means > that with a confidence of 95% the value of successful assignment would be a > value between 43% and 60%, which therefore means again that the > optimization of my model was ?not successful?. (??) > > 2) About the global interpretation of this results, I would say that > membership of my predefined groups are not well predicted by any model as > the mean successful assignment is not higher than 63% (Maximum when > n.pcs=25) and in addition the mean squared errors is quite high (30-40%). I > would be ?sure? of predicting only around 20% to the right predefined > group. In short, my set of microsatellites can?t explain well my predefined > groups. > > > > > > [image: cid:image002.jpg at 01CFE7A4.CCC02130]*$`Median and Confidence > Interval for Random Chance`* > > * 2.5% 50% 97.5% * > > *0.4294840 0.4928747 0.5962807* > > *$`Mean Successful Assignment by Number of PCs of PCA`* > > * 5 10 15 20 25 30 > 35 40 * > > *0.5871429 0.6000000 0.5819048 0.6014286 0.6952381 0.6747619 0.6333333 > 0.6109524 * > > *$`Number of PCs Achieving Highest Mean Success`* > > *[1] "25"* > > *$`Root Mean Squared Error by Number of PCs of PCA`* > > * 5 10 15 20 25 30 > 35 40 * > > *0.4301795 0.4141872 0.4389381 0.4131429 0.3241735 0.3531491 0.3885084 > 0.4145894 * > > *$`Number of PCs Achieving Lowest MSE`* > > *[1] "25"* > > > > > > > > > > > > > > > > Thanks in advance! I am learning a lot about R and adegenet package and I > find really interesting to assess weak genetic population structure. > > > > Kind regards, > > > > ?Angela > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > *From:* Caitlin Collins [mailto:caitiecollins at gmail.com] > *Sent:* Friday, 17 October 2014 1:28 a.m. > *To:* Angela Merino > *Cc:* Collins, Caitlin; Jombart, Thibaut > *Subject:* Re: Question about how to interpret Cross validation in my > analysis. Thanks! > > > > Hi Angela, > > > Well, I have two pieces of good news for you, and one piece of mediocre > news. > > First, there?s nothing to worry about with respect to the ?NULL? that you > are seeing. It just gets printed when xval.plot=TRUE as an artefact of one > of the lines of the printing function. It has no meaning, and certainly > does not imply that your model is not valid. (Given the stress that I now > realise this glaring ?NULL? may cause, I?ve changed the way the plots print > now, so in the next release of adegenet this won?t happen.) > > Second, you are absolutely correct in your interpretation of the results > of xvalDapc (which are stored in whatever object you assigned the results > to, in your case, ?xval?). > > > > This brings me to the mediocre news: given that your interpretation is > correct, it seems that the best model you can achieve with DAPC, where > n.pca=25, is only able to predict the group membership of validation set > individuals in 63% of the cases, with a 32% root mean squared error. > Arguably, this is not great. Your final comment on the matter, though, is > quite insightful. The fact that you can achieve the same modest level of > success with 20-80 PCs indicates that the optimisation procedure has not > been particularly successful. Ideally, one would like to see an arch, with > a maximum success point somewhere in the middle. In your case, there is a > bit of an arch, but it isn?t particularly striking. > > > > The only thing I might add to your interpretation of this result is that > it?s not so much that the model is poor because a similar level of success > can be achieved with variable numbers of PCs. If mean success was virtually > constant, but varying around 90%, the interpretation would not be that the > model is poor, but rather that most levels of PC retention can compose a > model that effectively discriminates between groups. > > I hope this has helped answer some of your questions. If you have any > more, please feel free to ask. > > Best, > Caitlin. > > > > > > On Mon, Oct 13, 2014 at 11:48 PM, Angela Merino < > Angela.Merino at cawthron.org.nz> wrote: > > Hi Caitlin Collins and Thibaut Jombart, > > > > My name is Angela Parody-Merino and I am a PhD student at Massey > University (New Zealand). I am studying the population genetic structure in > a migratory bird (the New Zealand Godwit) with 23 microsatellites. Anyway, > maybe this is a very simple question but I really want to understand and be > sure about the meaning and interpretation of the output when doing > cross-validation. I have been some days looking in the internet and reading > explanations etc?without being able to really understand what?s going on > with my analysis. Could you help me please? J > > > > This is the script of the analysis: > > > x <- ELpop > > > mat <- as.matrix(na.replace(x, method="mean")) > > > > Replaced 371 missing values > > > grp <- pop(x) > > > xval <- xvalDapc(mat, grp, n.pca.max = 40, training.set = 0.9, > > + result = "groupMean", center = TRUE, scale = FALSE, > > + n.pca = NULL, n.rep = 500, xval.plot = TRUE) > > NULL *>>> What does it mean this NULL? Does it mean that the model is not > valid?* > > *$`Median and Confidence Interval for Random Chance`* > > * 2.5% 50% 97.5% * > > *0.4294840 0.4928747 0.5962807 * > > > > *$`Mean Successful Assignment by Number of PCs of PCA`* > > * 5 10 15 20 25 30 > 35 40 * > > *0.5871429 0.6000000 0.5819048 0.6014286 0.6952381 0.6747619 0.6333333 > 0.6109524 * > > > > *$`Number of PCs Achieving Highest Mean Success`* > > *[1] "25"* > > > > *$`Root Mean Squared Error by Number of PCs of PCA`* > > * 5 10 15 20 25 30 > 35 40 * > > *0.4301795 0.4141872 0.4389381 0.4131429 0.3241735 0.3531491 0.3885084 > 0.4145894 * > > > > *$`Number of PCs Achieving Lowest MSE`* > > *[1] "25"* > > > > *From the screenshot and the output results of the cross validation (in > blue), I would say that my model (retaining 25PCs) can predict with a mean > of 63% but it is not such a good model because most of the models that can > be obtained by retaining 20, 40, 60, 80 PCs are quite the same successful. > Is it my interpretation correct?* > > > > > > > > Thanks in advance, > > > > Kind regards, > > > > ?Angela Parody-Merino > ------------------------------ > > *Attention: * > This message is for the named person's use only. It may contain > confidential, proprietary or legally privileged information. If you > receive this message in error, please immediately delete it and all copies > of it from your system, destroy any hard copies of it and notify the > sender. You must not, directly or indirectly, use, disclose, distribute, > print, or copy any part of this message if you are not the intended > recipient. Cawthron reserves the right to monitor all e-mail communications > through its networks. Any opinions expressed in this message are those of > the individual sender, except where the message states otherwise and the > sender is authorised to make that statement. > > This e-mail message has been scanned and cleared by *MailMarshal * > ------------------------------ > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 31124 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.jpg Type: image/jpeg Size: 48953 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image004.png Type: image/png Size: 308994 bytes Desc: not available URL: From coraline.bichet at univ-lyon1.fr Thu Aug 20 11:29:46 2015 From: coraline.bichet at univ-lyon1.fr (BICHET CORALINE) Date: Thu, 20 Aug 2015 09:29:46 +0000 Subject: [adegenet-forum] colorplot and scatter Message-ID: Dear forum I have two questions about the adegenet functions "colorplot" and "scatter" I precise that I am a beginner with the package adegenet, so excuse me in advance if my questions seem a little stupid... 1. Colorplot I made a colorplot to represent the result of a sPCA. The colorplot plots the different populations according to their localisation and the colors correspond to the global sPCA score. I would like to add a legend in a box representing the correspondance between colors and sPCA score (as the image below). I search... but I did not find anything to do this... [cid:426d0c7f-6e00-4dd2-a001-eeab8f80089a] 2. Scatter I used the function scatter to represent the results of a DAPC. The colored dots in the plot represent the cluster, but I would like to know if it is possible that the dots represent populations instead of clusters. The clusters would be still represented with the colored bars and the ellipses. I only want to change the dot colors. I tried to manually add dots in the plot (with the function "points"), using the coordinates of each indviduals in the DAPC axes. But the dots did not appeared... maybe due to a scale problem that I cannot solve... Here the function that I use for one population. The data frame "ind" contains the DAPC LD1 and LD2, and the population names of the different individuals : points(ind$LD1[ind$pop=="Aussois"]), scale(ind$LD2[ind$pop=="Aussois"]), col="black", pch=16, cex = 1) I hope that my questions are comprehensible. Thank you very much in advance for your help. Don't hesitate to let me know if you need further information to more understand my problems. Thanks a lot! Coraline Bichet coraline.bichet at univ-lyon1.fr +33(0)472433584 UMR-CNRS 5558, Laboratoire de Biom?trie et Biologie Evolutive (LBBE) Universit? Claude Bernard Lyon 1, b?timent Mendel 43 boulevard du 11 novembre 1918 69622 Villeurbanne Cedex, FRANCE -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: box.jpg Type: image/jpeg Size: 5902 bytes Desc: box.jpg URL: From t.jombart at imperial.ac.uk Thu Aug 20 13:01:56 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Thu, 20 Aug 2015 11:01:56 +0000 Subject: [adegenet-forum] colorplot and scatter In-Reply-To: References: Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570B1287D6F@icexch-m1.ic.ac.uk> Hi Coraline The colorplot is basically translating up to 3 quantitative variables into colors using RGB coding. So each axis corresponds to a channel, but the problem is our eyes are not good at knowing what color comes out when combining values RBG channels. To have a proper legend for 3 axes we would need some kind of color triangle. And a different system for 2 axes.. nothing impossible to code, but cumbersome. As for your second question, your approach is the best way to go. But there's a trick. The function add.scatter.eig, which adds the barplots of eigenvalues to the plot, changes the coordinate system, so that you can't plot anything afterwards. You need to disable them,e g.: library(adegenet) example(dapc) scatter(dapc1, col="transparent", scree.da=FALSE) points(dapc1$ind.coord[,1],dapc1$ind.coord[,2], col=funky(100)) Note that if some points are missing around the edges, you may need a par(xpd=TRUE) right before your 'points(..)'. Cheers Thibaut ============================== Dr Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Department of Infectious Disease Epidemiology Imperial College - School of Public Health Norfolk Place, London W2 1PG, UK Tel. : 0044 (0)20 7594 3658 http://sites.google.com/site/thibautjombart/ http://sites.google.com/site/therepiproject/ http://adegenet.r-forge.r-project.org/ Twitter: @thibautjombart ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of BICHET CORALINE [coraline.bichet at univ-lyon1.fr] Sent: 20 August 2015 10:29 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] colorplot and scatter Dear forum I have two questions about the adegenet functions "colorplot" and "scatter" I precise that I am a beginner with the package adegenet, so excuse me in advance if my questions seem a little stupid... 1. Colorplot I made a colorplot to represent the result of a sPCA. The colorplot plots the different populations according to their localisation and the colors correspond to the global sPCA score. I would like to add a legend in a box representing the correspondance between colors and sPCA score (as the image below). I search... but I did not find anything to do this... [cid:426d0c7f-6e00-4dd2-a001-eeab8f80089a] 2. Scatter I used the function scatter to represent the results of a DAPC. The colored dots in the plot represent the cluster, but I would like to know if it is possible that the dots represent populations instead of clusters. The clusters would be still represented with the colored bars and the ellipses. I only want to change the dot colors. I tried to manually add dots in the plot (with the function "points"), using the coordinates of each indviduals in the DAPC axes. But the dots did not appeared... maybe due to a scale problem that I cannot solve... Here the function that I use for one population. The data frame "ind" contains the DAPC LD1 and LD2, and the population names of the different individuals : points(ind$LD1[ind$pop=="Aussois"]), scale(ind$LD2[ind$pop=="Aussois"]), col="black", pch=16, cex = 1) I hope that my questions are comprehensible. Thank you very much in advance for your help. Don't hesitate to let me know if you need further information to more understand my problems. Thanks a lot! Coraline Bichet coraline.bichet at univ-lyon1.fr +33(0)472433584 UMR-CNRS 5558, Laboratoire de Biom?trie et Biologie Evolutive (LBBE) Universit? Claude Bernard Lyon 1, b?timent Mendel 43 boulevard du 11 novembre 1918 69622 Villeurbanne Cedex, FRANCE -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: box.jpg Type: image/jpeg Size: 5902 bytes Desc: box.jpg URL: From muzhinjin at yahoo.com Thu Aug 20 15:29:03 2015 From: muzhinjin at yahoo.com (normanm muzhinji) Date: Thu, 20 Aug 2015 13:29:03 +0000 (UTC) Subject: [adegenet-forum] Importing data In-Reply-To: <5356746.UZtQVanxUU@veles> References: <5356746.UZtQVanxUU@veles> Message-ID: <1277800057.7933534.1440077343935.JavaMail.yahoo@mail.yahoo.com> Thank you I am now using Fstat format and this is what I am getting > obj <- read.fstat(system.file("files/rhizoc.dat",package="adegenet")) Error in if (toupper(.readExt(file)) != "DAT") stop("File extension .dat expected") :?argument is of length zero The data ?I am using is as attached. I used the same data for calculating population genetic parameters in FSTAT package Thanks ?for your help Norman On Wednesday, August 19, 2015 8:02 PM, Vojt?ch Zeisek wrote: Hi Dne St 19. srpna 2015 17:24:01, normanm muzhinji napsal(a): >? I am trying to import my data using structure format but I am getting thisr > message Error in if (!toupper(.readExt(file)) %in% c("STR", "STRU")) > stop("File extension .stru expected") : argument is of length zero Which commands did You use? How do Your data look like? > Could you kindly help on what commands to use for importing a data saved on > my documents or /on desktop to adegenet for analysis. You probably started with documentation https://github.com/thibautjombart/adegenet/wiki/Tutorials We then need to know what You did to find out what went wrong... > Thanks for your help Sincerely, Vojt?ch -- Vojt?ch Zeisek http://trapa.cz/en/ Department of Botany, Faculty of Science Charles University in Prague Ben?tsk? 2, Prague, 12801, CZ http://botany.natur.cuni.cz/en/ Institute of Botany, Academy of Science Z?mek 1, Pr?honice, 25243, CZ http://www.ibot.cas.cz/en/ Czech Republic _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: rhizoc.dat Type: application/octet-stream Size: 3092 bytes Desc: not available URL: From t.jombart at imperial.ac.uk Thu Aug 20 15:59:12 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Thu, 20 Aug 2015 13:59:12 +0000 Subject: [adegenet-forum] Importing data In-Reply-To: <1277800057.7933534.1440077343935.JavaMail.yahoo@mail.yahoo.com> References: <5356746.UZtQVanxUU@veles>, <1277800057.7933534.1440077343935.JavaMail.yahoo@mail.yahoo.com> Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570B1287DCD@icexch-m1.ic.ac.uk> Hi Norman, please check the guidelines when posting - this question has been answered a few times on this forum. Check for instance: http://lists.r-forge.r-project.org/pipermail/adegenet-forum/2010-November/000158.html Cheers Thibaut ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of normanm muzhinji [muzhinjin at yahoo.com] Sent: 20 August 2015 14:29 To: Vojt?ch Zeisek; Adegenet R-Forum Subject: Re: [adegenet-forum] Importing data Thank you I am now using Fstat format and this is what I am getting > obj <- read.fstat(system.file("files/rhizoc.dat",package="adegenet")) Error in if (toupper(.readExt(file)) != "DAT") stop("File extension .dat expected") : argument is of length zero The data I am using is as attached. I used the same data for calculating population genetic parameters in FSTAT package Thanks for your help Norman On Wednesday, August 19, 2015 8:02 PM, Vojt?ch Zeisek wrote: Hi Dne St 19. srpna 2015 17:24:01, normanm muzhinji napsal(a): > I am trying to import my data using structure format but I am getting thisr > message Error in if (!toupper(.readExt(file)) %in% c("STR", "STRU")) > stop("File extension .stru expected") : argument is of length zero Which commands did You use? How do Your data look like? > Could you kindly help on what commands to use for importing a data saved on > my documents or /on desktop to adegenet for analysis. You probably started with documentation https://github.com/thibautjombart/adegenet/wiki/Tutorials We then need to know what You did to find out what went wrong... > Thanks for your help Sincerely, Vojt?ch -- Vojt?ch Zeisek http://trapa.cz/en/ Department of Botany, Faculty of Science Charles University in Prague Ben?tsk? 2, Prague, 12801, CZ http://botany.natur.cuni.cz/en/ Institute of Botany, Academy of Science Z?mek 1, Pr?honice, 25243, CZ http://www.ibot.cas.cz/en/ Czech Republic _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum -------------- next part -------------- An HTML attachment was scrubbed... URL: From coraline.bichet at univ-lyon1.fr Thu Aug 20 16:14:16 2015 From: coraline.bichet at univ-lyon1.fr (BICHET CORALINE) Date: Thu, 20 Aug 2015 14:14:16 +0000 Subject: [adegenet-forum] colorplot and scatter In-Reply-To: <2CB2DA8E426F3541AB1907F98ABA6570B1287D6F@icexch-m1.ic.ac.uk> References: , <2CB2DA8E426F3541AB1907F98ABA6570B1287D6F@icexch-m1.ic.ac.uk> Message-ID: Dear Thibaut Thanks ? lot! The script for the scatter works very well! I give more precisions about my problem with the function colorplot. My sPCA in only defined with two axes: >spca1 ######################################## # spatial Principal Component Analysis ######################################### class: spca $call: spca(obj = geno, xy = coord, scale = TRUE) $nfposi: 2 axis-components saved $nfnega: 0 axis-components saved Positive eigenvalues: 9.571 6.744 4.421 3.258 2.818 ... Negative eigenvalues: -0.1399 -0.09504 -0.09378 -0.08205 -0.0807 ... vector length mode content 1 $eig 123 numeric eigenvalues data.frame nrow ncol content 1 $c1 140 2 principal axes: scaled vectors of alleles loadings 2 $li 310 2 principal components: coordinates of entities ('scores') 3 $ls 310 2 lag vector of principal components 4 $as 2 2 pca axes onto spca axes $xy: matrix of spatial coordinates $lw: a list of spatial weights (class 'listw') other elements: NULL And the colorsplot function is: colorplot(spca1$xy, spca1$ls, axes = 1:2, cex = 3) If I understand well the function, the colorplot projects my populations according to their geographical coordinates and adds a color (green-red scale) according to the ls score for the twa sPCA axes. The spca1$ls object contains two columns corresponding to the ls of the two axes: > head(spca1$ls) Axis 1 Axis 2 1 3.383404 -2.599413 2 3.401930 -2.626953 3 3.410790 -2.636643 4 3.376684 -2.647047 5 3.368616 -2.582604 6 3.385992 -2.636117 So what I want to do is add the legend of these colors ... I ask again because I'm not sure that I was clear the first time I'm sorry to ask you again, but even my colleagues who usually save me with R have never used this colorplot ... Thank you again! Coraline Bichet coraline.bichet at univ-lyon1.fr +33(0)472433584 UMR-CNRS 5558, Laboratoire de Biom?trie et Biologie Evolutive (LBBE) Universit? Claude Bernard Lyon 1, b?timent Mendel 43 boulevard du 11 novembre 1918 69622 Villeurbanne Cedex, FRANCE ________________________________ De : Jombart, Thibaut [t.jombart at imperial.ac.uk] Envoy? : jeudi 20 ao?t 2015 13:01 ? : BICHET CORALINE; adegenet-forum at lists.r-forge.r-project.org Objet : RE: colorplot and scatter Hi Coraline The colorplot is basically translating up to 3 quantitative variables into colors using RGB coding. So each axis corresponds to a channel, but the problem is our eyes are not good at knowing what color comes out when combining values RBG channels. To have a proper legend for 3 axes we would need some kind of color triangle. And a different system for 2 axes.. nothing impossible to code, but cumbersome. As for your second question, your approach is the best way to go. But there's a trick. The function add.scatter.eig, which adds the barplots of eigenvalues to the plot, changes the coordinate system, so that you can't plot anything afterwards. You need to disable them,e g.: library(adegenet) example(dapc) scatter(dapc1, col="transparent", scree.da=FALSE) points(dapc1$ind.coord[,1],dapc1$ind.coord[,2], col=funky(100)) Note that if some points are missing around the edges, you may need a par(xpd=TRUE) right before your 'points(..)'. Cheers Thibaut ============================== Dr Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Department of Infectious Disease Epidemiology Imperial College - School of Public Health Norfolk Place, London W2 1PG, UK Tel. : 0044 (0)20 7594 3658 http://sites.google.com/site/thibautjombart/ http://sites.google.com/site/therepiproject/ http://adegenet.r-forge.r-project.org/ Twitter: @thibautjombart ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of BICHET CORALINE [coraline.bichet at univ-lyon1.fr] Sent: 20 August 2015 10:29 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] colorplot and scatter Dear forum I have two questions about the adegenet functions "colorplot" and "scatter" I precise that I am a beginner with the package adegenet, so excuse me in advance if my questions seem a little stupid... 1. Colorplot I made a colorplot to represent the result of a sPCA. The colorplot plots the different populations according to their localisation and the colors correspond to the global sPCA score. I would like to add a legend in a box representing the correspondance between colors and sPCA score (as the image below). I search... but I did not find anything to do this... [cid:426d0c7f-6e00-4dd2-a001-eeab8f80089a] 2. Scatter I used the function scatter to represent the results of a DAPC. The colored dots in the plot represent the cluster, but I would like to know if it is possible that the dots represent populations instead of clusters. The clusters would be still represented with the colored bars and the ellipses. I only want to change the dot colors. I tried to manually add dots in the plot (with the function "points"), using the coordinates of each indviduals in the DAPC axes. But the dots did not appeared... maybe due to a scale problem that I cannot solve... Here the function that I use for one population. The data frame "ind" contains the DAPC LD1 and LD2, and the population names of the different individuals : points(ind$LD1[ind$pop=="Aussois"]), scale(ind$LD2[ind$pop=="Aussois"]), col="black", pch=16, cex = 1) I hope that my questions are comprehensible. Thank you very much in advance for your help. Don't hesitate to let me know if you need further information to more understand my problems. Thanks a lot! Coraline Bichet coraline.bichet at univ-lyon1.fr +33(0)472433584 UMR-CNRS 5558, Laboratoire de Biom?trie et Biologie Evolutive (LBBE) Universit? Claude Bernard Lyon 1, b?timent Mendel 43 boulevard du 11 novembre 1918 69622 Villeurbanne Cedex, FRANCE -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: box.jpg Type: image/jpeg Size: 5902 bytes Desc: box.jpg URL: From t.jombart at imperial.ac.uk Thu Aug 20 16:50:43 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Thu, 20 Aug 2015 14:50:43 +0000 Subject: [adegenet-forum] colorplot and scatter In-Reply-To: References: , <2CB2DA8E426F3541AB1907F98ABA6570B1287D6F@icexch-m1.ic.ac.uk>, Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570B1287DF2@icexch-m1.ic.ac.uk> Hi there thanks for reposting to the group. In two dimensions, it is easier to get a legend. To get the basic one: library(adegenet) xy <- expand.grid(1:30,1:30) colorplot(xy, xy, pch=15, cex=10, xlab="Axis 1", ylab="Axis 2") You can have better resolution by replacing 30 by a larger number. You can add the legend manually afterwards, or using add.scatter: m1=matrix(rnorm(200),ncol=2) m2=matrix(rnorm(200),ncol=2) colorplot(m1,X=m2, xlab="x", ylab="y", cex=2) add.scatter({par(xaxt="n",yaxt="n");colorplot(xy, xy,xlab="Axis 1",ylab="Axis 2", line=0)}) Cheers Thibaut ________________________________ From: BICHET CORALINE [coraline.bichet at univ-lyon1.fr] Sent: 20 August 2015 15:14 To: Jombart, Thibaut; adegenet-forum at lists.r-forge.r-project.org Subject: RE:colorplot and scatter Dear Thibaut Thanks ? lot! The script for the scatter works very well! I give more precisions about my problem with the function colorplot. My sPCA in only defined with two axes: >spca1 ######################################## # spatial Principal Component Analysis ######################################### class: spca $call: spca(obj = geno, xy = coord, scale = TRUE) $nfposi: 2 axis-components saved $nfnega: 0 axis-components saved Positive eigenvalues: 9.571 6.744 4.421 3.258 2.818 ... Negative eigenvalues: -0.1399 -0.09504 -0.09378 -0.08205 -0.0807 ... vector length mode content 1 $eig 123 numeric eigenvalues data.frame nrow ncol content 1 $c1 140 2 principal axes: scaled vectors of alleles loadings 2 $li 310 2 principal components: coordinates of entities ('scores') 3 $ls 310 2 lag vector of principal components 4 $as 2 2 pca axes onto spca axes $xy: matrix of spatial coordinates $lw: a list of spatial weights (class 'listw') other elements: NULL And the colorsplot function is: colorplot(spca1$xy, spca1$ls, axes = 1:2, cex = 3) If I understand well the function, the colorplot projects my populations according to their geographical coordinates and adds a color (green-red scale) according to the ls score for the twa sPCA axes. The spca1$ls object contains two columns corresponding to the ls of the two axes: > head(spca1$ls) Axis 1 Axis 2 1 3.383404 -2.599413 2 3.401930 -2.626953 3 3.410790 -2.636643 4 3.376684 -2.647047 5 3.368616 -2.582604 6 3.385992 -2.636117 So what I want to do is add the legend of these colors ... I ask again because I'm not sure that I was clear the first time I'm sorry to ask you again, but even my colleagues who usually save me with R have never used this colorplot ... Thank you again! Coraline Bichet coraline.bichet at univ-lyon1.fr +33(0)472433584 UMR-CNRS 5558, Laboratoire de Biom?trie et Biologie Evolutive (LBBE) Universit? Claude Bernard Lyon 1, b?timent Mendel 43 boulevard du 11 novembre 1918 69622 Villeurbanne Cedex, FRANCE ________________________________ De : Jombart, Thibaut [t.jombart at imperial.ac.uk] Envoy? : jeudi 20 ao?t 2015 13:01 ? : BICHET CORALINE; adegenet-forum at lists.r-forge.r-project.org Objet : RE: colorplot and scatter Hi Coraline The colorplot is basically translating up to 3 quantitative variables into colors using RGB coding. So each axis corresponds to a channel, but the problem is our eyes are not good at knowing what color comes out when combining values RBG channels. To have a proper legend for 3 axes we would need some kind of color triangle. And a different system for 2 axes.. nothing impossible to code, but cumbersome. As for your second question, your approach is the best way to go. But there's a trick. The function add.scatter.eig, which adds the barplots of eigenvalues to the plot, changes the coordinate system, so that you can't plot anything afterwards. You need to disable them,e g.: library(adegenet) example(dapc) scatter(dapc1, col="transparent", scree.da=FALSE) points(dapc1$ind.coord[,1],dapc1$ind.coord[,2], col=funky(100)) Note that if some points are missing around the edges, you may need a par(xpd=TRUE) right before your 'points(..)'. Cheers Thibaut ============================== Dr Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Department of Infectious Disease Epidemiology Imperial College - School of Public Health Norfolk Place, London W2 1PG, UK Tel. : 0044 (0)20 7594 3658 http://sites.google.com/site/thibautjombart/ http://sites.google.com/site/therepiproject/ http://adegenet.r-forge.r-project.org/ Twitter: @thibautjombart ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of BICHET CORALINE [coraline.bichet at univ-lyon1.fr] Sent: 20 August 2015 10:29 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] colorplot and scatter Dear forum I have two questions about the adegenet functions "colorplot" and "scatter" I precise that I am a beginner with the package adegenet, so excuse me in advance if my questions seem a little stupid... 1. Colorplot I made a colorplot to represent the result of a sPCA. The colorplot plots the different populations according to their localisation and the colors correspond to the global sPCA score. I would like to add a legend in a box representing the correspondance between colors and sPCA score (as the image below). I search... but I did not find anything to do this... [cid:426d0c7f-6e00-4dd2-a001-eeab8f80089a] 2. Scatter I used the function scatter to represent the results of a DAPC. The colored dots in the plot represent the cluster, but I would like to know if it is possible that the dots represent populations instead of clusters. The clusters would be still represented with the colored bars and the ellipses. I only want to change the dot colors. I tried to manually add dots in the plot (with the function "points"), using the coordinates of each indviduals in the DAPC axes. But the dots did not appeared... maybe due to a scale problem that I cannot solve... Here the function that I use for one population. The data frame "ind" contains the DAPC LD1 and LD2, and the population names of the different individuals : points(ind$LD1[ind$pop=="Aussois"]), scale(ind$LD2[ind$pop=="Aussois"]), col="black", pch=16, cex = 1) I hope that my questions are comprehensible. Thank you very much in advance for your help. Don't hesitate to let me know if you need further information to more understand my problems. Thanks a lot! Coraline Bichet coraline.bichet at univ-lyon1.fr +33(0)472433584 UMR-CNRS 5558, Laboratoire de Biom?trie et Biologie Evolutive (LBBE) Universit? Claude Bernard Lyon 1, b?timent Mendel 43 boulevard du 11 novembre 1918 69622 Villeurbanne Cedex, FRANCE -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: box.jpg Type: image/jpeg Size: 5902 bytes Desc: box.jpg URL: From t.jombart at imperial.ac.uk Thu Aug 20 18:04:43 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Thu, 20 Aug 2015 16:04:43 +0000 Subject: [adegenet-forum] HWE In-Reply-To: <386B59F78C25A74C9D53373962B04CF80121C6C30A@E10-MBX1-CS.personale.dir.unibo.it> References: <386B59F78C25A74C9D53373962B04CF80121C6C27D@E10-MBX1-CS.personale.dir.unibo.it> <2CB2DA8E426F3541AB1907F98ABA6570B12858E7@icexch-m1.ic.ac.uk>, <386B59F78C25A74C9D53373962B04CF80121C6C30A@E10-MBX1-CS.personale.dir.unibo.it> Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570B1287E86@icexch-m1.ic.ac.uk> Hi Carlo, I am confused - I thought you wanted to keep only loci *not* in HWE? Yes, what you describe is doable - only a bit more cumbersome. You need to use seppop and then sapply over the objects: data(nancycats) hw.test(nancycats) allPval <- sapply(seppop(nancycats), function(e) hw.test(e, B=0)[,3,drop=FALSE]) and allPval contains the pvalues for all loci (row) and populations (columns). >From that it's trivial to get what you need, e.g. apply(allPval<0.05, 1, sum, na.rm=TRUE) > 8 To get the loci with significant departure from HWE in at least 8 populations (just tweak the 0.05 to use correction for multiple testing). Cheers Thibaut ________________________________ From: Carlo Pecoraro [carlo.pecoraro2 at unibo.it] Sent: 18 August 2015 12:42 To: Jombart, Thibaut Subject: R: HWE Hi Thibaut, first of all many thanks for your answer. It works perfectly..many thanks for that! The problem is that in this way I am going to lose more than 1000 loci in my dataset. I am wondering if I have just to remove those loci which are out of Hardy?Weinberg equilibrium in at least the 80/90% of my geographical samples/populations (i.e. 8/10). Would it be possible to filter out those loci in disequilibrium, for instance, in 8 o more populations? Sorry for disturbing you again with these boring stuff. Many thanks for your help. Cheers, Carlo Da: Jombart, Thibaut [mailto:t.jombart at imperial.ac.uk] Inviato: luned? 17 agosto 2015 18:16 A: Carlo Pecoraro; adegenet-forum at lists.r-forge.r-project.org Oggetto: RE: HWE Hi Carlo, if you want to be conservative, you can apply Bonferroni correction to your data. Here's an example using nancycats: > library(adegenet) > library(pegas) > temp <- hw.test(nancycats) > pval <- temp[,3] > pval fca8 fca23 fca43 fca45 fca77 fca78 0.000000e+00 0.000000e+00 0.000000e+00 1.622163e-03 0.000000e+00 0.000000e+00 fca90 fca96 fca37 0.000000e+00 1.965095e-14 1.209777e-10 > loc.to.keep <- pval < (0.05/nLoc(nancycats)) # use Bonferroni > loc.to.keep fca8 fca23 fca43 fca45 fca77 fca78 fca90 fca96 fca37 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE > x <- nancycats[loc=loc.to.keep] Here all the loci are kept, but loci at HWE would have been filtered out. Cheers Thibaut ============================== Dr Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Department of Infectious Disease Epidemiology Imperial College - School of Public Health Norfolk Place, London W2 1PG, UK Tel. : 0044 (0)20 7594 3658 http://sites.google.com/site/thibautjombart/ http://sites.google.com/site/therepiproject/ http://adegenet.r-forge.r-project.org/ Twitter: @thibautjombart ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Carlo Pecoraro [carlo.pecoraro2 at unibo.it] Sent: 17 August 2015 16:58 To: adegenet-forum at lists.r-forge.r-project.org Cc: Thibaut Jombart Subject: [adegenet-forum] I: HWE Hi all, I am trying to remove from my dataset those loci in HWE. I have measured it both per population: hwe.pop<-(lapply(seppop(x), hw.test) and per locus hwe.loc<-hw.test(x), B=1000). Now I would like to define a threshold (0.05) for the Pr(chi^2>), removing all those loci with a derived P-values above this threshold. How could I perform this analysis? Do you have any suggestion? I am wondering if this is the best way to filter my dataset according to the HWE. Any advise would be more than welcome. Best regards, Carlo Da: Carlo Pecoraro Inviato: luned? 17 agosto 2015 16:14 A: 'adegenet-forum at lists.r-forge.r-project.org' Oggetto: HWE Hi all, I am trying to remove from my dataset those loci in HWE. I have measured it both per population: hwe.pop<-(lapply(seppop(x), hw.test) and per locus hwe.loc<-hw.test(x), B=1000). Now I would like to define a threshold (0.05) for the Pr(chi^2>), removing all those loci with a derived P-values above this threshold. How could I perform this analysis? Do you have any suggestion? I am wondering if this is the best way to filter my dataset according to the HWE. Any advise would be more than welcome. Best regards, Carlo -- Carlo Pecoraro, PhD Candidate Laboratory of Genetics & Genomics of Marine Resources and Environment (GenoDREAM) Dept. Biological, Geological & Environmental Sciences (BiGeA) University of Bologna Via S. Alberto 163, 48123 Ravenna (Italy) IRD (Institut de Recherche pour le D?veloppement) UMR 212 EME (Ecosyt?mes Marins Exploit?s) BP 570 Victoria, Mah? Seychelles Ph: +39 3337603101 skype contact: carlo_pecoraro -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.sajed at gmail.com Fri Aug 21 02:40:09 2015 From: m.sajed at gmail.com (Md Sajedul Islam) Date: Thu, 20 Aug 2015 17:40:09 -0700 Subject: [adegenet-forum] Question about DAPC clustering Message-ID: Dear All, I was trying to analyze my data (2408 haploid fungal isolate genotyped using 17 SSR markers) using 'adegenet' program . However, I am little bit uncertain how to determine the number of clusters based on the rate of changes of BIC that I observed from my result. I am attaching herewith my output files. I chose 3, 4, 5 9 and 10 clusters. Every time they grouped into three clusters. Should I consider 3 cluster then? I will appreciate your suggestions with explanations. Just another question, is there anyway to see or extract a table from 'adegenet' about which individual belongs to which cluster? Thanks *Sajed Islam* -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: DAPC-K4.pdf Type: application/pdf Size: 138764 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: DAPC-K5.pdf Type: application/pdf Size: 141726 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: KSSR-DAPC-BIC1.pdf Type: application/pdf Size: 30006 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: KSSR-DAPC-BIC2.pdf Type: application/pdf Size: 30072 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: KSSR-DAPC-K3.pdf Type: application/pdf Size: 139478 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: KSSR-DAPC-K9.pdf Type: application/pdf Size: 148860 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: KSSR-DAPC-K10.pdf Type: application/pdf Size: 147517 bytes Desc: not available URL: From chollenbeck07 at tamu.edu Fri Aug 21 15:23:28 2015 From: chollenbeck07 at tamu.edu (Chris Hollenbeck) Date: Fri, 21 Aug 2015 08:23:28 -0500 Subject: [adegenet-forum] read.genepop (adegenet 2.0.0) Message-ID: Hi, I've seen a similar question on another thread, but I wasn't sure if this exact issue has been addressed. It seems that the 'read.genepop' function in adegenet 2.0 no longer has the option to specify the missing data character string, with the result that missing data gets coded as an allele. Is this an intentional feature of the new package? If so, I'm not sure what is the best way to modify the genind object to change the missing data 'alleles' to NAs. Thank you in advance for your help. - Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.jombart at imperial.ac.uk Fri Aug 21 15:38:46 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Fri, 21 Aug 2015 13:38:46 +0000 Subject: [adegenet-forum] read.genepop (adegenet 2.0.0) In-Reply-To: References: Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570B1287F9F@icexch-m1.ic.ac.uk> Hi there, yes, this was a documented bug, fixed in the devel version. Check: https://github.com/thibautjombart/adegenet Cheers Thibaut ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Chris Hollenbeck [chollenbeck07 at tamu.edu] Sent: 21 August 2015 14:23 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] read.genepop (adegenet 2.0.0) Hi, I've seen a similar question on another thread, but I wasn't sure if this exact issue has been addressed. It seems that the 'read.genepop' function in adegenet 2.0 no longer has the option to specify the missing data character string, with the result that missing data gets coded as an allele. Is this an intentional feature of the new package? If so, I'm not sure what is the best way to modify the genind object to change the missing data 'alleles' to NAs. Thank you in advance for your help. - Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: From chollenbeck07 at tamu.edu Fri Aug 21 15:50:55 2015 From: chollenbeck07 at tamu.edu (Chris Hollenbeck) Date: Fri, 21 Aug 2015 08:50:55 -0500 Subject: [adegenet-forum] read.genepop (adegenet 2.0.0) In-Reply-To: <2CB2DA8E426F3541AB1907F98ABA6570B1287F9F@icexch-m1.ic.ac.uk> References: <2CB2DA8E426F3541AB1907F98ABA6570B1287F9F@icexch-m1.ic.ac.uk> Message-ID: It's working fine now. Thanks! Chris On Fri, Aug 21, 2015 at 8:38 AM, Jombart, Thibaut wrote: > > Hi there, > > yes, this was a documented bug, fixed in the devel version. Check: > https://github.com/thibautjombart/adegenet > > Cheers > Thibaut > ------------------------------ > *From:* adegenet-forum-bounces at lists.r-forge.r-project.org [ > adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Chris > Hollenbeck [chollenbeck07 at tamu.edu] > *Sent:* 21 August 2015 14:23 > *To:* adegenet-forum at lists.r-forge.r-project.org > *Subject:* [adegenet-forum] read.genepop (adegenet 2.0.0) > > Hi, > > I've seen a similar question on another thread, but I wasn't sure if this > exact issue has been addressed. It seems that the 'read.genepop' function > in adegenet 2.0 no longer has the option to specify the missing data > character string, with the result that missing data gets coded as an > allele. Is this an intentional feature of the new package? If so, I'm not > sure what is the best way to modify the genind object to change the missing > data 'alleles' to NAs. > > Thank you in advance for your help. > > - Chris > -------------- next part -------------- An HTML attachment was scrubbed... URL: From coraline.bichet at univ-lyon1.fr Fri Aug 21 16:37:55 2015 From: coraline.bichet at univ-lyon1.fr (BICHET CORALINE) Date: Fri, 21 Aug 2015 14:37:55 +0000 Subject: [adegenet-forum] colorplot and scatter In-Reply-To: <2CB2DA8E426F3541AB1907F98ABA6570B1287DF2@icexch-m1.ic.ac.uk> References: , <2CB2DA8E426F3541AB1907F98ABA6570B1287D6F@icexch-m1.ic.ac.uk>, , <2CB2DA8E426F3541AB1907F98ABA6570B1287DF2@icexch-m1.ic.ac.uk> Message-ID: It's perfect Thanks a lot! Coraline Bichet coraline.bichet at univ-lyon1.fr +33(0)472433584 UMR-CNRS 5558, Laboratoire de Biom?trie et Biologie Evolutive (LBBE) Universit? Claude Bernard Lyon 1, b?timent Mendel 43 boulevard du 11 novembre 1918 69622 Villeurbanne Cedex, FRANCE ________________________________ De : Jombart, Thibaut [t.jombart at imperial.ac.uk] Envoy? : jeudi 20 ao?t 2015 16:50 ? : BICHET CORALINE; adegenet-forum at lists.r-forge.r-project.org Objet : RE: colorplot and scatter Hi there thanks for reposting to the group. In two dimensions, it is easier to get a legend. To get the basic one: library(adegenet) xy <- expand.grid(1:30,1:30) colorplot(xy, xy, pch=15, cex=10, xlab="Axis 1", ylab="Axis 2") You can have better resolution by replacing 30 by a larger number. You can add the legend manually afterwards, or using add.scatter: m1=matrix(rnorm(200),ncol=2) m2=matrix(rnorm(200),ncol=2) colorplot(m1,X=m2, xlab="x", ylab="y", cex=2) add.scatter({par(xaxt="n",yaxt="n");colorplot(xy, xy,xlab="Axis 1",ylab="Axis 2", line=0)}) Cheers Thibaut ________________________________ From: BICHET CORALINE [coraline.bichet at univ-lyon1.fr] Sent: 20 August 2015 15:14 To: Jombart, Thibaut; adegenet-forum at lists.r-forge.r-project.org Subject: RE:colorplot and scatter Dear Thibaut Thanks ? lot! The script for the scatter works very well! I give more precisions about my problem with the function colorplot. My sPCA in only defined with two axes: >spca1 ######################################## # spatial Principal Component Analysis ######################################### class: spca $call: spca(obj = geno, xy = coord, scale = TRUE) $nfposi: 2 axis-components saved $nfnega: 0 axis-components saved Positive eigenvalues: 9.571 6.744 4.421 3.258 2.818 ... Negative eigenvalues: -0.1399 -0.09504 -0.09378 -0.08205 -0.0807 ... vector length mode content 1 $eig 123 numeric eigenvalues data.frame nrow ncol content 1 $c1 140 2 principal axes: scaled vectors of alleles loadings 2 $li 310 2 principal components: coordinates of entities ('scores') 3 $ls 310 2 lag vector of principal components 4 $as 2 2 pca axes onto spca axes $xy: matrix of spatial coordinates $lw: a list of spatial weights (class 'listw') other elements: NULL And the colorsplot function is: colorplot(spca1$xy, spca1$ls, axes = 1:2, cex = 3) If I understand well the function, the colorplot projects my populations according to their geographical coordinates and adds a color (green-red scale) according to the ls score for the twa sPCA axes. The spca1$ls object contains two columns corresponding to the ls of the two axes: > head(spca1$ls) Axis 1 Axis 2 1 3.383404 -2.599413 2 3.401930 -2.626953 3 3.410790 -2.636643 4 3.376684 -2.647047 5 3.368616 -2.582604 6 3.385992 -2.636117 So what I want to do is add the legend of these colors ... I ask again because I'm not sure that I was clear the first time I'm sorry to ask you again, but even my colleagues who usually save me with R have never used this colorplot ... Thank you again! Coraline Bichet coraline.bichet at univ-lyon1.fr +33(0)472433584 UMR-CNRS 5558, Laboratoire de Biom?trie et Biologie Evolutive (LBBE) Universit? Claude Bernard Lyon 1, b?timent Mendel 43 boulevard du 11 novembre 1918 69622 Villeurbanne Cedex, FRANCE ________________________________ De : Jombart, Thibaut [t.jombart at imperial.ac.uk] Envoy? : jeudi 20 ao?t 2015 13:01 ? : BICHET CORALINE; adegenet-forum at lists.r-forge.r-project.org Objet : RE: colorplot and scatter Hi Coraline The colorplot is basically translating up to 3 quantitative variables into colors using RGB coding. So each axis corresponds to a channel, but the problem is our eyes are not good at knowing what color comes out when combining values RBG channels. To have a proper legend for 3 axes we would need some kind of color triangle. And a different system for 2 axes.. nothing impossible to code, but cumbersome. As for your second question, your approach is the best way to go. But there's a trick. The function add.scatter.eig, which adds the barplots of eigenvalues to the plot, changes the coordinate system, so that you can't plot anything afterwards. You need to disable them,e g.: library(adegenet) example(dapc) scatter(dapc1, col="transparent", scree.da=FALSE) points(dapc1$ind.coord[,1],dapc1$ind.coord[,2], col=funky(100)) Note that if some points are missing around the edges, you may need a par(xpd=TRUE) right before your 'points(..)'. Cheers Thibaut ============================== Dr Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Department of Infectious Disease Epidemiology Imperial College - School of Public Health Norfolk Place, London W2 1PG, UK Tel. : 0044 (0)20 7594 3658 http://sites.google.com/site/thibautjombart/ http://sites.google.com/site/therepiproject/ http://adegenet.r-forge.r-project.org/ Twitter: @thibautjombart ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of BICHET CORALINE [coraline.bichet at univ-lyon1.fr] Sent: 20 August 2015 10:29 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] colorplot and scatter Dear forum I have two questions about the adegenet functions "colorplot" and "scatter" I precise that I am a beginner with the package adegenet, so excuse me in advance if my questions seem a little stupid... 1. Colorplot I made a colorplot to represent the result of a sPCA. The colorplot plots the different populations according to their localisation and the colors correspond to the global sPCA score. I would like to add a legend in a box representing the correspondance between colors and sPCA score (as the image below). I search... but I did not find anything to do this... [cid:426d0c7f-6e00-4dd2-a001-eeab8f80089a] 2. Scatter I used the function scatter to represent the results of a DAPC. The colored dots in the plot represent the cluster, but I would like to know if it is possible that the dots represent populations instead of clusters. The clusters would be still represented with the colored bars and the ellipses. I only want to change the dot colors. I tried to manually add dots in the plot (with the function "points"), using the coordinates of each indviduals in the DAPC axes. But the dots did not appeared... maybe due to a scale problem that I cannot solve... Here the function that I use for one population. The data frame "ind" contains the DAPC LD1 and LD2, and the population names of the different individuals : points(ind$LD1[ind$pop=="Aussois"]), scale(ind$LD2[ind$pop=="Aussois"]), col="black", pch=16, cex = 1) I hope that my questions are comprehensible. Thank you very much in advance for your help. Don't hesitate to let me know if you need further information to more understand my problems. Thanks a lot! Coraline Bichet coraline.bichet at univ-lyon1.fr +33(0)472433584 UMR-CNRS 5558, Laboratoire de Biom?trie et Biologie Evolutive (LBBE) Universit? Claude Bernard Lyon 1, b?timent Mendel 43 boulevard du 11 novembre 1918 69622 Villeurbanne Cedex, FRANCE -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: box.jpg Type: image/jpeg Size: 5902 bytes Desc: box.jpg URL: From josiedjackson at gmail.com Sat Aug 22 14:54:58 2015 From: josiedjackson at gmail.com (Josie Jackson) Date: Sat, 22 Aug 2015 13:54:58 +0100 Subject: [adegenet-forum] function "orthobasis.listw" and global.rtest Message-ID: Hello, I am trying to test if I have significant global structure in my sPCA using the global rtest however I keep getting an error saying could not find function orthobasis.listw. I see that it has been removed from and put into adespatial. I have updated R and installed adespatial but I am not sure how to get it working again. Any hints would be much appreciated! script: SnowyGtest <- global.rtest(objSnowy1sbsp2$tab,mySpcaSnowy1sbsp2$lw, k = 1, nperm=99) Thanks very much, Josie -- Josie D'Urban Jackson Ph.D student Department of Biology & Biochemistry University of Bath Claverton Down Bath BA2 7AY United Kingdom Cardiff University School of Biosciences, The Sir Martin Evans Building Museum Avenue Cardiff CF103AX United Kingdom -------------- next part -------------- An HTML attachment was scrubbed... URL: From josiedjackson at gmail.com Mon Aug 24 13:18:25 2015 From: josiedjackson at gmail.com (Josie Jackson) Date: Mon, 24 Aug 2015 12:18:25 +0100 Subject: [adegenet-forum] function "orthobasis.listw" and global.rtest In-Reply-To: <55DAFAEB.5070300@univ-lyon1.fr> References: <55DAFAEB.5070300@univ-lyon1.fr> Message-ID: Hello, Thanks for your reply, I found some script written by Thibaut on this link: https://github.com/thibautjombart/adegenet/blob/master/R/orthobasis.R And it worked after running this so the problem is solved but as it says on the link - this may only be a temporary fix. Thanks, Josie On Mon, Aug 24, 2015 at 12:07 PM, St?phane Dray wrote: > Hi, > > I am not sure about which function you used. The function in adegenet > seems to work (at least the example). If you used the function included in > 'sedarJombart' on R-Forge (that included also the mspa function), this > behavior is 'normal'. I will try to include both mspa and global/local > tests this afternoon in adespatial.. It will solve the issues. > > Cheers > > > > Le 22/08/2015 14:54, Josie Jackson a ?crit : > > Hello, > I am trying to test if I have significant global structure in my sPCA > using the global rtest however I keep getting an error saying could not > find function orthobasis.listw. I see that it has been removed from and put > into adespatial. I have updated R and installed adespatial but I am not > sure how to get it working again. > Any hints would be much appreciated! > > script: > SnowyGtest <- global.rtest(objSnowy1sbsp2$tab,mySpcaSnowy1sbsp2$lw, k = 1, > nperm=99) > > Thanks very much, > Josie > > -- > Josie D'Urban Jackson > > Ph.D student > Department of Biology & Biochemistry > University of Bath > Claverton Down > Bath > BA2 7AY > United Kingdom > > Cardiff University School of Biosciences, > The Sir Martin Evans Building > Museum Avenue > Cardiff > CF103AX > United Kingdom > > > _______________________________________________ > adegenet-forum mailing listadegenet-forum at lists.r-forge.r-project.orghttps://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum > > > > -- > St?phane DRAY (stephane.dray at univ-lyon1.fr) > Laboratoire BBE-CNRS-UMR-5558, Univ. C. Bernard - Lyon I > 43, Bd du 11 Novembre 1918, 69622 Villeurbanne Cedex, France > Tel: 33 4 72 43 27 57 Fax: 33 4 72 43 13 88http://pbil.univ-lyon1.fr/members/dray/ > > > -- Josie D'Urban Jackson Ph.D student Department of Biology & Biochemistry University of Bath Claverton Down Bath BA2 7AY United Kingdom Cardiff University School of Biosciences, The Sir Martin Evans Building Museum Avenue Cardiff CF103AX United Kingdom -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephane.dray at univ-lyon1.fr Mon Aug 24 13:07:23 2015 From: stephane.dray at univ-lyon1.fr (=?windows-1252?Q?St=E9phane_Dray?=) Date: Mon, 24 Aug 2015 13:07:23 +0200 Subject: [adegenet-forum] function "orthobasis.listw" and global.rtest In-Reply-To: References: Message-ID: <55DAFAEB.5070300@univ-lyon1.fr> Hi, I am not sure about which function you used. The function in adegenet seems to work (at least the example). If you used the function included in 'sedarJombart' on R-Forge (that included also the mspa function), this behavior is 'normal'. I will try to include both mspa and global/local tests this afternoon in adespatial.. It will solve the issues. Cheers Le 22/08/2015 14:54, Josie Jackson a ?crit : > Hello, > I am trying to test if I have significant global structure in my sPCA > using the global rtest however I keep getting an error saying could > not find function orthobasis.listw. I see that it has been removed > from and put into adespatial. I have updated R and installed > adespatial but I am not sure how to get it working again. > Any hints would be much appreciated! > > script: > SnowyGtest <- global.rtest(objSnowy1sbsp2$tab,mySpcaSnowy1sbsp2$lw, k > = 1, nperm=99) > > Thanks very much, > Josie > > -- > Josie D'Urban Jackson > > Ph.D student > Department of Biology & Biochemistry > University of Bath > Claverton Down > Bath > BA2 7AY > United Kingdom > > Cardiff University School of Biosciences, > The Sir Martin Evans Building > Museum Avenue > Cardiff > CF103AX > United Kingdom > > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum -- St?phane DRAY (stephane.dray at univ-lyon1.fr) Laboratoire BBE-CNRS-UMR-5558, Univ. C. Bernard - Lyon I 43, Bd du 11 Novembre 1918, 69622 Villeurbanne Cedex, France Tel: 33 4 72 43 27 57 Fax: 33 4 72 43 13 88 http://pbil.univ-lyon1.fr/members/dray/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From carlo.pecoraro2 at unibo.it Tue Aug 25 09:22:22 2015 From: carlo.pecoraro2 at unibo.it (Carlo Pecoraro) Date: Tue, 25 Aug 2015 07:22:22 +0000 Subject: [adegenet-forum] R: HWE In-Reply-To: <2CB2DA8E426F3541AB1907F98ABA6570B1287E86@icexch-m1.ic.ac.uk> References: <386B59F78C25A74C9D53373962B04CF80121C6C27D@E10-MBX1-CS.personale.dir.unibo.it> <2CB2DA8E426F3541AB1907F98ABA6570B12858E7@icexch-m1.ic.ac.uk>, <386B59F78C25A74C9D53373962B04CF80121C6C30A@E10-MBX1-CS.personale.dir.unibo.it> <2CB2DA8E426F3541AB1907F98ABA6570B1287E86@icexch-m1.ic.ac.uk> Message-ID: <386B59F78C25A74C9D53373962B04CF80121C6C4D3@E10-MBX1-CS.personale.dir.unibo.it> Hi Thibaut, many thanks once again... Cheers Carlo Da: Jombart, Thibaut [mailto:t.jombart at imperial.ac.uk] Inviato: gioved? 20 agosto 2015 18:05 A: Carlo Pecoraro; adegenet-forum at lists.r-forge.r-project.org Oggetto: RE: HWE Hi Carlo, I am confused - I thought you wanted to keep only loci *not* in HWE? Yes, what you describe is doable - only a bit more cumbersome. You need to use seppop and then sapply over the objects: data(nancycats) hw.test(nancycats) allPval <- sapply(seppop(nancycats), function(e) hw.test(e, B=0)[,3,drop=FALSE]) and allPval contains the pvalues for all loci (row) and populations (columns). >From that it's trivial to get what you need, e.g. apply(allPval<0.05, 1, sum, na.rm=TRUE) > 8 To get the loci with significant departure from HWE in at least 8 populations (just tweak the 0.05 to use correction for multiple testing). Cheers Thibaut ________________________________ From: Carlo Pecoraro [carlo.pecoraro2 at unibo.it] Sent: 18 August 2015 12:42 To: Jombart, Thibaut Subject: R: HWE Hi Thibaut, first of all many thanks for your answer. It works perfectly..many thanks for that! The problem is that in this way I am going to lose more than 1000 loci in my dataset. I am wondering if I have just to remove those loci which are out of Hardy-Weinberg equilibrium in at least the 80/90% of my geographical samples/populations (i.e. 8/10). Would it be possible to filter out those loci in disequilibrium, for instance, in 8 o more populations? Sorry for disturbing you again with these boring stuff. Many thanks for your help. Cheers, Carlo Da: Jombart, Thibaut [mailto:t.jombart at imperial.ac.uk] Inviato: luned? 17 agosto 2015 18:16 A: Carlo Pecoraro; adegenet-forum at lists.r-forge.r-project.org Oggetto: RE: HWE Hi Carlo, if you want to be conservative, you can apply Bonferroni correction to your data. Here's an example using nancycats: > library(adegenet) > library(pegas) > temp <- hw.test(nancycats) > pval <- temp[,3] > pval fca8 fca23 fca43 fca45 fca77 fca78 0.000000e+00 0.000000e+00 0.000000e+00 1.622163e-03 0.000000e+00 0.000000e+00 fca90 fca96 fca37 0.000000e+00 1.965095e-14 1.209777e-10 > loc.to.keep <- pval < (0.05/nLoc(nancycats)) # use Bonferroni > loc.to.keep fca8 fca23 fca43 fca45 fca77 fca78 fca90 fca96 fca37 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE > x <- nancycats[loc=loc.to.keep] Here all the loci are kept, but loci at HWE would have been filtered out. Cheers Thibaut ============================== Dr Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Department of Infectious Disease Epidemiology Imperial College - School of Public Health Norfolk Place, London W2 1PG, UK Tel. : 0044 (0)20 7594 3658 http://sites.google.com/site/thibautjombart/ http://sites.google.com/site/therepiproject/ http://adegenet.r-forge.r-project.org/ Twitter: @thibautjombart ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Carlo Pecoraro [carlo.pecoraro2 at unibo.it] Sent: 17 August 2015 16:58 To: adegenet-forum at lists.r-forge.r-project.org Cc: Thibaut Jombart Subject: [adegenet-forum] I: HWE Hi all, I am trying to remove from my dataset those loci in HWE. I have measured it both per population: hwe.pop<-(lapply(seppop(x), hw.test) and per locus hwe.loc<-hw.test(x), B=1000). Now I would like to define a threshold (0.05) for the Pr(chi^2>), removing all those loci with a derived P-values above this threshold. How could I perform this analysis? Do you have any suggestion? I am wondering if this is the best way to filter my dataset according to the HWE. Any advise would be more than welcome. Best regards, Carlo Da: Carlo Pecoraro Inviato: luned? 17 agosto 2015 16:14 A: 'adegenet-forum at lists.r-forge.r-project.org' Oggetto: HWE Hi all, I am trying to remove from my dataset those loci in HWE. I have measured it both per population: hwe.pop<-(lapply(seppop(x), hw.test) and per locus hwe.loc<-hw.test(x), B=1000). Now I would like to define a threshold (0.05) for the Pr(chi^2>), removing all those loci with a derived P-values above this threshold. How could I perform this analysis? Do you have any suggestion? I am wondering if this is the best way to filter my dataset according to the HWE. Any advise would be more than welcome. Best regards, Carlo -- Carlo Pecoraro, PhD Candidate Laboratory of Genetics & Genomics of Marine Resources and Environment (GenoDREAM) Dept. Biological, Geological & Environmental Sciences (BiGeA) University of Bologna Via S. Alberto 163, 48123 Ravenna (Italy) IRD (Institut de Recherche pour le D?veloppement) UMR 212 EME (Ecosyt?mes Marins Exploit?s) BP 570 Victoria, Mah? Seychelles Ph: +39 3337603101 skype contact: carlo_pecoraro -------------- next part -------------- An HTML attachment was scrubbed... URL: From hvh22 at cam.ac.uk Tue Aug 25 17:20:54 2015 From: hvh22 at cam.ac.uk (Harriet Hunt) Date: Tue, 25 Aug 2015 16:20:54 +0100 Subject: [adegenet-forum] snpposi.plot with multiple chromosomes In-Reply-To: <2CB2DA8E426F3541AB1907F98ABA6570B12858D6@icexch-m1.ic.ac.uk> References: <4b5b7e5e54b71c4555111a64422eaa14@cam.ac.uk> <2CB2DA8E426F3541AB1907F98ABA6570B12858D6@icexch-m1.ic.ac.uk> Message-ID: <428adb59f49b90a707d839dc7e53c842@cam.ac.uk> Hi Thibaut, thanks a lot for your reply. I have got a data frame 'snps' which is metadata about the snps (positions, chromosome number, and other info I used to calculate genome position since I started off with positions relative to chromosomes). Its structure is as follows: > str(snps) 'data.frame': 23774 obs. of 6 variables: $ chrnumbers : Factor w/ 19 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ... $ scaffold : Factor w/ 10 levels "scaffold_1","scaffold_2",..: 1 1 1 1 1 1 1 1 1 1 ... $ position : int 82542 88128 88139 88175 90818 90857 91131 91147 91250 91807 ... $ length : int 42145699 42145699 42145699 42145699 42145699 42145699 42145699 42145699 42145699 42145699 ... $ startOffset: int 0 0 0 0 0 0 0 0 0 0 ... $ globpos : int 82542 88128 88139 88175 90818 90857 91131 91147 91250 91807 ... I tried: > chrPlot <- tapply(snps$globpos, snps$chrnumbers, snpposi.plot, > genome.size=405737341) As far as I can see this should be equivalent to the command for your simulated data set, but every time I run it, my computer crashes (totally crashes - can't shut down R, Ctrl-Alt-Del doesn't work). I don't know if I have misunderstood and got the syntax wrong, or the data set is too big, or of course there could just be a problem with my computer... I am running R 3.2.1 using Rstudio 0.99. thanks, Harriet On 2015-08-17 17:04, Jombart, Thibaut wrote: > Hi there, > > basically the workflow should be 1) split data by chromosome and 2) > lapply over the result to generate the plot. 1) should be implemented > by seploc, and I have just posted a feature request for this: > https://github.com/thibautjombart/adegenet/issues/84 > and for the whole thing. > > Meanwhile, there is a simple work around if you have i) SNP positions > and ii) chromosome info. Split the first with the second, and apply > snpposi.plot to the resulting list; or tapply directly. Here's an > example with a simulated dataset: > > ## load package > library(adegenet) > > ## simulate data: 10 indiv, 1,000 SNPs from 10,000 nucleotide > positions; first 600 SNPs are chr1, the other are chr2 > x=glSim(10, 1000) > position(x) <- sort(sample(1:1e4, 1000)) > chromosome(x) <- rep(1:2, c(600,400)) > > ## split positions by chromosome and apply snpposi.plot to the bits > allPlots <- tapply(position(x), chromosome(x), snpposi.plot, > genome.size=1e4) > allPlots[[1]] # chr 1 > allPlots[[2]] # chr 2 > > Note that if you want the positions to be relative to the chromosomes, > then you have to subtract manually the starting positions, e.g. > temp <- split(position(x), chromosome(x)) > temp[[2]] <- temp[[2]] - 5504 # assuming chr 2 starts at position 5504 > allPlots.scaled <- lapply(1:2, function(i) snpposi.plot(temp[[i]], > genome.size=max(temp[[i]]))) > > Cheers > Thibaut > > > > > ________________________________________ > From: adegenet-forum-bounces at lists.r-forge.r-project.org > [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of > Harriet Hunt [hvh22 at cam.ac.uk] > Sent: 17 August 2015 15:52 > To: adegenet-forum at lists.r-forge.r-project.org > Subject: [adegenet-forum] snpposi.plot with multiple chromosomes > > Hello all, > > I have a dataset with SNP positions given as base numbers along > chromosomes, and the number of the chromosome is in a separate column > in > the original data table. Can anyone suggest a way either to easily > select and plot the SNPs from one particular chromosome using > snpposi.plot, without having to extract the data for each chromosome > manually from the original data table? Or even better would be if > there's a way to use snpposi.plot to show the density plots for all the > chromosomes in order. > > thanks, Harriet > > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum -- Dr Harriet Hunt Research Associate McDonald Institute for Archaeological Research University of Cambridge Downing Street Cambridge CB2 3ER UK Tel: +44 (0)1223 339330 e-mail: hvh22 at cam.ac.uk From marianamv at utexas.edu Wed Aug 26 18:48:55 2015 From: marianamv at utexas.edu (Mariana Vasconcellos) Date: Wed, 26 Aug 2015 11:48:55 -0500 Subject: [adegenet-forum] Error in pairDistPlot() Message-ID: <411CC3B4-D742-4F6D-9A39-23DDED749F94@me.com> Hi eveyone, I am trying to run pairDistPlot() using adegenet package with my data or the example provided in the R help and it prints the following error: "Error in levels(K) : object 'K' not found?. See below the example that was supposed to run from the Rhelp. I am using the 2.0.1 development version of adegenet. Could anyone help me to understand why is it printing this error? > ## use a subset of influenza data > data(H3N2) > set.seed(1) > dat <- H3N2[sample(1:nInd(H3N2), 100)] > > ## get pairwise distances > temp <- pairDistPlot(dat, other(dat)$epid) Error in levels(K) : object 'K' not found Thank you, Mariana From t.jombart at imperial.ac.uk Thu Aug 27 12:11:00 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Thu, 27 Aug 2015 10:11:00 +0000 Subject: [adegenet-forum] Error in pairDistPlot() In-Reply-To: <411CC3B4-D742-4F6D-9A39-23DDED749F94@me.com> References: <411CC3B4-D742-4F6D-9A39-23DDED749F94@me.com> Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570B129087B@icexch-m1.ic.ac.uk> Hi Mariana, this is likely a bug. Best way to report these is via the issue system on github - that way you'll be notified automatically of any progress. I just created an issue for you: https://github.com/thibautjombart/adegenet/issues/87 comment there and you'll receive notifications. Most likely this will be sorted today. Cheers Thibaut ________________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Mariana Vasconcellos [marianamv at utexas.edu] Sent: 26 August 2015 17:48 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] Error in pairDistPlot() Hi eveyone, I am trying to run pairDistPlot() using adegenet package with my data or the example provided in the R help and it prints the following error: "Error in levels(K) : object 'K' not found?. See below the example that was supposed to run from the Rhelp. I am using the 2.0.1 development version of adegenet. Could anyone help me to understand why is it printing this error? > ## use a subset of influenza data > data(H3N2) > set.seed(1) > dat <- H3N2[sample(1:nInd(H3N2), 100)] > > ## get pairwise distances > temp <- pairDistPlot(dat, other(dat)$epid) Error in levels(K) : object 'K' not found Thank you, Mariana _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum From t.jombart at imperial.ac.uk Thu Aug 27 12:22:31 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Thu, 27 Aug 2015 10:22:31 +0000 Subject: [adegenet-forum] Error in pairDistPlot() In-Reply-To: <2CB2DA8E426F3541AB1907F98ABA6570B129087B@icexch-m1.ic.ac.uk> References: <411CC3B4-D742-4F6D-9A39-23DDED749F94@me.com>, <2CB2DA8E426F3541AB1907F98ABA6570B129087B@icexch-m1.ic.ac.uk> Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570B12908C3@icexch-m1.ic.ac.uk> Bug fixed (at commit 054359339c36b62a39a95823da97aad7e6ac721d). You can now use the function with the devel version of adegenet. See: https://github.com/thibautjombart/adegenet For installation guidelines. Best Thibaut ________________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Jombart, Thibaut [t.jombart at imperial.ac.uk] Sent: 27 August 2015 11:11 To: Mariana Vasconcellos; adegenet-forum at lists.r-forge.r-project.org Subject: Re: [adegenet-forum] Error in pairDistPlot() Hi Mariana, this is likely a bug. Best way to report these is via the issue system on github - that way you'll be notified automatically of any progress. I just created an issue for you: https://github.com/thibautjombart/adegenet/issues/87 comment there and you'll receive notifications. Most likely this will be sorted today. Cheers Thibaut ________________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Mariana Vasconcellos [marianamv at utexas.edu] Sent: 26 August 2015 17:48 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] Error in pairDistPlot() Hi eveyone, I am trying to run pairDistPlot() using adegenet package with my data or the example provided in the R help and it prints the following error: "Error in levels(K) : object 'K' not found?. See below the example that was supposed to run from the Rhelp. I am using the 2.0.1 development version of adegenet. Could anyone help me to understand why is it printing this error? > ## use a subset of influenza data > data(H3N2) > set.seed(1) > dat <- H3N2[sample(1:nInd(H3N2), 100)] > > ## get pairwise distances > temp <- pairDistPlot(dat, other(dat)$epid) Error in levels(K) : object 'K' not found Thank you, Mariana _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum