From melaniesmontes at gmail.com Wed Feb 8 12:14:52 2017 From: melaniesmontes at gmail.com (Melanie Montes) Date: Wed, 8 Feb 2017 12:14:52 +0100 Subject: [adegenet-forum] error in gstat.randtest Message-ID: Hello all, I recently finished running fstat on my dataset of about 50 000 snps / 56 individuals, and successfully got f-statistics in return. However, when I tried to run gstat.randtest to see if the structure was significant: fstat.sig <-gstat.randtest(nr2014, nsim=1000) ...I got 50+ warnings like this: 50: In max(y, na.rm = TRUE) : no non-missing arguments to max; returning -Inf and my results file looked like this: > fstat.sig Monte-Carlo test Call: gstat.randtest(x = nr2014, nsim = 1000) Observation: 0 Based on 1000 replicates Simulated p-value: 1 which leads me to suspect that it did not work. Does this have something to do with the missing data in my dataset? Sorry if this is a naive question, I am an R novice. Thanks for your time and the awesome package, I've been using it a lot! Sincerely, Melanie -------------- next part -------------- An HTML attachment was scrubbed... URL: From thibautjombart at gmail.com Thu Feb 9 11:53:01 2017 From: thibautjombart at gmail.com (Thibaut Jombart) Date: Thu, 9 Feb 2017 10:53:01 +0000 Subject: [adegenet-forum] error in gstat.randtest In-Reply-To: References: Message-ID: Hi Melanie, it's quite hard to tell without seeing the data, but yes, my suspicion is the same as yours, NAs are the culprits. Entirely non-typed loci are normally removed from genind objects during their construction, but it is still possible that for some groups in your data, some loci are entirely missing. Given how many SNPs you have, you can probably afford to remove loci with many missing data (just make sure you don't end up throwing too much away). propTyped(..., by = "loc") may be your friend here. Here's an example using microbov: > data(microbov) > propTyped(microbov, by = "loc") INRA63 INRA5 ETH225 ILSTS5 HEL5 HEL1 INRA35 ETH152 0.9914773 0.9786932 0.9914773 0.9673295 0.9786932 0.9914773 0.9829545 0.9573864 INRA23 ETH10 HEL9 CSSM66 INRA32 ETH3 BM2113 BM1824 0.9644886 0.9943182 0.9701705 0.9886364 0.9687500 0.9829545 0.9914773 0.9900568 HEL13 INRA37 BM1818 ILSTS6 MM12 CSRM60 ETH185 HAUT24 0.9772727 0.9815341 0.9588068 0.9446023 0.9744318 0.9730114 0.9275568 0.9872159 HAUT27 TGLA227 TGLA126 TGLA122 TGLA53 SPS115 0.9914773 0.9914773 0.9914773 0.9914773 0.9531250 0.9701705 > to_keep <- propTyped(microbov, by = "loc") > .99 # i.e. less 1% missing data > to_keep INRA63 INRA5 ETH225 ILSTS5 HEL5 HEL1 INRA35 ETH152 INRA23 ETH10 TRUE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE TRUE HEL9 CSSM66 INRA32 ETH3 BM2113 BM1824 HEL13 INRA37 BM1818 ILSTS6 FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE MM12 CSRM60 ETH185 HAUT24 HAUT27 TGLA227 TGLA126 TGLA122 TGLA53 SPS115 FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE > x <- microbov[loc = to_keep] > nLoc(x) [1] 10 > nLoc(microbov) [1] 30 This is just an illustration - this dataset actually has little in terms of missing data. In your case you probably want to play with the threshold (99% non-NA is likely an overkill). Cheers Thibaut -- Dr Thibaut Jombart Lecturer, Department of Infectious Disease Epidemiology, Imperial College London Head of RECON: repidemicsconsortium.org sites.google.com/site/thibautjombart/ github.com/thibautjombart Twitter: @TeebzR +44(0)20 7594 3658 On 8 February 2017 at 11:14, Melanie Montes wrote: > Hello all, > I recently finished running fstat on my dataset of about 50 000 snps / 56 > individuals, and successfully got f-statistics in return. However, when I > tried to run gstat.randtest to see if the structure was significant: > > fstat.sig <-gstat.randtest(nr2014, nsim=1000) > ...I got 50+ warnings like this: > > 50: In max(y, na.rm = TRUE) : no non-missing arguments to max; returning > -Inf > > and my results file looked like this: > > > fstat.sig > > Monte-Carlo test > > Call: gstat.randtest(x = nr2014, nsim = 1000) > > Observation: 0 > > Based on 1000 replicates > > Simulated p-value: 1 > > > which leads me to suspect that it did not work. Does this have something > to do with the missing data in my dataset? Sorry if this is a naive > question, I am an R novice. > > Thanks for your time and the awesome package, I've been using it a lot! > > Sincerely, > Melanie > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/ > listinfo/adegenet-forum > -------------- next part -------------- An HTML attachment was scrubbed... URL: From francois.lefevre.2 at inra.fr Fri Feb 10 11:47:14 2017 From: francois.lefevre.2 at inra.fr (flefevre) Date: Fri, 10 Feb 2017 11:47:14 +0100 Subject: [adegenet-forum] any change in find.clusters and/or dapc functions from V1.4-2 to V2.0.1 ? Message-ID: <942f05a0-c5c0-7bff-c583-ffbde69d9b29@inra.fr> Dear adegenet team, has something changed in the find.clusters and/or dapc functions from V1.4-2 and V2.01 of adegenet? I am re-doing an analysis that was first done in 2014 (with V1.4-2) on a genind object of 158 diploid individuals and 70 snp and the results are quite different (synthesis of 20 analyses using CLUMPP, highly consistent in each case). Exactly the same dataset and the same script. The only difference I can see between the 2 analyses is the version of adegenet: has something changed in these 2 functions? Actually the individuals belong to an admixed population, we look for the number of components and assignment of the individuals. The difference is as follows: 2014 analysis => 3 clear groups (136 individuals out of 158 have a mean assignment probability >0.95 to one of the groups), one group is more "compact" and consists of very related individuals 2017 analysis => 2 clear groups (111 individuals with assignment >0.95), all individuals previously assigned to the compact group are still assigned to the same new group but now associated with others, the other 2 previous groups do not relate well with the new ones. Any suggestion to interprete this discrepancy is welcome, Thank you, Fran?ois francois.lefevre.2 at inra.fr -- -------------- next part -------------- An HTML attachment was scrubbed... URL: From thibautjombart at gmail.com Wed Feb 22 13:33:51 2017 From: thibautjombart at gmail.com (Thibaut Jombart) Date: Wed, 22 Feb 2017 12:33:51 +0000 Subject: [adegenet-forum] any change in find.clusters and/or dapc functions from V1.4-2 to V2.0.1 ? In-Reply-To: <942f05a0-c5c0-7bff-c583-ffbde69d9b29@inra.fr> References: <942f05a0-c5c0-7bff-c583-ffbde69d9b29@inra.fr> Message-ID: Hello, yes, a lot has changed in between the two versions; see ChangeLog: https://cran.r-project.org/web/packages/adegenet/ChangeLog Besides, are you using the same version of R for both packages? Knowing if and what change(s) specifically could cause the results to differ is going to be difficult, we would have to finely compare the analyses. The first thing to check would be verify that the matrices of allele frequencies are the same. I think this would have been 'scaleGen' in the older version (specify scale=FALSE, center=FALSE, do not replace NA). This is 'tab' in the current version (use 'freq = TRUE, do not replace NA). Then check if the matrices are the same. A change in the default treatment of NA, or in scaling could explain the difference. Cheers Thibaut -- Dr Thibaut Jombart Lecturer, Department of Infectious Disease Epidemiology, Imperial College London Head of RECON: repidemicsconsortium.org sites.google.com/site/thibautjombart/ github.com/thibautjombart Twitter: @TeebzR +44(0)20 7594 3658 On 10 February 2017 at 10:47, flefevre wrote: > Dear adegenet team, > > has something changed in the find.clusters and/or dapc functions from > V1.4-2 and V2.01 of adegenet? > > I am re-doing an analysis that was first done in 2014 (with V1.4-2) on a > genind object of 158 diploid individuals and 70 snp and the results are > quite different (synthesis of 20 analyses using CLUMPP, highly consistent > in each case). Exactly the same dataset and the same script. The only > difference I can see between the 2 analyses is the version of adegenet: has > something changed in these 2 functions? > > Actually the individuals belong to an admixed population, we look for the > number of components and assignment of the individuals. The difference is > as follows: > 2014 analysis => 3 clear groups (136 individuals out of 158 have a mean > assignment probability >0.95 to one of the groups), one group is more > "compact" and consists of very related individuals > 2017 analysis => 2 clear groups (111 individuals with assignment >0.95), > all individuals previously assigned to the compact group are still assigned > to the same new group but now associated with others, the other 2 previous > groups do not relate well with the new ones. > > Any suggestion to interprete this discrepancy is welcome, > Thank you, > > Fran?ois > francois.lefevre.2 at inra.fr > > -- > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/ > listinfo/adegenet-forum > -------------- next part -------------- An HTML attachment was scrubbed... URL: From melaniesmontes at gmail.com Thu Feb 23 15:00:01 2017 From: melaniesmontes at gmail.com (Melanie Montes) Date: Thu, 23 Feb 2017 15:00:01 +0100 Subject: [adegenet-forum] error in gstat.randtest In-Reply-To: References: Message-ID: Thanks for the tip Thibaut! In case anyone else ends up running into this error, it seems that it was indeed the amount of missing data that was the problem. I tried keeping different amounts, but found that any less that 98% missing (even 97.5%) led to this error. This cut my dataset down from about 50 000 snps to 2800, so you see how much was actually missing! Thanks again, Melanie On Thu, Feb 9, 2017 at 11:53 AM, Thibaut Jombart wrote: > Hi Melanie, > > it's quite hard to tell without seeing the data, but yes, my suspicion is > the same as yours, NAs are the culprits. Entirely non-typed loci are > normally removed from genind objects during their construction, but it is > still possible that for some groups in your data, some loci are entirely > missing. > > Given how many SNPs you have, you can probably afford to remove loci with > many missing data (just make sure you don't end up throwing too much away). > propTyped(..., by = "loc") may be your friend here. Here's an example using > microbov: > > > > data(microbov) > > > propTyped(microbov, by = "loc") > INRA63 INRA5 ETH225 ILSTS5 HEL5 HEL1 INRA35 > ETH152 > 0.9914773 0.9786932 0.9914773 0.9673295 0.9786932 0.9914773 0.9829545 > 0.9573864 > INRA23 ETH10 HEL9 CSSM66 INRA32 ETH3 BM2113 > BM1824 > 0.9644886 0.9943182 0.9701705 0.9886364 0.9687500 0.9829545 0.9914773 > 0.9900568 > HEL13 INRA37 BM1818 ILSTS6 MM12 CSRM60 ETH185 > HAUT24 > 0.9772727 0.9815341 0.9588068 0.9446023 0.9744318 0.9730114 0.9275568 > 0.9872159 > HAUT27 TGLA227 TGLA126 TGLA122 TGLA53 SPS115 > 0.9914773 0.9914773 0.9914773 0.9914773 0.9531250 0.9701705 > > > to_keep <- propTyped(microbov, by = "loc") > .99 # i.e. less 1% > missing data > > > to_keep > INRA63 INRA5 ETH225 ILSTS5 HEL5 HEL1 INRA35 ETH152 INRA23 > ETH10 > TRUE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE > TRUE > HEL9 CSSM66 INRA32 ETH3 BM2113 BM1824 HEL13 INRA37 BM1818 > ILSTS6 > FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE > FALSE > MM12 CSRM60 ETH185 HAUT24 HAUT27 TGLA227 TGLA126 TGLA122 TGLA53 > SPS115 > FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE > FALSE > > > x <- microbov[loc = to_keep] > > > nLoc(x) > [1] 10 > > > nLoc(microbov) > [1] 30 > > > This is just an illustration - this dataset actually has little in terms > of missing data. In your case you probably want to play with the threshold > (99% non-NA is likely an overkill). > > Cheers > Thibaut > > > > -- > Dr Thibaut Jombart > Lecturer, Department of Infectious Disease Epidemiology, Imperial College > London > Head of RECON: repidemicsconsortium.org > sites.google.com/site/thibautjombart/ > github.com/thibautjombart > Twitter: @TeebzR > +44(0)20 7594 3658 <+44%2020%207594%203658> > > On 8 February 2017 at 11:14, Melanie Montes > wrote: > >> Hello all, >> I recently finished running fstat on my dataset of about 50 000 snps / 56 >> individuals, and successfully got f-statistics in return. However, when I >> tried to run gstat.randtest to see if the structure was significant: >> >> fstat.sig <-gstat.randtest(nr2014, nsim=1000) >> ...I got 50+ warnings like this: >> >> 50: In max(y, na.rm = TRUE) : no non-missing arguments to max; returning >> -Inf >> >> and my results file looked like this: >> >> > fstat.sig >> >> Monte-Carlo test >> >> Call: gstat.randtest(x = nr2014, nsim = 1000) >> >> Observation: 0 >> >> Based on 1000 replicates >> >> Simulated p-value: 1 >> >> >> which leads me to suspect that it did not work. Does this have something >> to do with the missing data in my dataset? Sorry if this is a naive >> question, I am an R novice. >> >> Thanks for your time and the awesome package, I've been using it a lot! >> >> Sincerely, >> Melanie >> >> _______________________________________________ >> adegenet-forum mailing list >> adegenet-forum at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo >> /adegenet-forum >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mahon2a at cmich.edu Tue Feb 28 22:43:57 2017 From: mahon2a at cmich.edu (Mahon, Andrew R) Date: Tue, 28 Feb 2017 21:43:57 +0000 Subject: [adegenet-forum] scatter plot labels Message-ID: <6773CB2C-48AD-492C-9C78-94096F896BBB@cmich.edu> Hi all, New to using adegenet. Quick question (that may or may not be simple?.). Is there a way to use actual sample labels (i.e., what I named them) and plotting them in the scatter of the DAPC (i.e., when you run scatter(dapc) command)? Thanks for any help in advance. -andy -- -------------- next part -------------- An HTML attachment was scrubbed... URL: From alangarcia87 at hotmail.com Mon Feb 27 20:19:00 2017 From: alangarcia87 at hotmail.com (Alan Garcia-Elfring) Date: Mon, 27 Feb 2017 19:19:00 +0000 Subject: [adegenet-forum] DAPC - 3.4 Interpreting variable contributions (using a genlight object) Message-ID: Hi all, I have a genlight object and I would like to analyze the contributions of different alleles to populations structure. The example on the manual is for genind objects and a previous answer indicated that the fix-around, using as.data.frame, is only good for haploid data. http://lists.r-forge.r-project.org/pipermail/adegenet-forum/2014-May/000840.html Is it yet possible to do section 3.4 (Jombart and Collins 2015) with a diploid dataset? It would be really cool if so! Thanks, Alan > pldata /// GENLIGHT OBJECT ///////// // 223 genotypes, 76,288 binary SNPs, size: 9 Mb 0 (0 %) missing data // Basic content @gen: list of 223 SNPbin @ploidy: ploidy of each individual (range: 2-2) // Optional content @ind.names: 223 individual labels @loc.names: 76288 locus labels @pop: population of each individual (group size range: 1-1) @other: a list containing: sex phenotype pat mat > freq399 <- tab(genind2genpop(pldata[loc=c("41837")]),freq=TRUE) Error in genind2genpop(pldata[loc = c("41837")]) : x is not a valid genind object [cid:732fdd8f-ab46-46ff-97d3-0dd3b4bc8971] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Rplot.png Type: image/png Size: 41041 bytes Desc: Rplot.png URL: