From thibautjombart at gmail.com Fri Dec 2 16:51:59 2016 From: thibautjombart at gmail.com (Thibaut Jombart) Date: Fri, 2 Dec 2016 15:51:59 +0000 Subject: [adegenet-forum] question on scaleGen() In-Reply-To: References: Message-ID: Hello, sorry, I am not sure I understand: what is the problem? Cheers Thibaut -- Dr Thibaut Jombart Lecturer, Department of Infectious Disease Epidemiology, Imperial College London Head of RECON: repidemicsconsortium.org sites.google.com/site/thibautjombart/ github.com/thibautjombart Twitter: @TeebzR On 25 November 2016 at 15:23, Da Pan wrote: > Dear Thimbaut and adegenet users, > Thank you for your time and help. > I probably have some more naif questions about adegenet. > I am attemping a PCA analysis on my SNP dataset with the following > arguements: > > test <- read.structure("batch_1.str",n.ind = 17, n.loc = 12451, col.lab = > 1, col.pop = 2, row.marknames = 1, NA.char = "0") > > test2 <- scaleGen(test, NA.method = "mean") > > After this, the R shows: > > Warning message: > In .local(x, ...) : Some scaling values are null. > Corresponding alleles are removed. > > I checked pop(test), it returned as : > > 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 > 02 02 04 04 06 06 07 07 19 19 38 38 39 39 46 46 46 > Levels: 02 04 06 07 19 38 39 46 > > How to solve this problem? > > thanks in advance > > Best wishes, > Da > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum From thibautjombart at gmail.com Mon Dec 5 13:10:03 2016 From: thibautjombart at gmail.com (Thibaut Jombart) Date: Mon, 5 Dec 2016 12:10:03 +0000 Subject: [adegenet-forum] Discrepancy in NA counts In-Reply-To: <237564767.572607.1480336824232.JavaMail.zimbra@biolitika.si> References: <1051082989.570757.1480318267345.JavaMail.zimbra@biolitika.si> <656891479.571300.1480323819198.JavaMail.zimbra@biolitika.si> <237564767.572607.1480336824232.JavaMail.zimbra@biolitika.si> Message-ID: Hello, Roman has fixed this bug in the current devel version of adegenet. See: https://github.com/thibautjombart/adegenet For guidelines on installing it. Can you confirm it solves your issue? Best Thibaut -- Dr Thibaut Jombart Lecturer, Department of Infectious Disease Epidemiology, Imperial College London Head of RECON: repidemicsconsortium.org sites.google.com/site/thibautjombart/ github.com/thibautjombart Twitter: @TeebzR On 28 November 2016 at 12:40, Roman Lu?trik wrote: > Hi Elizabeth, > > it would appear there is something funky happening with the code due to > locus names being numeric. This has happened before in some other function. > Until we fix this, you can change your locus names so that they start with a > letter. > > Here is the excerpt from the genind object indicating that these two samples > have alleles 33: > > X1401_25.13 X1401_25.33 X1403_13.11 X1403_13.13 X1403_13.33 > X1404_17.13 X1404_17.33 X1404_17.11 > C_KH1059 0 1 1 0 0 0 1 0 > M_KH1834 0 1 1 0 0 1 0 0 > > > Cheers, > Roman > > > ---- > In god we trust, all others bring data. > > ________________________________ > From: "Biz Sheedy" > To: "Roman Lu?trik" > Cc: adegenet-forum at lists.r-forge.r-project.org > Sent: Monday, November 28, 2016 11:00:53 AM > > Subject: Re: [adegenet-forum] Discrepancy in NA counts > > Thanks for looking into this. > > Something that I did differently to the code you provided, was that I only > answered the prompts for the read.structure function. This meant I did not > use sep="\t" and the number of alleles was 62 instead of 72, which I think > should be comparable to the excel count. Following the code you provide, > 'is.na' finds 23 NAs (instead of 20 NAs at 62 alleles and 16 zeroes in > excel). > > Your explanation makes sense to me for the additional three NAs in adegenet, > but I still don't understand how in locus 1401_25 the data for two > individuals (C_KH1059 and M_KH1834) changed from being homozygous for "3" to > being "NA"? > > I would really appreciate any further help on this. > > Thanks again, > Elizabeth > > > On 28 November 2016 at 18:03, Roman Lu?trik > wrote: >> >> Hi, >> >> I think the problem is that adegenet, for consistency, adds NAs to >> accommodate the extra alleles present for a particular locus. Take for >> example C_KH1238 (bottom row in the example pasted belo). >> In raw file, it has missing values for locus 1378_53, but this locus has >> three alleles, ergo 3 NAs and not 2. Can't go through all the NAs right now, >> but I think there's a pretty good chance this is what is causing the >> discrepancy between what you see in "excel" and in adegenet. >> >> 1369_41.11 1372_14.22 1372_14.24 1373_9.44 1373_9.24 1377_42.44 1377_42.24 >> 1378_53.22 1378_53.24 1378_53.44 1379_10.33 1379_10.13 1382_37.33 >> ... >> C_KH1238 0 1 0 1 0 1 0 NA NA NA 1 0 1 # notice 3 NAs for all available >> alleles for 1378_53, not just two (as expected for diploid) >> >> >> Here is the code I used to explore this: >> >> library(adegenet) >> >> xy <- read.table("Sub_batch_1.stru", header = TRUE, sep = "\t") >> xy <- xy[, c(-1, -2)] >> table(as.matrix(xy)) >> >> # 0 1 2 3 4 >> # 16 467 618 760 867 >> >> >> xy <- read.structure("Sub_batch_1.stru", NA.char="0", >> n.ind = 44, n.loc = 31, onerowperind = FALSE, >> col.lab = 1, col.pop = 2, row.marknames = 1, >> sep = "\t", col.others = 0) >> >> xy <- tab(xy) >> xy[grepl("C_KH1238", rownames(xy)), grepl("1378_53", colnames(xy))] >> >> Cheers, >> Roman >> >> ---- >> In god we trust, all others bring data. >> >> ________________________________ >> From: "Biz Sheedy" >> To: "Roman Lu?trik" >> Sent: Monday, November 28, 2016 9:11:39 AM >> Subject: Re: [adegenet-forum] Discrepancy in NA counts >> >> My apologies. First time posting to a forum so I am a little unsure of >> things. I have attached a subset of the data, which includes the locus that >> I saw had problems. >> >> In this case there are 31 loci with 16 zeroes counted (excel), and 20 NAs >> counted (adegenet). The additional NAs occur in locus 1401_25. >> >> Thanks so much, >> Elizabeth >> >> On 28 November 2016 at 16:31, Roman Lu?trik >> wrote: >>> >>> Hi, >>> >>> can you share a (subset) of the dataset? It's hard to pinpoint where >>> things might be going wrong without some data in hand. >>> >>> Cheers, >>> Roman >>> >>> ---- >>> In god we trust, all others bring data. >>> >>> ________________________________ >>> From: "Biz Sheedy" >>> To: adegenet-forum at lists.r-forge.r-project.org >>> Sent: Friday, November 25, 2016 10:44:16 AM >>> Subject: [adegenet-forum] Discrepancy in NA counts >>> >>> Dear All, >>> >>> I am trying to read SNP data from Stacks into adegenet. I have tried >>> read.structure and read.genepop but they both give (the same) NA counts that >>> are higher than expected. Using read.table on the structure-formatted file >>> (with "ind" and "pop" inserted into the first two columns of row one) gave >>> the expected number of missing data. >>> >>> I looked at a single population subset (both the original and the >>> converted data) in excel and found a locus where in the original data, all >>> nine individuals were "3", but in the converted data one individual was >>> "NA". The loci before and after this one both matched/were correct. >>> >>> I am not sure what I have missed for this to happen, my R skills are >>> beginner at best. Any help with reading the data in correctly would be >>> greatly appreciated! >>> >>> Thank you, >>> Elizabeth >>> >>> >>> R version 3.3.2 >>> adegenet version 2.0.1 >>> >>> Data: 44 individuals, diploid, 4279 loci. >>> >>> all<-read.structure("all_batch_1.stru", NA.char="0") >>> >>> Total cells in excel: 376552 >>> After read.structure/genepop: 44*8558=376552 >>> >>> 0s in excel: 3952 >>> 0s after read.table; length(which(X==0)): 3952 >>> NA after read.structure/genepop; sum(is.na(all$tab)): 4008 >>> Difference: 56 >>> >>> Subset Chichi >>> Total cells: 77022 >>> After read.structure/genepop: 9*8558=77022 >>> >>> 0s in excel: 742 >>> NA after read.structure/genepop; sum(is.na(chi$tab)): 756 >>> Difference: 14 >>> >>> >>> >>> -- >>> 4-1-1 Amakubo >>> Department of Botany >>> National Museum of Nature and Science >>> Tsukuba, Ibaraki 305-0005 >>> Japan >>> >>> biz.sheedy at gmail.com >>> >>> _______________________________________________ >>> adegenet-forum mailing list >>> adegenet-forum at lists.r-forge.r-project.org >>> >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum >> >> >> >> >> -- >> 4-1-1 Amakubo >> Department of Botany >> National Museum of Nature and Science >> Tsukuba, Ibaraki 305-0005 >> Japan >> >> biz.sheedy at gmail.com >> > > > > -- > 4-1-1 Amakubo > Department of Botany > National Museum of Nature and Science > Tsukuba, Ibaraki 305-0005 > Japan > > biz.sheedy at gmail.com > > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum From fhernandeu at uc.cl Mon Dec 5 15:29:14 2016 From: fhernandeu at uc.cl (=?UTF-8?Q?Felipe_Hern=C3=A1ndez?=) Date: Mon, 5 Dec 2016 09:29:14 -0500 Subject: [adegenet-forum] DaPC vs. BAPS results question Message-ID: Good morning, I wonder if you may guide me with this question (that may be pretty basic surely). After a run DaPC analysis using adegenet, I'm usually getting K between 4 and 5 for my dataset (480 hogs, 59 microsats, 39 sampling sites). Maximum number of clusters tried are 40. Afterwards, I tried to estimate number of clusters (spatial clustering by individuals) using another software (BAPS 6.0), but I got an even higher number of estimated cluster (K=17), after testing different maximum number of K's (i.e., K=5 through K=20). Any clue about what's the reason of this? Maybe related to the maximum number of cluster tested? Or, linkage disequilibrium between some loci? Sorry if the question is really basic, but I would appreciate any advice. Regards, Felipe -- Felipe Hern?ndez M?dico Veterinario (DVM), MSc. PhD. Candidate Interdisciplinary Ecology Program School of Natural Resources and Environment Wildlife Ecology and Conservation Department University of Florida -------------- next part -------------- An HTML attachment was scrubbed... URL: From thibautjombart at gmail.com Mon Dec 5 16:10:14 2016 From: thibautjombart at gmail.com (Thibaut Jombart) Date: Mon, 5 Dec 2016 15:10:14 +0000 Subject: [adegenet-forum] DaPC vs. BAPS results question In-Reply-To: References: Message-ID: Dear Felipe, this is always a hard question, as different methods essentially do.. different things. The K-means in find.clusters optimizes the variance between groups, while BAPS maximizes a likelihood function under a given population genetics model. So it may be the case that you have ~17 demes roughly at HWE, but that only 4-5 groups are optimum in terms of clearly delineated groups. And this is assuming both methods are 'right'. They may be prone to all sorts of biases. Namely, largely different group variances for the K-means, and deviations from the original model in BAPS. Feel free to post the image (or a link to it) of the BIC for find.clusters if you want a 2-cents advice on the number of K to look at. Best Thibaut -- Dr Thibaut Jombart Lecturer, Department of Infectious Disease Epidemiology, Imperial College London Head of RECON: repidemicsconsortium.org sites.google.com/site/thibautjombart/ github.com/thibautjombart Twitter: @TeebzR On 5 December 2016 at 14:29, Felipe Hern?ndez wrote: > Good morning, > > I wonder if you may guide me with this question (that may be pretty basic > surely). After a run DaPC analysis using adegenet, I'm usually getting K > between 4 and 5 for my dataset (480 hogs, 59 microsats, 39 sampling sites). > Maximum number of clusters tried are 40. Afterwards, I tried to estimate > number of clusters (spatial clustering by individuals) using another > software (BAPS 6.0), but I got an even higher number of estimated cluster > (K=17), after testing different maximum number of K's (i.e., K=5 through > K=20). Any clue about what's the reason of this? Maybe related to the > maximum number of cluster tested? Or, linkage disequilibrium between some > loci? Sorry if the question is really basic, but I would appreciate any > advice. > > Regards, > Felipe > > -- > Felipe Hern?ndez > M?dico Veterinario (DVM), MSc. > PhD. Candidate > Interdisciplinary Ecology Program > School of Natural Resources and Environment > Wildlife Ecology and Conservation Department > University of Florida > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum From thibautjombart at gmail.com Tue Dec 6 17:13:11 2016 From: thibautjombart at gmail.com (Thibaut Jombart) Date: Tue, 6 Dec 2016 16:13:11 +0000 Subject: [adegenet-forum] DaPC vs. BAPS results question In-Reply-To: References: Message-ID: Hello, the results will be a bit more stable if you increase the number of starting points for the k-means (see arg. n.start). It should not really impact the outcome though: here, any K from 2 to 12 is an equally good solution, at least as judged by the BIC. Cheers Thibaut -- Dr Thibaut Jombart Lecturer, Department of Infectious Disease Epidemiology, Imperial College London Head of RECON: repidemicsconsortium.org sites.google.com/site/thibautjombart/ github.com/thibautjombart Twitter: @TeebzR On 6 December 2016 at 15:17, Felipe Hern?ndez wrote: > Thanks Thibaut, > > Here you have the image and values for each estimated K. Any advice is > more than welcome, thanks! > > Best, > Felipe > > > grp > $Kstat > K=1 K=2 K=3 K=4 K=5 K=6 K=7 K=8 > 1494.756 1481.467 1473.864 1472.002 1470.633 1472.970 1470.754 1472.011 > K=9 K=10 K=11 K=12 K=13 K=14 K=15 K=16 > 1471.813 1473.632 1473.924 1476.759 1476.699 1475.433 1479.546 1481.119 > K=17 K=18 K=19 K=20 K=21 K=22 K=23 K=24 > 1481.292 1485.865 1488.130 1488.356 1493.552 1494.979 1501.182 1499.258 > K=25 K=26 K=27 K=28 K=29 K=30 K=31 K=32 > 1500.146 1504.113 1511.598 1511.550 1513.889 1516.275 1522.144 1524.733 > K=33 K=34 K=35 K=36 K=37 K=38 K=39 K=40 > 1528.089 1530.409 1535.778 1538.049 1541.269 1546.197 1547.656 1552.127 > > $stat > K=5 > 1470.633 > > > > 2016-12-05 10:10 GMT-05:00 Thibaut Jombart : > >> Dear Felipe, >> >> this is always a hard question, as different methods essentially do.. >> different things. The K-means in find.clusters optimizes the variance >> between groups, while BAPS maximizes a likelihood function under a >> given population genetics model. So it may be the case that you have >> ~17 demes roughly at HWE, but that only 4-5 groups are optimum in >> terms of clearly delineated groups. And this is assuming both methods >> are 'right'. They may be prone to all sorts of biases. Namely, largely >> different group variances for the K-means, and deviations from the >> original model in BAPS. >> >> Feel free to post the image (or a link to it) of the BIC for >> find.clusters if you want a 2-cents advice on the number of K to look >> at. >> >> Best >> Thibaut >> >> -- >> Dr Thibaut Jombart >> Lecturer, Department of Infectious Disease Epidemiology, Imperial College >> London >> Head of RECON: repidemicsconsortium.org >> sites.google.com/site/thibautjombart/ >> github.com/thibautjombart >> Twitter: @TeebzR >> >> >> On 5 December 2016 at 14:29, Felipe Hern?ndez wrote: >> > Good morning, >> > >> > I wonder if you may guide me with this question (that may be pretty >> basic >> > surely). After a run DaPC analysis using adegenet, I'm usually getting K >> > between 4 and 5 for my dataset (480 hogs, 59 microsats, 39 sampling >> sites). >> > Maximum number of clusters tried are 40. Afterwards, I tried to estimate >> > number of clusters (spatial clustering by individuals) using another >> > software (BAPS 6.0), but I got an even higher number of estimated >> cluster >> > (K=17), after testing different maximum number of K's (i.e., K=5 through >> > K=20). Any clue about what's the reason of this? Maybe related to the >> > maximum number of cluster tested? Or, linkage disequilibrium between >> some >> > loci? Sorry if the question is really basic, but I would appreciate any >> > advice. >> > >> > Regards, >> > Felipe >> > >> > -- >> > Felipe Hern?ndez >> > M?dico Veterinario (DVM), MSc. >> > PhD. Candidate >> > Interdisciplinary Ecology Program >> > School of Natural Resources and Environment >> > Wildlife Ecology and Conservation Department >> > University of Florida >> > >> > _______________________________________________ >> > adegenet-forum mailing list >> > adegenet-forum at lists.r-forge.r-project.org >> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo >> /adegenet-forum >> > > > > -- > Felipe Hern?ndez > M?dico Veterinario (DVM), MSc. > PhD. Candidate > Interdisciplinary Ecology Program > School of Natural Resources and Environment > Wildlife Ecology and Conservation Department > University of Florida > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fhernandeu at uc.cl Tue Dec 6 17:59:05 2016 From: fhernandeu at uc.cl (=?UTF-8?Q?Felipe_Hern=C3=A1ndez?=) Date: Tue, 6 Dec 2016 11:59:05 -0500 Subject: [adegenet-forum] DaPC vs. BAPS results question In-Reply-To: References: Message-ID: Ok, thanks! So just putting attention in the lower k-mean value doesn't relate to the more likely number of clusters at the end? Ultimately, may K=5 be considered as the most probable number of genetic clusters explained by my dataset, or should I consider other factors too? I tried your suggestions and see what I can get. Thanks! Best, 2016-12-06 11:13 GMT-05:00 Thibaut Jombart : > Hello, > > the results will be a bit more stable if you increase the number of > starting points for the k-means (see arg. n.start). > > It should not really impact the outcome though: here, any K from 2 to 12 > is an equally good solution, at least as judged by the BIC. > > Cheers > Thibaut > > > -- > Dr Thibaut Jombart > Lecturer, Department of Infectious Disease Epidemiology, Imperial College > London > Head of RECON: repidemicsconsortium.org > sites.google.com/site/thibautjombart/ > github.com/thibautjombart > Twitter: @TeebzR > > On 6 December 2016 at 15:17, Felipe Hern?ndez wrote: > >> Thanks Thibaut, >> >> Here you have the image and values for each estimated K. Any advice is >> more than welcome, thanks! >> >> Best, >> Felipe >> >> > grp >> $Kstat >> K=1 K=2 K=3 K=4 K=5 K=6 K=7 K=8 >> 1494.756 1481.467 1473.864 1472.002 1470.633 1472.970 1470.754 1472.011 >> K=9 K=10 K=11 K=12 K=13 K=14 K=15 K=16 >> 1471.813 1473.632 1473.924 1476.759 1476.699 1475.433 1479.546 1481.119 >> K=17 K=18 K=19 K=20 K=21 K=22 K=23 K=24 >> 1481.292 1485.865 1488.130 1488.356 1493.552 1494.979 1501.182 1499.258 >> K=25 K=26 K=27 K=28 K=29 K=30 K=31 K=32 >> 1500.146 1504.113 1511.598 1511.550 1513.889 1516.275 1522.144 1524.733 >> K=33 K=34 K=35 K=36 K=37 K=38 K=39 K=40 >> 1528.089 1530.409 1535.778 1538.049 1541.269 1546.197 1547.656 1552.127 >> >> $stat >> K=5 >> 1470.633 >> >> >> >> 2016-12-05 10:10 GMT-05:00 Thibaut Jombart : >> >>> Dear Felipe, >>> >>> this is always a hard question, as different methods essentially do.. >>> different things. The K-means in find.clusters optimizes the variance >>> between groups, while BAPS maximizes a likelihood function under a >>> given population genetics model. So it may be the case that you have >>> ~17 demes roughly at HWE, but that only 4-5 groups are optimum in >>> terms of clearly delineated groups. And this is assuming both methods >>> are 'right'. They may be prone to all sorts of biases. Namely, largely >>> different group variances for the K-means, and deviations from the >>> original model in BAPS. >>> >>> Feel free to post the image (or a link to it) of the BIC for >>> find.clusters if you want a 2-cents advice on the number of K to look >>> at. >>> >>> Best >>> Thibaut >>> >>> -- >>> Dr Thibaut Jombart >>> Lecturer, Department of Infectious Disease Epidemiology, Imperial >>> College London >>> Head of RECON: repidemicsconsortium.org >>> sites.google.com/site/thibautjombart/ >>> github.com/thibautjombart >>> Twitter: @TeebzR >>> >>> >>> On 5 December 2016 at 14:29, Felipe Hern?ndez wrote: >>> > Good morning, >>> > >>> > I wonder if you may guide me with this question (that may be pretty >>> basic >>> > surely). After a run DaPC analysis using adegenet, I'm usually getting >>> K >>> > between 4 and 5 for my dataset (480 hogs, 59 microsats, 39 sampling >>> sites). >>> > Maximum number of clusters tried are 40. Afterwards, I tried to >>> estimate >>> > number of clusters (spatial clustering by individuals) using another >>> > software (BAPS 6.0), but I got an even higher number of estimated >>> cluster >>> > (K=17), after testing different maximum number of K's (i.e., K=5 >>> through >>> > K=20). Any clue about what's the reason of this? Maybe related to the >>> > maximum number of cluster tested? Or, linkage disequilibrium between >>> some >>> > loci? Sorry if the question is really basic, but I would appreciate any >>> > advice. >>> > >>> > Regards, >>> > Felipe >>> > >>> > -- >>> > Felipe Hern?ndez >>> > M?dico Veterinario (DVM), MSc. >>> > PhD. Candidate >>> > Interdisciplinary Ecology Program >>> > School of Natural Resources and Environment >>> > Wildlife Ecology and Conservation Department >>> > University of Florida >>> > >>> > _______________________________________________ >>> > adegenet-forum mailing list >>> > adegenet-forum at lists.r-forge.r-project.org >>> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo >>> /adegenet-forum >>> >> >> >> >> -- >> Felipe Hern?ndez >> M?dico Veterinario (DVM), MSc. >> PhD. Candidate >> Interdisciplinary Ecology Program >> School of Natural Resources and Environment >> Wildlife Ecology and Conservation Department >> University of Florida >> > > -- Felipe Hern?ndez M?dico Veterinario (DVM), MSc. PhD. Candidate Interdisciplinary Ecology Program School of Natural Resources and Environment Wildlife Ecology and Conservation Department University of Florida -------------- next part -------------- An HTML attachment was scrubbed... URL: From thibautjombart at gmail.com Tue Dec 6 18:06:35 2016 From: thibautjombart at gmail.com (Thibaut Jombart) Date: Tue, 6 Dec 2016 17:06:35 +0000 Subject: [adegenet-forum] DaPC vs. BAPS results question In-Reply-To: References: Message-ID: Not really. In situation like this as in most cases, there is no true K - only some clustering solutions are a more efficient caricature of the data than others. In this case, K=2, 3, ... 10 are all equivalently good caricatures. Cheers Thibaut -- Dr Thibaut Jombart Lecturer, Department of Infectious Disease Epidemiology, Imperial College London Head of RECON: repidemicsconsortium.org sites.google.com/site/thibautjombart/ github.com/thibautjombart Twitter: @TeebzR On 6 December 2016 at 16:59, Felipe Hern?ndez wrote: > Ok, thanks! So just putting attention in the lower k-mean value doesn't > relate to the more likely number of clusters at the end? Ultimately, may > K=5 be considered as the most probable number of genetic clusters explained > by my dataset, or should I consider other factors too? I tried your > suggestions and see what I can get. Thanks! > > Best, > > 2016-12-06 11:13 GMT-05:00 Thibaut Jombart : > >> Hello, >> >> the results will be a bit more stable if you increase the number of >> starting points for the k-means (see arg. n.start). >> >> It should not really impact the outcome though: here, any K from 2 to 12 >> is an equally good solution, at least as judged by the BIC. >> >> Cheers >> Thibaut >> >> >> -- >> Dr Thibaut Jombart >> Lecturer, Department of Infectious Disease Epidemiology, Imperial >> College London >> Head of RECON: repidemicsconsortium.org >> sites.google.com/site/thibautjombart/ >> github.com/thibautjombart >> Twitter: @TeebzR >> >> On 6 December 2016 at 15:17, Felipe Hern?ndez wrote: >> >>> Thanks Thibaut, >>> >>> Here you have the image and values for each estimated K. Any advice is >>> more than welcome, thanks! >>> >>> Best, >>> Felipe >>> >>> > grp >>> $Kstat >>> K=1 K=2 K=3 K=4 K=5 K=6 K=7 K=8 >>> 1494.756 1481.467 1473.864 1472.002 1470.633 1472.970 1470.754 1472.011 >>> K=9 K=10 K=11 K=12 K=13 K=14 K=15 K=16 >>> 1471.813 1473.632 1473.924 1476.759 1476.699 1475.433 1479.546 1481.119 >>> K=17 K=18 K=19 K=20 K=21 K=22 K=23 K=24 >>> 1481.292 1485.865 1488.130 1488.356 1493.552 1494.979 1501.182 1499.258 >>> K=25 K=26 K=27 K=28 K=29 K=30 K=31 K=32 >>> 1500.146 1504.113 1511.598 1511.550 1513.889 1516.275 1522.144 1524.733 >>> K=33 K=34 K=35 K=36 K=37 K=38 K=39 K=40 >>> 1528.089 1530.409 1535.778 1538.049 1541.269 1546.197 1547.656 1552.127 >>> >>> $stat >>> K=5 >>> 1470.633 >>> >>> >>> >>> 2016-12-05 10:10 GMT-05:00 Thibaut Jombart : >>> >>>> Dear Felipe, >>>> >>>> this is always a hard question, as different methods essentially do.. >>>> different things. The K-means in find.clusters optimizes the variance >>>> between groups, while BAPS maximizes a likelihood function under a >>>> given population genetics model. So it may be the case that you have >>>> ~17 demes roughly at HWE, but that only 4-5 groups are optimum in >>>> terms of clearly delineated groups. And this is assuming both methods >>>> are 'right'. They may be prone to all sorts of biases. Namely, largely >>>> different group variances for the K-means, and deviations from the >>>> original model in BAPS. >>>> >>>> Feel free to post the image (or a link to it) of the BIC for >>>> find.clusters if you want a 2-cents advice on the number of K to look >>>> at. >>>> >>>> Best >>>> Thibaut >>>> >>>> -- >>>> Dr Thibaut Jombart >>>> Lecturer, Department of Infectious Disease Epidemiology, Imperial >>>> College London >>>> Head of RECON: repidemicsconsortium.org >>>> sites.google.com/site/thibautjombart/ >>>> github.com/thibautjombart >>>> Twitter: @TeebzR >>>> >>>> >>>> On 5 December 2016 at 14:29, Felipe Hern?ndez wrote: >>>> > Good morning, >>>> > >>>> > I wonder if you may guide me with this question (that may be pretty >>>> basic >>>> > surely). After a run DaPC analysis using adegenet, I'm usually >>>> getting K >>>> > between 4 and 5 for my dataset (480 hogs, 59 microsats, 39 sampling >>>> sites). >>>> > Maximum number of clusters tried are 40. Afterwards, I tried to >>>> estimate >>>> > number of clusters (spatial clustering by individuals) using another >>>> > software (BAPS 6.0), but I got an even higher number of estimated >>>> cluster >>>> > (K=17), after testing different maximum number of K's (i.e., K=5 >>>> through >>>> > K=20). Any clue about what's the reason of this? Maybe related to the >>>> > maximum number of cluster tested? Or, linkage disequilibrium between >>>> some >>>> > loci? Sorry if the question is really basic, but I would appreciate >>>> any >>>> > advice. >>>> > >>>> > Regards, >>>> > Felipe >>>> > >>>> > -- >>>> > Felipe Hern?ndez >>>> > M?dico Veterinario (DVM), MSc. >>>> > PhD. Candidate >>>> > Interdisciplinary Ecology Program >>>> > School of Natural Resources and Environment >>>> > Wildlife Ecology and Conservation Department >>>> > University of Florida >>>> > >>>> > _______________________________________________ >>>> > adegenet-forum mailing list >>>> > adegenet-forum at lists.r-forge.r-project.org >>>> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo >>>> /adegenet-forum >>>> >>> >>> >>> >>> -- >>> Felipe Hern?ndez >>> M?dico Veterinario (DVM), MSc. >>> PhD. Candidate >>> Interdisciplinary Ecology Program >>> School of Natural Resources and Environment >>> Wildlife Ecology and Conservation Department >>> University of Florida >>> >> >> > > > -- > Felipe Hern?ndez > M?dico Veterinario (DVM), MSc. > PhD. Candidate > Interdisciplinary Ecology Program > School of Natural Resources and Environment > Wildlife Ecology and Conservation Department > University of Florida > -------------- next part -------------- An HTML attachment was scrubbed... URL: From biz.sheedy at gmail.com Wed Dec 7 02:00:27 2016 From: biz.sheedy at gmail.com (Biz Sheedy) Date: Wed, 7 Dec 2016 10:00:27 +0900 Subject: [adegenet-forum] Discrepancy in NA counts In-Reply-To: References: <1051082989.570757.1480318267345.JavaMail.zimbra@biolitika.si> <656891479.571300.1480323819198.JavaMail.zimbra@biolitika.si> <237564767.572607.1480336824232.JavaMail.zimbra@biolitika.si> Message-ID: Hi Thibaut and Roman, Yes, the fix has solved the issue for me. Thanks so much to you both! (Sorry for the delayed response, I couldn't get the devel version at work.) Cheers, Elizabeth On 5 December 2016 at 21:10, Thibaut Jombart wrote: > Hello, > > Roman has fixed this bug in the current devel version of adegenet. See: > https://github.com/thibautjombart/adegenet > > For guidelines on installing it. Can you confirm it solves your issue? > > Best > Thibaut > > -- > Dr Thibaut Jombart > Lecturer, Department of Infectious Disease Epidemiology, Imperial College > London > Head of RECON: repidemicsconsortium.org > sites.google.com/site/thibautjombart/ > github.com/thibautjombart > Twitter: @TeebzR > > > On 28 November 2016 at 12:40, Roman Lu?trik > wrote: > > Hi Elizabeth, > > > > it would appear there is something funky happening with the code due to > > locus names being numeric. This has happened before in some other > function. > > Until we fix this, you can change your locus names so that they start > with a > > letter. > > > > Here is the excerpt from the genind object indicating that these two > samples > > have alleles 33: > > > > X1401_25.13 X1401_25.33 X1403_13.11 X1403_13.13 X1403_13.33 > > X1404_17.13 X1404_17.33 X1404_17.11 > > C_KH1059 0 1 1 0 0 0 1 0 > > M_KH1834 0 1 1 0 0 1 0 0 > > > > > > Cheers, > > Roman > > > > > > ---- > > In god we trust, all others bring data. > > > > ________________________________ > > From: "Biz Sheedy" > > To: "Roman Lu?trik" > > Cc: adegenet-forum at lists.r-forge.r-project.org > > Sent: Monday, November 28, 2016 11:00:53 AM > > > > Subject: Re: [adegenet-forum] Discrepancy in NA counts > > > > Thanks for looking into this. > > > > Something that I did differently to the code you provided, was that I > only > > answered the prompts for the read.structure function. This meant I did > not > > use sep="\t" and the number of alleles was 62 instead of 72, which I > think > > should be comparable to the excel count. Following the code you provide, > > 'is.na' finds 23 NAs (instead of 20 NAs at 62 alleles and 16 zeroes in > > excel). > > > > Your explanation makes sense to me for the additional three NAs in > adegenet, > > but I still don't understand how in locus 1401_25 the data for two > > individuals (C_KH1059 and M_KH1834) changed from being homozygous for > "3" to > > being "NA"? > > > > I would really appreciate any further help on this. > > > > Thanks again, > > Elizabeth > > > > > > On 28 November 2016 at 18:03, Roman Lu?trik > > wrote: > >> > >> Hi, > >> > >> I think the problem is that adegenet, for consistency, adds NAs to > >> accommodate the extra alleles present for a particular locus. Take for > >> example C_KH1238 (bottom row in the example pasted belo). > >> In raw file, it has missing values for locus 1378_53, but this locus has > >> three alleles, ergo 3 NAs and not 2. Can't go through all the NAs right > now, > >> but I think there's a pretty good chance this is what is causing the > >> discrepancy between what you see in "excel" and in adegenet. > >> > >> 1369_41.11 1372_14.22 1372_14.24 1373_9.44 1373_9.24 1377_42.44 > 1377_42.24 > >> 1378_53.22 1378_53.24 1378_53.44 1379_10.33 1379_10.13 1382_37.33 > >> ... > >> C_KH1238 0 1 0 1 0 1 0 NA NA NA 1 0 1 # notice 3 NAs for all available > >> alleles for 1378_53, not just two (as expected for diploid) > >> > >> > >> Here is the code I used to explore this: > >> > >> library(adegenet) > >> > >> xy <- read.table("Sub_batch_1.stru", header = TRUE, sep = "\t") > >> xy <- xy[, c(-1, -2)] > >> table(as.matrix(xy)) > >> > >> # 0 1 2 3 4 > >> # 16 467 618 760 867 > >> > >> > >> xy <- read.structure("Sub_batch_1.stru", NA.char="0", > >> n.ind = 44, n.loc = 31, onerowperind = FALSE, > >> col.lab = 1, col.pop = 2, row.marknames = 1, > >> sep = "\t", col.others = 0) > >> > >> xy <- tab(xy) > >> xy[grepl("C_KH1238", rownames(xy)), grepl("1378_53", colnames(xy))] > >> > >> Cheers, > >> Roman > >> > >> ---- > >> In god we trust, all others bring data. > >> > >> ________________________________ > >> From: "Biz Sheedy" > >> To: "Roman Lu?trik" > >> Sent: Monday, November 28, 2016 9:11:39 AM > >> Subject: Re: [adegenet-forum] Discrepancy in NA counts > >> > >> My apologies. First time posting to a forum so I am a little unsure of > >> things. I have attached a subset of the data, which includes the locus > that > >> I saw had problems. > >> > >> In this case there are 31 loci with 16 zeroes counted (excel), and 20 > NAs > >> counted (adegenet). The additional NAs occur in locus 1401_25. > >> > >> Thanks so much, > >> Elizabeth > >> > >> On 28 November 2016 at 16:31, Roman Lu?trik > > >> wrote: > >>> > >>> Hi, > >>> > >>> can you share a (subset) of the dataset? It's hard to pinpoint where > >>> things might be going wrong without some data in hand. > >>> > >>> Cheers, > >>> Roman > >>> > >>> ---- > >>> In god we trust, all others bring data. > >>> > >>> ________________________________ > >>> From: "Biz Sheedy" > >>> To: adegenet-forum at lists.r-forge.r-project.org > >>> Sent: Friday, November 25, 2016 10:44:16 AM > >>> Subject: [adegenet-forum] Discrepancy in NA counts > >>> > >>> Dear All, > >>> > >>> I am trying to read SNP data from Stacks into adegenet. I have tried > >>> read.structure and read.genepop but they both give (the same) NA > counts that > >>> are higher than expected. Using read.table on the structure-formatted > file > >>> (with "ind" and "pop" inserted into the first two columns of row one) > gave > >>> the expected number of missing data. > >>> > >>> I looked at a single population subset (both the original and the > >>> converted data) in excel and found a locus where in the original data, > all > >>> nine individuals were "3", but in the converted data one individual was > >>> "NA". The loci before and after this one both matched/were correct. > >>> > >>> I am not sure what I have missed for this to happen, my R skills are > >>> beginner at best. Any help with reading the data in correctly would be > >>> greatly appreciated! > >>> > >>> Thank you, > >>> Elizabeth > >>> > >>> > >>> R version 3.3.2 > >>> adegenet version 2.0.1 > >>> > >>> Data: 44 individuals, diploid, 4279 loci. > >>> > >>> all<-read.structure("all_batch_1.stru", NA.char="0") > >>> > >>> Total cells in excel: 376552 > >>> After read.structure/genepop: 44*8558=376552 > >>> > >>> 0s in excel: 3952 > >>> 0s after read.table; length(which(X==0)): 3952 > >>> NA after read.structure/genepop; sum(is.na(all$tab)): 4008 > >>> Difference: 56 > >>> > >>> Subset Chichi > >>> Total cells: 77022 > >>> After read.structure/genepop: 9*8558=77022 > >>> > >>> 0s in excel: 742 > >>> NA after read.structure/genepop; sum(is.na(chi$tab)): 756 > >>> Difference: 14 > >>> > >>> > >>> > >>> -- > >>> 4-1-1 Amakubo > >>> Department of Botany > >>> National Museum of Nature and Science > >>> Tsukuba, Ibaraki 305-0005 > >>> Japan > >>> > >>> biz.sheedy at gmail.com > >>> > >>> _______________________________________________ > >>> adegenet-forum mailing list > >>> adegenet-forum at lists.r-forge.r-project.org > >>> > >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/ > listinfo/adegenet-forum > >> > >> > >> > >> > >> -- > >> 4-1-1 Amakubo > >> Department of Botany > >> National Museum of Nature and Science > >> Tsukuba, Ibaraki 305-0005 > >> Japan > >> > >> biz.sheedy at gmail.com > >> > > > > > > > > -- > > 4-1-1 Amakubo > > Department of Botany > > National Museum of Nature and Science > > Tsukuba, Ibaraki 305-0005 > > Japan > > > > biz.sheedy at gmail.com > > > > > > _______________________________________________ > > adegenet-forum mailing list > > adegenet-forum at lists.r-forge.r-project.org > > https://lists.r-forge.r-project.org/cgi-bin/mailman/ > listinfo/adegenet-forum > -- Dr Elizabeth Sheedy JSPS Postdoctoral fellow 4-1-1 Amakubo Department of Botany National Museum of Nature and Science Tsukuba, Ibaraki 305-0005 Japan biz.sheedy at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From thibautjombart at gmail.com Wed Dec 7 11:50:35 2016 From: thibautjombart at gmail.com (Thibaut Jombart) Date: Wed, 7 Dec 2016 10:50:35 +0000 Subject: [adegenet-forum] Discrepancy in NA counts In-Reply-To: References: <1051082989.570757.1480318267345.JavaMail.zimbra@biolitika.si> <656891479.571300.1480323819198.JavaMail.zimbra@biolitika.si> <237564767.572607.1480336824232.JavaMail.zimbra@biolitika.si> Message-ID: Awesome. Thanks for the report, and many thanks to Roman for solving the issue! Best Thibaut -- Dr Thibaut Jombart Lecturer, Department of Infectious Disease Epidemiology, Imperial College London Head of RECON: repidemicsconsortium.org sites.google.com/site/thibautjombart/ github.com/thibautjombart Twitter: @TeebzR On 7 December 2016 at 01:00, Biz Sheedy wrote: > Hi Thibaut and Roman, > > Yes, the fix has solved the issue for me. Thanks so much to you both! > > (Sorry for the delayed response, I couldn't get the devel version at work.) > > Cheers, > Elizabeth > > > On 5 December 2016 at 21:10, Thibaut Jombart > wrote: > >> Hello, >> >> Roman has fixed this bug in the current devel version of adegenet. See: >> https://github.com/thibautjombart/adegenet >> >> For guidelines on installing it. Can you confirm it solves your issue? >> >> Best >> Thibaut >> >> -- >> Dr Thibaut Jombart >> Lecturer, Department of Infectious Disease Epidemiology, Imperial College >> London >> Head of RECON: repidemicsconsortium.org >> sites.google.com/site/thibautjombart/ >> github.com/thibautjombart >> Twitter: @TeebzR >> >> >> On 28 November 2016 at 12:40, Roman Lu?trik >> wrote: >> > Hi Elizabeth, >> > >> > it would appear there is something funky happening with the code due to >> > locus names being numeric. This has happened before in some other >> function. >> > Until we fix this, you can change your locus names so that they start >> with a >> > letter. >> > >> > Here is the excerpt from the genind object indicating that these two >> samples >> > have alleles 33: >> > >> > X1401_25.13 X1401_25.33 X1403_13.11 X1403_13.13 X1403_13.33 >> > X1404_17.13 X1404_17.33 X1404_17.11 >> > C_KH1059 0 1 1 0 0 0 1 0 >> > M_KH1834 0 1 1 0 0 1 0 0 >> > >> > >> > Cheers, >> > Roman >> > >> > >> > ---- >> > In god we trust, all others bring data. >> > >> > ________________________________ >> > From: "Biz Sheedy" >> > To: "Roman Lu?trik" >> > Cc: adegenet-forum at lists.r-forge.r-project.org >> > Sent: Monday, November 28, 2016 11:00:53 AM >> > >> > Subject: Re: [adegenet-forum] Discrepancy in NA counts >> > >> > Thanks for looking into this. >> > >> > Something that I did differently to the code you provided, was that I >> only >> > answered the prompts for the read.structure function. This meant I did >> not >> > use sep="\t" and the number of alleles was 62 instead of 72, which I >> think >> > should be comparable to the excel count. Following the code you provide, >> > 'is.na' finds 23 NAs (instead of 20 NAs at 62 alleles and 16 zeroes in >> > excel). >> > >> > Your explanation makes sense to me for the additional three NAs in >> adegenet, >> > but I still don't understand how in locus 1401_25 the data for two >> > individuals (C_KH1059 and M_KH1834) changed from being homozygous for >> "3" to >> > being "NA"? >> > >> > I would really appreciate any further help on this. >> > >> > Thanks again, >> > Elizabeth >> > >> > >> > On 28 November 2016 at 18:03, Roman Lu?trik > > >> > wrote: >> >> >> >> Hi, >> >> >> >> I think the problem is that adegenet, for consistency, adds NAs to >> >> accommodate the extra alleles present for a particular locus. Take for >> >> example C_KH1238 (bottom row in the example pasted belo). >> >> In raw file, it has missing values for locus 1378_53, but this locus >> has >> >> three alleles, ergo 3 NAs and not 2. Can't go through all the NAs >> right now, >> >> but I think there's a pretty good chance this is what is causing the >> >> discrepancy between what you see in "excel" and in adegenet. >> >> >> >> 1369_41.11 1372_14.22 1372_14.24 1373_9.44 1373_9.24 1377_42.44 >> 1377_42.24 >> >> 1378_53.22 1378_53.24 1378_53.44 1379_10.33 1379_10.13 1382_37.33 >> >> ... >> >> C_KH1238 0 1 0 1 0 1 0 NA NA NA 1 0 1 # notice 3 NAs for all available >> >> alleles for 1378_53, not just two (as expected for diploid) >> >> >> >> >> >> Here is the code I used to explore this: >> >> >> >> library(adegenet) >> >> >> >> xy <- read.table("Sub_batch_1.stru", header = TRUE, sep = "\t") >> >> xy <- xy[, c(-1, -2)] >> >> table(as.matrix(xy)) >> >> >> >> # 0 1 2 3 4 >> >> # 16 467 618 760 867 >> >> >> >> >> >> xy <- read.structure("Sub_batch_1.stru", NA.char="0", >> >> n.ind = 44, n.loc = 31, onerowperind = FALSE, >> >> col.lab = 1, col.pop = 2, row.marknames = 1, >> >> sep = "\t", col.others = 0) >> >> >> >> xy <- tab(xy) >> >> xy[grepl("C_KH1238", rownames(xy)), grepl("1378_53", colnames(xy))] >> >> >> >> Cheers, >> >> Roman >> >> >> >> ---- >> >> In god we trust, all others bring data. >> >> >> >> ________________________________ >> >> From: "Biz Sheedy" >> >> To: "Roman Lu?trik" >> >> Sent: Monday, November 28, 2016 9:11:39 AM >> >> Subject: Re: [adegenet-forum] Discrepancy in NA counts >> >> >> >> My apologies. First time posting to a forum so I am a little unsure of >> >> things. I have attached a subset of the data, which includes the locus >> that >> >> I saw had problems. >> >> >> >> In this case there are 31 loci with 16 zeroes counted (excel), and 20 >> NAs >> >> counted (adegenet). The additional NAs occur in locus 1401_25. >> >> >> >> Thanks so much, >> >> Elizabeth >> >> >> >> On 28 November 2016 at 16:31, Roman Lu?trik < >> roman.lustrik at biolitika.si> >> >> wrote: >> >>> >> >>> Hi, >> >>> >> >>> can you share a (subset) of the dataset? It's hard to pinpoint where >> >>> things might be going wrong without some data in hand. >> >>> >> >>> Cheers, >> >>> Roman >> >>> >> >>> ---- >> >>> In god we trust, all others bring data. >> >>> >> >>> ________________________________ >> >>> From: "Biz Sheedy" >> >>> To: adegenet-forum at lists.r-forge.r-project.org >> >>> Sent: Friday, November 25, 2016 10:44:16 AM >> >>> Subject: [adegenet-forum] Discrepancy in NA counts >> >>> >> >>> Dear All, >> >>> >> >>> I am trying to read SNP data from Stacks into adegenet. I have tried >> >>> read.structure and read.genepop but they both give (the same) NA >> counts that >> >>> are higher than expected. Using read.table on the structure-formatted >> file >> >>> (with "ind" and "pop" inserted into the first two columns of row one) >> gave >> >>> the expected number of missing data. >> >>> >> >>> I looked at a single population subset (both the original and the >> >>> converted data) in excel and found a locus where in the original >> data, all >> >>> nine individuals were "3", but in the converted data one individual >> was >> >>> "NA". The loci before and after this one both matched/were correct. >> >>> >> >>> I am not sure what I have missed for this to happen, my R skills are >> >>> beginner at best. Any help with reading the data in correctly would be >> >>> greatly appreciated! >> >>> >> >>> Thank you, >> >>> Elizabeth >> >>> >> >>> >> >>> R version 3.3.2 >> >>> adegenet version 2.0.1 >> >>> >> >>> Data: 44 individuals, diploid, 4279 loci. >> >>> >> >>> all<-read.structure("all_batch_1.stru", NA.char="0") >> >>> >> >>> Total cells in excel: 376552 >> >>> After read.structure/genepop: 44*8558=376552 >> >>> >> >>> 0s in excel: 3952 >> >>> 0s after read.table; length(which(X==0)): 3952 >> >>> NA after read.structure/genepop; sum(is.na(all$tab)): 4008 >> >>> Difference: 56 >> >>> >> >>> Subset Chichi >> >>> Total cells: 77022 >> >>> After read.structure/genepop: 9*8558=77022 >> >>> >> >>> 0s in excel: 742 >> >>> NA after read.structure/genepop; sum(is.na(chi$tab)): 756 >> >>> Difference: 14 >> >>> >> >>> >> >>> >> >>> -- >> >>> 4-1-1 Amakubo >> >>> Department of Botany >> >>> National Museum of Nature and Science >> >>> Tsukuba, Ibaraki 305-0005 >> >>> Japan >> >>> >> >>> biz.sheedy at gmail.com >> >>> >> >>> _______________________________________________ >> >>> adegenet-forum mailing list >> >>> adegenet-forum at lists.r-forge.r-project.org >> >>> >> >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo >> /adegenet-forum >> >> >> >> >> >> >> >> >> >> -- >> >> 4-1-1 Amakubo >> >> Department of Botany >> >> National Museum of Nature and Science >> >> Tsukuba, Ibaraki 305-0005 >> >> Japan >> >> >> >> biz.sheedy at gmail.com >> >> >> > >> > >> > >> > -- >> > 4-1-1 Amakubo >> > Department of Botany >> > National Museum of Nature and Science >> > Tsukuba, Ibaraki 305-0005 >> > Japan >> > >> > biz.sheedy at gmail.com >> > >> > >> > _______________________________________________ >> > adegenet-forum mailing list >> > adegenet-forum at lists.r-forge.r-project.org >> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo >> /adegenet-forum >> > > > > -- > Dr Elizabeth Sheedy > JSPS Postdoctoral fellow > > 4-1-1 Amakubo > Department of Botany > National Museum of Nature and Science > Tsukuba, Ibaraki 305-0005 > Japan > > biz.sheedy at gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fhernandeu at uc.cl Tue Dec 6 16:17:20 2016 From: fhernandeu at uc.cl (=?UTF-8?Q?Felipe_Hern=C3=A1ndez?=) Date: Tue, 06 Dec 2016 15:17:20 -0000 Subject: [adegenet-forum] DaPC vs. BAPS results question In-Reply-To: References: Message-ID: Thanks Thibaut, Here you have the image and values for each estimated K. Any advice is more than welcome, thanks! Best, Felipe > grp $Kstat K=1 K=2 K=3 K=4 K=5 K=6 K=7 K=8 1494.756 1481.467 1473.864 1472.002 1470.633 1472.970 1470.754 1472.011 K=9 K=10 K=11 K=12 K=13 K=14 K=15 K=16 1471.813 1473.632 1473.924 1476.759 1476.699 1475.433 1479.546 1481.119 K=17 K=18 K=19 K=20 K=21 K=22 K=23 K=24 1481.292 1485.865 1488.130 1488.356 1493.552 1494.979 1501.182 1499.258 K=25 K=26 K=27 K=28 K=29 K=30 K=31 K=32 1500.146 1504.113 1511.598 1511.550 1513.889 1516.275 1522.144 1524.733 K=33 K=34 K=35 K=36 K=37 K=38 K=39 K=40 1528.089 1530.409 1535.778 1538.049 1541.269 1546.197 1547.656 1552.127 $stat K=5 1470.633 2016-12-05 10:10 GMT-05:00 Thibaut Jombart : > Dear Felipe, > > this is always a hard question, as different methods essentially do.. > different things. The K-means in find.clusters optimizes the variance > between groups, while BAPS maximizes a likelihood function under a > given population genetics model. So it may be the case that you have > ~17 demes roughly at HWE, but that only 4-5 groups are optimum in > terms of clearly delineated groups. And this is assuming both methods > are 'right'. They may be prone to all sorts of biases. Namely, largely > different group variances for the K-means, and deviations from the > original model in BAPS. > > Feel free to post the image (or a link to it) of the BIC for > find.clusters if you want a 2-cents advice on the number of K to look > at. > > Best > Thibaut > > -- > Dr Thibaut Jombart > Lecturer, Department of Infectious Disease Epidemiology, Imperial College > London > Head of RECON: repidemicsconsortium.org > sites.google.com/site/thibautjombart/ > github.com/thibautjombart > Twitter: @TeebzR > > > On 5 December 2016 at 14:29, Felipe Hern?ndez wrote: > > Good morning, > > > > I wonder if you may guide me with this question (that may be pretty basic > > surely). After a run DaPC analysis using adegenet, I'm usually getting K > > between 4 and 5 for my dataset (480 hogs, 59 microsats, 39 sampling > sites). > > Maximum number of clusters tried are 40. Afterwards, I tried to estimate > > number of clusters (spatial clustering by individuals) using another > > software (BAPS 6.0), but I got an even higher number of estimated cluster > > (K=17), after testing different maximum number of K's (i.e., K=5 through > > K=20). Any clue about what's the reason of this? Maybe related to the > > maximum number of cluster tested? Or, linkage disequilibrium between some > > loci? Sorry if the question is really basic, but I would appreciate any > > advice. > > > > Regards, > > Felipe > > > > -- > > Felipe Hern?ndez > > M?dico Veterinario (DVM), MSc. > > PhD. Candidate > > Interdisciplinary Ecology Program > > School of Natural Resources and Environment > > Wildlife Ecology and Conservation Department > > University of Florida > > > > _______________________________________________ > > adegenet-forum mailing list > > adegenet-forum at lists.r-forge.r-project.org > > https://lists.r-forge.r-project.org/cgi-bin/mailman/ > listinfo/adegenet-forum > -- Felipe Hern?ndez M?dico Veterinario (DVM), MSc. PhD. Candidate Interdisciplinary Ecology Program School of Natural Resources and Environment Wildlife Ecology and Conservation Department University of Florida -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Hogs_BICvsNclusters.jpeg Type: image/jpeg Size: 33555 bytes Desc: not available URL: From ohen at aqua.dtu.dk Wed Dec 14 15:07:41 2016 From: ohen at aqua.dtu.dk (Ole Henriksen) Date: Wed, 14 Dec 2016 14:07:41 -0000 Subject: [adegenet-forum] Difference in assignment between versions of adegenet (DAPC) Message-ID: <7D7675C38641E241926881C240902CDA0C1E9C49@ait-pex02mbx05.win.dtu.dk> Hi Thibaut and Co We're a team who have used adegenet's (version 1-4.1 and 1-4.2 ) DAPC assignment method for some earlier studies. We are now encountering problems using the assignment method. The problem is that the new version adegenet 2.0.1 assigns "old individuals", which we have used in earlier studies, differently compared to assignments with earlier versions of the package. We use SNP data, and our gen-files look as shown below. Alleles are coded by three digits. Se example below ______________________________________________ GenePop file, with 5 samples & 96 loci cgpGmo-S1017 cgpGmo-S1018a cgpGmo-S1026 cgpGmo-S1070 cgpGmo-S1095 cgpGmo-S1103 POP DAB08_01 , 001001 002002 001002 002001 001001 001001 DAB08_02 , 001001 002002 001001 002002 001001 002002 DAB08_03 , 001001 002002 001001 002002 001001 002001 POP INC02_01 , 001001 002002 001002 002002 001001 002001 INC02_02 , 001001 002002 002002 002002 001001 002002 INC02_03 , 001001 002002 001002 002002 001001 002001 __________________________________________ We have two issues 1) Last year we assigned individuals using version adegenet 1-4.1.We suspected that is must be something with how the file are read, and we wanted to check and compare with older versions (1-4.1 and 1-4.2). We've tried to use older versions with install_version() to make the comparison between versions (1-4.1, 1-4.2 and 2.0.1), but we keep getting following error message when using older versions. ___________________________________________ Converting data from a Genepop .gen file to a genind object... File description: GenePop file, with 5 samples & 96 loci Error in while (keepCheck) { : missing value where TRUE/FALSE needed ____________________________________________________________ We do not understand why we get this error message, when we use the exact same files as we have always used. Any idea? 2) When we use the newest version, we get a different assignment result compared to assignments with earlier versions of the package. I have my previous assignment results for assigned individuals (1-4.1 and 1-4.2). I reassigned these individuals with the new package (2.0.1). Thereafter, I've compared the assignment between package versions and they are different, even though we retain the same number of PC's, use same reference file and use the same script with some minor corrections for reading files to accommodate the new version. Any idea why this is the case? Any changes to how each locus and allele are read from version to version? I have noticed that there is a difference between assignment when using adegenet (2.0.1) depending on the individuals I include in a gen-file for assignment. When I assign all my individuals from all years in one file, it will give a different assignment result than when I assign single files where they are divided up into years. Can it be the positioning of alleles at each locus which have changed? We are not sure what is going wrong, but we suspect that it is something with the reading of our files. Below is some R-history, which hopefully. might be helpful R-script: ______________________________________________ #Reading files Ref <- read.genepop("Ref.gen", ncode = 3) Assign <- read.genepop("TBA_All.gen", ncode = 3) #DAPC DAPC_Ref<-dapc(Ref, pop(Ref), n.pca=100, n.da=3) #Assignment Predict=predict.dapc(DAPC_Ref, newdata=Assign) Predict$assign Genind objects after read.genepop(): ___________________________________ >Reference /// GENIND OBJECT ///////// // 487 individuals; 96 loci; 192 alleles; size: 451.5 Kb // Basic content @tab: 487 x 192 matrix of allele counts @loc.n.all: number of alleles per locus (range: 2-2) @loc.fac: locus factor for the 192 columns of @tab @all.names: list of allele names for each locus @ploidy: ploidy of each individual (range: 2-2) @type: codom @call: read.genepop(file = "Ref.gen", ncode = 3) // Optional content @pop: population of each individual (group size range: 62-215) >AssignAll #All individuals for all years /// GENIND OBJECT ///////// // 1,357 individuals; 96 loci; 192 alleles; size: 1.1 Mb // Basic content @tab: 1357 x 192 matrix of allele counts @loc.n.all: number of alleles per locus (range: 2-2) @loc.fac: locus factor for the 192 columns of @tab @all.names: list of allele names for each locus @ploidy: ploidy of each individual (range: 2-2) @type: codom @call: read.genepop(file = "TBA_All.gen", ncode = 3) // Optional content @pop: population of each individual (group size range: 1357-1357) > Assign2015 #individuals for year 2015 only /// GENIND OBJECT ///////// // 469 individuals; 96 loci; 192 alleles; size: 434.2 Kb // Basic content @tab: 469 x 192 matrix of allele counts @loc.n.all: number of alleles per locus (range: 2-2) @loc.fac: locus factor for the 192 columns of @tab @all.names: list of allele names for each locus @ploidy: ploidy of each individual (range: 2-2) @type: codom @call: read.genepop(file = "TBA_Fisk2015.gen", ncode = 3) // Optional content @pop: population of each individual (group size range: 469-469) Assignment result showing different assignment depending on which individuals one include in a input-file (gen-file) for assignment is after predict.dapc(): _______________________________________________________ > Predict$assign #All individuals for all years [1] TAS10_30 TAS10_30 TAS10_30 TAS10_30 UMM45_39 UMM45_39 [7] UMM45_39 UMM45_39 TAS10_30 TAS10_30 UMM45_39 UMM45_39 [13] ISC02_39 UMM45_39 ISC02_39 ISC02_39 ISC02_39 ISC02_39 [19] UMM45_39 QOR08_30 ISC02_39 UMM45_39 TAS10_30 UMM45_39 [25] QOR08_30 QOR08_30 UMM45_39 QOR08_30 QOR08_30 UMM45_39 [31] UMM45_39 UMM45_39 QOR08_30 UMM45_39 UMM45_39 ISC02_39 [37] ISC02_39 UMM45_39 UMM45_39 QOR08_30 UMM45_39 QOR08_30 [43] UMM45_39 UMM45_39 UMM45_39 UMM45_39 QOR08_30 UMM45_39 etc. > Predict$assign #individuals for year 2015 only [1] TAS10_30 TAS10_30 TAS10_30 TAS10_30 TAS10_30 TAS10_30 [7] TAS10_30 TAS10_30 TAS10_30 TAS10_30 UMM45_39 UMM45_39 [13] UMM45_39 UMM45_39 UMM45_39 UMM45_39 UMM45_39 UMM45_39 [19] UMM45_39 UMM45_39 ISC02_39 ISC02_39 ISC02_39 ISC02_39 [25] ISC02_39 ISC02_39 TAS10_30 ISC02_39 ISC02_39 ISC02_39 [31] ISC02_39 ISC02_39 ISC02_39 ISC02_39 ISC02_39 TAS10_30 [37] ISC02_39 ISC02_39 ISC02_39 ISC02_39 ISC02_39 TAS10_30 [43] ISC02_39 ISC02_39 ISC02_39 ISC02_39 ISC02_39 ISC02_39 etc. Thank you Sincerely Ole and team -------------- next part -------------- An HTML attachment was scrubbed... URL: