From jrichardson4 at gmail.com Tue Nov 3 17:43:45 2015 From: jrichardson4 at gmail.com (Jonathan Richardson) Date: Tue, 3 Nov 2015 11:43:45 -0500 Subject: [adegenet-forum] MANOVA significant testing with DAPC Message-ID: Hi Thibaut and Adegenet users, I have a follow-up question to one I asked back in March 2012. I have more data to appreciate what you were suggesting then (original correspondence pasted below). In short, we would like to test whether "groups" of genotypes are significantly separated in discriminant function space. We proposed using a MANOVA of the individual coordinates coming from DAPC to do this. Now that I've tested another 2 datasets, Thibaut was correct that these usually come out significant regardless of actual clustering patterns in DF space. The original code looked like this: model <- manova(dapcobject$ind.coord~genindobject$pop) summary(model, test=?Wilks?) But you mentioned that a MANOVA could be done on the retained PCs after the PCA step - the more traditional test with discriminant analysis. After trying to apply this with our new datasets, we are hoping to clarify 2 things: 1. To execute this, do you mean to use the $tab item in the dapc output (i.e. "retained PCs of PCA"), in place of the $ind.coord item? Or did you mean a step earlier in the find.clusters PC retention step? 2. If you meant the dapc step, the structure of the $tab data appears to make it much more difficult to pull into an MANOVA analysis (i.e., it is a data frame with 1 observation per genotype, and # of variables equal to PCs retained). The $ind.coord data is numeric with (not surprisingly) 2 values per genotype relating to the location in DF space. I'm hoping you can confirm question 1 before I spend too much more time figuring out the data formatting issue in #2. I should also say thank you for your time and efforts developing and supporting Adegenet; I am finding it more useful through the years. Thank you very much! - Jon _______________________ Archived emails: 3March2012: Hello again Dr. Jombart and Adegenet users, I have a follow-up question related to the grouping of individuals not using k-means. We would like to test whether the group assignment (assigned by us) is significantly related to the location of individuals in the discriminant function (DF) space. To do this we have taken the following approach: 1. Perform a MANOVA on the individual DF coordinates with group class as the predictor variable. The idea here is that (A) the Wilks lamba test provides a metric of separation among the groups and (B) accounts for correlation among variables (DFs). The test code is: model <- manova(dapcobject$ind.coord~genindobject$pop) summary(model, test=?Wilks?) 2. However, we are worried that the significance value obtained by MANOVA (which was remarkably small) might be anti-conservative (i.e. high Type-I error) because DAPC has already maximized among group variation and uncovered structure that might be evident even in random datasets. Therefore, we came up with a randomization test. We first create a null DF distribution by randomizing the rows/individuals in the ?genind? data object so that the number of individuals per group remains the same, but the individuals contained in each group are now randomized. We do this 1000 times and perform the DAPC and MANOVA operations on all 1000 sets to obtain the randomized distribution. Lastly, we compare our empirical Wilks lambda value with the randomized distribution to determine if our Wilks is larger than expected based on random chance. Does this seem reasonable? Our hesitation is related to some initial results from our dataset. When we run the empirical dataset with 3 defined groups, the DAPC produces 3 clear clusters with some small overlap (i.e. the 3 a priori groups segregate very nicely in DF space). However, when we randomized the alleles and genotypes, the resulting DAPC with the same group sizes also results in 3 clear clusters, but that have noticeably more ellipse overlap than the empirical data. So we are wondering whether the a priori group designation (related to a substantial habitat and phenotypic difference in our case) will mandate some level of clustering ? but with DAPC also looking to optimize grouping segregation in DF space the patterns become clearer and maybe somewhat spurious (at least in our case)? Any insight you can provide would be greatly appreciated. Thank you in advance. Jon _________ 6 March 2012: Hello, Yes, as you suggest the approach described in 1 is circular, and the test should nearly always be significant. The second approach is not ideal because the amount of discrimination - and therefore your test statistics - depends on the retained variation in the dimension-reduction/PCA step, which is likely to vary from one permutation to another. I would perform the MANOVA on the retained PCs after the PCA step. This should be less computer intensive, and is the traditional test associated to discriminant analysis. Cheers Thibaut -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlotte.hurry at griffithuni.edu.au Wed Nov 4 23:17:11 2015 From: charlotte.hurry at griffithuni.edu.au (Charlotte Hurry) Date: Thu, 5 Nov 2015 09:17:11 +1100 Subject: [adegenet-forum] Please remove me from the mailing list Message-ID: many thanks -- Charlotte Hurry -------------- next part -------------- An HTML attachment was scrubbed... URL: From dan.j.buckley at gmail.com Wed Nov 4 23:24:16 2015 From: dan.j.buckley at gmail.com (Daniel Buckley) Date: Wed, 4 Nov 2015 22:24:16 +0000 Subject: [adegenet-forum] Please remove me from the mailing list Message-ID: On Wed, Nov 4, 2015 at 10:17 PM, Charlotte Hurry < charlotte.hurry at griffithuni.edu.au> wrote: > many thanks > > -- > Charlotte Hurry > > > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum > -------------- next part -------------- An HTML attachment was scrubbed... URL: From 17869067 at students.latrobe.edu.au Sun Nov 1 03:22:08 2015 From: 17869067 at students.latrobe.edu.au (LAURA NICOLE WOODINGS) Date: Sun, 1 Nov 2015 02:22:08 +0000 Subject: [adegenet-forum] xval warning Message-ID: Hi, I am trying to use xval to cross-validate the number of PCs to retain for a DAPC using the below command and I am getting the following error msg: xval_nopopbp75_cnt75_cl96_maf5to1 <- xvalDapc(nopop_bp75_cnt75_cl96_maf5to1NoNA at tab, nopop_clust$grp) There were 50 or more warnings (use warnings() to see the first 50) > warnings() Warning messages: 1: In if (result == "overall") { ... : the condition has length > 1 and only the first element will be used 2: In if (result == "groupMean") { ... : the condition has length > 1 and only the first element will be used Using find.clusters 2 groups are found but 1 group has 83 individuals and the other group has 4. Is the discrepancy in sample size causing the problem with xval? And if so how can I determine the optimal number of PC's to keep? I have attached my input file and the script that I use. Also when conduction DAPC should you remove the NA's in the data? The DAPC seems to work when the Nas are not removed, but when I run it with them removed I get different results. Thanks for your help! Cheers, Laura PhD Candidate Department of Ecology, Environment and Evolution| La Trobe University | Bundoora 3086 | Australia m: +61 408 642 006 | e: 17869067 at students.latrobe.edu.au -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: neutral_nopop_75bp_cnt75_cl96_2-16_ml80._maf0.05-0.01_missing0.8_hwe0.001.stru Type: application/octet-stream Size: 186036 bytes Desc: neutral_nopop_75bp_cnt75_cl96_2-16_ml80._maf0.05-0.01_missing0.8_hwe0.001.stru URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Xval.R Type: application/octet-stream Size: 598 bytes Desc: Xval.R URL: From matt_smith at fws.gov Tue Nov 3 19:05:17 2015 From: matt_smith at fws.gov (Smith, Matt) Date: Tue, 3 Nov 2015 10:05:17 -0800 Subject: [adegenet-forum] replacing NAs Message-ID: Hello, I am trying to create a genind object from a genepop file, then replace missing data with mean. Missing data is indicated by 0's in the genepop file. When I check for NAs, none are recognized. Why isn't adegenet recognizing missing data in my genepop file? > Coho <- read.genepop("Input/20pop10loci.gen", ncode = 3L) Converting data from a Genepop .gen file to a genind object... File description: coho data dec 2014 ...done. > sum(is.na(Coho$tab)) [1] 0 Thank you -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 20pop10loci.gen Type: application/octet-stream Size: 131049 bytes Desc: not available URL: From matt_smith at fws.gov Tue Nov 3 19:55:29 2015 From: matt_smith at fws.gov (Smith, Matt) Date: Tue, 3 Nov 2015 10:55:29 -0800 Subject: [adegenet-forum] replacing NAs Message-ID: Hello, I am trying to create a genind object from a genepop file, then replace missing data with mean. Missing data is indicated by 0's in the genepop file. When I check for NAs, none are recognized. Why isn't adegenet recognizing missing data in my genepop file? > Coho <- read.genepop("Input/20pop10loci.gen", ncode = 3L) Converting data from a Genepop .gen file to a genind object... File description: coho data dec 2014 ...done. > sum(is.na(Coho$tab)) [1] 0 I'm new to this, so sorry for any duplicate posts. Thank you! -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 20pop10loci.gen Type: application/octet-stream Size: 131049 bytes Desc: not available URL: From postmaster at r-forge.wu-wien.ac.at Fri Nov 6 14:26:16 2015 From: postmaster at r-forge.wu-wien.ac.at (The Post Office) Date: Fri, 6 Nov 2015 16:26:16 +0300 Subject: [adegenet-forum] Returned mail: Data format error Message-ID: The message was not delivered due to the following reason: Your message was not delivered because the destination computer was not reachable within the allowed queue period. The amount of time a message is queued before it is returned depends on local configura- tion parameters. Most likely there is a network problem that prevented delivery, but it is also possible that the computer is turned off, or does not have a mail system running right now. Your message was not delivered within 6 days: Host 6.32.159.194 is not responding. The following recipients could not receive this message: Please reply to postmaster at r-forge.wu-wien.ac.at if you feel this message to be in error. -------------- next part -------------- A non-text attachment was scrubbed... Name: mail.zip Type: application/octet-stream Size: 28978 bytes Desc: not available URL: From t.jombart at imperial.ac.uk Mon Nov 9 13:11:00 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Mon, 9 Nov 2015 12:11:00 +0000 Subject: [adegenet-forum] Please remove me from the mailing list In-Reply-To: References: Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570B12B5C14@icexch-m1.ic.ac.uk> Dear Daniel and Charlotte, and all, a reminder on how to unsubscribe from this forum. All you need to do go to this place and enter your email in the 'unsubscribe' section: http://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum The only effect of sending an email here asking to unsubscribe is, well, telling everyone here you want out, without actually removing you from the forum - awkward. Cheers Thibaut ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Daniel Buckley [dan.j.buckley at gmail.com] Sent: 04 November 2015 22:24 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] Please remove me from the mailing list On Wed, Nov 4, 2015 at 10:17 PM, Charlotte Hurry > wrote: many thanks -- Charlotte Hurry _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.jombart at imperial.ac.uk Mon Nov 9 15:43:35 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Mon, 9 Nov 2015 14:43:35 +0000 Subject: [adegenet-forum] Very different number of clusters in different datasets. In-Reply-To: <56335747.90706@students.mq.edu.au> References: <56335747.90706@students.mq.edu.au> Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570B12B5ED3@icexch-m1.ic.ac.uk> Hi there, there is a bunch of questions there, and I may miss one or two. In a nutshell: - It happens that k-means finds clusters where STRUCTURE fails (see original paper); this is not necessarily a sign that find.clusters is wrong; in your case, for the microsat data, it looks like if there are any clusters these are not linked to the geographical locations; hard to say more without seeing outputs/the data - The graph of your second analysis (SNPs) shows no structure. k=1 is not nonsensical, it is just a suggestion that there are no clusters in your data. - xvalDapc has not been implemented (yet) for genlight objects; to convert data into a suitable format try as.matrix(...). - cross validation is to be preferred to the a-score - MDS is not a clustering method - MDS optimizes overall diversity so may fail to detect group structure Cheers Thibaut ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Peri Bolton [peri.bolton at students.mq.edu.au] Sent: 30 October 2015 11:40 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] Very different number of clusters in different datasets. Dear adegenet developers and users, I have a dataset with 50 individuals across 5 sampling locations in a microsatellite dataset, and roughly equivalent numbers of individuals in a SNP dataset with 3839 loci. I have just been interested in finding whether there is any population structure in my species. However, when I run the different datasets I get different answers, and some of them look strange. microsatellite dataset. Fst, mantel test for IBD and STRUCTURE both find zero evidence of structure... find.clusters says k=4 or 5 then I run optima.a.score and xvalDapc to find the best number of PCs to retain for a dapc, and I have nice groups in the final answer, with apparently good assignment power back to the original groups. However, my alpha scores for that dapc run is as follows 1 2 3 4 0.4905714 0.5570149 0.7075510 0.5962500 Further, when I visualise this as a compoplot there is no evidence that these structures actually represent any kind of geographic structure in the data, as the groups are just randomly dispersed through my individuals. I have read on topics in the forums that if there is enough space in the data it will find an optimal clustering solution, no matter whether it is biologically realistic. I have also read that find.clusters shouldn't find an optimal solution for k=1 because it is meant to be a non-sense solution for a cluster. Indeed this makes sense because when you use sampling locality as a prior in dapc it all comes out as one big cluster. HOWEVER, when I run my SNP dataset things get really strange. I ran essentially all the same procedures and I've come up against a number of hurdles: 1. I can't get the xvalDapc to work on a genlight object. I keep getting an error: Error in as.data.frame.default(x[[i]], optional = TRUE) : cannot coerce class "structure("SNPbin", package = "adegenet")" to a data.frame In addition: Warning message: In min(dim(x)) : no non-missing arguments to min; returning Inf Obviously this is because genlight doesn't store the genetic data in the same way as the genind objects do. Is there a work around for using this function? So far I have got xvalDapc to work on my genind objects, but I do get a bunch of "warning messages "49: In if (result == "overall") { ... : the condition has length > 1 and only the first element will be used", but it seems to spit out an output at least.... 2. when I run find.clusters my cumulative variance plot is nearly linear... as is my BICvsK plot, with the optimal solution being the supposedly non-sensical k=1 (see the attached pdf of the output)? Is there something weird with my data? Or, is that the genuine signal coming through? When I use other clustering methods such as fastSTRUCTURE and mds I don't get any indication of structure either. HOWEVER, I don't know how to reconcile the two clustering solutions from the two nuclear data sources. 3. When I run an a.score analysis it is basically a flat line, and although it finds an "optimal" pca retention it doesn't seem very reliable to me (see also attached) So I am aware that there are a few problems there, but hopefully the itemisation and the context of my questions help any good hearted helping people out there. Sincerely, Peri -- Peri Bolton PhD Candidate, Griffith Lab Department of Biological Sciences Macquarie University, NSW 2109, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.jombart at imperial.ac.uk Mon Nov 9 15:47:57 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Mon, 9 Nov 2015 14:47:57 +0000 Subject: [adegenet-forum] Sample Names in glPlot() In-Reply-To: <6070ED6F-9EA9-4CAB-99C3-C472918ECE5D@gmail.com> References: <6070ED6F-9EA9-4CAB-99C3-C472918ECE5D@gmail.com> Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570B12B5EE8@icexch-m1.ic.ac.uk> Hi Luke, genlight are made for large datasets; the assumption is that there will typically be too many individuals to plot individual labels in glPlot. Patches welcome (you'll have to fork/pull request) but I am not sure I (or the rest of the team) will be able to spend time on this in the near future. In most other plots (e.g. scatterplots) individual labels should be easier to add. Cheers Thibaut ________________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Luke Anderson-Trocme [luke.anderson.trocme at gmail.com] Sent: 30 October 2015 18:51 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] Sample Names in glPlot() Hi Adegenet Forum, I have a question/comment regarding the Adegenet package. I have been able to create a penlight object with my sequences, but am struggling to find a way to add the names of the samples to the glPlot and other similar types of plots. At the moment all I can see is the number of individuals on the y axis. I have tried troubleshooting the problem myself but have not found a solution to my problem yet. Any help or advice is much appreciated. Thank you for your time, Luke Anderson-Trocm? MSc Candidate Biology Department, McGill University luke.anderson.trocme at gmail.com _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum From t.jombart at imperial.ac.uk Mon Nov 9 15:52:36 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Mon, 9 Nov 2015 14:52:36 +0000 Subject: [adegenet-forum] MANOVA significant testing with DAPC In-Reply-To: References: Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570B12B5EFF@icexch-m1.ic.ac.uk> Hi Jon, I am afraid the first test is indeed circular: $ind.coords are already optimized in the very sense of what MANOVA will consider to be structure. The test on the retained PCs of the PCA is valid though cumbersome if you plan on testing various number of retained PCs. PCs of the PCA are indeed stored in $tab of your dapc object, so this is what you want to use. Best Thibaut ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Jonathan Richardson [jrichardson4 at gmail.com] Sent: 03 November 2015 16:43 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] MANOVA significant testing with DAPC Hi Thibaut and Adegenet users, I have a follow-up question to one I asked back in March 2012. I have more data to appreciate what you were suggesting then (original correspondence pasted below). In short, we would like to test whether "groups" of genotypes are significantly separated in discriminant function space. We proposed using a MANOVA of the individual coordinates coming from DAPC to do this. Now that I've tested another 2 datasets, Thibaut was correct that these usually come out significant regardless of actual clustering patterns in DF space. The original code looked like this: model <- manova(dapcobject$ind.coord~genindobject$pop) summary(model, test=?Wilks?) But you mentioned that a MANOVA could be done on the retained PCs after the PCA step - the more traditional test with discriminant analysis. After trying to apply this with our new datasets, we are hoping to clarify 2 things: 1. To execute this, do you mean to use the $tab item in the dapc output (i.e. "retained PCs of PCA"), in place of the $ind.coord item? Or did you mean a step earlier in the find.clusters PC retention step? 2. If you meant the dapc step, the structure of the $tab data appears to make it much more difficult to pull into an MANOVA analysis (i.e., it is a data frame with 1 observation per genotype, and # of variables equal to PCs retained). The $ind.coord data is numeric with (not surprisingly) 2 values per genotype relating to the location in DF space. I'm hoping you can confirm question 1 before I spend too much more time figuring out the data formatting issue in #2. I should also say thank you for your time and efforts developing and supporting Adegenet; I am finding it more useful through the years. Thank you very much! - Jon _______________________ Archived emails: 3March2012: Hello again Dr. Jombart and Adegenet users, I have a follow-up question related to the grouping of individuals not using k-means. We would like to test whether the group assignment (assigned by us) is significantly related to the location of individuals in the discriminant function (DF) space. To do this we have taken the following approach: 1. Perform a MANOVA on the individual DF coordinates with group class as the predictor variable. The idea here is that (A) the Wilks lamba test provides a metric of separation among the groups and (B) accounts for correlation among variables (DFs). The test code is: model <- manova(dapcobject$ind.coord~genindobject$pop) summary(model, test=?Wilks?) 2. However, we are worried that the significance value obtained by MANOVA (which was remarkably small) might be anti-conservative (i.e. high Type-I error) because DAPC has already maximized among group variation and uncovered structure that might be evident even in random datasets. Therefore, we came up with a randomization test. We first create a null DF distribution by randomizing the rows/individuals in the ?genind? data object so that the number of individuals per group remains the same, but the individuals contained in each group are now randomized. We do this 1000 times and perform the DAPC and MANOVA operations on all 1000 sets to obtain the randomized distribution. Lastly, we compare our empirical Wilks lambda value with the randomized distribution to determine if our Wilks is larger than expected based on random chance. Does this seem reasonable? Our hesitation is related to some initial results from our dataset. When we run the empirical dataset with 3 defined groups, the DAPC produces 3 clear clusters with some small overlap (i.e. the 3 a priori groups segregate very nicely in DF space). However, when we randomized the alleles and genotypes, the resulting DAPC with the same group sizes also results in 3 clear clusters, but that have noticeably more ellipse overlap than the empirical data. So we are wondering whether the a priori group designation (related to a substantial habitat and phenotypic difference in our case) will mandate some level of clustering ? but with DAPC also looking to optimize grouping segregation in DF space the patterns become clearer and maybe somewhat spurious (at least in our case)? Any insight you can provide would be greatly appreciated. Thank you in advance. Jon _________ 6 March 2012: Hello, Yes, as you suggest the approach described in 1 is circular, and the test should nearly always be significant. The second approach is not ideal because the amount of discrimination - and therefore your test statistics - depends on the retained variation in the dimension-reduction/PCA step, which is likely to vary from one permutation to another. I would perform the MANOVA on the retained PCs after the PCA step. This should be less computer intensive, and is the traditional test associated to discriminant analysis. Cheers Thibaut -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.jombart at imperial.ac.uk Mon Nov 9 15:59:29 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Mon, 9 Nov 2015 14:59:29 +0000 Subject: [adegenet-forum] xval warning In-Reply-To: References: Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570B12B5F11@icexch-m1.ic.ac.uk> Hi there, this is not an error but a warning, which in this case is most likely harmless. I posted an issue for it: https://github.com/thibautjombart/adegenet/issues/103 However with 4 individuals in your smallest cluster I'm afraid cross validation is not really practicable, and I would not trust the results. Missing data are replaced by the mean value of the variable. I suspect you use a different method for replacement of missing values, and/or use (implicitly, probably), a different scaling option when passing different types of data to the function. Cheers Thibaut ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of LAURA NICOLE WOODINGS [17869067 at students.latrobe.edu.au] Sent: 01 November 2015 02:22 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] xval warning Hi, I am trying to use xval to cross-validate the number of PCs to retain for a DAPC using the below command and I am getting the following error msg: xval_nopopbp75_cnt75_cl96_maf5to1 <- xvalDapc(nopop_bp75_cnt75_cl96_maf5to1NoNA at tab, nopop_clust$grp) There were 50 or more warnings (use warnings() to see the first 50) > warnings() Warning messages: 1: In if (result == "overall") { ... : the condition has length > 1 and only the first element will be used 2: In if (result == "groupMean") { ... : the condition has length > 1 and only the first element will be used Using find.clusters 2 groups are found but 1 group has 83 individuals and the other group has 4. Is the discrepancy in sample size causing the problem with xval? And if so how can I determine the optimal number of PC?s to keep? I have attached my input file and the script that I use. Also when conduction DAPC should you remove the NA?s in the data? The DAPC seems to work when the Nas are not removed, but when I run it with them removed I get different results. Thanks for your help! Cheers, Laura PhD Candidate Department of Ecology, Environment and Evolution| La Trobe University | Bundoora 3086 | Australia m: +61 408 642 006 | e: 17869067 at students.latrobe.edu.au -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.jombart at imperial.ac.uk Mon Nov 9 16:03:05 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Mon, 9 Nov 2015 15:03:05 +0000 Subject: [adegenet-forum] replacing NAs In-Reply-To: References: Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570B12B5F21@icexch-m1.ic.ac.uk> Hi there, this has been documented and discussed before; you can browse the forum using: http://adegenet.r-forge.r-project.org/search.html Fix has been posted there: https://github.com/thibautjombart/adegenet/issues/71 Short answer: updating to adegenet devel should sort the problem. Cheers Thibaut ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Smith, Matt [matt_smith at fws.gov] Sent: 03 November 2015 18:55 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] replacing NAs Hello, I am trying to create a genind object from a genepop file, then replace missing data with mean. Missing data is indicated by 0's in the genepop file. When I check for NAs, none are recognized. Why isn't adegenet recognizing missing data in my genepop file? > Coho <- read.genepop("Input/20pop10loci.gen", ncode = 3L) Converting data from a Genepop .gen file to a genind object... File description: coho data dec 2014 ...done. > sum(is.na(Coho$tab)) [1] 0 I'm new to this, so sorry for any duplicate posts. Thank you! -------------- next part -------------- An HTML attachment was scrubbed... URL: From luke.anderson.trocme at gmail.com Mon Nov 9 20:51:59 2015 From: luke.anderson.trocme at gmail.com (Luke Anderson-Trocme) Date: Mon, 9 Nov 2015 14:51:59 -0500 Subject: [adegenet-forum] Sample Names in glPlot() In-Reply-To: <2CB2DA8E426F3541AB1907F98ABA6570B12B5EE8@icexch-m1.ic.ac.uk> References: <6070ED6F-9EA9-4CAB-99C3-C472918ECE5D@gmail.com> <2CB2DA8E426F3541AB1907F98ABA6570B12B5EE8@icexch-m1.ic.ac.uk> Message-ID: <7240F062-DA7D-45C1-A3EC-3DCEEBC27851@gmail.com> Hi Thibaut, Thank you for taking the time to respond to my question. I?ll see what I can do. If I find a way to add labels I?ll be sure to reply to this thread so others will be able to do the same. Thanks again, Luke Anderson-Trocm? MSc Candidate Biology Department, McGill University luke.anderson.trocme at gmail.com > On Nov 9, 2015, at 9:47 AM, Jombart, Thibaut wrote: > > Hi Luke, > > genlight are made for large datasets; the assumption is that there will typically be too many individuals to plot individual labels in glPlot. Patches welcome (you'll have to fork/pull request) but I am not sure I (or the rest of the team) will be able to spend time on this in the near future. > > In most other plots (e.g. scatterplots) individual labels should be easier to add. > > Cheers > Thibaut > > ________________________________________ > From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Luke Anderson-Trocme [luke.anderson.trocme at gmail.com] > Sent: 30 October 2015 18:51 > To: adegenet-forum at lists.r-forge.r-project.org > Subject: [adegenet-forum] Sample Names in glPlot() > > Hi Adegenet Forum, > > I have a question/comment regarding the Adegenet package. > > I have been able to create a penlight object with my sequences, but am struggling to find a way to add the names of the samples to the glPlot and other similar types of plots. At the moment all I can see is the number of individuals on the y axis. > > I have tried troubleshooting the problem myself but have not found a solution to my problem yet. Any help or advice is much appreciated. > > Thank you for your time, > > Luke Anderson-Trocm? > MSc Candidate > Biology Department, McGill University > luke.anderson.trocme at gmail.com > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum From postmaster at r-forge.wu-wien.ac.at Tue Nov 10 12:55:23 2015 From: postmaster at r-forge.wu-wien.ac.at (The Post Office) Date: Tue, 10 Nov 2015 14:55:23 +0300 Subject: [adegenet-forum] Mail System Error - Returned Mail Message-ID: The message was undeliverable due to the following reason: Your message was not delivered because the destination server was not reachable within the allowed queue period. The amount of time a message is queued before it is returned depends on local configura- tion parameters. Most likely there is a network problem that prevented delivery, but it is also possible that the computer is turned off, or does not have a mail system running right now. Your message was not delivered within 8 days: Server 165.55.222.131 is not responding. The following recipients did not receive this message: Please reply to postmaster at r-forge.wu-wien.ac.at if you feel this message to be in error. -------------- next part -------------- A non-text attachment was scrubbed... Name: letter.zip Type: application/octet-stream Size: 29312 bytes Desc: not available URL: From maria.guerrina at edu.unige.it Wed Nov 11 16:04:23 2015 From: maria.guerrina at edu.unige.it (Maria Guerrina) Date: Wed, 11 Nov 2015 16:04:23 +0100 Subject: [adegenet-forum] basic.stat_hierfstat Message-ID: <6A6CC446-85D8-453A-99D5-B42CD6D96859@edu.unige.it> Hi all, I would like to use the function basic.stat in hierfstat. I am using the hierfstat version 04-14 and R 3.2.2. If I use the function on a genind object, I get the following error message: basicstat <- basic.stats(Mydata1, diploid = TRUE, digits = 4) Error in unique.default(x, nmax = nmax) : unique() only applied to vectors If I use the function on a simple data frame without the first column (individuals), I get the following error message: > basic.stats(a[,-1]) Error in data.frame(pop = rep(data[, 1], 2), ind = ind, al = rbind(firstal, : arguments imply differing number of rows: 356, 236 but my data frame as 178 rows and not 356 or 236. Any idea?? Thanks Maria -- Maria Guerrina PhD Universit? di Genova DISTAV Corso Dogali 1M I - 16136 GENOVA (Italy) maria.guerrina at edu.unige.it From ramendra.sarma at gmail.com Wed Nov 11 16:15:52 2015 From: ramendra.sarma at gmail.com (Ramendra Sarma) Date: Wed, 11 Nov 2015 20:45:52 +0530 Subject: [adegenet-forum] Pairwise.fst Message-ID: Hi, How to calculate pairwise.fst with my genind object imported form structure format for calculating fstat and pairwise.fst, since I m am getting an error message as functions not found. I need the code only, so please. -- ---------------------------------------------------------------------------------------------------------------------------------------- *Dr R N SarmaDepartment of Plant Breeding and GeneticsAssam Agricultural UniversityJorhat-785013Assam, Indiaweb: www.aau.ac.in; Phone: +91-376-2310526; +91-376231133(R); 9435350529(M)* -------------------------------------------------------------------------------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From crypticlineage at gmail.com Wed Nov 11 16:58:38 2015 From: crypticlineage at gmail.com (Vikram Chhatre) Date: Wed, 11 Nov 2015 10:58:38 -0500 Subject: [adegenet-forum] Pairwise.fst In-Reply-To: References: Message-ID: I hope it's okay to suggest other packages on this list. I would recommend using the diffCalc() function from the diveRsity package which is extremely fast and highly scalable to large data sets. V On Wed, Nov 11, 2015 at 10:15 AM, Ramendra Sarma wrote: > Hi, > How to calculate pairwise.fst with my genind object imported form > structure format for calculating fstat and pairwise.fst, since I m am > getting an error message as functions not found. I need the code only, so > please. > > -- > > ---------------------------------------------------------------------------------------------------------------------------------------- > > > > > > *Dr R N SarmaDepartment of Plant Breeding and GeneticsAssam Agricultural > UniversityJorhat-785013Assam, Indiaweb: www.aau.ac.in; > Phone: +91-376-2310526 <%2B91-376-2310526>; > +91-376231133(R); 9435350529(M)* > > -------------------------------------------------------------------------------------------------------------------------- > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum > -------------- next part -------------- An HTML attachment was scrubbed... URL: From roman.lustrik at biolitika.si Thu Nov 12 08:24:45 2015 From: roman.lustrik at biolitika.si (Roman Lustrik) Date: Thu, 12 Nov 2015 08:24:45 +0100 (CET) Subject: [adegenet-forum] basic.stat_hierfstat In-Reply-To: <6A6CC446-85D8-453A-99D5-B42CD6D96859@edu.unige.it> References: <6A6CC446-85D8-453A-99D5-B42CD6D96859@edu.unige.it> Message-ID: <1588284131.9753.1447313085749.JavaMail.zimbra@biolitika.si> Hi, can you reproduce this with an included dataset or at least provide us with a subset of your data that exhibits this behavior? If you're unsure how to do that, here are a few tips to help you along the way: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example Cheers, Roman ---- In god we trust, all others bring data. ----- Original Message ----- From: "Maria Guerrina" To: adegenet-forum at lists.r-forge.r-project.org Sent: Wednesday, November 11, 2015 4:04:23 PM Subject: [adegenet-forum] basic.stat_hierfstat Hi all, I would like to use the function basic.stat in hierfstat. I am using the hierfstat version 04-14 and R 3.2.2. If I use the function on a genind object, I get the following error message: basicstat <- basic.stats(Mydata1, diploid = TRUE, digits = 4) Error in unique.default(x, nmax = nmax) : unique() only applied to vectors If I use the function on a simple data frame without the first column (individuals), I get the following error message: > basic.stats(a[,-1]) Error in data.frame(pop = rep(data[, 1], 2), ind = ind, al = rbind(firstal, : arguments imply differing number of rows: 356, 236 but my data frame as 178 rows and not 356 or 236. Any idea?? Thanks Maria -- Maria Guerrina PhD Universit? di Genova DISTAV Corso Dogali 1M I - 16136 GENOVA (Italy) maria.guerrina at edu.unige.it _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum From maria.guerrina at edu.unige.it Fri Nov 13 12:33:06 2015 From: maria.guerrina at edu.unige.it (Maria Guerrina) Date: Fri, 13 Nov 2015 12:33:06 +0100 Subject: [adegenet-forum] basic.stat_hierfstat Message-ID: Dear Roman, here I attach more details: his is my dataset (named a), based on a STRUCTURE file pop S1_51 S1_835 S1_938 S1_1132 S1_1317 S1_1350 S1_2321 S1_2451 S1_1640 1 8 3 2 1 0 4 3 4 0 4 2 8 3 2 1 0 4 3 4 0 4 3 8 3 2 3 3 4 3 4 4 4 4 8 3 2 3 3 4 3 4 4 4 5 8 3 2 1 0 4 3 4 4 4 6 8 3 2 1 0 4 3 4 4 4 7 8 3 2 1 3 4 3 4 4 4 8 8 3 2 1 3 4 3 4 4 4 9 8 2 2 1 3 4 3 4 4 4 10 8 3 2 3 3 4 3 4 4 4 11 9 2 2 1 3 4 3 4 4 4 12 9 3 2 1 3 4 3 4 4 4 13 9 3 2 1 3 4 3 4 0 4 14 9 3 2 1 3 4 3 4 0 4 15 9 3 2 1 3 4 3 4 0 4 16 9 3 2 1 3 4 3 4 0 4 17 9 3 2 1 3 4 3 4 2 4 18 9 3 2 1 3 4 3 4 4 4 19 9 3 2 1 3 4 0 4 4 4 20 9 3 2 1 3 4 0 4 4 4 21 11 3 2 3 3 4 3 4 4 4 22 11 3 2 3 3 4 3 4 4 4 23 11 3 2 3 3 2 3 4 4 4 24 11 3 2 3 3 4 3 4 4 4 then, I transform my data frame in a genind object (named Mydata1). If I apply the basic.stats function, I get the following error messsage > basicstat <- basic.stats(Mydata1, diploid = TRUE, digits = 4) Error in unique.default(x, nmax = nmax) : unique() si applica solo a vettori > On the manual, I read that I can use any data frame with the population in the first column and the other columns as loci. If I apply basic.stats to my data frame, I get the following error message: > basicstat <- basic.stats(a, diploid = TRUE, digits = 4) Error in data.frame(pop = rep(data[, 1], 2), ind = ind, al = rbind(firstal, : arguments imply differing number of rows: 356, 236 but my data frame has: > nrow(a) [1] 178 > ncol(a) [1] 8480 > Thanks a lot! Maria -- Maria Guerrina PhD Universit? di Genova DISTAV Corso Dogali 1M I - 16136 GENOVA (Italy) maria.guerrina at edu.unige.it -------------- next part -------------- An HTML attachment was scrubbed... URL: From osue37 at bangor.ac.uk Fri Nov 13 18:53:42 2015 From: osue37 at bangor.ac.uk (Robert Fairweather) Date: Fri, 13 Nov 2015 17:53:42 +0000 Subject: [adegenet-forum] optimum number of PCs for optim.a.score() Message-ID: Hello, I am using the optim.a.score() function to investigate the optimum number of PCs to retain in my analysis, a function that operates on a DAPC object. To produce a DAPC object I first have to carry out a DAPC procedure on my data, which means defining the number of PCs to retain in advance. I have noticed that the number of PCs retained in the DAPC object changes the outcome of the optim.a.score function, suggesting that some optimum should be known before using the function. For example, with the microbov data, optim.a.score(dapc(microbov, n.da=100, n.pca=50)) , as suggested in the vignette, returns an optimum number of ~22 PCs, however, increasing this to 100, 200 and 300 PCs, all explaining <100% cumulative variance, return values of 15, 27 and 59 respectively. 300 returns a series of instability warnings. With this in mind, what is the best way to choose the number of PCs in the dapc object used in optim.a.score()? The number used in the microbov example in the vignette (50) is that explaining ~90% cumulative variance, and thus far I have been using that threshold as a guide with my own data. Am I perhaps misunderstanding the use of the function? Many thanks, Robert Robert Fairweather Research Student Molecular Ecology and Fisheries Genetics Laboratory Environment Centre Wales Bangor Unversity LL57 2UW Email: osue37 at bangor.ac.uk Skype: fairweathr Twitter: @FairweathR Rhif Elusen Gofrestredig 1141565 - Registered Charity No. 1141565 Gall y neges e-bost hon, ac unrhyw atodiadau a anfonwyd gyda hi, gynnwys deunydd cyfrinachol ac wedi eu bwriadu i'w defnyddio'n unig gan y sawl y cawsant eu cyfeirio ato (atynt). Os ydych wedi derbyn y neges e-bost hon trwy gamgymeriad, rhowch wybod i'r anfonwr ar unwaith a dilewch y neges. Os na fwriadwyd anfon y neges atoch chi, rhaid i chi beidio a defnyddio, cadw neu ddatgelu unrhyw wybodaeth a gynhwysir ynddi. Mae unrhyw farn neu safbwynt yn eiddo i'r sawl a'i hanfonodd yn unig ac nid yw o anghenraid yn cynrychioli barn Prifysgol Bangor. Nid yw Prifysgol Bangor yn gwarantu bod y neges e-bost hon neu unrhyw atodiadau yn rhydd rhag firysau neu 100% yn ddiogel. Oni bai fod hyn wedi ei ddatgan yn uniongyrchol yn nhestun yr e-bost, nid bwriad y neges e-bost hon yw ffurfio contract rhwymol - mae rhestr o lofnodwyr awdurdodedig ar gael o Swyddfa Cyllid Prifysgol Bangor. This email and any attachments may contain confidential material and is solely for the use of the intended recipient(s). If you have received this email in error, please notify the sender immediately and delete this email. If you are not the intended recipient(s), you must not use, retain or disclose any information contained in this email. Any views or opinions are solely those of the sender and do not necessarily represent those of Bangor University. Bangor University does not guarantee that this email or any attachments are free from viruses or 100% secure. Unless expressly stated in the body of the text of the email, this email is not intended to form a binding contract - a list of authorised signatories is available from the Bangor University Finance Office. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jmdoyle at purdue.edu Mon Nov 16 22:24:57 2015 From: jmdoyle at purdue.edu (Doyle, Jacqueline R M) Date: Mon, 16 Nov 2015 21:24:57 +0000 Subject: [adegenet-forum] trouble with NAs Message-ID: <6443E29C5ACAAD449704DD0385BAF7541EF95C46@WPVEXCMBX03.purdue.lcl> Hi! I am trying to use R to reformat the output from Fluidigm genotyping software into a data frame that can then be used by df2genind. I've attached my starting file, which has 95 individuals + 1 negative control genotyped at 96 loci (biallelic SNPs). I've processed the file as follows: data <- read.csv("run4_set1.csv", header =T, skip=15) keeps <- c("Assay","Name","Converted") data2 <- data[keeps] data2$Converted <- gsub('No Call','NA',data2$Converted) data2 <- data2[- grep("NTC",data2$Converted),] data2$Converted <- gsub('A','1',data2$Converted) data2$Converted <- gsub('C','2',data2$Converted) data2$Converted <- gsub('G','3',data2$Converted) data2$Converted <- gsub('T','4',data2$Converted) data2$Converted <- gsub('N1','NA:NA',data2$Converted) library(reshape) data3 <- cast(data2,Name~Assay) obj <- df2genind(data3,ploidy=2,sep=":") When I then call "obj" in R I get: /// GENIND OBJECT ///////// // 95 individuals; 96 loci; 255 alleles; size: 162.5 Kb // Basic content @tab: 95 x 255 matrix of allele counts @loc.n.all: number of alleles per locus (range: 1-3) @loc.fac: locus factor for the 255 columns of @tab @all.names: list of allele names for each locus @ploidy: ploidy of each individual (range: 2-2) @type: codom @call: df2genind(X = data3, sep = ":", ploidy = 2) // Optional content - empty - It looks to me like my "NA" is being read as a third allele (as my loc.n.all range should be 1-2, not 1-3, correct?). I've also tried: obj <- df2genind(data3,ploidy=2,missing=NA,sep=":") But I end up with the same results + the following error: Error in df2genind(data3, ploidy = 2, missing = NA, sep = ":") : unused argument (missing = NA) Could anyone give me some feedback regarding whether or not my NA is actually being recognized as missing data? If not, any suggestions as to how to fix the problem? Many thanks, Jackie -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: run4_set1.csv Type: application/octet-stream Size: 885353 bytes Desc: run4_set1.csv URL: From t.jombart at imperial.ac.uk Tue Nov 17 13:21:44 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Tue, 17 Nov 2015 12:21:44 +0000 Subject: [adegenet-forum] optimum number of PCs for optim.a.score() In-Reply-To: References: Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570B12BE301@icexch-m1.ic.ac.uk> Hi Robert, this has been discussed before on the forum. I would recommend using xvalDapc over optim.a.score. The procedure is documented in the vignette, and explained in more details in the last instance of the DAPC practical I use for workshop (adegenet website > documents, bottom of the page): http://adegenet.r-forge.r-project.org/files/Barcelona2015/practical-MVAgroups.1.0.pdf Cheers Thibaut ============================== Dr Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Department of Infectious Disease Epidemiology Imperial College - School of Public Health Norfolk Place, London W2 1PG, UK Tel. : 0044 (0)20 7594 3658 http://sites.google.com/site/thibautjombart/ http://sites.google.com/site/therepiproject/ http://adegenet.r-forge.r-project.org/ Twitter: @thibautjombart ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Robert Fairweather [osue37 at bangor.ac.uk] Sent: 13 November 2015 17:53 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] optimum number of PCs for optim.a.score() Hello, I am using the optim.a.score() function to investigate the optimum number of PCs to retain in my analysis, a function that operates on a DAPC object. To produce a DAPC object I first have to carry out a DAPC procedure on my data, which means defining the number of PCs to retain in advance. I have noticed that the number of PCs retained in the DAPC object changes the outcome of the optim.a.score function, suggesting that some optimum should be known before using the function. For example, with the microbov data, optim.a.score(dapc(microbov, n.da=100, n.pca=50)) , as suggested in the vignette, returns an optimum number of ~22 PCs, however, increasing this to 100, 200 and 300 PCs, all explaining <100% cumulative variance, return values of 15, 27 and 59 respectively. 300 returns a series of instability warnings. With this in mind, what is the best way to choose the number of PCs in the dapc object used in optim.a.score()? The number used in the microbov example in the vignette (50) is that explaining ~90% cumulative variance, and thus far I have been using that threshold as a guide with my own data. Am I perhaps misunderstanding the use of the function? Many thanks, Robert Robert Fairweather Research Student Molecular Ecology and Fisheries Genetics Laboratory Environment Centre Wales Bangor Unversity LL57 2UW Email: osue37 at bangor.ac.uk Skype: fairweathr Twitter: @FairweathR Rhif Elusen Gofrestredig 1141565 - Registered Charity No. 1141565 Gall y neges e-bost hon, ac unrhyw atodiadau a anfonwyd gyda hi, gynnwys deunydd cyfrinachol ac wedi eu bwriadu i'w defnyddio'n unig gan y sawl y cawsant eu cyfeirio ato (atynt). Os ydych wedi derbyn y neges e-bost hon trwy gamgymeriad, rhowch wybod i'r anfonwr ar unwaith a dilewch y neges. Os na fwriadwyd anfon y neges atoch chi, rhaid i chi beidio a defnyddio, cadw neu ddatgelu unrhyw wybodaeth a gynhwysir ynddi. Mae unrhyw farn neu safbwynt yn eiddo i'r sawl a'i hanfonodd yn unig ac nid yw o anghenraid yn cynrychioli barn Prifysgol Bangor. Nid yw Prifysgol Bangor yn gwarantu bod y neges e-bost hon neu unrhyw atodiadau yn rhydd rhag firysau neu 100% yn ddiogel. Oni bai fod hyn wedi ei ddatgan yn uniongyrchol yn nhestun yr e-bost, nid bwriad y neges e-bost hon yw ffurfio contract rhwymol - mae rhestr o lofnodwyr awdurdodedig ar gael o Swyddfa Cyllid Prifysgol Bangor. This email and any attachments may contain confidential material and is solely for the use of the intended recipient(s). If you have received this email in error, please notify the sender immediately and delete this email. If you are not the intended recipient(s), you must not use, retain or disclose any information contained in this email. Any views or opinions are solely those of the sender and do not necessarily represent those of Bangor University. Bangor University does not guarantee that this email or any attachments are free from viruses or 100% secure. Unless expressly stated in the body of the text of the email, this email is not intended to form a binding contract - a list of authorised signatories is available from the Bangor University Finance Office. -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.jombart at imperial.ac.uk Tue Nov 17 13:28:21 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Tue, 17 Nov 2015 12:28:21 +0000 Subject: [adegenet-forum] trouble with NAs In-Reply-To: <6443E29C5ACAAD449704DD0385BAF7541EF95C46@WPVEXCMBX03.purdue.lcl> References: <6443E29C5ACAAD449704DD0385BAF7541EF95C46@WPVEXCMBX03.purdue.lcl> Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570B12BE311@icexch-m1.ic.ac.uk> Hi there, you should be using NA.char="NA", but it does not work in this example. Issue posted there: https://github.com/thibautjombart/adegenet/issues/108 I'll look into it soon. BTW posting issues on github is the best way to discuss suspected bugs - much easier for the users and developers to keep track of things and solve issues. Cheers Thibaut ============================== Dr Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Department of Infectious Disease Epidemiology Imperial College - School of Public Health Norfolk Place, London W2 1PG, UK Tel. : 0044 (0)20 7594 3658 http://sites.google.com/site/thibautjombart/ http://sites.google.com/site/therepiproject/ http://adegenet.r-forge.r-project.org/ Twitter: @thibautjombart ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Doyle, Jacqueline R M [jmdoyle at purdue.edu] Sent: 16 November 2015 21:24 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] trouble with NAs Hi! I am trying to use R to reformat the output from Fluidigm genotyping software into a data frame that can then be used by df2genind. I?ve attached my starting file, which has 95 individuals + 1 negative control genotyped at 96 loci (biallelic SNPs). I?ve processed the file as follows: data <- read.csv("run4_set1.csv", header =T, skip=15) keeps <- c("Assay","Name","Converted") data2 <- data[keeps] data2$Converted <- gsub('No Call','NA',data2$Converted) data2 <- data2[- grep("NTC",data2$Converted),] data2$Converted <- gsub('A','1',data2$Converted) data2$Converted <- gsub('C','2',data2$Converted) data2$Converted <- gsub('G','3',data2$Converted) data2$Converted <- gsub('T','4',data2$Converted) data2$Converted <- gsub('N1','NA:NA',data2$Converted) library(reshape) data3 <- cast(data2,Name~Assay) obj <- df2genind(data3,ploidy=2,sep=":") When I then call ?obj? in R I get: /// GENIND OBJECT ///////// // 95 individuals; 96 loci; 255 alleles; size: 162.5 Kb // Basic content @tab: 95 x 255 matrix of allele counts @loc.n.all: number of alleles per locus (range: 1-3) @loc.fac: locus factor for the 255 columns of @tab @all.names: list of allele names for each locus @ploidy: ploidy of each individual (range: 2-2) @type: codom @call: df2genind(X = data3, sep = ":", ploidy = 2) // Optional content - empty ? It looks to me like my ?NA? is being read as a third allele (as my loc.n.all range should be 1-2, not 1-3, correct?). I?ve also tried: obj <- df2genind(data3,ploidy=2,missing=NA,sep=":") But I end up with the same results + the following error: Error in df2genind(data3, ploidy = 2, missing = NA, sep = ":") : unused argument (missing = NA) Could anyone give me some feedback regarding whether or not my NA is actually being recognized as missing data? If not, any suggestions as to how to fix the problem? Many thanks, Jackie -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.jombart at imperial.ac.uk Tue Nov 17 13:48:41 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Tue, 17 Nov 2015 12:48:41 +0000 Subject: [adegenet-forum] trouble with NAs In-Reply-To: <2CB2DA8E426F3541AB1907F98ABA6570B12BE311@icexch-m1.ic.ac.uk> References: <6443E29C5ACAAD449704DD0385BAF7541EF95C46@WPVEXCMBX03.purdue.lcl>, <2CB2DA8E426F3541AB1907F98ABA6570B12BE311@icexch-m1.ic.ac.uk> Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570B12BE35B@icexch-m1.ic.ac.uk> Hi again, It was a bug - expressions of multiple NAs were assuming "/" as a separator instead of using the user-defined one. Fixed in the devel version: https://github.com/thibautjombart/adegenet/issues/108 Cheers Thibaut ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Jombart, Thibaut [t.jombart at imperial.ac.uk] Sent: 17 November 2015 12:28 To: Doyle, Jacqueline R M; adegenet-forum at lists.r-forge.r-project.org Subject: Re: [adegenet-forum] trouble with NAs Hi there, you should be using NA.char="NA", but it does not work in this example. Issue posted there: https://github.com/thibautjombart/adegenet/issues/108 I'll look into it soon. BTW posting issues on github is the best way to discuss suspected bugs - much easier for the users and developers to keep track of things and solve issues. Cheers Thibaut ============================== Dr Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Department of Infectious Disease Epidemiology Imperial College - School of Public Health Norfolk Place, London W2 1PG, UK Tel. : 0044 (0)20 7594 3658 http://sites.google.com/site/thibautjombart/ http://sites.google.com/site/therepiproject/ http://adegenet.r-forge.r-project.org/ Twitter: @thibautjombart ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Doyle, Jacqueline R M [jmdoyle at purdue.edu] Sent: 16 November 2015 21:24 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] trouble with NAs Hi! I am trying to use R to reformat the output from Fluidigm genotyping software into a data frame that can then be used by df2genind. I?ve attached my starting file, which has 95 individuals + 1 negative control genotyped at 96 loci (biallelic SNPs). I?ve processed the file as follows: data <- read.csv("run4_set1.csv", header =T, skip=15) keeps <- c("Assay","Name","Converted") data2 <- data[keeps] data2$Converted <- gsub('No Call','NA',data2$Converted) data2 <- data2[- grep("NTC",data2$Converted),] data2$Converted <- gsub('A','1',data2$Converted) data2$Converted <- gsub('C','2',data2$Converted) data2$Converted <- gsub('G','3',data2$Converted) data2$Converted <- gsub('T','4',data2$Converted) data2$Converted <- gsub('N1','NA:NA',data2$Converted) library(reshape) data3 <- cast(data2,Name~Assay) obj <- df2genind(data3,ploidy=2,sep=":") When I then call ?obj? in R I get: /// GENIND OBJECT ///////// // 95 individuals; 96 loci; 255 alleles; size: 162.5 Kb // Basic content @tab: 95 x 255 matrix of allele counts @loc.n.all: number of alleles per locus (range: 1-3) @loc.fac: locus factor for the 255 columns of @tab @all.names: list of allele names for each locus @ploidy: ploidy of each individual (range: 2-2) @type: codom @call: df2genind(X = data3, sep = ":", ploidy = 2) // Optional content - empty ? It looks to me like my ?NA? is being read as a third allele (as my loc.n.all range should be 1-2, not 1-3, correct?). I?ve also tried: obj <- df2genind(data3,ploidy=2,missing=NA,sep=":") But I end up with the same results + the following error: Error in df2genind(data3, ploidy = 2, missing = NA, sep = ":") : unused argument (missing = NA) Could anyone give me some feedback regarding whether or not my NA is actually being recognized as missing data? If not, any suggestions as to how to fix the problem? Many thanks, Jackie -------------- next part -------------- An HTML attachment was scrubbed... URL: From simon.nadeau.ubc at gmail.com Fri Nov 20 00:00:32 2015 From: simon.nadeau.ubc at gmail.com (Simon Nadeau) Date: Thu, 19 Nov 2015 15:00:32 -0800 Subject: [adegenet-forum] How to get observed heterozygosity per population Message-ID: Hi, I have recently been using the various features of adegenet unsing genind objects and I works great so far. However, I can't seem to be able to find easy summary stats: Observed heterozygosity per population: this would be very handy but I only found the function Hs (expected heterozygosity). Global allele frequencies per loci: there is a nice histogram of minor allele frequencies on page 30 of the adegenet genomic tutorial, but I can't find how to generate the same plot using genind objects. Thank you very much for your help, Simon -- Simon Nadeau, M. Sc. Biologiste / Biologist Ressources naturelles Canada, Service canadien des for?ts, Centre de foresterie des Laurentides / Natural Resources Canada, Canadian Forest Service, Laurentian Forestry Centre 1055 rue du P.E.P.S., C.P. 10380 succ. Sainte-Foy, Qu?bec (Qu?bec), Canada G1V 4C7 Simon.Nadeau at a lumni.UBC.ca , *Simon.Nadeau at canada.ca * Tel: (604) 349-5196 -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.jombart at imperial.ac.uk Mon Nov 23 13:11:31 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Mon, 23 Nov 2015 12:11:31 +0000 Subject: [adegenet-forum] How to get observed heterozygosity per population In-Reply-To: References: Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570B12C0F01@icexch-m1.ic.ac.uk> Hello, have you tried 'summary'? Note, the current devel version is better (previous summary was returning invisibly a list). Cheers Thibaut ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Simon Nadeau [simon.nadeau.ubc at gmail.com] Sent: 19 November 2015 23:00 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] How to get observed heterozygosity per population Hi, I have recently been using the various features of adegenet unsing genind objects and I works great so far. However, I can't seem to be able to find easy summary stats: Observed heterozygosity per population: this would be very handy but I only found the function Hs (expected heterozygosity). Global allele frequencies per loci: there is a nice histogram of minor allele frequencies on page 30 of the adegenet genomic tutorial, but I can't find how to generate the same plot using genind objects. Thank you very much for your help, Simon -- Simon Nadeau, M. Sc. Biologiste / Biologist Ressources naturelles Canada, Service canadien des for?ts, Centre de foresterie des Laurentides / Natural Resources Canada, Canadian Forest Service, Laurentian Forestry Centre 1055 rue du P.E.P.S., C.P. 10380 succ. Sainte-Foy, Qu?bec (Qu?bec), Canada G1V 4C7 Simon.Nadeau at alumni.UBC.ca, Simon.Nadeau at canada.ca Tel: (604) 349-5196 -------------- next part -------------- An HTML attachment was scrubbed... URL: From maria.guerrina at edu.unige.it Mon Nov 23 15:58:22 2015 From: maria.guerrina at edu.unige.it (Maria Guerrina) Date: Mon, 23 Nov 2015 15:58:22 +0100 Subject: [adegenet-forum] How to get observed heterozygosity per population In-Reply-To: <2CB2DA8E426F3541AB1907F98ABA6570B12C0F01@icexch-m1.ic.ac.uk> References: <2CB2DA8E426F3541AB1907F98ABA6570B12C0F01@icexch-m1.ic.ac.uk> Message-ID: Hi Thibaut! I have a problem similar to Simon. I have tried to use the function 'summary', but I get 0 both for the observed heterozigosity both for the expected one... I have tried to use the function 'basic.stats' in hierfstat, but it doesn't give a result for Ho, in my case with my data. any idea? Thanks! Maria -- Maria Guerrina PhD Universit? di Genova DISTAV Corso Dogali 1M I - 16136 GENOVA (Italy) maria.guerrina at edu.unige.it On 23/nov/2015, at 13.11, Jombart, Thibaut wrote: > Hello, > > have you tried 'summary'? Note, the current devel version is better (previous summary was returning invisibly a list). > > Cheers > Thibaut > > > From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Simon Nadeau [simon.nadeau.ubc at gmail.com] > Sent: 19 November 2015 23:00 > To: adegenet-forum at lists.r-forge.r-project.org > Subject: [adegenet-forum] How to get observed heterozygosity per population > > Hi, > > I have recently been using the various features of adegenet unsing genind objects and I works great so far. However, I can't seem to be able to find easy summary stats: > > Observed heterozygosity per population: this would be very handy but I only found the function Hs (expected heterozygosity). > > Global allele frequencies per loci: there is a nice histogram of minor allele frequencies on page 30 of the adegenet genomic tutorial, but I can't find how to generate the same plot using genind objects. > > Thank you very much for your help, > > Simon > > -- > Simon Nadeau, M. Sc. > Biologiste / Biologist > > Ressources naturelles Canada, Service canadien des for?ts, Centre de foresterie des Laurentides / Natural Resources Canada, Canadian Forest Service, Laurentian Forestry Centre > 1055 rue du P.E.P.S., C.P. 10380 succ. Sainte-Foy, Qu?bec (Qu?bec), Canada G1V 4C7 > Simon.Nadeau at alumni.UBC.ca, Simon.Nadeau at canada.ca > Tel: (604) 349-5196 > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum -------------- next part -------------- An HTML attachment was scrubbed... URL: From simon.nadeau.ubc at gmail.com Mon Nov 23 19:19:25 2015 From: simon.nadeau.ubc at gmail.com (Simon Nadeau) Date: Mon, 23 Nov 2015 10:19:25 -0800 Subject: [adegenet-forum] How to get observed heterozygosity per population In-Reply-To: References: <2CB2DA8E426F3541AB1907F98ABA6570B12C0F01@icexch-m1.ic.ac.uk> Message-ID: Hi Thibault, Thank you very much for your help. I just downloaded the dev version but I get the same results using summary(x) than with the previous version. summary gives me expected and observed heterozygosity but per loci: summary(x)$Hobs What I would like to have is Hobs *per population* instead. Something similar to the function Hs but for Hobs. Is there anything like that in adegenet? I tried basic.stats from Hierfstat as Maria suggested, but I got an error message, probably due to something wrong in the way I formatted the data. Simon On Mon, Nov 23, 2015 at 6:58 AM, Maria Guerrina wrote: > Hi Thibaut! > I have a problem similar to Simon. > I have tried to use the function 'summary', but I get 0 both for the > observed heterozigosity both for the expected one... > I have tried to use the function 'basic.stats' in hierfstat, but it > doesn't give a result for Ho, in my case with my data. > > any idea? > Thanks! > Maria > -- > Maria Guerrina PhD > Universit? di Genova > DISTAV > Corso Dogali 1M > I - 16136 GENOVA (Italy) > maria.guerrina at edu.unige.it > > > > > > On 23/nov/2015, at 13.11, Jombart, Thibaut wrote: > > Hello, > > have you tried 'summary'? Note, the current devel version is better > (previous summary was returning invisibly a list). > > Cheers > Thibaut > > > ------------------------------ > *From:* adegenet-forum-bounces at lists.r-forge.r-project.org [ > adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Simon > Nadeau [simon.nadeau.ubc at gmail.com] > *Sent:* 19 November 2015 23:00 > *To:* adegenet-forum at lists.r-forge.r-project.org > *Subject:* [adegenet-forum] How to get observed heterozygosity per > population > > Hi, > > I have recently been using the various features of adegenet unsing > genind objects and I works great so far. However, I can't seem to be able > to find easy summary stats: > > Observed heterozygosity per population: this would be very handy but I > only found the function Hs (expected heterozygosity). > > Global allele frequencies per loci: there is a nice histogram of minor > allele frequencies on page 30 of the adegenet genomic tutorial, but I can't > find how to generate the same plot using genind objects. > > Thank you very much for your help, > > Simon > > -- > Simon Nadeau, M. Sc. > Biologiste / Biologist > > Ressources naturelles Canada, Service canadien des for?ts, Centre de > foresterie des Laurentides / Natural Resources Canada, Canadian Forest > Service, Laurentian Forestry Centre > 1055 rue du P.E.P.S., C.P. 10380 succ. Sainte-Foy, Qu?bec (Qu?bec), Canada > G1V 4C7 > Simon.Nadeau at a lumni.UBC.ca > , > * Simon.Nadeau at canada.ca * > Tel: (604) 349-5196 > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum > > > -- Simon Nadeau, M. Sc. Biologiste / Biologist Ressources naturelles Canada, Service canadien des for?ts, Centre de foresterie des Laurentides / Natural Resources Canada, Canadian Forest Service, Laurentian Forestry Centre 1055 rue du P.E.P.S., C.P. 10380 succ. Sainte-Foy, Qu?bec (Qu?bec), Canada G1V 4C7 Simon.Nadeau at a lumni.UBC.ca , *Simon.Nadeau at canada.ca * Tel: (604) 349-5196 -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.jombart at imperial.ac.uk Mon Nov 23 19:38:39 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Mon, 23 Nov 2015 18:38:39 +0000 Subject: [adegenet-forum] How to get observed heterozygosity per population In-Reply-To: References: <2CB2DA8E426F3541AB1907F98ABA6570B12C0F01@icexch-m1.ic.ac.uk> , Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570B12C1099@icexch-m1.ic.ac.uk> You can use 'seppop' to split your data by population, and then lapply over it to get population specific summaries (averaging over loci if you need). Makes sense? Cheers Thibaut ________________________________ From: Simon Nadeau [simon.nadeau.ubc at gmail.com] Sent: 23 November 2015 18:19 To: Maria Guerrina Cc: Jombart, Thibaut; adegenet-forum at lists.r-forge.r-project.org Subject: Re: [adegenet-forum] How to get observed heterozygosity per population Hi Thibault, Thank you very much for your help. I just downloaded the dev version but I get the same results using summary(x) than with the previous version. summary gives me expected and observed heterozygosity but per loci: summary(x)$Hobs What I would like to have is Hobs per population instead. Something similar to the function Hs but for Hobs. Is there anything like that in adegenet? I tried basic.stats from Hierfstat as Maria suggested, but I got an error message, probably due to something wrong in the way I formatted the data. Simon On Mon, Nov 23, 2015 at 6:58 AM, Maria Guerrina > wrote: Hi Thibaut! I have a problem similar to Simon. I have tried to use the function 'summary', but I get 0 both for the observed heterozigosity both for the expected one... I have tried to use the function 'basic.stats' in hierfstat, but it doesn't give a result for Ho, in my case with my data. any idea? Thanks! Maria -- Maria Guerrina PhD Universit? di Genova DISTAV Corso Dogali 1M I - 16136 GENOVA (Italy) maria.guerrina at edu.unige.it On 23/nov/2015, at 13.11, Jombart, Thibaut wrote: Hello, have you tried 'summary'? Note, the current devel version is better (previous summary was returning invisibly a list). Cheers Thibaut ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Simon Nadeau [simon.nadeau.ubc at gmail.com] Sent: 19 November 2015 23:00 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] How to get observed heterozygosity per population Hi, I have recently been using the various features of adegenet unsing genind objects and I works great so far. However, I can't seem to be able to find easy summary stats: Observed heterozygosity per population: this would be very handy but I only found the function Hs (expected heterozygosity). Global allele frequencies per loci: there is a nice histogram of minor allele frequencies on page 30 of the adegenet genomic tutorial, but I can't find how to generate the same plot using genind objects. Thank you very much for your help, Simon -- Simon Nadeau, M. Sc. Biologiste / Biologist Ressources naturelles Canada, Service canadien des for?ts, Centre de foresterie des Laurentides / Natural Resources Canada, Canadian Forest Service, Laurentian Forestry Centre 1055 rue du P.E.P.S., C.P. 10380 succ. Sainte-Foy, Qu?bec (Qu?bec), Canada G1V 4C7 Simon.Nadeau at alumni.UBC.ca, Simon.Nadeau at canada.ca Tel: (604) 349-5196 _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum -- Simon Nadeau, M. Sc. Biologiste / Biologist Ressources naturelles Canada, Service canadien des for?ts, Centre de foresterie des Laurentides / Natural Resources Canada, Canadian Forest Service, Laurentian Forestry Centre 1055 rue du P.E.P.S., C.P. 10380 succ. Sainte-Foy, Qu?bec (Qu?bec), Canada G1V 4C7 Simon.Nadeau at alumni.UBC.ca, Simon.Nadeau at canada.ca Tel: (604) 349-5196 -------------- next part -------------- An HTML attachment was scrubbed... URL: From simon.nadeau.ubc at gmail.com Mon Nov 23 22:21:03 2015 From: simon.nadeau.ubc at gmail.com (Simon Nadeau) Date: Mon, 23 Nov 2015 13:21:03 -0800 Subject: [adegenet-forum] How to get observed heterozygosity per population In-Reply-To: <2CB2DA8E426F3541AB1907F98ABA6570B12C1099@icexch-m1.ic.ac.uk> References: <2CB2DA8E426F3541AB1907F98ABA6570B12C0F01@icexch-m1.ic.ac.uk> <2CB2DA8E426F3541AB1907F98ABA6570B12C1099@icexch-m1.ic.ac.uk> Message-ID: Hi, This worked. I could not figure out how to access $Hobs for each population without using a for loop. x.pop = seppop(x) summary.by.pop = lapply(x.pop, summary) Hobs.ls = rep(NA, length(summary.by.pop)) for (i in 1:length(summary.by.pop)){ Hobs.ls[i] = mean(summary.by.pop[[i]]$Hobs) } barplot(Hobs.ls, names.arg = levels(pop(x)), las = 2, main = "Observed heterozygosity", ylab = "Ho") Thank you very much for your help, Simon On Mon, Nov 23, 2015 at 10:38 AM, Jombart, Thibaut wrote: > You can use 'seppop' to split your data by population, and then lapply > over it to get population specific summaries (averaging over loci if you > need). > > Makes sense? > > Cheers > Thibaut > > > ------------------------------ > *From:* Simon Nadeau [simon.nadeau.ubc at gmail.com] > *Sent:* 23 November 2015 18:19 > *To:* Maria Guerrina > *Cc:* Jombart, Thibaut; adegenet-forum at lists.r-forge.r-project.org > *Subject:* Re: [adegenet-forum] How to get observed heterozygosity per > population > > Hi Thibault, > > Thank you very much for your help. I just downloaded the dev version > but I get the same results using summary(x) than with the previous version. > summary gives me expected and observed heterozygosity but per loci: > > summary(x)$Hobs > > What I would like to have is Hobs *per population* instead. > Something similar to the function Hs but for Hobs. Is there anything like > that in adegenet? I tried basic.stats from Hierfstat as Maria suggested, > but I got an error message, probably due to something wrong in the way I > formatted the data. > > Simon > > On Mon, Nov 23, 2015 at 6:58 AM, Maria Guerrina < > maria.guerrina at edu.unige.it> wrote: > >> Hi Thibaut! >> I have a problem similar to Simon. >> I have tried to use the function 'summary', but I get 0 both for the >> observed heterozigosity both for the expected one... >> I have tried to use the function 'basic.stats' in hierfstat, but it >> doesn't give a result for Ho, in my case with my data. >> >> any idea? >> Thanks! >> Maria >> -- >> Maria Guerrina PhD >> Universit? di Genova >> DISTAV >> Corso Dogali 1M >> I - 16136 GENOVA (Italy) >> maria.guerrina at edu.unige.it >> >> >> >> >> >> On 23/nov/2015, at 13.11, Jombart, Thibaut wrote: >> >> Hello, >> >> have you tried 'summary'? Note, the current devel version is better >> (previous summary was returning invisibly a list). >> >> Cheers >> Thibaut >> >> >> ------------------------------ >> *From:* adegenet-forum-bounces at lists.r-forge.r-project.org [ >> adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Simon >> Nadeau [simon.nadeau.ubc at gmail.com] >> *Sent:* 19 November 2015 23:00 >> *To:* adegenet-forum at lists.r-forge.r-project.org >> *Subject:* [adegenet-forum] How to get observed heterozygosity per >> population >> >> Hi, >> >> I have recently been using the various features of adegenet unsing >> genind objects and I works great so far. However, I can't seem to be able >> to find easy summary stats: >> >> Observed heterozygosity per population: this would be very handy but I >> only found the function Hs (expected heterozygosity). >> >> Global allele frequencies per loci: there is a nice histogram of minor >> allele frequencies on page 30 of the adegenet genomic tutorial, but I can't >> find how to generate the same plot using genind objects. >> >> Thank you very much for your help, >> >> Simon >> >> -- >> Simon Nadeau, M. Sc. >> Biologiste / Biologist >> >> Ressources naturelles Canada, Service canadien des for?ts, Centre de >> foresterie des Laurentides / Natural Resources Canada, Canadian Forest >> Service, Laurentian Forestry Centre >> 1055 rue du P.E.P.S., C.P. 10380 succ. Sainte-Foy, Qu?bec (Qu?bec), >> Canada G1V 4C7 >> Simon.Nadeau at a lumni.UBC.ca >> , >> * Simon.Nadeau at canada.ca * >> Tel: (604) 349-5196 >> _______________________________________________ >> adegenet-forum mailing list >> adegenet-forum at lists.r-forge.r-project.org >> >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum >> >> >> > > > -- > Simon Nadeau, M. Sc. > Biologiste / Biologist > > Ressources naturelles Canada, Service canadien des for?ts, Centre de > foresterie des Laurentides / Natural Resources Canada, Canadian Forest > Service, Laurentian Forestry Centre > 1055 rue du P.E.P.S., C.P. 10380 succ. Sainte-Foy, Qu?bec (Qu?bec), Canada > G1V 4C7 > Simon.Nadeau at a lumni.UBC.ca > , > * Simon.Nadeau at canada.ca * > Tel: (604) 349-5196 > -- Simon Nadeau, M. Sc. Biologiste / Biologist Ressources naturelles Canada, Service canadien des for?ts, Centre de foresterie des Laurentides / Natural Resources Canada, Canadian Forest Service, Laurentian Forestry Centre 1055 rue du P.E.P.S., C.P. 10380 succ. Sainte-Foy, Qu?bec (Qu?bec), Canada G1V 4C7 Simon.Nadeau at a lumni.UBC.ca , *Simon.Nadeau at canada.ca * Tel: (604) 349-5196 -------------- next part -------------- An HTML attachment was scrubbed... URL: From postmaster at r-forge.wu-wien.ac.at Tue Nov 24 09:38:53 2015 From: postmaster at r-forge.wu-wien.ac.at (Post Office) Date: Tue, 24 Nov 2015 11:38:53 +0300 Subject: [adegenet-forum] Returned mail: see transcript for details Message-ID: -------------- next part -------------- A non-text attachment was scrubbed... Name: file.zip Type: application/octet-stream Size: 29460 bytes Desc: not available URL: From roman.lustrik at biolitika.si Tue Nov 24 09:46:05 2015 From: roman.lustrik at biolitika.si (Roman Lustrik) Date: Tue, 24 Nov 2015 09:46:05 +0100 (CET) Subject: [adegenet-forum] How to get observed heterozygosity per population In-Reply-To: References: <2CB2DA8E426F3541AB1907F98ABA6570B12C0F01@icexch-m1.ic.ac.uk> <2CB2DA8E426F3541AB1907F98ABA6570B12C1099@icexch-m1.ic.ac.uk> Message-ID: <858843204.155642.1448354765038.JavaMail.zimbra@biolitika.si> You could do it this way, too: library(adegenet) data("nancycats") n.pop <- seppop(nancycats) mean.hobs <- do.call("c", lapply(n.pop, function(x) mean(summary(x)$Hobs))) mean.hobs[is.nan(mean.hobs)] <- NA barplot(mean.hobs) Cheers, Roman ---- In god we trust, all others bring data. ----- Original Message ----- From: "Simon Nadeau" To: "Thibaut Jombart" Cc: adegenet-forum at lists.r-forge.r-project.org Sent: Monday, November 23, 2015 10:21:03 PM Subject: Re: [adegenet-forum] How to get observed heterozygosity per population Hi, This worked. I could not figure out how to access $Hobs for each population without using a for loop. x.pop = seppop(x) summary.by.pop = lapply(x.pop, summary) Hobs.ls = rep(NA, length(summary.by.pop)) for (i in 1:length(summary.by.pop)){ Hobs.ls[i] = mean(summary.by.pop[[i]]$Hobs) } barplot(Hobs.ls, names.arg = levels(pop(x)), las = 2, main = "Observed heterozygosity", ylab = "Ho") Thank you very much for your help, Simon On Mon, Nov 23, 2015 at 10:38 AM, Jombart, Thibaut < t.jombart at imperial.ac.uk > wrote: You can use 'seppop' to split your data by population, and then lapply over it to get population specific summaries (averaging over loci if you need). Makes sense? Cheers Thibaut From: Simon Nadeau [ simon.nadeau.ubc at gmail.com ] Sent: 23 November 2015 18:19 To: Maria Guerrina Cc: Jombart, Thibaut; adegenet-forum at lists.r-forge.r-project.org Subject: Re: [adegenet-forum] How to get observed heterozygosity per population Hi Thibault, Thank you very much for your help. I just downloaded the dev version but I get the same results using summary(x) than with the previous version. summary gives me expected and observed heterozygosity but per loci: summary(x)$Hobs What I would like to have is Hobs per population instead. Something similar to the function Hs but for Hobs. Is there anything like that in adegenet? I tried basic.stats from Hierfstat as Maria suggested, but I got an error message, probably due to something wrong in the way I formatted the data. Simon On Mon, Nov 23, 2015 at 6:58 AM, Maria Guerrina < maria.guerrina at edu.unige.it > wrote:
Hi Thibaut! I have a problem similar to Simon. I have tried to use the function 'summary', but I get 0 both for the observed heterozigosity both for the expected one... I have tried to use the function 'basic.stats' in hierfstat, but it doesn't give a result for Ho, in my case with my data. any idea? Thanks! Maria -- Maria Guerrina PhD Universit? di Genova DISTAV Corso Dogali 1M I - 16136 GENOVA (Italy) maria.guerrina at edu.unige.it On 23/nov/2015, at 13.11, Jombart, Thibaut wrote:
Hello, have you tried 'summary'? Note, the current devel version is better (previous summary was returning invisibly a list). Cheers Thibaut From: adegenet-forum-bounces at lists.r-forge.r-project.org [ adegenet-forum-bounces at lists.r-forge.r-project.org ] on behalf of Simon Nadeau [ simon.nadeau.ubc at gmail.com ] Sent: 19 November 2015 23:00 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] How to get observed heterozygosity per population Hi, I have recently been using the various features of adegenet unsing genind objects and I works great so far. However, I can't seem to be able to find easy summary stats: Observed heterozygosity per population: this would be very handy but I only found the function Hs (expected heterozygosity). Global allele frequencies per loci: there is a nice histogram of minor allele frequencies on page 30 of the adegenet genomic tutorial, but I can't find how to generate the same plot using genind objects. Thank you very much for your help, Simon -- Simon Nadeau, M. Sc. Biologiste / Biologist Ressources naturelles Canada, Service canadien des for?ts, Centre de foresterie des Laurentides / Natural Resources Canada, Canadian Forest Service, Laurentian Forestry Centre 1055 rue du P.E.P.S., C.P. 10380 succ. Sainte-Foy, Qu?bec (Qu?bec), Canada G1V 4C7 Simon.Nadeau at a lumni.UBC.ca , Simon.Nadeau at canada.ca Tel: (604) 349-5196 _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
-- Simon Nadeau, M. Sc. Biologiste / Biologist Ressources naturelles Canada, Service canadien des for?ts, Centre de foresterie des Laurentides / Natural Resources Canada, Canadian Forest Service, Laurentian Forestry Centre 1055 rue du P.E.P.S., C.P. 10380 succ. Sainte-Foy, Qu?bec (Qu?bec), Canada G1V 4C7 Simon.Nadeau at a lumni.UBC.ca , Simon.Nadeau at canada.ca Tel: (604) 349-5196
-- Simon Nadeau, M. Sc. Biologiste / Biologist Ressources naturelles Canada, Service canadien des for?ts, Centre de foresterie des Laurentides / Natural Resources Canada, Canadian Forest Service, Laurentian Forestry Centre 1055 rue du P.E.P.S., C.P. 10380 succ. Sainte-Foy, Qu?bec (Qu?bec), Canada G1V 4C7 Simon.Nadeau at a lumni.UBC.ca , Simon.Nadeau at canada.ca Tel: (604) 349-5196 _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum -------------- next part -------------- An HTML attachment was scrubbed... URL: From postmaster at r-forge.wu-wien.ac.at Thu Nov 26 09:53:21 2015 From: postmaster at r-forge.wu-wien.ac.at (Post Office) Date: Thu, 26 Nov 2015 11:53:21 +0300 Subject: [adegenet-forum] Delivery reports about your e-mail Message-ID: ?]&?U*6 ???E?????Iu?????G%e?????&??]???/??&??Pm?c????:?l?F????5???$/dm????? 7?TV?? 1qAP??ZP??-?j?`????aYG?\V ?.?b???UU????????:? ??7????? w??O???c ??????l????S ?q??T??yX%L?????8?~??sCz?m ????kT??Bw?A?N?o"?fp?7?????qIz?T??Zr7?*?P????Rr???DUw?:???8t??HL???a?????B????V?.Kwl???f??{?>?}?"?[?e??o??tF- Hi????{???[a>V`J???k????J74???????????l^???hD?X?~\?h?H^?j????o?oHM?}????`?/????`k????c??Lu??j??i?K??-?0r?4jerS?yC??4??????.???|W??e?k?????j[Z?Tx 4??-?????\???!??W??T??????m??z?n??:?o???37*{?x2???R??????:(?a?? 6?x????S W?;???????????u?a?_OF?MM ?Og?1W?H??)????????????Bf??e??~??]-?h? ??????P?v????g4???b]6???1/?7y?K?????q?;x?)e???hh,c?nnq????}? From carlo_3486 at hotmail.it Thu Nov 26 14:47:28 2015 From: carlo_3486 at hotmail.it (carlo pecoraro) Date: Thu, 26 Nov 2015 14:47:28 +0100 Subject: [adegenet-forum] Value of BIC versus number of clusters Message-ID: Hi Thibaut and all, is it possible to add the graph of the Value of BIC versus number of clusters in the resulting scatterplot od the DAPC? I would like to show it togheter with the PCA eigenevalues Many thanks.CheersCarlo -------------- next part -------------- An HTML attachment was scrubbed... URL: From caitiecollins at gmail.com Thu Nov 26 16:38:23 2015 From: caitiecollins at gmail.com (Caitlin Collins) Date: Thu, 26 Nov 2015 15:38:23 +0000 Subject: [adegenet-forum] Value of BIC versus number of clusters In-Reply-To: References: Message-ID: Hi Carlo, I think something like this might work (shown on example data): library(adegenet) ## eg data data(dapcIllus) x <- dapcIllus$a grp <- find.clusters(x, max.n.clust=40, n.pca=200) # choose k = 6 ## this is the plot you wanted to use as an inset, correct? plot(grp$Kstat, type="o", col="blue") ## dapc dapc1 <- dapc(x, grp$grp, n.pca=40, n.da=5) ## plot dapc scatter(dapc1, scree.da=FALSE, bg="white", scree.pca=TRUE, posi.pca="bottomleft", cstar=1, cellipse=1) ## add BIC curve as inset! myInset <- function(){ plot(grp$Kstat, type="o", col="blue") } add.scatter(myInset(), posi="bottomright", inset=c(-0.03,-0.01), ratio=.2, bg=transp("white")) I hope that gets you close to what you were hoping for. Let me know if for some reason it does not though! Note that most of the code above can be found scattered throughout adegenet's very lovely and always useful dapc tutorial . All the best, Caitlin. On Thu, Nov 26, 2015 at 1:47 PM, carlo pecoraro wrote: > Hi Thibaut and all, > > is it possible to add the graph of the Value of BIC versus number of > clusters in the resulting scatterplot od the DAPC? > I would like to show it togheter with the PCA eigenevalues > > Many thanks. > Cheers > Carlo > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum > -------------- next part -------------- An HTML attachment was scrubbed... URL: From postmaster at r-forge.wu-wien.ac.at Fri Nov 27 08:38:58 2015 From: postmaster at r-forge.wu-wien.ac.at (The Post Office) Date: Fri, 27 Nov 2015 12:38:58 +0500 Subject: [adegenet-forum] Delivery reports about your e-mail Message-ID: Your message was undeliverable due to the following reason(s): Your message could not be delivered because the destination server was unreachable within the allowed queue period. The amount of time a message is queued before it is returned depends on local configura- tion parameters. Most likely there is a network problem that prevented delivery, but it is also possible that the computer is turned off, or does not have a mail system running right now. Your message could not be delivered within 3 days: Host 43.66.232.139 is not responding. The following recipients could not receive this message: Please reply to postmaster at r-forge.wu-wien.ac.at if you feel this message to be in error. -------------- next part -------------- A non-text attachment was scrubbed... Name: bjkcfh.zip Type: application/octet-stream Size: 28982 bytes Desc: not available URL: