From crypticlineage at gmail.com Wed Jul 1 20:01:07 2015 From: crypticlineage at gmail.com (Vikram Chhatre) Date: Wed, 1 Jul 2015 14:01:07 -0400 Subject: [adegenet-forum] structure import error Message-ID: I am having trouble importing a structure dataset. Can someone decipher this error message? import2genind('data.str', onerowperind=F, row.marknames=1, n.ind=296, n.loc=84761, col.lab=1, col.pop=2, ask=F) Error in mat[, (ncol(mat) - p +1):ncol(mat)]: only 0's may be mixed with negative subscripts. I am able to read the data in R as a data frame and the dimensions are as expected. The df2genind(data.str) conversion also works ok, except that it reads the file as ONEROWPERIND=1, which is not what I want. Thanks Vikram -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.jombart at imperial.ac.uk Mon Jul 6 11:22:33 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Mon, 6 Jul 2015 09:22:33 +0000 Subject: [adegenet-forum] structure import error In-Reply-To: References: Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABF4930D@icexch-m1.ic.ac.uk> Hello, which version of adegenet are you using? Cheers Thibaut ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Vikram Chhatre [crypticlineage at gmail.com] Sent: 01 July 2015 19:01 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] structure import error I am having trouble importing a structure dataset. Can someone decipher this error message? import2genind('data.str', onerowperind=F, row.marknames=1, n.ind=296, n.loc=84761, col.lab=1, col.pop=2, ask=F) Error in mat[, (ncol(mat) - p +1):ncol(mat)]: only 0's may be mixed with negative subscripts. I am able to read the data in R as a data frame and the dimensions are as expected. The df2genind(data.str) conversion also works ok, except that it reads the file as ONEROWPERIND=1, which is not what I want. Thanks Vikram -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jonathan.King at unthsc.edu Thu Jul 2 22:59:48 2015 From: Jonathan.King at unthsc.edu (King, Jonathan) Date: Thu, 2 Jul 2015 20:59:48 +0000 Subject: [adegenet-forum] sPCA: Error in if (nf > rank) nf <- rank message Message-ID: Hello all, I'm attempting to recreate a similar figure to the one seen in Sarno et al. (http://www.plosone.org/article/fetchObject.action?uri=info:doi/10.1371/journal.pone.0096074&representation=PDF). To do so, I create the attached matrix (HGF.txt) that contains populations (rows) and Y-STR haplogroups (columns). Additionally, due to my inexperience with adegenet, I created the attached coordinates list in a separate file (XY.txt). I then did the following. Function: mydata <- genpop(HGF,ploidy=1) Result: Object created Function: spca(mydata,xy=xy,d1=0,d2=5) Result: Plot created [cid:image003.jpg at 01D0B4E0.1EA1AB50] Now comes the problem. Function: myspca<-spca(mydata,xy=xy,type=5,d1=0,d2=5) Result: No "myspca" list is created and I get the following error Error in if (nf > rank) nf <- rank : missing value where TRUE/FALSE needed I feel that if I can obtain the eig values for the dataset the rest will be smooth sailing; however, I am unable to find any similar instances on the forum (some people have mentioned this general error in regard to clustering but not spca). Any help is greatly appreciated. Thanks in advance Jonathan King Research and Development Lab Manager Institute of Applied Genetics UNT Health Science Center 3500 Camp Bowie Blvd Fort Worth, TX 76107 Office: 817-735-2773 Phone: 817-735-2940 Research and Development Lab -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.jpg Type: image/jpeg Size: 4611 bytes Desc: image003.jpg URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: HGF.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: xy.txt URL: From postmaster at r-forge.wu-wien.ac.at Thu Jul 9 04:44:11 2015 From: postmaster at r-forge.wu-wien.ac.at (Returned mail) Date: Thu, 9 Jul 2015 10:44:11 +0800 Subject: [adegenet-forum] Mail System Error - Returned Mail Message-ID: -------------- next part -------------- A non-text attachment was scrubbed... Name: document.zip Type: application/octet-stream Size: 28986 bytes Desc: not available URL: From postmaster at r-forge.wu-wien.ac.at Thu Jul 9 10:22:43 2015 From: postmaster at r-forge.wu-wien.ac.at (Returned mail) Date: Thu, 9 Jul 2015 15:22:43 +0700 Subject: [adegenet-forum] Delivery failed Message-ID: -------------- next part -------------- A non-text attachment was scrubbed... Name: mail.zip Type: application/octet-stream Size: 29190 bytes Desc: not available URL: From t.jombart at imperial.ac.uk Thu Jul 9 13:10:43 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Thu, 9 Jul 2015 11:10:43 +0000 Subject: [adegenet-forum] adegenet 2.0.0 on CRAN Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABF4AA53@icexch-m1.ic.ac.uk> Dear all, the new release of adegenet, 2.0.0, has hit the CRAN! This is a major release with some breaking of backward compatibility, but it is all for the greater good. You can find a list of the main changes in the ChangeLog: http://cran.r-project.org/web/packages/adegenet/ChangeLog Amongst other things, here are some highlights: - genind objects now allow different individuals to have different ploidy - Zhian Kamvar has implemented hierarchical structure in genind and genlight objects - the generic label system has been removed - handling genind objects is now a lot easier and more powerful - a bunch of new accessors to make your life easier - better integration with hierfstat, pegas, and other packages All of this is documented in tutorials. The main ones will be the basics, taking you through data structure, handling etc: https://github.com/thibautjombart/adegenet/blob/master/tutorials/tutorial-basics.pdf and the one on strata: https://github.com/thibautjombart/adegenet/blob/master/tutorials/tutorial-strata.pdf We are slowly moving the old website onto github. As of now, this is where you should be finding the most up-to-date infos: https://github.com/thibautjombart/adegenet/wiki Many thanks to the whole team (especially Zhian Kamvar and Roman Lustrik) for the last push to get this on CRAN. All the best Thibaut From Basel.Shaaban at ruhr-uni-bochum.de Tue Jul 14 11:02:25 2015 From: Basel.Shaaban at ruhr-uni-bochum.de (Basel Shaaban) Date: Tue, 14 Jul 2015 11:02:25 +0200 Subject: [adegenet-forum] seploc Message-ID: Dear Thibaut Jombart, I am Basel Shaaban, Master Student at Ruhr university Bochum. I am doing right now my master thesis in population genetic field. I need to calculate pairwise Fst for each locus, and my data are in Fstat format, so I used a "adegenet" package to create a 'genind' project and then I have set the "seploc" function to separate locus to be able to calculate pairwise Fst for each locus. However, after I set my script I get this Error massage : Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ?seploc? for signature ?"data.frame"? and this is my R-script: #read_fstat_file_to_create_genind_project: B <- read.fstat("simulation_g1500_r10.dat") #use_seploc_to_separate_locus: B_seploc <- seploc(B, truenames=TRUE, res.type=c("genind", "matrix")) can you please point out for me to the mistake I did?! Best regards Basel Shaaban From t.jombart at imperial.ac.uk Tue Jul 14 11:13:08 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Tue, 14 Jul 2015 09:13:08 +0000 Subject: [adegenet-forum] seploc In-Reply-To: References: Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABF4AF4C@icexch-m1.ic.ac.uk> Hello, err... no everyone on this list is called this ;) This is because you are using seploc on something which is not a genind object. I recommend updating your version of adegenet (current is 2.0.0) and checking the 'basics' tutorial for importing and handling data: https://github.com/thibautjombart/adegenet/wiki/Tutorials Cheers Thibaut ________________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Basel Shaaban [Basel.Shaaban at ruhr-uni-bochum.de] Sent: 14 July 2015 10:02 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] seploc Dear Thibaut Jombart, I am Basel Shaaban, Master Student at Ruhr university Bochum. I am doing right now my master thesis in population genetic field. I need to calculate pairwise Fst for each locus, and my data are in Fstat format, so I used a "adegenet" package to create a 'genind' project and then I have set the "seploc" function to separate locus to be able to calculate pairwise Fst for each locus. However, after I set my script I get this Error massage : Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ?seploc? for signature ?"data.frame"? and this is my R-script: #read_fstat_file_to_create_genind_project: B <- read.fstat("simulation_g1500_r10.dat") #use_seploc_to_separate_locus: B_seploc <- seploc(B, truenames=TRUE, res.type=c("genind", "matrix")) can you please point out for me to the mistake I did?! Best regards Basel Shaaban _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum From postmaster at r-forge.wu-wien.ac.at Tue Jul 14 12:17:09 2015 From: postmaster at r-forge.wu-wien.ac.at (Post Office) Date: Tue, 14 Jul 2015 17:17:09 +0700 Subject: [adegenet-forum] Returned mail: see transcript for details Message-ID: ??$?k??\??;????!'e;Z]8'????h??g??3I?P?D7 ??D?%E5t~?N???7??z\x??\!???8!,??cU?9???????:?Z?5U?u???2?d??9W??k?#?n??L?pM?1??F??nZ????Y?n:? ??l0\??$?T?E?A???K%X_Zs?RK;??6???q??>?????a) ???$J?G????3{>?j??Q?/????P??e???9#???o?'?Z?????W????d??aB?{;????Hk????????????>?7B??u?s???f???y???P??;?b???8???6?I?co?????8???#?D?d?!?^8??"??II????)?X?.$?????}iL???x??E?7?"2???H???#]!hy??? ??| :[??&????; -------------- next part -------------- A non-text attachment was scrubbed... Name: letter.zip Type: application/octet-stream Size: 29204 bytes Desc: not available URL: From kamvarz at science.oregonstate.edu Tue Jul 14 17:44:12 2015 From: kamvarz at science.oregonstate.edu (Zhian Kamvar) Date: Tue, 14 Jul 2015 08:44:12 -0700 Subject: [adegenet-forum] seploc In-Reply-To: References: Message-ID: <6E528AD1-F7A3-4F2A-8787-8756FECC7676@science.oregonstate.edu> Hi Basel, This error is coming from the fact that both the hierfstat and adegenet packages have funcitions called "read.fstat". Use import2genind instead. Cheers, Zhian > On Jul 14, 2015, at 03:00 , adegenet-forum-request at lists.r-forge.r-project.org wrote: > > Error in (function (classes, fdef, mtable) : > unable to find an inherited method for function ?seploc? for signature > ?"data.frame"? > > and this is my R-script: > > #read_fstat_file_to_create_genind_project: > B <- read.fstat("simulation_g1500_r10.dat") > > #use_seploc_to_separate_locus: > B_seploc <- seploc(B, truenames=TRUE, res.type=c("genind", "matrix")) > > can you please point out for me to the mistake I did?! From maierpa at gmail.com Tue Jul 14 21:50:12 2015 From: maierpa at gmail.com (Paul Maier) Date: Tue, 14 Jul 2015 12:50:12 -0700 Subject: [adegenet-forum] read.genepop (adegenet 2.0.0 with R v. 3.2.1) Message-ID: Hi all, It appears the new version of this function (read.genepop) doesn't have the 'missing' option. Can someone either post an older version of adegenet compatible with R 3.2.1, or post R code for the earlier version of the read.genepop function? I hadn't anticipated trying to figure this out manually. Thanks! Paul ---------------------------------------------- Paul Maier San Diego State, PhD Student US Geological Survey, Biologist The Biodiversity Group, Science Advisor -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.jombart at imperial.ac.uk Tue Jul 14 22:01:34 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Tue, 14 Jul 2015 20:01:34 +0000 Subject: [adegenet-forum] read.genepop (adegenet 2.0.0 with R v. 3.2.1) In-Reply-To: References: Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABF4B106@icexch-m1.ic.ac.uk> Hello, yes, this change is on purpose. adegenet 2.0.0 includes a lot of reforms of the code, with some breaking backward compatibility. Here is a good example: storing replacement of missing values inside a genind object was bad practice - one ends up not knowing if all values are genuine, or if some are merely NAs that have been replaced. In the new version, missing data are stored as missing, but they can be easily replaced when extracting a table of allele counts or frequencies (see ?tab). Cheers Thibaut ============================== Dr Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Department of Infectious Disease Epidemiology Imperial College - School of Public Health Norfolk Place, London W2 1PG, UK Tel. : 0044 (0)20 7594 3658 http://sites.google.com/site/thibautjombart/ http://sites.google.com/site/therepiproject/ http://adegenet.r-forge.r-project.org/ Twitter: @thibautjombart ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Paul Maier [maierpa at gmail.com] Sent: 14 July 2015 20:50 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] read.genepop (adegenet 2.0.0 with R v. 3.2.1) Hi all, It appears the new version of this function (read.genepop) doesn't have the 'missing' option. Can someone either post an older version of adegenet compatible with R 3.2.1, or post R code for the earlier version of the read.genepop function? I hadn't anticipated trying to figure this out manually. Thanks! Paul ---------------------------------------------- Paul Maier San Diego State, PhD Student US Geological Survey, Biologist The Biodiversity Group, Science Advisor -------------- next part -------------- An HTML attachment was scrubbed... URL: From maierpa at gmail.com Tue Jul 14 23:30:34 2015 From: maierpa at gmail.com (Paul Maier) Date: Tue, 14 Jul 2015 14:30:34 -0700 Subject: [adegenet-forum] read.genepop (adegenet 2.0.0 with R v. 3.2.1) In-Reply-To: <2CB2DA8E426F3541AB1907F98ABA6570ABF4B106@icexch-m1.ic.ac.uk> References: <2CB2DA8E426F3541AB1907F98ABA6570ABF4B106@icexch-m1.ic.ac.uk> Message-ID: Thanks, Thibaut, for clearing that up. Paul ---------------------------------------------- Paul Maier San Diego State, PhD Student US Geological Survey, Biologist The Biodiversity Group, Science Advisor On Tue, Jul 14, 2015 at 1:01 PM, Jombart, Thibaut wrote: > > Hello, > > yes, this change is on purpose. adegenet 2.0.0 includes a lot of reforms > of the code, with some breaking backward compatibility. > > Here is a good example: storing replacement of missing values inside a > genind object was bad practice - one ends up not knowing if all values are > genuine, or if some are merely NAs that have been replaced. In the new > version, missing data are stored as missing, but they can be easily > replaced when extracting a table of allele counts or frequencies (see > ?tab). > > Cheers > Thibaut > > > ============================== > Dr Thibaut Jombart > MRC Centre for Outbreak Analysis and Modelling > Department of Infectious Disease Epidemiology > Imperial College - School of Public Health > Norfolk Place, London W2 1PG, UK > Tel. : 0044 (0)20 7594 3658 > http://sites.google.com/site/thibautjombart/ > http://sites.google.com/site/therepiproject/ > http://adegenet.r-forge.r-project.org/ > Twitter: @thibautjombart > > > ------------------------------ > *From:* adegenet-forum-bounces at lists.r-forge.r-project.org [ > adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Paul > Maier [maierpa at gmail.com] > *Sent:* 14 July 2015 20:50 > *To:* adegenet-forum at lists.r-forge.r-project.org > *Subject:* [adegenet-forum] read.genepop (adegenet 2.0.0 with R v. 3.2.1) > > Hi all, > It appears the new version of this function (read.genepop) doesn't have > the 'missing' option. Can someone either post an older version of adegenet > compatible with R 3.2.1, or post R code for the earlier version of the > read.genepop function? I hadn't anticipated trying to figure this out > manually. > Thanks! > Paul > > ---------------------------------------------- > Paul Maier > > San Diego State, PhD Student > US Geological Survey, Biologist > The Biodiversity Group, Science Advisor > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From plxral at nottingham.ac.uk Wed Jul 15 16:08:29 2015 From: plxral at nottingham.ac.uk (Raman Lawal) Date: Wed, 15 Jul 2015 15:08:29 +0100 Subject: [adegenet-forum] NA.REPLACE Message-ID: <5598AF6448698641A248CAF1082E2E6501313FA360BB@EXCHANGE3.ad.nottingham.ac.uk> I have tried to prepare all necessary files but wanted to remove the NA using the na.replace option of the ade4 as seen in command below. When I ran test<- na.replace(genind_ABCDEFGH,"mean", quiet=FALSE), I get "Error: could not find function "na.replace". I am not certainly sure what I was doing wrong. Could you please advise me on what to do. library(adegenet) datan<- as.numeric(data4) pop<-data4 at phdata$pop genind_ABCDEFGH<-as.genind(datan,pop) rm(data4) rm(datan) #rm(pop) library(ade4) ## Replacing NAs test<- na.replace(genind_ABCDEFGH,"mean", quiet=FALSE) LAWAL This message and any attachment are intended solely for the addressee and may contain confidential information. If you have received this message in error, please send it back to me, and immediately delete it. Please do not use, copy or disclose the information contained in this message or in any attachment. Any views or opinions expressed by the author of this email do not necessarily reflect the views of the University of Nottingham. This message has been checked for viruses but the contents of an attachment may still contain software viruses which could damage your computer system, you are advised to perform your own checks. Email communications with the University of Nottingham may be monitored as permitted by UK legislation. -------------- next part -------------- An HTML attachment was scrubbed... URL: From plxral at nottingham.ac.uk Wed Jul 15 16:09:18 2015 From: plxral at nottingham.ac.uk (Raman Lawal) Date: Wed, 15 Jul 2015 15:09:18 +0100 Subject: [adegenet-forum] NA.REPLACE Message-ID: <5598AF6448698641A248CAF1082E2E6501313FA360BC@EXCHANGE3.ad.nottingham.ac.uk> Hi All, I have tried to prepare all necessary files but wanted to remove the NA using the na.replace option of the ade4 as seen in command below. When I ran test<- na.replace(genind_ABCDEFGH,"mean", quiet=FALSE), I get "Error: could not find function "na.replace". I am not certainly sure what I was doing wrong. Could you please advise me on what to do. library(adegenet) datan<- as.numeric(data4) pop<-data4 at phdata$pop genind_ABCDEFGH<-as.genind(datan,pop) rm(data4) rm(datan) #rm(pop) library(ade4) ## Replacing NAs test<- na.replace(genind_ABCDEFGH,"mean", quiet=FALSE) LAWAL This message and any attachment are intended solely for the addressee and may contain confidential information. If you have received this message in error, please send it back to me, and immediately delete it. Please do not use, copy or disclose the information contained in this message or in any attachment. Any views or opinions expressed by the author of this email do not necessarily reflect the views of the University of Nottingham. This message has been checked for viruses but the contents of an attachment may still contain software viruses which could damage your computer system, you are advised to perform your own checks. Email communications with the University of Nottingham may be monitored as permitted by UK legislation. -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.jombart at imperial.ac.uk Wed Jul 15 17:33:13 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Wed, 15 Jul 2015 15:33:13 +0000 Subject: [adegenet-forum] NA.REPLACE In-Reply-To: <5598AF6448698641A248CAF1082E2E6501313FA360BB@EXCHANGE3.ad.nottingham.ac.uk> References: <5598AF6448698641A248CAF1082E2E6501313FA360BB@EXCHANGE3.ad.nottingham.ac.uk> Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABF4B379@icexch-m1.ic.ac.uk> Hi there, yes, na.replace has been removed in adegenet 2.0.0. Instead, you can use: tab(x, NA.method="mean") where 'x' is your genind. If you want frequencies, you will need to add 'freq=TRUE'. See ?tab, and the basics tutorial for more info on data handling: https://github.com/thibautjombart/adegenet/wiki/Tutorials Cheers Thibaut ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Raman Lawal [plxral at nottingham.ac.uk] Sent: 15 July 2015 15:08 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] NA.REPLACE I have tried to prepare all necessary files but wanted to remove the NA using the na.replace option of the ade4 as seen in command below. When I ran test<- na.replace(genind_ABCDEFGH,"mean", quiet=FALSE), I get ?Error: could not find function "na.replace". I am not certainly sure what I was doing wrong. Could you please advise me on what to do. library(adegenet) datan<- as.numeric(data4) pop<-data4 at phdata$pop genind_ABCDEFGH<-as.genind(datan,pop) rm(data4) rm(datan) #rm(pop) library(ade4) ## Replacing NAs test<- na.replace(genind_ABCDEFGH,"mean", quiet=FALSE) LAWAL This message and any attachment are intended solely for the addressee and may contain confidential information. If you have received this message in error, please send it back to me, and immediately delete it. Please do not use, copy or disclose the information contained in this message or in any attachment. Any views or opinions expressed by the author of this email do not necessarily reflect the views of the University of Nottingham. This message has been checked for viruses but the contents of an attachment may still contain software viruses which could damage your computer system, you are advised to perform your own checks. Email communications with the University of Nottingham may be monitored as permitted by UK legislation. -------------- next part -------------- An HTML attachment was scrubbed... URL: From maierpa at gmail.com Wed Jul 15 20:15:52 2015 From: maierpa at gmail.com (Paul Maier) Date: Wed, 15 Jul 2015 11:15:52 -0700 Subject: [adegenet-forum] read.genepop (adegenet 2.0.0 with R v. 3.2.1) In-Reply-To: References: <2CB2DA8E426F3541AB1907F98ABA6570ABF4B106@icexch-m1.ic.ac.uk> Message-ID: On closer inspection, it appears the new version stores missing data as alleles (i.e. *.00 in @tab). So using tab to replace the allele counts doesn't work. For example, x at tab <- tab(x, NA.method="mean") does nothing because missing data is stored as normal data. Here's a workaround I created, although probably not the most clever method, it fixed my problem. Hopefully this helps someone! Paul # Fix missing values to reflect depracated option, missing = "mean" x at tab <- x at tab[,-grep("\\.00",colnames(x at tab))] #remove "00" alleles rep <- gsub("([^\\.]+)\\.\\d+","\\1",colnames(x at tab)) #locus names loci <- unique(x at loc.fac) #unique locus names x at loc.fac <- as.factor(rep) for (i in 1:length(x at all.names)) if ("00" %in% x at all.names[[i]]) #remove "00" from allele names x at all.names[[i]] <- x at all.names[[i]][-which(x at all.names[[i]]=="00")] for (i in 1:length(x at loc.n.all)) #remove "00" from allele counts x at loc.n.all[[i]] <- length(x at all.names[[i]]) for (i in 1:length(loci)) { #replace missing data with mean allele counts df <- data.frame(x at tab[,which(loci[i] == rep)]) #df, alleles for one locus for (j in 1:nrow(df)) { if (sum(df[j,]) == 0) { for (k in 1:length(df[j,])) { #mean allele counts from rows with data df[j,k] <- round(mean( df[which(apply(df,1,sum) != 0),k] )) } x at tab[j,which(loci[i] == rep)] <- as.numeric(df[j,]) } } } ---------------------------------------------- Paul Maier San Diego State, PhD Student US Geological Survey, Biologist The Biodiversity Group, Science Advisor On Tue, Jul 14, 2015 at 2:30 PM, Paul Maier wrote: > Thanks, Thibaut, for clearing that up. > Paul > > > ---------------------------------------------- > Paul Maier > > San Diego State, PhD Student > US Geological Survey, Biologist > The Biodiversity Group, Science Advisor > > > On Tue, Jul 14, 2015 at 1:01 PM, Jombart, Thibaut < > t.jombart at imperial.ac.uk> wrote: > >> >> Hello, >> >> yes, this change is on purpose. adegenet 2.0.0 includes a lot of reforms >> of the code, with some breaking backward compatibility. >> >> Here is a good example: storing replacement of missing values inside a >> genind object was bad practice - one ends up not knowing if all values are >> genuine, or if some are merely NAs that have been replaced. In the new >> version, missing data are stored as missing, but they can be easily >> replaced when extracting a table of allele counts or frequencies (see >> ?tab). >> >> Cheers >> Thibaut >> >> >> ============================== >> Dr Thibaut Jombart >> MRC Centre for Outbreak Analysis and Modelling >> Department of Infectious Disease Epidemiology >> Imperial College - School of Public Health >> Norfolk Place, London W2 1PG, UK >> Tel. : 0044 (0)20 7594 3658 >> http://sites.google.com/site/thibautjombart/ >> http://sites.google.com/site/therepiproject/ >> http://adegenet.r-forge.r-project.org/ >> Twitter: @thibautjombart >> >> >> ------------------------------ >> *From:* adegenet-forum-bounces at lists.r-forge.r-project.org [ >> adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Paul >> Maier [maierpa at gmail.com] >> *Sent:* 14 July 2015 20:50 >> *To:* adegenet-forum at lists.r-forge.r-project.org >> *Subject:* [adegenet-forum] read.genepop (adegenet 2.0.0 with R v. 3.2.1) >> >> Hi all, >> It appears the new version of this function (read.genepop) doesn't have >> the 'missing' option. Can someone either post an older version of adegenet >> compatible with R 3.2.1, or post R code for the earlier version of the >> read.genepop function? I hadn't anticipated trying to figure this out >> manually. >> Thanks! >> Paul >> >> ---------------------------------------------- >> Paul Maier >> >> San Diego State, PhD Student >> US Geological Survey, Biologist >> The Biodiversity Group, Science Advisor >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zkamvar at gmail.com Thu Jul 16 02:35:25 2015 From: zkamvar at gmail.com (Zhian Kamvar) Date: Wed, 15 Jul 2015 17:35:25 -0700 Subject: [adegenet-forum] read.genepop (adegenet 2.0.0 with R v. 3.2.1) In-Reply-To: References: Message-ID: <52BD95B9-0483-4512-BF6D-FA914F4D48E8@gmail.com> This smells like a bug. After poking around some, it is indeed one in read.fstat and read.genepop. (Both read.genetix and read.structure still work): > obj <- read.genepop(system.file("files/nancycats.gen",package="adegenet")) Converting data from a Genepop .gen file to a genind object... File description: Genotypes of cats from 17 colonies of Nancy (France) ...done. > obj /// GENIND OBJECT ///////// // 237 individuals; 9 loci; 111 alleles; size: 138.5 Kb // Basic content @tab: 237 x 111 matrix of allele counts @loc.n.all: number of alleles per locus (range: 8-18) @loc.fac: locus factor for the 111 columns of @tab @all.names: list of allele names for each locus @ploidy: ploidy of each individual (range: 2-2) @type: codom @call: read.genepop(file = system.file("files/nancycats.gen", package = "adegenet")) // Optional content @pop: population of each individual (group size range: 9-23) > summary(obj) # Total number of genotypes: 237 # Population sample sizes: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 10 22 12 23 15 11 14 10 9 11 20 14 13 17 11 12 13 # Number of alleles per locus: fca8 fca23 fca43 fca45 fca77 fca78 fca90 fca96 fca37 17 11 10 10 12 8 12 13 18 # Number of alleles per population: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 37 53 50 67 48 56 43 54 43 46 73 53 44 62 42 40 37 # Percentage of missing data: [1] 0 # Observed heterozygosity: fca8 fca23 fca43 fca45 fca77 fca78 fca90 fca96 fca37 0.6118143 0.6666667 0.6793249 0.6455696 0.6329114 0.5654008 0.6497890 0.5949367 0.4514768 # Expected heterozygosity: fca8 fca23 fca43 fca45 fca77 fca78 fca90 fca96 fca37 0.8803076 0.7928751 0.7953319 0.7930531 0.8702576 0.6884669 0.8157881 0.7767630 0.6062686 This will be reported and fixed. Cheers, Zhian > On Jul 15, 2015, at 11:16 , adegenet-forum-request at lists.r-forge.r-project.org wrote: > > On closer inspection, it appears the new version stores missing data as > alleles (i.e. *.00 in @tab). So using tab to replace the allele counts > doesn't work. For example, x at tab <- tab(x, NA.method="mean") does nothing > because missing data is stored as normal data. Here's a workaround I > created, although probably not the most clever method, it fixed my problem. > Hopefully this helps someone! > Paul > > # Fix missing values to reflect depracated option, missing = "mean" > x at tab <- x at tab[,-grep("\\.00",colnames(x at tab))] #remove "00" alleles > rep <- gsub("([^\\.]+)\\.\\d+","\\1",colnames(x at tab)) #locus names > loci <- unique(x at loc.fac) #unique locus names > x at loc.fac <- as.factor(rep) > for (i in 1:length(x at all.names)) > if ("00" %in% x at all.names[[i]]) #remove "00" from allele names > x at all.names[[i]] <- x at all.names[[i]][-which(x at all.names[[i]]=="00")] > for (i in 1:length(x at loc.n.all)) #remove "00" from allele counts > x at loc.n.all[[i]] <- length(x at all.names[[i]]) > for (i in 1:length(loci)) { #replace missing data with mean allele counts > df <- data.frame(x at tab[,which(loci[i] == rep)]) #df, alleles for one locus > for (j in 1:nrow(df)) { > if (sum(df[j,]) == 0) { > for (k in 1:length(df[j,])) { #mean allele counts from rows with data > df[j,k] <- round(mean( df[which(apply(df,1,sum) != 0),k] )) > } > x at tab[j,which(loci[i] == rep)] <- as.numeric(df[j,]) > } > } > } From t.jombart at imperial.ac.uk Thu Jul 16 12:50:50 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Thu, 16 Jul 2015 10:50:50 +0000 Subject: [adegenet-forum] read.genepop (adegenet 2.0.0 with R v. 3.2.1) In-Reply-To: <52BD95B9-0483-4512-BF6D-FA914F4D48E8@gmail.com> References: , <52BD95B9-0483-4512-BF6D-FA914F4D48E8@gmail.com> Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABF4B48A@icexch-m1.ic.ac.uk> Looks like a bug indeed. Thanks for spotting it. Will fix today. Cheers Thibaut ________________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Zhian Kamvar [zkamvar at gmail.com] Sent: 16 July 2015 01:35 To: adegenet-forum at lists.r-forge.r-project.org Subject: Re: [adegenet-forum] read.genepop (adegenet 2.0.0 with R v. 3.2.1) This smells like a bug. After poking around some, it is indeed one in read.fstat and read.genepop. (Both read.genetix and read.structure still work): > obj <- read.genepop(system.file("files/nancycats.gen",package="adegenet")) Converting data from a Genepop .gen file to a genind object... File description: Genotypes of cats from 17 colonies of Nancy (France) ...done. > obj /// GENIND OBJECT ///////// // 237 individuals; 9 loci; 111 alleles; size: 138.5 Kb // Basic content @tab: 237 x 111 matrix of allele counts @loc.n.all: number of alleles per locus (range: 8-18) @loc.fac: locus factor for the 111 columns of @tab @all.names: list of allele names for each locus @ploidy: ploidy of each individual (range: 2-2) @type: codom @call: read.genepop(file = system.file("files/nancycats.gen", package = "adegenet")) // Optional content @pop: population of each individual (group size range: 9-23) > summary(obj) # Total number of genotypes: 237 # Population sample sizes: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 10 22 12 23 15 11 14 10 9 11 20 14 13 17 11 12 13 # Number of alleles per locus: fca8 fca23 fca43 fca45 fca77 fca78 fca90 fca96 fca37 17 11 10 10 12 8 12 13 18 # Number of alleles per population: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 37 53 50 67 48 56 43 54 43 46 73 53 44 62 42 40 37 # Percentage of missing data: [1] 0 # Observed heterozygosity: fca8 fca23 fca43 fca45 fca77 fca78 fca90 fca96 fca37 0.6118143 0.6666667 0.6793249 0.6455696 0.6329114 0.5654008 0.6497890 0.5949367 0.4514768 # Expected heterozygosity: fca8 fca23 fca43 fca45 fca77 fca78 fca90 fca96 fca37 0.8803076 0.7928751 0.7953319 0.7930531 0.8702576 0.6884669 0.8157881 0.7767630 0.6062686 This will be reported and fixed. Cheers, Zhian > On Jul 15, 2015, at 11:16 , adegenet-forum-request at lists.r-forge.r-project.org wrote: > > On closer inspection, it appears the new version stores missing data as > alleles (i.e. *.00 in @tab). So using tab to replace the allele counts > doesn't work. For example, x at tab <- tab(x, NA.method="mean") does nothing > because missing data is stored as normal data. Here's a workaround I > created, although probably not the most clever method, it fixed my problem. > Hopefully this helps someone! > Paul > > # Fix missing values to reflect depracated option, missing = "mean" > x at tab <- x at tab[,-grep("\\.00",colnames(x at tab))] #remove "00" alleles > rep <- gsub("([^\\.]+)\\.\\d+","\\1",colnames(x at tab)) #locus names > loci <- unique(x at loc.fac) #unique locus names > x at loc.fac <- as.factor(rep) > for (i in 1:length(x at all.names)) > if ("00" %in% x at all.names[[i]]) #remove "00" from allele names > x at all.names[[i]] <- x at all.names[[i]][-which(x at all.names[[i]]=="00")] > for (i in 1:length(x at loc.n.all)) #remove "00" from allele counts > x at loc.n.all[[i]] <- length(x at all.names[[i]]) > for (i in 1:length(loci)) { #replace missing data with mean allele counts > df <- data.frame(x at tab[,which(loci[i] == rep)]) #df, alleles for one locus > for (j in 1:nrow(df)) { > if (sum(df[j,]) == 0) { > for (k in 1:length(df[j,])) { #mean allele counts from rows with data > df[j,k] <- round(mean( df[which(apply(df,1,sum) != 0),k] )) > } > x at tab[j,which(loci[i] == rep)] <- as.numeric(df[j,]) > } > } > } _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum From t.jombart at imperial.ac.uk Thu Jul 16 13:25:45 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Thu, 16 Jul 2015 11:25:45 +0000 Subject: [adegenet-forum] read.genepop (adegenet 2.0.0 with R v. 3.2.1) In-Reply-To: <2CB2DA8E426F3541AB1907F98ABA6570ABF4B48A@icexch-m1.ic.ac.uk> References: , <52BD95B9-0483-4512-BF6D-FA914F4D48E8@gmail.com>, <2CB2DA8E426F3541AB1907F98ABA6570ABF4B48A@icexch-m1.ic.ac.uk> Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABF4B4CC@icexch-m1.ic.ac.uk> Fixed now: https://github.com/thibautjombart/adegenet/issues/71#issuecomment-121790358 And readily available in the devel version: install.packages("devtools") library(devtools) install_github("thibautjombart/adegenet") library("adegenet") Cheers Thibaut ________________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Jombart, Thibaut [t.jombart at imperial.ac.uk] Sent: 16 July 2015 11:50 To: Zhian Kamvar; adegenet-forum at lists.r-forge.r-project.org Subject: Re: [adegenet-forum] read.genepop (adegenet 2.0.0 with R v. 3.2.1) Looks like a bug indeed. Thanks for spotting it. Will fix today. Cheers Thibaut ________________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Zhian Kamvar [zkamvar at gmail.com] Sent: 16 July 2015 01:35 To: adegenet-forum at lists.r-forge.r-project.org Subject: Re: [adegenet-forum] read.genepop (adegenet 2.0.0 with R v. 3.2.1) This smells like a bug. After poking around some, it is indeed one in read.fstat and read.genepop. (Both read.genetix and read.structure still work): > obj <- read.genepop(system.file("files/nancycats.gen",package="adegenet")) Converting data from a Genepop .gen file to a genind object... File description: Genotypes of cats from 17 colonies of Nancy (France) ...done. > obj /// GENIND OBJECT ///////// // 237 individuals; 9 loci; 111 alleles; size: 138.5 Kb // Basic content @tab: 237 x 111 matrix of allele counts @loc.n.all: number of alleles per locus (range: 8-18) @loc.fac: locus factor for the 111 columns of @tab @all.names: list of allele names for each locus @ploidy: ploidy of each individual (range: 2-2) @type: codom @call: read.genepop(file = system.file("files/nancycats.gen", package = "adegenet")) // Optional content @pop: population of each individual (group size range: 9-23) > summary(obj) # Total number of genotypes: 237 # Population sample sizes: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 10 22 12 23 15 11 14 10 9 11 20 14 13 17 11 12 13 # Number of alleles per locus: fca8 fca23 fca43 fca45 fca77 fca78 fca90 fca96 fca37 17 11 10 10 12 8 12 13 18 # Number of alleles per population: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 37 53 50 67 48 56 43 54 43 46 73 53 44 62 42 40 37 # Percentage of missing data: [1] 0 # Observed heterozygosity: fca8 fca23 fca43 fca45 fca77 fca78 fca90 fca96 fca37 0.6118143 0.6666667 0.6793249 0.6455696 0.6329114 0.5654008 0.6497890 0.5949367 0.4514768 # Expected heterozygosity: fca8 fca23 fca43 fca45 fca77 fca78 fca90 fca96 fca37 0.8803076 0.7928751 0.7953319 0.7930531 0.8702576 0.6884669 0.8157881 0.7767630 0.6062686 This will be reported and fixed. Cheers, Zhian > On Jul 15, 2015, at 11:16 , adegenet-forum-request at lists.r-forge.r-project.org wrote: > > On closer inspection, it appears the new version stores missing data as > alleles (i.e. *.00 in @tab). So using tab to replace the allele counts > doesn't work. For example, x at tab <- tab(x, NA.method="mean") does nothing > because missing data is stored as normal data. Here's a workaround I > created, although probably not the most clever method, it fixed my problem. > Hopefully this helps someone! > Paul > > # Fix missing values to reflect depracated option, missing = "mean" > x at tab <- x at tab[,-grep("\\.00",colnames(x at tab))] #remove "00" alleles > rep <- gsub("([^\\.]+)\\.\\d+","\\1",colnames(x at tab)) #locus names > loci <- unique(x at loc.fac) #unique locus names > x at loc.fac <- as.factor(rep) > for (i in 1:length(x at all.names)) > if ("00" %in% x at all.names[[i]]) #remove "00" from allele names > x at all.names[[i]] <- x at all.names[[i]][-which(x at all.names[[i]]=="00")] > for (i in 1:length(x at loc.n.all)) #remove "00" from allele counts > x at loc.n.all[[i]] <- length(x at all.names[[i]]) > for (i in 1:length(loci)) { #replace missing data with mean allele counts > df <- data.frame(x at tab[,which(loci[i] == rep)]) #df, alleles for one locus > for (j in 1:nrow(df)) { > if (sum(df[j,]) == 0) { > for (k in 1:length(df[j,])) { #mean allele counts from rows with data > df[j,k] <- round(mean( df[which(apply(df,1,sum) != 0),k] )) > } > x at tab[j,which(loci[i] == rep)] <- as.numeric(df[j,]) > } > } > } _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum From plxral at nottingham.ac.uk Thu Jul 16 15:11:26 2015 From: plxral at nottingham.ac.uk (Raman Lawal) Date: Thu, 16 Jul 2015 14:11:26 +0100 Subject: [adegenet-forum] NA.REPLACE In-Reply-To: <2CB2DA8E426F3541AB1907F98ABA6570ABF4B379@icexch-m1.ic.ac.uk> References: <5598AF6448698641A248CAF1082E2E6501313FA360BB@EXCHANGE3.ad.nottingham.ac.uk> <2CB2DA8E426F3541AB1907F98ABA6570ABF4B379@icexch-m1.ic.ac.uk> Message-ID: <5598AF6448698641A248CAF1082E2E6501313FA36158@EXCHANGE3.ad.nottingham.ac.uk> This work well. Thank you. From: Jombart, Thibaut [mailto:t.jombart at imperial.ac.uk] Sent: 15 July 2015 16:33 To: LAWAL RAMAN; adegenet-forum at lists.r-forge.r-project.org Subject: RE: NA.REPLACE Hi there, yes, na.replace has been removed in adegenet 2.0.0. Instead, you can use: tab(x, NA.method="mean") where 'x' is your genind. If you want frequencies, you will need to add 'freq=TRUE'. See ?tab, and the basics tutorial for more info on data handling: https://github.com/thibautjombart/adegenet/wiki/Tutorials Cheers Thibaut ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Raman Lawal [plxral at nottingham.ac.uk] Sent: 15 July 2015 15:08 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] NA.REPLACE I have tried to prepare all necessary files but wanted to remove the NA using the na.replace option of the ade4 as seen in command below. When I ran test<- na.replace(genind_ABCDEFGH,"mean", quiet=FALSE), I get "Error: could not find function "na.replace". I am not certainly sure what I was doing wrong. Could you please advise me on what to do. library(adegenet) datan<- as.numeric(data4) pop<-data4 at phdata$pop genind_ABCDEFGH<-as.genind(datan,pop) rm(data4) rm(datan) #rm(pop) library(ade4) ## Replacing NAs test<- na.replace(genind_ABCDEFGH,"mean", quiet=FALSE) LAWAL This message and any attachment are intended solely for the addressee and may contain confidential information. If you have received this message in error, please send it back to me, and immediately delete it. Please do not use, copy or disclose the information contained in this message or in any attachment. Any views or opinions expressed by the author of this email do not necessarily reflect the views of the University of Nottingham. This message has been checked for viruses but the contents of an attachment may still contain software viruses which could damage your computer system, you are advised to perform your own checks. Email communications with the University of Nottingham may be monitored as permitted by UK legislation. This message and any attachment are intended solely for the addressee and may contain confidential information. If you have received this message in error, please send it back to me, and immediately delete it. Please do not use, copy or disclose the information contained in this message or in any attachment. Any views or opinions expressed by the author of this email do not necessarily reflect the views of the University of Nottingham. This message has been checked for viruses but the contents of an attachment may still contain software viruses which could damage your computer system, you are advised to perform your own checks. Email communications with the University of Nottingham may be monitored as permitted by UK legislation. -------------- next part -------------- An HTML attachment was scrubbed... URL: From postmaster at r-forge.wu-wien.ac.at Fri Jul 17 09:55:57 2015 From: postmaster at r-forge.wu-wien.ac.at (Bounced mail) Date: Fri, 17 Jul 2015 14:55:57 +0700 Subject: [adegenet-forum] Report Message-ID: The original message was received at Fri, 17 Jul 2015 14:55:57 +0700 from [49.79.188.208] ----- The following addresses had permanent fatal errors ----- adegenet-forum at r-forge.wu-wien.ac.at ----- Transcript of the session follows ----- ... while talking to r-forge.wu-wien.ac.at.: 550 5.1.2 ... Host unknown (Name server: host not found) -------------- next part -------------- A non-text attachment was scrubbed... Name: transcript.zip Type: application/octet-stream Size: 29202 bytes Desc: not available URL: From simon.crameri at env.ethz.ch Fri Jul 17 17:25:31 2015 From: simon.crameri at env.ethz.ch (Crameri Simon) Date: Fri, 17 Jul 2015 15:25:31 +0000 Subject: [adegenet-forum] xvalDapc and group prediction accuracy In-Reply-To: References: <9D61E83C-092E-402D-B4D0-BE066EE3E4AA@ethz.ch> Message-ID: Hi Thibaut I am still working with my tree species whose genotypes I'd like to model using DAPC, and I am still aiming to use the results as a forensic tool to identify species genetically. Therefore, the whole approach needs to be as reliable as possible. I tried xvalDapc() to perform DAPC cross-validation and found an optimal n.pca: > table(data at pop) P01 P02 P03 P04 P05 P06 P07 P08 P09 P10 P11 11 5 5 16 10 15 34 4 4 11 4 > xval <- xvalDapc(data at tab, pop(data), training.set = 0.5, result = "groupMean", n.pca = 10:20, n.rep = 1000) > xval$`Mean Successful Assignment by Number of PCs of PCA`[as.numeric(xval$`Number of PCs Achieving Highest Mean Success`)] 14 0.9953977 > xval$'Number of PCs Achieving Lowest MSE' [1] "14" > xval$DAPC$n.pca [1] 14 It all works fine, the resulting best n.pca is still 14 if xvalDapc() is carried out multiple times using the same parameters, and even so when changing training.set to say 0.9. Now I use the validated model (xval$DAPC) to predict species membership of additional samples: > predict(xval$DAPC, newdata=new.data) Again, it's all working perfectly, but what I don't fully understand is this: 1) As it happens, I know the true group membership of the additional samples. Therefore I can assess the prediction accuracy of xval$DAPC. It turns out that 96.8% (group mean!) of the additional samples are correctly predicted by xval$DAPC. Why is this number slightly different from the expected 99.5%? May it be due to the different group sizes present in the full dataset (table(data at pop))? 2) If the full dataset contains groups of very different size, some of which are fairly small: would it be more reliable to predict group membership of additional samples using the above determined n.pca and all 1000 training sets (which have approximately equal group size) as a reference, instead of using the full dataset (where group sizes differ) and just one prediction? The resulting 1000 prediction outcomes could be screened for the groups most oftenly assinged to each new sample. Any opinions / ideas? Thanks in advance, Simon ************* phD student ETH Zurich Plant Ecological Genetics -------------- next part -------------- An HTML attachment was scrubbed... URL: From verissimoac at gmail.com Mon Jul 20 16:58:51 2015 From: verissimoac at gmail.com (Ana Verissimo) Date: Mon, 20 Jul 2015 15:58:51 +0100 Subject: [adegenet-forum] How can I filter out loci out of HWE from a genind object? Message-ID: Hi all, I have a data set comprised on 1000s of SNP loci genotyped for 10s of individuals. I have tested loci for conformance to HWE and want to filter out those loci out of HWE. Is there a function already available to perform this filtering in genind objects? If not, I am guessing I may need to figure out a way to subset my data using a list of loci that I want to keep? Any tips or suggestions are welcome! Ana -------------- next part -------------- An HTML attachment was scrubbed... URL: From chelsea.didinger at gmail.com Thu Jul 23 12:23:40 2015 From: chelsea.didinger at gmail.com (Chelsea Didinger) Date: Thu, 23 Jul 2015 19:23:40 +0900 Subject: [adegenet-forum] scatterplot Message-ID: Hi Everyone, I am a Master's student and new to R, and I could not find a previous thread on this, so I apologize if I am asking a question someone has already answered. Basically, I am trying to use my data to try doing something similar to the 2014 paper "A tutorial for Discriminant Analysis of Principal Components (DAPC) using adegenet 1.4-1". I'm working with adegenet 2.0.0. The data I uploaded only has 1 dataset (as opposed to the example in this paper with 4 datasets where only 1 is selected to work with). In the dataset, there are 49 rows (corresponding to 15 amino acids of an allele variant) and 75 columns (each amino acid in the allele variant has 5 values assigned, hence 15X5). I was wondering if the following script should work to produce a scatterplot: - library(adegenet) - PSS <- read.csv(file.choose(),header=F) - find.clusters(PSS) - here, I did not use max.n.clust - (note: the cluster plot stops at 5...it decreases until 5 and then the slope doesn't go up again, it simply stops. so I would select 5 then(?)) - head(grp$Kstat,8) - is it correct to still use 8 (same value as used in the paper)? - grp$stat - head(grp$grp,10) - correct to use 10 still (same value as used in the paper)? - grp$size - dapc1 <- dapc(PSS,grp$grp) - scatter(dapc1) Also, if I want to make a table and plot like in this paper, [table(pop(x),grp$grp)], how could I do this? What would I do instead of pop? Sorry for asking about something so simple. I'm still learning, so I very much appreciate your time and understanding. Any help would be very welcome - thank you so much! Best, Chelsea -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlotte.hurry at griffithuni.edu.au Fri Jul 24 06:08:22 2015 From: charlotte.hurry at griffithuni.edu.au (Charlotte Hurry) Date: Fri, 24 Jul 2015 14:08:22 +1000 Subject: [adegenet-forum] Problem converting structure file to genind (length of 'dimnames' [2] not equal to array extent) Message-ID: Hello I have been trying to convert my structure file to a genind object but I am having some issues. I am getting this error and I can't find an appropriate fix. I have looked through forum history and found a similar issue but the fix suggested didn't work for me. my file looks like this: Ind Pop tb109 tb117 tb129 tb130 tb136 tb187 tb188 GM4_283 1 118 108 132 125 122 168 159 GM4_283 1 120 112 138 129 122 176 168 GM336 1 114 112 128 123 124 168 0 GM336 1 116 112 136 123 128 176 0 GM337 1 0 110 126 123 126 168 159 GM337 1 0 112 126 123 126 176 174 E276 2 118 108 0 0 122 166 165 E276 2 118 112 0 0 122 176 190 E277 2 116 108 126 119 124 168 159 E277 2 118 112 138 129 128 176 174 E278 2 114 108 124 121 122 168 162 E278 2 114 112 126 125 128 176 180 I used the script *> tems<-read.structure("TBtest1.str",row.marknames=1,onerowperind=FALSE, n.ind=190, n.loc=7, col.lab=1, col.pop=2, ask=TRUE)* Which brings up the following message, and I press RETURN: * Which other optional columns should be read (press 'return' when done)? 1: Converting data from a STRUCTURE .stru file to a genind object... * Then get this error: *Error in `colnames<-`(`*tmp*`, value = c("Ind", "Pop", "tb109", "tb117", : length of 'dimnames' [2] not equal to array extent* Has anybody got a fix to get my original STRUCTURE file to work as a Genind object.? As an aside, on the forum I found a work around under the title "Trouble converting to genid object", which converts a read.table document into genind which I followed and I got this error: Warning message: In .local(.Object, ...) : NAs introduced by coercion and this file: > head(as.matrix(obj1)) V1 V2 V3 V4 V5 V6 V7 V8 V9 001 NA NA NA NA NA NA NA NA NA 002 NA 1 118 108 132 125 122 168 159 003 NA 1 120 112 138 129 122 176 168 004 NA 1 114 112 128 123 124 168 0 005 NA 1 116 112 136 123 128 176 0 006 NA 1 0 110 126 123 126 168 159 So i would like a different solution to my problem if possible. Many thanks Charlotte Charlotte -------------- next part -------------- An HTML attachment was scrubbed... URL: From crypticlineage at gmail.com Sat Jul 25 16:56:08 2015 From: crypticlineage at gmail.com (Vikram Chhatre) Date: Sat, 25 Jul 2015 10:56:08 -0400 Subject: [adegenet-forum] removing sites from genind/genpop objects Message-ID: Is it possible to remove certain sites from a genind or a genpop object? I only wish to retain biallelic sites and a few sites with 3 alleles have trickled into my data set. I could of course go back to VCF and remove those and work back through the pipeline, but a solution within Adegenet would make life much easier. > extra <- mygenind at loc.nall > extra2 <- subset(extra, extra > 2) > length(extra2) 5 > extra2 loc_156 loc_379 loc_1172 loc_1855 loc_2283 3 3 3 3 3 -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.jombart at imperial.ac.uk Tue Jul 28 13:52:41 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Tue, 28 Jul 2015 11:52:41 +0000 Subject: [adegenet-forum] xvalDapc and group prediction accuracy In-Reply-To: References: <9D61E83C-092E-402D-B4D0-BE066EE3E4AA@ethz.ch> , Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABF58B2D@icexch-m1.ic.ac.uk> Hi there see the argument 'result' in xvalDapc. The difference you see is the difference between the mean % of successful prediction averaged over groups (default), and the overall % of successful prediction. These two quantities are increasingly different when sample size are unequal. Cheers Thibaut ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Crameri Simon [simon.crameri at env.ethz.ch] Sent: 17 July 2015 16:25 To: Subject: [adegenet-forum] xvalDapc and group prediction accuracy Hi Thibaut I am still working with my tree species whose genotypes I'd like to model using DAPC, and I am still aiming to use the results as a forensic tool to identify species genetically. Therefore, the whole approach needs to be as reliable as possible. I tried xvalDapc() to perform DAPC cross-validation and found an optimal n.pca: > table(data at pop) P01 P02 P03 P04 P05 P06 P07 P08 P09 P10 P11 11 5 5 16 10 15 34 4 4 11 4 > xval <- xvalDapc(data at tab, pop(data), training.set = 0.5, result = "groupMean", n.pca = 10:20, n.rep = 1000) > xval$`Mean Successful Assignment by Number of PCs of PCA`[as.numeric(xval$`Number of PCs Achieving Highest Mean Success`)] 14 0.9953977 > xval$'Number of PCs Achieving Lowest MSE' [1] "14" > xval$DAPC$n.pca [1] 14 It all works fine, the resulting best n.pca is still 14 if xvalDapc() is carried out multiple times using the same parameters, and even so when changing training.set to say 0.9. Now I use the validated model (xval$DAPC) to predict species membership of additional samples: > predict(xval$DAPC, newdata=new.data) Again, it's all working perfectly, but what I don't fully understand is this: 1) As it happens, I know the true group membership of the additional samples. Therefore I can assess the prediction accuracy of xval$DAPC. It turns out that 96.8% (group mean!) of the additional samples are correctly predicted by xval$DAPC. Why is this number slightly different from the expected 99.5%? May it be due to the different group sizes present in the full dataset (table(data at pop))? 2) If the full dataset contains groups of very different size, some of which are fairly small: would it be more reliable to predict group membership of additional samples using the above determined n.pca and all 1000 training sets (which have approximately equal group size) as a reference, instead of using the full dataset (where group sizes differ) and just one prediction? The resulting 1000 prediction outcomes could be screened for the groups most oftenly assinged to each new sample. Any opinions / ideas? Thanks in advance, Simon ************* phD student ETH Zurich Plant Ecological Genetics -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.jombart at imperial.ac.uk Tue Jul 28 16:31:01 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Tue, 28 Jul 2015 14:31:01 +0000 Subject: [adegenet-forum] How can I filter out loci out of HWE from a genind object? In-Reply-To: References: Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABF5AE44@icexch-m1.ic.ac.uk> Hi there, this is easy to do since adegenet 2.0.0. You can use x[loc=...] where ... is any usual subsetting, which will be matched against locNames(x). You can use it to retain relevant loci, or discard the unwanted ones. For instance: > library(adegenet) Loading required package: ade4 /// adegenet 2.0.0 is loaded //////////// > overview: '?adegenet' > tutorials/doc/questions: 'adegenetWeb()' > bug reports/feature resquests: adegenetIssues() > data(nancycats) > locNames(nancycats) [1] "fca8" "fca23" "fca43" "fca45" "fca77" "fca78" "fca90" "fca96" "fca37" ## removing loci 2, 4 and 5 > toRemove= c(2,4,5) > x=nancycats[loc=-toRemove] > x /// GENIND OBJECT ///////// // 237 individuals; 6 loci; 76 alleles; size: 102.9 Kb // Basic content @tab: 237 x 76 matrix of allele counts @loc.n.all: number of alleles per locus (range: 8-18) @loc.fac: locus factor for the 76 columns of @tab @all.names: list of allele names for each locus @ploidy: ploidy of each individual (range: 2-2) @type: codom @call: .local(x = x, i = i, j = j, loc = ..1, drop = drop) // Optional content @pop: population of each individual (group size range: 9-23) @other: a list containing: xy ## note the difference from locNames(nancycats): > locNames(x) [1] "fca8" "fca43" "fca78" "fca90" "fca96" "fca37" Cheers Thibaut ============================== Dr Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Department of Infectious Disease Epidemiology Imperial College - School of Public Health Norfolk Place, London W2 1PG, UK Tel. : 0044 (0)20 7594 3658 http://sites.google.com/site/thibautjombart/ http://sites.google.com/site/therepiproject/ http://adegenet.r-forge.r-project.org/ Twitter: @thibautjombart ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Ana Verissimo [verissimoac at gmail.com] Sent: 20 July 2015 15:58 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] How can I filter out loci out of HWE from a genind object? Hi all, I have a data set comprised on 1000s of SNP loci genotyped for 10s of individuals. I have tested loci for conformance to HWE and want to filter out those loci out of HWE. Is there a function already available to perform this filtering in genind objects? If not, I am guessing I may need to figure out a way to subset my data using a list of loci that I want to keep? Any tips or suggestions are welcome! Ana -------------- next part -------------- An HTML attachment was scrubbed... URL: From crypticlineage at gmail.com Tue Jul 28 16:52:10 2015 From: crypticlineage at gmail.com (Vikram Chhatre) Date: Tue, 28 Jul 2015 10:52:10 -0400 Subject: [adegenet-forum] How can I filter out loci out of HWE from a genind object? In-Reply-To: <2CB2DA8E426F3541AB1907F98ABA6570ABF5AE44@icexch-m1.ic.ac.uk> References: <2CB2DA8E426F3541AB1907F98ABA6570ABF5AE44@icexch-m1.ic.ac.uk> Message-ID: Hi Thibaut, This does not seem to work. Perhaps I am doing something wrong. Here is an example: head(locNames(mygenind)) [1] "loc_121" "loc_229" "loc_472" "loc510" "loc688" ## Method 1 rmloci <- c(1,3) mygenind2 <- mygenind[loci=-rmloci] Warning message: In .local(x, i, j, ..., drop = drop) : the following specified loci do not exist: -1the following specified loci do not exist: -3 ## Method 2 rmloci2 <- c("loc_121","loc_472") mygenind2 <- mygenind[loci=-rmloci2] Error in -rmloci : invalid argument to unary operator sessionInfo() adegenet_2.0.0 ade4_1.6-2 R 3.1.2 Thanks V On Tue, Jul 28, 2015 at 10:31 AM, Jombart, Thibaut wrote: > Hi there, > > this is easy to do since adegenet 2.0.0. You can use x[loc=...] where ... > is any usual subsetting, which will be matched against locNames(x). You can > use it to retain relevant loci, or discard the unwanted ones. > > For instance: > > > library(adegenet) > Loading required package: ade4 > > /// adegenet 2.0.0 is loaded //////////// > > > overview: '?adegenet' > > tutorials/doc/questions: 'adegenetWeb()' > > bug reports/feature resquests: adegenetIssues() > > > > data(nancycats) > > > *> locNames(nancycats) [1] "fca8" "fca23" "fca43" "fca45" "fca77" "fca78" > "fca90" "fca96" "fca37" * > *## removing loci 2, 4 and 5* > > toRemove= c(2,4,5) > > x=nancycats[loc=-toRemove] > > x > /// GENIND OBJECT ///////// > > // 237 individuals; 6 loci; 76 alleles; size: 102.9 Kb > > // Basic content > @tab: 237 x 76 matrix of allele counts > @loc.n.all: number of alleles per locus (range: 8-18) > @loc.fac: locus factor for the 76 columns of @tab > @all.names: list of allele names for each locus > @ploidy: ploidy of each individual (range: 2-2) > @type: codom > @call: .local(x = x, i = i, j = j, loc = ..1, drop = drop) > > // Optional content > @pop: population of each individual (group size range: 9-23) > @other: a list containing: xy > > *## note the difference from locNames(nancycats):* > *> locNames(x)* > *[1] "fca8" "fca43" "fca78" "fca90" "fca96" "fca37"* > > > Cheers > Thibaut > > > > ============================== > Dr Thibaut Jombart > MRC Centre for Outbreak Analysis and Modelling > Department of Infectious Disease Epidemiology > Imperial College - School of Public Health > Norfolk Place, London W2 1PG, UK > Tel. : 0044 (0)20 7594 3658 > http://sites.google.com/site/thibautjombart/ > http://sites.google.com/site/therepiproject/ > http://adegenet.r-forge.r-project.org/ > Twitter: @thibautjombart > > > ------------------------------ > *From:* adegenet-forum-bounces at lists.r-forge.r-project.org [ > adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Ana > Verissimo [verissimoac at gmail.com] > *Sent:* 20 July 2015 15:58 > *To:* adegenet-forum at lists.r-forge.r-project.org > *Subject:* [adegenet-forum] How can I filter out loci out of HWE from a > genind object? > > Hi all, > > I have a data set comprised on 1000s of SNP loci genotyped for 10s of > individuals. I have tested loci for conformance to HWE and want to filter > out those loci out of HWE. > > Is there a function already available to perform this filtering in > genind objects? > If not, I am guessing I may need to figure out a way to subset my data > using a list of loci that I want to keep? > > Any tips or suggestions are welcome! > Ana > > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum > -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.jombart at imperial.ac.uk Tue Jul 28 17:08:11 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Tue, 28 Jul 2015 15:08:11 +0000 Subject: [adegenet-forum] How can I filter out loci out of HWE from a genind object? In-Reply-To: References: <2CB2DA8E426F3541AB1907F98ABA6570ABF5AE44@icexch-m1.ic.ac.uk>, Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABF5AEEE@icexch-m1.ic.ac.uk> Hmmm... weird. Method #2 will not work, but #1 should. Can you try this and tell us what you get: library(adegenet) data(nancycats) nancycats[loc=-1] One possibility is that your adegenet 2.0.0 was the old from github. So maybe trying again after install.packages("adegenet") would be worth a try? Cheers Thibaut ________________________________ From: Vikram Chhatre [crypticlineage at gmail.com] Sent: 28 July 2015 15:52 To: adegenet-forum at lists.r-forge.r-project.org Cc: Jombart, Thibaut Subject: Re: [adegenet-forum] How can I filter out loci out of HWE from a genind object? Hi Thibaut, This does not seem to work. Perhaps I am doing something wrong. Here is an example: head(locNames(mygenind)) [1] "loc_121" "loc_229" "loc_472" "loc510" "loc688" ## Method 1 rmloci <- c(1,3) mygenind2 <- mygenind[loci=-rmloci] Warning message: In .local(x, i, j, ..., drop = drop) : the following specified loci do not exist: -1the following specified loci do not exist: -3 ## Method 2 rmloci2 <- c("loc_121","loc_472") mygenind2 <- mygenind[loci=-rmloci2] Error in -rmloci : invalid argument to unary operator sessionInfo() adegenet_2.0.0 ade4_1.6-2 R 3.1.2 Thanks V On Tue, Jul 28, 2015 at 10:31 AM, Jombart, Thibaut > wrote: Hi there, this is easy to do since adegenet 2.0.0. You can use x[loc=...] where ... is any usual subsetting, which will be matched against locNames(x). You can use it to retain relevant loci, or discard the unwanted ones. For instance: > library(adegenet) Loading required package: ade4 /// adegenet 2.0.0 is loaded //////////// > overview: '?adegenet' > tutorials/doc/questions: 'adegenetWeb()' > bug reports/feature resquests: adegenetIssues() > data(nancycats) > locNames(nancycats) [1] "fca8" "fca23" "fca43" "fca45" "fca77" "fca78" "fca90" "fca96" "fca37" ## removing loci 2, 4 and 5 > toRemove= c(2,4,5) > x=nancycats[loc=-toRemove] > x /// GENIND OBJECT ///////// // 237 individuals; 6 loci; 76 alleles; size: 102.9 Kb // Basic content @tab: 237 x 76 matrix of allele counts @loc.n.all: number of alleles per locus (range: 8-18) @loc.fac: locus factor for the 76 columns of @tab @all.names: list of allele names for each locus @ploidy: ploidy of each individual (range: 2-2) @type: codom @call: .local(x = x, i = i, j = j, loc = ..1, drop = drop) // Optional content @pop: population of each individual (group size range: 9-23) @other: a list containing: xy ## note the difference from locNames(nancycats): > locNames(x) [1] "fca8" "fca43" "fca78" "fca90" "fca96" "fca37" Cheers Thibaut ============================== Dr Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Department of Infectious Disease Epidemiology Imperial College - School of Public Health Norfolk Place, London W2 1PG, UK Tel. : 0044 (0)20 7594 3658 http://sites.google.com/site/thibautjombart/ http://sites.google.com/site/therepiproject/ http://adegenet.r-forge.r-project.org/ Twitter: @thibautjombart ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Ana Verissimo [verissimoac at gmail.com] Sent: 20 July 2015 15:58 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] How can I filter out loci out of HWE from a genind object? Hi all, I have a data set comprised on 1000s of SNP loci genotyped for 10s of individuals. I have tested loci for conformance to HWE and want to filter out those loci out of HWE. Is there a function already available to perform this filtering in genind objects? If not, I am guessing I may need to figure out a way to subset my data using a list of loci that I want to keep? Any tips or suggestions are welcome! Ana _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum -------------- next part -------------- An HTML attachment was scrubbed... URL: From crypticlineage at gmail.com Tue Jul 28 17:17:24 2015 From: crypticlineage at gmail.com (Vikram Chhatre) Date: Tue, 28 Jul 2015 11:17:24 -0400 Subject: [adegenet-forum] How can I filter out loci out of HWE from a genind object? In-Reply-To: <2CB2DA8E426F3541AB1907F98ABA6570ABF5AEEE@icexch-m1.ic.ac.uk> References: <2CB2DA8E426F3541AB1907F98ABA6570ABF5AE44@icexch-m1.ic.ac.uk> <2CB2DA8E426F3541AB1907F98ABA6570ABF5AEEE@icexch-m1.ic.ac.uk> Message-ID: Indeed. I was using the github version of adegenet 2.0.0. After reinstalling the package directly, method# 1 is now working. Thanks V On Tue, Jul 28, 2015 at 11:08 AM, Jombart, Thibaut wrote: > Hmmm... weird. > > Method #2 will not work, but #1 should. Can you try this and tell us what > you get: > > library(adegenet) > data(nancycats) > nancycats[loc=-1] > > One possibility is that your adegenet 2.0.0 was the old from github. So > maybe trying again after > > install.packages("adegenet") > > would be worth a try? > Cheers > Thibaut > > ------------------------------ > *From:* Vikram Chhatre [crypticlineage at gmail.com] > *Sent:* 28 July 2015 15:52 > *To:* adegenet-forum at lists.r-forge.r-project.org > *Cc:* Jombart, Thibaut > *Subject:* Re: [adegenet-forum] How can I filter out loci out of HWE from > a genind object? > > Hi Thibaut, > > This does not seem to work. Perhaps I am doing something wrong. Here is > an example: > > head(locNames(mygenind)) > [1] "loc_121" "loc_229" "loc_472" "loc510" "loc688" > > ## Method 1 > rmloci <- c(1,3) > mygenind2 <- mygenind[loci=-rmloci] > Warning message: > In .local(x, i, j, ..., drop = drop) : > the following specified loci do not exist: -1the following specified > loci do not exist: -3 > > ## Method 2 > rmloci2 <- c("loc_121","loc_472") > mygenind2 <- mygenind[loci=-rmloci2] > Error in -rmloci : invalid argument to unary operator > > sessionInfo() > adegenet_2.0.0 > ade4_1.6-2 > R 3.1.2 > > Thanks > V > > > > On Tue, Jul 28, 2015 at 10:31 AM, Jombart, Thibaut < > t.jombart at imperial.ac.uk> wrote: > >> Hi there, >> >> this is easy to do since adegenet 2.0.0. You can use x[loc=...] where ... >> is any usual subsetting, which will be matched against locNames(x). You can >> use it to retain relevant loci, or discard the unwanted ones. >> >> For instance: >> >> > library(adegenet) >> Loading required package: ade4 >> >> /// adegenet 2.0.0 is loaded //////////// >> >> > overview: '?adegenet' >> > tutorials/doc/questions: 'adegenetWeb()' >> > bug reports/feature resquests: adegenetIssues() >> >> >> > data(nancycats) >> >> >> *> locNames(nancycats) [1] "fca8" "fca23" "fca43" "fca45" "fca77" >> "fca78" "fca90" "fca96" "fca37" * >> *## removing loci 2, 4 and 5* >> > toRemove= c(2,4,5) >> > x=nancycats[loc=-toRemove] >> > x >> /// GENIND OBJECT ///////// >> >> // 237 individuals; 6 loci; 76 alleles; size: 102.9 Kb >> >> // Basic content >> @tab: 237 x 76 matrix of allele counts >> @loc.n.all: number of alleles per locus (range: 8-18) >> @loc.fac: locus factor for the 76 columns of @tab >> @all.names: list of allele names for each locus >> @ploidy: ploidy of each individual (range: 2-2) >> @type: codom >> @call: .local(x = x, i = i, j = j, loc = ..1, drop = drop) >> >> // Optional content >> @pop: population of each individual (group size range: 9-23) >> @other: a list containing: xy >> >> *## note the difference from locNames(nancycats):* >> *> locNames(x)* >> *[1] "fca8" "fca43" "fca78" "fca90" "fca96" "fca37"* >> >> >> Cheers >> Thibaut >> >> >> >> ============================== >> Dr Thibaut Jombart >> MRC Centre for Outbreak Analysis and Modelling >> Department of Infectious Disease Epidemiology >> Imperial College - School of Public Health >> Norfolk Place, London W2 1PG, UK >> Tel. : 0044 (0)20 7594 3658 >> http://sites.google.com/site/thibautjombart/ >> http://sites.google.com/site/therepiproject/ >> http://adegenet.r-forge.r-project.org/ >> Twitter: @thibautjombart >> >> >> ------------------------------ >> *From:* adegenet-forum-bounces at lists.r-forge.r-project.org [ >> adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Ana >> Verissimo [verissimoac at gmail.com] >> *Sent:* 20 July 2015 15:58 >> *To:* adegenet-forum at lists.r-forge.r-project.org >> *Subject:* [adegenet-forum] How can I filter out loci out of HWE from a >> genind object? >> >> Hi all, >> >> I have a data set comprised on 1000s of SNP loci genotyped for 10s of >> individuals. I have tested loci for conformance to HWE and want to filter >> out those loci out of HWE. >> >> Is there a function already available to perform this filtering in >> genind objects? >> If not, I am guessing I may need to figure out a way to subset my data >> using a list of loci that I want to keep? >> >> Any tips or suggestions are welcome! >> Ana >> >> >> _______________________________________________ >> adegenet-forum mailing list >> adegenet-forum at lists.r-forge.r-project.org >> >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.jombart at imperial.ac.uk Tue Jul 28 17:57:30 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Tue, 28 Jul 2015 15:57:30 +0000 Subject: [adegenet-forum] Problem converting structure file to genind (length of 'dimnames' [2] not equal to array extent) In-Reply-To: References: Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABF5AF91@icexch-m1.ic.ac.uk> Hi Charlotte, thanks for a detailed report. STRUCTURE files are a bit of a pain because of all the possible options, so in general if you can get them in a different format (e.g. GENETIX, Fstat) it will make your life easier. Here, your instruction seems okay given the sample of the file, but the file itself seems wrong. STRUCTURE's doc suggests you should remove 'Ind' and 'Pop' from the first line. http://pritchardlab.stanford.edu/structure_software/release_versions/v2.3.4/structure_doc.pdf It works on the sample provided. Was this file generated by STRUCTURE directly? If the STRUCTURE standards have changed, please fill in an issue and we'll adapt read.structure: https://github.com/thibautjombart/adegenet/issues Best Thibaut ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Charlotte Hurry [charlotte.hurry at griffithuni.edu.au] Sent: 24 July 2015 05:08 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] Problem converting structure file to genind (length of 'dimnames' [2] not equal to array extent) Hello I have been trying to convert my structure file to a genind object but I am having some issues. I am getting this error and I can't find an appropriate fix. I have looked through forum history and found a similar issue but the fix suggested didn't work for me. my file looks like this: Ind Pop tb109 tb117 tb129 tb130 tb136 tb187 tb188 GM4_283 1 118 108 132 125 122 168 159 GM4_283 1 120 112 138 129 122 176 168 GM336 1 114 112 128 123 124 168 0 GM336 1 116 112 136 123 128 176 0 GM337 1 0 110 126 123 126 168 159 GM337 1 0 112 126 123 126 176 174 E276 2 118 108 0 0 122 166 165 E276 2 118 112 0 0 122 176 190 E277 2 116 108 126 119 124 168 159 E277 2 118 112 138 129 128 176 174 E278 2 114 108 124 121 122 168 162 E278 2 114 112 126 125 128 176 180 I used the script > tems<-read.structure("TBtest1.str",row.marknames=1,onerowperind=FALSE, n.ind=190, n.loc=7, col.lab=1, col.pop=2, ask=TRUE) Which brings up the following message, and I press RETURN: Which other optional columns should be read (press 'return' when done)? 1: Converting data from a STRUCTURE .stru file to a genind object... Then get this error: Error in `colnames<-`(`*tmp*`, value = c("Ind", "Pop", "tb109", "tb117", : length of 'dimnames' [2] not equal to array extent Has anybody got a fix to get my original STRUCTURE file to work as a Genind object.? As an aside, on the forum I found a work around under the title "Trouble converting to genid object", which converts a read.table document into genind which I followed and I got this error: Warning message: In .local(.Object, ...) : NAs introduced by coercion and this file: > head(as.matrix(obj1)) V1 V2 V3 V4 V5 V6 V7 V8 V9 001 NA NA NA NA NA NA NA NA NA 002 NA 1 118 108 132 125 122 168 159 003 NA 1 120 112 138 129 122 176 168 004 NA 1 114 112 128 123 124 168 0 005 NA 1 116 112 136 123 128 176 0 006 NA 1 0 110 126 123 126 168 159 So i would like a different solution to my problem if possible. Many thanks Charlotte Charlotte -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlotte.hurry at griffithuni.edu.au Wed Jul 29 09:51:26 2015 From: charlotte.hurry at griffithuni.edu.au (Charlotte Hurry) Date: Wed, 29 Jul 2015 17:51:26 +1000 Subject: [adegenet-forum] Problem converting structure file to genind (length of 'dimnames' [2] not equal to array extent) In-Reply-To: <2CB2DA8E426F3541AB1907F98ABA6570ABF5AF91@icexch-m1.ic.ac.uk> References: <2CB2DA8E426F3541AB1907F98ABA6570ABF5AF91@icexch-m1.ic.ac.uk> Message-ID: Hello again I hope you don't mind but I'd like to double check my file conversion to genind. I understand that missing values are transformed to NA but I don't understand why I have NA values at all loci for some individuals. I have read the information online regarding Genind objects but nothing that explains the conversion of allele data to the values in the Genind object. For instance my genind object is as follows: > tab(obj[loc=1]) tb109.118 tb109.120 tb109.114 tb109.116 tb109.112 tb109.122 tb109.108 GM4_283 1 1 0 0 0 0 0 GM336 0 0 1 1 0 0 0 GM337 NA NA NA NA NA NA NA E276 2 0 0 0 0 0 0 E277 1 0 0 1 0 0 0 E278 0 0 2 0 0 0 0 Whereas my original data file is this: tb109 tb117 tb129 tb130 tb136 tb187 tb188 GM4_283 1 118 108 132 125 122 168 159 GM4_283 1 120 112 138 129 122 176 168 GM336 1 114 112 128 123 124 168 -9 GM336 1 116 112 136 123 128 176 -9 GM337 1 -9 110 126 123 126 168 159 GM337 1 -9 112 126 123 126 176 174 E276 2 118 108 -9 -9 122 166 165 E276 2 118 112 -9 -9 122 176 190 E277 2 116 108 126 119 124 168 159 E277 2 118 112 138 129 128 176 174 What is the mechanism that forces all values of GM336 to be NA values in the Genind, and is this correct? Thank you once again. Charlotte On 29 July 2015 at 01:57, Jombart, Thibaut wrote: > Hi Charlotte, > > thanks for a detailed report. > > STRUCTURE files are a bit of a pain because of all the possible options, > so in general if you can get them in a different format (e.g. GENETIX, > Fstat) it will make your life easier. > > Here, your instruction seems okay given the sample of the file, but the > file itself seems wrong. STRUCTURE's doc suggests you should remove 'Ind' > and 'Pop' from the first line. > > http://pritchardlab.stanford.edu/structure_software/release_versions/v2.3.4/structure_doc.pdf > > It works on the sample provided. > > Was this file generated by STRUCTURE directly? > > If the STRUCTURE standards have changed, please fill in an issue and we'll > adapt read.structure: > https://github.com/thibautjombart/adegenet/issues > > Best > Thibaut > > ------------------------------ > *From:* adegenet-forum-bounces at lists.r-forge.r-project.org [ > adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of > Charlotte Hurry [charlotte.hurry at griffithuni.edu.au] > *Sent:* 24 July 2015 05:08 > *To:* adegenet-forum at lists.r-forge.r-project.org > *Subject:* [adegenet-forum] Problem converting structure file to genind > (length of 'dimnames' [2] not equal to array extent) > > Hello > I have been trying to convert my structure file to a genind object but > I am having some issues. I am getting this error and I can't find an > appropriate fix. I have looked through forum history and found a similar > issue but the fix suggested didn't work for me. > > my file looks like this: > Ind Pop tb109 tb117 tb129 tb130 tb136 tb187 tb188 > GM4_283 1 118 108 132 125 122 168 159 > GM4_283 1 120 112 138 129 122 176 168 > GM336 1 114 112 128 123 124 168 0 > GM336 1 116 112 136 123 128 176 0 > GM337 1 0 110 126 123 126 168 159 > GM337 1 0 112 126 123 126 176 174 > E276 2 118 108 0 0 122 166 165 > E276 2 118 112 0 0 122 176 190 > E277 2 116 108 126 119 124 168 159 > E277 2 118 112 138 129 128 176 174 > E278 2 114 108 124 121 122 168 162 > E278 2 114 112 126 125 128 176 180 > > I used the script > *> tems<-read.structure("TBtest1.str",row.marknames=1,onerowperind=FALSE, > n.ind=190, n.loc=7, col.lab=1, col.pop=2, ask=TRUE)* > > Which brings up the following message, and I press RETURN: > > > * Which other optional columns should be read (press 'return' when done)? > 1: Converting data from a STRUCTURE .stru file to a genind object... * > > Then get this error: > > > > *Error in `colnames<-`(`*tmp*`, value = c("Ind", "Pop", "tb109", "tb117", > : length of 'dimnames' [2] not equal to array extent * > Has anybody got a fix to get my original STRUCTURE file to work as a > Genind object.? > > As an aside, on the forum I found a work around under the title "Trouble > converting to genid object", which converts a read.table document into > genind which I followed and I got this error: > Warning message: > In .local(.Object, ...) : NAs introduced by coercion > and this file: > > head(as.matrix(obj1)) > V1 V2 V3 V4 V5 V6 V7 V8 V9 > 001 NA NA NA NA NA NA NA NA NA > 002 NA 1 118 108 132 125 122 168 159 > 003 NA 1 120 112 138 129 122 176 168 > 004 NA 1 114 112 128 123 124 168 0 > 005 NA 1 116 112 136 123 128 176 0 > 006 NA 1 0 110 126 123 126 168 159 > > So i would like a different solution to my problem if possible. > Many thanks > Charlotte > > > Charlotte > > -- Charlotte Hurry PhD Candidate Australian Rivers Institute Griffith University 07 37356655 http://www.griffith.edu.au/environment-planning-architecture/australian-rivers-institute/hdr-students/charlotte-hurry -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.jombart at imperial.ac.uk Wed Jul 29 11:27:13 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Wed, 29 Jul 2015 09:27:13 +0000 Subject: [adegenet-forum] Problem converting structure file to genind (length of 'dimnames' [2] not equal to array extent) In-Reply-To: References: <2CB2DA8E426F3541AB1907F98ABA6570ABF5AF91@icexch-m1.ic.ac.uk>, Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABF5B0C3@icexch-m1.ic.ac.uk> Hi Charlotte, the conversion looks OK to me. In genind, data are recoded as alleles counts for each [locus.allele]. So if tb109 is "-9" (=missing), then all the alleles for tb109 are 'NA'. If you want to double check that conversion worked, it is probably best to compare your input to genind2df(x) where 'x' is your converted genind. Best Thibaut ________________________________ From: Charlotte Hurry [charlotte.hurry at griffithuni.edu.au] Sent: 29 July 2015 08:51 To: Jombart, Thibaut Cc: adegenet-forum at lists.r-forge.r-project.org Subject: Re: [adegenet-forum] Problem converting structure file to genind (length of 'dimnames' [2] not equal to array extent) Hello again I hope you don't mind but I'd like to double check my file conversion to genind. I understand that missing values are transformed to NA but I don't understand why I have NA values at all loci for some individuals. I have read the information online regarding Genind objects but nothing that explains the conversion of allele data to the values in the Genind object. For instance my genind object is as follows: > tab(obj[loc=1]) tb109.118 tb109.120 tb109.114 tb109.116 tb109.112 tb109.122 tb109.108 GM4_283 1 1 0 0 0 0 0 GM336 0 0 1 1 0 0 0 GM337 NA NA NA NA NA NA NA E276 2 0 0 0 0 0 0 E277 1 0 0 1 0 0 0 E278 0 0 2 0 0 0 0 Whereas my original data file is this: tb109 tb117 tb129 tb130 tb136 tb187 tb188 GM4_283 1 118 108 132 125 122 168 159 GM4_283 1 120 112 138 129 122 176 168 GM336 1 114 112 128 123 124 168 -9 GM336 1 116 112 136 123 128 176 -9 GM337 1 -9 110 126 123 126 168 159 GM337 1 -9 112 126 123 126 176 174 E276 2 118 108 -9 -9 122 166 165 E276 2 118 112 -9 -9 122 176 190 E277 2 116 108 126 119 124 168 159 E277 2 118 112 138 129 128 176 174 What is the mechanism that forces all values of GM336 to be NA values in the Genind, and is this correct? Thank you once again. Charlotte On 29 July 2015 at 01:57, Jombart, Thibaut > wrote: Hi Charlotte, thanks for a detailed report. STRUCTURE files are a bit of a pain because of all the possible options, so in general if you can get them in a different format (e.g. GENETIX, Fstat) it will make your life easier. Here, your instruction seems okay given the sample of the file, but the file itself seems wrong. STRUCTURE's doc suggests you should remove 'Ind' and 'Pop' from the first line. http://pritchardlab.stanford.edu/structure_software/release_versions/v2.3.4/structure_doc.pdf It works on the sample provided. Was this file generated by STRUCTURE directly? If the STRUCTURE standards have changed, please fill in an issue and we'll adapt read.structure: https://github.com/thibautjombart/adegenet/issues Best Thibaut ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Charlotte Hurry [charlotte.hurry at griffithuni.edu.au] Sent: 24 July 2015 05:08 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] Problem converting structure file to genind (length of 'dimnames' [2] not equal to array extent) Hello I have been trying to convert my structure file to a genind object but I am having some issues. I am getting this error and I can't find an appropriate fix. I have looked through forum history and found a similar issue but the fix suggested didn't work for me. my file looks like this: Ind Pop tb109 tb117 tb129 tb130 tb136 tb187 tb188 GM4_283 1 118 108 132 125 122 168 159 GM4_283 1 120 112 138 129 122 176 168 GM336 1 114 112 128 123 124 168 0 GM336 1 116 112 136 123 128 176 0 GM337 1 0 110 126 123 126 168 159 GM337 1 0 112 126 123 126 176 174 E276 2 118 108 0 0 122 166 165 E276 2 118 112 0 0 122 176 190 E277 2 116 108 126 119 124 168 159 E277 2 118 112 138 129 128 176 174 E278 2 114 108 124 121 122 168 162 E278 2 114 112 126 125 128 176 180 I used the script > tems<-read.structure("TBtest1.str",row.marknames=1,onerowperind=FALSE, n.ind=190, n.loc=7, col.lab=1, col.pop=2, ask=TRUE) Which brings up the following message, and I press RETURN: Which other optional columns should be read (press 'return' when done)? 1: Converting data from a STRUCTURE .stru file to a genind object... Then get this error: Error in `colnames<-`(`*tmp*`, value = c("Ind", "Pop", "tb109", "tb117", : length of 'dimnames' [2] not equal to array extent Has anybody got a fix to get my original STRUCTURE file to work as a Genind object.? As an aside, on the forum I found a work around under the title "Trouble converting to genid object", which converts a read.table document into genind which I followed and I got this error: Warning message: In .local(.Object, ...) : NAs introduced by coercion and this file: > head(as.matrix(obj1)) V1 V2 V3 V4 V5 V6 V7 V8 V9 001 NA NA NA NA NA NA NA NA NA 002 NA 1 118 108 132 125 122 168 159 003 NA 1 120 112 138 129 122 176 168 004 NA 1 114 112 128 123 124 168 0 005 NA 1 116 112 136 123 128 176 0 006 NA 1 0 110 126 123 126 168 159 So i would like a different solution to my problem if possible. Many thanks Charlotte Charlotte -- Charlotte Hurry PhD Candidate Australian Rivers Institute Griffith University 07 37356655 http://www.griffith.edu.au/environment-planning-architecture/australian-rivers-institute/hdr-students/charlotte-hurry -------------- next part -------------- An HTML attachment was scrubbed... URL: From jafdinizfilho at gmail.com Wed Jul 29 22:09:11 2015 From: jafdinizfilho at gmail.com (=?UTF-8?Q?Jos=C3=A9_Alexandre_Felizola_Diniz_Filho?=) Date: Wed, 29 Jul 2015 17:09:11 -0300 Subject: [adegenet-forum] pairwise Fst? Message-ID: Dear Thibaut I'm using adegenet for sometime now, great package!!! Thanks for working so hard on this! I just got a message from a coleague that works with me and uploaded the most recent version (july 2015) and indeed I see that the function pairwise.fst just disapeared. I couldn't find it inside any other function, just the standard pairwise distances (Nei, Rogers, and so on). Can you help us on this? Any clues? Cheers, Alexandre (Diniz Filho) -- Prof. Dr. Jos? Alexandre Felizola Diniz Filho CNPq PQ1A, FLS, Academia Brasileira de Ci?ncias Pr?-Reitor de P?s-Gradua??o Universidade Federal de Goi?s & Depto. de Ecologia/UFG Programa de P?s-Gradua??o em Ecologia & Evolu??o / Gen?tica & Biologia Molecular UFG -------------- next part -------------- An HTML attachment was scrubbed... URL: From maierpa at gmail.com Thu Jul 30 07:34:30 2015 From: maierpa at gmail.com (Paul Maier) Date: Wed, 29 Jul 2015 22:34:30 -0700 Subject: [adegenet-forum] read.genepop (adegenet 2.0.0 with R v. 3.2.1) In-Reply-To: <2CB2DA8E426F3541AB1907F98ABA6570ABF4B4CC@icexch-m1.ic.ac.uk> References: <52BD95B9-0483-4512-BF6D-FA914F4D48E8@gmail.com> <2CB2DA8E426F3541AB1907F98ABA6570ABF4B48A@icexch-m1.ic.ac.uk> <2CB2DA8E426F3541AB1907F98ABA6570ABF4B4CC@icexch-m1.ic.ac.uk> Message-ID: FYI - this is fixed in the tab function itself, but seems not to update the rest of the genind object when applied to genind at tab. So for example, if you import to genind, use tab(x, NA.method="mean") on the x at tab, then export using genind2df, it will fail. Also, did the old NA.method="mean" replace missing values with median alleles? This v gives a mean, which is not ideal if other programs are expecting integers. I'm still using my above code as a workaround. ---------------------------------------------- Paul Maier San Diego State, PhD Student US Geological Survey, Biologist The Biodiversity Group, Science Advisor On Thu, Jul 16, 2015 at 4:25 AM, Jombart, Thibaut wrote: > Fixed now: > https://github.com/thibautjombart/adegenet/issues/71#issuecomment-121790358 > > And readily available in the devel version: > > install.packages("devtools") > library(devtools) > install_github("thibautjombart/adegenet") > library("adegenet") > > Cheers > Thibaut > > > > > ________________________________________ > From: adegenet-forum-bounces at lists.r-forge.r-project.org [ > adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Jombart, > Thibaut [t.jombart at imperial.ac.uk] > Sent: 16 July 2015 11:50 > To: Zhian Kamvar; adegenet-forum at lists.r-forge.r-project.org > Subject: Re: [adegenet-forum] read.genepop (adegenet 2.0.0 with R v. 3.2.1) > > Looks like a bug indeed. Thanks for spotting it. Will fix today. > > Cheers > Thibaut > > > ________________________________________ > From: adegenet-forum-bounces at lists.r-forge.r-project.org [ > adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Zhian > Kamvar [zkamvar at gmail.com] > Sent: 16 July 2015 01:35 > To: adegenet-forum at lists.r-forge.r-project.org > Subject: Re: [adegenet-forum] read.genepop (adegenet 2.0.0 with R v. 3.2.1) > > This smells like a bug. After poking around some, it is indeed one in > read.fstat and read.genepop. (Both read.genetix and read.structure still > work): > > > obj <- > read.genepop(system.file("files/nancycats.gen",package="adegenet")) > > Converting data from a Genepop .gen file to a genind object... > > > File description: Genotypes of cats from 17 colonies of Nancy (France) > > ...done. > > > obj > /// GENIND OBJECT ///////// > > // 237 individuals; 9 loci; 111 alleles; size: 138.5 Kb > > // Basic content > @tab: 237 x 111 matrix of allele counts > @loc.n.all: number of alleles per locus (range: 8-18) > @loc.fac: locus factor for the 111 columns of @tab > @all.names: list of allele names for each locus > @ploidy: ploidy of each individual (range: 2-2) > @type: codom > @call: read.genepop(file = system.file("files/nancycats.gen", package = > "adegenet")) > > // Optional content > @pop: population of each individual (group size range: 9-23) > > summary(obj) > > # Total number of genotypes: 237 > > # Population sample sizes: > 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 > 10 22 12 23 15 11 14 10 9 11 20 14 13 17 11 12 13 > > # Number of alleles per locus: > fca8 fca23 fca43 fca45 fca77 fca78 fca90 fca96 fca37 > 17 11 10 10 12 8 12 13 18 > > # Number of alleles per population: > 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 > 37 53 50 67 48 56 43 54 43 46 73 53 44 62 42 40 37 > > # Percentage of missing data: > [1] 0 > > # Observed heterozygosity: > fca8 fca23 fca43 fca45 fca77 fca78 fca90 > fca96 fca37 > 0.6118143 0.6666667 0.6793249 0.6455696 0.6329114 0.5654008 0.6497890 > 0.5949367 0.4514768 > > # Expected heterozygosity: > fca8 fca23 fca43 fca45 fca77 fca78 fca90 > fca96 fca37 > 0.8803076 0.7928751 0.7953319 0.7930531 0.8702576 0.6884669 0.8157881 > 0.7767630 0.6062686 > > This will be reported and fixed. > > Cheers, > Zhian > > > On Jul 15, 2015, at 11:16 , > adegenet-forum-request at lists.r-forge.r-project.org wrote: > > > > On closer inspection, it appears the new version stores missing data as > > alleles (i.e. *.00 in @tab). So using tab to replace the allele counts > > doesn't work. For example, x at tab <- tab(x, NA.method="mean") does > nothing > > because missing data is stored as normal data. Here's a workaround I > > created, although probably not the most clever method, it fixed my > problem. > > Hopefully this helps someone! > > Paul > > > > # Fix missing values to reflect depracated option, missing = "mean" > > x at tab <- x at tab[,-grep("\\.00",colnames(x at tab))] #remove "00" alleles > > rep <- gsub("([^\\.]+)\\.\\d+","\\1",colnames(x at tab)) #locus names > > loci <- unique(x at loc.fac) #unique locus names > > x at loc.fac <- as.factor(rep) > > for (i in 1:length(x at all.names)) > > if ("00" %in% x at all.names[[i]]) #remove "00" from allele names > > x at all.names[[i]] <- x at all.names[[i]][-which(x at all.names[[i]]=="00")] > > for (i in 1:length(x at loc.n.all)) #remove "00" from allele counts > > x at loc.n.all[[i]] <- length(x at all.names[[i]]) > > for (i in 1:length(loci)) { #replace missing data with mean allele counts > > df <- data.frame(x at tab[,which(loci[i] == rep)]) #df, alleles for one > locus > > for (j in 1:nrow(df)) { > > if (sum(df[j,]) == 0) { > > for (k in 1:length(df[j,])) { #mean allele counts from rows with > data > > df[j,k] <- round(mean( df[which(apply(df,1,sum) != 0),k] )) > > } > > x at tab[j,which(loci[i] == rep)] <- as.numeric(df[j,]) > > } > > } > > } > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum > -------------- next part -------------- An HTML attachment was scrubbed... URL: From simon.crameri at env.ethz.ch Thu Jul 30 11:52:29 2015 From: simon.crameri at env.ethz.ch (Crameri Simon) Date: Thu, 30 Jul 2015 09:52:29 +0000 Subject: [adegenet-forum] error in find.clusters hierarchically In-Reply-To: References: <9D61E83C-092E-402D-B4D0-BE066EE3E4AA@ethz.ch> Message-ID: <0DA3F8F0-60A5-420C-A4B1-DEF4DA070207@ethz.ch> Hi Thibaut There seems to be a problem with the function find.clusters if applied hierarchically using the argument clust: > test ##################### ### Genind object ### ##################### - genotypes of individuals - S4 class: genind @call: .local(x = x, i = i, j = j, drop = drop) @tab: 396 x 209 matrix of genotypes @ind.names: vector of 396 individual names @loc.names: vector of 20 locus names @loc.nall: number of alleles per locus @loc.fac: locus factor for the 209 columns of @tab @all.names: list of 20 components yielding allele names for each locus @ploidy: 2 @type: codom Optional contents: @pop: - empty - @pop.names: - empty - @other: - empty - > str(clustfac) Factor w/ 4 levels "A","B","C","D": 1 1 1 1 1 1 1 1 1 1 ? This is where the error appears: > hclust <- find.clusters(test, clust=clustfac) Looking for sub-clusters in cluster A Choose the number PCs to retain (>=1): 32 Error: number of cluster centres must lie between 1 and nrow(x) > table(clustfac) clustfac A B C D 33 30 138 195 If not applied hierarchically, find.clusters will take the maximal available n.pca, even if the n.pca argument passed over is greater than nrow(x), right? Here, 33 individuals have factor level "A", therefore 32 is NOT outside 1 and nrow(x). The same error appears when using any number of PCs, say 2, 5, 30 or 400. What went wrong? Any help is greatly appreciated. Best, Simon PhD Candidate Institute of Integrative Biology ETH Zurich Switzerland -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.jombart at imperial.ac.uk Thu Jul 30 12:17:34 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Thu, 30 Jul 2015 10:17:34 +0000 Subject: [adegenet-forum] pairwise Fst? In-Reply-To: References: Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABF5B29B@icexch-m1.ic.ac.uk> Dear Alexandre All Fst functions have been moved to hierfstat (devel version for now) as of adegenet 2.0.0. This is documented in the new "basics" tutorial: https://github.com/thibautjombart/adegenet/wiki/Tutorials (p42 and after) pairwise.fst is documented p44. Best Thibaut ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Jos? Alexandre Felizola Diniz Filho [jafdinizfilho at gmail.com] Sent: 29 July 2015 21:09 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] pairwise Fst? Dear Thibaut I'm using adegenet for sometime now, great package!!! Thanks for working so hard on this! I just got a message from a coleague that works with me and uploaded the most recent version (july 2015) and indeed I see that the function pairwise.fst just disapeared. I couldn't find it inside any other function, just the standard pairwise distances (Nei, Rogers, and so on). Can you help us on this? Any clues? Cheers, Alexandre (Diniz Filho) -- Prof. Dr. Jos? Alexandre Felizola Diniz Filho CNPq PQ1A, FLS, Academia Brasileira de Ci?ncias Pr?-Reitor de P?s-Gradua??o Universidade Federal de Goi?s & Depto. de Ecologia/UFG Programa de P?s-Gradua??o em Ecologia & Evolu??o / Gen?tica & Biologia Molecular UFG -------------- next part -------------- An HTML attachment was scrubbed... URL: From gend at sun.ac.za Thu Jul 30 11:24:28 2015 From: gend at sun.ac.za (Diedericks, G, Me ) Date: Thu, 30 Jul 2015 09:24:28 +0000 Subject: [adegenet-forum] MSPA Message-ID: Good day, I'm trying to run a MSPA for a freshwater fish species, sampled at 10 sites along a river. I have chosen the Delaunay Triangulation, but need to edit the connections as some of the sites are below a dam wall, so the fish cannot move back up the river. Could you please assist me with this? Kind regards, Genevieve ________________________________ Genevieve Diedericks PhD candidate ~ Zoology Centre for Invasion Biology (C.I.B) Department of Botany & Zoology Stellenbosch University South Africa +27 (0) 21 808 4135 -------------- next part -------------- An HTML attachment was scrubbed... URL: From graham9c at gmail.com Wed Jul 29 23:51:37 2015 From: graham9c at gmail.com (Carly Graham) Date: Wed, 29 Jul 2015 15:51:37 -0600 Subject: [adegenet-forum] DAPC Message-ID: Hello, I have been looking at the population structure of whitefish in a small lake area. I used xvalDapc to determine the optimal number of PCs to retain for the dapc analysis. When I look at the output from this xval command I see that the following: $`Median and Confidence Interval for Random Chance` 2.5% 50% 97.5% 0.08354526 0.12267036 0.17258877 $`Mean Successful Assignment by Number of PCs of PCA` 20 40 60 80 100 120 140 160 180 0.1319444 0.1694444 0.1680556 0.1875000 0.2013889 0.1291667 0.1666667 0.1250000 0.1125000 $`Number of PCs Achieving Highest Mean Success` [1] "100" $`Root Mean Squared Error by Number of PCs of PCA` 20 40 60 80 100 120 140 160 180 0.8704578 0.8361065 0.8360719 0.8177360 0.8051124 0.8730468 0.8389395 0.8772458 0.8905042 $`Number of PCs Achieving Lowest MSE` [1] "100" From this I have interpreted that I should use 100 PCs. From there when I run the dapc with 100 PCs and then look at output from summary(dapc) I have an assignment probability of 0.8701923. Where I am confused is how to interpret the ?Mean Successful Assignment? from the above output. Does this also correspond to the assignment to populations? If so, is it more accurate to assume that the assignment probability is 0.2013889? Thanks, Carly Graham PhD Candidate University of Regina -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.jombart at imperial.ac.uk Thu Jul 30 12:31:01 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Thu, 30 Jul 2015 10:31:01 +0000 Subject: [adegenet-forum] read.genepop (adegenet 2.0.0 with R v. 3.2.1) In-Reply-To: References: <52BD95B9-0483-4512-BF6D-FA914F4D48E8@gmail.com> <2CB2DA8E426F3541AB1907F98ABA6570ABF4B48A@icexch-m1.ic.ac.uk> <2CB2DA8E426F3541AB1907F98ABA6570ABF4B4CC@icexch-m1.ic.ac.uk>, Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABF5B2E6@icexch-m1.ic.ac.uk> Hi Paul, yes, I think I wrote this a couple of times already, but possibly in another thread. As of adegenet 2.0.0, missing data are always stored as NA in genind objects. NA replacement will take place when extracting information from the object, typically using tab(...). The user is not supposed to change the content of @tab manually. The suggestion of replacing missing values with a median rather than mean is interesting. If you know about common / useful practices currently not available, feel free to post an issue - this kind feature is quick to add. Quick to do using: adegenetIssues() Best Thibaut ________________________________ From: Paul Maier [maierpa at gmail.com] Sent: 30 July 2015 06:34 To: Jombart, Thibaut Cc: Zhian Kamvar; adegenet-forum at lists.r-forge.r-project.org Subject: Re: [adegenet-forum] read.genepop (adegenet 2.0.0 with R v. 3.2.1) FYI - this is fixed in the tab function itself, but seems not to update the rest of the genind object when applied to genind at tab. So for example, if you import to genind, use tab(x, NA.method="mean") on the x at tab, then export using genind2df, it will fail. Also, did the old NA.method="mean" replace missing values with median alleles? This v gives a mean, which is not ideal if other programs are expecting integers. I'm still using my above code as a workaround. ---------------------------------------------- Paul Maier San Diego State, PhD Student US Geological Survey, Biologist The Biodiversity Group, Science Advisor On Thu, Jul 16, 2015 at 4:25 AM, Jombart, Thibaut > wrote: Fixed now: https://github.com/thibautjombart/adegenet/issues/71#issuecomment-121790358 And readily available in the devel version: install.packages("devtools") library(devtools) install_github("thibautjombart/adegenet") library("adegenet") Cheers Thibaut ________________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Jombart, Thibaut [t.jombart at imperial.ac.uk] Sent: 16 July 2015 11:50 To: Zhian Kamvar; adegenet-forum at lists.r-forge.r-project.org Subject: Re: [adegenet-forum] read.genepop (adegenet 2.0.0 with R v. 3.2.1) Looks like a bug indeed. Thanks for spotting it. Will fix today. Cheers Thibaut ________________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Zhian Kamvar [zkamvar at gmail.com] Sent: 16 July 2015 01:35 To: adegenet-forum at lists.r-forge.r-project.org Subject: Re: [adegenet-forum] read.genepop (adegenet 2.0.0 with R v. 3.2.1) This smells like a bug. After poking around some, it is indeed one in read.fstat and read.genepop. (Both read.genetix and read.structure still work): > obj <- read.genepop(system.file("files/nancycats.gen",package="adegenet")) Converting data from a Genepop .gen file to a genind object... File description: Genotypes of cats from 17 colonies of Nancy (France) ...done. > obj /// GENIND OBJECT ///////// // 237 individuals; 9 loci; 111 alleles; size: 138.5 Kb // Basic content @tab: 237 x 111 matrix of allele counts @loc.n.all: number of alleles per locus (range: 8-18) @loc.fac: locus factor for the 111 columns of @tab @all.names: list of allele names for each locus @ploidy: ploidy of each individual (range: 2-2) @type: codom @call: read.genepop(file = system.file("files/nancycats.gen", package = "adegenet")) // Optional content @pop: population of each individual (group size range: 9-23) > summary(obj) # Total number of genotypes: 237 # Population sample sizes: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 10 22 12 23 15 11 14 10 9 11 20 14 13 17 11 12 13 # Number of alleles per locus: fca8 fca23 fca43 fca45 fca77 fca78 fca90 fca96 fca37 17 11 10 10 12 8 12 13 18 # Number of alleles per population: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 37 53 50 67 48 56 43 54 43 46 73 53 44 62 42 40 37 # Percentage of missing data: [1] 0 # Observed heterozygosity: fca8 fca23 fca43 fca45 fca77 fca78 fca90 fca96 fca37 0.6118143 0.6666667 0.6793249 0.6455696 0.6329114 0.5654008 0.6497890 0.5949367 0.4514768 # Expected heterozygosity: fca8 fca23 fca43 fca45 fca77 fca78 fca90 fca96 fca37 0.8803076 0.7928751 0.7953319 0.7930531 0.8702576 0.6884669 0.8157881 0.7767630 0.6062686 This will be reported and fixed. Cheers, Zhian > On Jul 15, 2015, at 11:16 , adegenet-forum-request at lists.r-forge.r-project.org wrote: > > On closer inspection, it appears the new version stores missing data as > alleles (i.e. *.00 in @tab). So using tab to replace the allele counts > doesn't work. For example, x at tab <- tab(x, NA.method="mean") does nothing > because missing data is stored as normal data. Here's a workaround I > created, although probably not the most clever method, it fixed my problem. > Hopefully this helps someone! > Paul > > # Fix missing values to reflect depracated option, missing = "mean" > x at tab <- x at tab[,-grep("\\.00",colnames(x at tab))] #remove "00" alleles > rep <- gsub("([^\\.]+)\\.\\d+","\\1",colnames(x at tab)) #locus names > loci <- unique(x at loc.fac) #unique locus names > x at loc.fac <- as.factor(rep) > for (i in 1:length(x at all.names)) > if ("00" %in% x at all.names[[i]]) #remove "00" from allele names > x at all.names[[i]] <- x at all.names[[i]][-which(x at all.names[[i]]=="00")] > for (i in 1:length(x at loc.n.all)) #remove "00" from allele counts > x at loc.n.all[[i]] <- length(x at all.names[[i]]) > for (i in 1:length(loci)) { #replace missing data with mean allele counts > df <- data.frame(x at tab[,which(loci[i] == rep)]) #df, alleles for one locus > for (j in 1:nrow(df)) { > if (sum(df[j,]) == 0) { > for (k in 1:length(df[j,])) { #mean allele counts from rows with data > df[j,k] <- round(mean( df[which(apply(df,1,sum) != 0),k] )) > } > x at tab[j,which(loci[i] == rep)] <- as.numeric(df[j,]) > } > } > } _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.jombart at imperial.ac.uk Thu Jul 30 12:33:01 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Thu, 30 Jul 2015 10:33:01 +0000 Subject: [adegenet-forum] error in find.clusters hierarchically In-Reply-To: <0DA3F8F0-60A5-420C-A4B1-DEF4DA070207@ethz.ch> References: <9D61E83C-092E-402D-B4D0-BE066EE3E4AA@ethz.ch> , <0DA3F8F0-60A5-420C-A4B1-DEF4DA070207@ethz.ch> Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABF5B2FF@icexch-m1.ic.ac.uk> Hi Simon, by the look of your genind object, you are using an old version of adegenet. Can you update and check? This is an old feature which probably has not been used much. We should think about how to integrate this with the new strata support in adegenet 2.0.0. Cheers Thibaut ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Crameri Simon [simon.crameri at env.ethz.ch] Sent: 30 July 2015 10:52 To: Subject: [adegenet-forum] error in find.clusters hierarchically Hi Thibaut There seems to be a problem with the function find.clusters if applied hierarchically using the argument clust: > test ##################### ### Genind object ### ##################### - genotypes of individuals - S4 class: genind @call: .local(x = x, i = i, j = j, drop = drop) @tab: 396 x 209 matrix of genotypes @ind.names: vector of 396 individual names @loc.names: vector of 20 locus names @loc.nall: number of alleles per locus @loc.fac: locus factor for the 209 columns of @tab @all.names: list of 20 components yielding allele names for each locus @ploidy: 2 @type: codom Optional contents: @pop: - empty - @pop.names: - empty - @other: - empty - > str(clustfac) Factor w/ 4 levels "A","B","C","D": 1 1 1 1 1 1 1 1 1 1 ? This is where the error appears: > hclust <- find.clusters(test, clust=clustfac) Looking for sub-clusters in cluster A Choose the number PCs to retain (>=1): 32 Error: number of cluster centres must lie between 1 and nrow(x) > table(clustfac) clustfac A B C D 33 30 138 195 If not applied hierarchically, find.clusters will take the maximal available n.pca, even if the n.pca argument passed over is greater than nrow(x), right? Here, 33 individuals have factor level "A", therefore 32 is NOT outside 1 and nrow(x). The same error appears when using any number of PCs, say 2, 5, 30 or 400. What went wrong? Any help is greatly appreciated. Best, Simon PhD Candidate Institute of Integrative Biology ETH Zurich Switzerland -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.jombart at imperial.ac.uk Thu Jul 30 12:37:47 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Thu, 30 Jul 2015 10:37:47 +0000 Subject: [adegenet-forum] MSPA In-Reply-To: References: Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABF5B34D@icexch-m1.ic.ac.uk> Hi there, if I remember well, chooseCN has an option to edit the graph manually / interactively. The interface is a bit clunky but it should do the trick. Just set the argument edit.nb=TRUE when creating your graph. Cheers Thibaut ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Diedericks, G, Me [gend at sun.ac.za] Sent: 30 July 2015 10:24 To: adegenet-forum at lists.R-forge.R-project.org Cc: Diedericks, G, Me Subject: [adegenet-forum] MSPA Good day, I'm trying to run a MSPA for a freshwater fish species, sampled at 10 sites along a river. I have chosen the Delaunay Triangulation, but need to edit the connections as some of the sites are below a dam wall, so the fish cannot move back up the river. Could you please assist me with this? Kind regards, Genevieve ________________________________ Genevieve Diedericks PhD candidate ~ Zoology Centre for Invasion Biology (C.I.B) Department of Botany & Zoology Stellenbosch University South Africa +27 (0) 21 808 4135 -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.jombart at imperial.ac.uk Thu Jul 30 13:24:53 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Thu, 30 Jul 2015 11:24:53 +0000 Subject: [adegenet-forum] MSPA In-Reply-To: References: , <2CB2DA8E426F3541AB1907F98ABA6570ABF5B34D@icexch-m1.ic.ac.uk>, Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABF5B374@icexch-m1.ic.ac.uk> The graph may be too large for your computer to handle? Rendering of large graphs is typically slow on R's graphic devices. How many nodes do you have? Best Thibaut ps: please keep the forum Cced ________________________________ From: Diedericks, G, Me [gend at sun.ac.za] Sent: 30 July 2015 12:07 To: Jombart, Thibaut Subject: Re: MSPA Hi, Thanks for the speedy reply! I did as you suggested, but my R session keeps on bombing out or hanging (I am using Rstudio). Any suggestions on how to fix this? Thank you. Kind regards, Genevieve ________________________________ From: Jombart, Thibaut Sent: 30 July 2015 12:37 PM To: Diedericks, G, Me ; adegenet-forum at lists.R-forge.R-project.org Subject: RE: MSPA Hi there, if I remember well, chooseCN has an option to edit the graph manually / interactively. The interface is a bit clunky but it should do the trick. Just set the argument edit.nb=TRUE when creating your graph. Cheers Thibaut ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Diedericks, G, Me [gend at sun.ac.za] Sent: 30 July 2015 10:24 To: adegenet-forum at lists.R-forge.R-project.org Cc: Diedericks, G, Me Subject: [adegenet-forum] MSPA Good day, I'm trying to run a MSPA for a freshwater fish species, sampled at 10 sites along a river. I have chosen the Delaunay Triangulation, but need to edit the connections as some of the sites are below a dam wall, so the fish cannot move back up the river. Could you please assist me with this? Kind regards, Genevieve ________________________________ Genevieve Diedericks PhD candidate ~ Zoology Centre for Invasion Biology (C.I.B) Department of Botany & Zoology Stellenbosch University South Africa +27 (0) 21 808 4135 -------------- next part -------------- An HTML attachment was scrubbed... URL: From gend at sun.ac.za Thu Jul 30 13:42:51 2015 From: gend at sun.ac.za (Diedericks, G, Me ) Date: Thu, 30 Jul 2015 11:42:51 +0000 Subject: [adegenet-forum] MSPA In-Reply-To: <2CB2DA8E426F3541AB1907F98ABA6570ABF5B374@icexch-m1.ic.ac.uk> References: , <2CB2DA8E426F3541AB1907F98ABA6570ABF5B34D@icexch-m1.ic.ac.uk>, , <2CB2DA8E426F3541AB1907F98ABA6570ABF5B374@icexch-m1.ic.ac.uk> Message-ID: oops, sorry. I have 203 nodes... Regards, Genevieve ________________________________ From: Jombart, Thibaut Sent: 30 July 2015 01:24 PM To: Diedericks, G, Me ; adegenet-forum at lists.r-forge.r-project.org Subject: RE: MSPA The graph may be too large for your computer to handle? Rendering of large graphs is typically slow on R's graphic devices. How many nodes do you have? Best Thibaut ps: please keep the forum Cced ________________________________ From: Diedericks, G, Me [gend at sun.ac.za] Sent: 30 July 2015 12:07 To: Jombart, Thibaut Subject: Re: MSPA Hi, Thanks for the speedy reply! I did as you suggested, but my R session keeps on bombing out or hanging (I am using Rstudio). Any suggestions on how to fix this? Thank you. Kind regards, Genevieve ________________________________ From: Jombart, Thibaut Sent: 30 July 2015 12:37 PM To: Diedericks, G, Me ; adegenet-forum at lists.R-forge.R-project.org Subject: RE: MSPA Hi there, if I remember well, chooseCN has an option to edit the graph manually / interactively. The interface is a bit clunky but it should do the trick. Just set the argument edit.nb=TRUE when creating your graph. Cheers Thibaut ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Diedericks, G, Me [gend at sun.ac.za] Sent: 30 July 2015 10:24 To: adegenet-forum at lists.R-forge.R-project.org Cc: Diedericks, G, Me Subject: [adegenet-forum] MSPA Good day, I'm trying to run a MSPA for a freshwater fish species, sampled at 10 sites along a river. I have chosen the Delaunay Triangulation, but need to edit the connections as some of the sites are below a dam wall, so the fish cannot move back up the river. Could you please assist me with this? Kind regards, Genevieve ________________________________ Genevieve Diedericks PhD candidate ~ Zoology Centre for Invasion Biology (C.I.B) Department of Botany & Zoology Stellenbosch University South Africa +27 (0) 21 808 4135 -------------- next part -------------- An HTML attachment was scrubbed... URL: From caitiecollins at gmail.com Fri Jul 31 03:57:08 2015 From: caitiecollins at gmail.com (Caitlin Collins) Date: Fri, 31 Jul 2015 02:57:08 +0100 Subject: [adegenet-forum] Fwd: DAPC In-Reply-To: References: Message-ID: ---------- Forwarded message ---------- From: Caitlin Collins Date: Thu, Jul 30, 2015 at 3:04 PM Subject: Re: [adegenet-forum] DAPC To: Carly Graham Hi, I'm glad you asked. First things first, you are correct in interpreting from the xvalDapc results that you should use 100 PCs when running DAPC. (Though, I should say that you don't actually need to *run* DAPC after running xvalDapc, as the output of xvalDapc contains a dapc object that has been made by running DAPC with the optimal number of PCs as indicated by the lowest RMSE). Now, I think I may know where your confusion might lie: When you perform cross-validation with xvalDapc, the way it works is that DAPC gets run with varying numbers of PCs retained, *with some proportion of the data left out* (this is specified by the argument "training.set", by default 0.9). So, what "Mean Successful Assignment" is telling you is that, after xvalDapc ran DAPC with 100 PCs using only 90% of the, it was able to correctly place the left-out 10% of the data in the right group only 20.13889% of the time. The point of doing this is to identify the number of PCs that, when kept, allows us to generate a DAPC with the most informative and generalisable results. By contrast, the part of the output from summary(dapc) to which I think you are referring (presumably you mean "assign.prop"?) is not actually a *probability*: it is a *proportion*. Moreover, it is the proportion of individuals that were successfully assigned to the correct group *when the DAPC was run with all of the data*. This helps to explain why it is usually much higher than the "Mean Successful Assignment" of xvalDapc, even with the same number of PCs. While cross-validation was trying to *predict* the likely group membership of unseen individuals whose data was not used to build the model, DAPC is reporting the percent successful assignment of individuals to their groups based on a model built with data from these individuals and all others in the dataset. Now that this is hopefully beginning to become a little more clear, I should say (in case you have, or will, notice this and become confused) that the method used by DAPC to assign individuals to groups (leading to the assignment *proportion* reported by summary(dapc)) is a probabilistic one. Essentially, DAPC makes a probabilistic assessment based on the model. DAPC first asks: "based on the coordinate system I have generated, and given the data from this individual, where should I place this individual in multivariate space?". Then, for some individuals, it is incredibly clear that they have ended up in a corner of multivariate space that is defined and occupied by the members of a given group. For others, however, who may be placed in the space at the edge of a group or between two groups, their "most likely true group" is less clear. DAPC calculates a probability that each individual belongs in each group (storing these in the "$posterior" slot of the dapc object), and assigns individuals to the group for which their posterior probability is highest. Note that the "assign.prop" element of the output of summary(dapc) does not actually contain any information on these probabilities directly, it just tells you what proportion of the ultimate assignments were correct. Taking this all into consideration, if you were actually hoping to report an "assignment *probability*", the answer is not as straightforward as choosing either "assign.prop" or "Mean Successful Assignment". "Mean Successful Assignment" might tell you more about the ability of your model to make predictions about the group memberships of individuals that are not in your sample. *But*, if your sample happened to be a perfectly *representative *sample, this would be unfair. Clearly, given all the data in the sample, DAPC was able to build a model that accurately placed individuals in the correct group 87.01923% of the time. Furthermore, *this* is the DAPC model/output you are actually talking about (ie. the one built with all the data, not just 90% of it), so this figure is more indicative of the success of your model. Altogether, if you still wanted to make a statement about the ability of your model to make predictions about data that was not in your sample, the truth would presumably be somewhere in between those two numbers. But when talking about the DAPC outcome that you actually achieved with your dataset, the true successful assignment attained was 87.01923% (just don't call it a probability!). Hope that helps. Cheers, Caitlin. On Wed, Jul 29, 2015 at 10:51 PM, Carly Graham wrote: > Hello, > > I have been looking at the population structure of whitefish in a small > lake area. I used xvalDapc to determine the optimal number of PCs to retain > for the dapc analysis. When I look at the output from this xval command I > see that the following: > > $`Median and Confidence Interval for Random Chance` > 2.5% 50% 97.5% > 0.08354526 0.12267036 0.17258877 > > $`Mean Successful Assignment by Number of PCs of PCA` > 20 40 60 80 100 120 140 > 160 180 > 0.1319444 0.1694444 0.1680556 0.1875000 0.2013889 0.1291667 0.1666667 > 0.1250000 0.1125000 > > $`Number of PCs Achieving Highest Mean Success` > [1] "100" > > $`Root Mean Squared Error by Number of PCs of PCA` > 20 40 60 80 100 120 140 > 160 180 > 0.8704578 0.8361065 0.8360719 0.8177360 0.8051124 0.8730468 0.8389395 > 0.8772458 0.8905042 > > $`Number of PCs Achieving Lowest MSE` > [1] "100" > > > From this I have interpreted that I should use 100 PCs. From there when I > run the dapc with 100 PCs and then look at output from summary(dapc) I have > an assignment probability of 0.8701923. Where I am confused is how to > interpret the ?Mean Successful Assignment? from the above output. Does this > also correspond to the assignment to populations? If so, is it more > accurate to assume that the assignment probability is 0.2013889? > > Thanks, > > > Carly Graham > PhD Candidate > University of Regina > > > > > > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum > -------------- next part -------------- An HTML attachment was scrubbed... URL: