From t.jombart at imperial.ac.uk Tue Nov 4 13:20:37 2014 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Tue, 4 Nov 2014 12:20:37 +0000 Subject: [adegenet-forum] Hybridize Function / df2genind error message In-Reply-To: References: Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABE6D2D6@icexch-m1.ic.ac.uk> Hi Spencer, you don't need to convert genind to data.frames - just subset individuals in the genind objects like you would in a matrix. Then, you can pool datasets with potentially different alleles using 'repool'. Cheers Thibaut ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Spencer Bruce [goatsrunfaster at gmail.com] Sent: 27 October 2014 13:57 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] Hybridize Function / df2genind error message Hello All, After hybridizing two populations, I converted the genind file to at dataframe to randomly extract individuals. I then attempt to convert this data frame back into a genind file, but get the error message below: > F1_G1 <- df2genind(randomF1) Error in df2genind(randomF1) : 2 alleles cannot be coded by a total of 19 characters Im assuming this is because the "pop" column, instead of being coded by a number contains the text generated by the hybridize function "honnedaga-tdhybrids" I tried to resolve this by using the following code, but ran into a second error message: > randomF1$pop[randomF1$pop == "honnedaga-tdhybrids"] <- 1 Warning message: In `[<-.factor`(`*tmp*`, randomF1$pop == "honnedaga-tdhybrids", : invalid factor level, NA generated any idea how I might be able to fix this? Thanks in advance!!! -Spencer -- Spencer A Bruce 200 Washington St. Troy, NY 12180 518 225 0787 -------------- next part -------------- An HTML attachment was scrubbed... URL: From goatsrunfaster at gmail.com Tue Nov 4 13:30:53 2014 From: goatsrunfaster at gmail.com (Spencer Bruce) Date: Tue, 4 Nov 2014 07:30:53 -0500 Subject: [adegenet-forum] Hybridize Function / df2genind error message In-Reply-To: <2CB2DA8E426F3541AB1907F98ABA6570ABE6D2D6@icexch-m1.ic.ac.uk> References: <2CB2DA8E426F3541AB1907F98ABA6570ABE6D2D6@icexch-m1.ic.ac.uk> Message-ID: Got it, thanks. On Tue, Nov 4, 2014 at 7:20 AM, Jombart, Thibaut wrote: > Hi Spencer, > > you don't need to convert genind to data.frames - just subset individuals > in the genind objects like you would in a matrix. Then, you can pool > datasets with potentially different alleles using 'repool'. > > Cheers > Thibaut > ------------------------------ > *From:* adegenet-forum-bounces at lists.r-forge.r-project.org [ > adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Spencer > Bruce [goatsrunfaster at gmail.com] > *Sent:* 27 October 2014 13:57 > *To:* adegenet-forum at lists.r-forge.r-project.org > *Subject:* [adegenet-forum] Hybridize Function / df2genind error message > > Hello All, > > After hybridizing two populations, I converted the genind file to at > dataframe to randomly extract individuals. I then attempt to convert this > data frame back into a genind file, but get the error message below: > > > F1_G1 <- df2genind(randomF1) > Error in df2genind(randomF1) : > 2 alleles cannot be coded by a total of 19 characters > > Im assuming this is because the "pop" column, instead of being coded by > a number contains the text generated by the hybridize function > "honnedaga-tdhybrids" > > I tried to resolve this by using the following code, but ran into a > second error message: > > > randomF1$pop[randomF1$pop == "honnedaga-tdhybrids"] <- 1 > Warning message: > In `[<-.factor`(`*tmp*`, randomF1$pop == "honnedaga-tdhybrids", : > invalid factor level, NA generated > > any idea how I might be able to fix this? Thanks in advance!!! > > -Spencer > > -- > Spencer A Bruce > 200 Washington St. > Troy, NY 12180 > 518 225 0787 > -- Spencer A Bruce 113 Hill St. Troy, NY 12180 518 225 0787 -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.jombart at imperial.ac.uk Tue Nov 4 15:08:03 2014 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Tue, 4 Nov 2014 14:08:03 +0000 Subject: [adegenet-forum] find.clusters without PCA In-Reply-To: References: <43B55DB4-31DF-4C47-A4E7-F10B05131A3A@imperial.ac.uk>, Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABE6D35E@icexch-m1.ic.ac.uk> Dear all, naive questions are welcome here of course. Both the question and the answer make sense here, though Fede's answer makes me think he is sometimes so rude he could be French ;) Seriously though. The pre-PCA step has two purposes: 1) reduce the number of variables to its minimum 2) separate the noise from the structured signal If you are not interested in #2, #1 still has a computational interest. find.cluster uses k-means, which works with squared Euclidean distances between individual profiles. Generally speaking, when you have 'N' individuals and 'P' alleles, the number of dimensions necessary to represent all the information (all the distances) is min(N-1, P). K-means works faster with less variables. So running it on 'N-1' principal components (PCs) is generally faster than on 'P' alleles. If all PCs are retained, there is no loss of information. So in short, you don't need to remove the PCA step, just to keep all PCs. Makes sense? Cheers Thibaut ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Roberto Oliveira Santos [roberto at geodev.com.br] Sent: 30 October 2014 18:41 To: adegenet-forum at lists.r-forge.r-project.org Subject: Re: [adegenet-forum] find.clusters without PCA Hi Federico "shaming reputations"? sorry..., pretty much sure I don't have any reputation :-) if anyone ask a naive question this should be response? I disagree... anyway, thanks for the text. I'll keep in mind. Cheers, Roberto 2014-10-30 16:16 GMT+00:00 Federico Calboli >: You?re welcome. I would not be presenting the results to referees, PhD examiners or colleagues. http://judgestarling.tumblr.com/post/79974811093/shaming-reputations-as-a-means-of-reducing-the Happy reading! F On 30 Oct 2014, at 16:02, Roberto Oliveira Santos > wrote: > Dear Federico > > Many thanks. Very kind of you the "It would also be completely and utterly idiotic.". > > Best wishes > > Roberto > > > 2014-10-30 15:56 GMT+00:00 Federico Calboli >: > On 30 Oct 2014, at 15:40, Roberto Oliveira Santos > wrote: > > > Dear all > > > > Is it possible to run find.clusters without the PCA analysis? > > I would not know whether find.clusters would like it, but in general you can surely find clusters without bothering with a PCA first ? you have a formula, you input some data, you get your results. > > It would also be completely and utterly idiotic. > > You use a PCA before because of correlation betwen the data, and you transform the data with a PCA in a set of independent variables (and you also have an idea of what linear combinations explain little or nothing in the bargain). You use a PCA to get some signal out of the noise. > > So, you can well not use a PCA and cluster. You will get some results, that might, or not, look like the results you get after a PCA decomposition. You will also have biased your clustering to an unknown amount, in a way that is not clear what might actually mean. > > BW > > F > > > > I have interested in the clustering procedure but would like to compare the results with and without PCA transformation. > > > > Best wishes > > > > Roberto > > _______________________________________________ > > adegenet-forum mailing list > > adegenet-forum at lists.r-forge.r-project.org > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.jombart at imperial.ac.uk Tue Nov 4 15:12:58 2014 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Tue, 4 Nov 2014 14:12:58 +0000 Subject: [adegenet-forum] problems adding predicted points to scatter plot In-Reply-To: References: Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABE6D372@icexch-m1.ic.ac.uk> Hi there, for this it is best to re-use the example in the DAPC tutorial, p. 40, though I suspect this is what you have done already. I'd need to play with a reproducible example, but I think the problem comes from using add.scatter.eig, implicitly called when you ask for either screeplot to be added in scatter.dapc. This changes the coordinate system. Best think to do is disable these, add the new points, then add the screeplot last, as in the example. Cheers Thibaut ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Andres Schj?nhaug Susrud [andres.susrud at gmail.com] Sent: 30 October 2014 20:02 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] problems adding predicted points to scatter plot Dear list, I'm having problems adding points to a dapc scatter plot. grp = find.clusters(human_DR_bind_2[1:200,]) dapc1 <- dapc(human_DR_bind_2[1:200,],grp$grp) pred.sup <- predict.dapc(dapc1, newdata=x.sup2) names(pred.sup) scatter(dapc1, cell=2.5, pch=1, cstar=0, axesel=FALSE, col=c(2,3,4)) par(xpd=T) points(pred.sup$ind.scores[,1],pred.sup$ind.scores[,2],pch = 2,col = 6) the problem is that the predicted points are "all" visible, but completely out of placement. when plotting the dapc1$ind.scores[,1],dapc1$ind.scores plot(dapc1$ind.scores[,1],dapc1$ind.scores) points(pred.sup$ind.scores[,1],pred.sup$ind.scores[,2],pch = 2,col = 6) the alligment seems fine. thanks for any help on this matter BR Andres -------------- next part -------------- An HTML attachment was scrubbed... URL: From goatsrunfaster at gmail.com Fri Nov 7 15:47:53 2014 From: goatsrunfaster at gmail.com (Spencer Bruce) Date: Fri, 7 Nov 2014 09:47:53 -0500 Subject: [adegenet-forum] Random error message Message-ID: Hello All, I'm work with a large set of data where I need to continuously randomly sample individuals, and keep getting random error messages, seemingly for no reason. Example below: > randomF12Gen15 <- randomF12Gen14[sample(nrow(randomF12Gen14$tab), 199, ] Error: unexpected ']' in "randomF12Gen15 <- randomF12Gen14[sample(nrow(randomF12Gen14$tab), 199, ]" This does not make sense given that I ran multiple lines of code before it that look exactly the same. Any idea why this might be happening? -- Spencer A Bruce 113 Hill St. Troy, NY 12180 518 225 0787 -------------- next part -------------- An HTML attachment was scrubbed... URL: From caitiecollins at gmail.com Fri Nov 7 15:55:12 2014 From: caitiecollins at gmail.com (Caitlin Collins) Date: Fri, 7 Nov 2014 14:55:12 +0000 Subject: [adegenet-forum] Random error message In-Reply-To: References: Message-ID: Hi there, I'm guessing the previously lines of code didn't look exaaactly the same... You're missing a closing curved bracket on the right end of sample. Should be: randomF12Gen15 <- randomF12Gen14[sample(nrow(randomF12Gen14$tab), 199 *) *, ] See if that works now? Best, Caitlin. On Fri, Nov 7, 2014 at 2:47 PM, Spencer Bruce wrote: > Hello All, > > I'm work with a large set of data where I need to continuously randomly > sample individuals, and keep getting random error messages, seemingly for > no reason. Example below: > > > randomF12Gen15 <- randomF12Gen14[sample(nrow(randomF12Gen14$tab), 199, ] > > Error: unexpected ']' in "randomF12Gen15 <- > randomF12Gen14[sample(nrow(randomF12Gen14$tab), 199, ]" > > This does not make sense given that I ran multiple lines of code before it > that look exactly the same. > > Any idea why this might be happening? > > -- > Spencer A Bruce > 113 Hill St. > Troy, NY 12180 > 518 225 0787 > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum > -------------- next part -------------- An HTML attachment was scrubbed... URL: From goatsrunfaster at gmail.com Fri Nov 7 15:56:12 2014 From: goatsrunfaster at gmail.com (Spencer Bruce) Date: Fri, 7 Nov 2014 09:56:12 -0500 Subject: [adegenet-forum] Random error message In-Reply-To: References: Message-ID: Thanks! How embarrassing! On Fri, Nov 7, 2014 at 9:55 AM, Caitlin Collins wrote: > Hi there, > > I'm guessing the previously lines of code didn't look exaaactly the > same... You're missing a closing curved bracket on the right end of sample. > Should be: > > randomF12Gen15 <- randomF12Gen14[sample(nrow(randomF12Gen14$tab), 199 *) *, > ] > > See if that works now? > > Best, > Caitlin. > > On Fri, Nov 7, 2014 at 2:47 PM, Spencer Bruce > wrote: > >> Hello All, >> >> I'm work with a large set of data where I need to continuously randomly >> sample individuals, and keep getting random error messages, seemingly for >> no reason. Example below: >> >> > randomF12Gen15 <- randomF12Gen14[sample(nrow(randomF12Gen14$tab), 199, ] >> >> Error: unexpected ']' in "randomF12Gen15 <- >> randomF12Gen14[sample(nrow(randomF12Gen14$tab), 199, ]" >> >> This does not make sense given that I ran multiple lines of code before >> it that look exactly the same. >> >> Any idea why this might be happening? >> >> -- >> Spencer A Bruce >> 113 Hill St. >> Troy, NY 12180 >> 518 225 0787 >> >> _______________________________________________ >> adegenet-forum mailing list >> adegenet-forum at lists.r-forge.r-project.org >> >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum >> > > -- Spencer A Bruce 113 Hill St. Troy, NY 12180 518 225 0787 -------------- next part -------------- An HTML attachment was scrubbed... URL: From caitiecollins at gmail.com Fri Nov 7 15:57:11 2014 From: caitiecollins at gmail.com (Caitlin Collins) Date: Fri, 7 Nov 2014 14:57:11 +0000 Subject: [adegenet-forum] Random error message In-Reply-To: References: Message-ID: Not at all. The only reason I saw it is because I make that mistake all the time. Cheers, Caitlin. On Fri, Nov 7, 2014 at 2:56 PM, Spencer Bruce wrote: > Thanks! How embarrassing! > > On Fri, Nov 7, 2014 at 9:55 AM, Caitlin Collins > wrote: > >> Hi there, >> >> I'm guessing the previously lines of code didn't look exaaactly the >> same... You're missing a closing curved bracket on the right end of sample. >> Should be: >> >> randomF12Gen15 <- randomF12Gen14[sample(nrow(randomF12Gen14$tab), 199 >> *) *, ] >> >> See if that works now? >> >> Best, >> Caitlin. >> >> On Fri, Nov 7, 2014 at 2:47 PM, Spencer Bruce >> wrote: >> >>> Hello All, >>> >>> I'm work with a large set of data where I need to continuously randomly >>> sample individuals, and keep getting random error messages, seemingly for >>> no reason. Example below: >>> >>> > randomF12Gen15 <- randomF12Gen14[sample(nrow(randomF12Gen14$tab), 199, >>> ] >>> >>> Error: unexpected ']' in "randomF12Gen15 <- >>> randomF12Gen14[sample(nrow(randomF12Gen14$tab), 199, ]" >>> >>> This does not make sense given that I ran multiple lines of code before >>> it that look exactly the same. >>> >>> Any idea why this might be happening? >>> >>> -- >>> Spencer A Bruce >>> 113 Hill St. >>> Troy, NY 12180 >>> 518 225 0787 >>> >>> _______________________________________________ >>> adegenet-forum mailing list >>> adegenet-forum at lists.r-forge.r-project.org >>> >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum >>> >> >> > > > -- > Spencer A Bruce > 113 Hill St. > Troy, NY 12180 > 518 225 0787 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Julia.Bischof at uksh.de Thu Nov 13 14:24:10 2014 From: Julia.Bischof at uksh.de (Julia.Bischof at uksh.de) Date: Thu, 13 Nov 2014 13:24:10 +0000 Subject: [adegenet-forum] question on function df2genind - big dataframes (R package adegenet) Message-ID: <100E3CA58B060C44AE1FC97A7440CBE131C3C499@SBEXLP04.zad.local> Hello adegenet-forum-members, I wanted to use your R package adegenet to convert a dataframe into genind-class. My dataframe contains ~9800 samples (in rows) and ~87600 markers (in columns). Genotypes are coded like "AG", "TT", and so on. Example (5x5 extract): marker1 marker2 marker3 marker4 marker4 Samp1 GA GA AA TT TT Samp2 AA AA CA CT TT Samp3 GA GA CA TT TT Samp5 GA AA CA TT TT Samp6 AA AA AA TT TT I have been tryingt to run the df2genind for days now (df2genind(dataframe, ploidy=as.integer(2))), but it's not finishing or maybe not working. The question is now, can your function handle such big dataframes like this or could there be another problem? (It's running on a quit good server, so it shouldn't be due too little sever capacity/memory.) I also tried to convert parts of the dataframe (always 5000 marker, all ~9800 individuals) to genind and use afterwords your repool-command. But here we have the problem, that the genind-classes don't have the same length in the end. I could also use PED/MAP- or .raw-files from PLINK as input, but in the end I need a genind-object! Do you have any idea how to solve this problem? Best Regards, Julia Bischof [http://www.uksh.de/skin/uksh/tpl/infoportal/img/uk-sh_logo.gif] Universit?tsklinikum Schleswig-Holstein Rechtsf?hige Anstalt des ?ffentlichen Rechts der Christian-Albrechts-Universit?t zu Kiel und der Universit?t zu L?beck Vorstandsmitglieder: Prof. Dr. Jens Scholz (Vorsitzender), Peter Pansegrau, Christa Meyer Vorsitzender des Aufsichtsrates: Rolf Fischer Bankverbindungen: F?rde Sparkasse BLZ 210 501 70 Kto.-Nr. 100 206, IBAN: DE14 2105 0170 0000 1002 06 SWIFT/BIC: NOLA DE 21 KIE Commerzbank AG BLZ 230 800 40 Kto.-Nr. 300 041 200, IBAN: DE17 2308 0040 0300 0412 00 SWIFT/BIC: DRES DE FF 230 Diese E-Mail enth?lt vertrauliche Informationen und ist nur f?r die Personen bestimmt, an welche sie gerichtet ist. Sollten Sie nicht der bestimmungsgem??e Empf?nger sein, bitten wir Sie, uns hiervon unverz?glich zu unterrichten und die E-Mail zu vernichten. Wir weisen darauf hin, dass der Gebrauch und die Weiterleitung einer nicht bestimmungsgem?? empfangenen E-Mail und ihres Inhalts gesetzlich verboten sind und ggf. Schadensersatzanspr?che ausl?sen k?nnen. -------------- next part -------------- An HTML attachment was scrubbed... URL: From massimiliano.virgilio at africamuseum.be Tue Nov 18 15:03:08 2014 From: massimiliano.virgilio at africamuseum.be (Virgilio Massimiliano) Date: Tue, 18 Nov 2014 14:03:08 +0000 Subject: [adegenet-forum] DAPC a priori grouping Message-ID: Dear Thibaud, I?m trying to group populations of a genind object according to region and country of origin. The idea was to perform DAPCs and maximise differences between individuals from different regions/countries. I first added the region/country info to the @other slot of the genind object and then tried to perform DAPC: dapc_region <- dapc(data, grp=data at other$region, scale=FALSE, n.pca=20, n.da=5) yet, all I could get was a DAPC using populations as defined by the @pop slot, so I tried and modify the @pop slot region=read.table(?region.txt",head=T) data841 at pop<-region but I got: Error in checkAtAssignment("genind", "pop", "data.frame") : assignment of an object of class "data.frame" is not valid for @'pop' in an object of class "genind"; is(value, "factorOrNULL") is not TRUE Is this just something I cannot do on a genind object? any suggestion on how to easily change the apriori grouping of individuals (@pop) and perform DAPC many thanks in advance and all the best Massi -------------- next part -------------- An HTML attachment was scrubbed... URL: From tzbaris at icloud.com Thu Nov 20 18:13:44 2014 From: tzbaris at icloud.com (Tara Baris) Date: Thu, 20 Nov 2014 12:13:44 -0500 Subject: [adegenet-forum] Fst values Message-ID: Hi, I?m trying to calculate Fst values for about 10,000 loci in two different populations using the following line of code: pairwise.fst(x, pop=NULL, res.type=c("dist","matrix"), truenames=TRUE) I have two questions: 1) How would I specify the populations in this line? If I included a population definition file when converting to a genetix file, is that sufficient, and would the it still be pop=NULL? 2) When I try to run this in R, I get the following error message. Am I missing something? > pairwise.fst(NJ155_gen, pop=NULL, res.type=c("dist","matrix"), truenames=TRUE) Error in sub(paste("^.{", ncode/ploidy, "}", sep = ""), "", X) : invalid regular expression '^.{543}', reason 'Invalid contents of {}? Thank you, Tara -------------- next part -------------- An HTML attachment was scrubbed... URL: From massimiliano.virgilio at africamuseum.be Fri Nov 21 22:18:18 2014 From: massimiliano.virgilio at africamuseum.be (Virgilio Massimiliano) Date: Fri, 21 Nov 2014 21:18:18 +0000 Subject: [adegenet-forum] DAPC a priori grouping In-Reply-To: References: Message-ID: <763E6840-174B-45D6-A7E1-145F992341D3@africamuseum.be> got it: # DAPC REGION # change genind at pop region_pop=read.table("region_pop.txt",head=T) data1585_region <- data1585 data1585_region at pop <- as.factor(region_pop[,1]) # change genind at pop.names region_pop.names=read.table("region_pop.names.txt",head=T) popnames.region <- as.character(region_pop.names[,1]) names(popnames.region) <- paste('P',c(1:6), sep = '') data1585_region at pop.names <- popnames.region dapc_region <- dapc(data1585_region, grp=data1585 at pop, scale=FALSE, n.pca=30, n.da=5) scatter(dapc_region, col=funky(6), clabel=.5, scree.pca=TRUE, ratio.pca=0.15, scree.da=TRUE, ratio.da=0.15) mtext("dapc region 1585", side = 3, line = 3) many thanks to Fred ;-) M. __________________________________ Massimiliano Virgilio, PhD Royal Museum for Central Africa Leuvensesteenweg 13, B-3080 Tervuren, Belgium, +32 (0) 27695366 massimiliano.virgilio at africamuseum.be http://www.africamuseum.be/home/contact/staff/VIRGILIO_Massimiliano/ On 18 Nov 2014, at 15:03, Virgilio Massimiliano > wrote: Dear Thibaud, I?m trying to group populations of a genind object according to region and country of origin. The idea was to perform DAPCs and maximise differences between individuals from different regions/countries. I first added the region/country info to the @other slot of the genind object and then tried to perform DAPC: dapc_region <- dapc(data, grp=data at other$region, scale=FALSE, n.pca=20, n.da=5) yet, all I could get was a DAPC using populations as defined by the @pop slot, so I tried and modify the @pop slot region=read.table(?region.txt",head=T) data841 at pop<-region but I got: Error in checkAtAssignment("genind", "pop", "data.frame") : assignment of an object of class "data.frame" is not valid for @'pop' in an object of class "genind"; is(value, "factorOrNULL") is not TRUE Is this just something I cannot do on a genind object? any suggestion on how to easily change the apriori grouping of individuals (@pop) and perform DAPC many thanks in advance and all the best Massi -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.jombart at imperial.ac.uk Thu Nov 27 14:27:16 2014 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Thu, 27 Nov 2014 13:27:16 +0000 Subject: [adegenet-forum] Fst values In-Reply-To: References: Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABE85BB6@icexch-m1.ic.ac.uk> Hi there, for 1) please check the manpage - "pop" is there to specify populations! As for 2), it is hard to find out without knowing how NJ155_gen has been created. This should be a genind object, with something in @pop as you leave pop=NULL. A minimal reproducible example would be useful. Cheers Thibaut ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Tara Baris [tzbaris at icloud.com] Sent: 20 November 2014 17:13 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] Fst values Hi, I?m trying to calculate Fst values for about 10,000 loci in two different populations using the following line of code: pairwise.fst(x, pop=NULL, res.type=c("dist","matrix"), truenames=TRUE) I have two questions: 1) How would I specify the populations in this line? If I included a population definition file when converting to a genetix file, is that sufficient, and would the it still be pop=NULL? 2) When I try to run this in R, I get the following error message. Am I missing something? > pairwise.fst(NJ155_gen, pop=NULL, res.type=c("dist","matrix"), truenames=TRUE) Error in sub(paste("^.{", ncode/ploidy, "}", sep = ""), "", X) : invalid regular expression '^.{543}', reason 'Invalid contents of {}? Thank you, Tara -------------- next part -------------- An HTML attachment was scrubbed... URL: From coulsonmw at gmail.com Fri Nov 28 11:04:23 2014 From: coulsonmw at gmail.com (Mark Coulson) Date: Fri, 28 Nov 2014 10:04:23 +0000 Subject: [adegenet-forum] data input Message-ID: Hello, Apologies for the simple question but I have imported a genepop file into adegenet (although couldn't do this via 'read.genepop', instead had to use 'import2genind'. My individuals are labelled as POPNAME_1, POPNAME_2, and it correctly identifies both the population name as well as the individual identifier. However, how do I add in something to the @other slot. For example, I'd like to classify each of the populations as to which river they belong to. Many thanks, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From vojta at trapa.cz Fri Nov 28 11:16:38 2014 From: vojta at trapa.cz (=?utf-8?B?Vm9qdMSbY2g=?= Zeisek) Date: Fri, 28 Nov 2014 11:16:38 +0100 Subject: [adegenet-forum] data input In-Reply-To: References: Message-ID: <1523898.fccyOmciy8@veles> Hello Dne P? 28. listopadu 2014 10:04:23, Mark Coulson napsal(a): > Hello, > > Apologies for the simple question but I have imported a genepop file into > adegenet (although couldn't do this via 'read.genepop', instead had to use > 'import2genind'. My individuals are labelled as POPNAME_1, POPNAME_2, and > it correctly identifies both the population name as well as the individual > identifier. However, how do I add in something to the @other slot. For > example, I'd like to classify each of the populations as to which river > they belong to. Would something like YourData$other <- YourSortingVariable do the job? YourSortingVariable should have length as number of individuals in YourData (genind object) and it should be a factor, e.g.: YourSortingVariable <- c("Danube", "Danube", "Moldau", "Moldau", ...) There is also slot pop (YourData$pop) which is used to store population information and some functions can directly use it. HTH, Vojt?ch -- Vojt?ch Zeisek http://trapa.cz/en/ Department of Botany, Faculty of Science Charles University in Prague Ben?tsk? 2, Prague, 12801, CZ http://botany.natur.cuni.cz/en/ Institute of Botany, Academy of Science Z?mek 1, Pr?honice, 25243, CZ http://www.ibot.cas.cz/en/ Czech Republic -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 473 bytes Desc: This is a digitally signed message part. URL: From coulsonmw at gmail.com Fri Nov 28 11:26:09 2014 From: coulsonmw at gmail.com (Mark Coulson) Date: Fri, 28 Nov 2014 10:26:09 +0000 Subject: [adegenet-forum] data input Message-ID: I should mention that I have a data set with ~12,000 individuals. Is there a way to add in an @other column to a dataframe and have adegenet read it as such? -------------- next part -------------- An HTML attachment was scrubbed... URL: From vojta at trapa.cz Fri Nov 28 11:43:04 2014 From: vojta at trapa.cz (=?utf-8?B?Vm9qdMSbY2g=?= Zeisek) Date: Fri, 28 Nov 2014 11:43:04 +0100 Subject: [adegenet-forum] data input In-Reply-To: References: Message-ID: <1765980.SfevQD7J6C@veles> Hello Dne P? 28. listopadu 2014 10:26:09, Mark Coulson napsal(a): > I should mention that I have a data set with ~12,000 individuals. Is there > a way to add in an @other column to a dataframe and have adegenet read it > as such? You can use same way as I suggested. You can generate the variable using, for example, function rep(). See ?rep Two examples: rep(c("Danube", "Moldau"), each=600) rep(c("Danube", "Moldau"), times=c(650, 550)) I don't use genepop, so I'm not sure. I'm usually using read.loci() function which has parameter col.pop - You define there number of column with population information. I think import2genind isn't able to do what You wish. But I'd suggest You to extract respective column from You dataset into separate file, read it into YourSortingVariable by read.csv or read.table, e.g. YourSortingVariable <- read.csv("rivers.csv", header=TRUE, sep="\t", quote="", row.names=1) (see ?read.csv for all possibilities) and then like I suggested previously do YourData$other <- YourSortingVariable This should work. Yours, Vojt?ch -- Vojt?ch Zeisek http://trapa.cz/en/ Department of Botany, Faculty of Science Charles University in Prague Ben?tsk? 2, Prague, 12801, CZ http://botany.natur.cuni.cz/en/ Institute of Botany, Academy of Science Z?mek 1, Pr?honice, 25243, CZ http://www.ibot.cas.cz/en/ Czech Republic -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 473 bytes Desc: This is a digitally signed message part. URL: From coulsonmw at gmail.com Fri Nov 28 12:57:04 2014 From: coulsonmw at gmail.com (Mark Coulson) Date: Fri, 28 Nov 2014 11:57:04 +0000 Subject: [adegenet-forum] data input Message-ID: Hello, I am trying to read in a dataframe (rather than a genepop or other format) with 1 column as population, 1 column as individual and 1 column as River (which I would like as the @other tab). I've coded my alleles as 2-digit codes with no separator. In excel this looks like 1134 or 0230 for example. I've done df2genind(x, sep=NULL, ind.names=x$Individual, pop=x$Population, missing="0", ploidy=2) but I get Error in df2genind(scot, sep = NULL, ind.names = scot$Individual, pop = scot$Population, : 2 alleles cannot be coded by a total of 27 characters I notice that R is reading any genotype beginning with a 0 as 230 instead of 0230. I may also want to include some additional 'other' variables such as coordinates so looking for options for bringing in a dataframe instead of particular genetic formats. Thanks, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.jombart at imperial.ac.uk Fri Nov 28 13:36:31 2014 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Fri, 28 Nov 2014 12:36:31 +0000 Subject: [adegenet-forum] data input In-Reply-To: References: Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABE85ECF@icexch-m1.ic.ac.uk> Hello Mark, yes, R will read stuff looking like integers as integers, so you want to specify the type as 'character' when reading the file in, and make sure characters are not converted to factors automatically (option stringsAsFactors). I suspect "x" here may still contain non-allele data, so you may want to double check that. Cheers Thibaut ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Mark Coulson [coulsonmw at gmail.com] Sent: 28 November 2014 11:57 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] data input Hello, I am trying to read in a dataframe (rather than a genepop or other format) with 1 column as population, 1 column as individual and 1 column as River (which I would like as the @other tab). I've coded my alleles as 2-digit codes with no separator. In excel this looks like 1134 or 0230 for example. I've done df2genind(x, sep=NULL, ind.names=x$Individual, pop=x$Population, missing="0", ploidy=2) but I get Error in df2genind(scot, sep = NULL, ind.names = scot$Individual, pop = scot$Population, : 2 alleles cannot be coded by a total of 27 characters I notice that R is reading any genotype beginning with a 0 as 230 instead of 0230. I may also want to include some additional 'other' variables such as coordinates so looking for options for bringing in a dataframe instead of particular genetic formats. Thanks, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From vojta at trapa.cz Fri Nov 28 13:41:25 2014 From: vojta at trapa.cz (=?utf-8?B?Vm9qdMSbY2g=?= Zeisek) Date: Fri, 28 Nov 2014 13:41:25 +0100 Subject: [adegenet-forum] data input In-Reply-To: References: Message-ID: <1783704.xqBiuESER8@veles> Hello Dne P? 28. listopadu 2014 11:57:04, Mark Coulson napsal(a): > Hello, > > I am trying to read in a dataframe (rather than a genepop or other format) > with 1 column as population, 1 column as individual and 1 column as River > (which I would like as the @other tab). I've coded my alleles as 2-digit As far as I know, DF can not store river information for conversion, You have to add it manually later. > codes with no separator. In excel this looks like 1134 or 0230 for example. > > I've done df2genind(x, sep=NULL, ind.names=x$Individual, pop=x$Population, > missing="0", ploidy=2) but I get When there is no separator, You need to specify number of characters for each locus (ncode). > Error in df2genind(scot, sep = NULL, ind.names = scot$Individual, pop = > scot$Population, : > 2 alleles cannot be coded by a total of 27 characters There is no determination of number of characters determining allele (as there is no separator etc). > I notice that R is reading any genotype beginning with a 0 as 230 instead > of 0230. I may also want to include some additional 'other' variables such > as coordinates so looking for options for bringing in a dataframe instead > of particular genetic formats. As I've already wrote, You can fill additional information later. Finally, it will be easier, I think... > Thanks, > Mark Good luck, Vojt?ch -- Vojt?ch Zeisek http://trapa.cz/en/ Department of Botany, Faculty of Science Charles University in Prague Ben?tsk? 2, Prague, 12801, CZ http://botany.natur.cuni.cz/en/ Institute of Botany, Academy of Science Z?mek 1, Pr?honice, 25243, CZ http://www.ibot.cas.cz/en/ Czech Republic -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 473 bytes Desc: This is a digitally signed message part. URL: From coulsonmw at gmail.com Fri Nov 28 14:22:00 2014 From: coulsonmw at gmail.com (Mark Coulson) Date: Fri, 28 Nov 2014 13:22:00 +0000 Subject: [adegenet-forum] data input from dataframe Message-ID: Ok, so I got rid of my 'River' column so it is now Col 1 is a population label name and col 2 is an individual identifier. the remaining columns are the genotype data now reading 4 digits (i.e. 0206, 1220, etc.). scot <- read.table("Scotland_adegenet_no_river_names.txt", header=TRUE, sep="\t", quote="\"", colClasses="character", stringsAsFactors=FALSE) executing the conversion via df2genind(scot, sep="", ind.names=scot$Individual, pop=scot$Population, missing="0", ploidy=2, type="codom", ncode=4) I still get the following error Error in as.matrix(as.data.frame(strsplit(X, sep))) : error in evaluating the argument 'x' in selecting a method for function 'as.matrix': Error in data.frame(c("A", "n", "n", "A", "e"), c("A", "n", "n", "A", : arguments imply differing number of rows: 5, 10, 8, 7, 9, 12, 11, 16, 17, 14, 13, 15, 18, 25, 27, 23, 21, 26, 20, 19, 22, 24, 6, 4, 1 While I can get the genepop file accepted, being able to take a dataframe as indicated would be much easier for what I am hoping to do and the size of the datasets etc. Cheers, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.jombart at imperial.ac.uk Fri Nov 28 14:39:58 2014 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Fri, 28 Nov 2014 13:39:58 +0000 Subject: [adegenet-forum] data input from dataframe In-Reply-To: References: Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABE85F15@icexch-m1.ic.ac.uk> Hi, from ?df2genind: " The function ?df2genind? converts a data.frame (or a matrix) into a genind object. The data.frame must meet the following requirements: - genotypes are in row (one row per genotype) - markers are in columns - each element is a string of characters coding alleles " Which 'scot' does not fulfil, as it contains no-marker data. You probably want to use something like: df2genind(scot[, -(1:2)], ...) Cheers Thibaut ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Mark Coulson [coulsonmw at gmail.com] Sent: 28 November 2014 13:22 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] data input from dataframe Ok, so I got rid of my 'River' column so it is now Col 1 is a population label name and col 2 is an individual identifier. the remaining columns are the genotype data now reading 4 digits (i.e. 0206, 1220, etc.). scot <- read.table("Scotland_adegenet_no_river_names.txt", header=TRUE, sep="\t", quote="\"", colClasses="character", stringsAsFactors=FALSE) executing the conversion via df2genind(scot, sep="", ind.names=scot$Individual, pop=scot$Population, missing="0", ploidy=2, type="codom", ncode=4) I still get the following error Error in as.matrix(as.data.frame(strsplit(X, sep))) : error in evaluating the argument 'x' in selecting a method for function 'as.matrix': Error in data.frame(c("A", "n", "n", "A", "e"), c("A", "n", "n", "A", : arguments imply differing number of rows: 5, 10, 8, 7, 9, 12, 11, 16, 17, 14, 13, 15, 18, 25, 27, 23, 21, 26, 20, 19, 22, 24, 6, 4, 1 While I can get the genepop file accepted, being able to take a dataframe as indicated would be much easier for what I am hoping to do and the size of the datasets etc. Cheers, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From cristenwatt at trentu.ca Sat Nov 29 05:37:45 2014 From: cristenwatt at trentu.ca (Cristen Watt) Date: Fri, 28 Nov 2014 23:37:45 -0500 Subject: [adegenet-forum] Use of other projected coordinate systems (e.g. Lambert conformal conic) in the sPCA Message-ID: Hello, I have a question regarding conversion to UTM and the way the sPCA handles spatial data. I have lynx sampling locations from Western Canada and Alaska. I was originally given these locations in the North America Lambert Conformal Conic projection. However, I saw that the sPCA should be done using UTM. I converted my points to lat/long in ArcMap, then tried using convUL to convert to UTM. However, I am concerned with this method because my coordinates span a wide geographical extent. The convUL conversion is not recommended for more than one zone to the right or left of the central zone. My data span 10 zones (from Western Alaska to Eastern Alberta). I tried using convUL with the most central zone, but the connection network looked quite distorted, and am concerned about the possibility of 'erroneous results' mentioned in the PBSmapping package. Here is my main question: Since the projection I originally used (North America Lambert Conformal Conic) is already in meters, is it possible to use these coordinates without conversion to UTM? Will the sPCA perform the correct analyses using this projection? Thank you, Cristen -------------- next part -------------- An HTML attachment was scrubbed... URL: From coulsonmw at gmail.com Sat Nov 29 11:39:46 2014 From: coulsonmw at gmail.com (Mark Coulson) Date: Sat, 29 Nov 2014 10:39:46 +0000 Subject: [adegenet-forum] data input & reading error Message-ID: Thanks for the help. It now converts to a genind object, however I notice that it thinks all 17 of my markers each have 10 alleles (which they don't!). Also, when I run y<- summary(x) I notice that 'Percent of missing data' is 0 (again, it isn't)? I can still run other commands, and analyses, etc. but obviously these are meaningless if the alleles aren't correctly interpreted. Thanks, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: