From mw14533.2014 at my.bristol.ac.uk Wed Nov 23 21:05:17 2016 From: mw14533.2014 at my.bristol.ac.uk (Max Williams) Date: Wed, 23 Nov 2016 20:05:17 +0000 Subject: [adegenet-forum] PCoA Message-ID: Dear All I have recently used PCoA in adegenet to produce the following graph, using the s.class function (shown in attachment). Once i defined my PCoA data as the variable "pca.kelp" i inputted this into adegenet: s.class(pca.cows$li, fac=pop(microbov), col=funky(15)) s.class(pca.kelp$li, fac=pop(genind1), +col=transp(funky(15),.6), +axesel=FALSE, cstar=0, cpoint=3) I am fairly new to R statistics and would appreciate your time. Many thanks Max -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: pcoc,3axes.jpg Type: image/jpeg Size: 16081 bytes Desc: not available URL: From roman.lustrik at biolitika.si Thu Nov 24 08:48:03 2016 From: roman.lustrik at biolitika.si (Roman =?utf-8?Q?Lu=C5=A1trik?=) Date: Thu, 24 Nov 2016 08:48:03 +0100 (CET) Subject: [adegenet-forum] PCoA In-Reply-To: References: Message-ID: <1646837837.194970.1479973683914.JavaMail.zimbra@biolitika.si> Hello Max, is there a question here? If you have problems with your code, it's best to provide a reproducible example. Feel free to use any of the datasets that comes shipped with adegenet or perhaps you could simulate your own data. Cheers, Roman ---- In god we trust, all others bring data. From: "Max Williams" To: adegenet-forum at lists.r-forge.r-project.org Sent: Wednesday, November 23, 2016 9:05:17 PM Subject: [adegenet-forum] PCoA Dear All I have recently used PCoA in adegenet to produce the following graph, using the s.class function (shown in attachment). Once i defined my PCoA data as the variable "pca.kelp" i inputted this into adegenet: s.class(pca.cows$li, fac=pop(microbov), col=funky(15)) s.class(pca.kelp$li, fac=pop(genind1), +col=transp(funky(15),.6), +axesel=FALSE, cstar=0, cpoint=3) I am fairly new to R statistics and would appreciate your time. Many thanks Max _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum -------------- next part -------------- An HTML attachment was scrubbed... URL: From j.kro365 at gmail.com Wed Nov 2 18:41:10 2016 From: j.kro365 at gmail.com (John Kronenberger) Date: Wed, 02 Nov 2016 17:41:10 -0000 Subject: [adegenet-forum] Squared distance between groups Message-ID: Hey there, I'm aware of the mstree argument, but is there any way to output the squared distance between groups? For example, if I'm interested in the relative similarity between each group. Best, John -- John A. Kronenberger Master's Student and Teaching Assistant Graduate Degree Program in Ecology Department of Biology Colorado State University johnkronenberger.weebly.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From roman.lustrik at biolitika.si Thu Nov 24 10:27:47 2016 From: roman.lustrik at biolitika.si (Roman =?utf-8?Q?Lu=C5=A1trik?=) Date: Thu, 24 Nov 2016 10:27:47 +0100 (CET) Subject: [adegenet-forum] Fwd: PCoA In-Reply-To: References: <1646837837.194970.1479973683914.JavaMail.zimbra@biolitika.si> Message-ID: <1753299968.286478.1479979667007.JavaMail.zimbra@biolitika.si> Max is asking on how to remove the labels. This example was taken from the adegenet vignette ( see page 57 ) and it gets you pretty close to what you want. library(adegenet) data(microbov) mb <- scaleGen(microbov, NA.method = "mean") pca.cows <- dudi.pca(mb, center = FALSE, scale = FALSE, scannf = FALSE, nf = 3) par(mfrow = c(2, 1)) s.class(pca.cows$li, pop(microbov), xax = 1, yax = 3, col = transp(funky(15), .6), axesell=FALSE, cstar=0, cpoint=3, grid=FALSE) colorplot(pca.cows$li[c(1,3)], pca.cows$li, transp=TRUE, cex=3, xlab="PC 1", ylab="PC 3") title("PCA of microbov dataset\naxes 1-3") abline(v=0,h=0,col="grey", lty=2) Cheers, Roman ---- In god we trust, all others bring data. From: "mw14533 2014" To: "Roman Lu?trik" Sent: Thursday, November 24, 2016 9:39:45 AM Subject: Re: [adegenet-forum] PCoA Hello Roman The question i meant to ask was how do i remove the labels from the graph? sorry i seem to have been quite unhelpful and forgot to put that in my initial email. Many thanks Max On Thu, Nov 24, 2016 at 7:48 AM, Roman Lu?trik < roman.lustrik at biolitika.si > wrote: Hello Max, is there a question here? If you have problems with your code, it's best to provide a reproducible example. Feel free to use any of the datasets that comes shipped with adegenet or perhaps you could simulate your own data. Cheers, Roman ---- In god we trust, all others bring data. From: "Max Williams" < mw14533.2014 at my.bristol.ac.uk > To: adegenet-forum at lists.r-forge.r-project.org Sent: Wednesday, November 23, 2016 9:05:17 PM Subject: [adegenet-forum] PCoA Dear All I have recently used PCoA in adegenet to produce the following graph, using the s.class function (shown in attachment). Once i defined my PCoA data as the variable "pca.kelp" i inputted this into adegenet: s.class(pca.cows$li, fac=pop(microbov), col=funky(15)) s.class(pca.kelp$li, fac=pop(genind1), +col=transp(funky(15),.6), +axesel=FALSE, cstar=0, cpoint=3) I am fairly new to R statistics and would appreciate your time. Many thanks Max _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Rplot.jpeg Type: image/jpeg Size: 140648 bytes Desc: not available URL: From thibautjombart at gmail.com Fri Nov 25 19:04:03 2016 From: thibautjombart at gmail.com (Thibaut Jombart) Date: Fri, 25 Nov 2016 18:04:03 +0000 Subject: [adegenet-forum] Squared distance between groups In-Reply-To: References: Message-ID: I would use go for: - dist.genpop - various measures of pairwise Fst in hierfstat Best Thibaut -- Dr Thibaut Jombart Lecturer, Department of Infectious Disease Epidemiology, Imperial College London Head of RECON: repidemicsconsortium.org sites.google.com/site/thibautjombart/ github.com/thibautjombart Twitter: @TeebzR On 2 November 2016 at 17:41, John Kronenberger wrote: > Hey there, > > I'm aware of the mstree argument, but is there any way to output the squared > distance between groups? For example, if I'm interested in the relative > similarity between each group. > > Best, > > John > > -- > John A. Kronenberger > Master's Student and Teaching Assistant > Graduate Degree Program in Ecology > Department of Biology > Colorado State University > johnkronenberger.weebly.com > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum From biz.sheedy at gmail.com Fri Nov 25 10:44:16 2016 From: biz.sheedy at gmail.com (Biz Sheedy) Date: Fri, 25 Nov 2016 18:44:16 +0900 Subject: [adegenet-forum] Discrepancy in NA counts Message-ID: Dear All, I am trying to read SNP data from Stacks into adegenet. I have tried read.structure and read.genepop but they both give (the same) NA counts that are higher than expected. Using read.table on the structure-formatted file (with "ind" and "pop" inserted into the first two columns of row one) gave the expected number of missing data. I looked at a single population subset (both the original and the converted data) in excel and found a locus where in the original data, all nine individuals were "3", but in the converted data one individual was "NA". The loci before and after this one both matched/were correct. I am not sure what I have missed for this to happen, my R skills are beginner at best. Any help with reading the data in correctly would be greatly appreciated! Thank you, Elizabeth R version 3.3.2 adegenet version 2.0.1 Data: 44 individuals, diploid, 4279 loci. all<-read.structure("all_batch_1.stru", NA.char="0") Total cells in excel: 376552 After read.structure/genepop: 44*8558=376552 0s in excel: 3952 0s after read.table; length(which(X==0)): 3952 NA after read.structure/genepop; sum(is.na(all$tab)): 4008 Difference: 56 Subset Chichi Total cells: 77022 After read.structure/genepop: 9*8558=77022 0s in excel: 742 NA after read.structure/genepop; sum(is.na(chi$tab)): 756 Difference: 14 -- 4-1-1 Amakubo Department of Botany National Museum of Nature and Science Tsukuba, Ibaraki 305-0005 Japan biz.sheedy at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From panda143526 at gmail.com Fri Nov 25 16:23:47 2016 From: panda143526 at gmail.com (Da Pan) Date: Fri, 25 Nov 2016 16:23:47 +0100 Subject: [adegenet-forum] question on scaleGen() Message-ID: Dear Thimbaut and adegenet users, Thank you for your time and help. I probably have some more naif questions about adegenet. I am attemping a PCA analysis on my SNP dataset with the following arguements: test <- read.structure("batch_1.str",n.ind = 17, n.loc = 12451, col.lab = 1, col.pop = 2, row.marknames = 1, NA.char = "0") test2 <- scaleGen(test, NA.method = "mean") After this, the R shows: Warning message: In .local(x, ...) : Some scaling values are null. Corresponding alleles are removed. I checked pop(test), it returned as : 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 02 02 04 04 06 06 07 07 19 19 38 38 39 39 46 46 46 Levels: 02 04 06 07 19 38 39 46 How to solve this problem? thanks in advance Best wishes, Da -------------- next part -------------- An HTML attachment was scrubbed... URL: From arsalan at pobox.com Sun Nov 27 22:03:04 2016 From: arsalan at pobox.com (Arsalan Emami-Khoyi) Date: Mon, 28 Nov 2016 00:33:04 +0330 Subject: [adegenet-forum] scatter plot Message-ID: <1480280584.2062924.800481737.58E61574@webmail.messagingengine.com> Dear Dr. jombart and other users, I am not a very experienced user of adegenet ! I just follow tutorials and at final stage when I used scatter(dapc1) what I get is only few squares by numbers( screenshot attached) showing my clusters and individuals are not showed ! I am wondering if any body can help me to solve the issue ? apologize for basic question. Many thanks in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: scatter.jpg Type: image/jpeg Size: 76470 bytes Desc: not available URL: From roman.lustrik at biolitika.si Mon Nov 28 08:27:19 2016 From: roman.lustrik at biolitika.si (Roman =?utf-8?Q?Lu=C5=A1trik?=) Date: Mon, 28 Nov 2016 08:27:19 +0100 (CET) Subject: [adegenet-forum] scatter plot In-Reply-To: <1480280584.2062924.800481737.58E61574@webmail.messagingengine.com> References: <1480280584.2062924.800481737.58E61574@webmail.messagingengine.com> Message-ID: <922431229.570707.1480318039300.JavaMail.zimbra@biolitika.si> Hi, can you share the code you are using? On which dataset are you running it? If you are following the code from the vignette (along with the dataset that comes with adegenet) have you tried clearing your R session and running the script on a clean environment? Cheers, Roman ---- In god we trust, all others bring data. From: "Arsalan Emami-Khoyi" To: adegenet-forum at lists.r-forge.r-project.org, "Thibaut Jombart" Sent: Sunday, November 27, 2016 10:03:04 PM Subject: [adegenet-forum] scatter plot Dear Dr. jombart and other users, I am not a very experienced user of adegenet ! I just follow tutorials and at final stage when I used scatter(dapc1) what I get is only few squares by numbers( screenshot attached) showing my clusters and individuals are not showed ! I am wondering if any body can help me to solve the issue ? apologize for basic question. Many thanks in advance _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum -------------- next part -------------- An HTML attachment was scrubbed... URL: From roman.lustrik at biolitika.si Mon Nov 28 08:31:07 2016 From: roman.lustrik at biolitika.si (Roman =?utf-8?Q?Lu=C5=A1trik?=) Date: Mon, 28 Nov 2016 08:31:07 +0100 (CET) Subject: [adegenet-forum] Discrepancy in NA counts In-Reply-To: References: Message-ID: <1051082989.570757.1480318267345.JavaMail.zimbra@biolitika.si> Hi, can you share a (subset) of the dataset? It's hard to pinpoint where things might be going wrong without some data in hand. Cheers, Roman ---- In god we trust, all others bring data. From: "Biz Sheedy" To: adegenet-forum at lists.r-forge.r-project.org Sent: Friday, November 25, 2016 10:44:16 AM Subject: [adegenet-forum] Discrepancy in NA counts Dear All, I am trying to read SNP data from Stacks into adegenet. I have tried read.structure and read.genepop but they both give (the same) NA counts that are higher than expected. Using read.table on the structure-formatted file (with "ind" and "pop" inserted into the first two columns of row one) gave the expected number of missing data. I looked at a single population subset (both the original and the converted data) in excel and found a locus where in the original data, all nine individuals were "3", but in the converted data one individual was "NA". The loci before and after this one both matched/were correct. I am not sure what I have missed for this to happen, my R skills are beginner at best. Any help with reading the data in correctly would be greatly appreciated! Thank you, Elizabeth R version 3.3.2 adegenet version 2.0.1 Data: 44 individuals, diploid, 4279 loci. all<-read.structure("all_batch_1.stru", NA.char="0") Total cells in excel: 376552 After read.structure/genepop: 44*8558=376552 0s in excel: 3952 0s after read.table; length(which(X==0)): 3952 NA after read.structure/genepop; sum( is.na (all$tab)): 4008 Difference: 56 Subset Chichi Total cells: 77022 After read.structure/genepop: 9*8558=77022 0s in excel: 742 NA after read.structure/genepop; sum( is.na (chi$tab)): 756 Difference: 14 -- 4-1-1 Amakubo Department of Botany National Museum of Nature and Science Tsukuba, Ibaraki 305-0005 Japan biz.sheedy at gmail.com _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum -------------- next part -------------- An HTML attachment was scrubbed... URL: From Oliver.Berry at csiro.au Mon Nov 28 08:31:16 2016 From: Oliver.Berry at csiro.au (Oliver.Berry at csiro.au) Date: Mon, 28 Nov 2016 07:31:16 +0000 Subject: [adegenet-forum] scatter plot In-Reply-To: <1480280584.2062924.800481737.58E61574@webmail.messagingengine.com> References: <1480280584.2062924.800481737.58E61574@webmail.messagingengine.com> Message-ID: Hi Arsalam, It looks like you may have your labels (clab) set to 1 (or some other positive value) so that it is covering up your points. Perhaps your symbols (cex) are also set to either zero or a very small value. You might also have cstar set to zero so you wont see any of the lines joining points to the centroid for that group. I suggest tying higher values for your cex and cstar. You could also set clab=0 to remove the labels to see if theyre obscuring the points. I prefer to use a legend than direct labels. Cheers, Olly Dr Oliver Berry Senior Research Scientist | CSIRO Ocean and Atmosphere Team Leader | Coastal Ecosystems and Modelling Adjunct Senior Lecturer, School of Animal Biology, The University of Western Australia Phone: +61 8 9333 6584 | 0400 747 197 |Fax: +61 8 9333 6499 oliver.berry at csiro.au| www.csiro.au/en/Research/OandA Address: Centre for Environment and Life Sciences, Cnr Underwood Ave. & Brockway Rd, Floreat, WA, 6014 Postal: Private Mail Bag 5, Wembley, WA, 6913 PLEASE NOTE The information contained in this email may be confidential or privileged. Any unauthorised use or disclosure is prohibited. If you have received this email in error, please delete it immediately and notify the sender by return email. Thank you. To the extent permitted by law, CSIRO does not represent, warrant and/or guarantee that the integrity of this communication has been maintained or that the communication is free of errors, virus, interception or interference. From: adegenet-forum-bounces at lists.r-forge.r-project.org [mailto:adegenet-forum-bounces at lists.r-forge.r-project.org] On Behalf Of Arsalan Emami-Khoyi Sent: Monday, 28 November 2016 5:03 AM To: adegenet-forum at lists.r-forge.r-project.org; thibautjombart at gmail.com Subject: [adegenet-forum] scatter plot Dear Dr. jombart and other users, I am not a very experienced user of adegenet ! I just follow tutorials and at final stage when I used scatter(dapc1) what I get is only few squares by numbers( screenshot attached) showing my clusters and individuals are not showed ! I am wondering if any body can help me to solve the issue ? apologize for basic question. Many thanks in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: From roman.lustrik at biolitika.si Mon Nov 28 10:03:39 2016 From: roman.lustrik at biolitika.si (Roman =?utf-8?Q?Lu=C5=A1trik?=) Date: Mon, 28 Nov 2016 10:03:39 +0100 (CET) Subject: [adegenet-forum] Discrepancy in NA counts In-Reply-To: References: <1051082989.570757.1480318267345.JavaMail.zimbra@biolitika.si> Message-ID: <656891479.571300.1480323819198.JavaMail.zimbra@biolitika.si> Hi, I think the problem is that adegenet, for consistency, adds NAs to accommodate the extra alleles present for a particular locus. Take for example C_KH1238 (bottom row in the example pasted belo). In raw file, it has missing values for locus 1378_53, but this locus has three alleles, ergo 3 NAs and not 2. Can't go through all the NAs right now, but I think there's a pretty good chance this is what is causing the discrepancy between what you see in "excel" and in adegenet. 1369_41.11 1372_14.22 1372_14.24 1373_9.44 1373_9.24 1377_42.44 1377_42.24 1378_53.22 1378_53.24 1378_53.44 1379_10.33 1379_10.13 1382_37.33 ... C_KH1238 0 1 0 1 0 1 0 NA NA NA 1 0 1 # notice 3 NAs for all available alleles for 1378_53, not just two (as expected for diploid) Here is the code I used to explore this: library(adegenet) xy <- read.table("Sub_batch_1.stru", header = TRUE, sep = "\t") xy <- xy[, c(-1, -2)] table(as.matrix(xy)) # 0 1 2 3 4 # 16 467 618 760 867 xy <- read.structure("Sub_batch_1.stru", NA.char="0", n.ind = 44, n.loc = 31, onerowperind = FALSE, col.lab = 1, col.pop = 2, row.marknames = 1, sep = "\t", col.others = 0) xy <- tab(xy) xy[grepl("C_KH1238", rownames(xy)), grepl("1378_53", colnames(xy))] Cheers, Roman ---- In god we trust, all others bring data. From: "Biz Sheedy" To: "Roman Lu?trik" Sent: Monday, November 28, 2016 9:11:39 AM Subject: Re: [adegenet-forum] Discrepancy in NA counts My apologies. First time posting to a forum so I am a little unsure of things. I have attached a subset of the data, which includes the locus that I saw had problems. In this case there are 31 loci with 16 zeroes counted (excel), and 20 NAs counted (adegenet). The additional NAs occur in locus 1401_25. Thanks so much, Elizabeth On 28 November 2016 at 16:31, Roman Lu?trik < roman.lustrik at biolitika.si > wrote: Hi, can you share a (subset) of the dataset? It's hard to pinpoint where things might be going wrong without some data in hand. Cheers, Roman ---- In god we trust, all others bring data. From: "Biz Sheedy" < biz.sheedy at gmail.com > To: adegenet-forum at lists.r-forge.r-project.org Sent: Friday, November 25, 2016 10:44:16 AM Subject: [adegenet-forum] Discrepancy in NA counts Dear All, I am trying to read SNP data from Stacks into adegenet. I have tried read.structure and read.genepop but they both give (the same) NA counts that are higher than expected. Using read.table on the structure-formatted file (with "ind" and "pop" inserted into the first two columns of row one) gave the expected number of missing data. I looked at a single population subset (both the original and the converted data) in excel and found a locus where in the original data, all nine individuals were "3", but in the converted data one individual was "NA". The loci before and after this one both matched/were correct. I am not sure what I have missed for this to happen, my R skills are beginner at best. Any help with reading the data in correctly would be greatly appreciated! Thank you, Elizabeth R version 3.3.2 adegenet version 2.0.1 Data: 44 individuals, diploid, 4279 loci. all<-read.structure("all_batch_1.stru", NA.char="0") Total cells in excel: 376552 After read.structure/genepop: 44*8558=376552 0s in excel: 3952 0s after read.table; length(which(X==0)): 3952 NA after read.structure/genepop; sum( is.na (all$tab)): 4008 Difference: 56 Subset Chichi Total cells: 77022 After read.structure/genepop: 9*8558=77022 0s in excel: 742 NA after read.structure/genepop; sum( is.na (chi$tab)): 756 Difference: 14 -- 4-1-1 Amakubo Department of Botany National Museum of Nature and Science Tsukuba, Ibaraki 305-0005 Japan biz.sheedy at gmail.com _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum -- 4-1-1 Amakubo Department of Botany National Museum of Nature and Science Tsukuba, Ibaraki 305-0005 Japan biz.sheedy at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From biz.sheedy at gmail.com Mon Nov 28 11:00:53 2016 From: biz.sheedy at gmail.com (Biz Sheedy) Date: Mon, 28 Nov 2016 19:00:53 +0900 Subject: [adegenet-forum] Discrepancy in NA counts In-Reply-To: <656891479.571300.1480323819198.JavaMail.zimbra@biolitika.si> References: <1051082989.570757.1480318267345.JavaMail.zimbra@biolitika.si> <656891479.571300.1480323819198.JavaMail.zimbra@biolitika.si> Message-ID: Thanks for looking into this. Something that I did differently to the code you provided, was that I only answered the prompts for the read.structure function. This meant I did not use sep="\t" and the number of alleles was 62 instead of 72, which I think should be comparable to the excel count. Following the code you provide, ' is.na' finds 23 NAs (instead of 20 NAs at 62 alleles and 16 zeroes in excel). Your explanation makes sense to me for the additional three NAs in adegenet, but I still don't understand how in locus 1401_25 the data for two individuals (C_KH1059 and M_KH1834) changed from being homozygous for "3" to being "NA"? I would really appreciate any further help on this. Thanks again, Elizabeth On 28 November 2016 at 18:03, Roman Lu?trik wrote: > Hi, > > I think the problem is that adegenet, for consistency, adds NAs to > accommodate the extra alleles present for a particular locus. Take for > example C_KH1238 (bottom row in the example pasted belo). > In raw file, it has missing values for locus 1378_53, but this locus has > three alleles, ergo 3 NAs and not 2. Can't go through all the NAs right > now, but I think there's a pretty good chance this is what is causing the > discrepancy between what you see in "excel" and in adegenet. > > 1369_41.11 1372_14.22 1372_14.24 1373_9.44 1373_9.24 1377_42.44 1377_42.24 > 1378_53.22 1378_53.24 1378_53.44 1379_10.33 1379_10.13 1382_37.33 > ... > C_KH1238 0 1 0 1 0 1 0 *NA NA NA* 1 0 1 # notice 3 NAs for all available > alleles for 1378_53, not just two (as expected for diploid) > > > Here is the code I used to explore this: > > library(adegenet) > > xy <- read.table("Sub_batch_1.stru", header = TRUE, sep = "\t") > xy <- xy[, c(-1, -2)] > table(as.matrix(xy)) > > # 0 1 2 3 4 > # 16 467 618 760 867 > > > xy <- read.structure("Sub_batch_1.stru", NA.char="0", > n.ind = 44, n.loc = 31, onerowperind = FALSE, > col.lab = 1, col.pop = 2, row.marknames = 1, > sep = "\t", col.others = 0) > > xy <- tab(xy) > xy[grepl("C_KH1238", rownames(xy)), grepl("1378_53", colnames(xy))] > > Cheers, > Roman > > ---- > In god we trust, all others bring data. > > ------------------------------ > *From: *"Biz Sheedy" > *To: *"Roman Lu?trik" > *Sent: *Monday, November 28, 2016 9:11:39 AM > *Subject: *Re: [adegenet-forum] Discrepancy in NA counts > > My apologies. First time posting to a forum so I am a little unsure of > things. I have attached a subset of the data, which includes the locus that > I saw had problems. > > In this case there are 31 loci with 16 zeroes counted (excel), and 20 NAs > counted (adegenet). The additional NAs occur in locus 1401_25. > > Thanks so much, > Elizabeth > > On 28 November 2016 at 16:31, Roman Lu?trik > wrote: > >> Hi, >> >> can you share a (subset) of the dataset? It's hard to pinpoint where >> things might be going wrong without some data in hand. >> >> Cheers, >> Roman >> >> ---- >> In god we trust, all others bring data. >> >> ------------------------------ >> *From: *"Biz Sheedy" >> *To: *adegenet-forum at lists.r-forge.r-project.org >> *Sent: *Friday, November 25, 2016 10:44:16 AM >> *Subject: *[adegenet-forum] Discrepancy in NA counts >> >> Dear All, >> >> I am trying to read SNP data from Stacks into adegenet. I have tried >> read.structure and read.genepop but they both give (the same) NA counts >> that are higher than expected. Using read.table on the structure-formatted >> file (with "ind" and "pop" inserted into the first two columns of row one) >> gave the expected number of missing data. >> >> I looked at a single population subset (both the original and the >> converted data) in excel and found a locus where in the original data, all >> nine individuals were "3", but in the converted data one individual was >> "NA". The loci before and after this one both matched/were correct. >> >> I am not sure what I have missed for this to happen, my R skills are >> beginner at best. Any help with reading the data in correctly would be >> greatly appreciated! >> >> Thank you, >> Elizabeth >> >> >> R version 3.3.2 >> adegenet version 2.0.1 >> >> Data: 44 individuals, diploid, 4279 loci. >> >> all<-read.structure("all_batch_1.stru", NA.char="0") >> >> Total cells in excel: 376552 >> After read.structure/genepop: 44*8558=376552 >> >> 0s in excel: 3952 >> 0s after read.table; length(which(X==0)): 3952 >> NA after read.structure/genepop; sum(is.na(all$tab)): 4008 >> Difference: 56 >> >> Subset Chichi >> Total cells: 77022 >> After read.structure/genepop: 9*8558=77022 >> >> 0s in excel: 742 >> NA after read.structure/genepop; sum(is.na(chi$tab)): 756 >> Difference: 14 >> >> >> >> -- >> 4-1-1 Amakubo >> Department of Botany >> National Museum of Nature and Science >> Tsukuba, Ibaraki 305-0005 >> Japan >> >> biz.sheedy at gmail.com >> >> _______________________________________________ >> adegenet-forum mailing list >> adegenet-forum at lists.r-forge.r-project.org >> https://lists.r-forge.r-project.org/cgi-bin/mailman/ >> listinfo/adegenet-forum >> > > > > -- > 4-1-1 Amakubo > Department of Botany > National Museum of Nature and Science > Tsukuba, Ibaraki 305-0005 > Japan > > biz.sheedy at gmail.com > > -- 4-1-1 Amakubo Department of Botany National Museum of Nature and Science Tsukuba, Ibaraki 305-0005 Japan biz.sheedy at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From roman.lustrik at biolitika.si Mon Nov 28 13:40:24 2016 From: roman.lustrik at biolitika.si (Roman =?utf-8?Q?Lu=C5=A1trik?=) Date: Mon, 28 Nov 2016 13:40:24 +0100 (CET) Subject: [adegenet-forum] Discrepancy in NA counts In-Reply-To: References: <1051082989.570757.1480318267345.JavaMail.zimbra@biolitika.si> <656891479.571300.1480323819198.JavaMail.zimbra@biolitika.si> Message-ID: <237564767.572607.1480336824232.JavaMail.zimbra@biolitika.si> Hi Elizabeth, it would appear there is something funky happening with the code due to locus names being numeric. This has happened before in some other function. Until we fix this, you can change your locus names so that they start with a letter. Here is the excerpt from the genind object indicating that these two samples have alleles 33: X1401_25.13 X1401_25.33 X1403_13.11 X1403_13.13 X1403_13.33 X1404_17.13 X1404_17.33 X1404_17.11 C_KH1059 0 1 1 0 0 0 1 0 M_KH1834 0 1 1 0 0 1 0 0 Cheers, Roman ---- In god we trust, all others bring data. From: "Biz Sheedy" To: "Roman Lu?trik" Cc: adegenet-forum at lists.r-forge.r-project.org Sent: Monday, November 28, 2016 11:00:53 AM Subject: Re: [adegenet-forum] Discrepancy in NA counts Thanks for looking into this. Something that I did differently to the code you provided, was that I only answered the prompts for the read.structure function. This meant I did not use sep="\t" and the number of alleles was 62 instead of 72, which I think should be comparable to the excel count. Following the code you provide, ' is.na ' finds 23 NAs (instead of 20 NAs at 62 alleles and 16 zeroes in excel). Your explanation makes sense to me for the additional three NAs in adegenet, but I still don't understand how in locus 1401_25 the data for two individuals (C_KH1059 and M_KH1834) changed from being homozygous for "3" to being "NA"? I would really appreciate any further help on this. Thanks again, Elizabeth On 28 November 2016 at 18:03, Roman Lu?trik < roman.lustrik at biolitika.si > wrote: Hi, I think the problem is that adegenet, for consistency, adds NAs to accommodate the extra alleles present for a particular locus. Take for example C_KH1238 (bottom row in the example pasted belo). In raw file, it has missing values for locus 1378_53, but this locus has three alleles, ergo 3 NAs and not 2. Can't go through all the NAs right now, but I think there's a pretty good chance this is what is causing the discrepancy between what you see in "excel" and in adegenet. 1369_41.11 1372_14.22 1372_14.24 1373_9.44 1373_9.24 1377_42.44 1377_42.24 1378_53.22 1378_53.24 1378_53.44 1379_10.33 1379_10.13 1382_37.33 ... C_KH1238 0 1 0 1 0 1 0 NA NA NA 1 0 1 # notice 3 NAs for all available alleles for 1378_53, not just two (as expected for diploid) Here is the code I used to explore this: library(adegenet) xy <- read.table("Sub_batch_1.stru", header = TRUE, sep = "\t") xy <- xy[, c(-1, -2)] table(as.matrix(xy)) # 0 1 2 3 4 # 16 467 618 760 867 xy <- read.structure("Sub_batch_1.stru", NA.char="0", n.ind = 44, n.loc = 31, onerowperind = FALSE, col.lab = 1, col.pop = 2, row.marknames = 1, sep = "\t", col.others = 0) xy <- tab(xy) xy[grepl("C_KH1238", rownames(xy)), grepl("1378_53", colnames(xy))] Cheers, Roman ---- In god we trust, all others bring data. From: "Biz Sheedy" < biz.sheedy at gmail.com > To: "Roman Lu?trik" < roman.lustrik at biolitika.si > Sent: Monday, November 28, 2016 9:11:39 AM Subject: Re: [adegenet-forum] Discrepancy in NA counts My apologies. First time posting to a forum so I am a little unsure of things. I have attached a subset of the data, which includes the locus that I saw had problems. In this case there are 31 loci with 16 zeroes counted (excel), and 20 NAs counted (adegenet). The additional NAs occur in locus 1401_25. Thanks so much, Elizabeth On 28 November 2016 at 16:31, Roman Lu?trik < roman.lustrik at biolitika.si > wrote: BQ_BEGIN Hi, can you share a (subset) of the dataset? It's hard to pinpoint where things might be going wrong without some data in hand. Cheers, Roman ---- In god we trust, all others bring data. From: "Biz Sheedy" < biz.sheedy at gmail.com > To: adegenet-forum at lists.r-forge.r-project.org Sent: Friday, November 25, 2016 10:44:16 AM Subject: [adegenet-forum] Discrepancy in NA counts Dear All, I am trying to read SNP data from Stacks into adegenet. I have tried read.structure and read.genepop but they both give (the same) NA counts that are higher than expected. Using read.table on the structure-formatted file (with "ind" and "pop" inserted into the first two columns of row one) gave the expected number of missing data. I looked at a single population subset (both the original and the converted data) in excel and found a locus where in the original data, all nine individuals were "3", but in the converted data one individual was "NA". The loci before and after this one both matched/were correct. I am not sure what I have missed for this to happen, my R skills are beginner at best. Any help with reading the data in correctly would be greatly appreciated! Thank you, Elizabeth R version 3.3.2 adegenet version 2.0.1 Data: 44 individuals, diploid, 4279 loci. all<-read.structure("all_batch_1.stru", NA.char="0") Total cells in excel: 376552 After read.structure/genepop: 44*8558=376552 0s in excel: 3952 0s after read.table; length(which(X==0)): 3952 NA after read.structure/genepop; sum( is.na (all$tab)): 4008 Difference: 56 Subset Chichi Total cells: 77022 After read.structure/genepop: 9*8558=77022 0s in excel: 742 NA after read.structure/genepop; sum( is.na (chi$tab)): 756 Difference: 14 -- 4-1-1 Amakubo Department of Botany National Museum of Nature and Science Tsukuba, Ibaraki 305-0005 Japan biz.sheedy at gmail.com _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum -- 4-1-1 Amakubo Department of Botany National Museum of Nature and Science Tsukuba, Ibaraki 305-0005 Japan biz.sheedy at gmail.com BQ_END -- 4-1-1 Amakubo Department of Botany National Museum of Nature and Science Tsukuba, Ibaraki 305-0005 Japan biz.sheedy at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From perezalquicira at gmail.com Wed Nov 30 14:28:52 2016 From: perezalquicira at gmail.com (Jessica Perez Alquicira) Date: Wed, 30 Nov 2016 13:28:52 -0000 Subject: [adegenet-forum] tetraploid DAPC Message-ID: Hi, I would like to do a dapc on tetraploid data. My file format is in structure. I have not find this information in the manual. Could you please let me know how could I do that. Best -------------- next part -------------- An HTML attachment was scrubbed... URL: