From roman.lustrik at biolitika.si Sat Dec 2 12:02:37 2017 From: roman.lustrik at biolitika.si (Roman =?utf-8?Q?Lu=C5=A1trik?=) Date: Sat, 2 Dec 2017 12:02:37 +0100 (CET) Subject: [adegenet-forum] Question about seploc error: "Error in dimnames(x) <- dn : length of 'dimnames' [2] not equal to array extent" In-Reply-To: References: Message-ID: <1851487191.521659.1512212557605.JavaMail.zimbra@biolitika.si> Sorry for the late reply. Can you make a small subset of your data which demonstrates this problem? ---- In god we trust, all others bring data. From: "Rob Syme" To: adegenet-forum at lists.r-forge.r-project.org Cc: "muhammadqudratullah farooqi" Sent: Monday, November 20, 2017 10:41:11 AM Subject: [adegenet-forum] Question about seploc error: "Error in dimnames(x) <- dn : length of 'dimnames' [2] not equal to array extent" In trying to run seploc on a genind object (output from gl.read.silicodart from dartR v0.93), I get the error: >is.genind(gl) [1] TRUE > seploc(gl) Error in dimnames(x) <- dn : length of 'dimnames' [2] not equal to array extent 10. `colnames<-`(`*tmp*`, value = seq(ncol(tab))) 9. .local(.Object, ...) 8. initialize(value, ...) 7. initialize(value, ...) 6. new("genind", ...) 5. FUN(X[[i]], ...) 4. lapply(kX, genind, pop = x at pop, prevcall = prevcall, ploidy = x at ploidy, type = x at type) 3. .local(x, ...) 2. seploc(gl) 1. seploc(gl) Is there any way we can track down what peculiarity of our input data is causing the error? Thanks Rob Syme Curtin University _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason at joines.org Wed Dec 6 05:00:51 2017 From: jason at joines.org (Jason Paul Joines) Date: Tue, 5 Dec 2017 23:00:51 -0500 (EST) Subject: [adegenet-forum] what does 'd = ' mean in pca plots Message-ID: adegenet Users, I used the scatter function to plot the glPca of a genlight object as shown in tutorial-genomics.pdf. The plot includes a small inset barplot of Eigenvalues in the bottom left corner. The top right corner has the text "d = n" where n is a number that differs from one plot to the next. What does "d = n" represent? Thanks, Jason =========== From roman.lustrik at biolitika.si Wed Dec 6 15:10:00 2017 From: roman.lustrik at biolitika.si (Roman =?utf-8?Q?Lu=C5=A1trik?=) Date: Wed, 6 Dec 2017 15:10:00 +0100 (CET) Subject: [adegenet-forum] what does 'd = ' mean in pca plots In-Reply-To: References: Message-ID: <1510117891.578847.1512569400724.JavaMail.zimbra@biolitika.si> Following the code, I can trace it to ade4::scatter.grid (https://github.com/cran/ade4/blob/d571403a7ed08b2649c44901014aeb9e480ac1b0/R/scatterutil.R#L506). `xaxp` is a vector of coordinates and number of intervals between tick marks (see ?par). The code takes second and first coordinate and divides it by the number of intervals. And `a`, which is `d = a`, appears to be the minimum of this for x and y. Sorry for not being less technical. HTH, Roman ---- In god we trust, all others bring data. ----- Original Message ----- From: "Jason Paul Joines" To: adegenet-forum at lists.r-forge.r-project.org Sent: Wednesday, December 6, 2017 5:00:51 AM Subject: [adegenet-forum] what does 'd = ' mean in pca plots adegenet Users, I used the scatter function to plot the glPca of a genlight object as shown in tutorial-genomics.pdf. The plot includes a small inset barplot of Eigenvalues in the bottom left corner. The top right corner has the text "d = n" where n is a number that differs from one plot to the next. What does "d = n" represent? Thanks, Jason =========== _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum From jason at joines.org Wed Dec 6 15:49:41 2017 From: jason at joines.org (Jason Paul Joines) Date: Wed, 6 Dec 2017 09:49:41 -0500 (EST) Subject: [adegenet-forum] what does 'd = ' mean in pca plots In-Reply-To: <1510117891.578847.1512569400724.JavaMail.zimbra@biolitika.si> References: <1510117891.578847.1512569400724.JavaMail.zimbra@biolitika.si> Message-ID: Interesting. Then it seems that 'd = a' only describes properties of the plot and says nothing about the analysis. Thanks for digging that out of the code. Jason =========== ---------------------------------Original-Message--------------------------------- > Date: Wed, 6 Dec 2017 09:10:00 > From: Roman Lu?trik > To: Jason Paul Joines > Cc: adegenet-forum at lists.r-forge.r-project.org > Subject: Re: [adegenet-forum] what does 'd = ' mean in pca plots > > Following the code, I can trace it to ade4::scatter.grid (https://github.com/cran/ade4/blob/d571403a7ed08b2649c44901014aeb9e480ac1b0/R/scatterutil.R#L506). > `xaxp` is a vector of coordinates and number of intervals between tick marks (see ?par). The code takes second and first coordinate and divides it by the number of intervals. And `a`, which is `d = a`, appears to be the minimum of this for x and y. Sorry for not being less technical. > > HTH, > Roman > > > ---- > In god we trust, all others bring data. > > ----- Original Message ----- > From: "Jason Paul Joines" > To: adegenet-forum at lists.r-forge.r-project.org > Sent: Wednesday, December 6, 2017 5:00:51 AM > Subject: [adegenet-forum] what does 'd = ' mean in pca plots > > adegenet Users, > > I used the scatter function to plot the glPca of a genlight object > as shown in tutorial-genomics.pdf. The plot includes a small inset > barplot of Eigenvalues in the bottom left corner. The top right corner > has the text "d = n" where n is a number that differs from one plot to the > next. What does "d = n" represent? > > > Thanks, > > Jason > =========== > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum > > From thibautjombart at gmail.com Wed Dec 6 18:50:04 2017 From: thibautjombart at gmail.com (Thibaut Jombart) Date: Wed, 6 Dec 2017 17:50:04 +0000 Subject: [adegenet-forum] Grouping crossvalidation DAPC In-Reply-To: References: Message-ID: Hi Flo, sorry about the late reply. I'd make an excuse, but it won't bit "been busy". In short: There is no proper solution to the group labelling issue you mention, at least none that I know if. A workaround I have used alongside other colleagues uses two statistics, relying on all pairwise comparisons of individuals in the clusters: - % of times 2 indiv are in the same cluster when they should - % of times 2 indiv are in different clusters when they should The average of the two quantities is I think the rand index: https://en.wikipedia.org/wiki/Rand_index For the second, simple answer: DAPC is *bad* at finding hybrids. For this, consider using the soon-to-be-published (hopefully) method 'snapclust', also in adegenet. Doc for this is hidden there: https://github.com/thibautjombart/adegenet/raw/master/tutorials/tutorial-snapclust.pdf Best Thibaut -- Dr Thibaut Jombart Lecturer, Department of Infectious Disease Epidemiology, Imperial College London Head of RECON: repidemicsconsortium.org WHO Consultant - outbreak analysis sites.google.com/site/thibautjombart/ Twitter: @TeebzR +44(0)20 7594 3658 On 21 November 2017 at 18:45, Florian Leplat wrote: > Dear Adegenet user, > > > I was previously using adegenet package for genetic applications. > > Since I started to use the DAPC function. The methods is very informative. > However there are some details that I would like to understand better. > > > > The main objective of my use is to group a certain number of plant > genotype regarding to their genotypic background. Each genotype are > homozygous lines. > > I have no prerequisite information for any genotype. Therefore the main > idea is to attribute a ?group number? for each of my plants. > > > > I does work fine with a good comprehensiveness regarding to the > information that I have for each plants (origin, parents...) > > However, one of the limitations that I face is the repeatability of the > grouping. If running several time, a certain number of plant will be > attributed to a different group. Is there a cross-validation procedure at > this step in order to look at the percentage of plant always grouped > together for each run ? > > > > Furthermore, even if my grouping are quite consistent ?group wise? from > one run to the other, the grouping number will change. Is there a way to > solve that (for instance give a group number to the founder genotype of our > population) ? > > > > Then I have a second issue. After the first step which define my groups, I > would like to plot hybrids genotypes which should (could) be an admixture > between 2 groups. Therefore I cannot use them to build my groups as they > add some noise to the model. I wanted to use the procedure described in the > tutorial using supplementary individuals. I realized that it only works if > the supplementary individuals have already grouping information, however I > don?t have this information. Indeed I only want to see (mostly visually) > where the hybrids are positioned in relation to the other groups that I > previously defined. Is there a way to use the script to do that ? > > > > Thanks in advance for your help. > > Best regards. > > > *Flo* > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/ > listinfo/adegenet-forum > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jnwang at ksu.edu Tue Dec 12 18:41:00 2017 From: jnwang at ksu.edu (Jianan Wang) Date: Tue, 12 Dec 2017 17:41:00 -0000 Subject: [adegenet-forum] Large data set problem for DAPC Message-ID: <7C3FB9D6-5919-402F-A810-A963F46CE593@ksu.edu> Dear Thibaut, I?m running DAPC for a genomic data set including 400,000 SNPs and about 2000 individuals. I converted my vcf genomic data set to the genlight format, and used multiple cores. The major command line is as below: dapc1 <- dapc(GBSgenlight, combine_race, n.rep = 3, n.pca=10, parallel = "multicore", ncpus = 4) If I run a data subset including about 1000 individuals and pre-defined subpopulation names, the DAPC runs very well. However, when I run the full data set (2000 individuals), the DAPC job in the High Performance Cluster always quit automatically and no any result returned, after a long calculating or frozen. It will be appreciated if anyone would like share any suggestions or solutions to address my problem. In addition, can DAPC run a whole genome re-sequencing SNP data set? Or what?s the maximum data set can the DAPC cope with? Thanks in advance. Jianan From andrew.j.veale at gmail.com Wed Dec 13 14:08:53 2017 From: andrew.j.veale at gmail.com (Andrew Veale) Date: Wed, 13 Dec 2017 13:08:53 -0000 Subject: [adegenet-forum] POV in DAPC Message-ID: Hey! I have been asked to add the percentage of variance explained by each axis in the DAPC scatter plot. I?m not sure how best to do this, and I think it isn?t the most meaningful thing to do as it isn?t a straight PCA. The POV of a PCA is fine, and I can see how to get the Eigen-values for each PC retained - just not the then combined ones for the DAPC. Any thoughts on this? Thanks!