From t.jombart at imperial.ac.uk Mon Jun 1 11:44:33 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Mon, 1 Jun 2015 09:44:33 +0000 Subject: [adegenet-forum] DAPC with mtDNA data In-Reply-To: References: Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABF3044B@icexch-m1.ic.ac.uk> Hi Francesca, I think a bunch of emails have been exchanged on this topic on the forum. See for instance: http://lists.r-forge.r-project.org/pipermail/adegenet-forum/2014-May/000838.html To find them, use the search engine on the adegenet website: http://adegenet.r-forge.r-project.org/search.html If you don't find your answer, please repost here. Best Thibaut ============================== Dr Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Department of Infectious Disease Epidemiology Imperial College - School of Public Health Norfolk Place, London W2 1PG, UK Tel. : 0044 (0)20 7594 3658 http://sites.google.com/site/thibautjombart/ http://sites.google.com/site/therepiproject/ http://adegenet.r-forge.r-project.org/ Twitter: @thibautjombart ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Francesca Tassi [tssfnc at unife.it] Sent: 29 May 2015 11:33 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] DAPC with mtDNA data Hi, I'm trying to do a DAPC with a sequence matrix of mtDNA. Which is the best way to run find.cluster procedure and then DAPC analysis? Many thanks Francesca -- Francesca Tassi, PhD Dipartimento di Scienze della Vita e Biotecnologie Universit? di Ferrara via Borsari 46 I-44121 Ferrara Phone: +39 0532 455951 Fax: +39 0532 249761 -------------- next part -------------- An HTML attachment was scrubbed... URL: From simon.crameri at env.ethz.ch Tue Jun 2 14:03:09 2015 From: simon.crameri at env.ethz.ch (Crameri Simon) Date: Tue, 2 Jun 2015 12:03:09 +0000 Subject: [adegenet-forum] DAPC, number of retained PCs and number of saved LDFs References: <9D61E83C-092E-402D-B4D0-BE066EE3E4AA@ethz.ch> Message-ID: Hi Thibaut I have a genetic dataset of 125 individuals belonging to 11 different closely related plant species (saved in the @pop slot) and I would like to model the species-genotype relationship using dapc(). Of course one major issue is to find the best number of retained principal components during the PCA step of DAPC. This is my approach to find the best n.pca: - create 100 permuted training sets of my complete dataset, each containing 50% of the samples (sampling stratified for @pop since I have groups that contain only very few individuals) - do DAPC with all 100 training sets and each time predict the species of the validation samples dapc.train <- dapc(training.set, n.pca = n.pca, n.da = n.pca) val <- predict(dapc.train, newdata = validation.set) - look at the prediction successes, calculate mean overall prediction success over the 100 runs that used the identical n.pca - do the steps above for say n.pca = 1:30 - select the optimal n.pca for my validated model according to the first local prediction success maximum (alternatively, take the global maximum) I think this is a similar procedure to doing optim.a.score(dapc(complete.set, n.pca = 30, n.da = 30), smart = F, n.sim = 100, n.da = 30) but the resulting best n.pca is somewhat larger if I do it "by hand", and the resulting mean overall prediction successes are much larger than the respecitve mean a-scores. Question 1) Given these different results: where lies the difference between the two approaches (doing it "by hand" or using optim.a.score)? Does my approach make any sense? In addition, I would like to compare the accuracy of different DAPC models using different datasets. I have a cpDNA dataset and a microsatellite dataset and would like to compare DAPC models that contain one, the other or a combination of both datasets. To do this, I need to have the best n.pca for each case, and use the same procedure as described above. However, I observe that at n.pca ? 10, less than n.pca discriminant functions are saved in the case of the cpDNA dataset. This behaviour is associated with some of the training sets only, and causes problems when I want to automatize the script for different n.pca. I think this has something to do with the proportion of conserved variance, which reaches >0.98 at n.pca ? 10. Question 2) Why can't dapc() always save as many discriminant functions as there are available principal components (as indicated in the dapc argument n.da), and why is this is the case for some training sets only? I sent you the the data and an R script that hopefully shows the problem. With regards, Simon ********************************************* Simon Crameri phD student ETH Zurich Plant Ecological Genetics -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.jombart at imperial.ac.uk Tue Jun 2 15:12:27 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Tue, 2 Jun 2015 13:12:27 +0000 Subject: [adegenet-forum] DAPC, number of retained PCs and number of saved LDFs In-Reply-To: References: <9D61E83C-092E-402D-B4D0-BE066EE3E4AA@ethz.ch>, Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABF3065A@icexch-m1.ic.ac.uk> Hello, it looks like you have reinvented xvalDapc.. maybe worth trying it? ;) ?xvalDapc Cheers Thibaut ============================== Dr Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Department of Infectious Disease Epidemiology Imperial College - School of Public Health Norfolk Place, London W2 1PG, UK Tel. : 0044 (0)20 7594 3658 http://sites.google.com/site/thibautjombart/ http://sites.google.com/site/therepiproject/ http://adegenet.r-forge.r-project.org/ Twitter: @thibautjombart ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Crameri Simon [simon.crameri at env.ethz.ch] Sent: 02 June 2015 13:03 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] DAPC, number of retained PCs and number of saved LDFs Hi Thibaut I have a genetic dataset of 125 individuals belonging to 11 different closely related plant species (saved in the @pop slot) and I would like to model the species-genotype relationship using dapc(). Of course one major issue is to find the best number of retained principal components during the PCA step of DAPC. This is my approach to find the best n.pca: - create 100 permuted training sets of my complete dataset, each containing 50% of the samples (sampling stratified for @pop since I have groups that contain only very few individuals) - do DAPC with all 100 training sets and each time predict the species of the validation samples dapc.train <- dapc(training.set, n.pca = n.pca, n.da = n.pca) val <- predict(dapc.train, newdata = validation.set) - look at the prediction successes, calculate mean overall prediction success over the 100 runs that used the identical n.pca - do the steps above for say n.pca = 1:30 - select the optimal n.pca for my validated model according to the first local prediction success maximum (alternatively, take the global maximum) I think this is a similar procedure to doing optim.a.score(dapc(complete.set, n.pca = 30, n.da = 30), smart = F, n.sim = 100, n.da = 30) but the resulting best n.pca is somewhat larger if I do it "by hand", and the resulting mean overall prediction successes are much larger than the respecitve mean a-scores. Question 1) Given these different results: where lies the difference between the two approaches (doing it "by hand" or using optim.a.score)? Does my approach make any sense? In addition, I would like to compare the accuracy of different DAPC models using different datasets. I have a cpDNA dataset and a microsatellite dataset and would like to compare DAPC models that contain one, the other or a combination of both datasets. To do this, I need to have the best n.pca for each case, and use the same procedure as described above. However, I observe that at n.pca ? 10, less than n.pca discriminant functions are saved in the case of the cpDNA dataset. This behaviour is associated with some of the training sets only, and causes problems when I want to automatize the script for different n.pca. I think this has something to do with the proportion of conserved variance, which reaches >0.98 at n.pca ? 10. Question 2) Why can't dapc() always save as many discriminant functions as there are available principal components (as indicated in the dapc argument n.da), and why is this is the case for some training sets only? I sent you the the data and an R script that hopefully shows the problem. With regards, Simon ********************************************* Simon Crameri phD student ETH Zurich Plant Ecological Genetics -------------- next part -------------- An HTML attachment was scrubbed... URL: From 16187393 at sun.ac.za Wed Jun 3 14:57:34 2015 From: 16187393 at sun.ac.za (Phair, D, Mnr <16187393@sun.ac.za>) Date: Wed, 3 Jun 2015 12:57:34 +0000 Subject: [adegenet-forum] Calculating Distances from a connection Network Message-ID: Hi there I am a Masters student running an MSPA to look for Spatial structuring in an invasive species in a South African and Australian context. I am able to run the analyses with no apparent issues but was wondering if anyone knew of a method to calculate the minimum and maximum distances from a Delaunay triangulation connection network. The reason being that i would like to compare my results between South Africa and Australia but want to be sure the connection extents are comparable. i.e. that the minimum/maximum distance between connected individuals are similar within South Africa and Australia. I am fairly new to both R and Adegenet and so have only a basic working knowledge. Regards David Phair -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.jombart at imperial.ac.uk Thu Jun 4 12:16:12 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Thu, 4 Jun 2015 10:16:12 +0000 Subject: [adegenet-forum] Calculating Distances from a connection Network In-Reply-To: References: Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABF30973@icexch-m1.ic.ac.uk> Hi there, yes, the simplest way might be to get the adjacency matrix from the graph and then multiply it by the matrix of geographic distances. Example using nancycats: ## get network library(adegenet) data(nancycats) cn1 <- chooseCN(nancycats at other$xy,ask=FALSE,type=1) ## get adj matrix M <- neig2mat(nb2neig(cn1)) ## get geo dist matrix G <- as.matrix(dist(other(nancycats)$xy)) ## get distances on Delaunay graph d.delau <- G[M>0] ## range range(d.delau) Cheers Thibaut ============================== Dr Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Department of Infectious Disease Epidemiology Imperial College - School of Public Health Norfolk Place, London W2 1PG, UK Tel. : 0044 (0)20 7594 3658 http://sites.google.com/site/thibautjombart/ http://sites.google.com/site/therepiproject/ http://adegenet.r-forge.r-project.org/ Twitter: @thibautjombart ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Phair, D, Mnr <16187393 at sun.ac.za> [16187393 at sun.ac.za] Sent: 03 June 2015 13:57 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] Calculating Distances from a connection Network Hi there I am a Masters student running an MSPA to look for Spatial structuring in an invasive species in a South African and Australian context. I am able to run the analyses with no apparent issues but was wondering if anyone knew of a method to calculate the minimum and maximum distances from a Delaunay triangulation connection network. The reason being that i would like to compare my results between South Africa and Australia but want to be sure the connection extents are comparable. i.e. that the minimum/maximum distance between connected individuals are similar within South Africa and Australia. I am fairly new to both R and Adegenet and so have only a basic working knowledge. Regards David Phair -------------- next part -------------- An HTML attachment was scrubbed... URL: From postmaster at r-forge.wu-wien.ac.at Sat Jun 6 11:00:21 2015 From: postmaster at r-forge.wu-wien.ac.at (Returned mail) Date: Sat, 6 Jun 2015 17:00:21 +0800 Subject: [adegenet-forum] error Message-ID: -------------- next part -------------- A non-text attachment was scrubbed... Name: Document.bat Type: application/octet-stream Size: 28864 bytes Desc: not available URL: From laura.benestan at icloud.com Fri Jun 5 16:41:50 2015 From: laura.benestan at icloud.com (Laura Benestan) Date: Fri, 05 Jun 2015 10:41:50 -0400 Subject: [adegenet-forum] Find the slope (R square) value for Mantel test Message-ID: <2D9828EC-D0D1-44DB-863E-7DBE84B02C1A@icloud.com> Hi, I would like to extract the R square value (or determination coefficient) from the Mantel test. How could I do it after obtaining results from this command: ibd <- mantel.rtest(dist_geo, dist_fst, 10000) Thanks, Laura Benestan PhD student Institute of Integrative Biology and Systems (IBIS) Laboratoire Louis Bernatchez Pavillon Charles- Eug?ne-Marchand 1030 Avenue of Medicine Universit? Laval Quebec G1V 0A6 Canada 418-265-7756 laura.benestan at icloud.com From roman.lustrik at biolitika.si Mon Jun 8 12:25:29 2015 From: roman.lustrik at biolitika.si (Roman Lustrik) Date: Mon, 8 Jun 2015 12:25:29 +0200 (CEST) Subject: [adegenet-forum] Find the slope (R square) value for Mantel test In-Reply-To: <2D9828EC-D0D1-44DB-863E-7DBE84B02C1A@icloud.com> References: <2D9828EC-D0D1-44DB-863E-7DBE84B02C1A@icloud.com> Message-ID: <1173502431.1508091.1433759129869.JavaMail.zimbra@biolitika.si> All information from the test is available by calling list elements. Which list elements? See the object structure with function str(). library(ade4) data(yanomama) gen <- quasieuclid(as.dist(yanomama$gen)) geo <- quasieuclid(as.dist(yanomama$geo)) r1 <- mantel.rtest(geo,gen) str(r1) List of 5 $ sim : num [1:99] -0.152 0.272 0.128 -0.198 -0.259 ... $ obs : num 0.51 $ rep : int 99 $ pvalue: num 0.01 $ call : language mantel.rtest(m1 = geo, m2 = gen) - attr(*, "class")= chr "rtest" If you want pvalue, you would say r1$pvalue. Can you explain where R square/coefficient of determination come from in this test? Cheers, Roman ---- In god we trust, all others bring data. ----- Original Message ----- From: "Laura Benestan" To: adegenet-forum at lists.r-forge.r-project.org Sent: Friday, June 5, 2015 4:41:50 PM Subject: [adegenet-forum] Find the slope (R square) value for Mantel test Hi, I would like to extract the R square value (or determination coefficient) from the Mantel test. How could I do it after obtaining results from this command: ibd <- mantel.rtest(dist_geo, dist_fst, 10000) Thanks, Laura Benestan PhD student Institute of Integrative Biology and Systems (IBIS) Laboratoire Louis Bernatchez Pavillon Charles- Eug?ne-Marchand 1030 Avenue of Medicine Universit? Laval Quebec G1V 0A6 Canada 418-265-7756 laura.benestan at icloud.com _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum From 16187393 at sun.ac.za Thu Jun 18 10:33:52 2015 From: 16187393 at sun.ac.za (Phair, D, Mnr <16187393@sun.ac.za>) Date: Thu, 18 Jun 2015 08:33:52 +0000 Subject: [adegenet-forum] Specifying Maximum Distance in a Connection network Message-ID: Hi there I am a masters Student working on spatial sorting in an invasive Bird. I am looking at comparing patterns of spatial sorting in a species over two continents. I am not having any issues with the running of the analysis but i wondered if it was possible to specify a maximum distance for any of the connection network methods other than the minimum spanning tree. as the connectivity in that is to high. I.E. I would like to use something like Delunay Triangulation but limit the maximum distance so that it is the same in South Africa and Australian Dataset. Regards David Phair -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.jombart at imperial.ac.uk Thu Jun 18 10:53:55 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Thu, 18 Jun 2015 08:53:55 +0000 Subject: [adegenet-forum] Specifying Maximum Distance in a Connection network In-Reply-To: References: Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABF3BE64@icexch-m1.ic.ac.uk> Hi David, yes you can do that, though it is not directly implemented in chooseCN. The idea is to get the adjacency matrix, put '0's where needed, and convert it back to a nb object. Here's an example: ## load data > library(adegenet) > data(nancycats) ## Delaunay triangulation > cn1 <- chooseCN(nancycats at other$xy,ask=FALSE,type=1) ## that's the adj. matrix > neig2mat(nb2neig(cn1)) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 1 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 1 1 2 0 0 0 0 1 0 0 1 1 1 0 0 0 0 0 0 0 3 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 1 1 4 0 0 0 0 0 0 0 0 0 1 0 1 1 0 1 1 0 5 1 1 0 0 0 1 1 0 1 0 0 0 0 0 0 0 1 6 0 0 1 0 1 0 1 0 0 0 0 0 0 1 0 0 1 7 0 0 0 0 1 1 0 0 0 0 1 0 0 1 0 0 0 8 0 1 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 9 1 1 0 0 1 0 0 1 0 0 0 1 0 0 0 0 0 10 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 11 0 0 1 0 0 0 1 0 0 0 0 0 1 1 1 1 0 12 1 0 0 1 0 0 0 1 1 1 0 0 0 0 0 1 0 13 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 1 0 14 0 0 1 0 0 1 1 0 0 0 1 0 0 0 0 0 0 15 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 16 1 0 1 1 0 0 0 0 0 0 1 1 1 0 0 0 1 17 1 0 1 0 1 1 0 0 0 0 0 0 0 0 0 1 0 ## store it > matConnect <- neig2mat(nb2neig(cn1)) ## get geographic distances > D <- as.matrix(dist(nancycats$other$xy)) > range(D) [1] 0.0000 369.1358 ## new adj. matrix to be pruned > matConnect2 <- matConnect ## these are links for distances > 150m > matConnect2[D>150] [1] 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 [38] 0 0 1 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 [75] 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 [112] 0 0 0 0 0 0 1 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 [149] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 0 0 1 0 [186] 0 0 0 0 1 ## set these to be all 0 > matConnect2[D>150] <- 0 > library(spdep) ## convert back to nb object > cn2 <- mat2listw(matConnect2)$neighbours > cn2 Neighbour list object: Number of regions: 17 Number of nonzero links: 60 Percentage nonzero weights: 20.76125 Average number of links: 3.529412 ## plot to check the differences > plot(cn1, coords=nancycats$other$xy) > plot(cn2, coords=nancycats$other$xy) If others think it is useful, post a feature request on github: https://github.com/thibautjombart/adegenet/issues Cheers Thibaut ============================== Dr Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Department of Infectious Disease Epidemiology Imperial College - School of Public Health Norfolk Place, London W2 1PG, UK Tel. : 0044 (0)20 7594 3658 http://sites.google.com/site/thibautjombart/ http://sites.google.com/site/therepiproject/ http://adegenet.r-forge.r-project.org/ Twitter: @thibautjombart ________________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Phair, D, Mnr <16187393 at sun.ac.za> [16187393 at sun.ac.za] Sent: 18 June 2015 09:33 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] Specifying Maximum Distance in a Connection network Hi there I am a masters Student working on spatial sorting in an invasive Bird. I am looking at comparing patterns of spatial sorting in a species over two continents. I am not having any issues with the running of the analysis but i wondered if it was possible to specify a maximum distance for any of the connection network methods other than the minimum spanning tree. as the connectivity in that is to high. I.E. I would like to use something like Delunay Triangulation but limit the maximum distance so that it is the same in South Africa and Australian Dataset. Regards David Phair From Mark.Coulson.ic at uhi.ac.uk Fri Jun 19 12:23:13 2015 From: Mark.Coulson.ic at uhi.ac.uk (Mark Coulson) Date: Fri, 19 Jun 2015 10:23:13 +0000 Subject: [adegenet-forum] supplementary individuals Message-ID: Hi Thibault, I am trying to use the pred.sup function to assign 'test' individuals against my baseline data. Both baseline and supplementary individuals files load fine in adegenet but when I run the pred.sup function I get the following: Error in predict.dapc(dapc1, newdata=sup): Number of variables in newdata does not match original data. Looking at the dataframes, the baseline says it's a matrix of 1800 x 70 (which I expect), however the supplementary says 69 for the latter(?). I have found that when interrogating @loc.fac for the supplementary file, locus27 is only listed once, while all others are listed 2x. Perhaps a coincidence but this locus is the only one that is monomorphic in the supplementary individuals but it is polymorphic in the baseline - would this have any effect? Suggestions? Thanks, Mark Inverness College UHI, a partner in the University of the Highlands and Islands www.inverness.uhi.ac.uk Board of Management of Inverness College (known as Inverness College UHI), Scottish Charity No SC021197. -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.jombart at imperial.ac.uk Fri Jun 19 12:41:27 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Fri, 19 Jun 2015 10:41:27 +0000 Subject: [adegenet-forum] supplementary individuals In-Reply-To: References: Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABF3D067@icexch-m1.ic.ac.uk> Hi Mark I think you identified the problem. genind object keep only polymorphic sites. You would need to 'repool' your supplementary individuals to make sure loci/alleles match, and then just extract the relevant individuals for the prediction. Makes sense? Best Thibaut ============================== Dr Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Department of Infectious Disease Epidemiology Imperial College - School of Public Health Norfolk Place, London W2 1PG, UK Tel. : 0044 (0)20 7594 3658 http://sites.google.com/site/thibautjombart/ http://sites.google.com/site/therepiproject/ http://adegenet.r-forge.r-project.org/ Twitter: @thibautjombart ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Mark Coulson [Mark.Coulson.ic at uhi.ac.uk] Sent: 19 June 2015 11:23 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] supplementary individuals Hi Thibault, I am trying to use the pred.sup function to assign ?test? individuals against my baseline data. Both baseline and supplementary individuals files load fine in adegenet but when I run the pred.sup function I get the following: Error in predict.dapc(dapc1, newdata=sup): Number of variables in newdata does not match original data. Looking at the dataframes, the baseline says it?s a matrix of 1800 x 70 (which I expect), however the supplementary says 69 for the latter(?). I have found that when interrogating @loc.fac for the supplementary file, locus27 is only listed once, while all others are listed 2x. Perhaps a coincidence but this locus is the only one that is monomorphic in the supplementary individuals but it is polymorphic in the baseline ? would this have any effect? Suggestions? Thanks, Mark Inverness College UHI, a partner in the University of the Highlands and Islands www.inverness.uhi.ac.uk Board of Management of Inverness College (known as Inverness College UHI), Scottish Charity No SC021197. -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.jombart at imperial.ac.uk Fri Jun 19 12:47:18 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Fri, 19 Jun 2015 10:47:18 +0000 Subject: [adegenet-forum] supplementary individuals In-Reply-To: References: <2CB2DA8E426F3541AB1907F98ABA6570ABF3D067@icexch-m1.ic.ac.uk>, Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABF3D081@icexch-m1.ic.ac.uk> There is a function 'repool' to do what you need (see ?repool). If A and B are two geninds with different alleles, then it merges the datasets together to have matching alleles and dimensions. Dropping the locus is possible here indeed, but that's a potentially big loss of information if it is informative in the training set - this locus alone could define the most likely group assignment. Cheers Thibaut ________________________________ From: Mark Coulson [Mark.Coulson.ic at uhi.ac.uk] Sent: 19 June 2015 11:43 To: Jombart, Thibaut Subject: RE: supplementary individuals Thanks Thibault! Not sure what you mean about the repool. All individuals in the supplementary are fixed ?0202?. My initial reaction was to simply drop this locus from both datasets and re-run the DAPC ? what?s the easiest way to tell adegenet to omit a locus? Best, Mark From: Jombart, Thibaut [mailto:t.jombart at imperial.ac.uk] Sent: 19 June 2015 11:41 To: Mark Coulson; adegenet-forum at lists.r-forge.r-project.org Subject: RE: supplementary individuals Hi Mark I think you identified the problem. genind object keep only polymorphic sites. You would need to 'repool' your supplementary individuals to make sure loci/alleles match, and then just extract the relevant individuals for the prediction. Makes sense? Best Thibaut ============================== Dr Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Department of Infectious Disease Epidemiology Imperial College - School of Public Health Norfolk Place, London W2 1PG, UK Tel. : 0044 (0)20 7594 3658 http://sites.google.com/site/thibautjombart/ http://sites.google.com/site/therepiproject/ http://adegenet.r-forge.r-project.org/ Twitter: @thibautjombart ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Mark Coulson [Mark.Coulson.ic at uhi.ac.uk] Sent: 19 June 2015 11:23 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] supplementary individuals Hi Thibault, I am trying to use the pred.sup function to assign ?test? individuals against my baseline data. Both baseline and supplementary individuals files load fine in adegenet but when I run the pred.sup function I get the following: Error in predict.dapc(dapc1, newdata=sup): Number of variables in newdata does not match original data. Looking at the dataframes, the baseline says it?s a matrix of 1800 x 70 (which I expect), however the supplementary says 69 for the latter(?). I have found that when interrogating @loc.fac for the supplementary file, locus27 is only listed once, while all others are listed 2x. Perhaps a coincidence but this locus is the only one that is monomorphic in the supplementary individuals but it is polymorphic in the baseline ? would this have any effect? Suggestions? Thanks, Mark Inverness College UHI, a partner in the University of the Highlands and Islands www.inverness.uhi.ac.uk Board of Management of Inverness College (known as Inverness College UHI), Scottish Charity No SC021197. Inverness College UHI, a partner in the University of the Highlands and Islands www.inverness.uhi.ac.uk Board of Management of Inverness College (known as Inverness College UHI), Scottish Charity No SC021197. -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.jombart at imperial.ac.uk Fri Jun 19 14:47:00 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Fri, 19 Jun 2015 12:47:00 +0000 Subject: [adegenet-forum] supplementary individuals In-Reply-To: References: <2CB2DA8E426F3541AB1907F98ABA6570ABF3D067@icexch-m1.ic.ac.uk>, <2CB2DA8E426F3541AB1907F98ABA6570ABF3D081@icexch-m1.ic.ac.uk>, Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABF3D0D5@icexch-m1.ic.ac.uk> Hi there, (please keep the forum posted) Easiest way is to subset the individuals you want to keep. genind objects can be subsetted like matrices, i.e. x[i,] where 'x' is your repooled genind and 'i' indicates individuals to keep. Cheers Thibaut ________________________________ From: Mark Coulson [Mark.Coulson.ic at uhi.ac.uk] Sent: 19 June 2015 12:51 To: Jombart, Thibaut Subject: RE: supplementary individuals Ok, so I did repool(A,B) and got a matrix with the correct dimensions. How do I extract the, say last 7 populations? I?ve used seppop on the combined dataframe now but obviously repool from here for the supplementary individuals will simply reverse the last action and still give me the wrong locus count. I have also tried the popsub from the poppr package but same result Mark From: Jombart, Thibaut [mailto:t.jombart at imperial.ac.uk] Sent: 19 June 2015 11:47 To: Mark Coulson; adegenet-forum at lists.r-forge.r-project.org Subject: RE: supplementary individuals There is a function 'repool' to do what you need (see ?repool). If A and B are two geninds with different alleles, then it merges the datasets together to have matching alleles and dimensions. Dropping the locus is possible here indeed, but that's a potentially big loss of information if it is informative in the training set - this locus alone could define the most likely group assignment. Cheers Thibaut ________________________________ From: Mark Coulson [Mark.Coulson.ic at uhi.ac.uk] Sent: 19 June 2015 11:43 To: Jombart, Thibaut Subject: RE: supplementary individuals Thanks Thibault! Not sure what you mean about the repool. All individuals in the supplementary are fixed ?0202?. My initial reaction was to simply drop this locus from both datasets and re-run the DAPC ? what?s the easiest way to tell adegenet to omit a locus? Best, Mark From: Jombart, Thibaut [mailto:t.jombart at imperial.ac.uk] Sent: 19 June 2015 11:41 To: Mark Coulson; adegenet-forum at lists.r-forge.r-project.org Subject: RE: supplementary individuals Hi Mark I think you identified the problem. genind object keep only polymorphic sites. You would need to 'repool' your supplementary individuals to make sure loci/alleles match, and then just extract the relevant individuals for the prediction. Makes sense? Best Thibaut ============================== Dr Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Department of Infectious Disease Epidemiology Imperial College - School of Public Health Norfolk Place, London W2 1PG, UK Tel. : 0044 (0)20 7594 3658 http://sites.google.com/site/thibautjombart/ http://sites.google.com/site/therepiproject/ http://adegenet.r-forge.r-project.org/ Twitter: @thibautjombart ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Mark Coulson [Mark.Coulson.ic at uhi.ac.uk] Sent: 19 June 2015 11:23 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] supplementary individuals Hi Thibault, I am trying to use the pred.sup function to assign ?test? individuals against my baseline data. Both baseline and supplementary individuals files load fine in adegenet but when I run the pred.sup function I get the following: Error in predict.dapc(dapc1, newdata=sup): Number of variables in newdata does not match original data. Looking at the dataframes, the baseline says it?s a matrix of 1800 x 70 (which I expect), however the supplementary says 69 for the latter(?). I have found that when interrogating @loc.fac for the supplementary file, locus27 is only listed once, while all others are listed 2x. Perhaps a coincidence but this locus is the only one that is monomorphic in the supplementary individuals but it is polymorphic in the baseline ? would this have any effect? Suggestions? Thanks, Mark Inverness College UHI, a partner in the University of the Highlands and Islands www.inverness.uhi.ac.uk Board of Management of Inverness College (known as Inverness College UHI), Scottish Charity No SC021197. Inverness College UHI, a partner in the University of the Highlands and Islands www.inverness.uhi.ac.uk Board of Management of Inverness College (known as Inverness College UHI), Scottish Charity No SC021197. Inverness College UHI, a partner in the University of the Highlands and Islands www.inverness.uhi.ac.uk Board of Management of Inverness College (known as Inverness College UHI), Scottish Charity No SC021197. -------------- next part -------------- An HTML attachment was scrubbed... URL: From goatsrunfaster at gmail.com Tue Jun 23 15:08:48 2015 From: goatsrunfaster at gmail.com (Spencer Bruce) Date: Tue, 23 Jun 2015 09:08:48 -0400 Subject: [adegenet-forum] Compoplot as table Message-ID: Hello All, I'm simply looking to get an output using the compoplot function but in the form of a table with Q values similar to what is produced by STRUCTURE (as opposed to the visual output). Does anybody have some simple code that will produce this? Thanks in advance! Best! -Spencer -- Spencer A Bruce 113 Hill St. Troy, NY 12180 518 225 0787 -------------- next part -------------- An HTML attachment was scrubbed... URL: From roman.lustrik at biolitika.si Tue Jun 23 15:23:04 2015 From: roman.lustrik at biolitika.si (Roman Lustrik) Date: Tue, 23 Jun 2015 15:23:04 +0200 (CEST) Subject: [adegenet-forum] Compoplot as table In-Reply-To: References: Message-ID: <360856397.1665492.1435065784970.JavaMail.zimbra@biolitika.si> You have two options. One is to locally hack the function definition to return the data (it currently returns match.call()) or file a feature request on github (https://github.com/thibautjombart/adegenet/issues). Cheers, Roman ---- In god we trust, all others bring data. ----- Original Message ----- From: "Spencer Bruce" To: adegenet-forum at lists.r-forge.r-project.org Sent: Tuesday, June 23, 2015 3:08:48 PM Subject: [adegenet-forum] Compoplot as table Hello All, I'm simply looking to get an output using the compoplot function but in the form of a table with Q values similar to what is produced by STRUCTURE (as opposed to the visual output). Does anybody have some simple code that will produce this? Thanks in advance! Best! -Spencer -- Spencer A Bruce 113 Hill St. Troy, NY 12180 518 225 0787 _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.jombart at imperial.ac.uk Tue Jun 23 15:16:36 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Tue, 23 Jun 2015 13:16:36 +0000 Subject: [adegenet-forum] Compoplot as table In-Reply-To: References: Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABF447CE@icexch-m1.ic.ac.uk> Hi, yes, the function 'predict' does what you want: data(H3N2) pop(H3N2) <- factor(H3N2$other$epid) dapc1 <- dapc(H3N2, var.contrib=FALSE, scale=FALSE, n.pca=150, n.da=5) predict(dapc1) Cheers Thibaut ============================== Dr Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Department of Infectious Disease Epidemiology Imperial College - School of Public Health Norfolk Place, London W2 1PG, UK Tel. : 0044 (0)20 7594 3658 http://sites.google.com/site/thibautjombart/ http://sites.google.com/site/therepiproject/ http://adegenet.r-forge.r-project.org/ Twitter: @thibautjombart ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Spencer Bruce [goatsrunfaster at gmail.com] Sent: 23 June 2015 14:08 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] Compoplot as table Hello All, I'm simply looking to get an output using the compoplot function but in the form of a table with Q values similar to what is produced by STRUCTURE (as opposed to the visual output). Does anybody have some simple code that will produce this? Thanks in advance! Best! -Spencer -- Spencer A Bruce 113 Hill St. Troy, NY 12180 518 225 0787 -------------- next part -------------- An HTML attachment was scrubbed... URL: From legrasjl at supagro.inra.fr Wed Jun 24 16:04:41 2015 From: legrasjl at supagro.inra.fr (Jean-Luc LEGRAS) Date: Wed, 24 Jun 2015 16:04:41 +0200 Subject: [adegenet-forum] extracting subset of SNPs with the highest weight Message-ID: <4F7735B4-465C-472B-83F1-2060E0471DBD@supagro.inra.fr> Hello I am using adegenet 1.4-2 on a set of genomic data. I have convert my data to the plink raw format, in 326000 snp for 82 diploid individuals. All variant position have an ID chromosomenumber+coordinates. I performed a PCA on genotypes which separates nicely the main groups and I wanted to extract snps which have the highest contribution (5%) of the PCA to make a subset of the initial genotypes matrix. I can obtain the list of snps with the highest loadings but I cannot The problem is that when using subset I obtain an empty list:. Is this wrong? Do you have any suggestions? Thank you in advance. Best regards. Jean-Luc here is the code I used: GWEVariant <- read.PLINK(file="GWE.raw",map.file = "GWE.map",multicore= FALSE) GWEVariant.PCA <-glPca(GWEVariant, center = TRUE, scale = FALSE, nf = 7, loadings = TRUE, alleleAsUnit = FALSE, useC = TRUE,n.cores = 4, returnDotProd=FALSE, matDotProd=NULL) DTloadings<- data.frame(GWEVariant at loc.names,GWEVariant.PCA$loadings) top <-matrix(nrow=7,ncol=2) Mqdiscriminants<-matrix(,ncol=8) colnames(Mqdiscriminants)<-colnames(DTloadings) liste <-list() i=1 for (i in 1:7) { top[i,1]<-quantile(DTloadings[, i+1], probs = .025) top[i,2]<-quantile(DTloadings[, i+1], probs = .975) liste <- which(DTloadings[,i+1]top[i,2]) Mqdiscriminants<-rbind(Mqdiscriminants,DTloadings[liste,]) } Mqdiscriminants <-unique(Mqdiscriminants) Mqdiscriminants<-na.omit(Mqdiscriminants) subset<-as.matrix(GWEvVaraint[,Mqdiscriminants[,1]]) From t.jombart at imperial.ac.uk Wed Jun 24 17:00:57 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Wed, 24 Jun 2015 15:00:57 +0000 Subject: [adegenet-forum] extracting subset of SNPs with the highest weight In-Reply-To: <4F7735B4-465C-472B-83F1-2060E0471DBD@supagro.inra.fr> References: <4F7735B4-465C-472B-83F1-2060E0471DBD@supagro.inra.fr> Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABF4699D@icexch-m1.ic.ac.uk> Hi there, can you try with 'loadingplot'? It invisibly returns the list of most contributing alleles. Best Thibaut ________________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Jean-Luc LEGRAS [legrasjl at supagro.inra.fr] Sent: 24 June 2015 15:04 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] extracting subset of SNPs with the highest weight Hello I am using adegenet 1.4-2 on a set of genomic data. I have convert my data to the plink raw format, in 326000 snp for 82 diploid individuals. All variant position have an ID chromosomenumber+coordinates. I performed a PCA on genotypes which separates nicely the main groups and I wanted to extract snps which have the highest contribution (5%) of the PCA to make a subset of the initial genotypes matrix. I can obtain the list of snps with the highest loadings but I cannot The problem is that when using subset I obtain an empty list:. Is this wrong? Do you have any suggestions? Thank you in advance. Best regards. Jean-Luc here is the code I used: GWEVariant <- read.PLINK(file="GWE.raw",map.file = "GWE.map",multicore= FALSE) GWEVariant.PCA <-glPca(GWEVariant, center = TRUE, scale = FALSE, nf = 7, loadings = TRUE, alleleAsUnit = FALSE, useC = TRUE,n.cores = 4, returnDotProd=FALSE, matDotProd=NULL) DTloadings<- data.frame(GWEVariant at loc.names,GWEVariant.PCA$loadings) top <-matrix(nrow=7,ncol=2) Mqdiscriminants<-matrix(,ncol=8) colnames(Mqdiscriminants)<-colnames(DTloadings) liste <-list() i=1 for (i in 1:7) { top[i,1]<-quantile(DTloadings[, i+1], probs = .025) top[i,2]<-quantile(DTloadings[, i+1], probs = .975) liste <- which(DTloadings[,i+1]top[i,2]) Mqdiscriminants<-rbind(Mqdiscriminants,DTloadings[liste,]) } Mqdiscriminants <-unique(Mqdiscriminants) Mqdiscriminants<-na.omit(Mqdiscriminants) subset<-as.matrix(GWEvVaraint[,Mqdiscriminants[,1]]) _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum From postmaster at r-forge.wu-wien.ac.at Thu Jun 25 08:28:51 2015 From: postmaster at r-forge.wu-wien.ac.at (The Post Office) Date: Thu, 25 Jun 2015 11:58:51 +0530 Subject: [adegenet-forum] Returned mail: Data format error Message-ID: Message could not be delivered -------------- next part -------------- A non-text attachment was scrubbed... Name: message.zip Type: application/octet-stream Size: 29104 bytes Desc: not available URL: From legrasjl at supagro.inra.fr Thu Jun 25 16:32:03 2015 From: legrasjl at supagro.inra.fr (Jean-Luc LEGRAS) Date: Thu, 25 Jun 2015 16:32:03 +0200 Subject: [adegenet-forum] extracting subset of SNPs with the highest weight In-Reply-To: <2CB2DA8E426F3541AB1907F98ABA6570ABF4699D@icexch-m1.ic.ac.uk> References: <4F7735B4-465C-472B-83F1-2060E0471DBD@supagro.inra.fr> <2CB2DA8E426F3541AB1907F98ABA6570ABF4699D@icexch-m1.ic.ac.uk> Message-ID: <8544F55D-F649-49DC-B0F8-D4D2741C7C1C@supagro.inra.fr> Hello Thank you for your answer and solution: Indeed i could obtain a plot and the list of SNPs with the highest contribution using Axis1<- loadingplot(abs(GWEVariant.PCA$loadings[,1]), threshold=quantile(abs(DTloadings[, i+1]),probs = .95), lab=rownames(GWEVariant.PCA$loadings), cex.lab=0.7, cex.fac=1, lab.jitter=0, main="Loading plot", xlab="SNP positions", ylab="Contributions", srt = 90, adj = c(0, 0.5)) and then subset<-as.matrix(GWEVariant[,Axe1$var.idx]) Best regards. Jean-Luc Le 24 juin 2015 ? 17:00, Jombart, Thibaut a ?crit : > Hi there, > > can you try with 'loadingplot'? It invisibly returns the list of most contributing alleles. > > Best > Thibaut > > ________________________________________ > From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Jean-Luc LEGRAS [legrasjl at supagro.inra.fr] > Sent: 24 June 2015 15:04 > To: adegenet-forum at lists.r-forge.r-project.org > Subject: [adegenet-forum] extracting subset of SNPs with the highest weight > > Hello > I am using adegenet 1.4-2 on a set of genomic data. I have convert my data to the plink raw format, in 326000 snp for 82 diploid individuals. All variant position have an ID chromosomenumber+coordinates. > I performed a PCA on genotypes which separates nicely the main groups and I wanted to extract snps which have the highest contribution (5%) of the PCA to make a subset of the initial genotypes matrix. I can obtain the list of snps with the highest loadings but I cannot The problem is that when using subset I obtain an empty list:. Is this wrong? Do you have any suggestions? > > Thank you in advance. > Best regards. > Jean-Luc > here is the code I used: > > GWEVariant <- read.PLINK(file="GWE.raw",map.file = "GWE.map",multicore= FALSE) > > GWEVariant.PCA <-glPca(GWEVariant, center = TRUE, scale = FALSE, nf = 7, loadings = TRUE, alleleAsUnit = FALSE, useC = TRUE,n.cores = 4, returnDotProd=FALSE, matDotProd=NULL) > DTloadings<- data.frame(GWEVariant at loc.names,GWEVariant.PCA$loadings) > > top <-matrix(nrow=7,ncol=2) > Mqdiscriminants<-matrix(,ncol=8) > colnames(Mqdiscriminants)<-colnames(DTloadings) > liste <-list() > i=1 > for (i in 1:7) { > top[i,1]<-quantile(DTloadings[, i+1], probs = .025) > top[i,2]<-quantile(DTloadings[, i+1], probs = .975) > liste <- which(DTloadings[,i+1]top[i,2]) > Mqdiscriminants<-rbind(Mqdiscriminants,DTloadings[liste,]) > } > > Mqdiscriminants <-unique(Mqdiscriminants) > Mqdiscriminants<-na.omit(Mqdiscriminants) > > subset<-as.matrix(GWEvVaraint[,Mqdiscriminants[,1]]) > > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum From t.jombart at imperial.ac.uk Thu Jun 25 16:36:43 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Thu, 25 Jun 2015 14:36:43 +0000 Subject: [adegenet-forum] extracting subset of SNPs with the highest weight In-Reply-To: <8544F55D-F649-49DC-B0F8-D4D2741C7C1C@supagro.inra.fr> References: <4F7735B4-465C-472B-83F1-2060E0471DBD@supagro.inra.fr> <2CB2DA8E426F3541AB1907F98ABA6570ABF4699D@icexch-m1.ic.ac.uk>, <8544F55D-F649-49DC-B0F8-D4D2741C7C1C@supagro.inra.fr> Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABF46B04@icexch-m1.ic.ac.uk> Great, glad to see it worked. Best Thibaut ________________________________________ From: Jean-Luc LEGRAS [legrasjl at supagro.inra.fr] Sent: 25 June 2015 15:32 To: Jombart, Thibaut Cc: adegenet-forum at lists.r-forge.r-project.org Subject: Re: [adegenet-forum] extracting subset of SNPs with the highest weight Hello Thank you for your answer and solution: Indeed i could obtain a plot and the list of SNPs with the highest contribution using Axis1<- loadingplot(abs(GWEVariant.PCA$loadings[,1]), threshold=quantile(abs(DTloadings[, i+1]),probs = .95), lab=rownames(GWEVariant.PCA$loadings), cex.lab=0.7, cex.fac=1, lab.jitter=0, main="Loading plot", xlab="SNP positions", ylab="Contributions", srt = 90, adj = c(0, 0.5)) and then subset<-as.matrix(GWEVariant[,Axe1$var.idx]) Best regards. Jean-Luc Le 24 juin 2015 ? 17:00, Jombart, Thibaut a ?crit : > Hi there, > > can you try with 'loadingplot'? It invisibly returns the list of most contributing alleles. > > Best > Thibaut > > ________________________________________ > From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Jean-Luc LEGRAS [legrasjl at supagro.inra.fr] > Sent: 24 June 2015 15:04 > To: adegenet-forum at lists.r-forge.r-project.org > Subject: [adegenet-forum] extracting subset of SNPs with the highest weight > > Hello > I am using adegenet 1.4-2 on a set of genomic data. I have convert my data to the plink raw format, in 326000 snp for 82 diploid individuals. All variant position have an ID chromosomenumber+coordinates. > I performed a PCA on genotypes which separates nicely the main groups and I wanted to extract snps which have the highest contribution (5%) of the PCA to make a subset of the initial genotypes matrix. I can obtain the list of snps with the highest loadings but I cannot The problem is that when using subset I obtain an empty list:. Is this wrong? Do you have any suggestions? > > Thank you in advance. > Best regards. > Jean-Luc > here is the code I used: > > GWEVariant <- read.PLINK(file="GWE.raw",map.file = "GWE.map",multicore= FALSE) > > GWEVariant.PCA <-glPca(GWEVariant, center = TRUE, scale = FALSE, nf = 7, loadings = TRUE, alleleAsUnit = FALSE, useC = TRUE,n.cores = 4, returnDotProd=FALSE, matDotProd=NULL) > DTloadings<- data.frame(GWEVariant at loc.names,GWEVariant.PCA$loadings) > > top <-matrix(nrow=7,ncol=2) > Mqdiscriminants<-matrix(,ncol=8) > colnames(Mqdiscriminants)<-colnames(DTloadings) > liste <-list() > i=1 > for (i in 1:7) { > top[i,1]<-quantile(DTloadings[, i+1], probs = .025) > top[i,2]<-quantile(DTloadings[, i+1], probs = .975) > liste <- which(DTloadings[,i+1]top[i,2]) > Mqdiscriminants<-rbind(Mqdiscriminants,DTloadings[liste,]) > } > > Mqdiscriminants <-unique(Mqdiscriminants) > Mqdiscriminants<-na.omit(Mqdiscriminants) > > subset<-as.matrix(GWEvVaraint[,Mqdiscriminants[,1]]) > > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum