From Mark.Coulson.ic at uhi.ac.uk Wed Mar 4 14:47:57 2015 From: Mark.Coulson.ic at uhi.ac.uk (Mark Coulson) Date: Wed, 4 Mar 2015 13:47:57 +0000 Subject: [adegenet-forum] sequential DAPC Message-ID: Hello, I have run a DAPC on a large dataset of individuals from 100 locations. There are a couple of clear outlier groups that I have then removed and want to run a subsequent DAPC on the rest of the dataset (now 98 locations) and do these sequentially for a couple of rounds. My question is do I need to keep the same number of PCA and DF for each or should I be re-running the xvalDapc function for each 'level' individually? What would be more sensible in order to compare across levels? Best, Mark Inverness College UHI, a partner in the University of the Highlands and Islands www.inverness.uhi.ac.uk Board of Management of Inverness College (known as Inverness College UHI), Scottish Charity No SC021197. -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.jombart at imperial.ac.uk Wed Mar 4 18:14:23 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Wed, 4 Mar 2015 17:14:23 +0000 Subject: [adegenet-forum] Hackathon coming: request new features! Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABEE89CD@icexch-m1.ic.ac.uk> Dear all, as a follow-up to a previous post, most of the adegenet development team will be attending a hackathon hosted by NESCent (NC, USA) in a few days. Now that adegenet has moved on github, posting bug reports or feature requests is trivial. All you need to do is submit a new 'issue' at: https://github.com/thibautjombart/adegenet/issues So, if there is anything you wish changed, fixed or added, shoot! Cheers Thibaut ============================== Dr Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Department of Infectious Disease Epidemiology Imperial College - School of Public Health Norfolk Place, London W2 1PG, UK Tel. : 0044 (0)20 7594 3658 http://sites.google.com/site/thibautjombart/ http://sites.google.com/site/therepiproject/ http://adegenet.r-forge.r-project.org/ Twitter: @thibautjombart -------------- next part -------------- An HTML attachment was scrubbed... URL: From crypticlineage at gmail.com Mon Mar 30 04:25:16 2015 From: crypticlineage at gmail.com (Vikram Chhatre) Date: Sun, 29 Mar 2015 22:25:16 -0400 Subject: [adegenet-forum] extracting genefreq $tab from an indexed list Message-ID: I am working with hundreds of genpop objects indexed in a list. Using lapply and makefreq functions, population gene frequencies were stored in individual objects (1 per data set). Here is an example with just three objects: >summary(mygenpop) Length Class Mode data1.str 1 genpop S4 data2.str 1 genpop S4 data3.str 1 genpop S4 >mygenfreq <- lapply(mygenpop, function(x) makefreq(x, truenames=TRUE)) >summary(mygenfreq) Length Class Mode data1.str 3 -none- list data2.str 3 -none- list data3.str 3 -none- list >summary(mygenfreq[[1]]$tab) > str(mygenfreq[[1]]) $ tab : num [1:30, 1:1974] 0.6 0.5 0.325 0.675 0.6 0.5 0.5 0.375 0.55 0.475 ... $ nobs: num [1:30, 1:1000] 40 40 40 40 40 40 40 40 40 40 ... Next job is to work with the $tab matrix, but I am not sure how to access it from all objects in one command. >mygenfreqT <- lapply(mygenfreq[[1:3]]$tab, function(x) t(x)) This throws an error. The syntax seems to be wrong, but I am not sure how to fix this. Thanks for any help. Vikram -------------- next part -------------- An HTML attachment was scrubbed... URL: From roman.lustrik at biolitika.si Mon Mar 30 08:08:22 2015 From: roman.lustrik at biolitika.si (Roman Lustrik) Date: Mon, 30 Mar 2015 08:08:22 +0200 (CEST) Subject: [adegenet-forum] extracting genefreq $tab from an indexed list In-Reply-To: References: Message-ID: <1404807810.557482.1427695702830.JavaMail.zimbra@biolitika.si> S4 objects are different to the classical S3 object like data.frames, lists and other "basic" objects. One of their peculiarities is that they're accessed through "@" operator. In truth, user is not meant to access the slots directly - the developer should provide methods to access all the slots that she or he deems appropriate for user to access. No method could mean either it hasn't been implemented yet, or is not implemented by design (Thibaut will have more to say about this). And now crux of the matter. Your first two examples work because lists can be accessed through various operators. This is often done via `lapply(X = x, FUN = "[[", "element_name")`. In your case, you can try creating an anonymous function that accesses the slot. library(adegenet) data(nancycats) x <- list(nancycats) lapply(x, FUN = function(x) x$tab) Cheers, Roman ---- In god we trust, all others bring data. ----- Original Message ----- From: "Vikram Chhatre" To: adegenet-forum at lists.r-forge.r-project.org Sent: Monday, March 30, 2015 4:25:16 AM Subject: [adegenet-forum] extracting genefreq $tab from an indexed list I am working with hundreds of genpop objects indexed in a list. Using lapply and makefreq functions, population gene frequencies were stored in individual objects (1 per data set). Here is an example with just three objects: >summary(mygenpop) Length Class Mode data1.str 1 genpop S4 data2.str 1 genpop S4 data3.str 1 genpop S4 >mygenfreq <- lapply(mygenpop, function(x) makefreq(x, truenames=TRUE)) >summary(mygenfreq) Length Class Mode data1.str 3 -none- list data2.str 3 -none- list data3.str 3 -none- list >summary(mygenfreq[[1]]$tab) > str(mygenfreq[[1]]) $ tab : num [1:30, 1:1974] 0.6 0.5 0.325 0.675 0.6 0.5 0.5 0.375 0.55 0.475 ... $ nobs: num [1:30, 1:1000] 40 40 40 40 40 40 40 40 40 40 ... Next job is to work with the $tab matrix, but I am not sure how to access it from all objects in one command. >mygenfreqT <- lapply(mygenfreq[[1:3]]$tab, function(x) t(x)) This throws an error. The syntax seems to be wrong, but I am not sure how to fix this. Thanks for any help. Vikram _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum -------------- next part -------------- An HTML attachment was scrubbed... URL: From crypticlineage at gmail.com Mon Mar 30 14:11:24 2015 From: crypticlineage at gmail.com (Vikram Chhatre) Date: Mon, 30 Mar 2015 08:11:24 -0400 Subject: [adegenet-forum] extracting genefreq $tab from an indexed list In-Reply-To: <1568758322.557508.1427695784424.JavaMail.zimbra@biolitika.si> References: <1404807810.557482.1427695702830.JavaMail.zimbra@biolitika.si> <1568758322.557508.1427695784424.JavaMail.zimbra@biolitika.si> Message-ID: Hi Roman, Thank you for the explanation. The following does not work. >mygenfreqT <- lapply(mygenfreq, FUN=function(x) t(x at tab)) Error in t(x at tab) : trying to get slot "tab" from an object of a basic class ("list") with no slots Someone else suggested another solution, which seems to have worked: >mygenfreqT <- lapply(lapply(mygenfreq, "[[", "tab"), function(x) t(x)) > head(mygenfreqT[[1]]) 1 2 3 4 5 6 7 8 9 10 11 12 L0001.1 0.60 0.500 0.325 0.675 0.600 0.500 0.500 0.375 0.550 0.475 0.350 0.275 L0001.2 0.40 0.500 0.675 0.325 0.400 0.500 0.500 0.625 0.450 0.525 0.650 0.725 L0002.1 0.30 0.150 0.175 0.250 0.275 0.400 0.325 0.325 0.475 0.275 0.175 0.150 Any other solutions are welcome. Thanks Vikram On Mon, Mar 30, 2015 at 2:09 AM, Roman Lustrik wrote: > Oops, make that `lapply(x, FUN = function(x) x at tab)` . > > Cheers, > Roman > > ---- > In god we trust, all others bring data. > > ------------------------------ > *From: *"Roman Lustrik" > *To: *"Vikram Chhatre" > *Cc: *adegenet-forum at lists.r-forge.r-project.org > *Sent: *Monday, March 30, 2015 8:08:22 AM > *Subject: *Re: [adegenet-forum] extracting genefreq $tab from an indexed > list > > > S4 objects are different to the classical S3 object like data.frames, > lists and other "basic" objects. One of their peculiarities is that they're > accessed through "@" operator. In truth, user is not meant to access the > slots directly - the developer should provide methods to access all the > slots that she or he deems appropriate for user to access. No method could > mean either it hasn't been implemented yet, or is not implemented by design > (Thibaut will have more to say about this). > > And now crux of the matter. Your first two examples work because lists > can be accessed through various operators. This is often done via `lapply(X > = x, FUN = "[[", "element_name")`. In your case, you can try creating an > anonymous function that accesses the slot. > > library(adegenet) > data(nancycats) > x <- list(nancycats) > lapply(x, FUN = function(x) x$tab) > > > > Cheers, > > Roman > > > > > ---- > In god we trust, all others bring data. > > ------------------------------ > *From: *"Vikram Chhatre" > *To: *adegenet-forum at lists.r-forge.r-project.org > *Sent: *Monday, March 30, 2015 4:25:16 AM > *Subject: *[adegenet-forum] extracting genefreq $tab from an indexed list > > I am working with hundreds of genpop objects indexed in a list. Using > lapply and makefreq functions, population gene frequencies were stored in > individual objects (1 per data set). > > Here is an example with just three objects: > > >summary(mygenpop) > Length Class Mode > data1.str 1 genpop S4 > data2.str 1 genpop S4 > data3.str 1 genpop S4 > > >mygenfreq <- lapply(mygenpop, function(x) makefreq(x, truenames=TRUE)) > > >summary(mygenfreq) > Length Class Mode > data1.str 3 -none- list > data2.str 3 -none- list > data3.str 3 -none- list > > >summary(mygenfreq[[1]]$tab) > > str(mygenfreq[[1]]) > $ tab : num [1:30, 1:1974] 0.6 0.5 0.325 0.675 0.6 0.5 0.5 0.375 0.55 > 0.475 ... > $ nobs: num [1:30, 1:1000] 40 40 40 40 40 40 40 40 40 40 ... > > Next job is to work with the $tab matrix, but I am not sure how to access > it from all objects in one command. > > >mygenfreqT <- lapply(mygenfreq[[1:3]]$tab, function(x) t(x)) > > This throws an error. The syntax seems to be wrong, but I am not sure > how to fix this. Thanks for any help. > > Vikram > > > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum > > > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.jombart at imperial.ac.uk Mon Mar 30 14:53:02 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Mon, 30 Mar 2015 12:53:02 +0000 Subject: [adegenet-forum] extracting genefreq $tab from an indexed list In-Reply-To: References: Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABF01308@icexch-m1.ic.ac.uk> Hi there, the operator [[]] returns a slot of a list, not a list, which is an issue here. To subset a list you should use []. Otherwise, to do what you want, you need something like: lapply(mygenfreq, function(e) t(e$tab)) Note that as of adegnet_2.0-0, there will be a simpler interface to get frequencies (tab(x, freq=TRUE)). Cheers Thibaut ________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Vikram Chhatre [crypticlineage at gmail.com] Sent: 30 March 2015 03:25 To: adegenet-forum at lists.r-forge.r-project.org Subject: [adegenet-forum] extracting genefreq $tab from an indexed list I am working with hundreds of genpop objects indexed in a list. Using lapply and makefreq functions, population gene frequencies were stored in individual objects (1 per data set). Here is an example with just three objects: >summary(mygenpop) Length Class Mode data1.str 1 genpop S4 data2.str 1 genpop S4 data3.str 1 genpop S4 >mygenfreq <- lapply(mygenpop, function(x) makefreq(x, truenames=TRUE)) >summary(mygenfreq) Length Class Mode data1.str 3 -none- list data2.str 3 -none- list data3.str 3 -none- list >summary(mygenfreq[[1]]$tab) > str(mygenfreq[[1]]) $ tab : num [1:30, 1:1974] 0.6 0.5 0.325 0.675 0.6 0.5 0.5 0.375 0.55 0.475 ... $ nobs: num [1:30, 1:1000] 40 40 40 40 40 40 40 40 40 40 ... Next job is to work with the $tab matrix, but I am not sure how to access it from all objects in one command. >mygenfreqT <- lapply(mygenfreq[[1:3]]$tab, function(x) t(x)) This throws an error. The syntax seems to be wrong, but I am not sure how to fix this. Thanks for any help. Vikram -------------- next part -------------- An HTML attachment was scrubbed... URL: From karl.fetter at gmail.com Tue Mar 31 00:20:27 2015 From: karl.fetter at gmail.com (Karl Fetter) Date: Mon, 30 Mar 2015 18:20:27 -0400 Subject: [adegenet-forum] Parallel computing? Message-ID: <2A51E47D-3DFB-49EA-8901-F6E57B8291A9@gmail.com> Hi Adegenet Users, I'm going to be running a DAPC on a large data set soon of about 167K SNPs. I want to run these commands in parallel and I'm very unfamiliar with the process. A quick google search brings me to several dozen R packages for parallel computing and I'm wondering, what's the latest and greatest package out there? Thanks in advance! Karl Fetter From roman.lustrik at biolitika.si Tue Mar 31 08:29:59 2015 From: roman.lustrik at biolitika.si (Roman Lustrik) Date: Tue, 31 Mar 2015 08:29:59 +0200 (CEST) Subject: [adegenet-forum] Parallel computing? In-Reply-To: <2A51E47D-3DFB-49EA-8901-F6E57B8291A9@gmail.com> References: <2A51E47D-3DFB-49EA-8901-F6E57B8291A9@gmail.com> Message-ID: <1624376297.579467.1427783399095.JavaMail.zimbra@biolitika.si> It depends on what platform you're on. Dirk's task view gives a nice overview of what's available (http://cran.r-project.org/web/views/HighPerformanceComputing.html). I have experience on windows (snowfall, parallel from vanilla R) and HP super computer running RedHat where I've had good results using snow based appls and Rmpi on the cluster. Cheers, Roman ---- In god we trust, all others bring data. ----- Original Message ----- From: "Karl Fetter" To: adegenet-forum at lists.r-forge.r-project.org Sent: Tuesday, March 31, 2015 12:20:27 AM Subject: [adegenet-forum] Parallel computing? Hi Adegenet Users, I'm going to be running a DAPC on a large data set soon of about 167K SNPs. I want to run these commands in parallel and I'm very unfamiliar with the process. A quick google search brings me to several dozen R packages for parallel computing and I'm wondering, what's the latest and greatest package out there? Thanks in advance! Karl Fetter _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum From f.calboli at imperial.ac.uk Tue Mar 31 08:38:52 2015 From: f.calboli at imperial.ac.uk (Federico Calboli) Date: Tue, 31 Mar 2015 09:38:52 +0300 Subject: [adegenet-forum] Parallel computing? In-Reply-To: <2A51E47D-3DFB-49EA-8901-F6E57B8291A9@gmail.com> References: <2A51E47D-3DFB-49EA-8901-F6E57B8291A9@gmail.com> Message-ID: <2CB5BED9-04F7-4C5B-BE3A-25BFA54054E6@imperial.ac.uk> On 31 Mar 2015, at 01:20, Karl Fetter wrote: > > Hi Adegenet Users, > > I'm going to be running a DAPC on a large data set soon of about 167K SNPs. I hate to be contrararian, BUT you will have a lot of SNPs that are in strong linkage, i.e. they will provide *extactly* the same information, adding nothing to your analysis aside from computational burden. I know I am not a referee of your future paper, and thus you need not to, but you might actually get something out of convincing me ausing so many SNPs is actually beter that pruning them to a subset that have a much lower linkage between them (say, select SNPs with a pairwise R^2 of.5 in a window of 50 SNPs, that you slide 5 SNPs at a time until you have pruned the whole genome. PLINK can do this for you). Cheers F > I want to run these commands in parallel and I'm very unfamiliar with the process. A quick google search brings me to several dozen R packages for parallel computing and I'm wondering, what's the latest and greatest package out there? > > Thanks in advance! > > Karl Fetter > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum From t.jombart at imperial.ac.uk Tue Mar 31 12:53:46 2015 From: t.jombart at imperial.ac.uk (Jombart, Thibaut) Date: Tue, 31 Mar 2015 10:53:46 +0000 Subject: [adegenet-forum] Parallel computing? In-Reply-To: <2CB5BED9-04F7-4C5B-BE3A-25BFA54054E6@imperial.ac.uk> References: <2A51E47D-3DFB-49EA-8901-F6E57B8291A9@gmail.com>, <2CB5BED9-04F7-4C5B-BE3A-25BFA54054E6@imperial.ac.uk> Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABF02453@icexch-m1.ic.ac.uk> Hi there The point of DAPC is actually to handle this redundancy for you, and it is not clear to me that you need a supercomputer for your analyses. The PCA step of the DAPC is meant to identify blocks of strongly correlated SNPs, and it is also probably a more rigorous way to do so that using an arbitrary sliding window and R^2. Computationally, if you have 150k SNPs and say 200 individuals, the matrix that is diagonalized is still 200x200, and the dimensionality of your data is <= 200. The real challenge here is: 1) storing the data; if too large and if treating SNPs as binary data is OK, use the genlight class 2) converting the data; if you need a genind object, converting the data from a DNAbin object will take time; I have recently optimized this, so you may want to use the devel version of adegenet 2.0-0: https://github.com/thibautjombart/adegenet Cheers Thibaut ________________________________________ From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Federico Calboli [f.calboli at imperial.ac.uk] Sent: 31 March 2015 07:38 To: Karl Fetter Cc: adegenet-forum at lists.r-forge.r-project.org Subject: Re: [adegenet-forum] Parallel computing? On 31 Mar 2015, at 01:20, Karl Fetter wrote: > > Hi Adegenet Users, > > I'm going to be running a DAPC on a large data set soon of about 167K SNPs. I hate to be contrararian, BUT you will have a lot of SNPs that are in strong linkage, i.e. they will provide *extactly* the same information, adding nothing to your analysis aside from computational burden. I know I am not a referee of your future paper, and thus you need not to, but you might actually get something out of convincing me ausing so many SNPs is actually beter that pruning them to a subset that have a much lower linkage between them (say, select SNPs with a pairwise R^2 of.5 in a window of 50 SNPs, that you slide 5 SNPs at a time until you have pruned the whole genome. PLINK can do this for you). Cheers F > I want to run these commands in parallel and I'm very unfamiliar with the process. A quick google search brings me to several dozen R packages for parallel computing and I'm wondering, what's the latest and greatest package out there? > > Thanks in advance! > > Karl Fetter > _______________________________________________ > adegenet-forum mailing list > adegenet-forum at lists.r-forge.r-project.org > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum _______________________________________________ adegenet-forum mailing list adegenet-forum at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum