[adegenet-forum] Parallel computing?

Jombart, Thibaut t.jombart at imperial.ac.uk
Tue Mar 31 12:53:46 CEST 2015


Hi there

The point of DAPC is actually to handle this redundancy for you, and it is not clear to me that you need a supercomputer for your analyses. The PCA step of the DAPC is meant to identify blocks of strongly correlated SNPs, and it is also probably a more rigorous way to do so that using an arbitrary sliding window and R^2. 

Computationally, if you have 150k SNPs and say 200 individuals, the matrix that is diagonalized is still 200x200, and the dimensionality of your data is <= 200. The real challenge here is:
1) storing the data; if too large and if treating SNPs as binary data is OK, use the genlight class
2) converting the data; if you need a genind object, converting the data from a DNAbin object will take time; I have recently optimized this, so you may want to use the devel version of adegenet 2.0-0:
https://github.com/thibautjombart/adegenet

Cheers
Thibaut


________________________________________
From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Federico Calboli [f.calboli at imperial.ac.uk]
Sent: 31 March 2015 07:38
To: Karl Fetter
Cc: adegenet-forum at lists.r-forge.r-project.org
Subject: Re: [adegenet-forum] Parallel computing?

On 31 Mar 2015, at 01:20, Karl Fetter <karl.fetter at gmail.com> wrote:
>
> Hi Adegenet Users,
>
> I'm going to be running a DAPC on a large data set soon of about 167K SNPs.

I hate to be contrararian, BUT you will have a lot of SNPs that are in strong linkage, i.e. they will provide *extactly* the same information, adding nothing to your analysis aside from computational burden.

I know I am not a referee of your future paper, and thus you need not to, but you might actually get something out of convincing me ausing so many SNPs is actually beter that pruning them to a subset that have a much lower linkage between them (say, select SNPs with a pairwise R^2 of.5 in a window of 50 SNPs, that you slide 5 SNPs at a time until you have pruned the whole genome.  PLINK can do this for you).


Cheers

F



> I want to run these commands in parallel and I'm very unfamiliar with the process. A quick google search brings me to several dozen R packages for parallel computing and I'm wondering, what's the latest and greatest package out there?
>
> Thanks in advance!
>
> Karl Fetter
> _______________________________________________
> adegenet-forum mailing list
> adegenet-forum at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum

_______________________________________________
adegenet-forum mailing list
adegenet-forum at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum


More information about the adegenet-forum mailing list