From Mark.Coulson.ic at uhi.ac.uk  Wed Mar  4 14:47:57 2015
From: Mark.Coulson.ic at uhi.ac.uk (Mark Coulson)
Date: Wed, 4 Mar 2015 13:47:57 +0000
Subject: [adegenet-forum] sequential DAPC
Message-ID: <AMSPR06MB007A95782F54CC610137DC6EA1E0@AMSPR06MB007.eurprd06.prod.outlook.com>

Hello,

I have run a DAPC on a large dataset of individuals from 100 locations. There are a couple of clear outlier groups that I have then removed and want to run a subsequent DAPC on the rest of the dataset (now 98 locations) and do these sequentially for a couple of rounds. My question is do I need to keep the same number of PCA and DF for each or should I be re-running the xvalDapc function for each 'level' individually? What would be more sensible in order to compare across levels?

Best,
Mark
Inverness College UHI, a partner in the University of the Highlands and Islands www.inverness.uhi.ac.uk Board of Management of Inverness College (known as Inverness College UHI), Scottish Charity No SC021197.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20150304/417a0f5d/attachment.html>

From t.jombart at imperial.ac.uk  Wed Mar  4 18:14:23 2015
From: t.jombart at imperial.ac.uk (Jombart, Thibaut)
Date: Wed, 4 Mar 2015 17:14:23 +0000
Subject: [adegenet-forum] Hackathon coming: request new features!
Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABEE89CD@icexch-m1.ic.ac.uk>


Dear all,

as a follow-up to a previous post, most of the adegenet development team will be attending a hackathon hosted by NESCent (NC, USA) in a few days.

Now that adegenet has moved on github, posting bug reports or feature requests is trivial. All you need to do is submit a new 'issue' at:
https://github.com/thibautjombart/adegenet/issues

So, if there is anything you wish changed, fixed or added, shoot!

Cheers
Thibaut


==============================
Dr Thibaut Jombart
MRC Centre for Outbreak Analysis and Modelling
Department of Infectious Disease Epidemiology
Imperial College - School of Public Health
Norfolk Place, London W2 1PG, UK
Tel. : 0044 (0)20 7594 3658
http://sites.google.com/site/thibautjombart/
http://sites.google.com/site/therepiproject/
http://adegenet.r-forge.r-project.org/
Twitter: @thibautjombart


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20150304/f27f7af0/attachment.html>

From crypticlineage at gmail.com  Mon Mar 30 04:25:16 2015
From: crypticlineage at gmail.com (Vikram Chhatre)
Date: Sun, 29 Mar 2015 22:25:16 -0400
Subject: [adegenet-forum] extracting genefreq $tab from an indexed list
Message-ID: <CAJZnH0m0E--K3g5TbbZRKOkBL3wAf-smSndkuqNqbBHjYcEtAQ@mail.gmail.com>

I am working with hundreds of genpop objects indexed in a list.  Using
lapply and makefreq functions, population gene frequencies were stored in
individual objects (1 per data set).

Here is an example with just three objects:

>summary(mygenpop)
                  Length Class  Mode
data1.str 1      genpop S4
data2.str 1      genpop S4
data3.str 1      genpop S4

>mygenfreq <- lapply(mygenpop, function(x) makefreq(x, truenames=TRUE))

>summary(mygenfreq)
                  Length Class  Mode
data1.str 3      -none- list
data2.str 3      -none- list
data3.str 3      -none- list

>summary(mygenfreq[[1]]$tab)
> str(mygenfreq[[1]])
 $ tab : num [1:30, 1:1974] 0.6 0.5 0.325 0.675 0.6 0.5 0.5 0.375 0.55
0.475 ...
 $ nobs: num [1:30, 1:1000] 40 40 40 40 40 40 40 40 40 40 ...

Next job is to work with the $tab matrix, but I am not sure how to access
it from all objects in one command.

>mygenfreqT <- lapply(mygenfreq[[1:3]]$tab, function(x) t(x))

This throws an error.   The syntax seems to be wrong, but I am not sure how
to fix this.  Thanks for any help.

Vikram
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20150329/f9c4c366/attachment.html>

From roman.lustrik at biolitika.si  Mon Mar 30 08:08:22 2015
From: roman.lustrik at biolitika.si (Roman Lustrik)
Date: Mon, 30 Mar 2015 08:08:22 +0200 (CEST)
Subject: [adegenet-forum] extracting genefreq $tab from an indexed list
In-Reply-To: <CAJZnH0m0E--K3g5TbbZRKOkBL3wAf-smSndkuqNqbBHjYcEtAQ@mail.gmail.com>
References: <CAJZnH0m0E--K3g5TbbZRKOkBL3wAf-smSndkuqNqbBHjYcEtAQ@mail.gmail.com>
Message-ID: <1404807810.557482.1427695702830.JavaMail.zimbra@biolitika.si>

S4 objects are different to the classical S3 object like data.frames, lists and other "basic" objects. One of their peculiarities is that they're accessed through "@" operator. In truth, user is not meant to access the slots directly - the developer should provide methods to access all the slots that she or he deems appropriate for user to access. No method could mean either it hasn't been implemented yet, or is not implemented by design (Thibaut will have more to say about this). 

And now crux of the matter. Your first two examples work because lists can be accessed through various operators. This is often done via `lapply(X = x, FUN = "[[", "element_name")`. In your case, you can try creating an anonymous function that accesses the slot. 


library(adegenet) 
data(nancycats) 
x <- list(nancycats) 
lapply(x, FUN = function(x) x$tab) 


Cheers, 

Roman 


---- 
In god we trust, all others bring data. 

----- Original Message -----

From: "Vikram Chhatre" <crypticlineage at gmail.com> 
To: adegenet-forum at lists.r-forge.r-project.org 
Sent: Monday, March 30, 2015 4:25:16 AM 
Subject: [adegenet-forum] extracting genefreq $tab from an indexed list 

I am working with hundreds of genpop objects indexed in a list. Using lapply and makefreq functions, population gene frequencies were stored in individual objects (1 per data set). 

Here is an example with just three objects: 

>summary(mygenpop) 
Length Class Mode 
data1.str 1 genpop S4 
data2.str 1 genpop S4 
data3.str 1 genpop S4 

>mygenfreq <- lapply(mygenpop, function(x) makefreq(x, truenames=TRUE)) 

>summary(mygenfreq) 
Length Class Mode 
data1.str 3 -none- list 
data2.str 3 -none- list 
data3.str 3 -none- list 

>summary(mygenfreq[[1]]$tab) 
> str(mygenfreq[[1]]) 
$ tab : num [1:30, 1:1974] 0.6 0.5 0.325 0.675 0.6 0.5 0.5 0.375 0.55 0.475 ... 
$ nobs: num [1:30, 1:1000] 40 40 40 40 40 40 40 40 40 40 ... 

Next job is to work with the $tab matrix, but I am not sure how to access it from all objects in one command. 

>mygenfreqT <- lapply(mygenfreq[[1:3]]$tab, function(x) t(x)) 

This throws an error. The syntax seems to be wrong, but I am not sure how to fix this. Thanks for any help. 

Vikram 


_______________________________________________ 
adegenet-forum mailing list 
adegenet-forum at lists.r-forge.r-project.org 
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20150330/2268f83e/attachment.html>

From crypticlineage at gmail.com  Mon Mar 30 14:11:24 2015
From: crypticlineage at gmail.com (Vikram Chhatre)
Date: Mon, 30 Mar 2015 08:11:24 -0400
Subject: [adegenet-forum] extracting genefreq $tab from an indexed list
In-Reply-To: <1568758322.557508.1427695784424.JavaMail.zimbra@biolitika.si>
References: <CAJZnH0m0E--K3g5TbbZRKOkBL3wAf-smSndkuqNqbBHjYcEtAQ@mail.gmail.com>
 <1404807810.557482.1427695702830.JavaMail.zimbra@biolitika.si>
 <1568758322.557508.1427695784424.JavaMail.zimbra@biolitika.si>
Message-ID: <CAJZnH0mW4yPuNojosr_SkWVnTq2CNWJ1s_mT-0Prt6r57tmzXA@mail.gmail.com>

Hi Roman,

Thank you for the explanation.  The following does not work.

>mygenfreqT <- lapply(mygenfreq, FUN=function(x) t(x at tab))
Error in t(x at tab) :
  trying to get slot "tab" from an object of a basic class ("list") with no
slots

Someone else suggested another solution, which seems to have worked:

>mygenfreqT <- lapply(lapply(mygenfreq, "[[", "tab"), function(x) t(x))

> head(mygenfreqT[[1]])
           1     2     3     4     5     6     7     8     9    10    11
 12
L0001.1 0.60 0.500 0.325 0.675 0.600 0.500 0.500 0.375 0.550 0.475 0.350
0.275
L0001.2 0.40 0.500 0.675 0.325 0.400 0.500 0.500 0.625 0.450 0.525 0.650
0.725
L0002.1 0.30 0.150 0.175 0.250 0.275 0.400 0.325 0.325 0.475 0.275 0.175
0.150

Any other solutions are welcome.

Thanks
Vikram


On Mon, Mar 30, 2015 at 2:09 AM, Roman Lustrik <roman.lustrik at biolitika.si>
wrote:

> Oops, make that `lapply(x, FUN = function(x) x at tab)` <x at tab)>.
>
> Cheers,
> Roman
>
> ----
> In god we trust, all others bring data.
>
> ------------------------------
> *From: *"Roman Lustrik" <roman.lustrik at biolitika.si>
> *To: *"Vikram Chhatre" <crypticlineage at gmail.com>
> *Cc: *adegenet-forum at lists.r-forge.r-project.org
> *Sent: *Monday, March 30, 2015 8:08:22 AM
> *Subject: *Re: [adegenet-forum] extracting genefreq $tab from an indexed
> list
>
>
> S4 objects are different to the  classical S3 object like data.frames,
> lists and other "basic" objects. One of their peculiarities is that they're
> accessed through "@" operator. In truth, user is not meant to access the
> slots directly - the developer should provide methods to access all the
> slots that she or he deems appropriate for user to access. No method could
> mean either it hasn't been implemented yet, or is not implemented by design
> (Thibaut will have more to say about this).
>
> And now crux of the matter. Your first two examples work because lists
> can be accessed through various operators. This is often done via `lapply(X
> = x, FUN = "[[", "element_name")`. In your case, you can try creating an
> anonymous function that accesses the slot.
>
> library(adegenet)
> data(nancycats)
> x <- list(nancycats)
> lapply(x, FUN = function(x) x$tab)
>
>
>
> Cheers,
>
> Roman
>
>
>
>
> ----
> In god we trust, all others bring data.
>
> ------------------------------
> *From: *"Vikram Chhatre" <crypticlineage at gmail.com>
> *To: *adegenet-forum at lists.r-forge.r-project.org
> *Sent: *Monday, March 30, 2015 4:25:16 AM
> *Subject: *[adegenet-forum] extracting genefreq $tab from an indexed list
>
> I am working with hundreds of genpop objects indexed in a list.  Using
> lapply and makefreq functions, population gene frequencies were stored in
> individual objects (1 per data set).
>
> Here is an example with just three objects:
>
> >summary(mygenpop)
>                   Length Class  Mode
> data1.str 1      genpop S4
> data2.str 1      genpop S4
> data3.str 1      genpop S4
>
> >mygenfreq <- lapply(mygenpop, function(x) makefreq(x, truenames=TRUE))
>
> >summary(mygenfreq)
>                   Length Class  Mode
> data1.str 3      -none- list
> data2.str 3      -none- list
> data3.str 3      -none- list
>
> >summary(mygenfreq[[1]]$tab)
> > str(mygenfreq[[1]])
>  $ tab : num [1:30, 1:1974] 0.6 0.5 0.325 0.675 0.6 0.5 0.5 0.375 0.55
> 0.475 ...
>  $ nobs: num [1:30, 1:1000] 40 40 40 40 40 40 40 40 40 40 ...
>
> Next job is to work with the $tab matrix, but I am not sure how to access
> it from all objects in one command.
>
> >mygenfreqT <- lapply(mygenfreq[[1:3]]$tab, function(x) t(x))
>
> This throws an error.   The syntax seems to be wrong, but I am not sure
> how to fix this.  Thanks for any help.
>
> Vikram
>
>
>
> _______________________________________________
> adegenet-forum mailing list
> adegenet-forum at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
>
>
> _______________________________________________
> adegenet-forum mailing list
> adegenet-forum at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20150330/ea9bfc7b/attachment.html>

From t.jombart at imperial.ac.uk  Mon Mar 30 14:53:02 2015
From: t.jombart at imperial.ac.uk (Jombart, Thibaut)
Date: Mon, 30 Mar 2015 12:53:02 +0000
Subject: [adegenet-forum] extracting genefreq $tab from an indexed list
In-Reply-To: <CAJZnH0m0E--K3g5TbbZRKOkBL3wAf-smSndkuqNqbBHjYcEtAQ@mail.gmail.com>
References: <CAJZnH0m0E--K3g5TbbZRKOkBL3wAf-smSndkuqNqbBHjYcEtAQ@mail.gmail.com>
Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABF01308@icexch-m1.ic.ac.uk>


Hi there,

the operator [[]] returns a slot of a list, not a list, which is an issue here. To subset a list you should use [].

Otherwise, to do what you want, you need something like:
lapply(mygenfreq, function(e) t(e$tab))

Note that as of adegnet_2.0-0, there will be a simpler interface to get frequencies (tab(x, freq=TRUE)).

Cheers
Thibaut


________________________________
From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Vikram Chhatre [crypticlineage at gmail.com]
Sent: 30 March 2015 03:25
To: adegenet-forum at lists.r-forge.r-project.org
Subject: [adegenet-forum] extracting genefreq $tab from an indexed list

I am working with hundreds of genpop objects indexed in a list.  Using lapply and makefreq functions, population gene frequencies were stored in individual objects (1 per data set).

Here is an example with just three objects:

>summary(mygenpop)
                  Length Class  Mode
data1.str 1      genpop S4
data2.str 1      genpop S4
data3.str 1      genpop S4

>mygenfreq <- lapply(mygenpop, function(x) makefreq(x, truenames=TRUE))

>summary(mygenfreq)
                  Length Class  Mode
data1.str 3      -none- list
data2.str 3      -none- list
data3.str 3      -none- list

>summary(mygenfreq[[1]]$tab)
> str(mygenfreq[[1]])
 $ tab : num [1:30, 1:1974] 0.6 0.5 0.325 0.675 0.6 0.5 0.5 0.375 0.55 0.475 ...
 $ nobs: num [1:30, 1:1000] 40 40 40 40 40 40 40 40 40 40 ...

Next job is to work with the $tab matrix, but I am not sure how to access it from all objects in one command.

>mygenfreqT <- lapply(mygenfreq[[1:3]]$tab, function(x) t(x))

This throws an error.   The syntax seems to be wrong, but I am not sure how to fix this.  Thanks for any help.

Vikram


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20150330/e723430f/attachment.html>

From karl.fetter at gmail.com  Tue Mar 31 00:20:27 2015
From: karl.fetter at gmail.com (Karl Fetter)
Date: Mon, 30 Mar 2015 18:20:27 -0400
Subject: [adegenet-forum] Parallel computing?
Message-ID: <2A51E47D-3DFB-49EA-8901-F6E57B8291A9@gmail.com>

Hi Adegenet Users,

I'm going to be running a DAPC on a large data set soon of about 167K SNPs. I want to run these commands in parallel and I'm very unfamiliar with the process. A quick google search brings me to several dozen R packages for parallel computing and I'm wondering, what's the latest and greatest package out there?

Thanks in advance!

Karl Fetter 

From roman.lustrik at biolitika.si  Tue Mar 31 08:29:59 2015
From: roman.lustrik at biolitika.si (Roman Lustrik)
Date: Tue, 31 Mar 2015 08:29:59 +0200 (CEST)
Subject: [adegenet-forum] Parallel computing?
In-Reply-To: <2A51E47D-3DFB-49EA-8901-F6E57B8291A9@gmail.com>
References: <2A51E47D-3DFB-49EA-8901-F6E57B8291A9@gmail.com>
Message-ID: <1624376297.579467.1427783399095.JavaMail.zimbra@biolitika.si>

It depends on what platform you're on. Dirk's task view gives a nice overview of what's available (http://cran.r-project.org/web/views/HighPerformanceComputing.html). I have experience on windows (snowfall, parallel from vanilla R) and HP super computer running RedHat where I've had good results using snow based appls and Rmpi on the cluster.

Cheers,
Roman

----
In god we trust, all others bring data.

----- Original Message -----
From: "Karl Fetter" <karl.fetter at gmail.com>
To: adegenet-forum at lists.r-forge.r-project.org
Sent: Tuesday, March 31, 2015 12:20:27 AM
Subject: [adegenet-forum] Parallel computing?

Hi Adegenet Users,

I'm going to be running a DAPC on a large data set soon of about 167K SNPs. I want to run these commands in parallel and I'm very unfamiliar with the process. A quick google search brings me to several dozen R packages for parallel computing and I'm wondering, what's the latest and greatest package out there?

Thanks in advance!

Karl Fetter 
_______________________________________________
adegenet-forum mailing list
adegenet-forum at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum

From f.calboli at imperial.ac.uk  Tue Mar 31 08:38:52 2015
From: f.calboli at imperial.ac.uk (Federico Calboli)
Date: Tue, 31 Mar 2015 09:38:52 +0300
Subject: [adegenet-forum] Parallel computing?
In-Reply-To: <2A51E47D-3DFB-49EA-8901-F6E57B8291A9@gmail.com>
References: <2A51E47D-3DFB-49EA-8901-F6E57B8291A9@gmail.com>
Message-ID: <2CB5BED9-04F7-4C5B-BE3A-25BFA54054E6@imperial.ac.uk>

On 31 Mar 2015, at 01:20, Karl Fetter <karl.fetter at gmail.com> wrote:
> 
> Hi Adegenet Users,
> 
> I'm going to be running a DAPC on a large data set soon of about 167K SNPs.

I hate to be contrararian, BUT you will have a lot of SNPs that are in strong linkage, i.e. they will provide *extactly* the same information, adding nothing to your analysis aside from computational burden.

I know I am not a referee of your future paper, and thus you need not to, but you might actually get something out of convincing me ausing so many SNPs is actually beter that pruning them to a subset that have a much lower linkage between them (say, select SNPs with a pairwise R^2 of.5 in a window of 50 SNPs, that you slide 5 SNPs at a time until you have pruned the whole genome.  PLINK can do this for you). 


Cheers

F


> I want to run these commands in parallel and I'm very unfamiliar with the process. A quick google search brings me to several dozen R packages for parallel computing and I'm wondering, what's the latest and greatest package out there?
> 
> Thanks in advance!
> 
> Karl Fetter 
> _______________________________________________
> adegenet-forum mailing list
> adegenet-forum at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum


From t.jombart at imperial.ac.uk  Tue Mar 31 12:53:46 2015
From: t.jombart at imperial.ac.uk (Jombart, Thibaut)
Date: Tue, 31 Mar 2015 10:53:46 +0000
Subject: [adegenet-forum] Parallel computing?
In-Reply-To: <2CB5BED9-04F7-4C5B-BE3A-25BFA54054E6@imperial.ac.uk>
References: <2A51E47D-3DFB-49EA-8901-F6E57B8291A9@gmail.com>,
 <2CB5BED9-04F7-4C5B-BE3A-25BFA54054E6@imperial.ac.uk>
Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570ABF02453@icexch-m1.ic.ac.uk>


Hi there

The point of DAPC is actually to handle this redundancy for you, and it is not clear to me that you need a supercomputer for your analyses. The PCA step of the DAPC is meant to identify blocks of strongly correlated SNPs, and it is also probably a more rigorous way to do so that using an arbitrary sliding window and R^2. 

Computationally, if you have 150k SNPs and say 200 individuals, the matrix that is diagonalized is still 200x200, and the dimensionality of your data is <= 200. The real challenge here is:
1) storing the data; if too large and if treating SNPs as binary data is OK, use the genlight class
2) converting the data; if you need a genind object, converting the data from a DNAbin object will take time; I have recently optimized this, so you may want to use the devel version of adegenet 2.0-0:
https://github.com/thibautjombart/adegenet

Cheers
Thibaut


________________________________________
From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Federico Calboli [f.calboli at imperial.ac.uk]
Sent: 31 March 2015 07:38
To: Karl Fetter
Cc: adegenet-forum at lists.r-forge.r-project.org
Subject: Re: [adegenet-forum] Parallel computing?

On 31 Mar 2015, at 01:20, Karl Fetter <karl.fetter at gmail.com> wrote:
>
> Hi Adegenet Users,
>
> I'm going to be running a DAPC on a large data set soon of about 167K SNPs.

I hate to be contrararian, BUT you will have a lot of SNPs that are in strong linkage, i.e. they will provide *extactly* the same information, adding nothing to your analysis aside from computational burden.

I know I am not a referee of your future paper, and thus you need not to, but you might actually get something out of convincing me ausing so many SNPs is actually beter that pruning them to a subset that have a much lower linkage between them (say, select SNPs with a pairwise R^2 of.5 in a window of 50 SNPs, that you slide 5 SNPs at a time until you have pruned the whole genome.  PLINK can do this for you).


Cheers

F


> I want to run these commands in parallel and I'm very unfamiliar with the process. A quick google search brings me to several dozen R packages for parallel computing and I'm wondering, what's the latest and greatest package out there?
>
> Thanks in advance!
>
> Karl Fetter
> _______________________________________________
> adegenet-forum mailing list
> adegenet-forum at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum

_______________________________________________
adegenet-forum mailing list
adegenet-forum at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum