From siobhan.dennison at mq.edu.au Tue Sep 9 08:31:35 2014
From: siobhan.dennison at mq.edu.au (Siobhan Dennison)
Date: Tue, 9 Sep 2014 16:31:35 +1000
Subject: [adegenet-forum] Problems with find.cluster
Message-ID:
I am working on genetic structure of a threatened species, and as such have
rather small sample sizes. Two of my four populations are out of HWE, and
so I am using DAPC to look at population clustering because it does not
assume HWE.
The DAPC yielded 4 clusters as I expected, using the location information,
and retaining a very conservative 11 PCs (following a.score). However, when
I wanted to look at clustering with no location priors on the data, things
got a bit weird. I used the find.clusters option in adegenet, and I keep
getting very different results to my other analyses - the lowest BIC falls
at K=1, but the BIC values are extremely low (~420), steadily increasing
from there (I attached the graph FYI).
My Fst values based on microsatellites suggest high differentiation between
the 4 sites. I standardised my Fst values following Miermans 2006, which
gave rather high Fst values (0.2-0.4). My mitochondrial Fst values are also
high (>0.5).
Using Structure with LOCprior (accounting for low sample sizes), I get K=4
as the most likely number of clusters, and PCA also shows delineation
between the four sample sites.
Given that all of my other analyses tell the same story (that there a four
rather differentiated sites), I'm wondering if anyone can tell me where I
might be going wrong here?
Any pointers would be greatly appreciated!!
Thanks,
Siobhan
--
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
-------------- next part --------------
A non-text attachment was scrubbed...
Name: find.clusters output.pdf
Type: application/pdf
Size: 5249 bytes
Desc: not available
URL:
From crypticlineage at gmail.com Fri Sep 12 19:31:46 2014
From: crypticlineage at gmail.com (Vikram Chhatre)
Date: Fri, 12 Sep 2014 13:31:46 -0400
Subject: [adegenet-forum] Per locus pairwise Fst
In-Reply-To:
References:
<2CB2DA8E426F3541AB1907F98ABA6570A8233A2D@icexch-m1.ic.ac.uk>
Message-ID:
I am revisiting this topic due to some technical problems.
The task at hand is to estimate pairwise Fst matrices for each locus
separately.
# Genind object is stored in:
gen100_genind
# Use seploc to separate loci:
gen100_seploc <- seploc(gen100_genind, truenames=TRUE, res.type=c('genind',
'matrix')
# Calculate pairwise Fst:
gen100_perLocusPWFst <- lapply(gen100_seploc, pairwise.fst,
res.type=c('dist', 'matrix'), trunames=TRUE)
For a data set consisting of 30 populations, 20 individuals each, 1000 loci
and 2 alleles per locus (1.2 million data points), it takes up to 6 hours
to estimate the pairwise Fst matrix with this method.
Is there any way to speed this up? Should I look into any other packages?
Many thanks for your time and help.
Vikram
On Mon, Jul 14, 2014 at 9:16 AM, Vikram Chhatre
wrote:
> Perfect! Thank you for both solutions.
>
> V
>
>
> On Mon, Jul 14, 2014 at 9:13 AM, Jombart, Thibaut <
> t.jombart at imperial.ac.uk> wrote:
>
>>
>> Hi there,
>>
>> you can use seploc to separate loci, and lapply over the resulting list
>> using your prefered fst function.
>>
>> Cheers
>> Thibaut
>> ________________________________________
>> From: adegenet-forum-bounces at lists.r-forge.r-project.org [
>> adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Vikram
>> Chhatre [crypticlineage at gmail.com]
>> Sent: 14 July 2014 14:01
>> To: adegenet-forum at lists.r-forge.r-project.org
>> Subject: [adegenet-forum] Per locus pairwise Fst
>>
>> Good morning.
>>
>> I would like to estimate per locus pairwise Fst for populations, but it
>> appears that Adegenet only estimates this over all loci (i.e. single
>> matrix). What I would like is one matrix per locus. Has anyone modified
>> the functions or know of alternative programs that can do this?
>>
>> Thanks
>> Vikram
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From vojta at trapa.cz Sat Sep 13 19:47:34 2014
From: vojta at trapa.cz (=?utf-8?B?Vm9qdMSbY2g=?= Zeisek)
Date: Sat, 13 Sep 2014 19:47:34 +0200
Subject: [adegenet-forum] Per locus pairwise Fst
In-Reply-To:
References:
Message-ID: <1821182.AM9yii2LuR@veles.site>
Hello,
R is basically single threaded. Some packages/functions implement
parallelisation and if not, You can do it yourself. As You use function from
apply family, it should be easy, although I don't have solution right
available in my pocket. There are also several possibilities and it might
require some testing to find out the best solution for You task and equipment.
See http://www.r-bloggers.com/parallel-computing-in-r/ and details about
mentioned functions on http://cran.r-project.org/ When You google for parallel
computing in R, You get many links...
Good luck!
Vojt?ch
Dne P? 12. z??? 2014 13:31:46, Vikram Chhatre napsal(a):
> I am revisiting this topic due to some technical problems.
>
> The task at hand is to estimate pairwise Fst matrices for each locus
> separately.
>
> # Genind object is stored in:
> gen100_genind
>
> # Use seploc to separate loci:
> gen100_seploc <- seploc(gen100_genind, truenames=TRUE, res.type=c('genind',
> 'matrix')
>
> # Calculate pairwise Fst:
> gen100_perLocusPWFst <- lapply(gen100_seploc, pairwise.fst,
> res.type=c('dist', 'matrix'), trunames=TRUE)
>
> For a data set consisting of 30 populations, 20 individuals each, 1000 loci
> and 2 alleles per locus (1.2 million data points), it takes up to 6 hours
> to estimate the pairwise Fst matrix with this method.
>
> Is there any way to speed this up? Should I look into any other packages?
>
> Many thanks for your time and help.
> Vikram
>
> On Mon, Jul 14, 2014 at 9:16 AM, Vikram Chhatre
>
> wrote:
> > Perfect! Thank you for both solutions.
> >
> > V
> >
> > On Mon, Jul 14, 2014 at 9:13 AM, Jombart, Thibaut <
> >
> > t.jombart at imperial.ac.uk> wrote:
> >> Hi there,
> >>
> >> you can use seploc to separate loci, and lapply over the resulting list
> >> using your prefered fst function.
> >>
> >> Cheers
> >> Thibaut
> >> ________________________________________
> >> From: adegenet-forum-bounces at lists.r-forge.r-project.org [
> >> adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Vikram
> >> Chhatre [crypticlineage at gmail.com]
> >> Sent: 14 July 2014 14:01
> >> To: adegenet-forum at lists.r-forge.r-project.org
> >> Subject: [adegenet-forum] Per locus pairwise Fst
> >>
> >> Good morning.
> >>
> >> I would like to estimate per locus pairwise Fst for populations, but it
> >> appears that Adegenet only estimates this over all loci (i.e. single
> >> matrix). What I would like is one matrix per locus. Has anyone modified
> >> the functions or know of alternative programs that can do this?
> >>
> >> Thanks
> >> Vikram
--
Vojt?ch Zeisek
http://trapa.cz/en/
Department of Botany, Faculty of Science
Charles University in Prague
Ben?tsk? 2, Prague, 12801, CZ
http://botany.natur.cuni.cz/en/
Institute of Botany, Academy of Science
Z?mek 1, Pr?honice, 25243, CZ
http://www.ibot.cas.cz/en/
Czech Republic
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 490 bytes
Desc: This is a digitally signed message part.
URL:
From t.jombart at imperial.ac.uk Sat Sep 13 20:20:39 2014
From: t.jombart at imperial.ac.uk (Jombart, Thibaut)
Date: Sat, 13 Sep 2014 18:20:39 +0000
Subject: [adegenet-forum] Per locus pairwise Fst
In-Reply-To:
References:
<2CB2DA8E426F3541AB1907F98ABA6570A8233A2D@icexch-m1.ic.ac.uk>
,
Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570A825D6A1@icexch-m1.ic.ac.uk>
Hi there,
yes, this function is not optimized for large datasets. You can use the same approach but using functions from the hierfstat package.
Cheers
Thibaut
________________________________________
From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Vikram Chhatre [crypticlineage at gmail.com]
Sent: 12 September 2014 18:31
To: adegenet-forum at lists.r-forge.r-project.org
Subject: Re: [adegenet-forum] Per locus pairwise Fst
I am revisiting this topic due to some technical problems.
The task at hand is to estimate pairwise Fst matrices for each locus separately.
# Genind object is stored in:
gen100_genind
# Use seploc to separate loci:
gen100_seploc <- seploc(gen100_genind, truenames=TRUE, res.type=c('genind', 'matrix')
# Calculate pairwise Fst:
gen100_perLocusPWFst <- lapply(gen100_seploc, pairwise.fst, res.type=c('dist', 'matrix'), trunames=TRUE)
For a data set consisting of 30 populations, 20 individuals each, 1000 loci and 2 alleles per locus (1.2 million data points), it takes up to 6 hours to estimate the pairwise Fst matrix with this method.
Is there any way to speed this up? Should I look into any other packages?
Many thanks for your time and help.
Vikram
On Mon, Jul 14, 2014 at 9:16 AM, Vikram Chhatre > wrote:
Perfect! Thank you for both solutions.
V
On Mon, Jul 14, 2014 at 9:13 AM, Jombart, Thibaut > wrote:
Hi there,
you can use seploc to separate loci, and lapply over the resulting list using your prefered fst function.
Cheers
Thibaut
________________________________________
From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Vikram Chhatre [crypticlineage at gmail.com]
Sent: 14 July 2014 14:01
To: adegenet-forum at lists.r-forge.r-project.org
Subject: [adegenet-forum] Per locus pairwise Fst
Good morning.
I would like to estimate per locus pairwise Fst for populations, but it appears that Adegenet only estimates this over all loci (i.e. single matrix). What I would like is one matrix per locus. Has anyone modified the functions or know of alternative programs that can do this?
Thanks
Vikram
From t.jombart at imperial.ac.uk Sat Sep 13 20:22:21 2014
From: t.jombart at imperial.ac.uk (Jombart, Thibaut)
Date: Sat, 13 Sep 2014 18:22:21 +0000
Subject: [adegenet-forum] Per locus pairwise Fst
In-Reply-To: <1821182.AM9yii2LuR@veles.site>
References:
,
<1821182.AM9yii2LuR@veles.site>
Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570A825D6B1@icexch-m1.ic.ac.uk>
On non-windows systems, mclapply can be used to get a nice speedup, but really the first thing to do is use a function which does computations in a more optimal way.
Cheers
Thibaut
________________________________________
From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Vojt?ch Zeisek [vojta at trapa.cz]
Sent: 13 September 2014 18:47
To: adegenet-forum at lists.r-forge.r-project.org
Subject: Re: [adegenet-forum] Per locus pairwise Fst
Hello,
R is basically single threaded. Some packages/functions implement
parallelisation and if not, You can do it yourself. As You use function from
apply family, it should be easy, although I don't have solution right
available in my pocket. There are also several possibilities and it might
require some testing to find out the best solution for You task and equipment.
See http://www.r-bloggers.com/parallel-computing-in-r/ and details about
mentioned functions on http://cran.r-project.org/ When You google for parallel
computing in R, You get many links...
Good luck!
Vojt?ch
Dne P? 12. z??? 2014 13:31:46, Vikram Chhatre napsal(a):
> I am revisiting this topic due to some technical problems.
>
> The task at hand is to estimate pairwise Fst matrices for each locus
> separately.
>
> # Genind object is stored in:
> gen100_genind
>
> # Use seploc to separate loci:
> gen100_seploc <- seploc(gen100_genind, truenames=TRUE, res.type=c('genind',
> 'matrix')
>
> # Calculate pairwise Fst:
> gen100_perLocusPWFst <- lapply(gen100_seploc, pairwise.fst,
> res.type=c('dist', 'matrix'), trunames=TRUE)
>
> For a data set consisting of 30 populations, 20 individuals each, 1000 loci
> and 2 alleles per locus (1.2 million data points), it takes up to 6 hours
> to estimate the pairwise Fst matrix with this method.
>
> Is there any way to speed this up? Should I look into any other packages?
>
> Many thanks for your time and help.
> Vikram
>
> On Mon, Jul 14, 2014 at 9:16 AM, Vikram Chhatre
>
> wrote:
> > Perfect! Thank you for both solutions.
> >
> > V
> >
> > On Mon, Jul 14, 2014 at 9:13 AM, Jombart, Thibaut <
> >
> > t.jombart at imperial.ac.uk> wrote:
> >> Hi there,
> >>
> >> you can use seploc to separate loci, and lapply over the resulting list
> >> using your prefered fst function.
> >>
> >> Cheers
> >> Thibaut
> >> ________________________________________
> >> From: adegenet-forum-bounces at lists.r-forge.r-project.org [
> >> adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Vikram
> >> Chhatre [crypticlineage at gmail.com]
> >> Sent: 14 July 2014 14:01
> >> To: adegenet-forum at lists.r-forge.r-project.org
> >> Subject: [adegenet-forum] Per locus pairwise Fst
> >>
> >> Good morning.
> >>
> >> I would like to estimate per locus pairwise Fst for populations, but it
> >> appears that Adegenet only estimates this over all loci (i.e. single
> >> matrix). What I would like is one matrix per locus. Has anyone modified
> >> the functions or know of alternative programs that can do this?
> >>
> >> Thanks
> >> Vikram
--
Vojt?ch Zeisek
http://trapa.cz/en/
Department of Botany, Faculty of Science
Charles University in Prague
Ben?tsk? 2, Prague, 12801, CZ
http://botany.natur.cuni.cz/en/
Institute of Botany, Academy of Science
Z?mek 1, Pr?honice, 25243, CZ
http://www.ibot.cas.cz/en/
Czech Republic
From crypticlineage at gmail.com Sat Sep 13 22:48:07 2014
From: crypticlineage at gmail.com (Vikram Chhatre)
Date: Sat, 13 Sep 2014 16:48:07 -0400
Subject: [adegenet-forum] Per locus pairwise Fst
In-Reply-To: <2CB2DA8E426F3541AB1907F98ABA6570A825D6A1@icexch-m1.ic.ac.uk>
References:
<2CB2DA8E426F3541AB1907F98ABA6570A8233A2D@icexch-m1.ic.ac.uk>
<2CB2DA8E426F3541AB1907F98ABA6570A825D6A1@icexch-m1.ic.ac.uk>
Message-ID:
Thank you for all the replies. I have been looking at the pp.fst()
function in the Hierfstat package. Does the post-seploc data frame need to
be converted into something that Hierfstat understands first? The
following doesn't seem to work:
# Use seploc to separate loci:
gen100_seploc <- seploc(gen100_genind, truenames=TRUE, res.type=c('genind',
'matrix')
# Load Hierfstat
library(hierfstat)
# Calculate pairwise Fst:
gen100_perLocusPWFst <- lapply(gen100_seploc, pp.fst, diploid=TRUE)
Error in unique.default(Pop) : unique() applies only to vectors
On Sat, Sep 13, 2014 at 2:20 PM, Jombart, Thibaut
wrote:
>
> Hi there,
>
> yes, this function is not optimized for large datasets. You can use the
> same approach but using functions from the hierfstat package.
>
> Cheers
> Thibaut
> ________________________________________
> From: adegenet-forum-bounces at lists.r-forge.r-project.org [
> adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Vikram
> Chhatre [crypticlineage at gmail.com]
> Sent: 12 September 2014 18:31
> To: adegenet-forum at lists.r-forge.r-project.org
> Subject: Re: [adegenet-forum] Per locus pairwise Fst
>
> I am revisiting this topic due to some technical problems.
>
> The task at hand is to estimate pairwise Fst matrices for each locus
> separately.
>
> # Genind object is stored in:
> gen100_genind
>
> # Use seploc to separate loci:
> gen100_seploc <- seploc(gen100_genind, truenames=TRUE,
> res.type=c('genind', 'matrix')
>
> # Calculate pairwise Fst:
> gen100_perLocusPWFst <- lapply(gen100_seploc, pairwise.fst,
> res.type=c('dist', 'matrix'), trunames=TRUE)
>
> For a data set consisting of 30 populations, 20 individuals each, 1000
> loci and 2 alleles per locus (1.2 million data points), it takes up to 6
> hours to estimate the pairwise Fst matrix with this method.
>
> Is there any way to speed this up? Should I look into any other packages?
>
> Many thanks for your time and help.
> Vikram
>
>
>
>
> On Mon, Jul 14, 2014 at 9:16 AM, Vikram Chhatre > wrote:
> Perfect! Thank you for both solutions.
>
> V
>
>
> On Mon, Jul 14, 2014 at 9:13 AM, Jombart, Thibaut <
> t.jombart at imperial.ac.uk> wrote:
>
> Hi there,
>
> you can use seploc to separate loci, and lapply over the resulting list
> using your prefered fst function.
>
> Cheers
> Thibaut
> ________________________________________
> From: adegenet-forum-bounces at lists.r-forge.r-project.org adegenet-forum-bounces at lists.r-forge.r-project.org> [
> adegenet-forum-bounces at lists.r-forge.r-project.org adegenet-forum-bounces at lists.r-forge.r-project.org>] on behalf of Vikram
> Chhatre [crypticlineage at gmail.com]
> Sent: 14 July 2014 14:01
> To: adegenet-forum at lists.r-forge.r-project.org adegenet-forum at lists.r-forge.r-project.org>
> Subject: [adegenet-forum] Per locus pairwise Fst
>
> Good morning.
>
> I would like to estimate per locus pairwise Fst for populations, but it
> appears that Adegenet only estimates this over all loci (i.e. single
> matrix). What I would like is one matrix per locus. Has anyone modified
> the functions or know of alternative programs that can do this?
>
> Thanks
> Vikram
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From t.jombart at imperial.ac.uk Sun Sep 14 21:45:33 2014
From: t.jombart at imperial.ac.uk (Jombart, Thibaut)
Date: Sun, 14 Sep 2014 19:45:33 +0000
Subject: [adegenet-forum] Per locus pairwise Fst
In-Reply-To:
References:
<2CB2DA8E426F3541AB1907F98ABA6570A8233A2D@icexch-m1.ic.ac.uk>
<2CB2DA8E426F3541AB1907F98ABA6570A825D6A1@icexch-m1.ic.ac.uk>,
Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570A825D858@icexch-m1.ic.ac.uk>
Yes, you need to use:
?genind2hierfstat
Cheers
Thibaut
________________________________________
From: Vikram Chhatre [crypticlineage at gmail.com]
Sent: 13 September 2014 21:48
To: adegenet-forum at lists.r-forge.r-project.org; Jombart, Thibaut
Subject: Re: [adegenet-forum] Per locus pairwise Fst
Thank you for all the replies. I have been looking at the pp.fst() function in the Hierfstat package. Does the post-seploc data frame need to be converted into something that Hierfstat understands first? The following doesn't seem to work:
# Use seploc to separate loci:
gen100_seploc <- seploc(gen100_genind, truenames=TRUE, res.type=c('genind', 'matrix')
# Load Hierfstat
library(hierfstat)
# Calculate pairwise Fst:
gen100_perLocusPWFst <- lapply(gen100_seploc, pp.fst, diploid=TRUE)
Error in unique.default(Pop) : unique() applies only to vectors
On Sat, Sep 13, 2014 at 2:20 PM, Jombart, Thibaut > wrote:
Hi there,
yes, this function is not optimized for large datasets. You can use the same approach but using functions from the hierfstat package.
Cheers
Thibaut
________________________________________
From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Vikram Chhatre [crypticlineage at gmail.com]
Sent: 12 September 2014 18:31
To: adegenet-forum at lists.r-forge.r-project.org
Subject: Re: [adegenet-forum] Per locus pairwise Fst
I am revisiting this topic due to some technical problems.
The task at hand is to estimate pairwise Fst matrices for each locus separately.
# Genind object is stored in:
gen100_genind
# Use seploc to separate loci:
gen100_seploc <- seploc(gen100_genind, truenames=TRUE, res.type=c('genind', 'matrix')
# Calculate pairwise Fst:
gen100_perLocusPWFst <- lapply(gen100_seploc, pairwise.fst, res.type=c('dist', 'matrix'), trunames=TRUE)
For a data set consisting of 30 populations, 20 individuals each, 1000 loci and 2 alleles per locus (1.2 million data points), it takes up to 6 hours to estimate the pairwise Fst matrix with this method.
Is there any way to speed this up? Should I look into any other packages?
Many thanks for your time and help.
Vikram
On Mon, Jul 14, 2014 at 9:16 AM, Vikram Chhatre >> wrote:
Perfect! Thank you for both solutions.
V
On Mon, Jul 14, 2014 at 9:13 AM, Jombart, Thibaut >> wrote:
Hi there,
you can use seploc to separate loci, and lapply over the resulting list using your prefered fst function.
Cheers
Thibaut
________________________________________
From: adegenet-forum-bounces at lists.r-forge.r-project.org> [adegenet-forum-bounces at lists.r-forge.r-project.org>] on behalf of Vikram Chhatre [crypticlineage at gmail.com>]
Sent: 14 July 2014 14:01
To: adegenet-forum at lists.r-forge.r-project.org>
Subject: [adegenet-forum] Per locus pairwise Fst
Good morning.
I would like to estimate per locus pairwise Fst for populations, but it appears that Adegenet only estimates this over all loci (i.e. single matrix). What I would like is one matrix per locus. Has anyone modified the functions or know of alternative programs that can do this?
Thanks
Vikram
From caroline.duffie at gmail.com Tue Sep 16 22:45:25 2014
From: caroline.duffie at gmail.com (Caroline Judy)
Date: Tue, 16 Sep 2014 16:45:25 -0400
Subject: [adegenet-forum] randomize pop labels in a genind object for
randomization experiment.
Message-ID:
Hi Thibaut, Vikram, and others:
I'd like to try a randomization experiment to further explore my radseq
data using DAPC.
Data structure:
40 individuals in 2 (apriori) populations
6451 SNP loci
My data are for two very closely related "species" which show little to no
divergence at traditional markers. I performed a DAPC using a priori pop
definitions (set as species). The function can discriminate my species, but
the allelic contributions are very low ( highest few around .0015).
I am interested in trying a randomization experiment in which I shuffle the
population labels 100 times and then perform DAPC on each of these. Ultimately
the goal is to compare allelic loadings for the discriminant function
generated using true labels vs. randomized labels.
I am fairly new to R. A colleague suggested the general format to create a
loop, but could anyone offer a solution that could be implemented with a
genind object? Otherwise, I think it would be too labor intensive - I would
have to create 100 different structure input files to be converted to
genind objects.
nrep<- 100
results<- list() # or vector/matrix, depending on the case
For(I in 1:nrep)
{
Rand.labels<- sample(labels)
## do some analyses and assign relevant results to results
}
Thanks,
Caroline
On Sun, Sep 14, 2014 at 3:45 PM, Jombart, Thibaut
wrote:
>
> Yes, you need to use:
> ?genind2hierfstat
>
> Cheers
> Thibaut
>
> ________________________________________
> From: Vikram Chhatre [crypticlineage at gmail.com]
> Sent: 13 September 2014 21:48
> To: adegenet-forum at lists.r-forge.r-project.org; Jombart, Thibaut
> Subject: Re: [adegenet-forum] Per locus pairwise Fst
>
> Thank you for all the replies. I have been looking at the pp.fst()
> function in the Hierfstat package. Does the post-seploc data frame need to
> be converted into something that Hierfstat understands first? The
> following doesn't seem to work:
>
> # Use seploc to separate loci:
> gen100_seploc <- seploc(gen100_genind, truenames=TRUE,
> res.type=c('genind', 'matrix')
>
> # Load Hierfstat
> library(hierfstat)
>
> # Calculate pairwise Fst:
> gen100_perLocusPWFst <- lapply(gen100_seploc, pp.fst, diploid=TRUE)
>
> Error in unique.default(Pop) : unique() applies only to vectors
>
> On Sat, Sep 13, 2014 at 2:20 PM, Jombart, Thibaut <
> t.jombart at imperial.ac.uk> wrote:
>
> Hi there,
>
> yes, this function is not optimized for large datasets. You can use the
> same approach but using functions from the hierfstat package.
>
> Cheers
> Thibaut
> ________________________________________
> From: adegenet-forum-bounces at lists.r-forge.r-project.org adegenet-forum-bounces at lists.r-forge.r-project.org> [
> adegenet-forum-bounces at lists.r-forge.r-project.org adegenet-forum-bounces at lists.r-forge.r-project.org>] on behalf of Vikram
> Chhatre [crypticlineage at gmail.com]
> Sent: 12 September 2014 18:31
> To: adegenet-forum at lists.r-forge.r-project.org adegenet-forum at lists.r-forge.r-project.org>
> Subject: Re: [adegenet-forum] Per locus pairwise Fst
>
> I am revisiting this topic due to some technical problems.
>
> The task at hand is to estimate pairwise Fst matrices for each locus
> separately.
>
> # Genind object is stored in:
> gen100_genind
>
> # Use seploc to separate loci:
> gen100_seploc <- seploc(gen100_genind, truenames=TRUE,
> res.type=c('genind', 'matrix')
>
> # Calculate pairwise Fst:
> gen100_perLocusPWFst <- lapply(gen100_seploc, pairwise.fst,
> res.type=c('dist', 'matrix'), trunames=TRUE)
>
> For a data set consisting of 30 populations, 20 individuals each, 1000
> loci and 2 alleles per locus (1.2 million data points), it takes up to 6
> hours to estimate the pairwise Fst matrix with this method.
>
> Is there any way to speed this up? Should I look into any other packages?
>
> Many thanks for your time and help.
> Vikram
>
>
>
>
> On Mon, Jul 14, 2014 at 9:16 AM, Vikram Chhatre crypticlineage at gmail.com>>> wrote:
> Perfect! Thank you for both solutions.
>
> V
>
>
> On Mon, Jul 14, 2014 at 9:13 AM, Jombart, Thibaut <
> t.jombart at imperial.ac.uk t.jombart at imperial.ac.uk>> wrote:
>
> Hi there,
>
> you can use seploc to separate loci, and lapply over the resulting list
> using your prefered fst function.
>
> Cheers
> Thibaut
> ________________________________________
> From: adegenet-forum-bounces at lists.r-forge.r-project.org adegenet-forum-bounces at lists.r-forge.r-project.org> adegenet-forum-bounces at lists.r-forge.r-project.org adegenet-forum-bounces at lists.r-forge.r-project.org>> [
> adegenet-forum-bounces at lists.r-forge.r-project.org adegenet-forum-bounces at lists.r-forge.r-project.org> adegenet-forum-bounces at lists.r-forge.r-project.org adegenet-forum-bounces at lists.r-forge.r-project.org>>] on behalf of Vikram
> Chhatre [crypticlineage at gmail.com crypticlineage at gmail.com>]
> Sent: 14 July 2014 14:01
> To: adegenet-forum at lists.r-forge.r-project.org adegenet-forum at lists.r-forge.r-project.org> adegenet-forum at lists.r-forge.r-project.org adegenet-forum at lists.r-forge.r-project.org>>
> Subject: [adegenet-forum] Per locus pairwise Fst
>
> Good morning.
>
> I would like to estimate per locus pairwise Fst for populations, but it
> appears that Adegenet only estimates this over all loci (i.e. single
> matrix). What I would like is one matrix per locus. Has anyone modified
> the functions or know of alternative programs that can do this?
>
> Thanks
> Vikram
>
>
>
>
> _______________________________________________
> adegenet-forum mailing list
> adegenet-forum at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From carla.rivarossi at gmail.com Wed Sep 17 16:02:37 2014
From: carla.rivarossi at gmail.com (Carla Riva Rossi)
Date: Wed, 17 Sep 2014 11:02:37 -0300
Subject: [adegenet-forum] assignplot
Message-ID:
Hi Everyone,
I would like to change the color scheme in an assignplot to represent
membership probabilities with gray colors (where black =1, white=0) instead
of heat colors and then add a scale legend with the probability intervals.
Is there a way to do that?
Thanks in advance for the answers.
Carla Riva Rossi.-
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From t.jombart at imperial.ac.uk Thu Sep 18 11:41:12 2014
From: t.jombart at imperial.ac.uk (Jombart, Thibaut)
Date: Thu, 18 Sep 2014 09:41:12 +0000
Subject: [adegenet-forum] randomize pop labels in a genind object for
randomization experiment.
In-Reply-To:
References:
Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570A82686D8@icexch-m1.ic.ac.uk>
Hi there,
no need to recode everything: what you describe is cross-validation, and it is implemented in adegenet. See ?xvalDapc
Cheers
Thibaut
________________________________________
From: Caroline Judy [caroline.duffie at gmail.com]
Sent: 16 September 2014 21:45
To: Jombart, Thibaut
Cc: Vikram Chhatre; adegenet-forum at lists.r-forge.r-project.org
Subject: [adegenet-forum] randomize pop labels in a genind object for randomization experiment.
Hi Thibaut, Vikram, and others:
I'd like to try a randomization experiment to further explore my radseq data using DAPC.
Data structure:
40 individuals in 2 (apriori) populations
6451 SNP loci
My data are for two very closely related "species" which show little to no divergence at traditional markers. I performed a DAPC using a priori pop definitions (set as species). The function can discriminate my species, but the allelic contributions are very low ( highest few around .0015).
I am interested in trying a randomization experiment in which I shuffle the population labels 100 times and then perform DAPC on each of these. Ultimately the goal is to compare allelic loadings for the discriminant function generated using true labels vs. randomized labels.
I am fairly new to R. A colleague suggested the general format to create a loop, but could anyone offer a solution that could be implemented with a genind object? Otherwise, I think it would be too labor intensive - I would have to create 100 different structure input files to be converted to genind objects.
nrep<- 100
results<- list() # or vector/matrix, depending on the case
For(I in 1:nrep)
{
Rand.labels<- sample(labels)
## do some analyses and assign relevant results to results
}
Thanks,
Caroline
On Sun, Sep 14, 2014 at 3:45 PM, Jombart, Thibaut > wrote:
Yes, you need to use:
?genind2hierfstat
Cheers
Thibaut
________________________________________
From: Vikram Chhatre [crypticlineage at gmail.com]
Sent: 13 September 2014 21:48
To: adegenet-forum at lists.r-forge.r-project.org; Jombart, Thibaut
Subject: Re: [adegenet-forum] Per locus pairwise Fst
Thank you for all the replies. I have been looking at the pp.fst() function in the Hierfstat package. Does the post-seploc data frame need to be converted into something that Hierfstat understands first? The following doesn't seem to work:
# Use seploc to separate loci:
gen100_seploc <- seploc(gen100_genind, truenames=TRUE, res.type=c('genind', 'matrix')
# Load Hierfstat
library(hierfstat)
# Calculate pairwise Fst:
gen100_perLocusPWFst <- lapply(gen100_seploc, pp.fst, diploid=TRUE)
Error in unique.default(Pop) : unique() applies only to vectors
On Sat, Sep 13, 2014 at 2:20 PM, Jombart, Thibaut >> wrote:
Hi there,
yes, this function is not optimized for large datasets. You can use the same approach but using functions from the hierfstat package.
Cheers
Thibaut
________________________________________
From: adegenet-forum-bounces at lists.r-forge.r-project.org> [adegenet-forum-bounces at lists.r-forge.r-project.org>] on behalf of Vikram Chhatre [crypticlineage at gmail.com>]
Sent: 12 September 2014 18:31
To: adegenet-forum at lists.r-forge.r-project.org>
Subject: Re: [adegenet-forum] Per locus pairwise Fst
I am revisiting this topic due to some technical problems.
The task at hand is to estimate pairwise Fst matrices for each locus separately.
# Genind object is stored in:
gen100_genind
# Use seploc to separate loci:
gen100_seploc <- seploc(gen100_genind, truenames=TRUE, res.type=c('genind', 'matrix')
# Calculate pairwise Fst:
gen100_perLocusPWFst <- lapply(gen100_seploc, pairwise.fst, res.type=c('dist', 'matrix'), trunames=TRUE)
For a data set consisting of 30 populations, 20 individuals each, 1000 loci and 2 alleles per locus (1.2 million data points), it takes up to 6 hours to estimate the pairwise Fst matrix with this method.
Is there any way to speed this up? Should I look into any other packages?
Many thanks for your time and help.
Vikram
On Mon, Jul 14, 2014 at 9:16 AM, Vikram Chhatre >>>> wrote:
Perfect! Thank you for both solutions.
V
On Mon, Jul 14, 2014 at 9:13 AM, Jombart, Thibaut >>>> wrote:
Hi there,
you can use seploc to separate loci, and lapply over the resulting list using your prefered fst function.
Cheers
Thibaut
________________________________________
From: adegenet-forum-bounces at lists.r-forge.r-project.org>>> [adegenet-forum-bounces at lists.r-forge.r-project.org>>>] on behalf of Vikram Chhatre [crypticlineage at gmail.com>>>]
Sent: 14 July 2014 14:01
To: adegenet-forum at lists.r-forge.r-project.org>>>
Subject: [adegenet-forum] Per locus pairwise Fst
Good morning.
I would like to estimate per locus pairwise Fst for populations, but it appears that Adegenet only estimates this over all loci (i.e. single matrix). What I would like is one matrix per locus. Has anyone modified the functions or know of alternative programs that can do this?
Thanks
Vikram
_______________________________________________
adegenet-forum mailing list
adegenet-forum at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
From caitiecollins at gmail.com Thu Sep 18 14:46:16 2014
From: caitiecollins at gmail.com (Caitlin Collins)
Date: Thu, 18 Sep 2014 13:46:16 +0100
Subject: [adegenet-forum] Problems with find.cluster
In-Reply-To:
References:
Message-ID:
Hi Siobhan,
As a preliminary suggestion that will be easy to investigate, I would
suggest that perhaps the number of PCs retained is affecting your results
from find.clusters.
Have you had a look at the xvalDapc function? Similar to a.score, xvalDapc
can be used to help mediate the trade-off between discriminatory power and
over-fitting. I would be curious to see what xvalDapc recommends as the
number of PCs to retain to best differentiate the four groups you are
identifying via other methods. If the optimal number of PCs selected by
xvalDapc for the four groups is greater than the 11 PCs you have selected
with a.score, this would suggest that you may not have enough information
for the BIC to identify more than one cluster, so I would recommend
re-running find.clusters with the number of PCs suggested by xvalDapc to
see if you get different results.
Of course, it is possible that the problem lies elsewhere, or that
according to the BIC there is simply not enough evidence for more than one
cluster, but at least it will be very easy to check this theory. Please
let us know the results and we can then continue to search for other
solutions if necessary.
Best,
Caitlin.
On Tue, Sep 9, 2014 at 7:31 AM, Siobhan Dennison wrote:
> I am working on genetic structure of a threatened species, and as such
> have rather small sample sizes. Two of my four populations are out of HWE,
> and so I am using DAPC to look at population clustering because it does not
> assume HWE.
>
> The DAPC yielded 4 clusters as I expected, using the location information,
> and retaining a very conservative 11 PCs (following a.score). However, when
> I wanted to look at clustering with no location priors on the data, things
> got a bit weird. I used the find.clusters option in adegenet, and I keep
> getting very different results to my other analyses - the lowest BIC falls
> at K=1, but the BIC values are extremely low (~420), steadily increasing
> from there (I attached the graph FYI).
>
> My Fst values based on microsatellites suggest high differentiation
> between the 4 sites. I standardised my Fst values following Miermans 2006,
> which gave rather high Fst values (0.2-0.4). My mitochondrial Fst values
> are also high (>0.5).
>
> Using Structure with LOCprior (accounting for low sample sizes), I get K=4
> as the most likely number of clusters, and PCA also shows delineation
> between the four sample sites.
>
> Given that all of my other analyses tell the same story (that there a four
> rather differentiated sites), I'm wondering if anyone can tell me where I
> might be going wrong here?
>
> Any pointers would be greatly appreciated!!
>
> Thanks,
> Siobhan
> --
>
>
> _______________________________________________
> adegenet-forum mailing list
> adegenet-forum at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From maria.david.salas at gmail.com Thu Sep 18 16:08:31 2014
From: maria.david.salas at gmail.com (Maria del Carmen David)
Date: Thu, 18 Sep 2014 09:08:31 -0500
Subject: [adegenet-forum] how to label individuals in scatter(dapc)
Message-ID:
Hello. I can't find the way of labeling individuals in my dapc graphic. I
know that for those who work with huge amount of individuals it isn't
necessary but i have a bit less than 150 and i want to see how they plot. I
have used assignplot to get a better idea of the group assignments but it
would be extremely helpful to be able to label my samples. Thanks in
advance.
Maria del Carmen
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From caroline.duffie at gmail.com Thu Sep 18 16:20:47 2014
From: caroline.duffie at gmail.com (Caroline Judy)
Date: Thu, 18 Sep 2014 10:20:47 -0400
Subject: [adegenet-forum] randomize pop labels in a genind object for
randomization experiment.
In-Reply-To: <2CB2DA8E426F3541AB1907F98ABA6570A82686D8@icexch-m1.ic.ac.uk>
References:
<2CB2DA8E426F3541AB1907F98ABA6570A82686D8@icexch-m1.ic.ac.uk>
Message-ID:
I've been working through the tutorial again, and I now see (and
understand) the randomization step that is part of cross validation. I'm so
glad Adegenet has this formalized test. Thanks so much.
C
On Thu, Sep 18, 2014 at 5:41 AM, Jombart, Thibaut
wrote:
>
> Hi there,
>
> no need to recode everything: what you describe is cross-validation, and
> it is implemented in adegenet. See ?xvalDapc
>
> Cheers
>
> Thibaut
>
>
> ________________________________________
> From: Caroline Judy [caroline.duffie at gmail.com]
> Sent: 16 September 2014 21:45
> To: Jombart, Thibaut
> Cc: Vikram Chhatre; adegenet-forum at lists.r-forge.r-project.org
> Subject: [adegenet-forum] randomize pop labels in a genind object for
> randomization experiment.
>
> Hi Thibaut, Vikram, and others:
>
> I'd like to try a randomization experiment to further explore my radseq
> data using DAPC.
>
> Data structure:
> 40 individuals in 2 (apriori) populations
> 6451 SNP loci
>
> My data are for two very closely related "species" which show little to no
> divergence at traditional markers. I performed a DAPC using a priori pop
> definitions (set as species). The function can discriminate my species, but
> the allelic contributions are very low ( highest few around .0015).
>
> I am interested in trying a randomization experiment in which I shuffle
> the population labels 100 times and then perform DAPC on each of these.
> Ultimately the goal is to compare allelic loadings for the discriminant
> function generated using true labels vs. randomized labels.
>
> I am fairly new to R. A colleague suggested the general format to create a
> loop, but could anyone offer a solution that could be implemented with a
> genind object? Otherwise, I think it would be too labor intensive - I would
> have to create 100 different structure input files to be converted to
> genind objects.
>
> nrep<- 100
> results<- list() # or vector/matrix, depending on the case
> For(I in 1:nrep)
> {
> Rand.labels<- sample(labels)
> ## do some analyses and assign relevant results to results
> }
>
> Thanks,
> Caroline
>
>
> On Sun, Sep 14, 2014 at 3:45 PM, Jombart, Thibaut <
> t.jombart at imperial.ac.uk> wrote:
>
> Yes, you need to use:
> ?genind2hierfstat
>
> Cheers
> Thibaut
>
> ________________________________________
> From: Vikram Chhatre [crypticlineage at gmail.com crypticlineage at gmail.com>]
> Sent: 13 September 2014 21:48
> To: adegenet-forum at lists.r-forge.r-project.org adegenet-forum at lists.r-forge.r-project.org>; Jombart, Thibaut
> Subject: Re: [adegenet-forum] Per locus pairwise Fst
>
> Thank you for all the replies. I have been looking at the pp.fst()
> function in the Hierfstat package. Does the post-seploc data frame need to
> be converted into something that Hierfstat understands first? The
> following doesn't seem to work:
>
> # Use seploc to separate loci:
> gen100_seploc <- seploc(gen100_genind, truenames=TRUE,
> res.type=c('genind', 'matrix')
>
> # Load Hierfstat
> library(hierfstat)
>
> # Calculate pairwise Fst:
> gen100_perLocusPWFst <- lapply(gen100_seploc, pp.fst, diploid=TRUE)
>
> Error in unique.default(Pop) : unique() applies only to vectors
>
> On Sat, Sep 13, 2014 at 2:20 PM, Jombart, Thibaut <
> t.jombart at imperial.ac.uk t.jombart at imperial.ac.uk>> wrote:
>
> Hi there,
>
> yes, this function is not optimized for large datasets. You can use the
> same approach but using functions from the hierfstat package.
>
> Cheers
> Thibaut
> ________________________________________
> From: adegenet-forum-bounces at lists.r-forge.r-project.org adegenet-forum-bounces at lists.r-forge.r-project.org> adegenet-forum-bounces at lists.r-forge.r-project.org adegenet-forum-bounces at lists.r-forge.r-project.org>> [
> adegenet-forum-bounces at lists.r-forge.r-project.org adegenet-forum-bounces at lists.r-forge.r-project.org> adegenet-forum-bounces at lists.r-forge.r-project.org adegenet-forum-bounces at lists.r-forge.r-project.org>>] on behalf of Vikram
> Chhatre [crypticlineage at gmail.com crypticlineage at gmail.com>]
> Sent: 12 September 2014 18:31
> To: adegenet-forum at lists.r-forge.r-project.org adegenet-forum at lists.r-forge.r-project.org> adegenet-forum at lists.r-forge.r-project.org adegenet-forum at lists.r-forge.r-project.org>>
> Subject: Re: [adegenet-forum] Per locus pairwise Fst
>
> I am revisiting this topic due to some technical problems.
>
> The task at hand is to estimate pairwise Fst matrices for each locus
> separately.
>
> # Genind object is stored in:
> gen100_genind
>
> # Use seploc to separate loci:
> gen100_seploc <- seploc(gen100_genind, truenames=TRUE,
> res.type=c('genind', 'matrix')
>
> # Calculate pairwise Fst:
> gen100_perLocusPWFst <- lapply(gen100_seploc, pairwise.fst,
> res.type=c('dist', 'matrix'), trunames=TRUE)
>
> For a data set consisting of 30 populations, 20 individuals each, 1000
> loci and 2 alleles per locus (1.2 million data points), it takes up to 6
> hours to estimate the pairwise Fst matrix with this method.
>
> Is there any way to speed this up? Should I look into any other packages?
>
> Many thanks for your time and help.
> Vikram
>
>
>
>
> On Mon, Jul 14, 2014 at 9:16 AM, Vikram Chhatre crypticlineage at gmail.com>> crypticlineage at gmail.com> crypticlineage at gmail.com>>>> wrote:
> Perfect! Thank you for both solutions.
>
> V
>
>
> On Mon, Jul 14, 2014 at 9:13 AM, Jombart, Thibaut <
> t.jombart at imperial.ac.uk t.jombart at imperial.ac.uk> t.jombart at imperial.ac.uk t.jombart at imperial.ac.uk>>> wrote:
>
> Hi there,
>
> you can use seploc to separate loci, and lapply over the resulting list
> using your prefered fst function.
>
> Cheers
> Thibaut
> ________________________________________
> From: adegenet-forum-bounces at lists.r-forge.r-project.org adegenet-forum-bounces at lists.r-forge.r-project.org> adegenet-forum-bounces at lists.r-forge.r-project.org adegenet-forum-bounces at lists.r-forge.r-project.org>> adegenet-forum-bounces at lists.r-forge.r-project.org adegenet-forum-bounces at lists.r-forge.r-project.org> adegenet-forum-bounces at lists.r-forge.r-project.org adegenet-forum-bounces at lists.r-forge.r-project.org>>> [
> adegenet-forum-bounces at lists.r-forge.r-project.org adegenet-forum-bounces at lists.r-forge.r-project.org> adegenet-forum-bounces at lists.r-forge.r-project.org adegenet-forum-bounces at lists.r-forge.r-project.org>> adegenet-forum-bounces at lists.r-forge.r-project.org adegenet-forum-bounces at lists.r-forge.r-project.org> adegenet-forum-bounces at lists.r-forge.r-project.org adegenet-forum-bounces at lists.r-forge.r-project.org>>>] on behalf of
> Vikram Chhatre [crypticlineage at gmail.com > >> >>>]
> Sent: 14 July 2014 14:01
> To: adegenet-forum at lists.r-forge.r-project.org adegenet-forum at lists.r-forge.r-project.org> adegenet-forum at lists.r-forge.r-project.org adegenet-forum at lists.r-forge.r-project.org>> adegenet-forum at lists.r-forge.r-project.org adegenet-forum at lists.r-forge.r-project.org> adegenet-forum at lists.r-forge.r-project.org adegenet-forum at lists.r-forge.r-project.org>>>
> Subject: [adegenet-forum] Per locus pairwise Fst
>
> Good morning.
>
> I would like to estimate per locus pairwise Fst for populations, but it
> appears that Adegenet only estimates this over all loci (i.e. single
> matrix). What I would like is one matrix per locus. Has anyone modified
> the functions or know of alternative programs that can do this?
>
> Thanks
> Vikram
>
>
>
>
> _______________________________________________
> adegenet-forum mailing list
> adegenet-forum at lists.r-forge.r-project.org adegenet-forum at lists.r-forge.r-project.org>
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From caitiecollins at gmail.com Thu Sep 18 17:49:10 2014
From: caitiecollins at gmail.com (Caitlin Collins)
Date: Thu, 18 Sep 2014 16:49:10 +0100
Subject: [adegenet-forum] how to label individuals in scatter(dapc)
In-Reply-To:
References:
Message-ID:
Hi,
If you think the individuals you are plotting are spaced far enough that
you will be able to read labels at the individual level, one way to do it
is to use s.label.
Here is an example of how to use s.label to overlap labels to a scatterplot
of DAPC:
#############
## EXAMPLE ##
#############
set.seed(14)
# generate a simulated dataset with 3 populations
simpop <- glSim(100, 500, 5, k=3, sort.pop=TRUE)
# isolate the SNPs and the population factor
snps <- as.matrix(simpop)
phen <- simpop at other$ancestral.pops
# run a dapc
dapc1 <- dapc(snps, phen, n.pca=20, n.da=4)
# create the scatter plot as before
scatter(dapc1, cstar=0, cex=5, label=NULL)
# change graphical parameter to subsequently overlay the labels without
drawing a new plot
par(new=TRUE)
# make a data frame of the dapc coordinates used in scatter
df <- data.frame(x = dapc1$ind.coord[,1], y = dapc1$ind.coord[,2])
# identify/ create a vector of names for the individuals in your plot
noms <- paste("ind", c(1:100), sep=".")
# use the text function to add labels to the positions given by the
coordinates you used in plot
s.label(dfxy = df, xax=1, yax=2, label=noms,
clabel=0.7, # change the size of the labels
boxes=TRUE, # if points are spaced wide enough, can use TRUE to add
boxes around the labels
grid=FALSE, addaxes=FALSE) # do not draw lines or axes in addition
to the labels
The comments in the example above hopefully should give you all of the
relevant information, so please give them a read and then feel free to let
me know if you have any questions. You will almost certainly want to play
around with the arguments clabel and boxes in the s.label function to get
the labels to be readable for your case.
I hope that helps!
Best,
Caitlin.
On Thu, Sep 18, 2014 at 3:08 PM, Maria del Carmen David <
maria.david.salas at gmail.com> wrote:
> Hello. I can't find the way of labeling individuals in my dapc graphic. I
> know that for those who work with huge amount of individuals it isn't
> necessary but i have a bit less than 150 and i want to see how they plot. I
> have used assignplot to get a better idea of the group assignments but it
> would be extremely helpful to be able to label my samples. Thanks in
> advance.
>
> Maria del Carmen
>
> _______________________________________________
> adegenet-forum mailing list
> adegenet-forum at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From Jackie.Lighten at Dal.Ca Mon Sep 22 13:59:47 2014
From: Jackie.Lighten at Dal.Ca (Jackie Lighten)
Date: Mon, 22 Sep 2014 11:59:47 +0000
Subject: [adegenet-forum] Trouble converting to genid object
Message-ID:
Hi,
I am having trouble converting a presence/absence genotype data frame to a genid object
Please see attached for test data file.
Using
obj2 <- genind(test, ploidy=1, type="PA")
I get the error:
Error in `colnames<-`(`*tmp*`, value = c("L1", "L2")) :
length of 'dimnames' [2] not equal to array extent
Using
obj2 <- df2genind(test, ploidy=1, type="PA")
I get the error:
Error in `colnames<-`(`*tmp*`, value = "L1") :
length of 'dimnames' [2] not equal to array extent
In addition: Warning messages:
1: In eval(expr, envir, enclos) : NAs introduced by coercion
2: In df2genind(test, ploidy = 1, type = "PA") :
entirely non-type marker(s) deleted
Any help would be much appreciated
Thanks,
Jack
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: test.txt
URL:
From t.jombart at imperial.ac.uk Thu Sep 25 12:04:17 2014
From: t.jombart at imperial.ac.uk (Jombart, Thibaut)
Date: Thu, 25 Sep 2014 10:04:17 +0000
Subject: [adegenet-forum] Trouble converting to genid object
In-Reply-To:
References:
Message-ID: <2CB2DA8E426F3541AB1907F98ABA6570A826B775@icexch-m1.ic.ac.uk>
Hi there,
it looks like a bug. I'll investigate and get back to you.
Cheers
Thibaut
________________________________
From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Jackie Lighten [Jackie.Lighten at Dal.Ca]
Sent: 22 September 2014 12:59
To: adegenet-forum at lists.r-forge.r-project.org
Subject: [adegenet-forum] Trouble converting to genid object
Hi,
I am having trouble converting a presence/absence genotype data frame to a genid object
Please see attached for test data file.
Using
obj2 <- genind(test, ploidy=1, type="PA")
I get the error:
Error in `colnames<-`(`*tmp*`, value = c("L1", "L2")) :
length of 'dimnames' [2] not equal to array extent
Using
obj2 <- df2genind(test, ploidy=1, type="PA")
I get the error:
Error in `colnames<-`(`*tmp*`, value = "L1") :
length of 'dimnames' [2] not equal to array extent
In addition: Warning messages:
1: In eval(expr, envir, enclos) : NAs introduced by coercion
2: In df2genind(test, ploidy = 1, type = "PA") :
entirely non-type marker(s) deleted
Any help would be much appreciated
Thanks,
Jack
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From jean-luc.legras at supagro.inra.fr Thu Sep 25 16:02:15 2014
From: jean-luc.legras at supagro.inra.fr (Jean-Luc LEGRAS)
Date: Thu, 25 Sep 2014 16:02:15 +0200
Subject: [adegenet-forum] Combining genetic and phenotypic data?
Message-ID:
Hello,
I am a adegenet user, and I saw the discussion about the joined analysis of phenotypic and genetic data: ?
[adegenet-forum] Combining genetic and phenotypic data?
?
one year ago. We have genotyped yeast population by sequencing had pheotyped them as well for the production of many metabolites.
I was wondering if you have already implemented such functions in adegenet, are if this is on the way.
Thank you in advance for your answer.
Best regards.
Jean-Luc
PS: Je suis d?sol? d?avoir rat? votre passage ? Montpellier, mais j??tais pris le jour de votre visite au printemps dernier.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From caitiecollins at gmail.com Fri Sep 26 18:27:48 2014
From: caitiecollins at gmail.com (Caitlin Collins)
Date: Fri, 26 Sep 2014 17:27:48 +0100
Subject: [adegenet-forum] adegenet - a.score.opt vs. xvalDAPC
In-Reply-To: <24b1342a49a4456fb17abb347c5f96e5@icexch-h3.ic.ac.uk>
References: <24b1342a49a4456fb17abb347c5f96e5@icexch-h3.ic.ac.uk>
Message-ID:
Good question.
Essentially these are just two different approaches to the same problem of
trying to find the optimal number of PCs to retain in DAPC. The short
answer is: *Use xvalDapc instead of optim.a.score.*
optim.a.score was our first approach, and xvalDapc is our new and improved
approach? xvalDapc is easier to interpret and is likely to give better
results.
------
If you?re just generally curious about the two approaches, I can offer a
brief description and an explanation of the way I think about them, at
least:
Both methods rely on repeated measurements to perform model validation
relating to the impact of the number of PCs on the ability of the model to
predict the correct group membership of all individuals in the dataset.
In cross-validation with xvalDapc, DAPC is performed (with
increasing numbers of PCs) on a ?training set? (typically 90% of the
dataset) and then we project the individuals left out of the analysis onto
the discriminant axes constructed by DAPC. We measure how accurately we can
place this left-out 10% of individuals in the multidimensional space (in
which their position corresponds to their group membership). With too few
PCs retained, we fail to correctly assign the validation set of individuals
to the correct groups because we simply do not have enough information.
With too many PCs retained, we also begin to fail to correctly assign these
individuals, because essentially now all we are doing is over-describing
each of the individuals in the training set instead of painting a general
picture of just those features that relate to their group structure. This
over-description merely adds ?noise? that drowns out the group-defining
?signal? that we had been attempting to summarise. We perform the
cross-validation procedure repeatedly (each time varying the number of PCs
retained) with different training and validation sets until we find the
right signal-to-noise ratio, the goldilocks point between weak
discrimination and unstable results.
When using the a.score to achieve this aim, we repeatedly perform
DAPC with different numbers of retained PCs; but, by contrast to xvalDapc,
we keep all individuals in the analysis. Instead, with optim.a.score, at
each level of PC retention, we measure reassignment success to the real
populations of interest, and also measure that ?success? to fake randomized
populations. If there is any real group structure to be identified in the
dataset, the optimal level of PC retention will be the one at which our
ability to assign individuals to their real groups exceeds by the greatest
margin our ability to assign individuals to the false groupings, calculated
as Pt ? Pr, ie. probability of reassignment to the True cluster vs. the
Random cluster. With too few PCs, the probability of successful
reassignment will be low for both the true clusters and the random ones. On
the other hand, with too many PCs, you have so much information retained
that you could paint effectively any picture of groupings in the data, so
reassignment success to the false clusters will begin to approach that to
the true clusters and the a-score will decline, once again leaving a
goldilocks point in the middle of the arc indicating the optimal number of
PCs to retain.
The results of cross-validation and optim.a.score should not give
completely contradictory results, but they may not always give the same
result. If results differed, we would always recommend that you use the
results of xvalDapc over optim.a.score, hence you may as well just not
worry about optim.a.score in the first place.
Hope that helps!
Best,
Caitlin.
On Sat, Sep 20, 2014 at 10:35 PM, Judy (Duffie), Caroline
wrote:
> Dear Dr. Collins,
>
> I was wondering if you could help me understand the difference between
> using a.score.opt. vs. xvalDapc. It seems that both methods are used to
> determine the number of PCs to retain in the DAPC. Why and when would you
> use one method vs. the other?
>
> Thanks for any clarification you can offer. I?ve been through the papers
> and the tutorials, but am still trying to wrap my mind around these
> procedures.
>
> Caroline
>
> Caroline D. Judy, PhD Candidate & Peter Buck Fellow
> National Museum of Natural History
> Smithsonian Institution
> judyc at si.edu
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: