[adegenet-forum] define population level for PCA analysis in adegenet
Zhian Kamvar
zkamvar at gmail.com
Wed Oct 11 16:35:05 CEST 2017
The strata() function may be what you are looking for:
strata(datpop) <- pop_list
setPop(datpop) <- ~clusters
Where clusters is a column name in pop_list
-Zhian
Sent from my iPhone
> Date: Wed, 11 Oct 2017 12:08:05 +0200
> From: Didier Aurelle <aurelle.didier at gmail.com>
> To: adegenet-forum at lists.r-forge.r-project.org
> Subject: [adegenet-forum] define population level for PCA analysis in
> adegenet
> Message-ID:
> <CAKJPOZh3KVq6C3ic=ppkNfukaTXcSKjZ0t0-h2U1XBU9-AaNrA at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Dear all
>
> I want to perform a PCA analysis in adegenet starting from a genepop file
> without defined populations. I imported the data like this: datapop <-
> read.genepop('tous.gen', ncode=3, quiet = FALSE)
>
> it works, and I can perform a PCA after scaling the data. But I would like
> to plot the results / individuals on the PCA axis according to their
> population of origin using s.class. I have a vcf file with a three lettre
> code for each individual. I imported it in R: pops_list <-
> read.csv('liste_pops.csv', header=FALSE)
>
> but now how can I use it to define population levels in the genind object
> 'datapop'? I tried something likes this: setPop(datapop, formula = NULL)
>
> setPop(datapop) <- pops_list but it doesn't work; even the first line
> doesn't work: I get this message: "Erreur : formula must be a valid formula
> object."
>
> And then how should I use it in s.class?
>
> thanks
>
> Didier
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20171011/d2130c11/attachment-0001.html>
>
> ------------------------------
>
> Message: 2
> Date: Wed, 11 Oct 2017 13:13:20 +0200
> From: Davide Piffer <pifferdavide at gmail.com>
> To: Thibaut Jombart <thibautjombart at gmail.com>
> Cc: "adegenet-forum at lists.r-forge.r-project.org"
> <adegenet-forum at lists.r-forge.r-project.org>
> Subject: Re: [adegenet-forum] Retrieving population allele frequencies
> of SNPs using HGDP file
> Message-ID:
> <CAOq2dy6Npv6BYeV+_tyF9=ZMNjdkBTjpOSA7O0YcF5WviqWO2A at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi,
>
> I tried using different columns for the population. The Readme file lists
> these but none actually works. I am not sure if the problem is with the
> file or with the function because I am not an expert.
>
> Columns for individual data (HGDP/India/Africa individuals):
> 1. HGDP ID number or HapMap NA number
> 2. numeric code for population
> 3. name of population
> 4. country of origin
>
>
>
> On 11 October 2017 at 11:53, Thibaut Jombart <thibautjombart at gmail.com>
> wrote:
>
>> Hi there,
>>
>> reading populations info hasn't been a problem before (I think) in
>> read.structure. I would double-check which column it is, though I
>> assume you have. If you think there is a problem with the function
>> please post an issue on github with a reproducible example and we'll
>> try to sort it out.
>>
>> Best
>> Thibaut
>>
>> --
>> Dr Thibaut Jombart
>> Lecturer, Department of Infectious Disease Epidemiology, Imperial College
>> London
>> Head of RECON: repidemicsconsortium.org
>> WHO Consultant - outbreak analysis
>> sites.google.com/site/thibautjombart/
>> Twitter: @TeebzR
>> +44(0)20 7594 3658
>>
>>
>>> On 6 October 2017 at 12:08, Davide Piffer <pifferdavide at gmail.com> wrote:
>>> Ok, I think have found the file I need here:
>>> https://rosenberglab.stanford.edu/data/huangEtAl2011/
>> HuangEtAl_2011-GenetEpi.zip
>>> . However, it's in .str format. Following the instructions on the
>> manual, I
>>> tried to assign correct labels based on the Readme file
>>> (https://rosenberglab.stanford.edu/data/huangEtAl2011/
>> huangEtAl2011snpdata_readme)
>>>
>>> Mydata=read.structure("unphased_HGDP+India+Africa_
>> 2810SNPs-regions1to36.stru",
>>> onerowperind = FALSE,col.lab = 8,col.pop = 2,row.marknames = 1,n.ind =
>> 1107,
>>> n.loc = 2810, ask = FALSE)#convert into genind
>>> Mydata_pop=genind2genpop(Mydata)#convert into genpop
>>>
>>> However, I get a file with only 1 population.
>>>
>>> head(Mydata_pop)
>>> /// GENPOP OBJECT /////////
>>>
>>> // 1 population; 2,810 loci; 7,217 alleles; size: 1.5 Mb
>>>
>>> // Basic content
>>> @tab: 1 x 7217 matrix of allele counts
>>> @loc.n.all: number of alleles per locus (range: 2-4)
>>> @loc.fac: locus factor for the 7217 columns of @tab
>>> @all.names: list of allele names for each locus
>>> @ploidy: ploidy of each individual (range: 2-2)
>>> @type: codom
>>> @call: .local(x = x, i = i, j = j, drop = dro
>>>
>>> This is obviously wrong since there are 50+ populations.
>>>
>>> I tried changing col.pop from 2 to 3 but got the same output.
>>>
>>> Am I missing something?
>>>
>>>
>>> All the best,
>>> Davide
>>>
>>>
>>>
>>> On 6 October 2017 at 11:35, Thibaut Jombart <thibautjombart at gmail.com>
>>> wrote:
>>>>
>>>> Hi again,
>>>>
>>>> OK I think I got it. So:
>>>> - I can't remember how I built the eHGDP dataset, but it's an easy task
>>>> - I don't know if the data you're looking for is publicly available
>>>> - assuming you find it, there are two ways to get a genpop object:
>>>> #1: from individual data with pop info: read data in (read.csv /
>>>> read.table), use df2genind (be patient there, that'll take a while),
>>>> then genind2genpop
>>>>
>>>> #2: from population data (allele counts): read data in (read.csv /
>>>> read.table), use the genpop() constructor to make the data a genpop
>>>> object; I think this is documented in the basics tutorial, but
>>>> definitely also in ?genpop
>>>>
>>>> HTH
>>>> Best
>>>> Thibaut
>>>>
>>>> --
>>>> Dr Thibaut Jombart
>>>> Lecturer, Department of Infectious Disease Epidemiology, Imperial
>> College
>>>> London
>>>> Head of RECON: repidemicsconsortium.org
>>>> WHO Consultant - outbreak analysis
>>>> sites.google.com/site/thibautjombart/
>>>> Twitter: @TeebzR
>>>> +44(0)20 7594 3658
>>>>
>>>>
>>>> On 6 October 2017 at 10:24, Davide Piffer <pifferdavide at gmail.com>
>> wrote:
>>>>> Dear Thibaut,
>>>>>
>>>>> thanks for answering my question. I will try to reformulate my
>> question
>>>>> differently, stating the assumptions:
>>>>> 1) I assume that the eHGDP object was made into a genpop object from
>>>>> some
>>>>> raw .txt file, like the HGDP file I linked to in the previous email.
>>>>> 2) I need an object that looks exactly like the eHGDP object, but with
>>>>> SNPs
>>>>> instead of microsatellite alleles.
>>>>> 3) Since it's gonna be a rather complex task, I asked if any of you
>>>>> knows if
>>>>> someone has already done this job before and published it (e.g. as
>>>>> supplementary file).
>>>>> 4) Otherwise, I would like to know how to produce such a file myself,
>>>>> starting from a version of the HGDP file with population information.
>> If
>>>>> this was done for microsatellites, surely it can be done for the SNPs
>> as
>>>>> well? I assume they rely on the same raw HGDP file.
>>>>>
>>>>> Many thanks!
>>>>>
>>>>> Davide
>>>>>
>>>>> On 6 October 2017 at 10:56, Thibaut Jombart <thibautjombart at gmail.com
>>>
>>>>> wrote:
>>>>>>
>>>>>> Hi Davide,
>>>>>>
>>>>>> I am not entirely sure what you need, so sorry if I miss the point.
>>>>>> adegenet cannot make up for absent population information, but you
>> can
>>>>>> try to identify clusters of course, e.g. using find.clusters.
>>>>>>
>>>>>> eHGDP is not a file (at least not in the sense you probably mean),
>> but
>>>>>> a genind object. If the question is how you can get a file looking
>>>>>> like the one you link into a genind object, you probably want to use
>>>>>> something like read.csv and then df2genind. Imports should be
>> detailed
>>>>>> in the basics tutorial:
>>>>>> https://github.com/thibautjombart/adegenet/wiki/Tutorials
>>>>>>
>>>>>> Best
>>>>>> Thibaut
>>>>>>
>>>>>> --
>>>>>> Dr Thibaut Jombart
>>>>>> Lecturer, Department of Infectious Disease Epidemiology, Imperial
>>>>>> College
>>>>>> London
>>>>>> Head of RECON: repidemicsconsortium.org
>>>>>> WHO Consultant - outbreak analysis
>>>>>> sites.google.com/site/thibautjombart/
>>>>>> Twitter: @TeebzR
>>>>>> +44(0)20 7594 3658
>>>>>>
>>>>>>
>>>>>> On 4 October 2017 at 14:08, Davide Piffer <pifferdavide at gmail.com>
>>>>>> wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> I am new to Adegenet. I would like to retrieve population
>> frequencies
>>>>>>> of
>>>>>>> SNPs (using rsID) from the HGDP file
>> "HGDP_FinalReport_Forward.txt" :
>>>>>>> http://www.hagsc.org/hgdp/files.html
>>>>>>>
>>>>>>> However, the file lacks population information. It contains SNPs x
>>>>>>> individuals.
>>>>>>> I need a file structured like the eHGDP (except with SNPs and not
>>>>>>> microsatellite data) file provided with the package, that can be
>>>>>>> easily
>>>>>>> converted into genpop file and then compute the frequencies via
>>>>>>> makefreq.
>>>>>>> Do you know if there is any such file downloadable on the internet?
>>>>>>> i guess there must be a way to produce such a file using ADEGENET
>>>>>>> starting
>>>>>>> from raw data. but my knowledge of this package is not advanced
>>>>>>> enough
>>>>>>> yet.
>>>>>>>
>>>>>>> Best wishes,
>>>>>>>
>>>>>>> Davide
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> adegenet-forum mailing list
>>>>>>> adegenet-forum at lists.r-forge.r-project.org
>>>>>>>
>>>>>>>
>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/
>> listinfo/adegenet-forum
>>>>>
>>>>>
>>>
>>>
>>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20171011/68a69dbb/attachment-0001.html>
>
> ------------------------------
>
> Message: 3
> Date: Wed, 11 Oct 2017 12:19:59 +0100
> From: Thibaut Jombart <thibautjombart at gmail.com>
> To: Didier Aurelle <aurelle.didier at gmail.com>
> Cc: "adegenet-forum at lists.r-forge.r-project.org"
> <adegenet-forum at lists.r-forge.r-project.org>
> Subject: Re: [adegenet-forum] define population level for PCA analysis
> in adegenet
> Message-ID:
> <CANPRA+ocAKh0tBzHTdM0RSKO=Dd2hN9+PpRwDykOM63S3dZGCw at mail.gmail.com>
> Content-Type: text/plain; charset="UTF-8"
>
> Hi Didier,
>
> you are looking for 'pop(x) <- ...'.
>
> Accessors like this one, and other useful things are in the 'basics' tutorial:
> https://github.com/thibautjombart/adegenet/wiki/Tutorials
>
> Best
> Thibaut
>
> --
> Dr Thibaut Jombart
> Lecturer, Department of Infectious Disease Epidemiology, Imperial College London
> Head of RECON: repidemicsconsortium.org
> WHO Consultant - outbreak analysis
> sites.google.com/site/thibautjombart/
> Twitter: @TeebzR
> +44(0)20 7594 3658
>
>
>> On 11 October 2017 at 11:08, Didier Aurelle <aurelle.didier at gmail.com> wrote:
>> Dear all
>>
>> I want to perform a PCA analysis in adegenet starting from a genepop file
>> without defined populations. I imported the data like this: datapop <-
>> read.genepop('tous.gen', ncode=3, quiet = FALSE)
>>
>> it works, and I can perform a PCA after scaling the data. But I would like
>> to plot the results / individuals on the PCA axis according to their
>> population of origin using s.class. I have a vcf file with a three lettre
>> code for each individual. I imported it in R: pops_list <-
>> read.csv('liste_pops.csv', header=FALSE)
>>
>> but now how can I use it to define population levels in the genind object
>> 'datapop'? I tried something likes this: setPop(datapop, formula = NULL)
>>
>> setPop(datapop) <- pops_list but it doesn't work; even the first line
>> doesn't work: I get this message: "Erreur : formula must be a valid formula
>> object."
>>
>> And then how should I use it in s.class?
>>
>> thanks
>>
>> Didier
>>
>>
>> _______________________________________________
>> adegenet-forum mailing list
>> adegenet-forum at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
>
>
> ------------------------------
>
> _______________________________________________
> adegenet-forum mailing list
> adegenet-forum at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
>
> End of adegenet-forum Digest, Vol 110, Issue 7
> **********************************************
More information about the adegenet-forum
mailing list