[adegenet-forum] define population level for PCA analysis in adegenet

Zhian Kamvar zkamvar at gmail.com
Wed Oct 11 16:35:05 CEST 2017


The strata() function may be what you are looking for:

strata(datpop) <- pop_list
setPop(datpop) <- ~clusters 

Where clusters is a column name in pop_list

-Zhian

Sent from my iPhone

> Date: Wed, 11 Oct 2017 12:08:05 +0200
> From: Didier Aurelle <aurelle.didier at gmail.com>
> To: adegenet-forum at lists.r-forge.r-project.org
> Subject: [adegenet-forum] define population level for PCA analysis in
>    adegenet
> Message-ID:
>    <CAKJPOZh3KVq6C3ic=ppkNfukaTXcSKjZ0t0-h2U1XBU9-AaNrA at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
> 
> Dear all
> 
> I want to perform a PCA analysis in adegenet starting from a genepop file
> without defined populations. I imported the data like this: datapop <-
> read.genepop('tous.gen', ncode=3, quiet = FALSE)
> 
> it works, and I can perform a PCA after scaling the data. But I would like
> to plot the results / individuals on the PCA axis according to their
> population of origin using s.class. I have a vcf file with a three lettre
> code for each individual. I imported it in R: pops_list <-
> read.csv('liste_pops.csv', header=FALSE)
> 
> but now how can I use it to define population levels in the genind object
> 'datapop'? I tried something likes this: setPop(datapop, formula = NULL)
> 
> setPop(datapop) <- pops_list but it doesn't work; even the first line
> doesn't work: I get this message: "Erreur : formula must be a valid formula
> object."
> 
> And then how should I use it in s.class?
> 
> thanks
> 
> Didier
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20171011/d2130c11/attachment-0001.html>
> 
> ------------------------------
> 
> Message: 2
> Date: Wed, 11 Oct 2017 13:13:20 +0200
> From: Davide Piffer <pifferdavide at gmail.com>
> To: Thibaut Jombart <thibautjombart at gmail.com>
> Cc: "adegenet-forum at lists.r-forge.r-project.org"
>    <adegenet-forum at lists.r-forge.r-project.org>
> Subject: Re: [adegenet-forum] Retrieving population allele frequencies
>    of SNPs using HGDP file
> Message-ID:
>    <CAOq2dy6Npv6BYeV+_tyF9=ZMNjdkBTjpOSA7O0YcF5WviqWO2A at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
> 
> Hi,
> 
> I tried using different columns for the population. The Readme file lists
> these but none actually works. I am not sure if the problem is with the
> file or with the function because I am not an expert.
> 
> Columns for individual data (HGDP/India/Africa individuals):
> 1. HGDP ID number or HapMap NA number
> 2. numeric code for population
> 3. name of population
> 4. country of origin
> 
> 
> 
> On 11 October 2017 at 11:53, Thibaut Jombart <thibautjombart at gmail.com>
> wrote:
> 
>> Hi there,
>> 
>> reading populations info hasn't been a problem before (I think) in
>> read.structure. I would double-check which column it is, though I
>> assume you have. If you think there is a problem with the function
>> please post an issue on github with a reproducible example and we'll
>> try to sort it out.
>> 
>> Best
>> Thibaut
>> 
>> --
>> Dr Thibaut Jombart
>> Lecturer, Department of Infectious Disease Epidemiology, Imperial College
>> London
>> Head of RECON: repidemicsconsortium.org
>> WHO Consultant - outbreak analysis
>> sites.google.com/site/thibautjombart/
>> Twitter: @TeebzR
>> +44(0)20 7594 3658
>> 
>> 
>>> On 6 October 2017 at 12:08, Davide Piffer <pifferdavide at gmail.com> wrote:
>>> Ok, I think have found the file I need here:
>>> https://rosenberglab.stanford.edu/data/huangEtAl2011/
>> HuangEtAl_2011-GenetEpi.zip
>>> . However, it's in .str format. Following the instructions on the
>> manual, I
>>> tried to assign correct labels based on the Readme file
>>> (https://rosenberglab.stanford.edu/data/huangEtAl2011/
>> huangEtAl2011snpdata_readme)
>>> 
>>> Mydata=read.structure("unphased_HGDP+India+Africa_
>> 2810SNPs-regions1to36.stru",
>>> onerowperind = FALSE,col.lab = 8,col.pop = 2,row.marknames = 1,n.ind =
>> 1107,
>>> n.loc = 2810, ask = FALSE)#convert into genind
>>> Mydata_pop=genind2genpop(Mydata)#convert into genpop
>>> 
>>> However, I get a file with only 1 population.
>>> 
>>> head(Mydata_pop)
>>> /// GENPOP OBJECT /////////
>>> 
>>> // 1 population; 2,810 loci; 7,217 alleles; size: 1.5 Mb
>>> 
>>> // Basic content
>>>   @tab:  1 x 7217 matrix of allele counts
>>>   @loc.n.all: number of alleles per locus (range: 2-4)
>>>   @loc.fac: locus factor for the 7217 columns of @tab
>>>   @all.names: list of allele names for each locus
>>>   @ploidy: ploidy of each individual  (range: 2-2)
>>>   @type:  codom
>>>   @call: .local(x = x, i = i, j = j, drop = dro
>>> 
>>> This is obviously wrong since there are 50+ populations.
>>> 
>>> I tried changing col.pop from 2 to 3 but got the same output.
>>> 
>>> Am I missing something?
>>> 
>>> 
>>> All the best,
>>> Davide
>>> 
>>> 
>>> 
>>> On 6 October 2017 at 11:35, Thibaut Jombart <thibautjombart at gmail.com>
>>> wrote:
>>>> 
>>>> Hi again,
>>>> 
>>>> OK I think I got it. So:
>>>> - I can't remember how I built the eHGDP dataset, but it's an easy task
>>>> - I don't know if the data you're looking for is publicly available
>>>> - assuming you find it, there are two ways to get a genpop object:
>>>> #1: from individual data with pop info: read data in (read.csv /
>>>> read.table), use df2genind (be patient there, that'll take a while),
>>>> then genind2genpop
>>>> 
>>>> #2: from population data (allele counts): read data in (read.csv /
>>>> read.table), use the genpop() constructor to make the data a genpop
>>>> object; I think this is documented in the basics tutorial, but
>>>> definitely also in ?genpop
>>>> 
>>>> HTH
>>>> Best
>>>> Thibaut
>>>> 
>>>> --
>>>> Dr Thibaut Jombart
>>>> Lecturer, Department of Infectious Disease Epidemiology, Imperial
>> College
>>>> London
>>>> Head of RECON: repidemicsconsortium.org
>>>> WHO Consultant - outbreak analysis
>>>> sites.google.com/site/thibautjombart/
>>>> Twitter: @TeebzR
>>>> +44(0)20 7594 3658
>>>> 
>>>> 
>>>> On 6 October 2017 at 10:24, Davide Piffer <pifferdavide at gmail.com>
>> wrote:
>>>>> Dear Thibaut,
>>>>> 
>>>>> thanks for answering my question. I will try to reformulate my
>> question
>>>>> differently, stating the assumptions:
>>>>> 1)  I assume that the eHGDP object was made into a genpop object from
>>>>> some
>>>>> raw .txt file, like the HGDP file I linked to in the previous email.
>>>>> 2) I need an object that looks exactly like the eHGDP object, but with
>>>>> SNPs
>>>>> instead of microsatellite alleles.
>>>>> 3) Since it's gonna be a rather complex task, I asked if any of you
>>>>> knows if
>>>>> someone has already done this job before and published it (e.g. as
>>>>> supplementary file).
>>>>> 4) Otherwise, I would like to know how to produce such a file myself,
>>>>> starting from a version of the HGDP file with population information.
>> If
>>>>> this was done for microsatellites, surely it can be done for the SNPs
>> as
>>>>> well? I assume they rely on the same raw HGDP file.
>>>>> 
>>>>> Many thanks!
>>>>> 
>>>>> Davide
>>>>> 
>>>>> On 6 October 2017 at 10:56, Thibaut Jombart <thibautjombart at gmail.com
>>> 
>>>>> wrote:
>>>>>> 
>>>>>> Hi Davide,
>>>>>> 
>>>>>> I am not entirely sure what you need, so sorry if I miss the point.
>>>>>> adegenet cannot make up for absent population information, but you
>> can
>>>>>> try to identify clusters of course, e.g. using find.clusters.
>>>>>> 
>>>>>> eHGDP is not a file (at least not in the sense you probably mean),
>> but
>>>>>> a genind object. If the question is how you can get a file looking
>>>>>> like the one you link into a genind object, you probably want to use
>>>>>> something like read.csv and then df2genind. Imports should be
>> detailed
>>>>>> in the basics tutorial:
>>>>>> https://github.com/thibautjombart/adegenet/wiki/Tutorials
>>>>>> 
>>>>>> Best
>>>>>> Thibaut
>>>>>> 
>>>>>> --
>>>>>> Dr Thibaut Jombart
>>>>>> Lecturer, Department of Infectious Disease Epidemiology, Imperial
>>>>>> College
>>>>>> London
>>>>>> Head of RECON: repidemicsconsortium.org
>>>>>> WHO Consultant - outbreak analysis
>>>>>> sites.google.com/site/thibautjombart/
>>>>>> Twitter: @TeebzR
>>>>>> +44(0)20 7594 3658
>>>>>> 
>>>>>> 
>>>>>> On 4 October 2017 at 14:08, Davide Piffer <pifferdavide at gmail.com>
>>>>>> wrote:
>>>>>>> Hello,
>>>>>>> 
>>>>>>> I am new to Adegenet. I would like to retrieve population
>> frequencies
>>>>>>> of
>>>>>>> SNPs (using rsID) from the HGDP file
>> "HGDP_FinalReport_Forward.txt" :
>>>>>>> http://www.hagsc.org/hgdp/files.html
>>>>>>> 
>>>>>>> However, the file lacks population information. It contains SNPs x
>>>>>>> individuals.
>>>>>>> I need a file structured like the eHGDP (except with SNPs and not
>>>>>>> microsatellite data) file provided with the package, that can be
>>>>>>> easily
>>>>>>> converted into genpop file and then compute the frequencies via
>>>>>>> makefreq.
>>>>>>> Do you know if there is any such file downloadable on the internet?
>>>>>>> i guess there must be a way to produce such a file using ADEGENET
>>>>>>> starting
>>>>>>> from raw data. but my knowledge of this package is not advanced
>>>>>>> enough
>>>>>>> yet.
>>>>>>> 
>>>>>>> Best wishes,
>>>>>>> 
>>>>>>> Davide
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> adegenet-forum mailing list
>>>>>>> adegenet-forum at lists.r-forge.r-project.org
>>>>>>> 
>>>>>>> 
>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/
>> listinfo/adegenet-forum
>>>>> 
>>>>> 
>>> 
>>> 
>> 
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20171011/68a69dbb/attachment-0001.html>
> 
> ------------------------------
> 
> Message: 3
> Date: Wed, 11 Oct 2017 12:19:59 +0100
> From: Thibaut Jombart <thibautjombart at gmail.com>
> To: Didier Aurelle <aurelle.didier at gmail.com>
> Cc: "adegenet-forum at lists.r-forge.r-project.org"
>    <adegenet-forum at lists.r-forge.r-project.org>
> Subject: Re: [adegenet-forum] define population level for PCA analysis
>    in    adegenet
> Message-ID:
>    <CANPRA+ocAKh0tBzHTdM0RSKO=Dd2hN9+PpRwDykOM63S3dZGCw at mail.gmail.com>
> Content-Type: text/plain; charset="UTF-8"
> 
> Hi Didier,
> 
> you are looking for 'pop(x) <- ...'.
> 
> Accessors like this one, and other useful things are in the 'basics' tutorial:
> https://github.com/thibautjombart/adegenet/wiki/Tutorials
> 
> Best
> Thibaut
> 
> --
> Dr Thibaut Jombart
> Lecturer, Department of Infectious Disease Epidemiology, Imperial College London
> Head of RECON: repidemicsconsortium.org
> WHO Consultant - outbreak analysis
> sites.google.com/site/thibautjombart/
> Twitter: @TeebzR
> +44(0)20 7594 3658
> 
> 
>> On 11 October 2017 at 11:08, Didier Aurelle <aurelle.didier at gmail.com> wrote:
>> Dear all
>> 
>> I want to perform a PCA analysis in adegenet starting from a genepop file
>> without defined populations. I imported the data like this: datapop <-
>> read.genepop('tous.gen', ncode=3, quiet = FALSE)
>> 
>> it works, and I can perform a PCA after scaling the data. But I would like
>> to plot the results / individuals on the PCA axis according to their
>> population of origin using s.class. I have a vcf file with a three lettre
>> code for each individual. I imported it in R: pops_list <-
>> read.csv('liste_pops.csv', header=FALSE)
>> 
>> but now how can I use it to define population levels in the genind object
>> 'datapop'? I tried something likes this: setPop(datapop, formula = NULL)
>> 
>> setPop(datapop) <- pops_list but it doesn't work; even the first line
>> doesn't work: I get this message: "Erreur : formula must be a valid formula
>> object."
>> 
>> And then how should I use it in s.class?
>> 
>> thanks
>> 
>> Didier
>> 
>> 
>> _______________________________________________
>> adegenet-forum mailing list
>> adegenet-forum at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
> 
> 
> ------------------------------
> 
> _______________________________________________
> adegenet-forum mailing list
> adegenet-forum at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
> 
> End of adegenet-forum Digest, Vol 110, Issue 7
> **********************************************


More information about the adegenet-forum mailing list