[adegenet-forum] Retrieving population allele frequencies of SNPs using HGDP file

Davide Piffer pifferdavide at gmail.com
Wed Oct 11 14:47:38 CEST 2017


Yes.It is there

Best wishes

On 11 Oct 2017 2:17 pm, "Thibaut Jombart" <thibautjombart at gmail.com> wrote:

> Have you tried to open the file manually to check if the population
> information was indeed there in the third column?
>
> Best
> Thibaut
>
> On 11 Oct 2017 12:13, "Davide Piffer" <pifferdavide at gmail.com> wrote:
>
>> Hi,
>>
>> I tried using different columns for the population. The Readme file lists
>> these but none actually works. I am not sure if the problem is with the
>> file or with the function because I am not an expert.
>>
>> Columns for individual data (HGDP/India/Africa individuals):
>> 1. HGDP ID number or HapMap NA number
>> 2. numeric code for population
>> 3. name of population
>> 4. country of origin
>>
>>
>>
>> On 11 October 2017 at 11:53, Thibaut Jombart <thibautjombart at gmail.com>
>> wrote:
>>
>>> Hi there,
>>>
>>> reading populations info hasn't been a problem before (I think) in
>>> read.structure. I would double-check which column it is, though I
>>> assume you have. If you think there is a problem with the function
>>> please post an issue on github with a reproducible example and we'll
>>> try to sort it out.
>>>
>>> Best
>>> Thibaut
>>>
>>> --
>>> Dr Thibaut Jombart
>>> Lecturer, Department of Infectious Disease Epidemiology, Imperial
>>> College London
>>> Head of RECON: repidemicsconsortium.org
>>> WHO Consultant - outbreak analysis
>>> sites.google.com/site/thibautjombart/
>>> Twitter: @TeebzR
>>> +44(0)20 7594 3658
>>>
>>>
>>> On 6 October 2017 at 12:08, Davide Piffer <pifferdavide at gmail.com>
>>> wrote:
>>> > Ok, I think have found the file I need here:
>>> > https://rosenberglab.stanford.edu/data/huangEtAl2011/HuangEt
>>> Al_2011-GenetEpi.zip
>>> > . However, it's in .str format. Following the instructions on the
>>> manual, I
>>> > tried to assign correct labels based on the Readme file
>>> > (https://rosenberglab.stanford.edu/data/huangEtAl2011/huangE
>>> tAl2011snpdata_readme)
>>> >
>>> > Mydata=read.structure("unphased_HGDP+India+Africa_2810SNPs-r
>>> egions1to36.stru",
>>> > onerowperind = FALSE,col.lab = 8,col.pop = 2,row.marknames = 1,n.ind =
>>> 1107,
>>> > n.loc = 2810, ask = FALSE)#convert into genind
>>> > Mydata_pop=genind2genpop(Mydata)#convert into genpop
>>> >
>>> > However, I get a file with only 1 population.
>>> >
>>> > head(Mydata_pop)
>>> > /// GENPOP OBJECT /////////
>>> >
>>> >  // 1 population; 2,810 loci; 7,217 alleles; size: 1.5 Mb
>>> >
>>> >  // Basic content
>>> >    @tab:  1 x 7217 matrix of allele counts
>>> >    @loc.n.all: number of alleles per locus (range: 2-4)
>>> >    @loc.fac: locus factor for the 7217 columns of @tab
>>> >    @all.names: list of allele names for each locus
>>> >    @ploidy: ploidy of each individual  (range: 2-2)
>>> >    @type:  codom
>>> >    @call: .local(x = x, i = i, j = j, drop = dro
>>> >
>>> > This is obviously wrong since there are 50+ populations.
>>> >
>>> > I tried changing col.pop from 2 to 3 but got the same output.
>>> >
>>> > Am I missing something?
>>> >
>>> >
>>> > All the best,
>>> > Davide
>>> >
>>> >
>>> >
>>> > On 6 October 2017 at 11:35, Thibaut Jombart <thibautjombart at gmail.com>
>>> > wrote:
>>> >>
>>> >> Hi again,
>>> >>
>>> >> OK I think I got it. So:
>>> >> - I can't remember how I built the eHGDP dataset, but it's an easy
>>> task
>>> >> - I don't know if the data you're looking for is publicly available
>>> >> - assuming you find it, there are two ways to get a genpop object:
>>> >> #1: from individual data with pop info: read data in (read.csv /
>>> >> read.table), use df2genind (be patient there, that'll take a while),
>>> >> then genind2genpop
>>> >>
>>> >> #2: from population data (allele counts): read data in (read.csv /
>>> >> read.table), use the genpop() constructor to make the data a genpop
>>> >> object; I think this is documented in the basics tutorial, but
>>> >> definitely also in ?genpop
>>> >>
>>> >> HTH
>>> >> Best
>>> >> Thibaut
>>> >>
>>> >> --
>>> >> Dr Thibaut Jombart
>>> >> Lecturer, Department of Infectious Disease Epidemiology, Imperial
>>> College
>>> >> London
>>> >> Head of RECON: repidemicsconsortium.org
>>> >> WHO Consultant - outbreak analysis
>>> >> sites.google.com/site/thibautjombart/
>>> >> Twitter: @TeebzR
>>> >> +44(0)20 7594 3658
>>> >>
>>> >>
>>> >> On 6 October 2017 at 10:24, Davide Piffer <pifferdavide at gmail.com>
>>> wrote:
>>> >> > Dear Thibaut,
>>> >> >
>>> >> > thanks for answering my question. I will try to reformulate my
>>> question
>>> >> > differently, stating the assumptions:
>>> >> > 1)  I assume that the eHGDP object was made into a genpop object
>>> from
>>> >> > some
>>> >> > raw .txt file, like the HGDP file I linked to in the previous email.
>>> >> > 2) I need an object that looks exactly like the eHGDP object, but
>>> with
>>> >> > SNPs
>>> >> > instead of microsatellite alleles.
>>> >> > 3) Since it's gonna be a rather complex task, I asked if any of you
>>> >> > knows if
>>> >> > someone has already done this job before and published it (e.g. as
>>> >> > supplementary file).
>>> >> > 4) Otherwise, I would like to know how to produce such a file
>>> myself,
>>> >> > starting from a version of the HGDP file with population
>>> information. If
>>> >> > this was done for microsatellites, surely it can be done for the
>>> SNPs as
>>> >> > well? I assume they rely on the same raw HGDP file.
>>> >> >
>>> >> > Many thanks!
>>> >> >
>>> >> > Davide
>>> >> >
>>> >> > On 6 October 2017 at 10:56, Thibaut Jombart <
>>> thibautjombart at gmail.com>
>>> >> > wrote:
>>> >> >>
>>> >> >> Hi Davide,
>>> >> >>
>>> >> >> I am not entirely sure what you need, so sorry if I miss the point.
>>> >> >> adegenet cannot make up for absent population information, but you
>>> can
>>> >> >> try to identify clusters of course, e.g. using find.clusters.
>>> >> >>
>>> >> >> eHGDP is not a file (at least not in the sense you probably mean),
>>> but
>>> >> >> a genind object. If the question is how you can get a file looking
>>> >> >> like the one you link into a genind object, you probably want to
>>> use
>>> >> >> something like read.csv and then df2genind. Imports should be
>>> detailed
>>> >> >> in the basics tutorial:
>>> >> >> https://github.com/thibautjombart/adegenet/wiki/Tutorials
>>> >> >>
>>> >> >> Best
>>> >> >> Thibaut
>>> >> >>
>>> >> >> --
>>> >> >> Dr Thibaut Jombart
>>> >> >> Lecturer, Department of Infectious Disease Epidemiology, Imperial
>>> >> >> College
>>> >> >> London
>>> >> >> Head of RECON: repidemicsconsortium.org
>>> >> >> WHO Consultant - outbreak analysis
>>> >> >> sites.google.com/site/thibautjombart/
>>> >> >> Twitter: @TeebzR
>>> >> >> +44(0)20 7594 3658
>>> >> >>
>>> >> >>
>>> >> >> On 4 October 2017 at 14:08, Davide Piffer <pifferdavide at gmail.com>
>>> >> >> wrote:
>>> >> >> > Hello,
>>> >> >> >
>>> >> >> > I am new to Adegenet. I would like to retrieve population
>>> frequencies
>>> >> >> > of
>>> >> >> > SNPs (using rsID) from the HGDP file
>>> "HGDP_FinalReport_Forward.txt" :
>>> >> >> > http://www.hagsc.org/hgdp/files.html
>>> >> >> >
>>> >> >> > However, the file lacks population information. It contains SNPs
>>> x
>>> >> >> > individuals.
>>> >> >> > I need a file structured like the eHGDP (except with SNPs and not
>>> >> >> > microsatellite data) file provided with the package, that can be
>>> >> >> > easily
>>> >> >> > converted into genpop file and then compute the frequencies via
>>> >> >> > makefreq.
>>> >> >> > Do you know if there is any such file downloadable on the
>>> internet?
>>> >> >> > i guess there must be a way to produce such a file using ADEGENET
>>> >> >> > starting
>>> >> >> > from raw data. but my knowledge of this package is not advanced
>>> >> >> > enough
>>> >> >> > yet.
>>> >> >> >
>>> >> >> > Best wishes,
>>> >> >> >
>>> >> >> > Davide
>>> >> >> >
>>> >> >> > _______________________________________________
>>> >> >> > adegenet-forum mailing list
>>> >> >> > adegenet-forum at lists.r-forge.r-project.org
>>> >> >> >
>>> >> >> >
>>> >> >> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo
>>> /adegenet-forum
>>> >> >
>>> >> >
>>> >
>>> >
>>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20171011/8137ee6a/attachment.html>


More information about the adegenet-forum mailing list