[adegenet-forum] Retrieving population allele frequencies of SNPs using HGDP file

Thibaut Jombart thibautjombart at gmail.com
Wed Oct 11 14:52:12 CEST 2017


Okay great, if you post an issue on github with a minimal reproducible
example (data and code) I'll have a look today. We are finalising a
new release of adegenet as we speak.

Best
Thibaut



On 11 October 2017 at 13:47, Davide Piffer <pifferdavide at gmail.com> wrote:
> Yes.It is there
>
> Best wishes
>
> On 11 Oct 2017 2:17 pm, "Thibaut Jombart" <thibautjombart at gmail.com> wrote:
>>
>> Have you tried to open the file manually to check if the population
>> information was indeed there in the third column?
>>
>> Best
>> Thibaut
>>
>> On 11 Oct 2017 12:13, "Davide Piffer" <pifferdavide at gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> I tried using different columns for the population. The Readme file lists
>>> these but none actually works. I am not sure if the problem is with the file
>>> or with the function because I am not an expert.
>>>
>>> Columns for individual data (HGDP/India/Africa individuals):
>>> 1. HGDP ID number or HapMap NA number
>>> 2. numeric code for population
>>> 3. name of population
>>> 4. country of origin
>>>
>>>
>>>
>>> On 11 October 2017 at 11:53, Thibaut Jombart <thibautjombart at gmail.com>
>>> wrote:
>>>>
>>>> Hi there,
>>>>
>>>> reading populations info hasn't been a problem before (I think) in
>>>> read.structure. I would double-check which column it is, though I
>>>> assume you have. If you think there is a problem with the function
>>>> please post an issue on github with a reproducible example and we'll
>>>> try to sort it out.
>>>>
>>>> Best
>>>> Thibaut
>>>>
>>>> --
>>>> Dr Thibaut Jombart
>>>> Lecturer, Department of Infectious Disease Epidemiology, Imperial
>>>> College London
>>>> Head of RECON: repidemicsconsortium.org
>>>> WHO Consultant - outbreak analysis
>>>> sites.google.com/site/thibautjombart/
>>>> Twitter: @TeebzR
>>>> +44(0)20 7594 3658
>>>>
>>>>
>>>> On 6 October 2017 at 12:08, Davide Piffer <pifferdavide at gmail.com>
>>>> wrote:
>>>> > Ok, I think have found the file I need here:
>>>> >
>>>> > https://rosenberglab.stanford.edu/data/huangEtAl2011/HuangEtAl_2011-GenetEpi.zip
>>>> > . However, it's in .str format. Following the instructions on the
>>>> > manual, I
>>>> > tried to assign correct labels based on the Readme file
>>>> >
>>>> > (https://rosenberglab.stanford.edu/data/huangEtAl2011/huangEtAl2011snpdata_readme)
>>>> >
>>>> >
>>>> > Mydata=read.structure("unphased_HGDP+India+Africa_2810SNPs-regions1to36.stru",
>>>> > onerowperind = FALSE,col.lab = 8,col.pop = 2,row.marknames = 1,n.ind =
>>>> > 1107,
>>>> > n.loc = 2810, ask = FALSE)#convert into genind
>>>> > Mydata_pop=genind2genpop(Mydata)#convert into genpop
>>>> >
>>>> > However, I get a file with only 1 population.
>>>> >
>>>> > head(Mydata_pop)
>>>> > /// GENPOP OBJECT /////////
>>>> >
>>>> >  // 1 population; 2,810 loci; 7,217 alleles; size: 1.5 Mb
>>>> >
>>>> >  // Basic content
>>>> >    @tab:  1 x 7217 matrix of allele counts
>>>> >    @loc.n.all: number of alleles per locus (range: 2-4)
>>>> >    @loc.fac: locus factor for the 7217 columns of @tab
>>>> >    @all.names: list of allele names for each locus
>>>> >    @ploidy: ploidy of each individual  (range: 2-2)
>>>> >    @type:  codom
>>>> >    @call: .local(x = x, i = i, j = j, drop = dro
>>>> >
>>>> > This is obviously wrong since there are 50+ populations.
>>>> >
>>>> > I tried changing col.pop from 2 to 3 but got the same output.
>>>> >
>>>> > Am I missing something?
>>>> >
>>>> >
>>>> > All the best,
>>>> > Davide
>>>> >
>>>> >
>>>> >
>>>> > On 6 October 2017 at 11:35, Thibaut Jombart <thibautjombart at gmail.com>
>>>> > wrote:
>>>> >>
>>>> >> Hi again,
>>>> >>
>>>> >> OK I think I got it. So:
>>>> >> - I can't remember how I built the eHGDP dataset, but it's an easy
>>>> >> task
>>>> >> - I don't know if the data you're looking for is publicly available
>>>> >> - assuming you find it, there are two ways to get a genpop object:
>>>> >> #1: from individual data with pop info: read data in (read.csv /
>>>> >> read.table), use df2genind (be patient there, that'll take a while),
>>>> >> then genind2genpop
>>>> >>
>>>> >> #2: from population data (allele counts): read data in (read.csv /
>>>> >> read.table), use the genpop() constructor to make the data a genpop
>>>> >> object; I think this is documented in the basics tutorial, but
>>>> >> definitely also in ?genpop
>>>> >>
>>>> >> HTH
>>>> >> Best
>>>> >> Thibaut
>>>> >>
>>>> >> --
>>>> >> Dr Thibaut Jombart
>>>> >> Lecturer, Department of Infectious Disease Epidemiology, Imperial
>>>> >> College
>>>> >> London
>>>> >> Head of RECON: repidemicsconsortium.org
>>>> >> WHO Consultant - outbreak analysis
>>>> >> sites.google.com/site/thibautjombart/
>>>> >> Twitter: @TeebzR
>>>> >> +44(0)20 7594 3658
>>>> >>
>>>> >>
>>>> >> On 6 October 2017 at 10:24, Davide Piffer <pifferdavide at gmail.com>
>>>> >> wrote:
>>>> >> > Dear Thibaut,
>>>> >> >
>>>> >> > thanks for answering my question. I will try to reformulate my
>>>> >> > question
>>>> >> > differently, stating the assumptions:
>>>> >> > 1)  I assume that the eHGDP object was made into a genpop object
>>>> >> > from
>>>> >> > some
>>>> >> > raw .txt file, like the HGDP file I linked to in the previous
>>>> >> > email.
>>>> >> > 2) I need an object that looks exactly like the eHGDP object, but
>>>> >> > with
>>>> >> > SNPs
>>>> >> > instead of microsatellite alleles.
>>>> >> > 3) Since it's gonna be a rather complex task, I asked if any of you
>>>> >> > knows if
>>>> >> > someone has already done this job before and published it (e.g. as
>>>> >> > supplementary file).
>>>> >> > 4) Otherwise, I would like to know how to produce such a file
>>>> >> > myself,
>>>> >> > starting from a version of the HGDP file with population
>>>> >> > information. If
>>>> >> > this was done for microsatellites, surely it can be done for the
>>>> >> > SNPs as
>>>> >> > well? I assume they rely on the same raw HGDP file.
>>>> >> >
>>>> >> > Many thanks!
>>>> >> >
>>>> >> > Davide
>>>> >> >
>>>> >> > On 6 October 2017 at 10:56, Thibaut Jombart
>>>> >> > <thibautjombart at gmail.com>
>>>> >> > wrote:
>>>> >> >>
>>>> >> >> Hi Davide,
>>>> >> >>
>>>> >> >> I am not entirely sure what you need, so sorry if I miss the
>>>> >> >> point.
>>>> >> >> adegenet cannot make up for absent population information, but you
>>>> >> >> can
>>>> >> >> try to identify clusters of course, e.g. using find.clusters.
>>>> >> >>
>>>> >> >> eHGDP is not a file (at least not in the sense you probably mean),
>>>> >> >> but
>>>> >> >> a genind object. If the question is how you can get a file looking
>>>> >> >> like the one you link into a genind object, you probably want to
>>>> >> >> use
>>>> >> >> something like read.csv and then df2genind. Imports should be
>>>> >> >> detailed
>>>> >> >> in the basics tutorial:
>>>> >> >> https://github.com/thibautjombart/adegenet/wiki/Tutorials
>>>> >> >>
>>>> >> >> Best
>>>> >> >> Thibaut
>>>> >> >>
>>>> >> >> --
>>>> >> >> Dr Thibaut Jombart
>>>> >> >> Lecturer, Department of Infectious Disease Epidemiology, Imperial
>>>> >> >> College
>>>> >> >> London
>>>> >> >> Head of RECON: repidemicsconsortium.org
>>>> >> >> WHO Consultant - outbreak analysis
>>>> >> >> sites.google.com/site/thibautjombart/
>>>> >> >> Twitter: @TeebzR
>>>> >> >> +44(0)20 7594 3658
>>>> >> >>
>>>> >> >>
>>>> >> >> On 4 October 2017 at 14:08, Davide Piffer <pifferdavide at gmail.com>
>>>> >> >> wrote:
>>>> >> >> > Hello,
>>>> >> >> >
>>>> >> >> > I am new to Adegenet. I would like to retrieve population
>>>> >> >> > frequencies
>>>> >> >> > of
>>>> >> >> > SNPs (using rsID) from the HGDP file
>>>> >> >> > "HGDP_FinalReport_Forward.txt" :
>>>> >> >> > http://www.hagsc.org/hgdp/files.html
>>>> >> >> >
>>>> >> >> > However, the file lacks population information. It contains SNPs
>>>> >> >> > x
>>>> >> >> > individuals.
>>>> >> >> > I need a file structured like the eHGDP (except with SNPs and
>>>> >> >> > not
>>>> >> >> > microsatellite data) file provided with the package, that can be
>>>> >> >> > easily
>>>> >> >> > converted into genpop file and then compute the frequencies via
>>>> >> >> > makefreq.
>>>> >> >> > Do you know if there is any such file downloadable on the
>>>> >> >> > internet?
>>>> >> >> > i guess there must be a way to produce such a file using
>>>> >> >> > ADEGENET
>>>> >> >> > starting
>>>> >> >> > from raw data. but my knowledge of this package is not advanced
>>>> >> >> > enough
>>>> >> >> > yet.
>>>> >> >> >
>>>> >> >> > Best wishes,
>>>> >> >> >
>>>> >> >> > Davide
>>>> >> >> >
>>>> >> >> > _______________________________________________
>>>> >> >> > adegenet-forum mailing list
>>>> >> >> > adegenet-forum at lists.r-forge.r-project.org
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
>>>> >> >
>>>> >> >
>>>> >
>>>> >
>>>
>>>
>


More information about the adegenet-forum mailing list