[adegenet-forum] Retrieving population allele frequencies of SNPs using HGDP file

Thibaut Jombart thibautjombart at gmail.com
Wed Oct 11 14:16:59 CEST 2017


Have you tried to open the file manually to check if the population
information was indeed there in the third column?

Best
Thibaut

On 11 Oct 2017 12:13, "Davide Piffer" <pifferdavide at gmail.com> wrote:

> Hi,
>
> I tried using different columns for the population. The Readme file lists
> these but none actually works. I am not sure if the problem is with the
> file or with the function because I am not an expert.
>
> Columns for individual data (HGDP/India/Africa individuals):
> 1. HGDP ID number or HapMap NA number
> 2. numeric code for population
> 3. name of population
> 4. country of origin
>
>
>
> On 11 October 2017 at 11:53, Thibaut Jombart <thibautjombart at gmail.com>
> wrote:
>
>> Hi there,
>>
>> reading populations info hasn't been a problem before (I think) in
>> read.structure. I would double-check which column it is, though I
>> assume you have. If you think there is a problem with the function
>> please post an issue on github with a reproducible example and we'll
>> try to sort it out.
>>
>> Best
>> Thibaut
>>
>> --
>> Dr Thibaut Jombart
>> Lecturer, Department of Infectious Disease Epidemiology, Imperial College
>> London
>> Head of RECON: repidemicsconsortium.org
>> WHO Consultant - outbreak analysis
>> sites.google.com/site/thibautjombart/
>> Twitter: @TeebzR
>> +44(0)20 7594 3658
>>
>>
>> On 6 October 2017 at 12:08, Davide Piffer <pifferdavide at gmail.com> wrote:
>> > Ok, I think have found the file I need here:
>> > https://rosenberglab.stanford.edu/data/huangEtAl2011/HuangEt
>> Al_2011-GenetEpi.zip
>> > . However, it's in .str format. Following the instructions on the
>> manual, I
>> > tried to assign correct labels based on the Readme file
>> > (https://rosenberglab.stanford.edu/data/huangEtAl2011/huangE
>> tAl2011snpdata_readme)
>> >
>> > Mydata=read.structure("unphased_HGDP+India+Africa_2810SNPs-
>> regions1to36.stru",
>> > onerowperind = FALSE,col.lab = 8,col.pop = 2,row.marknames = 1,n.ind =
>> 1107,
>> > n.loc = 2810, ask = FALSE)#convert into genind
>> > Mydata_pop=genind2genpop(Mydata)#convert into genpop
>> >
>> > However, I get a file with only 1 population.
>> >
>> > head(Mydata_pop)
>> > /// GENPOP OBJECT /////////
>> >
>> >  // 1 population; 2,810 loci; 7,217 alleles; size: 1.5 Mb
>> >
>> >  // Basic content
>> >    @tab:  1 x 7217 matrix of allele counts
>> >    @loc.n.all: number of alleles per locus (range: 2-4)
>> >    @loc.fac: locus factor for the 7217 columns of @tab
>> >    @all.names: list of allele names for each locus
>> >    @ploidy: ploidy of each individual  (range: 2-2)
>> >    @type:  codom
>> >    @call: .local(x = x, i = i, j = j, drop = dro
>> >
>> > This is obviously wrong since there are 50+ populations.
>> >
>> > I tried changing col.pop from 2 to 3 but got the same output.
>> >
>> > Am I missing something?
>> >
>> >
>> > All the best,
>> > Davide
>> >
>> >
>> >
>> > On 6 October 2017 at 11:35, Thibaut Jombart <thibautjombart at gmail.com>
>> > wrote:
>> >>
>> >> Hi again,
>> >>
>> >> OK I think I got it. So:
>> >> - I can't remember how I built the eHGDP dataset, but it's an easy task
>> >> - I don't know if the data you're looking for is publicly available
>> >> - assuming you find it, there are two ways to get a genpop object:
>> >> #1: from individual data with pop info: read data in (read.csv /
>> >> read.table), use df2genind (be patient there, that'll take a while),
>> >> then genind2genpop
>> >>
>> >> #2: from population data (allele counts): read data in (read.csv /
>> >> read.table), use the genpop() constructor to make the data a genpop
>> >> object; I think this is documented in the basics tutorial, but
>> >> definitely also in ?genpop
>> >>
>> >> HTH
>> >> Best
>> >> Thibaut
>> >>
>> >> --
>> >> Dr Thibaut Jombart
>> >> Lecturer, Department of Infectious Disease Epidemiology, Imperial
>> College
>> >> London
>> >> Head of RECON: repidemicsconsortium.org
>> >> WHO Consultant - outbreak analysis
>> >> sites.google.com/site/thibautjombart/
>> >> Twitter: @TeebzR
>> >> +44(0)20 7594 3658
>> >>
>> >>
>> >> On 6 October 2017 at 10:24, Davide Piffer <pifferdavide at gmail.com>
>> wrote:
>> >> > Dear Thibaut,
>> >> >
>> >> > thanks for answering my question. I will try to reformulate my
>> question
>> >> > differently, stating the assumptions:
>> >> > 1)  I assume that the eHGDP object was made into a genpop object from
>> >> > some
>> >> > raw .txt file, like the HGDP file I linked to in the previous email.
>> >> > 2) I need an object that looks exactly like the eHGDP object, but
>> with
>> >> > SNPs
>> >> > instead of microsatellite alleles.
>> >> > 3) Since it's gonna be a rather complex task, I asked if any of you
>> >> > knows if
>> >> > someone has already done this job before and published it (e.g. as
>> >> > supplementary file).
>> >> > 4) Otherwise, I would like to know how to produce such a file myself,
>> >> > starting from a version of the HGDP file with population
>> information. If
>> >> > this was done for microsatellites, surely it can be done for the
>> SNPs as
>> >> > well? I assume they rely on the same raw HGDP file.
>> >> >
>> >> > Many thanks!
>> >> >
>> >> > Davide
>> >> >
>> >> > On 6 October 2017 at 10:56, Thibaut Jombart <
>> thibautjombart at gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> Hi Davide,
>> >> >>
>> >> >> I am not entirely sure what you need, so sorry if I miss the point.
>> >> >> adegenet cannot make up for absent population information, but you
>> can
>> >> >> try to identify clusters of course, e.g. using find.clusters.
>> >> >>
>> >> >> eHGDP is not a file (at least not in the sense you probably mean),
>> but
>> >> >> a genind object. If the question is how you can get a file looking
>> >> >> like the one you link into a genind object, you probably want to use
>> >> >> something like read.csv and then df2genind. Imports should be
>> detailed
>> >> >> in the basics tutorial:
>> >> >> https://github.com/thibautjombart/adegenet/wiki/Tutorials
>> >> >>
>> >> >> Best
>> >> >> Thibaut
>> >> >>
>> >> >> --
>> >> >> Dr Thibaut Jombart
>> >> >> Lecturer, Department of Infectious Disease Epidemiology, Imperial
>> >> >> College
>> >> >> London
>> >> >> Head of RECON: repidemicsconsortium.org
>> >> >> WHO Consultant - outbreak analysis
>> >> >> sites.google.com/site/thibautjombart/
>> >> >> Twitter: @TeebzR
>> >> >> +44(0)20 7594 3658
>> >> >>
>> >> >>
>> >> >> On 4 October 2017 at 14:08, Davide Piffer <pifferdavide at gmail.com>
>> >> >> wrote:
>> >> >> > Hello,
>> >> >> >
>> >> >> > I am new to Adegenet. I would like to retrieve population
>> frequencies
>> >> >> > of
>> >> >> > SNPs (using rsID) from the HGDP file
>> "HGDP_FinalReport_Forward.txt" :
>> >> >> > http://www.hagsc.org/hgdp/files.html
>> >> >> >
>> >> >> > However, the file lacks population information. It contains SNPs x
>> >> >> > individuals.
>> >> >> > I need a file structured like the eHGDP (except with SNPs and not
>> >> >> > microsatellite data) file provided with the package, that can be
>> >> >> > easily
>> >> >> > converted into genpop file and then compute the frequencies via
>> >> >> > makefreq.
>> >> >> > Do you know if there is any such file downloadable on the
>> internet?
>> >> >> > i guess there must be a way to produce such a file using ADEGENET
>> >> >> > starting
>> >> >> > from raw data. but my knowledge of this package is not advanced
>> >> >> > enough
>> >> >> > yet.
>> >> >> >
>> >> >> > Best wishes,
>> >> >> >
>> >> >> > Davide
>> >> >> >
>> >> >> > _______________________________________________
>> >> >> > adegenet-forum mailing list
>> >> >> > adegenet-forum at lists.r-forge.r-project.org
>> >> >> >
>> >> >> >
>> >> >> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo
>> /adegenet-forum
>> >> >
>> >> >
>> >
>> >
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20171011/199da45a/attachment-0001.html>


More information about the adegenet-forum mailing list