[adegenet-forum] Retrieving population allele frequencies of SNPs using HGDP file

Thibaut Jombart thibautjombart at gmail.com
Wed Oct 11 11:53:25 CEST 2017


Hi there,

reading populations info hasn't been a problem before (I think) in
read.structure. I would double-check which column it is, though I
assume you have. If you think there is a problem with the function
please post an issue on github with a reproducible example and we'll
try to sort it out.

Best
Thibaut

--
Dr Thibaut Jombart
Lecturer, Department of Infectious Disease Epidemiology, Imperial College London
Head of RECON: repidemicsconsortium.org
WHO Consultant - outbreak analysis
sites.google.com/site/thibautjombart/
Twitter: @TeebzR
+44(0)20 7594 3658


On 6 October 2017 at 12:08, Davide Piffer <pifferdavide at gmail.com> wrote:
> Ok, I think have found the file I need here:
> https://rosenberglab.stanford.edu/data/huangEtAl2011/HuangEtAl_2011-GenetEpi.zip
> . However, it's in .str format. Following the instructions on the manual, I
> tried to assign correct labels based on the Readme file
> (https://rosenberglab.stanford.edu/data/huangEtAl2011/huangEtAl2011snpdata_readme)
>
> Mydata=read.structure("unphased_HGDP+India+Africa_2810SNPs-regions1to36.stru",
> onerowperind = FALSE,col.lab = 8,col.pop = 2,row.marknames = 1,n.ind = 1107,
> n.loc = 2810, ask = FALSE)#convert into genind
> Mydata_pop=genind2genpop(Mydata)#convert into genpop
>
> However, I get a file with only 1 population.
>
> head(Mydata_pop)
> /// GENPOP OBJECT /////////
>
>  // 1 population; 2,810 loci; 7,217 alleles; size: 1.5 Mb
>
>  // Basic content
>    @tab:  1 x 7217 matrix of allele counts
>    @loc.n.all: number of alleles per locus (range: 2-4)
>    @loc.fac: locus factor for the 7217 columns of @tab
>    @all.names: list of allele names for each locus
>    @ploidy: ploidy of each individual  (range: 2-2)
>    @type:  codom
>    @call: .local(x = x, i = i, j = j, drop = dro
>
> This is obviously wrong since there are 50+ populations.
>
> I tried changing col.pop from 2 to 3 but got the same output.
>
> Am I missing something?
>
>
> All the best,
> Davide
>
>
>
> On 6 October 2017 at 11:35, Thibaut Jombart <thibautjombart at gmail.com>
> wrote:
>>
>> Hi again,
>>
>> OK I think I got it. So:
>> - I can't remember how I built the eHGDP dataset, but it's an easy task
>> - I don't know if the data you're looking for is publicly available
>> - assuming you find it, there are two ways to get a genpop object:
>> #1: from individual data with pop info: read data in (read.csv /
>> read.table), use df2genind (be patient there, that'll take a while),
>> then genind2genpop
>>
>> #2: from population data (allele counts): read data in (read.csv /
>> read.table), use the genpop() constructor to make the data a genpop
>> object; I think this is documented in the basics tutorial, but
>> definitely also in ?genpop
>>
>> HTH
>> Best
>> Thibaut
>>
>> --
>> Dr Thibaut Jombart
>> Lecturer, Department of Infectious Disease Epidemiology, Imperial College
>> London
>> Head of RECON: repidemicsconsortium.org
>> WHO Consultant - outbreak analysis
>> sites.google.com/site/thibautjombart/
>> Twitter: @TeebzR
>> +44(0)20 7594 3658
>>
>>
>> On 6 October 2017 at 10:24, Davide Piffer <pifferdavide at gmail.com> wrote:
>> > Dear Thibaut,
>> >
>> > thanks for answering my question. I will try to reformulate my question
>> > differently, stating the assumptions:
>> > 1)  I assume that the eHGDP object was made into a genpop object from
>> > some
>> > raw .txt file, like the HGDP file I linked to in the previous email.
>> > 2) I need an object that looks exactly like the eHGDP object, but with
>> > SNPs
>> > instead of microsatellite alleles.
>> > 3) Since it's gonna be a rather complex task, I asked if any of you
>> > knows if
>> > someone has already done this job before and published it (e.g. as
>> > supplementary file).
>> > 4) Otherwise, I would like to know how to produce such a file myself,
>> > starting from a version of the HGDP file with population information. If
>> > this was done for microsatellites, surely it can be done for the SNPs as
>> > well? I assume they rely on the same raw HGDP file.
>> >
>> > Many thanks!
>> >
>> > Davide
>> >
>> > On 6 October 2017 at 10:56, Thibaut Jombart <thibautjombart at gmail.com>
>> > wrote:
>> >>
>> >> Hi Davide,
>> >>
>> >> I am not entirely sure what you need, so sorry if I miss the point.
>> >> adegenet cannot make up for absent population information, but you can
>> >> try to identify clusters of course, e.g. using find.clusters.
>> >>
>> >> eHGDP is not a file (at least not in the sense you probably mean), but
>> >> a genind object. If the question is how you can get a file looking
>> >> like the one you link into a genind object, you probably want to use
>> >> something like read.csv and then df2genind. Imports should be detailed
>> >> in the basics tutorial:
>> >> https://github.com/thibautjombart/adegenet/wiki/Tutorials
>> >>
>> >> Best
>> >> Thibaut
>> >>
>> >> --
>> >> Dr Thibaut Jombart
>> >> Lecturer, Department of Infectious Disease Epidemiology, Imperial
>> >> College
>> >> London
>> >> Head of RECON: repidemicsconsortium.org
>> >> WHO Consultant - outbreak analysis
>> >> sites.google.com/site/thibautjombart/
>> >> Twitter: @TeebzR
>> >> +44(0)20 7594 3658
>> >>
>> >>
>> >> On 4 October 2017 at 14:08, Davide Piffer <pifferdavide at gmail.com>
>> >> wrote:
>> >> > Hello,
>> >> >
>> >> > I am new to Adegenet. I would like to retrieve population frequencies
>> >> > of
>> >> > SNPs (using rsID) from the HGDP file "HGDP_FinalReport_Forward.txt" :
>> >> > http://www.hagsc.org/hgdp/files.html
>> >> >
>> >> > However, the file lacks population information. It contains SNPs x
>> >> > individuals.
>> >> > I need a file structured like the eHGDP (except with SNPs and not
>> >> > microsatellite data) file provided with the package, that can be
>> >> > easily
>> >> > converted into genpop file and then compute the frequencies via
>> >> > makefreq.
>> >> > Do you know if there is any such file downloadable on the internet?
>> >> > i guess there must be a way to produce such a file using ADEGENET
>> >> > starting
>> >> > from raw data. but my knowledge of this package is not advanced
>> >> > enough
>> >> > yet.
>> >> >
>> >> > Best wishes,
>> >> >
>> >> > Davide
>> >> >
>> >> > _______________________________________________
>> >> > adegenet-forum mailing list
>> >> > adegenet-forum at lists.r-forge.r-project.org
>> >> >
>> >> >
>> >> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
>> >
>> >
>
>


More information about the adegenet-forum mailing list