[adegenet-forum] Retrieving population allele frequencies of SNPs using HGDP file
thibautjombart at gmail.com
Fri Oct 6 11:35:18 CEST 2017
OK I think I got it. So:
- I can't remember how I built the eHGDP dataset, but it's an easy task
- I don't know if the data you're looking for is publicly available
- assuming you find it, there are two ways to get a genpop object:
#1: from individual data with pop info: read data in (read.csv /
read.table), use df2genind (be patient there, that'll take a while),
#2: from population data (allele counts): read data in (read.csv /
read.table), use the genpop() constructor to make the data a genpop
object; I think this is documented in the basics tutorial, but
definitely also in ?genpop
Dr Thibaut Jombart
Lecturer, Department of Infectious Disease Epidemiology, Imperial College London
Head of RECON: repidemicsconsortium.org
WHO Consultant - outbreak analysis
+44(0)20 7594 3658
On 6 October 2017 at 10:24, Davide Piffer <pifferdavide at gmail.com> wrote:
> Dear Thibaut,
> thanks for answering my question. I will try to reformulate my question
> differently, stating the assumptions:
> 1) I assume that the eHGDP object was made into a genpop object from some
> raw .txt file, like the HGDP file I linked to in the previous email.
> 2) I need an object that looks exactly like the eHGDP object, but with SNPs
> instead of microsatellite alleles.
> 3) Since it's gonna be a rather complex task, I asked if any of you knows if
> someone has already done this job before and published it (e.g. as
> supplementary file).
> 4) Otherwise, I would like to know how to produce such a file myself,
> starting from a version of the HGDP file with population information. If
> this was done for microsatellites, surely it can be done for the SNPs as
> well? I assume they rely on the same raw HGDP file.
> Many thanks!
> On 6 October 2017 at 10:56, Thibaut Jombart <thibautjombart at gmail.com>
>> Hi Davide,
>> I am not entirely sure what you need, so sorry if I miss the point.
>> adegenet cannot make up for absent population information, but you can
>> try to identify clusters of course, e.g. using find.clusters.
>> eHGDP is not a file (at least not in the sense you probably mean), but
>> a genind object. If the question is how you can get a file looking
>> like the one you link into a genind object, you probably want to use
>> something like read.csv and then df2genind. Imports should be detailed
>> in the basics tutorial:
>> Dr Thibaut Jombart
>> Lecturer, Department of Infectious Disease Epidemiology, Imperial College
>> Head of RECON: repidemicsconsortium.org
>> WHO Consultant - outbreak analysis
>> Twitter: @TeebzR
>> +44(0)20 7594 3658
>> On 4 October 2017 at 14:08, Davide Piffer <pifferdavide at gmail.com> wrote:
>> > Hello,
>> > I am new to Adegenet. I would like to retrieve population frequencies of
>> > SNPs (using rsID) from the HGDP file "HGDP_FinalReport_Forward.txt" :
>> > http://www.hagsc.org/hgdp/files.html
>> > However, the file lacks population information. It contains SNPs x
>> > individuals.
>> > I need a file structured like the eHGDP (except with SNPs and not
>> > microsatellite data) file provided with the package, that can be easily
>> > converted into genpop file and then compute the frequencies via
>> > makefreq.
>> > Do you know if there is any such file downloadable on the internet?
>> > i guess there must be a way to produce such a file using ADEGENET
>> > starting
>> > from raw data. but my knowledge of this package is not advanced enough
>> > yet.
>> > Best wishes,
>> > Davide
>> > _______________________________________________
>> > adegenet-forum mailing list
>> > adegenet-forum at lists.r-forge.r-project.org
>> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
More information about the adegenet-forum