[adegenet-forum] Retrieving population allele frequencies of SNPs using HGDP file

Davide Piffer pifferdavide at gmail.com
Fri Oct 6 13:08:32 CEST 2017


Ok, I think have found the file I need here:
https://rosenberglab.stanford.edu/data/huangEtAl2011/HuangEtAl_2011-GenetEpi.zip
. However, it's in .str format. Following the instructions on the manual, I
tried to assign correct labels based on the Readme file (
https://rosenberglab.stanford.edu/data/huangEtAl2011/huangEtAl2011snpdata_readme
)

Mydata=read.structure("unphased_HGDP+India+Africa_2810SNPs-regions1to36.stru",
onerowperind = FALSE,col.lab = 8,col.pop = 2,row.marknames = 1,n.ind =
1107, n.loc = 2810, ask = FALSE)#convert into genind
Mydata_pop=genind2genpop(Mydata)#convert into genpop

However, I get a file with only 1 population.

head(Mydata_pop)
/// GENPOP OBJECT /////////

 // 1 population; 2,810 loci; 7,217 alleles; size: 1.5 Mb

 // Basic content
   @tab:  1 x 7217 matrix of allele counts
   @loc.n.all: number of alleles per locus (range: 2-4)
   @loc.fac: locus factor for the 7217 columns of @tab
   @all.names: list of allele names for each locus
   @ploidy: ploidy of each individual  (range: 2-2)
   @type:  codom
   @call: .local(x = x, i = i, j = j, drop = dro

This is obviously wrong since there are 50+ populations.

I tried changing col.pop from 2 to 3 but got the same output.

Am I missing something?


All the best,
Davide



On 6 October 2017 at 11:35, Thibaut Jombart <thibautjombart at gmail.com>
wrote:

> Hi again,
>
> OK I think I got it. So:
> - I can't remember how I built the eHGDP dataset, but it's an easy task
> - I don't know if the data you're looking for is publicly available
> - assuming you find it, there are two ways to get a genpop object:
> #1: from individual data with pop info: read data in (read.csv /
> read.table), use df2genind (be patient there, that'll take a while),
> then genind2genpop
>
> #2: from population data (allele counts): read data in (read.csv /
> read.table), use the genpop() constructor to make the data a genpop
> object; I think this is documented in the basics tutorial, but
> definitely also in ?genpop
>
> HTH
> Best
> Thibaut
>
> --
> Dr Thibaut Jombart
> Lecturer, Department of Infectious Disease Epidemiology, Imperial College
> London
> Head of RECON: repidemicsconsortium.org
> WHO Consultant - outbreak analysis
> sites.google.com/site/thibautjombart/
> Twitter: @TeebzR
> +44(0)20 7594 3658
>
>
> On 6 October 2017 at 10:24, Davide Piffer <pifferdavide at gmail.com> wrote:
> > Dear Thibaut,
> >
> > thanks for answering my question. I will try to reformulate my question
> > differently, stating the assumptions:
> > 1)  I assume that the eHGDP object was made into a genpop object from
> some
> > raw .txt file, like the HGDP file I linked to in the previous email.
> > 2) I need an object that looks exactly like the eHGDP object, but with
> SNPs
> > instead of microsatellite alleles.
> > 3) Since it's gonna be a rather complex task, I asked if any of you
> knows if
> > someone has already done this job before and published it (e.g. as
> > supplementary file).
> > 4) Otherwise, I would like to know how to produce such a file myself,
> > starting from a version of the HGDP file with population information. If
> > this was done for microsatellites, surely it can be done for the SNPs as
> > well? I assume they rely on the same raw HGDP file.
> >
> > Many thanks!
> >
> > Davide
> >
> > On 6 October 2017 at 10:56, Thibaut Jombart <thibautjombart at gmail.com>
> > wrote:
> >>
> >> Hi Davide,
> >>
> >> I am not entirely sure what you need, so sorry if I miss the point.
> >> adegenet cannot make up for absent population information, but you can
> >> try to identify clusters of course, e.g. using find.clusters.
> >>
> >> eHGDP is not a file (at least not in the sense you probably mean), but
> >> a genind object. If the question is how you can get a file looking
> >> like the one you link into a genind object, you probably want to use
> >> something like read.csv and then df2genind. Imports should be detailed
> >> in the basics tutorial:
> >> https://github.com/thibautjombart/adegenet/wiki/Tutorials
> >>
> >> Best
> >> Thibaut
> >>
> >> --
> >> Dr Thibaut Jombart
> >> Lecturer, Department of Infectious Disease Epidemiology, Imperial
> College
> >> London
> >> Head of RECON: repidemicsconsortium.org
> >> WHO Consultant - outbreak analysis
> >> sites.google.com/site/thibautjombart/
> >> Twitter: @TeebzR
> >> +44(0)20 7594 3658
> >>
> >>
> >> On 4 October 2017 at 14:08, Davide Piffer <pifferdavide at gmail.com>
> wrote:
> >> > Hello,
> >> >
> >> > I am new to Adegenet. I would like to retrieve population frequencies
> of
> >> > SNPs (using rsID) from the HGDP file "HGDP_FinalReport_Forward.txt" :
> >> > http://www.hagsc.org/hgdp/files.html
> >> >
> >> > However, the file lacks population information. It contains SNPs x
> >> > individuals.
> >> > I need a file structured like the eHGDP (except with SNPs and not
> >> > microsatellite data) file provided with the package, that can be
> easily
> >> > converted into genpop file and then compute the frequencies via
> >> > makefreq.
> >> > Do you know if there is any such file downloadable on the internet?
> >> > i guess there must be a way to produce such a file using ADEGENET
> >> > starting
> >> > from raw data. but my knowledge of this package is not advanced enough
> >> > yet.
> >> >
> >> > Best wishes,
> >> >
> >> > Davide
> >> >
> >> > _______________________________________________
> >> > adegenet-forum mailing list
> >> > adegenet-forum at lists.r-forge.r-project.org
> >> >
> >> > https://lists.r-forge.r-project.org/cgi-bin/mailman/
> listinfo/adegenet-forum
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20171006/5ac5c1f6/attachment-0001.html>


More information about the adegenet-forum mailing list