[adegenet-forum] Retrieving population allele frequencies of SNPs using HGDP file

Davide Piffer pifferdavide at gmail.com
Wed Oct 11 13:13:20 CEST 2017


Hi,

I tried using different columns for the population. The Readme file lists
these but none actually works. I am not sure if the problem is with the
file or with the function because I am not an expert.

Columns for individual data (HGDP/India/Africa individuals):
1. HGDP ID number or HapMap NA number
2. numeric code for population
3. name of population
4. country of origin



On 11 October 2017 at 11:53, Thibaut Jombart <thibautjombart at gmail.com>
wrote:

> Hi there,
>
> reading populations info hasn't been a problem before (I think) in
> read.structure. I would double-check which column it is, though I
> assume you have. If you think there is a problem with the function
> please post an issue on github with a reproducible example and we'll
> try to sort it out.
>
> Best
> Thibaut
>
> --
> Dr Thibaut Jombart
> Lecturer, Department of Infectious Disease Epidemiology, Imperial College
> London
> Head of RECON: repidemicsconsortium.org
> WHO Consultant - outbreak analysis
> sites.google.com/site/thibautjombart/
> Twitter: @TeebzR
> +44(0)20 7594 3658
>
>
> On 6 October 2017 at 12:08, Davide Piffer <pifferdavide at gmail.com> wrote:
> > Ok, I think have found the file I need here:
> > https://rosenberglab.stanford.edu/data/huangEtAl2011/
> HuangEtAl_2011-GenetEpi.zip
> > . However, it's in .str format. Following the instructions on the
> manual, I
> > tried to assign correct labels based on the Readme file
> > (https://rosenberglab.stanford.edu/data/huangEtAl2011/
> huangEtAl2011snpdata_readme)
> >
> > Mydata=read.structure("unphased_HGDP+India+Africa_
> 2810SNPs-regions1to36.stru",
> > onerowperind = FALSE,col.lab = 8,col.pop = 2,row.marknames = 1,n.ind =
> 1107,
> > n.loc = 2810, ask = FALSE)#convert into genind
> > Mydata_pop=genind2genpop(Mydata)#convert into genpop
> >
> > However, I get a file with only 1 population.
> >
> > head(Mydata_pop)
> > /// GENPOP OBJECT /////////
> >
> >  // 1 population; 2,810 loci; 7,217 alleles; size: 1.5 Mb
> >
> >  // Basic content
> >    @tab:  1 x 7217 matrix of allele counts
> >    @loc.n.all: number of alleles per locus (range: 2-4)
> >    @loc.fac: locus factor for the 7217 columns of @tab
> >    @all.names: list of allele names for each locus
> >    @ploidy: ploidy of each individual  (range: 2-2)
> >    @type:  codom
> >    @call: .local(x = x, i = i, j = j, drop = dro
> >
> > This is obviously wrong since there are 50+ populations.
> >
> > I tried changing col.pop from 2 to 3 but got the same output.
> >
> > Am I missing something?
> >
> >
> > All the best,
> > Davide
> >
> >
> >
> > On 6 October 2017 at 11:35, Thibaut Jombart <thibautjombart at gmail.com>
> > wrote:
> >>
> >> Hi again,
> >>
> >> OK I think I got it. So:
> >> - I can't remember how I built the eHGDP dataset, but it's an easy task
> >> - I don't know if the data you're looking for is publicly available
> >> - assuming you find it, there are two ways to get a genpop object:
> >> #1: from individual data with pop info: read data in (read.csv /
> >> read.table), use df2genind (be patient there, that'll take a while),
> >> then genind2genpop
> >>
> >> #2: from population data (allele counts): read data in (read.csv /
> >> read.table), use the genpop() constructor to make the data a genpop
> >> object; I think this is documented in the basics tutorial, but
> >> definitely also in ?genpop
> >>
> >> HTH
> >> Best
> >> Thibaut
> >>
> >> --
> >> Dr Thibaut Jombart
> >> Lecturer, Department of Infectious Disease Epidemiology, Imperial
> College
> >> London
> >> Head of RECON: repidemicsconsortium.org
> >> WHO Consultant - outbreak analysis
> >> sites.google.com/site/thibautjombart/
> >> Twitter: @TeebzR
> >> +44(0)20 7594 3658
> >>
> >>
> >> On 6 October 2017 at 10:24, Davide Piffer <pifferdavide at gmail.com>
> wrote:
> >> > Dear Thibaut,
> >> >
> >> > thanks for answering my question. I will try to reformulate my
> question
> >> > differently, stating the assumptions:
> >> > 1)  I assume that the eHGDP object was made into a genpop object from
> >> > some
> >> > raw .txt file, like the HGDP file I linked to in the previous email.
> >> > 2) I need an object that looks exactly like the eHGDP object, but with
> >> > SNPs
> >> > instead of microsatellite alleles.
> >> > 3) Since it's gonna be a rather complex task, I asked if any of you
> >> > knows if
> >> > someone has already done this job before and published it (e.g. as
> >> > supplementary file).
> >> > 4) Otherwise, I would like to know how to produce such a file myself,
> >> > starting from a version of the HGDP file with population information.
> If
> >> > this was done for microsatellites, surely it can be done for the SNPs
> as
> >> > well? I assume they rely on the same raw HGDP file.
> >> >
> >> > Many thanks!
> >> >
> >> > Davide
> >> >
> >> > On 6 October 2017 at 10:56, Thibaut Jombart <thibautjombart at gmail.com
> >
> >> > wrote:
> >> >>
> >> >> Hi Davide,
> >> >>
> >> >> I am not entirely sure what you need, so sorry if I miss the point.
> >> >> adegenet cannot make up for absent population information, but you
> can
> >> >> try to identify clusters of course, e.g. using find.clusters.
> >> >>
> >> >> eHGDP is not a file (at least not in the sense you probably mean),
> but
> >> >> a genind object. If the question is how you can get a file looking
> >> >> like the one you link into a genind object, you probably want to use
> >> >> something like read.csv and then df2genind. Imports should be
> detailed
> >> >> in the basics tutorial:
> >> >> https://github.com/thibautjombart/adegenet/wiki/Tutorials
> >> >>
> >> >> Best
> >> >> Thibaut
> >> >>
> >> >> --
> >> >> Dr Thibaut Jombart
> >> >> Lecturer, Department of Infectious Disease Epidemiology, Imperial
> >> >> College
> >> >> London
> >> >> Head of RECON: repidemicsconsortium.org
> >> >> WHO Consultant - outbreak analysis
> >> >> sites.google.com/site/thibautjombart/
> >> >> Twitter: @TeebzR
> >> >> +44(0)20 7594 3658
> >> >>
> >> >>
> >> >> On 4 October 2017 at 14:08, Davide Piffer <pifferdavide at gmail.com>
> >> >> wrote:
> >> >> > Hello,
> >> >> >
> >> >> > I am new to Adegenet. I would like to retrieve population
> frequencies
> >> >> > of
> >> >> > SNPs (using rsID) from the HGDP file
> "HGDP_FinalReport_Forward.txt" :
> >> >> > http://www.hagsc.org/hgdp/files.html
> >> >> >
> >> >> > However, the file lacks population information. It contains SNPs x
> >> >> > individuals.
> >> >> > I need a file structured like the eHGDP (except with SNPs and not
> >> >> > microsatellite data) file provided with the package, that can be
> >> >> > easily
> >> >> > converted into genpop file and then compute the frequencies via
> >> >> > makefreq.
> >> >> > Do you know if there is any such file downloadable on the internet?
> >> >> > i guess there must be a way to produce such a file using ADEGENET
> >> >> > starting
> >> >> > from raw data. but my knowledge of this package is not advanced
> >> >> > enough
> >> >> > yet.
> >> >> >
> >> >> > Best wishes,
> >> >> >
> >> >> > Davide
> >> >> >
> >> >> > _______________________________________________
> >> >> > adegenet-forum mailing list
> >> >> > adegenet-forum at lists.r-forge.r-project.org
> >> >> >
> >> >> >
> >> >> > https://lists.r-forge.r-project.org/cgi-bin/mailman/
> listinfo/adegenet-forum
> >> >
> >> >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20171011/68a69dbb/attachment.html>


More information about the adegenet-forum mailing list