[adegenet-forum] df2genind returns wrong number of alleles - genotype code 0(homo), 1(homo), 2(hetero)

Thibaut Jombart thibautjombart at gmail.com
Fri Oct 28 12:13:06 CEST 2016


Dear Laura,

This coding is indeed not compatible with the expected input for df2genind.
The function takes in characters coding alleles, not genotypes. Imagine a
locus is heterozygote A / T, with A as ref. The input for df2genind would
be "A / T" while yours is "1". In fact the coding you describe is the one
used in the genlight class, which is a lot more compact. You might want to
use it; for instance:

> set.seed(1)
> m <-  matrix(sample(0:2, 30, replace=TRUE), nrow=5)
> m
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    0    2    0    1    2    1
[2,]    1    2    0    2    0    0
[3,]    1    1    2    2    1    1
[4,]    2    1    1    1    0    2
[5,]    0    0    2    2    0    1

> x <- new("genlight", m)
> x
 /// GENLIGHT OBJECT /////////

 // 5 genotypes,  6 binary SNPs, size: 9.2 Kb
 0 (0 %) missing data

 // Basic content
   @gen: list of 5 SNPbin

 // Optional content
   @other: a list containing: elements without names

> plot(x)


Note that of ploidy is constant across individuals, you can also do without
- your data format is already compatible with most methods.

Best
Thibaut



--
Dr Thibaut Jombart
Lecturer, Department of Infectious Disease Epidemiology, Imperial College
London
Head of RECON: repidemicsconsortium.org
sites.google.com/site/thibautjombart/
github.com/thibautjombart
Twitter: @TeebzR <http://twitter.com/TeebzR>

On 28 October 2016 at 03:18, Laura Taillebois <Laura.Taillebois at cdu.edu.au>
wrote:

> Hi All adegenet guru,
>
> I am having troubles getting *df2genind* function to find the correct
> number of alleles in my dataset.
>
> My data are SNPs data (2 alleles at each locus). The genotypes are encoded
> in one single column such as 0=reference homozygote, 1=SNP homozygote and
> 2=heterozygote. And I importe them as data frame from a comma separated
> .csv file.
>
> When I apply the function df2genind,
>
> genind <- df2genind(locus, sep=",", ncode=1, NA.char="NA", ploidy=2)
>
> the genind object returned is as follow:
>
> /// GENIND OBJECT /////////
>
>  // 1 individual; 2,078 loci; 5,752 alleles; size: 1.2 Mb
>
>  // Basic content
>    @tab:  1 x 5752 matrix of allele counts
>    @loc.n.all: number of alleles per locus (range: 2-3)
>    @loc.fac: locus factor for the 5752 columns of @tab
>    @all.names: list of allele names for each locus
>    @ploidy: ploidy of each individual  (range: 2-2)
>    @type:  codom
>    @call: .local(x = x, i = i, j = j, drop = drop)
>
>  // Optional content
>    - empty -
>
> There should be only 4,158 alleles in the object and not 5,752. Is there a
> problem with using this type of 0,1,2 code for the genotypes? Should my
> input have 2 columns for each genotype ?
>
> Thanks for your support!
>
> Cheers, Laura
>
> _______________________________________________
> adegenet-forum mailing list
> adegenet-forum at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/
> listinfo/adegenet-forum
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20161028/9e58838b/attachment.html>


More information about the adegenet-forum mailing list