[adegenet-forum] df2genind returns wrong number of alleles - genotype code 0(homo), 1(homo), 2(hetero)
Thibaut Jombart
thibautjombart at gmail.com
Fri Oct 28 12:13:06 CEST 2016
Dear Laura,
This coding is indeed not compatible with the expected input for df2genind.
The function takes in characters coding alleles, not genotypes. Imagine a
locus is heterozygote A / T, with A as ref. The input for df2genind would
be "A / T" while yours is "1". In fact the coding you describe is the one
used in the genlight class, which is a lot more compact. You might want to
use it; for instance:
> set.seed(1)
> m <- matrix(sample(0:2, 30, replace=TRUE), nrow=5)
> m
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0 2 0 1 2 1
[2,] 1 2 0 2 0 0
[3,] 1 1 2 2 1 1
[4,] 2 1 1 1 0 2
[5,] 0 0 2 2 0 1
> x <- new("genlight", m)
> x
/// GENLIGHT OBJECT /////////
// 5 genotypes, 6 binary SNPs, size: 9.2 Kb
0 (0 %) missing data
// Basic content
@gen: list of 5 SNPbin
// Optional content
@other: a list containing: elements without names
> plot(x)
Note that of ploidy is constant across individuals, you can also do without
- your data format is already compatible with most methods.
Best
Thibaut
--
Dr Thibaut Jombart
Lecturer, Department of Infectious Disease Epidemiology, Imperial College
London
Head of RECON: repidemicsconsortium.org
sites.google.com/site/thibautjombart/
github.com/thibautjombart
Twitter: @TeebzR <http://twitter.com/TeebzR>
On 28 October 2016 at 03:18, Laura Taillebois <Laura.Taillebois at cdu.edu.au>
wrote:
> Hi All adegenet guru,
>
> I am having troubles getting *df2genind* function to find the correct
> number of alleles in my dataset.
>
> My data are SNPs data (2 alleles at each locus). The genotypes are encoded
> in one single column such as 0=reference homozygote, 1=SNP homozygote and
> 2=heterozygote. And I importe them as data frame from a comma separated
> .csv file.
>
> When I apply the function df2genind,
>
> genind <- df2genind(locus, sep=",", ncode=1, NA.char="NA", ploidy=2)
>
> the genind object returned is as follow:
>
> /// GENIND OBJECT /////////
>
> // 1 individual; 2,078 loci; 5,752 alleles; size: 1.2 Mb
>
> // Basic content
> @tab: 1 x 5752 matrix of allele counts
> @loc.n.all: number of alleles per locus (range: 2-3)
> @loc.fac: locus factor for the 5752 columns of @tab
> @all.names: list of allele names for each locus
> @ploidy: ploidy of each individual (range: 2-2)
> @type: codom
> @call: .local(x = x, i = i, j = j, drop = drop)
>
> // Optional content
> - empty -
>
> There should be only 4,158 alleles in the object and not 5,752. Is there a
> problem with using this type of 0,1,2 code for the genotypes? Should my
> input have 2 columns for each genotype ?
>
> Thanks for your support!
>
> Cheers, Laura
>
> _______________________________________________
> adegenet-forum mailing list
> adegenet-forum at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/
> listinfo/adegenet-forum
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20161028/9e58838b/attachment.html>
More information about the adegenet-forum
mailing list