[adegenet-forum] problem to import data with read.structure
jombart at biomserv.univ-lyon1.fr
Wed Jul 23 11:16:57 CEST 2008
> Dear all,
> My question is about the data importation and the use of
> read.structure command.
> In the tutorial I read:
> "In all cases, it should be possible to store data in an individuals x
> markers table where each element is a character string coding 2 alleles
> Such data are interpretable when all strings contain 2,4 or 6 characters."
This part was about df2genind, but it is obsolete since version 1.2-0;
now there can be any ploidy level, so the function aims at finding the
number of characters coding alleles according the the maximum number of
characters and the ploidy. Anyhow, it is safer to provide the number of
characters coding the genotypes (argument ncode) or to use a separator
between alleles. I updated the tutorial, it should be online within a
> In my case the allele are not stored together and should *not be
> coded with two character(*s)?
This applied to df2genind, not to read.structure. In structure, alleles
are always separated, so there should be no problem.
> In more detail:
> In fact I have a problem for importing data with adegent using
> read.structure (which is the most convenient when the two alleles of
> each loci are not stored together but in two different colon). When
> in my data to import, I have allele coded with 2 and 1 characters , I
> have no probleme to import the data. However when I have all the
> alleles coded with only one characters, I cannot import the file and I
> have the following error message:
> _I used the following command:_
> _Error message:_
> Error in df2genind(X = X, pop = pop, missing = missing) :
> Invalid number of coding characters (should be 2, 4, or 6)
This may be a problem in read.structure, not in your data. But I have to
be able to reproduce the problem to make it clear, and correct bugs if
any. Could you send me a toy dataset reproducing the problem?
> _My questions are:_
> - is the error message due to the fact that the alleles are coded with
> only one character?
It should not. If it is, I'll fix this.
> -when in a dataset alleles are coded with two and one character, R
> read all with two characters?
No, it does not.
> -what is the best way to import such data (file text with allele not
I'd say, the best way is the simplest for you. If your data are in
STRUCTURE format, then read.structure should do the job, and I'll fix
problems if there are some. If you do not have a file with one of the
recognized format (GENETIX, Hierfstat, Genepop, STRUCTURE), then use
df2genind. The advantage of df2genind is that any separator between
alleles can be used.
> -one allele is coded by 0 : is it a problem ?
Yes, because it will be understood as a NA. In many formats, "0" (or
"00", or "000", etc.) stands for NA. In STRUCTURE, NAs are coded by "-9"
by default, but read.structure uses internally df2genind, which
considers NAs and zeros both as missing data. I added a comment about
this in ?read.structure.
> Thank you for your attention
> Stéphanie Manel
> Université Joseph Fourier,
> Laboratoire d'Ecologie Alpine, Equipe GPB
> UMR-CNRS 5553, BPX53 Grenoble 38041
> tél: 04 76 51 41 15
> adegenet-forum mailing list
> adegenet-forum at lists.r-forge.r-project.org
CNRS UMR 5558 - Laboratoire de Biométrie et Biologie Evolutive
Universite Lyon 1
43 bd du 11 novembre 1918
69622 Villeurbanne Cedex
Tél. : 04.72.43.29.35
Fax : 04.72.43.13.88
jombart at biomserv.univ-lyon1.fr
More information about the adegenet-forum