[adegenet-forum] reading dipolid genetic sequence data for use in SPCA
Nevil Amos
nevil.amos at gmail.com
Tue Nov 15 06:44:15 CET 2011
I am trying to read dipolid sequence data in fasta format into a genind
object for spca. The input data is as pairs of sequences arranged
with the Id of the individual before ach sequence so each individual
name appears before tow consecutive sequences. there are 44 individuals
with 88 sequences of equal length.
when I read this using read.dna() and dnabin2genind() the data is read
but there are 88 inds in geneind at ind.names, and 88 genotypes in
genind at tab, each ind.name is repeated, ie the sequences are being
treated as from 88 individuals rather than paired sequences from 44.
I cannot see where I include an argument to inform that the data is dipoid.
Please can you advise how to proceed with this data?
I have pasted below the code and the data for the first two individuals:
> AB4<-read.dna("d:/nevs_docs/EYR/NccGenes/AB4_phased.fas", format =
"fasta", skip = 0,
+ nlines = 0, comment.char = "[", seq.names = NULL,
+ as.character = F, as.matrix = T)
> AB4genind<-DNAbin2genind(AB4)
> AB4genind
#####################
### Genind object ###
#####################
- genotypes of individuals -
S4 class: genind
@call: DNAbin2genind(x = AB4)
@tab: 88 x 4 matrix of genotypes
@ind.names: vector of 88 individual names
@loc.names: vector of 2 locus names
@loc.nall: number of alleles per locus
@loc.fac: locus factor for the 4 columns of @tab
@all.names: list of 2 components yielding allele names for each locus
@ploidy: 1
@type: codom
Optionnal contents:
@pop: - empty -
@pop.names: - empty -
@other: - empty -
fasta file for first two individuals
>'EYR018' [by DnaSP Ver. 5.10.01, from file:
AB4_EYR44_196bp_phased.nex Nov 11, 2011]
GAGAGATCTAAGGAGCCTAAGCTATGATCTGTGGAGCAATAGGTGGCTCACCTGAGACAC
AGCCTTGGCTGGAGTCACGGTACTTTCCAGAGCTCTGTGCTGAAGAGC-GGCTCTCAGTG
AAGCTTTGTCCTCCCTCTGCAGGGCTGGATGGGCTGGCTGAACGCTGTGCCCAGTACAAG
AAAGATGGTGCTGACA
>'EYR018'
GAGAGATCTAAGGAGCCTAAGCTATGATCTGTGGAGCAATAGGTGGCTCACCTGAGACAC
AGCCTTGGCTGGAGTCACGGTACTTTCCAGAGCTCTGTGCTGAAGAGC-GGCTCTCAGTG
AAGCTTTGTCCTCCCTCTGCAGGGCTGGATGGGCTGGCTGAACGCTGTGCCCAGTACAAG
AAAGATGGTGCTGACA
>'ANWC45051'
GAGAGATCTAAGGAGCCTAAGCTATGATCTGTGGAGCAATAGGTGGCTCACCTGAGACAC
AGCCTTGGCTGGAGTCACGGTACTTTCCAGAGCTCTGTGCTGAAGAGC-GGCTCTCAGTG
AAGCTTTGTCCTCCCTCTGCAGGGCTGGATGGGCTGGCTGAACGCTGTGCCCAGTACAAG
AAAGATGGTGCTGACA
>'ANWC45051'
GAGAGATCTAAGGAGCCTAAGCTATGATCTGTGGAGCAATAGGTGGCTCACCTGAGACAC
AGCCTTGGCTGGAGTCACGGTACTTTCCAGAGCTCTGTGCTGAAGAGC-GGCTCTCAGTG
AAGCTTTGTCCTCCCTCTGCAGGGCTGGATGGGCTGGCTGAACGCTGTGCCCAGTACAAG
AAAGATGGTGCTGACA
More information about the adegenet-forum
mailing list