[adegenet-forum] reading dipolid genetic sequence data for use in SPCA

Nevil Amos nevil.amos at gmail.com
Tue Nov 15 06:44:15 CET 2011


I am trying to read dipolid sequence  data in fasta format into a genind 
object for spca.    The input data is as pairs of sequences arranged 
with the Id of the individual before ach sequence so each individual 
name appears before tow consecutive sequences.  there are 44 individuals 
with 88 sequences of equal length.

when I read this using read.dna() and dnabin2genind()  the data is read 
but there are  88 inds in geneind at ind.names, and 88 genotypes in 
genind at tab,  each ind.name is repeated, ie the sequences are being 
treated as from 88 individuals rather than paired sequences from 44.

I cannot see where I include an argument to inform that the data is dipoid.

Please can you advise how to proceed with this data?

I have pasted below the code and the data for the first two individuals:
 > AB4<-read.dna("d:/nevs_docs/EYR/NccGenes/AB4_phased.fas", format = 
"fasta", skip = 0,
+          nlines = 0, comment.char = "[", seq.names = NULL,
+          as.character = F, as.matrix = T)
 > AB4genind<-DNAbin2genind(AB4)
 > AB4genind

    #####################
    ### Genind object ###
    #####################
- genotypes of individuals -

S4 class:  genind
@call: DNAbin2genind(x = AB4)

@tab:  88 x 4 matrix of genotypes

@ind.names: vector of  88 individual names
@loc.names: vector of  2 locus names
@loc.nall: number of alleles per locus
@loc.fac: locus factor for the  4 columns of @tab
@all.names: list of  2 components yielding allele names for each locus
@ploidy:  1
@type:  codom

Optionnal contents:
@pop:  - empty -
@pop.names:  - empty -

@other: - empty -

fasta file for first two individuals

 >'EYR018'   [by DnaSP Ver. 5.10.01, from file: 
AB4_EYR44_196bp_phased.nex     Nov 11, 2011]
GAGAGATCTAAGGAGCCTAAGCTATGATCTGTGGAGCAATAGGTGGCTCACCTGAGACAC
AGCCTTGGCTGGAGTCACGGTACTTTCCAGAGCTCTGTGCTGAAGAGC-GGCTCTCAGTG
AAGCTTTGTCCTCCCTCTGCAGGGCTGGATGGGCTGGCTGAACGCTGTGCCCAGTACAAG
AAAGATGGTGCTGACA
 >'EYR018'
GAGAGATCTAAGGAGCCTAAGCTATGATCTGTGGAGCAATAGGTGGCTCACCTGAGACAC
AGCCTTGGCTGGAGTCACGGTACTTTCCAGAGCTCTGTGCTGAAGAGC-GGCTCTCAGTG
AAGCTTTGTCCTCCCTCTGCAGGGCTGGATGGGCTGGCTGAACGCTGTGCCCAGTACAAG
AAAGATGGTGCTGACA
 >'ANWC45051'
GAGAGATCTAAGGAGCCTAAGCTATGATCTGTGGAGCAATAGGTGGCTCACCTGAGACAC
AGCCTTGGCTGGAGTCACGGTACTTTCCAGAGCTCTGTGCTGAAGAGC-GGCTCTCAGTG
AAGCTTTGTCCTCCCTCTGCAGGGCTGGATGGGCTGGCTGAACGCTGTGCCCAGTACAAG
AAAGATGGTGCTGACA
 >'ANWC45051'
GAGAGATCTAAGGAGCCTAAGCTATGATCTGTGGAGCAATAGGTGGCTCACCTGAGACAC
AGCCTTGGCTGGAGTCACGGTACTTTCCAGAGCTCTGTGCTGAAGAGC-GGCTCTCAGTG
AAGCTTTGTCCTCCCTCTGCAGGGCTGGATGGGCTGGCTGAACGCTGTGCCCAGTACAAG
AAAGATGGTGCTGACA







More information about the adegenet-forum mailing list