[adegenet-forum] DNAbin and pop
Jombart, Thibaut
t.jombart at imperial.ac.uk
Mon Dec 16 06:33:58 CET 2013
Hello,
yes, there are simpler ways. sub/gsub and regular expressions are immensely useful to extract information contained in the labels of sequences.
For instance:
##
> lab <- c("AD01012","AD666","FR1212","AD0101","FR9873")
> lab
[1] "AD01012" "AD666" "FR1212" "AD0101" "FR9873"
> pop <- gsub("[[:digit:]]","",lab)
> pop
[1] "AD" "AD" "FR" "AD" "FR"
##
For some useful examples, see ?sub and ?regexp
Cheers
Thibaut
________________________________________
From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Rita Castilho [rita.castil at gmail.com]
Sent: 16 December 2013 05:02
To: adegenet-forum at lists.r-forge.r-project.org
Subject: [adegenet-forum] DNAbin and pop
Hi!
I am new to R and I have a lot of trouble in going from a phylip or fasta file to a genind object or fasta2DNAbin containing pop information.
My files are always phylip or fasta files, and sequences have a reference composed of an di-alpha followed by 4 numeric digits (e.g. CD1495). The first two letters determine the population to which the sequence belongs to.
Is there a quick way to do it instead of doing this, as the grouping factor can be easily deduced from the current individual labels, saving the task of read that info R separately?
#reading data
dna <- fasta2DNAbin('data.fas')
# setting pops
data.pop <- as.factor(rep(c('AD', 'CD', 'FR', 'GE', 'RE', 'OT', 'YU', 'AU'), c(17, 11, 12, 12, 25, 14, 13, 20)))
Many thanks
Rita
More information about the adegenet-forum
mailing list