[adegenet-forum] DNAbin and pop

Jombart, Thibaut t.jombart at imperial.ac.uk
Mon Dec 16 06:33:58 CET 2013


Hello, 

yes, there are simpler ways. sub/gsub and regular expressions are immensely useful to extract information contained in the labels of sequences.

For instance:
##
> lab <- c("AD01012","AD666","FR1212","AD0101","FR9873")
> lab
[1] "AD01012" "AD666"   "FR1212"  "AD0101"  "FR9873" 
> pop <- gsub("[[:digit:]]","",lab)
> pop
[1] "AD" "AD" "FR" "AD" "FR"
##

For some useful examples, see ?sub and ?regexp

Cheers
Thibaut

________________________________________
From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Rita Castilho [rita.castil at gmail.com]
Sent: 16 December 2013 05:02
To: adegenet-forum at lists.r-forge.r-project.org
Subject: [adegenet-forum] DNAbin and pop

Hi!
I am new to R and I have a lot of trouble in going from a phylip or fasta file to a genind object or fasta2DNAbin containing pop information.
My files are always phylip or fasta files, and sequences have a reference composed of an di-alpha followed by 4 numeric digits (e.g. CD1495). The first two letters determine the population to which the sequence belongs to.

Is there a quick way to do it instead of doing this, as the grouping factor can be easily deduced from the current individual labels, saving the task of read that info R separately?

#reading data
dna <- fasta2DNAbin('data.fas')
# setting pops
data.pop <- as.factor(rep(c('AD', 'CD', 'FR', 'GE', 'RE', 'OT', 'YU', 'AU'), c(17, 11, 12, 12, 25, 14, 13, 20)))

Many thanks
Rita



More information about the adegenet-forum mailing list