[adegenet-forum] DNAbin and pop

Mon Dec 16 07:42:34 CET 2013

Dear Thibaut

Thanks for the prompt reply!
Unfortunately I do not see how that improves on the example given.
When one uses allelic data, there are simple (automatic) ways to build a 
genind object that includes the factor pop or even a xy coordinates 
factor. That is because the read.file functions available include that 
possibility (read.genepop, retains the pop info, read.genalex, retains 
pop, and xy info). And there is no need of further manipulations. So I 
was looking for something similar, perhaps not a read.file function, 
because read.fasta does not include that, but a set of scritps that will 
do it.
I saw another previous suggestion of yours, but it implies still an 
extra file:
popFac <- read.csv("oneColumnFileWithMyGroupsInIt.csv")
popFac <- factor(unlist(popFac))
pop(obj) <- popFac

and in any case I could not understand how to use it, as I get an error:

data.dnabin <- fasta2DNAbin("Engraulis_P3_mtDNA.fas")
popFac <- read.csv("Engraulis_P3_mtDNA_pops.csv")
popFac <- factor(unlist(popFac))
pop(data.dnabin) <- popFac

Error in (function (classes, fdef, mtable)  :
   unable to find an inherited method for function 'pop<-' for signature 
'"DNAbin"'

It would be neat to have a way of reading from the fasta/phylip files 
the first two letters, and use them as factors. I am not familiarized 
with R enough to be able to do it. I just use the packages, and most of 
the times I have a hard time to get things working, because the 
departure examples include R.data, which are not very useful for the 
beginners.

In any case I appreciate your efforts towards programming for the community!

Best
Rita

> Jombart, Thibaut <mailto:t.jombart at imperial.ac.uk>
> December 16, 2013 5:33 AM
> Hello,
>
> yes, there are simpler ways. sub/gsub and regular expressions are immensely useful to extract information contained in the labels of sequences.
>
> For instance:
> ##
>> lab<- c("AD01012","AD666","FR1212","AD0101","FR9873")
>> lab
> [1] "AD01012" "AD666"   "FR1212"  "AD0101"  "FR9873"
>> pop<- gsub("[[:digit:]]","",lab)
>> pop
> [1] "AD" "AD" "FR" "AD" "FR"
> ##
>
> For some useful examples, see ?sub and ?regexp
>
> Cheers
> Thibaut
>
> ________________________________________
> From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Rita Castilho [rita.castil at gmail.com]
> Sent: 16 December 2013 05:02
> To: adegenet-forum at lists.r-forge.r-project.org
> Subject: [adegenet-forum] DNAbin and pop
>
> Hi!
> I am new to R and I have a lot of trouble in going from a phylip or fasta file to a genind object or fasta2DNAbin containing pop information.
> My files are always phylip or fasta files, and sequences have a reference composed of an di-alpha followed by 4 numeric digits (e.g. CD1495). The first two letters determine the population to which the sequence belongs to.
>
> Is there a quick way to do it instead of doing this, as the grouping factor can be easily deduced from the current individual labels, saving the task of read that info R separately?
>
> #reading data
> dna<- fasta2DNAbin('data.fas')
> # setting pops
> data.pop<- as.factor(rep(c('AD', 'CD', 'FR', 'GE', 'RE', 'OT', 'YU', 'AU'), c(17, 11, 12, 12, 25, 14, 13, 20)))
>
> Many thanks
> Rita
>
>
> Rita Castilho <mailto:rita.castil at gmail.com>
> December 16, 2013 5:02 AM
> Hi!
> I am new to R and I have a lot of trouble in going from a phylip or 
> fasta file to a genind object or fasta2DNAbin containing pop information.
> My files are always phylip or fasta files, and sequences have a 
> reference composed of an di-alpha followed by 4 numeric digits (e.g. 
> CD1495). The first two letters determine the population to which the 
> sequence belongs to.
>
> Is there a quick way to do it instead of doing this, as the grouping 
> factor can be easily deduced from the current individual labels, 
> saving the task of read that info R separately?
>
> #reading data
> dna <- fasta2DNAbin('data.fas')
> # setting pops
> data.pop <- as.factor(rep(c('AD', 'CD', 'FR', 'GE', 'RE', 'OT', 'YU', 
> 'AU'), c(17, 11, 12, 12, 25, 14,13,20)))
>
> Many thanks
> Rita
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20131216/655c9488/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: compose-unknown-contact.jpg
Type: image/jpeg
Size: 770 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20131216/655c9488/attachment-0001.jpg>