<html><head>
<meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type">
</head><body style="font-family: Arial; font-size: 14pt;" wsmode="reply"
bgcolor="#FFFFFF" text="#000000"><div style="font-size:
14pt;font-family: Arial;">Dear Thibaut<br><br>Thanks for the prompt
reply! <br>Unfortunately I do not see how that improves on the example
given. <br>When one uses allelic data, there are simple (automatic) ways
to build a genind object that includes the factor pop or even a xy
coordinates factor. That is because the read.file functions available
include that possibility (read.genepop, retains the pop info,
read.genalex, retains pop, and xy info). And there is no need of further
manipulations. So I was looking for something similar, perhaps not a
read.file function, because read.fasta does not include that, but a set
of scritps that will do it. <br>I saw another previous suggestion of
yours, <span>but it implies still an extra file:</span><br><small>popFac
<- read.csv("oneColumnFileWithMyGroupsInIt.csv")<br>popFac <-
factor(unlist(popFac))<br>pop(obj) <- popFac</small><br><br>and in
any case I could not understand how to use it, as I get an error:<br><br><small>data.dnabin
<- fasta2DNAbin("Engraulis_P3_mtDNA.fas")<br>popFac <-
read.csv("Engraulis_P3_mtDNA_pops.csv")<br>popFac <-
factor(unlist(popFac))<br>pop(data.dnabin) <- popFac</small><br><br>Error
in (function (classes, fdef, mtable) : <br> unable to find an
inherited method for function ‘pop<-’ for signature ‘"DNAbin"’<br><br>It
would be neat to have a way of reading from the fasta/phylip files the
first two letters, and use them as factors. I am not familiarized with R
enough to be able to do it. I just use the packages, and most of the
times I have a hard time to get things working, because the departure
examples include R.data, which are not very useful for the beginners.<br><br>In
any case I appreciate your efforts towards programming for the
community!<br><br><br>Best<br>Rita<br><br><br><br><br><blockquote
style="border: 0px none;"
cite="mid:2CB2DA8E426F3541AB1907F98ABA657075F13A67@icexch-m2.ic.ac.uk"
type="cite"><div style="margin:30px 25px 10px 25px;" class="__pbConvHr"><div
style="display:table;width:100%;border-top:1px solid
#EDEEF0;padding-top:5px"> <div
style="display:table-cell;vertical-align:middle;padding-right:6px;"><img
photoaddress="t.jombart@imperial.ac.uk" photoname="Jombart, Thibaut"
src="cid:part1.05070704.06000907@gmail.com"
name="compose-unknown-contact.jpg" height="25px" width="25px"></div> <div
style="display:table-cell;white-space:nowrap;vertical-align:middle;width:100%">
<a moz-do-not-send="true" href="mailto:t.jombart@imperial.ac.uk"
style="color:#737F92
!important;padding-right:6px;font-weight:bold;text-decoration:none
!important;">Jombart, Thibaut</a></div> <div
style="display:table-cell;white-space:nowrap;vertical-align:middle;">
<font color="#9FA2A5"><span style="padding-left:6px">December 16, 2013
5:33 AM</span></font></div></div></div><div
style="color:#888888;margin-left:24px;margin-right:24px;"
__pbrmquotes="true" class="__pbConvBody"><pre wrap="">Hello,
yes, there are simpler ways. sub/gsub and regular expressions are immensely useful to extract information contained in the labels of sequences.
For instance:
##
</pre><blockquote type="cite"><pre wrap="">lab <- c("AD01012","AD666","FR1212","AD0101","FR9873")
lab
</pre></blockquote><pre wrap=""><!---->[1] "AD01012" "AD666" "FR1212" "AD0101" "FR9873"
</pre><blockquote type="cite"><pre wrap="">pop <- gsub("[[:digit:]]","",lab)
pop
</pre></blockquote><pre wrap=""><!---->[1] "AD" "AD" "FR" "AD" "FR"
##
For some useful examples, see ?sub and ?regexp
Cheers
Thibaut
________________________________________
From: <a class="moz-txt-link-abbreviated" href="mailto:adegenet-forum-bounces@lists.r-forge.r-project.org">adegenet-forum-bounces@lists.r-forge.r-project.org</a> [<a class="moz-txt-link-abbreviated" href="mailto:adegenet-forum-bounces@lists.r-forge.r-project.org">adegenet-forum-bounces@lists.r-forge.r-project.org</a>] on behalf of Rita Castilho [<a class="moz-txt-link-abbreviated" href="mailto:rita.castil@gmail.com">rita.castil@gmail.com</a>]
Sent: 16 December 2013 05:02
To: <a class="moz-txt-link-abbreviated" href="mailto:adegenet-forum@lists.r-forge.r-project.org">adegenet-forum@lists.r-forge.r-project.org</a>
Subject: [adegenet-forum] DNAbin and pop
Hi!
I am new to R and I have a lot of trouble in going from a phylip or fasta file to a genind object or fasta2DNAbin containing pop information.
My files are always phylip or fasta files, and sequences have a reference composed of an di-alpha followed by 4 numeric digits (e.g. CD1495). The first two letters determine the population to which the sequence belongs to.
Is there a quick way to do it instead of doing this, as the grouping factor can be easily deduced from the current individual labels, saving the task of read that info R separately?
#reading data
dna <- fasta2DNAbin('data.fas')
# setting pops
data.pop <- as.factor(rep(c('AD', 'CD', 'FR', 'GE', 'RE', 'OT', 'YU', 'AU'), c(17, 11, 12, 12, 25, 14, 13, 20)))
Many thanks
Rita
</pre></div><div style="margin:30px 25px 10px 25px;" class="__pbConvHr"><div
style="display:table;width:100%;border-top:1px solid
#EDEEF0;padding-top:5px"> <div
style="display:table-cell;vertical-align:middle;padding-right:6px;"><img
photoaddress="rita.castil@gmail.com" photoname="Rita Castilho"
src="cid:part1.05070704.06000907@gmail.com"
name="compose-unknown-contact.jpg" height="25px" width="25px"></div> <div
style="display:table-cell;white-space:nowrap;vertical-align:middle;width:100%">
<a moz-do-not-send="true" href="mailto:rita.castil@gmail.com"
style="color:#737F92
!important;padding-right:6px;font-weight:bold;text-decoration:none
!important;">Rita Castilho</a></div> <div
style="display:table-cell;white-space:nowrap;vertical-align:middle;">
<font color="#9FA2A5"><span style="padding-left:6px">December 16, 2013
5:02 AM</span></font></div></div></div><div
style="color:#888888;margin-left:24px;margin-right:24px;"
__pbrmquotes="true" class="__pbConvBody">
<meta content="text/html; charset=ISO-8859-1" http-equiv="content-type">
<div style="font-size: 14pt;font-family: Arial;"><span><div>Hi!<br>I am
new to R and I have a lot of trouble in going from a phylip or fasta
file to a genind object or fasta2DNAbin containing pop information.<br>My
files are always phylip or fasta files, and sequences have a reference
composed of an di-alpha followed by 4 numeric digits (e.g. CD1495). The
first two letters determine the population to which the sequence belongs
to.<br><br>Is there a quick way to do it instead of doing this, as the
grouping factor can be easily deduced from the current individual
labels, saving the task of read that info R separately?<br><br>#reading
data<br>dna <- fasta2DNAbin('data.fas')<br># setting pops<br>data.pop
<- as.factor(rep(c('AD', 'CD', 'FR', 'GE', 'RE', 'OT', 'YU', 'AU'),
c(17, <span style="display: inline; font-size: inherit; padding: 0pt;"
class="__postbox-detected-content __postbox-detected-date"
__postbox-detected-content="__postbox-detected-date">11, <span
style="display: inline; font-size: inherit; padding: 0pt;"
class="__postbox-detected-content __postbox-detected-date"
__postbox-detected-content="__postbox-detected-date">12, <span
style="display: inline; font-size: inherit; padding: 0pt;"
class="__postbox-detected-content __postbox-detected-date"
__postbox-detected-content="__postbox-detected-date">12, 25, 14,</span></span></span><span
style="display: inline; font-size: inherit; padding: 0pt;"
class="__postbox-detected-content __postbox-detected-date"
__postbox-detected-content="__postbox-detected-date"><span
style="display: inline; font-size: inh<br />erit; padding: 0pt;"
class="__postbox-detected-content __postbox-detected-date"
__postbox-detected-content="__postbox-detected-date"> 13,</span></span><span
style="display: inline; font-size: inherit; padding: 0pt;"
class="__postbox-detected-content __postbox-detected-date"
__postbox-detected-content="__postbox-detected-date"> 20)))</span><br><br>Many
thanks<br>Rita</div> </span></div>
</div></blockquote></div></body></html>