[Seqinr-forum] Problem with query when species are not in Genbank

Simon Penel simon.penel at univ-lyon1.fr
Fri Jun 8 10:37:13 CEST 2012


Dear Maxime,

what you obtain is something which is expected: you may have several 
sequences for each species.

The namesp list will not give you the name of the species associated to 
each sequence, but  the list of all species which are associated to the 
list of sequences.

For example if you try

query("seqsp","sp=PAPAVER ORIENTALE and K=rbcl")


you get 2 sequences from the same unique species PAPAVER ORIENTAL (taxid 
22694)

I hope this answer to your question

all the best,

Simon




Le 08/06/12 09:41, Maxime Réjou-Méchain a écrit :
> Dear all,
>
> here I come again for a problem with the query function. I still have some
> difficulties to obtain all the taxa names of my retrieve sequences.
>
> Here is my code:
>
>> query("seqsp","sp=eudicotyledons and K=rbcl")
>> TOTseq=sapply(seqsp$req, getSequence, as.string = TRUE)
>> length(TOTseq)
> 30609
>> TOTname=getName(seqsp)
>> length(TOTname)
> 30609
>> query("sp","PS seqsp")
>> namesp=getName(sp)
>> length(namesp)
> 15831
>
> As cou can see, the length of my namesp vector is much lower than the
> number of retrieved sequences and I do not understand why.
> Is someone can explain me why I can not retrieve all the taxa names
> associated with each sequence?
>
> Many thanks in advance for your help,
>
> Maxime
>
>
>
>
>
> 2012/5/31 Maxime Réjou-Méchain<maxime.rejou at gmail.com>
>
>> This is indeed a great solution,
>>
>> Many Thanks!
>>
>> Maxime
>>
>>
>> 2012/5/31 Leonor Palmeira<mlpalmeira at ulg.ac.be>
>>
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> Dear Maxime,
>>>
>>> what you could try is get all species that are in "genbank" by:
>>>
>>> query("seqsp","K=rbcl", virtual=T)
>>>
>>> and then project all these sequences to species:
>>>
>>> query("sp","PS seqsp")
>>> namesp=getName(sp)
>>>
>>> Once you have all the species names present in "genbank", you can make a
>>> condition on the iteration on 'Genustobefind', if you don't match a name
>>> in 'namesp', the query is not sent.
>>>
>>> Best,
>>> Leonor.
>>>
>>> On 31/05/12 15:33, Maxime Réjou-Méchain wrote:
>>>> Dear all,
>>>>
>>>> I am currently using a loop with the function query to know which
>>> genera in
>>>> my list (n>2000 genera) have some sequences in genbank.
>>>>
>>>> Here is the loop:
>>>>
>>>> choosebank("genbank")
>>>> sequences=vector(mode="numeric",length=length(Genustobefind))
>>>> for (i in 1:length(Genustobefind)){
>>>> print(paste("Retrieving sequence ",Genustobefind[i]," i=",i,sep=""))
>>>> query1=paste("SP=",Genustobefind[i], " AND K=rbcl", sep="")
>>>> query("bb", query1,virtual=T)
>>>> sequences[i]=bb$nelem
>>>> }
>>>>
>>>> The loop work well but a lot of genera block the query function. For
>>>> example, if I do
>>>>
>>>> query("bb", "SP=Acanthodium AND K=rbcl",virtual=T)
>>>>
>>>> The function run without stopping. I thus have to break the loop.
>>>>
>>>> For information, if I submit manually the genera that block the query
>>>> function to genbank I obtain in the main cases:
>>>>
>>>>     - The following term was not found in Nucleotide: Acanthodium.
>>>>     - See the search
>>>> details<http://www.ncbi.nlm.nih.gov/nuccore/details?querykey=1>.
>>>>
>>>>     - No items found.
>>>>
>>>> In fact all these genera are not present in Genbank. Have you any idea
>>> to
>>>> overcome this problem? I just want to have a value of zero for these
>>> genera
>>>> and to continue my loop.
>>>> Many thanks in advance for your help,
>>>>
>>>> Maxime
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Seqinr-forum mailing list
>>>> Seqinr-forum at lists.r-forge.r-project.org
>>>>
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/seqinr-forum
>>>
>>> - --
>>> Leonor Palmeira, PhD
>>>
>>> Phone: +32 4 366 42 69
>>> Email: mlpalmeira AT ulg DOT ac DOT be
>>> http://sites.google.com/site/leonorpalmeira
>>>
>>> Immunology-Vaccinology, Bat. B43b
>>> Faculty of Veterinary Medicine
>>> Boulevard de Colonster, 20
>>> University of Liege, B-4000 Liege (Sart-Tilman)
>>> Belgium
>>> -----BEGIN PGP SIGNATURE-----
>>> Version: GnuPG v1.4.10 (GNU/Linux)
>>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>>>
>>> iQEcBAEBAgAGBQJPx3ZZAAoJEKquFGwgRb3za8gIAMB7kIRp/vciSlvgd9tLdt12
>>> wAZLGpMoP6drWsewDGMfk6yfAeOxGM5Q3U4HkHdzC6ilZ2gL2yEaFMfwEYHSqVFh
>>> EFz7GstvI/6zZ6yAsup6CVlbYOQTPLR7eh9rUB74Ik+IT03tjAyLQc5jp1eRcmmv
>>> nRrv+LqLtRUHnDdR+EWh5O7+AFb3q2h9WyeiTzkAiSmwBnYzioVVocwgt+4WHIDc
>>> +6w0aOrK7JTeHkb7WZTBiqfpqQ+1WLrVf88XNK+TZqB71nFms8losZ1jmJbSNRoZ
>>> r1YoSljG6uhwpT5k3w4uy2JZT+vAv5keoPkFFbtpKyLYoZgdZTH6HYi6Eiv+mJU=
>>> =P3tV
>>> -----END PGP SIGNATURE-----
>>>
>>
>>
>> --
>> Maxime Réjou-Méchain
>> Laboratoire Evolution et Diversité Biologique UMR 5174
>> Centre National Recherche Scientifique/Université Paul Sabatier
>> Bâtiment 4R1 31062 Toulouse
>> France
>> Personal website<https://sites.google.com/site/maximerejoumechain/>
>> +33 (0) 5-61-55-85-81
>>   <maxime.rejou at gmail.com>
>>
>
>
> _______________________________________________
> Seqinr-forum mailing list
> Seqinr-forum at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/seqinr-forum


-- 
Simon Penel
Laboratoire de Biometrie et Biologie Evolutive
Bat 711 - CNRS UMR 5558 - Universite Lyon 1
43 bd du 11 novembre 1918 69622 Villeurbanne Cedex
Tel:   04 72 43 29 04      Fax:  04 72 43 13 88
http://lbbe.univ-lyon1.fr/-Penel-Simon-.html?lang=fr

ATTENTION NOUVELLE ADRESSE:
simon.penel at univ-lyon1.fr


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/seqinr-forum/attachments/20120608/04f8999d/attachment.html>


More information about the Seqinr-forum mailing list