[Seqinr-forum] Problem with query when species are not in Genbank

Maxime Réjou-Méchain maxime.rejou at gmail.com
Fri Jun 8 09:41:05 CEST 2012


Dear all,

here I come again for a problem with the query function. I still have some
difficulties to obtain all the taxa names of my retrieve sequences.

Here is my code:

>query("seqsp","sp=eudicotyledons and K=rbcl")
>TOTseq=sapply(seqsp$req, getSequence, as.string = TRUE)
>length(TOTseq)
30609
>TOTname=getName(seqsp)
>length(TOTname)
30609
>query("sp","PS seqsp")
>namesp=getName(sp)
>length(namesp)
15831

As cou can see, the length of my namesp vector is much lower than the
number of retrieved sequences and I do not understand why.
Is someone can explain me why I can not retrieve all the taxa names
associated with each sequence?

Many thanks in advance for your help,

Maxime





2012/5/31 Maxime Réjou-Méchain <maxime.rejou at gmail.com>

> This is indeed a great solution,
>
> Many Thanks!
>
> Maxime
>
>
> 2012/5/31 Leonor Palmeira <mlpalmeira at ulg.ac.be>
>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Dear Maxime,
>>
>> what you could try is get all species that are in "genbank" by:
>>
>> query("seqsp","K=rbcl", virtual=T)
>>
>> and then project all these sequences to species:
>>
>> query("sp","PS seqsp")
>> namesp=getName(sp)
>>
>> Once you have all the species names present in "genbank", you can make a
>> condition on the iteration on 'Genustobefind', if you don't match a name
>> in 'namesp', the query is not sent.
>>
>> Best,
>> Leonor.
>>
>> On 31/05/12 15:33, Maxime Réjou-Méchain wrote:
>> > Dear all,
>> >
>> > I am currently using a loop with the function query to know which
>> genera in
>> > my list (n>2000 genera) have some sequences in genbank.
>> >
>> > Here is the loop:
>> >
>> > choosebank("genbank")
>> > sequences=vector(mode="numeric",length=length(Genustobefind))
>> > for (i in 1:length(Genustobefind)){
>> > print(paste("Retrieving sequence ",Genustobefind[i]," i=",i,sep=""))
>> > query1=paste("SP=",Genustobefind[i], " AND K=rbcl", sep="")
>> > query("bb", query1,virtual=T)
>> > sequences[i]=bb$nelem
>> > }
>> >
>> > The loop work well but a lot of genera block the query function. For
>> > example, if I do
>> >
>> > query("bb", "SP=Acanthodium AND K=rbcl",virtual=T)
>> >
>> > The function run without stopping. I thus have to break the loop.
>> >
>> > For information, if I submit manually the genera that block the query
>> > function to genbank I obtain in the main cases:
>> >
>> >    - The following term was not found in Nucleotide: Acanthodium.
>> >    - See the search
>> > details<http://www.ncbi.nlm.nih.gov/nuccore/details?querykey=1>.
>> >
>> >    - No items found.
>> >
>> > In fact all these genera are not present in Genbank. Have you any idea
>> to
>> > overcome this problem? I just want to have a value of zero for these
>> genera
>> > and to continue my loop.
>> > Many thanks in advance for your help,
>> >
>> > Maxime
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > Seqinr-forum mailing list
>> > Seqinr-forum at lists.r-forge.r-project.org
>> >
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/seqinr-forum
>>
>> - --
>> Leonor Palmeira, PhD
>>
>> Phone: +32 4 366 42 69
>> Email: mlpalmeira AT ulg DOT ac DOT be
>> http://sites.google.com/site/leonorpalmeira
>>
>> Immunology-Vaccinology, Bat. B43b
>> Faculty of Veterinary Medicine
>> Boulevard de Colonster, 20
>> University of Liege, B-4000 Liege (Sart-Tilman)
>> Belgium
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.4.10 (GNU/Linux)
>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>>
>> iQEcBAEBAgAGBQJPx3ZZAAoJEKquFGwgRb3za8gIAMB7kIRp/vciSlvgd9tLdt12
>> wAZLGpMoP6drWsewDGMfk6yfAeOxGM5Q3U4HkHdzC6ilZ2gL2yEaFMfwEYHSqVFh
>> EFz7GstvI/6zZ6yAsup6CVlbYOQTPLR7eh9rUB74Ik+IT03tjAyLQc5jp1eRcmmv
>> nRrv+LqLtRUHnDdR+EWh5O7+AFb3q2h9WyeiTzkAiSmwBnYzioVVocwgt+4WHIDc
>> +6w0aOrK7JTeHkb7WZTBiqfpqQ+1WLrVf88XNK+TZqB71nFms8losZ1jmJbSNRoZ
>> r1YoSljG6uhwpT5k3w4uy2JZT+vAv5keoPkFFbtpKyLYoZgdZTH6HYi6Eiv+mJU=
>> =P3tV
>> -----END PGP SIGNATURE-----
>>
>
>
>
> --
> Maxime Réjou-Méchain
> Laboratoire Evolution et Diversité Biologique UMR 5174
> Centre National Recherche Scientifique/Université Paul Sabatier
> Bâtiment 4R1 31062 Toulouse
> France
> Personal website <https://sites.google.com/site/maximerejoumechain/>
> +33 (0) 5-61-55-85-81
>  <maxime.rejou at gmail.com>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/seqinr-forum/attachments/20120608/2ffaa2d3/attachment.html>


More information about the Seqinr-forum mailing list