[Seqinr-forum] where.is.this.acc() function

Coghlan, Avril A.Coghlan at ucc.ie
Sat Nov 14 17:25:47 CET 2009


Dear Jean,

The where.is.this.acc() function is very nice! I tried it and it works
perfectly for me. 

Once tiny suggestion that I have is that it seems to keep checking all
the databases for an accession, even after it has found the accession in
one of the databases. I guess it would make it slightly faster to run if
it stopped looking for the accession once it has found it in one of the
databases? 

Regards, and thanks again,
Avril

-----Original Message-----
From: Jean lobry [mailto:lobry at biomserv.univ-lyon1.fr] 
Sent: 12 November 2009 18:37
To: seqinr-forum at r-forge.wu-wien.ac.at
Cc: Coghlan, Avril
Subject: RE: [Seqinr-forum] querying genbank to get the sequence for an
accession

Dear Avril,

>
>Thank you so much for your helpful reply.
>

You're welcome!

>That makes a lot of sense, and it's clear to me now.
>
>I think it's a great idea to provide a function where.is.this.acc()
that would
>tell the user which database to find a particular accession number in.

OK, I have just commited it for seqinR release 2.0-7 which will be
available very soon on CRAN because we have to correct an error
generated with R 2.11 (schedulded release in april 2010).

If you want to give it a try, the source code is already available
in the svn repository in the link below, you just have to copy/paste
it in your R console.

http://r-forge.r-project.org/plugins/scmsvn/viewcvs.php/pkg/R/where.is.t
his.acc.R?rev=1704&root=seqinr&view=markup

>I am also wondering whether you only store some accession numbers and
not
>others in ACNUC?
>For example, the Haemophilus influenzae Rd KW20 genome sequence is
stored in
>GenBank with accessions L42023 (the original submission) and NC_000907.
>
>I find that I can find the sequence by typing in R:
>>  library("seqinr")
>>  choosebank("genbank")
>>  query("haemophilus","AC=L42023")
>
>However, it doesn't work for me to type:
>>  query("haemophilus", "NC_000907")
>I get an error message:
>Error in query("haempphilus", "AC=NC_000907") :
>   invalid request:"unknown accession number at (^): \"AC
>
>I'm wondering if the H. influenza genome sequence is only stored in
ACNUC with
>just one accession (L42023), or should I also be able to search for
accession
>NC_000907 somehow?
>
>Thanks again for your help, I appreciate it very much.
>

Don't want to give an authoritative answer with the server down.
My guess is that there is a single accession number (by definition)
but that the others may be accessible by keyword queries.

Best,

Jean
-- 
Jean R. Lobry            (lobry at biomserv.univ-lyon1.fr)
Laboratoire BBE-CNRS-UMR-5558, Univ. C. Bernard - LYON I,
43 Bd 11/11/1918, F-69622 VILLEURBANNE CEDEX, FRANCE
allo  : +33 472 43 27 56     fax    : +33 472 43 13 88
http://pbil.univ-lyon1.fr/members/lobry/




More information about the Seqinr-forum mailing list