[Seqinr-forum] querying genbank to get the sequence for an accession

Coghlan, Avril A.Coghlan at ucc.ie
Tue Nov 10 14:06:17 CET 2009


Dear Jean,

Thank you so much for your helpful reply.

That makes a lot of sense, and it's clear to me now. 

I think it's a great idea to provide a function where.is.this.acc() that would tell the user which database to find a particular accession number in. 

I am also wondering whether you only store some accession numbers and not others in ACNUC?
For example, the Haemophilus influenzae Rd KW20 genome sequence is stored in GenBank with accessions L42023 (the original submission) and NC_000907. 

I find that I can find the sequence by typing in R:
> library("seqinr")
> choosebank("genbank")
> query("haemophilus","AC=L42023")

However, it doesn't work for me to type:
> query("haemophilus", "NC_000907")
I get an error message:
Error in query("haempphilus", "AC=NC_000907") : 
  invalid request:"unknown accession number at (^): \"AC

I'm wondering if the H. influenza genome sequence is only stored in ACNUC with just one accession (L42023), or should I also be able to search for accession NC_000907 somehow?

Thanks again for your help, I appreciate it very much.

Kind Regards,
Avril


-----Original Message-----
From: Jean lobry [mailto:lobry at biomserv.univ-lyon1.fr] 
Sent: 09 November 2009 16:23
To: seqinr-forum at r-forge.wu-wien.ac.at
Cc: Coghlan, Avril
Subject: Re: [Seqinr-forum] querying genbank to get the sequence for an accession

>  Dear Colleagues,
>
>  I am just learning how to use SeqinR and it looks extremely
>  useful.
>
>  I am trying to figure out how I can use SeqinR to query the
>  genbank database to get the sequence corresponding to a
>  particular accession number, but am having a problem.
>
>  For example, the accession number corresponding to the
>  Bacteriophage lambda genome is NC_001416. I tried to get
>  this sequence by typing in R:
>
>  Ø  library("seqinr")
>
>  Ø  choosebank("genbank")
>
>
>  Ø  query("lambda","AC=NC_001416")
>
>  The first two commands work fine, however, I get this error
>  message from the query command: Error in query("lambda",
>  "AC=NC_000907") :  invalid request:"unknown accession number
>  at (^): \"AC
>
>  I am not sure what I am doing wrong, as other requests are
>  working fine for me, eg. the command
>
>  Ø  query("lambda","AC=CP001252")
>
>  works fine for me.
>
>  Does anyone know what I am doing wrong? I will be very
>  grateful for any advice you can give.
>
>  Regards, Avril
>
>  Avril Coghlan Department of Microbiology University College
>  Cork Ireland

Dear Avril,

the message just means that there is no sequence with the corresponding
accession number in the last opened database. In your case this is
because complete genome sequences are stored on different ACNUC databases
at pbil. What you want is:

#######
library(seqinr)
choosebank("refseqViruses")
query("lambda", "AC=NC_001416")
#######

Best,

Jean

P.S. I wonder if we shouldn't provide a function where.is.this.acc() that would
loop over all available databases to locate the databases where a given
accession number is present.
-- 
Jean R. Lobry            (lobry at biomserv.univ-lyon1.fr)
Laboratoire BBE-CNRS-UMR-5558, Univ. C. Bernard - LYON I,
43 Bd 11/11/1918, F-69622 VILLEURBANNE CEDEX, FRANCE
allo  : +33 472 43 27 56     fax    : +33 472 43 13 88
http://pbil.univ-lyon1.fr/members/lobry/




More information about the Seqinr-forum mailing list