[Seqinr-forum] searching for sequences from Aspergillus nidulans in 'genbank'

Jean lobry lobry at biomserv.univ-lyon1.fr
Sun Nov 15 11:40:52 CET 2009


>Dear SeqinR forum,
>
>Today I used SeqinR to retrieve sequences from the fungus Aspergillus
>nidulans from the ACNUC 'genbank' database, using the commands:
>>  choosebank("genbank")
>>  query("anidulans","SP=aspergillus nidulans")
>>  anidulans$nelem
>[1] 18948
>This means that there were 18948 sequences from Aspergillus nidulans
>found.
>
>As far as I understand it, the ACNUC 'genbank' database corresponds to
>the NCBI Nucleotide database, is that right?
>
>I also did a search directly of the NCBI Nucleotide database on the NCBI
>website for Aspergillus nidulans sequences, by going to
>http://www.ncbi.nlm.nih.gov/nucleotide/ and searching for "Aspergillus
>nidulans"[ORGN]. That search found 29119 nucleotide sequences (12271 of
>which are ESTs).
>
>I am wondering why there is a difference between the search that I did
>of the ACNUC 'genbank' database, and on the NCBI Nucleotide Database
>website?
>I don't think it can be due to the ACNUC database missing some sequences
>recently submitted to NCBI, as the ACNUC website says that the ACNUC
>'genbank' database was very recently updated, on Nov 13, 2009 (from
>http://pbil.univ-lyon1.fr/cgi-bin/get_relnum?db=GenBank&ident=1929541324
>).

Dear Avril,

genbank is organized into general divisions:

http://www.ncbi.nlm.nih.gov/HTGS/divisions.html

They are not all included in our ACNUC database for genbank.

IIRC, the "functional divisions" (viz. EST, STS, GSS and HTG) are
not included. This explains the difference between the two results.

@Simon: am I correct here?

>I will be very grateful for your advice, as I would like to use the
>SeqinR library for a bioinformatics practical for students, and want to
>make sure I understand how it works.

For a practical for students I would suggest to use frozen databases.
They are accessible with the special value "TP" for the "tagbank"
argument on opening (TP means "Travaux Pratiques" which is french
for practicals).

######
>  library(seqinr)
>  choosebank(tagbank = "TP")
[1] "emblTP"      "swissprotTP" "hoverprotTP" "hovernuclTP" "trypano"
>  choosebank("emblTP")
>  banknameSocket$details
[1] "             ****     ACNUC Data Base Content      **** 
"
[2] "              EMBL Library Release 78 WITHOUT ESTs  (March 2004)"
[3] "27,571,397,913 bases; 12,533,594 sequences; 1,604,500 subseqs; 
339,186 refers."
[4] "Software by M. Gouy & M. Jacobzone, Laboratoire de biometrie, 
Universite Lyon I "
>  query("anidulans","SP=aspergillus nidulans")
>  anidulans$nelem
[1] 218
######

There are only 218 sequences for Aspergillus nidulans in this frozen
version of EMBL, but the advantage is that the results are stable over
time. Your practical will be ready unchanged for next year.

Best,
-- 
Jean R. Lobry            (lobry at biomserv.univ-lyon1.fr)
Laboratoire BBE-CNRS-UMR-5558, Univ. C. Bernard - LYON I,
43 Bd 11/11/1918, F-69622 VILLEURBANNE CEDEX, FRANCE
allo  : +33 472 43 27 56     fax    : +33 472 43 13 88
http://pbil.univ-lyon1.fr/members/lobry/




More information about the Seqinr-forum mailing list