[Seqinr-forum] searching for sequences from author Arctander in 'genbank'
Coghlan, Avril
A.Coghlan at ucc.ie
Thu Dec 10 17:12:21 CET 2009
Dear Simon,
Sorry for the delay in my reply, the last two weeks were very hectic.
Thank you for your helpful reply, your explanation about the Aspergillus
sequences makes a lot of sense, and everything is becoming clear to me.
I have another question about why I get different numbers when searching
via SeqinR or via the NCBI website.
I have tried to figure out why but couldn't figure it out in this cases.
I will be grateful if you could shed light on this.
It is: when I go to www.ncbi.nlm.nih.gov and search the Nucleotide
database for:
Arctander[AU] NOT (srcdb_refseq[PROP] OR wgs[PROP] OR "synthetic
construct"[ORGN])
I get 2973 sequences
When I search the ACNUC 'genbank' database via R, using
'query("arctander","AU=@arctander@")', I get 2911 sequences.
I tried to figure out why the two lists of sequences don't match up.
There are some sequences found via the search on the NCBI website that
aren't found via ACNUC, and vice versa.
For example, this sequence is found via the search on the NCBI website
but not via ACNUC:
Accession AF106218
It has got 'Nyakaana,S. and Arctander,P.' in the AUTHORS field, so I'm
not sure why it wasn't found via ACNUC - do you know?
Regards, and thanks again for your help, I appreciate it very much.
Avril
-----Original Message-----
From: penel at biomserv.univ-lyon1.fr [mailto:penel at biomserv.univ-lyon1.fr]
Sent: 26 November 2009 14:22
To: Coghlan, Avril
Cc: seqinr-forum at r-forge.wu-wien.ac.at
Subject: Re: [Seqinr-forum] searching for sequences from Aspergillus
nidulans in 'genbank'
Dear Jean and Avril,
It seems that missing sequences are sequences from whole genome
shotgun.
These sequences are not included in ACNUC-Genbank because these data
are included in the ACNUC database "EMBL-wgs".
If you query emblwgs, you will find 248 sequences from Aspergillus
nidulans : the sequence AACD00000000 contains the 248 sequences.
Warning, you have access to this sequence via its accession number :
"ac=AACD00000000" not via its name.
Note : in the new seqinr function it may be useful to check both the
the accession number ad the name to avoid this type of problems?
All teh best
Simon
More information about the Seqinr-forum
mailing list