[Seqinr-forum] Problem to get sequences from GenBank using query function
Eme, David
D.Eme at massey.ac.nz
Thu Mar 7 05:45:09 CET 2019
Dear seqinr users,
I'm trying to extract DNA sequences from GenBank using the "query" function from seqinr (I used to use it a lot a year ago and it was working well), but I got an erratic behaviour from the function reporting an error message for a large number of species which have DNA sequences that I was able to get using the same function a year ago (sequences that can also be found through the NCBI web platform).
Here an example with a problematic species (Pterois antennata, taxonomic ID = 185882) in the following web page https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=185882 we can see the presence of 44 sequences in the Nucleotides database (sequences that I could get a year ago, using the same code).
Taxonomy Browser<https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=185882>
www.ncbi.nlm.nih.gov
THE NCBI Taxonomy database allows browsing of the taxonomy tree, which contains a classification of organisms.
Below the code I have used (I have tried on different computers but I'm getting the same results).
library(seqinr)
choosebank("genbank")
ee = seqinr::query("sp=Pterois antennata")
# the first time I got the following error message
Error in readLines(socket, n = nelem, ok = FALSE) :
too few lines read in readLines
# but trying again the same query I got the following error message:
ee = seqinr::query("sp=Pterois antennata")
Error in seqinr::query("sp=Pterois antennata") :
invalid request:"FJ584021"
# I also tried using the NCBI taxid
ee = seqinr::query("tid=185882")
Error in seqinr::query("tid=185882") : invalid request:"FJ584022"
Interestingly the accession number reported in the error message for instance "FJ584022" is a DNA sequences belonging to the species of interest "Pterois antennata"
I don't really understand the problem considering that the function works for other species.
I'm about to finalize a revision of a manuscript using this function as part of a new R package, and I would be really glad if you can find a solution to this problem!!!
Thanks you very much in advance for your help greatly appreciated!
David Eme
PS: for information my sessionInfo() for the two computers i tried with are reported below.
sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS
Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.0
LAPACK: /usr/lib/lapack/liblapack.so.3.0
locale:
[1] LC_CTYPE=en_NZ.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_NZ.UTF-8 LC_COLLATE=en_NZ.UTF-8
[5] LC_MONETARY=en_NZ.UTF-8 LC_MESSAGES=en_NZ.UTF-8
[7] LC_PAPER=en_NZ.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_NZ.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] seqinr_3.4-5 regPhylo_1.0.1
loaded via a namespace (and not attached):
[1] MASS_7.3-44 compiler_3.4.4 tools_3.4.4 ade4_1.7-13
sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134)
Matrix products: default
locale:
[1] LC_COLLATE=English_New Zealand.1252 LC_CTYPE=English_New Zealand.1252 LC_MONETARY=English_New Zealand.1252
[4] LC_NUMERIC=C LC_TIME=English_New Zealand.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] seqinr_3.4-5
loaded via a namespace (and not attached):
[1] MASS_7.3-51.1 compiler_3.5.2 ade4_1.7-13
-------------- section suivante --------------
Une pi?ce jointe HTML a ?t? nettoy?e...
URL: <http://lists.r-forge.r-project.org/pipermail/seqinr-forum/attachments/20190307/2c6e8d71/attachment.html>
More information about the Seqinr-forum
mailing list