[Seqinr-forum] Problem to get sequences from GenBank using query function

Eme, David D.Eme at massey.ac.nz
Thu Mar 7 05:45:09 CET 2019


Dear seqinr users,


I'm trying to extract DNA sequences from GenBank using the "query" function from seqinr (I used to use it a lot a year ago and it was working well), but I got an erratic behaviour from the function reporting an error message for a large number of species which have DNA sequences that I was able to get using the same function a year ago (sequences that can also be found through the NCBI web platform).


Here an example with a problematic species (Pterois antennata, taxonomic ID = 185882) in the following web page https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=185882 we can see the presence of 44 sequences in the Nucleotides database (sequences that I could get a year ago, using the same code).

Taxonomy Browser<https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=185882>
www.ncbi.nlm.nih.gov
THE NCBI Taxonomy database allows browsing of the taxonomy tree, which contains a classification of organisms.

Below the code I have used (I have tried on different computers but I'm getting the same results).


library(seqinr)

choosebank("genbank")

ee = seqinr::query("sp=Pterois antennata")

# the first time I got the following error message

Error in readLines(socket, n = nelem, ok = FALSE) :
  too few lines read in readLines


# but trying again the same query I got the following error message:

ee = seqinr::query("sp=Pterois antennata")

Error in seqinr::query("sp=Pterois antennata") :
  invalid request:"FJ584021"


# I also tried using the NCBI taxid

ee = seqinr::query("tid=185882")
Error in seqinr::query("tid=185882") : invalid request:"FJ584022"



Interestingly the accession number reported in the error message for instance "FJ584022" is a DNA sequences belonging to the species of interest "Pterois antennata"


I don't really understand the problem considering that the function works for other species.


I'm about to finalize a revision of a manuscript using this function as part of a new R package, and I would be really glad if you can find a solution to this problem!!!


Thanks you very much in advance for your help greatly appreciated!


David Eme


PS: for information my sessionInfo() for the two computers i tried with are reported below.

sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.0
LAPACK: /usr/lib/lapack/liblapack.so.3.0

locale:
 [1] LC_CTYPE=en_NZ.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_NZ.UTF-8        LC_COLLATE=en_NZ.UTF-8
 [5] LC_MONETARY=en_NZ.UTF-8    LC_MESSAGES=en_NZ.UTF-8
 [7] LC_PAPER=en_NZ.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_NZ.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] seqinr_3.4-5   regPhylo_1.0.1

loaded via a namespace (and not attached):
[1] MASS_7.3-44    compiler_3.4.4 tools_3.4.4    ade4_1.7-13


sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134)

Matrix products: default

locale:
[1] LC_COLLATE=English_New Zealand.1252  LC_CTYPE=English_New Zealand.1252    LC_MONETARY=English_New Zealand.1252
[4] LC_NUMERIC=C                         LC_TIME=English_New Zealand.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] seqinr_3.4-5

loaded via a namespace (and not attached):
[1] MASS_7.3-51.1  compiler_3.5.2 ade4_1.7-13









-------------- section suivante --------------
Une pi?ce jointe HTML a ?t? nettoy?e...
URL: <http://lists.r-forge.r-project.org/pipermail/seqinr-forum/attachments/20190307/2c6e8d71/attachment.html>


More information about the Seqinr-forum mailing list