[Seqinr-forum] Problem to get sequences from GenBank using query function

Simon Penel simon.penel at univ-lyon1.fr
Thu Mar 7 11:12:39 CET 2019


Hello David,

I was able to reproduce the error you get.

it seems that the problem occurs since R 3.5.1. but this has to be checked.

It suspect  that the problem is due to response time, wich seems quite 
long now,

As a temporary solution, you may increase the minimum response time 
before the socket stops waiting.

I  was able to get rid off the problem using the following commands (be 
aware of the new usage of the query function : list names should be the 
same for result and in the query):


choosebank("genbank",timeout=20)
ee<-query("ee","sp=Pterois antennata")



The verbose version gives

ee<-query("ee","sp=Pterois antennata",verbose=T)
I'm checking the arguments...
... and everything is OK up to now.
I'm checking the status of the socket connection...
... and everything is OK up to now.
I'm sending query to server...
... answer from server is: code=0&lrank=2&count=44&type=SQ&locus=T
I'm trying to analyse answer from server...
... and everything is OK up to now.
... and the rank of the resulting list is: 2 .
... and there are 44 elements in the list.
... and the elements in the list are of type SQ .
... and there are only parent sequences in the list.
I'm trying to get the infos about the elements of the list...
... and I have received 44 lines as expected.
 >




My session info is

R version 3.5.2 (2018-12-20)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.5 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
  [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8
  [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8
  [7] LC_PAPER=fr_FR.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods base

other attached packages:
[1] seqinr_3.4-5

loaded via a namespace (and not attached):
[1] MASS_7.3-51.1  compiler_3.5.2 ade4_1.7-13


I hope this helps, at least for the moment

All the best

Simon


.Le 07/03/2019 à 05:45, Eme, David a écrit :
>
> Dear seqinr users,
>
>
> I'm trying to extract DNA sequences from GenBank using the "query" 
> function from seqinr (I used to use it a lot a year ago and it was 
> working well), but I got an erratic behaviour from the function 
> reporting an error message for a large number of species which have 
> DNA sequences that I was able to get using the same function a year 
> ago (sequences that can also be found through the NCBI web platform).
>
>
> Here an example with a problematic species (Pterois antennata, 
> taxonomic ID = 185882) in the following web page 
> https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=185882 we 
> can see the presence of 44 sequences in the Nucleotides database 
> (sequences that I could get a year ago, using the same code).
>
> Taxonomy Browser 
> <https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=185882>
> www.ncbi.nlm.nih.gov
> THE NCBI Taxonomy database allows browsing of the taxonomy tree, which 
> contains a classification of organisms.
>
> Below the code I have used (I have tried on different computers but 
> I'm getting the same results).
>
>
> library(seqinr)
>
> choosebank("genbank")
>
> ee = seqinr::query("sp=Pterois antennata")
>
> # the first time I got the following error message
>
> Error in readLines(socket, n = nelem, ok = FALSE) :
>   too few lines read in readLines
>
> # but trying again the same query I got the following error message:
>
> ee = seqinr::query("sp=Pterois antennata")
>
> Error in seqinr::query("sp=Pterois antennata") :
>   invalid request:"FJ584021"
>
> # I also tried using the NCBI taxid
>
> ee = seqinr::query("tid=185882")
> Error in seqinr::query("tid=185882") : invalid request:"FJ584022"
>
>
>
> Interestingly the accession number reported in the error message for 
> instance "FJ584022" is a DNA sequences belonging to the species of 
> interest "Pterois antennata"
>
>
> I don't really understand the problem considering that the function 
> works for other species.
>
>
> I'm about to finalize a revision of a manuscript using this function 
> as part of a new R package, and I would be really glad if you can find 
> a solution to this problem!!!
>
>
> Thanks you very much in advance for your help greatly appreciated!
>
>
> David Eme
>
>
> PS: for information my sessionInfo() for the two computers i tried 
> with are reported below.
>
> sessionInfo()
> R version 3.4.4 (2018-03-15)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 14.04.5 LTS
>
> Matrix products: default
> BLAS: /usr/lib/libblas/libblas.so.3.0
> LAPACK: /usr/lib/lapack/liblapack.so.3.0
>
> locale:
>  [1] LC_CTYPE=en_NZ.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_NZ.UTF-8        LC_COLLATE=en_NZ.UTF-8
>  [5] LC_MONETARY=en_NZ.UTF-8    LC_MESSAGES=en_NZ.UTF-8
>  [7] LC_PAPER=en_NZ.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_NZ.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] seqinr_3.4-5   regPhylo_1.0.1
>
> loaded via a namespace (and not attached):
> [1] MASS_7.3-44    compiler_3.4.4 tools_3.4.4  ade4_1.7-13
>
>
> sessionInfo()
> R version 3.5.2 (2018-12-20)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> Running under: Windows 10 x64 (build 17134)
>
> Matrix products: default
>
> locale:
> [1] LC_COLLATE=English_New Zealand.1252 LC_CTYPE=English_New 
> Zealand.1252    LC_MONETARY=English_New Zealand.1252
> [4] LC_NUMERIC=C                         LC_TIME=English_New Zealand.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets methods   base
>
> other attached packages:
> [1] seqinr_3.4-5
>
> loaded via a namespace (and not attached):
> [1] MASS_7.3-51.1  compiler_3.5.2 ade4_1.7-13
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
> Seqinr-forum mailing list
> Seqinr-forum at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/seqinr-forum


-- 
Simon Penel     ^(;,;)^
Laboratoire de Biometrie et Biologie Evolutive
Bat 711 - CNRS UMR 5558
43 bd du 11 novembre 1918 69622 Villeurbanne Cedex
Tel:   04 72 43 29 04      Fax:  04 72 43 13 88
http://lbbe.univ-lyon1.fr/-Penel-Simon-.html?lang=fr



More information about the Seqinr-forum mailing list