[Seqinr-forum] read.fasta fails to read a gzipped compressed file from a ftp server

Jean Lobry jean.lobry at univ-lyon1.fr
Sat Apr 21 20:02:02 CEST 2018


Dear all,

I was able to reproduce the well documented faulty behaviour as follows:

-------------------------- BEGIN ------------------------------------
$ R --quiet
 > library("seqinr")
 > filename <- 
"ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/008/865/GCF_000008865.1_ASM886v1/GCF_000008865.1_ASM886v1_rna_from_genomic.fna.gz"
 > ld <- read.fasta(file = filename)
Error in substr(lines, 1L, 1L) :
   chaîne de charactères multioctets incorrecte à 
'<aa><89><b4>b<68><8d><a4>c0<ed><B$<97>	O<eb>3:<d2>ڀR 
46<a4>L<9b><91>d<ba><b2>9<a6><c4>k!<d2>z<a6>G<8d>4<bb> 
פa0>2H[<ac>X<d2>VP<b0>a<d2>tj<84><e6><99>&<95><8e><86>Kt<ac>OHZ:@<d7>.
 
|Sэ<e8><81>#<e9><ae><9a><9f>ab><bb>b<ac>ab>W<d7>W<8c><95><93><8d><d5><d0>/<8c><d5>]6<?<fb><cd><f5>ա<8d>\e5><86>a<b2>Qw<b0>Q<b4><90>` 
<8e>#-M<a8><9d<f5><db><9a>
                                                  z
 
d$|<8a>4r<dc>¥<d7><f3><eb><e7><ae>a<d7>p<8c>+\??<c2>5<e8><8e>p<f9><d6><e1>!<aa><94>"`E2<a1>iR<ab><88><e6><89>'
De plus : There were 27 warnings (use warnings() to see them)
 > sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: 
/Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: 
/Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] fr_FR.UTF-8/fr_FR.UTF-8/fr_FR.UTF-8/C/fr_FR.UTF-8/fr_FR.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] seqinr_3.4-5

loaded via a namespace (and not attached):
[1] compiler_3.4.1 ade4_1.7-8
 > q()
Save workspace image? [y/n/c]: n

-------------------------- END ------------------------------------

Rings a bell to me, I'll post a solution asap.

Best,

JLO


-------- Message transféré --------
Sujet : 	Re: Error in library(help=seqinr)
Date : 	Sun, 22 Apr 2018 02:21:23 +0900
De : 	Haruo Suzuki <haruo at sfc.keio.ac.jp>
Pour : 	Simon Penel <simon.penel at univ-lyon1.fr>
Copie à : 	jean.lobry at univ-lyon1.fr



Dear Simon,

The `read.fasta` function of seqinr package failed to load the gzipped 
FASTA file directly from the ftp site, simply by specifying a full URL, 
as shown in attachment.

Yours sincerely,
Haruo Suzuki
-------------- next part --------------

R version 3.3.3 (2017-03-06) -- "Another Canoe"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin13.4.0 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

[R.app GUI 1.69 (7328) x86_64-apple-darwin13.4.0]

[History restored from /Users/haruo/.Rapp.history]

> 
> library("seqinr")
> filename <- "ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/008/865/GCF_000008865.1_ASM886v1/GCF_000008865.1_ASM886v1_rna_from_genomic.fna.gz"
> ld <- read.fasta(file = filename)
Error in substr(lines, 1L, 1L) : 
  invalid multibyte string at '<aa><89><b4>b<68><8d><a4>c0<ed><B$<97>	O<eb>3:<d2>ڀR	46<a4>L<9b><91>d<ba><b2>9<a6><c4>k!<d2>z<a6>G<8d>4<bb>	פ<a0>2H[<ac>X<d2>VP<b0>a<d2>tj<84><e6><99>&<95><8e><86>Kt<ac>OHZ:@<d7>.
|Sэ<e8><81>#<e9><ae><9a><9f><ab><bb>b<ac><ab>W<d7>W<8c><95><93><8d><d5><d0>/<8c><d5>]6<?<fb><cd><f5>ա<8d>\<e5><86>a<b2>Qw<b0>Q<b4><90>` <8e>#-M<a8><9d<f5><db><9a>
z
d$|<8a>4r<dc>¥<d7><f3><eb><e7><ae>a<d7>p<8c>+\??<c2>5<e8><8e>p<f9><d6>
<e1>!<aa><94>"`E2<a1>iR<ab><88><e6><89>'
In addition: There were 27 warnings (use warnings() to see them)
> 
> warnings()
Warning messages:
1: In readLines(file) : line 1 appears to contain an embedded nul
2: In readLines(file) : line 3 appears to contain an embedded nul
3: In readLines(file) : line 4 appears to contain an embedded nul
4: In readLines(file) : line 9 appears to contain an embedded nul
5: In readLines(file) : line 10 appears to contain an embedded nul
6: In readLines(file) : line 13 appears to contain an embedded nul
7: In readLines(file) : line 15 appears to contain an embedded nul
8: In readLines(file) : line 17 appears to contain an embedded nul
9: In readLines(file) : line 22 appears to contain an embedded nul
10: In readLines(file) : line 28 appears to contain an embedded nul
11: In readLines(file) : line 30 appears to contain an embedded nul
12: In readLines(file) : line 32 appears to contain an embedded nul
13: In readLines(file) : line 34 appears to contain an embedded nul
14: In readLines(file) : line 40 appears to contain an embedded nul
15: In readLines(file) : line 46 appears to contain an embedded nul
16: In readLines(file) : line 50 appears to contain an embedded nul
17: In readLines(file) : line 52 appears to contain an embedded nul
18: In readLines(file) : line 53 appears to contain an embedded nul
19: In readLines(file) : line 54 appears to contain an embedded nul
20: In readLines(file) : line 70 appears to contain an embedded nul
21: In readLines(file) : line 71 appears to contain an embedded nul
22: In readLines(file) :
  incomplete final line found on 'ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/008/865/GCF_000008865.1_ASM886v1/GCF_000008865.1_ASM886v1_rna_from_genomic.fna.gz'
23: In grep("^;", lines) : input string 1 is invalid in this locale
24: In grep("^;", lines) : input string 2 is invalid in this locale
25: In grep("^;", lines) : input string 4 is invalid in this locale
26: In grep("^;", lines) : input string 5 is invalid in this locale
27: In grep("^;", lines) : input string 6 is invalid in this locale
> sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X Mavericks 10.9.5

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] seqinr_3.4-6

loaded via a namespace (and not attached):
[1] MASS_7.3-45 ade4_1.7-10
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/seqinr-forum/attachments/20180421/113df0da/attachment.html>


More information about the Seqinr-forum mailing list