[Seqinr-forum] read.fasta fails to read a gzipped compressed file from a ftp server
Jean Lobry
jean.lobry at univ-lyon1.fr
Sat Apr 21 20:23:13 CEST 2018
Found it!
just use gzcon() to encapsulate the thing, this is explained at:
http://seqinr.r-forge.r-project.org/src/mainmatter/getseqflat.pdf
Here is it:
--------------------- BEGIN
$ R --quiet
> library(seqinr)
> filename <-
"ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/008/865/GCF_000008865.1_ASM886v1/GCF_000008865.1_ASM886v1_rna_from_genomic.fna.gz"
> ld <- read.fasta(file = gzcon(url(filename)))
> head(names(ld))
[1] "lcl|NC_002695.1_rrna_1" "lcl|NC_002695.1_trna_2"
"lcl|NC_002695.1_trna_3"
[4] "lcl|NC_002695.1_rrna_4" "lcl|NC_002695.1_rrna_5"
"lcl|NC_002695.1_trna_6"
----------------------- END
Best,
JLO
Le 21/04/2018 à 20:02, Jean Lobry a écrit :
> Dear all,
>
> I was able to reproduce the well documented faulty behaviour as follows:
>
> -------------------------- BEGIN ------------------------------------
> $ R --quiet
> > library("seqinr")
> > filename <-
> "ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/008/865/GCF_000008865.1_ASM886v1/GCF_000008865.1_ASM886v1_rna_from_genomic.fna.gz"
>
> > ld <- read.fasta(file = filename)
> Error in substr(lines, 1L, 1L) :
> chaîne de charactères multioctets incorrecte à
> '<aa><89><b4>b<68><8d><a4>c0<ed><B$<97> O<eb>3:<d2>ڀR
> 46<a4>L<9b><91>d<ba><b2>9<a6><c4>k!<d2>z<a6>G<8d>4<bb>
> פa0>2H[<ac>X<d2>VP<b0>a<d2>tj<84><e6><99>&<95><8e><86>Kt<ac>OHZ:@<d7>.
>
> |Sэ<e8><81>#<e9><ae><9a><9f>ab><bb>b<ac>ab>W<d7>W<8c><95><93><8d><d5><d0>/<8c><d5>]6<?<fb><cd><f5>ա<8d>\e5><86>a<b2>Qw<b0>Q<b4><90>`
> <8e>#-M<a8><9d<f5><db><9a>
> z
>
> d$|<8a>4r<dc>¥<d7><f3><eb><e7><ae>a<d7>p<8c>+\??<c2>5<e8><8e>p<f9><d6><e1>!<aa><94>"`E2<a1>iR<ab><88><e6><89>'
>
> De plus : There were 27 warnings (use warnings() to see them)
> > sessionInfo()
> R version 3.4.1 (2017-06-30)
> Platform: x86_64-apple-darwin15.6.0 (64-bit)
> Running under: macOS Sierra 10.12.6
>
> Matrix products: default
> BLAS:
> /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
> LAPACK:
> /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
>
> locale:
> [1] fr_FR.UTF-8/fr_FR.UTF-8/fr_FR.UTF-8/C/fr_FR.UTF-8/fr_FR.UTF-8
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] seqinr_3.4-5
>
> loaded via a namespace (and not attached):
> [1] compiler_3.4.1 ade4_1.7-8
> > q()
> Save workspace image? [y/n/c]: n
>
> -------------------------- END ------------------------------------
>
> Rings a bell to me, I'll post a solution asap.
>
> Best,
>
> JLO
>
>
> -------- Message transféré --------
> Sujet : Re: Error in library(help=seqinr)
> Date : Sun, 22 Apr 2018 02:21:23 +0900
> De : Haruo Suzuki <haruo at sfc.keio.ac.jp>
> Pour : Simon Penel <simon.penel at univ-lyon1.fr>
> Copie à : jean.lobry at univ-lyon1.fr
>
>
>
> Dear Simon,
>
> The `read.fasta` function of seqinr package failed to load the gzipped
> FASTA file directly from the ftp site, simply by specifying a full URL,
> as shown in attachment.
>
> Yours sincerely,
> Haruo Suzuki
>
>
> _______________________________________________
> Seqinr-forum mailing list
> Seqinr-forum at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/seqinr-forum
>
More information about the Seqinr-forum
mailing list