[Seqinr-commits] r1900 - www/src/mainmatter
noreply at r-forge.r-project.org
noreply at r-forge.r-project.org
Thu Jun 2 16:02:00 CEST 2016
Author: jeanlobry
Date: 2016-06-02 16:01:59 +0200 (Thu, 02 Jun 2016)
New Revision: 1900
Modified:
www/src/mainmatter/getseqflat.rnw
www/src/mainmatter/getseqflat.tex
Log:
use of gzcon for read.fasta
Modified: www/src/mainmatter/getseqflat.rnw
===================================================================
--- www/src/mainmatter/getseqflat.rnw 2016-06-02 12:58:01 UTC (rev 1899)
+++ www/src/mainmatter/getseqflat.rnw 2016-06-02 14:01:59 UTC (rev 1900)
@@ -97,6 +97,49 @@
read.fasta(aafile, seqtype = "AA", as.string = TRUE, set.attributes = FALSE)
@
+\subsubsection{Compressed file example}
+
+The original file before compression looks like:
+
+<<examplegzip1,eval=T>>=
+uncompressed <- system.file("sequences/smallAA.fasta", package = "seqinr")
+cat(readLines(uncompressed), sep = "\n")
+@
+
+The compressed file example is full of mojibakes because of its
+binary nature, but the \texttt{readLines()} is still able to read
+it correctly:
+
+<<examplegzip2,eval=T>>=
+compressed <- system.file("sequences/smallAA.fasta.gz", package = "seqinr")
+readChar(compressed, nchar = 1000, useBytes = TRUE)
+cat(readLines(compressed), sep = "\n")
+@
+
+We can therefore import the sequences directly from a gzipped file:
+
+<<examplegzip3,eval=T>>=
+res1 <- read.fasta(uncompressed)
+res2 <- read.fasta(compressed)
+identical(res1, res2)
+@
+
+This automatic conversion works well for local files but is no more
+active when you read the data from an URL, for instance:
+
+<<ftpgz1,eval=T>>=
+myurl <- "ftp://ftp.ncbi.nlm.nih.gov/refseq/release/plasmid/plasmid.1.rna.fna.gz"
+try.res <- try(read.fasta(myurl))
+try.res
+@
+
+A simple workthrough is to encapsulate this into \texttt{gzcon()} :
+
+<<ftpgz2,eval=T>>=
+myseq <- read.fasta(gzcon(url(myurl)))
+getName(myseq)
+@
+
\subsection{The function \texttt{write.fasta()}}
This function writes sequences to a file in FASTA format.
Modified: www/src/mainmatter/getseqflat.tex
===================================================================
--- www/src/mainmatter/getseqflat.tex 2016-06-02 12:58:01 UTC (rev 1899)
+++ www/src/mainmatter/getseqflat.tex 2016-06-02 14:01:59 UTC (rev 1900)
@@ -330,6 +330,89 @@
\end{Soutput}
\end{Schunk}
+\subsubsection{Compressed file example}
+
+The original file before compression looks like:
+
+\begin{Schunk}
+\begin{Sinput}
+ uncompressed <- system.file("sequences/smallAA.fasta", package = "seqinr")
+ cat(readLines(uncompressed), sep = "\n")
+\end{Sinput}
+\begin{Soutput}
+>smallAA A very small AA file in FASTA format
+SEQINRSEQINRSEQINRSEQINR*
+\end{Soutput}
+\end{Schunk}
+
+The compressed file example is full of mojibakes because of its
+binary nature, but the \texttt{readLines()} is still able to read
+it correctly:
+
+\begin{Schunk}
+\begin{Sinput}
+ compressed <- system.file("sequences/smallAA.fasta.gz", package = "seqinr")
+ readChar(compressed, nchar = 1000, useBytes = TRUE)
+\end{Sinput}
+\begin{Soutput}
+[1] "\037\x8b\b\b\xd4\024PW"
+\end{Soutput}
+\begin{Sinput}
+ cat(readLines(compressed), sep = "\n")
+\end{Sinput}
+\begin{Soutput}
+>smallAA A very small AA file in FASTA format
+SEQINRSEQINRSEQINRSEQINR*
+\end{Soutput}
+\end{Schunk}
+
+We can therefore import the sequences directly from a gzipped file:
+
+\begin{Schunk}
+\begin{Sinput}
+ res1 <- read.fasta(uncompressed)
+ res2 <- read.fasta(compressed)
+ identical(res1, res2)
+\end{Sinput}
+\begin{Soutput}
+[1] TRUE
+\end{Soutput}
+\end{Schunk}
+
+This automatic conversion works well for local files but is no more
+active when you read the data from an URL, for instance:
+
+\begin{Schunk}
+\begin{Sinput}
+ myurl <- "ftp://ftp.ncbi.nlm.nih.gov/refseq/release/plasmid/plasmid.1.rna.fna.gz"
+ try.res <- try(read.fasta(myurl))
+ try.res
+\end{Sinput}
+\begin{Soutput}
+[1] "Error in read.fasta(myurl) : no line starting with a > character found\n"
+attr(,"class")
+[1] "try-error"
+attr(,"condition")
+<simpleError in read.fasta(myurl): no line starting with a > character found>
+\end{Soutput}
+\end{Schunk}
+
+A simple workthrough is to encapsulate this into \texttt{gzcon()} :
+
+\begin{Schunk}
+\begin{Sinput}
+ myseq <- read.fasta(gzcon(url(myurl)))
+ getName(myseq)
+\end{Sinput}
+\begin{Soutput}
+[1] "gi|470467018|ref|NR_074151.1|" "gi|444303868|ref|NR_074290.1|"
+[3] "gi|452192228|ref|NR_075742.1|" "gi|451991842|ref|NR_075394.1|"
+[5] "gi|451991838|ref|NR_075390.1|" "gi|444303919|ref|NR_074342.1|"
+[7] "gi|470486111|ref|NR_076736.1|" "gi|470480648|ref|NR_076426.1|"
+[9] "gi|470478007|ref|NR_076423.1|"
+\end{Soutput}
+\end{Schunk}
+
\subsection{The function \texttt{write.fasta()}}
This function writes sequences to a file in FASTA format.
@@ -507,7 +590,7 @@
\end{Sinput}
\begin{Soutput}
user system elapsed
- 3.908 0.035 3.948
+ 3.827 0.036 3.863
\end{Soutput}
\end{Schunk}
@@ -528,7 +611,7 @@
\end{Sinput}
\begin{Soutput}
user system elapsed
- 0.164 0.002 0.167
+ 0.161 0.002 0.162
\end{Soutput}
\end{Schunk}
@@ -1566,7 +1649,7 @@
There were two compilation steps:
\begin{itemize}
- \item \Rlogo{} compilation time was: Thu Jun 2 14:42:23 2016
+ \item \Rlogo{} compilation time was: Thu Jun 2 15:58:57 2016
\item \LaTeX{} compilation time was: \today
\end{itemize}
More information about the Seqinr-commits
mailing list