[Seqinr-commits] r2083 - www/src/mainmatter

Thu Aug 3 20:56:01 CEST 2017

Author: jeanlobry
Date: 2017-08-03 20:56:01 +0200 (Thu, 03 Aug 2017)
New Revision: 2083

Added:
   www/src/mainmatter/introduction.pdf
Modified:
   www/src/mainmatter/introduction.rnw
   www/src/mainmatter/introduction.tex
Log:
update 3.4-5

Added: www/src/mainmatter/introduction.pdf
===================================================================
(Binary files differ)


Property changes on: www/src/mainmatter/introduction.pdf
___________________________________________________________________
Added: svn:mime-type
   + application/octet-stream

Modified: www/src/mainmatter/introduction.rnw
===================================================================

--- www/src/mainmatter/introduction.rnw	2017-08-03 17:57:18 UTC (rev 2082)
+++ www/src/mainmatter/introduction.rnw	2017-08-03 18:56:01 UTC (rev 2083)
@@ -1,8 +1,8 @@
 \documentclass{article}
 \input{../config/commontex}
 
-\title{Introduction}
-\author{Lobry, J.R.}
+\title{Introduction to \seqinr{}}
+\author{Pr. Jean R. \textsc{Lobry}}
 
 \begin{document}
 \SweaveOpts{concordance=TRUE}
@@ -10,6 +10,8 @@
 \maketitle
 \tableofcontents
 % BEGIN - DO NOT REMOVE THIS LINE
+
+\newpage
 \section{About ACNUC}
 
 \marginpar{
@@ -21,22 +23,13 @@
 \tiny{Cover of ACNUC book vol. 2}
 }
 
-ACNUC\footnote{
-A contraction of ACides NUCl{\'e}iques, that is \emph{NUCleic ACids}
-in french (\url{http://pbil.univ-lyon1.fr/databases/acnuc/acnuc.html})}
-was first a database of nucleic acids developed in the early
-80's in the same lab (Lyon, France) that issued \seqinr{}. ACNUC was first published
-as a printed book in two volumes \cite{GautierC1982a, GautierC1982b}
-whose covers are reproduced in margin there. At about the same time, two
-other databases were created, one in the USA (GenBank,
-at Los Alamos and now managed by the NCBI\footnote{National Center for Biotechnology Information}), 
-and another one in Germany
-(created in K{\"o}ln by K. St{\"u}ber). To avoid duplication of efforts at the
-european level, a single repository database was initiated in Germany yielding
-the EMBL\footnote{European Molecular Biology Laboratory} database that moved from K{\"o}ln
-to Heidelberg, and then to its current location at the EBI\footnote{European Bioinformatic
-Institute} near Cambridge. The DDBJ\footnote{DNA Data Bank of Japan} started
-in 1986 at the NIG\footnote{National Institute of Genetics} in Mishima. These three
+ACNUC\footnote{%
+A contraction of ACides NUCl{\'e}iques, that is \emph{NUCleic ACids} in french (\url{http://pbil.univ-lyon1.fr/databases/acnuc/acnuc.html})}
+was the first database of nucleic acids developed in the early 80's in the same lab (Lyon, France) that issued \seqinr{}. ACNUC was  published as a printed book in two volumes \cite{GautierC1982a, GautierC1982b} whose covers are reproduced in margin there. At about the same time, two other databases were created, one in the USA (GenBank, at Los Alamos and now managed by the NCBI\footnote{National Center for Biotechnology Information}), and another one in Germany (created in K{\"o}ln by K. St{\"u}ber). To avoid duplication of efforts at the european level, a single repository database was initiated in Germany yielding the EMBL\footnote{European Molecular Biology Laboratory} database that moved from K{\"o}ln to Heidelberg, and then to its current location at the EBI\footnote{%
+European Bioinformatic Institute} 
+near Cambridge. The DDBJ\footnote{%
+DNA Data Bank of Japan} 
+started in 1986 at the NIG\footnote{National Institute of Genetics} in Mishima. These three
 main repository DNA databases are now collaborating to maintain the INSD\footnote{
 International Nucleotide Sequence Database (\url{http://www.insdc.org/})} 
 and are sharing data on a daily basis.
@@ -191,8 +184,7 @@
 
 You need a computer connected to the Internet. First, install \Rlogo{} on your computer.
 There are distributions for Linux, Mac and Windows users
-on the CRAN (\texttt{http://cran.r-project.org}). Then, install the \texttt{ape}, 
-\texttt{ade4} and \texttt{seqinr} packages. This can be done directly in an \Rlogo{} console
+on the CRAN (\texttt{http://cran.r-project.org}). Then, install the \texttt{seqinr} package. This can be done directly in an \Rlogo{} console
 with for instance the command \texttt{install.packages("seqinr")}. 
 Last, load the \seqinr{} package with:
 
@@ -310,6 +302,7 @@
 example(chargaff, ask = FALSE)
 @
 \setkeys{Gin}{width=0.8\textwidth}
+\medskip
 
 This is a very specialised graph. The filled areas correspond to non-allowed values beause the sum 
 of the four bases frequencies cannot exceed 100\%. The white areas correspond to possible values 
@@ -356,37 +349,35 @@
 
 \subsection{Data as fast moving targets}
  
-In research area, data are not always stable. 
-Consider figure 1 from \cite{lobrylncs} which is reproduced here in figure \ref{fig1lncs2004}.
-Data have been updated since then, but we can re-use  the same \Rlogo{}~code\footnote{
-This code was adapted from \url{http://pbil.univ-lyon1.fr/members/lobry/repro/lncs04/}.
-}
+In research area, data are not always stable. Consider figure 1 from \cite{lobrylncs} which is reproduced here in figure~\ref{fig1lncs2004} page~\pageref{fig1lncs2004} here. Data have been updated since then, but we can re-use the same \Rlogo{}~code\footnote{%
+This code was adapted from \url{http://pbil.univ-lyon1.fr/members/lobry/repro/lncs04/}.}
 to update the figure:
 
 \setkeys{Gin}{width=\textwidth}
-<<dbg, fig = TRUE, eval=T, width=9.2, height=6>>=
+<<dbg, fig = TRUE, eval=T, width=9.2, height=7>>=
 data <- get.db.growth()
 scale <- 1
-   ltymoore <- 1 # line type for Moore's law
-    date <- data$date
-    Nucleotides <- data$Nucleotides
-    Month <- data$Month
-    plot.default(date, log10(Nucleotides), 
-        main = "Update of Fig. 1 from Lobry (2004) LNCS, 3039:679:\nThe exponential growth of genome sequence data", xlab = "Year", 
-        ylab = "Log10 number of nucleotides", pch = 19, las = 1,
-        cex = scale, cex.axis = scale, cex.lab = scale)
-    abline(lm(log10(Nucleotides) ~ date), lwd = 2)
-    lm1 <- lm(log(Nucleotides) ~ date)
-    mu <- lm1$coef[2]
-    dbt <- log(2)/mu
-    dbt <- 12 * dbt
-    x <- mean(date)
-    y <- mean(log10(Nucleotides))
-    a <- log10(2)/1.5
-    b <- y - a * x
-    lm10 <- lm(log10(Nucleotides) ~ date)
-    for (i in seq(-10, 10, by = 1)) if (i != 0) 
-            abline(coef = c(b + i, a), col = "black", lty = ltymoore)
+ltymoore <- 1 # line type for Moore's law
+date <- data$date
+Nucleotides <- data$Nucleotides
+Month <- data$Month
+plot.default(date, log10(Nucleotides), 
+  main = "Update of Fig. 1 from Lobry (2004) LNCS, 3039:679:
+  \nThe exponential growth of genome sequence data", xlab = "Year", 
+  ylab = "Log10 number of nucleotides", pch = 19, las = 1,
+  cex = scale, cex.axis = scale, cex.lab = scale)
+abline(lm(log10(Nucleotides) ~ date), lwd = 2)
+lm1 <- lm(log(Nucleotides) ~ date)
+mu <- lm1$coef[2]
+dbt <- log(2)/mu
+dbt <- 12 * dbt
+x <- mean(date)
+y <- mean(log10(Nucleotides))
+a <- log10(2)/1.5
+b <- y - a * x
+lm10 <- lm(log10(Nucleotides) ~ date)
+for (i in seq(-10, 10, by = 1)) if (i != 0) 
+  abline(coef = c(b + i, a), col = "black", lty = ltymoore)
 @
 \setkeys{Gin}{width=0.8\textwidth}
 
@@ -395,16 +386,10 @@
 
 
 \begin{figure}
-\begin{center}
+\fbox{\begin{minipage}{\textwidth}
 \includegraphics[width=\textwidth]{../figs/fig1lncs2004}
-\end{center}
-\caption{Screenshot of figure 1 from \cite{lobrylncs}.
-The exponential growth of genomic sequence data mimics Moore's law.
-The source of data is the december 2003 release note (realnote.txt) from the EMBL database
-available at \protect\url{http://www.ebi.ac.uk/}. External lines correspond to what would be expected with
-a doubling time of 18 months. The central line through points is the best least square fit,
-corresponding to a doubling time of 16.9 months.}
-\label{fig1lncs2004}
+\caption{\label{fig1lncs2004} Screenshot of figure 1 from \cite{lobrylncs}. The exponential growth of genomic sequence data mimics \textsc{Moore}'s law. The source of data was the december 2003 release note (\texttt{realnote.txt}) from the EMBL database that was available at \protect\url{http://www.ebi.ac.uk/}. External lines correspond to what would be expected with a doubling time of 18 months. The central line through points is the best least square fit, corresponding here to a doubling time of 16.9 months.}
+\end{minipage}}
 \end{figure}
 
 \subsection{\texttt{Sweave()} and \texttt{xtable()}}

Modified: www/src/mainmatter/introduction.tex
===================================================================
--- www/src/mainmatter/introduction.tex	2017-08-03 17:57:18 UTC (rev 2082)
+++ www/src/mainmatter/introduction.tex	2017-08-03 18:56:01 UTC (rev 2083)
@@ -1,8 +1,8 @@
 \documentclass{article}
 \input{../config/commontex}
 
-\title{Introduction}
-\author{Lobry, J.R.}
+\title{Introduction to \seqinr{}}
+\author{Pr. Jean R. \textsc{Lobry}}
 
 \usepackage{Sweave}
 \begin{document}
@@ -46,6 +46,8 @@
 \maketitle
 \tableofcontents
 % BEGIN - DO NOT REMOVE THIS LINE
+
+\newpage
 \section{About ACNUC}
 
 \marginpar{
@@ -57,22 +59,13 @@
 \tiny{Cover of ACNUC book vol. 2}
 }
 
-ACNUC\footnote{
-A contraction of ACides NUCl{\'e}iques, that is \emph{NUCleic ACids}
-in french (\url{http://pbil.univ-lyon1.fr/databases/acnuc/acnuc.html})}
-was first a database of nucleic acids developed in the early
-80's in the same lab (Lyon, France) that issued \seqinr{}. ACNUC was first published
-as a printed book in two volumes \cite{GautierC1982a, GautierC1982b}
-whose covers are reproduced in margin there. At about the same time, two
-other databases were created, one in the USA (GenBank,
-at Los Alamos and now managed by the NCBI\footnote{National Center for Biotechnology Information}), 
-and another one in Germany
-(created in K{\"o}ln by K. St{\"u}ber). To avoid duplication of efforts at the
-european level, a single repository database was initiated in Germany yielding
-the EMBL\footnote{European Molecular Biology Laboratory} database that moved from K{\"o}ln
-to Heidelberg, and then to its current location at the EBI\footnote{European Bioinformatic
-Institute} near Cambridge. The DDBJ\footnote{DNA Data Bank of Japan} started
-in 1986 at the NIG\footnote{National Institute of Genetics} in Mishima. These three
+ACNUC\footnote{%
+A contraction of ACides NUCl{\'e}iques, that is \emph{NUCleic ACids} in french (\url{http://pbil.univ-lyon1.fr/databases/acnuc/acnuc.html})}
+was the first database of nucleic acids developed in the early 80's in the same lab (Lyon, France) that issued \seqinr{}. ACNUC was  published as a printed book in two volumes \cite{GautierC1982a, GautierC1982b} whose covers are reproduced in margin there. At about the same time, two other databases were created, one in the USA (GenBank, at Los Alamos and now managed by the NCBI\footnote{National Center for Biotechnology Information}), and another one in Germany (created in K{\"o}ln by K. St{\"u}ber). To avoid duplication of efforts at the european level, a single repository database was initiated in Germany yielding the EMBL\footnote{European Molecular Biology Laboratory} database that moved from K{\"o}ln to Heidelberg, and then to its current location at the EBI\footnote{%
+European Bioinformatic Institute} 
+near Cambridge. The DDBJ\footnote{%
+DNA Data Bank of Japan} 
+started in 1986 at the NIG\footnote{National Institute of Genetics} in Mishima. These three
 main repository DNA databases are now collaborating to maintain the INSD\footnote{
 International Nucleotide Sequence Database (\url{http://www.insdc.org/})} 
 and are sharing data on a daily basis.
@@ -135,8 +128,8 @@
 \end{Sinput}
 \begin{Soutput}
 [1] "             ****     ACNUC Data Base Content      ****                         " 
-[2] "         GenBank Release 213 (15 April 2016) Last Updated: May 22, 2016"          
-[3] "212,493,047,396 bases; 194,219,757 sequences; 31,530,545 subseqs; 876,736 refers."
+[2] "         GenBank Release 220  (15 June 2017) Last Updated: Jul 15, 2017"          
+[3] "236,647,372,946 bases; 202,357,489 sequences; 45,589,898 subseqs; 930,092 refers."
 [4] "Software M. Gouy, Lab. Biometrie et Biologie Evolutive, Universite Lyon I "       
 \end{Soutput}
 \begin{Sinput}
@@ -144,7 +137,7 @@
  bpbk
 \end{Sinput}
 \begin{Soutput}
-[1] "212,493,047,396"
+[1] "236,647,372,946"
 \end{Soutput}
 \begin{Sinput}
  bpbk <- as.numeric(paste(unlist(strsplit(bpbk, split = ",")), collapse = ""))
@@ -152,11 +145,11 @@
  (widthkm <- widthcm/10^5)
 \end{Sinput}
 \begin{Soutput}
-[1] 18.16159
+[1] 20.22604
 \end{Soutput}
 \end{Schunk}
 
-It would be about 18.2
+It would be about 20.2
 kilometer long in ACNUC book format to print GenBank today (\today). As a
 matter of comparison, our local universitary library buiding\footnote{%
 Université de Lyon, F-69000, Lyon ; Université Lyon 1 ; 
@@ -183,9 +176,9 @@
 The Comprehensive \Rlogo{} Archive Network, CRAN, is a network of servers 
 around the world that store identical, up-to-date, versions of code and documentation 
 for R. At compilation time of this document, there were
-95 
+88 
 mirrors available 
-from 50 countries.
+from 48 countries.
 Please use the CRAN mirror nearest to you to minimize network load, they are
 listed at \texttt{http://cran.r-project.org/mirrors.html}, and can be directly
 selected with the function \texttt{chooseCRANmirror()}.
@@ -195,8 +188,8 @@
 In the terminology of the \Rlogo{} project \cite{R, RfromR}, this document 
 is a package \emph{vignette}, which means that all code outputs present 
 here were actually obtained by runing them.
-The examples given thereafter were run under \texttt{R version 3.2.4 (2016-03-10)}
-on Tue May 31 18:00:24 2016 with Sweave \cite{Sweave}. There is a section at the end of
+The examples given thereafter were run under \texttt{R version 3.4.1 (2017-06-30)}
+on Thu Aug  3 20:42:51 2017 with Sweave \cite{Sweave}. There is a section at the end of
 each chapter called \textbf{Session Informations} that gives details about
 packages and package versions that were involved\footnote{
 Previous versions of \Rlogo{} and packages are available on CRAN mirrors,
@@ -223,8 +216,7 @@
 
 You need a computer connected to the Internet. First, install \Rlogo{} on your computer.
 There are distributions for Linux, Mac and Windows users
-on the CRAN (\texttt{http://cran.r-project.org}). Then, install the \texttt{ape}, 
-\texttt{ade4} and \texttt{seqinr} packages. This can be done directly in an \Rlogo{} console
+on the CRAN (\texttt{http://cran.r-project.org}). Then, install the \texttt{seqinr} package. This can be done directly in an \Rlogo{} console
 with for instance the command \texttt{install.packages("seqinr")}. 
 Last, load the \seqinr{} package with:
 
@@ -241,9 +233,9 @@
  lseqinr()[1:9]
 \end{Sinput}
 \begin{Soutput}
-[1] "a"            "aaa"          "aacost"       "aaindex"     
-[5] "AAstat"       "acnucclose"   "acnucopen"    "al2bp"       
-[9] "alllistranks"
+[1] "a"            "aaa"          "AAstat"       "acnucclose"  
+[5] "acnucopen"    "al2bp"        "alllistranks" "alr"         
+[9] "amb"         
 \end{Soutput}
 \end{Schunk}
 
@@ -293,7 +285,7 @@
 
 Do not re-invent (there's a patent \cite{wheel} on it anyway).
 At the compilation time of this document there were 
-8463
+11163
 contributed packages available. Even if you don't want to be spoon-feed 
 \textit{{\`a} bouche ouverte}, 
 it's not a bad
@@ -349,6 +341,7 @@
 \end{Schunk}
 \includegraphics{../figs/introduction-chargaff}
 \setkeys{Gin}{width=0.8\textwidth}
+\medskip
 
 This is a very specialised graph. The filled areas correspond to non-allowed values beause the sum 
 of the four bases frequencies cannot exceed 100\%. The white areas correspond to possible values 
@@ -403,11 +396,8 @@
 
 \subsection{Data as fast moving targets}
  
-In research area, data are not always stable. 
-Consider figure 1 from \cite{lobrylncs} which is reproduced here in figure \ref{fig1lncs2004}.
-Data have been updated since then, but we can re-use  the same \Rlogo{}~code\footnote{
-This code was adapted from \url{http://pbil.univ-lyon1.fr/members/lobry/repro/lncs04/}.
-}
+In research area, data are not always stable. Consider figure 1 from \cite{lobrylncs} which is reproduced here in figure~\ref{fig1lncs2004} page~\pageref{fig1lncs2004} here. Data have been updated since then, but we can re-use the same \Rlogo{}~code\footnote{%
+This code was adapted from \url{http://pbil.univ-lyon1.fr/members/lobry/repro/lncs04/}.}
 to update the figure:
 
 \setkeys{Gin}{width=\textwidth}
@@ -415,46 +405,41 @@
 \begin{Sinput}
  data <- get.db.growth()
  scale <- 1
-    ltymoore <- 1 # line type for Moore's law
-     date <- data$date
-     Nucleotides <- data$Nucleotides
-     Month <- data$Month
-     plot.default(date, log10(Nucleotides), 
-         main = "Update of Fig. 1 from Lobry (2004) LNCS, 3039:679:\nThe exponential growth of genome sequence data", xlab = "Year", 
-         ylab = "Log10 number of nucleotides", pch = 19, las = 1,
-         cex = scale, cex.axis = scale, cex.lab = scale)
-     abline(lm(log10(Nucleotides) ~ date), lwd = 2)
-     lm1 <- lm(log(Nucleotides) ~ date)
-     mu <- lm1$coef[2]
-     dbt <- log(2)/mu
-     dbt <- 12 * dbt
-     x <- mean(date)
-     y <- mean(log10(Nucleotides))
-     a <- log10(2)/1.5
-     b <- y - a * x
-     lm10 <- lm(log10(Nucleotides) ~ date)
-     for (i in seq(-10, 10, by = 1)) if (i != 0) 
-             abline(coef = c(b + i, a), col = "black", lty = ltymoore)
+ ltymoore <- 1 # line type for Moore's law
+ date <- data$date
+ Nucleotides <- data$Nucleotides
+ Month <- data$Month
+ plot.default(date, log10(Nucleotides), 
+   main = "Update of Fig. 1 from Lobry (2004) LNCS, 3039:679:
+   \nThe exponential growth of genome sequence data", xlab = "Year", 
+   ylab = "Log10 number of nucleotides", pch = 19, las = 1,
+   cex = scale, cex.axis = scale, cex.lab = scale)
+ abline(lm(log10(Nucleotides) ~ date), lwd = 2)
+ lm1 <- lm(log(Nucleotides) ~ date)
+ mu <- lm1$coef[2]
+ dbt <- log(2)/mu
+ dbt <- 12 * dbt
+ x <- mean(date)
+ y <- mean(log10(Nucleotides))
+ a <- log10(2)/1.5
+ b <- y - a * x
+ lm10 <- lm(log10(Nucleotides) ~ date)
+ for (i in seq(-10, 10, by = 1)) if (i != 0) 
+   abline(coef = c(b + i, a), col = "black", lty = ltymoore)
 \end{Sinput}
 \end{Schunk}
 \includegraphics{../figs/introduction-dbg}
 \setkeys{Gin}{width=0.8\textwidth}
 
 
-The doubling time is now 18.8 months.
+The doubling time is now 19.1 months.
 
 
 \begin{figure}
-\begin{center}
+\fbox{\begin{minipage}{\textwidth}
 \includegraphics[width=\textwidth]{../figs/fig1lncs2004}
-\end{center}
-\caption{Screenshot of figure 1 from \cite{lobrylncs}.
-The exponential growth of genomic sequence data mimics Moore's law.
-The source of data is the december 2003 release note (realnote.txt) from the EMBL database
-available at \protect\url{http://www.ebi.ac.uk/}. External lines correspond to what would be expected with
-a doubling time of 18 months. The central line through points is the best least square fit,
-corresponding to a doubling time of 16.9 months.}
-\label{fig1lncs2004}
+\caption{\label{fig1lncs2004} Screenshot of figure 1 from \cite{lobrylncs}. The exponential growth of genomic sequence data mimics \textsc{Moore}'s law. The source of data was the december 2003 release note (\texttt{realnote.txt}) from the EMBL database that was available at \protect\url{http://www.ebi.ac.uk/}. External lines correspond to what would be expected with a doubling time of 18 months. The central line through points is the best least square fit, corresponding here to a doubling time of 16.9 months.}
+\end{minipage}}
 \end{figure}
 
 \subsection{\texttt{Sweave()} and \texttt{xtable()}}
@@ -473,20 +458,25 @@
 This part was compiled under the following \Rlogo{}~environment:
 
 \begin{itemize}\raggedright
-  \item R version 3.2.4 (2016-03-10), \verb|x86_64-apple-darwin13.4.0|
+  \item R version 3.4.1 (2017-06-30), \verb|x86_64-apple-darwin15.6.0|
   \item Locale: \verb|fr_FR.UTF-8/fr_FR.UTF-8/fr_FR.UTF-8/C/fr_FR.UTF-8/fr_FR.UTF-8|
+  \item Running under: \verb|macOS Sierra 10.12.5|
+  \item Matrix products: default
+  \item BLAS: \verb|/Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib|
+  \item LAPACK: \verb|/Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib|
   \item Base packages: base, datasets, graphics, grDevices, grid,
     methods, stats, utils
-  \item Other packages: ade4~1.7-4, ape~3.5, grImport~0.9-0,
-    MASS~7.3-45, seqinr~3.0-11, tseries~0.10-35, XML~3.98-1.4,
+  \item Other packages: ade4~1.7-6, ape~4.1, grImport~0.9-0,
+    MASS~7.3-47, seqinr~3.4-5, tseries~0.10-41, XML~3.98-1.9,
     xtable~1.8-2
-  \item Loaded via a namespace (and not attached): lattice~0.20-33,
-    nlme~3.1-125, quadprog~1.5-5, tools~3.2.4, zoo~1.7-12
+  \item Loaded via a namespace (and not attached): compiler~3.4.1,
+    lattice~0.20-35, nlme~3.1-131, parallel~3.4.1, quadprog~1.5-5,
+    quantmod~0.4-10, tools~3.4.1, TTR~0.23-1, xts~0.9-7, zoo~1.8-0
 \end{itemize}
 There were two compilation steps:
 
 \begin{itemize}
-  \item \Rlogo{} compilation time was: Tue May 31 18:00:39 2016
+  \item \Rlogo{} compilation time was: Thu Aug  3 20:43:01 2017
   \item \LaTeX{} compilation time was: \today
 \end{itemize}