[Seqinr-commits] r1849 - www/src/mainmatter

noreply at r-forge.r-project.org noreply at r-forge.r-project.org
Tue May 31 21:29:19 CEST 2016


Author: jeanlobry
Date: 2016-05-31 21:29:18 +0200 (Tue, 31 May 2016)
New Revision: 1849

Modified:
   www/src/mainmatter/getseqacnuc.rnw
   www/src/mainmatter/getseqacnuc.tex
Log:
major update so that it works

Modified: www/src/mainmatter/getseqacnuc.rnw
===================================================================
--- www/src/mainmatter/getseqacnuc.rnw	2016-05-31 16:20:47 UTC (rev 1848)
+++ www/src/mainmatter/getseqacnuc.rnw	2016-05-31 19:29:18 UTC (rev 1849)
@@ -75,32 +75,32 @@
 
 
 Now, if you want to work with a given database, say GenBank, just call \texttt{choosebank()}
-with \texttt{"genbank"} as its first argument, the result is saved in the variable
-\texttt{banknameSocket} in the workspace:
+with \texttt{"genbank"} as its first argument:
 
 <<choixbanque2, eval=T>>=
-choosebank("genbank")
-str(banknameSocket)
+mybank <- choosebank("genbank")
+str(mybank)
 closebank()
 @
 
-The components of \texttt{banknameSocket} means that in the database
-called \texttt{\Sexpr{ifelse(exists("banknameSocket"), banknameSocket$bankname, "???")}} at the compilation time
+The components of \texttt{mybank} means that in the database
+called \texttt{\Sexpr{ifelse(exists("mybank"), mybank$bankname, "???")}} at the compilation time
 of this document there were 
-\texttt{\Sexpr{ifelse(exists("banknameSocket"), formatC(as.integer(banknameSocket$totseqs), big.mark=","), "???")}}
+\texttt{\Sexpr{ifelse(exists("mybank"), formatC(as.integer(mybank$totseqs), big.mark=","), "???")}}
 sequences from
-\texttt{\Sexpr{ifelse(exists("banknameSocket"), formatC(as.integer(banknameSocket$totspecs), big.mark=","), "???")}}
+\texttt{\Sexpr{ifelse(exists("mybank"), formatC(as.integer(mybank$totspecs), big.mark=","), "???")}}
 species and a total of
-\texttt{\Sexpr{ifelse(exists("banknameSocket"), formatC(as.integer(banknameSocket$totkeys), big.mark=","), "???")}}
+\texttt{\Sexpr{ifelse(exists("mybank"), formatC(as.integer(mybank$totkeys), big.mark=","), "???")}}
 keywords. The status of the bank was
-\texttt{\Sexpr{ifelse(exists("banknameSocket"), banknameSocket$status, "???")}}, 
+\texttt{\Sexpr{ifelse(exists("mybank"), mybank$status, "???")}}, 
 and the release information was
-\texttt{\Sexpr{ifelse(exists("banknameSocket"), banknameSocket$release, "???")}}.
+\texttt{\Sexpr{ifelse(exists("mybank"), mybank$release, "???")}}.
 For specialized databases, some relevant informations are also given in the
 \texttt{details} component.
 
 As from \seqinr~1.0-3, the result of the \texttt{choosebank()} function is automatically
-stored in a global variable named \texttt{banknameSocket}, so that if no socket argument
+stored in a variable named \texttt{banknameSocket} in the \texttt{.seqinrEnv}
+environment, so that if no socket argument
 is given to the \texttt{query()} function, the last opened database will be used by default
 for your requests.
 This is just a matter of convenience so that you don't have to explicitly specify the details of the
@@ -108,7 +108,7 @@
 full control of the process since \texttt{choosebank()} returns (invisibly) all the
 required details. There is no trouble to open \emph{simultaneously} many databases.
 You are just limited by the number of simultaneous connections your build of \Rlogo{}~is
-allowed\footnote{
+allowed\footnote{%
 As from \Rlogo{}~2.4.0 he maximum number of open connections has been increased from
 50 to 128. Note also that 
 there is a very convenient function called \texttt{closeAllConnections()} in the \Rlogo{}~base package if
@@ -130,13 +130,14 @@
   if(inherits(bkopenres, "try-error")){
     ntaxa[i] <- NA
   } else {
-    ntaxa[i] <- as.numeric(banknameSocket$totspecs)
+    ntaxa[i] <- as.numeric(bkopenres$totspecs)
     closebank()
   }
 }
 names(ntaxa) <- banks
 @ 
 <<plottaxaperbank,fig=T,eval=T,width=6,height=8>>=
+ntaxa <- ntaxa[!is.na(ntaxa)]
 dotchart(log10(ntaxa[order(ntaxa)]), pch = 19,
 main = "Number of taxa in available databases",
 xlab = "Log10(number of taxa)")
@@ -145,10 +146,13 @@
 \section{Make your query}
 
 For this section, set up the default bank to GenBank, so that you don't have 
-to provide the sockets details for the \texttt{query()} function:
+to provide the sockets details for the \texttt{query()} function. We set the
+\texttt{verbose} argument to \texttt{TRUE}, just for the fun\footnote{%
+This option is however usefull for trouble shooting.}, this is not
+really usefull here:
 
 <<settogenbankbeforequery, eval=T>>=
-choosebank("genbank")
+choosebank("genbank", verbose = TRUE)
 @
 
 Then, you have to say what you want, that is to compose a query
@@ -164,7 +168,7 @@
 \tiny{\textit{Felis catus}. Source: wikipedia}}
 
 <<query1,eval=T>>=
-query("completeCatsCDS", "sp=felis catus AND t=cds AND NOT k=partial")
+completeCatsCDS <- query("completeCatsCDS", "sp=felis catus AND t=cds AND NOT k=partial")
 @
 
 Now, there is in the workspace an object called \texttt{completeCatsCDS}, which
@@ -185,17 +189,13 @@
 in a given database is given by the function \texttt{getType()}, for example the list
 of available subsequences in GenBank is given in table \ref{genbank}.
 
-%
-% Besoin d'edition manuelle du fichier genbank.tex pour virer les caracteres sp?ciaux Latex, ici "_"
-%
-<<xtablegenbank, fig = FALSE, echo = FALSE,eval=FALSE>>=
+<<xtablegenbank, fig = FALSE, echo = FALSE,eval=F>>=
 choosebank("genbank") -> bank
-tmp <- getType(bank$s)
-tmp <- t(data.frame(tmp))
+tmp <- getType(bank$socket)
 row.names(tmp)<-1:nrow(tmp)
 names(tmp)<-NULL
 colnames(tmp) <- c("Type","Description")
-print(xtable(tmp, digits = rep(0,3), caption = paste("Available subsequences in", bank$bankname), label = "genbank"), 
+print(xtable(tmp, digits = rep(0,3), caption = paste("Available subsequences in", bank$release), label = "genbank"), 
 file = "../tables/genbank.tex")
 @
 \input{../tables/genbank.tex}
@@ -220,7 +220,7 @@
 to get only the list of sequences that were published in 2004:
 
 <<query2,eval=T>>=
-query("ccc2004", "completeCatsCDS AND y=2004")
+ccc2004 <- query("ccc2004", "completeCatsCDS AND y=2004")
 length(ccc2004$req)
 ccc2004$nelem
 @
@@ -233,16 +233,12 @@
 with many elements, for instance :
 
 <<queryvirtual,eval=T>>=
-query("allcds", "t=cds", virtual = TRUE)
+allcds <- query("allcds", "t=cds", virtual = TRUE)
 allcds$nelem
 @
 
 There are therefore \texttt{\Sexpr{ifelse(exists("allcds"), formatC(as.integer(allcds$nelem), big.mark=","), "???")}} coding
-sequences in this version of GenBank\footnote{
-which is stored in the \texttt{release} component of the object \texttt{banknameSocket}
-and current value is today (\today): \texttt{banknameSocket\$release = 
-\Sexpr{ifelse(exists("banknameSocket"), banknameSocket$release, "???")}}.
-}. 
+sequences in this version of GenBank. 
 It would be long to get all the informations for the elements
 of this list, so we have set the parameter \texttt{virtual} to \texttt{TRUE} and the \texttt{req}
 component of the list has not been documented:
@@ -251,14 +247,14 @@
 allcds$req
 @
 
-However, the list can still be re-used\footnote{
+However, the list can still be re-used\footnote{%
 of course, as long as the socket connection with the server has not been lost: virtual lists details are only
 known by the server.}, 
 for instance we may extract from this list all the sequences
 from, say, \textit{Mycoplasma genitalium}:
 
 <<chtouille,eval=T>>=
-query("small", "allcds AND sp=mycoplasma genitalium", virtual = TRUE)
+small <- query("small", "allcds AND sp=mycoplasma genitalium", virtual = TRUE)
 small$nelem
 @
 
@@ -267,7 +263,7 @@
 virtual list:
 
 <<chtouille2,eval=T>>=
-query("small", "allcds et sp=mycoplasma genitalium")
+small <- query("small", "allcds AND sp=mycoplasma genitalium")
 getName(small$req[1:10])
 @
 
@@ -277,30 +273,30 @@
 \begin{description}
 \item[\textbf{Man.}] How many sequences are available for our species?
 <<man, eval=T>>=
-query("man","sp=homo sapiens",virtual=T)
+man <- query("man","sp=homo sapiens",virtual=T)
 man$nelem
 @
 There are \texttt{\Sexpr{ifelse(exists("man"), formatC(man$nelem, big.mark=","), "???")}} sequences from \textit{Homo sapiens}.
 
 \item[\textbf{Sex.}] How many sequences are annotated with a keyword starting by sex?
 <<sex, eval=T>>=
-query("sex","k=sex@",virtual=T)
+sex <- query("sex","k=sex@",virtual=T)
 sex$nelem
 @
 There are \texttt{\Sexpr{ifelse(exists("sex"), formatC(sex$nelem, big.mark=","), "???")}} such sequences.
 
 \item[\textbf{tRNA.}] How many complete tRNA sequences are available?
 <<trnacplt, eval=T>>=
-query("trna","t=trna AND NOT k=partial",virtual=T)
+trna <- query("trna","t=trna AND NOT k=partial",virtual=T)
 trna$nelem
 @
 There are \texttt{\Sexpr{ifelse(exists("trna"), formatC(trna$nelem, big.mark=","), "???")}} complete tRNA sequences.
 
 \item[\textbf{Nature vs. Science.}] In which journal were the more sequences published?
 <<natvsscience, eval=T>>= 
-query("nature","j=nature",virtual=T)
+nature <- query("nature","j=nature",virtual=T)
 nature$nelem
-query("science","j=science",virtual=T)
+science <- query("science","j=science",virtual=T)
 science$nelem
 @
 There are \texttt{\Sexpr{ifelse(exists("nature"), formatC(nature$nelem, big.mark=","), "???")}} sequences published
@@ -315,23 +311,23 @@
 
 \item[\textbf{Smith.}] How many sequences have Smith (last name) as author?
 <<smith, eval=T>>=
-query("smith","au=smith",virtual=T)
+smith <- query("smith","au=smith",virtual=T)
 smith$nelem
 @
 There are \texttt{\Sexpr{ifelse(exists("smith"), formatC(smith$nelem, big.mark=","), "???")}} such sequences.
 
 \item[\textbf{YK2.}] How many sequences were published after year 2000 (included)?
 <<yk2, eval=T>>=
-query("yk2","y>2000",virtual=T)
+yk2 <- query("yk2","y>2000",virtual=T)
 yk2$nelem
 @
 There are \texttt{\Sexpr{ifelse(exists("yk2"), formatC(yk2$nelem, big.mark=","), "???")}} sequences published after year 2000.
 
 \item[\textbf{Organelle contest.}] Do we have more sequences from chloroplast genomes or from mitochondion genomes?
 <<organelles, eval=T>>=
-query("chloro","o=chloroplast",virtual=T)
+chloro <- query("chloro","o=chloroplast",virtual=T)
 chloro$nelem
-query("mito","o=mitochondrion",virtual=T)
+mito <- query("mito","o=mitochondrion",virtual=T)
 mito$nelem
 @
 There are \texttt{\Sexpr{ifelse(exists("chloro"), formatC(chloro$nelem, big.mark=","), "???")}} sequences from
@@ -352,15 +348,6 @@
 
 \subsection{Introduction}
 
-There are two functions to get the sequences. The first one, 
-\texttt{getSequence()}, uses regular socket connections, the
-second one, \texttt{extractseqs()}, uses zlib compressed sockets,
-which is faster but the function is experimental (details in
-chapter \ref{extractseqs} page \pageref{extractseqs}).
-
-\subsection{Extacting sequences with \texttt{getSequence()}}
-
-
 For this section we set up the bank to \texttt{emblTP} which is a frozen
 subset of EMBL database to allow for the reproducibility of results.
 
@@ -372,7 +359,7 @@
 coding sequences from \textit{Felis catus} :
 
 <<requeryccc, fig=F, eval=T>>=
-query("completeCatsCDS", "sp=felis catus AND t=cds AND NOT k=partial")
+completeCatsCDS <- query("completeCatsCDS", "sp=felis catus AND t=cds AND NOT k=partial")
 (nseq <- completeCatsCDS$nelem)
 @
 
@@ -410,7 +397,7 @@
 instance the following coding sequence from sequence \texttt{AE003734}:
 
 <<transann,fig=F,eval=T>>=
-query("trs","N=AE003734.PE35")
+trs <- query("trs","N=AE003734.PE35")
 getAnnot(trs$req[[1]]) -> annots
 cat(annots, sep="\n")
 @
@@ -421,7 +408,7 @@
 query with the sequence name:
 
 <<transplicing1, eval=T>>=
-query("transspliced", "N=AE003734.PE35")
+transspliced <- query("transspliced", "N=AE003734.PE35")
 length(transspliced$req)
 getName(transspliced$req[[1]])
 @
@@ -436,7 +423,7 @@
 @
 
 All the complex trans-splicing operations have been done here. You can check that there is no
-in-frame stop codons\footnote{
+in-frame stop codons\footnote{%
 Stop codons are represented by the character \texttt{*} when translated into protein.} 
 with the \texttt{getTrans()} function to translate this coding sequence into protein:
 
@@ -473,7 +460,7 @@
 Consider the following CDS from \texttt{M19233}:
 
 <<multi,fig=F>>=
-query("multi", "AC=M19233 AND T=CDS")
+multi <- query("multi", "AC=M19233 AND T=CDS")
 cat(getAnnot(multi$req[[1]]), sep = "\n")
 @ 
 
@@ -486,12 +473,15 @@
 @ 
 
 There is no stop codon here because the sequence is partial.
+If you are experiencing a strong closure issue problem here,
+just close the bank:
 
-
-<<closebanembltp,fig=F>>=
+<<closebanembltp,fig=F,echo=T>>=
 closebank()
 @ 
 
+\noindent Feeling better now ?
+
 \SweaveInput{../config/sessionInfo.rnw}
 
 % END - DO NOT REMOVE THIS LINE

Modified: www/src/mainmatter/getseqacnuc.tex
===================================================================
--- www/src/mainmatter/getseqacnuc.tex	2016-05-31 16:20:47 UTC (rev 1848)
+++ www/src/mainmatter/getseqacnuc.tex	2016-05-31 19:29:18 UTC (rev 1849)
@@ -81,10 +81,10 @@
  [9] "hovergen"        "hogenom5"        "hogenom5dna"     "hogenom4"       
 [13] "hogenom4dna"     "homolens"        "homolensdna"     "hobacnucl"      
 [17] "hobacprot"       "phever2"         "phever2dna"      "refseq"         
-[21] "greviews"        "bacterial"       "protozoan"       "ensprotists"    
-[25] "ensfungi"        "ensmetazoa"      "ensplants"       "ensemblbacteria"
-[29] "mito"            "polymorphix"     "emglib"          "taxobacgen"     
-[33] "refseqViruses"  
+[21] "greviews"        "bacterial"       "archaeal"        "protozoan"      
+[25] "ensprotists"     "ensfungi"        "ensmetazoa"      "ensplants"      
+[29] "ensemblbacteria" "mito"            "polymorphix"     "emglib"         
+[33] "refseqViruses"   "ribodb"          "taxodb"         
 \end{Soutput}
 \end{Schunk}
  
@@ -106,9 +106,9 @@
 2    embl     on
 3 emblwgs     on
                                                                          info
-1              GenBank Release 201 (15 April 2014) Last Updated: Jun  2, 2014
-2 EMBL Nucleotide Archive Release 119 (March 2014) Last Updated: Jun  1, 2014
-3               EMBL Whole Genome Shotgun sequences Release 119  (March 2014)
+1              GenBank Release 213 (15 April 2016) Last Updated: May 22, 2016
+2 EMBL Nucleotide Archive Release 127 (March 2016) Last Updated: May 21, 2016
+3                EMBL Whole Genome Shotgun sequences Release 127 (March 2016)
 \end{Soutput}
 \end{Schunk}
 
@@ -149,13 +149,12 @@
 
 
 Now, if you want to work with a given database, say GenBank, just call \texttt{choosebank()}
-with \texttt{"genbank"} as its first argument, the result is saved in the variable
-\texttt{banknameSocket} in the workspace:
+with \texttt{"genbank"} as its first argument:
 
 \begin{Schunk}
 \begin{Sinput}
- choosebank("genbank")
- str(banknameSocket)
+ mybank <- choosebank("genbank")
+ str(mybank)
 \end{Sinput}
 \begin{Soutput}
 List of 9
@@ -163,67 +162,36 @@
   .. ..- attr(*, "conn_id")=<externalptr> 
  $ bankname: chr "genbank"
  $ banktype: chr "GENBANK"
- $ totseqs : num 1.91e+08
- $ totspecs: num 1242014
- $ totkeys : num 43531291
- $ release : chr "         GenBank Release 201 (15 April 2014) Last Updated: Jun  2, 2014"
+ $ totseqs : num 2.26e+08
+ $ totspecs: num 1585615
+ $ totkeys : num 69548858
+ $ release : chr "         GenBank Release 213 (15 April 2016) Last Updated: May 22, 2016"
  $ status  :Class 'AsIs'  chr "on"
- $ details : chr [1:4] "             ****     ACNUC Data Base Content      ****                         " "         GenBank Release 201 (15 April 2014) Last Updated: Jun  2, 2014" "160,671,579,040 bases; 172,482,713 sequences; 18,666,226 subseqs; 786,167 refers." "Software by M. Gouy, Lab. Biometrie et Biologie Evolutive, Universite Lyon I "
+ $ details : chr [1:4] "             ****     ACNUC Data Base Content      ****                         " "         GenBank Release 213 (15 April 2016) Last Updated: May 22, 2016" "212,493,047,396 bases; 194,219,757 sequences; 31,530,545 subseqs; 876,736 refers." "Software M. Gouy, Lab. Biometrie et Biologie Evolutive, Universite Lyon I "
 \end{Soutput}
 \begin{Sinput}
  closebank()
 \end{Sinput}
 \end{Schunk}
 
-The components of \texttt{banknameSocket} means that in the database
+The components of \texttt{mybank} means that in the database
 called \texttt{genbank} at the compilation time
 of this document there were 
-\texttt{191,148,940}
+\texttt{225,750,303}
 sequences from
-\texttt{1,242,014}
+\texttt{1,585,615}
 species and a total of
-\texttt{43,531,291}
+\texttt{69,548,858}
 keywords. The status of the bank was
 \texttt{on}, 
 and the release information was
-\texttt{         GenBank Release 201 (15 April 2014) Last Updated: Jun  2, 2014}.
+\texttt{         GenBank Release 213 (15 April 2016) Last Updated: May 22, 2016}.
 For specialized databases, some relevant informations are also given in the
-\texttt{details} component, for instance:
+\texttt{details} component.
 
-\begin{Schunk}
-\begin{Sinput}
- choosebank("taxobacgen")
- cat(banknameSocket$details, sep = "\n")
-\end{Sinput}
-\begin{Soutput}
-               ****     ACNUC Data Base Content      ****
-                 TaxoBacGen Rel. 7 (September 2005)
-1,151,149,763 bases; 254,335 sequences; 847,767 subseqs; 63,879 refers.
-	Data compiled from GenBank by Gregory Devulder 
-        Laboratoire de Biometrie & Biologie Evolutive, Univ Lyon I
-------------------------------
-This database is a taxonomic genomic database. 
-It results from an expertise crossing the data nomenclature database DSMZ
-[http://www.dsmz.de/species/bacteria.htm Deutsche Sammlung von
-Mikroorganismen und Zellkulturen GmbH, Braunschweig, Germany]
-and GenBank. 
-- Only contains sequences described under species present in 
-Bacterial Nomenclature Up-to-date.
-- Names of species and genus validly published according to the
-Bacteriological Code (names with standing in nomenclature) is 
-added in field "DEFINITION".
-- A keyword "type strain" is added in field "FEATURES/source/strain" in
-GenBank format definition to easyly identify Type Strain.
-Taxobacgen is a genomic database designed for studies based on a strict
-respect of up-to-date nomenclature and taxonomy.
-\end{Soutput}
-\begin{Sinput}
- closebank()
-\end{Sinput}
-\end{Schunk}
-
 As from \seqinr~1.0-3, the result of the \texttt{choosebank()} function is automatically
-stored in a global variable named \texttt{banknameSocket}, so that if no socket argument
+stored in a variable named \texttt{banknameSocket} in the \texttt{.seqinrEnv}
+environment, so that if no socket argument
 is given to the \texttt{query()} function, the last opened database will be used by default
 for your requests.
 This is just a matter of convenience so that you don't have to explicitly specify the details of the
@@ -231,7 +199,7 @@
 full control of the process since \texttt{choosebank()} returns (invisibly) all the
 required details. There is no trouble to open \emph{simultaneously} many databases.
 You are just limited by the number of simultaneous connections your build of \Rlogo{}~is
-allowed\footnote{
+allowed\footnote{%
 As from \Rlogo{}~2.4.0 he maximum number of open connections has been increased from
 50 to 128. Note also that 
 there is a very convenient function called \texttt{closeAllConnections()} in the \Rlogo{}~base package if
@@ -254,7 +222,7 @@
    if(inherits(bkopenres, "try-error")){
      ntaxa[i] <- NA
    } else {
-     ntaxa[i] <- as.numeric(banknameSocket$totspecs)
+     ntaxa[i] <- as.numeric(bkopenres$totspecs)
      closebank()
    }
  }
@@ -263,6 +231,7 @@
 \end{Schunk}
 \begin{Schunk}
 \begin{Sinput}
+ ntaxa <- ntaxa[!is.na(ntaxa)]
  dotchart(log10(ntaxa[order(ntaxa)]), pch = 19,
  main = "Number of taxa in available databases",
  xlab = "Log10(number of taxa)")
@@ -273,12 +242,37 @@
 \section{Make your query}
 
 For this section, set up the default bank to GenBank, so that you don't have 
-to provide the sockets details for the \texttt{query()} function:
+to provide the sockets details for the \texttt{query()} function. We set the
+\texttt{verbose} argument to \texttt{TRUE}, just for the fun\footnote{%
+This option is however usefull for trouble shooting.}, this is not
+really usefull here:
 
 \begin{Schunk}
 \begin{Sinput}
- choosebank("genbank")
+ choosebank("genbank", verbose = TRUE)
 \end{Sinput}
+\begin{Soutput}
+Verbose mode is on, parameter values are:
+  bank =  "genbank" 
+  host =  "pbil.univ-lyon1.fr" 
+  port =  5558 
+  timeout =  5 seconds 
+  infobank =  FALSE 
+  tagbank =  NA 
+I'm ckecking that sockets are available on this build of R...
+... yes, sockets are available on this build of R.
+I'm trying to open the socket connection...
+... yes, I was able to open the socket connection.
+I'm trying to read answer from server...
+... answer from server is: OK acnuc socket started 
+clientid(): sending clientid&id=seqinr_3.0-11 
+... answer from server is: code=0 
+parser.socket received: -->code=0<--
+I'm trying to open the bank from server...
+... and everything is OK up to now.
+I'm trying to get information on the bank...
+... and everything is OK up to now.
+\end{Soutput}
 \end{Schunk}
 
 Then, you have to say what you want, that is to compose a query
@@ -295,7 +289,7 @@
 
 \begin{Schunk}
 \begin{Sinput}
- query("completeCatsCDS", "sp=felis catus AND t=cds AND NOT k=partial")
+ completeCatsCDS <- query("completeCatsCDS", "sp=felis catus AND t=cds AND NOT k=partial")
 \end{Sinput}
 \end{Schunk}
 
@@ -324,9 +318,6 @@
 in a given database is given by the function \texttt{getType()}, for example the list
 of available subsequences in GenBank is given in table \ref{genbank}.
 
-%
-% Besoin d'edition manuelle du fichier genbank.tex pour virer les caracteres spéciaux Latex, ici "_"
-%
 \input{../tables/genbank.tex}
 
 
@@ -355,7 +346,7 @@
 
 \begin{Schunk}
 \begin{Sinput}
- query("ccc2004", "completeCatsCDS AND y=2004")
+ ccc2004 <- query("ccc2004", "completeCatsCDS AND y=2004")
  length(ccc2004$req)
 \end{Sinput}
 \begin{Soutput}
@@ -378,20 +369,16 @@
 
 \begin{Schunk}
 \begin{Sinput}
- query("allcds", "t=cds", virtual = TRUE)
+ allcds <- query("allcds", "t=cds", virtual = TRUE)
  allcds$nelem
 \end{Sinput}
 \begin{Soutput}
-[1] 20580107
+[1] 34260201
 \end{Soutput}
 \end{Schunk}
 
-There are therefore \texttt{20,580,107} coding
-sequences in this version of GenBank\footnote{
-which is stored in the \texttt{release} component of the object \texttt{banknameSocket}
-and current value is today (\today): \texttt{banknameSocket\$release = 
-         GenBank Release 201 (15 April 2014) Last Updated: Jun  2, 2014}.
-}. 
+There are therefore \texttt{34,260,201} coding
+sequences in this version of GenBank. 
 It would be long to get all the informations for the elements
 of this list, so we have set the parameter \texttt{virtual} to \texttt{TRUE} and the \texttt{req}
 component of the list has not been documented:
@@ -405,7 +392,7 @@
 \end{Soutput}
 \end{Schunk}
 
-However, the list can still be re-used\footnote{
+However, the list can still be re-used\footnote{%
 of course, as long as the socket connection with the server has not been lost: virtual lists details are only
 known by the server.}, 
 for instance we may extract from this list all the sequences
@@ -413,21 +400,21 @@
 
 \begin{Schunk}
 \begin{Sinput}
- query("small", "allcds AND sp=mycoplasma genitalium", virtual = TRUE)
+ small <- query("small", "allcds AND sp=mycoplasma genitalium", virtual = TRUE)
  small$nelem
 \end{Sinput}
 \begin{Soutput}
-[1] 3346
+[1] 3382
 \end{Soutput}
 \end{Schunk}
 
-There are then \texttt{3,346} elements in
+There are then \texttt{3,382} elements in
 the list \texttt{small}, so that we can safely repeat the previous query without asking for a
 virtual list:
 
 \begin{Schunk}
 \begin{Sinput}
- query("small", "allcds et sp=mycoplasma genitalium")
+ small <- query("small", "allcds AND sp=mycoplasma genitalium")
  getName(small$req[1:10])
 \end{Sinput}
 \begin{Soutput}
@@ -443,59 +430,59 @@
 \item[\textbf{Man.}] How many sequences are available for our species?
 \begin{Schunk}
 \begin{Sinput}
- query("man","sp=homo sapiens",virtual=T)
+ man <- query("man","sp=homo sapiens",virtual=T)
  man$nelem
 \end{Sinput}
 \begin{Soutput}
-[1] 20581455
+[1] 23519997
 \end{Soutput}
 \end{Schunk}
-There are \texttt{20,581,455} sequences from \textit{Homo sapiens}.
+There are \texttt{23,519,997} sequences from \textit{Homo sapiens}.
 
 \item[\textbf{Sex.}] How many sequences are annotated with a keyword starting by sex?
 \begin{Schunk}
 \begin{Sinput}
- query("sex","k=sex@",virtual=T)
+ sex <- query("sex","k=sex@",virtual=T)
  sex$nelem
 \end{Sinput}
 \begin{Soutput}
-[1] 2977
+[1] 3577
 \end{Soutput}
 \end{Schunk}
-There are \texttt{2,977} such sequences.
+There are \texttt{3,577} such sequences.
 
 \item[\textbf{tRNA.}] How many complete tRNA sequences are available?
 \begin{Schunk}
 \begin{Sinput}
- query("trna","t=trna AND NOT k=partial",virtual=T)
+ trna <- query("trna","t=trna AND NOT k=partial",virtual=T)
  trna$nelem
 \end{Sinput}
 \begin{Soutput}
-[1] 1260401
+[1] 1810833
 \end{Soutput}
 \end{Schunk}
-There are \texttt{1,260,401} complete tRNA sequences.
+There are \texttt{1,810,833} complete tRNA sequences.
 
 \item[\textbf{Nature vs. Science.}] In which journal were the more sequences published?
 \begin{Schunk}
 \begin{Sinput}
- query("nature","j=nature",virtual=T)
+ nature <- query("nature","j=nature",virtual=T)
  nature$nelem
 \end{Sinput}
 \begin{Soutput}
-[1] 2619977
+[1] 2645373
 \end{Soutput}
 \begin{Sinput}
- query("science","j=science",virtual=T)
+ science <- query("science","j=science",virtual=T)
  science$nelem
 \end{Sinput}
 \begin{Soutput}
-[1] 2227746
+[1] 2244003
 \end{Soutput}
 \end{Schunk}
-There are \texttt{2,619,977} sequences published
+There are \texttt{2,645,373} sequences published
 in \textit{Nature} and
-\texttt{2,227,746} sequences published in
+\texttt{2,244,003} sequences published in
 \textit{Science}, so that the winner is 
 \textit{Nature}.
 
@@ -506,47 +493,47 @@
 \item[\textbf{Smith.}] How many sequences have Smith (last name) as author?
 \begin{Schunk}
 \begin{Sinput}
- query("smith","au=smith",virtual=T)
+ smith <- query("smith","au=smith",virtual=T)
  smith$nelem
 \end{Sinput}
 \begin{Soutput}
-[1] 6239128
+[1] 6433901
 \end{Soutput}
 \end{Schunk}
-There are \texttt{6,239,128} such sequences.
+There are \texttt{6,433,901} such sequences.
 
 \item[\textbf{YK2.}] How many sequences were published after year 2000 (included)?
 \begin{Schunk}
 \begin{Sinput}
- query("yk2","y>2000",virtual=T)
+ yk2 <- query("yk2","y>2000",virtual=T)
  yk2$nelem
 \end{Sinput}
 \begin{Soutput}
-[1] 160690121
+[1] 182398606
 \end{Soutput}
 \end{Schunk}
-There are \texttt{160,690,121} sequences published after year 2000.
+There are \texttt{182,398,606} sequences published after year 2000.
 
 \item[\textbf{Organelle contest.}] Do we have more sequences from chloroplast genomes or from mitochondion genomes?
 \begin{Schunk}
 \begin{Sinput}
- query("chloro","o=chloroplast",virtual=T)
+ chloro <- query("chloro","o=chloroplast",virtual=T)
  chloro$nelem
 \end{Sinput}
 \begin{Soutput}
-[1] 644245
+[1] 870722
 \end{Soutput}
 \begin{Sinput}
- query("mito","o=mitochondrion",virtual=T)
+ mito <- query("mito","o=mitochondrion",virtual=T)
  mito$nelem
 \end{Sinput}
 \begin{Soutput}
-[1] 2235491
+[1] 3479548
 \end{Soutput}
 \end{Schunk}
-There are \texttt{644,245} sequences from
+There are \texttt{870,722} sequences from
 chloroplast genomes and
-\texttt{2,235,491} sequences from mitochondrion
+\texttt{3,479,548} sequences from mitochondrion
 genomes, so that the winner is 
 mitochondrion.
 
@@ -564,15 +551,6 @@
 
 \subsection{Introduction}
 
-There are two functions to get the sequences. The first one, 
-\texttt{getSequence()}, uses regular socket connections, the
-second one, \texttt{extractseqs()}, uses zlib compressed sockets,
-which is faster but the function is experimental (details in
-chapter \ref{extractseqs} page \pageref{extractseqs}).
-
-\subsection{Extacting sequences with \texttt{getSequence()}}
-
-
 For this section we set up the bank to \texttt{emblTP} which is a frozen
 subset of EMBL database to allow for the reproducibility of results.
 
@@ -587,7 +565,7 @@
 
 \begin{Schunk}
 \begin{Sinput}
- query("completeCatsCDS", "sp=felis catus AND t=cds AND NOT k=partial")
+ completeCatsCDS <- query("completeCatsCDS", "sp=felis catus AND t=cds AND NOT k=partial")
  (nseq <- completeCatsCDS$nelem)
 \end{Sinput}
 \begin{Soutput}
@@ -647,7 +625,7 @@
 
 \begin{Schunk}
 \begin{Sinput}
- query("trs","N=AE003734.PE35")
+ trs <- query("trs","N=AE003734.PE35")
  getAnnot(trs$req[[1]]) -> annots
  cat(annots, sep="\n")
 \end{Sinput}
@@ -684,7 +662,7 @@
 
 \begin{Schunk}
 \begin{Sinput}
- query("transspliced", "N=AE003734.PE35")
+ transspliced <- query("transspliced", "N=AE003734.PE35")
  length(transspliced$req)
 \end{Sinput}
 \begin{Soutput}
@@ -715,7 +693,7 @@
 \end{Schunk}
 
 All the complex trans-splicing operations have been done here. You can check that there is no
-in-frame stop codons\footnote{
+in-frame stop codons\footnote{%
 Stop codons are represented by the character \texttt{*} when translated into protein.} 
 with the \texttt{getTrans()} function to translate this coding sequence into protein:
 
@@ -774,7 +752,7 @@
 
 \begin{Schunk}
 \begin{Sinput}
- query("multi", "AC=M19233 AND T=CDS")
+ multi <- query("multi", "AC=M19233 AND T=CDS")
  cat(getAnnot(multi$req[[1]]), sep = "\n")
 \end{Sinput}
 \begin{Soutput}
@@ -834,15 +812,18 @@
 \end{Schunk}
 
 There is no stop codon here because the sequence is partial.
+If you are experiencing a strong closure issue problem here,
+just close the bank:
 
-
 \begin{Schunk}
 \begin{Sinput}
  closebank()
 \end{Sinput}
 \end{Schunk}
 
+\noindent Feeling better now ?
 
+
 \section*{Session Informations}
 
 \begin{scriptsize}
@@ -850,20 +831,20 @@
 This part was compiled under the following \Rlogo{}~environment:
 
 \begin{itemize}\raggedright
-  \item R version 3.1.0 (2014-04-10), \verb|x86_64-apple-darwin13.1.0|
+  \item R version 3.2.4 (2016-03-10), \verb|x86_64-apple-darwin13.4.0|
   \item Locale: \verb|fr_FR.UTF-8/fr_FR.UTF-8/fr_FR.UTF-8/C/fr_FR.UTF-8/fr_FR.UTF-8|
   \item Base packages: base, datasets, graphics, grDevices, grid,
     methods, stats, utils
-  \item Other packages: ade4~1.6-2, ape~3.1-2, grImport~0.9-0,
-    MASS~7.3-31, seqinr~3.0-11, tseries~0.10-32, XML~3.98-1.1,
-    xtable~1.7-3
-  \item Loaded via a namespace (and not attached): lattice~0.20-29,
-    nlme~3.1-117, quadprog~1.5-5, tools~3.1.0, zoo~1.7-11
+  \item Other packages: ade4~1.7-4, ape~3.5, grImport~0.9-0,
+    MASS~7.3-45, seqinr~3.0-11, tseries~0.10-35, XML~3.98-1.4,
+    xtable~1.8-2
+  \item Loaded via a namespace (and not attached): lattice~0.20-33,
+    nlme~3.1-125, quadprog~1.5-5, tools~3.2.4, zoo~1.7-12
 \end{itemize}
 There were two compilation steps:
 
 \begin{itemize}
-  \item \Rlogo{} compilation time was: Fri Jun  6 18:11:38 2014
+  \item \Rlogo{} compilation time was: Tue May 31 21:23:09 2016
   \item \LaTeX{} compilation time was: \today
 \end{itemize}
 



More information about the Seqinr-commits mailing list