[adegenet-commits] r896 - in pkg: R inst/doc

Tue May 31 15:21:27 CEST 2011

Author: jombart
Date: 2011-05-31 15:21:27 +0200 (Tue, 31 May 2011)
New Revision: 896

Modified:
   pkg/R/glHandle.R
   pkg/inst/doc/adegenet-genomics.Rnw
   pkg/inst/doc/adegenet-genomics.tex
Log:
Fixed a minor issue when subsetting SNPs in genlight - loci's names were lost.


Modified: pkg/R/glHandle.R
===================================================================

--- pkg/R/glHandle.R	2011-05-31 13:07:26 UTC (rev 895)
+++ pkg/R/glHandle.R	2011-05-31 13:21:27 UTC (rev 896)
@@ -70,6 +70,7 @@
         return(x)
     } else { # need to subset SNPs
         old.other <- other(x)
+        old.ind.names <- indNames(x)
 
         ## handle loc.names, chromosome and position
         new.loc.names <- locNames(x)[j]
@@ -77,7 +78,8 @@
         new.position <- position(x)[j]
         new.gen <- lapply(x at gen, function(e) e[j])
         ##x <- as.matrix(x)[, j, drop=FALSE] # maybe need to process one row at a time
-        x <- new("genlight", gen=new.gen, pop=ori.pop, ploidy=ori.ploidy, loc.names=new.loc.names,
+        x <- new("genlight", gen=new.gen, pop=ori.pop, ploidy=ori.ploidy,
+                 ind.names=old.ind.names, loc.names=new.loc.names,
                  chromosome=new.chr, position=new.position, other=old.other)
     }
 

Modified: pkg/inst/doc/adegenet-genomics.Rnw
===================================================================
--- pkg/inst/doc/adegenet-genomics.Rnw	2011-05-31 13:07:26 UTC (rev 895)
+++ pkg/inst/doc/adegenet-genomics.Rnw	2011-05-31 13:21:27 UTC (rev 896)
@@ -336,8 +336,22 @@
 \end{itemize}
 
 
+Accessors are meant to be clever about replacement, meaning that they try hard to prevent
+replacement with inconsistent values. For instance, if we try to set information about the
+chromosomes of the SNPs, the provided factor has to match the number of loci:
+<<>>=
+x
+temp <- try(chr(x) <- rep("chr-1", 7), silent=TRUE)
+temp
+chr(x) <- rep("chr-1", 10)
+x
+chr(x)
+@
 
 
+
+
+
 %%%%%%%%%%%%%%%%
 \subsection{Subsetting the data}
 %%%%%%%%%%%%%%%%

Modified: pkg/inst/doc/adegenet-genomics.tex
===================================================================
--- pkg/inst/doc/adegenet-genomics.tex	2011-05-31 13:07:26 UTC (rev 895)
+++ pkg/inst/doc/adegenet-genomics.tex	2011-05-31 13:21:27 UTC (rev 896)
@@ -9,7 +9,7 @@
 
 \usepackage[utf8]{inputenc} % for UTF-8/single quotes from sQuote()
 \newcommand{\code}[1]{{{\tt #1}}}
-\title{Analysing genomic-wide SNP data using adegenet}
+\title{Analysing genomic-wide SNP data using  \textit{adegenet} 1.3-0}
 \author{Thibaut Jombart}
 \date{\today}
 
@@ -40,7 +40,8 @@
 
 \begin{abstract}
   Genome-wide SNP data can quickly be challenging to analyse using standard
-  computer. \textit{adegenet} implements representation of these data with unprecedented efficiency
+  computer. The package \textit{adegenet} \cite{tjart05} for the R software \cite{np145}
+  implements representation of these data with unprecedented efficiency
   using the classes \texttt{SNPbin} and \texttt{genlight}, which can require up to 60 times less RAM than usual
   representation using allele frequencies.
   This vignette introduces these classes and illustrates how these objects can be handled and
@@ -118,13 +119,13 @@
 \end{Schunk}
 
 The slots respectively contain:
-\begin{description}
+\begin{itemize}
   \item \texttt{snp}: SNP data with specific internal coding.
   \item \texttt{n.loc}: the number of SNPs stored in the object.
   \item \texttt{NA.posi}: position of the missing data (NAs).
   \item \texttt{label}: an optional label for the individual.
   \item \texttt{ploidy}: the ploidy level of the genome.
-\end{description}
+\end{itemize}
 
 New objects are created using \texttt{new}, with these slots as arguments.
 If no argument is provided, an empty object is created:
@@ -276,7 +277,7 @@
 \end{Schunk}
 As it can be seen, these objects allow for storing more information in addition to vectors of SNP frequencies.
 More precisely, their content is (see \texttt{?genlight} for more details):
-\begin{description}
+\begin{itemize}
   \item \texttt{gen}: SNP data for different individuals, each stored as a \texttt{SNPbin}; loci
     have to be identical across all individuals.
   \item \texttt{n.loc}: the number of SNPs stored in the object.
@@ -289,9 +290,9 @@
   \item \texttt{pop}: (optional) a factor grouping individuals into 'populations'.
   \item \texttt{other}: (optional) a list containing any supplementary information to be stored with
     the data.
-\end{description}
+\end{itemize}
 
-\noindent Like \texttt{SNbin} object, \texttt{genlight} object are created using the constructor \texttt{new},
+\noindent Like \texttt{SNPbin} object, \texttt{genlight} object are created using the constructor \texttt{new},
 providing content for the slots above as arguments.
 When none is provided, an empty object is created:
 \begin{Schunk}
@@ -305,11 +306,11 @@
 \end{Schunk}
 The most important information to provide is obviously the genotypes (argument \texttt{gen}); these
 can be provided as:
-\begin{description}
+\begin{itemize}
 \item a \texttt{list} of integer vectors representing the number of second allele at each locus.
 \item a \texttt{matrix} / \texttt{data.frame} of integers, with individuals in rows and SNPs in columns.
 \item a list of \texttt{SNPbin} objects.
-\end{description}
+\end{itemize}
 
 Ploidy has to be consistent across loci for a given individual, but individuals do not have to have
 the same ploidy, so that it is possible to have hapoid,
@@ -387,7 +388,7 @@
 > object.size(dat)/object.size(x)
 \end{Sinput}
 \begin{Soutput}
-61.6340315378476 bytes
+61.6432258100309 bytes
 \end{Soutput}
 \end{Schunk}
 here again, the storage if the data is much more efficient in \texttt{genlight} than using integers: converted data occupy
@@ -406,25 +407,163 @@
 \item handling smaller objects, thereby decreasing the possibly high computational time taken by memory allocation.
 \end{enumerate}
 
-While this makes implementing methods more complicated, considerable efforts have been devoted to
-making these issues oblivious to the user. In practice, routines are implemented so as to minimize
+While this makes implementing methods more complicated.
+In practice, routines are implemented so as to minimize
 the amount of data converted back to integers, use C code where possible, and use multiple cores
 if the package \textit{multicore} is installed an multiple cores are available.
+Fortunately, these underlying technical issues are oblivious to the user, and one merely needs to
+know how to manipulate \texttt{genlight} objects using a few key functions to be able to analyze data.
 
 
 
 
+
+
 %%%%%%%%%%%%%%%%
 %%%%%%%%%%%%%%%%
 \section{In practice}
 %%%%%%%%%%%%%%%%
 %%%%%%%%%%%%%%%%
 
+%%%%%%%%%%%%%%%%
+\subsection{Using accessors}
+%%%%%%%%%%%%%%%%
 
+In the following, we demonstrate how to manipulate and analyse \texttt{genlight} objects.
+The phylosophy underlying formal (S4) classes in general, and \texttt{genlight} objects in
+particular, is that internal representation of the information can be complex as long as accessing
+this information is simple.
+This is made possible by decoupling storage and accession: the user is not meant to access the
+content of the object directly, but has to use \texttt{accessors} to retrieve or modify information.
+\\
 
+Available accessors are documented in \code{?genlight}.
+Most of them are identical to accessors for \texttt{genind} and \texttt{genpop} objects, such as:
+\begin{itemize}
+  \item \texttt{nInd}: returns the number of individuals in the object.
+  \item \texttt{nLoc}: returns the number of loci (SNPs).
+  \item \texttt{indNames}$^*$: returns/sets labels for individuals.
+  \item \texttt{locNames}$^*$: returns/sets labels for loci (SNPs).
+  \item \texttt{alleles}$^*$: returns/sets alleles.
+  \item \texttt{ploidy}$^*$: returns/sets ploidy of the individuals.
+  \item \texttt{pop}$^*$: returns/sets a factor grouping individuals.
+  \item \texttt{other}$^*$: returns/sets misc information stored as a list.
+\end{itemize}
+where $^*$ indicates that a replacement method is available using \texttt{<-'}; for instance:
+\begin{Schunk}
+\begin{Sinput}
+> dat <- lapply(1:3, function(i) sample(0:2, 10, replace = TRUE))
+> dat
+\end{Sinput}
+\begin{Soutput}
+[[1]]
+ [1] 0 0 0 2 1 2 1 2 0 0
 
+[[2]]
+ [1] 2 2 1 1 1 0 1 0 2 1
+
+[[3]]
+ [1] 2 1 0 2 2 0 1 2 2 0
+\end{Soutput}
+\begin{Sinput}
+> x <- new("genlight", dat)
+> x
+\end{Sinput}
+\begin{Soutput}
+ === S4 class genlight ===
+ 3 genotypes,  10 binary SNPs
+ Ploidy: 2
+ 0 (0 %) missing data
+\end{Soutput}
+\begin{Sinput}
+> indNames(x)
+\end{Sinput}
+\begin{Soutput}
+NULL
+\end{Soutput}
+\begin{Sinput}
+> indNames(x) <- paste("individual", 1:3)
+> indNames(x)
+\end{Sinput}
+\begin{Soutput}
+[1] "individual 1" "individual 2" "individual 3"
+\end{Soutput}
+\begin{Sinput}
+> locNames(x)
+\end{Sinput}
+\begin{Soutput}
+NULL
+\end{Soutput}
+\begin{Sinput}
+> locNames(x) <- paste("SNP", 1:nLoc(x), sep = ".")
+> as.matrix(x)
+\end{Sinput}
+\begin{Soutput}
+             SNP.1 SNP.2 SNP.3 SNP.4 SNP.5 SNP.6 SNP.7 SNP.8 SNP.9 SNP.10
+individual 1     0     0     0     2     1     2     1     2     0      0
+individual 2     2     2     1     1     1     0     1     0     2      1
+individual 3     2     1     0     2     2     0     1     2     2      0
+\end{Soutput}
+\end{Schunk}
+
+\noindent
+In addition, some specific accessors are available for \texttt{genlight} objects:
+\begin{itemize}
+  \item \texttt{NA.posi}: returns the position of missing values in each individual.
+  \item \texttt{chromosome}$^*$: returns/sets the chromosome of each SNP.
+  \item \texttt{chr}$^*$: same as \texttt{chromosome} --- used as a shortcut.
+  \item \texttt{position}$^*$: returns/sets the position of each SNP.
+\end{itemize}
+
+
+Accessors are meant to be clever about replacement, meaning that they try hard to prevent
+replacement with inconsistent values. For instance, if we try to set information about the
+chromosomes of the SNPs, the provided factor has to match the number of loci:
+\begin{Schunk}
+\begin{Sinput}
+> x
+\end{Sinput}
+\begin{Soutput}
+ === S4 class genlight ===
+ 3 genotypes,  10 binary SNPs
+ Ploidy: 2
+ 0 (0 %) missing data
+\end{Soutput}
+\begin{Sinput}
+> temp <- try(chr(x) <- rep("chr-1", 7), silent = TRUE)
+> temp
+\end{Sinput}
+\begin{Soutput}
+[1] "Error in `chromosome<-`(`*tmp*`, value = c(\"chr-1\", \"chr-1\", \"chr-1\",  : \n  Vector length does no match number of loci\n"
+attr(,"class")
+[1] "try-error"
+\end{Soutput}
+\begin{Sinput}
+> chr(x) <- rep("chr-1", 10)
+> x
+\end{Sinput}
+\begin{Soutput}
+ === S4 class genlight ===
+ 3 genotypes,  10 binary SNPs
+ Ploidy: 2
+ 0 (0 %) missing data
+ @chromosome: chromosome of the SNPs
+\end{Soutput}
+\begin{Sinput}
+> chr(x)
+\end{Sinput}
+\begin{Soutput}
+ [1] chr-1 chr-1 chr-1 chr-1 chr-1 chr-1 chr-1 chr-1 chr-1 chr-1
+Levels: chr-1
+\end{Soutput}
+\end{Schunk}
+
+
+
+
+
 %%%%%%%%%%%%%%%%
-\subsection{Using accessors}
+\subsection{Subsetting the data}
 %%%%%%%%%%%%%%%%
 
 
@@ -452,5 +591,18 @@
 
 
 
+\begin{thebibliography}{9}
 
+\bibitem{tjart05}
+  Jombart, T. (2008) adegenet: a R package for the multivariate
+  analysis of genetic markers. \textit{Bioinformatics} 24: 1403-1405.
+
+\bibitem{np145}
+  R Development Core Team (2011). R: A language and environment for
+  statistical computing. R Foundation for Statistical Computing,
+  Vienna, Austria. ISBN 3-900051-07-0.
+
+\end{thebibliography}
+
+
 \end{document}