[adegenet-commits] r893 - in pkg: R inst/doc

noreply at r-forge.r-project.org noreply at r-forge.r-project.org
Tue May 31 11:46:39 CEST 2011


Author: jombart
Date: 2011-05-31 11:46:39 +0200 (Tue, 31 May 2011)
New Revision: 893

Modified:
   pkg/R/SNPbin.R
   pkg/inst/doc/adegenet-genomics.Rnw
Log:
Abstract of genomics vignette.
Corrected a minor issue in SNPbin constructor.


Modified: pkg/R/SNPbin.R
===================================================================
--- pkg/R/SNPbin.R	2011-05-30 17:13:17 UTC (rev 892)
+++ pkg/R/SNPbin.R	2011-05-31 09:46:39 UTC (rev 893)
@@ -112,7 +112,7 @@
     }
 
     ## handle full-NA data
-    if(all(is.na(input$snp))){
+    if(!is.null(input$snp) && all(is.na(input$snp))){
         x at snp <- list()
         x at n.loc <- length(input$snp)
         x at snp[[1]] <- .bin2raw(rep(0L, length(input$snp)))$snp
@@ -130,8 +130,12 @@
     if(!is.null(input$n.loc)){
         x at n.loc <- as.integer(input$n.loc)
     } else {
-        warning("number of SNPs (n.loc) not provided to the genlight constructor - using the maximum number given data coding.")
-        x at n.loc <- as.integer(length(x at snp)*8)
+        if(!is.null(input$snp)){
+            warning("number of SNPs (n.loc) not provided to the genlight constructor - using the maximum number given data coding.")
+            x at n.loc <- as.integer(length(x at snp)*8)
+        } else {
+            x at n.loc <- 0L
+        }
     }
 
 

Modified: pkg/inst/doc/adegenet-genomics.Rnw
===================================================================
--- pkg/inst/doc/adegenet-genomics.Rnw	2011-05-30 17:13:17 UTC (rev 892)
+++ pkg/inst/doc/adegenet-genomics.Rnw	2011-05-31 09:46:39 UTC (rev 893)
@@ -1,6 +1,6 @@
 \documentclass{article}
 % \VignettePackage{adegenet-genomics}
-% \VignetteIndexEntry{Analysing genomic data using adegenet}
+% \VignetteIndexEntry{Analysing genome-wide SNP data using adegenet}
 
 \usepackage{graphicx}
 \usepackage[colorlinks=true,urlcolor=blue]{hyperref}
@@ -9,8 +9,8 @@
 
 \usepackage[utf8]{inputenc} % for UTF-8/single quotes from sQuote()
 \newcommand{\code}[1]{{{\tt #1}}}
-\title{Analysing genomic data using adegenet}
-\author{Thibaut Jombart and Isma\"il Ahmed}
+\title{Analysing genomic-wide SNP data using adegenet}
+\author{Thibaut Jombart}
 \date{\today}
 
 
@@ -36,10 +36,24 @@
 \color{black}
 
 \maketitle
+
+\begin{abstract}
+  Genome-wide SNP data can quickly be challenging to analyse using standard
+  computer. \textit{adegenet} implements representation of these data with unprecedented efficiency
+  using the classes \texttt{SNPbin} and \texttt{genlight}, which can require up to 60 times less RAM than usual
+  representation using allele frequencies.
+  This vignette introduces these classes and illustrates how these objects can be handled and
+  analyzed in R.
+  It also introduces more advanced features of an API in C language which may be useful to develop
+  new method based on these objects.
+\end{abstract}
+
+\newpage
+
 \tableofcontents
 
 
-
+\newpage
 %%%%%%%%%%%%%%%%
 %%%%%%%%%%%%%%%%
 \section{Introduction}
@@ -47,14 +61,19 @@
 %%%%%%%%%%%%%%%%
 Modern sequencing technologies now make complete genomes more widely accessible.
 The subsequent amounts of genetic data pose challenges in terms of storing and handling the data,
-making former tools developed for classical genetic markers such as microsatellite impracticable on
+making former tools developed for classical genetic markers such as microsatellite impracticable using
 standard computers.
 Adegenet has developed new object classes dedicated to handling genome-wide polymorphism (SNPs) with
 minimum rapid access memory (RAM) requirements.
+\\
 
 Two new formal classes have been implemented: \texttt{SNPbin}, used to store genome-wide SNPs for
 one individual, and \texttt{genlight}, which stored the same information for multiple individuals.
-In this vignette, we present these classes and show how they can be used for genetic data analysis.
+Information represented this way is binary: only biallelic SNPs can be stored and analyzed using these classes.
+However, these objects are otherwise very flexible, and can incorporate different levels of ploidy
+across individuals within a single dataset.
+In this vignette, we present these object classes and show how their content can be further handled and
+content analyzed.
 
 
 
@@ -69,8 +88,28 @@
 %%%%%%%%%%%%%%%%
 \subsection{\code{SNPbin}: storage of single genomes}
 %%%%%%%%%%%%%%%%
+The class \texttt{SNPbin} is the core representation of biallelic SNPs which allows to represent
+data with unprecedented efficiency.
+The essential idea is to code binary SNPs not as integers, but as bits. This operation is tricky in
+R as there is no handling of bits, only bytes -- series of 8 bits. However, the class
+\texttt{SNPbin} handles this transparently using sub-rountines in C language.
+Considerable efforts have been made so that the user does not have to dig into the complex internal
+structure of the objects, and can handle \texttt{SNPbin} objects as easily as possible.
+\\
 
+Like \texttt{genind} and \texttt{genpop} objects, \texttt{SNPbin} is a formal "S4" class. The
+structure of these objects is detailed in the dedicated manpage (\texttt{?SNPbin}). As all S4
+objects, instances of the class \texttt{SNPbin} are composed of slots accessible using the
+\texttt{@} operator. This content is generic (it is the same for all instances of the class), and returned by:
+<<>>=
+library(adegenet)
+getClassDef("SNPbin")
+@
 
+
+
+
+
 %%%%%%%%%%%%%%%%
 \subsection{\code{genlight}: storage of multiple genomes}
 %%%%%%%%%%%%%%%%



More information about the adegenet-commits mailing list