[adegenet-commits] r893 - in pkg: R inst/doc
noreply at r-forge.r-project.org
noreply at r-forge.r-project.org
Tue May 31 11:46:39 CEST 2011
Author: jombart
Date: 2011-05-31 11:46:39 +0200 (Tue, 31 May 2011)
New Revision: 893
Modified:
pkg/R/SNPbin.R
pkg/inst/doc/adegenet-genomics.Rnw
Log:
Abstract of genomics vignette.
Corrected a minor issue in SNPbin constructor.
Modified: pkg/R/SNPbin.R
===================================================================
--- pkg/R/SNPbin.R 2011-05-30 17:13:17 UTC (rev 892)
+++ pkg/R/SNPbin.R 2011-05-31 09:46:39 UTC (rev 893)
@@ -112,7 +112,7 @@
}
## handle full-NA data
- if(all(is.na(input$snp))){
+ if(!is.null(input$snp) && all(is.na(input$snp))){
x at snp <- list()
x at n.loc <- length(input$snp)
x at snp[[1]] <- .bin2raw(rep(0L, length(input$snp)))$snp
@@ -130,8 +130,12 @@
if(!is.null(input$n.loc)){
x at n.loc <- as.integer(input$n.loc)
} else {
- warning("number of SNPs (n.loc) not provided to the genlight constructor - using the maximum number given data coding.")
- x at n.loc <- as.integer(length(x at snp)*8)
+ if(!is.null(input$snp)){
+ warning("number of SNPs (n.loc) not provided to the genlight constructor - using the maximum number given data coding.")
+ x at n.loc <- as.integer(length(x at snp)*8)
+ } else {
+ x at n.loc <- 0L
+ }
}
Modified: pkg/inst/doc/adegenet-genomics.Rnw
===================================================================
--- pkg/inst/doc/adegenet-genomics.Rnw 2011-05-30 17:13:17 UTC (rev 892)
+++ pkg/inst/doc/adegenet-genomics.Rnw 2011-05-31 09:46:39 UTC (rev 893)
@@ -1,6 +1,6 @@
\documentclass{article}
% \VignettePackage{adegenet-genomics}
-% \VignetteIndexEntry{Analysing genomic data using adegenet}
+% \VignetteIndexEntry{Analysing genome-wide SNP data using adegenet}
\usepackage{graphicx}
\usepackage[colorlinks=true,urlcolor=blue]{hyperref}
@@ -9,8 +9,8 @@
\usepackage[utf8]{inputenc} % for UTF-8/single quotes from sQuote()
\newcommand{\code}[1]{{{\tt #1}}}
-\title{Analysing genomic data using adegenet}
-\author{Thibaut Jombart and Isma\"il Ahmed}
+\title{Analysing genomic-wide SNP data using adegenet}
+\author{Thibaut Jombart}
\date{\today}
@@ -36,10 +36,24 @@
\color{black}
\maketitle
+
+\begin{abstract}
+ Genome-wide SNP data can quickly be challenging to analyse using standard
+ computer. \textit{adegenet} implements representation of these data with unprecedented efficiency
+ using the classes \texttt{SNPbin} and \texttt{genlight}, which can require up to 60 times less RAM than usual
+ representation using allele frequencies.
+ This vignette introduces these classes and illustrates how these objects can be handled and
+ analyzed in R.
+ It also introduces more advanced features of an API in C language which may be useful to develop
+ new method based on these objects.
+\end{abstract}
+
+\newpage
+
\tableofcontents
-
+\newpage
%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%
\section{Introduction}
@@ -47,14 +61,19 @@
%%%%%%%%%%%%%%%%
Modern sequencing technologies now make complete genomes more widely accessible.
The subsequent amounts of genetic data pose challenges in terms of storing and handling the data,
-making former tools developed for classical genetic markers such as microsatellite impracticable on
+making former tools developed for classical genetic markers such as microsatellite impracticable using
standard computers.
Adegenet has developed new object classes dedicated to handling genome-wide polymorphism (SNPs) with
minimum rapid access memory (RAM) requirements.
+\\
Two new formal classes have been implemented: \texttt{SNPbin}, used to store genome-wide SNPs for
one individual, and \texttt{genlight}, which stored the same information for multiple individuals.
-In this vignette, we present these classes and show how they can be used for genetic data analysis.
+Information represented this way is binary: only biallelic SNPs can be stored and analyzed using these classes.
+However, these objects are otherwise very flexible, and can incorporate different levels of ploidy
+across individuals within a single dataset.
+In this vignette, we present these object classes and show how their content can be further handled and
+content analyzed.
@@ -69,8 +88,28 @@
%%%%%%%%%%%%%%%%
\subsection{\code{SNPbin}: storage of single genomes}
%%%%%%%%%%%%%%%%
+The class \texttt{SNPbin} is the core representation of biallelic SNPs which allows to represent
+data with unprecedented efficiency.
+The essential idea is to code binary SNPs not as integers, but as bits. This operation is tricky in
+R as there is no handling of bits, only bytes -- series of 8 bits. However, the class
+\texttt{SNPbin} handles this transparently using sub-rountines in C language.
+Considerable efforts have been made so that the user does not have to dig into the complex internal
+structure of the objects, and can handle \texttt{SNPbin} objects as easily as possible.
+\\
+Like \texttt{genind} and \texttt{genpop} objects, \texttt{SNPbin} is a formal "S4" class. The
+structure of these objects is detailed in the dedicated manpage (\texttt{?SNPbin}). As all S4
+objects, instances of the class \texttt{SNPbin} are composed of slots accessible using the
+\texttt{@} operator. This content is generic (it is the same for all instances of the class), and returned by:
+<<>>=
+library(adegenet)
+getClassDef("SNPbin")
+@
+
+
+
+
%%%%%%%%%%%%%%%%
\subsection{\code{genlight}: storage of multiple genomes}
%%%%%%%%%%%%%%%%
More information about the adegenet-commits
mailing list