[adegenet-commits] r755 - pkg/man

noreply at r-forge.r-project.org noreply at r-forge.r-project.org
Wed Jan 5 17:55:18 CET 2011


Author: jombart
Date: 2011-01-05 17:55:18 +0100 (Wed, 05 Jan 2011)
New Revision: 755

Added:
   pkg/man/SNPbin.Rd
   pkg/man/genlight.Rd
Modified:
   pkg/man/as.genind.Rd
   pkg/man/genind.Rd
Log:
Documented genlight as well.


Added: pkg/man/SNPbin.Rd
===================================================================
--- pkg/man/SNPbin.Rd	                        (rev 0)
+++ pkg/man/SNPbin.Rd	2011-01-05 16:55:18 UTC (rev 755)
@@ -0,0 +1,138 @@
+\name{SNPbin-class}
+\docType{class}
+\alias{SNPbin}
+\alias{SNPbin-class}
+\alias{[,SNPbin-method}
+\alias{[,SNPbin,ANY,ANY-method}
+\alias{initialize,SNPbin-method}
+\alias{show,SNPbin-method}
+\alias{nLoc,SNPbin-method}
+\alias{$,SNPbin-method}
+\alias{$<-,SNPbin-method}
+\alias{names,SNPbin-method}
+\alias{ploidy,SNPbin-method}
+\alias{as,SNPbin,integer-method}
+\alias{as.integer,SNPbin-method}
+% \alias{,SNPbin-method}
+% \alias{,SNPbin-method}
+% \alias{,SNPbin-method}
+% \alias{,SNPbin-method}
+%%%%
+\title{Formal class "SNPbin"}
+\description{
+  The class \code{SNPbin} is a formal (S4) class for storing a genotype
+  of binary SNPs in a compact way, using a bit-level coding scheme.
+  This storage is most efficient with haploid data, where the memory
+  taken to represent data can reduced more than 50 times. However,
+  \code{SNPbin} can be used for any level of ploidy, and still remain an
+  efficient storage mode.
+
+  A \code{SNPbin} object can be constructed from
+  a vector of integers giving the number of the second allele for each
+  locus.
+
+  \code{SNPbin} stores a single genotype. To store multiple genotypes,
+  use the \linkS4class{genlight} class.
+}
+\section{Objects from the class SNPbin}{
+  \code{SNPbin} objects can be created by calls to \code{new("SNPbin",
+    ...)}, where '...' can be the following arguments:
+  
+  \describe{
+    \item{\code{snp}}{a vector of integers or numeric giving numbers of
+    copies of the second alleles for each locus. If only one unnamed
+    argument is provided to 'new', it is considered as this one.}
+    \item{\code{ploidy}}{an integer indicating the ploidy of the
+    genotype; if not provided, will be guessed from the data (as the
+    maximum from the 'snp' input vector).}
+    \item{\code{label}}{an optional character string serving as a label
+    for the genotype.}
+  }
+}
+\section{Slots}{
+  The following slots are the content of instances of the class
+  \code{SNPbin}; note that in most cases, it is better to retrieve
+  information via accessors (see below), rather than by accessing the
+  slots manually.
+  \describe{
+    \item{\code{snp}:}{a list of vectors with the class \code{raw}.}
+    \item{\code{n.loc}:}{an integer indicating the number of SNPs of the
+      genotype.}
+    \item{\code{NA.posi}:}{a vector of integer giving the position of
+    missing data.}
+    \item{\code{label}:}{an optional character string serving as a label
+      for the genotype..}
+    \item{\code{ploidy}:}{an integer indicating the ploidy of the genotype.}
+}
+}
+\section{Methods}{
+  Here is a list of methods available for \code{SNPbin} objects. Most of
+    these methods are accessors, that is, functions which are used to
+    retrieve the content of the object. Specific manpages can exist for
+    accessors with more than one argument. These are indicated by a '*'
+    symbol next to the method's name. This list also contains methods
+    for conversion from \code{SNPbin} to other classes.
+  \describe{
+    \item{[}{\code{signature(x = "SNPbin")}: usual method to subset
+      objects in R. The argument indicates how SNPs are to be
+      subsetted. It can be a vector of signed integers or of logicals.}
+    \item{show}{\code{signature(x = "SNPbin")}: printing of the
+      object.}
+    \item{$}{\code{signature(x = "SNPbin")}: similar to the @ operator;
+      used to access the content of slots of the object.}
+    \item{$<-}{\code{signature(x = "SNPbin")}: similar to the @ operator;
+      used to replace the content of slots of the object.}
+    \item{nLoc}{\code{signature(x = "SNPbin")}: returns the number of
+      SNPs in the object.}
+    \item{names}{\code{signature(x = "SNPbin")}: returns the names of
+    the slots of the object.}
+    \item{ploidy}{\code{signature(x = "SNPbin")}: returns the ploidy of
+    the genotype.}
+    \item{as.integer}{\code{signature(x = "SNPbin")}: converts a
+    \code{SNPbin} object to a vector of integers. The S4 method 'as' can
+    be used as well (e.g. as(x, "integer")).}
+  }
+}
+\author{Thibaut Jombart (\email{t.jombart at imperial.ac.uk})}
+\seealso{
+ Related class:\cr
+  -  \code{\linkS4class{genlight}}, for storing multiple binary SNP
+  genotypes. \cr
+  -  \code{\linkS4class{genind}}, for storing other types of genetic markers. \cr
+}
+\examples{
+#### HAPLOID EXAMPLE ####
+## create a genotype of 1,000,000 SNPs
+dat <- sample(c(0,1,NA), 1e6, prob=c(.495, .495, .01), replace=TRUE)
+dat[1:10]
+x <- new("SNPbin", dat)
+x
+x[1:10] # subsetting
+as.integer(x[1:10])
+
+## try a few accessors
+ploidy(x)
+nLoc(x)
+head(x$snp[[1]]) # internal bit-level coding
+
+## check that conversion is OK
+identical(as(x, "integer"),as.integer(dat)) # SHOULD BE TRUE
+
+## compare the size of the objects
+print(object.size(dat), unit="auto")
+print(object.size(x), unit="auto")
+object.size(dat)/object.size(x) # EFFICIENCY OF CONVERSION
+
+
+#### TETRAPLOID EXAMPLE ####
+## create a genotype of 1,000,000 SNPs
+dat <- sample(c(0:4,NA), 1e6, prob=c(rep(.995/5,5), 0.005), replace=TRUE)
+x <- new("SNPbin", dat)
+identical(as(x, "integer"),as.integer(dat)) # MUST BE TRUE
+
+## compare the size of the objects
+print(object.size(dat), unit="auto")
+print(object.size(x), unit="auto")
+object.size(dat)/object.size(x) # EFFICIENCY OF CONVERSION
+}
+\keyword{classes}

Modified: pkg/man/as.genind.Rd
===================================================================
--- pkg/man/as.genind.Rd	2011-01-05 16:02:27 UTC (rev 754)
+++ pkg/man/as.genind.Rd	2011-01-05 16:55:18 UTC (rev 755)
@@ -49,8 +49,14 @@
 \author{Thibaut Jombart \email{t.jombart at imperial.ac.uk}}
 \seealso{
   \code{\linkS4class{genind} class}, and \code{\link{import2genind}} for
-  importing from various types of file.
+  importing from various types of file.\cr
+
+  Related classes:\cr
+  - \linkS4class{genpop} for storing data per populations\cr
+
+  - \linkS4class{genlight} for an efficient storage of binary SNPs genotypes\cr
 }
+}
 \examples{
 data(nancycats)
 nancycats at loc.names

Modified: pkg/man/genind.Rd
===================================================================
--- pkg/man/genind.Rd	2011-01-05 16:02:27 UTC (rev 754)
+++ pkg/man/genind.Rd	2011-01-05 16:55:18 UTC (rev 755)
@@ -63,8 +63,12 @@
 \seealso{\code{\link{as.genind}}, \code{\link{is.genind}}, \code{\link{genind2genpop}},
   \code{\link{genpop}}, \code{\link{import2genind}},
   \code{\link{read.genetix}}, \code{\link{read.genepop}},
-  \code{\link{read.fstat}}, \code{\link{na.replace}}
-  
+  \code{\link{read.fstat}}, \code{\link{na.replace}}\cr
+
+  Related classes:\cr
+  - \linkS4class{genpop} for storing data per populations\cr
+
+  - \linkS4class{genlight} for an efficient storage of binary SNPs genotypes\cr
 }
 \author{ Thibaut Jombart \email{t.jombart at imperial.ac.uk} }
 \examples{

Added: pkg/man/genlight.Rd
===================================================================
--- pkg/man/genlight.Rd	                        (rev 0)
+++ pkg/man/genlight.Rd	2011-01-05 16:55:18 UTC (rev 755)
@@ -0,0 +1,178 @@
+\name{genlight-class}
+\docType{class}
+\alias{genlight}
+\alias{genlight-class}
+\alias{[,genlight-method}
+\alias{[,genlight,ANY,ANY-method}
+\alias{initialize,genlight-method}
+\alias{show,genlight-method}
+\alias{nLoc,genlight-method}
+\alias{nInd,genlight-method}
+\alias{$,genlight-method}
+\alias{$<-,genlight-method}
+\alias{names,genlight-method}
+\alias{ploidy,genlight-method}
+\alias{locNames,genlight-method}
+\alias{indNames,genlight-method}
+\alias{as,genlight,matrix-method}
+\alias{as.matrix,genlight-method}
+\alias{as,genlight,data.frame-method}
+\alias{as.data.frame,genlight-method}
+\alias{as,genlight,list-method}
+\alias{as.list,genlight-method}
+% \alias{,genlight-method}
+% \alias{,genlight-method}
+% \alias{,genlight-method}
+% \alias{,genlight-method}
+%%%%
+\title{Formal class "genlight"}
+\description{
+  The class \code{genlight} is a formal (S4) class for storing a genotypes
+  of binary SNPs in a compact way, using a bit-level coding scheme.
+  This storage is most efficient with haploid data, where the memory
+  taken to represent data can reduced more than 50 times. However,
+  \code{genlight} can be used for any level of ploidy, and still remain an
+  efficient storage mode.
+
+  A \code{genlight} object can be constructed from vectors of integers
+  giving the number of the second allele for each locus and each
+  individual (see 'Objects of the class genlight' below).
+
+  \code{genlight} stores a multiple genotypes. Each genotype is stored
+  as a \linkS4class{SNPbin} object.
+}
+\section{Objects from the class genlight}{
+  \code{genlight} objects can be created by calls to \code{new("genlight",
+    ...)}, where '...' can be the following arguments:
+  
+  \describe{
+    \item{\code{gen}}{input genotypes, where each genotype is coded as a
+      vector of numbers of the second allele. If a list, each slot of the
+      list correspond to an individual; if a matrix or a data.frame, rows
+      correspond to individuals and columns to SNPs. If individuals or
+      loci are named in the input, these names will we stored in the
+      produced object. All individuals are expected to have the same
+      number of SNPs. Shorter genotypes are completed with NAs, issuing a
+      warning.}
+    \item{\code{ploidy}}{an optional vector of integers indicating the ploidy of the
+      genotypes. Genotypes can therefore have different ploidy. If not
+      provided, ploidy will be guessed from the data (as the
+      maximum number of second alleles in each individual).}
+    \item{\code{ind.names}}{an optional vector of characters giving the labels
+      of the genotypes.}
+    \item{\code{loc.names}}{an optional vector of characters giving the labels
+      of the SNPs.}
+    \item{\code{loc.all}}{an optional vector of characters indicating
+    the alleles of each SNP; for each SNP, alleles must be coded by two
+    letters separated by '/', e.g. 'a/t' is valid, but 'a  t' or 'a |t' are not.}
+  }
+}
+\section{Slots}{
+  The following slots are the content of instances of the class
+  \code{genlight}; note that in most cases, it is better to retrieve
+  information via accessors (see below), rather than by accessing the
+  slots manually.
+  \describe{
+    \item{\code{gen}:}{a list of genotypes stored as  \linkS4class{SNPbin} objects.}
+    \item{\code{n.loc}:}{an integer indicating the number of SNPs of the
+      genotype.}
+    \item{\code{ind.names}:}{a vector of characters indicating the names of
+      genotypes.}
+    \item{\code{loc.names}:}{a vector of characters indicating the names of
+      SNPs.}
+    \item{\code{loc.all}:}{a vector of characters indicating the alleles
+      of each SNP.}
+    \item{\code{ploidy}:}{a vector of integers indicating the ploidy of each genotype.}
+  }
+}
+\section{Methods}{
+  Here is a list of methods available for \code{genlight} objects. Most of
+    these methods are accessors, that is, functions which are used to
+    retrieve the content of the object. Specific manpages can exist for
+    accessors with more than one argument. These are indicated by a '*'
+    symbol next to the method's name. This list also contains methods
+    for conversion from \code{genlight} to other classes.
+  \describe{
+    \item{[}{\code{signature(x = "genlight")}: usual method to subset
+      objects in R. Is to be applied as if the object was a matrix where
+      genotypes are rows and SNPs are columns. Indexing can be done via
+      vectors of signed integers or of logicals.}
+    \item{show}{\code{signature(x = "genlight")}: printing of the
+      object.}
+    \item{$}{\code{signature(x = "genlight")}: similar to the @ operator;
+      used to access the content of slots of the object.}
+    \item{$<-}{\code{signature(x = "genlight")}: similar to the @ operator;
+      used to replace the content of slots of the object.}
+    \item{nInd}{\code{signature(x = "genlight")}: returns the number of
+      individuals in the object.}
+    \item{nLoc}{\code{signature(x = "genlight")}: returns the number of
+      SNPs in the object.}
+    \item{names}{\code{signature(x = "genlight")}: returns the names of
+      the slots of the object.}
+    \item{ploidy}{\code{signature(x = "genlight")}: returns the ploidy of
+      the genotypes.}
+    \item{indNames}{\code{signature(x = "genlight")}: returns the names of
+      the individuals, if provided when the object was contructed.}
+    \item{locNames}{\code{signature(x = "genlight")}: returns the names of
+      the loci, if provided when the object was contructed.}
+    \item{as.matrix}{\code{signature(x = "genlight")}: converts a
+      \code{genlight} object into a matrix of integers, with individuals
+      in rows and SNPs in columns. The S4 method 'as' can be used as
+      well (e.g. as(x, "matrix")).}
+    \item{as.data.frame}{\code{signature(x = "genlight")}: same as \code{as.matrix}.}
+    \item{as.list}{\code{signature(x = "genlight")}: converts a
+      \code{genlight} object into a list of genotypes coded as vector of
+      integers (numbers of second allele). The S4 method 'as' can be
+      used as well (e.g. as(x, "list")).}
+  }
+}
+\author{Thibaut Jombart (\email{t.jombart at imperial.ac.uk})}
+\seealso{
+ Related class:\cr
+  -  \code{\linkS4class{SNPbin}}, for storing individual genotypes of
+  binary SNPs\cr
+  
+  -  \code{\linkS4class{genind}}, for storing other types of genetic markers. \cr
+}
+\examples{
+## TOY EXAMPLE ##
+## create and convert data
+dat <- list(toto=c(1,1,0,0), titi=c(NA,1,1,0), tata=c(NA,0,3, NA))
+x <- new("genlight", dat)
+x
+
+## examine the content of the object
+names(x)
+x at gen
+x at gen[[1]]@snp # bit-level coding for first individual
+
+## conversions
+as.list(x)
+as.matrix(x)
+
+## round trips - must return TRUE
+identical(x, new("genlight", as.list(x))) # list
+identical(x, new("genlight", as.matrix(x))) # matrix
+identical(x, new("genlight", as.data.frame(x))) # data.frame
+
+## test subsetting
+x[c(1,3)] # keep individuals 1 and 3
+as.list(x[c(1,3)])
+x[c(1,3), 1:2] # keep individuals 1 and 3, loci 1 and 2
+as.list(x[c(1,3), 1:2])
+x[c(TRUE,FALSE), c(TRUE,TRUE,FALSE,FALSE)] # same, using logicals
+as.list(x[c(TRUE,FALSE), c(TRUE,TRUE,FALSE,FALSE)])
+
+
+## REAL-SIZE EXAMPLE ##
+## 50 genotypes of 1,000,000 SNPs
+dat <- lapply(1:50, function(i) sample(c(0,1,NA), 1e6, prob=c(.5, .49, .01), replace=TRUE))
+names(dat) <- paste("indiv", 1:length(dat))
+print(object.size(dat), unit="aut") # size of the original data
+
+x <- new("genlight", dat) # conversion + time taken
+x
+print(object.size(x), unit="au") # size of the genlight object
+object.size(dat)/object.size(x) # conversion efficiency
+}
+\keyword{classes}



More information about the adegenet-commits mailing list