[adegenet-commits] r755 - pkg/man
noreply at r-forge.r-project.org
noreply at r-forge.r-project.org
Wed Jan 5 17:55:18 CET 2011
Author: jombart
Date: 2011-01-05 17:55:18 +0100 (Wed, 05 Jan 2011)
New Revision: 755
Added:
pkg/man/SNPbin.Rd
pkg/man/genlight.Rd
Modified:
pkg/man/as.genind.Rd
pkg/man/genind.Rd
Log:
Documented genlight as well.
Added: pkg/man/SNPbin.Rd
===================================================================
--- pkg/man/SNPbin.Rd (rev 0)
+++ pkg/man/SNPbin.Rd 2011-01-05 16:55:18 UTC (rev 755)
@@ -0,0 +1,138 @@
+\name{SNPbin-class}
+\docType{class}
+\alias{SNPbin}
+\alias{SNPbin-class}
+\alias{[,SNPbin-method}
+\alias{[,SNPbin,ANY,ANY-method}
+\alias{initialize,SNPbin-method}
+\alias{show,SNPbin-method}
+\alias{nLoc,SNPbin-method}
+\alias{$,SNPbin-method}
+\alias{$<-,SNPbin-method}
+\alias{names,SNPbin-method}
+\alias{ploidy,SNPbin-method}
+\alias{as,SNPbin,integer-method}
+\alias{as.integer,SNPbin-method}
+% \alias{,SNPbin-method}
+% \alias{,SNPbin-method}
+% \alias{,SNPbin-method}
+% \alias{,SNPbin-method}
+%%%%
+\title{Formal class "SNPbin"}
+\description{
+ The class \code{SNPbin} is a formal (S4) class for storing a genotype
+ of binary SNPs in a compact way, using a bit-level coding scheme.
+ This storage is most efficient with haploid data, where the memory
+ taken to represent data can reduced more than 50 times. However,
+ \code{SNPbin} can be used for any level of ploidy, and still remain an
+ efficient storage mode.
+
+ A \code{SNPbin} object can be constructed from
+ a vector of integers giving the number of the second allele for each
+ locus.
+
+ \code{SNPbin} stores a single genotype. To store multiple genotypes,
+ use the \linkS4class{genlight} class.
+}
+\section{Objects from the class SNPbin}{
+ \code{SNPbin} objects can be created by calls to \code{new("SNPbin",
+ ...)}, where '...' can be the following arguments:
+
+ \describe{
+ \item{\code{snp}}{a vector of integers or numeric giving numbers of
+ copies of the second alleles for each locus. If only one unnamed
+ argument is provided to 'new', it is considered as this one.}
+ \item{\code{ploidy}}{an integer indicating the ploidy of the
+ genotype; if not provided, will be guessed from the data (as the
+ maximum from the 'snp' input vector).}
+ \item{\code{label}}{an optional character string serving as a label
+ for the genotype.}
+ }
+}
+\section{Slots}{
+ The following slots are the content of instances of the class
+ \code{SNPbin}; note that in most cases, it is better to retrieve
+ information via accessors (see below), rather than by accessing the
+ slots manually.
+ \describe{
+ \item{\code{snp}:}{a list of vectors with the class \code{raw}.}
+ \item{\code{n.loc}:}{an integer indicating the number of SNPs of the
+ genotype.}
+ \item{\code{NA.posi}:}{a vector of integer giving the position of
+ missing data.}
+ \item{\code{label}:}{an optional character string serving as a label
+ for the genotype..}
+ \item{\code{ploidy}:}{an integer indicating the ploidy of the genotype.}
+}
+}
+\section{Methods}{
+ Here is a list of methods available for \code{SNPbin} objects. Most of
+ these methods are accessors, that is, functions which are used to
+ retrieve the content of the object. Specific manpages can exist for
+ accessors with more than one argument. These are indicated by a '*'
+ symbol next to the method's name. This list also contains methods
+ for conversion from \code{SNPbin} to other classes.
+ \describe{
+ \item{[}{\code{signature(x = "SNPbin")}: usual method to subset
+ objects in R. The argument indicates how SNPs are to be
+ subsetted. It can be a vector of signed integers or of logicals.}
+ \item{show}{\code{signature(x = "SNPbin")}: printing of the
+ object.}
+ \item{$}{\code{signature(x = "SNPbin")}: similar to the @ operator;
+ used to access the content of slots of the object.}
+ \item{$<-}{\code{signature(x = "SNPbin")}: similar to the @ operator;
+ used to replace the content of slots of the object.}
+ \item{nLoc}{\code{signature(x = "SNPbin")}: returns the number of
+ SNPs in the object.}
+ \item{names}{\code{signature(x = "SNPbin")}: returns the names of
+ the slots of the object.}
+ \item{ploidy}{\code{signature(x = "SNPbin")}: returns the ploidy of
+ the genotype.}
+ \item{as.integer}{\code{signature(x = "SNPbin")}: converts a
+ \code{SNPbin} object to a vector of integers. The S4 method 'as' can
+ be used as well (e.g. as(x, "integer")).}
+ }
+}
+\author{Thibaut Jombart (\email{t.jombart at imperial.ac.uk})}
+\seealso{
+ Related class:\cr
+ - \code{\linkS4class{genlight}}, for storing multiple binary SNP
+ genotypes. \cr
+ - \code{\linkS4class{genind}}, for storing other types of genetic markers. \cr
+}
+\examples{
+#### HAPLOID EXAMPLE ####
+## create a genotype of 1,000,000 SNPs
+dat <- sample(c(0,1,NA), 1e6, prob=c(.495, .495, .01), replace=TRUE)
+dat[1:10]
+x <- new("SNPbin", dat)
+x
+x[1:10] # subsetting
+as.integer(x[1:10])
+
+## try a few accessors
+ploidy(x)
+nLoc(x)
+head(x$snp[[1]]) # internal bit-level coding
+
+## check that conversion is OK
+identical(as(x, "integer"),as.integer(dat)) # SHOULD BE TRUE
+
+## compare the size of the objects
+print(object.size(dat), unit="auto")
+print(object.size(x), unit="auto")
+object.size(dat)/object.size(x) # EFFICIENCY OF CONVERSION
+
+
+#### TETRAPLOID EXAMPLE ####
+## create a genotype of 1,000,000 SNPs
+dat <- sample(c(0:4,NA), 1e6, prob=c(rep(.995/5,5), 0.005), replace=TRUE)
+x <- new("SNPbin", dat)
+identical(as(x, "integer"),as.integer(dat)) # MUST BE TRUE
+
+## compare the size of the objects
+print(object.size(dat), unit="auto")
+print(object.size(x), unit="auto")
+object.size(dat)/object.size(x) # EFFICIENCY OF CONVERSION
+}
+\keyword{classes}
Modified: pkg/man/as.genind.Rd
===================================================================
--- pkg/man/as.genind.Rd 2011-01-05 16:02:27 UTC (rev 754)
+++ pkg/man/as.genind.Rd 2011-01-05 16:55:18 UTC (rev 755)
@@ -49,8 +49,14 @@
\author{Thibaut Jombart \email{t.jombart at imperial.ac.uk}}
\seealso{
\code{\linkS4class{genind} class}, and \code{\link{import2genind}} for
- importing from various types of file.
+ importing from various types of file.\cr
+
+ Related classes:\cr
+ - \linkS4class{genpop} for storing data per populations\cr
+
+ - \linkS4class{genlight} for an efficient storage of binary SNPs genotypes\cr
}
+}
\examples{
data(nancycats)
nancycats at loc.names
Modified: pkg/man/genind.Rd
===================================================================
--- pkg/man/genind.Rd 2011-01-05 16:02:27 UTC (rev 754)
+++ pkg/man/genind.Rd 2011-01-05 16:55:18 UTC (rev 755)
@@ -63,8 +63,12 @@
\seealso{\code{\link{as.genind}}, \code{\link{is.genind}}, \code{\link{genind2genpop}},
\code{\link{genpop}}, \code{\link{import2genind}},
\code{\link{read.genetix}}, \code{\link{read.genepop}},
- \code{\link{read.fstat}}, \code{\link{na.replace}}
-
+ \code{\link{read.fstat}}, \code{\link{na.replace}}\cr
+
+ Related classes:\cr
+ - \linkS4class{genpop} for storing data per populations\cr
+
+ - \linkS4class{genlight} for an efficient storage of binary SNPs genotypes\cr
}
\author{ Thibaut Jombart \email{t.jombart at imperial.ac.uk} }
\examples{
Added: pkg/man/genlight.Rd
===================================================================
--- pkg/man/genlight.Rd (rev 0)
+++ pkg/man/genlight.Rd 2011-01-05 16:55:18 UTC (rev 755)
@@ -0,0 +1,178 @@
+\name{genlight-class}
+\docType{class}
+\alias{genlight}
+\alias{genlight-class}
+\alias{[,genlight-method}
+\alias{[,genlight,ANY,ANY-method}
+\alias{initialize,genlight-method}
+\alias{show,genlight-method}
+\alias{nLoc,genlight-method}
+\alias{nInd,genlight-method}
+\alias{$,genlight-method}
+\alias{$<-,genlight-method}
+\alias{names,genlight-method}
+\alias{ploidy,genlight-method}
+\alias{locNames,genlight-method}
+\alias{indNames,genlight-method}
+\alias{as,genlight,matrix-method}
+\alias{as.matrix,genlight-method}
+\alias{as,genlight,data.frame-method}
+\alias{as.data.frame,genlight-method}
+\alias{as,genlight,list-method}
+\alias{as.list,genlight-method}
+% \alias{,genlight-method}
+% \alias{,genlight-method}
+% \alias{,genlight-method}
+% \alias{,genlight-method}
+%%%%
+\title{Formal class "genlight"}
+\description{
+ The class \code{genlight} is a formal (S4) class for storing a genotypes
+ of binary SNPs in a compact way, using a bit-level coding scheme.
+ This storage is most efficient with haploid data, where the memory
+ taken to represent data can reduced more than 50 times. However,
+ \code{genlight} can be used for any level of ploidy, and still remain an
+ efficient storage mode.
+
+ A \code{genlight} object can be constructed from vectors of integers
+ giving the number of the second allele for each locus and each
+ individual (see 'Objects of the class genlight' below).
+
+ \code{genlight} stores a multiple genotypes. Each genotype is stored
+ as a \linkS4class{SNPbin} object.
+}
+\section{Objects from the class genlight}{
+ \code{genlight} objects can be created by calls to \code{new("genlight",
+ ...)}, where '...' can be the following arguments:
+
+ \describe{
+ \item{\code{gen}}{input genotypes, where each genotype is coded as a
+ vector of numbers of the second allele. If a list, each slot of the
+ list correspond to an individual; if a matrix or a data.frame, rows
+ correspond to individuals and columns to SNPs. If individuals or
+ loci are named in the input, these names will we stored in the
+ produced object. All individuals are expected to have the same
+ number of SNPs. Shorter genotypes are completed with NAs, issuing a
+ warning.}
+ \item{\code{ploidy}}{an optional vector of integers indicating the ploidy of the
+ genotypes. Genotypes can therefore have different ploidy. If not
+ provided, ploidy will be guessed from the data (as the
+ maximum number of second alleles in each individual).}
+ \item{\code{ind.names}}{an optional vector of characters giving the labels
+ of the genotypes.}
+ \item{\code{loc.names}}{an optional vector of characters giving the labels
+ of the SNPs.}
+ \item{\code{loc.all}}{an optional vector of characters indicating
+ the alleles of each SNP; for each SNP, alleles must be coded by two
+ letters separated by '/', e.g. 'a/t' is valid, but 'a t' or 'a |t' are not.}
+ }
+}
+\section{Slots}{
+ The following slots are the content of instances of the class
+ \code{genlight}; note that in most cases, it is better to retrieve
+ information via accessors (see below), rather than by accessing the
+ slots manually.
+ \describe{
+ \item{\code{gen}:}{a list of genotypes stored as \linkS4class{SNPbin} objects.}
+ \item{\code{n.loc}:}{an integer indicating the number of SNPs of the
+ genotype.}
+ \item{\code{ind.names}:}{a vector of characters indicating the names of
+ genotypes.}
+ \item{\code{loc.names}:}{a vector of characters indicating the names of
+ SNPs.}
+ \item{\code{loc.all}:}{a vector of characters indicating the alleles
+ of each SNP.}
+ \item{\code{ploidy}:}{a vector of integers indicating the ploidy of each genotype.}
+ }
+}
+\section{Methods}{
+ Here is a list of methods available for \code{genlight} objects. Most of
+ these methods are accessors, that is, functions which are used to
+ retrieve the content of the object. Specific manpages can exist for
+ accessors with more than one argument. These are indicated by a '*'
+ symbol next to the method's name. This list also contains methods
+ for conversion from \code{genlight} to other classes.
+ \describe{
+ \item{[}{\code{signature(x = "genlight")}: usual method to subset
+ objects in R. Is to be applied as if the object was a matrix where
+ genotypes are rows and SNPs are columns. Indexing can be done via
+ vectors of signed integers or of logicals.}
+ \item{show}{\code{signature(x = "genlight")}: printing of the
+ object.}
+ \item{$}{\code{signature(x = "genlight")}: similar to the @ operator;
+ used to access the content of slots of the object.}
+ \item{$<-}{\code{signature(x = "genlight")}: similar to the @ operator;
+ used to replace the content of slots of the object.}
+ \item{nInd}{\code{signature(x = "genlight")}: returns the number of
+ individuals in the object.}
+ \item{nLoc}{\code{signature(x = "genlight")}: returns the number of
+ SNPs in the object.}
+ \item{names}{\code{signature(x = "genlight")}: returns the names of
+ the slots of the object.}
+ \item{ploidy}{\code{signature(x = "genlight")}: returns the ploidy of
+ the genotypes.}
+ \item{indNames}{\code{signature(x = "genlight")}: returns the names of
+ the individuals, if provided when the object was contructed.}
+ \item{locNames}{\code{signature(x = "genlight")}: returns the names of
+ the loci, if provided when the object was contructed.}
+ \item{as.matrix}{\code{signature(x = "genlight")}: converts a
+ \code{genlight} object into a matrix of integers, with individuals
+ in rows and SNPs in columns. The S4 method 'as' can be used as
+ well (e.g. as(x, "matrix")).}
+ \item{as.data.frame}{\code{signature(x = "genlight")}: same as \code{as.matrix}.}
+ \item{as.list}{\code{signature(x = "genlight")}: converts a
+ \code{genlight} object into a list of genotypes coded as vector of
+ integers (numbers of second allele). The S4 method 'as' can be
+ used as well (e.g. as(x, "list")).}
+ }
+}
+\author{Thibaut Jombart (\email{t.jombart at imperial.ac.uk})}
+\seealso{
+ Related class:\cr
+ - \code{\linkS4class{SNPbin}}, for storing individual genotypes of
+ binary SNPs\cr
+
+ - \code{\linkS4class{genind}}, for storing other types of genetic markers. \cr
+}
+\examples{
+## TOY EXAMPLE ##
+## create and convert data
+dat <- list(toto=c(1,1,0,0), titi=c(NA,1,1,0), tata=c(NA,0,3, NA))
+x <- new("genlight", dat)
+x
+
+## examine the content of the object
+names(x)
+x at gen
+x at gen[[1]]@snp # bit-level coding for first individual
+
+## conversions
+as.list(x)
+as.matrix(x)
+
+## round trips - must return TRUE
+identical(x, new("genlight", as.list(x))) # list
+identical(x, new("genlight", as.matrix(x))) # matrix
+identical(x, new("genlight", as.data.frame(x))) # data.frame
+
+## test subsetting
+x[c(1,3)] # keep individuals 1 and 3
+as.list(x[c(1,3)])
+x[c(1,3), 1:2] # keep individuals 1 and 3, loci 1 and 2
+as.list(x[c(1,3), 1:2])
+x[c(TRUE,FALSE), c(TRUE,TRUE,FALSE,FALSE)] # same, using logicals
+as.list(x[c(TRUE,FALSE), c(TRUE,TRUE,FALSE,FALSE)])
+
+
+## REAL-SIZE EXAMPLE ##
+## 50 genotypes of 1,000,000 SNPs
+dat <- lapply(1:50, function(i) sample(c(0,1,NA), 1e6, prob=c(.5, .49, .01), replace=TRUE))
+names(dat) <- paste("indiv", 1:length(dat))
+print(object.size(dat), unit="aut") # size of the original data
+
+x <- new("genlight", dat) # conversion + time taken
+x
+print(object.size(x), unit="au") # size of the genlight object
+object.size(dat)/object.size(x) # conversion efficiency
+}
+\keyword{classes}
More information about the adegenet-commits
mailing list