[adegenet-commits] r900 - in pkg: R inst/doc

Fri Jun 10 13:19:41 CEST 2011

Author: jombart
Date: 2011-06-10 13:19:41 +0200 (Fri, 10 Jun 2011)
New Revision: 900

Modified:
   pkg/R/SNPbin.R
   pkg/R/import.R
   pkg/inst/doc/adegenet-genomics.Rnw
Log:
Corrected minor stuff in read.snp and show method


Modified: pkg/R/SNPbin.R
===================================================================

--- pkg/R/SNPbin.R	2011-06-02 14:59:23 UTC (rev 899)
+++ pkg/R/SNPbin.R	2011-06-10 11:19:41 UTC (rev 900)
@@ -501,6 +501,14 @@
         cat("\n @position: position of the SNPs")
     }
 
+    if(!is.null(alleles(object))){
+        cat("\n @alleles: position of the SNPs")
+    }
+
+    if(!is.null(object at loc.names)){
+        cat("\n @loc.names: names of the SNPs")
+    }
+
     if(!is.null(other(object))){
         cat("\n @other: ")
         cat("a list containing: ")
@@ -617,7 +625,16 @@
 
 ## locNames
 setMethod("locNames","genlight", function(x,...){
-    return(x at loc.names)
+    ## if loc names provided, return them
+    if(!is.null(x at loc.names)) return(x at loc.names)
+
+    ## otherwise, look for position / alleles
+    if(!is.null(res <- position(x))){
+        if(!is.null(alleles(x))){
+            res <- paste(res, alleles(x), sep=".")
+        }
+        return(res)
+    }
 })
 
 

Modified: pkg/R/import.R
===================================================================
--- pkg/R/import.R	2011-06-02 14:59:23 UTC (rev 899)
+++ pkg/R/import.R	2011-06-10 11:19:41 UTC (rev 900)
@@ -859,7 +859,7 @@
         other <- list(chromosome = misc.info$chromosome)
     }
 
-    res <- new("genlight", gen=res, ind.names=ind.names, loc.names=misc.info$position, loc.all=misc.info$allele, ploidy=misc.info$ploidy, pop=misc.info$population, other=other)
+    res <- new("genlight", gen=res, ind.names=ind.names, position=misc.info$position, loc.all=misc.info$allele, ploidy=misc.info$ploidy, pop=misc.info$population, other=other)
 
     if(!quiet) cat("\n...done.\n\n")
 

Modified: pkg/inst/doc/adegenet-genomics.Rnw
===================================================================
--- pkg/inst/doc/adegenet-genomics.Rnw	2011-06-02 14:59:23 UTC (rev 899)
+++ pkg/inst/doc/adegenet-genomics.Rnw	2011-06-10 11:19:41 UTC (rev 900)
@@ -363,7 +363,7 @@
 returned by \texttt{as.matrix}.
 Therefore, subsetting can be achieved using $[$ \texttt{idx.row , idx.col} $]$ where \texttt{idx.row}
 and \texttt{idx.col} are indices for rows (individuals) and columns (SNPs).
-For instance, using the previous toy dataset, we try a few classical subsetting for matrices:
+For instance, using the previous toy dataset, we try a few classical subsetting of rows and columns:
 <<>>=
 x
 as.matrix(x)
@@ -393,16 +393,109 @@
 
 
 
+
 %%%%%%%%%%%%%%%%
 \subsection{Data conversions}
 %%%%%%%%%%%%%%%%
 
-
-
 % % % % % % % % % % % % %
 \subsubsection{The \texttt{.snp} format}
 % % % % % % % % % % % % %
 
+\textit{adegenet} has defined its own format for storing biallelic SNP data in text files with
+extension \texttt{.snp}.
+This format has several advantages: it is fairly compact (more so than usual non-compressed
+formats), allows for any information about individuals or loci to be stored, allows for comments,
+and is easily parsed --- in particular, not all information has to be read at a time, again
+minimizing RAM requirements for import procedures.
+
+
+An example file of this format is distributed with adegenet.
+Once the package has been installed, the file can be accessed by typing:
+<<eval=FALSE>>=
+file.show(system.file("files/exampleSnpDat.snp",package="adegenet"))
+@
+Otherwise, this file is also accessible from the \textit{adegenet} website (section 'Documents').
+A complete description of the \texttt{.snp} format is provided in the comment section of the file.
+\\
+
+
+The structure of a \texttt{.snp} file can be summarized as follows:
+\begin{itemize}
+\item a (possibly empty) \texttt{comment section}
+\item \texttt{meta-information}, i.e. information about loci or individuals, stored as named vectors
+\item \texttt{genotypes}, stored as named vectors
+\end{itemize}
+
+The \textit{comment section} can starts with the line:\\
+\noindent \texttt{>>>> begin comments - do not remove this line <<<<}\\
+\noindent and ends with the line:\\
+\noindent \texttt{>>>> end comments - do not remove this line <<<<}.\\
+\noindent While this section can be left empty, these two lines have to be present for the format to
+be valid.
+Each \textit{meta-information} is stored using two lines, the first starting as
+\texttt{>> name-of-the-information}, and the second containing the information itself, each
+item separated by a single space.
+Any label can be used, but some specific names will be recognized and interpreted by the parser:
+\begin{itemize}
+\item \texttt{position}: the following line contains integers giving the position of the SNPs on the sequence
+\item \texttt{allele}: character strings representing the two alleles of each loci separated by "/"
+\item \texttt{population}: character strings indicating a group memberships of the individuals
+\item \texttt{ploidy}: integers indicating the ploidy of each individual; alternatively, one single integer if
+all individuals have the same ploidy
+\item \texttt{chromosome}: character strings indicating the chromosome on which the SNP are located
+\end{itemize}
+Each \textit{genotype} is stored using two lines, the first being
+\texttt{> label-of-the-individual}, and the second being integers corresponding to the number of
+second allele for each loci, without separators; missing data are coded as '\texttt{-}'.
+\\
+
+
+\texttt{.snp} files can be read in R using \texttt{read.snp}, which converts data into
+\texttt{genlight} objects.
+The function reads data by chunks of a several individuals (minimum 1, no maximum besides RAM
+constraints) at a time, which allows one to read massive datasets with negligible RAM requirements
+(albeit at a cost of computational time). The argument \texttt{chunkSize} indicates the number of
+genomes read at a time; larger values mean reading data faster but require more RAM.
+We can illustrate \texttt{read.snp} using the example file mentioned above.
+The non-comment part of the file reads:
+\begin{verbatim}
+[...]
+>> position
+1 8 11 43
+>> allele
+a/t g/c a/c t/a
+>> population
+Brit Brit Fren monster NA
+>> ploidy
+2
+> foo
+1020
+> bar
+0012
+> toto
+10-0
+> Nyarlathotep
+0120
+> an even longer label but OK since on a single line
+1100
+\end{verbatim}
+We read the file in using:
+<<>>=
+obj <- read.snp(system.file("files/exampleSnpDat.snp",package="adegenet"), chunk=2)
+obj
+as.matrix(obj)
+alleles(obj)
+pop(obj)
+indNames(obj)
+@
+Note that \texttt{system.file} is generally useless: it is only used in this example to access a
+file installed alongside the package. Usual calls to \texttt{read.snp} will ressemble:
+<<eval=FALSE>>=
+obj <- read.snp("path-to-my-file.snp")
+@
+
+
 % % % % % % % % % % % % %
 \subsubsection{Importing data from PLINK}
 % % % % % % % % % % % % %