[GenABEL-dev] [Genabel-commits] r2023 - in pkg: . RepeatABEL RepeatABEL/R RepeatABEL/data RepeatABEL/inst RepeatABEL/inst/doc RepeatABEL/man RepeatABEL/vignettes RepeatABEL/vignettes/marginnote

L.C. Karssen lennart at karssen.org
Fri Nov 20 13:51:01 CET 2015


Dear Lars,

I noticed you committed RepeatABEL to the SVN repo. Great work! I made a
few comments below.

On 20-11-15 12:56, noreply at r-forge.r-project.org wrote:
> Author: larsronn
> Date: 2015-11-20 12:56:57 +0100 (Fri, 20 Nov 2015)
> New Revision: 2023
> 
> Added:
>    pkg/RepeatABEL/
>    pkg/RepeatABEL/DESCRIPTION
>    pkg/RepeatABEL/NAMESPACE
>    pkg/RepeatABEL/R/
>    pkg/RepeatABEL/R/Create_gwaa_scan.R
>    pkg/RepeatABEL/R/SmoothSNPmatrix.R
>    pkg/RepeatABEL/R/compute.GRM.R
>    pkg/RepeatABEL/R/constructV.R
>    pkg/RepeatABEL/R/preFitModel.R
>    pkg/RepeatABEL/R/rGLS.R
>    pkg/RepeatABEL/R/simulate_PhenData.R
>    pkg/RepeatABEL/data/
>    pkg/RepeatABEL/data/Phen.Data.rda
>    pkg/RepeatABEL/data/datalist
>    pkg/RepeatABEL/data/flycatchers.rda
>    pkg/RepeatABEL/data/gen.data.rda
>    pkg/RepeatABEL/data/thinned.flycatchers.rda
>    pkg/RepeatABEL/inst/
>    pkg/RepeatABEL/inst/CITATION
>    pkg/RepeatABEL/inst/doc/
>    pkg/RepeatABEL/inst/doc/RepeatABEL_vignette.pdf

I would remove the PDF from SVN. In principle, only source code should
end up in a version control system, not the output. The R CMD
check/build commands should create the vignette from its .Rnw source.

>    pkg/RepeatABEL/man/
>    pkg/RepeatABEL/man/.Rapp.history

I think the file above has slipped in by mistake.

>    pkg/RepeatABEL/man/Create_gwaa_scan.Rd
>    pkg/RepeatABEL/man/Phen.Data.Rd
>    pkg/RepeatABEL/man/RepeatABEL-package.Rd
>    pkg/RepeatABEL/man/SmoothSNPmatrix.Rd
>    pkg/RepeatABEL/man/compute.GRM.Rd
>    pkg/RepeatABEL/man/constructV.Rd
>    pkg/RepeatABEL/man/flycatchers.Rd
>    pkg/RepeatABEL/man/gen.data.Rd
>    pkg/RepeatABEL/man/preFitModel.Rd
>    pkg/RepeatABEL/man/rGLS.Rd
>    pkg/RepeatABEL/man/simulate_PhenData.Rd
>    pkg/RepeatABEL/man/thinned.flycatchers.Rd
>    pkg/RepeatABEL/vignettes/
>    pkg/RepeatABEL/vignettes/.Rhistory

This file can also be removed.

For each of the aforementioned files I would set the svn:ignore property
to ensure they won't accidentally get committed again. See for example
http://svnbook.red-bean.com/en/1.7/svn.advanced.props.special.ignore.html for
more info on the 'svn propset' command.


Best,

Lennart.

>    pkg/RepeatABEL/vignettes/Figure_Manhattan.pdf
>    pkg/RepeatABEL/vignettes/RepeatABEL_vignette.Rnw
>    pkg/RepeatABEL/vignettes/marginnote/
>    pkg/RepeatABEL/vignettes/marginnote/README
>    pkg/RepeatABEL/vignettes/marginnote/marginnote.dtx
>    pkg/RepeatABEL/vignettes/marginnote/marginnote.ins
>    pkg/RepeatABEL/vignettes/marginnote/marginnote.pdf
>    pkg/RepeatABEL/vignettes/marginnote/marginnote.sty
> Log:
> RepeatABEL package uploaded
> 
> Added: pkg/RepeatABEL/DESCRIPTION
> ===================================================================
> --- pkg/RepeatABEL/DESCRIPTION	                        (rev 0)
> +++ pkg/RepeatABEL/DESCRIPTION	2015-11-20 11:56:57 UTC (rev 2023)
> @@ -0,0 +1,11 @@
> +Package: RepeatABEL
> +Type: Package
> +Title: GWAS for Multiple Observations on Related Individuals
> +Version: 1.0
> +Date: 2015-11-03
> +Author: Lars Ronnegard
> +Maintainer: Lars Ronnegard <lrn at du.se>
> +Description: The RepeatABEL package is used to perform genome-wide association studies on individuals that are both related and have repeated measurements.
> +License: GPL
> +Depends: R (>= 2.10), hglm, GenABEL
> +
> 
> Added: pkg/RepeatABEL/NAMESPACE
> ===================================================================
> --- pkg/RepeatABEL/NAMESPACE	                        (rev 0)
> +++ pkg/RepeatABEL/NAMESPACE	2015-11-20 11:56:57 UTC (rev 2023)
> @@ -0,0 +1,4 @@
> +exportPattern("^[^\\.]")
> +import("hglm","GenABEL")
> +
> +
> 
> Added: pkg/RepeatABEL/R/Create_gwaa_scan.R
> ===================================================================
> --- pkg/RepeatABEL/R/Create_gwaa_scan.R	                        (rev 0)
> +++ pkg/RepeatABEL/R/Create_gwaa_scan.R	2015-11-20 11:56:57 UTC (rev 2023)
> @@ -0,0 +1,22 @@
> +#' @title Creates a scan.gwaa object
> +#'
> +#' @description
> +#' Creates a scan.gwaa object from a GenABEL object and p-values.
> +#'
> +#' @param data A gwaa.data object
> +#' @param P1df P-values computed from external analysis
> +#' @param SNP.eff Estimated additive SNP effects
> +#' 
> +#' @author Lars Ronnegard
> +#' 
> +Create_gwaa_scan <- function(data, P1df, SNP.eff) {
> +	chi2.1df <- qchisq(P1df, df=1, lower.tail=FALSE)
> +	lambda <- try( estlambda(chi2.1df), silent=TRUE )
> +	se_effB <- abs(SNP.eff)/qnorm(P1df/2, lower.tail=FALSE)
> +    if ( class(lambda) == "try-error" ) lambda <- list( estimate = NA, se = NA )
> +	results <- data.frame(effB = SNP.eff,  se_effB = se_effB, chi2.1df = NA, P1df = P1df, Pc1df = NA, stringsAsFactors = FALSE)
> +  rownames(results) = snpnames(data)
> +	out <- new("scan.gwaa", results = results, annotation = annotation(data), lambda = lambda, idnames = idnames(data), call = as.call( list("rGLS") ), family = "gaussian")
> +	return(out)
> +}
> +
> 
> Added: pkg/RepeatABEL/R/SmoothSNPmatrix.R
> ===================================================================
> --- pkg/RepeatABEL/R/SmoothSNPmatrix.R	                        (rev 0)
> +++ pkg/RepeatABEL/R/SmoothSNPmatrix.R	2015-11-20 11:56:57 UTC (rev 2023)
> @@ -0,0 +1,19 @@
> +#' @title Imputes column means to missing genotypes
> +#'
> +#' @description
> +#' Imputes column means to missing genotypes.
> +#'
> +#' @param SNP A matrix including SNP coding.
> +#' 
> +#' @author Lars Ronnegard
> +#' 
> +SmoothSNPmatrix <-
> +function(SNP) {
> +	#Imputes column means to missing values
> +	m <- ncol(SNP)
> +    miss <- is.na(colMeans(SNP))
> +	for (i in (1:m)[miss]){
> +	SNP[is.na(SNP[,i]), i] <- mean( SNP[!is.na(SNP[ , i]), i] )
> +	}
> +	return(SNP)
> +}
> 
> Added: pkg/RepeatABEL/R/compute.GRM.R
> ===================================================================
> --- pkg/RepeatABEL/R/compute.GRM.R	                        (rev 0)
> +++ pkg/RepeatABEL/R/compute.GRM.R	2015-11-20 11:56:57 UTC (rev 2023)
> @@ -0,0 +1,28 @@
> +#' @title Computes a Genetic Relationship Matrix from a GenABEL object
> +#'
> +#' @description
> +#' Two alternative methods for GRM computations implemented.
> +#'
> +#' @param gen.data The GenABEL object.
> +#' @param method Method to be used.
> +#' 
> +#' @author Lars Ronnegard
> +#' 
> +compute.GRM <-
> +function(gen.data, method = "GenABEL") {
> +  if (method == "GenABEL"){
> +    gkins <- ibs(gen.data, weight = "freq")
> +    GRM <- gkins
> +    GRM[t(lower.tri(gkins))] <- 0
> +    rm(gkins)
> +    GRM <- 2*(GRM + t(GRM) - diag(diag(GRM)))
> +  }
> +  if (method == "ZZt"){
> +    SNP.matrix <- scale(as.double(gen.data))
> +    m <- ncol(SNP.matrix)
> +    markers.to.fit <- (1:m)[!is.na(colMeans(SNP.matrix))] 
> +    GRM <- tcrossprod(SNP.matrix[,markers.to.fit])/length(markers.to.fit)
> +    rm(SNP.matrix)
> +  }
> +  return(GRM)
> +}
> 
> Added: pkg/RepeatABEL/R/constructV.R
> ===================================================================
> --- pkg/RepeatABEL/R/constructV.R	                        (rev 0)
> +++ pkg/RepeatABEL/R/constructV.R	2015-11-20 11:56:57 UTC (rev 2023)
> @@ -0,0 +1,22 @@
> +#' @title Constructs the (co)variance matrix for y
> +#'
> +#' @description
> +#' Constructs the (co)variance matrix for y.
> +#'
> +#' @param Z The incidence matrix for the random effects column binded with the Cholesky of the GRM 
> +#' @param RandC The number of columns in the two matrices combined in Z. 
> +#' @param ratio The ratios between random effect variances and the residual variance.
> +#' 
> +#' @author Lars Ronnegard
> +#' 
> +constructV <-
> +function(Z, RandC, ratio) {
> +  V <- diag(nrow(Z))
> +  indx <- 1
> +  for (i in 1:length(ratio)) {
> +    Z.tmp <- Z[,indx:(indx + RandC[i] - 1)]
> +    indx = indx + RandC[i]
> +    V <- V + tcrossprod(Z.tmp) * ratio[i]
> + }
> + return(V)
> +}
> 
> Added: pkg/RepeatABEL/R/preFitModel.R
> ===================================================================
> --- pkg/RepeatABEL/R/preFitModel.R	                        (rev 0)
> +++ pkg/RepeatABEL/R/preFitModel.R	2015-11-20 11:56:57 UTC (rev 2023)
> @@ -0,0 +1,175 @@
> +#' @title Fits a linear mixed model (without fixed SNP effects) and computes the fitted variance-covariance matrix for later use in the rGLS function.
> +#'
> +#' @description Uses a GenABEL object and phenotype data as input. The model is fitted using the \code{hglm} function in the hglm package.
> +#'
> +#' @param fixed A formula including the response and fixed effects
> +#' @param random A formula for the random effects
> +#' @param id.name The column name of the IDs in phen.data
> +#' @param genabel.data An GenABEL object including marker information. This object has one observation per individual. 
> +#' @param phenotype.data A data frame including the repeated observations and IDs. 
> +#' @param corStruc A list specifying the correlation structure for each random effect. The options are: \code{"Ind"} for iid random effects, \code{"GRM"} for a correlation structure given by a genetic relationship matrix, or \code{"CAR"} for a spatial correlation structure given by a Conditional Autoregressive model specified by a neighborhood matrix. 
> +#' @param GRM A genetic relationship matrix. If not specified whilst the \code{"GRM"} option is given for \code{corStruc} then the GRM is computed internally within the function. 
> +#' @param Neighbor.Matrix A neighborhood matrix having non-zero value for an element (i,j) where the observations i and j come from neighboring locations. The diagonal elements should be zero. 
> +#' @return Returns a list including the fitted hglm object \code{fitted.hglm}, the variance-covariance matrix \code{V} and the ratios between estimated variance components for the random effects divided by the residual variance, \code{ratio}. 
> +#' @author Lars Ronnegard
> +#' 
> +#' @examples
> +#'  ####### FIRST EXAMPLE USING GRM #############
> +#'  data(Phen.Data) #Phenotype data with repeated observations
> +#'  data(gen.data) #GenABEL object including IDs and marker genotypes
> +#'  GWAS1 <- rGLS(y ~ age + sex, genabel.data = gen.data, phenotype.data = Phen.Data) 
> +#'  plot(GWAS1, main="")
> +#'  summary(GWAS1)
> +#'  #Summary for variance component estimation without SNP effects
> +#'  summary(GWAS1@@call$hglm) 
> +#'  #The same results can be computed using the preFitModel as follows
> +#'  fixed = y ~ age + sex
> +#'  Mod1 <- preFitModel(fixed, random=~1|id, genabel.data = gen.data, 
> +#'                      phenotype.data = Phen.Data, corStruc=list( id=list("GRM","Ind") )) 
> +#'  GWAS1b <- rGLS(fixed, genabel.data = gen.data, 
> +#'                 phenotype.data = Phen.Data, V = Mod1$V) 
> +#'  plot(GWAS1b, main="Results using the preFitModel function")
> +#'  ####### SECOND EXAMPLE USING CAR #############
> +#'  # Add a fake nest variable to the data just to run the example
> +#'  #In this example there are 6 nests and 60 observations per nest
> +#'  Phen.Data$nest <- rep(1:6, each=60)
> +#'  #A model including polygenic effects, permanent environmental effects, 
> +#'  #and nest effect as random
> +#'  Mod2 <- preFitModel(fixed, random=~1|id + 1|nest, genabel.data = gen.data, 
> +#'           phenotype.data = Phen.Data, corStruc=list( id=list("GRM","Ind"), nest=list("Ind")) )
> +#'  GWAS2 <- rGLS(fixed, genabel.data = gen.data, phenotype.data = Phen.Data, V = Mod2$V) 
> +#'  plot(GWAS2) 
> +#'  #Similar to previous plot because the nest effect variance component is almost 0.
> +#'  ###################
> +#'  #Construct a fake nighbourhood matrix
> +#'  D = matrix(0,6,6)
> +#'  D[1,2] = D[2,1] = 1
> +#'  D[5,6] = D[6,5] = 1
> +#'  D[2,4] = D[4,2] = 1
> +#'  D[3,5] = D[5,3] = 1
> +#'  D[1,6] = D[6,1] = 1
> +#'  D[3,4] = D[4,3] = 1
> +#'  #The matrix shows which pair of nests that can be considered as neighbours
> +#'  image(Matrix(D), main="Neighbourhood matrix")
> +#'  Mod3 <- preFitModel(fixed, random=~1|id + 1|nest, genabel.data = gen.data, 
> +#'           phenotype.data = Phen.Data, corStruc=list( id=list("GRM","Ind"), 
> +#'                                      nest=list("CAR")), Neighbor.Matrix=D )
> +#'  GWAS2b <- rGLS(fixed, genabel.data = gen.data, 
> +#'                 phenotype.data = Phen.Data, V = Mod3$V) 
> +#'  plot(GWAS2b)
> +#'
> +preFitModel <- function(fixed = y~1, random = ~1|id, id.name = "id", genabel.data, phenotype.data, corStruc = NULL, GRM = NULL, Neighbor.Matrix = NULL) {
> +  if (!inherits(fixed, "formula") || length(fixed) != 3) stop("\n Fixed-effects model must be a formula of the form \"resp ~ pred\"")
> +  if (!inherits(random, "formula")) stop("\n Random part must be a one-sided formula of the form \" ~ effect|Cluster\"")
> +  if (attr(terms(random), "response") != 0) stop("\n Random part must be a one-sided formula of the form \" ~ effect|Cluster\"")
> +  if (all.names(random)[2] != "|") stop("The subjects/clusters in Random must be separated by \" |\"")
> +  if (class(genabel.data)!="gwaa.data") stop("The input of genabel.data is not a GenABEL object")
> +  if (is.null(genabel.data at phdata$id)) stop("IDs not given as id in the phdata list")
> +  #Get trait name
> +  trait <- all.vars(fixed)[1]
> +  #Remove NAs from phenotypic data
> +  y.all <- phenotype.data[,names(phenotype.data) %in% trait]
> +  phenotype.data <- phenotype.data[!is.na(y.all),]
> +  #Connect IDs in GenABEL data set with IDs in the phenotype file
> +  id1 <- phenotype.data[,names(phenotype.data) %in% id.name] #ID for phenotype data
> +  id2 <- genabel.data at phdata$id #ID for genotype data
> +  test1 <- id1 %in% id2
> +  test2 <- id2 %in% id1
> +  genabel.data = genabel.data[test2, ] #Exclude individuals having no phenotype information
> +  phenotype.data = phenotype.data[test1, ] #Exclude individuals having no genotype information
> +  id1 <- phenotype.data[,names(phenotype.data) %in% id.name] #ID for phenotype data for cleaned data
> +  id2 <- genabel.data at phdata$id #ID for genotype data for cleaned data
> +  ### Create design matrix for the fixed effects ###
> +  Y <- phenotype.data[,names(phenotype.data) %in% trait]
> +  X <- model.matrix(fixed, data = phenotype.data)
> +  if (nrow(X) != length(Y)) stop("Remove all lines with NA from the phenotype input data before running this function.")
> +  row.names(X) <- NULL
> +  ###############################################
> +  ## Create design matrix for random effects ####
> +  ###############################################
> +  #Construct incidence matrix for repeated observations
> +  N <- length(id2)
> +  n <- length(id1)
> +  indx <- numeric(n)
> +  for (i in 1:N) {
> +    indx <- indx + i * (id1 %in% id2[i])
> +  }
> +  ### Get SQRT of GRM first
> +  if (is.null(GRM)) {
> +    autosomalMarkers <- which(chromosome(genabel.data) != "X")
> +    GRM <- compute.GRM(genabel.data[,snpnames(genabel.data)[autosomalMarkers]])  
> +  }
> +  eig <- eigen(GRM)
> +  if (max(diag(GRM))>1.6) print("There seems to be highly inbred individuals in your data")
> +  if (min(eig$values) < -0.5 ) print("The genetic relationship matrix is far from positive definite")
> +  non_zero.eigenvalues = eig$values>(1e-6) #Put numerically small eigenvalues to zero
> +  eig$values[!non_zero.eigenvalues] <- 0
> +  print("GRM ready")
> +  Z.GRM <- ( eig$vectors %*% diag(sqrt(eig$values)) )[indx,]
> +  ##Create Z
> +  RanTermVec <- unlist(strsplit(attr(terms(random), "term.labels"), split = c("+"), fixed = TRUE))
> +  k.rand <- length(RanTermVec)
> +  Z <- NULL
> +  RandC <- NULL
> +  data <- phenotype.data
> +  for (i in 1:k.rand) {
> +    RanTerm <- unlist(strsplit(RanTermVec[i], split = "|", fixed = TRUE))
> +    RanTerm[2] <- gsub(" ", "", RanTerm[2], fixed = TRUE)
> +    ranf <- paste("~", "as.factor(", RanTerm[2], ")", "-1", sep = "")
> +    ranf <- as.formula(ranf)
> +    rmf <- model.frame(ranf, data)
> +    z <- model.matrix(attr(rmf, "terms"), data = rmf)
> +    if (nrow(z) != length(Y)) stop("Remove all lines with NA from the phenotype input data before running this function.")
> +    row.names(z) <- NULL
> +    if (is.null(corStruc)) {
> +      RandC <- c(RandC, ncol(z))
> +      Z <- cbind(Z, z)
> +    }else{
> +      cs <- unlist( corStruc[names(corStruc) %in% RanTerm[2]] )
> +      for (j in 1:length(cs)) {
> +        zL <- switch(cs[j],
> +                     "Ind" = z,
> +                     "GRM" = Z.GRM,
> +                     "CAR" = z,
> +                     stop("Input correlation structure not available")
> +        )
> +        
> +        RandC <- c(RandC, ncol(zL))
> +        Z <- cbind(Z, zL)
> +        j.rand <- length(RandC)
> +        if (cs[j]=="CAR") {
> +          if ( is.null(Neighbor.Matrix) ) stop("CAR correlation requires a Neighbor.Matrix to be specified")
> +          if ( ncol(z) != ncol(Neighbor.Matrix) ) stop("Incorrect dimension of Neighbor.Matrix")
> +          if ( j.rand == 1) {
> +            rand.family = list( CAR(D=Neighbor.Matrix) )
> +          } else {
> +            rand.family[[j.rand]] = CAR(D=Neighbor.Matrix)
> +          }
> +        } else {
> +          if (j.rand == 1) rand.family = list(gaussian())
> +          if (j.rand > 1) rand.family[[j.rand]] = gaussian()
> +        }
> +      }
> +    }
> +  }
> +  #####################################
> +  ### Fit the model 
> +  if (length(RandC) > 1) mod1 <- hglm(y=Y, X=X, Z=Z, RandC=RandC, maxit=200, rand.family=rand.family)
> +  if (length(RandC) == 1) mod1 <- hglm(y=Y, X=X, Z=Z, RandC=RandC, maxit=200, rand.family=rand.family[[1]])
> +  ratio <- mod1$varRanef/mod1$varFix
> +  #stop("The constructV function needs to be changed if CAR model is to be used")
> +  #V <- constructV(Z, RandC, ratio)
> +   V = diag(nrow(Z))
> +    indx = 1
> +    for (i in 1:length(ratio)) {
> +    	Z.tmp <- Z[, indx:(indx + RandC[i] - 1)]
> +		indx = indx + RandC[i]
> +    	if (rand.family[[i]]$family != "CAR") {
> +			V <- V + tcrossprod(Z.tmp) * ratio[i]
> +    	} else {
> +    		ratio[i] <- mod1$CAR.tau/mod1$varFix
> +    		V <- V + Z.tmp %*% solve( diag( ncol(Z.tmp) ) - mod1$CAR.rho * Neighbor.Matrix ) %*% t(Z.tmp) * ratio[i]
> +    	}
> +    }
> +  list( fitted.hglm = mod1, V = V, ratio = ratio)
> +}
> 
> Added: pkg/RepeatABEL/R/rGLS.R
> ===================================================================
> --- pkg/RepeatABEL/R/rGLS.R	                        (rev 0)
> +++ pkg/RepeatABEL/R/rGLS.R	2015-11-20 11:56:57 UTC (rev 2023)
> @@ -0,0 +1,155 @@
> +#' @title GWAS for Studies having Repeated Measurements on Related Individuals
> +#'
> +#' @description
> +#'  It is used to perform genome-wide association studies on individuals that are both related and have repeated measurements. 
> +#'  The function computes score statistic based p-values for a linear mixed model including random polygenic effects and 
> +#'  a random effect for repeated measurements. A p-value is computed for each marker and the null hypothesis tested is a
> +#'   zero additive marker effect. 
> +#'
> +#' @param formula.FixedEffects Formula including the response variable and cofactors as fixed effects.
> +#' @param genabel.data An GenABEL object including marker information. This object has one observtion per individuals. 
> +#' @param phenotype.data A data frame including the repeated observations and IDs. 
> +#' @param id.name The column name of the IDs in phen.data
> +#' @param GRM An optional genetic relationship matrix (GRM) can be included as input. Otherwise the GRM is computed within the function.
> +#' @param V An optional (co)variance matrix can be included as input. Otherwise it is computed using the hglm function.
> +#' @param memory Used to optimize computations. The maximum number of elements in a matrix that can be stored efficiently.
> +#' @details 
> +#' A generalized squares (GLS) is fitted for each marker given a (co)variance matrix V. 
> +#' The computations are made fast by transforming the GLS to 
> +#' an ordinary least-squares (OLS) problem using an eigen-decomposition of V. 
> +#' The OLS are computed using QR-factorization. If V is not specified then a model 
> +#' including random polygenic effects and permanent environmental effects is 
> +#' fitted (using the hglm package) to compute V. A GenABEL object (scan.gwaa class) 
> +#' is returned (including also the \code{hglm} results).
> +#' Let e.g. GWAS1 be an object returned by the \code{rGLS} function. 
> +#' Then a Manhattan plot can be produced by calling \code{plot(GWAS1)} and 
> +#' the top SNPs using \code{summary(GWAS1)}. Both of these functions are 
> +#' generic GenABEL functions. \cr
> +#' The results from the fitted linear mixed model without any SNP effect included 
> +#' are produced by calling \code{summary(GWAS1@@call$hglm)}.
> +#' 
> +#' @author Lars Ronnegard
> +#' 
> +#' 
> +#' @examples
> +#'  data(Phen.Data) #Phenotype data with repeated observations
> +#'  data(gen.data) #GenABEL object including IDs and marker genotypes
> +#'  GWAS1 <- rGLS(y ~ age + sex, genabel.data = gen.data, phenotype.data = Phen.Data) 
> +#'  plot(GWAS1, main="")
> +#'  summary(GWAS1)
> +#'  #Summary for variance component estimation without SNP effects
> +#'  summary(GWAS1@@call$hglm) 
> +#'
> +rGLS <-
> +function(formula.FixedEffects = y ~ 1, genabel.data, phenotype.data, id.name = "id", GRM = NULL, V = NULL, memory=1e8) {
> +	#Check input data
> +	if (class(genabel.data) != "gwaa.data") stop("The input of genabel.data is not a GenABEL object")
> +	if (is.null(genabel.data at phdata$id)) stop("IDs not given as id in the phdata list")
> +    if (!is.null(GRM)) { if(!isSymmetric(GRM)) warning("The given GRM must be a symmetric matrix!") }
> +    if (!is.null(V)) { if(!isSymmetric(V)) warning("The given V must be a symmetric matrix!") }
> +    V.input  <- V
> +    #require(hglm)
> +    #require(GenABEL)
> +    #Get trait name
> +	trait <- all.vars(formula.FixedEffects)[1]
> +    #Remove NAs from phenotypic data
> +    y.all <- phenotype.data[,names(phenotype.data)%in%trait]
> +    phenotype.data <- phenotype.data[!is.na(y.all),]
> +	#Connect IDs in GenABEL data set with IDs in the phenotype file
> +	id1 <- phenotype.data[,names(phenotype.data) %in% id.name] #ID for phenotype data
> +	id2 <- genabel.data at phdata$id #ID for genotype data
> +    test1 <- id1 %in% id2
> +    test2 <- id2 %in% id1
> +    genabel.data <- genabel.data[test2,] #Exclude individuals having no phenotype information
> +    phenotype.data <- phenotype.data[test1,] #Exclude individuals having no genotype information
> +    id1 <- phenotype.data[,names(phenotype.data) %in% id.name] #ID for phenotype data for cleaned data
> +	id2 <- genabel.data at phdata$id #ID for genotype data for cleaned data
> +	#####################
> +	#Construct incidence matrix for repeated observations
> +    N=length(id2)
> +    n=length(id1)
> +    indx <- numeric(n)
> +    for (i in 1:N) {
> +        indx <- indx + i * (id1 %in% id2[i])
> +    }
> +    Z.indx <- diag(N)[indx,]
> +    #Construct response and design matrix
> +	y <- phenotype.data[ , names(phenotype.data) %in% trait] #Create the response variable
> +    X <- model.matrix(formula.FixedEffects, data = phenotype.data) #Fixed effect design matrix
> +	#####################
> + if (is.null(V)) {
> +    #Construct GRM
> +    if (is.null(GRM)) {
> +      autosomalMarkers <- which(chromosome(genabel.data)!= "X")
> +      GRM <- compute.GRM(genabel.data[ , snpnames(genabel.data)[autosomalMarkers]])
> +    }
> +    eig <- eigen(GRM)
> +    if (max(diag(GRM)) > 1.6) print("There seems to be highly inbred individuals in your data")
> +    if (min(eig$values < -0.5)) print("The genetic relationship matrix is far from positive definite")
> +    non_zero.eigenvalues <- eig$values>(1e-6) #Put numerically small eigenvalues to zero
> +    eig$values[ !non_zero.eigenvalues ] <- 0
> +    print("GRM ready")
> +    #####################
> +    #Fit hglm
> +    Z.GRM <- ( eig$vectors %*% diag(sqrt(eig$values)) )[indx, ]
> +    Z <- (cbind(Z.GRM, Z.indx))
> +    mod1 <- hglm(y=y, X=X, Z=Z, RandC = c(ncol(Z.GRM), ncol(Z.indx)), maxit = 200)
> +    if (mod1$Converge != "converged") stop("The variance component estimation did not converge in 200 iterations. Try to estimate them separately and provide the estimated (co)variance matrix V as input. \n\n")
> +    print("Variance component estimation ready")
> +    #####################
> +    #Construct rotation matrix
> +    ratio <- mod1$varRanef/mod1$varFix
> +    V <- constructV(Z=Z, RandC = c(ncol(Z.GRM), ncol(Z.indx)), ratio)
> +  }
> +	eig.V <- eigen(V)
> +	transf.matrix <- diag(1/sqrt(eig.V$values)) %*% t(eig.V$vectors)
> +	y.new <- transf.matrix %*% y
> +	X.new <- transf.matrix %*% X
> +	print("Rotation matrix ready")
> +	#####################
> +	#Fit a linear model for each SNP
> +	SNP.matrix <- as.double(genabel.data)
> +    if (sum(is.na(SNP.matrix)) > 0) {
> +        SNP.matrix <- SmoothSNPmatrix(SNP.matrix)
> +    }
> +	m <- ncol(SNP.matrix)
> +	p.val <- SNP.est <- rep(1, m)
> +	colnames(X.new) <- as.character(1:ncol(X.new)) #To avoid columns having strange names
> +	print("Rotate LMM started")
> +    #Fit using QR factorization
> +    #Null model
> +	qr0 <- qr(X.new)
> +	est0 <- qr.coef(qr0, y.new)
> +	res <- y.new - X.new %*% est0
> +    n <- length(y.new)
> +	RSS.0 <- sum(res^2)/n
> +	#Split computations into reasonably sized blocks
> +    if (memory < n) memory <- n
> +	step.size <- floor(memory/n)
> +    steps <- ceiling(m/step.size)
> +	jj=1
> +	kk=0
> +	for (step.i in 1:steps) {
> +		if (step.i == steps) kk <- kk + m %% step.size else kk <- kk + step.size
> +		markers.to.fit <- jj:kk
> +		snp.new <- transf.matrix %*% SNP.matrix[indx, markers.to.fit]
> +		mm <- 0
> +		for (j in markers.to.fit) {
> +			mm <- mm+1
> +			X1 <- cbind(snp.new[, mm], X.new)
> +			qr1 <- qr(X1)
> +			est1 <- qr.coef(qr1, y.new)
> +			res <- y.new - X1 %*% est1
> +			RSS.1 <- sum(res^2)/n
> +			SNP.est[j] <- est1[1]
> +			LRT <- -n * (log(RSS.1) - log(RSS.0))
> +			p.val[j] <- 1 - pchisq(LRT, df=1)
> +        }
> +        jj <- jj + step.size
> +	}
> +	print("Rotate LMM ready")
> +	#####################
> +	qt.results <- Create_gwaa_scan(genabel.data, p.val, SNP.est)
> +	if (is.null(V.input)) qt.results at call$hglm <- mod1
> +	return(qt.results)
> +}
> 
> Added: pkg/RepeatABEL/R/simulate_PhenData.R
> ===================================================================
> --- pkg/RepeatABEL/R/simulate_PhenData.R	                        (rev 0)
> +++ pkg/RepeatABEL/R/simulate_PhenData.R	2015-11-20 11:56:57 UTC (rev 2023)
> @@ -0,0 +1,88 @@
> +#' @title Simulation function for the RepeatABEL package.
> +#'
> +#' @description  The function takes a GenABEL object as input and generates simulated phenotypic values for related individuals having repeated obserevations.
> +#' @param formula.FixedEffects A formula including the name of the simulated variable as response, and cofactors as fixed effects.
> +#' @param genabel.data A GenABEL object of class gwaa.data.
> +#' @param n.obs A vector including the number of observations per individual. The length of n.obs must be equal to the number if individuals in genabel.data.
> +#' @param SNP.eff The size of a simulated SNP.effect.
> +#' @param SNP.nr The SNP genotype that the SNP effect is simulated on. SNP.nr=i is the i:th SNP.
> +#' @param beta The simulated fixed effects. Must be equal to the number of cofactors simulated (including the intercept term).
> +#' @param VC A vector of length 3 including the simulated variances of the polygenic effect, permanent environmental effect and residuals, respectively.
> +#' @param GRM  An optional input where the Genetic Relationship Matrix can be given. Otherwise it is computed using the GenABEL package.
> +#' @param sim.gamma A logical parameter specifying whether the residuals shuld be simulated from a gamma distribution or not. If specified as TRUE then residuals are drawn from a gamma distribution with variance equal to the residual variance specified in \code{VC[3]} 
> +
> +#' @return Returns a data frame including the simulated phenotypic values, cofactors and IDs.
> +#' @author Lars Ronnegard
> +
> +#' @examples
> +#' data(gen.data)
> +#'  #Simulate 4 observations per individual
> +#'  set.seed(1234)
> +#'  Phen.Sim <- simulate_PhenData(y ~ age, genabel.data=gen.data, 
> +#'                  n.obs=rep(4, nids(gen.data)), SNP.eff=1, SNP.nr=1000, VC=c(1,1,1))
> +#'  GWAS1 <- rGLS(y ~ age, genabel.data = gen.data, phenotype.data = Phen.Sim)
> +#'  plot(GWAS1, main="Simulated Data Results")
> +#'  
> +simulate_PhenData <-
> +function(formula.FixedEffects = y~1, genabel.data, n.obs, SNP.eff = NULL, SNP.nr = NULL, beta = NULL, VC = c(1,1,1), GRM = NULL, sim.gamma = FALSE){
> +    #library(GenABEL)
> +  if (length(SNP.eff) != length(SNP.nr)) stop("The number of elements in SNP.eff and SNP.nr must be the same")
> +  if (is.null(SNP.eff)) {
> +      SNP.eff <- 0
> +      SNP.nr <- 1
> +  }
> +  SNP <- as.double(genabel.data at gtdata[,SNP.nr])
> +  rownames(SNP) <- NULL
> +  SNP <- SmoothSNPmatrix(SNP)
> +  n <- nids(genabel.data)
> +  if (length(n.obs) != n) stop("The number of individuals and the length of n.obs must be the same")
> +  N <- sum(n.obs)
> +  id1 <- rep(genabel.data at phdata$id, n.obs)
> +  id2 <- genabel.data at phdata$id
> +  #Construct incidence matrix for repeated observations
> +  indx <- numeric(N)
> +  for (i in 1:n) {
> +      indx <- indx + i * (id1 %in% id2[i])
> +  }
> +  genabel.data at phdata$temp.dummy <- 0
> +  n.names <- length(colnames(genabel.data at phdata))
> +  colnames(genabel.data at phdata) <- c(colnames(genabel.data at phdata)[-n.names], all.vars(formula.FixedEffects)[1])
> +  ###########
> +  if (is.null(GRM)) {
> +    autosomalMarkers <- which(chromosome(genabel.data) != "X")
> +    GRM <- compute.GRM(genabel.data[ , snpnames(genabel.data)[autosomalMarkers]])
> +  }
> +  eig <- eigen(GRM)
> +  if (max(diag(GRM)) > 1.6) print("There seems to be highly inbred individuals in your data")
> +  if (min(eig$values < -0.5)) print("The genetic relationship matrix is far from positive definite")
> +  non_zero.eigenvalues <- eig$values>(1e-6) #Put numerically small eigenvalues to zero
> +  eig$values[!non_zero.eigenvalues] <- 0
> +  Z.GRM <- ( eig$vectors %*% diag(sqrt(eig$values)) )[indx, ]
> +  rownames(Z.GRM) <- NULL
> +  ############
> +  X0 <- model.matrix(formula.FixedEffects, data = genabel.data at phdata)
> +  row.names(X0) <- NULL
> +  if (is.null(beta)) beta <- rep(0, ncol(X0))
> +  if (ncol(X0) != length(beta)) stop("The length of beta must be the same as the number columns in the cofactor design matrix")
> +  X <- X0[indx, 1:ncol(X0)]
> +  if (class(X)=="numeric") X <- matrix(X, length(X), 1)
> +  sigma2a <- VC[1]
> +  sigma2p <- VC[2]
> +  sigma2e <- VC[3]
> +  a <- rnorm(n, 0, sqrt(sigma2a))
> +  p <- rnorm(n, 0, sqrt(sigma2p))
> +  e <- rnorm(N, 0, sqrt(sigma2e))
> +  if (sim.gamma) e <- rgamma(N, sigma2e) - sigma2e #Follows a gamma distribution of variance sigma2e and mean=0
> +  y0 <- X %*% beta + Z.GRM %*% a + p[indx] + e
> +  if (length(SNP.nr)==1) y <- y0 + SNP[indx,] * SNP.eff
> +  if (length(SNP.nr)>1) y <- y0 + SNP[indx,] %*% SNP.eff
> +  id <- rep(genabel.data at phdata$id, n.obs)
> +  for (i in 1:length(all.vars(formula.FixedEffects))) {
> +    var.name <- all.vars(formula.FixedEffects)[i]
> +    if (i == 1) xx <- y
> +    if (i > 1) xx <- cbind(xx, rep(genabel.data at phdata[ , names(genabel.data at phdata) %in% var.name], n.obs))
> +  }
> +  colnames(xx) <- all.vars(formula.FixedEffects)
> +  Phen.Data <- data.frame(xx, id)
> +  return(Phen.Data)
> +}
> 
> Added: pkg/RepeatABEL/data/Phen.Data.rda
> ===================================================================
> (Binary files differ)
> 
> 
> Property changes on: pkg/RepeatABEL/data/Phen.Data.rda
> ___________________________________________________________________
> Added: svn:mime-type
>    + application/octet-stream
> 
> Added: pkg/RepeatABEL/data/datalist
> ===================================================================
> --- pkg/RepeatABEL/data/datalist	                        (rev 0)
> +++ pkg/RepeatABEL/data/datalist	2015-11-20 11:56:57 UTC (rev 2023)
> @@ -0,0 +1,4 @@
> +Phen.Data
> +flycatchers
> +gen.data
> +thinned.flycatchers
> 
> Added: pkg/RepeatABEL/data/flycatchers.rda
> ===================================================================
> (Binary files differ)
> 
> 
> Property changes on: pkg/RepeatABEL/data/flycatchers.rda
> ___________________________________________________________________
> Added: svn:mime-type
>    + application/octet-stream
> 
> Added: pkg/RepeatABEL/data/gen.data.rda
> ===================================================================
> (Binary files differ)
> 
> 
> Property changes on: pkg/RepeatABEL/data/gen.data.rda
> ___________________________________________________________________
> Added: svn:mime-type
>    + application/octet-stream
> 
> Added: pkg/RepeatABEL/data/thinned.flycatchers.rda
> ===================================================================
> (Binary files differ)
> 
> 
> Property changes on: pkg/RepeatABEL/data/thinned.flycatchers.rda
> ___________________________________________________________________
> Added: svn:mime-type
>    + application/octet-stream
> 
> Added: pkg/RepeatABEL/inst/CITATION
> ===================================================================
> --- pkg/RepeatABEL/inst/CITATION	                        (rev 0)
> +++ pkg/RepeatABEL/inst/CITATION	2015-11-20 11:56:57 UTC (rev 2023)
> @@ -0,0 +1,34 @@
> +citHeader("To cite RepeatABEL in publications use:")
> +
> +citEntry(entry="Article",
> +         title = "Increasing the power of genome wide association studies in natural populations using repeated measures: evaluation and implementation",
> +         author = personList(as.person("Lars Ronnegard"),
> +                             as.person("S. Eryn McFarlane"),
> +                             as.person("Arild Husby"),
> +                             as.person("Takeshi Kawakami"),
> +                             as.person("Hans Ellegren"),
> +                             as.person("Anna Qvarnstrom")),
> +         journal      = "Methods in Ecology and Evolution",
> +         year         = "2016",
> +         volume       = "1",
> +         number       = "1",
> +         pages        = "1-1",
> +         
> +         textVersion = 
> +         paste("Lars Ronnegard, S. Eryn McFarlane, Arild Husby, Takeshi Kawakami, Hans Ellegren and Anna Qvarnstrom (2016)", 
> +               "Increasing the power of genome wide association studies in natural populations using repeated measures: evaluation and implementation.",
> +               "Methods in Ecology and Evolution, (pending revision).")
> +)
> +
> +citEntry(entry="Manual",
> +title = "GenABEL: genome-wide SNP association analysis",
> +author = "GenABEL project developers",
> +    year = "2013",
> +    note = "R package version 1.8-0",
> +    url = "http://CRAN.R-project.org/package=GenABEL",
> +
> +textVersion = 
> +         paste("GenABEL project developers (2013).",
> +"GenABEL: genome-wide SNP association analysis.",
> +"R package version 1.8-0.","http://CRAN.R-project.org/package=GenABEL")
> +)
> 
> Added: pkg/RepeatABEL/inst/doc/RepeatABEL_vignette.pdf
> ===================================================================
> --- pkg/RepeatABEL/inst/doc/RepeatABEL_vignette.pdf	                        (rev 0)
> +++ pkg/RepeatABEL/inst/doc/RepeatABEL_vignette.pdf	2015-11-20 11:56:57 UTC (rev 2023)
> @@ -0,0 +1,6997 @@
> +%PDF-1.5
> +%����
> +4 0 obj <<
> +/Length 521       
> +>>
> +stream
> +concordance:RepeatABEL_vignette.tex:RepeatABEL_vignette.Rnw:1 101 1 1 5 37 1 1 2 1 0 1 2 1 0 1 2 4 0 1 2 2 1 1 3 12 0 1 2 5 1 1 2 29 0 1 2 4 1 1 2 1 0 1 3 10 0 1 2 3 1 1 2 1 0 1 4 12 0 1 2 3 1 1 4 9 0 1 2 8 1 1 3 2 0 1 4 8 0 1 2 8 0 1 1 28 0 1 2 6 1 1 3 2 0 1 6 4 0 1 2 4 0 1 2 4 1 1 2 1 0 6 1 3 0 1 2 2 1 1 5 4 0 1 2 4 0 1 2 5 1 1 2 1 0 1 1 3 0 1 2 2 1 1 5 7 0 1 2 2 1 1 3 5 0 1 2 2 1 1 4 6 0 1 2 2 1 1 3 2 0 1 1 3 0 1 2 8 1 1 4 3 0 1 2 1 0 2 1 1 5 4 0 1 7 6 0 1 3 2 0 1 2 1 0 1 4 6 0 1 2 4 1 1 4 3 0 2 1 1 2 4 0 1 2 3 1
> +endstream
> +endobj
> +7 0 obj <<
> +/Length 270       
> +/Filter /FlateDecode
> +>>
> +stream
> +xڅ�=O1
���
���I.��J��Nـ�ׂ�8tV�:vST118v�گ���
��O^����g�d	c��l�M�����"A���*�#Ǩ�w����wm(�q�r�XrZ��X at 5<hJ�YGRҰ;͚�4�۞�O9��I�^�]�b<�0Jѯ�;@d"A�B�)�	�f�Vȵ�||�7��/��f�`�
> +)����ާ���ގ12-yu�(�����
������f"0�m��\O�����ˎ^�vϋj�8
UT��fb�q`�c������b6
> +endstream
> +endobj
> +16 0 obj <<
> +/Length 3463      
> +/Filter /FlateDecode
> +>>
> +stream
> +x��Z[�۶~�����CM-�At��4�q&�x�m�I�t��V�Z���K}�
> + HA^e�&�@\���
�j����%�_�>��e�F�慭���~��͝2�ZU�6��v6�1{5��f\����v<)T6;M�K�n7��~��K7j�2v�F����o�
�=@Kk���]l��
�Ŀ9~,�����8ȅ�톿,I
> +��0u6��	��G�s+�1����|���
> +v���U6o�ۋ/������
$���~7.��]�����%OBd�a
> +j�p�Z�MY�e�=4����5ͦ�n�km�o9�)1#YZ̸�
> +��#�nQY�)O
> +'9�o�;�%�|Ow��Xi�5<���J^���Лeų�bA���RٺE�}��
�М3$���B
> +�ŖPự+x.$���5hI��KxJ?)]̧���O�fbL�l;t�>d�7g]��(ߧVݠ,����Q쐤��Sa)0}6*߉f0�C��Q
> +��
> +k>�t�ݝ�%w��
��ْ���=���%$���H6E�����ʸD���f=]	fl�`��9���ί�`����Ģ�bۢ�^p�5['0/�>,E����Ke�F�!�ѣ��~�Z��Ã�<�囆YKs
�0���H�V�}^����_�b��h�\�Z
Y'_��c���kq
�)B����/���Ŕ�#en�PG
��O�TĀ��<�*�Tt����{��ǥ��5�R�g�3��%
��)���^�]��4�_��6��.��(� ���
����9X at Y B�B
.Ie4���� �a�@L{�f!�-�{jMBcl͐C�b�
���چ�m�o*ǹr��2衕a��?C��
> +��]U��h�[+&��<�8���n��z�ʫ����Т7dAPd���&��I��i�qV��,
> [TRUNCATED]
> 
> To get the complete diff run:
>     svnlook diff /svnroot/genabel -r 2023
> 
> 
> 
> _______________________________________________
> Genabel-commits mailing list
> Genabel-commits at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits
> 

-- 
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
L.C. Karssen
Utrecht
The Netherlands

lennart at karssen.org
http://blog.karssen.org
GPG key ID: A88F554A
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 213 bytes
Desc: OpenPGP digital signature
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20151120/46401442/attachment-0001.sig>


More information about the genabel-devel mailing list