[Genabel-commits] r641 - in pkg: . PredictABEL PredictABEL/R PredictABEL/data PredictABEL/inst PredictABEL/man

noreply at r-forge.r-project.org noreply at r-forge.r-project.org
Wed Feb 9 21:45:41 CET 2011


Author: lckarssen
Date: 2011-02-09 21:45:41 +0100 (Wed, 09 Feb 2011)
New Revision: 641

Added:
   pkg/PredictABEL/
   pkg/PredictABEL/DESCRIPTION
   pkg/PredictABEL/R/
   pkg/PredictABEL/R/PredictABEL.R
   pkg/PredictABEL/data/
   pkg/PredictABEL/data/ExampleData.RData
   pkg/PredictABEL/inst/
   pkg/PredictABEL/inst/doc/
   pkg/PredictABEL/man/
   pkg/PredictABEL/man/ExampleData.Rd
   pkg/PredictABEL/man/ExampleModels.Rd
   pkg/PredictABEL/man/ORmultivariate.Rd
   pkg/PredictABEL/man/ORunivariate.Rd
   pkg/PredictABEL/man/PredictABEL-package.Rd
   pkg/PredictABEL/man/fitLogRegModel.Rd
   pkg/PredictABEL/man/plotCalibration.Rd
   pkg/PredictABEL/man/plotDiscriminationBox.Rd
   pkg/PredictABEL/man/plotPredictivenessCurve.Rd
   pkg/PredictABEL/man/plotPriorPosteriorRisk.Rd
   pkg/PredictABEL/man/plotROC.Rd
   pkg/PredictABEL/man/plotRiskDistribution.Rd
   pkg/PredictABEL/man/plotRiskscorePredrisk.Rd
   pkg/PredictABEL/man/predRisk.Rd
   pkg/PredictABEL/man/reclassification.Rd
   pkg/PredictABEL/man/riskScore.Rd
Log:
Initial commit of Suman Kundu's PredictABEL package

Added: pkg/PredictABEL/DESCRIPTION
===================================================================
--- pkg/PredictABEL/DESCRIPTION	                        (rev 0)
+++ pkg/PredictABEL/DESCRIPTION	2011-02-09 20:45:41 UTC (rev 641)
@@ -0,0 +1,18 @@
+Package: PredictABEL
+Title: Assessment of risk prediction models
+Version: 1.1
+Date: 2011-02-09
+Author: Suman Kundu, Yurii S. Aulchenko, A. Cecile J.W. Janssens
+Maintainer: Suman Kundu <s.kundu at erasmusmc.nl>, A. Cecile J.W. Janssens <a.janssens at erasmusmc.nl>
+Depends: R (>= 2.9.0), Hmisc, ROCR, epitools, PBSmodelling
+Suggests: GenABEL
+Description: PredictABEL includes functions to assess the performance of risk models. The package contains functions for the various measures that are used
+    in empirical studies, including univariate and multivariate odds ratios (OR) of the predictors, the c-statistic (or area under the receiver operating
+    characteristic (ROC) curve (AUC)), Hosmer-Lemeshow goodness of fit test, reclassification table, net reclassification improvement (NRI) and integrated
+    discrimination improvement (IDI). Also included are functions to create plots, such as risk distributions, ROC curves, calibration plot, discrimination
+    box plot and predictiveness curves. In addition to functions to assess the performance of risk models, the package includes functions to obtain weighted
+    and unweighted risk scores as well as predicted risks using logistic regression analysis. These logistic regression functions are specifically written
+    for models that include genetic variables, but they can also be applied to models that are based on non-genetic risk factors only.
+License: GPL (>= 2)
+Collate: 'PredictABEL.R'
+Packaged: 2011-02-09 12:39:12 UTC; 488810

Added: pkg/PredictABEL/R/PredictABEL.R
===================================================================
--- pkg/PredictABEL/R/PredictABEL.R	                        (rev 0)
+++ pkg/PredictABEL/R/PredictABEL.R	2011-02-09 20:45:41 UTC (rev 641)
@@ -0,0 +1,1685 @@
+#'  An R package for the analysis of (genetic) risk prediction studies.  
+#'
+#' Fueled by the substantial gene discoveries from genome-wide association 
+#' studies, there is increasing interest in investigating the predictive 
+#' ability of genetic risk models. To assess the performance of genetic risk 
+#' models, PredictABEL includes functions for the various measures and plots 
+#' that have been used in empirical studies, including univariate and 
+#' multivariate odds ratios (ORs) of the predictors, the c-statistic (or AUC), 
+#' Hosmer-Lemeshow goodness of fit test, reclassification table, net 
+#' reclassification improvement (NRI) and integrated discrimination 
+#' improvement (IDI). The plots included are the ROC plot, calibration plot, 
+#' discrimination box plot, predictiveness curve, and several risk distributions. 
+#'
+#'
+#' These functions can be applied to predicted risks that are obtained using 
+#' logistic regression analysis, to weighted or unweighted risk scores, for 
+#' which the functions are included in this package. The functions can also be 
+#' used to assess risks or risk scores that are constructed using other methods, e.g., Cox Proportional 
+#' Hazards regression analysis, which are not included in the current version. 
+#' Risks obtained from other methods can be imported into R for assessment 
+#' of the predictive performance.
+#'
+#'
+#' The functions to construct the risk models using logistic regression analyses 
+#' are specifically written for models that include genetic variables, 
+#' eventually in addition to non-genetic factors, but they can also be applied 
+#' to construct models that are based on non-genetic risk factors only. \cr 
+#'
+#'
+#' Before using the functions \code{\link{fitLogRegModel}} for constructing 
+#' a risk model or \code{\link{riskScore}} for computing risk 
+#' scores, the following checks on the dataset are advisable to be done: 
+#' 
+#' (1) Missing values: The logistic regression analyses and computation of 
+#' the risk score are done only for subjects that have no missing data. In case 
+#' of missing values, individuals with missing data can be removed from the 
+#' dataset or imputation strategies can be used to fill in missing data. 
+#' Subjects with missing data can be removed with the R function \code{na.omit} 
+#' (available in \code{stats} package). 
+#' Example: \code{DataFileNew <- na.omit(DataFile)} 
+#' will make a new dataset (\code{DataFileNew}) with no missing values;
+#'
+#' 
+#' (2) Multicollinearity: When there is strong correlation between the 
+#' predictor variables, regression coefficients may be estimated imprecisely 
+#' and risks scores may be biased because the assumption of independent effects 
+#' is violated. In genetic risk prediction studies, problems with 
+#' multicollinearity should be expected when single nucleotide polymorphisms 
+#' (SNPs) located in the same gene are                                                    
+#' in strong linkage disequilibrium (LD). For SNPs in LD it is common to select 
+#' the variant with the lowest p-value in the model;  
+#' 
+#' 
+#' (3) Outliers: When the data contain significant outliers, either clinical 
+#' variables with extreme values of the outcomes or extreme values resulting 
+#' from errors in the data entry, these may impact the construction of the risk models and 
+#' computation of the risks scores. Data should be carefully checked and outliers 
+#' need to be removed or replaced, if justified;
+#'
+#' (4) Recoding of data: In the computation of unweighted risk scores, it is assumed 
+#' that the genetic variants are coded \code{0,1,2}  
+#' representing the number of alleles carried. When variants 
+#' are coded \code{0,1} representing a dominant or recessive effect of the alleles, 
+#' the variables need to be recoded before unweighted risk scores can be computed. \cr
+#'
+#'
+#' To import data into R several alternative strategies can be used. Use the 
+#' \code{Hmisc} package for importing SPSS and SAS data into R. 
+#' Use "\code{ExampleData <- read.table("DataName.txt", header=T, sep="\t")}" for text 
+#' files where variable names are included as column headers and data are 
+#' separated by tabs.  
+#' Use "\code{ExampleData <- read.table("Name.csv", sep=",", header=T)}" 
+#' for comma-separated files with variable names as column headers. 
+#' Use \code{"setwd(dir)"} to set the working directory to "dir". The datafile 
+#' needs to be present in the working directory. \cr 
+#'
+#'
+#' To export datafiles from R tables to a tab-delimited textfile with the first row as
+#' the name of the variables, 
+#' use "\code{write.table(R_Table, file="Name.txt", row.names=FALSE, sep="\t")}"  and 
+#' when a comma-separated textfile is requested and variable names are provided in the first row,
+#' use "\code{write.table(R_Table, file="Name.csv", row.names=FALSE, sep=",")}".  
+#' When the directory is not specified, the file will be
+#' saved in the working directory. For exporting R data into SPSS, SAS and 
+#' Stata data, use functions in the the \code{foreign} package. \cr
+#'
+#' Several functions in this package depend on other R packages:
+#' 
+#' (1) \code{Hmisc}, is used to compute NRI and IDI;
+#' 
+#' (2) \code{ROCR}, is used to produce ROC plots;
+#' 
+#' (3) \code{epitools}, is used to compute  univariate odds ratios;
+##' 
+#' (4) \code{PBSmodelling}, is used to produce predictiveness curve.
+#'
+#' @note The current version of the package includes the basic measures 
+#' and plots that are used in the assessment of (genetic) risk prediction models. 
+#' Planned extensions of the package include functions to construct risk 
+#' models using Cox Proportional Hazards analysis for prospective data and 
+#' functions to construct simulated data for the evaluation of 
+#' genetic risk models (see Janssens et al, Genet Med 2006).
+#' 
+#'
+##' @author  Suman Kundu  
+##'
+##' Yurii S. Aulchenko
+##'
+##' A. Cecile J.W. Janssens  
+##'
+##' @keywords package
+#' 
+#' @references S Kundu, YS Aulchenko, CM van Duijn, ACJW Janssens. PredictABEL:
+#' an R package for the assessment of risk prediction models.
+#' Eur J Epidemiol 2011. In press. \cr
+#'
+#' ACJW Janssens, JPA Ioannidis, CM van Duijn, J Little, MJ Khoury.
+#' Strengthening the Reporting of Genetic Risk Prediction Studies: The GRIPS
+#' Statement Proposal. Eur J Epidemiol 2011. In press. \cr
+#'
+#' ACJW Janssens, JPA Ioannidis, S Bedrosian, P Boffetta, SM Dolan, N Dowling,
+#' I Fortier, AN. Freedman, JM Grimshaw, J Gulcher, M Gwinn, MA Hlatky, H Janes,
+#' P Kraft, S Melillo, CJ O'Donnell, MJ Pencina, D Ransohoff, SD Schully,
+#' D Seminara, DM Winn, CF Wright, CM van Duijn, J Little, MJ Khoury.
+#' Strengthening the reporting of genetic risk prediction studies
+#' (GRIPS)-Elaboration and explanation. Eur J Epidemiol 2011. In press. \cr  
+#'
+#' Aulchenko YS, Ripke S, Isaacs A, van Duijn CM. GenABEL: an R package for genome-wide
+#' association analysis. Bioinformatics 2007;23(10):1294-6.
+#'
+"PredictABEL-package" <- function() {}
+#' Function to fit a logistic regression model.
+#'
+#' The function fits a standard GLM function for the logistic regression model.
+#' This function can be used to construct a logistic regression model based on genetic and non-genetic
+#' predictors. The function also allows to enter the genetic predictors 
+#' as a single risk score. For that purpose, the function requires that 
+#' the dataset additionally includes the risk score.
+#' A new dataset can be constructed using
+#'  "\code{NewExampleData <- cbind(ExampleData,riskScore)}". 
+#' The genetic risk scores can be obtained
+#' using the function \code{\link{riskScore}} in this package or be
+#' imported from other methods. 
+#'
+#' @param data Data frame or matrix that includes the outcome and 
+#' predictor variables.  
+#' @param cOutcome Column number of the outcome variable. \code{cOutcome=2}
+#'  means that the second column of the dataset is the outcome variable.
+#' To fit the logistic regression model, the outcome variable needs to be 
+#' (re)coded as \code{1} for the presence and \code{0} for the absence of the
+#' outcome of interest. 
+#' @param cNonGenPreds Column numbers of the non-genetic predictors that are  
+#' included in the model. An example to denote column numbers is  
+#' \code{c(3,6:8,10)}. Choose \code{c(0)} when no non-genetic predictors 
+#' are considered.
+#' @param cNonGenPredsCat Column numbers of the non-genetic predictors that  
+#' are entered as categorical variables in the model. When non-genetic 
+#' predictors are not specified as being categorical they are treated as     
+#' continuous variables in the model. If no non-genetic predictors are 
+#' categorical, denote \code{c(0)}. 
+#' @param cGenPreds Column numbers of the genetic predictors or genetic risk score.
+#' Denote \code{c(0)}
+#' when the prediction model does not consider 
+#' genetic predictors or genetic risk score.
+#' @param cGenPredsCat Column numbers of the genetic predictors that are 
+#' entered as categorical variables in the model. When SNPs are considered as 
+#' categorical, the model  
+#' will estimate effects per genotype. Otherwise, SNPs are considered as  
+#' continuous variables for which  the model will estimate an allelic effect.  
+#' Choose c(0) when no genetic predictors are considered as categorical 
+#' or when genetic predictors are entered as a risk score into the model.
+#'
+#' @return No value returned.
+#'
+#' @keywords models
+#'
+#'
+#' @seealso  \code{\link{predRisk}}, \code{\link{ORmultivariate}}, \code{\link{riskScore}}
+#' @examples
+#' # specify dataset with outcome and predictor variables 
+#' data(ExampleData)
+#' # specify column number of outcome variable
+#'  cOutcome <- 2 
+#'  # specify column numbers of non-genetic predictors
+#'  cNonGenPred <- c(3:10)
+#'  # specify column numbers of non-genetic predictors that are categorical      
+#'  cNonGenPredCat <- c(6:8)
+#'  # specify column numbers of genetic predictors
+#'  cGenPred <- c(11,13:16)
+#'  # specify column numbers of genetic predictors that are categorical
+#'  cGenPredCat <- c(0) 
+#'
+#'  # fit logistic regression model
+#'  riskmodel <- fitLogRegModel(data=ExampleData, cOutcome=cOutcome, 
+#'  cNonGenPreds=cNonGenPred, cNonGenPredsCat=cNonGenPredCat, 
+#'  cGenPreds=cGenPred, cGenPredsCat=cGenPredCat)
+#' 
+#'  # show summary details for the fitted risk model
+#'  summary(riskmodel)
+#'
+"fitLogRegModel" <-
+function (data, cOutcome, cNonGenPreds, cNonGenPredsCat,
+cGenPreds, cGenPredsCat) {
+
+if (missing(cNonGenPreds)) {cNonGenPreds<- c(0)}
+if (missing(cNonGenPredsCat)) {cNonGenPredsCat<- c(0)}
+if (missing(cGenPreds)) {cGenPreds<- c(0)}
+if (missing(cGenPredsCat)) {cGenPredsCat<- c(0)}
+if ((cGenPreds[1]==0) & (cNonGenPreds[1]==0)) {
+stop("No predictors have been considered.\n")
+}
+
+NonGen_factor <- as.data.frame(data[,cNonGenPredsCat])
+Gen_factor<- as.data.frame(data[,cGenPredsCat])
+if(dim(Gen_factor)[2] >0 )
+{
+for(i in 1:dim(Gen_factor)[2])
+{
+Gen_factor[,i] <- as.factor(Gen_factor[,i])
+}
+}
+if(dim(NonGen_factor)[2] >0 )
+{
+for(i in 1:dim(NonGen_factor)[2])
+{
+NonGen_factor[,i] <- as.factor(NonGen_factor[,i])
+}
+}
+
+p <- cbind(data[,setdiff(cNonGenPreds,cNonGenPredsCat)],NonGen_factor,
+data[,setdiff(cGenPreds,cGenPredsCat)],Gen_factor)
+colnames(p)<- c(colnames(data)[setdiff(cNonGenPreds,cNonGenPredsCat)],colnames(data)[cNonGenPredsCat],
+colnames(data)[setdiff(cGenPreds,cGenPredsCat)],colnames(data)[cGenPredsCat])
+
+outcome<- data[,cOutcome]
+genes <- colnames(p)
+fmla<- as.formula(paste("outcome~", paste(genes, collapse="+")))
+model <- glm(fmla, data=p, family=binomial("logit"))
+return(model)
+
+}
+#' Function to compute predicted risks for all individuals in the dataset. 
+#' 
+#' The function computes predicted risks from a specified logistic regression model.   
+#' The function \code{\link{fitLogRegModel}} can be used to construct such a model.  
+#'
+#' @param riskModel Name of logistic regression model that can be fitted using  
+#' the function \code{\link{fitLogRegModel}}.
+#' @param data Data frame or matrix that includes the outcome, ID number and 
+#' predictor variables. 
+#' @param cID Column number of ID variable. The ID number and predicted risks 
+#' will be saved under \code{filename}. When \code{cID} is not specified, the output is not saved. 
+#' @param filename Name of the output file in which the ID number and 
+#' estimated predicted risks will be saved. The file is saved in the working  
+#' directory as a txt file. When no \code{filename} is specified, the output is not saved.
+#'
+#'
+#'
+#' @return The function returns a vector of predicted risks.
+#'
+#'
+#' @keywords htest
+#'
+#'
+#' @seealso \code{\link{fitLogRegModel}}, \code{\link{plotCalibration}}, 
+#' \code{\link{plotROC}}, \code{\link{plotPriorPosteriorRisk}}
+#'
+#' @examples
+#' # specify dataset with outcome and predictor variables
+#' data(ExampleData)
+#' # specify column number of the outcome variable
+#'  cOutcome <- 2
+#' # specify column number of ID variable
+#'  cID <- 1
+#'  # specify column numbers of non-genetic predictors
+#'  cNonGenPred <- c(3:10)
+#'  # specify column numbers of non-genetic predictors that are categorical
+#'  cNonGenPredCat <- c(6:8)
+#'  # specify column numbers of genetic predictors
+#'  cGenPred <- c(11,13:16)
+#'  # specify column numbers of genetic predictors that are categorical
+#'  cGenPredCat <- c(0)
+#'
+#'  # fit logistic regression model
+#'  riskmodel <- fitLogRegModel(data=ExampleData, cOutcome=cOutcome,
+#'  cNonGenPreds=cNonGenPred, cNonGenPredsCat=cNonGenPredCat,
+#'  cGenPreds=cGenPred, cGenPredsCat=cGenPredCat)
+#'
+#'  # obtain predicted risks 
+#'  predRisk <- predRisk(riskModel=riskmodel, filename="name.txt") 
+#'
+"predRisk" <- 
+function(riskModel, data, cID, filename)
+{
+ if (any(class(riskModel) == "glm"))
+  {
+   predrisk <- predict(riskModel, type="response")
+  } 
+else 
+  {
+stop("The argument 'riskModel' should be a (GLM)model")      
+  }
+     
+  if (!missing(data)&& !missing(cID)&& !missing(filename))
+  {tab <- cbind(ID=data[, cID],PredRisk=predrisk)
+  write.table(tab, file=filename, row.names = FALSE,sep = "\t")
+  }
+   return(predrisk)
+  }   
+#' Function to compute genetic risk scores. The function computes unweighted 
+#' or weighted genetic risk scores. The relative effects (or weights) of 
+#' genetic variants can either come from beta coefficients of a risk model 
+#' or from a vector of beta coefficients imported into R, e.g., when beta cofficients are obtained from meta-analysis.
+#' 
+#'
+#' The function calculates unweighted 
+#' or weighted genetic risk scores. The unweighted genetic risk score is a simple
+#' risk allele count assuming that all alleles have the same effect. For this
+#' calculation, it is required that the genetic variables are coded as the number of risk
+#' alleles. Beta coefficients are used to determine which allele is the risk
+#' allele. When the sign of the beta coefficient is negative, the allele coding
+#' is reversed. The weighted risk score is a sum of the number of risk alleles
+#' multiplied by their beta coefficients.
+#'
+#' The beta coefficients can come from two different sources, either beta coefficients of a risk model 
+#' or a vector of beta coefficients imported into R, e.g., when beta cofficients are obtained from meta-analysis.
+#' This vector of beta coefficients 
+#' should be a named vector containing the same names as mentioned in genetic variants.
+#' A logistic regression model can be constructed using \code{\link{fitLogRegModel}}
+#' from this package.
+#'  
+#' @note When a vector of beta coefficients is imported, it should be checked 
+#' whether the DNA strands and the coding of the risk alleles are the same 
+#' as in the study data. The functions are available in the package \code{GenABEL}
+#' to accurately compute risk scores when the DNA strands are different or the risk 
+#' alleles are coded differently in the study data and the data used in meta-analysis.  
+#'
+#' @param weights The vector that includes the weights given to the genetic 
+#' variants. See details for more informations.
+#' @param data Data frame or matrix that includes the outcome  
+#' and predictors variables. 
+#' @param cGenPreds Column numbers of the genetic variables on the basis of   
+#'  which the risk score is computed.
+#' @param Type Specification of the type of risk scores that will be computed.
+#' Type can be weighted (\code{Type="weighted"}) or 
+#' unweighted (\code{Type="unweighted"}). 
+#'
+#' @return The function returns a vector of risk scores.
+#'
+#'
+#' @keywords htest
+#'
+#'
+#' @seealso \code{\link{plotRiskDistribution}}, \code{\link{plotRiskscorePredrisk}}
+#' @examples
+#' # specify dataset with outcome and predictor variables
+#'  data(ExampleData) 
+#'  # specify column numbers of genetic predictors
+#'  cGenPred <- c(11:16)
+#'
+#'  # fit a logistic regression model
+#'  # all steps needed to construct a logistic regression model are written in a function
+#'  # called 'ExampleModels', which is described on page 4-5
+#'  riskmodel <- ExampleModels()$riskModel2
+#'
+#'  # compute unweighted risk scores 
+#'  riskScore <- riskScore(weights=riskmodel, data=ExampleData, 
+#' cGenPreds=cGenPred, Type="unweighted")  
+#'
+"riskScore" <- 
+function(weights, data, cGenPreds, Type )
+{
+riskModel <- weights
+x <- data[, cGenPreds]
+if (any(class(riskModel) == "glm")) 
+  {
+    if(! setequal(intersect(names(riskModel$coef),colnames(x)),colnames(x)))
+    {
+     stop("The risk model does not contain all the genetic variants as predictors")
+    }
+     else
+    {
+     y <- riskModel$coef[intersect(names(riskModel$coef),colnames(x))]
+    } 
+  }
+  
+else if(is.vector(riskModel))
+ {
+   if (length(names(riskModel))!= length(riskModel))
+   {
+    stop("The argument 'weights' is not a named beta (coefficient) vector")
+   }
+   if( setequal(intersect(names(riskModel),colnames(x)),colnames(x)))
+   {
+    y <- riskModel[intersect(colnames(x),names(riskModel))]
+   }                              
+   else
+   {
+ stop("Beta coefficient vector does not contain all the genetic variants")
+   }
+  
+ }
+ 
+else
+   {
+stop("'weights' argument should either be a model or a named beta vector" )
+   }
+
+if(Type=="weighted")
+{
+wrs<- y %*% t(x) 
+return(as.vector(wrs))
+}
+else if (Type=="unweighted")
+{
+Unbetavector <- sign(y)
+pro <-   Unbetavector %*% t(x)
+num <-sum(Unbetavector==-1)
+urs<-pro+(2*num)
+return(as.vector(urs))
+}
+}
+#' Function to compute univariate ORs for genetic predictors. The function computes the univariate ORs with 95\% CIs for genetic predictors.
+#'
+#' The function computes the univariate ORs with 95\% CIs for the specified 
+#' genetic variants both per allele and per genotype. The ORs are saved with the data from which they are  
+#' calculated. Genotype frequencies are provided for      
+#' persons with and without the outcome  
+#' of interest. The genotype or allele that is coded as \code{'0'} is considered 
+#' as the reference to computes the ORs. 
+#' 
+#' @param data Data frame or matrix that includes the outcome and 
+#' predictors variables. 
+#' @param cOutcome Column number of the outcome variable. \code{cOutcome=2}  
+#' means that the second column of the dataset is the outcome variable.
+#' @param cGenPreds Column numbers of genetic variables for which the ORs 
+#' are calculated.
+#' @param filenameGeno Name of the output file in which the univariate ORs  
+#' and frequencies per genotype will be saved. The file is saved in the working directory as 
+#' a txt file. When no \code{filenameGeno} is specified, the output is not saved.
+#' @param filenameAllele Name of the output file in which the univariate ORs and 
+#' frequencies per allele will be saved. The file is saved in the working 
+#' directory as a txt file. When no \code{filenameAllele} is specified, the output is not saved.
+#' 
+#' @return The function returns two different tables. One table contains genotype frequencies 
+#' and univariate ORs with 95\% CIs and the other contains allele frequencies and 
+#' univariate ORs with 95\% CIs.
+#'
+#'
+#' @keywords manip
+#' 
+#'
+#' @seealso \code{\link{ORmultivariate}}  
+#' @examples
+#' # specify dataset with outcome and predictor variables 
+#' data(ExampleData)
+#' # specify column number of the outcome variable
+#'  cOutcome <- 2 
+#'  # specify column numbers of genetic predictors
+#'  cGenPreds <- c(11:13,16)
+#'
+#'  # compute univariate ORs
+#'  ORunivariate(data=ExampleData, cOutcome=cOutcome, cGenPreds=cGenPreds, 
+#' filenameGeno="GenoOR.txt", filenameAllele="AlleleOR.txt")
+#' 
+"ORunivariate" <- function(data, cOutcome,cGenPreds,filenameGeno, filenameAllele )
+  {
+  p<- data[,cGenPreds ] # p : A table of Genotype data for all SNP's
+  o<- data[,cOutcome] #  o: the binary outcome variable
+  m1 <- matrix (nrow=dim(p)[2], ncol=19)
+  n1 <- matrix (nrow=dim(p)[2], ncol=12)
+
+  for (i in 1:dim(p)[2])
+         {
+s<-table(p[,i],o)
+if (dim(s)[1]==1) {s <- rbind(s,c(0,0));s <- rbind(s,c(0,0))}
+if (dim(s)[1]==2) {s <- rbind(s,c(0,0))}
+a<-oddsratio.wald(s)$measure
+b<-oddsratio.wald(s)$data
+c <- matrix (nrow=2, ncol=2)
+c[1,1] <- 2*b[1,1]+b[2,1]
+c[2,1] <- 2*b[3,1]+b[2,1]
+c[1,2] <- 2*b[1,2]+b[2,2]
+c[2,2] <- 2*b[3,2]+b[2,2]
+dimnames(c)[[1]] <- c("Allele-I","Allele-II")
+dimnames(c)[[2]] <- c("0","1")
+d<-oddsratio.wald(c)$measure
+e<-oddsratio.wald(c)$data
+
+            m1[i,1]<-  colnames(p)[i]
+            m1[i,2]<-  ( b[1,2])
+            m1[i,3]<-  ( round((b[1,2]/b[4,2])*100 ,1))
+            m1[i,4]<-  ( b[2,2])
+            m1[i,5]<-  ( round((b[2,2]/b[4,2])*100 ,1))
+            m1[i,6]<-  ( b[3,2])
+            m1[i,7]<-  ( round((b[3,2]/b[4,2])*100 ,1))
+            m1[i,8]<-  ( b[1,1])
+            m1[i,9]<-  ( round((b[1,1]/b[4,1])*100 ,1))
+            m1[i,10]<- ( b[2,1])
+            m1[i,11]<- ( round((b[2,1]/b[4,1])*100 ,1))
+            m1[i,12]<- ( b[3,1])
+            m1[i,13]<- ( round((b[3,1]/b[4,1])*100 ,1))
+            m1[i,14]<- ( round(a[2,1] ,2))
+            m1[i,15]<- ( round(a[2,2] ,2))
+            m1[i,16]<- ( round(a[2,3] ,2))
+            m1[i,17]<- ( round(a[3,1] ,2))
+            m1[i,18]<- ( round(a[3,2] ,2))
+            m1[i,19]<- ( round(a[3,3] ,2))
+
+            n1[i,1]<-  colnames(p)[i]
+            n1[i,2]<-  (e[1,2])
+            n1[i,3]<-  round((e[1,2]/e[3,2])*100,1)
+            n1[i,4]<-  e[2,2]
+            n1[i,5]<-  round((e[2,2]/e[3,2])*100,1)
+            n1[i,6]<-   e[1,1]
+            n1[i,7]<-   round((e[1,1]/e[3,1])*100,1)
+            n1[i,8]<-   e[2,1]
+            n1[i,9]<-   round((e[2,1]/e[3,1])*100,1)
+            n1[i,10]<-  round(d[2,1] ,2)
+            n1[i,11]<-  round(d[2,2] ,2)
+            n1[i,12]<-  round(d[2,3] ,2)
+
+        }
+
+       m1 <- as.table(m1)
+       dimnames(m1)[[1]] <- c(1:dim(p)[2])
+       dimnames(m1)[[2]] <- c("Name", "0 ","0 (%)", "1 ","1 (%)", "2 ","2 (%)",
+       "0 ","0 (%)", "1 ","1 (%)", "2 ","2 (%)","OR1","CI-Low","CI-high","OR2",
+       "CI-Low","CI-high")
+names(dimnames(m1)) <- c("", " Genotype frequencies for Cases & Controls, and OR(95% CI)")
+     if (!missing(filenameGeno))
+		 write.table(m1,file=filenameGeno, row.names = FALSE,sep = "\t")
+
+       n1 <- as.table(n1)
+       dimnames(n1)[[1]] <- c(1:dim(p)[2])
+       dimnames(n1)[[2]] <- c("Name", "0","0 (%)","1","1 (%)", "0","0 (%)","1",
+       "1 (%)","OR1","CI-Low","CI-high")
+names(dimnames(n1)) <- c("", " Allele frequencies for Cases & Controls, and OR(95% CI) ")
+       if (!missing(filenameAllele))
+       write.table(n1,file=filenameAllele, row.names = FALSE,sep = "\t")
+
+       p<- list(Genotype =m1,  Allelic=n1)
+        return(p)
+  }
+#' Function to obtain multivariate odds ratios from a logistic regression model. 
+#' The function estimates multivariate (adjusted) odds ratios (ORs) with    
+#' 95\% confidence intervals (CIs) for all the genetic and non-genetic variables 
+#' in the risk model. 
+#'
+#' The function requires that first a logistic regression  
+#' model is fitted either by using \code{GLM} function or the function 
+#' \code{\link{fitLogRegModel}}. In addition to the multivariate ORs, 
+#' the function returns summary statistics of model performance, namely the Brier 
+#' score and the Nagelkerke's \eqn{R^2} value.  
+#' The Brier score quantifies the accuracy of risk predictions by comparing 
+#' predicted risks with observed outcomes at individual level (where outcome 
+#' values are either 0 or 1). The Nagelkerke's \eqn{R^2} value indicates the percentage of variation   
+#'  of the outcome explained by the predictors in the model.
+#'
+#' @param riskModel Name of logistic regression model that can be fitted using  
+#' the function \code{\link{fitLogRegModel}}.
+#' @param filename Name of the output file in which the multivariate    
+#'  ORs will be saved. If no directory is specified, the file is 
+#' saved in the working directory as a txt file.   
+#' When \code{filename} is not specified, the output is not saved.
+#'
+#' 
+#'  @return 
+#'   The function returns: 
+#'   \item{Predictors Summary}{OR with 95\% CI and corresponding  p-values for 
+#' each predictor in the model}
+#'   \item{Brier Score}{Brier score}
+#'   \item{Nagelkerke Index}{Nagelkerke's \eqn{R^2} value}
+#'
+#'
+#'
+#' @keywords htest
+#'
+#'
+#' @references Brier GW. Verification of forecasts expressed in terms of probability. 
+#' Monthly weather review 1950;78:1-3.   
+#'
+#'
+#' Nagelkerke NJ. A note on a general definition of the coefficient
+#' of determination. Biometrika 1991;78:691-692.
+#'
+#'
+#' @seealso \code{\link{fitLogRegModel}}, \code{\link{ORunivariate}} 
+#'
+#' @examples
+#' # specify dataset with outcome and predictor variables
+#' data(ExampleData)
+#' # specify column number of outcome variable
+#'  cOutcome <- 2
+#'  # specify column numbers of non-genetic predictors
+#'  cNonGenPred <- c(3:10)
+#'  # specify column numbers of non-genetic predictors that are categorical
+#'  cNonGenPredCat <- c(6:8)
+#'  # specify column numbers of genetic predictors
+#'  cGenPred <- c(11,13:16)
+#'  # specify column numbers of genetic predictors that are categorical
+#'  cGenPredCat <- c(0)
+#'
+#'  # fit logistic regression model
+#'  riskmodel <- fitLogRegModel(data=ExampleData, cOutcome=cOutcome,
+#'  cNonGenPreds=cNonGenPred, cNonGenPredsCat=cNonGenPredCat,
+#'  cGenPreds=cGenPred, cGenPredsCat=cGenPredCat)
+#'
+#'  # obtain multivariate OR(95\% CI) for all predictors of the fitted model
+#'   ORmultivariate(riskModel=riskmodel, filename="multiOR.txt")  
+#'
+"ORmultivariate" <- function(riskModel, filename) 
+{
+if (!any(class(riskModel)=="glm")) 
+stop("riskModel argument should have 'glm' class")
+sum.coef<-summary(riskModel)$coef
+OR<-exp(sum.coef[,"Estimate"]) 
+Upper.CI<-exp(sum.coef[,"Estimate"]+1.96*sum.coef[,"Std. Error"])   
+Lower.CI<-exp(sum.coef[,1]-1.96*sum.coef[,2])
+tab <-cbind("OR"=round(OR ,4),"Lower.CI"=round(Lower.CI ,4),"Upper.CI"=
+round(Upper.CI ,4),"p-value"=round((sum.coef)[, 4],4))
+
+if (!missing(filename)) 
+	write.table(tab,file=filename, row.names=TRUE,sep = "\t")
+
+B <- mean((riskModel$y) * (1-predict(riskModel, type="response"))^2 +
+(1-riskModel$y) * (predict(riskModel, type="response"))^2)
+# B
+Bmax <- mean(riskModel$y) * (1-mean(riskModel$y))^2 + 
+(1-mean(riskModel$y)) *mean(riskModel$y)^2
+Bscaled <- 1 - B/Bmax
+
+LLfull <- (riskModel$deviance)/(-2)
+LLnul <- (riskModel$null.deviance)/(-2)
+n <- length(riskModel$y)   
+Mc <- 1-(LLfull/LLnul)   # McFadden's R square
+Cox <- 1-exp ((-2)*(LLfull-LLnul)/n)  # Cox-Snell R square 
+Rmax <- 1-exp((2*LLnul)/n)
+Nag <- Cox/Rmax    # Nagelkerke_R2
+
+  p<- list(Predictors_Summary= tab,Brier_Score=round(B ,4),Nagelkerke_R2=round(Nag,4))
+ return(p)
+} 
+#' Function to plot predicted risks against risk scores.  
+#' This function is used to make a plot of predicted risks against risk scores.
+#' 
+#' The function creates a plot of predicted risks against risk scores. 
+#' Predicted risks can be obtained using the functions
+#' \code{\link{fitLogRegModel}} and \code{\link{predRisk}}
+#' or be imported from other methods or packages. 
+#' The function \code{\link{riskScore}} can be
+#'  used to compute unweighted or weighted risk scores.      
+#' 
+#' @param data Data frame or matrix that includes the outcome and 
+#' predictors variables. 
+#' @param riskScore Vector of (weighted or unweighted) genetic risk scores.
+#' @param predRisk Vector of predicted risks.
+#' @param plottitle  Title of the plot. Specification of \code{plottitle} is optional. Default is "Risk score predicted risk plot".
+#' @param xlabel  Label of x-axis. Specification of \code{xlabel} is optional. Default is "Risk score".
+#' @param ylabel  Label of y-axis. Specification of \code{ylabel} is optional. Default is "Predicted risk".
+#' @param rangexaxis Range of the x axis. Specification of \code{rangexaxis} is optional.
+#' @param rangeyaxis Range of the y axis. Specification of \code{rangeyaxis} is optional. Default is \code{c(0,1)}.
+#' @param filename Name of the output file in which risk scores and
+#' predicted risks for each individual will be saved. If no directory is 
+#' specified, the file is saved in the working directory as a txt file. 
+#' When no \code{filename} is specified, the output is not saved.
+#' @param fileplot Name of the output file that contains the plot. The file is
+#' saved in the working directory in the format specified under \code{plottype}. Example:
+#' \code{fileplot="plotname"}. Note that the extension is not specified here. 
+#' When \code{fileplot} is not specified, the plot is not saved.
+#' @param plottype The format in which the plot is saved. Available formats are
+#' wmf, emf, png, jpg, jpeg, bmp, tif, tiff, ps,
+#' eps or pdf. For example, \code{plottype="eps"} will save the plot in eps format.
+#' When \code{plottype} is not specified, the plot will be saved in jpg format.
+#' 
+#' @return 
+#'  The function creates a plot of predicted risks against risk scores.
+#'
+#'
+#' @keywords hplot
+#'
+#'
+#' @seealso \code{\link{riskScore}}, \code{\link{predRisk}} 
+#'
+#' @examples
+#' # specify dataset with outcome and predictor variables
+#' data(ExampleData)
+#'
+#'  # fit a logistic regression  model
+#'  # all steps needed to construct a logistic regression model are written in a function
+#'  # called 'ExampleModels', which is described on page 4-5
+#'  riskmodel <- ExampleModels()$riskModel2
+#'
+#'  # obtain predicted risks
+#'  predRisk <- predRisk(riskmodel)
+#'
+#'  # specify column numbers of genetic predictors
+#'  cGenPred <- c(11:16)
+#'
+#'  # function to compute unweighted genetic risk scores
+#'  riskScore <- riskScore(weights=riskmodel, data=ExampleData, 
+#'  cGenPreds=cGenPred, Type="unweighted")
+#'
+#'  # specify range of x-axis
+#'  rangexaxis <- c(0,12)   
+#'  # specify range of y-axis
+#'  rangeyaxis <- c(0,1)     
+#'  # specify label of x-axis
+#'  xlabel <- "Risk score"     
+#'  # specify label of y-axis
+#'  ylabel <- "Predicted risk" 
+#'  # specify title for the plot
+#'  plottitle <- "Risk score versus predicted risk"
+#'
+#'  # produce risk score-predicted risk plot
[TRUNCATED]

To get the complete diff run:
    svnlook diff /svnroot/genabel -r 641


More information about the Genabel-commits mailing list