[Genabel-commits] r641 - in pkg: . PredictABEL PredictABEL/R PredictABEL/data PredictABEL/inst PredictABEL/man
noreply at r-forge.r-project.org
noreply at r-forge.r-project.org
Wed Feb 9 21:45:41 CET 2011
Author: lckarssen
Date: 2011-02-09 21:45:41 +0100 (Wed, 09 Feb 2011)
New Revision: 641
Added:
pkg/PredictABEL/
pkg/PredictABEL/DESCRIPTION
pkg/PredictABEL/R/
pkg/PredictABEL/R/PredictABEL.R
pkg/PredictABEL/data/
pkg/PredictABEL/data/ExampleData.RData
pkg/PredictABEL/inst/
pkg/PredictABEL/inst/doc/
pkg/PredictABEL/man/
pkg/PredictABEL/man/ExampleData.Rd
pkg/PredictABEL/man/ExampleModels.Rd
pkg/PredictABEL/man/ORmultivariate.Rd
pkg/PredictABEL/man/ORunivariate.Rd
pkg/PredictABEL/man/PredictABEL-package.Rd
pkg/PredictABEL/man/fitLogRegModel.Rd
pkg/PredictABEL/man/plotCalibration.Rd
pkg/PredictABEL/man/plotDiscriminationBox.Rd
pkg/PredictABEL/man/plotPredictivenessCurve.Rd
pkg/PredictABEL/man/plotPriorPosteriorRisk.Rd
pkg/PredictABEL/man/plotROC.Rd
pkg/PredictABEL/man/plotRiskDistribution.Rd
pkg/PredictABEL/man/plotRiskscorePredrisk.Rd
pkg/PredictABEL/man/predRisk.Rd
pkg/PredictABEL/man/reclassification.Rd
pkg/PredictABEL/man/riskScore.Rd
Log:
Initial commit of Suman Kundu's PredictABEL package
Added: pkg/PredictABEL/DESCRIPTION
===================================================================
--- pkg/PredictABEL/DESCRIPTION (rev 0)
+++ pkg/PredictABEL/DESCRIPTION 2011-02-09 20:45:41 UTC (rev 641)
@@ -0,0 +1,18 @@
+Package: PredictABEL
+Title: Assessment of risk prediction models
+Version: 1.1
+Date: 2011-02-09
+Author: Suman Kundu, Yurii S. Aulchenko, A. Cecile J.W. Janssens
+Maintainer: Suman Kundu <s.kundu at erasmusmc.nl>, A. Cecile J.W. Janssens <a.janssens at erasmusmc.nl>
+Depends: R (>= 2.9.0), Hmisc, ROCR, epitools, PBSmodelling
+Suggests: GenABEL
+Description: PredictABEL includes functions to assess the performance of risk models. The package contains functions for the various measures that are used
+ in empirical studies, including univariate and multivariate odds ratios (OR) of the predictors, the c-statistic (or area under the receiver operating
+ characteristic (ROC) curve (AUC)), Hosmer-Lemeshow goodness of fit test, reclassification table, net reclassification improvement (NRI) and integrated
+ discrimination improvement (IDI). Also included are functions to create plots, such as risk distributions, ROC curves, calibration plot, discrimination
+ box plot and predictiveness curves. In addition to functions to assess the performance of risk models, the package includes functions to obtain weighted
+ and unweighted risk scores as well as predicted risks using logistic regression analysis. These logistic regression functions are specifically written
+ for models that include genetic variables, but they can also be applied to models that are based on non-genetic risk factors only.
+License: GPL (>= 2)
+Collate: 'PredictABEL.R'
+Packaged: 2011-02-09 12:39:12 UTC; 488810
Added: pkg/PredictABEL/R/PredictABEL.R
===================================================================
--- pkg/PredictABEL/R/PredictABEL.R (rev 0)
+++ pkg/PredictABEL/R/PredictABEL.R 2011-02-09 20:45:41 UTC (rev 641)
@@ -0,0 +1,1685 @@
+#' An R package for the analysis of (genetic) risk prediction studies.
+#'
+#' Fueled by the substantial gene discoveries from genome-wide association
+#' studies, there is increasing interest in investigating the predictive
+#' ability of genetic risk models. To assess the performance of genetic risk
+#' models, PredictABEL includes functions for the various measures and plots
+#' that have been used in empirical studies, including univariate and
+#' multivariate odds ratios (ORs) of the predictors, the c-statistic (or AUC),
+#' Hosmer-Lemeshow goodness of fit test, reclassification table, net
+#' reclassification improvement (NRI) and integrated discrimination
+#' improvement (IDI). The plots included are the ROC plot, calibration plot,
+#' discrimination box plot, predictiveness curve, and several risk distributions.
+#'
+#'
+#' These functions can be applied to predicted risks that are obtained using
+#' logistic regression analysis, to weighted or unweighted risk scores, for
+#' which the functions are included in this package. The functions can also be
+#' used to assess risks or risk scores that are constructed using other methods, e.g., Cox Proportional
+#' Hazards regression analysis, which are not included in the current version.
+#' Risks obtained from other methods can be imported into R for assessment
+#' of the predictive performance.
+#'
+#'
+#' The functions to construct the risk models using logistic regression analyses
+#' are specifically written for models that include genetic variables,
+#' eventually in addition to non-genetic factors, but they can also be applied
+#' to construct models that are based on non-genetic risk factors only. \cr
+#'
+#'
+#' Before using the functions \code{\link{fitLogRegModel}} for constructing
+#' a risk model or \code{\link{riskScore}} for computing risk
+#' scores, the following checks on the dataset are advisable to be done:
+#'
+#' (1) Missing values: The logistic regression analyses and computation of
+#' the risk score are done only for subjects that have no missing data. In case
+#' of missing values, individuals with missing data can be removed from the
+#' dataset or imputation strategies can be used to fill in missing data.
+#' Subjects with missing data can be removed with the R function \code{na.omit}
+#' (available in \code{stats} package).
+#' Example: \code{DataFileNew <- na.omit(DataFile)}
+#' will make a new dataset (\code{DataFileNew}) with no missing values;
+#'
+#'
+#' (2) Multicollinearity: When there is strong correlation between the
+#' predictor variables, regression coefficients may be estimated imprecisely
+#' and risks scores may be biased because the assumption of independent effects
+#' is violated. In genetic risk prediction studies, problems with
+#' multicollinearity should be expected when single nucleotide polymorphisms
+#' (SNPs) located in the same gene are
+#' in strong linkage disequilibrium (LD). For SNPs in LD it is common to select
+#' the variant with the lowest p-value in the model;
+#'
+#'
+#' (3) Outliers: When the data contain significant outliers, either clinical
+#' variables with extreme values of the outcomes or extreme values resulting
+#' from errors in the data entry, these may impact the construction of the risk models and
+#' computation of the risks scores. Data should be carefully checked and outliers
+#' need to be removed or replaced, if justified;
+#'
+#' (4) Recoding of data: In the computation of unweighted risk scores, it is assumed
+#' that the genetic variants are coded \code{0,1,2}
+#' representing the number of alleles carried. When variants
+#' are coded \code{0,1} representing a dominant or recessive effect of the alleles,
+#' the variables need to be recoded before unweighted risk scores can be computed. \cr
+#'
+#'
+#' To import data into R several alternative strategies can be used. Use the
+#' \code{Hmisc} package for importing SPSS and SAS data into R.
+#' Use "\code{ExampleData <- read.table("DataName.txt", header=T, sep="\t")}" for text
+#' files where variable names are included as column headers and data are
+#' separated by tabs.
+#' Use "\code{ExampleData <- read.table("Name.csv", sep=",", header=T)}"
+#' for comma-separated files with variable names as column headers.
+#' Use \code{"setwd(dir)"} to set the working directory to "dir". The datafile
+#' needs to be present in the working directory. \cr
+#'
+#'
+#' To export datafiles from R tables to a tab-delimited textfile with the first row as
+#' the name of the variables,
+#' use "\code{write.table(R_Table, file="Name.txt", row.names=FALSE, sep="\t")}" and
+#' when a comma-separated textfile is requested and variable names are provided in the first row,
+#' use "\code{write.table(R_Table, file="Name.csv", row.names=FALSE, sep=",")}".
+#' When the directory is not specified, the file will be
+#' saved in the working directory. For exporting R data into SPSS, SAS and
+#' Stata data, use functions in the the \code{foreign} package. \cr
+#'
+#' Several functions in this package depend on other R packages:
+#'
+#' (1) \code{Hmisc}, is used to compute NRI and IDI;
+#'
+#' (2) \code{ROCR}, is used to produce ROC plots;
+#'
+#' (3) \code{epitools}, is used to compute univariate odds ratios;
+##'
+#' (4) \code{PBSmodelling}, is used to produce predictiveness curve.
+#'
+#' @note The current version of the package includes the basic measures
+#' and plots that are used in the assessment of (genetic) risk prediction models.
+#' Planned extensions of the package include functions to construct risk
+#' models using Cox Proportional Hazards analysis for prospective data and
+#' functions to construct simulated data for the evaluation of
+#' genetic risk models (see Janssens et al, Genet Med 2006).
+#'
+#'
+##' @author Suman Kundu
+##'
+##' Yurii S. Aulchenko
+##'
+##' A. Cecile J.W. Janssens
+##'
+##' @keywords package
+#'
+#' @references S Kundu, YS Aulchenko, CM van Duijn, ACJW Janssens. PredictABEL:
+#' an R package for the assessment of risk prediction models.
+#' Eur J Epidemiol 2011. In press. \cr
+#'
+#' ACJW Janssens, JPA Ioannidis, CM van Duijn, J Little, MJ Khoury.
+#' Strengthening the Reporting of Genetic Risk Prediction Studies: The GRIPS
+#' Statement Proposal. Eur J Epidemiol 2011. In press. \cr
+#'
+#' ACJW Janssens, JPA Ioannidis, S Bedrosian, P Boffetta, SM Dolan, N Dowling,
+#' I Fortier, AN. Freedman, JM Grimshaw, J Gulcher, M Gwinn, MA Hlatky, H Janes,
+#' P Kraft, S Melillo, CJ O'Donnell, MJ Pencina, D Ransohoff, SD Schully,
+#' D Seminara, DM Winn, CF Wright, CM van Duijn, J Little, MJ Khoury.
+#' Strengthening the reporting of genetic risk prediction studies
+#' (GRIPS)-Elaboration and explanation. Eur J Epidemiol 2011. In press. \cr
+#'
+#' Aulchenko YS, Ripke S, Isaacs A, van Duijn CM. GenABEL: an R package for genome-wide
+#' association analysis. Bioinformatics 2007;23(10):1294-6.
+#'
+"PredictABEL-package" <- function() {}
+#' Function to fit a logistic regression model.
+#'
+#' The function fits a standard GLM function for the logistic regression model.
+#' This function can be used to construct a logistic regression model based on genetic and non-genetic
+#' predictors. The function also allows to enter the genetic predictors
+#' as a single risk score. For that purpose, the function requires that
+#' the dataset additionally includes the risk score.
+#' A new dataset can be constructed using
+#' "\code{NewExampleData <- cbind(ExampleData,riskScore)}".
+#' The genetic risk scores can be obtained
+#' using the function \code{\link{riskScore}} in this package or be
+#' imported from other methods.
+#'
+#' @param data Data frame or matrix that includes the outcome and
+#' predictor variables.
+#' @param cOutcome Column number of the outcome variable. \code{cOutcome=2}
+#' means that the second column of the dataset is the outcome variable.
+#' To fit the logistic regression model, the outcome variable needs to be
+#' (re)coded as \code{1} for the presence and \code{0} for the absence of the
+#' outcome of interest.
+#' @param cNonGenPreds Column numbers of the non-genetic predictors that are
+#' included in the model. An example to denote column numbers is
+#' \code{c(3,6:8,10)}. Choose \code{c(0)} when no non-genetic predictors
+#' are considered.
+#' @param cNonGenPredsCat Column numbers of the non-genetic predictors that
+#' are entered as categorical variables in the model. When non-genetic
+#' predictors are not specified as being categorical they are treated as
+#' continuous variables in the model. If no non-genetic predictors are
+#' categorical, denote \code{c(0)}.
+#' @param cGenPreds Column numbers of the genetic predictors or genetic risk score.
+#' Denote \code{c(0)}
+#' when the prediction model does not consider
+#' genetic predictors or genetic risk score.
+#' @param cGenPredsCat Column numbers of the genetic predictors that are
+#' entered as categorical variables in the model. When SNPs are considered as
+#' categorical, the model
+#' will estimate effects per genotype. Otherwise, SNPs are considered as
+#' continuous variables for which the model will estimate an allelic effect.
+#' Choose c(0) when no genetic predictors are considered as categorical
+#' or when genetic predictors are entered as a risk score into the model.
+#'
+#' @return No value returned.
+#'
+#' @keywords models
+#'
+#'
+#' @seealso \code{\link{predRisk}}, \code{\link{ORmultivariate}}, \code{\link{riskScore}}
+#' @examples
+#' # specify dataset with outcome and predictor variables
+#' data(ExampleData)
+#' # specify column number of outcome variable
+#' cOutcome <- 2
+#' # specify column numbers of non-genetic predictors
+#' cNonGenPred <- c(3:10)
+#' # specify column numbers of non-genetic predictors that are categorical
+#' cNonGenPredCat <- c(6:8)
+#' # specify column numbers of genetic predictors
+#' cGenPred <- c(11,13:16)
+#' # specify column numbers of genetic predictors that are categorical
+#' cGenPredCat <- c(0)
+#'
+#' # fit logistic regression model
+#' riskmodel <- fitLogRegModel(data=ExampleData, cOutcome=cOutcome,
+#' cNonGenPreds=cNonGenPred, cNonGenPredsCat=cNonGenPredCat,
+#' cGenPreds=cGenPred, cGenPredsCat=cGenPredCat)
+#'
+#' # show summary details for the fitted risk model
+#' summary(riskmodel)
+#'
+"fitLogRegModel" <-
+function (data, cOutcome, cNonGenPreds, cNonGenPredsCat,
+cGenPreds, cGenPredsCat) {
+
+if (missing(cNonGenPreds)) {cNonGenPreds<- c(0)}
+if (missing(cNonGenPredsCat)) {cNonGenPredsCat<- c(0)}
+if (missing(cGenPreds)) {cGenPreds<- c(0)}
+if (missing(cGenPredsCat)) {cGenPredsCat<- c(0)}
+if ((cGenPreds[1]==0) & (cNonGenPreds[1]==0)) {
+stop("No predictors have been considered.\n")
+}
+
+NonGen_factor <- as.data.frame(data[,cNonGenPredsCat])
+Gen_factor<- as.data.frame(data[,cGenPredsCat])
+if(dim(Gen_factor)[2] >0 )
+{
+for(i in 1:dim(Gen_factor)[2])
+{
+Gen_factor[,i] <- as.factor(Gen_factor[,i])
+}
+}
+if(dim(NonGen_factor)[2] >0 )
+{
+for(i in 1:dim(NonGen_factor)[2])
+{
+NonGen_factor[,i] <- as.factor(NonGen_factor[,i])
+}
+}
+
+p <- cbind(data[,setdiff(cNonGenPreds,cNonGenPredsCat)],NonGen_factor,
+data[,setdiff(cGenPreds,cGenPredsCat)],Gen_factor)
+colnames(p)<- c(colnames(data)[setdiff(cNonGenPreds,cNonGenPredsCat)],colnames(data)[cNonGenPredsCat],
+colnames(data)[setdiff(cGenPreds,cGenPredsCat)],colnames(data)[cGenPredsCat])
+
+outcome<- data[,cOutcome]
+genes <- colnames(p)
+fmla<- as.formula(paste("outcome~", paste(genes, collapse="+")))
+model <- glm(fmla, data=p, family=binomial("logit"))
+return(model)
+
+}
+#' Function to compute predicted risks for all individuals in the dataset.
+#'
+#' The function computes predicted risks from a specified logistic regression model.
+#' The function \code{\link{fitLogRegModel}} can be used to construct such a model.
+#'
+#' @param riskModel Name of logistic regression model that can be fitted using
+#' the function \code{\link{fitLogRegModel}}.
+#' @param data Data frame or matrix that includes the outcome, ID number and
+#' predictor variables.
+#' @param cID Column number of ID variable. The ID number and predicted risks
+#' will be saved under \code{filename}. When \code{cID} is not specified, the output is not saved.
+#' @param filename Name of the output file in which the ID number and
+#' estimated predicted risks will be saved. The file is saved in the working
+#' directory as a txt file. When no \code{filename} is specified, the output is not saved.
+#'
+#'
+#'
+#' @return The function returns a vector of predicted risks.
+#'
+#'
+#' @keywords htest
+#'
+#'
+#' @seealso \code{\link{fitLogRegModel}}, \code{\link{plotCalibration}},
+#' \code{\link{plotROC}}, \code{\link{plotPriorPosteriorRisk}}
+#'
+#' @examples
+#' # specify dataset with outcome and predictor variables
+#' data(ExampleData)
+#' # specify column number of the outcome variable
+#' cOutcome <- 2
+#' # specify column number of ID variable
+#' cID <- 1
+#' # specify column numbers of non-genetic predictors
+#' cNonGenPred <- c(3:10)
+#' # specify column numbers of non-genetic predictors that are categorical
+#' cNonGenPredCat <- c(6:8)
+#' # specify column numbers of genetic predictors
+#' cGenPred <- c(11,13:16)
+#' # specify column numbers of genetic predictors that are categorical
+#' cGenPredCat <- c(0)
+#'
+#' # fit logistic regression model
+#' riskmodel <- fitLogRegModel(data=ExampleData, cOutcome=cOutcome,
+#' cNonGenPreds=cNonGenPred, cNonGenPredsCat=cNonGenPredCat,
+#' cGenPreds=cGenPred, cGenPredsCat=cGenPredCat)
+#'
+#' # obtain predicted risks
+#' predRisk <- predRisk(riskModel=riskmodel, filename="name.txt")
+#'
+"predRisk" <-
+function(riskModel, data, cID, filename)
+{
+ if (any(class(riskModel) == "glm"))
+ {
+ predrisk <- predict(riskModel, type="response")
+ }
+else
+ {
+stop("The argument 'riskModel' should be a (GLM)model")
+ }
+
+ if (!missing(data)&& !missing(cID)&& !missing(filename))
+ {tab <- cbind(ID=data[, cID],PredRisk=predrisk)
+ write.table(tab, file=filename, row.names = FALSE,sep = "\t")
+ }
+ return(predrisk)
+ }
+#' Function to compute genetic risk scores. The function computes unweighted
+#' or weighted genetic risk scores. The relative effects (or weights) of
+#' genetic variants can either come from beta coefficients of a risk model
+#' or from a vector of beta coefficients imported into R, e.g., when beta cofficients are obtained from meta-analysis.
+#'
+#'
+#' The function calculates unweighted
+#' or weighted genetic risk scores. The unweighted genetic risk score is a simple
+#' risk allele count assuming that all alleles have the same effect. For this
+#' calculation, it is required that the genetic variables are coded as the number of risk
+#' alleles. Beta coefficients are used to determine which allele is the risk
+#' allele. When the sign of the beta coefficient is negative, the allele coding
+#' is reversed. The weighted risk score is a sum of the number of risk alleles
+#' multiplied by their beta coefficients.
+#'
+#' The beta coefficients can come from two different sources, either beta coefficients of a risk model
+#' or a vector of beta coefficients imported into R, e.g., when beta cofficients are obtained from meta-analysis.
+#' This vector of beta coefficients
+#' should be a named vector containing the same names as mentioned in genetic variants.
+#' A logistic regression model can be constructed using \code{\link{fitLogRegModel}}
+#' from this package.
+#'
+#' @note When a vector of beta coefficients is imported, it should be checked
+#' whether the DNA strands and the coding of the risk alleles are the same
+#' as in the study data. The functions are available in the package \code{GenABEL}
+#' to accurately compute risk scores when the DNA strands are different or the risk
+#' alleles are coded differently in the study data and the data used in meta-analysis.
+#'
+#' @param weights The vector that includes the weights given to the genetic
+#' variants. See details for more informations.
+#' @param data Data frame or matrix that includes the outcome
+#' and predictors variables.
+#' @param cGenPreds Column numbers of the genetic variables on the basis of
+#' which the risk score is computed.
+#' @param Type Specification of the type of risk scores that will be computed.
+#' Type can be weighted (\code{Type="weighted"}) or
+#' unweighted (\code{Type="unweighted"}).
+#'
+#' @return The function returns a vector of risk scores.
+#'
+#'
+#' @keywords htest
+#'
+#'
+#' @seealso \code{\link{plotRiskDistribution}}, \code{\link{plotRiskscorePredrisk}}
+#' @examples
+#' # specify dataset with outcome and predictor variables
+#' data(ExampleData)
+#' # specify column numbers of genetic predictors
+#' cGenPred <- c(11:16)
+#'
+#' # fit a logistic regression model
+#' # all steps needed to construct a logistic regression model are written in a function
+#' # called 'ExampleModels', which is described on page 4-5
+#' riskmodel <- ExampleModels()$riskModel2
+#'
+#' # compute unweighted risk scores
+#' riskScore <- riskScore(weights=riskmodel, data=ExampleData,
+#' cGenPreds=cGenPred, Type="unweighted")
+#'
+"riskScore" <-
+function(weights, data, cGenPreds, Type )
+{
+riskModel <- weights
+x <- data[, cGenPreds]
+if (any(class(riskModel) == "glm"))
+ {
+ if(! setequal(intersect(names(riskModel$coef),colnames(x)),colnames(x)))
+ {
+ stop("The risk model does not contain all the genetic variants as predictors")
+ }
+ else
+ {
+ y <- riskModel$coef[intersect(names(riskModel$coef),colnames(x))]
+ }
+ }
+
+else if(is.vector(riskModel))
+ {
+ if (length(names(riskModel))!= length(riskModel))
+ {
+ stop("The argument 'weights' is not a named beta (coefficient) vector")
+ }
+ if( setequal(intersect(names(riskModel),colnames(x)),colnames(x)))
+ {
+ y <- riskModel[intersect(colnames(x),names(riskModel))]
+ }
+ else
+ {
+ stop("Beta coefficient vector does not contain all the genetic variants")
+ }
+
+ }
+
+else
+ {
+stop("'weights' argument should either be a model or a named beta vector" )
+ }
+
+if(Type=="weighted")
+{
+wrs<- y %*% t(x)
+return(as.vector(wrs))
+}
+else if (Type=="unweighted")
+{
+Unbetavector <- sign(y)
+pro <- Unbetavector %*% t(x)
+num <-sum(Unbetavector==-1)
+urs<-pro+(2*num)
+return(as.vector(urs))
+}
+}
+#' Function to compute univariate ORs for genetic predictors. The function computes the univariate ORs with 95\% CIs for genetic predictors.
+#'
+#' The function computes the univariate ORs with 95\% CIs for the specified
+#' genetic variants both per allele and per genotype. The ORs are saved with the data from which they are
+#' calculated. Genotype frequencies are provided for
+#' persons with and without the outcome
+#' of interest. The genotype or allele that is coded as \code{'0'} is considered
+#' as the reference to computes the ORs.
+#'
+#' @param data Data frame or matrix that includes the outcome and
+#' predictors variables.
+#' @param cOutcome Column number of the outcome variable. \code{cOutcome=2}
+#' means that the second column of the dataset is the outcome variable.
+#' @param cGenPreds Column numbers of genetic variables for which the ORs
+#' are calculated.
+#' @param filenameGeno Name of the output file in which the univariate ORs
+#' and frequencies per genotype will be saved. The file is saved in the working directory as
+#' a txt file. When no \code{filenameGeno} is specified, the output is not saved.
+#' @param filenameAllele Name of the output file in which the univariate ORs and
+#' frequencies per allele will be saved. The file is saved in the working
+#' directory as a txt file. When no \code{filenameAllele} is specified, the output is not saved.
+#'
+#' @return The function returns two different tables. One table contains genotype frequencies
+#' and univariate ORs with 95\% CIs and the other contains allele frequencies and
+#' univariate ORs with 95\% CIs.
+#'
+#'
+#' @keywords manip
+#'
+#'
+#' @seealso \code{\link{ORmultivariate}}
+#' @examples
+#' # specify dataset with outcome and predictor variables
+#' data(ExampleData)
+#' # specify column number of the outcome variable
+#' cOutcome <- 2
+#' # specify column numbers of genetic predictors
+#' cGenPreds <- c(11:13,16)
+#'
+#' # compute univariate ORs
+#' ORunivariate(data=ExampleData, cOutcome=cOutcome, cGenPreds=cGenPreds,
+#' filenameGeno="GenoOR.txt", filenameAllele="AlleleOR.txt")
+#'
+"ORunivariate" <- function(data, cOutcome,cGenPreds,filenameGeno, filenameAllele )
+ {
+ p<- data[,cGenPreds ] # p : A table of Genotype data for all SNP's
+ o<- data[,cOutcome] # o: the binary outcome variable
+ m1 <- matrix (nrow=dim(p)[2], ncol=19)
+ n1 <- matrix (nrow=dim(p)[2], ncol=12)
+
+ for (i in 1:dim(p)[2])
+ {
+s<-table(p[,i],o)
+if (dim(s)[1]==1) {s <- rbind(s,c(0,0));s <- rbind(s,c(0,0))}
+if (dim(s)[1]==2) {s <- rbind(s,c(0,0))}
+a<-oddsratio.wald(s)$measure
+b<-oddsratio.wald(s)$data
+c <- matrix (nrow=2, ncol=2)
+c[1,1] <- 2*b[1,1]+b[2,1]
+c[2,1] <- 2*b[3,1]+b[2,1]
+c[1,2] <- 2*b[1,2]+b[2,2]
+c[2,2] <- 2*b[3,2]+b[2,2]
+dimnames(c)[[1]] <- c("Allele-I","Allele-II")
+dimnames(c)[[2]] <- c("0","1")
+d<-oddsratio.wald(c)$measure
+e<-oddsratio.wald(c)$data
+
+ m1[i,1]<- colnames(p)[i]
+ m1[i,2]<- ( b[1,2])
+ m1[i,3]<- ( round((b[1,2]/b[4,2])*100 ,1))
+ m1[i,4]<- ( b[2,2])
+ m1[i,5]<- ( round((b[2,2]/b[4,2])*100 ,1))
+ m1[i,6]<- ( b[3,2])
+ m1[i,7]<- ( round((b[3,2]/b[4,2])*100 ,1))
+ m1[i,8]<- ( b[1,1])
+ m1[i,9]<- ( round((b[1,1]/b[4,1])*100 ,1))
+ m1[i,10]<- ( b[2,1])
+ m1[i,11]<- ( round((b[2,1]/b[4,1])*100 ,1))
+ m1[i,12]<- ( b[3,1])
+ m1[i,13]<- ( round((b[3,1]/b[4,1])*100 ,1))
+ m1[i,14]<- ( round(a[2,1] ,2))
+ m1[i,15]<- ( round(a[2,2] ,2))
+ m1[i,16]<- ( round(a[2,3] ,2))
+ m1[i,17]<- ( round(a[3,1] ,2))
+ m1[i,18]<- ( round(a[3,2] ,2))
+ m1[i,19]<- ( round(a[3,3] ,2))
+
+ n1[i,1]<- colnames(p)[i]
+ n1[i,2]<- (e[1,2])
+ n1[i,3]<- round((e[1,2]/e[3,2])*100,1)
+ n1[i,4]<- e[2,2]
+ n1[i,5]<- round((e[2,2]/e[3,2])*100,1)
+ n1[i,6]<- e[1,1]
+ n1[i,7]<- round((e[1,1]/e[3,1])*100,1)
+ n1[i,8]<- e[2,1]
+ n1[i,9]<- round((e[2,1]/e[3,1])*100,1)
+ n1[i,10]<- round(d[2,1] ,2)
+ n1[i,11]<- round(d[2,2] ,2)
+ n1[i,12]<- round(d[2,3] ,2)
+
+ }
+
+ m1 <- as.table(m1)
+ dimnames(m1)[[1]] <- c(1:dim(p)[2])
+ dimnames(m1)[[2]] <- c("Name", "0 ","0 (%)", "1 ","1 (%)", "2 ","2 (%)",
+ "0 ","0 (%)", "1 ","1 (%)", "2 ","2 (%)","OR1","CI-Low","CI-high","OR2",
+ "CI-Low","CI-high")
+names(dimnames(m1)) <- c("", " Genotype frequencies for Cases & Controls, and OR(95% CI)")
+ if (!missing(filenameGeno))
+ write.table(m1,file=filenameGeno, row.names = FALSE,sep = "\t")
+
+ n1 <- as.table(n1)
+ dimnames(n1)[[1]] <- c(1:dim(p)[2])
+ dimnames(n1)[[2]] <- c("Name", "0","0 (%)","1","1 (%)", "0","0 (%)","1",
+ "1 (%)","OR1","CI-Low","CI-high")
+names(dimnames(n1)) <- c("", " Allele frequencies for Cases & Controls, and OR(95% CI) ")
+ if (!missing(filenameAllele))
+ write.table(n1,file=filenameAllele, row.names = FALSE,sep = "\t")
+
+ p<- list(Genotype =m1, Allelic=n1)
+ return(p)
+ }
+#' Function to obtain multivariate odds ratios from a logistic regression model.
+#' The function estimates multivariate (adjusted) odds ratios (ORs) with
+#' 95\% confidence intervals (CIs) for all the genetic and non-genetic variables
+#' in the risk model.
+#'
+#' The function requires that first a logistic regression
+#' model is fitted either by using \code{GLM} function or the function
+#' \code{\link{fitLogRegModel}}. In addition to the multivariate ORs,
+#' the function returns summary statistics of model performance, namely the Brier
+#' score and the Nagelkerke's \eqn{R^2} value.
+#' The Brier score quantifies the accuracy of risk predictions by comparing
+#' predicted risks with observed outcomes at individual level (where outcome
+#' values are either 0 or 1). The Nagelkerke's \eqn{R^2} value indicates the percentage of variation
+#' of the outcome explained by the predictors in the model.
+#'
+#' @param riskModel Name of logistic regression model that can be fitted using
+#' the function \code{\link{fitLogRegModel}}.
+#' @param filename Name of the output file in which the multivariate
+#' ORs will be saved. If no directory is specified, the file is
+#' saved in the working directory as a txt file.
+#' When \code{filename} is not specified, the output is not saved.
+#'
+#'
+#' @return
+#' The function returns:
+#' \item{Predictors Summary}{OR with 95\% CI and corresponding p-values for
+#' each predictor in the model}
+#' \item{Brier Score}{Brier score}
+#' \item{Nagelkerke Index}{Nagelkerke's \eqn{R^2} value}
+#'
+#'
+#'
+#' @keywords htest
+#'
+#'
+#' @references Brier GW. Verification of forecasts expressed in terms of probability.
+#' Monthly weather review 1950;78:1-3.
+#'
+#'
+#' Nagelkerke NJ. A note on a general definition of the coefficient
+#' of determination. Biometrika 1991;78:691-692.
+#'
+#'
+#' @seealso \code{\link{fitLogRegModel}}, \code{\link{ORunivariate}}
+#'
+#' @examples
+#' # specify dataset with outcome and predictor variables
+#' data(ExampleData)
+#' # specify column number of outcome variable
+#' cOutcome <- 2
+#' # specify column numbers of non-genetic predictors
+#' cNonGenPred <- c(3:10)
+#' # specify column numbers of non-genetic predictors that are categorical
+#' cNonGenPredCat <- c(6:8)
+#' # specify column numbers of genetic predictors
+#' cGenPred <- c(11,13:16)
+#' # specify column numbers of genetic predictors that are categorical
+#' cGenPredCat <- c(0)
+#'
+#' # fit logistic regression model
+#' riskmodel <- fitLogRegModel(data=ExampleData, cOutcome=cOutcome,
+#' cNonGenPreds=cNonGenPred, cNonGenPredsCat=cNonGenPredCat,
+#' cGenPreds=cGenPred, cGenPredsCat=cGenPredCat)
+#'
+#' # obtain multivariate OR(95\% CI) for all predictors of the fitted model
+#' ORmultivariate(riskModel=riskmodel, filename="multiOR.txt")
+#'
+"ORmultivariate" <- function(riskModel, filename)
+{
+if (!any(class(riskModel)=="glm"))
+stop("riskModel argument should have 'glm' class")
+sum.coef<-summary(riskModel)$coef
+OR<-exp(sum.coef[,"Estimate"])
+Upper.CI<-exp(sum.coef[,"Estimate"]+1.96*sum.coef[,"Std. Error"])
+Lower.CI<-exp(sum.coef[,1]-1.96*sum.coef[,2])
+tab <-cbind("OR"=round(OR ,4),"Lower.CI"=round(Lower.CI ,4),"Upper.CI"=
+round(Upper.CI ,4),"p-value"=round((sum.coef)[, 4],4))
+
+if (!missing(filename))
+ write.table(tab,file=filename, row.names=TRUE,sep = "\t")
+
+B <- mean((riskModel$y) * (1-predict(riskModel, type="response"))^2 +
+(1-riskModel$y) * (predict(riskModel, type="response"))^2)
+# B
+Bmax <- mean(riskModel$y) * (1-mean(riskModel$y))^2 +
+(1-mean(riskModel$y)) *mean(riskModel$y)^2
+Bscaled <- 1 - B/Bmax
+
+LLfull <- (riskModel$deviance)/(-2)
+LLnul <- (riskModel$null.deviance)/(-2)
+n <- length(riskModel$y)
+Mc <- 1-(LLfull/LLnul) # McFadden's R square
+Cox <- 1-exp ((-2)*(LLfull-LLnul)/n) # Cox-Snell R square
+Rmax <- 1-exp((2*LLnul)/n)
+Nag <- Cox/Rmax # Nagelkerke_R2
+
+ p<- list(Predictors_Summary= tab,Brier_Score=round(B ,4),Nagelkerke_R2=round(Nag,4))
+ return(p)
+}
+#' Function to plot predicted risks against risk scores.
+#' This function is used to make a plot of predicted risks against risk scores.
+#'
+#' The function creates a plot of predicted risks against risk scores.
+#' Predicted risks can be obtained using the functions
+#' \code{\link{fitLogRegModel}} and \code{\link{predRisk}}
+#' or be imported from other methods or packages.
+#' The function \code{\link{riskScore}} can be
+#' used to compute unweighted or weighted risk scores.
+#'
+#' @param data Data frame or matrix that includes the outcome and
+#' predictors variables.
+#' @param riskScore Vector of (weighted or unweighted) genetic risk scores.
+#' @param predRisk Vector of predicted risks.
+#' @param plottitle Title of the plot. Specification of \code{plottitle} is optional. Default is "Risk score predicted risk plot".
+#' @param xlabel Label of x-axis. Specification of \code{xlabel} is optional. Default is "Risk score".
+#' @param ylabel Label of y-axis. Specification of \code{ylabel} is optional. Default is "Predicted risk".
+#' @param rangexaxis Range of the x axis. Specification of \code{rangexaxis} is optional.
+#' @param rangeyaxis Range of the y axis. Specification of \code{rangeyaxis} is optional. Default is \code{c(0,1)}.
+#' @param filename Name of the output file in which risk scores and
+#' predicted risks for each individual will be saved. If no directory is
+#' specified, the file is saved in the working directory as a txt file.
+#' When no \code{filename} is specified, the output is not saved.
+#' @param fileplot Name of the output file that contains the plot. The file is
+#' saved in the working directory in the format specified under \code{plottype}. Example:
+#' \code{fileplot="plotname"}. Note that the extension is not specified here.
+#' When \code{fileplot} is not specified, the plot is not saved.
+#' @param plottype The format in which the plot is saved. Available formats are
+#' wmf, emf, png, jpg, jpeg, bmp, tif, tiff, ps,
+#' eps or pdf. For example, \code{plottype="eps"} will save the plot in eps format.
+#' When \code{plottype} is not specified, the plot will be saved in jpg format.
+#'
+#' @return
+#' The function creates a plot of predicted risks against risk scores.
+#'
+#'
+#' @keywords hplot
+#'
+#'
+#' @seealso \code{\link{riskScore}}, \code{\link{predRisk}}
+#'
+#' @examples
+#' # specify dataset with outcome and predictor variables
+#' data(ExampleData)
+#'
+#' # fit a logistic regression model
+#' # all steps needed to construct a logistic regression model are written in a function
+#' # called 'ExampleModels', which is described on page 4-5
+#' riskmodel <- ExampleModels()$riskModel2
+#'
+#' # obtain predicted risks
+#' predRisk <- predRisk(riskmodel)
+#'
+#' # specify column numbers of genetic predictors
+#' cGenPred <- c(11:16)
+#'
+#' # function to compute unweighted genetic risk scores
+#' riskScore <- riskScore(weights=riskmodel, data=ExampleData,
+#' cGenPreds=cGenPred, Type="unweighted")
+#'
+#' # specify range of x-axis
+#' rangexaxis <- c(0,12)
+#' # specify range of y-axis
+#' rangeyaxis <- c(0,1)
+#' # specify label of x-axis
+#' xlabel <- "Risk score"
+#' # specify label of y-axis
+#' ylabel <- "Predicted risk"
+#' # specify title for the plot
+#' plottitle <- "Risk score versus predicted risk"
+#'
+#' # produce risk score-predicted risk plot
[TRUNCATED]
To get the complete diff run:
svnlook diff /svnroot/genabel -r 641
More information about the Genabel-commits
mailing list