[Genabel-commits] r932 - in pkg/PredictABEL: . R man

Mon Jul 23 18:53:05 CEST 2012

Author: lckarssen
Date: 2012-07-23 18:53:04 +0200 (Mon, 23 Jul 2012)
New Revision: 932

Added:
   pkg/PredictABEL/NAMESPACE
   pkg/PredictABEL/NEWS
Modified:
   pkg/PredictABEL/DESCRIPTION
   pkg/PredictABEL/R/PredictABEL.R
   pkg/PredictABEL/man/predRisk.Rd
   pkg/PredictABEL/man/reclassification.Rd
   pkg/PredictABEL/man/simulatedDataset.Rd
Log:
Upload of Suman Kundu's latest changes to PredictABEL. If there are no bugs this will be PredictABEL 1.2-1

Modified: pkg/PredictABEL/DESCRIPTION
===================================================================

--- pkg/PredictABEL/DESCRIPTION	2012-07-20 07:32:48 UTC (rev 931)
+++ pkg/PredictABEL/DESCRIPTION	2012-07-23 16:53:04 UTC (rev 932)
@@ -1,27 +1,29 @@
-Package: PredictABEL
-Title: Assessment of risk prediction models
-Version: 1.2
-Date: 2011-10-19
-Author: Suman Kundu, Yurii S. Aulchenko, A. Cecile J.W. Janssens
-Maintainer: Suman Kundu <s.kundu at erasmusmc.nl>, A. Cecile J.W. Janssens <a.janssens at erasmusmc.nl>
-Depends: R (>= 2.12.0), Hmisc, ROCR, epitools, PBSmodelling
-Suggests: GenABEL
-Description: PredictABEL includes functions to assess the performance of
- risk models. The package contains functions for the various measures that are
- used in empirical studies, including univariate and multivariate odds ratios
- (OR) of the predictors, the c-statistic (or area under the receiver operating
- characteristic (ROC) curve (AUC)), Hosmer-Lemeshow goodness of fit test,
- reclassification table, net reclassification improvement (NRI) and
- integrated discrimination improvement (IDI). Also included are functions
- to create plots, such as risk distributions, ROC curves, calibration plot,
- discrimination box plot and predictiveness curves. In addition to functions
- to assess the performance of risk models, the package includes functions to
- obtain weighted and unweighted risk scores as well as predicted risks using
- logistic regression analysis. These logistic regression functions are
- specifically written for models that include genetic variables, but they
- can also be applied to models that are based on non-genetic risk factors only.
- Finally, the package includes function to construct a simulated dataset that
- contains individual genotype data, estimated genetic risk, and disease status,
- used for the evaluation of genetic risk models.
-License: GPL (>= 2)
-Packaged: 2011-10-18 15:01:45 UTC; 488810
+Package: PredictABEL
+Title: Assessment of risk prediction models
+Version: 1.2-1
+Date: 2012-07-22
+Author: Suman Kundu, Yurii S. Aulchenko, A. Cecile J.W. Janssens
+Maintainer: Suman Kundu <s.kundu at erasmusmc.nl>, A. Cecile J.W. Janssens
+        <a.janssens at erasmusmc.nl>
+Depends: R (>= 2.12.0), Hmisc, ROCR, epitools, PBSmodelling
+Suggests: GenABEL
+Description: PredictABEL includes functions to assess the performance of
+ risk models. The package contains functions for the various measures that are
+ used in empirical studies, including univariate and multivariate odds ratios
+ (OR) of the predictors, the c-statistic (or area under the receiver operating
+ characteristic (ROC) curve (AUC)), Hosmer-Lemeshow goodness of fit test,
+ reclassification table, net reclassification improvement (NRI) and
+ integrated discrimination improvement (IDI). Also included are functions
+ to create plots, such as risk distributions, ROC curves, calibration plot,
+ discrimination box plot and predictiveness curves. In addition to functions
+ to assess the performance of risk models, the package includes functions to
+ obtain weighted and unweighted risk scores as well as predicted risks using
+ logistic regression analysis. These logistic regression functions are
+ specifically written for models that include genetic variables, but they
+ can also be applied to models that are based on non-genetic risk factors only.
+ Finally, the package includes function to construct a simulated dataset with 
+ genotypes, genetic risks, and disease status for a hypothetical population, which 
+ is used for the evaluation of genetic risk models.
+License: GPL (>= 2)
+URL: http://www.genabel.org/packages/PredictABEL
+Packaged: 2012-07-23 15:35:17 UTC; 488810

Added: pkg/PredictABEL/NAMESPACE
===================================================================
--- pkg/PredictABEL/NAMESPACE	                        (rev 0)
+++ pkg/PredictABEL/NAMESPACE	2012-07-23 16:53:04 UTC (rev 932)
@@ -0,0 +1,13 @@
+# Default NAMESPACE created by R
+# Remove the previous line if you edit this file
+
+# Export all names
+exportPattern(".")
+
+# Import all packages listed as Imports or Depends
+import(
+  Hmisc,
+  ROCR,
+  epitools,
+  PBSmodelling
+)

Added: pkg/PredictABEL/NEWS
===================================================================
--- pkg/PredictABEL/NEWS	                        (rev 0)
+++ pkg/PredictABEL/NEWS	2012-07-23 16:53:04 UTC (rev 932)
@@ -0,0 +1,12 @@
+Changes in version 1.2-1 (2012-07-22)
+   * Modified the reclassification function that can handle the zero observation corresponding to a risk category
+   * Modified the reclassification function that can enable to compute the continuous NRI 
+   * Modified the predRisk function which would allow to do prediction with estimated risk model on new data
+
+Changes in version 1.2 (2011-10-19)
+   * Added the function, simulatedDataset, to construct a simulated dataset with genotypes, genetic risks, and disease status for a hypothetical population, which is used for the evaluation of genetic risk models
+   * Modified the function reclassification to include any number of risk categories
+   * Modified the function plotDiscriminationBox which can include any continuous measures (risk scores/ predicted risks) to create box plots for cases and controls
+
+
+

Modified: pkg/PredictABEL/R/PredictABEL.R
===================================================================
--- pkg/PredictABEL/R/PredictABEL.R	2012-07-20 07:32:48 UTC (rev 931)
+++ pkg/PredictABEL/R/PredictABEL.R	2012-07-23 16:53:04 UTC (rev 932)
@@ -248,7 +248,7 @@
 #'
 #' @param riskModel Name of logistic regression model that can be fitted using
 #' the function \code{\link{fitLogRegModel}}.
-#' @param data Data frame or matrix that includes the outcome, ID number and
+#' @param data Data frame or matrix that includes the ID number and
 #' predictor variables.
 #' @param cID Column number of ID variable. The ID number and predicted risks
 #' will be saved under \code{filename}. When \code{cID} is not specified, the output is not saved.
@@ -297,7 +297,7 @@
 {
  if (any(class(riskModel) == "glm"))
   {
-   predrisk <- predict(riskModel, type="response")
+   predrisk <- predict(riskModel, newdata=data, type="response")
   }
 else
   {
@@ -1462,16 +1462,23 @@
   tab <- cbind(Tab, " % reclassified"= round((rowSums(Tab)-diag(Tab))/rowSums(Tab),2)*100)
   names(dimnames(tab)) <- c("Initial Model", "Updated Model")
   print(tab)
-
 cat(" _________________________________________\n")
 
-  x<-improveProb(x1=as.numeric(c1)*(1/max(as.numeric(c1))),
-  x2=as.numeric(c2)*(1/max(as.numeric(c2))), y=data[,cOutcome])
+c11 <-factor(c1, levels = levels(c1), labels = c(1:length(levels(c1))))
+c22 <-factor(c2, levels = levels(c2), labels = c(1:length(levels(c2))))
+
+  x<-improveProb(x1=as.numeric(c11)*(1/(length(levels(c11)))),
+  x2=as.numeric(c22)*(1/(length(levels(c22)))), y=data[,cOutcome])
+  
+
   y<-improveProb(x1=predrisk1, x2=predrisk2, y=data[,cOutcome])
 
-cat("\n NRI [95% CI]:", round(x$nri,4),"[",round(x$nri-1.96*x$se.nri,4),"-",
+cat("\n NRI(Categorical) [95% CI]:", round(x$nri,4),"[",round(x$nri-1.96*x$se.nri,4),"-",
  round(x$nri+1.96*x$se.nri,4), "]", "; p-value:", round(2*pnorm(-abs(x$z.nri)),5), "\n" )
 
+ cat(" NRI(Continuous) [95% CI]:", round(y$nri,4),"[",round(y$nri-1.96*y$se.nri,4),"-",
+ round(y$nri+1.96*y$se.nri,4), "]", "; p-value:", round(2*pnorm(-abs(y$z.nri)),5), "\n" )
+
 cat(" IDI [95% CI]:", round(y$idi,4),"[",round(y$idi-1.96*y$se.idi,4),"-",
  round(y$idi+1.96*y$se.idi,4), "]","; p-value:", round(2*pnorm(-abs(y$z.idi)),5), "\n")
 }
@@ -1621,8 +1628,8 @@
   out<- list(riskModel1=riskmodel1, riskModel2=riskmodel2)
   return(out)
   }
-#' Function to construct a simulated dataset containing individual genotype
-#' data, estimated genetic risk and disease status.
+#' Function to construct a simulated dataset containing individual genotype data, 
+#' genetic risks and disease status for a hypothetical population.
 #' Construct a dataset that contains individual genotype data, genetic risk,
 #' and disease status for a hypothetical population.
 #' The dataset is constructed using simulation in such a way that the frequencies
@@ -1723,23 +1730,30 @@
 #' van Duijn CM. Predictive testing for complex diseases using multiple genes:
 #' fact or fiction? Genet Med. 2006;8:395-400.
 #'
+#' Kundu S, Karssen LC, Janssens AC: Analytical and simulation methods for 
+#' estimating the potential predictive ability of genetic profiling: a comparison 
+#' of methods and results. Eur J Hum Genet. 2012 May 30.
+#' 
+#' van Zitteren M, van der Net JB, Kundu S, Freedman AN, van Duijn CM,
+#' Janssens AC. Genome-based prediction of breast cancer risk in the general
+#' population: a modeling study based on meta-analyses of genetic associations.
+#' Cancer Epidemiol Biomarkers Prev. 2011;20:9-22.
 #'
+#' van der Net JB, Janssens AC, Sijbrands EJ, Steyerberg EW. Value of genetic
+#' profiling for the prediction of coronary heart disease.
+#' Am Heart J. 2009;158:105-10.
+#'
 #' Janssens AC, Moonesinghe R, Yang Q, Steyerberg EW, van Duijn CM, Khoury MJ.
 #' The impact of genotype frequencies on the clinical validity of genomic
 #' profiling for predicting common chronic diseases. Genet Med. 2007;9:528-35.
 #'
+
+
 #'
-#' van der Net JB, Janssens AC, Sijbrands EJ, Steyerberg EW. Value of genetic
-#' profiling for the prediction of coronary heart disease.
-#' Am Heart J. 2009;158:105-10.
 #'
+
 #'
-#' van Zitteren M, van der Net JB, Kundu S, Freedman AN, van Duijn CM,
-#' Janssens AC. Genome-based prediction of breast cancer risk in the general
-#' population: a modeling study based on meta-analyses of genetic associations.
-#' Cancer Epidemiol Biomarkers Prev. 2011;20:9-22.
 #'
-#'
 #' @examples
 #' # specify the matrix containing the ORs and frequencies of genetic variants
 #' # In this example we used per allele effects of the risk variants
@@ -1756,7 +1770,7 @@
 #' # Obtain the AUC and produce ROC curve
 #' plotROC(data=Data, cOutcome=4, predrisk=Data[,3])
 #'
-"simulatedDataset" <- function(ORfreq, poprisk, popsize, filename)
+"simulatedDataset" <- function(ORfreq, poprisk, popsize, filename) 
 {
 if (missing(poprisk)) {stop("Population disease risk is not specified")}
 if (missing(popsize)) {stop("Total number of individuals is not mentioned")}
@@ -1773,6 +1787,7 @@
 tabel <- cbind(a,b,c,dd,e,f,g,OR)
 tabel
 }
+
 reconstruct.2x3table <- function(OR1,OR2,p1,p2,d,s){
 	a	<- 1
 	eOR	<- 0
@@ -1818,9 +1833,10 @@
 	tabel
 }
 
-adjust.postp <- function (pd, LR){
+
+adjust.postp <- function (pd, LR){		
 	odds.diff <- 0
-	prior.odds <- pd/(1-pd)
+	prior.odds <- pd/(1-pd)	
 	for (i in (1:100000)) {
 	Postp <- (prior.odds*LR)/(1+(prior.odds*LR))
 	odds.diff <- (pd-mean(Postp))/ (1-(pd-mean(Postp)))
@@ -1830,42 +1846,46 @@
 	Postp
 }
 
-
 func.data <- function(p,d,OR,s,g){
   Data <- matrix (NA,s,4+g)
-  Data[,1] <- rep(0,s)
-	Data[,2] <- rep(1,s)
+  Data[,1] <- rep(0,s)                   
+	Data[,2] <- rep(1,s)									
 	Data[,3] <- rep(0,s)
 	i <- 0
 	while (i < g){
     i <- i+1
     cells2x3 <- rep(NA,9)
     cells2x3 <- if(p[i,2]==0) {reconstruct.2x2table(p=p[i,1],d,OR=OR[i,1],s)} else {if(p[i,2]==1) {reconstruct.2x3tableHWE(OR=OR[i,1],p=p[i,1],d,s)}
-  else {reconstruct.2x3table(OR1=OR[i,1],OR2=OR[i,2],p1=p[i,1],p2=p[i,2],d,s)}}   # reconstruct table for calculation of likelihood ratios for genotypes
-      LREE 	  <- ((cells2x3[1]/d*s)/(cells2x3[2]/(1-d)*s))			# calculate likelihood ratios
+  else {reconstruct.2x3table(OR1=OR[i,1],OR2=OR[i,2],p1=p[i,1],p2=p[i,2],d,s)}}   
+      LREE 	  <- ((cells2x3[1]/d*s)/(cells2x3[2]/(1-d)*s))			
       LREe	  <- ((cells2x3[3]/d*s)/(cells2x3[4]/(1-d)*s))
       LRee	  <- ((cells2x3[5]/d*s)/(cells2x3[6]/(1-d)*s))
-
+ 
  Gene <- if(p[i,2]==0){c(rep(0,((1-p[i,1]-p[i,2])*s)),rep(1,p[i,1]*s),rep(2,p[i,2]*s))}
-     else {c(rep(0,((1-p[i,1]*p[i,1]-2*p[i,1]*(1-p[i,1]))*s)),rep(1,2*p[i,1]*(1-p[i,1])*s),rep(2,p[i,1]*p[i,1]*s))}		# create vector of genotypes for all subjects based on hardy-weinberg distribution of alleles
-		Filler <- s-length(Gene)                               #soms is Gene 1 te subject te kort en dan werkt het niet
-		Gene <- sample(c(Gene,rep(0,Filler)),s,rep=FALSE)
+ else {if(p[i,2]==1) {c(rep(0,(((1-p[i,1])^2)*s)),rep(1,2*p[i,1]*(1-p[i,1])*s),rep(2,p[i,1]*p[i,1]*s))}
+  else {c(rep(0,((1-p[i,1]-p[i,2])*s)),rep(1,p[i,1]*s),rep(2,p[i,2]*s))}}  
+		Filler <- s-length(Gene)                               
+		Gene <- sample(c(Gene,rep(0,Filler)),s,replace=FALSE)
     Data[,4+i] <- Gene
     GeneLR <- ifelse(Gene==0,LRee,ifelse(Gene==1,LREe,LREE))
-
+   
     Data[,1] <- Data[,1]+Gene
-    Data[,2] <- Data[,2]*GeneLR
-
+    Data[,2] <- Data[,2]*GeneLR	
+			
+#	 cat(i,"")
 		}
-
-		Data[,3] <- adjust.postp(pd=d, LR=Data[,2])
-		Data[,4]  <- ifelse(runif(s)<=(Data[,3]), 1, 0)
+		
+		Data[,3] <- adjust.postp(pd=d, LR=Data[,2])				
+		Data[,4]  <- ifelse(runif(s)<=(Data[,3]), 1, 0)  					          	
     Data <- as.data.frame(Data)
     Data
     }
- simulatedData <- func.data (p=ORfreq[,c(3,4)],d=poprisk,OR=ORfreq[,c(1,2)],s=popsize,g=nrow(ORfreq))
+  
+ simulatedData <- func.data  (p=ORfreq[,c(3,4)],d=poprisk,OR=ORfreq[,c(1,2)],s=popsize,g=nrow(ORfreq))   
 
-if (!missing(filename))
+
+if (!missing(filename)) 
 	{write.table( simulatedData,file=filename, row.names=TRUE,sep = "\t")  }
+
  return(simulatedData)
 }

Modified: pkg/PredictABEL/man/predRisk.Rd
===================================================================
--- pkg/PredictABEL/man/predRisk.Rd	2012-07-20 07:32:48 UTC (rev 931)
+++ pkg/PredictABEL/man/predRisk.Rd	2012-07-23 16:53:04 UTC (rev 932)
@@ -2,7 +2,7 @@
 \alias{predRisk}
 \title{Function to compute predicted risks for all individuals in the dataset.}
 \usage{predRisk(riskModel, data, cID, filename)}
-\description{Function to compute predicted risks for all individuals in the dataset.}
+\description{Function to compute predicted risks for all individuals in the (new)dataset.}
 \details{The function computes predicted risks from a specified logistic regression model.   
 The function \code{\link{fitLogRegModel}} can be used to construct such a model.}
 \value{The function returns a vector of predicted risks.}
@@ -11,7 +11,7 @@
 \code{\link{plotROC}}, \code{\link{plotPriorPosteriorRisk}}}
 \arguments{\item{riskModel}{Name of logistic regression model that can be fitted using  
 the function \code{\link{fitLogRegModel}}.}
-\item{data}{Data frame or matrix that includes the outcome, ID number and 
+\item{data}{Data frame or matrix that includes the ID number and 
 predictor variables.}
 \item{cID}{Column number of ID variable. The ID number and predicted risks 
 will be saved under \code{filename}. When \code{cID} is not specified, the output is not saved.}

Modified: pkg/PredictABEL/man/reclassification.Rd
===================================================================
--- pkg/PredictABEL/man/reclassification.Rd	2012-07-20 07:32:48 UTC (rev 931)
+++ pkg/PredictABEL/man/reclassification.Rd	2012-07-23 16:53:04 UTC (rev 932)
@@ -3,14 +3,19 @@
 \title{Function for reclassification table and statistics.}
 \usage{reclassification(data, cOutcome, predrisk1, predrisk2, cutoff)}
 \description{The function creates a reclassification table and provides statistics.}
-\details{The function creates a reclassification table and computes the net        
-reclassification improvement (\code{NRI}) and integrated discrimination   
-improvement (\code{IDI}). A reclassification table indicates the number 
-of individuals who move to another risk category or remain in the same 
-risk category as a result of updating the risk model. NRI equal to \code{x\%}
-means that compared with individuals without outcome, 
+\details{The function creates a reclassification table and computes the 
+categorical and continuous net reclassification improvement (\code{NRI}) and 
+integrated discrimination improvement (\code{IDI}). A reclassification table 
+indicates the number of individuals who move to another risk category or remain 
+in the same risk category as a result of updating the risk model. Categorical \code{NRI} equal to 
+\code{x\%} means that compared with individuals without outcome, 
 individuals with outcome were almost \code{x\%} more likely to move up a category than down.
-IDI equal to \code{x\%} means that the difference in average 
+The function also computes continuous \code{NRI}, which does not require any discrete 
+risk categories and relies on the proportions of individuals with outcome 
+correctly assigned a higher probability and individuals without outcome 
+correctly assigned a lower probability by an updated model compared with the 
+initial model.
+\code{IDI} equal to \code{x\%} means that the difference in average 
 predicted risks between the individuals with and without the outcome  
 increased by \code{x\%} in the updated model.
 The function requires predicted risks estimated by using two separate risk 
@@ -19,7 +24,8 @@
 or be imported from other methods or packages.}
 \value{The function returns the reclassification table, separately  
 for individuals with and without the outcome of interest and the following measures: 
-\item{NRI}{Net Reclassification Improvement with 95\% CI and \code{p-value} of the test}
+\item{NRI (Categorical)}{Categorical Net Reclassification Improvement with 95\% CI and \code{p-value} of the test}
+\item{NRI (Continuous)}{Continuous Net Reclassification Improvement with 95\% CI and \code{p-value} of the test}
 \item{IDI}{Integrated Discrimination Improvement with 95\% CI and \code{p-value} 
 of the test}}
 \keyword{htest}

Modified: pkg/PredictABEL/man/simulatedDataset.Rd
===================================================================
--- pkg/PredictABEL/man/simulatedDataset.Rd	2012-07-20 07:32:48 UTC (rev 931)
+++ pkg/PredictABEL/man/simulatedDataset.Rd	2012-07-23 16:53:04 UTC (rev 932)
@@ -1,7 +1,7 @@
 \name{simulatedDataset}
 \alias{simulatedDataset}
-\title{Function to construct a simulated dataset containing individual  
-genotype data, estimated genetic risk and disease status.}
+\title{Function to construct a simulated dataset containing individual genotype  
+ data, genetic risks and disease status for a hypothetical population.}
 \usage{simulatedDataset(ORfreq, poprisk, popsize, filename)}
 \description{Construct a dataset that contains individual genotype data, genetic risk,   
 and disease status for a hypothetical population. 
@@ -76,20 +76,25 @@
 fact or fiction? Genet Med. 2006;8:395-400.
 
 
-Janssens AC, Moonesinghe R, Yang Q, Steyerberg EW, van Duijn CM, Khoury MJ. 
-The impact of genotype frequencies on the clinical validity of genomic 
-profiling for predicting common chronic diseases. Genet Med. 2007;9:528-35.
+Kundu S, Karssen LC, Janssens AC: Analytical and simulation methods for 
+estimating the potential predictive ability of genetic profiling: a comparison 
+of methods and results. Eur J Hum Genet. 2012 May 30.
 
 
+van Zitteren M, van der Net JB, Kundu S, Freedman AN, van Duijn CM, 
+Janssens AC. Genome-based prediction of breast cancer risk in the general 
+population: a modeling study based on meta-analyses of genetic associations. 
+Cancer Epidemiol Biomarkers Prev. 2011;20:9-22.
+
+
 van der Net JB, Janssens AC, Sijbrands EJ, Steyerberg EW. Value of genetic 
 profiling for the prediction of coronary heart disease. 
 Am Heart J. 2009;158:105-10.
 
 
-van Zitteren M, van der Net JB, Kundu S, Freedman AN, van Duijn CM, 
-Janssens AC. Genome-based prediction of breast cancer risk in the general 
-population: a modeling study based on meta-analyses of genetic associations. 
-Cancer Epidemiol Biomarkers Prev. 2011;20:9-22.}
+Janssens AC, Moonesinghe R, Yang Q, Steyerberg EW, van Duijn CM, Khoury MJ. 
+The impact of genotype frequencies on the clinical validity of genomic 
+profiling for predicting common chronic diseases. Genet Med. 2007;9:528-35.}
 \arguments{\item{ORfreq}{Matrix with ORs and frequencies of the genetic variants. 
 The matrix contains four columns in which the first two describe ORs and the 
 last two describe the corresponding frequencies. The number of rows in this