[Genabel-commits] r796 - in pkg/PredictABEL: . R man

noreply at r-forge.r-project.org noreply at r-forge.r-project.org
Wed Oct 19 11:25:29 CEST 2011


Author: lckarssen
Date: 2011-10-19 11:25:26 +0200 (Wed, 19 Oct 2011)
New Revision: 796

Modified:
   pkg/PredictABEL/DESCRIPTION
   pkg/PredictABEL/R/PredictABEL.R
   pkg/PredictABEL/man/PredictABEL-package.Rd
   pkg/PredictABEL/man/plotCalibration.Rd
   pkg/PredictABEL/man/plotDiscriminationBox.Rd
   pkg/PredictABEL/man/plotPriorPosteriorRisk.Rd
   pkg/PredictABEL/man/plotROC.Rd
   pkg/PredictABEL/man/plotRiskDistribution.Rd
   pkg/PredictABEL/man/predRisk.Rd
   pkg/PredictABEL/man/reclassification.Rd
   pkg/PredictABEL/man/simulatedDataset.Rd
Log:
Upload of Suman's latest changes to PredictABEL. Fine-tuning of the documentation. This version will be sent to CRAN as version 1.2. 



Modified: pkg/PredictABEL/DESCRIPTION
===================================================================
--- pkg/PredictABEL/DESCRIPTION	2011-10-18 22:15:20 UTC (rev 795)
+++ pkg/PredictABEL/DESCRIPTION	2011-10-19 09:25:26 UTC (rev 796)
@@ -1,23 +1,27 @@
-Package: PredictABEL
-Title: Assessment of risk prediction models
-Version: 1.2
-Date: 2011-09-30
-Author: Suman Kundu, Yurii S. Aulchenko, A. Cecile J.W. Janssens
-Maintainer: Suman Kundu <s.kundu at erasmusmc.nl>, A. Cecile J.W. Janssens <a.janssens at erasmusmc.nl>
-Depends: R (>= 2.12.0), Hmisc, ROCR, epitools, PBSmodelling
-Suggests: GenABEL
-Description: PredictABEL includes functions to assess the performance of
- risk models. The package contains functions for the various measures that are
- used in empirical studies, including univariate and multivariate odds ratios
- (OR) of the predictors, the c-statistic (or area under the receiver operating
- characteristic (ROC) curve (AUC)), Hosmer-Lemeshow goodness of fit test,
- reclassification table, net reclassification improvement (NRI) and
- integrated discrimination improvement (IDI). Also included are functions
- to create plots, such as risk distributions, ROC curves, calibration plot,
- discrimination box plot and predictiveness curves. In addition to functions
- to assess the performance of risk models, the package includes functions to
- obtain weighted and unweighted risk scores as well as predicted risks using
- logistic regression analysis. These logistic regression functions are
- specifically written for models that include genetic variables, but they
- can also be applied to models that are based on non-genetic risk factors only.
-License: GPL (>= 2)
+Package: PredictABEL
+Title: Assessment of risk prediction models
+Version: 1.2
+Date: 2011-10-19
+Author: Suman Kundu, Yurii S. Aulchenko, A. Cecile J.W. Janssens
+Maintainer: Suman Kundu <s.kundu at erasmusmc.nl>, A. Cecile J.W. Janssens <a.janssens at erasmusmc.nl>
+Depends: R (>= 2.12.0), Hmisc, ROCR, epitools, PBSmodelling
+Suggests: GenABEL
+Description: PredictABEL includes functions to assess the performance of
+ risk models. The package contains functions for the various measures that are
+ used in empirical studies, including univariate and multivariate odds ratios
+ (OR) of the predictors, the c-statistic (or area under the receiver operating
+ characteristic (ROC) curve (AUC)), Hosmer-Lemeshow goodness of fit test,
+ reclassification table, net reclassification improvement (NRI) and
+ integrated discrimination improvement (IDI). Also included are functions
+ to create plots, such as risk distributions, ROC curves, calibration plot,
+ discrimination box plot and predictiveness curves. In addition to functions
+ to assess the performance of risk models, the package includes functions to
+ obtain weighted and unweighted risk scores as well as predicted risks using
+ logistic regression analysis. These logistic regression functions are
+ specifically written for models that include genetic variables, but they
+ can also be applied to models that are based on non-genetic risk factors only.
+ Finally, the package includes function to construct a simulated dataset that
+ contains individual genotype data, estimated genetic risk, and disease status,
+ used for the evaluation of genetic risk models.
+License: GPL (>= 2)
+Packaged: 2011-10-18 15:01:45 UTC; 488810

Modified: pkg/PredictABEL/R/PredictABEL.R
===================================================================
--- pkg/PredictABEL/R/PredictABEL.R	2011-10-18 22:15:20 UTC (rev 795)
+++ pkg/PredictABEL/R/PredictABEL.R	2011-10-19 09:25:26 UTC (rev 796)
@@ -95,11 +95,13 @@
 #' (4) \code{PBSmodelling}, is used to produce predictiveness curve.
 #'
 #' @note The current version of the package includes the basic measures
-#' and plots that are used in the assessment of (genetic) risk prediction models.
+#' and plots that are used in the assessment of (genetic) risk prediction models and the
+#' function to construct a simulated dataset that contains individual genotype
+#' data, estimated genetic risk and disease status, used for the evaluation of
+#' genetic risk models (see Janssens et al, Genet Med 2006).
 #' Planned extensions of the package include functions to construct risk
 #' models using Cox Proportional Hazards analysis for prospective data and
-#' functions to construct simulated data for the evaluation of
-#' genetic risk models (see Janssens et al, Genet Med 2006).
+#' assess the performance of risk models for time-to-event data.
 #'
 #'
 ##' @author  Suman Kundu
@@ -252,7 +254,8 @@
 #' will be saved under \code{filename}. When \code{cID} is not specified, the output is not saved.
 #' @param filename Name of the output file in which the ID number and
 #' estimated predicted risks will be saved. The file is saved in the working
-#' directory as a txt file. When no \code{filename} is specified, the output is not saved.
+#' directory as a txt file. Example: filename="name.txt". When no \code{filename}
+#' is specified, the output is not saved.
 #'
 #'
 #'
@@ -287,7 +290,7 @@
 #'  cGenPreds=cGenPred, cGenPredsCat=cGenPredCat)
 #'
 #'  # obtain predicted risks
-#'  predRisk <- predRisk(riskModel=riskmodel, filename="name.txt")
+#'  predRisk <- predRisk(riskModel=riskmodel)
 #'
 "predRisk" <-
 function(riskModel, data, cID, filename)
@@ -471,7 +474,7 @@
   n1 <- matrix (nrow=dim(p)[2], ncol=12)
 
   for (i in 1:dim(p)[2])
-         {
+	 {
 s<-table(p[,i],o)
 if (dim(s)[1]==1) {s <- rbind(s,c(0,0));s <- rbind(s,c(0,0))}
 if (dim(s)[1]==2) {s <- rbind(s,c(0,0))}
@@ -487,40 +490,40 @@
 d<-oddsratio.wald(c)$measure
 e<-oddsratio.wald(c)$data
 
-            m1[i,1]<-  colnames(p)[i]
-            m1[i,2]<-  ( b[1,2])
-            m1[i,3]<-  ( round((b[1,2]/b[4,2])*100 ,1))
-            m1[i,4]<-  ( b[2,2])
-            m1[i,5]<-  ( round((b[2,2]/b[4,2])*100 ,1))
-            m1[i,6]<-  ( b[3,2])
-            m1[i,7]<-  ( round((b[3,2]/b[4,2])*100 ,1))
-            m1[i,8]<-  ( b[1,1])
-            m1[i,9]<-  ( round((b[1,1]/b[4,1])*100 ,1))
-            m1[i,10]<- ( b[2,1])
-            m1[i,11]<- ( round((b[2,1]/b[4,1])*100 ,1))
-            m1[i,12]<- ( b[3,1])
-            m1[i,13]<- ( round((b[3,1]/b[4,1])*100 ,1))
-            m1[i,14]<- ( round(a[2,1] ,2))
-            m1[i,15]<- ( round(a[2,2] ,2))
-            m1[i,16]<- ( round(a[2,3] ,2))
-            m1[i,17]<- ( round(a[3,1] ,2))
-            m1[i,18]<- ( round(a[3,2] ,2))
-            m1[i,19]<- ( round(a[3,3] ,2))
+	    m1[i,1]<-  colnames(p)[i]
+	    m1[i,2]<-  ( b[1,2])
+	    m1[i,3]<-  ( round((b[1,2]/b[4,2])*100 ,1))
+	    m1[i,4]<-  ( b[2,2])
+	    m1[i,5]<-  ( round((b[2,2]/b[4,2])*100 ,1))
+	    m1[i,6]<-  ( b[3,2])
+	    m1[i,7]<-  ( round((b[3,2]/b[4,2])*100 ,1))
+	    m1[i,8]<-  ( b[1,1])
+	    m1[i,9]<-  ( round((b[1,1]/b[4,1])*100 ,1))
+	    m1[i,10]<- ( b[2,1])
+	    m1[i,11]<- ( round((b[2,1]/b[4,1])*100 ,1))
+	    m1[i,12]<- ( b[3,1])
+	    m1[i,13]<- ( round((b[3,1]/b[4,1])*100 ,1))
+	    m1[i,14]<- ( round(a[2,1] ,2))
+	    m1[i,15]<- ( round(a[2,2] ,2))
+	    m1[i,16]<- ( round(a[2,3] ,2))
+	    m1[i,17]<- ( round(a[3,1] ,2))
+	    m1[i,18]<- ( round(a[3,2] ,2))
+	    m1[i,19]<- ( round(a[3,3] ,2))
 
-            n1[i,1]<-  colnames(p)[i]
-            n1[i,2]<-  (e[1,2])
-            n1[i,3]<-  round((e[1,2]/e[3,2])*100,1)
-            n1[i,4]<-  e[2,2]
-            n1[i,5]<-  round((e[2,2]/e[3,2])*100,1)
-            n1[i,6]<-   e[1,1]
-            n1[i,7]<-   round((e[1,1]/e[3,1])*100,1)
-            n1[i,8]<-   e[2,1]
-            n1[i,9]<-   round((e[2,1]/e[3,1])*100,1)
-            n1[i,10]<-  round(d[2,1] ,2)
-            n1[i,11]<-  round(d[2,2] ,2)
-            n1[i,12]<-  round(d[2,3] ,2)
+	    n1[i,1]<-  colnames(p)[i]
+	    n1[i,2]<-  (e[1,2])
+	    n1[i,3]<-  round((e[1,2]/e[3,2])*100,1)
+	    n1[i,4]<-  e[2,2]
+	    n1[i,5]<-  round((e[2,2]/e[3,2])*100,1)
+	    n1[i,6]<-   e[1,1]
+	    n1[i,7]<-   round((e[1,1]/e[3,1])*100,1)
+	    n1[i,8]<-   e[2,1]
+	    n1[i,9]<-   round((e[2,1]/e[3,1])*100,1)
+	    n1[i,10]<-  round(d[2,1] ,2)
+	    n1[i,11]<-  round(d[2,2] ,2)
+	    n1[i,12]<-  round(d[2,3] ,2)
 
-        }
+	}
 
        m1 <- as.table(m1)
        dimnames(m1)[[1]] <- c(1:dim(p)[2])
@@ -529,7 +532,7 @@
        "CI-Low","CI-high")
 names(dimnames(m1)) <- c("", " Genotype frequencies for Cases & Controls, and OR(95% CI)")
      if (!missing(filenameGeno))
-                 write.table(m1,file=filenameGeno, row.names = FALSE,sep = "\t")
+		 write.table(m1,file=filenameGeno, row.names = FALSE,sep = "\t")
 
        n1 <- as.table(n1)
        dimnames(n1)[[1]] <- c(1:dim(p)[2])
@@ -540,7 +543,7 @@
        write.table(n1,file=filenameAllele, row.names = FALSE,sep = "\t")
 
        p<- list(Genotype =m1,  Allelic=n1)
-        return(p)
+	return(p)
   }
 #' Function to obtain multivariate odds ratios from a logistic regression model.
 #' The function estimates multivariate (adjusted) odds ratios (ORs) with
@@ -621,7 +624,7 @@
 round(Upper.CI ,4),"p-value"=round((sum.coef)[, 4],4))
 
 if (!missing(filename))
-        write.table(tab,file=filename, row.names=TRUE,sep = "\t")
+	write.table(tab,file=filename, row.names=TRUE,sep = "\t")
 
 B <- mean((riskModel$y) * (1-predict(riskModel, type="response"))^2 +
 (1-riskModel$y) * (predict(riskModel, type="response"))^2)
@@ -732,16 +735,15 @@
   abline(b,col = 8)
 
  if (missing(plottype)) {plottype<- "jpg"}
-        if (!missing(fileplot))
+ 	if (!missing(fileplot))
       savePlot(filename = fileplot,
-         type =plottype,
-         device = dev.cur(),
-         restoreConsole = TRUE)
+	 type =plottype,
+	 device = dev.cur())
   tab<- cbind(riskScore,predRisk)
   tab <- as.table(tab)
   dimnames(tab)[[2]] <- c("Risk Score", "Predicted risk ")
   if (!missing(filename))
-        write.table(tab,file=filename, row.names=TRUE,sep = "\t")
+	write.table(tab,file=filename, row.names=TRUE,sep = "\t")
   }
 #' Function to plot posterior risks against prior risks.
 #'
@@ -841,46 +843,45 @@
   risk2 <- posteriorrisk
 
     if( plotAll)
-        {
-        plot(risk1,risk2,xlab= xlabel, ylab=ylabel,main=plottitle,
-        cex.lab=1.2, cex.axis=1.1, las=1,pty='s',xlim=rangeaxis ,
-        ylim=rangeaxis,pch=20)
-        abline(a=0,b=1, lwd=1,col=8)
-               if (missing(plottype)) {plottype<- "jpg"}
-               if (!missing(fileplot))
-         savePlot(filename = fileplot,
-         type =plottype,
-         device = dev.cur(),
-         restoreConsole = TRUE)
+	{
+	plot(risk1,risk2,xlab= xlabel, ylab=ylabel,main=plottitle,
+	cex.lab=1.2, cex.axis=1.1, las=1,pty='s',xlim=rangeaxis ,
+	ylim=rangeaxis,pch=20)
+	abline(a=0,b=1, lwd=1,col=8)
+	       if (missing(plottype)) {plottype<- "jpg"}
+ 	       if (!missing(fileplot))
+	 savePlot(filename = fileplot,
+	 type =plottype,
+	 device = dev.cur()
+		  )
        }
        else
        {
-        op <- par(mfrow=c(1,2),pty="s" )
-        plot(risk1,risk2,xlab= xlabel, ylab=ylabel,
-        col = (1-(data[,cOutcome]))*1,cex.lab=1.2, cex.axis=1, las=1,pty='s',
-        xlim=rangeaxis , ylim=rangeaxis,pch="*")
-        abline(a=0,b=1, lwd=1,col=4)
-        title(labels[1],cex.main=1)
+	op <- par(mfrow=c(1,2),pty="s" )
+	plot(risk1,risk2,xlab= xlabel, ylab=ylabel,
+	col = (1-(data[,cOutcome]))*1,cex.lab=1.2, cex.axis=1, las=1,pty='s',
+	xlim=rangeaxis , ylim=rangeaxis,pch="*")
+	abline(a=0,b=1, lwd=1,col=4)
+	title(labels[1],cex.main=1)
 
-        plot(risk1,risk2, xlab= xlabel, ylab=ylabel,
-        col =(data[,cOutcome])*1, cex.lab=1.2, cex.axis=1, las=1,pty='s',
-        xlim=rangeaxis , ylim=rangeaxis,pch="*")
-        abline(a=0,b=1, lwd=1,col=4)
-        title(labels[2], cex.main=1)
-        par(op)
-        if (missing(plottype)) {plottype<- "jpg"}
-              if (!missing(fileplot))
-        savePlot(filename = fileplot,
-         type =plottype,
-         device = dev.cur(),
-         restoreConsole = TRUE)
+	plot(risk1,risk2, xlab= xlabel, ylab=ylabel,
+	col =(data[,cOutcome])*1, cex.lab=1.2, cex.axis=1, las=1,pty='s',
+	xlim=rangeaxis , ylim=rangeaxis,pch="*")
+	abline(a=0,b=1, lwd=1,col=4)
+	title(labels[2], cex.main=1)
+	par(op)
+	if (missing(plottype)) {plottype<- "jpg"}
+ 	      if (!missing(fileplot))
+	savePlot(filename = fileplot,
+	 type =plottype,
+	 device = dev.cur())
        }
 
-         tab<- cbind(risk1,risk2,data[,cOutcome])
-         tab <- as.table(tab)
+	 tab<- cbind(risk1,risk2,data[,cOutcome])
+	 tab <- as.table(tab)
        dimnames(tab)[[2]] <- c("Predicted risk 1","Predicted risk 2", "outcome")
-         if (!missing(filename))
-               write.table(tab,file=filename, row.names=TRUE,sep = "\t")
+	 if (!missing(filename))
+	       write.table(tab,file=filename, row.names=TRUE,sep = "\t")
   }
 #' Function for calibration plot and Hosmer-Lemeshow goodness of fit test.
 #' The function produces a calibration plot and provides Hosmer-Lemeshow
@@ -907,7 +908,7 @@
 #' @param ylabel  Label of y-axis. Specification of \code{ylabel} is optional. Default is "Observed risk".
 #' @param filename Name of the output file in which the calibration table is saved.
 #' The file is saved as a txt file in the working directory. When  no
-#' \code{filename} is specified, the output is not saved.
+#' \code{filename} is specified, the output is not saved. Example: filename="calibration.txt"
 #' @param fileplot Name of the file that contains the calibation plot.
 #'  The file is saved in the working directory in the format specified under \code{plottype}. Example:
 #' \code{fileplot="plotname"}. Note that the extension is not specified here.
@@ -955,7 +956,7 @@
 #'
 #' # compute calibration measures and produce calibration plot
 #'  plotCalibration(data=ExampleData, cOutcome=cOutcome, predRisk=predRisk,
-#'  groups=groups, rangeaxis=rangeaxis, filename="calibration.txt")
+#'  groups=groups, rangeaxis=rangeaxis)
 #'
 "plotCalibration" <-
 function( data, cOutcome, predRisk, groups, rangeaxis, plottitle,
@@ -969,8 +970,8 @@
  p=predRisk
  y=data[,cOutcome]
  if (length(unique(y))!=2) {
-                            stop(" The specified outcome is not a binary variable.\n")
-                            }
+			    stop(" The specified outcome is not a binary variable.\n")
+			    }
 else{
 
 matres	<-matrix(NA,nrow=groups,ncol=5)
@@ -993,14 +994,13 @@
 pval<- 1-pchisq(chisqr,df)
 lines(x =c(0,1),y=c(0,1))
  if (missing(plottype)) {plottype<- "jpg"}
-        if (!missing(fileplot))
+ 	if (!missing(fileplot))
       savePlot(filename = fileplot,
-         type =plottype,
-         device = dev.cur(),
-         restoreConsole = TRUE)
+	 type =plottype,
+	 device = dev.cur())
 
 if (!missing(filename))
-        write.table(matres,file=filename, row.names=TRUE,sep = "\t",dec=",")
+	write.table(matres,file=filename, row.names=TRUE,sep = "\t",dec=",")
   out <- list(Table_HLtest=matres,Chi_square = round(chisqr,3), df=df,
   p_value =round(pval,4))
   return(out)
@@ -1085,8 +1085,8 @@
   if (missing(ylabel)) {ylabel<- "Sensitivity"}
  if (class(predrisk) == "numeric") {predrisk<- cbind(predrisk)}
   a<-c(1:dim(predrisk)[2])
-        for(i in 1:dim(predrisk)[2])
-                {
+ 	for(i in 1:dim(predrisk)[2])
+		{
   rAllele <- rcorr.cens(predrisk[,i], data[,cOutcome], outx=FALSE)
   pred <- prediction(predrisk[,i], data[,cOutcome])
   perf <- performance(pred,"tpr","fpr")
@@ -1107,17 +1107,16 @@
  cat("AUC [95% CI] for the model",i, ": ", round(rAllele[1],3),
  "[", round(rAllele[1]-1.96/2*rAllele[3],3)," - ",
  round(rAllele[1]+1.96/2*rAllele[3],3), "] \n")
-        }
-          if (!missing(labels)){
-        legend( "bottomright",legend= labels, col=c(17:(16+dim(predrisk)[2])),
+  	}
+	  if (!missing(labels)){
+  	legend( "bottomright",legend= labels, col=c(17:(16+dim(predrisk)[2])),
     lty=c(1:(dim(predrisk)[2])),lwd =2,cex=1)
     }
  if (missing(plottype)) {plottype<- "jpg"}
-        if (!missing(fileplot))
+ 	if (!missing(fileplot))
       savePlot(filename = fileplot,
-         type =plottype,
-         device = dev.cur(),
-         restoreConsole = TRUE)
+	 type =plottype,
+	 device = dev.cur())
 
 }
 #' Function for predictiveness curve. The function creates a plot of cumulative percentage
@@ -1196,7 +1195,7 @@
 # if (missing(labels)) {labels <- c(1:dim(predrisk)[2])}
 a<-c(1:dim(predrisk)[2])
 for(i in 1:dim(predrisk)[2])
-                {
+		{
   x<-predrisk[,i]
    if (i==1)
    {
@@ -1235,17 +1234,16 @@
     mtext(ylab, side = 2, line = 2.5, cex = 1.2)
     invisible(data.frame(x = x, y = y))
    }
-        }
-                  if (!missing(labels)){
+  	}
+  		  if (!missing(labels)){
 legend("bottomright",legend= labels,  col = c(17:(16+dim(predrisk)[2])),
 lty=c(1:(dim(predrisk)[2])),lwd = 2,cex=1)
 }
  if (missing(plottype)) {plottype<- "jpg"}
-        if (!missing(fileplot))
+ 	if (!missing(fileplot))
       savePlot(filename = fileplot,
-         type =plottype,
-         device = dev.cur(),
-         restoreConsole = TRUE)
+	 type =plottype,
+	 device = dev.cur())
 
 
  }
@@ -1344,9 +1342,9 @@
    b<-round(seq(rangexaxis[1],rangexaxis[2],2*interval),2)
 
     for(i in 1:length(a))
-                {
-                if(any(a[i]==b)){a[i]<-a[i] }
-                else{a[i] <- ""}
+		{
+		if(any(a[i]==b)){a[i]<-a[i] }
+		else{a[i] <- ""}
     }
 
    dimnames(p)[[1]] <- a
@@ -1356,11 +1354,10 @@
     cex.lab = 1.2,cex.axis=1.1, las=1,space = c(0,.10))
 
   if (missing(plottype)) {plottype<- "jpg"}
-        if (!missing(fileplot))
+ 	if (!missing(fileplot))
       savePlot(filename = fileplot,
-         type =plottype,
-         device = dev.cur(),
-         restoreConsole = TRUE)
+	 type =plottype,
+	 device = dev.cur())
 
     }
 #' Function for reclassification table and statistics.
@@ -1376,7 +1373,7 @@
 #' IDI equal to \code{x\%} means that the difference in average
 #'   predicted risks between the individuals with and without the outcome
 #'   increased by \code{x\%} in the updated model.
-# The function requires predicted risks estimated by using two separate risk
+#' The function requires predicted risks estimated by using two separate risk
 #' models. Predicted risks can be obtained using the functions
 #' \code{\link{fitLogRegModel}} and \code{\link{predRisk}}
 #' or be imported from other methods or packages.
@@ -1384,9 +1381,9 @@
 #' @param data Data frame or matrix that includes the outcome and
 #' predictors variables.
 #' @param cOutcome  Column number of the outcome variable.
-#' @param predrisk1  Vector of predicted risks of all individuals using initial(old)
+#' @param predrisk1  Vector of predicted risks of all individuals using initial
 #' model.
-#' @param predrisk2  Vector of predicted risks of all individuals using updated(new)
+#' @param predrisk2  Vector of predicted risks of all individuals using updated
 #' model.
 #' @param cutoff  Cutoff values for risk categories.
 #' Define the cut-off values as \code{c(0,...,1)}.
@@ -1438,102 +1435,40 @@
 #'
 "reclassification" <-
 function(data,cOutcome,predrisk1,predrisk2, cutoff) {
+
+c1 <- cut(predrisk1,breaks = cutoff ,include.lowest=TRUE,right= FALSE)
+c2 <- cut(predrisk2,breaks = cutoff ,include.lowest=TRUE,right= FALSE)
+tabReclas <- table("Initial Model"=c1, "Updated Model"=c2)
 cat(" _________________________________________\n")
-  if (length(cutoff)==4) {
-                         cat(" cutoff values are",cutoff[2],",",cutoff[3],"\n" )
-                         label <- c("Low risk","Intermediate risk","High risk")
-                          }
-  else {
-         if (length(cutoff)==3) {
-                            cat(" cutoff value is",cutoff[2],"\n" )
-                            label <- c("Low risk","High risk")
-                            }
-         if (length(cutoff)==5) {
-cat(" cutoff values are",cutoff[2],",",cutoff[3],",",cutoff[4],"\n" )
-label <- c("Low risk","Intermediate-I risk","Intermediate-II risk","High risk")
-                          }
-         if((!length(cutoff)==3)&(!length(cutoff)==4)&(!length(cutoff)==5)) {
-stop(" \n 'cutoff' values number is either less than 1 or greater than 3" )
-             }
-       }
+cat(" \n     Reclassification table    \n")
 cat(" _________________________________________\n")
-c1 <- cut(predrisk1,breaks = cutoff, labels = label ,include.lowest=TRUE,right= FALSE)
-c2 <- cut(predrisk2,breaks = cutoff, labels = label ,include.lowest=TRUE,right= FALSE)
-tabReclas <- table(OldModel=c1, NewModel=c2)
-cat(" \n **    Reclassification table    **\n")
 
  ta<- table(c1, c2, data[,cOutcome])
- dimnames(ta)[[3]] <- c("Outcome-absent","Outcome-present")
- names(dimnames(ta)) <- c("Predicted Risks-1", "Predicted Risks-2")
 
-if(length(cutoff)==5) {
-                            cat ("\n \n Outcome: absent \n " )
-tab1 <- cbind(ta[,,1], ReclassifiedPercentage=round(c(ta[1,2,1]+
-                            ta[1,3,1]+ta[1,4,1],ta[2,1,1]+ta[2,3,1]+ta[2,4,1],
-                            ta[3,1,1]+ta[3,2,1]+ta[3,4,1],ta[4,1,1]+ta[4,2,1]
-                            +ta[4,3,1])/margin.table(ta[,,1], 1),2))
-                          names(dimnames(tab1)) <- c("Old model", " New model")
-                            print(tab1)
-                            cat ("\n \n Outcome: present \n " )
-tab2 <- cbind(ta[,,2], ReclassifiedPercentage=round(c(ta[1,2,2]+
-                            ta[1,3,2]+ta[1,4,2],ta[2,1,2]+ta[2,3,2]+ta[1,4,2],
-                            ta[3,1,2]+ta[3,2,2]+ta[3,4,2],ta[4,1,2]+ta[4,2,2]
-                            +ta[4,3,2])/margin.table(ta[,,2], 1),2))
-                           names(dimnames(tab2)) <- c("Old model", " New model")
-                            print(tab2)
-                            cat ("\n \n Total sample \n " )
-                            tab<- cbind(tabReclas, ReclassifiedPercentage=
-                           round(c(tabReclas[1,2]+tabReclas[1,3]+tabReclas[1,4],
-                            tabReclas[2,1]+tabReclas[2,3]+tabReclas[2,4],
-                            tabReclas[3,1]+tabReclas[3,2]+tabReclas[3,4],
-tabReclas[4,1]+tabReclas[4,2]+tabReclas[4,3])/margin.table(tabReclas, 1),2))
-                            names(dimnames(tab)) <- c("Old model", " New model")
-                            print(tab)
-                          }
+  cat ("\n Outcome: absent \n  \n" )
+  TabAbs <- ta[,,1]
+  tab1 <- cbind(TabAbs, " % reclassified"= round((rowSums(TabAbs)-diag(TabAbs))/rowSums(TabAbs),2)*100)
+  names(dimnames(tab1)) <- c("Initial Model", "Updated Model")
+  print(tab1)
 
-if(length(cutoff)==4) {
-                            cat ("\n \n Outcome: absent \n " )
-tab1 <- cbind(ta[,,1], ReclassifiedPercentage=round(c(ta[1,2,1]+
-ta[1,3,1],ta[2,1,1]+ta[2,3,1],ta[3,1,1]+ta[3,2,1])/margin.table(ta[,,1], 1),2))
-                          names(dimnames(tab1)) <- c("Old model", " New model")
-                            print(tab1)
-                            cat ("\n \n Outcome: present \n " )
-tab2 <- cbind(ta[,,2], ReclassifiedPercentage=round(c(ta[1,2,2]+
-ta[1,3,2],ta[2,1,2]+ta[2,3,2],ta[3,1,2]+ta[3,2,2])/margin.table(ta[,,2], 1),2))
-                          names(dimnames(tab2)) <- c("Old model", " New model")
-                            print(tab2)
-                            cat ("\n \n Total sample \n " )
-                            tab<- cbind(tabReclas, ReclassifiedPercentage=
-                          round(c(tabReclas[1,2]+tabReclas[1,3],tabReclas[2,1]+
-tabReclas[2,3],tabReclas[3,1]+tabReclas[3,2])/margin.table(tabReclas, 1),2))
-                          names(dimnames(tab)) <- c("Old model", " New model")
-                            print(tab)
-                          }
+  cat ("\n \n Outcome: present \n  \n" )
+  TabPre <- ta[,,2]
+  tab2 <- cbind(TabPre, " % reclassified"= round((rowSums(TabPre)-diag(TabPre))/rowSums(TabPre),2)*100)
+  names(dimnames(tab2)) <- c("Initial Model", "Updated Model")
+  print(tab2)
 
-if(length(cutoff)==3) {
-                            cat ("\n \n Outcome: absent \n" )
-                            tab1 <- cbind(ta[,,1], ReclassifiedPercentage=
-round(c(ta[1,2,1],ta[2,1,1])/margin.table(ta[,,1], 1),2))
-                          names(dimnames(tab1)) <- c("Old model", " New model")
-                            print(tab1)
-                            cat ("\n \n Outcome: present \n " )
-                            tab2 <- cbind(ta[,,2], ReclassifiedPercentage=
-round(c(ta[1,2,2],ta[2,1,2])/margin.table(ta[,,2], 1),2))
-names(dimnames(tab2)) <- c("Old model", " New model")
-                            print(tab2)
-                            cat ("\n \n Total sample \n " )
-                            tab<- cbind(tabReclas, ReclassifiedPercentage=
-round(c(tabReclas[1,2],tabReclas[2,1])/margin.table(tabReclas, 1),2))
-names(dimnames(tab)) <- c("Old model", " New model")
-                            print(tab)
-                          }
+  cat ("\n \n Combined Data \n  \n" )
+  Tab <- tabReclas
+  tab <- cbind(Tab, " % reclassified"= round((rowSums(Tab)-diag(Tab))/rowSums(Tab),2)*100)
+  names(dimnames(tab)) <- c("Initial Model", "Updated Model")
+  print(tab)
 
 cat(" _________________________________________\n")
 
-
   x<-improveProb(x1=as.numeric(c1)*(1/max(as.numeric(c1))),
   x2=as.numeric(c2)*(1/max(as.numeric(c2))), y=data[,cOutcome])
   y<-improveProb(x1=predrisk1, x2=predrisk2, y=data[,cOutcome])
+
 cat("\n NRI [95% CI]:", round(x$nri,4),"[",round(x$nri-1.96*x$se.nri,4),"-",
  round(x$nri+1.96*x$se.nri,4), "]", "; p-value:", round(2*pnorm(-abs(x$z.nri)),5), "\n" )
 
@@ -1623,8 +1558,8 @@
  mean(risk[data[,cOutcome]==0]),3))
 
  if (missing(plottype)) {plottype<- "jpg"}
-        if (!missing(fileplot))
-      savePlot(filename = fileplot,type =plottype,device = dev.cur(),restoreConsole = TRUE)
+ 	if (!missing(fileplot))
+      savePlot(filename = fileplot,type =plottype,device = dev.cur())
 return(p)
 }
 #' An example code to construct a risk model using logistic regression analysis.
@@ -1686,4 +1621,251 @@
   out<- list(riskModel1=riskmodel1, riskModel2=riskmodel2)
   return(out)
   }
-################################################################
+#' Function to construct a simulated dataset containing individual genotype
+#' data, estimated genetic risk and disease status.
+#' Construct a dataset that contains individual genotype data, genetic risk,
+#' and disease status for a hypothetical population.
+#' The dataset is constructed using simulation in such a way that the frequencies
+#' and odds ratios (ORs) of the genetic variants and the population disease risk
+#' computed from this dataset are the same as specified by the input parameters.
+#'
+#' The function will execute when the matrix with odds ratios and frequencies,
+#' population disease risk and the number of individuals are specified. \cr
+#'
+#' The simulation method is described in detail in the references. \cr
+#'
+#'
+#' The method assumes that (i) the combined effect of the genetic variants
+#' on disease risk follows a multiplicative (log additive) risk model;
+#' (ii) genetic variants inherit independently, that is no linkage disequilibrium
+#' between the variants; (iii) genetic variants have independent effects on the
+#' disease risk, which indicates no interaction among variants; and (iv) all
+#' genotypes and allele proportions are in Hardy-Weinberg equilibrium.
+#' Assumption (ii) and (iv) are used to generate the genotype data, and assumption
+#'(i), (ii) and (iii) are used to calculate disease risk.
+#'
+#'
+#' Simulating the dataset involves three steps: (1) modelling genotype data,
+#' (2) modelling disease risks, and (3) modelling disease status. Brief
+#' descriptions of these steps are as follows:
+#'
+#'
+#' (1) Modelling genotype data: For each variant the genotype
+#' frequencies are either specified or calculated from the allele frequencies
+#' using Hardy-Weinberg equilibrium. Then, the genotypes for each genetic
+#' variant are randomly distributed without replacement over all individuals.
+#'
+#'
+#' (2) Modelling disease risks: For the calculation of the individual disease
+#' risk, Bayes' theorem is used, which states that the posterior odds of disease
+#' are obtained by multiplying the prior odds by the likelihood ratio (LR) of
+#' the individual genotype data. The prior odds are calculated from the
+#' population disease risk or disease prevalence
+#' (prior odds= prior risk/ (1- prior risk)) and the posterior odds are converted
+#' back into disease risk (disease risk= posterior odds/ (1+ posterior odds)).
+#' Under the no linkage disequilibrium (LD) assumption, the LR of a genetic profile
+#' is obtained by multiplying the LRs of the single genotypes that are included in
+#' the risk model. The LR of a single genotype is calculated using frequencies
+#' and ORs of genetic variants and population disease risk. See references
+#' for more details.
+#'
+#'
+#' (3) Modelling disease status: To model disease status, we used a procedure
+#' that compares the estimated disease risk of each subject to a randomly drawn value
+#' between 0 and 1 from a uniform distribution. A subject is assigned to the
+#' group who will develop the disease when the disease risk is higher than the
+#' random value and to the group who will not develop the disease when the risk
+#' is lower than the random value.
+#'
+#' This procedure ensures that for each genomic profile, the percentage of
+#' people who will develop the disease equals the population disease risk
+#' associated with that profile, when the subgroup of individuals with that
+#' profile is sufficiently large.
+#'
+#'
+#' @param ORfreq Matrix with ORs and frequencies of the genetic variants.
+#' The matrix contains four columns in which the first two describe ORs and the
+#' last two describe the corresponding frequencies. The number of rows in this
+#' matrix is same as the number of genetic variants included. Genetic variants
+#' can be specified as per genotype, per allele, or as dominant/ recessive
+#' effect of the risk allele. When per genotype data are used, OR of the
+#' heterozygous and homozygous risk genotypes are mentioned in the first two
+#' columns and the corresponding genotype frequencies are mentioned in the last
+#' two columns. When per allele data are used, the OR and frequency of the risk
+#' allele are specified in the first and third column and the remaining two cells
+#' are coded as '1'.  Similarly, when dominant/ recessive effects of the risk
+#' alleles are used, the
+#' OR and frequency of the dominant/ recessive variant are specified in the first
+#' and third column, and the remaining two cells are coded as '0'.
+#' @param poprisk Population disease risk (expressed in proportion).
+#' @param popsize  Total number of individuals included in the dataset.
+#' @param filename Name of the file in which the dataset will be saved.
+#' The file is saved in the working directory as a txt file. When no filename
+#' is specified, the output is not saved.
+#'
+#'  @return
+#'   The function returns:
+#'   \item{Dataset}{A data frame or matrix that includes genotype data,
+#' genetic risk and disease status for a hypothetical population.
+#' The dataset contains (4 + number of genetic variants included) columns,
+#' in which the first column is the un-weighted risk score, which is the sum
+#' of the number of risk alleles for each individual, the third column is the
+#' estimated genetic risk, the forth column is the individual disease status expressed
+#' as '0' or '1', indicating without or with the outcome of interest, and the fifth until
+#' the end column are genotype data for the variants expressed as '0', '1' or '2',
+#' which indicate the number of risk alleles present in each individual for the genetic variants.}
+#'
+#'
+#' @keywords models
+#'
+#'
+#' @references Janssens AC, Aulchenko YS, Elefante S, Borsboom GJ, Steyerberg EW,
+#' van Duijn CM. Predictive testing for complex diseases using multiple genes:
+#' fact or fiction? Genet Med. 2006;8:395-400.
+#'
+#'
+#' Janssens AC, Moonesinghe R, Yang Q, Steyerberg EW, van Duijn CM, Khoury MJ.
+#' The impact of genotype frequencies on the clinical validity of genomic
+#' profiling for predicting common chronic diseases. Genet Med. 2007;9:528-35.
+#'
+#'
+#' van der Net JB, Janssens AC, Sijbrands EJ, Steyerberg EW. Value of genetic
+#' profiling for the prediction of coronary heart disease.
+#' Am Heart J. 2009;158:105-10.
+#'
+#'
+#' van Zitteren M, van der Net JB, Kundu S, Freedman AN, van Duijn CM,
+#' Janssens AC. Genome-based prediction of breast cancer risk in the general
+#' population: a modeling study based on meta-analyses of genetic associations.
+#' Cancer Epidemiol Biomarkers Prev. 2011;20:9-22.
+#'
+#'
+#' @examples
+#' # specify the matrix containing the ORs and frequencies of genetic variants
+#' # In this example we used per allele effects of the risk variants
+#' ORfreq<-cbind(c(1.35,1.20,1.24,1.16), rep(1,4), c(.41,.29,.28,.51),rep(1,4))
+#'
+#' # specify the population disease risk
+#'  popRisk <- 0.3
+#' # specify size of hypothetical population
+#'  popSize <- 10000
+#'
+#' # Obtain the simulated dataset
+#' Data <- simulatedDataset(ORfreq=ORfreq, poprisk=popRisk, popsize=popSize)
+#'
+#' # Obtain the AUC and produce ROC curve
+#' plotROC(data=Data, cOutcome=4, predrisk=Data[,3])
+#'
+"simulatedDataset" <- function(ORfreq, poprisk, popsize, filename)
+{
+if (missing(poprisk)) {stop("Population disease risk is not specified")}
[TRUNCATED]

To get the complete diff run:
    svnlook diff /svnroot/genabel -r 796


More information about the Genabel-commits mailing list