[Genabel-commits] r796 - in pkg/PredictABEL: . R man
noreply at r-forge.r-project.org
noreply at r-forge.r-project.org
Wed Oct 19 11:25:29 CEST 2011
Author: lckarssen
Date: 2011-10-19 11:25:26 +0200 (Wed, 19 Oct 2011)
New Revision: 796
Modified:
pkg/PredictABEL/DESCRIPTION
pkg/PredictABEL/R/PredictABEL.R
pkg/PredictABEL/man/PredictABEL-package.Rd
pkg/PredictABEL/man/plotCalibration.Rd
pkg/PredictABEL/man/plotDiscriminationBox.Rd
pkg/PredictABEL/man/plotPriorPosteriorRisk.Rd
pkg/PredictABEL/man/plotROC.Rd
pkg/PredictABEL/man/plotRiskDistribution.Rd
pkg/PredictABEL/man/predRisk.Rd
pkg/PredictABEL/man/reclassification.Rd
pkg/PredictABEL/man/simulatedDataset.Rd
Log:
Upload of Suman's latest changes to PredictABEL. Fine-tuning of the documentation. This version will be sent to CRAN as version 1.2.
Modified: pkg/PredictABEL/DESCRIPTION
===================================================================
--- pkg/PredictABEL/DESCRIPTION 2011-10-18 22:15:20 UTC (rev 795)
+++ pkg/PredictABEL/DESCRIPTION 2011-10-19 09:25:26 UTC (rev 796)
@@ -1,23 +1,27 @@
-Package: PredictABEL
-Title: Assessment of risk prediction models
-Version: 1.2
-Date: 2011-09-30
-Author: Suman Kundu, Yurii S. Aulchenko, A. Cecile J.W. Janssens
-Maintainer: Suman Kundu <s.kundu at erasmusmc.nl>, A. Cecile J.W. Janssens <a.janssens at erasmusmc.nl>
-Depends: R (>= 2.12.0), Hmisc, ROCR, epitools, PBSmodelling
-Suggests: GenABEL
-Description: PredictABEL includes functions to assess the performance of
- risk models. The package contains functions for the various measures that are
- used in empirical studies, including univariate and multivariate odds ratios
- (OR) of the predictors, the c-statistic (or area under the receiver operating
- characteristic (ROC) curve (AUC)), Hosmer-Lemeshow goodness of fit test,
- reclassification table, net reclassification improvement (NRI) and
- integrated discrimination improvement (IDI). Also included are functions
- to create plots, such as risk distributions, ROC curves, calibration plot,
- discrimination box plot and predictiveness curves. In addition to functions
- to assess the performance of risk models, the package includes functions to
- obtain weighted and unweighted risk scores as well as predicted risks using
- logistic regression analysis. These logistic regression functions are
- specifically written for models that include genetic variables, but they
- can also be applied to models that are based on non-genetic risk factors only.
-License: GPL (>= 2)
+Package: PredictABEL
+Title: Assessment of risk prediction models
+Version: 1.2
+Date: 2011-10-19
+Author: Suman Kundu, Yurii S. Aulchenko, A. Cecile J.W. Janssens
+Maintainer: Suman Kundu <s.kundu at erasmusmc.nl>, A. Cecile J.W. Janssens <a.janssens at erasmusmc.nl>
+Depends: R (>= 2.12.0), Hmisc, ROCR, epitools, PBSmodelling
+Suggests: GenABEL
+Description: PredictABEL includes functions to assess the performance of
+ risk models. The package contains functions for the various measures that are
+ used in empirical studies, including univariate and multivariate odds ratios
+ (OR) of the predictors, the c-statistic (or area under the receiver operating
+ characteristic (ROC) curve (AUC)), Hosmer-Lemeshow goodness of fit test,
+ reclassification table, net reclassification improvement (NRI) and
+ integrated discrimination improvement (IDI). Also included are functions
+ to create plots, such as risk distributions, ROC curves, calibration plot,
+ discrimination box plot and predictiveness curves. In addition to functions
+ to assess the performance of risk models, the package includes functions to
+ obtain weighted and unweighted risk scores as well as predicted risks using
+ logistic regression analysis. These logistic regression functions are
+ specifically written for models that include genetic variables, but they
+ can also be applied to models that are based on non-genetic risk factors only.
+ Finally, the package includes function to construct a simulated dataset that
+ contains individual genotype data, estimated genetic risk, and disease status,
+ used for the evaluation of genetic risk models.
+License: GPL (>= 2)
+Packaged: 2011-10-18 15:01:45 UTC; 488810
Modified: pkg/PredictABEL/R/PredictABEL.R
===================================================================
--- pkg/PredictABEL/R/PredictABEL.R 2011-10-18 22:15:20 UTC (rev 795)
+++ pkg/PredictABEL/R/PredictABEL.R 2011-10-19 09:25:26 UTC (rev 796)
@@ -95,11 +95,13 @@
#' (4) \code{PBSmodelling}, is used to produce predictiveness curve.
#'
#' @note The current version of the package includes the basic measures
-#' and plots that are used in the assessment of (genetic) risk prediction models.
+#' and plots that are used in the assessment of (genetic) risk prediction models and the
+#' function to construct a simulated dataset that contains individual genotype
+#' data, estimated genetic risk and disease status, used for the evaluation of
+#' genetic risk models (see Janssens et al, Genet Med 2006).
#' Planned extensions of the package include functions to construct risk
#' models using Cox Proportional Hazards analysis for prospective data and
-#' functions to construct simulated data for the evaluation of
-#' genetic risk models (see Janssens et al, Genet Med 2006).
+#' assess the performance of risk models for time-to-event data.
#'
#'
##' @author Suman Kundu
@@ -252,7 +254,8 @@
#' will be saved under \code{filename}. When \code{cID} is not specified, the output is not saved.
#' @param filename Name of the output file in which the ID number and
#' estimated predicted risks will be saved. The file is saved in the working
-#' directory as a txt file. When no \code{filename} is specified, the output is not saved.
+#' directory as a txt file. Example: filename="name.txt". When no \code{filename}
+#' is specified, the output is not saved.
#'
#'
#'
@@ -287,7 +290,7 @@
#' cGenPreds=cGenPred, cGenPredsCat=cGenPredCat)
#'
#' # obtain predicted risks
-#' predRisk <- predRisk(riskModel=riskmodel, filename="name.txt")
+#' predRisk <- predRisk(riskModel=riskmodel)
#'
"predRisk" <-
function(riskModel, data, cID, filename)
@@ -471,7 +474,7 @@
n1 <- matrix (nrow=dim(p)[2], ncol=12)
for (i in 1:dim(p)[2])
- {
+ {
s<-table(p[,i],o)
if (dim(s)[1]==1) {s <- rbind(s,c(0,0));s <- rbind(s,c(0,0))}
if (dim(s)[1]==2) {s <- rbind(s,c(0,0))}
@@ -487,40 +490,40 @@
d<-oddsratio.wald(c)$measure
e<-oddsratio.wald(c)$data
- m1[i,1]<- colnames(p)[i]
- m1[i,2]<- ( b[1,2])
- m1[i,3]<- ( round((b[1,2]/b[4,2])*100 ,1))
- m1[i,4]<- ( b[2,2])
- m1[i,5]<- ( round((b[2,2]/b[4,2])*100 ,1))
- m1[i,6]<- ( b[3,2])
- m1[i,7]<- ( round((b[3,2]/b[4,2])*100 ,1))
- m1[i,8]<- ( b[1,1])
- m1[i,9]<- ( round((b[1,1]/b[4,1])*100 ,1))
- m1[i,10]<- ( b[2,1])
- m1[i,11]<- ( round((b[2,1]/b[4,1])*100 ,1))
- m1[i,12]<- ( b[3,1])
- m1[i,13]<- ( round((b[3,1]/b[4,1])*100 ,1))
- m1[i,14]<- ( round(a[2,1] ,2))
- m1[i,15]<- ( round(a[2,2] ,2))
- m1[i,16]<- ( round(a[2,3] ,2))
- m1[i,17]<- ( round(a[3,1] ,2))
- m1[i,18]<- ( round(a[3,2] ,2))
- m1[i,19]<- ( round(a[3,3] ,2))
+ m1[i,1]<- colnames(p)[i]
+ m1[i,2]<- ( b[1,2])
+ m1[i,3]<- ( round((b[1,2]/b[4,2])*100 ,1))
+ m1[i,4]<- ( b[2,2])
+ m1[i,5]<- ( round((b[2,2]/b[4,2])*100 ,1))
+ m1[i,6]<- ( b[3,2])
+ m1[i,7]<- ( round((b[3,2]/b[4,2])*100 ,1))
+ m1[i,8]<- ( b[1,1])
+ m1[i,9]<- ( round((b[1,1]/b[4,1])*100 ,1))
+ m1[i,10]<- ( b[2,1])
+ m1[i,11]<- ( round((b[2,1]/b[4,1])*100 ,1))
+ m1[i,12]<- ( b[3,1])
+ m1[i,13]<- ( round((b[3,1]/b[4,1])*100 ,1))
+ m1[i,14]<- ( round(a[2,1] ,2))
+ m1[i,15]<- ( round(a[2,2] ,2))
+ m1[i,16]<- ( round(a[2,3] ,2))
+ m1[i,17]<- ( round(a[3,1] ,2))
+ m1[i,18]<- ( round(a[3,2] ,2))
+ m1[i,19]<- ( round(a[3,3] ,2))
- n1[i,1]<- colnames(p)[i]
- n1[i,2]<- (e[1,2])
- n1[i,3]<- round((e[1,2]/e[3,2])*100,1)
- n1[i,4]<- e[2,2]
- n1[i,5]<- round((e[2,2]/e[3,2])*100,1)
- n1[i,6]<- e[1,1]
- n1[i,7]<- round((e[1,1]/e[3,1])*100,1)
- n1[i,8]<- e[2,1]
- n1[i,9]<- round((e[2,1]/e[3,1])*100,1)
- n1[i,10]<- round(d[2,1] ,2)
- n1[i,11]<- round(d[2,2] ,2)
- n1[i,12]<- round(d[2,3] ,2)
+ n1[i,1]<- colnames(p)[i]
+ n1[i,2]<- (e[1,2])
+ n1[i,3]<- round((e[1,2]/e[3,2])*100,1)
+ n1[i,4]<- e[2,2]
+ n1[i,5]<- round((e[2,2]/e[3,2])*100,1)
+ n1[i,6]<- e[1,1]
+ n1[i,7]<- round((e[1,1]/e[3,1])*100,1)
+ n1[i,8]<- e[2,1]
+ n1[i,9]<- round((e[2,1]/e[3,1])*100,1)
+ n1[i,10]<- round(d[2,1] ,2)
+ n1[i,11]<- round(d[2,2] ,2)
+ n1[i,12]<- round(d[2,3] ,2)
- }
+ }
m1 <- as.table(m1)
dimnames(m1)[[1]] <- c(1:dim(p)[2])
@@ -529,7 +532,7 @@
"CI-Low","CI-high")
names(dimnames(m1)) <- c("", " Genotype frequencies for Cases & Controls, and OR(95% CI)")
if (!missing(filenameGeno))
- write.table(m1,file=filenameGeno, row.names = FALSE,sep = "\t")
+ write.table(m1,file=filenameGeno, row.names = FALSE,sep = "\t")
n1 <- as.table(n1)
dimnames(n1)[[1]] <- c(1:dim(p)[2])
@@ -540,7 +543,7 @@
write.table(n1,file=filenameAllele, row.names = FALSE,sep = "\t")
p<- list(Genotype =m1, Allelic=n1)
- return(p)
+ return(p)
}
#' Function to obtain multivariate odds ratios from a logistic regression model.
#' The function estimates multivariate (adjusted) odds ratios (ORs) with
@@ -621,7 +624,7 @@
round(Upper.CI ,4),"p-value"=round((sum.coef)[, 4],4))
if (!missing(filename))
- write.table(tab,file=filename, row.names=TRUE,sep = "\t")
+ write.table(tab,file=filename, row.names=TRUE,sep = "\t")
B <- mean((riskModel$y) * (1-predict(riskModel, type="response"))^2 +
(1-riskModel$y) * (predict(riskModel, type="response"))^2)
@@ -732,16 +735,15 @@
abline(b,col = 8)
if (missing(plottype)) {plottype<- "jpg"}
- if (!missing(fileplot))
+ if (!missing(fileplot))
savePlot(filename = fileplot,
- type =plottype,
- device = dev.cur(),
- restoreConsole = TRUE)
+ type =plottype,
+ device = dev.cur())
tab<- cbind(riskScore,predRisk)
tab <- as.table(tab)
dimnames(tab)[[2]] <- c("Risk Score", "Predicted risk ")
if (!missing(filename))
- write.table(tab,file=filename, row.names=TRUE,sep = "\t")
+ write.table(tab,file=filename, row.names=TRUE,sep = "\t")
}
#' Function to plot posterior risks against prior risks.
#'
@@ -841,46 +843,45 @@
risk2 <- posteriorrisk
if( plotAll)
- {
- plot(risk1,risk2,xlab= xlabel, ylab=ylabel,main=plottitle,
- cex.lab=1.2, cex.axis=1.1, las=1,pty='s',xlim=rangeaxis ,
- ylim=rangeaxis,pch=20)
- abline(a=0,b=1, lwd=1,col=8)
- if (missing(plottype)) {plottype<- "jpg"}
- if (!missing(fileplot))
- savePlot(filename = fileplot,
- type =plottype,
- device = dev.cur(),
- restoreConsole = TRUE)
+ {
+ plot(risk1,risk2,xlab= xlabel, ylab=ylabel,main=plottitle,
+ cex.lab=1.2, cex.axis=1.1, las=1,pty='s',xlim=rangeaxis ,
+ ylim=rangeaxis,pch=20)
+ abline(a=0,b=1, lwd=1,col=8)
+ if (missing(plottype)) {plottype<- "jpg"}
+ if (!missing(fileplot))
+ savePlot(filename = fileplot,
+ type =plottype,
+ device = dev.cur()
+ )
}
else
{
- op <- par(mfrow=c(1,2),pty="s" )
- plot(risk1,risk2,xlab= xlabel, ylab=ylabel,
- col = (1-(data[,cOutcome]))*1,cex.lab=1.2, cex.axis=1, las=1,pty='s',
- xlim=rangeaxis , ylim=rangeaxis,pch="*")
- abline(a=0,b=1, lwd=1,col=4)
- title(labels[1],cex.main=1)
+ op <- par(mfrow=c(1,2),pty="s" )
+ plot(risk1,risk2,xlab= xlabel, ylab=ylabel,
+ col = (1-(data[,cOutcome]))*1,cex.lab=1.2, cex.axis=1, las=1,pty='s',
+ xlim=rangeaxis , ylim=rangeaxis,pch="*")
+ abline(a=0,b=1, lwd=1,col=4)
+ title(labels[1],cex.main=1)
- plot(risk1,risk2, xlab= xlabel, ylab=ylabel,
- col =(data[,cOutcome])*1, cex.lab=1.2, cex.axis=1, las=1,pty='s',
- xlim=rangeaxis , ylim=rangeaxis,pch="*")
- abline(a=0,b=1, lwd=1,col=4)
- title(labels[2], cex.main=1)
- par(op)
- if (missing(plottype)) {plottype<- "jpg"}
- if (!missing(fileplot))
- savePlot(filename = fileplot,
- type =plottype,
- device = dev.cur(),
- restoreConsole = TRUE)
+ plot(risk1,risk2, xlab= xlabel, ylab=ylabel,
+ col =(data[,cOutcome])*1, cex.lab=1.2, cex.axis=1, las=1,pty='s',
+ xlim=rangeaxis , ylim=rangeaxis,pch="*")
+ abline(a=0,b=1, lwd=1,col=4)
+ title(labels[2], cex.main=1)
+ par(op)
+ if (missing(plottype)) {plottype<- "jpg"}
+ if (!missing(fileplot))
+ savePlot(filename = fileplot,
+ type =plottype,
+ device = dev.cur())
}
- tab<- cbind(risk1,risk2,data[,cOutcome])
- tab <- as.table(tab)
+ tab<- cbind(risk1,risk2,data[,cOutcome])
+ tab <- as.table(tab)
dimnames(tab)[[2]] <- c("Predicted risk 1","Predicted risk 2", "outcome")
- if (!missing(filename))
- write.table(tab,file=filename, row.names=TRUE,sep = "\t")
+ if (!missing(filename))
+ write.table(tab,file=filename, row.names=TRUE,sep = "\t")
}
#' Function for calibration plot and Hosmer-Lemeshow goodness of fit test.
#' The function produces a calibration plot and provides Hosmer-Lemeshow
@@ -907,7 +908,7 @@
#' @param ylabel Label of y-axis. Specification of \code{ylabel} is optional. Default is "Observed risk".
#' @param filename Name of the output file in which the calibration table is saved.
#' The file is saved as a txt file in the working directory. When no
-#' \code{filename} is specified, the output is not saved.
+#' \code{filename} is specified, the output is not saved. Example: filename="calibration.txt"
#' @param fileplot Name of the file that contains the calibation plot.
#' The file is saved in the working directory in the format specified under \code{plottype}. Example:
#' \code{fileplot="plotname"}. Note that the extension is not specified here.
@@ -955,7 +956,7 @@
#'
#' # compute calibration measures and produce calibration plot
#' plotCalibration(data=ExampleData, cOutcome=cOutcome, predRisk=predRisk,
-#' groups=groups, rangeaxis=rangeaxis, filename="calibration.txt")
+#' groups=groups, rangeaxis=rangeaxis)
#'
"plotCalibration" <-
function( data, cOutcome, predRisk, groups, rangeaxis, plottitle,
@@ -969,8 +970,8 @@
p=predRisk
y=data[,cOutcome]
if (length(unique(y))!=2) {
- stop(" The specified outcome is not a binary variable.\n")
- }
+ stop(" The specified outcome is not a binary variable.\n")
+ }
else{
matres <-matrix(NA,nrow=groups,ncol=5)
@@ -993,14 +994,13 @@
pval<- 1-pchisq(chisqr,df)
lines(x =c(0,1),y=c(0,1))
if (missing(plottype)) {plottype<- "jpg"}
- if (!missing(fileplot))
+ if (!missing(fileplot))
savePlot(filename = fileplot,
- type =plottype,
- device = dev.cur(),
- restoreConsole = TRUE)
+ type =plottype,
+ device = dev.cur())
if (!missing(filename))
- write.table(matres,file=filename, row.names=TRUE,sep = "\t",dec=",")
+ write.table(matres,file=filename, row.names=TRUE,sep = "\t",dec=",")
out <- list(Table_HLtest=matres,Chi_square = round(chisqr,3), df=df,
p_value =round(pval,4))
return(out)
@@ -1085,8 +1085,8 @@
if (missing(ylabel)) {ylabel<- "Sensitivity"}
if (class(predrisk) == "numeric") {predrisk<- cbind(predrisk)}
a<-c(1:dim(predrisk)[2])
- for(i in 1:dim(predrisk)[2])
- {
+ for(i in 1:dim(predrisk)[2])
+ {
rAllele <- rcorr.cens(predrisk[,i], data[,cOutcome], outx=FALSE)
pred <- prediction(predrisk[,i], data[,cOutcome])
perf <- performance(pred,"tpr","fpr")
@@ -1107,17 +1107,16 @@
cat("AUC [95% CI] for the model",i, ": ", round(rAllele[1],3),
"[", round(rAllele[1]-1.96/2*rAllele[3],3)," - ",
round(rAllele[1]+1.96/2*rAllele[3],3), "] \n")
- }
- if (!missing(labels)){
- legend( "bottomright",legend= labels, col=c(17:(16+dim(predrisk)[2])),
+ }
+ if (!missing(labels)){
+ legend( "bottomright",legend= labels, col=c(17:(16+dim(predrisk)[2])),
lty=c(1:(dim(predrisk)[2])),lwd =2,cex=1)
}
if (missing(plottype)) {plottype<- "jpg"}
- if (!missing(fileplot))
+ if (!missing(fileplot))
savePlot(filename = fileplot,
- type =plottype,
- device = dev.cur(),
- restoreConsole = TRUE)
+ type =plottype,
+ device = dev.cur())
}
#' Function for predictiveness curve. The function creates a plot of cumulative percentage
@@ -1196,7 +1195,7 @@
# if (missing(labels)) {labels <- c(1:dim(predrisk)[2])}
a<-c(1:dim(predrisk)[2])
for(i in 1:dim(predrisk)[2])
- {
+ {
x<-predrisk[,i]
if (i==1)
{
@@ -1235,17 +1234,16 @@
mtext(ylab, side = 2, line = 2.5, cex = 1.2)
invisible(data.frame(x = x, y = y))
}
- }
- if (!missing(labels)){
+ }
+ if (!missing(labels)){
legend("bottomright",legend= labels, col = c(17:(16+dim(predrisk)[2])),
lty=c(1:(dim(predrisk)[2])),lwd = 2,cex=1)
}
if (missing(plottype)) {plottype<- "jpg"}
- if (!missing(fileplot))
+ if (!missing(fileplot))
savePlot(filename = fileplot,
- type =plottype,
- device = dev.cur(),
- restoreConsole = TRUE)
+ type =plottype,
+ device = dev.cur())
}
@@ -1344,9 +1342,9 @@
b<-round(seq(rangexaxis[1],rangexaxis[2],2*interval),2)
for(i in 1:length(a))
- {
- if(any(a[i]==b)){a[i]<-a[i] }
- else{a[i] <- ""}
+ {
+ if(any(a[i]==b)){a[i]<-a[i] }
+ else{a[i] <- ""}
}
dimnames(p)[[1]] <- a
@@ -1356,11 +1354,10 @@
cex.lab = 1.2,cex.axis=1.1, las=1,space = c(0,.10))
if (missing(plottype)) {plottype<- "jpg"}
- if (!missing(fileplot))
+ if (!missing(fileplot))
savePlot(filename = fileplot,
- type =plottype,
- device = dev.cur(),
- restoreConsole = TRUE)
+ type =plottype,
+ device = dev.cur())
}
#' Function for reclassification table and statistics.
@@ -1376,7 +1373,7 @@
#' IDI equal to \code{x\%} means that the difference in average
#' predicted risks between the individuals with and without the outcome
#' increased by \code{x\%} in the updated model.
-# The function requires predicted risks estimated by using two separate risk
+#' The function requires predicted risks estimated by using two separate risk
#' models. Predicted risks can be obtained using the functions
#' \code{\link{fitLogRegModel}} and \code{\link{predRisk}}
#' or be imported from other methods or packages.
@@ -1384,9 +1381,9 @@
#' @param data Data frame or matrix that includes the outcome and
#' predictors variables.
#' @param cOutcome Column number of the outcome variable.
-#' @param predrisk1 Vector of predicted risks of all individuals using initial(old)
+#' @param predrisk1 Vector of predicted risks of all individuals using initial
#' model.
-#' @param predrisk2 Vector of predicted risks of all individuals using updated(new)
+#' @param predrisk2 Vector of predicted risks of all individuals using updated
#' model.
#' @param cutoff Cutoff values for risk categories.
#' Define the cut-off values as \code{c(0,...,1)}.
@@ -1438,102 +1435,40 @@
#'
"reclassification" <-
function(data,cOutcome,predrisk1,predrisk2, cutoff) {
+
+c1 <- cut(predrisk1,breaks = cutoff ,include.lowest=TRUE,right= FALSE)
+c2 <- cut(predrisk2,breaks = cutoff ,include.lowest=TRUE,right= FALSE)
+tabReclas <- table("Initial Model"=c1, "Updated Model"=c2)
cat(" _________________________________________\n")
- if (length(cutoff)==4) {
- cat(" cutoff values are",cutoff[2],",",cutoff[3],"\n" )
- label <- c("Low risk","Intermediate risk","High risk")
- }
- else {
- if (length(cutoff)==3) {
- cat(" cutoff value is",cutoff[2],"\n" )
- label <- c("Low risk","High risk")
- }
- if (length(cutoff)==5) {
-cat(" cutoff values are",cutoff[2],",",cutoff[3],",",cutoff[4],"\n" )
-label <- c("Low risk","Intermediate-I risk","Intermediate-II risk","High risk")
- }
- if((!length(cutoff)==3)&(!length(cutoff)==4)&(!length(cutoff)==5)) {
-stop(" \n 'cutoff' values number is either less than 1 or greater than 3" )
- }
- }
+cat(" \n Reclassification table \n")
cat(" _________________________________________\n")
-c1 <- cut(predrisk1,breaks = cutoff, labels = label ,include.lowest=TRUE,right= FALSE)
-c2 <- cut(predrisk2,breaks = cutoff, labels = label ,include.lowest=TRUE,right= FALSE)
-tabReclas <- table(OldModel=c1, NewModel=c2)
-cat(" \n ** Reclassification table **\n")
ta<- table(c1, c2, data[,cOutcome])
- dimnames(ta)[[3]] <- c("Outcome-absent","Outcome-present")
- names(dimnames(ta)) <- c("Predicted Risks-1", "Predicted Risks-2")
-if(length(cutoff)==5) {
- cat ("\n \n Outcome: absent \n " )
-tab1 <- cbind(ta[,,1], ReclassifiedPercentage=round(c(ta[1,2,1]+
- ta[1,3,1]+ta[1,4,1],ta[2,1,1]+ta[2,3,1]+ta[2,4,1],
- ta[3,1,1]+ta[3,2,1]+ta[3,4,1],ta[4,1,1]+ta[4,2,1]
- +ta[4,3,1])/margin.table(ta[,,1], 1),2))
- names(dimnames(tab1)) <- c("Old model", " New model")
- print(tab1)
- cat ("\n \n Outcome: present \n " )
-tab2 <- cbind(ta[,,2], ReclassifiedPercentage=round(c(ta[1,2,2]+
- ta[1,3,2]+ta[1,4,2],ta[2,1,2]+ta[2,3,2]+ta[1,4,2],
- ta[3,1,2]+ta[3,2,2]+ta[3,4,2],ta[4,1,2]+ta[4,2,2]
- +ta[4,3,2])/margin.table(ta[,,2], 1),2))
- names(dimnames(tab2)) <- c("Old model", " New model")
- print(tab2)
- cat ("\n \n Total sample \n " )
- tab<- cbind(tabReclas, ReclassifiedPercentage=
- round(c(tabReclas[1,2]+tabReclas[1,3]+tabReclas[1,4],
- tabReclas[2,1]+tabReclas[2,3]+tabReclas[2,4],
- tabReclas[3,1]+tabReclas[3,2]+tabReclas[3,4],
-tabReclas[4,1]+tabReclas[4,2]+tabReclas[4,3])/margin.table(tabReclas, 1),2))
- names(dimnames(tab)) <- c("Old model", " New model")
- print(tab)
- }
+ cat ("\n Outcome: absent \n \n" )
+ TabAbs <- ta[,,1]
+ tab1 <- cbind(TabAbs, " % reclassified"= round((rowSums(TabAbs)-diag(TabAbs))/rowSums(TabAbs),2)*100)
+ names(dimnames(tab1)) <- c("Initial Model", "Updated Model")
+ print(tab1)
-if(length(cutoff)==4) {
- cat ("\n \n Outcome: absent \n " )
-tab1 <- cbind(ta[,,1], ReclassifiedPercentage=round(c(ta[1,2,1]+
-ta[1,3,1],ta[2,1,1]+ta[2,3,1],ta[3,1,1]+ta[3,2,1])/margin.table(ta[,,1], 1),2))
- names(dimnames(tab1)) <- c("Old model", " New model")
- print(tab1)
- cat ("\n \n Outcome: present \n " )
-tab2 <- cbind(ta[,,2], ReclassifiedPercentage=round(c(ta[1,2,2]+
-ta[1,3,2],ta[2,1,2]+ta[2,3,2],ta[3,1,2]+ta[3,2,2])/margin.table(ta[,,2], 1),2))
- names(dimnames(tab2)) <- c("Old model", " New model")
- print(tab2)
- cat ("\n \n Total sample \n " )
- tab<- cbind(tabReclas, ReclassifiedPercentage=
- round(c(tabReclas[1,2]+tabReclas[1,3],tabReclas[2,1]+
-tabReclas[2,3],tabReclas[3,1]+tabReclas[3,2])/margin.table(tabReclas, 1),2))
- names(dimnames(tab)) <- c("Old model", " New model")
- print(tab)
- }
+ cat ("\n \n Outcome: present \n \n" )
+ TabPre <- ta[,,2]
+ tab2 <- cbind(TabPre, " % reclassified"= round((rowSums(TabPre)-diag(TabPre))/rowSums(TabPre),2)*100)
+ names(dimnames(tab2)) <- c("Initial Model", "Updated Model")
+ print(tab2)
-if(length(cutoff)==3) {
- cat ("\n \n Outcome: absent \n" )
- tab1 <- cbind(ta[,,1], ReclassifiedPercentage=
-round(c(ta[1,2,1],ta[2,1,1])/margin.table(ta[,,1], 1),2))
- names(dimnames(tab1)) <- c("Old model", " New model")
- print(tab1)
- cat ("\n \n Outcome: present \n " )
- tab2 <- cbind(ta[,,2], ReclassifiedPercentage=
-round(c(ta[1,2,2],ta[2,1,2])/margin.table(ta[,,2], 1),2))
-names(dimnames(tab2)) <- c("Old model", " New model")
- print(tab2)
- cat ("\n \n Total sample \n " )
- tab<- cbind(tabReclas, ReclassifiedPercentage=
-round(c(tabReclas[1,2],tabReclas[2,1])/margin.table(tabReclas, 1),2))
-names(dimnames(tab)) <- c("Old model", " New model")
- print(tab)
- }
+ cat ("\n \n Combined Data \n \n" )
+ Tab <- tabReclas
+ tab <- cbind(Tab, " % reclassified"= round((rowSums(Tab)-diag(Tab))/rowSums(Tab),2)*100)
+ names(dimnames(tab)) <- c("Initial Model", "Updated Model")
+ print(tab)
cat(" _________________________________________\n")
-
x<-improveProb(x1=as.numeric(c1)*(1/max(as.numeric(c1))),
x2=as.numeric(c2)*(1/max(as.numeric(c2))), y=data[,cOutcome])
y<-improveProb(x1=predrisk1, x2=predrisk2, y=data[,cOutcome])
+
cat("\n NRI [95% CI]:", round(x$nri,4),"[",round(x$nri-1.96*x$se.nri,4),"-",
round(x$nri+1.96*x$se.nri,4), "]", "; p-value:", round(2*pnorm(-abs(x$z.nri)),5), "\n" )
@@ -1623,8 +1558,8 @@
mean(risk[data[,cOutcome]==0]),3))
if (missing(plottype)) {plottype<- "jpg"}
- if (!missing(fileplot))
- savePlot(filename = fileplot,type =plottype,device = dev.cur(),restoreConsole = TRUE)
+ if (!missing(fileplot))
+ savePlot(filename = fileplot,type =plottype,device = dev.cur())
return(p)
}
#' An example code to construct a risk model using logistic regression analysis.
@@ -1686,4 +1621,251 @@
out<- list(riskModel1=riskmodel1, riskModel2=riskmodel2)
return(out)
}
-################################################################
+#' Function to construct a simulated dataset containing individual genotype
+#' data, estimated genetic risk and disease status.
+#' Construct a dataset that contains individual genotype data, genetic risk,
+#' and disease status for a hypothetical population.
+#' The dataset is constructed using simulation in such a way that the frequencies
+#' and odds ratios (ORs) of the genetic variants and the population disease risk
+#' computed from this dataset are the same as specified by the input parameters.
+#'
+#' The function will execute when the matrix with odds ratios and frequencies,
+#' population disease risk and the number of individuals are specified. \cr
+#'
+#' The simulation method is described in detail in the references. \cr
+#'
+#'
+#' The method assumes that (i) the combined effect of the genetic variants
+#' on disease risk follows a multiplicative (log additive) risk model;
+#' (ii) genetic variants inherit independently, that is no linkage disequilibrium
+#' between the variants; (iii) genetic variants have independent effects on the
+#' disease risk, which indicates no interaction among variants; and (iv) all
+#' genotypes and allele proportions are in Hardy-Weinberg equilibrium.
+#' Assumption (ii) and (iv) are used to generate the genotype data, and assumption
+#'(i), (ii) and (iii) are used to calculate disease risk.
+#'
+#'
+#' Simulating the dataset involves three steps: (1) modelling genotype data,
+#' (2) modelling disease risks, and (3) modelling disease status. Brief
+#' descriptions of these steps are as follows:
+#'
+#'
+#' (1) Modelling genotype data: For each variant the genotype
+#' frequencies are either specified or calculated from the allele frequencies
+#' using Hardy-Weinberg equilibrium. Then, the genotypes for each genetic
+#' variant are randomly distributed without replacement over all individuals.
+#'
+#'
+#' (2) Modelling disease risks: For the calculation of the individual disease
+#' risk, Bayes' theorem is used, which states that the posterior odds of disease
+#' are obtained by multiplying the prior odds by the likelihood ratio (LR) of
+#' the individual genotype data. The prior odds are calculated from the
+#' population disease risk or disease prevalence
+#' (prior odds= prior risk/ (1- prior risk)) and the posterior odds are converted
+#' back into disease risk (disease risk= posterior odds/ (1+ posterior odds)).
+#' Under the no linkage disequilibrium (LD) assumption, the LR of a genetic profile
+#' is obtained by multiplying the LRs of the single genotypes that are included in
+#' the risk model. The LR of a single genotype is calculated using frequencies
+#' and ORs of genetic variants and population disease risk. See references
+#' for more details.
+#'
+#'
+#' (3) Modelling disease status: To model disease status, we used a procedure
+#' that compares the estimated disease risk of each subject to a randomly drawn value
+#' between 0 and 1 from a uniform distribution. A subject is assigned to the
+#' group who will develop the disease when the disease risk is higher than the
+#' random value and to the group who will not develop the disease when the risk
+#' is lower than the random value.
+#'
+#' This procedure ensures that for each genomic profile, the percentage of
+#' people who will develop the disease equals the population disease risk
+#' associated with that profile, when the subgroup of individuals with that
+#' profile is sufficiently large.
+#'
+#'
+#' @param ORfreq Matrix with ORs and frequencies of the genetic variants.
+#' The matrix contains four columns in which the first two describe ORs and the
+#' last two describe the corresponding frequencies. The number of rows in this
+#' matrix is same as the number of genetic variants included. Genetic variants
+#' can be specified as per genotype, per allele, or as dominant/ recessive
+#' effect of the risk allele. When per genotype data are used, OR of the
+#' heterozygous and homozygous risk genotypes are mentioned in the first two
+#' columns and the corresponding genotype frequencies are mentioned in the last
+#' two columns. When per allele data are used, the OR and frequency of the risk
+#' allele are specified in the first and third column and the remaining two cells
+#' are coded as '1'. Similarly, when dominant/ recessive effects of the risk
+#' alleles are used, the
+#' OR and frequency of the dominant/ recessive variant are specified in the first
+#' and third column, and the remaining two cells are coded as '0'.
+#' @param poprisk Population disease risk (expressed in proportion).
+#' @param popsize Total number of individuals included in the dataset.
+#' @param filename Name of the file in which the dataset will be saved.
+#' The file is saved in the working directory as a txt file. When no filename
+#' is specified, the output is not saved.
+#'
+#' @return
+#' The function returns:
+#' \item{Dataset}{A data frame or matrix that includes genotype data,
+#' genetic risk and disease status for a hypothetical population.
+#' The dataset contains (4 + number of genetic variants included) columns,
+#' in which the first column is the un-weighted risk score, which is the sum
+#' of the number of risk alleles for each individual, the third column is the
+#' estimated genetic risk, the forth column is the individual disease status expressed
+#' as '0' or '1', indicating without or with the outcome of interest, and the fifth until
+#' the end column are genotype data for the variants expressed as '0', '1' or '2',
+#' which indicate the number of risk alleles present in each individual for the genetic variants.}
+#'
+#'
+#' @keywords models
+#'
+#'
+#' @references Janssens AC, Aulchenko YS, Elefante S, Borsboom GJ, Steyerberg EW,
+#' van Duijn CM. Predictive testing for complex diseases using multiple genes:
+#' fact or fiction? Genet Med. 2006;8:395-400.
+#'
+#'
+#' Janssens AC, Moonesinghe R, Yang Q, Steyerberg EW, van Duijn CM, Khoury MJ.
+#' The impact of genotype frequencies on the clinical validity of genomic
+#' profiling for predicting common chronic diseases. Genet Med. 2007;9:528-35.
+#'
+#'
+#' van der Net JB, Janssens AC, Sijbrands EJ, Steyerberg EW. Value of genetic
+#' profiling for the prediction of coronary heart disease.
+#' Am Heart J. 2009;158:105-10.
+#'
+#'
+#' van Zitteren M, van der Net JB, Kundu S, Freedman AN, van Duijn CM,
+#' Janssens AC. Genome-based prediction of breast cancer risk in the general
+#' population: a modeling study based on meta-analyses of genetic associations.
+#' Cancer Epidemiol Biomarkers Prev. 2011;20:9-22.
+#'
+#'
+#' @examples
+#' # specify the matrix containing the ORs and frequencies of genetic variants
+#' # In this example we used per allele effects of the risk variants
+#' ORfreq<-cbind(c(1.35,1.20,1.24,1.16), rep(1,4), c(.41,.29,.28,.51),rep(1,4))
+#'
+#' # specify the population disease risk
+#' popRisk <- 0.3
+#' # specify size of hypothetical population
+#' popSize <- 10000
+#'
+#' # Obtain the simulated dataset
+#' Data <- simulatedDataset(ORfreq=ORfreq, poprisk=popRisk, popsize=popSize)
+#'
+#' # Obtain the AUC and produce ROC curve
+#' plotROC(data=Data, cOutcome=4, predrisk=Data[,3])
+#'
+"simulatedDataset" <- function(ORfreq, poprisk, popsize, filename)
+{
+if (missing(poprisk)) {stop("Population disease risk is not specified")}
[TRUNCATED]
To get the complete diff run:
svnlook diff /svnroot/genabel -r 796
More information about the Genabel-commits
mailing list