[Genabel-commits] r789 - in pkg/PredictABEL: . R man
noreply at r-forge.r-project.org
noreply at r-forge.r-project.org
Fri Sep 30 12:36:11 CEST 2011
Author: lckarssen
Date: 2011-09-30 12:36:10 +0200 (Fri, 30 Sep 2011)
New Revision: 789
Added:
pkg/PredictABEL/R/simulatedDataset.R
pkg/PredictABEL/man/simulatedDataset.Rd
Removed:
pkg/PredictABEL/R/simulation_codes.R
Modified:
pkg/PredictABEL/DESCRIPTION
pkg/PredictABEL/R/PredictABEL.R
Log:
Fixed a couple of minor build probems in PredictABEL.
Modified: pkg/PredictABEL/DESCRIPTION
===================================================================
--- pkg/PredictABEL/DESCRIPTION 2011-09-30 10:02:34 UTC (rev 788)
+++ pkg/PredictABEL/DESCRIPTION 2011-09-30 10:36:10 UTC (rev 789)
@@ -1,27 +1,23 @@
Package: PredictABEL
Title: Assessment of risk prediction models
-Version: 1.1.1
-Date: 2011-02-09
+Version: 1.2
+Date: 2011-09-30
Author: Suman Kundu, Yurii S. Aulchenko, A. Cecile J.W. Janssens
-Maintainer: Suman Kundu <s.kundu at erasmusmc.nl>,
-A. Cecile J.W. Janssens <a.janssens at erasmusmc.nl>
-Depends: R (>= 2.9.0), Hmisc, ROCR, epitools, PBSmodelling
+Maintainer: Suman Kundu <s.kundu at erasmusmc.nl>, A. Cecile J.W. Janssens <a.janssens at erasmusmc.nl>
+Depends: R (>= 2.12.0), Hmisc, ROCR, epitools, PBSmodelling
Suggests: GenABEL
-Description: PredictABEL includes functions to assess the performance of
-risk models. The package contains functions for the various measures that are
-used in empirical studies, including univariate and multivariate odds ratios
+Description: PredictABEL includes functions to assess the performance of
+ risk models. The package contains functions for the various measures that are
+ used in empirical studies, including univariate and multivariate odds ratios
(OR) of the predictors, the c-statistic (or area under the receiver operating
- characteristic (ROC) curve (AUC)), Hosmer-Lemeshow goodness of fit test,
- reclassification table, net reclassification improvement (NRI) and
- integrated discrimination improvement (IDI). Also included are functions
- to create plots, such as risk distributions, ROC curves, calibration plot,
- discrimination box plot and predictiveness curves. In addition to functions
- to assess the performance of risk models, the package includes functions to
- obtain weighted and unweighted risk scores as well as predicted risks using
- logistic regression analysis. These logistic regression functions are
- specifically written for models that include genetic variables, but they
- can also be applied to models that are based on non-genetic risk factors only.
+ characteristic (ROC) curve (AUC)), Hosmer-Lemeshow goodness of fit test,
+ reclassification table, net reclassification improvement (NRI) and
+ integrated discrimination improvement (IDI). Also included are functions
+ to create plots, such as risk distributions, ROC curves, calibration plot,
+ discrimination box plot and predictiveness curves. In addition to functions
+ to assess the performance of risk models, the package includes functions to
+ obtain weighted and unweighted risk scores as well as predicted risks using
+ logistic regression analysis. These logistic regression functions are
+ specifically written for models that include genetic variables, but they
+ can also be applied to models that are based on non-genetic risk factors only.
License: GPL (>= 2)
-
-
-
Modified: pkg/PredictABEL/R/PredictABEL.R
===================================================================
--- pkg/PredictABEL/R/PredictABEL.R 2011-09-30 10:02:34 UTC (rev 788)
+++ pkg/PredictABEL/R/PredictABEL.R 2011-09-30 10:36:10 UTC (rev 789)
@@ -1,115 +1,115 @@
-#' An R package for the analysis of (genetic) risk prediction studies.
+#' An R package for the analysis of (genetic) risk prediction studies.
#'
-#' Fueled by the substantial gene discoveries from genome-wide association
-#' studies, there is increasing interest in investigating the predictive
-#' ability of genetic risk models. To assess the performance of genetic risk
-#' models, PredictABEL includes functions for the various measures and plots
-#' that have been used in empirical studies, including univariate and
-#' multivariate odds ratios (ORs) of the predictors, the c-statistic (or AUC),
-#' Hosmer-Lemeshow goodness of fit test, reclassification table, net
-#' reclassification improvement (NRI) and integrated discrimination
-#' improvement (IDI). The plots included are the ROC plot, calibration plot,
-#' discrimination box plot, predictiveness curve, and several risk distributions.
+#' Fueled by the substantial gene discoveries from genome-wide association
+#' studies, there is increasing interest in investigating the predictive
+#' ability of genetic risk models. To assess the performance of genetic risk
+#' models, PredictABEL includes functions for the various measures and plots
+#' that have been used in empirical studies, including univariate and
+#' multivariate odds ratios (ORs) of the predictors, the c-statistic (or AUC),
+#' Hosmer-Lemeshow goodness of fit test, reclassification table, net
+#' reclassification improvement (NRI) and integrated discrimination
+#' improvement (IDI). The plots included are the ROC plot, calibration plot,
+#' discrimination box plot, predictiveness curve, and several risk distributions.
#'
#'
-#' These functions can be applied to predicted risks that are obtained using
-#' logistic regression analysis, to weighted or unweighted risk scores, for
-#' which the functions are included in this package. The functions can also be
-#' used to assess risks or risk scores that are constructed using other methods, e.g., Cox Proportional
-#' Hazards regression analysis, which are not included in the current version.
-#' Risks obtained from other methods can be imported into R for assessment
+#' These functions can be applied to predicted risks that are obtained using
+#' logistic regression analysis, to weighted or unweighted risk scores, for
+#' which the functions are included in this package. The functions can also be
+#' used to assess risks or risk scores that are constructed using other methods, e.g., Cox Proportional
+#' Hazards regression analysis, which are not included in the current version.
+#' Risks obtained from other methods can be imported into R for assessment
#' of the predictive performance.
#'
#'
-#' The functions to construct the risk models using logistic regression analyses
-#' are specifically written for models that include genetic variables,
-#' eventually in addition to non-genetic factors, but they can also be applied
-#' to construct models that are based on non-genetic risk factors only. \cr
+#' The functions to construct the risk models using logistic regression analyses
+#' are specifically written for models that include genetic variables,
+#' eventually in addition to non-genetic factors, but they can also be applied
+#' to construct models that are based on non-genetic risk factors only. \cr
#'
#'
-#' Before using the functions \code{\link{fitLogRegModel}} for constructing
-#' a risk model or \code{\link{riskScore}} for computing risk
-#' scores, the following checks on the dataset are advisable to be done:
-#'
-#' (1) Missing values: The logistic regression analyses and computation of
-#' the risk score are done only for subjects that have no missing data. In case
-#' of missing values, individuals with missing data can be removed from the
-#' dataset or imputation strategies can be used to fill in missing data.
-#' Subjects with missing data can be removed with the R function \code{na.omit}
-#' (available in \code{stats} package).
-#' Example: \code{DataFileNew <- na.omit(DataFile)}
+#' Before using the functions \code{\link{fitLogRegModel}} for constructing
+#' a risk model or \code{\link{riskScore}} for computing risk
+#' scores, the following checks on the dataset are advisable to be done:
+#'
+#' (1) Missing values: The logistic regression analyses and computation of
+#' the risk score are done only for subjects that have no missing data. In case
+#' of missing values, individuals with missing data can be removed from the
+#' dataset or imputation strategies can be used to fill in missing data.
+#' Subjects with missing data can be removed with the R function \code{na.omit}
+#' (available in \code{stats} package).
+#' Example: \code{DataFileNew <- na.omit(DataFile)}
#' will make a new dataset (\code{DataFileNew}) with no missing values;
#'
-#'
-#' (2) Multicollinearity: When there is strong correlation between the
-#' predictor variables, regression coefficients may be estimated imprecisely
-#' and risks scores may be biased because the assumption of independent effects
-#' is violated. In genetic risk prediction studies, problems with
-#' multicollinearity should be expected when single nucleotide polymorphisms
-#' (SNPs) located in the same gene are
-#' in strong linkage disequilibrium (LD). For SNPs in LD it is common to select
-#' the variant with the lowest p-value in the model;
-#'
-#'
-#' (3) Outliers: When the data contain significant outliers, either clinical
-#' variables with extreme values of the outcomes or extreme values resulting
-#' from errors in the data entry, these may impact the construction of the risk models and
-#' computation of the risks scores. Data should be carefully checked and outliers
+#'
+#' (2) Multicollinearity: When there is strong correlation between the
+#' predictor variables, regression coefficients may be estimated imprecisely
+#' and risks scores may be biased because the assumption of independent effects
+#' is violated. In genetic risk prediction studies, problems with
+#' multicollinearity should be expected when single nucleotide polymorphisms
+#' (SNPs) located in the same gene are
+#' in strong linkage disequilibrium (LD). For SNPs in LD it is common to select
+#' the variant with the lowest p-value in the model;
+#'
+#'
+#' (3) Outliers: When the data contain significant outliers, either clinical
+#' variables with extreme values of the outcomes or extreme values resulting
+#' from errors in the data entry, these may impact the construction of the risk models and
+#' computation of the risks scores. Data should be carefully checked and outliers
#' need to be removed or replaced, if justified;
#'
-#' (4) Recoding of data: In the computation of unweighted risk scores, it is assumed
-#' that the genetic variants are coded \code{0,1,2}
-#' representing the number of alleles carried. When variants
-#' are coded \code{0,1} representing a dominant or recessive effect of the alleles,
+#' (4) Recoding of data: In the computation of unweighted risk scores, it is assumed
+#' that the genetic variants are coded \code{0,1,2}
+#' representing the number of alleles carried. When variants
+#' are coded \code{0,1} representing a dominant or recessive effect of the alleles,
#' the variables need to be recoded before unweighted risk scores can be computed. \cr
#'
#'
-#' To import data into R several alternative strategies can be used. Use the
-#' \code{Hmisc} package for importing SPSS and SAS data into R.
-#' Use "\code{ExampleData <- read.table("DataName.txt", header=T, sep="\t")}" for text
-#' files where variable names are included as column headers and data are
-#' separated by tabs.
-#' Use "\code{ExampleData <- read.table("Name.csv", sep=",", header=T)}"
-#' for comma-separated files with variable names as column headers.
-#' Use \code{"setwd(dir)"} to set the working directory to "dir". The datafile
-#' needs to be present in the working directory. \cr
+#' To import data into R several alternative strategies can be used. Use the
+#' \code{Hmisc} package for importing SPSS and SAS data into R.
+#' Use "\code{ExampleData <- read.table("DataName.txt", header=T, sep="\t")}" for text
+#' files where variable names are included as column headers and data are
+#' separated by tabs.
+#' Use "\code{ExampleData <- read.table("Name.csv", sep=",", header=T)}"
+#' for comma-separated files with variable names as column headers.
+#' Use \code{"setwd(dir)"} to set the working directory to "dir". The datafile
+#' needs to be present in the working directory. \cr
#'
#'
#' To export datafiles from R tables to a tab-delimited textfile with the first row as
-#' the name of the variables,
-#' use "\code{write.table(R_Table, file="Name.txt", row.names=FALSE, sep="\t")}" and
+#' the name of the variables,
+#' use "\code{write.table(R_Table, file="Name.txt", row.names=FALSE, sep="\t")}" and
#' when a comma-separated textfile is requested and variable names are provided in the first row,
-#' use "\code{write.table(R_Table, file="Name.csv", row.names=FALSE, sep=",")}".
+#' use "\code{write.table(R_Table, file="Name.csv", row.names=FALSE, sep=",")}".
#' When the directory is not specified, the file will be
-#' saved in the working directory. For exporting R data into SPSS, SAS and
+#' saved in the working directory. For exporting R data into SPSS, SAS and
#' Stata data, use functions in the the \code{foreign} package. \cr
#'
#' Several functions in this package depend on other R packages:
-#'
+#'
#' (1) \code{Hmisc}, is used to compute NRI and IDI;
-#'
+#'
#' (2) \code{ROCR}, is used to produce ROC plots;
-#'
+#'
#' (3) \code{epitools}, is used to compute univariate odds ratios;
-##'
+##'
#' (4) \code{PBSmodelling}, is used to produce predictiveness curve.
#'
-#' @note The current version of the package includes the basic measures
-#' and plots that are used in the assessment of (genetic) risk prediction models.
-#' Planned extensions of the package include functions to construct risk
-#' models using Cox Proportional Hazards analysis for prospective data and
-#' functions to construct simulated data for the evaluation of
+#' @note The current version of the package includes the basic measures
+#' and plots that are used in the assessment of (genetic) risk prediction models.
+#' Planned extensions of the package include functions to construct risk
+#' models using Cox Proportional Hazards analysis for prospective data and
+#' functions to construct simulated data for the evaluation of
#' genetic risk models (see Janssens et al, Genet Med 2006).
-#'
#'
-##' @author Suman Kundu
+#'
+##' @author Suman Kundu
##'
##' Yurii S. Aulchenko
##'
-##' A. Cecile J.W. Janssens
+##' A. Cecile J.W. Janssens
##'
##' @keywords package
-#'
+#'
#' @references S Kundu, YS Aulchenko, CM van Duijn, ACJW Janssens. PredictABEL:
#' an R package for the assessment of risk prediction models.
#' Eur J Epidemiol. 2011;26:261-4. \cr
@@ -123,7 +123,7 @@
#' P Kraft, S Melillo, CJ O'Donnell, MJ Pencina, D Ransohoff, SD Schully,
#' D Seminara, DM Winn, CF Wright, CM van Duijn, J Little, MJ Khoury.
#' Strengthening the reporting of genetic risk prediction studies
-#' (GRIPS)-Elaboration and explanation. Eur J Epidemiol. 2011;26:313-37. \cr
+#' (GRIPS)-Elaboration and explanation. Eur J Epidemiol. 2011;26:313-37. \cr
#'
#' Aulchenko YS, Ripke S, Isaacs A, van Duijn CM. GenABEL: an R package for genome-wide
#' association analysis. Bioinformatics 2007;23(10):1294-6.
@@ -133,41 +133,41 @@
#'
#' The function fits a standard GLM function for the logistic regression model.
#' This function can be used to construct a logistic regression model based on genetic and non-genetic
-#' predictors. The function also allows to enter the genetic predictors
-#' as a single risk score. For that purpose, the function requires that
+#' predictors. The function also allows to enter the genetic predictors
+#' as a single risk score. For that purpose, the function requires that
#' the dataset additionally includes the risk score.
#' A new dataset can be constructed using
-#' "\code{NewExampleData <- cbind(ExampleData,riskScore)}".
+#' "\code{NewExampleData <- cbind(ExampleData,riskScore)}".
#' The genetic risk scores can be obtained
#' using the function \code{\link{riskScore}} in this package or be
-#' imported from other methods.
+#' imported from other methods.
#'
-#' @param data Data frame or matrix that includes the outcome and
-#' predictor variables.
+#' @param data Data frame or matrix that includes the outcome and
+#' predictor variables.
#' @param cOutcome Column number of the outcome variable. \code{cOutcome=2}
#' means that the second column of the dataset is the outcome variable.
-#' To fit the logistic regression model, the outcome variable needs to be
+#' To fit the logistic regression model, the outcome variable needs to be
#' (re)coded as \code{1} for the presence and \code{0} for the absence of the
-#' outcome of interest.
-#' @param cNonGenPreds Column numbers of the non-genetic predictors that are
-#' included in the model. An example to denote column numbers is
-#' \code{c(3,6:8,10)}. Choose \code{c(0)} when no non-genetic predictors
+#' outcome of interest.
+#' @param cNonGenPreds Column numbers of the non-genetic predictors that are
+#' included in the model. An example to denote column numbers is
+#' \code{c(3,6:8,10)}. Choose \code{c(0)} when no non-genetic predictors
#' are considered.
-#' @param cNonGenPredsCat Column numbers of the non-genetic predictors that
-#' are entered as categorical variables in the model. When non-genetic
-#' predictors are not specified as being categorical they are treated as
-#' continuous variables in the model. If no non-genetic predictors are
-#' categorical, denote \code{c(0)}.
+#' @param cNonGenPredsCat Column numbers of the non-genetic predictors that
+#' are entered as categorical variables in the model. When non-genetic
+#' predictors are not specified as being categorical they are treated as
+#' continuous variables in the model. If no non-genetic predictors are
+#' categorical, denote \code{c(0)}.
#' @param cGenPreds Column numbers of the genetic predictors or genetic risk score.
#' Denote \code{c(0)}
-#' when the prediction model does not consider
+#' when the prediction model does not consider
#' genetic predictors or genetic risk score.
-#' @param cGenPredsCat Column numbers of the genetic predictors that are
-#' entered as categorical variables in the model. When SNPs are considered as
-#' categorical, the model
-#' will estimate effects per genotype. Otherwise, SNPs are considered as
-#' continuous variables for which the model will estimate an allelic effect.
-#' Choose c(0) when no genetic predictors are considered as categorical
+#' @param cGenPredsCat Column numbers of the genetic predictors that are
+#' entered as categorical variables in the model. When SNPs are considered as
+#' categorical, the model
+#' will estimate effects per genotype. Otherwise, SNPs are considered as
+#' continuous variables for which the model will estimate an allelic effect.
+#' Choose c(0) when no genetic predictors are considered as categorical
#' or when genetic predictors are entered as a risk score into the model.
#'
#' @return No value returned.
@@ -177,24 +177,24 @@
#'
#' @seealso \code{\link{predRisk}}, \code{\link{ORmultivariate}}, \code{\link{riskScore}}
#' @examples
-#' # specify dataset with outcome and predictor variables
+#' # specify dataset with outcome and predictor variables
#' data(ExampleData)
#' # specify column number of outcome variable
-#' cOutcome <- 2
+#' cOutcome <- 2
#' # specify column numbers of non-genetic predictors
#' cNonGenPred <- c(3:10)
-#' # specify column numbers of non-genetic predictors that are categorical
+#' # specify column numbers of non-genetic predictors that are categorical
#' cNonGenPredCat <- c(6:8)
#' # specify column numbers of genetic predictors
#' cGenPred <- c(11,13:16)
#' # specify column numbers of genetic predictors that are categorical
-#' cGenPredCat <- c(0)
+#' cGenPredCat <- c(0)
#'
#' # fit logistic regression model
-#' riskmodel <- fitLogRegModel(data=ExampleData, cOutcome=cOutcome,
-#' cNonGenPreds=cNonGenPred, cNonGenPredsCat=cNonGenPredCat,
+#' riskmodel <- fitLogRegModel(data=ExampleData, cOutcome=cOutcome,
+#' cNonGenPreds=cNonGenPred, cNonGenPredsCat=cNonGenPredCat,
#' cGenPreds=cGenPred, cGenPredsCat=cGenPredCat)
-#'
+#'
#' # show summary details for the fitted risk model
#' summary(riskmodel)
#'
@@ -239,19 +239,19 @@
return(model)
}
-#' Function to compute predicted risks for all individuals in the dataset.
-#'
-#' The function computes predicted risks from a specified logistic regression model.
-#' The function \code{\link{fitLogRegModel}} can be used to construct such a model.
+#' Function to compute predicted risks for all individuals in the dataset.
#'
-#' @param riskModel Name of logistic regression model that can be fitted using
+#' The function computes predicted risks from a specified logistic regression model.
+#' The function \code{\link{fitLogRegModel}} can be used to construct such a model.
+#'
+#' @param riskModel Name of logistic regression model that can be fitted using
#' the function \code{\link{fitLogRegModel}}.
-#' @param data Data frame or matrix that includes the outcome, ID number and
-#' predictor variables.
-#' @param cID Column number of ID variable. The ID number and predicted risks
-#' will be saved under \code{filename}. When \code{cID} is not specified, the output is not saved.
-#' @param filename Name of the output file in which the ID number and
-#' estimated predicted risks will be saved. The file is saved in the working
+#' @param data Data frame or matrix that includes the outcome, ID number and
+#' predictor variables.
+#' @param cID Column number of ID variable. The ID number and predicted risks
+#' will be saved under \code{filename}. When \code{cID} is not specified, the output is not saved.
+#' @param filename Name of the output file in which the ID number and
+#' estimated predicted risks will be saved. The file is saved in the working
#' directory as a txt file. When no \code{filename} is specified, the output is not saved.
#'
#'
@@ -262,7 +262,7 @@
#' @keywords htest
#'
#'
-#' @seealso \code{\link{fitLogRegModel}}, \code{\link{plotCalibration}},
+#' @seealso \code{\link{fitLogRegModel}}, \code{\link{plotCalibration}},
#' \code{\link{plotROC}}, \code{\link{plotPriorPosteriorRisk}}
#'
#' @examples
@@ -286,34 +286,34 @@
#' cNonGenPreds=cNonGenPred, cNonGenPredsCat=cNonGenPredCat,
#' cGenPreds=cGenPred, cGenPredsCat=cGenPredCat)
#'
-#' # obtain predicted risks
-#' predRisk <- predRisk(riskModel=riskmodel, filename="name.txt")
+#' # obtain predicted risks
+#' predRisk <- predRisk(riskModel=riskmodel, filename="name.txt")
#'
-"predRisk" <-
+"predRisk" <-
function(riskModel, data, cID, filename)
{
if (any(class(riskModel) == "glm"))
{
predrisk <- predict(riskModel, type="response")
- }
-else
+ }
+else
{
-stop("The argument 'riskModel' should be a (GLM)model")
+stop("The argument 'riskModel' should be a (GLM)model")
}
-
+
if (!missing(data)&& !missing(cID)&& !missing(filename))
{tab <- cbind(ID=data[, cID],PredRisk=predrisk)
write.table(tab, file=filename, row.names = FALSE,sep = "\t")
}
return(predrisk)
- }
-#' Function to compute genetic risk scores. The function computes unweighted
-#' or weighted genetic risk scores. The relative effects (or weights) of
-#' genetic variants can either come from beta coefficients of a risk model
+ }
+#' Function to compute genetic risk scores. The function computes unweighted
+#' or weighted genetic risk scores. The relative effects (or weights) of
+#' genetic variants can either come from beta coefficients of a risk model
#' or from a vector of beta coefficients imported into R, e.g., when beta cofficients are obtained from meta-analysis.
-#'
#'
-#' The function calculates unweighted
+#'
+#' The function calculates unweighted
#' or weighted genetic risk scores. The unweighted genetic risk score is a simple
#' risk allele count assuming that all alleles have the same effect. For this
#' calculation, it is required that the genetic variables are coded as the number of risk
@@ -322,28 +322,28 @@
#' is reversed. The weighted risk score is a sum of the number of risk alleles
#' multiplied by their beta coefficients.
#'
-#' The beta coefficients can come from two different sources, either beta coefficients of a risk model
+#' The beta coefficients can come from two different sources, either beta coefficients of a risk model
#' or a vector of beta coefficients imported into R, e.g., when beta cofficients are obtained from meta-analysis.
-#' This vector of beta coefficients
+#' This vector of beta coefficients
#' should be a named vector containing the same names as mentioned in genetic variants.
#' A logistic regression model can be constructed using \code{\link{fitLogRegModel}}
#' from this package.
-#'
-#' @note When a vector of beta coefficients is imported, it should be checked
-#' whether the DNA strands and the coding of the risk alleles are the same
+#'
+#' @note When a vector of beta coefficients is imported, it should be checked
+#' whether the DNA strands and the coding of the risk alleles are the same
#' as in the study data. The functions are available in the package \code{GenABEL}
-#' to accurately compute risk scores when the DNA strands are different or the risk
-#' alleles are coded differently in the study data and the data used in meta-analysis.
+#' to accurately compute risk scores when the DNA strands are different or the risk
+#' alleles are coded differently in the study data and the data used in meta-analysis.
#'
-#' @param weights The vector that includes the weights given to the genetic
+#' @param weights The vector that includes the weights given to the genetic
#' variants. See details for more informations.
-#' @param data Data frame or matrix that includes the outcome
-#' and predictors variables.
-#' @param cGenPreds Column numbers of the genetic variables on the basis of
+#' @param data Data frame or matrix that includes the outcome
+#' and predictors variables.
+#' @param cGenPreds Column numbers of the genetic variables on the basis of
#' which the risk score is computed.
#' @param Type Specification of the type of risk scores that will be computed.
-#' Type can be weighted (\code{Type="weighted"}) or
-#' unweighted (\code{Type="unweighted"}).
+#' Type can be weighted (\code{Type="weighted"}) or
+#' unweighted (\code{Type="unweighted"}).
#'
#' @return The function returns a vector of risk scores.
#'
@@ -354,7 +354,7 @@
#' @seealso \code{\link{plotRiskDistribution}}, \code{\link{plotRiskscorePredrisk}}
#' @examples
#' # specify dataset with outcome and predictor variables
-#' data(ExampleData)
+#' data(ExampleData)
#' # specify column numbers of genetic predictors
#' cGenPred <- c(11:16)
#'
@@ -363,16 +363,16 @@
#' # called 'ExampleModels', which is described on page 4-5
#' riskmodel <- ExampleModels()$riskModel2
#'
-#' # compute unweighted risk scores
-#' riskScore <- riskScore(weights=riskmodel, data=ExampleData,
-#' cGenPreds=cGenPred, Type="unweighted")
+#' # compute unweighted risk scores
+#' riskScore <- riskScore(weights=riskmodel, data=ExampleData,
+#' cGenPreds=cGenPred, Type="unweighted")
#'
-"riskScore" <-
+"riskScore" <-
function(weights, data, cGenPreds, Type )
{
riskModel <- weights
x <- data[, cGenPreds]
-if (any(class(riskModel) == "glm"))
+if (any(class(riskModel) == "glm"))
{
if(! setequal(intersect(names(riskModel$coef),colnames(x)),colnames(x)))
{
@@ -381,9 +381,9 @@
else
{
y <- riskModel$coef[intersect(names(riskModel$coef),colnames(x))]
- }
+ }
}
-
+
else if(is.vector(riskModel))
{
if (length(names(riskModel))!= length(riskModel))
@@ -393,14 +393,14 @@
if( setequal(intersect(names(riskModel),colnames(x)),colnames(x)))
{
y <- riskModel[intersect(colnames(x),names(riskModel))]
- }
+ }
else
{
stop("Beta coefficient vector does not contain all the genetic variants")
}
-
+
}
-
+
else
{
stop("'weights' argument should either be a model or a named beta vector" )
@@ -408,7 +408,7 @@
if(Type=="weighted")
{
-wrs<- y %*% t(x)
+wrs<- y %*% t(x)
return(as.vector(wrs))
}
else if (Type=="unweighted")
@@ -422,47 +422,47 @@
}
#' Function to compute univariate ORs for genetic predictors. The function computes the univariate ORs with 95\% CIs for genetic predictors.
#'
-#' The function computes the univariate ORs with 95\% CIs for the specified
-#' genetic variants both per allele and per genotype. The ORs are saved with the data from which they are
-#' calculated. Genotype frequencies are provided for
-#' persons with and without the outcome
-#' of interest. The genotype or allele that is coded as \code{'0'} is considered
-#' as the reference to computes the ORs.
-#'
-#' @param data Data frame or matrix that includes the outcome and
-#' predictors variables.
-#' @param cOutcome Column number of the outcome variable. \code{cOutcome=2}
+#' The function computes the univariate ORs with 95\% CIs for the specified
+#' genetic variants both per allele and per genotype. The ORs are saved with the data from which they are
+#' calculated. Genotype frequencies are provided for
+#' persons with and without the outcome
+#' of interest. The genotype or allele that is coded as \code{'0'} is considered
+#' as the reference to computes the ORs.
+#'
+#' @param data Data frame or matrix that includes the outcome and
+#' predictors variables.
+#' @param cOutcome Column number of the outcome variable. \code{cOutcome=2}
#' means that the second column of the dataset is the outcome variable.
-#' @param cGenPreds Column numbers of genetic variables for which the ORs
+#' @param cGenPreds Column numbers of genetic variables for which the ORs
#' are calculated.
-#' @param filenameGeno Name of the output file in which the univariate ORs
-#' and frequencies per genotype will be saved. The file is saved in the working directory as
+#' @param filenameGeno Name of the output file in which the univariate ORs
+#' and frequencies per genotype will be saved. The file is saved in the working directory as
#' a txt file. When no \code{filenameGeno} is specified, the output is not saved.
-#' @param filenameAllele Name of the output file in which the univariate ORs and
-#' frequencies per allele will be saved. The file is saved in the working
+#' @param filenameAllele Name of the output file in which the univariate ORs and
+#' frequencies per allele will be saved. The file is saved in the working
#' directory as a txt file. When no \code{filenameAllele} is specified, the output is not saved.
-#'
-#' @return The function returns two different tables. One table contains genotype frequencies
-#' and univariate ORs with 95\% CIs and the other contains allele frequencies and
+#'
+#' @return The function returns two different tables. One table contains genotype frequencies
+#' and univariate ORs with 95\% CIs and the other contains allele frequencies and
#' univariate ORs with 95\% CIs.
#'
#'
#' @keywords manip
-#'
#'
-#' @seealso \code{\link{ORmultivariate}}
+#'
+#' @seealso \code{\link{ORmultivariate}}
#' @examples
-#' # specify dataset with outcome and predictor variables
+#' # specify dataset with outcome and predictor variables
#' data(ExampleData)
#' # specify column number of the outcome variable
-#' cOutcome <- 2
+#' cOutcome <- 2
#' # specify column numbers of genetic predictors
#' cGenPreds <- c(11:13,16)
#'
#' # compute univariate ORs
-#' ORunivariate(data=ExampleData, cOutcome=cOutcome, cGenPreds=cGenPreds,
+#' ORunivariate(data=ExampleData, cOutcome=cOutcome, cGenPreds=cGenPreds,
#' filenameGeno="GenoOR.txt", filenameAllele="AlleleOR.txt")
-#'
+#'
"ORunivariate" <- function(data, cOutcome,cGenPreds,filenameGeno, filenameAllele )
{
p<- data[,cGenPreds ] # p : A table of Genotype data for all SNP's
@@ -529,7 +529,7 @@
"CI-Low","CI-high")
names(dimnames(m1)) <- c("", " Genotype frequencies for Cases & Controls, and OR(95% CI)")
if (!missing(filenameGeno))
- write.table(m1,file=filenameGeno, row.names = FALSE,sep = "\t")
+ write.table(m1,file=filenameGeno, row.names = FALSE,sep = "\t")
n1 <- as.table(n1)
dimnames(n1)[[1]] <- c(1:dim(p)[2])
@@ -542,32 +542,32 @@
p<- list(Genotype =m1, Allelic=n1)
return(p)
}
-#' Function to obtain multivariate odds ratios from a logistic regression model.
-#' The function estimates multivariate (adjusted) odds ratios (ORs) with
-#' 95\% confidence intervals (CIs) for all the genetic and non-genetic variables
-#' in the risk model.
+#' Function to obtain multivariate odds ratios from a logistic regression model.
+#' The function estimates multivariate (adjusted) odds ratios (ORs) with
+#' 95\% confidence intervals (CIs) for all the genetic and non-genetic variables
+#' in the risk model.
#'
-#' The function requires that first a logistic regression
-#' model is fitted either by using \code{GLM} function or the function
-#' \code{\link{fitLogRegModel}}. In addition to the multivariate ORs,
-#' the function returns summary statistics of model performance, namely the Brier
-#' score and the Nagelkerke's \eqn{R^2} value.
-#' The Brier score quantifies the accuracy of risk predictions by comparing
-#' predicted risks with observed outcomes at individual level (where outcome
-#' values are either 0 or 1). The Nagelkerke's \eqn{R^2} value indicates the percentage of variation
+#' The function requires that first a logistic regression
[TRUNCATED]
To get the complete diff run:
svnlook diff /svnroot/genabel -r 789
More information about the Genabel-commits
mailing list