[GenABEL-dev] [Genabel-commits] r902 - in branches/ProbABEL-pacox/v.0.1-3: examples examples/test_cox_ngpreds2 src
Yury Aulchenko
yurii.aulchenko at gmail.com
Sun Apr 29 17:39:16 CEST 2012
Dear Lennart,
You were absolutely right - the problem with bug [#1846] seems to be in 'update_snp' function for the cox data type.
Looking at the analysis of different models (ProbABEL vs R script examples/test_cox_ngpreds2/run_models_in_R.R) it appeared that P(A1A1) and P(A1A2) are "swapped" when Cox model is analyzed. Therefore it looks like changing single line
// OLD VARIANT
// X.put((gend.G).get(i,(snpnum*ngpreds+j)),(ncov-ngpreds+j),order[i]);
to
X.put((gend.G).get(i,(snpnum*ngpreds+j)),(ncov-j-1),order[i]);
solves the problem -- or at least now results using ProbABEL and R script examples/test_cox_ngpreds2/run_models_in_R.R are almost equivalent.
The fact that the results are "only" 99.99% (not 100%) the same could be numeric - looking at the dose files, I see there is very little variance, so the genotypes are not very informative and convergence may be somewhat poor.
I think good news is that this fix can be easily transferred to the "mainstream" ProbABEL as it is not very complex change :)
Please have a look and if you agree we can probably roll this fix out (in 1.3-0, or may be try to integrate into "mainstream" before?) for further testing by the people interested.
best wishes,
Yurii
-------------------------------------------------------
Yurii Aulchenko, PhD, Dr. Habil.
Independent researcher and consultant
yurii [dot] aulchenko [at] gmail [dot] com
On Apr 29, 2012, at 5:23 PM, noreply at r-forge.r-project.org wrote:
> Author: yurii
> Date: 2012-04-29 17:23:11 +0200 (Sun, 29 Apr 2012)
> New Revision: 902
>
> Added:
> branches/ProbABEL-pacox/v.0.1-3/examples/example_palinear_simple.sh
> branches/ProbABEL-pacox/v.0.1-3/examples/test_cox_ngpreds2/
> branches/ProbABEL-pacox/v.0.1-3/examples/test_cox_ngpreds2/run_models_in_R.R
> branches/ProbABEL-pacox/v.0.1-3/examples/test_cox_ngpreds2/run_models_in_R_palinear.R
> Modified:
> branches/ProbABEL-pacox/v.0.1-3/src/data.h
> branches/ProbABEL-pacox/v.0.1-3/src/reg1.h
> Log:
> fixing bug [#1846] (results from mldose and mlprob are not the same in Cox analysis). It turns out that probabilities P(A1A1) and P(A1A2) were swapped specifically in Cox 'update_snp' procedure. Added R scripts testing all models (except 2df) via comparison of pacoxph and R.
>
> Added: branches/ProbABEL-pacox/v.0.1-3/examples/example_palinear_simple.sh
> ===================================================================
> --- branches/ProbABEL-pacox/v.0.1-3/examples/example_palinear_simple.sh (rev 0)
> +++ branches/ProbABEL-pacox/v.0.1-3/examples/example_palinear_simple.sh 2012-04-29 15:23:11 UTC (rev 902)
> @@ -0,0 +1,7 @@
> +echo "analysing Linear model"
> +
> +../bin/palinear -p height.txt -d test.mldose.2 -i test.mlinfo -m test.map -c 19 -o height.out.txt.dose
> +
> +../bin/palinear -p height.txt -d test.mlprob --ngpreds=2 -i test.mlinfo -m test.map -c 19 -o height.out.txt.prob
> +
> +diff height.out.txt.dose_add.out.txt height.out.txt.prob_add.out.txt
>
> Added: branches/ProbABEL-pacox/v.0.1-3/examples/test_cox_ngpreds2/run_models_in_R.R
> ===================================================================
> --- branches/ProbABEL-pacox/v.0.1-3/examples/test_cox_ngpreds2/run_models_in_R.R (rev 0)
> +++ branches/ProbABEL-pacox/v.0.1-3/examples/test_cox_ngpreds2/run_models_in_R.R 2012-04-29 15:23:11 UTC (rev 902)
> @@ -0,0 +1,92 @@
> +library(survival)
> +
> +###
> +# load the data
> +###
> +
> +# load phenotypic data
> +pheno <- read.table("../coxph_data.txt",head=TRUE,string=FALSE)
> +
> +# load genetic DOSE data
> +dose <- read.table("../test.mldose.2",head=FALSE,string=FALSE)
> +# remove "1->" from the names of dose-IDs
> +idNames <- dose[,1]
> +idNames <- sub("[0-9]+->","",idNames)
> +dose[,1] <- idNames
> +# check consistency of names
> +table(dose[,1] == pheno[,1])
> +
> +# load genetic PROB data
> +prob <- read.table("../test.mlprob",head=FALSE,string=FALSE)
> +# remove "1->" from the names of dose-IDs
> +idNames <- prob[,1]
> +idNames <- sub("[0-9]+->","",idNames)
> +prob[,1] <- idNames
> +# check consistency of names
> +table(prob[,1] == pheno[,1])
> +
> +# check consistency DOSE <-> PROB
> +doseFromProb <- matrix(NA,ncol=dim(dose)[2],nrow=dim(dose)[1])
> +for (i in 3:dim(dose)[2]) {
> + indexHom <- 3 + ( i - 3 ) * 2
> + indexHet <- indexHom + 1
> + doseFromProb[,i] <- prob[,indexHom]*2 + prob[,indexHet]
> + print( table( ( dose[,i] - doseFromProb[,i] ) > 1e-8 ) )
> +}
> +
> +###
> +# run analysis
> +###
> +
> +# run ProbABEL
> +system("cd ..; sh example_cox.sh; cd test_cox_ngpreds2")
> +resPaAddDose <- read.table("../coxph.out.txt.dose_add.out.txt", head=TRUE)[,c("beta_SNP_add","sebeta_SNP_add")]
> +resPaAddProb <- read.table("../coxph.out.txt.prob_add.out.txt", head=TRUE)[,c("beta_SNP_addA1","sebeta_SNP_addA1")]
> +resPaDom <- read.table("../coxph.out.txt.prob_domin.out.txt", head=TRUE)[,c("beta_SNP_domA1","sebeta_SNP_domA1")]
> +resPaRec <- read.table("../coxph.out.txt.prob_recess.out.txt", head=TRUE)[,c("beta_SNP_recA1","sebeta_SNP_recA1")]
> +resPaOdom <- read.table("../coxph.out.txt.prob_over_domin.out.txt", head=TRUE)[,c("beta_SNP_odom","sebeta_SNP_odom")]
> +
> +attach(pheno)
> +
> +# run analysis on dose
> +emptyRes <- resPaAddDose
> +emptyRes[] <- NA
> +resRAddDose <- resRAddProb <- resRDom <- resRRec <- resROdom <- emptyRes
> +for (i in 3:dim(dose)[2]) {
> + smr <- summary( coxph(Surv(fupt_chd,chd) ~ sex + age + othercov + dose[,i] ) )
> + resRAddDose[i-2,] <- smr$coef[4,c(1,3)]
> + smr <- summary( coxph(Surv(fupt_chd,chd) ~ sex + age + othercov + doseFromProb[,i] ) )
> + resRAddProb[i-2,] <- smr$coef[4,c(1,3)]
> +}
> +resPaAddDose/resRAddDose
> +resPaAddProb/resRAddProb
> +
> +# dominant model
> +for (i in 3:dim(dose)[2]) {
> + indexHom <- 3 + ( i - 3 ) * 2
> + indexHet <- indexHom + 1
> + regProb <- prob[,indexHom] + prob[,indexHet]
> + smr <- summary( coxph(Surv(fupt_chd,chd) ~ sex + age + othercov + regProb ) )
> + resRDom[i-2,] <- smr$coef[4,c(1,3)]
> +}
> +resPaDom/resRDom
> +
> +# recessive model
> +for (i in 3:dim(dose)[2]) {
> + indexHom <- 3 + ( i - 3 ) * 2
> + indexHet <- indexHom + 1
> + regProb <- prob[,indexHom]
> + smr <- summary( coxph(Surv(fupt_chd,chd) ~ sex + age + othercov + regProb ) )
> + resRRec[i-2,] <- smr$coef[4,c(1,3)]
> +}
> +resPaRec/resRRec
> +
> +# over-dominant model
> +for (i in 3:dim(dose)[2]) {
> + indexHom <- 3 + ( i - 3 ) * 2
> + indexHet <- indexHom + 1
> + regProb <- prob[,indexHet]
> + smr <- summary( coxph(Surv(fupt_chd,chd) ~ sex + age + othercov + regProb ) )
> + resROdom[i-2,] <- smr$coef[4,c(1,3)]
> +}
> +resPaOdom/resROdom
>
>
> Property changes on: branches/ProbABEL-pacox/v.0.1-3/examples/test_cox_ngpreds2/run_models_in_R.R
> ___________________________________________________________________
> Added: svn:mime-type
> + text/plain
>
> Added: branches/ProbABEL-pacox/v.0.1-3/examples/test_cox_ngpreds2/run_models_in_R_palinear.R
> ===================================================================
> --- branches/ProbABEL-pacox/v.0.1-3/examples/test_cox_ngpreds2/run_models_in_R_palinear.R (rev 0)
> +++ branches/ProbABEL-pacox/v.0.1-3/examples/test_cox_ngpreds2/run_models_in_R_palinear.R 2012-04-29 15:23:11 UTC (rev 902)
> @@ -0,0 +1,74 @@
> +###
> +# load the data
> +###
> +
> +# load phenotypic data
> +pheno <- read.table("../height.txt",head=TRUE,string=FALSE)
> +
> +# load genetic DOSE data
> +dose <- read.table("../test.mldose.2",head=FALSE,string=FALSE)
> +# remove "1->" from the names of dose-IDs
> +idNames <- dose[,1]
> +idNames <- sub("[0-9]+->","",idNames)
> +dose[,1] <- idNames
> +# check consistency of names
> +table(dose[,1] == pheno[,1])
> +
> +# load genetic PROB data
> +prob <- read.table("../test.mlprob",head=FALSE,string=FALSE)
> +# remove "1->" from the names of dose-IDs
> +idNames <- prob[,1]
> +idNames <- sub("[0-9]+->","",idNames)
> +prob[,1] <- idNames
> +# check consistency of names
> +table(prob[,1] == pheno[,1])
> +
> +# check consistency DOSE <-> PROB
> +doseFromProb <- matrix(NA,ncol=dim(dose)[2],nrow=dim(dose)[1])
> +for (i in 3:dim(dose)[2]) {
> + indexHom <- 3 + ( i - 3 ) * 2
> + indexHet <- indexHom + 1
> + doseFromProb[,i] <- prob[,indexHom]*2 + prob[,indexHet]
> + print( table( ( dose[,i] - doseFromProb[,i] ) > 1e-8 ) )
> +}
> +
> +###
> +# run analysis
> +###
> +
> +attach(pheno)
> +
> +# run analysis on dose
> +for (i in 3:dim(dose)[2]) {
> + smr <- summary( lm( height ~ sex + age + dose[,i] ) )
> + print( smr$coef[4,1:2] )
> + smr <- summary( lm( height ~ sex + age + doseFromProb[,i] ) )
> + print( smr$coef[4,1:2] )
> +}
> +
> +# dominant model
> +for (i in 3:dim(dose)[2]) {
> + indexHom <- 3 + ( i - 3 ) * 2
> + indexHet <- indexHom + 1
> + regProb <- prob[,indexHom] + prob[,indexHet]
> + smr <- summary( lm( height ~ sex + age + regProb ) )
> + print( smr$coef[4,1:2] )
> +}
> +
> +# recessive model
> +for (i in 3:dim(dose)[2]) {
> + indexHom <- 3 + ( i - 3 ) * 2
> + indexHet <- indexHom + 1
> + regProb <- prob[,indexHom]
> + smr <- summary( lm( height ~ sex + age + regProb ) )
> + print( smr$coef[4,1:2] )
> +}
> +
> +# over-dominant model
> +for (i in 3:dim(dose)[2]) {
> + indexHom <- 3 + ( i - 3 ) * 2
> + indexHet <- indexHom + 1
> + regProb <- prob[,indexHet]
> + smr <- summary( lm( height ~ sex + age + regProb ) )
> + print( smr$coef[4,1:2] )
> +}
>
>
> Property changes on: branches/ProbABEL-pacox/v.0.1-3/examples/test_cox_ngpreds2/run_models_in_R_palinear.R
> ___________________________________________________________________
> Added: svn:mime-type
> + text/plain
>
> Modified: branches/ProbABEL-pacox/v.0.1-3/src/data.h
> ===================================================================
> --- branches/ProbABEL-pacox/v.0.1-3/src/data.h 2012-04-29 11:38:41 UTC (rev 901)
> +++ branches/ProbABEL-pacox/v.0.1-3/src/data.h 2012-04-29 15:23:11 UTC (rev 902)
> @@ -362,7 +362,7 @@
> {
> for (int i=0;i<nids;i++)
> for (int j=0;j<ngpreds;j++)
> - X.put((gend.G).get(i,(snpnum*ngpreds+j)),i,(ncov+1-j-1));
> + X.put((gend.G).get(i,(snpnum*ngpreds+j)),i,(ncov-j));
> // X.put((gend.G).get(i,(snpnum*ngpreds+j)),i,(ncov-ngpreds+1+(ngpreds-j+1)));
> // X.put((gend.G).get(i,(snpnum*ngpreds+j)),i,(ncov-ngpreds+1+j));
> }
> @@ -491,10 +491,17 @@
> }
> void update_snp(gendata &gend, int snpnum)
> {
> - // note this sorts by "order"!!!
> - for (int i=0;i<nids;i++)
> - for (int j=0;j<ngpreds;j++)
> - X.put((gend.G).get(i,(snpnum*ngpreds+j)),(ncov-ngpreds+j),order[i]);
> + /** note this sorts by "order"!!!
> + * Here we deal with transposed X, hence last two arguments are swapped
> + * compared to the other 'update_snp'
> + * Also, the starting column-1 is not necessary for cox X therefore
> + * 'ncov-j' changes to 'ncov-j-1'
> + **/
> + for (int i=0;i<nids;i++)
> + for (int j=0;j<ngpreds;j++)
> + X.put((gend.G).get(i,(snpnum*ngpreds+j)),(ncov-j-1),order[i]);
> + // OLD VARIANT
> + // X.put((gend.G).get(i,(snpnum*ngpreds+j)),(ncov-ngpreds+j),order[i]);
> }
> ~coxph_data()
> {
>
> Modified: branches/ProbABEL-pacox/v.0.1-3/src/reg1.h
> ===================================================================
> --- branches/ProbABEL-pacox/v.0.1-3/src/reg1.h 2012-04-29 11:38:41 UTC (rev 901)
> +++ branches/ProbABEL-pacox/v.0.1-3/src/reg1.h 2012-04-29 15:23:11 UTC (rev 902)
> @@ -22,7 +22,8 @@
>
> #include <cmath>
>
> -mematrix<double> apply_model(mematrix<double> &X, int model, int interaction, int ngpreds, bool iscox=false, int nullmodel = 0)
> +mematrix<double> apply_model(mematrix<double> &X, int model, int interaction,
> + int ngpreds, bool iscox=false, int nullmodel = 0)
> // model 0 = 2 df
> // model 1 = additive 1 df
> // model 2 = dominant 1 df
> @@ -159,20 +160,20 @@
> {
> nX.reinit(X.nrow,(X.ncol-1));
> }
> - int c1 = X.ncol-2;
> - int c2 = X.ncol-1;
> + int c1 = X.ncol-2; // column with Prob(A1A2)
> + int c2 = X.ncol-1; // column with Prob(A1A1). Note the order is swapped cf the file!
> for (int i=0;i<X.nrow;i++)
> for (int j=0;j<(X.ncol-2);j++)
> nX[i*nX.ncol+j] = X[i*X.ncol+j];
> for (int i=0;i<nX.nrow;i++)
> {
> - if (model==1)
> + if (model==1) // additive
> nX[i*nX.ncol+c1] = X[i*X.ncol+c1] + 2.*X[i*X.ncol+c2];
> - else if (model==2)
> + else if (model==2) // dominant
> nX[i*nX.ncol+c1] = X[i*X.ncol+c1] + X[i*X.ncol+c2];
> - else if (model==3)
> + else if (model==3) // recessive
> nX[i*nX.ncol+c1] = X[i*X.ncol+c2];
> - else if (model==4)
> + else if (model==4) // over-dominant
> nX[i*nX.ncol+c1] = X[i*X.ncol+c1];
> if(interaction != 0)
> nX[i*nX.ncol+c2] = X[i*nX.ncol+interaction] * nX[i*nX.ncol+c1];//Maksim: interaction with SNP
>
> _______________________________________________
> Genabel-commits mailing list
> Genabel-commits at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits
More information about the genabel-devel
mailing list