Score on 2df is almost the same to Wald on 2df - you do the same math but your input var-cov matrix is different (estimated under null for score and under the alternative or Wald)<div><br></div><div>Robust - I am not sure at all you can use LRT - I mean these two may be theoretically incompatible (like you can do LRT if you use maximum likelihood, but not when you use restricted maximum likelihood - it is simply mathematically incorrect). But again, not 100% sure. Actually if ProbA can do that "technically" it is worth to figure this out and either disable or give a BIG warning.</div>
<div><br></div><div>On the contrary, you can combine score/Wald with "robust"</div><div><br></div><div>best wishes,</div><div>Y<br><br><div class="gmail_quote">On Wed, Aug 14, 2013 at 12:11 PM, L.C. Karssen <span dir="ltr"><<a href="mailto:lennart@karssen.org" target="_blank">lennart@karssen.org</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">On 13-08-13 17:53, Yurii Aulchenko wrote:<br>
> This is a long-awaited-for improvement! - great work!<br>
<br>
</div>Thanks! As always it was a learning experience. With these larger<br>
changes you get to know the code better and better. And learn more<br>
statistics along the way :-).<br>
<br>
I decided to go for LRT instead of the Wald test for two reasons:<br>
- LRT is theoretically more superior<br>
- I found the equation for the Wald on 2df in the ProbABEL paper, but<br>
programming-wise I couldn't get it to work. The coxfit2() function for<br>
example says it returns the covariance matrix, but after extracting the<br>
sub-matrices I still didn't get answers that were close to the<br>
(beta/se_beta)^2 values for the 1df case. So, after spending some time,<br>
implementing the LRT while only recalculating the null model in the case<br>
of missing genotype data was the quickest.<br>
<br>
I've also added the LRT-chi^2 for Cox and logistic regression now, as<br>
well as R-based consistency checks.<br>
<br>
I haven't looked at the output when using the score option or the robust<br>
option at all yet. Any idea if that will require a lot of additional<br>
programming? Various people are waiting for the fixed Cox regression, so<br>
I would like to put out a new ProbABEL release ASAP.<br>
<br>
<br>
Thanks,<br>
<br>
Lennart.<br>
<div><div class="h5"><br>
<br>
><br>
> ----------------------<br>
> Yurii Aulchenko<br>
> (sent from mobile device)<br>
><br>
> On 8 Aug 2013, at 15:19, "<a href="mailto:noreply@r-forge.r-project.org">noreply@r-forge.r-project.org</a>"<br>
> <<a href="mailto:noreply@r-forge.r-project.org">noreply@r-forge.r-project.org</a>> wrote:<br>
><br>
>> Author: lckarssen<br>
>> Date: 2013-08-08 13:19:30 +0200 (Thu, 08 Aug 2013)<br>
>> New Revision: 1286<br>
>><br>
>> Modified:<br>
>> pkg/ProbABEL/src/eigen_mematrix.cpp<br>
>> pkg/ProbABEL/src/eigen_mematrix.h<br>
>> pkg/ProbABEL/src/main.cpp<br>
>> pkg/ProbABEL/src/mematri1.h<br>
>> pkg/ProbABEL/src/mematrix.h<br>
>> pkg/ProbABEL/src/reg1.cpp<br>
>> pkg/ProbABEL/src/regdata.cpp<br>
>> pkg/ProbABEL/src/regdata.h<br>
>> Log:<br>
>> Added chi^2 information to the ProbABEL output for linear regression.<br>
>> NOTE: for palogist and pacoxph this still needs to be fixed!!!<br>
>><br>
>> The chi^2 values are based on the LRT. The null model is calculated at<br>
>> the beginning (this was already part of ProbABEL for a long time). In<br>
>> the case of missing genotype data the null model is recalculated for<br>
>> that SNP only. So for people with imputed data there should be no<br>
>> difference in computation time.<br>
>><br>
>> This is a bit of a rough implementation. Maybe some more work is<br>
>> needed to make it better (in terms of programming style/efficiency).<br>
>><br>
>> Changes per file:<br>
>> src/main.cpp:<br>
>> - Some small (unrelated) changes to the way progress information is printed<br>
>> - Changed output precision of beta, se_beta, chi^2 to 6 instead of 9 digits<br>
>> - around line 700 is where the recalculation of the null model is done.<br>
>> src/regdata.h, src/regdata.cpp:<br>
>> - Add a function remove_snp_from_X() that removes the genotype data<br>
>> from the design matrix. This is necessary, because in order to know<br>
>> which individuals have missing genotype data (and therefore should<br>
>> be excluded from the null estimation), we first need to have the<br>
>> genotype data in.<br>
>> src/reg1.cpp:<br>
>> - At the beginning of apply_model() check if we are calculating the<br>
>> null model. if so, we don't need to apply the genotypic model at<br>
>> all.<br>
>> src/eigen_mematrix.h, src/eigen_mematrix.cpp:<br>
>> - Implement the delete_column() function. When transitioning to Eigen<br>
>> this function wasn't used anywhere in the code, so it wasn't<br>
>> carried over from the mematrix files.<br>
>> src/mematri1.h, src/mematrix.h:<br>
>> - Set the col/row number argument to const in the delete_column() and<br>
>> delete_row() functions.<br>
>><br>
>><br>
>><br>
>><br>
>><br>
>> Modified: pkg/ProbABEL/src/eigen_mematrix.cpp<br>
>> ===================================================================<br>
>> --- pkg/ProbABEL/src/eigen_mematrix.cpp 2013-08-08 10:07:32 UTC (rev 1285)<br>
>> +++ pkg/ProbABEL/src/eigen_mematrix.cpp 2013-08-08 11:19:30 UTC (rev 1286)<br>
>> @@ -362,4 +362,30 @@<br>
>> return temp;<br>
>> }<br>
>><br>
>> +<br>
>> +template<class DT><br>
>> +void mematrix<DT>::delete_column(const int delcol)<br>
>> +{<br>
>> + if (delcol > ncol || delcol < 0)<br>
>> + {<br>
>> + fprintf(stderr, "mematrix::delete_column: column out of range\n");<br>
>> + exit(1);<br>
>> + }<br>
>> +<br>
>> + // Eigen::Matrix<DT,-1,-1,0,-1,-1> *auxdata =<br>
>> + // new Eigen::Matrix<DT,-1,-1,0,-1,-1>;<br>
>> + MatrixXd auxdata = data;<br>
>> +<br>
>> + data.resize(data.rows(), data.cols()-1);<br>
>> +<br>
>> + int rightColsSize = auxdata.cols() - delcol - 1;<br>
>> +<br>
>> + data.leftCols(delcol) = auxdata.leftCols(delcol);<br>
>> + data.rightCols(rightColsSize) = auxdata.rightCols(rightColsSize);<br>
>> +<br>
>> + ncol--;<br>
>> +}<br>
>> +<br>
>> +<br>
>> +<br>
>> #endif<br>
>><br>
>> Modified: pkg/ProbABEL/src/eigen_mematrix.h<br>
>> ===================================================================<br>
>> --- pkg/ProbABEL/src/eigen_mematrix.h 2013-08-08 10:07:32 UTC (rev 1285)<br>
>> +++ pkg/ProbABEL/src/eigen_mematrix.h 2013-08-08 11:19:30 UTC (rev 1286)<br>
>> @@ -37,6 +37,8 @@<br>
>> mematrix operator*(const mematrix &M);<br>
>> mematrix operator*(const mematrix *M);<br>
>><br>
>> + void delete_column(const int delcol);<br>
>> +<br>
>> void reinit(int nr, int nc);<br>
>><br>
>> unsigned int getnrow(void)<br>
>><br>
>> Modified: pkg/ProbABEL/src/main.cpp<br>
>> ===================================================================<br>
>> --- pkg/ProbABEL/src/main.cpp 2013-08-08 10:07:32 UTC (rev 1285)<br>
>> +++ pkg/ProbABEL/src/main.cpp 2013-08-08 11:19:30 UTC (rev 1286)<br>
>> @@ -208,9 +208,9 @@<br>
>> << input_var.getSep()<br>
>> << "sebeta_SNP_recA1";<br>
>> *outfile[4] << input_var.getSep()<br>
>> - << "beta_SNP_odom"<br>
>> + << "beta_SNP_odomA1"<br>
>> << input_var.getSep()<br>
>> - << "sebeta_SNP_odom";<br>
>> + << "sebeta_SNP_odomA1";<br>
>> if (input_var.getInteraction() != 0)<br>
>> {<br>
>> //Han Chen<br>
>> @@ -263,7 +263,7 @@<br>
>> *outfile[1] << input_var.getSep() << "chi2_SNP_A1\n"; // "loglik\n";<br>
>> *outfile[2] << input_var.getSep() << "chi2_SNP_domA1\n";// "loglik\n";<br>
>> *outfile[3] << input_var.getSep() << "chi2_SNP_recA1\n";// "loglik\n";<br>
>> - *outfile[4] << input_var.getSep() << "chi2_SNP_odom\n"; // "loglik\n";<br>
>> + *outfile[4] << input_var.getSep() << "chi2_SNP_odomA1\n"; // "loglik\n";<br>
>> }<br>
>><br>
>> void create_header2(std::vector<std::ofstream*>& outfile, cmdvars& input_var,<br>
>> @@ -389,7 +389,7 @@<br>
>><br>
>> masked_matrix invvarmatrix;<br>
>><br>
>> - std::cout << "Reading data ..." << std::flush;<br>
>> + std::cout << "Reading data..." << std::flush;<br>
>> if (input_var.getInverseFilename() != NULL)<br>
>> {<br>
>> loadInvSigma(input_var, phd, invvarmatrix);<br>
>> @@ -412,7 +412,7 @@<br>
>> phd.allmeasured, phd.idnames);<br>
>> }<br>
>><br>
>> - std::cout << " loaded genotypic data ..." << std::flush;<br>
>> + std::cout << " loaded genotypic data..." << std::flush;<br>
>><br>
>> // estimate null model<br>
>> #if COXPH<br>
>> @@ -421,7 +421,7 @@<br>
>> regdata nrgd = regdata(phd, gtd, -1, input_var.isIsInteractionExcluded());<br>
>> #endif<br>
>><br>
>> - std::cout << " loaded null data ..." << std::flush;<br>
>> + std::cout << " loaded null data..." << std::flush;<br>
>> #if LOGISTIC<br>
>> logistic_reg nrd = logistic_reg(nrgd);<br>
>> nrd.estimate(nrgd, 0, MAXITER, EPS, CHOLTOL, 0,<br>
>> @@ -446,14 +446,14 @@<br>
>> #endif<br>
>> double null_loglik = nrd.loglik;<br>
>><br>
>> - std::cout << " estimated null model ...";<br>
>> + std::cout << " estimated null model...";<br>
>> // end null<br>
>> #if COXPH<br>
>> coxph_data rgd(phd, gtd, 0);<br>
>> #else<br>
>> regdata rgd(phd, gtd, 0, input_var.isIsInteractionExcluded());<br>
>> #endif<br>
>> - std::cout << " formed regression object ...";<br>
>> + std::cout << " formed regression object...\n";<br>
>><br>
>><br>
>> // Open a vector of files that will be used for output. Depending<br>
>> @@ -505,13 +505,16 @@<br>
>> for (int i = 0; i < maxmod; i++)<br>
>> {<br>
>> beta_sebeta.push_back(new std::ostringstream());<br>
>> - beta_sebeta[i]->precision(9);<br>
>> + beta_sebeta[i]->precision(6);<br>
>> + //*beta_sebeta[i] << scientific;<br>
>> //Han Chen<br>
>> covvalue.push_back(new std::ostringstream());<br>
>> - covvalue[i]->precision(9);<br>
>> + covvalue[i]->precision(6);<br>
>> + //*covvalue[i] << scientific;<br>
>> //Oct 26, 2009<br>
>> chi2.push_back(new std::ostringstream());<br>
>> - chi2[i]->precision(9);<br>
>> + chi2[i]->precision(6);<br>
>> + //*chi2[i] << scientific;<br>
>> }<br>
>><br>
>><br>
>> @@ -565,10 +568,10 @@<br>
>> poly = 0;<br>
>> }<br>
>><br>
>> + // Write mlinfo information to the output file(s)<br>
>> // Prob data: All models output. One file per model<br>
>> if (input_var.getNgpreds() == 2)<br>
>> {<br>
>> - // Write mlinfo to output:<br>
>> for (unsigned int file = 0; file < outfile.size(); file++)<br>
>> {<br>
>> write_mlinfo(outfile, file, mli, csnp, input_var,<br>
>> @@ -679,7 +682,7 @@<br>
>> } // END for(pos = start_pos; pos < rd.beta.nrow; pos++)<br>
>><br>
>><br>
>> - //calculate chi2<br>
>> + //calculate chi^2<br>
>> //________________________________<br>
>> //cout << rd.loglik<<" "<<input_var.getNgpreds() << "\n";<br>
>><br>
>> @@ -690,23 +693,41 @@<br>
>><br>
>> if (input_var.getScore() == 0)<br>
>> {<br>
>> + double loglik = rd.loglik;<br>
>> if (gcount != gtd.nids)<br>
>> {<br>
>> // If SNP data is missing we didn't<br>
>> // correctly compute the null likelihood<br>
>> - *chi2[model] << "NaN";<br>
>> +<br>
>> + // Recalculate null likelihood by<br>
>> + // stripping the SNP data column(s) from<br>
>> + // the X matrix in the regression object<br>
>> + // and run the null model estimation again<br>
>> + // for this SNP.<br>
>> +// BEWARE, ONLY IMPLEMENTED FOR LINEAR REG!!!<br>
>> +// TODO LCK<br>
>> +#ifdef LINEAR<br>
>> + regdata new_rgd = rgd;<br>
>> + new_rgd.remove_snp_from_X();<br>
>> + linear_reg new_null_rd(new_rgd);<br>
>> + new_null_rd.estimate(new_rgd, 0, CHOLTOL, model,<br>
>> + input_var.getInteraction(),<br>
>> + input_var.getNgpreds(),<br>
>> + invvarmatrix,<br>
>> + input_var.getRobust(), 1);<br>
>> +<br>
>> + *chi2[model] << 2. * (loglik - new_null_rd.loglik);<br>
>> +#endif<br>
>> }<br>
>> else<br>
>> {<br>
>> // No missing SNP data, we can compute the LRT<br>
>> - *chi2[model] << 2. * (rd.loglik - null_loglik);<br>
>> + *chi2[model] << 2. * (loglik - null_loglik);<br>
>> }<br>
>> - //*chi2[model] << rd.loglik;<br>
>> } else<br>
>> {<br>
>> // We want score test output<br>
>> *chi2[model] << rd.chi2_score;<br>
>> - //*chi2[model] << "nan";<br>
>> }<br>
>> }<br>
>> } // END first part of if(poly); allele not too rare<br>
>><br>
>> Modified: pkg/ProbABEL/src/mematri1.h<br>
>> ===================================================================<br>
>> --- pkg/ProbABEL/src/mematri1.h 2013-08-08 10:07:32 UTC (rev 1285)<br>
>> +++ pkg/ProbABEL/src/mematri1.h 2013-08-08 11:19:30 UTC (rev 1286)<br>
>> @@ -301,7 +301,7 @@<br>
>> }<br>
>><br>
>> template<class DT><br>
>> -void mematrix<DT>::delete_column(int delcol)<br>
>> +void mematrix<DT>::delete_column(const int delcol)<br>
>> {<br>
>> if (delcol > ncol || delcol < 0)<br>
>> {<br>
>> @@ -333,7 +333,7 @@<br>
>> }<br>
>><br>
>> template<class DT><br>
>> -void mematrix<DT>::delete_row(int delrow)<br>
>> +void mematrix<DT>::delete_row(const int delrow)<br>
>> {<br>
>> if (delrow > nrow || delrow < 0)<br>
>> {<br>
>><br>
>> Modified: pkg/ProbABEL/src/mematrix.h<br>
>> ===================================================================<br>
>> --- pkg/ProbABEL/src/mematrix.h 2013-08-08 10:07:32 UTC (rev 1285)<br>
>> +++ pkg/ProbABEL/src/mematrix.h 2013-08-08 11:19:30 UTC (rev 1286)<br>
>> @@ -48,8 +48,8 @@<br>
>> void put(DT value, int nr, int nc);<br>
>> DT column_mean(int nc);<br>
>> void print(void);<br>
>> - void delete_column(int delcol);<br>
>> - void delete_row(int delrow);<br>
>> + void delete_column(const int delcol);<br>
>> + void delete_row(const int delrow);<br>
>><br>
>> };<br>
>><br>
>><br>
>> Modified: pkg/ProbABEL/src/reg1.cpp<br>
>> ===================================================================<br>
>> --- pkg/ProbABEL/src/reg1.cpp 2013-08-08 10:07:32 UTC (rev 1285)<br>
>> +++ pkg/ProbABEL/src/reg1.cpp 2013-08-08 11:19:30 UTC (rev 1286)<br>
>> @@ -4,12 +4,22 @@<br>
>> mematrix<double> apply_model(mematrix<double>& X, int model, int interaction,<br>
>> int ngpreds, bool is_interaction_excluded,<br>
>> bool iscox, int nullmodel)<br>
>> +// if ngpreds==1 (dose data):<br>
>> +// model 0 = additive 1 df<br>
>> +// if ngpreds==2 (prob data):<br>
>> // model 0 = 2 df<br>
>> // model 1 = additive 1 df<br>
>> // model 2 = dominant 1 df<br>
>> // model 3 = recessive 1 df<br>
>> // model 4 = over-dominant 1 df<br>
>> {<br>
>> + if(nullmodel)<br>
>> + {<br>
>> + // No need to apply any genotypic model when calculating the<br>
>> + // null model<br>
>> + return (X);<br>
>> + }<br>
>> +<br>
>> if (model == 0)<br>
>> {<br>
>> if (interaction != 0 && !nullmodel)<br>
>> @@ -295,12 +305,13 @@<br>
>> if (verbose)<br>
>> {<br>
>> cout << rdata.is_interaction_excluded<br>
>> - << " <-irdata.is_interaction_excluded\n";<br>
>> + << " <-rdata.is_interaction_excluded\n";<br>
>> // std::cout << "invvarmatrix:\n";<br>
>> // invvarmatrixin.masked_data->print();<br>
>> std::cout << "rdata.X:\n";<br>
>> rdata.X.print();<br>
>> }<br>
>> +<br>
>> mematrix<double> X = apply_model(rdata.X, model, interaction, ngpreds,<br>
>> rdata.is_interaction_excluded, false,<br>
>> nullmodel);<br>
>> @@ -311,6 +322,7 @@<br>
>> std::cout << "Y:\n";<br>
>> rdata.Y.print();<br>
>> }<br>
>> +<br>
>> int length_beta = X.ncol;<br>
>> beta.reinit(length_beta, 1);<br>
>> sebeta.reinit(length_beta, 1);<br>
>><br>
>> Modified: pkg/ProbABEL/src/regdata.cpp<br>
>> ===================================================================<br>
>> --- pkg/ProbABEL/src/regdata.cpp 2013-08-08 10:07:32 UTC (rev 1285)<br>
>> +++ pkg/ProbABEL/src/regdata.cpp 2013-08-08 11:19:30 UTC (rev 1286)<br>
>> @@ -39,7 +39,7 @@<br>
>><br>
>> for (int i = 0; i < nids; i++)<br>
>> {<br>
>> - masked_data[i] = 0;<br>
>> + masked_data[i] = obj.masked_data[i];<br>
>> }<br>
>> }<br>
>><br>
>> @@ -95,6 +95,9 @@<br>
>><br>
>> void regdata::update_snp(gendata &gend, int snpnum)<br>
>> {<br>
>> + // Add genotypic data (dosage or probabilities) to the design<br>
>> + // matrix X.<br>
>> +<br>
>> for (int j = 0; j < ngpreds; j++)<br>
>> {<br>
>> double snpdata[nids];<br>
>> @@ -109,11 +112,34 @@<br>
>> {<br>
>> X.put(snpdata[i], i, (ncov - j));<br>
>> if (isnan(snpdata[i]))<br>
>> + {<br>
>> masked_data[i] = 1;<br>
>> + }<br>
>> }<br>
>> }<br>
>> }<br>
>><br>
>> +void regdata::remove_snp_from_X()<br>
>> +{<br>
>> + // update_snp() adds SNP information to the design matrix. This<br>
>> + // function allows you to strip that information from X again.<br>
>> + // This is used for example when calculating the null model.<br>
>> +<br>
>> + if(ngpreds == 1)<br>
>> + {<br>
>> + X.delete_column(X.ncol -1);<br>
>> + }<br>
>> + else if(ngpreds == 2)<br>
>> + {<br>
>> + X.delete_column(X.ncol -1);<br>
>> + X.delete_column(X.ncol -1);<br>
>> + }<br>
>> + else<br>
>> + {<br>
>> + cerr << "ngpreds should be 1 or 2. you should never come here!\n";<br>
>> + }<br>
>> +}<br>
>> +<br>
>> regdata::~regdata()<br>
>> {<br>
>> delete[] regdata::masked_data;<br>
>><br>
>> Modified: pkg/ProbABEL/src/regdata.h<br>
>> ===================================================================<br>
>> --- pkg/ProbABEL/src/regdata.h 2013-08-08 10:07:32 UTC (rev 1285)<br>
>> +++ pkg/ProbABEL/src/regdata.h 2013-08-08 11:19:30 UTC (rev 1286)<br>
>> @@ -34,6 +34,7 @@<br>
>> bool ext_is_interaction_excluded);<br>
>> mematrix<double> extract_genotypes();<br>
>> void update_snp(gendata &gend, int snpnum);<br>
>> + void remove_snp_from_X();<br>
>> regdata get_unmasked_data();<br>
>> ~regdata();<br>
>><br>
>><br>
>> _______________________________________________<br>
>> Genabel-commits mailing list<br>
>> <a href="mailto:Genabel-commits@lists.r-forge.r-project.org">Genabel-commits@lists.r-forge.r-project.org</a><br>
>> <a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits</a><br>
</div></div>> _______________________________________________<br>
> genabel-devel mailing list<br>
> <a href="mailto:genabel-devel@lists.r-forge.r-project.org">genabel-devel@lists.r-forge.r-project.org</a><br>
> <a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel</a><br>
><br>
<span class="HOEnZb"><font color="#888888"><br>
<br>
--<br>
-----------------------------------------------------------------<br>
L.C. Karssen<br>
Utrecht<br>
The Netherlands<br>
<br>
<a href="mailto:lennart@karssen.org">lennart@karssen.org</a><br>
<a href="http://blog.karssen.org" target="_blank">http://blog.karssen.org</a><br>
<br>
Stuur mij aub geen Word of Powerpoint bestanden!<br>
Zie <a href="http://www.gnu.org/philosophy/no-word-attachments.nl.html" target="_blank">http://www.gnu.org/philosophy/no-word-attachments.nl.html</a><br>
------------------------------------------------------------------<br>
<br>
</font></span><br>_______________________________________________<br>
genabel-devel mailing list<br>
<a href="mailto:genabel-devel@lists.r-forge.r-project.org">genabel-devel@lists.r-forge.r-project.org</a><br>
<a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel</a><br></blockquote></div><br><br clear="all">
<div><br></div>-- <br>-----------------------------------------------------<br>Yurii S. Aulchenko<br><div><br></div><div>[ <a href="http://nl.linkedin.com/in/yuriiaulchenko" target="_blank">LinkedIn</a> ] [ <a href="http://twitter.com/YuriiAulchenko" target="_blank">Twitter</a> ] [ <a href="http://yurii-aulchenko.blogspot.nl/" target="_blank">Blog</a> ]</div>
</div>