[GenABEL-dev] [Genabel-commits] r1286 - pkg/ProbABEL/src
L.C. Karssen
lennart at karssen.org
Wed Aug 14 19:08:53 CEST 2013
Another question related to the LRT-based chi^2 implementation:
Is there a reason why LRT-based chi^2 values would be wrong/nonsense
when running palinear with the --mscore option?
At the moment there is a big "if( !mmscore )" around the chi^2 section
that was already in place from the old days when LRT-based chi^2 was in
ProbABEL (but didn't take missing genotypes into account).
I removed the if() and compared to the Wald statistic, the LRT chi^2
seems a bit off, but not by much.
Thanks for any insights!
Lennart.
On 14-08-13 10:11, L.C. Karssen wrote:
> On 13-08-13 17:53, Yurii Aulchenko wrote:
>> This is a long-awaited-for improvement! - great work!
>
> Thanks! As always it was a learning experience. With these larger
> changes you get to know the code better and better. And learn more
> statistics along the way :-).
>
> I decided to go for LRT instead of the Wald test for two reasons:
> - LRT is theoretically more superior
> - I found the equation for the Wald on 2df in the ProbABEL paper, but
> programming-wise I couldn't get it to work. The coxfit2() function for
> example says it returns the covariance matrix, but after extracting the
> sub-matrices I still didn't get answers that were close to the
> (beta/se_beta)^2 values for the 1df case. So, after spending some time,
> implementing the LRT while only recalculating the null model in the case
> of missing genotype data was the quickest.
>
> I've also added the LRT-chi^2 for Cox and logistic regression now, as
> well as R-based consistency checks.
>
> I haven't looked at the output when using the score option or the robust
> option at all yet. Any idea if that will require a lot of additional
> programming? Various people are waiting for the fixed Cox regression, so
> I would like to put out a new ProbABEL release ASAP.
>
>
> Thanks,
>
> Lennart.
>
>
>>
>> ----------------------
>> Yurii Aulchenko
>> (sent from mobile device)
>>
>> On 8 Aug 2013, at 15:19, "noreply at r-forge.r-project.org"
>> <noreply at r-forge.r-project.org> wrote:
>>
>>> Author: lckarssen
>>> Date: 2013-08-08 13:19:30 +0200 (Thu, 08 Aug 2013)
>>> New Revision: 1286
>>>
>>> Modified:
>>> pkg/ProbABEL/src/eigen_mematrix.cpp
>>> pkg/ProbABEL/src/eigen_mematrix.h
>>> pkg/ProbABEL/src/main.cpp
>>> pkg/ProbABEL/src/mematri1.h
>>> pkg/ProbABEL/src/mematrix.h
>>> pkg/ProbABEL/src/reg1.cpp
>>> pkg/ProbABEL/src/regdata.cpp
>>> pkg/ProbABEL/src/regdata.h
>>> Log:
>>> Added chi^2 information to the ProbABEL output for linear regression.
>>> NOTE: for palogist and pacoxph this still needs to be fixed!!!
>>>
>>> The chi^2 values are based on the LRT. The null model is calculated at
>>> the beginning (this was already part of ProbABEL for a long time). In
>>> the case of missing genotype data the null model is recalculated for
>>> that SNP only. So for people with imputed data there should be no
>>> difference in computation time.
>>>
>>> This is a bit of a rough implementation. Maybe some more work is
>>> needed to make it better (in terms of programming style/efficiency).
>>>
>>> Changes per file:
>>> src/main.cpp:
>>> - Some small (unrelated) changes to the way progress information is printed
>>> - Changed output precision of beta, se_beta, chi^2 to 6 instead of 9 digits
>>> - around line 700 is where the recalculation of the null model is done.
>>> src/regdata.h, src/regdata.cpp:
>>> - Add a function remove_snp_from_X() that removes the genotype data
>>> from the design matrix. This is necessary, because in order to know
>>> which individuals have missing genotype data (and therefore should
>>> be excluded from the null estimation), we first need to have the
>>> genotype data in.
>>> src/reg1.cpp:
>>> - At the beginning of apply_model() check if we are calculating the
>>> null model. if so, we don't need to apply the genotypic model at
>>> all.
>>> src/eigen_mematrix.h, src/eigen_mematrix.cpp:
>>> - Implement the delete_column() function. When transitioning to Eigen
>>> this function wasn't used anywhere in the code, so it wasn't
>>> carried over from the mematrix files.
>>> src/mematri1.h, src/mematrix.h:
>>> - Set the col/row number argument to const in the delete_column() and
>>> delete_row() functions.
>>>
>>>
>>>
>>>
>>>
>>> Modified: pkg/ProbABEL/src/eigen_mematrix.cpp
>>> ===================================================================
>>> --- pkg/ProbABEL/src/eigen_mematrix.cpp 2013-08-08 10:07:32 UTC (rev 1285)
>>> +++ pkg/ProbABEL/src/eigen_mematrix.cpp 2013-08-08 11:19:30 UTC (rev 1286)
>>> @@ -362,4 +362,30 @@
>>> return temp;
>>> }
>>>
>>> +
>>> +template<class DT>
>>> +void mematrix<DT>::delete_column(const int delcol)
>>> +{
>>> + if (delcol > ncol || delcol < 0)
>>> + {
>>> + fprintf(stderr, "mematrix::delete_column: column out of range\n");
>>> + exit(1);
>>> + }
>>> +
>>> + // Eigen::Matrix<DT,-1,-1,0,-1,-1> *auxdata =
>>> + // new Eigen::Matrix<DT,-1,-1,0,-1,-1>;
>>> + MatrixXd auxdata = data;
>>> +
>>> + data.resize(data.rows(), data.cols()-1);
>>> +
>>> + int rightColsSize = auxdata.cols() - delcol - 1;
>>> +
>>> + data.leftCols(delcol) = auxdata.leftCols(delcol);
>>> + data.rightCols(rightColsSize) = auxdata.rightCols(rightColsSize);
>>> +
>>> + ncol--;
>>> +}
>>> +
>>> +
>>> +
>>> #endif
>>>
>>> Modified: pkg/ProbABEL/src/eigen_mematrix.h
>>> ===================================================================
>>> --- pkg/ProbABEL/src/eigen_mematrix.h 2013-08-08 10:07:32 UTC (rev 1285)
>>> +++ pkg/ProbABEL/src/eigen_mematrix.h 2013-08-08 11:19:30 UTC (rev 1286)
>>> @@ -37,6 +37,8 @@
>>> mematrix operator*(const mematrix &M);
>>> mematrix operator*(const mematrix *M);
>>>
>>> + void delete_column(const int delcol);
>>> +
>>> void reinit(int nr, int nc);
>>>
>>> unsigned int getnrow(void)
>>>
>>> Modified: pkg/ProbABEL/src/main.cpp
>>> ===================================================================
>>> --- pkg/ProbABEL/src/main.cpp 2013-08-08 10:07:32 UTC (rev 1285)
>>> +++ pkg/ProbABEL/src/main.cpp 2013-08-08 11:19:30 UTC (rev 1286)
>>> @@ -208,9 +208,9 @@
>>> << input_var.getSep()
>>> << "sebeta_SNP_recA1";
>>> *outfile[4] << input_var.getSep()
>>> - << "beta_SNP_odom"
>>> + << "beta_SNP_odomA1"
>>> << input_var.getSep()
>>> - << "sebeta_SNP_odom";
>>> + << "sebeta_SNP_odomA1";
>>> if (input_var.getInteraction() != 0)
>>> {
>>> //Han Chen
>>> @@ -263,7 +263,7 @@
>>> *outfile[1] << input_var.getSep() << "chi2_SNP_A1\n"; // "loglik\n";
>>> *outfile[2] << input_var.getSep() << "chi2_SNP_domA1\n";// "loglik\n";
>>> *outfile[3] << input_var.getSep() << "chi2_SNP_recA1\n";// "loglik\n";
>>> - *outfile[4] << input_var.getSep() << "chi2_SNP_odom\n"; // "loglik\n";
>>> + *outfile[4] << input_var.getSep() << "chi2_SNP_odomA1\n"; // "loglik\n";
>>> }
>>>
>>> void create_header2(std::vector<std::ofstream*>& outfile, cmdvars& input_var,
>>> @@ -389,7 +389,7 @@
>>>
>>> masked_matrix invvarmatrix;
>>>
>>> - std::cout << "Reading data ..." << std::flush;
>>> + std::cout << "Reading data..." << std::flush;
>>> if (input_var.getInverseFilename() != NULL)
>>> {
>>> loadInvSigma(input_var, phd, invvarmatrix);
>>> @@ -412,7 +412,7 @@
>>> phd.allmeasured, phd.idnames);
>>> }
>>>
>>> - std::cout << " loaded genotypic data ..." << std::flush;
>>> + std::cout << " loaded genotypic data..." << std::flush;
>>>
>>> // estimate null model
>>> #if COXPH
>>> @@ -421,7 +421,7 @@
>>> regdata nrgd = regdata(phd, gtd, -1, input_var.isIsInteractionExcluded());
>>> #endif
>>>
>>> - std::cout << " loaded null data ..." << std::flush;
>>> + std::cout << " loaded null data..." << std::flush;
>>> #if LOGISTIC
>>> logistic_reg nrd = logistic_reg(nrgd);
>>> nrd.estimate(nrgd, 0, MAXITER, EPS, CHOLTOL, 0,
>>> @@ -446,14 +446,14 @@
>>> #endif
>>> double null_loglik = nrd.loglik;
>>>
>>> - std::cout << " estimated null model ...";
>>> + std::cout << " estimated null model...";
>>> // end null
>>> #if COXPH
>>> coxph_data rgd(phd, gtd, 0);
>>> #else
>>> regdata rgd(phd, gtd, 0, input_var.isIsInteractionExcluded());
>>> #endif
>>> - std::cout << " formed regression object ...";
>>> + std::cout << " formed regression object...\n";
>>>
>>>
>>> // Open a vector of files that will be used for output. Depending
>>> @@ -505,13 +505,16 @@
>>> for (int i = 0; i < maxmod; i++)
>>> {
>>> beta_sebeta.push_back(new std::ostringstream());
>>> - beta_sebeta[i]->precision(9);
>>> + beta_sebeta[i]->precision(6);
>>> + //*beta_sebeta[i] << scientific;
>>> //Han Chen
>>> covvalue.push_back(new std::ostringstream());
>>> - covvalue[i]->precision(9);
>>> + covvalue[i]->precision(6);
>>> + //*covvalue[i] << scientific;
>>> //Oct 26, 2009
>>> chi2.push_back(new std::ostringstream());
>>> - chi2[i]->precision(9);
>>> + chi2[i]->precision(6);
>>> + //*chi2[i] << scientific;
>>> }
>>>
>>>
>>> @@ -565,10 +568,10 @@
>>> poly = 0;
>>> }
>>>
>>> + // Write mlinfo information to the output file(s)
>>> // Prob data: All models output. One file per model
>>> if (input_var.getNgpreds() == 2)
>>> {
>>> - // Write mlinfo to output:
>>> for (unsigned int file = 0; file < outfile.size(); file++)
>>> {
>>> write_mlinfo(outfile, file, mli, csnp, input_var,
>>> @@ -679,7 +682,7 @@
>>> } // END for(pos = start_pos; pos < rd.beta.nrow; pos++)
>>>
>>>
>>> - //calculate chi2
>>> + //calculate chi^2
>>> //________________________________
>>> //cout << rd.loglik<<" "<<input_var.getNgpreds() << "\n";
>>>
>>> @@ -690,23 +693,41 @@
>>>
>>> if (input_var.getScore() == 0)
>>> {
>>> + double loglik = rd.loglik;
>>> if (gcount != gtd.nids)
>>> {
>>> // If SNP data is missing we didn't
>>> // correctly compute the null likelihood
>>> - *chi2[model] << "NaN";
>>> +
>>> + // Recalculate null likelihood by
>>> + // stripping the SNP data column(s) from
>>> + // the X matrix in the regression object
>>> + // and run the null model estimation again
>>> + // for this SNP.
>>> +// BEWARE, ONLY IMPLEMENTED FOR LINEAR REG!!!
>>> +// TODO LCK
>>> +#ifdef LINEAR
>>> + regdata new_rgd = rgd;
>>> + new_rgd.remove_snp_from_X();
>>> + linear_reg new_null_rd(new_rgd);
>>> + new_null_rd.estimate(new_rgd, 0, CHOLTOL, model,
>>> + input_var.getInteraction(),
>>> + input_var.getNgpreds(),
>>> + invvarmatrix,
>>> + input_var.getRobust(), 1);
>>> +
>>> + *chi2[model] << 2. * (loglik - new_null_rd.loglik);
>>> +#endif
>>> }
>>> else
>>> {
>>> // No missing SNP data, we can compute the LRT
>>> - *chi2[model] << 2. * (rd.loglik - null_loglik);
>>> + *chi2[model] << 2. * (loglik - null_loglik);
>>> }
>>> - //*chi2[model] << rd.loglik;
>>> } else
>>> {
>>> // We want score test output
>>> *chi2[model] << rd.chi2_score;
>>> - //*chi2[model] << "nan";
>>> }
>>> }
>>> } // END first part of if(poly); allele not too rare
>>>
>>> Modified: pkg/ProbABEL/src/mematri1.h
>>> ===================================================================
>>> --- pkg/ProbABEL/src/mematri1.h 2013-08-08 10:07:32 UTC (rev 1285)
>>> +++ pkg/ProbABEL/src/mematri1.h 2013-08-08 11:19:30 UTC (rev 1286)
>>> @@ -301,7 +301,7 @@
>>> }
>>>
>>> template<class DT>
>>> -void mematrix<DT>::delete_column(int delcol)
>>> +void mematrix<DT>::delete_column(const int delcol)
>>> {
>>> if (delcol > ncol || delcol < 0)
>>> {
>>> @@ -333,7 +333,7 @@
>>> }
>>>
>>> template<class DT>
>>> -void mematrix<DT>::delete_row(int delrow)
>>> +void mematrix<DT>::delete_row(const int delrow)
>>> {
>>> if (delrow > nrow || delrow < 0)
>>> {
>>>
>>> Modified: pkg/ProbABEL/src/mematrix.h
>>> ===================================================================
>>> --- pkg/ProbABEL/src/mematrix.h 2013-08-08 10:07:32 UTC (rev 1285)
>>> +++ pkg/ProbABEL/src/mematrix.h 2013-08-08 11:19:30 UTC (rev 1286)
>>> @@ -48,8 +48,8 @@
>>> void put(DT value, int nr, int nc);
>>> DT column_mean(int nc);
>>> void print(void);
>>> - void delete_column(int delcol);
>>> - void delete_row(int delrow);
>>> + void delete_column(const int delcol);
>>> + void delete_row(const int delrow);
>>>
>>> };
>>>
>>>
>>> Modified: pkg/ProbABEL/src/reg1.cpp
>>> ===================================================================
>>> --- pkg/ProbABEL/src/reg1.cpp 2013-08-08 10:07:32 UTC (rev 1285)
>>> +++ pkg/ProbABEL/src/reg1.cpp 2013-08-08 11:19:30 UTC (rev 1286)
>>> @@ -4,12 +4,22 @@
>>> mematrix<double> apply_model(mematrix<double>& X, int model, int interaction,
>>> int ngpreds, bool is_interaction_excluded,
>>> bool iscox, int nullmodel)
>>> +// if ngpreds==1 (dose data):
>>> +// model 0 = additive 1 df
>>> +// if ngpreds==2 (prob data):
>>> // model 0 = 2 df
>>> // model 1 = additive 1 df
>>> // model 2 = dominant 1 df
>>> // model 3 = recessive 1 df
>>> // model 4 = over-dominant 1 df
>>> {
>>> + if(nullmodel)
>>> + {
>>> + // No need to apply any genotypic model when calculating the
>>> + // null model
>>> + return (X);
>>> + }
>>> +
>>> if (model == 0)
>>> {
>>> if (interaction != 0 && !nullmodel)
>>> @@ -295,12 +305,13 @@
>>> if (verbose)
>>> {
>>> cout << rdata.is_interaction_excluded
>>> - << " <-irdata.is_interaction_excluded\n";
>>> + << " <-rdata.is_interaction_excluded\n";
>>> // std::cout << "invvarmatrix:\n";
>>> // invvarmatrixin.masked_data->print();
>>> std::cout << "rdata.X:\n";
>>> rdata.X.print();
>>> }
>>> +
>>> mematrix<double> X = apply_model(rdata.X, model, interaction, ngpreds,
>>> rdata.is_interaction_excluded, false,
>>> nullmodel);
>>> @@ -311,6 +322,7 @@
>>> std::cout << "Y:\n";
>>> rdata.Y.print();
>>> }
>>> +
>>> int length_beta = X.ncol;
>>> beta.reinit(length_beta, 1);
>>> sebeta.reinit(length_beta, 1);
>>>
>>> Modified: pkg/ProbABEL/src/regdata.cpp
>>> ===================================================================
>>> --- pkg/ProbABEL/src/regdata.cpp 2013-08-08 10:07:32 UTC (rev 1285)
>>> +++ pkg/ProbABEL/src/regdata.cpp 2013-08-08 11:19:30 UTC (rev 1286)
>>> @@ -39,7 +39,7 @@
>>>
>>> for (int i = 0; i < nids; i++)
>>> {
>>> - masked_data[i] = 0;
>>> + masked_data[i] = obj.masked_data[i];
>>> }
>>> }
>>>
>>> @@ -95,6 +95,9 @@
>>>
>>> void regdata::update_snp(gendata &gend, int snpnum)
>>> {
>>> + // Add genotypic data (dosage or probabilities) to the design
>>> + // matrix X.
>>> +
>>> for (int j = 0; j < ngpreds; j++)
>>> {
>>> double snpdata[nids];
>>> @@ -109,11 +112,34 @@
>>> {
>>> X.put(snpdata[i], i, (ncov - j));
>>> if (isnan(snpdata[i]))
>>> + {
>>> masked_data[i] = 1;
>>> + }
>>> }
>>> }
>>> }
>>>
>>> +void regdata::remove_snp_from_X()
>>> +{
>>> + // update_snp() adds SNP information to the design matrix. This
>>> + // function allows you to strip that information from X again.
>>> + // This is used for example when calculating the null model.
>>> +
>>> + if(ngpreds == 1)
>>> + {
>>> + X.delete_column(X.ncol -1);
>>> + }
>>> + else if(ngpreds == 2)
>>> + {
>>> + X.delete_column(X.ncol -1);
>>> + X.delete_column(X.ncol -1);
>>> + }
>>> + else
>>> + {
>>> + cerr << "ngpreds should be 1 or 2. you should never come here!\n";
>>> + }
>>> +}
>>> +
>>> regdata::~regdata()
>>> {
>>> delete[] regdata::masked_data;
>>>
>>> Modified: pkg/ProbABEL/src/regdata.h
>>> ===================================================================
>>> --- pkg/ProbABEL/src/regdata.h 2013-08-08 10:07:32 UTC (rev 1285)
>>> +++ pkg/ProbABEL/src/regdata.h 2013-08-08 11:19:30 UTC (rev 1286)
>>> @@ -34,6 +34,7 @@
>>> bool ext_is_interaction_excluded);
>>> mematrix<double> extract_genotypes();
>>> void update_snp(gendata &gend, int snpnum);
>>> + void remove_snp_from_X();
>>> regdata get_unmasked_data();
>>> ~regdata();
>>>
>>>
>>> _______________________________________________
>>> Genabel-commits mailing list
>>> Genabel-commits at lists.r-forge.r-project.org
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits
>> _______________________________________________
>> genabel-devel mailing list
>> genabel-devel at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>>
>
>
>
>
> _______________________________________________
> genabel-devel mailing list
> genabel-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>
--
-----------------------------------------------------------------
L.C. Karssen
Utrecht
The Netherlands
lennart at karssen.org
http://blog.karssen.org
Stuur mij aub geen Word of Powerpoint bestanden!
Zie http://www.gnu.org/philosophy/no-word-attachments.nl.html
------------------------------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 230 bytes
Desc: OpenPGP digital signature
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20130814/9f1cf369/attachment.sig>
More information about the genabel-devel
mailing list