[GenABEL-dev] [Genabel-commits] r1573 - branches/ProbABEL-0.50/src
L.C. Karssen
lennart at karssen.org
Thu Jan 30 10:54:00 CET 2014
Hi Maarten,
Interesting changes! More use of Eigen is good :-).
I've got a few questions about this commit:
1) Is it still possible to compile ProbABEL/palinear without EIGEN? Or
is it time to decide to really depend on it?
2) I see you're using a lot of MatrixXd variables. These are matrices of
double type. Wouldn't it make more sense to use the Matrix<double,
Dynamic, Dynamic, RowMajor> construction? In case we ever decide to go
with all computations in float (would be an easier search/replace of
double with float)? Or is that a much slower implementation? I know it's
a bit of a weird argument, but I was wondering if you gave it a thought.
3) Are you planning to do some computation time comparisons with and
without these changes?
I've got a few more questions and comments in your below.
Thanks for all the good work!
Lennart.
On 30-01-14 00:00, noreply at r-forge.r-project.org wrote:
> Author: maartenk
> Date: 2014-01-30 00:00:40 +0100 (Thu, 30 Jan 2014)
> New Revision: 1573
>
> Modified:
> branches/ProbABEL-0.50/src/reg1.cpp
> branches/ProbABEL-0.50/src/reg1.h
> Log:
> changed linear_reg::estimate to use more eigen functions: cholesky.h is not nessary for palinear.
>
> Modified: branches/ProbABEL-0.50/src/reg1.cpp
> ===================================================================
> --- branches/ProbABEL-0.50/src/reg1.cpp 2014-01-29 17:25:58 UTC (rev 1572)
> +++ branches/ProbABEL-0.50/src/reg1.cpp 2014-01-29 23:00:40 UTC (rev 1573)
> @@ -285,17 +285,14 @@
> chi2_score = chi2[0];
> }
>
> +
> void linear_reg::estimate(regdata& rdatain, int verbose, double tol_chol,
> int model, int interaction, int ngpreds, masked_matrix& invvarmatrixin,
> int robust, int nullmodel) {
> // suda interaction parameter
> // model should come here
> regdata rdata = rdatain.get_unmasked_data();
> - if (invvarmatrixin.length_of_mask != 0)
> - {
> - invvarmatrixin.update_mask(rdatain.masked_data);
> - // invvarmatrixin.masked_data->print();
> - }
> +
> if (verbose)
> {
> cout << rdata.is_interaction_excluded
> @@ -334,8 +331,14 @@
>
> double sigma2_internal;
> mematrix<double> tXX_i;
> +#if EIGEN
> + LDLT <MatrixXd> Ch ;
> +#endif
> if (invvarmatrixin.length_of_mask != 0)
> {
> + //retrieve masked data W
> + invvarmatrixin.update_mask(rdatain.masked_data);
> +
> // This regression is Weighted Least Square: used for mmscore :
> // FLOPS count are calculated for 3*1000 matrix as follow:
> //C=AB (m X n matrix A and n x P matrix B)
> @@ -343,6 +346,10 @@
> //Oct 26, 2009
> mematrix<double> tXW = transpose(X) * invvarmatrixin.masked_data; // flops 5997000
> tXX_i = tXW * X; // 17991 flops
> +#if EIGEN
> + Ch=LDLT <MatrixXd>(tXX_i.data.selfadjointView<Lower>());
> +#endif
> +
it seems that you do the work for EIGEN in an #if/#endif, but the
cholesky2_mm is still used below. Shouldn't that be in the #else part?
> // use cholesky to invert
> cholesky2_mm(tXX_i, tol_chol);
> chinv2_mm(tXX_i);
> @@ -356,23 +363,41 @@
> sigma2 += val * val; // flops: 3000
> }
> double N = X.nrow;
> - sigma2_internal = sigma2 / (N - static_cast<double>(length_beta));
> + //sigma2_internal = sigma2 / (N - static_cast<double>(length_beta));
> + // Ugly fix to the fact that if we do mmscore, sigma2 is already
> + // in the matrix...
> + // YSA, 2009.07.20
> + sigma2_internal = 1.0;
> sigma2 /= N;
> - //NO mm-score regression : normal least square regression
> +
> }
> - else
> + else//NO mm-score regression : normal least square regression
> {
> +#if EIGEN
> + int m = X.ncol;
> + MatrixXd txx = MatrixXd(m, m).setZero().selfadjointView<Lower>().rankUpdate(X.data.adjoint());
> + Ch=LDLT <MatrixXd>(txx.selfadjointView<Lower>());
> + beta.data= Ch.solve(X.data.adjoint() * rdata.Y.data);
> +
> + tXX_i.data=Ch.solve(MatrixXd(m, m).Identity(m,m));
> + tXX_i.nrow=tXX_i.data.rows();
> + tXX_i.ncol=tXX_i.data.cols();
> + tXX_i.nelements=tXX_i.ncol*tXX_i.nrow;
> +
> +#else
> mematrix<double> tX = transpose(X);
> // use cholesky to invert
> - tXX_i = tX * X;
> - cholesky2_mm(tXX_i, tol_chol);
> - chinv2_mm(tXX_i);
> - beta = tXX_i * (tX * (rdata.Y));
> + tXX_i = tX * X;
> + cholesky2_mm(tXX_i, tol_chol);
> + chinv2_mm(tXX_i);
> + beta = tXX_i * (tX * (rdata.Y));
> +#endif
> +
> // now compute residual variance
> sigma2 = 0.;
> mematrix<double> sigma2_matrix = rdata.Y - (X * beta);
> #if EIGEN
> - sigma2 = sigma2_matrix.data.array().square().sum();
> + sigma2 = sigma2_matrix.data.squaredNorm() ;
> #else
> for (int i = 0; i < sigma2_matrix.nrow; i++)
> {
> @@ -380,8 +405,9 @@
> sigma2 += val * val;
> }
> #endif
> - double N = X.nrow;
> - sigma2_internal = sigma2 / (N - static_cast<double>(length_beta));
> + double N = static_cast<double>(X.nrow);
Code layout: remember we have a space around operators :-)
> + double P=static_cast<double>(length_beta);
> + sigma2_internal = sigma2 / (N - P);
> sigma2 /= N;
> }
> /*
> @@ -407,8 +433,8 @@
> double intercept = beta.get(0, 0);
> residuals.data= rdata.Y.data.array()-intercept;
> //matrix.
> - Eigen::ArrayXXd betacol = beta.data.block(1,0,beta.data.rows()-1,1).array().transpose();
> - Eigen::ArrayXXd resid_sub = (X.data.block(0,1,X.data.rows(),X.data.cols()-1)*betacol.matrix().asDiagonal()).rowwise().sum() ;
And also space after comma's and lines shorter than 80 characters...
Otherwise Jenkins will be pissed again ;-).
> + ArrayXXd betacol = beta.data.block(1,0,beta.data.rows()-1,1).array().transpose();
> + ArrayXXd resid_sub = (X.data.block(0,1,X.data.rows(),X.data.cols()-1)*betacol.matrix().asDiagonal()).rowwise().sum() ;
> //std::cout << resid_sub << std::endl;
> residuals.data-=resid_sub.matrix();
> //residuals[i] -= resid_sub;
> @@ -422,41 +448,77 @@
> double resid = rdata.Y[i] - beta.get(0, 0); // intercept
> for (int j = 1; j < beta.nrow; j++){
> resid -= beta.get(j, 0) * X.get(i, j);
> -
> }
> residuals[i] = resid;
> loglik -= halfrecsig2 * resid * resid;
> }
> #endif
>
> -
> -
> loglik -= static_cast<double>(rdata.nids) * log(sqrt(sigma2));
> - // cout << "estimate " << rdata.nids << "\n";
> - //
> - // Ugly fix to the fact that if we do mmscore, sigma2 is already
> - // in the matrix...
> - // YSA, 2009.07.20
> - //
> - //cout << "estimate 0\n";
> - if (invvarmatrixin.length_of_mask != 0)
> - sigma2_internal = 1.0;
> +#if EIGEN
> + MatrixXd tXX_inv=Ch.solve(MatrixXd(length_beta, length_beta).Identity(length_beta,length_beta));
> +#endif
>
> mematrix<double> robust_sigma2(X.ncol, X.ncol);
> if (robust)
> {
> +#if EIGEN
> + MatrixXd Xresiduals = X.data.array().colwise()*residuals.data.col(0).array();
> + MatrixXd XbyR = MatrixXd(X.ncol, X.ncol).setZero().selfadjointView<Lower>().rankUpdate(Xresiduals.adjoint());
> + robust_sigma2.data= tXX_inv*XbyR *tXX_inv;
> +#else
> +
> mematrix<double> XbyR = X;
> - for (int i = 0; i < X.nrow; i++)
> + for (int i = 0; i < X.nrow; i++){
> for (int j = 0; j < X.ncol; j++)
> {
> double tmpval = XbyR.get(i, j) * residuals[i];
> XbyR.put(tmpval, i, j);
> }
> + }
> XbyR = transpose(XbyR) * XbyR;
> robust_sigma2 = tXX_i * XbyR;
> robust_sigma2 = robust_sigma2 * tXX_i;
> +
> +#endif
> +
> +
> }
> //cout << "estimate 0\n";
> +#if EIGEN
> + if (robust)
> + {
> + sebeta.data = robust_sigma2.data.diagonal().array().sqrt();
> + }
> + else
> + {
> + sebeta.data =
> + (sigma2_internal
> + * tXX_inv.diagonal().array()).sqrt();
> + }
> + int offset=X.ncol- 1;
> + //if additive and interaction and 2 predictors and more then 2 betas
> +
> + if (model == 0 && interaction != 0 && ngpreds == 2 && length_beta > 2){
> + offset=X.ncol - 2;
> + }
> +
> + if (robust)
> + {
> + covariance.data = robust_sigma2.data.bottomLeftCorner(
> + offset, offset).diagonal();
> +
> + }
> + else
> + {
> + covariance.data = sigma2_internal
> + * tXX_inv.bottomLeftCorner(offset,
> + offset).diagonal().array();
> + }
> +
> +#else
> +
> + //cout << "estimate 0\n";
> for (int i = 0; i < (length_beta); i++)
> {
> if (robust)
> @@ -510,12 +572,8 @@
> //Oct 26, 2009
> }
> }
> - //cout << "estimate E\n";
> - if (verbose)
> - {
> - std::cout << "sebeta (" << sebeta.nrow << "):\n";
> - sebeta.print();
> - }
> +#endif
> +
> }
>
> void linear_reg::score(mematrix<double>& resid, regdata& rdatain, int verbose,
>
> Modified: branches/ProbABEL-0.50/src/reg1.h
> ===================================================================
> --- branches/ProbABEL-0.50/src/reg1.h 2014-01-29 17:25:58 UTC (rev 1572)
> +++ branches/ProbABEL-0.50/src/reg1.h 2014-01-29 23:00:40 UTC (rev 1573)
> @@ -30,6 +30,7 @@
> #include "regdata.h"
> #include "maskedmatrix.h"
>
> +
> mematrix<double> apply_model(mematrix<double>& X, int model, int interaction,
> int ngpreds, bool is_interaction_excluded, bool iscox = false,
> int nullmodel = 0);
>
> _______________________________________________
> Genabel-commits mailing list
> Genabel-commits at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits
>
--
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
L.C. Karssen
Utrecht
The Netherlands
lennart at karssen.org
http://blog.karssen.org
GPG key ID: A88F554A
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 230 bytes
Desc: OpenPGP digital signature
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20140130/cb5ec6c3/attachment.sig>
More information about the genabel-devel
mailing list