[Genabel-commits] r1658 - pkg/ProbABEL/src
Yurii Aulchenko
yurii.aulchenko at gmail.com
Mon Mar 24 09:46:47 CET 2014
Sum of logs = log of prods
The latter is in principle dangerous practice when we multiply
probabilities - the product is getting close to zero very fast
potentially leading to numerical problems (of which machine zero is
the smallest because easily detected)
Yurii
----------------------
Yurii Aulchenko
(sent from mobile device)
> On Mar 21, 2014, at 11:57 PM, "noreply at r-forge.r-project.org" <noreply at r-forge.r-project.org> wrote:
>
> Author: lckarssen
> Date: 2014-03-21 23:57:20 +0100 (Fri, 21 Mar 2014)
> New Revision: 1658
>
> Modified:
> pkg/ProbABEL/src/reg1.cpp
> Log:
> Speed-ups in ProbABEL's logistic regression.
> Profiling showed lots of (expensive) calls to exp() and log(). I got rid of ~ 1/3 of the exp() calls by saving the result in a variable and reusing it in the calculation of exp(mu) / ( 1+exp(mu) ).
> The number of log() calls was reduced even more by using the fact that sum_i( log(x_i) ) = log( prod_i(x_i) )
>
> In total this gives roughly a speed up of 30% -- 40% when reading txt dosage files and roughly 20% -- 30% when using filevector files (measured for dosage data).
>
>
> Modified: pkg/ProbABEL/src/reg1.cpp
> ===================================================================
> --- pkg/ProbABEL/src/reg1.cpp 2014-03-21 21:05:44 UTC (rev 1657)
> +++ pkg/ProbABEL/src/reg1.cpp 2014-03-21 22:57:20 UTC (rev 1658)
> @@ -719,7 +719,8 @@
> double emu = eMu.get(i, 0);
> double value = emu;
> double zval;
> - value = exp(value) / (1. + exp(value));
> + double expval = exp(value);
> + value = expval / (1. + expval);
> residuals[i] = (rdata.Y).get(i, 0) - value;
> eMu.put(value, i, 0);
> W.put(value * (1. - value), i, 0);
> @@ -778,11 +779,24 @@
> beta.print();
> }
> // std::cout << "beta:\n"; beta.print();
> - // compute likelihood
> +
> + // Compute the likelihood. The following commented code gives
> + // the 'easy to understand' algorithm. The code that's
> + // actually used is mathematically equivalent (remember:
> + // log(a*b) = log(a)+log(b)), but faster because log() is
> + // relatively expensive.
> + // for (int i = 0; i < eMu.nrow; i++) {
> + // loglik += rdata.Y[i] * eMu_us[i] - log(1. +
> + // exp(eMu_us[i]));
> + // }
> prevlik = loglik;
> loglik = 0.;
> - for (int i = 0; i < eMu.nrow; i++)
> - loglik += rdata.Y[i] * eMu_us[i] - log(1. + exp(eMu_us[i]));
> + double logterm = 1;
> + for (int i = 0; i < eMu.nrow; i++) {
> + loglik += rdata.Y[i] * eMu_us[i];
> + logterm *= 1. + exp(eMu_us[i]);
> + }
> + loglik += - log(logterm);
>
> delta = fabs(1. - (prevlik / loglik));
> niter++;
>
> _______________________________________________
> Genabel-commits mailing list
> Genabel-commits at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-commits
More information about the Genabel-commits
mailing list