[Rcpp-devel] for loop and random draws

Denis Haine cheval at zaclys.net
Fri Oct 25 21:42:40 CEST 2024



Hi,

Sorry to get back to this, but I can't figure out how to solve it. Do 
not hesitate to direct me to another forum if this one is not 
appropriate. I don't think there's anything wrong with Rcpp, but I might 
not proceed with the loop in Rcpp the right way.

So I tried various regressions in the Rcpp for loop (fastglm, 
Rfast::glm_poisson, even fastlm; see code at end of email) and they all 
output the same saved matrix, but a different one than without including 
a regression step. Compared to a pure R code, the R code with or without 
a regression output the same matrix as the Rcpp loop without a 
regression (and therefore different from the ones with a regression), 
i.e:

- R loop without regression, R loop with regression (glm or fastglm), 
Rcpp loop without regression = all.equal TRUE.

- Rcpp loop with regression (fastglm, Rfast, or fastlm) =  all.equal 
TRUE but different from above.

I'm totally puzzled on why I've got these results. How can I get the 
same output with Rcpp + regression as with the R loop +/- regression?

Denis

NumericMatrix define_e(int iter, NumericMatrix obs_mat, NumericVector 
draws) {
   Rcpp::NumericVector d = obs_mat(_, 1);
   Rcpp::NumericVector pr = no_init(d.length());
   Rcpp::NumericMatrix e(d.length(), iter);
   for (int i = 0; i < iter; i++) {
     pr = obs_mat(_, 0) * draws(i);
     e(_, i) = cpprbinom(d.length(), 1, pr);
   }
   return e;
}

NumericMatrix define_ereg1(int iter, NumericMatrix obs_mat, 
NumericVector draws) {
   Environment pkg = Environment::namespace_env("fastglm");
   Function f = pkg["fastglm"];

   Rcpp::NumericVector d = obs_mat(_, 1);
   Rcpp::NumericVector pr = no_init(d.length());
   Rcpp::NumericMatrix e(d.length(), iter);
   Rcpp::NumericMatrix ematrix(d.length(), 2);
   Rcpp::NumericVector v(d.length(), 1);
   ematrix(_, 0) = v;
   Rcpp::List mod_pois;
   for (int i = 0; i < iter; i++) {
     pr = obs_mat(_, 0) * draws(i);
     e(_, i) = cpprbinom(d.length(), 1, pr);

     ematrix(_, 1) = e;
     mod_pois = f(Named("x", ematrix), Named("y", d), Named("family", 
"poisson"));
   }
   return e;
}

NumericMatrix define_ereg2(int iter, NumericMatrix obs_mat, 
NumericVector draws) {
   Environment pkg = Environment::namespace_env("Rfast");
   Function f = pkg["glm_poisson"];

   Rcpp::NumericVector d = obs_mat(_, 1);
   Rcpp::NumericVector pr = no_init(d.length());
   Rcpp::NumericMatrix e(d.length(), iter);
   Rcpp::NumericMatrix ematrix(d.length(), 2);
   Rcpp::NumericVector v(d.length(), 1);
   ematrix(_, 0) = v;
   Rcpp::List mod_pois;
   for (int i = 0; i < iter; i++) {
     pr = obs_mat(_, 0) * draws(i);
     e(_, i) = cpprbinom(d.length(), 1, pr);

     ematrix(_, 1) = e;
     mod_pois = f(Named("x", ematrix(_, 1)), Named("y", d));
   }
   return e;
}

NumericMatrix define_ereg3(int iter, NumericMatrix obs_mat, 
NumericVector draws) {
   Environment pkg = Environment::namespace_env("RcppArmadillo");
   Function f = pkg["fastLm"];

   Rcpp::NumericVector d = obs_mat(_, 1);
   Rcpp::NumericVector pr = no_init(d.length());
   Rcpp::NumericMatrix e(d.length(), iter);
   Rcpp::NumericMatrix ematrix(d.length(), 2);
   Rcpp::NumericVector v(d.length(), 1);
   ematrix(_, 0) = v;
   Rcpp::List mod_pois;
   for (int i = 0; i < iter; i++) {
     pr = obs_mat(_, 0) * draws(i);
     e(_, i) = cpprbinom(d.length(), 1, pr);

     ematrix(_, 1) = e;
     mod_pois = f(Named("X", ematrix), Named("y", d));
   }
   return e;
}

draws <- rbeta(200, 100, 60)
obs_mat <- cbind(c(rep(1, 40), rep(0, 60)), c(rep(1, 20), rep(0, 80)))
iter <- length(draws)
set.seed(1234)
t0 <- define_e(iter, obs_mat, draws)

e <- matrix(NA, nrow = nrow(obs_mat), ncol = iter)
e_obs <- obs_mat[, 1]
p <- rep(NA, nrow(obs_mat))
set.seed(1234)
for (i in 1:iter) {
       p <- e_obs * draws[i]
       e[, i] <- rbinom(nrow(obs_mat), 1, p)
}
all.equal(t0, e)  # TRUE

e1 <- matrix(NA, nrow = nrow(obs_mat), ncol = iter)
e_obs <- obs_mat[, 1]
p <- rep(NA, nrow(obs_mat))
set.seed(1234)
for (i in 1:iter) {
      p <- e_obs * draws[i]
      e1[, i] <- rbinom(nrow(obs_mat), 1, p)
      X <- cbind(rep(1, nrow(obs_mat)), e1[, i])
      d <- obs_mat[, 2]
      mod_pois <- glm(d ~ X, family = poisson(link = "log"))
}
all.equal(e, e1)  # TRUE
all.equal(e1, t0)  # TRUE

e2 <- matrix(NA, nrow = nrow(obs_mat), ncol = iter)
e_obs <- obs_mat[, 1]
p <- rep(NA, nrow(obs_mat))
set.seed(1234)
for (i in 1:iter) {
      p <- e_obs * draws[i]
      e2[, i] <- rbinom(nrow(obs_mat), 1, p)
      X <- cbind(rep(1, nrow(obs_mat)), e2[, i])
      d <- obs_mat[, 2]
      mod_pois <- fastglm::fastglm(X, d, family = poisson(link = "log"))
}
all.equal(e, e2)  # TRUE
all.equal(e2, t0)  # TRUE

set.seed(1234)
t1 <- define_ereg1(iter, obs_mat, draws)
all.equal(t1, e)  # [1] "Mean relative difference: 1.788299"

set.seed(1234)
t2 <- define_ereg2(iter, obs_mat, draws)
all.equal(t2, e)  # [1] "Mean relative difference: 1.788299"
all.equal(t2, t1)  # TRUE

set.seed(1234)
t3 <- define_ereg3(iter, obs_mat, draws)
all.equal(t3, e)  # "Mean relative difference: 1.788299"
all.equal(t3, t1)  # TRUE
all.equal(t3, t2)  # TRUE

Le 2024-10-01 13:19, Dirk Eddelbuettel a écrit :

> On 1 October 2024 at 11:57, Denis Haine wrote:
> | But adding the regression to define_e1() (as define_e2() in previous 
> email)
> | does not give the same. The first iteration in the loop is identical 
> in both,
> | but not the subsequent ones. That's why I was thinking it could be 
> linked to
> | the seed, but (sorry for my limited knowledge of C++) it would be 
> strange to
> | manually "re-seed" at each iteration. Moreover, I believe the seed 
> set in R is
> | transferred to Rcpp and so does not need to be taken specifically 
> care of.
> 
> Nice. So it seems to me that you identified fastglm as the source of 
> your
> problem. When we call it, your results differ and there is a 
> side-effect.
> 
> But when we don't it is still true. I added
> 
> Environment pkg2 = Environment::namespace_env("base");
> Function f2 = pkg2["dim"];
> 
> // ...
> 
> RObject ro = f2(ematrix);
> 
> while keeping the lines relevant to fastglm calls and result storage
> commented out.  Now we get TRUE from all.equal(t0,t1).
> 
> That seems to rule out Rcpp::Function, and may indicate you need to 
> look into
> what fastglm may be doing.
> 
> Cheers, Dirk
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20241025/c8e2119b/attachment.htm>


More information about the Rcpp-devel mailing list