[Rcpp-devel] Avoiding memory allocations when using Armadillo, BLAS, and LAPACK
Nathan Kurz
nate at verse.com
Wed Feb 18 22:31:11 CET 2015
On Wed, Feb 18, 2015 at 8:00 AM, Dirk Eddelbuettel <edd at debian.org> wrote:
> It is a little challenging to keep up with your ability to ask this question
> here, on StackOverflow and again on r-devel. As I first saw it here, I'll
> answer here.
Sorry about that. I was planning to do it first only on StackOverflow
so the code samples would be better formatted, and then saw your
frequent exhortations to use Rcpp-devel instead. The r-devel post is
actually a separate issue that happens to share the same example. In
that one, I'm exploring a patch along the lines of Radford's pqr work
that would require changing the way R-core handles all of it's
allocations.
> R is a dynamically-typed interpreted language with many bells and whistles,
> but also opaque memory management.
>From the Rcpp point of view, it probably should be considered opaque.
>From the R-core side, I think it would be useful if there were more
people exploring ways to improve it. From the measurements I've made,
I think improving R's memory management might be the lowest hanging
fruit for improving R's overall performance.
> My recommendation always is to "if in
> doubt and when working with large objects" to maybe just step aside and do
> something different outside of R.
Sidestepping R is definitely a clear path to high performance, but for
this particular project I'm trying to figure out ways to write things
that interoperate with existing R code and can be modified by R
programmers unfamiliar with C++. I'm hoping that there is a subset
of R "design patterns" that produces acceptably high performance. In
order of preference, I'm hoping to:
1) Find a way to write high performance code in straight R.
2) Failing that, write an Rcpp extension for high performance BLAS.
3) Failing that, write a C/C++ library and an Rcpp interface to it.
The r-devel thread is concentrating on the first, how to improve the
performance of core R code by reducing memory churn. The thread here
is concentrating on the second, how to write Rcpp extensions that use
BLAS/LAPACK functions more efficiently. You'll be happy to know that
I've not yet started a thread on the third approach!
> You mentioned logistic regression.
I added a more complete code sample to the StackOverflow question.
For the actual work, I'm thinking to implement Komareks LR-TRIRLS,
since the algorithmic advantage is probably going to be greater than
the implementation difference.
> And we did something similar at work: use
> Rcpp as well as bigmemory via external pointers (ie Rcpp::XPtr) so that *you*
> can allocate one chunk of memory *you* control, and keep at arm's length of
> R.
The bigmemory/bigalgebra combination comes quite close to what I'm
trying to do, but I'm scared to rely on it. The namespace is a
hodgepodge, I don't need the large than memory aspects, it's not
actively maintained, and some parts seem broken or missing.
XPtr is interesting, but conceptually it seems like there should be a
way to work with R's memory management rather than trying to sidestep
it. Perhaps I'm being naive.
> Implementing a simple glm-alike wrapper over that to fit a logistic
> regression in then not so hard.
Mostly yes, with the possible exception of the question I'm focussing on :)
How do I enable the R syntax "wQ = w * Q" to reuse the preallocated
space for wQ rather than allocating a new temporary? While doing a
system level malloc() is cheaper than letting R handle it, it's still
much less efficient than reuse.
> We even did it in a multicore context to get
> extra parallelism still using only that one (large!!) chunk of memory.
Yes, fork() plus copy-on-write seems like a great performance
combination. But as long as the memory isn't modified, I think this
should work just as well for native R variables (or Rcpp::Numeric) as
for external pointers.
--nate
More information about the Rcpp-devel
mailing list