[Rcpp-devel] Interfacing Rcpp with a C library: memory allocation

Dirk Eddelbuettel edd at debian.org
Thu Nov 21 15:40:37 CET 2013


On 21 November 2013 at 14:37, Alessandro Mammana wrote:
| I found out what the problem was, luckily it was just a stupid bug in
| my code (an array index out of bounds).
| I still didn't quite get how memory allocation works. As far as I
| understood there are these two ways of allocating memory:
| 
| 1. Using malloc/free, new/delete, normal C++ constructors and
| destructors: this memory area is completely separated from R memory
| area and it is erased at the end of the Rcpp call. Pointers to these
| addresses will not be valid in a second Rcpp call.
| 2. Using Rcpp constructors/destructors, Calloc() and Free(): this is
| the same memory used by R and it is suitable for passing objects in
| and out.
| 
| Is that right?

Not really. The topic is complicated, and part of it is the difference
between beginner/intermediate ("you know enough to be dangerous") and more
experienced users ("you know what not to do").

If you look at any of our recent presentations or packages, you will never
ever see a single 'new/delete' or 'malloc/free'.   [ Package internals do use
it, but user code should not need to. ]

The general idea is to write __C++__ and not C. In C++, just about anything
you need to do you can (and should !!) do with the standard containers.
Someone joked the other day that 90% of all C++ answers on StackOverflow
could be summed up as saying "just use std::vector".

And in the context of Rcpp, just use our containers.  A NumericVector (or
...Matrix, idem for Integer...) just goes back to R as a native vector (or
matrix) and you do not have to worry about anything. Same for List etc pp/

Dirk

| Thanks again,
| Ale
| 
| 
| On Wed, Nov 20, 2013 at 7:25 PM, Dirk Eddelbuettel <edd at debian.org> wrote:
| >
| > On 20 November 2013 at 18:54, Alessandro Mammana wrote:
| > | Thanks a lot for your help, I'll look at the references. I saw Section
| > | 6.1.2 already. Does that mean that I should replace  all the calls to
| > | malloc() and to free() with those to Calloc() and Free() in the
| > | external C code (maybe with a macro?)? Or does it just mean that I
| > | would get the same memory errors, but this way they are handled by R?
| >
| > The idea is not to get errors.
| >
| > Let's assume that the library you use is in itself correct.  You could then
| > use it and write wrapper functions. In the wrapper functions, you create Rcpp
| > type and assign to those. That would avoid having to alter the library. A
| > good defensive approach.
| >
| > Depending on how well you know what you are doing, you can do something like
| > too where we assume we have  someFunction(double* p):
| >
| >     int n;
| >
| >     Rcpp::NumericVector x(n);      // vector of n element
| >
| >     someFunction(x.begin());       // someFunction sees a double*
| >
| > which is how I would interface a C library.  The devil is in the detail, and
| > you still have not provide any so we can't help you much more than via
| > generalities.
| >
| > Pick one function from the library. Interface it. Get it work without a
| > segfault.  Apply what you learned on another function etc pp.
| >
| > There are other approaches (Rcpp modules is one) but you may to make sure
| > you clear a few conceptual hurdles first.
| >
| > Dirk
| >
| >
| >
| > |
| > | Thanks a lot!
| > | Ale
| > |
| > | On Wed, Nov 20, 2013 at 6:35 PM, Dirk Eddelbuettel <edd at debian.org> wrote:
| > | >
| > | > On 20 November 2013 at 18:27, Alessandro Mammana wrote:
| > | > | Dear all,
| > | > | I'm trying to write some efficient code for analyzing sequencing data
| > | > | in R. To do this I would like to use the C library samtools. I've
| > | > | created a package where the src directory looks like this:
| > | > |
| > | > | src
| > | > | |-- Makevars
| > | > | |-- RcppExports.cpp
| > | > | |-- mysourcecode.cpp
| > | > | `-- samtools
| > | > |     |-- all *.c and *.h files as well as an independent Makefile
| > | > |
| > | > | My Makevars file looks like this:
| > | > |
| > | > | PKG_CPPFLAGS = -Isamtools
| > | > | PKG_LIBS = `$(R_HOME)/bin/Rscript -e "Rcpp:::LdFlags()"`
| > | > | PKG_LIBS += -Lsamtools  -lbam -lz -lpthread
| > | > |
| > | > | $(SHLIB): samtools/libbam.a
| > | > |
| > | > | samtools/libbam.a:
| > | > |       @(cd samtools-0.1.19 && $(MAKE) libbam.a \
| > | > |           CC="$(CC)" CFLAGS="$(CFLAGS) $(CPICFLAGS)" AR="$(AR)"
| > | > | RANLIB="$(RANLIB)")
| > | > |
| > | > | Everything compiles and I get my cpp functions in R, however I am
| > | > | getting some weird segfaults, I think they are due to memory
| > | > | allocation, but it's hard for me to track them. Especially now, these
| > | > | errors are showing up not immediately, but at the second time that I
| > | > | call a Rcpp function.
| > | > |
| > | > | I wanted to ask the following:
| > | > | 1. Is it the right way of using external C libraries? I couldn't find
| > | > | much documentation around
| > | >
| > | > Sure.
| > | >
| > | > | 2. The C library uses malloc and free, and so do I (as little as
| > | > | possible, just to interface with the library), is this mechanism
| > | > | clashing against Rcpp/R memory management? Could it happen, for
| > | > | instance, that R tries to free allocated memory that I already
| > | > | manually freed myself?
| > | >
| > | > Yes.
| > | >
| > | > Please read the "Writing R Extensions" manual, section 6.1.2, on
| > | > User-controlled memory.
| > | >
| > | > And/or see the Rcpp documentation: if you actually use our data containers,
| > | > things "just work", ie
| > | >
| > | >   R> cppFunction("NumericMatrix foo(int n) { return NumericMatrix(n); }")
| > | >   R> foo(2)
| > | >        [,1] [,2]
| > | >   [1,]    0    0
| > | >   [2,]    0    0
| > | >   R> foo(4)
| > | >        [,1] [,2] [,3] [,4]
| > | >   [1,]    0    0    0    0
| > | >   [2,]    0    0    0    0
| > | >   [3,]    0    0    0    0
| > | >   [4,]    0    0    0    0
| > | >   R>
| > | >
| > | > will never create a segfault. Ditto for using proper C++ containers (eg
| > | > std::vector<double>).
| > | >
| > | > But you cannot randomly match the C approach (see Section 6.1.2, as mentioned
| > | > above) and the R/Rcpp approach.
| > | >
| > | > | In general I didn't understand much about memory allocation in Rcpp
| > | > | and I couldn't find many resources talking about it. Is there anything
| > | >
| > | > Well I could recommend a book to you...  There are also eight vignettes in
| > | > the package, and the more on other sites (the list archive, our blogs,
| > | > StackOverlow, Hadley's draft book).
| > | >
| > | > | R- or Rcpp-specific that I have to keep in mind or should I program as
| > | > | if I were programming in C/C++?
| > | >
| > | > You can provided you do it right.
| > | >
| > | > Hope this helps, if it was unclear please come back with more questions.
| > | > Also see the unit tests and examples.
| > | >
| > | > Cheers, Dirk
| > | >
| > | > --
| > | > Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
| > |
| > |
| > |
| > | --
| > | Alessandro Mammana, PhD Student
| > | Max Planck Institute for Molecular Genetics
| > | Ihnestraße 63-73
| > | D-14195 Berlin, Germany
| >
| > --
| > Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
| 
| 
| 
| -- 
| Alessandro Mammana, PhD Student
| Max Planck Institute for Molecular Genetics
| Ihnestraße 63-73
| D-14195 Berlin, Germany

-- 
Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com


More information about the Rcpp-devel mailing list