[Rcpp-devel] What is the best practice to expose a C st ructure from 3rd party library into R?

Dirk Eddelbuettel edd at debian.org
Tue Jul 23 18:37:25 CEST 2013


Hi Wush,

On 23 July 2013 at 23:52, Wush Wu wrote:
| Hi all,
| 
| Thanks for your reply. 

My pleasure.
 
| I'm very happy to contribute to Rcpp Gallery. 
| 
| I wrote some wrappers of C structure from 3rd party library with Rcpp, and
| these issues kept bothering me. A good example will make the life easier.

I think this one may help.  I rolled the code one step further on the train
this morning and started to use Rcpp Modules -- this shou;d work well with it.
 
| Well, I didn't fully understand why rredis is so slow. As shown in my
| benchmark, the serialization of R object spends a little time(about 0.01
| second), so I guess the main difference is the efficiency of transferring data
| between R and redis.

I am not sure either.  Talking to redis is "just networking" over a socket,
which R does from compiled, and so does rredis.  

Maybe someone would need to profile rredis.
 
| By the way, I talked to another redis user at Taiwan R User Group. He suggested
| me to make the API compatible with rredis so that Rhiredis could replace rredis
| in doRedis, which is one of parallel backend of `foreach`.

That would be nice.  I am not sure I have time to really look after a
full-featured redis client -- but I will provide a (much simpler, much
smaller) package with the minimal Rcpp wrapping needed.

| Sometimes I need to connect multiple redis server simultaneously, so I need to
| expose the connection object to make me select the server in R. 

I see.  You will get that once I switch to Rcpp modules. Then the 'instance'
is properly inside the C++ object, which is instantiated, accessed and
controlled from R. Pretty much ideal as best as I can tell.
 
| It is convenient to use a global object if the user only needs one connection.
| Therefore, I suggest to expose connection object to R, and let R put the last
| constructed object into `options` and pass it to the API as the default
| argument. This is how I implement Rhiredis.

Not a great approach in my book as you need to go back and forth and copy
that connection object all the time -- whereas it could be a component of an
opaque class. I like that second approach much better.  I alleviates a lot of
the issues your are detailing below.  Simpler is better, generally.

That said, Rhiredis is of course your package and you can do whatever you
want with it.

I hope to be able to convince you that one does not Boost smart_ptr, or
passing objects back and forth, or ...

| However, thanks for the idea of singleton which I have never thought before.
| Indeed, it is simpler.

Has the same constraint of just one connection -- good enough for many case
but as you point out one may want to overcome in some situations too.

| I might misuse the `finalizer`. The `freeA` free the memory which will be freed
| by R later. However, I cannot find another place to call the `freeA` if I
| directly expose `A` with `RCPP_MODULE`.

My approach has no direct free or delete, only calls to the Redis API to instantiate
one redis connection object, possibly delete it and of course delete the
message objects.  But redis manages that memory.

So we need to do no memory managed -- and can be pretty certain to never have
segfaults.

|     I suspect this may be due to your use of Boost smart_ptr so you may be
|     freeing something that might already be free'ed.
| 
| The code in the repository is the third approach. Sorry that I didn't explain
| it clearly in the first mail.
| 
| The reason of using smart pointer is subtle.
| 
| At first, I embeded the connection structure into a C++ class like what Dirk
| showed below. I'll explain the problem there.

As I mentioned, you can do without that.

| Take `redisCommand` for example. Since I didn't make the connection object to
| be a singleton, R need to pass a connection object to `redisCommand`. The
| object is exposed by `RCPP_MODULE`, so I need to extract it from `RCPP_MODULE`.

Well there are other ways to do this.

|     | | I would essentially do what you do: use RCPP_MODULE to expose a C++
|     | | class so that the C++ class manages scoping : constructor, destructor,
|     | | etc ...
|     | |
|     | | class A_cpp {
|     | | public:
|     | |      A_cpp( ) : obj( initA() ){}
|     | |      ~A_cpp(){ freeA(obj); obj = NULL ; }
|     | |
|     | |      int get_flag(){ return obj->flag ; }
|     | |      void set_flag( int x ){ obj->flag = x ; }
|     | |
|     | | private:
|     | |      A* obj ;
|     | | } ;
| 
| In my case, a double free runtime error occurred. The reason is due to the copy
| constructor of class A_cpp, i.e. `A_cpp(const A_cpp& src);`. 
| 
| This implementation will corrupt memory if:
| 
| ```cpp
|   A_cpp a1;
|   A_cpp a2(a1);
| ```
|  
| The compiler will free the space of a1::obj twice. The first time is at
| `a1::~A_cpp` and the second time is at `a2::~A_cpp`.
| 
| That's why I use a smart pointer to make class A_cpp copyable.

You copy a pointer, rather than the object, so I would add a test for "is not
NULL" in freeA() -- else you risk freeing the same thing twice. And get a
segfault.  

Anyway, there are better ways, I think, not involving a pointer.

| The implementation of RCPP_MODULE contains a copy construction. A compile-time
| error occurred if I close the copy constructor manually:
| 
| ```cpp
| class A_cpp {
|   //...
| private:
|   A_cpp(const A_cpp&);
|   void operator=(const A_cpp&);

Maybe it's the const, we don;t always have signatures for that...

| }
| ```
[...]
| Thank you, Dirk.
| 
| A singleton does make things easier. However, I still want to know how to
| expose C structure with provided `free`-like function because sometimes we
| cannot use singleton.

Manage your objects carefully, and don't call free() on NULL pointers.

If you want to write the equivalent of C code with Rcpp, you certainly can.
But you must make sure you do the pointer operations correctly.



>From your second email:

| By the way, maybe you should look two files in `inst`.
|
| The `gen_function.R` crawls http://redis.io/commands and generates the command
| according to the official redis manual. You could modify `template.R` to
| generate the helper functions dynamically based on the exposed Rcpp function.

I did of course see those. Similar to what Romain does at times with
generator macros and scripts :)  And I like how roxygen also created all
those help files from it. 

But what I am doing (at least for now) is much simpler: one single redis
execution function which just sends one combined string with command _and_
arguments, eg 'SET testchannel 42' which should be fine.

That way we should end up with just one (small) R file, and one Rd file.  

More by this evening (US Central time), hopefully.

Cheers, Dirk

-- 
Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com


More information about the Rcpp-devel mailing list