[Rcpp-devel] What is the best practice to expose a C st ructure from 3rd party library into R?

Tue Jul 23 05:22:38 CEST 2013

Hi Wush,

Nice work on Rhiredis.  I had some time to look more closely and have some
comments below.  In sum, you may have made things too complicated.  But hey,
you have a full working example which isn't too bad!  Well done.

On 22 July 2013 at 06:57, Dirk Eddelbuettel wrote:
| On 22 July 2013 at 13:19, romain at r-enthusiasts.com wrote:
| | Le 2013-07-22 10:12, Wush Wu a écrit :
| | > I wrote a wrapper of hiredis, which is a minimalistic C client for 
| | > the Redis database. Its name is `Rhiredis` and is much faster than 
| | > rredis, an existed redis client of R. Please
| | > see http://rpubs.com/wush978/rhiredis [1] for details. 

[ Actually, part the speed difference comes from you using base64enc which is
C-based and faster than what rredis has. If you first create a base64 object,
and then send that object directly with Rhiredis, the difference is smaller
but of course still significant. ]

| | > However, I still want to know if there is a better approach. I tried
| | > three approaches. Here is my opinions.
| | >
| | > # Preliminary
| | >
| | > Given a 3rd party C library, it usually provides a C structure and
| | > functions of memory management. For example:
| | >
| | > ```c
| | >
| | > struct A {
| | >     int flag;
| | >     // ...
| | > };
| | >
| | > A* initA();
| | > void freeA(A* a);
| | > ```
| | >
| | > The structure `A` need to be stored into a R object and pass to
| | > another function.
| | > Also, the user might need to access `A.flag`. For simplicity, there 
| | > is
| | > only one field in this example. However, there might be a number of
| | > fields in practice.

I think this is too complicated.  

All you need is 'static pointer' to the redic context object.

Then create one function to instantiate the object, and reuse it. 

| | > # XPtr
| | >
| | >  Thanks for the user guide of Rcpp modules , this is the first
| | > approach I tried.
| | >
| | > We could expose `A` into R as follow:
| | >
| | > ```cpp
| | > Rcpp::XPtr<A, freeA> a(initA());
| | > ```
| | >
| | > However, we should write helper function to accessing `A.flag` for
| | > every fields.

Same. 

You don't need to bring the redis context back to R. So no need for an
XPtr object.

| | > # RCPP_MODULE
| | >
| | > The guids of Rcpp modules also provides another approach:
| | >
| | > ```cpp
| | >
| | > RCPP_EXPOSED_CLASS(A)
| | >
| | > RCPP_MODULE(A) {
| | >   class_<A>("A")
| | >   .field("flag", &A::flag)
| | >   ;
| | >  }
| | >
| | > //@export
| | > //[[Rcpp::export]]
| | > SEXP init() {
| | >   BEGIN_RCPP
| | >   return wrap(*initA());
| | >   END_RCPP
| | > }
| | > ```
| | >
| | > This will produce a S4 class named `A` in R which stores C structure
| | > `A`.
| | >
| | > However, it also produces memory leak because no `freeA` is called.
| | >
| | > Adding `.finalizer(freeA)` in `RCPP_MODULE` will cause an error of
| | > freeing memory twice.
| | 
| | Gotta look into why this happens.

I suspect this may be due to your use of Boost smart_ptr so you may be
freeing something that might already be free'ed.  

| | > # Embed `A` into C++ class and expose the class with RCPP_MODULE
| | >
| | > This approach is implemented in `Rhiredis`.
| | >
| | > Finally, I still need to write helper function to expose the field of
| | > `A`. But the user could access these flag in R with operator `$`.
| | >
| | > Note that I still need a function to extract the pointer of `A` from
| | > exposed S4 object:
| | >
| | > ```cpp
| | >
| | > template<class T>
| | > T* extract_ptr(SEXP s) {
| | >  Rcpp::S4 s4(s);
| | >  Rcpp::Environment env(s4);
| | >  Rcpp::XPtr<T> xptr(env.get(".pointer"));
| | >  return static_cast<T*>(R_ExternalPtrAddr(xptr));
| | > }
| | > ```

I do not understand why you would need all that.

| | > Please give me some suggestion if you know a better or a different
| | > approach.
| | 
| | I would essentially do what you do: use RCPP_MODULE to expose a C++ 
| | class so that the C++ class manages scoping : constructor, destructor, 
| | etc ...
| | 
| | class A_cpp {
| | public:
| |      A_cpp( ) : obj( initA() ){}
| |      ~A_cpp(){ freeA(obj); obj = NULL ; }
| | 
| |      int get_flag(){ return obj->flag ; }
| |      void set_flag( int x ){ obj->flag = x ; }
| | 
| | private:
| |      A* obj ;
| | } ;

I may do something like that tomorrow evening.

For now, I did something _much_ simpler still.   We need single C++ source
file.  Some key features:

-- no more Boost shared_ptr
-- no more Rcpp::XPtr
-- "empty" constructor and destructor, nothing to be done
-- no member variables
-- one worker function which takes string (with a redis command), runs it
   and returns the result string
-- the worker function checks if the redis context pointer is non-NULL; in
   case it is NULL (eg first use) it creates a context

-----------------------------------------------------------------------------

#include <Rcpp.h>
#include <hiredis/hiredis.h> 	// on Ubuntu file /usr/include/hiredis/hiredis.h

// We use a static pointer which makes the variable persistent across calls
// A more formal C++ idiom would be a singleton class, but this is simpler
static redisContext *prc = NULL;

// A simple and lightweight class -- without member variables 
// We could add some member variables to cache the last call, status, ...
//
class Redis {

private: 
    void checkAndInit() {
	if (prc == NULL) {
	    prc = redisConnect("127.0.0.1", 6379);
	    // should test for error here, could even throw() under Rcpp, ...
	    Rcpp::Rcout << "Context created\n";
	} else {
	    Rcpp::Rcout << "Reusing context\n";
	}
    }

public:
    Redis()  { /* Rcpp::Rcout << "In ctor\n"; */ }
    ~Redis() { /* Rcpp::Rcout << "In dtor\n"; */ }

    std::string execCommand(std::string cmd) {
	checkAndInit();		// important: ensure prc is set up, re-use if so
	redisReply *reply = static_cast<redisReply*>(redisCommand(prc, cmd.c_str()));
	std::string res(reply->str);
	freeReplyObject(reply);
	return(res);
    }

    void disconnect() {
	redisFree(prc);
	prc = NULL;		// just to be on the safe side
    }

    // could create new functions to (re-)connect with given host and port etc pp
};

-----------------------------------------------------------------------------

Couple this with a simple function (which is so simple that I wrote it
"straight" without even Rcpp attributes. One string in, one string out:

-----------------------------------------------------------------------------
// for now, single worker function
extern "C" SEXP execRedisCommand(SEXP p) {
    Redis redis;
    std::string txt = Rcpp::as<std::string>(p);
    std::string res = redis.execCommand(txt);
    return Rcpp::wrap(res);
}
-----------------------------------------------------------------------------

(We could create a second function to close redis and call disconnect();
and we could 'getter' function for the state variables inside the context.)

This function is then used by a single R function:

-----------------------------------------------------------------------------
## minimal R interface to redis using the hiredis library
redis <- function(cmd) {
    ## some sanity checking on cmd would be nice too...
    .Call("execRedisCommand", cmd, package="Rhiredis")
}
-----------------------------------------------------------------------------

And that is all.  I added a simple tester:

edd at don:~/git/Rhiredis$ Rscript tests/simpleRedisClient.R 
Context created
[1] "PONG"
Reusing context
[1] "PONG"
Reusing context
[1] "PONG"
Reusing context
[1] "OK"
Reusing context
[1] "42"
edd at don:~/git/Rhiredis$ 

As you can see from the verbose text from checkAndInit(), we only need to
create the redis instance once.

I may still turn this into something based on Rcpp modules. Maybe tomorrow.

The code is for now in a quick fork of your Rhiredis in my github account.

Hope this helps,  Dirk

-- 
Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com