[Rcpp-devel] new to R, so don't understand Rcpp limits

Christian Gunning xian at unm.edu
Fri Aug 4 20:17:53 CEST 2017


> | Hi,
> | I need to share some data between my Python code and my c++ code, C++
> does
> | not really have a lot of nice ideas like DataFrames. But if you save a
> | dataframe from Python into csv, you can readily read it using R. Csv is
> not
> | the best way to go, but it is a simple case.
>
> CSVs are indeed a terrible format, yet annoyingly common. Try binary
> alternatives if you can.
>

I disagree with Dirk that CSV is "a terrible format" - it excels (hah) at
human readability, is decently machine-readable and easily compressible,
but certainly is inappropriate for many tasks that require efficiency.


> | I have generally been noticing as I google around, that R has a healthy
> and
> | seemingly growing list of packages that can be accessed by c++ code. From
> | c++, R does not look so bad to me, and I would like to get access to this
> | large library of native routines in R.
> |
> | First on the list, is that I hope to read a dataframe or something like
> it
> | from data in a file, and then transform that dataframe or other tabular
> | object into something I can use in my c++ code for linear algebra, like
> an
> | Armadillo matrix.
> |
> | So is there any native code in the R world that I can use to read a
> | dataframe from a file?
>

First off, only use a data.frame if what you really want is a data.frame.
Otherwise, stick with a matrix (or convert to one as early as possible).

* Data.frame = ordered collection of like-sized vectors, possibly of
heterogeneous type.
* Matrix = ordered collection values, of known / fixed dimension, by
default represented internally as columns of vectors in both R and
armadillo (as in LAPACK).

In R, for modest-sized objects, going between these two types is
"relatively seamless". But in C++/Rcpp, the underlying differences are more
apparent to the user.  Matrices "just work" (e.g. easy construction of an
"identical" armadillo object), whereas data.frames require some care and
attention, and possibly extra object creation destruction.  When possible,
stick with matrices.


> |
> | I think Rcpp is really cool,
>
it might make me
> | a backdoor R user.
>

I became a backdoor C++ programmer through C++.  +1 really cool.

I found Google Protocol Buffers absurdly useful for moving between R and
cpp in complex projects.  It's well-documented, fast, encourages
separate/good metadata documentation, and works smoothly for R, C++, and
Python.   I never did use protobufs for vector data, though. I did write
some test code using repeated fields, but didn't get to the point of
comfort there. For arrays of fixed dimension, I can imagine using one field
per dimension to code that dimension's length, and then a final repeated
field with the payload.  See below for example.

Question for Dirk (et al):

Has anyone used protobuf messages for, e.g., passing arrays? Any obvious
downsides?  When I last googled, I didn't find much re protobuf repeated
fields or Rcpp + protobufs...


// File PbTest.proto
syntax = "proto2";
package Array;
// see
https://developers.google.com/protocol-buffers/docs/reference/cpp-generated#fields

message a2d {
    optional uint32 dim1 = 10;
    optional uint32 dim2 = 20;
    // add more dims here
    //
    // numeric vector
    repeated float  payload = 50;
}

## File pbarray.R
library(RProtoBuf)
aa <- matrix(1:30, ncol=3)
bb <- new(P("Array.a2d", file='PbTest.proto'))
bb$dim1 <- dim(aa)[1]
bb$dim2 <- dim(aa)[2]
bb$add(field='payload', values=aa)

cat(as.character(bb))

best,
Christian
http://www.x14n.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20170804/a261d18e/attachment.html>


More information about the Rcpp-devel mailing list