[Rcpp-devel] DataFrame and passing by reference

stat quant statquant at outlook.com
Sun Mar 31 18:59:02 CEST 2013

Thanks Kevin,
This question came because when you do this

// [[Rcpp::export]]
DataFrame updateDFByValue(DataFrame df) {
    int N = df.nrows();
    NumericVector newCol(N,1.);
    df["newCol"] = newCol;

The DataFrame is returned to R as a list, and building back another
data.frame might

   1. cost time
   2. appear like a waste if what was intended was to update the data.frame

data.tableallows by-reference updates in R but there is no C api that
I know of, but
it is an enhanced data.frame so Rcpp deals with it as a data.frame, I
thought it was too bad to be able to update by reference in R and not in
C++ so I asked this genuine question.

Your way makes sense to me, I'll try to dig deeper.

PS: Dirk answered me

2013/3/31 Kevin Ushey <kevinushey at gmail.com>

> I think the problem here is that the assignment df["newCol"] = newCol
> copies the dataframe. Note that something like this would work as expected:
> #include <Rcpp.h>
> using namespace Rcpp;
> // [[Rcpp::export]]
> void updateDFByRef(DataFrame& df) {
>     int N = df.nrows();
>     NumericVector newCol(N,1.);
>     df[0] = newCol; // replace the 1st vector with the numeric vector of
> 1s, by ref
>     return;
> }
> So, the reference to the original df is getting passed, the problem is
> figuring out how to assign a new vector to df without forcing a copy.
> I'm not sure if there's a ready-made solution, but I imagine the easiest
> way to do it would be:
> 1) Declare a new list of df.size()+1,
> 2) Copy the pointers to the new list (not sure the best way to do this in
> Rcpp),
> 3) Assign the vector you want to the new, last column,
> 4) Return that new list.
> This should work since internally, lists (VECSXP)s are just vectors of
> SEXPs (pointers) to other R vectors (REALSXPs, INTSXPs, and so on...)
> (Please correct me if I'm wrong on the above.)
> -Kevin
> On Sun, Mar 31, 2013 at 6:44 AM, stat quant <statquant at outlook.com> wrote:
>> Hello list,
>> looking at Rcpp::DataFrame in the gallery<http://gallery.rcpp.org/tags/dataframe/>I realized that I didn't know how to modify a DataFrame by reference.
>> Googling a bit I found this post on SO<http://stackoverflow.com/questions/13773529/passing-a-data-table-to-c-functions-using-rcpp-and-or-rcpparmadillo>and this post
>> on the archive<http://www.mail-archive.com/rcpp-devel@lists.r-forge.r-project.org/msg04919.html>
>> .
>> There is nothing obvious so I suspect I miss something big like "It is
>> already the case because" or "it does not make sense because".
>> I tried the following which compiled but the data.frame object passed to
>> updateDFByRef in R stayed untouched
>> #include <Rcpp.h>
>> using namespace Rcpp;
>> // [[Rcpp::export]]
>> void updateDFByRef(DataFrame& df) {
>>     int N = df.nrows();
>>     NumericVector newCol(N,1.);
>>     df["newCol"] = newCol;
>>     return;
>> }
>> Could somebody explain me what I am missing or kindly point me to a
>> document where I can find the explanation ?
>> Cheers
>> _______________________________________________
>> Rcpp-devel mailing list
>> Rcpp-devel at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20130331/972a90c0/attachment.html>

More information about the Rcpp-devel mailing list