[Rcpp-devel] Is creating a DataFrame from std::vector-s supposed to work?

Dirk Eddelbuettel edd at debian.org
Thu May 16 20:09:16 CEST 2013


Hi Ivan,

Thanks for the follow-up.

On 16 May 2013 at 13:17, Ivan Popivanov wrote:
| Are you saying that DataFrame::create doesn't work properly with std::vector? I
| think I have seen examples and test units doing exactly that.

No, it works. At least "in general". But as the parallel thread shows, there
cane be issues with temp objects. And we had reports about issues with large
objects going into data.frames.

So my advice would be to plan to minimize copies. Instantiate re-using memory
allocs. 

The Rcpp Gallery post on faster data.frame object creation does something
along those lines, and was reported to be robust and fast as I recall.
 
| In my example, I'd like the c++ layer doing the actual work to stay pure c++,
| so that this code could be used without Rcpp (in a c++ program for
| instance).

Yes. I have done that too.  And then create a shim function that takes your
pure C++ and creates Rcpp / R objects.

But I have in fact returned large lists of large std::vector objects too.

| Thus, the results from this layer are std::vector. Other than that, I just want
| to float it up to R, where the final result (in R) makes most sense to be a
| data frame. It seemed most efficient to build the data frame in c++. And since
| I saw that a data frame can be created from std::vectors - I went for the
| direct approach.

Agreed.  Before we had a Rcpp::DateFrame type, I returned an Rcpp::List and
just added an as.data.frame() call back in R. That worked well.
 
| Under these circumstances, can you think of a better way building the data
| frame without creating the extra R objects on the fly? In fact, the
| intermediate vectors are not a big deal - the gain in speed of using c++ dwarfs
| the conversion, but still, if there is an easy optimization ...

Hard to tell.  I think we have constructors that take x.begin() and
x.length() so you'd re-use or copy efficiently.  Might be worth trying those.
 
| Of course, I can return anything and convert it to a data.frame in R. Just
| trying to understand what is the recommended way doing that. Any other
| suggestions/ideas are welcome.

All good, and it is a little trial-and-error.  Patches for code or
documentation always welcome!

Dirk

| 
| Thanks in advance,
| Ivan
| 
| 
| On Wed, May 15, 2013 at 11:30 PM, Dirk Eddelbuettel <edd at debian.org> wrote:
| 
| 
|     On 15 May 2013 at 20:23, Ivan Popivanov wrote:
|     | Two versions of the code, see at the end. The first converts my
|     std::vectors to
|     | Rcpp equivalents. The second builds the data frame from the std::vectors.
| 
|     A date.frame is an R list with some special sauce, and it is probably
|     easier
|     to construct this from R types -- reduces temp copies.  We can after not
|     (directly) send a std::vector back to R, though the implicit use of wrap()
|     comes close.
| 
|     | Calling the function containing this code numerous times (hundreds if not
|     | thousands) returning the std::vectors directly seems to cause random
|     crashes or
|     | hangs somewhere after the return statement. The other version seems to
|     work
|     | properly.
| 
|     The devil is in the detail.
| 
|     There was some discussion here on the list over the years, and we have one
|     contributed 'faster' data.frame creator here:
| 
|        http://gallery.rcpp.org/articles/faster-data-frame-creation/
| 
|     Dirk
| 
|     | Thanks,
|     |
|     |   return Rcpp::DataFrame::create(
|     |                Rcpp::Named("Entry") = Rcpp::IntegerVector
|     (ibeg.begin(),
|     | ibeg.end()),
|     |                Rcpp::Named("Exit") = Rcpp::IntegerVector
|     (iendOut.begin
|     | (),iendOut.end()),
|     |                Rcpp::Named("Position") = Rcpp::IntegerVector
|     | (position.begin(), position.end()),
|     |                Rcpp::Named("StopLoss") = Rcpp::NumericVector
|     | (stopLoss.begin(), stopLoss.end()),
|     |                Rcpp::Named("StopTrailing") = Rcpp::NumericVector
|     | (stopTrailing.begin(), stopTrailing.end()),
|     |                Rcpp::Named("ProfitTarget") = Rcpp::NumericVector
|     | (profitTarget.begin(), profitTarget.end()),
|     |                Rcpp::Named("ExitPrice") = Rcpp::NumericVector
|     | (exitPrice.begin(), exitPrice.end()),
|     |                Rcpp::Named("Gain") = Rcpp::NumericVector
|     (gain.begin(),
|     | gain.end()),
|     |                Rcpp::Named("MAE") = Rcpp::NumericVector(mae.begin
|     (),
|     | mae.end()),
|     |                Rcpp::Named("MFE") = Rcpp::NumericVector(mfe.begin
|     (),
|     | mfe.end()),
|     |                Rcpp::Named("Reason") = Rcpp::IntegerVector
|     (reason.begin
|     | (), reason.end()));
|     |    /*
|     |    return Rcpp::DataFrame::create(
|     |                Rcpp::Named("Entry") = ibeg,
|     |                Rcpp::Named("Exit") = iendOut,
|     |                Rcpp::Named("Position") = position,
|     |                Rcpp::Named("StopLoss") = stopLoss,
|     |                Rcpp::Named("StopTrailing") = stopTrailing,
|     |                Rcpp::Named("ProfitTarget") = profitTarget,
|     |                Rcpp::Named("ExitPrice") = exitPrice,
|     |                Rcpp::Named("Gain") = gain,
|     |                Rcpp::Named("MAE") = mae,
|     |                Rcpp::Named("MFE") = mfe,
|     |                Rcpp::Named("Reason") = reason);
|     |                */
|     |
|     | ----------------------------------------------------------------------
|     | _______________________________________________
|     | Rcpp-devel mailing list
|     | Rcpp-devel at lists.r-forge.r-project.org
|     | https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
|     --
|     Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
| 
| 

-- 
Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com


More information about the Rcpp-devel mailing list