[Rcpp-devel] Modules and Boost and larger data sets
Simon Zehnder
szehnder at uni-bonn.de
Fri Sep 6 14:42:55 CEST 2013
Hi Dirk,
thanks for the quick answer and to the many suggestions and correction you gave! I have now a better idea how to design the package.
On Sep 6, 2013, at 2:20 PM, Dirk Eddelbuettel <edd at debian.org> wrote:
>
> On 6 September 2013 at 13:46, Simon Zehnder wrote:
> | Dear Rcpp-Users and Rcpp-Devels,
> |
> | this goes especially to Dirk and Romain, the developers of RcppBDT.
>
> Well its's mostly me for the scope of it, with numerous invaluable assists
> from Romain. The released version is far behind the SVN version;
> unfortunately the SVN version is far from release-ready.
>
For the next time I know better. Looking forward to the release.
> | I am right now writing on a package for market microstructure data -
> | usually large tick datasets with trade times and security symbols.
>
> Interesting. I do that for a living too.
>
Well, when doing research with the tick data it is sometimes a pain in the ass to match trades with the last quotes or match spot prices with future prices. In MM, research relies a lot on this and it consumes almost the most time. So I try to construct a package that can do most of it - and fast (my idea is to use openmp in C++ for ordering and filtering). Furthermore the most used tick data for research are either NYSE/wrds or for Bonds TRACE (regarding the spot markets). So the package should also deal with the special format of these to make it easier. There is a package 'highfrequency' which does something similar but for TRACE data it is not appropriate.
> | I read the Rcpp Book about Modules and when starting as usual with S4
> | classes in R, the Modules came into my mind. As I am operating on datasets
> | with usually around 1 Mio. rows I am wondering, if maybe the implementation
> | via Modules is the better (better in regard to performance) one - in
>
> That is not usually the motivation for modules.
>
> "Straight up" functions, coded via inline or attributes, will be as fast.
>
With that I make my decision -> in R S4 classes. I do know these very well now.
> | comparison to the usual S4 class implementation directly in R. With Modules
>
> "The usual S4 class implementation"?
>
> I have done R for over a decade and I still hardly use S4, so "the usual" is,
> errmm, "unusual".
>
That is true. the S4 class system is not very near to OOP in C++ or Java and there are a lot of limitations, etc. It gives me though a good way to structure my code. With usual I meant: writing S4 classes in R - not defining them in C++: as far as I understood from the Modules chapter of your book - S4 classes are build automatically with Modules defined? Please correct me, if I am wrong.
> | I am able to define all functions on the datasets in C++ - which I expect
> | to be faster. Sorting the data and filtering the data in regard to
> | dates/times are of course one of the main tasks to be covered.
>
> I have some trouble with the logic of your argument, but accept the end
> result that Boost Date.Time is good for dates and times. :)
>
It's all about performance. Sorry for being imprecise. I expect sorting and filtering data in regard to dates/times in C++ is faster than doing it in R relying on POSIXlt/POSIXct (at least for datasets of larger size).
> | In RcppBDT I read in the DESCRIPTION file, that the Boost Header Files for
> | Date.Time must be included.
>
> "On the system on which RcppBDT is to be compiled" -- different from where it
> is used (Windows, say). _No run-time depends_
Ah, the binaries that can be loaded for each system ...
> | As I have to choose one library for Date/Time formats in C++, boost just
> | seems so appropriate. But for usage in the Market Microstructure community
> | it is impossible to expect them to install Boost on their system.
>
> Sorry but one has nothing to do with the other.
>
True, I just want my colleagues and other researchers in the field to be able to use it very easily. You give me the answer below.
> Also please look at the CRAN package BH -- it _provides_ Boost headers for
> this very purpose. Several packages already use it.
>
> | So, I would like to provide Boost already within the package.
>
> Just don't do it. Seriously. Use a "Depends: BH"
>
That is perfect! Thanks for this valuable information.
> | As everything what you two do makes sense, I think I haven't grabbed yet the
> | reason, why Boost is not provided in the RcppBDT right alongside. Is there
> | something which restricts me from doing this?
>
> It's inefficient. We don't ship the headers of the C library either.
>
> It's just a Depends.
>
> Better to hand-off to the system, and with R, we can (at least for pure
> template headers) via the BH package we created.
>
> | I am very thankful for thoughts and opinions on my idea and my question.
>
> Sure, no problem.
>
> Dirk
>
> --
> Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
So, at the end: Thanks again for your valuable comments and tips.
Best
Simon
More information about the Rcpp-devel
mailing list