[GSoC-PortA] Random Portfolios Speed Improvement with Rcpp

Wed Nov 13 17:32:23 CET 2013

Minor email style comment:  It is much easier to read a thread if html
format is used so that font and color differences are used.  Doug

From: gsoc-porta-bounces at lists.r-forge.r-project.org
[mailto:gsoc-porta-bounces at lists.r-forge.r-project.org] On Behalf Of Ross
Bennett
Sent: Wednesday, November 13, 2013 8:06 AM
To: PortfolioAnalytics
Subject: Re: [GSoC-PortA] Random Portfolios Speed Improvement with Rcpp

See responses inline.

On Tue, Nov 12, 2013 at 9:49 PM, Brian G. Peterson <brian at braverock.com>
wrote:

Ross, this is a very interesting prototype.

We haven't shied away from adding C dependencies in PerformanceAnalytics or
blotter or quantstrat.  All of them have recently acquired compiled code.

The first part of your benchmark is a fair one, generating the random
portfolios, and one that I would expect compiled code to do better than
native R code (though we haven't spent any time profiling or trying to
improve the native R code either) because it is a big loop.

I'll look to see if there is a way to improve the R code here. At first
glance it isn't obvious where performance could be improved, because it just
a while loop as you stated. Perhaps something with how the weight_seq
subsetting is done. Although the overall impact on random portfolios would
be small because this is only called once, any improvements here could be
used in rp_transform which is called by the mapping function passed to
DEoptim. If I understand correctly, this is called tens of thousands of time
so any incremental improvement could be a large net gain overall for
optimize_method="DEoptim".

The second part of your benchmark is rather unfair though, as you say
yourself:

"Benchmark the optimization functions of PortfolioAnalytics and RcppRP.
The rp_optimize_v2 uses slimmed down C++ implementations of
constrained_objective and optimize.portfolio from PortfolioAnalytics. The
objective, constrained objective, and optimization functions
must all be in C++ so that I can "stay in C++ world" for the
optimization when calling constrained_objective for each set of weights."

The entire point of constrained_objective is that the objective from the
portfolio specification is of arbitrary complexity, and will typically
include much more complex functions than 'mean' and 'sd'.

I, perhaps naively, assumed that the most common objective functions used
were mean, StdDev, and ES. What is your take on the most common complex
objective functions? Perhaps these could be optimized or implemented in a
compiled language. Depending on the function, improvements here would also
likely be an overall net gain because they are called thousands or tens of
thousands of times from constrained_objective.

So, keeping everything in C++ world isn't really possible with arbitrary
objectives, by construction, because the objectives can be *any R function*.

That said, where would generally useful improvements (in handling arbitrary
objectives) likely lie?

* handling the loop over all the random weight vectors in compiled code:

This isn't likely to be a huge performance improvement with arbitrary
objectives, as the time spent in the loop is likely dwarfed by the objective
function itself.

Agreed, I believe most of the time is spent in constrained_objective.

* improved handling of arguments and argument matching

This one could be huge, but also doesn't require compiled code.  Josh Ulrich
recently came up with huge speed improvements in quantstrat  in part by
improving the argument matching and calling of arbitrary functions.  The
prototype of that code in quantstrat came from PortfolioAnalytics.  The key
improvement was in not evaluating large arguments. In this case, that would
be the returns time series and the moments and co-moments.  This trick could
and probably should be ported to PortfolioAnalytics.

I saw his post on FOSS trading and was very impressed by the performance
gain. I would really like to look into this for PortfolioAnalytics. I
started looking the quantstrat source code, but wasn't exactly sure what I
was looking for. Can you point me to the diff or function (or a simple
example) where this was done?

We saw Dirk try to create a faster C++ version of DEoptim a few years ago.
His RcppDEoptim didn't pass dots (...) to the objective function.  Oops.
The entire performance gain came from this lack of ability to use an
arbitrary objective.  When dots were added back in, the C++ version of DE is
slower than the C version (as you'd expect). Passing dots to the objective
isn't exactly optional outside of toy examples.

* general improvements in optimize.portfolio

Not clear without profiling, but i wouldn't expect this to be more than a
fraction of the runtime with real objectives.

* general improvements in constrained_objective

There is almost certainly a role for compiled code in constrained_objective.
This is a very large, complex function that could definitely be improved.
The core functionality of calling arbitrary objectives as specified by the
user can't be given up. though, or we lose the reason to allow an arbitrary
portfolio specification in the first place.  Obviously, both C and C++ can
call back to R code from compiled code.  We do this already in DEoptim to
call the objective (constrained_objective in PortA) from DEoptim's C code.

My first step will be to optimize constrained_objective in R code. Assuming
we take the step to write constrained_objective in compiled code, how much
of an overhead or performance hit is there with repeatedly calling back to R
from C or C++?

Are optimize.portfolio and constrained_objective complex enough that a
dependency on Rcpp would be worth it?  Quite possibly.  C code makes a lot
of sense when the code can be kept compact and the overhead of defining
objects in C isn't too great.  C++ or Rcpp makes sense when the complexity
of the functions increases and the code would be more legible and
maintainable in C++ than C.  Is constrained_objective complex enough to
benefit from C++.  Maybe.

I think we'd need to be fair and ask where the performance gains come from
and how we could gain a generic, generally useful improvement in
PortfolioAnalytics.

If it makes sense, improving optimize.portfolio and constrained_objective
would improve the performance of PortfolioAnalytics for all solvers, not
just random portfolios.  That gain would need to come from benefits
realizable even with an arbitrarily complex portfolio specification.

Thanks for the detailed breakdown and analysis of improving the performance
in PortfolioAnalytics!

Regards,

Brian

On 11/12/2013 10:43 PM, Ross Bennett wrote:

All,

Over the course of the google summer of code project, I learned a lot
about the random portfolios algorithm (among many other topics) and
became quite fascinated with the concept. I had some free time over the
weekend and decided to implement random portfolio optimization using
Rcpp. My motivation for doing this was to learn C++ and Rcpp with no
expectation of how much faster this could actually be.

Here are the results of two benchmarks I did.

This first benchmark is just generating random portfolios.

     test replications elapsed relative
1     pa             10 188.74     6.583
2 rcpp_s           10   28.67    1.000

The next benchmark is the actual optimization.

   test replications elapsed relative
1   pa            10  211.027    808.5
2 rcpp           10      0.261    1.000

I am a beginner at C++ so I am pretty sure there are further
improvements that can be made with my C++ code.

The benchmark results got me thinking that we might be able to use this
in PortfolioAnalytics. The RcppRP package I started this weekend is
really rough around the edges, but with some more improvements could
serve as an alternate optimization method for random portfolios. We
could have something like optimize.portfolio(...,
optimize_method="random_rcpp") that calls the proper functions from RcppRP.

The RcppRP package is on my github page if you are interested in looking
at the code.
https://github.com/rossb34/RcppRP

Any thoughts on if this is worth continuing to pursue? Either way I plan
to continue working on RcppRP for the sole purpose of learning C++ and Rcpp.

Regards,
Ross

_______________________________________________
GSoC-PortA mailing list
GSoC-PortA at lists.r-forge.r-project.org
http://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/gsoc-porta

-- 
Brian G. Peterson
http://braverock.com/brian/
Ph: 773-459-4973
IM: bgpbraverock
_______________________________________________
GSoC-PortA mailing list
GSoC-PortA at lists.r-forge.r-project.org
http://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/gsoc-porta

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/gsoc-porta/attachments/20131113/d0b44d5f/attachment-0001.html>