[GSoC-PortA] Random Portfolios Speed Improvement with Rcpp

Ross Bennett rossbennett34 at gmail.com
Fri Nov 15 05:25:33 CET 2013


All,

I think I found the culprit for the slow performance of
constrained_objective. It has to do with the moments being re-calculated in
set.portfolio.moments at each iteration of constrained_objective.

You can verify this putting a print(str(momentargs)) or browser() statement
at the beginning of set.portfolio.moments.

Here is a small reproducible example to mimic how the moments are passed
from optimize.portfolio to constrained_objective.

# This function mimics optimize.portfolio how the moments are passed to
# constrained_objective
fun1 <- function(){
  # the moments are calculated with momentFUN and then assigned to dotargs
  # in optimize.portfolio. The dotargs are then passed to ... in
  # constrained_objective

  # replicate what momentFUN would look like
  dotargs <- list()
  dotargs$mu <- rep(1, 4)
  dotargs$sigma <- diag(4)

  print("fun2")
  fun2(...=dotargs)
}

# This function mimics constrained_objective how the moments are grabbed
# from optimize.portfolio
fun2 <- function(...){
  tmp <- list(...)
  print(str(tmp))
  # tmp is then passed to set.portfolio.moments as momentargs
}

> fun1()
[1] "fun2"
List of 1
 $ ...:List of 2
  ..$ mu   : num [1:4] 1 1 1 1
  ..$ sigma: num [1:4, 1:4] 1 0 0 0 0 1 0 0 0 0 ...
NULL

You can see that '...' is prepended when the list is constructed in
fun2. This is why set.portfolio.moments can't find momentargs$mu,
momentargs$sigma, etc. and has to be recalculated at each iteration. So the
problem is '...' is actually the first element of the list.

Have any of you run into this before? Any suggestions for a fix?

I think we could add a formal "moments" argument to constrained objective
and pass the moments set in optimize.portfolio to constrained_objective
directly through the "moments" argument. Is there any downside to this
approach that I might be overlooking?

Thanks,
Ross



On Thu, Nov 14, 2013 at 2:55 PM, Ross Bennett <rossbennett34 at gmail.com>wrote:

> Brian,
>
> Attached are the output of two Rprof runs:
>  - rp_profile_new.txt uses modify.args to match the arguments for
> set.portfolio.moments in constrained_objectve
>  - rp_profile_old.txt is the existing code to match the arguments
>
> There is a slight performance improvement, but nothing significant. I used
> a simple objective to minimize mES using optimize_method="random" with
> search_size=1000. The dataset is all 13 indices from data(edhec).
>
> If you look at rp_profile_old, the total.time for set.portfolio.moments is
> 194.94 and total.pct is 98.82. This seems odd to me since
> set.portfolio.moments checks if the moments for mu, sigma, m3, and m4 are
> null. (e.g. if(is.null(momentargs$m3)) momentargs$m3 =
> PerformanceAnalytics:::M3.MM(R))
>
> It is as if all the moments are being calculated at each iteration in
> set.portfolio.moments, which shouldn't be happening. They should be passed
> from optimize.portfolio when they are set with momentFUN. I'll be able to
> look into this in more detail later this evening.
>
> One thing I plan on trying is adding a formal "moments" argument to
> constrained_objective so that we can pass the object directly from
> optimize.portfolio.
>
> Realistically, the moments *should* only need to be calculated once in
> optimize.portfolio and should not be recalculated in constrained_objective,
> correct?
>
> Ross
>
>
>
>
> On Wed, Nov 13, 2013 at 9:04 AM, Brian G. Peterson <brian at braverock.com>wrote:
>
>> See responses inline.
>>
>>
>> On 11/13/2013 10:06 AM, Ross Bennett wrote:
>>
>>  On Tue, Nov 12, 2013 at 9:49 PM, Brian G. Peterson <brian at braverock.com
>>> <mailto:brian at braverock.com>> wrote:
>>>
>>>     Ross, this is a very interesting prototype.
>>>
>>>     We haven't shied away from adding C dependencies in
>>>     PerformanceAnalytics or blotter or quantstrat.  All of them have
>>>     recently acquired compiled code.
>>>
>>>     The first part of your benchmark is a fair one, generating the
>>>     random portfolios, and one that I would expect compiled code to do
>>>     better than native R code (though we haven't spent any time
>>>     profiling or trying to improve the native R code either) because it
>>>     is a big loop.
>>>
>>>
>>> I'll look to see if there is a way to improve the R code here. At first
>>> glance it isn't obvious where performance could be improved, because it
>>> just a while loop as you stated. Perhaps something with how the
>>> weight_seq subsetting is done. Although the overall impact on random
>>> portfolios would be small because this is only called once, any
>>> improvements here could be used in rp_transform which is called by the
>>> mapping function passed to DEoptim. If I understand correctly, this is
>>> called tens of thousands of time so any incremental improvement could be
>>> a large net gain overall for optimize_method="DEoptim".
>>>
>>
>> Yes, rewriting the mapping function in a compiled language may indeed
>> help a lot.  I'm still not clear on *how much* is will help.  We probably
>> need an Rprof run to figure it out.
>>
>>
>>      The second part of your benchmark is rather unfair though, as you
>>>     say yourself:
>>>
>>>     "Benchmark the optimization functions of PortfolioAnalytics and
>>> RcppRP.
>>>     The rp_optimize_v2 uses slimmed down C++ implementations of
>>>     constrained_objective and optimize.portfolio from
>>>     PortfolioAnalytics. The objective, constrained objective, and
>>>     optimization functions
>>>     must all be in C++ so that I can ”stay in C++ world” for the
>>>     optimization when calling constrained_objective for each set of
>>>     weights."
>>>
>>>     The entire point of constrained_objective is that the objective from
>>>     the portfolio specification is of arbitrary complexity, and will
>>>     typically include much more complex functions than 'mean' and 'sd'.
>>>
>>>
>>> I, perhaps naively, assumed that the most common objective functions
>>> used were mean, StdDev, and ES. What is your take on the most common
>>> complex objective functions? Perhaps these could be optimized or
>>> implemented in a compiled language. Depending on the function,
>>> improvements here would also likely be an overall net gain because they
>>> are called thousands or tens of thousands of times from
>>> constrained_objective.
>>>
>>
>> Well, remember that there are lots of modifications even to portfolio
>> covariances and ETL.  the Cornish Fisher stuff, modifications based on
>> better estimates of the moments, etc.  Drawdowns and factor exposures are
>> probably the others that are most often used in practice.
>>
>> Josh has some C code for doing the higher co-moment matrices that I want
>> to get into PerformanceAnalytics sometime soon.  That code is very
>> expensive in R.
>>
>>
>>
>>
>>>     So, keeping everything in C++ world isn't really possible with
>>>     arbitrary objectives, by construction, because the objectives can be
>>>     *any R function*.
>>>
>>>
>>>     That said, where would generally useful improvements (in handling
>>>     arbitrary objectives) likely lie?
>>>
>>>     * handling the loop over all the random weight vectors in compiled
>>> code:
>>>
>>>     This isn't likely to be a huge performance improvement with
>>>     arbitrary objectives, as the time spent in the loop is likely
>>>     dwarfed by the objective function itself.
>>>
>>>
>>> Agreed, I believe most of the time is spent in constrained_objective.
>>>
>>
>> Again, it would be good to look at an Rprof run of both a simple
>> portfolio spec and a complex one.  Recent examples of more nuanced ones can
>> be found in Peter's symposium2013 code in the sandbox.
>>
>>
>>
>>      * improved handling of arguments and argument matching
>>>
>>>     This one could be huge, but also doesn't require compiled code.
>>>       Josh Ulrich recently came up with huge speed improvements in
>>>     quantstrat  in part by improving the argument matching and calling
>>>     of arbitrary functions.  The prototype of that code in quantstrat
>>>     came from PortfolioAnalytics.  The key improvement was in not
>>>     evaluating large arguments. In this case, that would be the returns
>>>     time series and the moments and co-moments.  This trick could and
>>>     probably should be ported to PortfolioAnalytics.
>>>
>>>
>>> I saw his post on FOSS trading and was very impressed by the performance
>>> gain. I would really like to look into this for PortfolioAnalytics. I
>>> started looking the quantstrat source code, but wasn't exactly sure what
>>> I was looking for. Can you point me to the diff or function (or a simple
>>> example) where this was done?
>>>
>>
>> The function is modify.args in utils.R, and you can see it used in a
>> manner very similarly to how we use it in PortA in fn ApplyIndicators in
>> indicators.R
>>
>>
>>
>>      We saw Dirk try to create a faster C++ version of DEoptim a few
>>>     years ago.  His RcppDEoptim didn't pass dots (...) to the objective
>>>     function.  Oops.  The entire performance gain came from this lack of
>>>     ability to use an arbitrary objective.  When dots were added back
>>>     in, the C++ version of DE is slower than the C version (as you'd
>>>     expect). Passing dots to the objective isn't exactly optional
>>>     outside of toy examples.
>>>
>>>     * general improvements in optimize.portfolio
>>>
>>>     Not clear without profiling, but i wouldn't expect this to be more
>>>     than a fraction of the runtime with real objectives.
>>>
>>>     * general improvements in constrained_objective
>>>
>>>     There is almost certainly a role for compiled code in
>>>     constrained_objective.  This is a very large, complex function that
>>>     could definitely be improved.  The core functionality of calling
>>>     arbitrary objectives as specified by the user can't be given up.
>>>     though, or we lose the reason to allow an arbitrary portfolio
>>>     specification in the first place.  Obviously, both C and C++ can
>>>     call back to R code from compiled code.  We do this already in
>>>     DEoptim to call the objective (constrained_objective in PortA) from
>>>     DEoptim's C code.
>>>
>>>
>>> My first step will be to optimize constrained_objective in R code.
>>> Assuming we take the step to write constrained_objective in compiled
>>> code, how much of an overhead or performance hit is there with
>>> repeatedly calling back to R from C or C++?
>>>
>>
>> Like any other function execution overhead.  If you can avoid evaluating
>> the arguments, it's very low cost.  As I said, DEoptim does this from C to
>> call the objective function.
>>
>>
>>      Are optimize.portfolio and constrained_objective complex enough that
>>>     a dependency on Rcpp would be worth it?  Quite possibly.  C code
>>>     makes a lot of sense when the code can be kept compact and the
>>>     overhead of defining objects in C isn't too great.  C++ or Rcpp
>>>     makes sense when the complexity of the functions increases and the
>>>     code would be more legible and maintainable in C++ than C.  Is
>>>     constrained_objective complex enough to benefit from C++.  Maybe.
>>>
>>>     I think we'd need to be fair and ask where the performance gains
>>>     come from and how we could gain a generic, generally useful
>>>     improvement in PortfolioAnalytics.
>>>
>>>     If it makes sense, improving optimize.portfolio and
>>>     constrained_objective would improve the performance of
>>>     PortfolioAnalytics for all solvers, not just random portfolios.
>>>       That gain would need to come from benefits realizable even with an
>>>     arbitrarily complex portfolio specification.
>>>
>>>
>>> Thanks for the detailed breakdown and analysis of improving the
>>> performance in PortfolioAnalytics!
>>>
>>
>> Absolutely.  Thanks for working on it!
>>
>>
>> Regards,
>>
>>  Brian
>>
>>
>>
>>>     On 11/12/2013 10:43 PM, Ross Bennett wrote:
>>>
>>>         All,
>>>
>>>         Over the course of the google summer of code project, I learned
>>>         a lot
>>>         about the random portfolios algorithm (among many other topics)
>>> and
>>>         became quite fascinated with the concept. I had some free time
>>>         over the
>>>         weekend and decided to implement random portfolio optimization
>>> using
>>>         Rcpp. My motivation for doing this was to learn C++ and Rcpp
>>> with no
>>>         expectation of how much faster this could actually be.
>>>
>>>         Here are the results of two benchmarks I did.
>>>
>>>         This first benchmark is just generating random portfolios.
>>>
>>>               test replications elapsed relative
>>>         1     pa             10 188.74     6.583
>>>         2 rcpp_s           10   28.67    1.000
>>>
>>>         The next benchmark is the actual optimization.
>>>
>>>             test replications elapsed relative
>>>         1   pa            10  211.027    808.5
>>>         2 rcpp           10      0.261    1.000
>>>
>>>         I am a beginner at C++ so I am pretty sure there are further
>>>         improvements that can be made with my C++ code.
>>>
>>>         The benchmark results got me thinking that we might be able to
>>>         use this
>>>         in PortfolioAnalytics. The RcppRP package I started this weekend
>>> is
>>>         really rough around the edges, but with some more improvements
>>> could
>>>         serve as an alternate optimization method for random portfolios.
>>> We
>>>         could have something like optimize.portfolio(...,
>>>         optimize_method="random_rcpp") that calls the proper functions
>>>         from RcppRP.
>>>
>>>         The RcppRP package is on my github page if you are interested in
>>>         looking
>>>         at the code.
>>>         https://github.com/rossb34/__RcppRP
>>>
>>>         <https://github.com/rossb34/RcppRP>
>>>
>>>         Any thoughts on if this is worth continuing to pursue? Either
>>>         way I plan
>>>         to continue working on RcppRP for the sole purpose of learning
>>>         C++ and Rcpp.
>>>
>>>         Regards,
>>>         Ross
>>>
>>>
>>>
>>>         _________________________________________________
>>>         GSoC-PortA mailing list
>>>         GSoC-PortA at lists.r-forge.r-__project.org
>>>         <mailto:GSoC-PortA at lists.r-forge.r-project.org>
>>>         http://lists.r-forge.r-__project.org/cgi-bin/mailman/__
>>> listinfo/gsoc-porta
>>>
>>>         <http://lists.r-forge.r-project.org/cgi-bin/mailman/
>>> listinfo/gsoc-porta>
>>>
>>>
>>>
>>>     --
>>>     Brian G. Peterson
>>>     http://braverock.com/brian/
>>>     Ph: 773-459-4973 <tel:773-459-4973>
>>>     IM: bgpbraverock
>>>     _________________________________________________
>>>     GSoC-PortA mailing list
>>>     GSoC-PortA at lists.r-forge.r-__project.org
>>>     <mailto:GSoC-PortA at lists.r-forge.r-project.org>
>>>     http://lists.r-forge.r-__project.org/cgi-bin/mailman/__
>>> listinfo/gsoc-porta
>>>     <http://lists.r-forge.r-project.org/cgi-bin/mailman/
>>> listinfo/gsoc-porta>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> GSoC-PortA mailing list
>>> GSoC-PortA at lists.r-forge.r-project.org
>>> http://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/gsoc-porta
>>>
>>>
>>
>> --
>> Brian G. Peterson
>> http://braverock.com/brian/
>> Ph: 773-459-4973
>> IM: bgpbraverock
>> _______________________________________________
>> GSoC-PortA mailing list
>> GSoC-PortA at lists.r-forge.r-project.org
>> http://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/gsoc-porta
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/gsoc-porta/attachments/20131114/09d0fa99/attachment-0001.html>


More information about the GSoC-PortA mailing list