[Blotter-commits] r1662 - in pkg/quantstrat: . sandbox/backtest_musings

Fri Dec 26 21:13:24 CET 2014

Author: braverock
Date: 2014-12-26 21:13:24 +0100 (Fri, 26 Dec 2014)
New Revision: 1662

Modified:
   pkg/quantstrat/DESCRIPTION
   pkg/quantstrat/sandbox/backtest_musings/stat_process.bib
   pkg/quantstrat/sandbox/backtest_musings/strat_dev_process.Rmd
   pkg/quantstrat/sandbox/backtest_musings/strat_dev_process.pdf
Log:
- add notes on data transformation for predictor values, references to Kuhn 2013

Modified: pkg/quantstrat/DESCRIPTION
===================================================================

--- pkg/quantstrat/DESCRIPTION	2014-12-07 21:32:37 UTC (rev 1661)
+++ pkg/quantstrat/DESCRIPTION	2014-12-26 20:13:24 UTC (rev 1662)
@@ -1,7 +1,7 @@
 Package: quantstrat
 Type: Package
 Title: Quantitative Strategy Model Framework
-Version: 0.9.1657
+Version: 0.9.1662
 Date: $Date$
 Author: Peter Carl, Brian G. Peterson, Joshua Ulrich, Jan Humme
 Depends:

Modified: pkg/quantstrat/sandbox/backtest_musings/stat_process.bib
===================================================================
--- pkg/quantstrat/sandbox/backtest_musings/stat_process.bib	2014-12-07 21:32:37 UTC (rev 1661)
+++ pkg/quantstrat/sandbox/backtest_musings/stat_process.bib	2014-12-26 20:13:24 UTC (rev 1662)
@@ -153,6 +153,14 @@
   publisher={McGraw-Hill}
 }
 
+ at Book{Kuhn2013,
+  Title                    = {Applied predictive modeling},
+  Author                   = {Kuhn, Max and Johnson, Kjell},
+  Publisher                = {Springer},
+  Year                     = {2013},
+  url                      = {http://appliedpredictivemodeling.com/}
+}
+
 @misc{mistakes2011,
   author={Martha K. Smith},
   author_sort={Smith, Martha},

Modified: pkg/quantstrat/sandbox/backtest_musings/strat_dev_process.Rmd
===================================================================
--- pkg/quantstrat/sandbox/backtest_musings/strat_dev_process.Rmd	2014-12-07 21:32:37 UTC (rev 1661)
+++ pkg/quantstrat/sandbox/backtest_musings/strat_dev_process.Rmd	2014-12-26 20:13:24 UTC (rev 1662)
@@ -484,7 +484,7 @@
 of training and cross validation tests in every part of strategy evaluation. 
 Of particular utility is the application of these techniques to each 
 *component* of the strategy, in turn, rather than or before testing the 
-entire strategy model.
+entire strategy model. (see, e.g. @Kuhn2013)
 
 \newthought{Defining the objectives, hypotheses, and expected outcome(s)} of the 
 experiment (backtest) as declared before any strategy code is written or run 
@@ -545,6 +545,18 @@
 chosen subsets.  Failure to exercise care here leads almost 
 inevitably to overfitting (and poor out of sample results). 
 
+An indicator is, at it's core, a measurement used to make a prediction.
+This means that the broader literature on statistical predictors is valid.
+Many techniques have been developed by statisticians and other modelers 
+to improve the predictive value of model inputs.  See @Kuhn2013 or Hastie 
+!@Hastie2009. Input scaling, detrending, centering, de-correlating, and many 
+other techniques may all be applicable.  The correct adjustments or 
+transformations will depend on the nature of the specific indicator.  
+Luckily, the statistics literature is also full of diagnostics to help
+determine which methods to apply, and what their impact is.  You do need
+to remain cognizant of what you give up in each of these cases in terms
+of interpretability, trace-ability, or microstructure of the data.
+
 It is also important to be aware of the structural dangers of bars.  Many 
 indicators are constructed on periodic or "bar" data.  Bars are not a 
 particularly stable analytical unit, and are often sensitive to exact 
@@ -631,7 +643,7 @@
 signals, holding period, etc.) to the process you are evaluating, to ensure 
 that you are really looking at comparable things. 
 
-Because every signal is a prediction, when analysing of signal processes, we 
+Because every signal is a prediction, when analyzing signal processes, we 
 can begin to fully apply the literature on model specification and testing of 
 predictions.  From the simplest available methods such as mean squared model 
 error or kernel distance from an ideal process, through extensive evaluation 
@@ -680,6 +692,18 @@
 exact entry rule from the signal process, there will likely be a positive 
 expectation for the entry in production.
 
+Another analysis of entry rules that may be carried out both on the backtest 
+and in post-trade analysis is to extract the distribution of duration between 
+entering the order and getting a fill.  Differences between the backtest and 
+production will provide you with information to calibrate the backtest 
+expectations.  Information from post trade analysis will provide you with 
+information to calibrate your execution and microstructure parameters.  
+You can also analyze how conservative or aggressive your backtest fill 
+assumptions are by analyzing how many opportunities you may have had to 
+trade at the order price after entering the order but before the price 
+changes, or how many shares or contracts traded at your order price before 
+you would have moved or canceled the order. 
+
 ## exit rules
 
 There are two primary classes of exit rules, signal based and empirical;
@@ -1157,7 +1181,7 @@
     \includegraphics{MAE}
     \end{marginfigure}
 Maximum Adverse Excursion (MAE) or Maximum Favorable Excursion (MFE) show how 
-far down (or up) every trade went during the course of its lifecycle.  You 
+far down (or up) every trade went during the course of its life-cycle.  You 
 can capture information on how many trades close close to their highs or lows,
 as well as evaluating points at which the P&L for the trade statistically
 just isn't going to get any better, or isn't going to recover.  While
@@ -1295,7 +1319,7 @@
 - tail risk measures
 - volatility analysis
 - factor analysis
-    - factor model monte carlo
+    - factor model Monte Carlo
 - style analysis
 - comparing strategies in return space
 - applicability to asset allocation (see below)
@@ -1339,14 +1363,14 @@
 
 \newthought{What if you don't have enough data?} Let's suppose you want 500 
 observations for each of 50 synthetic assets based on the rule of thumb above. 
-That is approximately two years of daily returns.  This number of obervations
+That is approximately two years of daily returns.  This number of observations
 would likely produce a high degree of confidence if you had been running the
 strategy on 50 synthetic assets for two years in a stable way. If you 
 want to allocate capital in alignment with your business objectives before you 
 have enough data, you can do a number of things:
 - use the data you have and re-optimize frequently to check for stability and 
   add more data
-- use higher frequency data, e.g. hourly instead fo daily
+- use higher frequency data, e.g. hourly instead of daily
 - use a technique such as Factor Model Monte Carlo (Jiang 2007, Zivot 2011,2012) 
   to construct equal histories
 - optimize over fewer assets, requiring a smaller history
@@ -1414,7 +1438,7 @@
 that may identify how much out of sample deterioration could be expected.
 
 With all of the methods described in this section, it is important to note 
-that you ar no longer measuring performance; that was covered in prior 
+that you are no longer measuring performance; that was covered in prior 
 sections.  At this point, we are measuring statistical error, and developing
 or refuting the level of confidence which may be appropriate for the backtest
 results.  We have absolutely *fitted* the strategy to the data, but is it 
@@ -1539,7 +1563,7 @@
 OOS periods in walk forward analysis are effectively validation sets as in
 cross validation.  You can and should measure the out of sample deterioration
 of your walk forward model between the IS performance and the OOS performance 
-of the model.  One advantage od walk forward is that it allows parameters to 
+of the model.  One advantage of walk forward is that it allows parameters to 
 change with the data.  One disadvantage is that there is a temptation to make
 the OOS periods for walk forward analysis rather small, making it very 
 difficult to measure deterioration from the training period.  Another potential 
@@ -1549,7 +1573,7 @@
 considered an advantage. The IS periods are not all independent draws from the 
 data, and the OOS periods will later be used as IS periods, so any analytical
 technique that assumes *i.i.d.* observations should be viewed at least with 
-scepticism.
+skepticism.
 
 *k*-fold cross validation improves on the classical single hold-out OOS 
 model by randomly dividing the sample of size *T* into sequential 
@@ -1591,6 +1615,7 @@
 
 - data mining bias and cross validation from @Aronson2006
 
+\newpage 
 
 # Acknowledgements
 

Modified: pkg/quantstrat/sandbox/backtest_musings/strat_dev_process.pdf
===================================================================
(Binary files differ)