[Blotter-commits] r1662 - in pkg/quantstrat: . sandbox/backtest_musings
noreply at r-forge.r-project.org
noreply at r-forge.r-project.org
Fri Dec 26 21:13:24 CET 2014
Author: braverock
Date: 2014-12-26 21:13:24 +0100 (Fri, 26 Dec 2014)
New Revision: 1662
Modified:
pkg/quantstrat/DESCRIPTION
pkg/quantstrat/sandbox/backtest_musings/stat_process.bib
pkg/quantstrat/sandbox/backtest_musings/strat_dev_process.Rmd
pkg/quantstrat/sandbox/backtest_musings/strat_dev_process.pdf
Log:
- add notes on data transformation for predictor values, references to Kuhn 2013
Modified: pkg/quantstrat/DESCRIPTION
===================================================================
--- pkg/quantstrat/DESCRIPTION 2014-12-07 21:32:37 UTC (rev 1661)
+++ pkg/quantstrat/DESCRIPTION 2014-12-26 20:13:24 UTC (rev 1662)
@@ -1,7 +1,7 @@
Package: quantstrat
Type: Package
Title: Quantitative Strategy Model Framework
-Version: 0.9.1657
+Version: 0.9.1662
Date: $Date$
Author: Peter Carl, Brian G. Peterson, Joshua Ulrich, Jan Humme
Depends:
Modified: pkg/quantstrat/sandbox/backtest_musings/stat_process.bib
===================================================================
--- pkg/quantstrat/sandbox/backtest_musings/stat_process.bib 2014-12-07 21:32:37 UTC (rev 1661)
+++ pkg/quantstrat/sandbox/backtest_musings/stat_process.bib 2014-12-26 20:13:24 UTC (rev 1662)
@@ -153,6 +153,14 @@
publisher={McGraw-Hill}
}
+ at Book{Kuhn2013,
+ Title = {Applied predictive modeling},
+ Author = {Kuhn, Max and Johnson, Kjell},
+ Publisher = {Springer},
+ Year = {2013},
+ url = {http://appliedpredictivemodeling.com/}
+}
+
@misc{mistakes2011,
author={Martha K. Smith},
author_sort={Smith, Martha},
Modified: pkg/quantstrat/sandbox/backtest_musings/strat_dev_process.Rmd
===================================================================
--- pkg/quantstrat/sandbox/backtest_musings/strat_dev_process.Rmd 2014-12-07 21:32:37 UTC (rev 1661)
+++ pkg/quantstrat/sandbox/backtest_musings/strat_dev_process.Rmd 2014-12-26 20:13:24 UTC (rev 1662)
@@ -484,7 +484,7 @@
of training and cross validation tests in every part of strategy evaluation.
Of particular utility is the application of these techniques to each
*component* of the strategy, in turn, rather than or before testing the
-entire strategy model.
+entire strategy model. (see, e.g. @Kuhn2013)
\newthought{Defining the objectives, hypotheses, and expected outcome(s)} of the
experiment (backtest) as declared before any strategy code is written or run
@@ -545,6 +545,18 @@
chosen subsets. Failure to exercise care here leads almost
inevitably to overfitting (and poor out of sample results).
+An indicator is, at it's core, a measurement used to make a prediction.
+This means that the broader literature on statistical predictors is valid.
+Many techniques have been developed by statisticians and other modelers
+to improve the predictive value of model inputs. See @Kuhn2013 or Hastie
+!@Hastie2009. Input scaling, detrending, centering, de-correlating, and many
+other techniques may all be applicable. The correct adjustments or
+transformations will depend on the nature of the specific indicator.
+Luckily, the statistics literature is also full of diagnostics to help
+determine which methods to apply, and what their impact is. You do need
+to remain cognizant of what you give up in each of these cases in terms
+of interpretability, trace-ability, or microstructure of the data.
+
It is also important to be aware of the structural dangers of bars. Many
indicators are constructed on periodic or "bar" data. Bars are not a
particularly stable analytical unit, and are often sensitive to exact
@@ -631,7 +643,7 @@
signals, holding period, etc.) to the process you are evaluating, to ensure
that you are really looking at comparable things.
-Because every signal is a prediction, when analysing of signal processes, we
+Because every signal is a prediction, when analyzing signal processes, we
can begin to fully apply the literature on model specification and testing of
predictions. From the simplest available methods such as mean squared model
error or kernel distance from an ideal process, through extensive evaluation
@@ -680,6 +692,18 @@
exact entry rule from the signal process, there will likely be a positive
expectation for the entry in production.
+Another analysis of entry rules that may be carried out both on the backtest
+and in post-trade analysis is to extract the distribution of duration between
+entering the order and getting a fill. Differences between the backtest and
+production will provide you with information to calibrate the backtest
+expectations. Information from post trade analysis will provide you with
+information to calibrate your execution and microstructure parameters.
+You can also analyze how conservative or aggressive your backtest fill
+assumptions are by analyzing how many opportunities you may have had to
+trade at the order price after entering the order but before the price
+changes, or how many shares or contracts traded at your order price before
+you would have moved or canceled the order.
+
## exit rules
There are two primary classes of exit rules, signal based and empirical;
@@ -1157,7 +1181,7 @@
\includegraphics{MAE}
\end{marginfigure}
Maximum Adverse Excursion (MAE) or Maximum Favorable Excursion (MFE) show how
-far down (or up) every trade went during the course of its lifecycle. You
+far down (or up) every trade went during the course of its life-cycle. You
can capture information on how many trades close close to their highs or lows,
as well as evaluating points at which the P&L for the trade statistically
just isn't going to get any better, or isn't going to recover. While
@@ -1295,7 +1319,7 @@
- tail risk measures
- volatility analysis
- factor analysis
- - factor model monte carlo
+ - factor model Monte Carlo
- style analysis
- comparing strategies in return space
- applicability to asset allocation (see below)
@@ -1339,14 +1363,14 @@
\newthought{What if you don't have enough data?} Let's suppose you want 500
observations for each of 50 synthetic assets based on the rule of thumb above.
-That is approximately two years of daily returns. This number of obervations
+That is approximately two years of daily returns. This number of observations
would likely produce a high degree of confidence if you had been running the
strategy on 50 synthetic assets for two years in a stable way. If you
want to allocate capital in alignment with your business objectives before you
have enough data, you can do a number of things:
- use the data you have and re-optimize frequently to check for stability and
add more data
-- use higher frequency data, e.g. hourly instead fo daily
+- use higher frequency data, e.g. hourly instead of daily
- use a technique such as Factor Model Monte Carlo (Jiang 2007, Zivot 2011,2012)
to construct equal histories
- optimize over fewer assets, requiring a smaller history
@@ -1414,7 +1438,7 @@
that may identify how much out of sample deterioration could be expected.
With all of the methods described in this section, it is important to note
-that you ar no longer measuring performance; that was covered in prior
+that you are no longer measuring performance; that was covered in prior
sections. At this point, we are measuring statistical error, and developing
or refuting the level of confidence which may be appropriate for the backtest
results. We have absolutely *fitted* the strategy to the data, but is it
@@ -1539,7 +1563,7 @@
OOS periods in walk forward analysis are effectively validation sets as in
cross validation. You can and should measure the out of sample deterioration
of your walk forward model between the IS performance and the OOS performance
-of the model. One advantage od walk forward is that it allows parameters to
+of the model. One advantage of walk forward is that it allows parameters to
change with the data. One disadvantage is that there is a temptation to make
the OOS periods for walk forward analysis rather small, making it very
difficult to measure deterioration from the training period. Another potential
@@ -1549,7 +1573,7 @@
considered an advantage. The IS periods are not all independent draws from the
data, and the OOS periods will later be used as IS periods, so any analytical
technique that assumes *i.i.d.* observations should be viewed at least with
-scepticism.
+skepticism.
*k*-fold cross validation improves on the classical single hold-out OOS
model by randomly dividing the sample of size *T* into sequential
@@ -1591,6 +1615,7 @@
- data mining bias and cross validation from @Aronson2006
+\newpage
# Acknowledgements
Modified: pkg/quantstrat/sandbox/backtest_musings/strat_dev_process.pdf
===================================================================
(Binary files differ)
More information about the Blotter-commits
mailing list