[Highfrequency-commits] r76 - pkg/highfrequency/R
noreply at r-forge.r-project.org
noreply at r-forge.r-project.org
Thu Aug 28 17:25:23 CEST 2014
Author: kboudt
Date: 2014-08-28 17:25:22 +0200 (Thu, 28 Aug 2014)
New Revision: 76
Added:
pkg/highfrequency/R/spotvol.r
Log:
Added: pkg/highfrequency/R/spotvol.r
===================================================================
--- pkg/highfrequency/R/spotvol.r (rev 0)
+++ pkg/highfrequency/R/spotvol.r 2014-08-28 15:25:22 UTC (rev 76)
@@ -0,0 +1,1269 @@
+#' Spot volatility estimation
+#'
+#' The \code{spotvolatility} package offers several methods to estimate spot
+#' volatility and its intraday seasonality, using high-frequency data.
+#'
+#' @details
+#' The following spot volatility estimation methods have been implemented:
+#'
+#' @section Deterministic periodicity:
+#' The spot volatility is decomposed into a a deterministic periodic factor
+#' f_{i} (identical for every day in the sample) and a daily factor s_{t}
+#' (identical for all observations within a day). Both components are then
+#' estimated separately. For more details, see Taylor and Xu (1997) and
+#' Andersen and Bollerslev (1997). The jump robust versions by Boudt et al.
+#' (2011) have also been implemented.
+#'
+#' @section Stochastic periodicity:
+#' This method by Beltratti and Morana (2001) assumes the periodicity factor to
+#' be stochastic. The spot volatility estimation is split into four components:
+#' a random walk, an autoregressive process, a stochastic cyclical process and
+#' a deterministic cyclical process. The model is estimated using a
+#' quasi-maximum likelihood method based on the Kalman Filter. The package
+#' \code{\link[=fkf]{FKF}} is used to apply the Kalman filter.
+#'
+#' @section Nonparametric filtering:
+#' This method by Kristensen (2010) filters the spot volatility in a
+#' nonparametric way by applying kernel weights to the standard realized
+#' volatility estimator. Different kernels and bandwidths can be used to focus
+#' on specific characteristics of the volatility process.
+#'
+#' @section Piecewise constant volatility:
+#' Another nonparametric method is that of Fried (2012), which assumes the
+#' volatility to be piecewise constant over local windows. Robust two-sample
+#' tests are applied to detect changes in variability between subsequent
+#' windows. The spot volatility can then be estimated by evaluating
+#' regular realized volatility estimators within each local window.
+#'
+#' @section GARCH models with intraday seasonality:
+#' The package also includes an option to apply GARCH models, implemented by
+#' the \code{\link{rugarch}} package, to estimate spot volatility from intraday
+#' data. This is done by including external regressors in the model. These
+#' regressors are based on a flexible Fourier form, which was also used in the
+#' stochastic and deterministic periodicity estimation methods.
+#'
+#' @template references
+#'
+#' @docType package
+#' @name spotvolatility
+NULL
+
+#' Spot volatility estimation
+#'
+#' @description
+#'
+#' The \code{spotvol} function offers several methods to estimate spot
+#' volatility and its intraday seasonality, using high-frequency data. It
+#' returns an object of class \code{spotvol}, which can contain various outputs,
+#' depending on the method used. See 'Details' for a description of each method.
+#' In any case, the output will contain the spot volatility estimates.
+#'
+#' The input can consist of price data or return data, either tick by tick or
+#' sampled at set intervals. The data will be converted to equispaced
+#' high-frequency returns \eqn{r_{t,i}} (read: the \eqn{i}th return on day
+#' \eqn{t}).
+#'
+#' @details The following estimation methods can be specified in \code{method}:
+#'
+#' \strong{Deterministic periodicity method (\code{"detper"})}
+#'
+#' Parameters:
+#' \tabular{ll}{
+#' \code{dailyvol} \tab A string specifying the estimation method for the daily
+#' component \eqn{s_t}. Possible values are \code{"bipower", "rv", "medrv"}.
+#' Default = \code{"bipower"}. \cr
+#' \code{periodicvol} \tab A string specifying the estimation method for the
+#' component of intraday volatility, that depends in a deterministic way on the
+#' intraday time at which the return is observed. Possible values are
+#' \code{"TML", "SD", "WSD", "OLS"}. See Boudt et al. (2011) for details.
+#' Default = \code{"TML"}.\cr
+#' \code{P1} \tab A positive integer corresponding to the number of cosinus
+#' terms used in the flexible Fourier specification of the periodicity function,
+#' see Andersen et al. (1997) for details. Default = 5. \cr
+#' \code{P2} \tab Same as \code{P1}, but for the sinus terms. Default = 5.\cr
+#' \code{dummies} \tab Boolean: in case it is \code{TRUE}, the parametric
+#' estimator of periodic standard deviation specifies the periodicity function
+#' as the sum of dummy variables corresponding to each intraday period. If it
+#' is \code{FALSE}, the parametric estimator uses the flexible Fourier
+#' specification. Default = \code{FALSE}.
+#' }
+#' Outputs (see 'Value' for a full description of each component):
+#' \itemize{
+#' \item{\code{spot}}
+#' \item{\code{daily}}
+#' \item{\code{periodic}}
+#' }
+#' The spot volatility is decomposed into a deterministic periodic factor
+#' \eqn{f_{i}} (identical for every day in the sample) and a daily factor
+#' \eqn{s_{t}} (identical for all observations within a day). Both components
+#' are then estimated separately. For more details, see Taylor and Xu (1997)
+#' and Andersen and Bollerslev (1997). The jump robust versions by Boudt et al.
+#' (2011) have also been implemented.
+#'
+#' \strong{Stochastic periodicity method (\code{"stochper"})}
+#'
+#' Parameters:
+#' \tabular{ll}{
+#' \code{P1} \tab A positive integer corresponding to the number of cosinus
+#' terms used in the flexible Fourier specification of the periodicity function.
+#' Default = 5. \cr
+#' \code{P2} \tab Same as \code{P1}, but for the sinus terms. Default = 5.\cr
+#' \code{init} \tab A named list of initial values to be used in the
+#' optimization routine (\code{"BFGS"} in \code{\link{optim}}). Default =
+#' \code{list(sigma = 0.03, sigma_mu = 0.005, sigma_h = 0.005, sigma_k = 0.05,
+#' phi = 0.2, rho = 0.98, mu = c(2, -0.5), delta_c = rep(0, max(1,P1)),
+#' delta_s = rep(0, max(1,P2)))}. See Beltratti & Morana (2001) for a definition
+#' of each parameter. \code{init} can contain any number of these parameters.
+#' For parameters not specified in \code{init}, the default initial value will
+#' be used.\cr
+#' \code{control} \tab A list of options to be passed down to
+#' \code{\link{optim}}.
+#' }
+#' Outputs (see 'Value' for a full description of each component):
+#' \itemize{
+#' \item{\code{spot}}
+#' \item{\code{par}}
+#' }
+#' This method by Beltratti and Morana (2001) assumes the periodicity factor to
+#' be stochastic. The spot volatility estimation is split into four components:
+#' a random walk, an autoregressive process, a stochastic cyclical process and
+#' a deterministic cyclical process. The model is estimated using a
+#' quasi-maximum likelihood method based on the Kalman Filter. The package
+#' \code{\link[=fkf]{FKF}} is used to apply the Kalman filter. In addition to
+#' the spot volatility estimates, all parameter estimates are returned.
+#'
+#' \strong{Nonparametric filtering (\code{"kernel"})}
+#'
+#' Parameters:
+#' \tabular{ll}{
+#' \code{type} \tab String specifying the type of kernel to be used. Options
+#' include \code{"gaussian", "epanechnikov", "beta"}. Default =
+#' \code{"gaussian"}.\cr
+#' \code{h} \tab Scalar or vector specifying bandwidth(s) to be used in kernel.
+#' If \code{h} is a scalar, it will be assumed equal throughout the sample. If
+#' it is a vector, it should contain bandwidths for each day. If left empty,
+#' it will be estimated. Default = \code{NULL}. \cr
+#' \code{est} \tab String specifiying the bandwidth estimation method. Possible
+#' values include \code{"cv", "quarticity"}. Method \code{"cv"} equals
+#' cross-validation, which chooses the bandwidth that minimizes the Integrated
+#' Square Error. \code{"quarticity"} multiplies the simple plug-in estimator
+#' by a factor based on the daily quarticity of the returns. \code{est} is
+#' obsolete if \code{h} has already been specified by the user. Default =
+#' \code{"cv"}.\cr
+#' \code{lower} \tab Lower bound to be used in bandwidth optimization routine,
+#' when using cross-validation method. Default is \eqn{0.1n^{-0.2}}. \cr
+#' \code{upper} \tab Upper bound to be used in bandwidth optimization routine,
+#' when using cross-validation method. Default is \eqn{n^{-0.2}}. \cr
+#' }
+#' Outputs (see 'Value' for a full description of each component):
+#' \itemize{
+#' \item{\code{spot}}
+#' \item{\code{par}}
+#' }
+#' This method by Kristensen (2010) filters the spot volatility in a
+#' nonparametric way by applying kernel weights to the standard realized
+#' volatility estimator. Different kernels and bandwidths can
+#' be used to focus on specific characteristics of the volatility process.
+#'
+#' Estimation results heavily depend on the bandwidth parameter \eqn{h}, so it
+#' is important that this parameter is well chosen. However, it is difficult to
+#' come up with a method that determines the optimal bandwidth for any kind of
+#' data or kernel that can be used. Although some estimation methods are
+#' provided, it is advised that you specify \eqn{h} yourself, or make sure that
+#' the estimation results are appropiate.
+#'
+#' One way to estimate \eqn{h}, is by using cross-validation. For each day in
+#' the sample, \eqn{h} is chosen as to minimize the Integrated Square Error,
+#' which is a function of \eqn{h}. However, this function often has multiple
+#' local minima, or no minima at all (\eqn{h -> \Inf}). To ensure a reasonable
+#' optimum is reached, strict boundaries have to be imposed on \eqn{h}. These
+#' can be specified by \code{lower} and \code{upper}, which by default are
+#' \eqn{0.1n^{-0.2}} and \eqn{n^{-0.2}} respectively, where \eqn{n} is the
+#' number of observations in a day.
+#'
+#' When using the method \code{"kernel"}, in addition to the spot volatility
+#' estimates, all used values of the bandwidth \eqn{h} are returned.
+#'
+#' \strong{Piecewise constant volatility (\code{"piecewise"})}
+#'
+#' Parameters:
+#' \tabular{ll}{
+#' \code{type} \tab String specifying the type of test to be used. Options
+#' include \code{"MDa", "MDb", "DM"}. See Fried (2012) for details. Default =
+#' \code{"MDa"}.\cr
+#' \code{m} \tab Number of observations to include in reference window.
+#' Default = \code{40}. \cr
+#' \code{n} \tab Number of observations to include in test window.
+#' Default = \code{20}. \cr
+#' \code{alpha} \tab Significance level to be used in tests. Note that the test
+#' will be executed many times (roughly equal to the total number of
+#' observations), so it is advised to use a small value for \code{alpha}, to
+#' avoid a lot of false positives. Default = \code{0.005}. \cr
+#' \code{volest} \tab String specifying the realized volatility estimator to be
+#' used in local windows. Possible values are \code{"bipower", "rv", "medrv"}.
+#' Default = \code{"bipower"}. \cr
+#' \code{online} \tab Boolean indicating whether estimations at a certain point
+#' \eqn{t} should be done online (using only information available at
+#' \eqn{t-1}), or ex post (using all observations between two change points).
+#' Default = \code{TRUE}. \cr
+#' }
+#' Outputs (see 'Value' for a full description of each component):
+#' \itemize{
+#' \item{\code{spot}}
+#' \item{\code{cp}}
+#' }
+#'
+#' This nonparametric method by Fried (2012) assumes the volatility to be
+#' piecewise constant over local windows. Robust two-sample tests are applied to
+#' detect changes in variability between subsequent windows. The spot volatility
+#' can then be estimated by evaluating regular realized volatility estimators
+#' within each local window.
+#'
+#' Along with the spot volatility estimates, this method will return the
+#' detected change points in the volatility level. When plotting a
+#' \code{spotvol} object containing \code{cp}, these change points will be
+#' visualized.
+#'
+#' \strong{GARCH models with intraday seasonality (\code{"garch"})}
+#'
+#' Parameters:
+#' \tabular{ll}{
+#' \code{model} \tab String specifying the type of test to be used. Options
+#' include \code{"sGARCH", "eGARCH"}. See \code{\link{ugarchspec}} in the
+#' \code{\link{rugarch}} package. Default = \code{"eGARCH"}. \cr
+#' \code{garchorder} \tab Numeric value of length 2, containing the order of
+#' the GARCH model to be estimated. Default = \code{c(1,1)}. \cr
+#' \code{dist} \tab String specifying the distribution to be assumed on the
+#' innovations. See \code{distribution.model} in \code{\link{ugarchspec}} for
+#' possible options. Default = \code{"norm"}. \cr
+#' \code{solver.control} \tab List containing solver options.
+#' See \code{\link{ugarchfit}} for possible values. Default = \code{list()}. \cr
+#' \code{P1} \tab A positive integer corresponding to the number of cosinus
+#' terms used in the flexible Fourier specification of the periodicity function.
+#' Default = 5. \cr
+#' \code{P2} \tab Same as \code{P1}, but for the sinus terms. Default = 5.\cr
+#' }
+#' Outputs (see 'Value' for a full description of each component):
+#' \itemize{
+#' \item{\code{spot}}
+#' \item{\code{ugarchfit}}
+#' }
+#' This method generates the external regressors needed to model the intraday
+#' seasonality with a Flexible Fourier form. The \code{\link{rugarch}} package
+#' is then employed to estimate the specified GARCH(1,1) model.
+#'
+#' Along with the spot volatility estimates, this method will return the
+#' \code{\link{ugarchfit}} object used by the \code{\link{rugarch}} package.
+#'
+#' @param data Either an \code{\link{xts}} object, containing price data, or a
+#' \code{matrix} containing returns. For price data, irregularly spaced
+#' observations are allowed. They will be aggregated to the level specified by
+#' parameters \code{on} and \code{k}. For return data, the observations are
+#' assumed to be equispaced, with the time between them specified by \code{on}
+#' and \code{k}. Return data should be in matrix form, where each row
+#' corresponds to a day, and each column to an intraday period. The output
+#' will be in the same form as the input (\code{xts} or
+#' \code{matrix}/\code{numeric}).
+#' @param method specifies which method will be used to estimate the spot
+#' volatility. Options include \code{"detper"} and \code{"stochper"}.
+#' See 'Details'.
+#' @param on string indicating the time scale in which \code{k} is expressed.
+#' Possible values are: \code{"secs", "seconds", "mins", "minutes", "hours"}.
+#' @param k positive integer, indicating the number of periods to aggregate
+#' over. E.g. to aggregate an \code{xts} object to the 5 minute frequency, set
+#' \code{k = 5} and \code{on = "minutes"}.
+#' @param marketopen the market opening time. This should be in the time zone
+#' specified by \code{tz}. By default, \code{marketopen = "09:30:00"}.
+#' @param marketclose the market closing time. This should be in the time zone
+#' specified by \code{tz}. By default, \code{marketclose = "16:00:00"}.
+#' @param tz string specifying the time zone to which the times in \code{data}
+#' and/or \code{marketopen}/ \code{marketclose} belong. Default = \code{"GMT"}.
+#' @param ... method-specific parameters (see 'Details').
+#'
+#' @return
+#' A \code{spotvol} object, which is a list containing one or more of the
+#' following outputs, depending on the method used:
+#'
+#' \code{spot}
+#'
+#' An \code{xts} or \code{matrix} object (depending on the input) containing
+#' spot volatility estimates \eqn{\sigma_{t,i}}, reported for each interval
+#' \eqn{i} between \code{marketopen} and \code{marketclose} for every day
+#' \eqn{t} in \code{data}. The length of the intervals is specifiedby \code{k}
+#' and \code{on}. Methods that provide this output: All.
+#'
+#' \code{daily}
+#'
+#' An \code{xts} or \code{numeric} object (depending on the input) containing
+#' estimates of the daily volatility levels for each day \eqn{t} in \code{data},
+#' if the used method decomposed spot volatility into a daily and an intraday
+#' component. Methods that provide this output: \code{"detper"}.
+#'
+#' \code{periodic}
+#'
+#' An \code{xts} or \code{numeric} object (depending on the input) containing
+#' estimates of the intraday periodicity factor for each day interval \eqn{i}
+#' between \code{marketopen} and \code{marketclose}, if the spot volatility was
+#' decomposed into a daily and an intraday component. If the output is in
+#' \code{xts} format, this periodicity factor will be dated to the first day of
+#' the input data, but it is identical for each day in the sample. Methods that
+#' provide this output: \code{"detper"}.
+#'
+#' \code{par}
+#'
+#' A named list containing parameter estimates, for methods that estimate one
+#' or more parameters. Methods that provide this output:
+#' \code{"stochper", "kernel"}.
+#'
+#' \code{cp}
+#'
+#' A vector containing the change points in the volatility, i.e. the observation
+#' indices after which the volatility level changed, according to the applied
+#' tests. The vector starts with a 0. Methods that provide this output:
+#' \code{"piecewise"}.
+#'
+#' \code{ugarchfit}
+#'
+#' A \code{\link{ugarchfit}} object, as used by the \code{\link{rugarch}}
+#' package, containing all output from fitting the GARCH model to the data.
+#' Methods that provide this output: \code{"garch"}.
+#'
+#' @export
+#' @examples
+#' data(sample_real5minprices)
+#'
+#' # default method, deterministic periodicity
+#' vol1 <- spotvol(sample_real5minprices)
+#' par.def <- par(no.readonly = TRUE)
+#' plot(vol1)
+#' par(par.def)
+#'
+#' # compare to stochastic periodicity
+#' init = list(sigma = 0.03, sigma_mu = 0.005, sigma_h = 0.007,
+#' sigma_k = 0.06, phi = 0.194, rho = 0.986, mu = c(1.87,-0.42),
+#' delta_c = c(0.25, -0.05, -0.2, 0.13, 0.02), delta_s = c(-1.2,
+#' 0.11, 0.26, -0.03, 0.08))
+#' # next method will take around 110 iterations
+#' vol2 <- spotvol(sample_real5minprices, method = "stochper", init = init)
+#' plot(as.numeric(vol1$spot[1:780]), type="l")
+#' lines(as.numeric(vol2$spot[1:780]), col="red")
+#' legend("topright", c("detper", "stochper"), col = c("black", "red"), lty=1)
+#'
+#' # various kernel estimates
+#' h1 = bw.nrd0((1:nrow(sample_returns_5min))*(5*60))
+#' vol3 <- spotvol(sample_returns_5min, method = "kernel", h = h1)
+#' vol4 <- spotvol(sample_returns_5min, method = "kernel", est = "quarticity")
+#' vol5 <- spotvol(sample_returns_5min, method = "kernel", est = "cv")
+#' plot(vol3, length = 2880)
+#' lines(as.numeric(t(vol4$spot))[1:2880], col="red")
+#' lines(as.numeric(t(vol5$spot))[1:2880], col="blue")
+#' legend("topright", c("h = simple estimate", "h = quarticity corrected",
+#' "h = crossvalidated"), col = c("black", "red", "blue"), lty=1)
+#' par(par.def)
+#'
+#' # piecewise constant volatility, using an example from Fried (2012)
+#' simdata <- matrix(sqrt(5/3)*rt(3000, df = 5), ncol = 500, byrow = TRUE)
+#' simdata <- c(1, 1, 1.5, 1.5, 2, 1)*simdata
+#' # the volatility of the simulated now changes at 1000, 2000 and 2500
+#' vol6 <- spotvol(simdata, method = "piecewise", m = 200, n = 100,
+#' online = FALSE)
+#' plot(vol6)
+#'
+#' # compare regular GARCH(1,1) model to eGARCH, both with external regressors
+#' vol7 <- spotvol(sample_returns_5min, method = "garch", model = "sGARCH")
+#' vol8 <- spotvol(sample_returns_5min, method = "garch", model = "eGARCH")
+#' plot(as.numeric(t(vol7$spot)), type = "l")
+#' lines(as.numeric(t(vol8$spot)), col = "red")
+#' legend("topleft", c("GARCH", "eGARCH"), col = c("black", "red"), lty=1)
+#'
+#' @template references
+spotvol <- function(data, method = "detper", ..., on = "minutes", k = 5,
+ marketopen = "09:30:00", marketclose = "16:00:00",
+ tz = "GMT")
+{
+ require(timeDate)
+ require(chron)
+ if (on == "seconds" | on == "secs")
+ delta <- k
+ if (on == "minutes" | on == "mins")
+ delta <- k*60
+ if (on == "hours")
+ delta <- k*3600
+
+ if (inherits(data, what = "xts")) {
+ data <- xts(data, order.by = as.POSIXct(time(data), tz = tz), tzone = tz)
+ dates <- unique(format(time(data), "%Y-%m-%d"))
+ cDays <- length(dates)
+ rdata <- mR <- c()
+ intraday <- seq(from = times(marketopen), to = times(marketclose),
+ by = times(delta/(24*3600)))
+ if (as.character(tail(intraday, 1)) != marketclose)
+ intraday <- c(intraday, marketclose)
+ intraday <- intraday[2:length(intraday)]
+ for (d in 1:cDays) {
+ datad <- data[as.character(dates[d])]
+ if (!all(format(time(datad), format = "%Z") == tz))
+ stop(paste("Not all data on ", dates[d], " is in time zone \"", tz,
+ "\". This may be due to daylight saving time. Try using a",
+ " time zone without daylight saving, such as GMT.",
+ sep = ""))
+ datad <- aggregatePrice(datad, on = on, k = k , marketopen = marketopen,
+ marketclose = marketclose, tz = tz)
+ z <- xts(rep(1, length(intraday)), tzone = tz,
+ order.by = as.POSIXct(paste(dates[d], as.character(intraday),
+ sep=" "), tz = tz))
+ datad <- merge.xts(z, datad)$datad
+ datad <- na.locf(datad)
+ rdatad <- makeReturns(datad)
+ rdatad <- rdatad[time(rdatad) > min(time(rdatad))]
+ rdata <- rbind(rdata, rdatad)
+ mR <- rbind(mR, as.numeric(rdatad))
+ }
+ } else if (class(data) == "matrix") {
+ mR <- data
+ rdata <- NULL
+ } else stop("Input data has to consist of either of the following:
+ 1. An xts object containing price data
+ 2. A matrix containing return data")
+
+ options <- list(...)
+ out <- switch(method,
+ detper = detper(mR, rdata = rdata, options = options),
+ stochper = stochper(mR, rdata = rdata, options = options),
+ kernel = kernelestim(mR, rdata = rdata, delta, options = options),
+ piecewise = piecewise(mR, rdata = rdata, options = options),
+ garch = garch_s(mR, rdata = rdata, options = options))
+ return(out)
+}
+
+# Deterministic periodicity model
+#
+# Modified spotVol function from highfrequency package
+detper <- function(mR, rdata = NULL, options = list())
+{
+ # default options, replace if user-specified
+ op <- list(dailyvol = "bipower", periodicvol = "TML", dummies = FALSE,
+ P1 = 5, P2 = 5)
+ op[names(options)] <- options
+
+ cDays <- nrow(mR)
+ M <- ncol(mR)
+ if (cDays == 1 & is.null(rdata)) {
+ mR <- as.numeric(mR)
+ estimdailyvol <- switch(op$dailyvol,
+ bipower = rBPCov(mR),
+ medrv = medRV(mR),
+ rv = rCov(mR))
+ } else {
+ if (is.null(rdata)) {
+ estimdailyvol <- switch(op$dailyvol,
+ bipower = apply(mR, 1, "rBPCov"),
+ medrv = apply(mR, 1, "medRV"),
+ rv = apply(mR, 1, "rCov"))
+ } else {
+ estimdailyvol <- switch(op$dailyvol,
+ bipower = apply.daily(rdata, rBPCov),
+ medrv = apply.daily(rdata, medRV),
+ rv = apply.daily(rdata, rCov))
+ dates = time(estimdailyvol)
+ }
+ }
+ if (cDays <= 50) {
+ print("Periodicity estimation requires at least 50 observations.
+ Periodic component set to unity")
+ estimperiodicvol = rep(1, M)
+ } else {
+ mstdR <- mR/sqrt(as.numeric(estimdailyvol) * (1/M))
+ selection <- c(1:M)[ (nrow(mR)-apply(mR,2,'countzeroes')) >=20]
+ # preferably no na is between
+ selection <- c( min(selection) : max(selection) )
+ mstdR <- mstdR[,selection]
+ estimperiodicvol_temp <- diurnal(stddata = mstdR, method = op$periodicvol,
+ dummies = op$dummies, P1 = op$P1,
+ P2 = op$P2)[[1]]
+ estimperiodicvol <- rep(1,M)
+ estimperiodicvol[selection] <- estimperiodicvol_temp
+ mfilteredR <- mR/matrix(rep(estimperiodicvol, cDays), byrow = T,
+ nrow = cDays)
+ estimdailyvol <- switch(op$dailyvol,
+ bipower = apply(mfilteredR, 1, "rBPCov"),
+ medrv = apply(mfilteredR, 1, "medRV"),
+ rv = apply(mfilteredR, 1, "rCov"))
+ spot <- rep(sqrt(as.numeric(estimdailyvol) * (1/M)), each = M) *
+ rep(estimperiodicvol, cDays)
+ if (is.null(rdata)) {
+ spot <- matrix(spot, nrow = cDays, ncol = M, byrow = TRUE)
+ } else {
+ spot <- xts(spot, order.by = time(rdata))
+ estimdailyvol <- xts(estimdailyvol, order.by = dates)
+ estimperiodicvol <- xts(estimperiodicvol, order.by = time(rdata[1:M]))
+ }
+ out <- list(spot = spot, daily = estimdailyvol, periodic = estimperiodicvol)
+ class(out) <- "spotvol"
+ return(out)
+ }
+}
+
+# Stochastic periodicity model
+#
+# This function estimates the spot volatility by using the stochastic periodcity
+# model of Beltratti & Morana (2001)
+stochper <- function(mR, rdata = NULL, options = list())
+{
+ require(FKF)
+ # default options, replace if user-specified
+ op <- list(init = list(), P1 = 5, P2 = 5, control = list(trace=1, maxit=500))
+ op[names(options)] <- options
+
+ N <- ncol(mR)
+ days <- nrow(mR)
+ mR[mR == 0] <- NA
+ logr2 <- log(mR^2)
+ rvector <- as.vector(t(logr2))
+ lambda <- (2*pi)/N;
+
+ # default starting values of parameters
+ sp <- list(sigma = 0.03,
+ sigma_mu = 0.005,
+ sigma_h = 0.005,
+ sigma_k = 0.05,
+ phi = 0.2,
+ rho = 0.98,
+ mu = c(2, -0.5),
+ delta_c = rep(0, max(1,op$P1)),
+ delta_s = rep(0, max(1,op$P2)))
+
+ # replace if user has specified different values
+ sp[names(op$init)] <- op$init
+
+ # check input
+ for (i in c("sigma", "sigma_mu", "sigma_h", "sigma_k", "phi", "rho")) {
+ if (sapply(sp, length)[i] != 1) stop(paste(i, " must be a scalar"))
+ }
+ if (length(sp$mu) != 2)
+ stop("mu must have length 2")
+ if (length(sp$delta_c) != op$P1 & op$P1 > 0)
+ stop("delta_c must have length equal to P1")
+ if (length(sp$delta_s) != op$P2 & op$P2 > 0)
+ stop("delta_s must have length equal to P2")
+ if (length(sp$delta_c) < 1)
+ stop("delta_c must at least have length 1")
+ if (length(sp$delta_s) < 1)
+ stop("delta_s must at least have length 1")
+
+ # transform parameters to allow for unrestricted optimization
+ # (domain -Inf to Inf)
+ par_t <- c(sigma = log(sp$sigma), sigma_mu = log(sp$sigma_mu),
+ sigma_h = log(sp$sigma_h), sigma_k = log(sp$sigma_k),
+ phi = log(sp$phi/(1-sp$phi)), rho = log(sp$rho/(1-sp$rho)),
+ mu = sp$mu, delta_c = sp$delta_c, delta_s = sp$delta_s)
+
+ opt <- optim(par_t, loglikBM, yt = rvector, N = N, days = days, P1 = op$P1,
+ P2 = op$P2, method="BFGS", control = op$control)
+
+ # recreate model to obtain volatility estimates
+ ss <- ssmodel(opt$par, days, N, P1 = op$P1, P2 = op$P2)
+ kf <- fkf(a0 = ss$a0, P0 = ss$P0, dt = ss$dt, ct = ss$ct, Tt = ss$Tt,
+ Zt = ss$Zt, HHt = ss$HHt, GGt = ss$GGt,
+ yt = matrix(rvector, ncol = length(rvector)))
+ sigmahat <- as.vector(exp((ss$Zt%*%kf$at[,1:(N*days)] + ss$ct + 1.27)/2))
+
+ # transform parameter estimates back
+ estimates <- c(exp(opt$par["sigma"]), exp(opt$par["sigma_mu"]),
+ exp(opt$par["sigma_h"]), exp(opt$par["sigma_k"]),
+ exp(opt$par["phi"])/(1+exp(opt$par["phi"])),
+ exp(opt$par["rho"])/(1+exp(opt$par["rho"])), opt$par[-(1:6)])
+
+ if (is.null(rdata)) {
+ spot <- matrix(sigmahat, nrow = days, ncol = N, byrow = TRUE)
+ } else {
+ spot <- xts(sigmahat, order.by = time(rdata))
+ }
+ out <- list(spot = spot, par = estimates)
+ class(out) <- "spotvol"
+ return(out)
+}
+
+# Calculate log likelihood using Kalman Filter
+#
+# This function returns the average log likehood value of the stochastic
+# periodicity model, given the input parameters.
+loglikBM <- function(par_t, yt, days, N = 288, P1 = 5, P2 = 5)
+{
+ ss <- ssmodel(par_t, days, N, P1 = P1, P2 = P2)
+ yt <- matrix(yt, ncol = length(yt))
+ kf <- fkf(a0 = ss$a0, P0 = ss$P0, dt = ss$dt, ct = ss$ct, Tt = ss$Tt,
+ Zt = ss$Zt, HHt = ss$HHt, GGt = ss$GGt, yt = yt)
+ return(-kf$logLik/length(yt))
+}
+
+# Generate state space model
+#
+# This function creates the state space matrices from the input parameters.
+# The output is in the format used by the FKF package.
+ssmodel <- function(par_t, days, N = 288, P1 = 5, P2 = 5)
+{
+ par <- c(exp(par_t["sigma"]), exp(par_t["sigma_mu"]), exp(par_t["sigma_h"]),
+ exp(par_t["sigma_k"]), exp(par_t["phi"])/(1+exp(par_t["phi"])),
+ exp(par_t["rho"])/(1+exp(par_t["rho"])), par_t[-(1:6)])
+ lambda <- (2*pi)/288
+ a0 <- c(0, 0, par["delta_c1"], par["delta_s1"])
+ if (P1 == 0)
+ a0[3] <- par["delta_c"]
+ if (P2 == 0)
+ a0[4] <- par["delta_s"]
+ m <- length(a0)
+ P0 <- Tt <- Ht <- matrix(0, m, m)
+ diag(Tt) <- c(1, par["phi"], rep(par["rho"]*cos(lambda), 2))
+ Tt[3,4] <- par["rho"]*sin(lambda)
+ Tt[4,3] <- par["rho"]*-sin(lambda)
+ Zt <- matrix(c(1, 1, 1, 0), ncol = m)
+ Gt <- sqrt(0.5*pi^2)
+ GGt <- Gt %*% t(Gt)
+ diag(Ht) <- c(par["sigma_mu"], par["sigma_h"], rep(par["sigma_k"], 2))
+ HHt <- Ht %*% t(Ht)
+ dt <- matrix(0, nrow = m)
+ ct <- log(par["sigma"]^2) - 1.270363
+
+ # calculate deterministic part c2, add to ct
+ n <- 1:N
+ M1 <- (2*n)/(N+1)
+ M2 <- (6*n^2)/((N+1)*(N+2))
+ c2 <- par["mu1"]*M1 + par["mu2"]*M2
+ if (P1 > 1) {
+ for (k in 2:P1) {
+ c2 <- c2 + par[paste("delta_c", k, sep="")]*cos(k*lambda*n)
+ }
+ }
+ if (P2 > 1) {
+ for (p in 2:P2) {
+ c2 <- c2 + par[paste("delta_s", p, sep="")]*sin(p*lambda*n)
+ }
+ }
+ ct <- matrix(ct + c2, ncol = N*days)
+
+ return(list(a0 = a0, P0 = P0, Tt = Tt, Zt = Zt, GGt = GGt, HHt = HHt,
+ dt = dt, ct = ct))
+}
+
+# Kernel estimation method
+#
+# See Kristensen (2010)
+kernelestim <- function(mR, rdata = NULL, delta = 300, options = list())
+{
+ # default options, replace if user-specified
+ op <- list(type = "gaussian", h = NULL, est = "cv", lower = NULL,
+ upper = NULL)
+ op[names(options)] <- options
+
+ D <- nrow(mR)
+ N <- ncol(mR)
+ if (N < 100 & op$est == "cv")
+ warning("Cross-validation may not return optimal results in small samples.")
+ if (op$type == "beta" & op$est == "quarticity" ) {
+ warning("No standard estimator available for Beta kernel bandwidth.
+ Cross-validation will be used instead.")
+ op$est = "cv"
+ }
+ t <- (1:N)*delta
+ S <- N*delta
+ if (is.null(op$h)) {
+ h <- numeric(D)
+ } else {
+ h <- rep(op$h, length.out = D)
+ }
+ sigma2hat <- matrix(NA, nrow = D, ncol = N)
+ for(d in 1:D) {
+ if (is.null(op$h)) {
+ quarticity <- (N/3)*rowSums(mR^4)
+ qscale <- quarticity^0.2
+ qmult <- qscale/sqrt((1/D)*sum(qscale^2))
+ if (op$est == "cv")
+ cat(paste("Estimating optimal bandwidth for day", d, "of", D, "...\n"))
+ h[d] <- estbandwidth(mR[d, ], delta = delta, qmult = qmult[d],
+ type = op$type, est = op$est, lower = op$lower,
+ upper = op$upper)
+ }
+ for(n in 1:N) {
+ if (op$type == "beta") {
+ K <- kernelk(t/S, type = op$type, b = h[d], y = t[n]/S)
+ } else {
+ K <- kernelk((t-t[n])/h[d], type = op$type)/h[d]
+ }
+ K <- K/sum(K)
+ sigma2hat[d, n] <- K %*% (mR[d, ]^2)
+ }
+ }
+ spot <- as.vector(t(sqrt(sigma2hat)))
+ if (is.null(rdata)) {
+ spot <- matrix(spot, nrow = D, ncol = N, byrow = TRUE)
+ } else {
+ spot <- xts(spot, order.by = time(rdata))
+ }
+ out <- list(spot = spot, par = list(h = h))
+ class(out) <- "spotvol"
+ return(out)
+}
+
+# calculate values of certain kernels
+# arguments b and y only needed for type == "beta"
+kernelk <- function(x, type = "gaussian", b = 1, y = 1)
+{
+ if (type == "gaussian")
+ return(dnorm(x))
+ if (type == "epanechnikov") {
+ z <- (3/4)*(1-x^2)
+ z[abs(x) > 1] <- 0
+ return(z)
+ }
+ if (type == "beta")
+ return(dbeta(x, y/b + 1, (1-y)/b + 1))
+}
+
+# estimate optimal bandwidth paramater h
+# by default, this is done through crossvalidation (cv)
+# else the formula for h_opt in Kristensen(2010) is approximated
+estbandwidth <- function(x, delta = 300, qmult = 1, type = "gaussian",
+ est = "cv", lower = NULL, upper = NULL)
+{
+ N <- length(x)
+ S <- N*delta
+ default <- bw.nrd0((1:N)*delta)
+ if (type == "epanechnikov")
+ default <- default*2.34
+ if (est == "quarticity")
+ h <- default*qmult
+ if (est == "cv") {
+ if (type == "beta") {
+ if (is.null(lower))
+ lower <- 0.0001
+ if (is.null(upper))
+ upper <- 1
+ } else {
+ if (is.null(lower))
+ lower <- default/3
+ if (is.null(upper))
+ upper <- default*3
+ }
[TRUNCATED]
To get the complete diff run:
svnlook diff /svnroot/highfrequency -r 76
More information about the Highfrequency-commits
mailing list