[Highfrequency-commits] r76 - pkg/highfrequency/R

noreply at r-forge.r-project.org noreply at r-forge.r-project.org
Thu Aug 28 17:25:23 CEST 2014


Author: kboudt
Date: 2014-08-28 17:25:22 +0200 (Thu, 28 Aug 2014)
New Revision: 76

Added:
   pkg/highfrequency/R/spotvol.r
Log:


Added: pkg/highfrequency/R/spotvol.r
===================================================================
--- pkg/highfrequency/R/spotvol.r	                        (rev 0)
+++ pkg/highfrequency/R/spotvol.r	2014-08-28 15:25:22 UTC (rev 76)
@@ -0,0 +1,1269 @@
+#' Spot volatility estimation
+#' 
+#' The \code{spotvolatility} package offers several methods to estimate spot 
+#' volatility and its intraday seasonality, using high-frequency data.
+#' 
+#' @details
+#' The following spot volatility estimation methods have been implemented:
+#' 
+#' @section Deterministic periodicity:
+#' The spot volatility is decomposed into a a deterministic periodic factor 
+#' f_{i} (identical for every day in the sample) and a daily factor s_{t} 
+#' (identical for all observations within a day). Both components are then
+#' estimated separately. For more details, see Taylor and Xu (1997) and 
+#' Andersen and Bollerslev (1997). The jump robust versions by Boudt et al.
+#' (2011) have also been implemented.
+#' 
+#' @section Stochastic periodicity:
+#' This method by Beltratti and Morana (2001) assumes the periodicity factor to
+#' be stochastic. The spot volatility estimation is split into four components:
+#' a random walk, an autoregressive process, a stochastic cyclical process and
+#' a deterministic cyclical process. The model is estimated using a 
+#' quasi-maximum likelihood method based on the Kalman Filter. The package
+#' \code{\link[=fkf]{FKF}} is used to apply the Kalman filter.
+#' 
+#' @section Nonparametric filtering:
+#' This method by Kristensen (2010) filters the spot volatility in a
+#' nonparametric way by applying kernel weights to the standard realized
+#' volatility estimator. Different kernels and bandwidths can be used to focus
+#' on specific characteristics of the volatility process.
+#' 
+#' @section Piecewise constant volatility:
+#' Another nonparametric method is that of Fried (2012), which assumes the 
+#' volatility to be piecewise constant over local windows. Robust two-sample
+#' tests are applied to detect changes in variability between subsequent 
+#' windows. The spot volatility can then be estimated by evaluating 
+#' regular realized volatility estimators within each local window.
+#' 
+#' @section GARCH models with intraday seasonality:
+#' The package also includes an option to apply GARCH models, implemented by
+#' the \code{\link{rugarch}} package, to estimate spot volatility from intraday 
+#' data. This is done by including external regressors in the model. These 
+#' regressors are based on a flexible Fourier form, which was also used in the 
+#' stochastic and deterministic periodicity estimation methods.
+#' 
+#' @template references
+#' 
+#' @docType package
+#' @name spotvolatility
+NULL
+
+#' Spot volatility estimation
+#' 
+#' @description
+#'
+#' The \code{spotvol} function offers several methods to estimate spot 
+#' volatility and its intraday seasonality, using high-frequency data. It 
+#' returns an object of class \code{spotvol}, which can contain various outputs,
+#' depending on the method used. See 'Details' for a description of each method.
+#' In any case, the output will contain the spot volatility estimates. 
+#' 
+#' The input can consist of price data or return data, either tick by tick or 
+#' sampled at set intervals. The data will be converted to equispaced 
+#' high-frequency returns \eqn{r_{t,i}} (read: the \eqn{i}th return on day 
+#' \eqn{t}). 
+#' 
+#' @details The following estimation methods can be specified in \code{method}:
+#' 
+#' \strong{Deterministic periodicity method (\code{"detper"})}
+#' 
+#' Parameters:
+#' \tabular{ll}{
+#' \code{dailyvol} \tab A string specifying the estimation method for the daily
+#' component \eqn{s_t}. Possible values are \code{"bipower", "rv", "medrv"}.
+#' Default = \code{"bipower"}. \cr
+#' \code{periodicvol} \tab A string specifying the estimation method for the 
+#' component of intraday volatility, that depends in a deterministic way on the
+#' intraday time at which the return is observed. Possible values are 
+#' \code{"TML", "SD", "WSD", "OLS"}. See Boudt et al. (2011) for details. 
+#' Default = \code{"TML"}.\cr
+#' \code{P1} \tab A positive integer corresponding to the number of cosinus
+#' terms used in the flexible Fourier specification of the periodicity function, 
+#' see Andersen et al. (1997) for details. Default = 5. \cr
+#' \code{P2} \tab Same as \code{P1}, but for the sinus terms. Default = 5.\cr
+#' \code{dummies} \tab Boolean: in case it is \code{TRUE}, the parametric
+#' estimator of periodic standard deviation specifies the periodicity function
+#' as the sum of dummy variables corresponding to each intraday period. If it 
+#' is \code{FALSE}, the parametric estimator uses the flexible Fourier 
+#' specification. Default = \code{FALSE}.
+#' }
+#' Outputs (see 'Value' for a full description of each component):
+#' \itemize{
+#' \item{\code{spot}}
+#' \item{\code{daily}}
+#' \item{\code{periodic}}
+#' }
+#' The spot volatility is decomposed into a deterministic periodic factor
+#' \eqn{f_{i}} (identical for every day in the sample) and a daily factor 
+#' \eqn{s_{t}} (identical for all observations within a day). Both components
+#' are then estimated separately. For more details, see Taylor and Xu (1997)
+#' and Andersen and Bollerslev (1997). The jump robust versions by Boudt et al.
+#' (2011) have also been implemented.
+#' 
+#' \strong{Stochastic periodicity method (\code{"stochper"})}
+#' 
+#' Parameters:
+#' \tabular{ll}{
+#' \code{P1} \tab A positive integer corresponding to the number of cosinus 
+#' terms used in the flexible Fourier specification of the periodicity function. 
+#' Default = 5. \cr
+#' \code{P2} \tab Same as \code{P1}, but for the sinus terms. Default = 5.\cr
+#' \code{init} \tab A named list of initial values to be used in the
+#' optimization routine (\code{"BFGS"} in \code{\link{optim}}). Default = 
+#' \code{list(sigma = 0.03, sigma_mu = 0.005, sigma_h = 0.005, sigma_k = 0.05, 
+#' phi = 0.2, rho = 0.98, mu = c(2, -0.5), delta_c = rep(0, max(1,P1)), 
+#' delta_s = rep(0, max(1,P2)))}. See Beltratti & Morana (2001) for a definition
+#' of each parameter. \code{init} can contain any number of these parameters. 
+#' For parameters not specified in \code{init}, the default initial value will 
+#' be used.\cr
+#' \code{control} \tab A list of options to be passed down to 
+#' \code{\link{optim}}.
+#' }
+#' Outputs (see 'Value' for a full description of each component):
+#' \itemize{
+#' \item{\code{spot}}
+#' \item{\code{par}}
+#' }
+#' This method by Beltratti and Morana (2001) assumes the periodicity factor to
+#' be stochastic. The spot volatility estimation is split into four components:
+#' a random walk, an autoregressive process, a stochastic cyclical process and
+#' a deterministic cyclical process. The model is estimated using a 
+#' quasi-maximum likelihood method based on the Kalman Filter. The package
+#' \code{\link[=fkf]{FKF}} is used to apply the Kalman filter. In addition to 
+#' the spot volatility estimates, all parameter estimates are returned.
+#' 
+#' \strong{Nonparametric filtering (\code{"kernel"})}
+#' 
+#' Parameters:
+#' \tabular{ll}{
+#' \code{type} \tab String specifying the type of kernel to be used. Options 
+#' include \code{"gaussian", "epanechnikov", "beta"}. Default = 
+#' \code{"gaussian"}.\cr
+#' \code{h} \tab Scalar or vector specifying bandwidth(s) to be used in kernel. 
+#' If \code{h} is a scalar, it will be assumed equal throughout the sample. If
+#' it is a vector, it should contain bandwidths for each day. If left empty, 
+#' it will be estimated. Default = \code{NULL}. \cr
+#' \code{est} \tab String specifiying the bandwidth estimation method. Possible
+#' values include \code{"cv", "quarticity"}. Method \code{"cv"} equals 
+#' cross-validation, which chooses the bandwidth that minimizes the Integrated
+#' Square Error. \code{"quarticity"} multiplies the simple plug-in estimator
+#' by a factor based on the daily quarticity of the returns. \code{est} is
+#' obsolete if \code{h} has already been specified by the user. Default = 
+#' \code{"cv"}.\cr
+#' \code{lower} \tab Lower bound to be used in bandwidth optimization routine, 
+#' when using cross-validation method. Default is \eqn{0.1n^{-0.2}}. \cr
+#' \code{upper} \tab Upper bound to be used in bandwidth optimization routine,
+#' when using cross-validation method. Default is \eqn{n^{-0.2}}. \cr
+#' }
+#' Outputs (see 'Value' for a full description of each component):
+#' \itemize{
+#' \item{\code{spot}}
+#' \item{\code{par}}
+#' }
+#' This method by Kristensen (2010) filters the spot volatility in a 
+#' nonparametric way by applying kernel weights to the standard realized 
+#' volatility estimator. Different kernels and bandwidths can 
+#' be used to focus on specific characteristics of the volatility process.
+#' 
+#' Estimation results heavily depend on the bandwidth parameter \eqn{h}, so it
+#' is important that this parameter is well chosen. However, it is difficult to
+#' come up with a method that determines the optimal bandwidth for any kind of 
+#' data or kernel that can be used. Although some estimation methods are 
+#' provided, it is advised that you specify \eqn{h} yourself, or make sure that
+#' the estimation results are appropiate.
+#' 
+#' One way to estimate \eqn{h}, is by using cross-validation. For each day in
+#' the sample, \eqn{h} is chosen as to minimize the Integrated Square Error, 
+#' which is a function of \eqn{h}. However, this function often has multiple
+#' local minima, or no minima at all (\eqn{h -> \Inf}). To ensure a reasonable 
+#' optimum is reached, strict boundaries have to be imposed on \eqn{h}. These 
+#' can be specified by \code{lower} and \code{upper}, which by default are 
+#' \eqn{0.1n^{-0.2}} and \eqn{n^{-0.2}} respectively, where \eqn{n} is the 
+#' number of observations in a day.
+#' 
+#' When using the method \code{"kernel"}, in addition to the spot volatility 
+#' estimates, all used values of the bandwidth \eqn{h} are returned.
+#' 
+#' \strong{Piecewise constant volatility (\code{"piecewise"})}
+#' 
+#' Parameters:
+#' \tabular{ll}{
+#' \code{type} \tab String specifying the type of test to be used. Options 
+#' include \code{"MDa", "MDb", "DM"}. See Fried (2012) for details. Default = 
+#' \code{"MDa"}.\cr
+#' \code{m} \tab Number of observations to include in reference window. 
+#' Default = \code{40}. \cr
+#' \code{n} \tab Number of observations to include in test window. 
+#' Default = \code{20}. \cr
+#' \code{alpha} \tab Significance level to be used in tests. Note that the test
+#' will be executed many times (roughly equal to the total number of 
+#' observations), so it is advised to use a small value for \code{alpha}, to
+#' avoid a lot of false positives. Default = \code{0.005}. \cr
+#' \code{volest} \tab String specifying the realized volatility estimator to be
+#' used in local windows. Possible values are \code{"bipower", "rv", "medrv"}. 
+#' Default = \code{"bipower"}. \cr
+#' \code{online} \tab Boolean indicating whether estimations at a certain point
+#' \eqn{t} should be done online (using only information available at
+#' \eqn{t-1}), or ex post (using all observations between two change points). 
+#' Default = \code{TRUE}.  \cr
+#' }
+#' Outputs (see 'Value' for a full description of each component):
+#' \itemize{
+#' \item{\code{spot}}
+#' \item{\code{cp}}
+#' }
+#' 
+#' This nonparametric method by Fried (2012) assumes the volatility to be 
+#' piecewise constant over local windows. Robust two-sample tests are applied to
+#' detect changes in variability between subsequent windows. The spot volatility
+#' can then be estimated by evaluating regular realized volatility estimators 
+#' within each local window.
+#' 
+#' Along with the spot volatility estimates, this method will return the 
+#' detected change points in the volatility level. When plotting a 
+#' \code{spotvol} object containing \code{cp}, these change points will be 
+#' visualized.
+#' 
+#' \strong{GARCH models with intraday seasonality  (\code{"garch"})}
+#' 
+#' Parameters:
+#' \tabular{ll}{
+#' \code{model} \tab String specifying the type of test to be used. Options 
+#' include \code{"sGARCH", "eGARCH"}. See \code{\link{ugarchspec}} in the 
+#' \code{\link{rugarch}} package. Default = \code{"eGARCH"}. \cr
+#' \code{garchorder} \tab Numeric value of length 2, containing the order of 
+#' the GARCH model to be estimated. Default = \code{c(1,1)}. \cr
+#' \code{dist} \tab String specifying the distribution to be assumed on the
+#' innovations. See \code{distribution.model} in \code{\link{ugarchspec}} for 
+#' possible options. Default = \code{"norm"}. \cr
+#' \code{solver.control} \tab List containing solver options. 
+#' See \code{\link{ugarchfit}} for possible values. Default = \code{list()}. \cr
+#' \code{P1} \tab A positive integer corresponding to the number of cosinus 
+#' terms used in the flexible Fourier specification of the periodicity function. 
+#' Default = 5. \cr
+#' \code{P2} \tab Same as \code{P1}, but for the sinus terms. Default = 5.\cr
+#' }
+#' Outputs (see 'Value' for a full description of each component):
+#' \itemize{
+#' \item{\code{spot}}
+#' \item{\code{ugarchfit}}
+#' }
+#' This method generates the external regressors needed to model the intraday 
+#' seasonality with a Flexible Fourier form. The \code{\link{rugarch}} package 
+#' is then employed to estimate the specified GARCH(1,1) model.
+#' 
+#' Along with the spot volatility estimates, this method will return the
+#' \code{\link{ugarchfit}} object used by the \code{\link{rugarch}} package.
+#' 
+#' @param data Either an \code{\link{xts}} object, containing price data, or a 
+#' \code{matrix} containing returns. For price data, irregularly spaced 
+#' observations are allowed. They will be aggregated to the level specified by 
+#' parameters \code{on} and \code{k}. For return data, the observations are
+#' assumed to be equispaced, with the time between them specified by \code{on}
+#' and \code{k}. Return data should be in matrix form, where each row 
+#' corresponds to a day, and each column to an intraday period. The output 
+#' will be in the same form as the input (\code{xts} or 
+#' \code{matrix}/\code{numeric}).
+#' @param method specifies which method will be used to estimate the spot 
+#' volatility. Options include \code{"detper"} and \code{"stochper"}. 
+#' See 'Details'.
+#' @param on string indicating the time scale in which \code{k} is expressed.
+#' Possible values are: \code{"secs", "seconds", "mins", "minutes", "hours"}.
+#' @param k positive integer, indicating the number of periods to aggregate
+#' over. E.g. to aggregate an \code{xts} object to the 5 minute frequency, set 
+#' \code{k = 5} and \code{on = "minutes"}.
+#' @param marketopen the market opening time. This should be in the time zone 
+#' specified by \code{tz}. By default, \code{marketopen = "09:30:00"}.
+#' @param marketclose the market closing time. This should be in the time zone 
+#' specified by \code{tz}. By default, \code{marketclose = "16:00:00"}.
+#' @param tz string specifying the time zone to which the times in \code{data} 
+#' and/or \code{marketopen}/ \code{marketclose} belong. Default = \code{"GMT"}.
+#' @param ... method-specific parameters (see 'Details').
+#' 
+#' @return
+#' A \code{spotvol} object, which is a list containing one or more of the 
+#' following outputs, depending on the method used:
+#' 
+#' \code{spot}
+#' 
+#' An \code{xts} or \code{matrix} object (depending on the input) containing 
+#' spot volatility estimates \eqn{\sigma_{t,i}}, reported for each interval 
+#' \eqn{i} between \code{marketopen} and \code{marketclose} for every day 
+#' \eqn{t} in \code{data}. The length of the intervals is specifiedby \code{k} 
+#' and \code{on}. Methods that provide this output: All.
+#' 
+#' \code{daily}
+#' 
+#' An \code{xts} or \code{numeric} object (depending on the input) containing 
+#' estimates of the daily volatility levels for each day \eqn{t} in \code{data},
+#' if the used method decomposed spot volatility into a daily and an intraday 
+#' component. Methods that provide this output: \code{"detper"}.
+#' 
+#' \code{periodic}
+#' 
+#' An \code{xts} or \code{numeric} object (depending on the input) containing 
+#' estimates of the intraday periodicity factor for each day interval \eqn{i} 
+#' between \code{marketopen} and \code{marketclose}, if the spot volatility was
+#' decomposed into a daily and an intraday component. If the output is in 
+#' \code{xts} format, this periodicity factor will be dated to the first day of
+#' the input data, but it is identical for each day in the sample. Methods that
+#' provide this output: \code{"detper"}.
+#' 
+#' \code{par}
+#' 
+#' A named list containing parameter estimates, for methods that estimate one
+#' or more parameters. Methods that provide this output: 
+#' \code{"stochper", "kernel"}.
+#' 
+#' \code{cp}
+#' 
+#' A vector containing the change points in the volatility, i.e. the observation
+#' indices after which the volatility level changed, according to the applied
+#' tests. The vector starts with a 0. Methods that provide this output: 
+#' \code{"piecewise"}. 
+#' 
+#' \code{ugarchfit}
+#' 
+#' A \code{\link{ugarchfit}} object, as used by the \code{\link{rugarch}} 
+#' package, containing all output from fitting the GARCH model to the data. 
+#' Methods that provide this output: \code{"garch"}.
+#' 
+#' @export
+#' @examples
+#' data(sample_real5minprices)
+#' 
+#' # default method, deterministic periodicity
+#' vol1 <- spotvol(sample_real5minprices)
+#' par.def <- par(no.readonly = TRUE)
+#' plot(vol1)
+#' par(par.def)
+#' 
+#' # compare to stochastic periodicity
+#' init = list(sigma = 0.03, sigma_mu = 0.005, sigma_h = 0.007,
+#'             sigma_k = 0.06, phi = 0.194, rho = 0.986, mu = c(1.87,-0.42),
+#'             delta_c = c(0.25, -0.05, -0.2, 0.13, 0.02), delta_s = c(-1.2, 
+#'             0.11, 0.26, -0.03, 0.08))
+#' # next method will take around 110 iterations
+#' vol2 <- spotvol(sample_real5minprices, method = "stochper", init = init)
+#' plot(as.numeric(vol1$spot[1:780]), type="l")
+#' lines(as.numeric(vol2$spot[1:780]), col="red")
+#' legend("topright", c("detper", "stochper"), col = c("black", "red"), lty=1)
+#' 
+#' # various kernel estimates
+#' h1 = bw.nrd0((1:nrow(sample_returns_5min))*(5*60))
+#' vol3 <- spotvol(sample_returns_5min, method = "kernel", h = h1)
+#' vol4 <- spotvol(sample_returns_5min, method = "kernel", est = "quarticity")
+#' vol5 <- spotvol(sample_returns_5min, method = "kernel", est = "cv")
+#' plot(vol3, length = 2880)
+#' lines(as.numeric(t(vol4$spot))[1:2880], col="red")
+#' lines(as.numeric(t(vol5$spot))[1:2880], col="blue")
+#' legend("topright", c("h = simple estimate", "h = quarticity corrected",
+#'        "h = crossvalidated"), col = c("black", "red", "blue"), lty=1)
+#' par(par.def)
+#' 
+#' # piecewise constant volatility, using an example from Fried (2012)
+#' simdata <- matrix(sqrt(5/3)*rt(3000, df = 5), ncol = 500, byrow = TRUE)
+#' simdata <- c(1, 1, 1.5, 1.5, 2, 1)*simdata
+#' # the volatility of the simulated now changes at 1000, 2000 and 2500 
+#' vol6 <- spotvol(simdata, method = "piecewise", m = 200, n  = 100, 
+#'                 online = FALSE)
+#' plot(vol6)
+#' 
+#' # compare regular GARCH(1,1) model to eGARCH, both with external regressors
+#' vol7 <- spotvol(sample_returns_5min, method = "garch", model = "sGARCH")
+#' vol8 <- spotvol(sample_returns_5min, method = "garch", model = "eGARCH")
+#' plot(as.numeric(t(vol7$spot)), type = "l")
+#' lines(as.numeric(t(vol8$spot)), col = "red")
+#' legend("topleft", c("GARCH", "eGARCH"), col = c("black", "red"), lty=1)
+#'
+#' @template references
+spotvol <- function(data, method = "detper", ..., on = "minutes", k = 5,
+                    marketopen = "09:30:00", marketclose = "16:00:00",
+                    tz = "GMT")  
+{
+  require(timeDate)
+  require(chron)
+  if (on == "seconds" | on == "secs") 
+    delta <- k 
+  if (on == "minutes" | on == "mins") 
+    delta <- k*60  
+  if (on == "hours") 
+    delta <- k*3600 
+  
+  if (inherits(data, what = "xts")) {
+    data <- xts(data, order.by = as.POSIXct(time(data), tz = tz), tzone = tz)
+    dates <- unique(format(time(data), "%Y-%m-%d"))
+    cDays <- length(dates)
+    rdata <- mR <- c()
+    intraday <- seq(from = times(marketopen), to = times(marketclose), 
+                    by = times(delta/(24*3600))) 
+    if (as.character(tail(intraday, 1)) != marketclose) 
+      intraday <- c(intraday, marketclose)
+    intraday <- intraday[2:length(intraday)]
+    for (d in 1:cDays) {
+      datad <- data[as.character(dates[d])]
+      if (!all(format(time(datad), format = "%Z") == tz)) 
+        stop(paste("Not all data on ", dates[d], " is in time zone \"", tz,
+                   "\". This may be due to daylight saving time. Try using a",
+                   " time zone without daylight saving, such as GMT.", 
+                   sep = ""))
+      datad <- aggregatePrice(datad, on = on, k = k , marketopen = marketopen,
+                             marketclose = marketclose, tz = tz)
+      z <- xts(rep(1, length(intraday)), tzone = tz, 
+              order.by = as.POSIXct(paste(dates[d], as.character(intraday), 
+                                    sep=" "), tz = tz))
+      datad <- merge.xts(z, datad)$datad
+      datad <- na.locf(datad)
+      rdatad <- makeReturns(datad)
+      rdatad <- rdatad[time(rdatad) > min(time(rdatad))]
+      rdata <- rbind(rdata, rdatad)
+      mR <- rbind(mR, as.numeric(rdatad))
+    }
+  } else if (class(data) == "matrix") {
+    mR <- data
+    rdata <- NULL
+  } else stop("Input data has to consist of either of the following: 
+            1. An xts object containing price data
+            2. A matrix containing return data")
+
+  options <- list(...)
+  out <- switch(method, 
+           detper = detper(mR, rdata = rdata, options = options), 
+           stochper = stochper(mR, rdata = rdata, options = options),
+           kernel = kernelestim(mR, rdata = rdata, delta, options = options),
+           piecewise = piecewise(mR, rdata = rdata, options = options),
+           garch = garch_s(mR, rdata = rdata, options = options))  
+  return(out)
+}
+
+# Deterministic periodicity model
+# 
+# Modified spotVol function from highfrequency package
+detper <- function(mR, rdata = NULL, options = list()) 
+{
+  # default options, replace if user-specified
+  op <- list(dailyvol = "bipower", periodicvol = "TML", dummies = FALSE, 
+             P1 = 5, P2 = 5)
+  op[names(options)] <- options 
+  
+  cDays <- nrow(mR)
+  M <- ncol(mR)
+  if (cDays == 1 & is.null(rdata)) { 
+    mR <- as.numeric(mR)
+    estimdailyvol <- switch(op$dailyvol, 
+                           bipower = rBPCov(mR), 
+                           medrv = medRV(mR), 
+                           rv = rCov(mR))
+  } else {
+    if (is.null(rdata)) {
+      estimdailyvol <- switch(op$dailyvol, 
+                             bipower = apply(mR, 1, "rBPCov"),
+                             medrv = apply(mR, 1, "medRV"),
+                             rv = apply(mR, 1, "rCov"))
+    } else {
+      estimdailyvol <- switch(op$dailyvol, 
+                              bipower = apply.daily(rdata, rBPCov),
+                              medrv = apply.daily(rdata, medRV),
+                              rv = apply.daily(rdata, rCov))
+      dates = time(estimdailyvol)
+    }
+  }  
+  if (cDays <= 50) {
+    print("Periodicity estimation requires at least 50 observations. 
+          Periodic component set to unity")
+    estimperiodicvol = rep(1, M)
+  } else {
+    mstdR <- mR/sqrt(as.numeric(estimdailyvol) * (1/M))
+    selection <- c(1:M)[ (nrow(mR)-apply(mR,2,'countzeroes')) >=20] 
+    # preferably no na is between
+    selection <- c( min(selection) : max(selection) )
+    mstdR <- mstdR[,selection]
+    estimperiodicvol_temp <- diurnal(stddata = mstdR, method = op$periodicvol, 
+                                     dummies = op$dummies, P1 = op$P1, 
+                                     P2 = op$P2)[[1]]
+    estimperiodicvol <- rep(1,M)
+    estimperiodicvol[selection] <- estimperiodicvol_temp
+    mfilteredR <- mR/matrix(rep(estimperiodicvol, cDays), byrow = T, 
+                            nrow = cDays)
+    estimdailyvol <- switch(op$dailyvol, 
+                            bipower = apply(mfilteredR, 1, "rBPCov"),
+                            medrv = apply(mfilteredR, 1, "medRV"), 
+                            rv = apply(mfilteredR, 1, "rCov"))
+    spot <- rep(sqrt(as.numeric(estimdailyvol) * (1/M)), each = M) * 
+              rep(estimperiodicvol, cDays)
+    if (is.null(rdata)) {
+      spot <- matrix(spot, nrow = cDays, ncol = M, byrow = TRUE)
+    } else {
+      spot <- xts(spot, order.by = time(rdata))
+      estimdailyvol <- xts(estimdailyvol, order.by = dates)
+      estimperiodicvol <- xts(estimperiodicvol, order.by = time(rdata[1:M]))
+    }
+    out <- list(spot = spot, daily = estimdailyvol, periodic = estimperiodicvol)
+    class(out) <- "spotvol"
+    return(out)
+  }
+}
+
+# Stochastic periodicity model
+# 
+# This function estimates the spot volatility by using the stochastic periodcity
+# model of Beltratti & Morana (2001)
+stochper <- function(mR, rdata = NULL, options = list()) 
+{
+  require(FKF)
+  # default options, replace if user-specified
+  op <- list(init = list(), P1 = 5, P2 = 5, control = list(trace=1, maxit=500))
+  op[names(options)] <- options 
+  
+  N <- ncol(mR)
+  days <- nrow(mR)
+  mR[mR == 0] <- NA
+  logr2 <- log(mR^2)
+  rvector <- as.vector(t(logr2)) 
+  lambda <- (2*pi)/N;
+  
+  # default starting values of parameters
+  sp <- list(sigma = 0.03,
+             sigma_mu = 0.005,
+             sigma_h = 0.005,
+             sigma_k = 0.05,
+             phi = 0.2,
+             rho = 0.98,
+             mu = c(2, -0.5),
+             delta_c = rep(0, max(1,op$P1)),
+             delta_s = rep(0, max(1,op$P2)))
+  
+  # replace if user has specified different values
+  sp[names(op$init)] <- op$init
+  
+  # check input
+  for (i in c("sigma", "sigma_mu", "sigma_h", "sigma_k", "phi", "rho")) {
+    if (sapply(sp, length)[i] != 1) stop(paste(i, " must be a scalar"))  
+  }
+  if (length(sp$mu) != 2) 
+    stop("mu must have length 2")
+  if (length(sp$delta_c) != op$P1 & op$P1 > 0) 
+    stop("delta_c must have length equal to P1")
+  if (length(sp$delta_s) != op$P2 & op$P2 > 0) 
+    stop("delta_s must have length equal to P2")
+  if (length(sp$delta_c) < 1) 
+    stop("delta_c must at least have length 1")
+  if (length(sp$delta_s) < 1) 
+    stop("delta_s must at least have length 1")
+  
+  # transform parameters to allow for unrestricted optimization 
+  # (domain -Inf to Inf)
+  par_t <- c(sigma = log(sp$sigma), sigma_mu = log(sp$sigma_mu), 
+             sigma_h = log(sp$sigma_h), sigma_k = log(sp$sigma_k), 
+             phi = log(sp$phi/(1-sp$phi)), rho = log(sp$rho/(1-sp$rho)),
+             mu = sp$mu, delta_c = sp$delta_c, delta_s = sp$delta_s) 
+  
+  opt <- optim(par_t, loglikBM, yt = rvector, N = N, days = days, P1 = op$P1, 
+               P2 = op$P2, method="BFGS", control = op$control)
+  
+  # recreate model to obtain volatility estimates
+  ss <- ssmodel(opt$par, days, N, P1 = op$P1, P2 = op$P2)
+  kf <- fkf(a0 = ss$a0, P0 = ss$P0, dt = ss$dt, ct = ss$ct, Tt = ss$Tt, 
+            Zt = ss$Zt, HHt = ss$HHt, GGt = ss$GGt, 
+            yt = matrix(rvector, ncol = length(rvector)))
+  sigmahat <- as.vector(exp((ss$Zt%*%kf$at[,1:(N*days)] + ss$ct + 1.27)/2))
+  
+  # transform parameter estimates back
+  estimates <- c(exp(opt$par["sigma"]), exp(opt$par["sigma_mu"]), 
+                 exp(opt$par["sigma_h"]), exp(opt$par["sigma_k"]),
+                 exp(opt$par["phi"])/(1+exp(opt$par["phi"])), 
+                 exp(opt$par["rho"])/(1+exp(opt$par["rho"])), opt$par[-(1:6)])
+
+  if (is.null(rdata)) {
+    spot <- matrix(sigmahat, nrow = days, ncol = N, byrow = TRUE)
+  } else {
+    spot <- xts(sigmahat, order.by = time(rdata))
+  }
+  out <- list(spot = spot, par = estimates)
+  class(out) <- "spotvol"
+  return(out)
+}
+
+# Calculate log likelihood using Kalman Filter
+# 
+# This function returns the average log likehood value of the stochastic 
+# periodicity model, given the input parameters.
+loglikBM <- function(par_t, yt, days, N = 288, P1 = 5, P2 = 5)
+{
+  ss <- ssmodel(par_t, days, N, P1 = P1, P2 = P2)
+  yt <- matrix(yt, ncol = length(yt))
+  kf <- fkf(a0 = ss$a0, P0 = ss$P0, dt = ss$dt, ct = ss$ct, Tt = ss$Tt, 
+            Zt = ss$Zt, HHt = ss$HHt, GGt = ss$GGt, yt = yt)
+  return(-kf$logLik/length(yt))
+}
+
+# Generate state space model
+# 
+# This function creates the state space matrices from the input parameters.
+# The output is in the format used by the FKF package.
+ssmodel <- function(par_t, days, N = 288, P1 = 5, P2 = 5)
+{
+  par <- c(exp(par_t["sigma"]), exp(par_t["sigma_mu"]), exp(par_t["sigma_h"]), 
+           exp(par_t["sigma_k"]), exp(par_t["phi"])/(1+exp(par_t["phi"])), 
+           exp(par_t["rho"])/(1+exp(par_t["rho"])), par_t[-(1:6)])
+  lambda <- (2*pi)/288
+  a0 <- c(0, 0, par["delta_c1"], par["delta_s1"])
+  if (P1 == 0) 
+    a0[3] <- par["delta_c"]
+  if (P2 == 0) 
+    a0[4] <- par["delta_s"]   
+  m <- length(a0)
+  P0 <- Tt <- Ht <- matrix(0, m, m)
+  diag(Tt) <- c(1, par["phi"], rep(par["rho"]*cos(lambda), 2))
+  Tt[3,4] <- par["rho"]*sin(lambda)
+  Tt[4,3] <- par["rho"]*-sin(lambda)
+  Zt <- matrix(c(1, 1, 1, 0), ncol = m)
+  Gt <- sqrt(0.5*pi^2)
+  GGt <- Gt %*% t(Gt)
+  diag(Ht) <- c(par["sigma_mu"], par["sigma_h"], rep(par["sigma_k"], 2))
+  HHt <- Ht %*% t(Ht)
+  dt <- matrix(0, nrow = m)
+  ct <- log(par["sigma"]^2) - 1.270363
+  
+  # calculate deterministic part c2, add to ct
+  n <- 1:N
+  M1 <- (2*n)/(N+1)
+  M2 <- (6*n^2)/((N+1)*(N+2))
+  c2 <- par["mu1"]*M1 + par["mu2"]*M2
+  if (P1 > 1) {
+    for (k in 2:P1) {
+      c2 <- c2 + par[paste("delta_c", k, sep="")]*cos(k*lambda*n) 
+    }  
+  }
+  if (P2 > 1) {
+    for (p in 2:P2) {
+      c2 <- c2 + par[paste("delta_s", p, sep="")]*sin(p*lambda*n)
+    }  
+  }
+  ct <- matrix(ct + c2, ncol = N*days)
+  
+  return(list(a0 = a0, P0 = P0, Tt = Tt, Zt = Zt, GGt = GGt, HHt = HHt, 
+              dt = dt, ct = ct))
+}
+
+# Kernel estimation method
+# 
+# See Kristensen (2010)
+kernelestim <- function(mR, rdata = NULL, delta = 300, options = list())
+{
+  # default options, replace if user-specified
+  op <- list(type = "gaussian", h = NULL, est = "cv", lower = NULL, 
+             upper = NULL)
+  op[names(options)] <- options
+  
+  D <- nrow(mR)
+  N <- ncol(mR)
+  if (N < 100 & op$est == "cv") 
+    warning("Cross-validation may not return optimal results in small samples.")
+  if (op$type == "beta" & op$est == "quarticity" ) {
+    warning("No standard estimator available for Beta kernel bandwidth.
+                Cross-validation will be used instead.")
+    op$est = "cv" 
+  }
+  t <- (1:N)*delta
+  S <- N*delta
+  if (is.null(op$h)) { 
+    h <- numeric(D) 
+  } else {
+    h <- rep(op$h, length.out = D)
+  }
+  sigma2hat <- matrix(NA, nrow = D, ncol = N)
+  for(d in 1:D) {
+    if (is.null(op$h)) {
+      quarticity <- (N/3)*rowSums(mR^4)
+      qscale <- quarticity^0.2
+      qmult <- qscale/sqrt((1/D)*sum(qscale^2))
+      if (op$est == "cv") 
+        cat(paste("Estimating optimal bandwidth for day", d, "of", D, "...\n"))
+      h[d] <- estbandwidth(mR[d, ], delta = delta, qmult = qmult[d], 
+                           type = op$type, est = op$est, lower = op$lower, 
+                           upper = op$upper)
+    }
+    for(n in 1:N) {
+      if (op$type == "beta") {
+        K <- kernelk(t/S, type = op$type, b = h[d], y = t[n]/S)
+      } else {
+        K <- kernelk((t-t[n])/h[d], type = op$type)/h[d]
+      }
+      K <- K/sum(K)
+      sigma2hat[d, n] <- K %*% (mR[d, ]^2)
+    }
+  }
+  spot <- as.vector(t(sqrt(sigma2hat)))
+  if (is.null(rdata)) {
+    spot <- matrix(spot, nrow = D, ncol = N, byrow = TRUE)
+  } else {
+    spot <- xts(spot, order.by = time(rdata))
+  }
+  out <- list(spot = spot, par = list(h = h))
+  class(out) <- "spotvol"
+  return(out)
+}
+
+# calculate values of certain kernels
+# arguments b and y only needed for type == "beta"
+kernelk <- function(x, type = "gaussian", b = 1, y = 1)
+{
+  if (type == "gaussian") 
+    return(dnorm(x))  
+  if (type == "epanechnikov") {
+    z <- (3/4)*(1-x^2)
+    z[abs(x) > 1] <- 0
+    return(z)
+  }
+  if (type == "beta") 
+    return(dbeta(x, y/b + 1, (1-y)/b + 1))
+}
+
+# estimate optimal bandwidth paramater h
+# by default, this is done through crossvalidation (cv)
+# else the formula for h_opt in Kristensen(2010) is approximated
+estbandwidth <- function(x, delta = 300, qmult = 1, type = "gaussian", 
+                         est = "cv", lower = NULL, upper = NULL)
+{
+  N <- length(x)
+  S <- N*delta
+  default <- bw.nrd0((1:N)*delta)
+  if (type == "epanechnikov") 
+    default <- default*2.34
+  if (est == "quarticity")  
+    h <- default*qmult 
+  if (est == "cv") {
+    if (type == "beta") {
+      if (is.null(lower)) 
+        lower <- 0.0001
+      if (is.null(upper)) 
+        upper <- 1
+    } else {
+      if (is.null(lower)) 
+        lower <- default/3
+      if (is.null(upper)) 
+        upper <- default*3
+    }
[TRUNCATED]

To get the complete diff run:
    svnlook diff /svnroot/highfrequency -r 76


More information about the Highfrequency-commits mailing list