[Dplr-commits] r693 - in branches/redfit: R man
noreply at r-forge.r-project.org
noreply at r-forge.r-project.org
Tue Sep 24 12:38:31 CEST 2013
Author: mvkorpel
Date: 2013-09-24 12:38:31 +0200 (Tue, 24 Sep 2013)
New Revision: 693
Modified:
branches/redfit/R/redfit.R
branches/redfit/man/redfit.Rd
Log:
New formulas for rcritlo and rcrithi (limits of acceptance region of
runs test). Uses a normal approximation, fixed to exact values for
nfreq <= 16000.
Modified: branches/redfit/R/redfit.R
===================================================================
--- branches/redfit/R/redfit.R 2013-09-17 16:30:12 UTC (rev 692)
+++ branches/redfit/R/redfit.R 2013-09-24 10:38:31 UTC (rev 693)
@@ -319,23 +319,227 @@
ci99 <- NULL
}
- ## Test equality of theoretical AR1 and estimated spectrum using a
- ## runs test (Bendat and Piersol, 1986, p. 95). The empirical
- ## equations for calculating critical values for 5-% significance
- ## were derived from the tabulated critical values in B&P.
if (iwin2 == 0 && ofac == 1 && dn50 == 1) {
spectrcomp <- rep.int(0, nfreq)
spectrcomp[gxxc - gredth >= 0] <- 1
rcnt <- 1 + sum(diff(spectrcomp) != 0)
- ## dplR: NOTE! Integer division is used in REDFIT. This should be
- ## checked (by finding a copy of Bendat and Piersol). For now, we
- ## can assume that real(nout/2) was supposed to be real(nout)/2.
- ## sqrtHalfNfreq <- sqrt(nfreq %/% 2)
- sqrtHalfNfreq <- sqrt(nfreq / 2)
- ## dplR: NOTE! Is round() the right function to use? Maybe ceiling
- ## for the lower limit and floor for the higher limit?
- rcritlo <- round((-0.79557086 + 1.0088719 * sqrtHalfNfreq)^2)
- rcrithi <- round(( 0.75751462 + 0.9955133 * sqrtHalfNfreq)^2)
+
+ ## dplR: Old formulas for rcritlo, rcrithi (REDFIT).
+ ## ## Test equality of theoretical AR1 and estimated spectrum
+ ## ## using a runs test (Bendat and Piersol, 1986, p. 95). The
+ ## ## empirical equations for calculating critical values for
+ ## ## 5-% significance were derived from the tabulated
+ ## ## critical values in B&P. dplR: NOTE! Integer division is
+ ## ## used in REDFIT. We can assume that real(nout/2) was
+ ## ## supposed to be real(nout)/2.
+ ## ## sqrtHalfNfreq <- sqrt(nfreq %/% 2)
+ ## sqrtHalfNfreq <- sqrt(nfreq / 2)
+ ## rcritlo <- round((-0.79557086 + 1.0088719 * sqrtHalfNfreq)^2)
+ ## rcrithi <- round(( 0.75751462 + 0.9955133 * sqrtHalfNfreq)^2)
+
+ ## dplR: Updated formulas for rcritlo, rcrithi.
+ ##
+ ## The REDFIT formulas seem to be very, very inexact, with
+ ## either definition of sqrtHalfNfreq. For example, the width
+ ## of the acceptance region increases, then decreases (*), and
+ ## goes to <= 0 at nfreq >= 27144 (original sqrtHalfNfreq) or
+ ## at nfreq >= 27144 || nfreq %in% seq(from = 27051, by = 2,
+ ## to = 27143) (updated sqrtHalfNfreq). Another example:
+ ## rcritlo is 0 (impossible number of runs) at some small
+ ## values of nfreq.
+ ##
+ ## (*) The increase and decrease are general trends, but there
+ ## are local fluctuations against the trend.
+ ##
+ ## About the new formulas:
+ ##
+ ## Exact values were computed up to nfreq == 16000. For nfreq
+ ## > 16000, a normal approximation is used. For nfreq <=
+ ## 16000, the acceptance region obtained using the normal
+ ## approximation differs from the exact acceptance region at
+ ## 52 values of nfreq (diffIdx). The problem at hand
+ ## (comparison of two spectra) is analogous to studying the
+ ## number of runs of heads and tails with nfreq tosses of a
+ ## fair coin (p == 0.5).
+ ##
+ ## The sequence of acceptance region widths is non-decreasing
+ ## for both odd and even nfreq, individually. However,
+ ## because of the symmetric distribution, if rcritlo increases
+ ## by 1 going from nfreq to nfreq + 1, rcrithi does not
+ ## change, and the change in the width is -1.
+ ##
+ ## Reference: Bradley, J. V. (1968) Distribution-Free
+ ## Statistical Tests. Prentice-Hall. p. 253--254, 259--263.
+ ##
+ ## Code for computing the exact rcritlo, rcrithi can be found
+ ## below. It was used for obtaining 'diffIdx' and the
+ ## corrections +1 and -1. Requires the "gmp" package
+ ## (arbitrary precision numbers). Runs very slowly for large
+ ## values of n.
+ ##
+ ## library(gmp)
+ ## runprobZ <- function(k, n) {
+ ## invhalfpowM1 <- as.bigz(2)^(n - 1)
+ ## if (k %% 2 == 0) {
+ ## ## even number of runs
+ ## r <- k / 2
+ ## n1 <- seq(from = r, by = 1, to = n - r)
+ ## nn1 <- length(n1)
+ ## halfn1 <- nn1 %/% 2
+ ## if (nn1 %% 2 == 1) {
+ ## probsum <- chooseZ(n1[halfn1 + 1] - 1, r - 1)
+ ## probsum <- probsum * probsum
+ ## } else {
+ ## probsum <- 0
+ ## }
+ ## lown1 <- n1[seq_len(halfn1)]
+ ## if (length(lown1) > 0) {
+ ## lown2 <- n - lown1
+ ## probsum <- probsum + 2 * sum(chooseZ(lown1 - 1, r - 1) *
+ ## chooseZ(lown2 - 1, r - 1))
+ ## }
+ ## probsum / invhalfpowM1
+ ## } else if (k == 1) {
+ ## ## one run
+ ## as.bigq(2)^(1 - n)
+ ## } else {
+ ## ## odd number of runs
+ ## r <- (k - 1) / 2
+ ## n1 <- seq(from = r + 1, by = 1, to = n - r)
+ ## n2 <- n - n1
+ ## probsum <- sum(chooseZ(n1 - 1, r) * chooseZ(n2 - 1, r - 1))
+ ## probsum / invhalfpowM1
+ ## }
+ ## }
+ ## newcrit <- function(n, crit = 0.05, verbose = FALSE) {
+ ## stopifnot(is.numeric(n), length(n) >= 1, is.finite(n),
+ ## round(n) == n, n > 0, is.numeric(crit),
+ ## length(crit) == 1, is.finite(crit), crit < 1)
+ ## verbose2 <- isTRUE(verbose)
+ ## halfcrit <- crit / 2
+ ## nn <- length(n)
+ ## res <- matrix(NA_real_, 2, nn)
+ ## for (j in seq_len(nn)) {
+ ## thisn <- n[j]
+ ## if (verbose2) {
+ ## cat("n = ", thisn, " (", j, " / ", nn, ")\n", sep = "")
+ ## }
+ ## halfn <- thisn %/% 2
+ ## oddn <- thisn %% 2
+ ## complength <- halfn + oddn
+ ## lowaccept <- NA_real_
+ ## if (oddn == 1) {
+ ## csum <- (as.bigq(1) - runprobZ(complength, thisn)) / 2
+ ## if (as.numeric(csum) <= halfcrit) {
+ ## lowaccept <- complength
+ ## } else {
+ ## for (k in seq(from = complength - 1, by = -1,
+ ## length.out = complength - 1)) {
+ ## csum <- csum - runprobZ(k, thisn)
+ ## if (csum <= halfcrit) {
+ ## lowaccept <- k
+ ## break
+ ## }
+ ## }
+ ## }
+ ## } else {
+ ## csum <- as.bigq(1, 2)
+ ## for (k in seq(from = complength, by = -1,
+ ## length.out = complength)) {
+ ## csum <- csum - runprobZ(k, thisn)
+ ## if (csum <= halfcrit) {
+ ## lowaccept <- k
+ ## break
+ ## }
+ ## }
+ ## }
+ ## highaccept <- thisn - lowaccept + 1
+ ## res[, j] <- c(lowaccept, highaccept)
+ ## }
+ ## drop(res)
+ ## }
+
+ ## dplR: Empirical mean and standard deviation (variance) of
+ ## the number of runs distribution, and code for computing
+ ## them.
+ ## library(gmp)
+ ## runtableZ <- function(n) {
+ ## stopifnot(is.numeric(n), length(n) == 1,
+ ## is.finite(n), round(n) == n, n > 0)
+ ## halfn <- n %/% 2
+ ## oddn <- n %% 2
+ ## res <- numeric(n)
+ ## invhalfpowM1 <- as.bigz(2)^(n - 1)
+ ## ## Symmetric distribution. Compute first half only.
+ ## complength <- halfn + oddn
+ ## evenlength <- complength %/% 2
+ ## oddlength <- evenlength + complength %% 2 - 1
+ ## ## one run
+ ## res[1] <- 0.5^(n - 1)
+ ## ## odd number of runs (>= 3)
+ ## oddseq <- seq(from = 3, by = 2, length.out = oddlength)
+ ## for (k in oddseq) {
+ ## r <- (k - 1) / 2
+ ## n1 <- seq(from = r + 1, by = 1, to = n - r)
+ ## n2 <- n - n1
+ ## probsum <- sum(chooseZ(n1 - 1, r) * chooseZ(n2 - 1, r - 1))
+ ## res[k] <- as.numeric(probsum / invhalfpowM1)
+ ## }
+ ## ## even number of runs
+ ## evenseq <- seq(from = 2, by = 2, length.out = evenlength)
+ ## for (k in evenseq) {
+ ## r <- k / 2
+ ## n1 <- seq(from = r, by = 1, to = n - r)
+ ## nn1 <- length(n1)
+ ## halfn1 <- nn1 %/% 2
+ ## if (nn1 %% 2 == 1) {
+ ## probsum <- chooseZ(n1[halfn1 + 1] - 1, r - 1)
+ ## probsum <- probsum * probsum
+ ## } else {
+ ## probsum <- 0
+ ## }
+ ## leftn1 <- n1[seq_len(halfn1)]
+ ## if (length(leftn1) > 0) {
+ ## leftn2 <- n - leftn1
+ ## probsum <- probsum + 2 * sum(chooseZ(leftn1 - 1, r - 1) *
+ ## chooseZ(leftn2 - 1, r - 1))
+ ## }
+ ## res[k] <- as.numeric(probsum / invhalfpowM1)
+ ## }
+ ## ## Last half is mirror image of first half
+ ## res[seq(from = n, by = -1, length.out = halfn)] <-
+ ## res[seq_len(halfn)]
+ ## res / sum(res)
+ ## }
+ ## meanvar <- function(n, crit = 0.05) {
+ ## nn <- length(n)
+ ## res <- matrix(NA_real_, 2, nn)
+ ## for (k in seq_len(nn)) {
+ ## thisn <- n[k]
+ ## rtable <- runtableZ(thisn)
+ ## nseq <- 1:thisn
+ ## res[1, k] <- sum(rtable * nseq)
+ ## res[2, k] <- sum(rtable * (nseq - res[1, k])^2)
+ ## }
+ ## drop(res)
+ ## }
+ nMean <- 0.5 * nfreq + 0.5
+ nSd <- sqrt(0.25 * nfreq - 0.25)
+
+ diffIdx <-
+ c(18, 45, 68, 95, 139, 191, 268, 302, 397, 439,
+ 552, 652, 847, 1002, 1101, 1709, 1838, 2063, 2110, 2157,
+ 2763, 2926, 3325, 3504, 3626, 3750, 3876, 4004, 4134,
+ 4333, 4816, 4887, 5031, 5550, 6095, 6500, 6749, 7613,
+ 8624, 8719, 9202, 10414, 10941, 11048, 11591, 13415,
+ 13772, 14500, 14623, 14871, 15627, 15883)
+ diffMatch <- match(nfreq, diffIdx)
+ rcritlo <- floor(qnorm(0.025, mean = nMean, sd = nSd) + 0.5)
+ rcrithi <- floor(qnorm(0.975, mean = nMean, sd = nSd) + 0.5)
+ if (!is.na(diffMatch)) {
+ rcritlo <- rcritlo + 1
+ rcrithi <- rcrithi - 1
+ }
} else {
rcnt <- NULL
rcritlo <- NULL
Modified: branches/redfit/man/redfit.Rd
===================================================================
--- branches/redfit/man/redfit.Rd 2013-09-17 16:30:12 UTC (rev 692)
+++ branches/redfit/man/redfit.Rd 2013-09-24 10:38:31 UTC (rev 693)
@@ -87,9 +87,12 @@
between \code{redfit} and REDFIT with respect to the number of
points per segment and the overlap of consecutive segments.
- \item The critical values of the runs test may differ between
- \code{redfit} and REDFIT due to a different interpretation of the
- related equations.
+ \item The critical values of the runs test differ between
+ \code{redfit} and REDFIT. The equations in REDFIT are flawed,
+ particularly when the number of frequencies is large. For example,
+ the lower limit of the acceptance region exceeds the upper limit
+ when the number of frequencies exceeds a threshold, which is clearly
+ wrong.
}
}
@@ -111,20 +114,25 @@
\item{rcnt }{ a \code{numeric} value giving the number of runs in a
statistical test studying the difference between a theoretical AR1
spectrum and the bias-corrected spectrum estimated from the data.
- Requires that \code{\var{iwin} == 0} (\code{"rectangular"}),
+ Null hypothesis: the two spectra \dQuote{agree}, i.e. the
+ probability of either being larger than the other is 0.5 at every
+ point. Requires that \code{\var{iwin} == 0} (\code{"rectangular"}),
\code{\var{ofac} == 1} and \code{\var{n50} == 1}. Otherwise the
- test is not performed and \var{rcnt} is \code{NULL}. See Bendat and
- Piersol, p. 95. }
+ test is not performed and \var{rcnt} is \code{NULL}. See Bradley,
+ p. 253\enc{–}{--}254, 259\enc{–}{--}263 (probability \var{p} equals
+ 0.5). }
\item{rcritlo }{ a \code{numeric} critical low value for
- \code{\var{rcnt}}. Approximately 2.5 percent of the null
- distribution lies below this value (TO BE CHECKED). Is \code{NULL}
- when \code{\var{rcnt}} is \code{NULL}. }
+ \code{\var{rcnt}}, i.e. the lowest value for accepting the null
+ hyphothesis. Approximately 2.5 percent of the null distribution
+ lies below this value. Is \code{NULL} when \code{\var{rcnt}} is
+ \code{NULL}. }
- \item{rcritlo }{ a \code{numeric} critical high value for
- \code{\var{rcnt}}. Approximately 2.5 percent of the null
- distribution lies above this value (TO BE CHECKED). Is \code{NULL}
- when \code{\var{rcnt}} is \code{NULL}.}
+ \item{rcrithi }{ a \code{numeric} critical high value for
+ \code{\var{rcnt}}, i.e. the highest value for accepting the null
+ hyphothesis. Approximately 2.5 percent of the null distribution
+ lies above this value. Is \code{NULL} when \code{\var{rcnt}} is
+ \code{NULL}.}
\item{freq }{ the frequencies used. A \code{numeric} vector. The
other numeric vectors have the same length, i.e. one value for each
@@ -210,8 +218,8 @@
\href{http://www.ncdc.noaa.gov/paleo/softlib/redfit/redfit.html}{REDFIT},
which is in the public domain.
- Bendat, J. S. and Piersol, A. G. (1986) \emph{Random Data: Analysis
- and Measurement Procedures.} Wiley. \acronym{ISBN}: 0-471-04000-2.
+ Bradley, J. V. (1968) \emph{Distribution-Free Statistical
+ Tests}. Prentice-Hall.
Schulz, M. and Mudelsee, M. (2002) REDFIT: estimating red-noise
spectra directly from unevenly spaced paleoclimatic time series.
More information about the Dplr-commits
mailing list