[Dplr-commits] r697 - in branches/redfit: . R man

noreply at r-forge.r-project.org noreply at r-forge.r-project.org
Tue Oct 1 14:59:27 CEST 2013


Author: mvkorpel
Date: 2013-10-01 14:59:27 +0200 (Tue, 01 Oct 2013)
New Revision: 697

Modified:
   branches/redfit/ChangeLog
   branches/redfit/DESCRIPTION
   branches/redfit/NAMESPACE
   branches/redfit/R/redfit.R
   branches/redfit/man/redfit.Rd
Log:
- Import package "gmp"
- Exports new function runcrit(): acceptance region of number of runs test
- redfit() has changes inspired by differences between REDFIT 3.5 and 3.8e:
  * Ability to average data (x) at duplicate input times (t)
    (new elements "t" and "x" in return value, NULL if no duplicate times)
  * Ability to compute number-of-runs acceptance regions with user-specified
    significance levels (new argument p)
  * New architecture for obtaining the acceptance regions, implemented
    in runcrit() and a few non-exported functions:
    1. check if the result has been precomputed (non-exported list runPrecomp)
    2. if not, try to compute exact result. Limits for maximum allowed
       computing time and largest n for which to try this are set in
       new arguments 'maxTime' and 'nLimit'.
    3. fall back to normal approximation if needed
    4. new element "rcritexact" in the return value tells if the
       result is exact or an approximation
  * Make sure that tau resulting from an estimated rho is non-negative
  * In redfitTrig(), a robust method for computing arg2


Modified: branches/redfit/ChangeLog
===================================================================
--- branches/redfit/ChangeLog	2013-10-01 12:07:30 UTC (rev 696)
+++ branches/redfit/ChangeLog	2013-10-01 12:59:27 UTC (rev 697)
@@ -1,5 +1,16 @@
 * CHANGES IN dplR VERSION 1.5.7
 
+File: DESCRIPTION
+-----------------
+
+- Import gmp (>= 0.5-2)
+
+File: NAMESPACE
+---------------
+
+- New imports from gmp and utils
+- Export redfit() and runcrit()
+
 Various .R files
 ----------------
 
@@ -41,7 +52,8 @@
 Files: redfit.R, redfit.c
 -------------------------
 
-- New function redfit() based on REDFIT by Schulz and Mudelsee
+- New function redfit() based on REDFIT by Schulz and Mudelsee.  Also
+  another exported function runcrit().
 
 
 * CHANGES IN dplR VERSION 1.5.6

Modified: branches/redfit/DESCRIPTION
===================================================================
--- branches/redfit/DESCRIPTION	2013-10-01 12:07:30 UTC (rev 696)
+++ branches/redfit/DESCRIPTION	2013-10-01 12:59:27 UTC (rev 697)
@@ -3,7 +3,7 @@
 Type: Package
 Title: Dendrochronology Program Library in R
 Version: 1.5.7
-Date: 2013-09-24
+Date: 2013-10-01
 Authors at R: c(person(c("Andrew", "G."), "Bunn", role = c("aut", "cph",
         "cre", "trl"), email = "andrew.bunn at wwu.edu"), person("Mikko",
         "Korpela", role = c("aut", "trl")), person("Franco", "Biondi",
@@ -17,8 +17,9 @@
 Copyright: Authors and Aalto University (for work of M. Korpela)
 Maintainer: Andy Bunn <andy.bunn at wwu.edu>
 Depends: R (>= 2.15.0)
-Imports: graphics, grDevices, grid, stats, utils, digest (>= 0.2.3),
-        lattice (>= 0.13-6), stringr (>= 0.4), XML (>= 2.1-0)
+Imports: gmp (>= 0.5-2), graphics, grDevices, grid, stats, utils,
+        digest (>= 0.2.3), lattice (>= 0.13-6), stringr (>= 0.4), XML
+        (>= 2.1-0)
 Suggests: foreach, iterators, RUnit (>= 0.4.25)
 Description: This package contains functions for performing tree-ring
         analyses, IO, and graphics.

Modified: branches/redfit/NAMESPACE
===================================================================
--- branches/redfit/NAMESPACE	2013-10-01 12:07:30 UTC (rev 696)
+++ branches/redfit/NAMESPACE	2013-10-01 12:59:27 UTC (rev 697)
@@ -7,6 +7,8 @@
 
 importFrom(digest, digest)
 
+importFrom(gmp, as.bigq, as.bigz, chooseZ, is.bigq)
+
 importFrom(grDevices, rainbow)
 
 importFrom(grid, gpar, grid.lines, grid.newpage, grid.polygon,
@@ -19,7 +21,7 @@
 importFrom(stringr, str_pad, str_trim)
 
 importFrom(utils, head, installed.packages, read.fwf, tail,
-           packageVersion)
+           packageVersion, write.table)
 
 importFrom(XML, xmlEventParse)
 
@@ -29,7 +31,7 @@
        gini.coef, glk, hanning, i.detrend, i.detrend.series, morlet,
        po.to.wc, pointer, powt, print.redfit, rcs, read.compact,
        read.crn, read.fh, read.ids, read.rwl, read.tridas,
-       read.tucson, redfit, rwi.stats, rwi.stats.legacy,
+       read.tucson, redfit, runcrit, rwi.stats, rwi.stats.legacy,
        rwi.stats.running, rwl.stats, sea, seg.plot, sens1, sens2,
        series.rwl.plot, skel.plot, spag.plot, strip.rwl, tbrm,
        tridas.vocabulary, uuid.gen, wavelet.plot, wc.to.po,

Modified: branches/redfit/R/redfit.R
===================================================================
--- branches/redfit/R/redfit.R	2013-10-01 12:07:30 UTC (rev 696)
+++ branches/redfit/R/redfit.R	2013-10-01 12:59:27 UTC (rev 697)
@@ -1,8 +1,8 @@
 ### This part of dplR was (arguably non-trivially) translated and
-### adapted from public domain Fortran program REDFIT (Michael Schulz
-### and Manfred Mudelsee). The possibly non-free parts of REDFIT
-### derived from Numerical Recipes were not used.
-### http://www.ncdc.noaa.gov/paleo/softlib/redfit/redfit.html
+### adapted from public domain Fortran program REDFIT, version 3.8e
+### (Michael Schulz and Manfred Mudelsee). The possibly non-free parts
+### of REDFIT derived from Numerical Recipes were not used.
+### http://www.geo.uni-bremen.de/geomod/staff/mschulz/
 ### Author of the dplR version is Mikko Korpela.
 ###
 ### Copyright (C) 2013 Aalto University
@@ -31,11 +31,12 @@
 ##
 ## Main Assumptions:
 ## -----------------
-##     - The noise background can be apprximated by an AR(1) process.
+##     - The noise background can be approximated by an AR(1) process.
 ##     - The distribution of data points along the time axis is not
 ##       too clustered.
 ##     - The potential effect of harmonic signal components on the
 ##       estimation procedure is neglected.
+##     - The time series has to be weakly stationary.
 ##
 ## The first-order autoregressive model, AR(1) model, which is used
 ## to describe the noise background in a time series x(t_i), reads
@@ -98,9 +99,35 @@
 ##
 ## 11. Scale theorectical AR(1) spectrum for various significance levels.
 
+## Notes:
+## ------
+## * A linear trend is subtracted from each WOSA segment.
+##
+## * tau is estimated separately for each WOSA segment and subsequently
+##   averaged.
+##
+## * Default max. frequency = avg. Nyquist freq. (hifac = 1.0).
+##
+## dplR note: Authors of REDFIT
+## Authors: Michael Schulz, MARUM and Faculty of Geosciences, Univ. Bremen
+## -------- Klagenfurter Str., D-28334 Bremen
+##          mschulz at marum.de
+##          www.geo.uni-bremen.de/~mschulz
+##
+##          Manfred Mudelsee, Inst. of Meteorology, Univ. Leipzig
+##          Stephanstr. 3, D-04103 Leipzig
+##          Mudelsee at rz.uni-leipzig.de
+##
+## Reference: Schulz, M. and Mudelsee, M. (2002) REDFIT: Estimating
+## ---------- red-noise spectra directly from unevenly spaced paleoclimatic
+##            time series. Computers and Geosciences, 28, 421-426.
+
+
 redfit <- function(x, t, tType = c("time", "age"), nsim = 1000, mctest = TRUE,
-                   ofac = 4, hifac = 1, n50 = 3, rhopre = NULL, iwin = 2,
-                   txOrdered = FALSE, verbose = FALSE, seed = NULL) {
+                   ofac = 4, hifac = 1, n50 = 3, rhopre = NULL,
+                   p = c(0.10, 0.05, 0.02), iwin = 2,
+                   txOrdered = FALSE, verbose = FALSE, seed = NULL,
+                   maxTime = 10, nLimit = 10000) {
     cl <- match.call()
     if (!is.null(seed)) {
         set.seed(seed)
@@ -115,9 +142,10 @@
     NSIM_LIMIT <- 21
     ## dplR: Check
     tType2 <- match.arg(tType)
+    tTime <- tType2 == "time"
     stopifnot(is.numeric(x))
     if (!is.null(rhopre)) {
-        stopifnot(is.numeric(rhopre), length(rhopre) == 1, is.finite(rhopre))
+        stopifnot(is.numeric(rhopre), length(rhopre) == 1, rhopre <= 1)
     }
     stopifnot(is.numeric(ofac), length(ofac) == 1, is.finite(ofac))
     if (ofac < 1) {
@@ -127,10 +155,16 @@
     if (hifac <= 0) {
         stop("'hifac' must be positive")
     }
-    stopifnot(is.numeric(n50), length(n50) == 1, is.finite(n50),
-              round(n50) == n50, n50 >= 1)
-    stopifnot(is.numeric(nsim), length(nsim) == 1, is.finite(nsim),
-              round(nsim) == nsim, nsim >= 1)
+    stopifnot(is.numeric(n50), length(n50) == 1, is.finite(n50), n50 >= 1,
+              round(n50) == n50)
+    stopifnot(is.numeric(nsim), length(nsim) == 1, is.finite(nsim), nsim >= 1,
+              round(nsim) == nsim)
+    if (length(p) > 0) {
+        stopifnot(is.numeric(p) || is.bigq(p), p > 0, p < 1)
+    }
+    stopifnot(is.numeric(maxTime), length(maxTime) == 1, maxTime >= 0)
+    stopifnot(is.numeric(nLimit), length(nLimit) == 1, nLimit >= 0,
+              round(nLimit) == nLimit)
     stopifnot(identical(txOrdered, TRUE) || identical(txOrdered, FALSE))
     stopifnot(identical(verbose, TRUE) || identical(verbose, FALSE))
     stopifnot(identical(mctest, TRUE) || identical(mctest, FALSE))
@@ -192,28 +226,79 @@
         stop(gettextf("too few points (%.0f), at least %.0f needed",
                       np, MIN_POINTS, domain = "R-dplR"), domain = NA)
     }
+    duplT <- FALSE
     if (tGiven && !txOrdered) {
         idx <- order(t2)
         t2 <- t2[idx]
         x2 <- x2[idx]
+        dupl <- duplicated(t2)
+        if (any(dupl)) {
+            duplT <- TRUE
+            if (tTime) {
+                warning("Duplicate times in 't', averaging data")
+            } else {
+                warning("Duplicate ages in 't', averaging data")
+            }
+            if (verbose) {
+                if (tTime) {
+                    cat(gettext("Number of duplicates by time,\n",
+                                domain = "R-dplR"), file = stderr())
+                } else {
+                    cat(gettext("Number of duplicates by age,\n",
+                                domain = "R-dplR"), file = stderr())
+                }
+                cat(gettext("'k' duplicates means 'k + 1' total obsevations:\n",
+                            domain = "R-dplR"), file = stderr())
+                dtable <- table(t2[dupl])
+                if (tTime) {
+                    dtable <- data.frame(time = as.numeric(names(dtable)),
+                                         duplicates = as.vector(dtable))
+                } else {
+                    dtable <- data.frame(age = as.numeric(names(dtable)),
+                                         duplicates = as.vector(dtable))
+                }
+                write.table(dtable, row.names = FALSE, file = stderr())
+            }
+            notdupl <- !dupl
+            nunique <- sum(notdupl)
+            xnew <- numeric(nunique)
+            currentid <- 1
+            currentstart <- 1
+            for (k in 2:np) {
+                if (notdupl[k]) {
+                    xnew[currentid] <- mean(x2[currentstart:(k - 1)])
+                    currentid <- currentid + 1
+                    currentstart <- k
+                }
+            }
+            if (currentid == nunique) {
+                xnew[nunique] <- mean(x2[currentstart:np])
+            }
+            t2 <- t2[notdupl]
+            x2 <- xnew
+            np <- nunique
+            if (np < MIN_POINTS) {
+                stop(gettextf("too few points (%.0f), at least %.0f needed",
+                              np, MIN_POINTS, domain = "R-dplR"), domain = NA)
+            }
+        }
     }
     ## dplR: The rest of the function assumes that t2 is age, not time
-    if (tType2 == "time") {
+    t2NoRev <- t2
+    x2NoRev <- x2
+    if (tTime) {
         t2 <- -rev(t2)
         x2 <- rev(x2)
     }
     if (tGiven) {
         difft <- diff(t2)
-        if (!txOrdered && any(difft == 0)) {
-            stop("duplicated values in 't'")
-        }
     } else {
         difft <- rep.int(1.0, np)
     }
     ## dplR: Setup
     params <- redfitSetdim(MIN_POINTS, t2, ofac, hifac, n50, verbose,
                            iwin = iwin2, nsim = nsim, mctest = mctest,
-                           rhopre = rhopre)
+                           rhopre = rhopre, p = p)
     avgdt <- params[["avgdt"]]
     nseg <- params[["nseg"]]
     fnyq <- params[["fnyq"]]
@@ -238,6 +323,13 @@
     ## dplR: estimate lag-1 autocorrelation coefficient unless prescribed
     if (is.null(rhopre) || rhopre < 0) {
         rho <- redfitGetrho(t2, x2, dn50, nseg, segskip, lmfitfun)
+        ## make sure that tau is non-negative
+        if (rho > 1) {
+            warning(gettext("redfitGetrho returned rho = %f, forced to zero",
+                            rho, domain = "R-dplR"),
+                    domain = NA)
+            rho <- 0
+        }
     } else {
         rho <- rhopre
     }
@@ -325,43 +417,49 @@
         rcnt <- 1 + sum(diff(spectrcomp) != 0)
 
         ## dplR: Old formulas for rcritlo, rcrithi (REDFIT).
-        ## ## Test equality of theoretical AR1 and estimated spectrum
+        ##
+        ## ## test equality of theoretical AR1 and estimated spectrum
         ## ## using a runs test (Bendat and Piersol, 1986, p. 95). The
         ## ## empirical equations for calculating critical values for
-        ## ## 5-% significance were derived from the tabulated
-        ## ## critical values in B&P.  dplR: NOTE! Integer division is
-        ## ## used in REDFIT.  We can assume that real(nout/2) was
-        ## ## supposed to be real(nout)/2.
-        ## ## sqrtHalfNfreq <- sqrt(nfreq %/% 2)
-        ## sqrtHalfNfreq <- sqrt(nfreq / 2)
-        ## rcritlo <- round((-0.79557086 + 1.0088719 * sqrtHalfNfreq)^2)
-        ## rcrithi <- round(( 0.75751462 + 0.9955133 * sqrtHalfNfreq)^2)
+        ## ## different significanes were derived from the tabulated
+        ## ## critical values in B&P.
+        ## sqrtHalfNfreq <- sqrt(nfreq %/% 2)
+        ## ## REDFIT >= 3.8a (at least until 3.8e)
+        ## 10-% level of significance
+        ## rcritlo10 <- round((-0.62899892 + 1.0030933 * sqrtHalfNfreq)^2)
+        ## rcrithi10 <- round(( 0.66522732 + 0.9944506 * sqrtHalfNfreq)^2)
+        ## 5-% level of significance
+        ## rcritlo5 <- round((-0.78161838 + 1.0069634 * sqrtHalfNfreq)^2)
+        ## rcrithi5 <- round(( 0.75701059 + 0.9956021 * sqrtHalfNfreq)^2)
+        ## 2-% level of significance
+        ## rcritlo2 <- round((-0.92210867 + 1.0064993 * sqrtHalfNfreq)^2)
+        ## rcrithi2 <- round(( 0.82670832 + 1.0014299 * sqrtHalfNfreq)^2)
 
         ## dplR: Updated formulas for rcritlo, rcrithi.
         ##
-        ## The REDFIT formulas seem to be very, very inexact, with
-        ## either definition of sqrtHalfNfreq.  For example, the width
-        ## of the acceptance region increases, then decreases (*), and
-        ## goes to <= 0 at nfreq >= 27144 (original sqrtHalfNfreq) or
-        ## at nfreq >= 27144 || nfreq %in% seq(from = 27051, by = 2,
-        ## to = 27143) (updated sqrtHalfNfreq).  Another example:
-        ## rcritlo is 0 (impossible number of runs) at some small
-        ## values of nfreq.
+        ## The REDFIT formulas seem to be quite inexact.  For example,
+        ## the width of the acceptance region increases, then
+        ## decreases (*), and goes to <= 0 at nfreq >= 27144 (5 %
+        ## significance).  Another example: rcritlo is 0 (impossible
+        ## number of runs) at some small values of nfreq.
         ##
         ## (*) The increase and decrease are general trends, but there
         ## are local fluctuations against the trend.
         ##
         ## About the new formulas:
         ##
-        ## Exact values were computed up to nfreq == 16000.  For nfreq
-        ## > 16000, a normal approximation is used.  For nfreq <=
-        ## 16000, the acceptance region obtained using the normal
-        ## approximation differs from the exact acceptance region at
-        ## 52 values of nfreq (diffIdx).  The problem at hand
-        ## (comparison of two spectra) is analogous to studying the
-        ## number of runs of heads and tails with nfreq tosses of a
-        ## fair coin (p == 0.5).
+        ## Exact values were computed for nfreq <= NMAX (possibly
+        ## depending on significance levels, see redfitTablecrit() for
+        ## up-to-date values) at a selected few significance levels.
+        ## For non-tabulated significance levels, the exact solution
+        ## is computed if time permits and nfreq is not too large
+        ## (maxTime, nLimit), or finally a normal approximation is
+        ## used.
         ##
+        ## The problem at hand (comparison of two spectra) is
+        ## analogous to studying the number of runs of heads and tails
+        ## with nfreq tosses of a fair coin (p == 0.5).
+        ##
         ## The sequence of acceptance region widths is non-decreasing
         ## for both odd and even nfreq, individually.  However,
         ## because of the symmetric distribution, if rcritlo increases
@@ -370,180 +468,15 @@
         ##
         ## Reference: Bradley, J. V. (1968) Distribution-Free
         ## Statistical Tests. Prentice-Hall. p. 253--254, 259--263.
-        ##
-        ## Code for computing the exact rcritlo, rcrithi can be found
-        ## below.  It was used for obtaining 'diffIdx' and the
-        ## corrections +1 and -1.  Requires the "gmp" package
-        ## (arbitrary precision numbers).  Runs very slowly for large
-        ## values of n.
-        ##
-        ## library(gmp)
-        ## runprobZ <- function(k, n) {
-        ##     invhalfpowM1 <- as.bigz(2)^(n - 1)
-        ##     if (k %% 2 == 0) {
-        ##         ## even number of runs
-        ##         r <- k / 2
-        ##         n1 <- seq(from = r, by = 1, to = n - r)
-        ##         nn1 <- length(n1)
-        ##         halfn1 <- nn1 %/% 2
-        ##         if (nn1 %% 2 == 1) {
-        ##             probsum <- chooseZ(n1[halfn1 + 1] - 1, r - 1)
-        ##             probsum <- probsum * probsum
-        ##         } else {
-        ##             probsum <- 0
-        ##         }
-        ##         lown1 <- n1[seq_len(halfn1)]
-        ##         if (length(lown1) > 0) {
-        ##             lown2 <- n - lown1
-        ##             probsum <- probsum + 2 * sum(chooseZ(lown1 - 1, r - 1) *
-        ##                                          chooseZ(lown2 - 1, r - 1))
-        ##         }
-        ##         probsum / invhalfpowM1
-        ##     } else if (k == 1) {
-        ##         ## one run
-        ##         as.bigq(2)^(1 - n)
-        ##     } else {
-        ##         ## odd number of runs
-        ##         r <- (k - 1) / 2
-        ##         n1 <- seq(from = r + 1, by = 1, to = n - r)
-        ##         n2 <- n - n1
-        ##         probsum <- sum(chooseZ(n1 - 1, r) * chooseZ(n2 - 1, r - 1))
-        ##         probsum / invhalfpowM1
-        ##     }
-        ## }
-        ## newcrit <- function(n, crit = 0.05, verbose = FALSE) {
-        ##     stopifnot(is.numeric(n), length(n) >= 1, is.finite(n),
-        ##               round(n) == n, n > 0, is.numeric(crit),
-        ##               length(crit) == 1, is.finite(crit), crit < 1)
-        ##     verbose2 <- isTRUE(verbose)
-        ##     halfcrit <- crit / 2
-        ##     nn <- length(n)
-        ##     res <- matrix(NA_real_, 2, nn)
-        ##     for (j in seq_len(nn)) {
-        ##         thisn <- n[j]
-        ##         if (verbose2) {
-        ##             cat("n = ", thisn, " (", j, " / ", nn, ")\n", sep = "")
-        ##         }
-        ##         halfn <- thisn %/% 2
-        ##         oddn <- thisn %% 2
-        ##         complength <- halfn + oddn
-        ##         lowaccept <- NA_real_
-        ##         if (oddn == 1) {
-        ##             csum <- (as.bigq(1) - runprobZ(complength, thisn)) / 2
-        ##             if (as.numeric(csum) <= halfcrit) {
-        ##                 lowaccept <- complength
-        ##             } else {
-        ##                 for (k in seq(from = complength - 1, by = -1,
-        ##                               length.out = complength - 1)) {
-        ##                     csum <- csum - runprobZ(k, thisn)
-        ##                     if (csum <= halfcrit) {
-        ##                         lowaccept <- k
-        ##                         break
-        ##                     }
-        ##                 }
-        ##             }
-        ##         } else {
-        ##             csum <- as.bigq(1, 2)
-        ##             for (k in seq(from = complength, by = -1,
-        ##                           length.out = complength)) {
-        ##                 csum <- csum - runprobZ(k, thisn)
-        ##                 if (csum <= halfcrit) {
-        ##                     lowaccept <- k
-        ##                     break
-        ##                 }
-        ##             }
-        ##         }
-        ##         highaccept <- thisn - lowaccept + 1
-        ##         res[, j] <- c(lowaccept, highaccept)
-        ##     }
-        ##     drop(res)
-        ## }
-
-        ## dplR: Empirical mean and standard deviation (variance) of
-        ## the number of runs distribution, and code for computing
-        ## them.
-        ## library(gmp)
-        ## runtableZ <- function(n) {
-        ##     stopifnot(is.numeric(n), length(n) == 1,
-        ##               is.finite(n), round(n) == n, n > 0)
-        ##     halfn <- n %/% 2
-        ##     oddn <- n %% 2
-        ##     res <- numeric(n)
-        ##     invhalfpowM1 <- as.bigz(2)^(n - 1)
-        ##     ## Symmetric distribution. Compute first half only.
-        ##     complength <- halfn + oddn
-        ##     evenlength <- complength %/% 2
-        ##     oddlength <- evenlength + complength %% 2 - 1
-        ##     ## one run
-        ##     res[1] <- 0.5^(n - 1)
-        ##     ## odd number of runs (>= 3)
-        ##     oddseq <- seq(from = 3, by = 2, length.out = oddlength)
-        ##     for (k in oddseq) {
-        ##         r <- (k - 1) / 2
-        ##         n1 <- seq(from = r + 1, by = 1, to = n - r)
-        ##         n2 <- n - n1
-        ##         probsum <- sum(chooseZ(n1 - 1, r) * chooseZ(n2 - 1, r - 1))
-        ##         res[k] <- as.numeric(probsum / invhalfpowM1)
-        ##     }
-        ##     ## even number of runs
-        ##     evenseq <- seq(from = 2, by = 2, length.out = evenlength)
-        ##     for (k in evenseq) {
-        ##         r <- k / 2
-        ##         n1 <- seq(from = r, by = 1, to = n - r)
-        ##         nn1 <- length(n1)
-        ##         halfn1 <- nn1 %/% 2
-        ##         if (nn1 %% 2 == 1) {
-        ##             probsum <- chooseZ(n1[halfn1 + 1] - 1, r - 1)
-        ##             probsum <- probsum * probsum
-        ##         } else {
-        ##             probsum <- 0
-        ##         }
-        ##         leftn1 <- n1[seq_len(halfn1)]
-        ##         if (length(leftn1) > 0) {
-        ##             leftn2 <- n - leftn1
-        ##             probsum <- probsum + 2 * sum(chooseZ(leftn1 - 1, r - 1) *
-        ##                                          chooseZ(leftn2 - 1, r - 1))
-        ##         }
-        ##         res[k] <- as.numeric(probsum / invhalfpowM1)
-        ##     }
-        ##     ## Last half is mirror image of first half
-        ##     res[seq(from = n, by = -1, length.out = halfn)] <-
-        ##         res[seq_len(halfn)]
-        ##     res / sum(res)
-        ## }
-        ## meanvar <- function(n, crit = 0.05) {
-        ##     nn <- length(n)
-        ##     res <- matrix(NA_real_, 2, nn)
-        ##     for (k in seq_len(nn)) {
-        ##         thisn <- n[k]
-        ##         rtable <- runtableZ(thisn)
-        ##         nseq <- 1:thisn
-        ##         res[1, k] <- sum(rtable * nseq)
-        ##         res[2, k] <- sum(rtable * (nseq - res[1, k])^2)
-        ##     }
-        ##     drop(res)
-        ## }
-        nMean <- 0.5 * nfreq + 0.5
-        nSd <- sqrt(0.25 * nfreq - 0.25)
-
-        diffIdx <-
-            c(18, 45, 68, 95, 139, 191, 268, 302, 397, 439,
-              552, 652, 847, 1002, 1101, 1709, 1838, 2063, 2110, 2157,
-              2763, 2926, 3325, 3504, 3626, 3750, 3876, 4004, 4134,
-              4333, 4816, 4887, 5031, 5550, 6095, 6500, 6749, 7613,
-              8624, 8719, 9202, 10414, 10941, 11048, 11591, 13415,
-              13772, 14500, 14623, 14871, 15627, 15883)
-        diffMatch <- match(nfreq, diffIdx)
-        rcritlo <- floor(qnorm(0.025, mean = nMean, sd = nSd) + 0.5)
-        rcrithi <- floor(qnorm(0.975, mean = nMean, sd = nSd) + 0.5)
-        if (!is.na(diffMatch)) {
-            rcritlo <- rcritlo + 1
-            rcrithi <- rcrithi - 1
-        }
+        tmp <- runcrit(nfreq, p, maxTime, nLimit)
+        rcritlo <- tmp[[1]]
+        rcrithi <- tmp[[2]]
+        rcritexact <- tmp[[3]]
     } else {
         rcnt <- NULL
         rcritlo <- NULL
         rcrithi <- NULL
+        rcritexact <- NULL
     }
 
     ## dplR: Elements of the list returned from this function:
@@ -551,8 +484,9 @@
     ##  rho       average autocorrelation coefficient (estimated or prescribed)
     ##  tau       average tau, tau == -avgdt / log(rho)
     ##  rcnt      runs count, test of equality of theoretical and data spectrum
-    ##  rcritlo   critical low value for rcnt
-    ##  rcrithi   critical high value for rcnt
+    ##  rcritlo   critical low value(s) for rcnt, one for each p
+    ##  rcrithi   critical high value(s) for rcnt, one for each p
+    ##  rcritexact   are the critical values (limits of acceptance region) exact?
     ##  freq      frequency vector
     ##  gxx       autospectrum of input data
     ##  gxxc      corrected autospectrum of input data
@@ -567,6 +501,8 @@
     ##  params    parameters dependent on the command line arguments
     ##  vers      version of dplR containing the function
     ##  seed      if not NULL, value used for set.seed(seed)
+    ##  t         t with duplicates removed (times or ages) or NULL
+    ##  x         x averaged over duplicate values of t or NULL
     dplrNS <- tryCatch(getNamespace("dplR"), error = function(...) NULL)
     if (!is.null(dplrNS) && exists("redfit", dplrNS) &&
         identical(match.fun(as.list(cl)[[1]]), get("redfit", dplrNS))) {
@@ -575,11 +511,12 @@
         vers <- NULL
     }
     res <- list(varx = varx, rho = rho, tau = tau, rcnt = rcnt,
-                rcritlo = rcritlo, rcrithi = rcrithi,
+                rcritlo = rcritlo, rcrithi = rcrithi, rcritexact = rcritexact,
                 freq = freq, gxx = gxx, gxxc = gxxc, grravg = grravg,
                 gredth = gredth, corr = corr,
                 ci80 = ci80, ci90 = ci90, ci95 = ci95, ci99 = ci99,
-                call = cl, params = params, vers = vers, seed = seed)
+                call = cl, params = params, vers = vers, seed = seed,
+                t = if (duplT) t2NoRev, x = if (duplT) x2NoRev)
     class(res) <- "redfit"
     res
 }
@@ -733,13 +670,26 @@
         precat(rep.int("-", nchar(gtxt)))
         rcnt <- x[["rcnt"]]
         if (!is.null(rcnt)) {
-            gtxt <- gettext("5-% acceptance region:", domain = "R-dplR")
-            precat(gtxt, newline = FALSE)
-            cat(" rcritlo = ", format(x[["rcritlo"]], digits = digits), "\n",
-                sep = "")
-            precat(rep.int(" ", nchar(gtxt)), newline = FALSE)
-            cat(" rcrithi = ", format(x[["rcrithi"]], digits = digits), "\n",
-                sep = "")
+            runP <- params[["p"]]
+            nP <- length(runP)
+            if (nP > 0) {
+                gtxt <- gettextf("%s-%% acceptance region:",
+                                 format(as.numeric(100 * (1 - runP)),
+                                        digits = digits),
+                                 domain = "R-dplR")
+                nC <- nchar(gtxt[1])
+                rcritlo <- x[["rcritlo"]]
+                rcrithi <- x[["rcrithi"]]
+            }
+            for (k in seq_len(nP)) {
+                precat(gtxt[k], newline = FALSE)
+                cat(" rcritlo = ", format(rcritlo[k], digits = digits), "\n",
+                    sep = "")
+                precat(rep.int(" ", nC), newline = FALSE)
+                cat(" rcrithi = ", format(rcrithi[k], digits = digits), "\n",
+                    sep = "")
+                precat()
+            }
             precat("r_test = ", format(rcnt, digits = digits))
         } else {
             if (iwin != 0) {
@@ -789,6 +739,503 @@
     invisible(x)
 }
 
+## dplR: Utility function.
+redfitRunprobZ <- function(k, n) {
+    invhalfpowM1 <- as.bigz(2)^(n - 1)
+    if (k %% 2 == 0) {
+        ## even number of runs
+        r <- k / 2
+        n1 <- seq(from = r, by = 1, to = n - r)
+        nn1 <- length(n1)
+        halfn1 <- nn1 %/% 2
+        if (nn1 %% 2 == 1) {
+            probsum <- chooseZ(n1[halfn1 + 1] - 1, r - 1)
+            probsum <- probsum * probsum
+        } else {
+            probsum <- 0
+        }
+        lown1 <- n1[seq_len(halfn1)]
+        if (length(lown1) > 0) {
+            lown2 <- n - lown1
+            probsum <- probsum + 2 * sum(chooseZ(lown1 - 1, r - 1) *
+                                         chooseZ(lown2 - 1, r - 1))
+        }
+        probsum / invhalfpowM1
+    } else if (k == 1) {
+        ## one run
+        as.bigq(2)^(1 - n)
+    } else {
+        ## odd number of runs
+        r <- (k - 1) / 2
+        n1 <- seq(from = r + 1, by = 1, to = n - r)
+        n2 <- n - n1
+        probsum <- sum(chooseZ(n1 - 1, r) * chooseZ(n2 - 1, r - 1))
+        probsum / invhalfpowM1
+    }
+}
+
+## dplR: Utility function.
+redfitRuncsum <- function(n, crit, verbose = FALSE,
+                          timelimit = Inf) {
+    if (is.bigq(crit)) {
+        halfcrit <- crit / 2
+    } else {
+        halfcrit <- as.bigq(crit, 2)
+    }
+    verbose2 <- isTRUE(verbose)
+    nn <- length(n)
+    csums <- vector(mode = "list", length = nn)
+    for (j in seq_len(nn)) {
+        thisn <- n[j]
+        if (verbose2) {
+            cat("n = ", thisn, " (", j, " / ", nn, ")\n", sep = "")
+        }
+        halfn <- thisn %/% 2
+        oddn <- thisn %% 2
+        complength <- halfn + oddn
+        tmpcsums <- as.bigq(rep.int(NA_real_, complength))
+        if (oddn == 1) {
+            st <- system.time({
+                csum <- (as.bigq(1) - redfitRunprobZ(complength,
+                                                     thisn)) / 2
+                tmpcsums[[complength]] <- csum
+            })
+        } else {
+            st <- system.time({
+                csum <- as.bigq(1, 2) - redfitRunprobZ(complength, thisn)
+                tmpcsums[[complength]] <- csum
+            })
+        }
+        if (st[3] > timelimit) {
+            stop("timelimit exceeded")
+        }
+        if (csum > halfcrit) {
+            finalk <- 1
+            for (k in seq(from = complength - 1, by = -1,
+                          length.out = max(0, complength - 2))) {
+                csum <- csum - redfitRunprobZ(k, thisn)
+                tmpcsums[[k]] <- csum
+                if (csum <= halfcrit) {
+                    finalk <- k
+                    break
+                }
+            }
+        } else {
+            finalk <- complength
+        }
+        seqstart <- max(2, finalk)
+        seqlength <- complength - seqstart + 1
+        ## store n, drop 0 and NA
+        csums[[j]] <- c(as.bigq(thisn),
+                        tmpcsums[seq(from = seqstart, by = 1,
+                                     length.out = seqlength)])
+    }
+    if (nn == 1) {
+        csums <- csums[[1]]
+    }
+    csums
+}
+
+## dplR: Utility function.
+## crit can be bigq or numeric
+redfitCsumtocrit <- function(csums, crit, limits = FALSE) {
+    if (is.list(csums)) {
+        csums2 <- csums
+    } else {
+        csums2 <- list(csums)
+    }
+    nn <- length(csums2)
+    ## Our own sorting function iSort can handle bigq
+    tmp <- iSort(crit, decreasing = TRUE)
+    halfcrit <- tmp[[1]] / 2
+    ncrit <- length(halfcrit)
+    if (limits) {
+        lowcrit <- matrix(NA_real_, ncrit, nn)
+        highcrit <- matrix(NA_real_, ncrit, nn)
+        tmp2 <- as.character(as.numeric(rev(tmp[[1]])))
+        rownames(lowcrit) <- tmp2
+        rownames(highcrit) <- tmp2
+    }
+    noZeroCrit <- halfcrit[ncrit] != 0
+    res <- matrix(NA_real_, 2 * ncrit, nn)
+    rownames(res) <- as.character(as.numeric(c(rev(halfcrit), 1 - halfcrit)))
+    for (j in seq_len(nn)) {
+        Csums <- csums2[[j]]
+        nthis <- length(Csums)
+        n <- as.numeric(Csums[[1]])
+        complength <- n %/% 2 + n %% 2
+        if (complength == 1) {
+            res[, j] <- rep(c(1, n), each = ncrit)
+            if (limits) {
+                lowcrit[, j] <- rep.int(0, ncrit)
+                highcrit[, j] <- rep.int(0.5, ncrit)
+            }
+        } else {
+            Csums <- c(Csums[seq(from = nthis, by = -1,
+                                 length.out = nthis - 1)],
+                       rep.int(NA_real_, complength - nthis),
+                       Csums[[1]])
+            lowaccept <- rep.int(NA_real_, ncrit)
+            lowlow <- rep.int(NA_real_, ncrit)
+            lowhigh <- rep.int(NA_real_, ncrit)
+            allGood <- nthis == complength
+            l <- 1
+            for (k in seq_len(ncrit)) {
+                thisHalfcrit <- halfcrit[k]
+                l <- l - 1 + which.max(Csums[l:complength] <= thisHalfcrit)
+                if (Csums[[l]] <= thisHalfcrit) {
+                    lowaccept[k] <- complength + 1 - l
+                    lowlow[k] <- as.numeric(Csums[[l]])
+                    if (l > 1) {
+                        lowhigh[k] <- as.numeric(Csums[[l - 1]])
+                    } else {
+                        lowhigh[k] <- 0.5
+                    }
+                } else if (allGood || thisHalfcrit == 0) {
+                    lowaccept[k] <- 1
+                    lowlow[k] <- 0
+                    lowhigh[k] <- as.numeric(Csums[[complength - 1]])
+                } else if (noZeroCrit) {
+                    break
+                }
+            }
+            highaccept <- n + 1 - lowaccept
+            res[, j] <- c(rev(lowaccept), highaccept)
+            if (limits) {
+                lowcrit[, j] <- rev(lowlow)
+                highcrit[, j] <- rev(lowhigh)
+            }
+        }
+    }
+    ## When limits = TRUE, we also return the limits pmin and pmax such
+    ## that res holds when pmin < crit < pmax.
+    if (limits) {
+        list(drop(res),
+             pmin = 2 * apply(lowcrit, 1, max),
+             pmax = 2 * apply(highcrit, 1, min))
+    } else {
+        drop(res)
+    }
+}
+
+## dplR: Normal approximation of the acceptance region of the number
+## of runs test.
+## p must be numeric. If limits is TRUE, length(p) must be 1.
[TRUNCATED]

To get the complete diff run:
    svnlook diff /svnroot/dplr -r 697


More information about the Dplr-commits mailing list