[datatable-help] How to speed up grouping time series, help please

Matthew Dowle mdowle at mdowle.plus.com
Wed Apr 6 00:48:18 CEST 2011


pls explain the big picture. I don't recognise why you're attempting to
do.call merge.
> Thanks for Your reply Matthew,
> On 10 ts, 10000 values each, it takes 5.7 seconds to reshape, I'm willing
> to reduce time at least by the half reducing my total batch time by 10
> minutes approximately (over 70 minutes total).
>
> I'm trying to do something like:
> do.call(mergfe, x[, .SD, by=ID]) but data.table is not designed to work
> this way (return a data.table), there is no problem in data.table itself.
> I'm trying to extract K (10) data.table from  a data.table with keys ID,
> DATE and then CJ.
>
>
> Thanks in advance for any help.
> Best regards,
> Daniele
>
> -----Original Message-----
> From: datatable-help-bounces at r-forge.wu-wien.ac.at
> [mailto:datatable-help-bounces at r-forge.wu-wien.ac.at] On Behalf Of Matthew
> Dowle
> Sent: 05 April 2011 14:20
> To: datatable-help at r-forge.wu-wien.ac.at
> Subject: Re: [datatable-help] How to speed up grouping time series,help
> please
>
> It's easier to help if you provide timings along with your example
> reproducible code, please.
> How long is it taking, and how long do you think it should take?
> Please also try to avoid phrases such as "without success". Does that mean
> you got an error message (if so, what was it) or wrong result (if so, what
> was wrong)?
> Matthew
>
> "Daniele Amberti" <daniele.amberti at ors.it> wrote in message
> news:5C57984CA179A247803E12AAB0F7ABA6DB20979608 at adorsmail01.ors.local...
>>I retrieve for a few hundred times a group of time series (10-15 ts
>>with 10000 values each), on every group I do some calculation, graphs
>>etc. I wonder if there is a faster method than what presented below to
>>get an appropriate timeseries object.
>>
>> Making a query with RODBC for every group I get a data frame like this:
>>
>>> X
>>  ID                DATE     VALUE
>> 14  3 2000-01-01 00:00:03 0.5726334
>> 4   1 2000-01-01 00:00:03 0.8830174
>> 1   1 2000-01-01 00:00:00 0.2875775
>> 15  3 2000-01-01 00:00:04 0.1029247
>> 11  3 2000-01-01 00:00:00 0.9568333
>> 9   2 2000-01-01 00:00:03 0.5514350
>> 7   2 2000-01-01 00:00:01 0.5281055
>> 6   2 2000-01-01 00:00:00 0.0455565
>> 12  3 2000-01-01 00:00:01 0.4533342
>> 8   2 2000-01-01 00:00:02 0.8924190
>> 3   1 2000-01-01 00:00:02 0.4089769
>> 13  3 2000-01-01 00:00:02 0.6775706
>>
>> And I want to get a timeSeries object or xts object like this:
>>
>>                           1         2         3
>> 2000-01-01 00:00:00 0.2875775 0.0455565 0.9568333
>> 2000-01-01 00:00:01        NA 0.5281055 0.4533342
>> 2000-01-01 00:00:02 0.4089769 0.8924190 0.6775706
>> 2000-01-01 00:00:03 0.8830174 0.5514350 0.5726334
>> 2000-01-01 00:00:04        NA        NA 0.1029247
>>
>> Both classes accept a matrix so if I can create a matrix like the one
>> represented above and an array of characters representing dates faster
>> than what possible with xts:::merge, for example, I will have a faster
>> implementation, this is the reason why I'm writing to datatable-help;
>> I red vignettes, tests and did tests trying to generate a set of
>> data.table (using .SD and by = ID) an then CJ but without success up
>> to now, any input to test this approach will be really appreciate.
>>
>> Input data can be sorted or unsorted (the most complicated case is in
>> the example, unsorted and missing data) in the sense that I can  sort
>> in query if I can take an advantage from this.
>>
>> Below some code to generate the test case above.
>>
>> Thanks in advance for any input, best regards, Daniele
>>
>>
>> set.seed(123)
>> N <- 100 # number of observations, use 5 to replicate test case above
>> K <- 3   # number of timeseries ID
>>
>> X <- data.frame(
>> ID = rep(1:K, each = N),
>> DATE = as.character(rep(as.POSIXct("2000-01-01", tz = "GMT")+ 0:(N-1),
>> K)),
>> VALUE = runif(N*K), stringsAsFactors = FALSE)
>>
>> X <- X[sample(1:(N*K), N*K),] # sample observations to get random order
>> (optional)
>> X <- X[-(sample(1:nrow(X), floor(nrow(X)*0.2))),] # 20% missing
>>
>> head(X, 15)
>>
>>
>> # an implementation in xts:
>> xtsSplit <- function(x)
>> {
>> library(xts)
>> x <- xts(x[,c("ID","VALUE")], as.POSIXct(x[,"DATE"]))
>> x <- do.call(merge, split(x$VALUE,x$ID))
>> return(x)
>> }
>>
>> xtsSplitTime <- replicate(50,
>> system.time(xtsSplit(X))[[1]])
>> median(xtsTime)
>>
>>
>> ORS Srl
>>
>> Via Agostino Morando 1/3 12060 Roddi (Cn) - Italy
>> Tel. +39 0173 620211
>> Fax. +39 0173 620299 / +39 0173 433111
>> Web Site www.ors.it
>>
>> ------------------------------------------------------------------------------------------------------------------------
>> Qualsiasi utilizzo non autorizzato del presente messaggio e dei suoi
>> allegati è vietato e potrebbe costituire reato.
>> Se lei avesse ricevuto erroneamente questo messaggio, Le saremmo grati
>> se
>> provvedesse alla distruzione dello stesso
>> e degli eventuali allegati.
>> Opinioni, conclusioni o altre informazioni riportate nella e-mail, che
>> non
>> siano relative alle attività e/o
>> alla missione aziendale di O.R.S. Srl si intendono non  attribuibili
>> alla
>> società stessa, né la impegnano in alcun modo.
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
>
>
>
>
> ORS Srl
>
> Via Agostino Morando 1/3 12060 Roddi (Cn) - Italy
> Tel. +39 0173 620211
> Fax. +39 0173 620299 / +39 0173 433111
> Web Site www.ors.it
>
> ------------------------------------------------------------------------------------------------------------------------
> Qualsiasi utilizzo non autorizzato del presente messaggio e dei suoi
> allegati è vietato e potrebbe costituire reato.
> Se lei avesse ricevuto erroneamente questo messaggio, Le saremmo grati se
> provvedesse alla distruzione dello stesso
> e degli eventuali allegati.
> Opinioni, conclusioni o altre informazioni riportate nella e-mail, che non
> siano relative alle attività e/o
> alla missione aziendale di O.R.S. Srl si intendono non  attribuibili alla
> società stessa, né la impegnano in alcun modo.
>




More information about the datatable-help mailing list