[datatable-help] How to speed up grouping time series, help please

Matthew Dowle mdowle at mdowle.plus.com
Thu Apr 7 10:39:17 CEST 2011


Hi Daniele,

Maybe someone else on the list can see what you mean better than I can :-)

Otherwise, you're stuck with me!  ...

What I mean by 'big picture' is what you're really trying to do i.e. the end 
result.

For example: " I have measurements of 10 thousands things observed very
often and sometimes twice in the same second. I would like the mean
measurement every 5 minutes, but when several measurements occur in the
same second I would like to include just the maximum of those ones. Here
is some code you can paste into R to generate some data, and here is my
attempt so far ..."

The problems with your last post include  i) what is X, a data.frame or
or a data.table?   ii) which reshape are you using the stats one or the
reshape package?  iii) there is nothing I can actually paste into an R
prompt and  iv) what is the final result you really want to achieve?

Also, please read the section headed "describe the goal, not the step" here 
:
      http://www.catb.org/~esr/faqs/smart-questions.html
which is the last link in the r-help posting guide.

Thanks,
Matthew


"Daniele Amberti" <daniele.amberti at ors.it> wrote in message 
news:5C57984CA179A247803E12AAB0F7ABA6EB77009594 at adorsmail01.ors.local...
Sorry for delayed answer Matthew and thanks for Your interest in the topic.
I can get the appropriate matrix shape from
> X
  ID                DATE     VALUE
14  3 2000-01-01 00:00:03 0.5726334
4   1 2000-01-01 00:00:03 0.8830174
1   1 2000-01-01 00:00:00 0.2875775
15  3 2000-01-01 00:00:04 0.1029247
11  3 2000-01-01 00:00:00 0.9568333
9   2 2000-01-01 00:00:03 0.5514350
7   2 2000-01-01 00:00:01 0.5281055
6   2 2000-01-01 00:00:00 0.0455565
12  3 2000-01-01 00:00:01 0.4533342
8   2 2000-01-01 00:00:02 0.8924190
3   1 2000-01-01 00:00:02 0.4089769
13  3 2000-01-01 00:00:02 0.6775706

With
reshape(X, idvar = "DATE", timevar = "ID", direction = "wide")
                  DATE   VALUE.3   VALUE.1   VALUE.2
14 2000-01-01 00:00:03 0.5726334 0.8830174 0.5514350
1  2000-01-01 00:00:00 0.9568333 0.2875775 0.0455565
15 2000-01-01 00:00:04 0.1029247        NA        NA
7  2000-01-01 00:00:01 0.4533342        NA 0.5281055
8  2000-01-01 00:00:02 0.6775706 0.4089769 0.8924190
and then sorting by date.

What I was trying to do is to:
 - DT <- data.table(X)
 - setKey(DT, ID, DATE)
... than start the staff I'm not able to manage:
My idea was to have 3 data tables in a list and then merge, the code You 
refer to (do.call(merge ...)) is not working as described and so is not 
really relevant here. Also
Also data.table do not implement a merge method so this approach it's 
probably not optimal.
Another option can be data.table and reshape, a discussion already took 
place:
http://www.mail-archive.com/r-help@r-project.org/msg102833.html
but I didn't find a solution to the problem from that.

Since data.table is so fast in sorting and grouping I'm wondering if there 
is the possibility to reshape data as described in the example faster then 
what actually I'm able to do leveraging xts, split and merge functions.
 X <- xts(X[,c("ID","VALUE")], as.POSIXct(X[,"DATE"]))
 X <- do.call(merge, split(X$VALUE,X$ID))

My understanding is that data.table was created with time series in mind 
(also supported by Your presentation at LondonR), since most advanced time 
series objects in R (xts, timeSeries or zoo) need to be created as described 
in previous post, to have a really fast approach to reshape data and get a 
suitable format for specialized time series objects would be a very nice 
feature.


Thanks in advance,
Daniele

-----Original Message-----
From: Matthew Dowle [mailto:mdowle at mdowle.plus.com]
Sent: 06 April 2011 00:48
To: Daniele Amberti
Cc: datatable-help at r-forge.wu-wien.ac.at; Matthew Dowle
Subject: RE: [datatable-help] How to speed up grouping time series,help 
please

pls explain the big picture. I don't recognise why you're attempting to
do.call merge.
> Thanks for Your reply Matthew,
> On 10 ts, 10000 values each, it takes 5.7 seconds to reshape, I'm willing
> to reduce time at least by the half reducing my total batch time by 10
> minutes approximately (over 70 minutes total).
>
> I'm trying to do something like:
> do.call(mergfe, x[, .SD, by=ID]) but data.table is not designed to work
> this way (return a data.table), there is no problem in data.table itself.
> I'm trying to extract K (10) data.table from  a data.table with keys ID,
> DATE and then CJ.
>
>
> Thanks in advance for any help.
> Best regards,
> Daniele
>
> -----Original Message-----
> From: datatable-help-bounces at r-forge.wu-wien.ac.at
> [mailto:datatable-help-bounces at r-forge.wu-wien.ac.at] On Behalf Of Matthew
> Dowle
> Sent: 05 April 2011 14:20
> To: datatable-help at r-forge.wu-wien.ac.at
> Subject: Re: [datatable-help] How to speed up grouping time series,help
> please
>
> It's easier to help if you provide timings along with your example
> reproducible code, please.
> How long is it taking, and how long do you think it should take?
> Please also try to avoid phrases such as "without success". Does that mean
> you got an error message (if so, what was it) or wrong result (if so, what
> was wrong)?
> Matthew
>
> "Daniele Amberti" <daniele.amberti at ors.it> wrote in message
> news:5C57984CA179A247803E12AAB0F7ABA6DB20979608 at adorsmail01.ors.local...
>>I retrieve for a few hundred times a group of time series (10-15 ts
>>with 10000 values each), on every group I do some calculation, graphs
>>etc. I wonder if there is a faster method than what presented below to
>>get an appropriate timeseries object.
>>
>> Making a query with RODBC for every group I get a data frame like this:
>>
>>> X
>>  ID                DATE     VALUE
>> 14  3 2000-01-01 00:00:03 0.5726334
>> 4   1 2000-01-01 00:00:03 0.8830174
>> 1   1 2000-01-01 00:00:00 0.2875775
>> 15  3 2000-01-01 00:00:04 0.1029247
>> 11  3 2000-01-01 00:00:00 0.9568333
>> 9   2 2000-01-01 00:00:03 0.5514350
>> 7   2 2000-01-01 00:00:01 0.5281055
>> 6   2 2000-01-01 00:00:00 0.0455565
>> 12  3 2000-01-01 00:00:01 0.4533342
>> 8   2 2000-01-01 00:00:02 0.8924190
>> 3   1 2000-01-01 00:00:02 0.4089769
>> 13  3 2000-01-01 00:00:02 0.6775706
>>
>> And I want to get a timeSeries object or xts object like this:
>>
>>                           1         2         3
>> 2000-01-01 00:00:00 0.2875775 0.0455565 0.9568333
>> 2000-01-01 00:00:01        NA 0.5281055 0.4533342
>> 2000-01-01 00:00:02 0.4089769 0.8924190 0.6775706
>> 2000-01-01 00:00:03 0.8830174 0.5514350 0.5726334
>> 2000-01-01 00:00:04        NA        NA 0.1029247
>>
>> Both classes accept a matrix so if I can create a matrix like the one
>> represented above and an array of characters representing dates faster
>> than what possible with xts:::merge, for example, I will have a faster
>> implementation, this is the reason why I'm writing to datatable-help;
>> I red vignettes, tests and did tests trying to generate a set of
>> data.table (using .SD and by = ID) an then CJ but without success up
>> to now, any input to test this approach will be really appreciate.
>>
>> Input data can be sorted or unsorted (the most complicated case is in
>> the example, unsorted and missing data) in the sense that I can  sort
>> in query if I can take an advantage from this.
>>
>> Below some code to generate the test case above.
>>
>> Thanks in advance for any input, best regards, Daniele
>>
>>
>> set.seed(123)
>> N <- 100 # number of observations, use 5 to replicate test case above
>> K <- 3   # number of timeseries ID
>>
>> X <- data.frame(
>> ID = rep(1:K, each = N),
>> DATE = as.character(rep(as.POSIXct("2000-01-01", tz = "GMT")+ 0:(N-1),
>> K)),
>> VALUE = runif(N*K), stringsAsFactors = FALSE)
>>
>> X <- X[sample(1:(N*K), N*K),] # sample observations to get random order
>> (optional)
>> X <- X[-(sample(1:nrow(X), floor(nrow(X)*0.2))),] # 20% missing
>>
>> head(X, 15)
>>
>>
>> # an implementation in xts:
>> xtsSplit <- function(x)
>> {
>> library(xts)
>> x <- xts(x[,c("ID","VALUE")], as.POSIXct(x[,"DATE"]))
>> x <- do.call(merge, split(x$VALUE,x$ID))
>> return(x)
>> }
>>
>> xtsSplitTime <- replicate(50,
>> system.time(xtsSplit(X))[[1]])
>> median(xtsTime)
>>
>>
>> ORS Srl
>>
>> Via Agostino Morando 1/3 12060 Roddi (Cn) - Italy
>> Tel. +39 0173 620211
>> Fax. +39 0173 620299 / +39 0173 433111
>> Web Site www.ors.it
>>
>> ------------------------------------------------------------------------------------------------------------------------
>> Qualsiasi utilizzo non autorizzato del presente messaggio e dei suoi
>> allegati è vietato e potrebbe costituire reato.
>> Se lei avesse ricevuto erroneamente questo messaggio, Le saremmo grati
>> se
>> provvedesse alla distruzione dello stesso
>> e degli eventuali allegati.
>> Opinioni, conclusioni o altre informazioni riportate nella e-mail, che
>> non
>> siano relative alle attività e/o
>> alla missione aziendale di O.R.S. Srl si intendono non  attribuibili
>> alla
>> società stessa, né la impegnano in alcun modo.
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
>
>
>
>
> ORS Srl
>
> Via Agostino Morando 1/3 12060 Roddi (Cn) - Italy
> Tel. +39 0173 620211
> Fax. +39 0173 620299 / +39 0173 433111
> Web Site www.ors.it
>
> ------------------------------------------------------------------------------------------------------------------------
> Qualsiasi utilizzo non autorizzato del presente messaggio e dei suoi
> allegati è vietato e potrebbe costituire reato.
> Se lei avesse ricevuto erroneamente questo messaggio, Le saremmo grati se
> provvedesse alla distruzione dello stesso
> e degli eventuali allegati.
> Opinioni, conclusioni o altre informazioni riportate nella e-mail, che non
> siano relative alle attività e/o
> alla missione aziendale di O.R.S. Srl si intendono non  attribuibili alla
> società stessa, né la impegnano in alcun modo.
>



ORS Srl

Via Agostino Morando 1/3 12060 Roddi (Cn) - Italy
Tel. +39 0173 620211
Fax. +39 0173 620299 / +39 0173 433111
Web Site www.ors.it

------------------------------------------------------------------------------------------------------------------------
Qualsiasi utilizzo non autorizzato del presente messaggio e dei suoi 
allegati è vietato e potrebbe costituire reato.
Se lei avesse ricevuto erroneamente questo messaggio, Le saremmo grati se 
provvedesse alla distruzione dello stesso
e degli eventuali allegati.
Opinioni, conclusioni o altre informazioni riportate nella e-mail, che non 
siano relative alle attività e/o
alla missione aziendale di O.R.S. Srl si intendono non  attribuibili alla 
società stessa, né la impegnano in alcun modo. 





More information about the datatable-help mailing list