[datatable-help] Integer date and time classes for use with data.table

Matthew Dowle mdowle at mdowle.plus.com
Wed Apr 14 23:20:45 CEST 2010


Yes, looks useful. I'll try and take a look too.

Does anyone know why Date is numeric (i.e. double) rather than integer
anyway ?  In ?Date it says this :

     It is intended that the date should be an integer, but this is not
     enforced in the internal representation.  Fractional days will be
     ignored when printing.  It is possible to produce fractional days
     via the ‘mean’ method or by adding or subtracting (see
     ‘Ops.Date’).

It still seems strange though why Date can't be plain integer. When I
create a Date it goes straight to numeric not integer :

> storage.mode(as.Date("2009-01-01"))
[1] "double"

We could allow double in keys now btw. Its on the list for after v1.4.
But as theres no radix for float, integer date would still be
useful/needed.

Just to throw out some random thoughts. If it could incorporate
something like this :

http://en.wikipedia.org/wiki/ISO_8601#Time_intervals

maybe it could be extended a bit e.g. :

data[CJ(ids,"2007-07:W:2008-01"),sd(diff(log(price))),roll=TRUE]

where the W means weekly, and 2007-07 automatically means month end.
This would be weekly from the last day of 2007-07 (a tuesday). If you
wanted weekly on Fridays it would be "2007-07-27:W:2008-01"

Something like that anyway. The xts package uses ISO 8601 which seems
a good idea to use a standard.  Or we could use/extend xts.

"20070701T090000:1H:20070705T1700"  would be hourly, between 9am on the
first day, and 5pm on the last day, including through the 24hr clock
every day.

"20070701:WD:20070705T0900:1H:1700"  would be hourly between 9am and 5pm
each day, on the weekdays (WD) between those dates. 2007-07-01 was a
sunday so it would start on the monday at 9am.

The timespan string could be passed to data.table, and internally it
would create the appropriate type of vector (IDate/POSIXct/Date/etc) to
join to the table.

Anyway, straw man ...


On Mon, 2010-04-12 at 17:05 -0500, Rob Forler wrote:
> this looks quite beneficial.
> 
> I find myself doing the following a lot
> 
> dt$Date= as.integer(format(dt$Date, "%Y%m%d"))
> 
> So this should take care of this problem.
> 
> I'll see where I can use this and try to test it out,
> Rob
> 
> On Mon, Apr 12, 2010 at 4:08 PM, Short, Tom <TShort at epri.com> wrote:
>         See enclosed for a draft of classes that implement dates and
>         times with
>         integer storage. The IDate class is a simple wrapper around
>         the Date
>         class that tries to keep an integer storage format. The ITime
>         class, the
>         time of day, is stored as the number of seconds in a day.
>         
>         Because IDate and ITime are stored as integers with ranges
>         less than
>         100,000, data.table indexing and sorting is fast. Also
>         included are
>         conversions to and from POSIXct and chron formats.
>         
>         Comments and tests are welcome.
>         
>         Examples:
>         
>         >     (t <- as.ITime("10:45:04"))
>         [1] "10:45:04"
>         >     (d <- as.IDate("2001-01-01"))
>         [1] "2001-01-01"
>         >
>         >     datetime <- seq(as.POSIXct("2001-01-01"),
>         +                     as.POSIXct("2001-01-03"), by = "5 hour")
>         >
>         >     (a <- data.table(IDateTime(datetime), a = rep(1:2, 5),
>         +                      key = "a,date,time"))
>                    date     time a
>          [1,] 2001-01-01 00:00:00 1
>          [2,] 2001-01-01 10:00:00 1
>          [3,] 2001-01-02 06:00:00 1
>          [4,] 2001-01-02 16:00:00 1
>          [5,] 2001-01-02 20:00:00 1
>          [6,] 2001-01-01 05:00:00 2
>          [7,] 2001-01-01 15:00:00 2
>          [8,] 2001-01-02 01:00:00 2
>          [9,] 2001-01-02 11:00:00 2
>         [10,] 2001-01-03 21:00:00 2
>         >
>         >     a[, mean(a), by = "date"]
>                   date  V1
>         [1,] 2001-01-01 1.5
>         [2,] 2001-01-02 1.4
>         [3,] 2001-01-03 2.0
>         >
>         >     as.POSIXct(af$date, af$time)
>         [1] "2001-01-01 00:00:00 EST" "2001-01-01 18:00:00 EST"
>         [3] "2001-01-02 12:00:00 EST" "2001-01-01 06:00:00 EST"
>         [5] "2001-01-02 00:00:00 EST" "2001-01-02 18:00:00 EST"
>         [7] "2001-01-01 12:00:00 EST" "2001-01-02 06:00:00 EST"
>         [9] "2001-01-03 00:00:00 EST"
>         
>         - Tom
>         
>         Tom Short
>         Electric Power Research Institute (EPRI)
>         
>         
>         
>         _______________________________________________
>         datatable-help mailing list
>         datatable-help at lists.r-forge.r-project.org
>         https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>         
> 
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help




More information about the datatable-help mailing list