[datatable-help] Dealing with dateTime

colin umansky statquant at gmail.com
Thu Jan 3 17:58:45 CET 2013


Ok, but sorting on POSIXct(double) should be less efficient than on int64
isn't it (via a radix sort)?

Additionally, I don't know what you think of adding IMonth (looking like
"2011-02"), when grouping, at present we can use month but it does not
dissociate the year, it could be quick and useful for stats computed by
group.

Regards

2013/1/3 Matthew Dowle <mdowle at mdowle.plus.com>

> **
>
>
>
> Hi,
>
> One reason 'double' type was added to setkey was to allow POSIXct in keys.
> That was as recently as v1.8.2 :
>
> o   Numeric columns (type 'double') are now allowed in keys and ad hoc
>         by. J() and SJ() no longer coerce 'double' to 'integer'. i join columns
>         which mismatch on numeric type are coerced silently to match
>         the type of x's join column. Two floating point values
>         are considered equal (by grouping and binary search joins) if their
>         difference is within sqrt(.Machine$double.eps), by default. See example
>         in ?unique.data.table. Completes FRs #951, #1609 and #1075. This paves the
>         way for other atomic types which use 'double' (such as POSIXct and bit64).
>         Thanks to Chris Neff for beta testing and finding problems with keys
>         of two numeric columns (bug #2004), fixed and tests added.
>
> So, POSIXct, or using integer64 to store  YYYYMMDDHHMMSSmmm is another
> possibility (no epoch has some pros as well as cons),  or date and time
> held in separate columns.
>
> The thinking is, rightly or wrongly, that R already supports milliseconds
> in various ways. data.table doesn't aim to prescribe which datetime class
> you place in the data.table; it's up to you what you use.  It only has
> IDate because Date in R is (oddly) stored as numeric rather than integer
> which (I at least) have never really understood.  For a long time
> data.table only supported integer columns in keys and joins (including
> factors which are integers/enumerations).  But now double (and character)
> are fine in keys too.
>
> So to answer your question as asked:  as.POSIXct("2010-01-03
> 09:34:54.342697")  already works.  But note :
>
>
> http://stackoverflow.com/questions/10931972/r-issue-with-rounding-milliseconds
>
> http://stackoverflow.com/questions/11136340/zoo-xts-microsecond-read-issue
>
>
> http://stackoverflow.com/questions/8889554/milliseconds-puzzle-when-calling-strptime-in-r
>
> http://stackoverflow.com/questions/2150138/how-to-parse-milliseconds-in-r
>
> HTH, also :
>
> http://stackoverflow.com/a/14063077/403310
>
> But yes I'm sure we can do better, just not quite sure precisely how.
>
> Matthew
>
>
>
> On 03.01.2013 11:17, colin umansky wrote:
>
> Hello,
> I have been thinking about how data.table deals with dateTime and would
> like to share my questions/opinions.
> Where I think data.table is (likely to be wrong :))
> At the moment data.table deals independently with IDate and ITime
> (%H:%M:%S) that are simple (Matthew Doyle words) derived class. As I
> understand it they are stored as integers to enable fast radix sorting
> etc...
> There is no milli/micro/nano which is a problem as far as financial time
> series are concerned.
> Suggestions:
> Would that be possible to store a IDateTime as the number of micro since
> epoch-time ?
> an IDateTime object would be represented like a=as.IDateTime("2010-01-03
> 09:34:54.342697"), then
> year: asIYear(a); #would display "2010"
> month: as.IMonth(a); #would display "2010-01"
> date: as.IDate(a); #would display "2010-01-03"
> etc...
> Having all those built-in types would probably be useful to efficient
> grouping.
> PS:
> The best soft I have experienced, to deal with timeseries, data is kdb (
> http://kx.com/)
> I particularly like the way datetimes are handled (
> http://code.kx.com/wiki/JB:QforMortals/atoms#time), it may be a source of
> inspiration...
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130103/da3ebaec/attachment.html>


More information about the datatable-help mailing list