[datatable-help] Dealing with dateTime

Matthew Dowle mdowle at mdowle.plus.com
Thu Jan 3 13:19:32 CET 2013



One reason 'double' type was added to setkey was to allow
POSIXct in keys. That was as recently as v1.8.2 : 

o Numeric columns
(type 'double') are now allowed in keys and ad hoc
 by. J() and SJ() no
longer coerce 'double' to 'integer'. i join columns
 which mismatch on
numeric type are coerced silently to match
 the type of x's join column.
Two floating point values
 are considered equal (by grouping and binary
search joins) if their
 difference is within sqrt(.Machine$double.eps),
by default. See example
 in ?unique.data.table. Completes FRs #951,
#1609 and #1075. This paves the
 way for other atomic types which use
'double' (such as POSIXct and bit64).
 Thanks to Chris Neff for beta
testing and finding problems with keys
 of two numeric columns (bug
#2004), fixed and tests added.

So, POSIXct, or using integer64 to store
YYYYMMDDHHMMSSmmm is another possibility (no epoch has some pros as well
as cons), or date and time held in separate columns. 

The thinking is,
rightly or wrongly, that R already supports milliseconds in various
ways. data.table doesn't aim to prescribe which datetime class you place
in the data.table; it's up to you what you use. It only has IDate
because Date in R is (oddly) stored as numeric rather than integer which
(I at least) have never really understood. For a long time data.table
only supported integer columns in keys and joins (including factors
which are integers/enumerations). But now double (and character) are
fine in keys too. 

So to answer your question as asked:
as.POSIXct("2010-01-03 09:34:54.342697") already works. But note :





HTH, also : 


But yes
I'm sure we can do better, just not quite sure precisely how. 


On 03.01.2013 11:17, colin umansky wrote: 

> Hello, 
> I have been
thinking about how data.table deals with dateTime and would like to
share my questions/opinions. 
> Where I think data.table is (likely to
be wrong :)) 
> At the moment data.table deals independently with IDate
and ITime (%H:%M:%S) that are simple (Matthew Doyle words) derived
class. As I understand it they are stored as integers to enable fast
radix sorting etc... 
> There is no milli/micro/nano which is a problem
as far as financial time series are concerned. 
> Suggestions: 
> Would
that be possible to store a IDateTime as the number of micro since
epoch-time ? 
> an IDateTime object would be represented like
a=as.IDateTime("2010-01-03 09:34:54.342697"), then 
> year: asIYear(a);
#would display "2010" 
> month: as.IMonth(a); #would display "2010-01"

> date: as.IDate(a); #would display "2010-01-03" 
> etc... 
> Having
all those built-in types would probably be useful to efficient grouping.

> PS: 
> The best soft I have experienced, to deal with timeseries,
data is kdb (http://kx.com/ [1]) 
> I particularly like the way
datetimes are handled (http://code.kx.com/wiki/JB:QforMortals/atoms#time
[2]), it may be a source of inspiration...


[2] http://code.kx.com/wiki/JB:QforMortals/atoms#time
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130103/d3e2a307/attachment.html>

More information about the datatable-help mailing list