[datatable-help] join results aren't always sorted?

Sam Steingold sds at gnu.org
Thu May 30 21:48:52 CEST 2013


Hi,
I have a table:
--8<---------------cut here---------------start------------->8---
> str(dates.dt)
Classes ‘data.table’ and 'data.frame':	1343 obs. of  4 variables:
 $ sid  : chr  "missing" "missing" "missing" "missing" ...
 $ s.c  : chr  "CLICK" "CLICK" "CLICK" "CLICK" ...
 $ count: int  70559 71555 79985 84385 88147 94130 100195 109031 116890 129726 ...
 $ time : POSIXct, format: "2013-05-15 00:00:00" "2013-05-15 01:00:00" ...
 - attr(*, ".internal.selfref")=<externalptr> 
 - attr(*, "sorted")= chr  "sid" "s.c" "time"
> dates.dt
          sid   s.c count                time
   1: missing CLICK 70559 2013-05-15 00:00:00
   2: missing CLICK 71555 2013-05-15 01:00:00
   3: missing CLICK 79985 2013-05-15 02:00:00
   4: missing CLICK 84385 2013-05-15 03:00:00
   5: missing CLICK 88147 2013-05-15 04:00:00
  ---                                        
1339: present SHARE 35295 2013-05-28 19:00:00
1340: present SHARE 36284 2013-05-28 20:00:00
1341: present SHARE 69504 2013-05-28 21:00:00
1342: present SHARE 67037 2013-05-28 22:00:00
1343: present SHARE 61014 2013-05-28 23:00:00
--8<---------------cut here---------------end--------------->8---
I summarise them by various fields:
--8<---------------cut here---------------start------------->8---
> shares <- dates.dt[s.c=="SHARE", list(sum(count)) , by="time"]
> clicks <- dates.dt[s.c=="CLICK", list(sum(count)) , by="time"]
> str(shares)
Classes ‘data.table’ and 'data.frame':	336 obs. of  2 variables:
 $ time: POSIXct, format: "2013-05-15 00:00:00" "2013-05-15 01:00:00" ...
 $ V1  : int  60531 57837 67495 76716 83465 86822 91318 100520 112352 124784 ...
 - attr(*, ".internal.selfref")=<externalptr> 
> str(clicks)
Classes ‘data.table’ and 'data.frame':	336 obs. of  2 variables:
 $ time: POSIXct, format: "2013-05-15 00:00:00" "2013-05-15 01:00:00" ...
 $ V1  : int  129450 137222 157721 171319 183720 195652 216003 238295 260715 279235 ...
 - attr(*, "sorted")= chr "time"
 - attr(*, ".internal.selfref")=<externalptr> 
--8<---------------cut here---------------end--------------->8---
why is clicks but not shares sorted by time?
(if I make "time" the first key in dates.dt, the problem goes away, so,
I guess, this is expected).

What I actually want is a single data table keyed by time with columns
shares,clicks,missing,present,missing/clicks &c
I can, obviously, construct it by hand:
--8<---------------cut here---------------start------------->8---
setkeyv(shares,"time")
stopifnot(identical(shares$time,clicks$time))
dt <- data.table(time=shares$time, clicks=clicks$V1, shares=shares$V1)
--8<---------------cut here---------------end--------------->8---
but I was wondering if there is a better way.
Thanks.

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 13.04 (raring) X 11.0.11303000
http://www.childpsy.net/ http://pmw.org.il http://dhimmi.com
http://jihadwatch.org http://www.memritv.org http://honestreporting.com
Garbage In, Gospel Out


More information about the datatable-help mailing list