[datatable-help] join results aren't always sorted?
Sam Steingold
sds at gnu.org
Thu May 30 21:48:52 CEST 2013
Hi,
I have a table:
--8<---------------cut here---------------start------------->8---
> str(dates.dt)
Classes ‘data.table’ and 'data.frame': 1343 obs. of 4 variables:
$ sid : chr "missing" "missing" "missing" "missing" ...
$ s.c : chr "CLICK" "CLICK" "CLICK" "CLICK" ...
$ count: int 70559 71555 79985 84385 88147 94130 100195 109031 116890 129726 ...
$ time : POSIXct, format: "2013-05-15 00:00:00" "2013-05-15 01:00:00" ...
- attr(*, ".internal.selfref")=<externalptr>
- attr(*, "sorted")= chr "sid" "s.c" "time"
> dates.dt
sid s.c count time
1: missing CLICK 70559 2013-05-15 00:00:00
2: missing CLICK 71555 2013-05-15 01:00:00
3: missing CLICK 79985 2013-05-15 02:00:00
4: missing CLICK 84385 2013-05-15 03:00:00
5: missing CLICK 88147 2013-05-15 04:00:00
---
1339: present SHARE 35295 2013-05-28 19:00:00
1340: present SHARE 36284 2013-05-28 20:00:00
1341: present SHARE 69504 2013-05-28 21:00:00
1342: present SHARE 67037 2013-05-28 22:00:00
1343: present SHARE 61014 2013-05-28 23:00:00
--8<---------------cut here---------------end--------------->8---
I summarise them by various fields:
--8<---------------cut here---------------start------------->8---
> shares <- dates.dt[s.c=="SHARE", list(sum(count)) , by="time"]
> clicks <- dates.dt[s.c=="CLICK", list(sum(count)) , by="time"]
> str(shares)
Classes ‘data.table’ and 'data.frame': 336 obs. of 2 variables:
$ time: POSIXct, format: "2013-05-15 00:00:00" "2013-05-15 01:00:00" ...
$ V1 : int 60531 57837 67495 76716 83465 86822 91318 100520 112352 124784 ...
- attr(*, ".internal.selfref")=<externalptr>
> str(clicks)
Classes ‘data.table’ and 'data.frame': 336 obs. of 2 variables:
$ time: POSIXct, format: "2013-05-15 00:00:00" "2013-05-15 01:00:00" ...
$ V1 : int 129450 137222 157721 171319 183720 195652 216003 238295 260715 279235 ...
- attr(*, "sorted")= chr "time"
- attr(*, ".internal.selfref")=<externalptr>
--8<---------------cut here---------------end--------------->8---
why is clicks but not shares sorted by time?
(if I make "time" the first key in dates.dt, the problem goes away, so,
I guess, this is expected).
What I actually want is a single data table keyed by time with columns
shares,clicks,missing,present,missing/clicks &c
I can, obviously, construct it by hand:
--8<---------------cut here---------------start------------->8---
setkeyv(shares,"time")
stopifnot(identical(shares$time,clicks$time))
dt <- data.table(time=shares$time, clicks=clicks$V1, shares=shares$V1)
--8<---------------cut here---------------end--------------->8---
but I was wondering if there is a better way.
Thanks.
--
Sam Steingold (http://sds.podval.org/) on Ubuntu 13.04 (raring) X 11.0.11303000
http://www.childpsy.net/ http://pmw.org.il http://dhimmi.com
http://jihadwatch.org http://www.memritv.org http://honestreporting.com
Garbage In, Gospel Out
More information about the datatable-help
mailing list