[datatable-help] variable column names
Sam Steingold
sds at gnu.org
Fri Apr 26 17:45:22 CEST 2013
I am still missing something:
--8<---------------cut here---------------start------------->8---
> dt <- data.table(user=c(rep(4, 5),rep(3, 5)), behavior=c(rep(FALSE,5),rep(TRUE,5)),
country=c(rep(1,4),rep(2,6)), language=c(rep(6,6),rep(5,4)),
event=1:10, key=c("user","country","language"))
> dt
user behavior country language event
1: 3 TRUE 2 5 7
2: 3 TRUE 2 5 8
3: 3 TRUE 2 5 9
4: 3 TRUE 2 5 10
5: 3 TRUE 2 6 6
6: 4 FALSE 1 6 1
7: 4 FALSE 1 6 2
8: 4 FALSE 1 6 3
9: 4 FALSE 1 6 4
10: 4 FALSE 2 6 5
> users <- dt[, sum(behavior) > 0, by=user]
Finding groups (bysameorder=TRUE) ... done in 0secs. bysameorder=TRUE and o__ is length 0
Detected that j uses these columns: behavior
Optimization is on but j left unchanged as 'sum(behavior) > 0'
Starting dogroups ... done dogroups in 0 secs
> users
user V1
1: 3 TRUE
2: 4 FALSE
> setnames(users, "V1", "behavior")
--8<---------------cut here---------------end--------------->8---
Now I want to do the same thing as in
http://stackoverflow.com/questions/16200815/summarize-a-data-table-with-unreliable-data
for both fields
> fields <- c("country","language")
here is what I tried so far:
--8<---------------cut here---------------start------------->8---
dt[, .N, .SDcols=fields, by=eval(list("user",fields))]
Error in `[.data.table`(dt, , .N, .SDcols = fields, by = eval(list("user", :
The items in the 'by' or 'keyby' list are length (1,2). Each must be same length as rows in x or number of rows returned by i (10).
Calls: [ -> [.data.table
--8<---------------cut here---------------end--------------->8---
the idea is to do something like
--8<---------------cut here---------------start------------->8---
> dt.out <- dt[, .N, by=list(user,country)][, list(country[which.max(N)], max(N)/sum(N)), by=user]
> setnames(dt.out, c("V1", "V2"), paste0("country",c(".name", ".support")))
> users <- users[dt.out]
user behavior country.name country.support
1: 3 TRUE 2 1.0
2: 4 FALSE 1 0.8
--8<---------------cut here---------------end--------------->8---
except that I do not want to have the literal "country" and "language"
and that I am sure there is a way to avoid copying users in
> users <- users[dt.out]
by a ":=" trick.
Thanks.
> * Matthew Dowle <zqbjyr at zqbjyr.cyhf.pbz> [2013-04-24 21:54:17 +0100]:
>
> where ... is eval(myid)
> iigc
>> Or:
>> DT[,lapply(.SD,sum),by=...,.SDcols=myvars]
--
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.10 (quantal) X 11.0.11300000
http://www.childpsy.net/ http://palestinefacts.org http://ffii.org
http://jihadwatch.org http://thereligionofpeace.com
Morning is too early for anything but sleep.
More information about the datatable-help
mailing list