[datatable-help] variable column names

Fri Apr 26 18:00:27 CEST 2013

> dt[, sum(behavior) > 0, by=user]
    user    V1
1:    3  TRUE
2:    4 FALSE
> dt[, any(behavior), by=user]     # same
    user    V1
1:    3  TRUE
2:    4 FALSE
> dt[, list(behavior = any(behavior)), by=user]   # how to same without 
> setnames afterwards
    user behavior
1:    3     TRUE
2:    4    FALSE
> fields <- c("country","language")
> dt[, list(behavior = any(behavior)), by=c("user",fields)]   # by may 
> be character vector of column names
    user country language behavior
1:    3       2        5     TRUE
2:    3       2        6     TRUE
3:    4       1        6    FALSE
4:    4       2        6    FALSE

HTH
Matthew


On 26.04.2013 16:45, Sam Steingold wrote:
> I am still missing something:
>
> --8<---------------cut here---------------start------------->8---
>> dt <- data.table(user=c(rep(4, 5),rep(3, 5)), 
>> behavior=c(rep(FALSE,5),rep(TRUE,5)),
>                  country=c(rep(1,4),rep(2,6)), 
> language=c(rep(6,6),rep(5,4)),
>                  event=1:10, key=c("user","country","language"))
>> dt
>     user behavior country language event
>  1:    3     TRUE       2        5     7
>  2:    3     TRUE       2        5     8
>  3:    3     TRUE       2        5     9
>  4:    3     TRUE       2        5    10
>  5:    3     TRUE       2        6     6
>  6:    4    FALSE       1        6     1
>  7:    4    FALSE       1        6     2
>  8:    4    FALSE       1        6     3
>  9:    4    FALSE       1        6     4
> 10:    4    FALSE       2        6     5
>>   users <- dt[, sum(behavior) > 0, by=user]
> Finding groups (bysameorder=TRUE) ... done in 0secs. bysameorder=TRUE
> and o__ is length 0
> Detected that j uses these columns: behavior
> Optimization is on but j left unchanged as 'sum(behavior) > 0'
> Starting dogroups ... done dogroups in 0 secs
>> users
>    user    V1
> 1:    3  TRUE
> 2:    4 FALSE
>> setnames(users, "V1", "behavior")
> --8<---------------cut here---------------end--------------->8---
>
> Now I want to do the same thing as in
> 
> http://stackoverflow.com/questions/16200815/summarize-a-data-table-with-unreliable-data
> for both fields
>> fields <- c("country","language")
>
> here is what I tried so far:
>
> --8<---------------cut here---------------start------------->8---
> dt[, .N, .SDcols=fields, by=eval(list("user",fields))]
> Error in `[.data.table`(dt, , .N, .SDcols = fields, by =
> eval(list("user",  :
>   The items in the 'by' or 'keyby' list are length (1,2). Each must
> be same length as rows in x or number of rows returned by i (10).
> Calls: [ -> [.data.table
> --8<---------------cut here---------------end--------------->8---
>
> the idea is to do something like
>
> --8<---------------cut here---------------start------------->8---
>> dt.out <- dt[, .N, by=list(user,country)][, 
>> list(country[which.max(N)], max(N)/sum(N)), by=user]
>> setnames(dt.out, c("V1", "V2"),  paste0("country",c(".name", 
>> ".support")))
>> users <- users[dt.out]
>    user behavior country.name country.support
> 1:    3     TRUE            2             1.0
> 2:    4    FALSE            1             0.8
> --8<---------------cut here---------------end--------------->8---
>
> except that I do not want to have the literal "country" and 
> "language"
> and that I am sure there is a way to avoid copying users in
>> users <- users[dt.out]
> by a ":=" trick.
>
> Thanks.
>
>> * Matthew Dowle <zqbjyr at zqbjyr.cyhf.pbz> [2013-04-24 21:54:17 
>> +0100]:
>>
>> where ... is eval(myid)
>> iigc
>>> Or:
>>> DT[,lapply(.SD,sum),by=...,.SDcols=myvars]