[datatable-help] variable column names
Matthew Dowle
mdowle at mdowle.plus.com
Fri Apr 26 18:00:27 CEST 2013
> dt[, sum(behavior) > 0, by=user]
user V1
1: 3 TRUE
2: 4 FALSE
> dt[, any(behavior), by=user] # same
user V1
1: 3 TRUE
2: 4 FALSE
> dt[, list(behavior = any(behavior)), by=user] # how to same without
> setnames afterwards
user behavior
1: 3 TRUE
2: 4 FALSE
> fields <- c("country","language")
> dt[, list(behavior = any(behavior)), by=c("user",fields)] # by may
> be character vector of column names
user country language behavior
1: 3 2 5 TRUE
2: 3 2 6 TRUE
3: 4 1 6 FALSE
4: 4 2 6 FALSE
HTH
Matthew
On 26.04.2013 16:45, Sam Steingold wrote:
> I am still missing something:
>
> --8<---------------cut here---------------start------------->8---
>> dt <- data.table(user=c(rep(4, 5),rep(3, 5)),
>> behavior=c(rep(FALSE,5),rep(TRUE,5)),
> country=c(rep(1,4),rep(2,6)),
> language=c(rep(6,6),rep(5,4)),
> event=1:10, key=c("user","country","language"))
>> dt
> user behavior country language event
> 1: 3 TRUE 2 5 7
> 2: 3 TRUE 2 5 8
> 3: 3 TRUE 2 5 9
> 4: 3 TRUE 2 5 10
> 5: 3 TRUE 2 6 6
> 6: 4 FALSE 1 6 1
> 7: 4 FALSE 1 6 2
> 8: 4 FALSE 1 6 3
> 9: 4 FALSE 1 6 4
> 10: 4 FALSE 2 6 5
>> users <- dt[, sum(behavior) > 0, by=user]
> Finding groups (bysameorder=TRUE) ... done in 0secs. bysameorder=TRUE
> and o__ is length 0
> Detected that j uses these columns: behavior
> Optimization is on but j left unchanged as 'sum(behavior) > 0'
> Starting dogroups ... done dogroups in 0 secs
>> users
> user V1
> 1: 3 TRUE
> 2: 4 FALSE
>> setnames(users, "V1", "behavior")
> --8<---------------cut here---------------end--------------->8---
>
> Now I want to do the same thing as in
>
> http://stackoverflow.com/questions/16200815/summarize-a-data-table-with-unreliable-data
> for both fields
>> fields <- c("country","language")
>
> here is what I tried so far:
>
> --8<---------------cut here---------------start------------->8---
> dt[, .N, .SDcols=fields, by=eval(list("user",fields))]
> Error in `[.data.table`(dt, , .N, .SDcols = fields, by =
> eval(list("user", :
> The items in the 'by' or 'keyby' list are length (1,2). Each must
> be same length as rows in x or number of rows returned by i (10).
> Calls: [ -> [.data.table
> --8<---------------cut here---------------end--------------->8---
>
> the idea is to do something like
>
> --8<---------------cut here---------------start------------->8---
>> dt.out <- dt[, .N, by=list(user,country)][,
>> list(country[which.max(N)], max(N)/sum(N)), by=user]
>> setnames(dt.out, c("V1", "V2"), paste0("country",c(".name",
>> ".support")))
>> users <- users[dt.out]
> user behavior country.name country.support
> 1: 3 TRUE 2 1.0
> 2: 4 FALSE 1 0.8
> --8<---------------cut here---------------end--------------->8---
>
> except that I do not want to have the literal "country" and
> "language"
> and that I am sure there is a way to avoid copying users in
>> users <- users[dt.out]
> by a ":=" trick.
>
> Thanks.
>
>> * Matthew Dowle <zqbjyr at zqbjyr.cyhf.pbz> [2013-04-24 21:54:17
>> +0100]:
>>
>> where ... is eval(myid)
>> iigc
>>> Or:
>>> DT[,lapply(.SD,sum),by=...,.SDcols=myvars]
More information about the datatable-help
mailing list