[datatable-help] Idea/feature request

Andreas Borg andreas.borg at unimedizin-mainz.de
Wed May 11 10:24:13 CEST 2011

Hi Steve,

> Now that you've brought this back up, what do you think you would
> prefer? For example, using my (admittedly contrived) original example:
> result <- some.big.data.table[, by=list(colA, colB), {
>  ## Sometimes I want to know what the current values of
>  ## colA and colB are in here to get some more info. Mabye
>  ## we can have .BY:
>  xref <- more.data[J(.BY[1], .BY[2]), mult='all'] ## or something
>  ## ...
> }]
> Should it be `J(.BY[1], .BY[2])` or is something like `J(colA, colB)`
> more natural, you think?
'J(colA, colB)' is perfect if you know the column names in advance. This 
is not true in my case. I created a minimal example for a possible 
application for a '.BY' construct:

 > dt <- data.table(x=c(0,1,0,1), y=c(1,0,1,0))
 > dt
     x y
[1,] 0 1
[2,] 1 0
[3,] 0 1
[4,] 1 0

 From this table, I want the row sum for each group, i.e. "select x + y 
from dt group by x, y" in SQL. This would be:

 > setkey(dt, x, y)
 > dt[,sum(x[1], y[1]), by=list(x,y)]
     x y V1
[1,] 0 1  1
[2,] 1 0  1

But what if dt can have an arbitrary number of (grouping) columns with 
arbitrary names? If the grouping columns are given as

groupCols <- c("x", "y")

, the following is possible:

 > expr <- parse(text = sprintf("sum(%s)", paste(groupCols, "[1]", 
sep="", collapse=", ")))
 > dt[,eval(expr), by=groupCols]
     x y V1
[1,] 0 1  1
[2,] 1 0  1

Now, this is certainly uglier than

 > dt[, sum(.BY), by = groupCols]

My actual application is that I apply decision tree models (rpart) to a 
large number of binary patterns. In order to save computation time, I 
classify each distinct pattern only once. So what I basically do is to 
group by all attributes and apply the model once to each group.


Andreas Borg
Medizinische Informatik

der Johannes Gutenberg-Universität
Institut für Medizinische Biometrie, Epidemiologie und Informatik
Obere Zahlbacher Straße 69, 55131 Mainz

Telefon +49 (0) 6131 175062
E-Mail: borg at imbei.uni-mainz.de

Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der
richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den
Absender und löschen Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe
dieser Mail und der darin enthaltenen Informationen ist nicht gestattet.

More information about the datatable-help mailing list