[datatable-help] by row

Gabor Grothendieck ggrothendieck at gmail.com
Sun Jun 29 22:58:50 CEST 2014


There was some discussion of an .EACHI facility for data.table.  Not
sure what happened about that but I have an example that might be
useful:

http://stackoverflow.com/questions/24472254/splitting-a-column-by-factor-within-a-data-frame/24472571#24472571

which shows the code where DT has columns v1, v2 and v3:

DT[, split(v2, v1), by = names(DT)]

It works well if the rows of DT are unique but if they are not then
one must do something ugly like appending a uniquifying column of
1:nrow(DT), say, and then including that in by and then finally
removing it again at the end.

This suggests two features:

1. The ability to tell it to do the by by row
2. The ability to selectively omit by variables from the output

For example, if one could use a pseudo column .I and if -.I meant do
not include it in the output then one could write:

DT[, split(v2, v1), by = c(names(DT), -.I)]

Other syntaxes may be thought of too and the main suggestion here is
the possible need for these features rather than the specific syntax.

(By the way, is there an intention to move to the issue system on
github for things like this?)

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com


More information about the datatable-help mailing list