[datatable-help] checking an approach to filtering rows in a data.table

Vincent Carey stvjc at channing.harvard.edu
Mon Mar 10 04:33:12 CET 2014


I have looked around for code on row filtering with data.table, but have
not found anything addressing this use case.

I want to retrieve the rows satisfying a certain condition within groups,
in this case having the maximum value for a specific variable.  The
following
seems to work, but I wonder if there is a more direct approach.

rowsWmaxVinG = function(dt, V, by) {
#
# filter dt to the rows possessing max value of
# variable V within groups formed using by
#
# example: data(mtcars)
# ddt = data.table(mtcars)
#> rowsWmaxVinG( ddt, by="cyl", V="mpg")
#    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#1: 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
#2: 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
#3: 19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
#
 setkeyv(dt, c(by, V)) # sort within groups
 dt[ cumsum(dt[, .N, by=by]$N), ]  # take last row from each group
}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140309/91b7db32/attachment.html>


More information about the datatable-help mailing list