[datatable-help] number of rows selected in .SD subset
Ben Tupper
btupper at bigelow.org
Thu Jan 22 18:11:34 CET 2015
Hello,
I have been learning to use data.table and studying the vignette located here...
https://rawgit.com/wiki/Rdatatable/data.table/vignettes/datatable-intro-vignette.html
Section 2f. shows how to subset a data.table to select an arbitrary number of rows in each .SD. That's really handy.
2. Aggregations
f. Subset .SD for each group: ans <- flights[, head(.SD, 2), by=month]
In a similar way, I can get the last row of the .SD using either tail, nrow or dim (I don't think it matters much, but dim seems to be a faster*).
ans <- flights[,.SD[dim(.SD)[1]], by=month]
I got to wondering if the number of rows in .SD might be exposed in each grouping iteration. Is there an equivalent to .N for the subset data.table, .SD? Something like .SDN or the like?
Thanks for data.table!
Ben
* After reading this discussion http://r.789695.n4.nabble.com/What-is-the-fastest-way-to-determine-that-data-table-is-empty-td4638348.html#a4638451 I tried out a couple of methods for getting the last element of a grouping using nrow(), tail() and dim().
# using tail
> microbenchmark( last1 <- flights[, tail(.SD, 1), by=month] )
Unit: milliseconds
expr min lq mean median uq max neval
last1 <- flights[, tail(.SD, 1), by = month] 16.65898 16.89704 18.26415 17.37007 19.20147 40.12966 100
# using dim
> microbenchmark( last2 <- flights[,.SD[dim(.SD)[1]], by=month] )
Unit: milliseconds
expr min lq mean median uq max neval
last2 <- flights[, .SD[dim(.SD)[1]], by = month] 15.51243 15.87788 17.40978 16.19426 17.83308 59.22429 100
# using nrow
> microbenchmark( last3 <- flights[,.SD[nrow(.SD)], by=month] )
Unit: milliseconds
expr min lq mean median uq max neval
last3 <- flights[, .SD[nrow(.SD)], by = month] 15.63919 15.92073 17.28836 16.52588 18.33867 24.92624 100
> identical(last1, last2)
[1] TRUE
> identical(last1, last3)
[1] TRUE
Ben Tupper
Bigelow Laboratory for Ocean Sciences
60 Bigelow Drive, P.O. Box 380
East Boothbay, Maine 04544
http://www.bigelow.org
More information about the datatable-help
mailing list