[datatable-help] number of rows selected in .SD subset

Ben Tupper btupper at bigelow.org
Thu Jan 22 18:11:34 CET 2015


Hello,

I have been learning to use data.table and studying the vignette located here...

https://rawgit.com/wiki/Rdatatable/data.table/vignettes/datatable-intro-vignette.html

Section 2f. shows how to subset a data.table to select an arbitrary number of rows in each .SD.  That's really handy.

2. Aggregations
  f. Subset .SD for each group:    ans <- flights[, head(.SD, 2), by=month]

In a similar way, I can get the last row of the .SD using either tail, nrow or dim (I don't think it matters much, but dim seems to be a faster*).

  ans <- flights[,.SD[dim(.SD)[1]], by=month]

I got to wondering if the number of rows in .SD might be exposed in each grouping iteration.  Is there an equivalent to .N for the subset data.table, .SD?  Something like .SDN or the like?   

Thanks for data.table!

Ben

* After reading this discussion http://r.789695.n4.nabble.com/What-is-the-fastest-way-to-determine-that-data-table-is-empty-td4638348.html#a4638451 I tried out a couple of methods for getting the last element of a grouping using nrow(), tail() and dim().

# using tail
> microbenchmark( last1 <- flights[, tail(.SD, 1), by=month] )
Unit: milliseconds
                                         expr      min       lq     mean   median       uq      max neval
 last1 <- flights[, tail(.SD, 1), by = month] 16.65898 16.89704 18.26415 17.37007 19.20147 40.12966   100

# using dim
>   microbenchmark( last2 <- flights[,.SD[dim(.SD)[1]], by=month] )
Unit: milliseconds
                                             expr      min       lq     mean   median       uq      max neval
 last2 <- flights[, .SD[dim(.SD)[1]], by = month] 15.51243 15.87788 17.40978 16.19426 17.83308 59.22429   100

# using nrow
>   microbenchmark( last3 <- flights[,.SD[nrow(.SD)], by=month] )
Unit: milliseconds
                                           expr      min       lq     mean   median       uq      max neval
 last3 <- flights[, .SD[nrow(.SD)], by = month] 15.63919 15.92073 17.28836 16.52588 18.33867 24.92624   100

>   identical(last1, last2)
[1] TRUE
>   identical(last1, last3)
[1] TRUE

Ben Tupper
Bigelow Laboratory for Ocean Sciences
60 Bigelow Drive, P.O. Box 380
East Boothbay, Maine 04544
http://www.bigelow.org










More information about the datatable-help mailing list