[datatable-help] Filtering Based on Previous Observation

Gabor Grothendieck ggrothendieck at gmail.com
Wed Apr 30 14:00:10 CEST 2014


On Tue, Apr 29, 2014 at 10:04 AM, Michael Smith <my.r.help at gmail.com> wrote:
> All,
>
> Is there some data.table-idiomatic way to filter based on a previous
> observation/row? For example, I want to remove a row if
> DT$a[row]==DT$a[row-1].
>
> It could be done by first calculating the lag and then filtering based
> on that, but I wonder if there's a more direct way.
>
> The following example works, but my feeling is there should be a more
> elegant solution:
>
> ( DT <- data.table(a = c(1, 2, 2, 3), b = 8:5) )
> DT[, L.a := c(NA, head(a, -1))][a != L.a | is.na(L.a)][, L.a := NULL][]

If the unique elements always appear consecutively then the following
would work.

(For example, if `a` were in ascending order (as in the example) or
descending order then  that would be satisfied.  If DT were keyed
on 'a' then this would always be the case.)

DT[ !duplicated(a) ]

Note that 'a' need not be numeric.

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com


More information about the datatable-help mailing list