[datatable-help] calculating lagged values of panel data using data.table

Fri Jul 15 11:33:39 CEST 2011

Dear all,

I found out about the data.table package yesterday, and I've been very
impressed with the speed so far.

I'm currently trying to calculate lagged values of a variable for some
panel data set. The approach I take is the one below, but I'm not sure
whether that is correct, and whether it is safe to do in this way
(e.g. if data is re-ordered).

Hopefully, one of you will be able to guide me in the right direction.

Many thanks in advance,
Jelmer

# create some data, with missing periods
DT <- data.table( 'id'=c(1,1,1,1,2,2,2), 'time'=c(1,2,4,5,2,3,4),
'value'=c(7,6,5,4,3,2,1) )
setkey( DT, id, time )

# extract id and time values to construct data.table with keys,
including missing periods
id.vec <- unique( DT$id )
time.vec <- unique( DT$time )
num.periods <- max( time.vec ) - min( time.vec ) + 1
tmp.keys <- data.table( 'id'=rep( id.vec, each=num.periods ),
'time'=rep( time.vec, length(id.vec) ) )
setkey( tmp.keys, id, time )

# show data
DT                                                  # original data
DT[ tmp.keys ]                                      # fill in missing
periods with NA

# shift 'value' column by id to get lagged 'value'
DT[ tmp.keys ][,
list(time,value,'value.lag'=c(NA,value[-length(value)])), by=id ]