[datatable-help] data.table and reshape

Dennis Murphy djmuser at gmail.com
Tue Aug 2 03:14:13 CEST 2011


Hi:

An alternative to the reshape() function is the reshape package (or
the enhanced reshape2 package). Since data tables also have a data
frame attribute, the reshape package plays nice with them. Here's how
it would look using the cast() function in the reshape package:

library(reshape)
cast(out, x ~ y, value = 'SUM')
  x  AA  BB
1 a  72 123
2 b  84 119
3 c 162  96

The variables you want in the rows ('id' variables) are listed on the
LHS of the formula, the 'timevar' variable is on the right hand side
of the formula and the value variable is the 'dependent' variable, for
lack of a better term.

The dcast() function in the reshape2 package is preferred because it
has a few extra options that come in handy on occasion - e.g., a means
of optionally setting a value when a cell in the reshaped data frame
is empty rather than filling it with NA. The code in this case is
almost identical:

> dcast(out, x ~ y, value_var = 'SUM')
  x  AA  BB
1 a  72 123
2 b  84 119
3 c 162  96

There are some differences in the output of the two functions, though:

> str(dcast(out, x ~ y, value = 'SUM'))
Using SUM as value column: use value_var to override.
'data.frame':   3 obs. of  3 variables:
 $ x : Factor w/ 3 levels "a","b","c": 1 2 3
 $ AA: int  72 84 162
 $ BB: int  123 119 96
> str(reshape(out, direction='wide', idvar='x', timevar='y'))
Classes ‘data.table’ and 'data.frame':  3 obs. of  3 variables:
 $ x     : Factor w/ 3 levels "a","b","c": 1 2 3
 $ SUM.AA: int  72 84 162
 $ SUM.BB: int  123 119 96
 - attr(*, "reshapeWide")=List of 5
  ..$ v.names: NULL
  ..$ timevar: chr "y"
  ..$ idvar  : chr "x"
  ..$ times  : Factor w/ 2 levels "AA","BB": 1 2
  ..$ varying: chr [1, 1:2] "SUM.AA" "SUM.BB"
> str(as.data.table( dcast(out, x ~ y, value = 'SUM')))
Using SUM as value column: use value_var to override.
Classes ‘data.table’ and 'data.frame':  3 obs. of  3 variables:
 $ x : Factor w/ 3 levels "a","b","c": 1 2 3
 $ AA: int  72 84 162
 $ BB: int  123 119 96

As you can see, the last line retains both classes but does not create
the attributes that the reshape() function does. You can decide which
best suits your purposes.

HTH,
Dennis


More information about the datatable-help mailing list