[datatable-help] data.table and reshape

Matthew Dowle mdowle at mdowle.plus.com
Thu Aug 4 04:47:12 CEST 2011


Hi,

I don't know reshape/dcast/melt well, so thanks Dennis. I've linked this
thread to the FR on it. This area seems to be coming up quite a bit.

https://r-forge.r-project.org/tracker/index.php?func=detail&aid=1055&group_id=240&atid=978

Matthew

On Mon, 2011-08-01 at 18:14 -0700, Dennis Murphy wrote:
> Hi:
> 
> An alternative to the reshape() function is the reshape package (or
> the enhanced reshape2 package). Since data tables also have a data
> frame attribute, the reshape package plays nice with them. Here's how
> it would look using the cast() function in the reshape package:
> 
> library(reshape)
> cast(out, x ~ y, value = 'SUM')
>   x  AA  BB
> 1 a  72 123
> 2 b  84 119
> 3 c 162  96
> 
> The variables you want in the rows ('id' variables) are listed on the
> LHS of the formula, the 'timevar' variable is on the right hand side
> of the formula and the value variable is the 'dependent' variable, for
> lack of a better term.
> 
> The dcast() function in the reshape2 package is preferred because it
> has a few extra options that come in handy on occasion - e.g., a means
> of optionally setting a value when a cell in the reshaped data frame
> is empty rather than filling it with NA. The code in this case is
> almost identical:
> 
> > dcast(out, x ~ y, value_var = 'SUM')
>   x  AA  BB
> 1 a  72 123
> 2 b  84 119
> 3 c 162  96
> 
> There are some differences in the output of the two functions, though:
> 
> > str(dcast(out, x ~ y, value = 'SUM'))
> Using SUM as value column: use value_var to override.
> 'data.frame':   3 obs. of  3 variables:
>  $ x : Factor w/ 3 levels "a","b","c": 1 2 3
>  $ AA: int  72 84 162
>  $ BB: int  123 119 96
> > str(reshape(out, direction='wide', idvar='x', timevar='y'))
> Classes ‘data.table’ and 'data.frame':  3 obs. of  3 variables:
>  $ x     : Factor w/ 3 levels "a","b","c": 1 2 3
>  $ SUM.AA: int  72 84 162
>  $ SUM.BB: int  123 119 96
>  - attr(*, "reshapeWide")=List of 5
>   ..$ v.names: NULL
>   ..$ timevar: chr "y"
>   ..$ idvar  : chr "x"
>   ..$ times  : Factor w/ 2 levels "AA","BB": 1 2
>   ..$ varying: chr [1, 1:2] "SUM.AA" "SUM.BB"
> > str(as.data.table( dcast(out, x ~ y, value = 'SUM')))
> Using SUM as value column: use value_var to override.
> Classes ‘data.table’ and 'data.frame':  3 obs. of  3 variables:
>  $ x : Factor w/ 3 levels "a","b","c": 1 2 3
>  $ AA: int  72 84 162
>  $ BB: int  123 119 96
> 
> As you can see, the last line retains both classes but does not create
> the attributes that the reshape() function does. You can decide which
> best suits your purposes.
> 
> HTH,
> Dennis
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help




More information about the datatable-help mailing list