[datatable-help] using sample() in data.table

Matthew Dowle mdowle at mdowle.plus.com
Tue Jun 19 15:30:25 CEST 2012


Hi,

Welcome to the list.

Rather than picking a column and calling length() on it, .N is a little
more convenient (and faster if that column isn't otherwise used, as in
this example). Search ?data.table for the string ".N" to find out more.

And to group by expressions of column names, wrap with list().  So,

    SPF[, .N, by=list(sample(wdpaint,replace=FALSE),pnvid)]

But that won't calculate any different statistics, just return the groups
in a different order. Seems like just an example, rather than the real
task, iiuc, which is fine of course.

Matthew


> Hi, I am new to this package and not sure how to implement the sample()
> function with data.table.
>
> I have a data frame SPF with three columns cat, pnvid and wdpaint. The
> pnvid variables has values 1:3, the wdpaint has values 1:10. I am
> interested in the count of all combinations of wdpaint and pnvid in my
> data
> set, which can be calculated using table or tapply (I use the latter in
> the
> example code below).
>
> Normally I would use something like:
>
> *c <- tapply(SPF$cat, list(as.factor(SPF$pnvid), as.factor(SPF$wdpaint),
> function(x) length(x))*
>
> If I understand correctly, I would use the below when working with data
> tables:
>
> *f <- SPF[,length(cat),by="wdpaint,pnvid"]*
>
> But what if I want to reshuffle the column wdpaint first? When using
> tapply, it would be something along the lines of:
>
> *a <- list(as.factor(SPF$pnvid), as.factor(sample(SPF$wdpaint,
> replace=F)))
> c <- tapply(SPF$cat, a, function(x) length(x))*
>
>
> But how to do this with data.table?
>
> Paulo
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help




More information about the datatable-help mailing list