[datatable-help] using sample() in data.table

Paulo van Breugel p.vanbreugel at gmail.com
Tue Jun 19 16:13:44 CEST 2012


Thanks Matthew

I am not sure I understand the code (actually, I am sure I do not :-( .
More specifically, I would expect the two expressions below to yield tables
of the same dimension (basically all combinations of wdpaint and pnnid):

aa <- SPFdt[, .N, by=list(sample(wdpaint,replace=FALSE),pnvid)]
dim(aa)
> 254  3
bb <- SPFdt[, .N, by=list(wdpaint,pnvid)
dim(bb)
> 170 3

What I am looking for is creating a cross table of pnvid and wdpaint, i.e.,
the frequency or number of occurrences of each combination of pnvid and
wdpaint. Shuffling wdpaint should give in that case a different frequency
distribution, like in the example below:

table(c(1,1,2,2), c(3,3,4,4))
table(c(2,2,1,1), c(3,3,4,4))

Basically what I want to do is run X permutations on a data set which I
will then use to create a confidence interval on the frequency distribution
of sample points over wdpaint and pnvid

Cheers,

Paulo





On Tue, Jun 19, 2012 at 3:30 PM, Matthew Dowle <mdowle at mdowle.plus.com>wrote:

>
> Hi,
>
> Welcome to the list.
>
> Rather than picking a column and calling length() on it, .N is a little
> more convenient (and faster if that column isn't otherwise used, as in
> this example). Search ?data.table for the string ".N" to find out more.
>
> And to group by expressions of column names, wrap with list().  So,
>
>    SPF[, .N, by=list(sample(wdpaint,replace=FALSE),pnvid)]
>
> But that won't calculate any different statistics, just return the groups
> in a different order. Seems like just an example, rather than the real
> task, iiuc, which is fine of course.
>
> Matthew
>
>
> > Hi, I am new to this package and not sure how to implement the sample()
> > function with data.table.
> >
> > I have a data frame SPF with three columns cat, pnvid and wdpaint. The
> > pnvid variables has values 1:3, the wdpaint has values 1:10. I am
> > interested in the count of all combinations of wdpaint and pnvid in my
> > data
> > set, which can be calculated using table or tapply (I use the latter in
> > the
> > example code below).
> >
> > Normally I would use something like:
> >
> > *c <- tapply(SPF$cat, list(as.factor(SPF$pnvid), as.factor(SPF$wdpaint),
> > function(x) length(x))*
> >
> > If I understand correctly, I would use the below when working with data
> > tables:
> >
> > *f <- SPF[,length(cat),by="wdpaint,pnvid"]*
> >
> > But what if I want to reshuffle the column wdpaint first? When using
> > tapply, it would be something along the lines of:
> >
> > *a <- list(as.factor(SPF$pnvid), as.factor(sample(SPF$wdpaint,
> > replace=F)))
> > c <- tapply(SPF$cat, a, function(x) length(x))*
> >
> >
> > But how to do this with data.table?
> >
> > Paulo
> > _______________________________________________
> > datatable-help mailing list
> > datatable-help at lists.r-forge.r-project.org
> >
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20120619/8af110d0/attachment.html>


More information about the datatable-help mailing list