[datatable-help] using sample() in data.table

Tue Jun 19 16:24:12 CEST 2012

The shuffling can form a different number of groups can't it?

table(c(1,1,2,2), c(3,3,4,4))   # 2 groups
table(c(2,2,1,1), c(3,3,4,4))   # 2 groups
table(c(2,1,2,1), c(3,3,4,4))   # 4 groups

> Thanks Matthew
>
> I am not sure I understand the code (actually, I am sure I do not :-( .
> More specifically, I would expect the two expressions below to yield
> tables
> of the same dimension (basically all combinations of wdpaint and pnnid):
>
> aa <- SPFdt[, .N, by=list(sample(wdpaint,replace=FALSE),pnvid)]
> dim(aa)
>> 254  3
> bb <- SPFdt[, .N, by=list(wdpaint,pnvid)
> dim(bb)
>> 170 3
>
> What I am looking for is creating a cross table of pnvid and wdpaint,
> i.e.,
> the frequency or number of occurrences of each combination of pnvid and
> wdpaint. Shuffling wdpaint should give in that case a different frequency
> distribution, like in the example below:
>
> table(c(1,1,2,2), c(3,3,4,4))
> table(c(2,2,1,1), c(3,3,4,4))
>
> Basically what I want to do is run X permutations on a data set which I
> will then use to create a confidence interval on the frequency
> distribution
> of sample points over wdpaint and pnvid
>
> Cheers,
>
> Paulo
>
>
>
>
>
> On Tue, Jun 19, 2012 at 3:30 PM, Matthew Dowle
> <mdowle at mdowle.plus.com>wrote:
>
>>
>> Hi,
>>
>> Welcome to the list.
>>
>> Rather than picking a column and calling length() on it, .N is a little
>> more convenient (and faster if that column isn't otherwise used, as in
>> this example). Search ?data.table for the string ".N" to find out more.
>>
>> And to group by expressions of column names, wrap with list().  So,
>>
>>    SPF[, .N, by=list(sample(wdpaint,replace=FALSE),pnvid)]
>>
>> But that won't calculate any different statistics, just return the
>> groups
>> in a different order. Seems like just an example, rather than the real
>> task, iiuc, which is fine of course.
>>
>> Matthew
>>
>>
>> > Hi, I am new to this package and not sure how to implement the
>> sample()
>> > function with data.table.
>> >
>> > I have a data frame SPF with three columns cat, pnvid and wdpaint. The
>> > pnvid variables has values 1:3, the wdpaint has values 1:10. I am
>> > interested in the count of all combinations of wdpaint and pnvid in my
>> > data
>> > set, which can be calculated using table or tapply (I use the latter
>> in
>> > the
>> > example code below).
>> >
>> > Normally I would use something like:
>> >
>> > *c <- tapply(SPF$cat, list(as.factor(SPF$pnvid),
>> as.factor(SPF$wdpaint),
>> > function(x) length(x))*
>> >
>> > If I understand correctly, I would use the below when working with
>> data
>> > tables:
>> >
>> > *f <- SPF[,length(cat),by="wdpaint,pnvid"]*
>> >
>> > But what if I want to reshuffle the column wdpaint first? When using
>> > tapply, it would be something along the lines of:
>> >
>> > *a <- list(as.factor(SPF$pnvid), as.factor(sample(SPF$wdpaint,
>> > replace=F)))
>> > c <- tapply(SPF$cat, a, function(x) length(x))*
>> >
>> >
>> > But how to do this with data.table?
>> >
>> > Paulo
>> > _______________________________________________
>> > datatable-help mailing list
>> > datatable-help at lists.r-forge.r-project.org
>> >
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
>>
>>
>