Thanks Matthew<br><br>I am not sure I understand the code (actually, I am sure I do not :-( . More specifically, I would expect the two expressions below to yield tables of the same dimension (basically all combinations of wdpaint and pnnid): <br>

<br>aa <- SPFdt[, .N, by=list(sample(wdpaint,replace=FALSE),pnvid)]<br>dim(aa)<br>> 254  3<br>bb <- SPFdt[, .N, by=list(wdpaint,pnvid)<br>dim(bb)<br>> 170 3<br><br>What I am looking for is creating a cross table of pnvid and wdpaint, i.e., the frequency or number of occurrences of each combination of pnvid and wdpaint. Shuffling wdpaint should give in that case a different frequency distribution, like in the example below:<br>

<br>table(c(1,1,2,2), c(3,3,4,4))<br>table(c(2,2,1,1), c(3,3,4,4))<br><br>Basically what I want to do is run X permutations on a data set which I will then use to create a confidence interval on the frequency distribution of sample points over wdpaint and pnvid<br>

<br>Cheers,<br><br>Paulo<br><br><br><br><br><br><div class="gmail_quote">On Tue, Jun 19, 2012 at 3:30 PM, Matthew Dowle <span dir="ltr"><<a href="mailto:mdowle@mdowle.plus.com" target="_blank">mdowle@mdowle.plus.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>

Hi,<br>

<br>

Welcome to the list.<br>

<br>

Rather than picking a column and calling length() on it, .N is a little<br>

more convenient (and faster if that column isn't otherwise used, as in<br>

this example). Search ?data.table for the string ".N" to find out more.<br>

<br>

And to group by expressions of column names, wrap with list().  So,<br>

<br>

    SPF[, .N, by=list(sample(wdpaint,replace=FALSE),pnvid)]<br>

<br>

But that won't calculate any different statistics, just return the groups<br>

in a different order. Seems like just an example, rather than the real<br>

task, iiuc, which is fine of course.<br>

<br>

Matthew<br>

<div class="im"><br>

<br>

> Hi, I am new to this package and not sure how to implement the sample()<br>

> function with data.table.<br>

><br>

> I have a data frame SPF with three columns cat, pnvid and wdpaint. The<br>

> pnvid variables has values 1:3, the wdpaint has values 1:10. I am<br>

> interested in the count of all combinations of wdpaint and pnvid in my<br>

> data<br>

> set, which can be calculated using table or tapply (I use the latter in<br>

> the<br>

> example code below).<br>

><br>

> Normally I would use something like:<br>

><br>

</div>> *c <- tapply(SPF$cat, list(as.factor(SPF$pnvid), as.factor(SPF$wdpaint),<br>

> function(x) length(x))*<br>

<div class="im">><br>

> If I understand correctly, I would use the below when working with data<br>

> tables:<br>

><br>

</div>> *f <- SPF[,length(cat),by="wdpaint,pnvid"]*<br>

<div class="im">><br>

> But what if I want to reshuffle the column wdpaint first? When using<br>

> tapply, it would be something along the lines of:<br>

><br>

</div>> *a <- list(as.factor(SPF$pnvid), as.factor(sample(SPF$wdpaint,<br>

> replace=F)))<br>

> c <- tapply(SPF$cat, a, function(x) length(x))*<br>

<div class="im">><br>

><br>

> But how to do this with data.table?<br>

><br>

> Paulo<br>

</div>> _______________________________________________<br>

> datatable-help mailing list<br>

> <a href="mailto:datatable-help@lists.r-forge.r-project.org">datatable-help@lists.r-forge.r-project.org</a><br>

> <a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a><br>

<br>

<br>

</blockquote></div><br>