Thanks Matthew<br><br>I am not sure I understand the code (actually, I am sure I do not :-( . More specifically, I would expect the two expressions below to yield tables of the same dimension (basically all combinations of wdpaint and pnnid): <br>
<br>aa <- SPFdt[, .N, by=list(sample(wdpaint,replace=FALSE),pnvid)]<br>dim(aa)<br>> 254 3<br>bb <- SPFdt[, .N, by=list(wdpaint,pnvid)<br>dim(bb)<br>> 170 3<br><br>What I am looking for is creating a cross table of pnvid and wdpaint, i.e., the frequency or number of occurrences of each combination of pnvid and wdpaint. Shuffling wdpaint should give in that case a different frequency distribution, like in the example below:<br>
<br>table(c(1,1,2,2), c(3,3,4,4))<br>table(c(2,2,1,1), c(3,3,4,4))<br><br>Basically what I want to do is run X permutations on a data set which I will then use to create a confidence interval on the frequency distribution of sample points over wdpaint and pnvid<br>
<br>Cheers,<br><br>Paulo<br><br><br><br><br><br><div class="gmail_quote">On Tue, Jun 19, 2012 at 3:30 PM, Matthew Dowle <span dir="ltr"><<a href="mailto:mdowle@mdowle.plus.com" target="_blank">mdowle@mdowle.plus.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
Hi,<br>
<br>
Welcome to the list.<br>
<br>
Rather than picking a column and calling length() on it, .N is a little<br>
more convenient (and faster if that column isn't otherwise used, as in<br>
this example). Search ?data.table for the string ".N" to find out more.<br>
<br>
And to group by expressions of column names, wrap with list(). So,<br>
<br>
SPF[, .N, by=list(sample(wdpaint,replace=FALSE),pnvid)]<br>
<br>
But that won't calculate any different statistics, just return the groups<br>
in a different order. Seems like just an example, rather than the real<br>
task, iiuc, which is fine of course.<br>
<br>
Matthew<br>
<div class="im"><br>
<br>
> Hi, I am new to this package and not sure how to implement the sample()<br>
> function with data.table.<br>
><br>
> I have a data frame SPF with three columns cat, pnvid and wdpaint. The<br>
> pnvid variables has values 1:3, the wdpaint has values 1:10. I am<br>
> interested in the count of all combinations of wdpaint and pnvid in my<br>
> data<br>
> set, which can be calculated using table or tapply (I use the latter in<br>
> the<br>
> example code below).<br>
><br>
> Normally I would use something like:<br>
><br>
</div>> *c <- tapply(SPF$cat, list(as.factor(SPF$pnvid), as.factor(SPF$wdpaint),<br>
> function(x) length(x))*<br>
<div class="im">><br>
> If I understand correctly, I would use the below when working with data<br>
> tables:<br>
><br>
</div>> *f <- SPF[,length(cat),by="wdpaint,pnvid"]*<br>
<div class="im">><br>
> But what if I want to reshuffle the column wdpaint first? When using<br>
> tapply, it would be something along the lines of:<br>
><br>
</div>> *a <- list(as.factor(SPF$pnvid), as.factor(sample(SPF$wdpaint,<br>
> replace=F)))<br>
> c <- tapply(SPF$cat, a, function(x) length(x))*<br>
<div class="im">><br>
><br>
> But how to do this with data.table?<br>
><br>
> Paulo<br>
</div>> _______________________________________________<br>
> datatable-help mailing list<br>
> <a href="mailto:datatable-help@lists.r-forge.r-project.org">datatable-help@lists.r-forge.r-project.org</a><br>
> <a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a><br>
<br>
<br>
</blockquote></div><br>