[datatable-help] What's your opinion on the feature request: add option mult="random"

Matthew Dowle mdowle at mdowle.plus.com
Fri Jan 6 09:34:49 CET 2012


Very keen for direct contributions in that way, happy to help you with
svn etc, and you joining the project.

In this particular example, how about :

    rawData[sample(rawData[J("eu"), which=TRUE],size=1)]

This solves the inefficiency of the 1st step; i.e.,
    intDT <- rawData[J("eu"), mult="all"]
which copies a subset of all the columns, whilst retaining flexibility
for the user so user can easily sample 2 rows, or any other R method to
select a random subset.

Because of potential scoping conflicts (say a column was called
"rawData" i.e. the same name of the table), to be more robust :

x = sample(rawData[J("eu"), which=TRUE],size=1)
rawData[x]

This is slightly different because when i is a single name (x in this
case), data.table knows the caller must mean the x in calling scope, not
the column called "x" (if any).  Is two steps like this ok?  I'm
guessing it was really the inefficiency that was the motivation?

Matthew

On Fri, 2012-01-06 at 00:20 +0100, Christoph Jäckel wrote:
> Hi together,
> 
> 
> I run a Monte Carlo simulation on a data.table and do that currently
> with a loop: on every run, I choose a subset of rows subject to
> certain criteria and from those rows I take a random element.
> Currently, I do the following: Let's say I have funds from two regions
> ("eu" and "us") and I want to choose a random fund from "eu" (could be
> "us" in the next run and a different region in the third):
> 
> 
> library(data.table)
> rawData <- data.table(fundID  = letters,
>                       compGeo = rep(c("us", "eu"), each=13))
> setkey(rawData, "compGeo")
> intDT <- rawData[J("eu"), mult="all"]
> intDT[sample.int(nrow(intDT), size=1)]
> 
> 
> So my idea is to just give the user the option mult="random", which
> does this in one step. What do you think about that feature request? 
> 
> 
> With respect to the implementation: I changed a few lines in the
> function '[.data.table' and got this to run on my locale data.table
> version, so I guess I could implement it (as far as I can see, one
> just needs to change some R code). However, I haven't done extensive
> testing and I'm not an expert on shared projects and subversion (never
> did that actually), so I guess I would need some help to start with
> and the confirmation I couldn't break anything ;-)
> 
> 
> Christoph
> 
> 
> 
> 
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help




More information about the datatable-help mailing list