[datatable-help] What's your opinion on the feature request: add option mult="random"
Matthew Dowle
mdowle at mdowle.plus.com
Fri Jan 6 09:34:49 CET 2012
Very keen for direct contributions in that way, happy to help you with
svn etc, and you joining the project.
In this particular example, how about :
rawData[sample(rawData[J("eu"), which=TRUE],size=1)]
This solves the inefficiency of the 1st step; i.e.,
intDT <- rawData[J("eu"), mult="all"]
which copies a subset of all the columns, whilst retaining flexibility
for the user so user can easily sample 2 rows, or any other R method to
select a random subset.
Because of potential scoping conflicts (say a column was called
"rawData" i.e. the same name of the table), to be more robust :
x = sample(rawData[J("eu"), which=TRUE],size=1)
rawData[x]
This is slightly different because when i is a single name (x in this
case), data.table knows the caller must mean the x in calling scope, not
the column called "x" (if any). Is two steps like this ok? I'm
guessing it was really the inefficiency that was the motivation?
Matthew
On Fri, 2012-01-06 at 00:20 +0100, Christoph Jäckel wrote:
> Hi together,
>
>
> I run a Monte Carlo simulation on a data.table and do that currently
> with a loop: on every run, I choose a subset of rows subject to
> certain criteria and from those rows I take a random element.
> Currently, I do the following: Let's say I have funds from two regions
> ("eu" and "us") and I want to choose a random fund from "eu" (could be
> "us" in the next run and a different region in the third):
>
>
> library(data.table)
> rawData <- data.table(fundID = letters,
> compGeo = rep(c("us", "eu"), each=13))
> setkey(rawData, "compGeo")
> intDT <- rawData[J("eu"), mult="all"]
> intDT[sample.int(nrow(intDT), size=1)]
>
>
> So my idea is to just give the user the option mult="random", which
> does this in one step. What do you think about that feature request?
>
>
> With respect to the implementation: I changed a few lines in the
> function '[.data.table' and got this to run on my locale data.table
> version, so I guess I could implement it (as far as I can see, one
> just needs to change some R code). However, I haven't done extensive
> testing and I'm not an expert on shared projects and subversion (never
> did that actually), so I guess I would need some help to start with
> and the confirmation I couldn't break anything ;-)
>
>
> Christoph
>
>
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
More information about the datatable-help
mailing list