[datatable-help] (no subject)

John Kerpel john.kerpel2 at gmail.com
Wed Aug 7 03:50:57 CEST 2013


Wow, thx!  I didn't think it would be straightforward - but to your point I
will try to set up my data differently to see if I can simplify the process.


On Tue, Aug 6, 2013 at 5:45 PM, Steve Lianoglou <
mailinglist.honeypot at gmail.com> wrote:

> Hi John,
>
> (resending because I was bounced from list due to sending from wrong
> email address)
>
> Please use "reply-all" when replying to emails on this list so that
> discussion stays "on list" and others can help with and benefit from
> the discussion.
>
> Comments below:
>
> On Aug 6, 2013, at 2:40 PM, John Kerpel <john.kerpel2 at gmail.com> wrote:
>
> > Steve:
> >
> > To follow up on my question from a couple of days ago, assuming the
> > following:
>
> > DT =
> data.table(a=c(4:13),y=c(1,1,2,2,2,3,3,3,4,4),x=1:10,z=c(1,1,1,1,2,2,2,2,3,3),zz=c(1,1,1,1,1,2,2,2,2,2))
> > setkeyv(DT,cols=c("a","x","y","z","zz"))
> > #DT[,if(.N>=4) {list(predict(smooth.spline(x,y),a)$y)} ,by=c('z', 'zz')]
> >
> > a=c(4:13)
> > y=c(1,1,2,2,2,3,3,3,4,4)
> > x=1:10
> > predict(smooth.spline(x[1:4],y[1:4]),a[1:5])$y
> > [1] 2.1 2.5 2.9 3.3 3.7
> > predict(smooth.spline(x[5:8],y[5:8]),a[6:10])$y
> > [1] 2.954664 2.909333 2.864003 2.818672 2.773341
>
> > So in this example the predictor a is indexed by zz and (x,y) is indexed
> by
> > z.  Is there a way to do this in the "by" statement?  I've got a
> workaround
> > that uses clusterMap, but I'd like to use data.table instead via some
> > statement like what is commented out above.
>
> > Thanks for your help.
>
> This seems like the data is setup in a rather strange way -- you'd
> like to have objects (smooth splines) predict on elements (the `a`s)
> that are trained on different sets that you want to predict .. there's
> no "natural" way to use the same data for training and prediction by
> iterating over subsets at the same time.
>
> Perhaps you provided a toy example which isn't how your real data is
> set up, but if not, I'd recommend perhaps having two different tables
> (one with your zz's and your z's split), eg:
>
> train <- data.table(x=whatever, y=whatever, z=z-index)
> predict.on <- data.table(a=a.values, z=z-index)
>
> Anyway, I'll just leave the code that uses data.table with your
> current data below with no further comment -- it'll do what you want.
>
> library(data.table)
>
> a <- c(4:13)
> y <- c(1,1,2,2,2,3,3,3,4,4)
> x <- 1:10
> z <- c(1,1,1,1,2,2,2,2,3,3)
> zz <- c(1,1,1,1,1,2,2,2,2,2)
> DT <- data.table(a=a, y=y, x=x, z=z, zz=zz)
> setkeyv(DT, 'z')
> Zs <- unique(DT)$z
>
> splines <- lapply(Zs, function(zval) {
>  dt <- DT[J(zval)]
>  if (nrow(dt) >= 4) {
>    ss <- smooth.spline(dt$x, dt$y)
>  } else {
>    ss <- NULL
>  }
>  data.table(zz=zval, ss=list(ss), is.spline=!is.null(ss))
> })
> splines <- rbindlist(splines)[is.spline == TRUE]
> setkeyv(splines, 'zz')
> setkeyv(DT, 'zz')
>
> splines[DT, list(preds=predict(ss[[1]], a)$y)]
>     zz    preds
>  1:  1 2.100000
>  2:  1 2.500000
>  3:  1 2.900000
>  4:  1 3.300000
>  5:  1 3.700000
>  6:  2 2.954664
>  7:  2 2.909333
>  8:  2 2.864003
>  9:  2 2.818672
> 10:  2 2.773341
>
> HTH,
> -steve
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130806/a04d4915/attachment.html>


More information about the datatable-help mailing list