[datatable-help] (no subject)

Steve Lianoglou mailinglist.honeypot at gmail.com
Wed Aug 7 00:45:36 CEST 2013


Hi John,

(resending because I was bounced from list due to sending from wrong
email address)

Please use "reply-all" when replying to emails on this list so that
discussion stays "on list" and others can help with and benefit from
the discussion.

Comments below:

On Aug 6, 2013, at 2:40 PM, John Kerpel <john.kerpel2 at gmail.com> wrote:

> Steve:
>
> To follow up on my question from a couple of days ago, assuming the
> following:

> DT = data.table(a=c(4:13),y=c(1,1,2,2,2,3,3,3,4,4),x=1:10,z=c(1,1,1,1,2,2,2,2,3,3),zz=c(1,1,1,1,1,2,2,2,2,2))
> setkeyv(DT,cols=c("a","x","y","z","zz"))
> #DT[,if(.N>=4) {list(predict(smooth.spline(x,y),a)$y)} ,by=c('z', 'zz')]
>
> a=c(4:13)
> y=c(1,1,2,2,2,3,3,3,4,4)
> x=1:10
> predict(smooth.spline(x[1:4],y[1:4]),a[1:5])$y
> [1] 2.1 2.5 2.9 3.3 3.7
> predict(smooth.spline(x[5:8],y[5:8]),a[6:10])$y
> [1] 2.954664 2.909333 2.864003 2.818672 2.773341

> So in this example the predictor a is indexed by zz and (x,y) is indexed by
> z.  Is there a way to do this in the "by" statement?  I've got a workaround
> that uses clusterMap, but I'd like to use data.table instead via some
> statement like what is commented out above.

> Thanks for your help.

This seems like the data is setup in a rather strange way -- you'd
like to have objects (smooth splines) predict on elements (the `a`s)
that are trained on different sets that you want to predict .. there's
no "natural" way to use the same data for training and prediction by
iterating over subsets at the same time.

Perhaps you provided a toy example which isn't how your real data is
set up, but if not, I'd recommend perhaps having two different tables
(one with your zz's and your z's split), eg:

train <- data.table(x=whatever, y=whatever, z=z-index)
predict.on <- data.table(a=a.values, z=z-index)

Anyway, I'll just leave the code that uses data.table with your
current data below with no further comment -- it'll do what you want.

library(data.table)

a <- c(4:13)
y <- c(1,1,2,2,2,3,3,3,4,4)
x <- 1:10
z <- c(1,1,1,1,2,2,2,2,3,3)
zz <- c(1,1,1,1,1,2,2,2,2,2)
DT <- data.table(a=a, y=y, x=x, z=z, zz=zz)
setkeyv(DT, 'z')
Zs <- unique(DT)$z

splines <- lapply(Zs, function(zval) {
 dt <- DT[J(zval)]
 if (nrow(dt) >= 4) {
   ss <- smooth.spline(dt$x, dt$y)
 } else {
   ss <- NULL
 }
 data.table(zz=zval, ss=list(ss), is.spline=!is.null(ss))
})
splines <- rbindlist(splines)[is.spline == TRUE]
setkeyv(splines, 'zz')
setkeyv(DT, 'zz')

splines[DT, list(preds=predict(ss[[1]], a)$y)]
    zz    preds
 1:  1 2.100000
 2:  1 2.500000
 3:  1 2.900000
 4:  1 3.300000
 5:  1 3.700000
 6:  2 2.954664
 7:  2 2.909333
 8:  2 2.864003
 9:  2 2.818672
10:  2 2.773341

HTH,
-steve


More information about the datatable-help mailing list