[datatable-help] Best way to apply function to set of columns to create new columns where the function requires other columns from data.table

Frank Erickson fperickson at wisc.edu
Thu Feb 12 16:40:10 CET 2015


Hi Marc,

I think the set function is a good fit:

for (j0 in varnames)
set(dt,j=paste0(j0,'_mean'),value=wtd.mean(dt[[j0]],dt[[3]]))

I guess this is significantly more efficient than nested ['s and .SD's if
your data is large. If your data.table is really big, though, maybe you
want to assign the weighted means elsewhere...? They're just scalars, so
you probably don't need them filling out a vector of the data table.

--Frank


On Tue, Feb 10, 2015 at 1:58 PM, Marc Halperin <Halperin at outins.com> wrote:

> I want to add new columns to a data.table that is the weighted average of
> the columns and a weight variable.  This is a general problem I run into
> when using .SDcols but also needing another variable from the data.table to
> be available within the function within lapply.  Without including that
> variable within .SDcols (in this case the weight variable), I don't have
> access to it in the lapply function argument.   Is it a bad idea to subset
> .SD how I've done it?
>
> library(data.table)
> library(Hmisc)
>
> dt <- data.table(a=runif(10), b= runif(10), weight=runif(10))
>
> varnames <- c("a","b")
>
> dt[ , ( paste( "mean", varnames, sep = "_" ) ) := lapply( .SD[ , .SD,
> .SDcols = -"weight" ], wtd.mean, weight ), .SDcols = c("weight",varnames) ]
>
> Thanks
>
> -Marc
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20150212/7953c272/attachment.html>


More information about the datatable-help mailing list