[datatable-help] convenience function for transforming variables, and adding them to the table

Fri May 28 11:38:07 CEST 2010

There used to be an incbycols argument which would prevent the B.1 :

   dt[,transform(.SD,D=mean(A)),by=B,incbycols=FALSE]

Shall we bring it back ?  incbycols was removed to simplify the number of
arguments since the orginal reason for it went away.  Seems like we do
have a good reason for it.

Also see feature request #200 (raised in Aug 08) :

https://r-forge.r-project.org/tracker/index.php?func=detail&aid=200&group_id=240&atid=978

Seems like back then I was thinking that syntax like :

   DT[J(2),{b=3;c=a+5}]

would allow fast update by binary search if the assignments to b and c
were updates to the table rather than local scope. NA would be added in
columns for b and c for the non matching rows.  But those are local scope
assignments currently as you know.

So maybe a switch ?

   DT[,{D=mean(A)},by=B,update=TRUE]

or just remove local scope (?) so that :

   DT[,{D=mean(A)},by=B]

would do the transform. That syntax is more natural in my mind. A local
temporary assignment could be more difficult syntax (as rarer to need) :

   DT[,{T(E)=A+2;F=E*2},by=B]

or something like that where T (temporary) could be named L (local) in the
spirit of I(). The above would create a new F column but not an E,
although the F used E along the way.

Thanks for raising it. Will give it some more thought too.

Matthew

>> Sasha Goodman wrote:
>>
>> I'm trying to make a simple convenience function for the
>> following common procedure, where one variable is transformed
>> with an arbitrary function and merged as a variable to the table:
>>
>> dt <- data.table(A = rep(1:3, each=4), B = rep(1:4, each=3), C =
> rep(1:2, 6))
>> dt[, transform(.SD,D=mean(A)), by="B"]
>>
>> Here is my first attempt, but it won't run because of scoping issues.
>>
>> dt.groupby <- function(data,grouping, ...) {
>> data[, transform(.SD,expr=...), by=grouping]
>> }
>>
>
> Tough problem. The best I could do was:
>
> dt.groupby <- function(data,grouping, ...) {
>     eval(bquote(data[, transform(.SD, ...),
>                      by = .(substitute(grouping))]))
> }
>
> It relies on some funky language manipulation that I always have
> a tough time with. It also fails for multiple groupings:
>
> dt.groupby(dt, B, D = mean(A), E = median(A))
> dt.groupby(dt, "B", D = mean(A), E = median(A))
> dt.groupby(dt, list(B,C), D = mean(A), E = median(A)) # fails
>
>
>
>> Any suggestions? It would also be nice if duplicate columns were
>> not created, such as the "B.1" the first procedure adds.
>
> Yes, it would. We probably don't want to take those columns out of
> .SD because they might be useful. I'm not sure how to get transform
> to ignore them.
>
> - Tom
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>