[datatable-help] := suggestions

Damian Betebenner dbetebenner at nciea.org
Mon May 14 10:10:31 CEST 2012


One last wrinkle to iron out:

Does assignment by reference work with a class that has a slot that is a data.table?

I have defined a new class where one of the slots is a data.table. However, when I apply:

for (i in c("VALID_CASE", "Y", "Z")) my_object at test.dt[, i := my.additional.table[[i]][NA][1], with=FALSE]



Nothing "sticks". That is, none of the variables I'm attempting to assign by reference using := are created.


It does work when done outside of the class:


for (i in c("VALID_CASE", "Y", "Z")) test.dt[, i := my.additional.table[[i]][NA][1], with=FALSE]


Damian Betebenner
Center for Assessment
PO Box 351
Dover, NH   03821-0351
 
Phone (office): (603) 516-7900
Phone (cell): (857) 234-2474
Fax: (603) 516-7910

dbetebenner at nciea.org
www.nciea.org




-----Original Message-----
From: Matthew Dowle [mailto:mdowlenoreply at virginmedia.com] On Behalf Of Matthew Dowle
Sent: Friday, May 11, 2012 9:33 PM
To: Damian Betebenner
Cc: datatable-help at lists.r-forge.r-project.org
Subject: Re: := suggestions

On Fri, 2012-05-11 at 13:15 -0500, Damian Betebenner wrote:
> All,
> 
> Trying to use := well but get errors and warnings and am looking for 
> an elegant way to subset and use := together when multiple variables 
> are being created and factors are involved.
> 
> Here’s some code showing what I’m trying to do. Any help in doing this 
> better greatly appreciated:
> 
> require(data.table)
> 
> ### Base data.table
> 
> test.dt <- data.table(ID=rep(1:10, 2), 
> CONTENT_AREA=as.factor(rep(c("MATH", "READ"), each=10)), X=rnorm(10))
> 
> setkeyv(test.dt, c("ID", "CONTENT_AREA")) test.dt
>
> ### Values to be looked up
> 
> my.lookup <- data.table(ID=1:5,  CONTENT_AREA=as.factor("MATH")) 
> my.lookup
>  
> ### Data table to be added to the original data.table
>
> my.additional.table <- data.table(my.lookup, VALID_CASE=factor(1, 
> levels=1:2, labels=c("VALID_CASE", "INVALID_CASE")), 
> Y=as.factor(letters[1:5]), Z=101:105) my.additional.table
> 
> ### First attempt with error
>
> test.dt[my.lookup, names(my.new.table) := my.additional.table, 
> with=FALSE, mult="first"]

I get :
Error in eval(expr, envir, enclos) : object 'my.new.table' not found

but assuming that was typo, then with :

test.dt[my.lookup, names(my.additional.table) := my.additional.table, with=FALSE, mult="first"]

I get :

Error in `[.data.table`(test.dt, my.lookup, `:=`(names(my.additional.table),  : 
  Attempt to add new column(s) and set subset of rows at the same time.
Create the new column(s) first, and then you'll be able to assign to a subset. If i is set to 1:nrow(x) then please remove that (no need, it's faster without).

That error was meant to say "for now", oops. Will try and implement that in 1.8.1 (automatic adding of new column, padding with NA where the sub assigning := doesn't touch).  More comments below ...
 
> 
> ### Create the variables in test.dt using := (but gives warnings and 
> is cumbersome to have to specify the class of the variables that are 
> going to be created)
> 
> for (i in c("VALID_CASE", "Y", "Z")) {
>                 test.dt[, i := NA_integer_, with=FALSE, mult="first"]
>                 class(test.dt[[i]]) <- class(my.additional.table[[i]])
>                 if (is.factor(test.dt[[i]])) levels(test.dt[[i]]) <-
> levels(my.additional.table[[i]])
> }

Yes I get this warning (twice) too :
Warning messages:
1: In `[.data.table`(test.dt, , `:=`(i, NA_integer_), with = FALSE,  :
  Invalid .internal.selfref detected and fixed by taking a copy of the whole table, so that := can add this new column by reference. At an earlier point, this data.table has been copied by R. Avoid key<-,
names<- and attr<- which in R currently (and oddly) all copy the whole data.table. Use set* syntax instead to avoid copying: setkey(),
setnames() and setattr(). If this message doesn't help, please report to datatable-help so the root cause can be fixed.

I guess that one or both the class<- and levels<- are copying the whole table. Consistent with the first iteration working without warning followed by warnings on the 2nd and 3rd.

Just for now until it's automatic, and it might be useful for other tasks, empty factor columns can be created with factor(NA), and := is factor level aware so you can add new levels just by assigning a character value to an item (:= modifies the factor levels by reference
for you).   So :

for (i in c("VALID_CASE", "Y", "Z"))
    test.dt[, i := if(is.factor(my.additional.table[[i]]) factor(NA) else NA_integer_, with=FALSE] # No warnings

or,

for (i in c("VALID_CASE", "Y", "Z"))
    test.dt[, i := my.additional.table[[i]][NA], with=FALSE]

which copes with more types and also retains all levels.

>  
> 
> ### Sucessfully perform the variable creation on the rows indicated by 
> my.lookup
> 
> test.dt[my.lookup, names(my.additional.table) := my.additional.table, 
> with=FALSE, mult="first"]
>  
> 
>  
> 
> Damian Betebenner
> 
> Center for Assessment
> 
> PO Box 351
> 
> Dover, NH   03821-0351
> 
>  
> 
> Phone (office): (603) 516-7900
> 
> Phone (cell): (857) 234-2474
> 
> Fax: (603) 516-7910
> 
>  
> 
> dbetebenner at nciea.org
> 
> www.nciea.org
> 
>  





More information about the datatable-help mailing list