[datatable-help] := suggestions

Matthew Dowle mdowle at mdowle.plus.com
Sat May 12 03:32:37 CEST 2012


On Fri, 2012-05-11 at 13:15 -0500, Damian Betebenner wrote:
> All,
> 
> Trying to use := well but get errors and warnings and am looking for
> an elegant way to subset and use := together when multiple variables
> are being created and factors are involved.
> 
> Here’s some code showing what I’m trying to do. Any help in doing this
> better greatly appreciated:
> 
> require(data.table)
> 
> ### Base data.table
> 
> test.dt <- data.table(ID=rep(1:10, 2),
> CONTENT_AREA=as.factor(rep(c("MATH", "READ"), each=10)), X=rnorm(10))
> 
> setkeyv(test.dt, c("ID", "CONTENT_AREA"))
> test.dt
>
> ### Values to be looked up 
> 
> my.lookup <- data.table(ID=1:5,  CONTENT_AREA=as.factor("MATH"))
> my.lookup
>  
> ### Data table to be added to the original data.table
>
> my.additional.table <- data.table(my.lookup, VALID_CASE=factor(1,
> levels=1:2, labels=c("VALID_CASE", "INVALID_CASE")),
> Y=as.factor(letters[1:5]), Z=101:105) 
> my.additional.table
> 
> ### First attempt with error
>
> test.dt[my.lookup, names(my.new.table) := my.additional.table,
> with=FALSE, mult="first"]

I get :
Error in eval(expr, envir, enclos) : object 'my.new.table' not found

but assuming that was typo, then with :

test.dt[my.lookup, names(my.additional.table) := my.additional.table,
with=FALSE, mult="first"]

I get :

Error in `[.data.table`(test.dt, my.lookup,
`:=`(names(my.additional.table),  : 
  Attempt to add new column(s) and set subset of rows at the same time.
Create the new column(s) first, and then you'll be able to assign to a
subset. If i is set to 1:nrow(x) then please remove that (no need, it's
faster without).

That error was meant to say "for now", oops. Will try and implement that
in 1.8.1 (automatic adding of new column, padding with NA where the sub
assigning := doesn't touch).  More comments below ...
 
> 
> ### Create the variables in test.dt using := (but gives warnings and
> is cumbersome to have to specify the class of the variables that are
> going to be created) 
> 
> for (i in c("VALID_CASE", "Y", "Z")) {
>                 test.dt[, i := NA_integer_, with=FALSE, mult="first"]
>                 class(test.dt[[i]]) <- class(my.additional.table[[i]])
>                 if (is.factor(test.dt[[i]])) levels(test.dt[[i]]) <-
> levels(my.additional.table[[i]])
> }

Yes I get this warning (twice) too :
Warning messages:
1: In `[.data.table`(test.dt, , `:=`(i, NA_integer_), with = FALSE,  :
  Invalid .internal.selfref detected and fixed by taking a copy of the
whole table, so that := can add this new column by reference. At an
earlier point, this data.table has been copied by R. Avoid key<-,
names<- and attr<- which in R currently (and oddly) all copy the whole
data.table. Use set* syntax instead to avoid copying: setkey(),
setnames() and setattr(). If this message doesn't help, please report to
datatable-help so the root cause can be fixed.

I guess that one or both the class<- and levels<- are copying the whole
table. Consistent with the first iteration working without warning
followed by warnings on the 2nd and 3rd.

Just for now until it's automatic, and it might be useful for other
tasks, empty factor columns can be created with factor(NA), and := is
factor level aware so you can add new levels just by assigning a
character value to an item (:= modifies the factor levels by reference
for you).   So :

for (i in c("VALID_CASE", "Y", "Z"))
    test.dt[, i := if(is.factor(my.additional.table[[i]]) factor(NA)
else NA_integer_, with=FALSE]
# No warnings

or,

for (i in c("VALID_CASE", "Y", "Z"))
    test.dt[, i := my.additional.table[[i]][NA], with=FALSE]

which copes with more types and also retains all levels.

>  
> 
> ### Sucessfully perform the variable creation on the rows indicated by
> my.lookup
> 
> test.dt[my.lookup, names(my.additional.table) := my.additional.table,
> with=FALSE, mult="first"]
>  
> 
>  
> 
> Damian Betebenner
> 
> Center for Assessment
> 
> PO Box 351
> 
> Dover, NH   03821-0351
> 
>  
> 
> Phone (office): (603) 516-7900
> 
> Phone (cell): (857) 234-2474
> 
> Fax: (603) 516-7910
> 
>  
> 
> dbetebenner at nciea.org
> 
> www.nciea.org
> 
>  





More information about the datatable-help mailing list