[datatable-help] := suggestions

Matthew Dowle mdowle at mdowle.plus.com
Mon May 14 10:52:48 CEST 2012


I'm not too hot on S4 I'm afraid. In principle it should work I guess. If
you run .Internal(inspect(my_object)) before and after, that should reveal
what happened. It seems that merely instantiating the class copies its
arguments.

> setClass("test", representation(x="integer",y="data.table")
+ )
[1] "test"
> x = new("test", x=1:4, y=data.table(a=1:3,b=4:6))
> data.table:::selfrefok(x at y)
[1] 0    # i.e., its been copied already, by new() I guess
> x at y
     a b
[1,] 1 4
[2,] 2 5
[3,] 3 6
> x at y[,c:=7:9]
     a b c
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
Warning message:
In `[.data.table`(x at y, , `:=`(c, 7:9)) :
  Invalid .internal.selfref detected and fixed by taking a copy of the
whole table, so that := can add this new column by reference. At an
earlier point, this data.table has been copied by R. Avoid key<-,
names<- and attr<- which in R currently (and oddly) all copy the whole
data.table. Use set* syntax instead to avoid copying: setkey(),
setnames() and setattr(). If this message doesn't help, please report to
datatable-help so the root cause can be fixed.
> x
An object of class "test"
Slot "x":
[1] 1 2 3 4

Slot "y":
     a b
[1,] 1 4
[2,] 2 5
[3,] 3 6

> data.table:::selfrefok(x at y)
[1] 1
> x at y[,c:=7:9]
     a b c
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> # no warning this time, but still hasn't updated by reference :
> x at y
     a b
[1,] 1 4
[2,] 2 5
[3,] 3 6
>

If you'd like this then please raise a feature request. The first := on
the slot would generate the warning about a previous copy but then assign
back to the slot by reference.  That warning could be switched off in the
case of slots.

But if new() copies its arguments and there's no way to stop or avoid
that, then I wonder if it makes sense to include a large data.table inside
an S4 class at all?  Where else does S4 copy?

Matthew


> One last wrinkle to iron out:
>
> Does assignment by reference work with a class that has a slot that is a
> data.table?
>
> I have defined a new class where one of the slots is a data.table.
> However, when I apply:
>
> for (i in c("VALID_CASE", "Y", "Z")) my_object at test.dt[, i :=
> my.additional.table[[i]][NA][1], with=FALSE]
>
>
>
> Nothing "sticks". That is, none of the variables I'm attempting to assign
> by reference using := are created.
>
>
> It does work when done outside of the class:
>
>
> for (i in c("VALID_CASE", "Y", "Z")) test.dt[, i :=
> my.additional.table[[i]][NA][1], with=FALSE]
>
>
> Damian Betebenner
> Center for Assessment
> PO Box 351
> Dover, NH   03821-0351
>  
> Phone (office): (603) 516-7900
> Phone (cell): (857) 234-2474
> Fax: (603) 516-7910
>
> dbetebenner at nciea.org
> www.nciea.org
>
>
>
>
> -----Original Message-----
> From: Matthew Dowle [mailto:mdowlenoreply at virginmedia.com] On Behalf Of
> Matthew Dowle
> Sent: Friday, May 11, 2012 9:33 PM
> To: Damian Betebenner
> Cc: datatable-help at lists.r-forge.r-project.org
> Subject: Re: := suggestions
>
> On Fri, 2012-05-11 at 13:15 -0500, Damian Betebenner wrote:
>> All,
>>
>> Trying to use := well but get errors and warnings and am looking for
>> an elegant way to subset and use := together when multiple variables
>> are being created and factors are involved.
>>
>> Here’s some code showing what I’m trying to do. Any help in doing
>> this
>> better greatly appreciated:
>>
>> require(data.table)
>>
>> ### Base data.table
>>
>> test.dt <- data.table(ID=rep(1:10, 2),
>> CONTENT_AREA=as.factor(rep(c("MATH", "READ"), each=10)), X=rnorm(10))
>>
>> setkeyv(test.dt, c("ID", "CONTENT_AREA")) test.dt
>>
>> ### Values to be looked up
>>
>> my.lookup <- data.table(ID=1:5,  CONTENT_AREA=as.factor("MATH"))
>> my.lookup
>>
>> ### Data table to be added to the original data.table
>>
>> my.additional.table <- data.table(my.lookup, VALID_CASE=factor(1,
>> levels=1:2, labels=c("VALID_CASE", "INVALID_CASE")),
>> Y=as.factor(letters[1:5]), Z=101:105) my.additional.table
>>
>> ### First attempt with error
>>
>> test.dt[my.lookup, names(my.new.table) := my.additional.table,
>> with=FALSE, mult="first"]
>
> I get :
> Error in eval(expr, envir, enclos) : object 'my.new.table' not found
>
> but assuming that was typo, then with :
>
> test.dt[my.lookup, names(my.additional.table) := my.additional.table,
> with=FALSE, mult="first"]
>
> I get :
>
> Error in `[.data.table`(test.dt, my.lookup,
> `:=`(names(my.additional.table),  :
>   Attempt to add new column(s) and set subset of rows at the same time.
> Create the new column(s) first, and then you'll be able to assign to a
> subset. If i is set to 1:nrow(x) then please remove that (no need, it's
> faster without).
>
> That error was meant to say "for now", oops. Will try and implement that
> in 1.8.1 (automatic adding of new column, padding with NA where the sub
> assigning := doesn't touch).  More comments below ...
>
>>
>> ### Create the variables in test.dt using := (but gives warnings and
>> is cumbersome to have to specify the class of the variables that are
>> going to be created)
>>
>> for (i in c("VALID_CASE", "Y", "Z")) {
>>                 test.dt[, i := NA_integer_, with=FALSE, mult="first"]
>>                 class(test.dt[[i]]) <- class(my.additional.table[[i]])
>>                 if (is.factor(test.dt[[i]])) levels(test.dt[[i]]) <-
>> levels(my.additional.table[[i]])
>> }
>
> Yes I get this warning (twice) too :
> Warning messages:
> 1: In `[.data.table`(test.dt, , `:=`(i, NA_integer_), with = FALSE,  :
>   Invalid .internal.selfref detected and fixed by taking a copy of the
> whole table, so that := can add this new column by reference. At an
> earlier point, this data.table has been copied by R. Avoid key<-,
> names<- and attr<- which in R currently (and oddly) all copy the whole
> data.table. Use set* syntax instead to avoid copying: setkey(),
> setnames() and setattr(). If this message doesn't help, please report to
> datatable-help so the root cause can be fixed.
>
> I guess that one or both the class<- and levels<- are copying the whole
> table. Consistent with the first iteration working without warning
> followed by warnings on the 2nd and 3rd.
>
> Just for now until it's automatic, and it might be useful for other tasks,
> empty factor columns can be created with factor(NA), and := is factor
> level aware so you can add new levels just by assigning a character value
> to an item (:= modifies the factor levels by reference
> for you).   So :
>
> for (i in c("VALID_CASE", "Y", "Z"))
>     test.dt[, i := if(is.factor(my.additional.table[[i]]) factor(NA) else
> NA_integer_, with=FALSE] # No warnings
>
> or,
>
> for (i in c("VALID_CASE", "Y", "Z"))
>     test.dt[, i := my.additional.table[[i]][NA], with=FALSE]
>
> which copes with more types and also retains all levels.
>
>>
>>
>> ### Sucessfully perform the variable creation on the rows indicated by
>> my.lookup
>>
>> test.dt[my.lookup, names(my.additional.table) := my.additional.table,
>> with=FALSE, mult="first"]
>>
>>
>>
>>
>> Damian Betebenner
>>
>> Center for Assessment
>>
>> PO Box 351
>>
>> Dover, NH   03821-0351
>>
>>
>>
>> Phone (office): (603) 516-7900
>>
>> Phone (cell): (857) 234-2474
>>
>> Fax: (603) 516-7910
>>
>>
>>
>> dbetebenner at nciea.org
>>
>> www.nciea.org
>>
>>
>
>
>
>




More information about the datatable-help mailing list