[datatable-help] := unclarity and possible bug?

Matthew Dowle mdowle at mdowle.plus.com
Thu Aug 4 14:50:24 CEST 2011


Still doesn't seem to be latest version:  DT[,z:=5] should add column (and 
that's tested).
Otherwise correct and intended behaviour (although an informative warning 
needs adding when 5 gets coerced to type of column (i.e. logical) - thanks 
for spotting). Remember as.logical(5) is TRUE without warning.   So, try 
creating column with NA_integer_ or NA_real_ instead.  Once the column type 
is set,  that's it.  Columns aren't coerced to match type of RHS, unlike 
data.frame  [which if you think about it is a big hit if the data is large].

"Chris Neff" <caneff at gmail.com> wrote in message 
news:CAAuY0RXT7q+cm91PJ8KGkMwDApwFxM_EALb-Yu=P6ndp+LEfXg at mail.gmail.com...
Ignore this second one, restarting and refreshing my data.table
install now gives the proper error message when I try that. Sorry I'm
not used to being on the bleeding edge of these things and I forget to
update. However the first question is still mainly relevant:

> DT <- data.table(x=1:10, y=rep(1:2,5))
> DT[,z:=5]
       x y
 [1,]  1 1
 [2,]  2 2
 [3,]  3 1
 [4,]  4 2
 [5,]  5 1
 [6,]  6 2
 [7,]  7 1
 [8,]  8 2
 [9,]  9 1
[10,] 10 2
> DT[1:nrow(DT),z:=5]
Error in `[.data.table`(DT, 1:nrow(DT), `:=`(z, 5)) :
  Attempt to add new column(s) and set subset of rows at the same
time. Create the new column(s) first, and then you'll be able to
assign to a subset. If i is set to 1:nrow(x) then please remove that
(no need, it's faster without).
> DT$z <- NA
> DT[, z:=5]
       x y    z
 [1,]  1 1 TRUE
 [2,]  2 2 TRUE
 [3,]  3 1 TRUE
 [4,]  4 2 TRUE
 [5,]  5 1 TRUE
 [6,]  6 2 TRUE
 [7,]  7 1 TRUE
 [8,]  8 2 TRUE
 [9,]  9 1 TRUE
[10,] 10 2 TRUE



The return on DT[,z:=5] when I haven't initialized DT$z yet is
different, but still more uninformative than it is when I do
DT[1:nrow(DT), z:=5].  And the DT$z <- NA issue is still there.

Thanks!


On 4 August 2011 08:18, Chris Neff <caneff at gmail.com> wrote:
> A second question while I'm playing with it. It seems from the FRs
> that it doesn't support multiple := in one select, but:
>
> DT <- data.table(x=1:10, y=rep(1:2,10))
> DT$a = 0
> DT$z = 0
>
> DT[, list(a := y/sum(y), z := 5)]
>
> works just fine for me. An error gets thrown but afterwards the
> columns are modified as intended. Why the error?
>
>> DT[,list(z:=5,a:=y/sum(y))]
> z
> [1] 5
> [1] TRUE
> a
> y/sum(y)
> [1] TRUE
> Error in data.table(`:=`(z, 5), `:=`(a, y/sum(y))) :
> column or argument 1 is NULL
>> DT
> x y z a
> [1,] 1 1 5 0.06666667
> [2,] 2 2 5 0.13333333
> [3,] 3 1 5 0.06666667
> [4,] 4 2 5 0.13333333
> [5,] 5 1 5 0.06666667
> [6,] 6 2 5 0.13333333
> [7,] 7 1 5 0.06666667
> [8,] 8 2 5 0.13333333
> [9,] 9 1 5 0.06666667
> [10,] 10 2 5 0.13333333
>
> -Chris
>
> On 4 August 2011 08:12, Chris Neff <caneff at gmail.com> wrote:
>> Hi all,
>>
>> If I do:
>>
>> DT <- data.table(x=1:10, y=rep(1:2,5))
>>
>> Then try the following
>>
>> DT[, z:=5]
>>
>> I get:
>>
>>> DT[, z:=5]
>> z
>> [1] 5
>> [1] TRUE
>> NULL
>>
>> and if I were to do DT <- DT[, z:=5], then DT gets set to NULL.
>> Alternatively if I do
>>
>> DT[1:10, z:=5]
>>
>> I get
>>
>>> DT=DT[1:nrow(DT),z:=5]
>> z
>> [1] 5
>> [1] 1 2 3 4 5 6 7 8 9 10
>> Error in `:=`(z, 5) :
>> Attempt to add new column(s) and set subset of rows at the same
>> time. Create the new column(s) first, and then you'll be able to
>> assign to a subset. If i is set to 1:nrow(x) then please remove that
>> (no need, it's faster without).
>>
>>
>> Which is more informative. So I do as it instructs:
>>
>> DT$z <- NA
>>
>> DT[, z:=5]
>>
>> And as output I get:
>>
>>> DT
>> x y z
>> [1,] 1 1 TRUE
>> [2,] 2 2 TRUE
>> [3,] 3 1 TRUE
>> [4,] 4 2 TRUE
>> [5,] 5 1 TRUE
>> [6,] 6 2 TRUE
>> [7,] 7 1 TRUE
>> [8,] 8 2 TRUE
>> [9,] 9 1 TRUE
>> [10,] 10 2 TRUE
>>
>>
>> Why isn't z 5 like assigned? I think it is because I assigned it as
>> NA, and data table didn't know to change it to integer (although why
>> it changed it to logical is another puzzle). If I instead do
>>
>> DT$z <- 0
>>
>> DT[, z:=5]
>>
>> It works fine.
>>
>> So my two points are:
>>
>> A) Doing DT[,z:=5] should be as informative as doing DT[1:nrow(DT),
>> z:=5] with the error message.
>>
>> B) What went wrong with the NA assignment I did?
>>
>> Thanks!
>> Chris
>>
> 





More information about the datatable-help mailing list