[datatable-help] := unclarity and possible bug?

Chris Neff caneff at gmail.com
Thu Aug 4 15:09:31 CEST 2011


I've ran the following 3 different times in new sessions:

install.packages("data.table",
repos="http://R-Forge.R-project.org",type="source")

and still DT[,z:=5] does nothing.  Is there something I check to make
sure that the latest version is loaded?


As for the coercion stuff, I feel that it feels somewhat inconsistent
right now. For instance:

> DT <- data.table(x=1:10, y=1:10)

> DT$y <- TRUE

> sapply(DT, class)

        x         y
"integer" "integer"

> DT$y <- rnorm(10)
> sapply(DT, class)
        x         y
"integer" "numeric"

So in the first case y silently coerces the logical to an integer
without warning, but in the second case y happily turns into a numeric
when need be.  Why the difference?

When I do something like DT$y <- foo, I expect that y should turn into
foo regardless of what y was before.  If there is some reason why DT[,
y:=foo] should be different than DT$y <- foo, that is a secondary
matter, but I get mightily confused when DT$y <- foo doesn't behave
like data.frame.

On 4 August 2011 08:50, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
> Still doesn't seem to be latest version:  DT[,z:=5] should add column (and
> that's tested).
> Otherwise correct and intended behaviour (although an informative warning
> needs adding when 5 gets coerced to type of column (i.e. logical) - thanks
> for spotting). Remember as.logical(5) is TRUE without warning.   So, try
> creating column with NA_integer_ or NA_real_ instead.  Once the column type
> is set,  that's it.  Columns aren't coerced to match type of RHS, unlike
> data.frame  [which if you think about it is a big hit if the data is large].
>
> "Chris Neff" <caneff at gmail.com> wrote in message
> news:CAAuY0RXT7q+cm91PJ8KGkMwDApwFxM_EALb-Yu=P6ndp+LEfXg at mail.gmail.com...
> Ignore this second one, restarting and refreshing my data.table
> install now gives the proper error message when I try that. Sorry I'm
> not used to being on the bleeding edge of these things and I forget to
> update. However the first question is still mainly relevant:
>
>> DT <- data.table(x=1:10, y=rep(1:2,5))
>> DT[,z:=5]
>       x y
>  [1,]  1 1
>  [2,]  2 2
>  [3,]  3 1
>  [4,]  4 2
>  [5,]  5 1
>  [6,]  6 2
>  [7,]  7 1
>  [8,]  8 2
>  [9,]  9 1
> [10,] 10 2
>> DT[1:nrow(DT),z:=5]
> Error in `[.data.table`(DT, 1:nrow(DT), `:=`(z, 5)) :
>  Attempt to add new column(s) and set subset of rows at the same
> time. Create the new column(s) first, and then you'll be able to
> assign to a subset. If i is set to 1:nrow(x) then please remove that
> (no need, it's faster without).
>> DT$z <- NA
>> DT[, z:=5]
>       x y    z
>  [1,]  1 1 TRUE
>  [2,]  2 2 TRUE
>  [3,]  3 1 TRUE
>  [4,]  4 2 TRUE
>  [5,]  5 1 TRUE
>  [6,]  6 2 TRUE
>  [7,]  7 1 TRUE
>  [8,]  8 2 TRUE
>  [9,]  9 1 TRUE
> [10,] 10 2 TRUE
>
>
>
> The return on DT[,z:=5] when I haven't initialized DT$z yet is
> different, but still more uninformative than it is when I do
> DT[1:nrow(DT), z:=5].  And the DT$z <- NA issue is still there.
>
> Thanks!
>
>
> On 4 August 2011 08:18, Chris Neff <caneff at gmail.com> wrote:
>> A second question while I'm playing with it. It seems from the FRs
>> that it doesn't support multiple := in one select, but:
>>
>> DT <- data.table(x=1:10, y=rep(1:2,10))
>> DT$a = 0
>> DT$z = 0
>>
>> DT[, list(a := y/sum(y), z := 5)]
>>
>> works just fine for me. An error gets thrown but afterwards the
>> columns are modified as intended. Why the error?
>>
>>> DT[,list(z:=5,a:=y/sum(y))]
>> z
>> [1] 5
>> [1] TRUE
>> a
>> y/sum(y)
>> [1] TRUE
>> Error in data.table(`:=`(z, 5), `:=`(a, y/sum(y))) :
>> column or argument 1 is NULL
>>> DT
>> x y z a
>> [1,] 1 1 5 0.06666667
>> [2,] 2 2 5 0.13333333
>> [3,] 3 1 5 0.06666667
>> [4,] 4 2 5 0.13333333
>> [5,] 5 1 5 0.06666667
>> [6,] 6 2 5 0.13333333
>> [7,] 7 1 5 0.06666667
>> [8,] 8 2 5 0.13333333
>> [9,] 9 1 5 0.06666667
>> [10,] 10 2 5 0.13333333
>>
>> -Chris
>>
>> On 4 August 2011 08:12, Chris Neff <caneff at gmail.com> wrote:
>>> Hi all,
>>>
>>> If I do:
>>>
>>> DT <- data.table(x=1:10, y=rep(1:2,5))
>>>
>>> Then try the following
>>>
>>> DT[, z:=5]
>>>
>>> I get:
>>>
>>>> DT[, z:=5]
>>> z
>>> [1] 5
>>> [1] TRUE
>>> NULL
>>>
>>> and if I were to do DT <- DT[, z:=5], then DT gets set to NULL.
>>> Alternatively if I do
>>>
>>> DT[1:10, z:=5]
>>>
>>> I get
>>>
>>>> DT=DT[1:nrow(DT),z:=5]
>>> z
>>> [1] 5
>>> [1] 1 2 3 4 5 6 7 8 9 10
>>> Error in `:=`(z, 5) :
>>> Attempt to add new column(s) and set subset of rows at the same
>>> time. Create the new column(s) first, and then you'll be able to
>>> assign to a subset. If i is set to 1:nrow(x) then please remove that
>>> (no need, it's faster without).
>>>
>>>
>>> Which is more informative. So I do as it instructs:
>>>
>>> DT$z <- NA
>>>
>>> DT[, z:=5]
>>>
>>> And as output I get:
>>>
>>>> DT
>>> x y z
>>> [1,] 1 1 TRUE
>>> [2,] 2 2 TRUE
>>> [3,] 3 1 TRUE
>>> [4,] 4 2 TRUE
>>> [5,] 5 1 TRUE
>>> [6,] 6 2 TRUE
>>> [7,] 7 1 TRUE
>>> [8,] 8 2 TRUE
>>> [9,] 9 1 TRUE
>>> [10,] 10 2 TRUE
>>>
>>>
>>> Why isn't z 5 like assigned? I think it is because I assigned it as
>>> NA, and data table didn't know to change it to integer (although why
>>> it changed it to logical is another puzzle). If I instead do
>>>
>>> DT$z <- 0
>>>
>>> DT[, z:=5]
>>>
>>> It works fine.
>>>
>>> So my two points are:
>>>
>>> A) Doing DT[,z:=5] should be as informative as doing DT[1:nrow(DT),
>>> z:=5] with the error message.
>>>
>>> B) What went wrong with the NA assignment I did?
>>>
>>> Thanks!
>>> Chris
>>>
>>
>
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>


More information about the datatable-help mailing list