[datatable-help] := unclarity and possible bug?

Matthew Dowle mdowle at mdowle.plus.com
Thu Aug 4 15:56:19 CEST 2011


Try cacheOK=FALSE (passed on to download.file() via ...)

Or, sessionInfo() please. Is it Windows (dll not being refeshed).  Reboot, 
clear out,  R --vanilla install,  clear out browser cache manually.  Failing 
all that,  download file manually and install from file.

The rnorm(10) is already a vector as long as the table itself => invokes 
"replace" column. Since you the user already created it,  it is plonked 
right into the column, bang.
The other case is recycling or subassign, and that preserves the column's 
type (for speed, unlike data.frame).
So, intended behaviour,  just not what you expected.


"Chris Neff" <caneff at gmail.com> wrote in message 
news:CAAuY0RWYJ8+wEbVG8XvYYqhSNFPV6XxgdQo8smRy+DRWGj5sfg at mail.gmail.com...
I've ran the following 3 different times in new sessions:

install.packages("data.table",
repos="http://R-Forge.R-project.org",type="source")

and still DT[,z:=5] does nothing.  Is there something I check to make
sure that the latest version is loaded?


As for the coercion stuff, I feel that it feels somewhat inconsistent
right now. For instance:

> DT <- data.table(x=1:10, y=1:10)

> DT$y <- TRUE

> sapply(DT, class)

        x         y
"integer" "integer"

> DT$y <- rnorm(10)
> sapply(DT, class)
        x         y
"integer" "numeric"

So in the first case y silently coerces the logical to an integer
without warning, but in the second case y happily turns into a numeric
when need be.  Why the difference?

When I do something like DT$y <- foo, I expect that y should turn into
foo regardless of what y was before.  If there is some reason why DT[,
y:=foo] should be different than DT$y <- foo, that is a secondary
matter, but I get mightily confused when DT$y <- foo doesn't behave
like data.frame.

On 4 August 2011 08:50, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
> Still doesn't seem to be latest version: DT[,z:=5] should add column (and
> that's tested).
> Otherwise correct and intended behaviour (although an informative warning
> needs adding when 5 gets coerced to type of column (i.e. logical) - thanks
> for spotting). Remember as.logical(5) is TRUE without warning. So, try
> creating column with NA_integer_ or NA_real_ instead. Once the column type
> is set, that's it. Columns aren't coerced to match type of RHS, unlike
> data.frame [which if you think about it is a big hit if the data is 
> large].
>
> "Chris Neff" <caneff at gmail.com> wrote in message
> news:CAAuY0RXT7q+cm91PJ8KGkMwDApwFxM_EALb-Yu=P6ndp+LEfXg at mail.gmail.com...
> Ignore this second one, restarting and refreshing my data.table
> install now gives the proper error message when I try that. Sorry I'm
> not used to being on the bleeding edge of these things and I forget to
> update. However the first question is still mainly relevant:
>
>> DT <- data.table(x=1:10, y=rep(1:2,5))
>> DT[,z:=5]
> x y
> [1,] 1 1
> [2,] 2 2
> [3,] 3 1
> [4,] 4 2
> [5,] 5 1
> [6,] 6 2
> [7,] 7 1
> [8,] 8 2
> [9,] 9 1
> [10,] 10 2
>> DT[1:nrow(DT),z:=5]
> Error in `[.data.table`(DT, 1:nrow(DT), `:=`(z, 5)) :
> Attempt to add new column(s) and set subset of rows at the same
> time. Create the new column(s) first, and then you'll be able to
> assign to a subset. If i is set to 1:nrow(x) then please remove that
> (no need, it's faster without).
>> DT$z <- NA
>> DT[, z:=5]
> x y z
> [1,] 1 1 TRUE
> [2,] 2 2 TRUE
> [3,] 3 1 TRUE
> [4,] 4 2 TRUE
> [5,] 5 1 TRUE
> [6,] 6 2 TRUE
> [7,] 7 1 TRUE
> [8,] 8 2 TRUE
> [9,] 9 1 TRUE
> [10,] 10 2 TRUE
>
>
>
> The return on DT[,z:=5] when I haven't initialized DT$z yet is
> different, but still more uninformative than it is when I do
> DT[1:nrow(DT), z:=5]. And the DT$z <- NA issue is still there.
>
> Thanks!
>
>
> On 4 August 2011 08:18, Chris Neff <caneff at gmail.com> wrote:
>> A second question while I'm playing with it. It seems from the FRs
>> that it doesn't support multiple := in one select, but:
>>
>> DT <- data.table(x=1:10, y=rep(1:2,10))
>> DT$a = 0
>> DT$z = 0
>>
>> DT[, list(a := y/sum(y), z := 5)]
>>
>> works just fine for me. An error gets thrown but afterwards the
>> columns are modified as intended. Why the error?
>>
>>> DT[,list(z:=5,a:=y/sum(y))]
>> z
>> [1] 5
>> [1] TRUE
>> a
>> y/sum(y)
>> [1] TRUE
>> Error in data.table(`:=`(z, 5), `:=`(a, y/sum(y))) :
>> column or argument 1 is NULL
>>> DT
>> x y z a
>> [1,] 1 1 5 0.06666667
>> [2,] 2 2 5 0.13333333
>> [3,] 3 1 5 0.06666667
>> [4,] 4 2 5 0.13333333
>> [5,] 5 1 5 0.06666667
>> [6,] 6 2 5 0.13333333
>> [7,] 7 1 5 0.06666667
>> [8,] 8 2 5 0.13333333
>> [9,] 9 1 5 0.06666667
>> [10,] 10 2 5 0.13333333
>>
>> -Chris
>>
>> On 4 August 2011 08:12, Chris Neff <caneff at gmail.com> wrote:
>>> Hi all,
>>>
>>> If I do:
>>>
>>> DT <- data.table(x=1:10, y=rep(1:2,5))
>>>
>>> Then try the following
>>>
>>> DT[, z:=5]
>>>
>>> I get:
>>>
>>>> DT[, z:=5]
>>> z
>>> [1] 5
>>> [1] TRUE
>>> NULL
>>>
>>> and if I were to do DT <- DT[, z:=5], then DT gets set to NULL.
>>> Alternatively if I do
>>>
>>> DT[1:10, z:=5]
>>>
>>> I get
>>>
>>>> DT=DT[1:nrow(DT),z:=5]
>>> z
>>> [1] 5
>>> [1] 1 2 3 4 5 6 7 8 9 10
>>> Error in `:=`(z, 5) :
>>> Attempt to add new column(s) and set subset of rows at the same
>>> time. Create the new column(s) first, and then you'll be able to
>>> assign to a subset. If i is set to 1:nrow(x) then please remove that
>>> (no need, it's faster without).
>>>
>>>
>>> Which is more informative. So I do as it instructs:
>>>
>>> DT$z <- NA
>>>
>>> DT[, z:=5]
>>>
>>> And as output I get:
>>>
>>>> DT
>>> x y z
>>> [1,] 1 1 TRUE
>>> [2,] 2 2 TRUE
>>> [3,] 3 1 TRUE
>>> [4,] 4 2 TRUE
>>> [5,] 5 1 TRUE
>>> [6,] 6 2 TRUE
>>> [7,] 7 1 TRUE
>>> [8,] 8 2 TRUE
>>> [9,] 9 1 TRUE
>>> [10,] 10 2 TRUE
>>>
>>>
>>> Why isn't z 5 like assigned? I think it is because I assigned it as
>>> NA, and data table didn't know to change it to integer (although why
>>> it changed it to logical is another puzzle). If I instead do
>>>
>>> DT$z <- 0
>>>
>>> DT[, z:=5]
>>>
>>> It works fine.
>>>
>>> So my two points are:
>>>
>>> A) Doing DT[,z:=5] should be as informative as doing DT[1:nrow(DT),
>>> z:=5] with the error message.
>>>
>>> B) What went wrong with the NA assignment I did?
>>>
>>> Thanks!
>>> Chris
>>>
>>
>
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> 





More information about the datatable-help mailing list