[datatable-help] := unclarity and possible bug?

Chris Neff caneff at gmail.com
Thu Aug 4 16:06:10 CEST 2011


cacheOK=FALSE didn't fix it.  Running on Ubuntu (so should I do
anything in that paragraph?). Session info:

> sessionInfo()
R version 2.12.1 (2010-12-16)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C
 [3] LC_TIME=C                 LC_COLLATE=C
 [5] LC_MONETARY=C             LC_MESSAGES=en_US.utf8
 [7] LC_PAPER=en_US.utf8       LC_NAME=C
 [9] LC_ADDRESS=C              LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] data.table_1.6.3


On 4 August 2011 09:56, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
>
> Try cacheOK=FALSE (passed on to download.file() via ...)
>
> Or, sessionInfo() please. Is it Windows (dll not being refeshed).  Reboot,
> clear out,  R --vanilla install,  clear out browser cache manually.  Failing
> all that,  download file manually and install from file.
>
> The rnorm(10) is already a vector as long as the table itself => invokes
> "replace" column. Since you the user already created it,  it is plonked
> right into the column, bang.
> The other case is recycling or subassign, and that preserves the column's
> type (for speed, unlike data.frame).
> So, intended behaviour,  just not what you expected.
>
>
> "Chris Neff" <caneff at gmail.com> wrote in message
> news:CAAuY0RWYJ8+wEbVG8XvYYqhSNFPV6XxgdQo8smRy+DRWGj5sfg at mail.gmail.com...
> I've ran the following 3 different times in new sessions:
>
> install.packages("data.table",
> repos="http://R-Forge.R-project.org",type="source")
>
> and still DT[,z:=5] does nothing.  Is there something I check to make
> sure that the latest version is loaded?
>
>
> As for the coercion stuff, I feel that it feels somewhat inconsistent
> right now. For instance:
>
>> DT <- data.table(x=1:10, y=1:10)
>
>> DT$y <- TRUE
>
>> sapply(DT, class)
>
>        x         y
> "integer" "integer"
>
>> DT$y <- rnorm(10)
>> sapply(DT, class)
>        x         y
> "integer" "numeric"
>
> So in the first case y silently coerces the logical to an integer
> without warning, but in the second case y happily turns into a numeric
> when need be.  Why the difference?
>
> When I do something like DT$y <- foo, I expect that y should turn into
> foo regardless of what y was before.  If there is some reason why DT[,
> y:=foo] should be different than DT$y <- foo, that is a secondary
> matter, but I get mightily confused when DT$y <- foo doesn't behave
> like data.frame.
>
> On 4 August 2011 08:50, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
>> Still doesn't seem to be latest version: DT[,z:=5] should add column (and
>> that's tested).
>> Otherwise correct and intended behaviour (although an informative warning
>> needs adding when 5 gets coerced to type of column (i.e. logical) - thanks
>> for spotting). Remember as.logical(5) is TRUE without warning. So, try
>> creating column with NA_integer_ or NA_real_ instead. Once the column type
>> is set, that's it. Columns aren't coerced to match type of RHS, unlike
>> data.frame [which if you think about it is a big hit if the data is
>> large].
>>
>> "Chris Neff" <caneff at gmail.com> wrote in message
>> news:CAAuY0RXT7q+cm91PJ8KGkMwDApwFxM_EALb-Yu=P6ndp+LEfXg at mail.gmail.com...
>> Ignore this second one, restarting and refreshing my data.table
>> install now gives the proper error message when I try that. Sorry I'm
>> not used to being on the bleeding edge of these things and I forget to
>> update. However the first question is still mainly relevant:
>>
>>> DT <- data.table(x=1:10, y=rep(1:2,5))
>>> DT[,z:=5]
>> x y
>> [1,] 1 1
>> [2,] 2 2
>> [3,] 3 1
>> [4,] 4 2
>> [5,] 5 1
>> [6,] 6 2
>> [7,] 7 1
>> [8,] 8 2
>> [9,] 9 1
>> [10,] 10 2
>>> DT[1:nrow(DT),z:=5]
>> Error in `[.data.table`(DT, 1:nrow(DT), `:=`(z, 5)) :
>> Attempt to add new column(s) and set subset of rows at the same
>> time. Create the new column(s) first, and then you'll be able to
>> assign to a subset. If i is set to 1:nrow(x) then please remove that
>> (no need, it's faster without).
>>> DT$z <- NA
>>> DT[, z:=5]
>> x y z
>> [1,] 1 1 TRUE
>> [2,] 2 2 TRUE
>> [3,] 3 1 TRUE
>> [4,] 4 2 TRUE
>> [5,] 5 1 TRUE
>> [6,] 6 2 TRUE
>> [7,] 7 1 TRUE
>> [8,] 8 2 TRUE
>> [9,] 9 1 TRUE
>> [10,] 10 2 TRUE
>>
>>
>>
>> The return on DT[,z:=5] when I haven't initialized DT$z yet is
>> different, but still more uninformative than it is when I do
>> DT[1:nrow(DT), z:=5]. And the DT$z <- NA issue is still there.
>>
>> Thanks!
>>
>>
>> On 4 August 2011 08:18, Chris Neff <caneff at gmail.com> wrote:
>>> A second question while I'm playing with it. It seems from the FRs
>>> that it doesn't support multiple := in one select, but:
>>>
>>> DT <- data.table(x=1:10, y=rep(1:2,10))
>>> DT$a = 0
>>> DT$z = 0
>>>
>>> DT[, list(a := y/sum(y), z := 5)]
>>>
>>> works just fine for me. An error gets thrown but afterwards the
>>> columns are modified as intended. Why the error?
>>>
>>>> DT[,list(z:=5,a:=y/sum(y))]
>>> z
>>> [1] 5
>>> [1] TRUE
>>> a
>>> y/sum(y)
>>> [1] TRUE
>>> Error in data.table(`:=`(z, 5), `:=`(a, y/sum(y))) :
>>> column or argument 1 is NULL
>>>> DT
>>> x y z a
>>> [1,] 1 1 5 0.06666667
>>> [2,] 2 2 5 0.13333333
>>> [3,] 3 1 5 0.06666667
>>> [4,] 4 2 5 0.13333333
>>> [5,] 5 1 5 0.06666667
>>> [6,] 6 2 5 0.13333333
>>> [7,] 7 1 5 0.06666667
>>> [8,] 8 2 5 0.13333333
>>> [9,] 9 1 5 0.06666667
>>> [10,] 10 2 5 0.13333333
>>>
>>> -Chris
>>>
>>> On 4 August 2011 08:12, Chris Neff <caneff at gmail.com> wrote:
>>>> Hi all,
>>>>
>>>> If I do:
>>>>
>>>> DT <- data.table(x=1:10, y=rep(1:2,5))
>>>>
>>>> Then try the following
>>>>
>>>> DT[, z:=5]
>>>>
>>>> I get:
>>>>
>>>>> DT[, z:=5]
>>>> z
>>>> [1] 5
>>>> [1] TRUE
>>>> NULL
>>>>
>>>> and if I were to do DT <- DT[, z:=5], then DT gets set to NULL.
>>>> Alternatively if I do
>>>>
>>>> DT[1:10, z:=5]
>>>>
>>>> I get
>>>>
>>>>> DT=DT[1:nrow(DT),z:=5]
>>>> z
>>>> [1] 5
>>>> [1] 1 2 3 4 5 6 7 8 9 10
>>>> Error in `:=`(z, 5) :
>>>> Attempt to add new column(s) and set subset of rows at the same
>>>> time. Create the new column(s) first, and then you'll be able to
>>>> assign to a subset. If i is set to 1:nrow(x) then please remove that
>>>> (no need, it's faster without).
>>>>
>>>>
>>>> Which is more informative. So I do as it instructs:
>>>>
>>>> DT$z <- NA
>>>>
>>>> DT[, z:=5]
>>>>
>>>> And as output I get:
>>>>
>>>>> DT
>>>> x y z
>>>> [1,] 1 1 TRUE
>>>> [2,] 2 2 TRUE
>>>> [3,] 3 1 TRUE
>>>> [4,] 4 2 TRUE
>>>> [5,] 5 1 TRUE
>>>> [6,] 6 2 TRUE
>>>> [7,] 7 1 TRUE
>>>> [8,] 8 2 TRUE
>>>> [9,] 9 1 TRUE
>>>> [10,] 10 2 TRUE
>>>>
>>>>
>>>> Why isn't z 5 like assigned? I think it is because I assigned it as
>>>> NA, and data table didn't know to change it to integer (although why
>>>> it changed it to logical is another puzzle). If I instead do
>>>>
>>>> DT$z <- 0
>>>>
>>>> DT[, z:=5]
>>>>
>>>> It works fine.
>>>>
>>>> So my two points are:
>>>>
>>>> A) Doing DT[,z:=5] should be as informative as doing DT[1:nrow(DT),
>>>> z:=5] with the error message.
>>>>
>>>> B) What went wrong with the NA assignment I did?
>>>>
>>>> Thanks!
>>>> Chris
>>>>
>>>
>>
>>
>>
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
>
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>


More information about the datatable-help mailing list