[datatable-help] := unclarity and possible bug?

Matthew Dowle mdowle at mdowle.plus.com
Thu Aug 4 16:16:28 CEST 2011


does test.data.table() work?  How many tests does it run?  Does the startup 
banner state 1.6.4 ?

"Chris Neff" <caneff at gmail.com> wrote in message 
news:CAAuY0RX3er-gSB52am1AZo4Ws9=fbXzqigaJnmfKqPxF_mn-8g at mail.gmail.com...
I thought this might be the problem (from the R-forge download page):
"In order to successfully install the packages provided on R-Forge,
you have to switch to the most recent version of R or, alternatively,
install from the package sources (.tar.gz) in older versions of R"

I'm running 2.12.1 because it is built with internal company compilers
for extra internal support. I tried just downloading the
data.table_1.6.3.tar.gz file from r-forge and doing R CMD INSTALL, but
still DT[,z:=5] doesn't work.



On 4 August 2011 10:06, Chris Neff <caneff at gmail.com> wrote:
> cacheOK=FALSE didn't fix it. Running on Ubuntu (so should I do
> anything in that paragraph?). Session info:
>
>> sessionInfo()
> R version 2.12.1 (2010-12-16)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C
> [3] LC_TIME=C LC_COLLATE=C
> [5] LC_MONETARY=C LC_MESSAGES=en_US.utf8
> [7] LC_PAPER=en_US.utf8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] data.table_1.6.3
>
>
> On 4 August 2011 09:56, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
>>
>> Try cacheOK=FALSE (passed on to download.file() via ...)
>>
>> Or, sessionInfo() please. Is it Windows (dll not being refeshed). Reboot,
>> clear out, R --vanilla install, clear out browser cache manually. Failing
>> all that, download file manually and install from file.
>>
>> The rnorm(10) is already a vector as long as the table itself => invokes
>> "replace" column. Since you the user already created it, it is plonked
>> right into the column, bang.
>> The other case is recycling or subassign, and that preserves the column's
>> type (for speed, unlike data.frame).
>> So, intended behaviour, just not what you expected.
>>
>>
>> "Chris Neff" <caneff at gmail.com> wrote in message
>> news:CAAuY0RWYJ8+wEbVG8XvYYqhSNFPV6XxgdQo8smRy+DRWGj5sfg at mail.gmail.com...
>> I've ran the following 3 different times in new sessions:
>>
>> install.packages("data.table",
>> repos="http://R-Forge.R-project.org",type="source")
>>
>> and still DT[,z:=5] does nothing. Is there something I check to make
>> sure that the latest version is loaded?
>>
>>
>> As for the coercion stuff, I feel that it feels somewhat inconsistent
>> right now. For instance:
>>
>>> DT <- data.table(x=1:10, y=1:10)
>>
>>> DT$y <- TRUE
>>
>>> sapply(DT, class)
>>
>> x y
>> "integer" "integer"
>>
>>> DT$y <- rnorm(10)
>>> sapply(DT, class)
>> x y
>> "integer" "numeric"
>>
>> So in the first case y silently coerces the logical to an integer
>> without warning, but in the second case y happily turns into a numeric
>> when need be. Why the difference?
>>
>> When I do something like DT$y <- foo, I expect that y should turn into
>> foo regardless of what y was before. If there is some reason why DT[,
>> y:=foo] should be different than DT$y <- foo, that is a secondary
>> matter, but I get mightily confused when DT$y <- foo doesn't behave
>> like data.frame.
>>
>> On 4 August 2011 08:50, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
>>> Still doesn't seem to be latest version: DT[,z:=5] should add column 
>>> (and
>>> that's tested).
>>> Otherwise correct and intended behaviour (although an informative 
>>> warning
>>> needs adding when 5 gets coerced to type of column (i.e. logical) - 
>>> thanks
>>> for spotting). Remember as.logical(5) is TRUE without warning. So, try
>>> creating column with NA_integer_ or NA_real_ instead. Once the column 
>>> type
>>> is set, that's it. Columns aren't coerced to match type of RHS, unlike
>>> data.frame [which if you think about it is a big hit if the data is
>>> large].
>>>
>>> "Chris Neff" <caneff at gmail.com> wrote in message
>>> news:CAAuY0RXT7q+cm91PJ8KGkMwDApwFxM_EALb-Yu=P6ndp+LEfXg at mail.gmail.com...
>>> Ignore this second one, restarting and refreshing my data.table
>>> install now gives the proper error message when I try that. Sorry I'm
>>> not used to being on the bleeding edge of these things and I forget to
>>> update. However the first question is still mainly relevant:
>>>
>>>> DT <- data.table(x=1:10, y=rep(1:2,5))
>>>> DT[,z:=5]
>>> x y
>>> [1,] 1 1
>>> [2,] 2 2
>>> [3,] 3 1
>>> [4,] 4 2
>>> [5,] 5 1
>>> [6,] 6 2
>>> [7,] 7 1
>>> [8,] 8 2
>>> [9,] 9 1
>>> [10,] 10 2
>>>> DT[1:nrow(DT),z:=5]
>>> Error in `[.data.table`(DT, 1:nrow(DT), `:=`(z, 5)) :
>>> Attempt to add new column(s) and set subset of rows at the same
>>> time. Create the new column(s) first, and then you'll be able to
>>> assign to a subset. If i is set to 1:nrow(x) then please remove that
>>> (no need, it's faster without).
>>>> DT$z <- NA
>>>> DT[, z:=5]
>>> x y z
>>> [1,] 1 1 TRUE
>>> [2,] 2 2 TRUE
>>> [3,] 3 1 TRUE
>>> [4,] 4 2 TRUE
>>> [5,] 5 1 TRUE
>>> [6,] 6 2 TRUE
>>> [7,] 7 1 TRUE
>>> [8,] 8 2 TRUE
>>> [9,] 9 1 TRUE
>>> [10,] 10 2 TRUE
>>>
>>>
>>>
>>> The return on DT[,z:=5] when I haven't initialized DT$z yet is
>>> different, but still more uninformative than it is when I do
>>> DT[1:nrow(DT), z:=5]. And the DT$z <- NA issue is still there.
>>>
>>> Thanks!
>>>
>>>
>>> On 4 August 2011 08:18, Chris Neff <caneff at gmail.com> wrote:
>>>> A second question while I'm playing with it. It seems from the FRs
>>>> that it doesn't support multiple := in one select, but:
>>>>
>>>> DT <- data.table(x=1:10, y=rep(1:2,10))
>>>> DT$a = 0
>>>> DT$z = 0
>>>>
>>>> DT[, list(a := y/sum(y), z := 5)]
>>>>
>>>> works just fine for me. An error gets thrown but afterwards the
>>>> columns are modified as intended. Why the error?
>>>>
>>>>> DT[,list(z:=5,a:=y/sum(y))]
>>>> z
>>>> [1] 5
>>>> [1] TRUE
>>>> a
>>>> y/sum(y)
>>>> [1] TRUE
>>>> Error in data.table(`:=`(z, 5), `:=`(a, y/sum(y))) :
>>>> column or argument 1 is NULL
>>>>> DT
>>>> x y z a
>>>> [1,] 1 1 5 0.06666667
>>>> [2,] 2 2 5 0.13333333
>>>> [3,] 3 1 5 0.06666667
>>>> [4,] 4 2 5 0.13333333
>>>> [5,] 5 1 5 0.06666667
>>>> [6,] 6 2 5 0.13333333
>>>> [7,] 7 1 5 0.06666667
>>>> [8,] 8 2 5 0.13333333
>>>> [9,] 9 1 5 0.06666667
>>>> [10,] 10 2 5 0.13333333
>>>>
>>>> -Chris
>>>>
>>>> On 4 August 2011 08:12, Chris Neff <caneff at gmail.com> wrote:
>>>>> Hi all,
>>>>>
>>>>> If I do:
>>>>>
>>>>> DT <- data.table(x=1:10, y=rep(1:2,5))
>>>>>
>>>>> Then try the following
>>>>>
>>>>> DT[, z:=5]
>>>>>
>>>>> I get:
>>>>>
>>>>>> DT[, z:=5]
>>>>> z
>>>>> [1] 5
>>>>> [1] TRUE
>>>>> NULL
>>>>>
>>>>> and if I were to do DT <- DT[, z:=5], then DT gets set to NULL.
>>>>> Alternatively if I do
>>>>>
>>>>> DT[1:10, z:=5]
>>>>>
>>>>> I get
>>>>>
>>>>>> DT=DT[1:nrow(DT),z:=5]
>>>>> z
>>>>> [1] 5
>>>>> [1] 1 2 3 4 5 6 7 8 9 10
>>>>> Error in `:=`(z, 5) :
>>>>> Attempt to add new column(s) and set subset of rows at the same
>>>>> time. Create the new column(s) first, and then you'll be able to
>>>>> assign to a subset. If i is set to 1:nrow(x) then please remove that
>>>>> (no need, it's faster without).
>>>>>
>>>>>
>>>>> Which is more informative. So I do as it instructs:
>>>>>
>>>>> DT$z <- NA
>>>>>
>>>>> DT[, z:=5]
>>>>>
>>>>> And as output I get:
>>>>>
>>>>>> DT
>>>>> x y z
>>>>> [1,] 1 1 TRUE
>>>>> [2,] 2 2 TRUE
>>>>> [3,] 3 1 TRUE
>>>>> [4,] 4 2 TRUE
>>>>> [5,] 5 1 TRUE
>>>>> [6,] 6 2 TRUE
>>>>> [7,] 7 1 TRUE
>>>>> [8,] 8 2 TRUE
>>>>> [9,] 9 1 TRUE
>>>>> [10,] 10 2 TRUE
>>>>>
>>>>>
>>>>> Why isn't z 5 like assigned? I think it is because I assigned it as
>>>>> NA, and data table didn't know to change it to integer (although why
>>>>> it changed it to logical is another puzzle). If I instead do
>>>>>
>>>>> DT$z <- 0
>>>>>
>>>>> DT[, z:=5]
>>>>>
>>>>> It works fine.
>>>>>
>>>>> So my two points are:
>>>>>
>>>>> A) Doing DT[,z:=5] should be as informative as doing DT[1:nrow(DT),
>>>>> z:=5] with the error message.
>>>>>
>>>>> B) What went wrong with the NA assignment I did?
>>>>>
>>>>> Thanks!
>>>>> Chris
>>>>>
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> datatable-help mailing list
>>> datatable-help at lists.r-forge.r-project.org
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>
>>
>>
>>
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
> 





More information about the datatable-help mailing list