[datatable-help] := unclarity and possible bug?

Chris Neff caneff at gmail.com
Thu Aug 4 16:27:16 CEST 2011


> test.data.table()
Running /home/caneff/R/x86_64-pc-linux-gnu-library/2.12/data.table/tests/tests.R
Loading required package: ggplot2
Loading required package: reshape

Attaching package: 'reshape'

The following object(s) are masked from 'package:plyr':

    rename, round_any

Loading required package: grid
Loading required package: proto
Test 304 Error in try(x, TRUE) : could not find function "haskey"
Error in eval(expr, envir, enclos) : 1 errors in test.data.table()

And the banner says 1.6.3.  I got it from the link after "Download:"
here: https://r-forge.r-project.org/R/?group_id=240

Should I be clicking somewhere else for 1.6.4? Why won't
install.packages with type="source" get 1.6.4 for me either?

On 4 August 2011 10:16, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
> does test.data.table() work?  How many tests does it run?  Does the startup
> banner state 1.6.4 ?
>
> "Chris Neff" <caneff at gmail.com> wrote in message
> news:CAAuY0RX3er-gSB52am1AZo4Ws9=fbXzqigaJnmfKqPxF_mn-8g at mail.gmail.com...
> I thought this might be the problem (from the R-forge download page):
> "In order to successfully install the packages provided on R-Forge,
> you have to switch to the most recent version of R or, alternatively,
> install from the package sources (.tar.gz) in older versions of R"
>
> I'm running 2.12.1 because it is built with internal company compilers
> for extra internal support. I tried just downloading the
> data.table_1.6.3.tar.gz file from r-forge and doing R CMD INSTALL, but
> still DT[,z:=5] doesn't work.
>
>
>
> On 4 August 2011 10:06, Chris Neff <caneff at gmail.com> wrote:
>> cacheOK=FALSE didn't fix it. Running on Ubuntu (so should I do
>> anything in that paragraph?). Session info:
>>
>>> sessionInfo()
>> R version 2.12.1 (2010-12-16)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C
>> [3] LC_TIME=C LC_COLLATE=C
>> [5] LC_MONETARY=C LC_MESSAGES=en_US.utf8
>> [7] LC_PAPER=en_US.utf8 LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>>
>> other attached packages:
>> [1] data.table_1.6.3
>>
>>
>> On 4 August 2011 09:56, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
>>>
>>> Try cacheOK=FALSE (passed on to download.file() via ...)
>>>
>>> Or, sessionInfo() please. Is it Windows (dll not being refeshed). Reboot,
>>> clear out, R --vanilla install, clear out browser cache manually. Failing
>>> all that, download file manually and install from file.
>>>
>>> The rnorm(10) is already a vector as long as the table itself => invokes
>>> "replace" column. Since you the user already created it, it is plonked
>>> right into the column, bang.
>>> The other case is recycling or subassign, and that preserves the column's
>>> type (for speed, unlike data.frame).
>>> So, intended behaviour, just not what you expected.
>>>
>>>
>>> "Chris Neff" <caneff at gmail.com> wrote in message
>>> news:CAAuY0RWYJ8+wEbVG8XvYYqhSNFPV6XxgdQo8smRy+DRWGj5sfg at mail.gmail.com...
>>> I've ran the following 3 different times in new sessions:
>>>
>>> install.packages("data.table",
>>> repos="http://R-Forge.R-project.org",type="source")
>>>
>>> and still DT[,z:=5] does nothing. Is there something I check to make
>>> sure that the latest version is loaded?
>>>
>>>
>>> As for the coercion stuff, I feel that it feels somewhat inconsistent
>>> right now. For instance:
>>>
>>>> DT <- data.table(x=1:10, y=1:10)
>>>
>>>> DT$y <- TRUE
>>>
>>>> sapply(DT, class)
>>>
>>> x y
>>> "integer" "integer"
>>>
>>>> DT$y <- rnorm(10)
>>>> sapply(DT, class)
>>> x y
>>> "integer" "numeric"
>>>
>>> So in the first case y silently coerces the logical to an integer
>>> without warning, but in the second case y happily turns into a numeric
>>> when need be. Why the difference?
>>>
>>> When I do something like DT$y <- foo, I expect that y should turn into
>>> foo regardless of what y was before. If there is some reason why DT[,
>>> y:=foo] should be different than DT$y <- foo, that is a secondary
>>> matter, but I get mightily confused when DT$y <- foo doesn't behave
>>> like data.frame.
>>>
>>> On 4 August 2011 08:50, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
>>>> Still doesn't seem to be latest version: DT[,z:=5] should add column
>>>> (and
>>>> that's tested).
>>>> Otherwise correct and intended behaviour (although an informative
>>>> warning
>>>> needs adding when 5 gets coerced to type of column (i.e. logical) -
>>>> thanks
>>>> for spotting). Remember as.logical(5) is TRUE without warning. So, try
>>>> creating column with NA_integer_ or NA_real_ instead. Once the column
>>>> type
>>>> is set, that's it. Columns aren't coerced to match type of RHS, unlike
>>>> data.frame [which if you think about it is a big hit if the data is
>>>> large].
>>>>
>>>> "Chris Neff" <caneff at gmail.com> wrote in message
>>>> news:CAAuY0RXT7q+cm91PJ8KGkMwDApwFxM_EALb-Yu=P6ndp+LEfXg at mail.gmail.com...
>>>> Ignore this second one, restarting and refreshing my data.table
>>>> install now gives the proper error message when I try that. Sorry I'm
>>>> not used to being on the bleeding edge of these things and I forget to
>>>> update. However the first question is still mainly relevant:
>>>>
>>>>> DT <- data.table(x=1:10, y=rep(1:2,5))
>>>>> DT[,z:=5]
>>>> x y
>>>> [1,] 1 1
>>>> [2,] 2 2
>>>> [3,] 3 1
>>>> [4,] 4 2
>>>> [5,] 5 1
>>>> [6,] 6 2
>>>> [7,] 7 1
>>>> [8,] 8 2
>>>> [9,] 9 1
>>>> [10,] 10 2
>>>>> DT[1:nrow(DT),z:=5]
>>>> Error in `[.data.table`(DT, 1:nrow(DT), `:=`(z, 5)) :
>>>> Attempt to add new column(s) and set subset of rows at the same
>>>> time. Create the new column(s) first, and then you'll be able to
>>>> assign to a subset. If i is set to 1:nrow(x) then please remove that
>>>> (no need, it's faster without).
>>>>> DT$z <- NA
>>>>> DT[, z:=5]
>>>> x y z
>>>> [1,] 1 1 TRUE
>>>> [2,] 2 2 TRUE
>>>> [3,] 3 1 TRUE
>>>> [4,] 4 2 TRUE
>>>> [5,] 5 1 TRUE
>>>> [6,] 6 2 TRUE
>>>> [7,] 7 1 TRUE
>>>> [8,] 8 2 TRUE
>>>> [9,] 9 1 TRUE
>>>> [10,] 10 2 TRUE
>>>>
>>>>
>>>>
>>>> The return on DT[,z:=5] when I haven't initialized DT$z yet is
>>>> different, but still more uninformative than it is when I do
>>>> DT[1:nrow(DT), z:=5]. And the DT$z <- NA issue is still there.
>>>>
>>>> Thanks!
>>>>
>>>>
>>>> On 4 August 2011 08:18, Chris Neff <caneff at gmail.com> wrote:
>>>>> A second question while I'm playing with it. It seems from the FRs
>>>>> that it doesn't support multiple := in one select, but:
>>>>>
>>>>> DT <- data.table(x=1:10, y=rep(1:2,10))
>>>>> DT$a = 0
>>>>> DT$z = 0
>>>>>
>>>>> DT[, list(a := y/sum(y), z := 5)]
>>>>>
>>>>> works just fine for me. An error gets thrown but afterwards the
>>>>> columns are modified as intended. Why the error?
>>>>>
>>>>>> DT[,list(z:=5,a:=y/sum(y))]
>>>>> z
>>>>> [1] 5
>>>>> [1] TRUE
>>>>> a
>>>>> y/sum(y)
>>>>> [1] TRUE
>>>>> Error in data.table(`:=`(z, 5), `:=`(a, y/sum(y))) :
>>>>> column or argument 1 is NULL
>>>>>> DT
>>>>> x y z a
>>>>> [1,] 1 1 5 0.06666667
>>>>> [2,] 2 2 5 0.13333333
>>>>> [3,] 3 1 5 0.06666667
>>>>> [4,] 4 2 5 0.13333333
>>>>> [5,] 5 1 5 0.06666667
>>>>> [6,] 6 2 5 0.13333333
>>>>> [7,] 7 1 5 0.06666667
>>>>> [8,] 8 2 5 0.13333333
>>>>> [9,] 9 1 5 0.06666667
>>>>> [10,] 10 2 5 0.13333333
>>>>>
>>>>> -Chris
>>>>>
>>>>> On 4 August 2011 08:12, Chris Neff <caneff at gmail.com> wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> If I do:
>>>>>>
>>>>>> DT <- data.table(x=1:10, y=rep(1:2,5))
>>>>>>
>>>>>> Then try the following
>>>>>>
>>>>>> DT[, z:=5]
>>>>>>
>>>>>> I get:
>>>>>>
>>>>>>> DT[, z:=5]
>>>>>> z
>>>>>> [1] 5
>>>>>> [1] TRUE
>>>>>> NULL
>>>>>>
>>>>>> and if I were to do DT <- DT[, z:=5], then DT gets set to NULL.
>>>>>> Alternatively if I do
>>>>>>
>>>>>> DT[1:10, z:=5]
>>>>>>
>>>>>> I get
>>>>>>
>>>>>>> DT=DT[1:nrow(DT),z:=5]
>>>>>> z
>>>>>> [1] 5
>>>>>> [1] 1 2 3 4 5 6 7 8 9 10
>>>>>> Error in `:=`(z, 5) :
>>>>>> Attempt to add new column(s) and set subset of rows at the same
>>>>>> time. Create the new column(s) first, and then you'll be able to
>>>>>> assign to a subset. If i is set to 1:nrow(x) then please remove that
>>>>>> (no need, it's faster without).
>>>>>>
>>>>>>
>>>>>> Which is more informative. So I do as it instructs:
>>>>>>
>>>>>> DT$z <- NA
>>>>>>
>>>>>> DT[, z:=5]
>>>>>>
>>>>>> And as output I get:
>>>>>>
>>>>>>> DT
>>>>>> x y z
>>>>>> [1,] 1 1 TRUE
>>>>>> [2,] 2 2 TRUE
>>>>>> [3,] 3 1 TRUE
>>>>>> [4,] 4 2 TRUE
>>>>>> [5,] 5 1 TRUE
>>>>>> [6,] 6 2 TRUE
>>>>>> [7,] 7 1 TRUE
>>>>>> [8,] 8 2 TRUE
>>>>>> [9,] 9 1 TRUE
>>>>>> [10,] 10 2 TRUE
>>>>>>
>>>>>>
>>>>>> Why isn't z 5 like assigned? I think it is because I assigned it as
>>>>>> NA, and data table didn't know to change it to integer (although why
>>>>>> it changed it to logical is another puzzle). If I instead do
>>>>>>
>>>>>> DT$z <- 0
>>>>>>
>>>>>> DT[, z:=5]
>>>>>>
>>>>>> It works fine.
>>>>>>
>>>>>> So my two points are:
>>>>>>
>>>>>> A) Doing DT[,z:=5] should be as informative as doing DT[1:nrow(DT),
>>>>>> z:=5] with the error message.
>>>>>>
>>>>>> B) What went wrong with the NA assignment I did?
>>>>>>
>>>>>> Thanks!
>>>>>> Chris
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> datatable-help mailing list
>>>> datatable-help at lists.r-forge.r-project.org
>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> datatable-help mailing list
>>> datatable-help at lists.r-forge.r-project.org
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>
>>
>
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>


More information about the datatable-help mailing list