[datatable-help] := unclarity and possible bug?

Chris Neff caneff at gmail.com
Thu Aug 4 19:43:49 CEST 2011


>> How can I make sure I install something that is the latest developer
>> build?
> No data.table user has been so keen to be so up to date, as you, before ;)
> So, short answer is I don't know.  Is it possible that the tar.gz source
> file is only created on R-Forge once per day, then?  I always thought
> type="source" fetched the latest commit, but perhaps it just gets the last
> nightly snapshot of tar.gz. If you could find out from R-Forge (maybe it is
> documented there, see it's documentation) please let us all know how.
> Worst case you can just 'svn up' and 'R CMD build' the package yourself.
> That's what we do as developers. Not ideal though, is it. More of a question
> for R-Forge support.

All this seems to just point to a stale R-Forge file. I can dig into
it a bit.  Want to stay up to date to make sure I'm not spamming the
list with bugs that are fixed.

>
> Matthew
>
>
> On 4 August 2011 11:15, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
>> Relief. Yes that's definitely not the latest version. That error is fixed
>> (otherwise it wouldn't be accepted to CRAN). The source tar.gz is on CRAN
>> now, that should work.
>>
>> I don't know why type="source" from R-Forge is so behind. I thought it was
>> bang up to date that way. If you could ask Stefan on R-Forge I'd be most
>> grateful. Or maybe someone else on the list knows. Might the package
>> have gotten locked in your install? Read up about 00Lock. Absolutely sure
>> no error messages on install? Did you require(data.table) in the "sudo R"
>> after the install just to tickle it? There are some conditions it rolls
>> back to previous (silently, but I never worked it out). If other R
>> processes are using the package (even zombies) it might get confused, so
>> kill all R sessions and use a sudo R --vanilla to install. Maybe an 'svn
>> up' way to get latest, but I never thought that was necessary. You did
>> "sudo R" to install, right? Otherwise it asks if you want to install
>> somewhere else, and you don't want to do that.
>>
>>
>> "Chris Neff" <caneff at gmail.com> wrote in message
>> news:CAAuY0RXA1-9pzL3SR=uF5iJw4MnU_nF40Lw-P1VpiN6Yn=cuDA at mail.gmail.com...
>>> test.data.table()
>> Running
>> /home/caneff/R/x86_64-pc-linux-gnu-library/2.12/data.table/tests/tests.R
>> Loading required package: ggplot2
>> Loading required package: reshape
>>
>> Attaching package: 'reshape'
>>
>> The following object(s) are masked from 'package:plyr':
>>
>> rename, round_any
>>
>> Loading required package: grid
>> Loading required package: proto
>> Test 304 Error in try(x, TRUE) : could not find function "haskey"
>> Error in eval(expr, envir, enclos) : 1 errors in test.data.table()
>>
>> And the banner says 1.6.3. I got it from the link after "Download:"
>> here: https://r-forge.r-project.org/R/?group_id=240
>>
>> Should I be clicking somewhere else for 1.6.4? Why won't
>> install.packages with type="source" get 1.6.4 for me either?
>>
>> On 4 August 2011 10:16, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
>>> does test.data.table() work? How many tests does it run? Does the startup
>>> banner state 1.6.4 ?
>>>
>>> "Chris Neff" <caneff at gmail.com> wrote in message
>>> news:CAAuY0RX3er-gSB52am1AZo4Ws9=fbXzqigaJnmfKqPxF_mn-8g at mail.gmail.com...
>>> I thought this might be the problem (from the R-forge download page):
>>> "In order to successfully install the packages provided on R-Forge,
>>> you have to switch to the most recent version of R or, alternatively,
>>> install from the package sources (.tar.gz) in older versions of R"
>>>
>>> I'm running 2.12.1 because it is built with internal company compilers
>>> for extra internal support. I tried just downloading the
>>> data.table_1.6.3.tar.gz file from r-forge and doing R CMD INSTALL, but
>>> still DT[,z:=5] doesn't work.
>>>
>>>
>>>
>>> On 4 August 2011 10:06, Chris Neff <caneff at gmail.com> wrote:
>>>> cacheOK=FALSE didn't fix it. Running on Ubuntu (so should I do
>>>> anything in that paragraph?). Session info:
>>>>
>>>>> sessionInfo()
>>>> R version 2.12.1 (2010-12-16)
>>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>>>
>>>> locale:
>>>> [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C
>>>> [3] LC_TIME=C LC_COLLATE=C
>>>> [5] LC_MONETARY=C LC_MESSAGES=en_US.utf8
>>>> [7] LC_PAPER=en_US.utf8 LC_NAME=C
>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>>> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
>>>>
>>>> attached base packages:
>>>> [1] stats graphics grDevices utils datasets methods base
>>>>
>>>> other attached packages:
>>>> [1] data.table_1.6.3
>>>>
>>>>
>>>> On 4 August 2011 09:56, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
>>>>>
>>>>> Try cacheOK=FALSE (passed on to download.file() via ...)
>>>>>
>>>>> Or, sessionInfo() please. Is it Windows (dll not being refeshed).
>>>>> Reboot,
>>>>> clear out, R --vanilla install, clear out browser cache manually.
>>>>> Failing
>>>>> all that, download file manually and install from file.
>>>>>
>>>>> The rnorm(10) is already a vector as long as the table itself =>
>>>>> invokes
>>>>> "replace" column. Since you the user already created it, it is plonked
>>>>> right into the column, bang.
>>>>> The other case is recycling or subassign, and that preserves the
>>>>> column's
>>>>> type (for speed, unlike data.frame).
>>>>> So, intended behaviour, just not what you expected.
>>>>>
>>>>>
>>>>> "Chris Neff" <caneff at gmail.com> wrote in message
>>>>> news:CAAuY0RWYJ8+wEbVG8XvYYqhSNFPV6XxgdQo8smRy+DRWGj5sfg at mail.gmail.com...
>>>>> I've ran the following 3 different times in new sessions:
>>>>>
>>>>> install.packages("data.table",
>>>>> repos="http://R-Forge.R-project.org",type="source")
>>>>>
>>>>> and still DT[,z:=5] does nothing. Is there something I check to make
>>>>> sure that the latest version is loaded?
>>>>>
>>>>>
>>>>> As for the coercion stuff, I feel that it feels somewhat inconsistent
>>>>> right now. For instance:
>>>>>
>>>>>> DT <- data.table(x=1:10, y=1:10)
>>>>>
>>>>>> DT$y <- TRUE
>>>>>
>>>>>> sapply(DT, class)
>>>>>
>>>>> x y
>>>>> "integer" "integer"
>>>>>
>>>>>> DT$y <- rnorm(10)
>>>>>> sapply(DT, class)
>>>>> x y
>>>>> "integer" "numeric"
>>>>>
>>>>> So in the first case y silently coerces the logical to an integer
>>>>> without warning, but in the second case y happily turns into a numeric
>>>>> when need be. Why the difference?
>>>>>
>>>>> When I do something like DT$y <- foo, I expect that y should turn into
>>>>> foo regardless of what y was before. If there is some reason why DT[,
>>>>> y:=foo] should be different than DT$y <- foo, that is a secondary
>>>>> matter, but I get mightily confused when DT$y <- foo doesn't behave
>>>>> like data.frame.
>>>>>
>>>>> On 4 August 2011 08:50, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
>>>>>> Still doesn't seem to be latest version: DT[,z:=5] should add column
>>>>>> (and
>>>>>> that's tested).
>>>>>> Otherwise correct and intended behaviour (although an informative
>>>>>> warning
>>>>>> needs adding when 5 gets coerced to type of column (i.e. logical) -
>>>>>> thanks
>>>>>> for spotting). Remember as.logical(5) is TRUE without warning. So, try
>>>>>> creating column with NA_integer_ or NA_real_ instead. Once the column
>>>>>> type
>>>>>> is set, that's it. Columns aren't coerced to match type of RHS, unlike
>>>>>> data.frame [which if you think about it is a big hit if the data is
>>>>>> large].
>>>>>>
>>>>>> "Chris Neff" <caneff at gmail.com> wrote in message
>>>>>> news:CAAuY0RXT7q+cm91PJ8KGkMwDApwFxM_EALb-Yu=P6ndp+LEfXg at mail.gmail.com...
>>>>>> Ignore this second one, restarting and refreshing my data.table
>>>>>> install now gives the proper error message when I try that. Sorry I'm
>>>>>> not used to being on the bleeding edge of these things and I forget to
>>>>>> update. However the first question is still mainly relevant:
>>>>>>
>>>>>>> DT <- data.table(x=1:10, y=rep(1:2,5))
>>>>>>> DT[,z:=5]
>>>>>> x y
>>>>>> [1,] 1 1
>>>>>> [2,] 2 2
>>>>>> [3,] 3 1
>>>>>> [4,] 4 2
>>>>>> [5,] 5 1
>>>>>> [6,] 6 2
>>>>>> [7,] 7 1
>>>>>> [8,] 8 2
>>>>>> [9,] 9 1
>>>>>> [10,] 10 2
>>>>>>> DT[1:nrow(DT),z:=5]
>>>>>> Error in `[.data.table`(DT, 1:nrow(DT), `:=`(z, 5)) :
>>>>>> Attempt to add new column(s) and set subset of rows at the same
>>>>>> time. Create the new column(s) first, and then you'll be able to
>>>>>> assign to a subset. If i is set to 1:nrow(x) then please remove that
>>>>>> (no need, it's faster without).
>>>>>>> DT$z <- NA
>>>>>>> DT[, z:=5]
>>>>>> x y z
>>>>>> [1,] 1 1 TRUE
>>>>>> [2,] 2 2 TRUE
>>>>>> [3,] 3 1 TRUE
>>>>>> [4,] 4 2 TRUE
>>>>>> [5,] 5 1 TRUE
>>>>>> [6,] 6 2 TRUE
>>>>>> [7,] 7 1 TRUE
>>>>>> [8,] 8 2 TRUE
>>>>>> [9,] 9 1 TRUE
>>>>>> [10,] 10 2 TRUE
>>>>>>
>>>>>>
>>>>>>
>>>>>> The return on DT[,z:=5] when I haven't initialized DT$z yet is
>>>>>> different, but still more uninformative than it is when I do
>>>>>> DT[1:nrow(DT), z:=5]. And the DT$z <- NA issue is still there.
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>>
>>>>>> On 4 August 2011 08:18, Chris Neff <caneff at gmail.com> wrote:
>>>>>>> A second question while I'm playing with it. It seems from the FRs
>>>>>>> that it doesn't support multiple := in one select, but:
>>>>>>>
>>>>>>> DT <- data.table(x=1:10, y=rep(1:2,10))
>>>>>>> DT$a = 0
>>>>>>> DT$z = 0
>>>>>>>
>>>>>>> DT[, list(a := y/sum(y), z := 5)]
>>>>>>>
>>>>>>> works just fine for me. An error gets thrown but afterwards the
>>>>>>> columns are modified as intended. Why the error?
>>>>>>>
>>>>>>>> DT[,list(z:=5,a:=y/sum(y))]
>>>>>>> z
>>>>>>> [1] 5
>>>>>>> [1] TRUE
>>>>>>> a
>>>>>>> y/sum(y)
>>>>>>> [1] TRUE
>>>>>>> Error in data.table(`:=`(z, 5), `:=`(a, y/sum(y))) :
>>>>>>> column or argument 1 is NULL
>>>>>>>> DT
>>>>>>> x y z a
>>>>>>> [1,] 1 1 5 0.06666667
>>>>>>> [2,] 2 2 5 0.13333333
>>>>>>> [3,] 3 1 5 0.06666667
>>>>>>> [4,] 4 2 5 0.13333333
>>>>>>> [5,] 5 1 5 0.06666667
>>>>>>> [6,] 6 2 5 0.13333333
>>>>>>> [7,] 7 1 5 0.06666667
>>>>>>> [8,] 8 2 5 0.13333333
>>>>>>> [9,] 9 1 5 0.06666667
>>>>>>> [10,] 10 2 5 0.13333333
>>>>>>>
>>>>>>> -Chris
>>>>>>>
>>>>>>> On 4 August 2011 08:12, Chris Neff <caneff at gmail.com> wrote:
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> If I do:
>>>>>>>>
>>>>>>>> DT <- data.table(x=1:10, y=rep(1:2,5))
>>>>>>>>
>>>>>>>> Then try the following
>>>>>>>>
>>>>>>>> DT[, z:=5]
>>>>>>>>
>>>>>>>> I get:
>>>>>>>>
>>>>>>>>> DT[, z:=5]
>>>>>>>> z
>>>>>>>> [1] 5
>>>>>>>> [1] TRUE
>>>>>>>> NULL
>>>>>>>>
>>>>>>>> and if I were to do DT <- DT[, z:=5], then DT gets set to NULL.
>>>>>>>> Alternatively if I do
>>>>>>>>
>>>>>>>> DT[1:10, z:=5]
>>>>>>>>
>>>>>>>> I get
>>>>>>>>
>>>>>>>>> DT=DT[1:nrow(DT),z:=5]
>>>>>>>> z
>>>>>>>> [1] 5
>>>>>>>> [1] 1 2 3 4 5 6 7 8 9 10
>>>>>>>> Error in `:=`(z, 5) :
>>>>>>>> Attempt to add new column(s) and set subset of rows at the same
>>>>>>>> time. Create the new column(s) first, and then you'll be able to
>>>>>>>> assign to a subset. If i is set to 1:nrow(x) then please remove that
>>>>>>>> (no need, it's faster without).
>>>>>>>>
>>>>>>>>
>>>>>>>> Which is more informative. So I do as it instructs:
>>>>>>>>
>>>>>>>> DT$z <- NA
>>>>>>>>
>>>>>>>> DT[, z:=5]
>>>>>>>>
>>>>>>>> And as output I get:
>>>>>>>>
>>>>>>>>> DT
>>>>>>>> x y z
>>>>>>>> [1,] 1 1 TRUE
>>>>>>>> [2,] 2 2 TRUE
>>>>>>>> [3,] 3 1 TRUE
>>>>>>>> [4,] 4 2 TRUE
>>>>>>>> [5,] 5 1 TRUE
>>>>>>>> [6,] 6 2 TRUE
>>>>>>>> [7,] 7 1 TRUE
>>>>>>>> [8,] 8 2 TRUE
>>>>>>>> [9,] 9 1 TRUE
>>>>>>>> [10,] 10 2 TRUE
>>>>>>>>
>>>>>>>>
>>>>>>>> Why isn't z 5 like assigned? I think it is because I assigned it as
>>>>>>>> NA, and data table didn't know to change it to integer (although why
>>>>>>>> it changed it to logical is another puzzle). If I instead do
>>>>>>>>
>>>>>>>> DT$z <- 0
>>>>>>>>
>>>>>>>> DT[, z:=5]
>>>>>>>>
>>>>>>>> It works fine.
>>>>>>>>
>>>>>>>> So my two points are:
>>>>>>>>
>>>>>>>> A) Doing DT[,z:=5] should be as informative as doing DT[1:nrow(DT),
>>>>>>>> z:=5] with the error message.
>>>>>>>>
>>>>>>>> B) What went wrong with the NA assignment I did?
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>> Chris
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> datatable-help mailing list
>>>>>> datatable-help at lists.r-forge.r-project.org
>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> datatable-help mailing list
>>>>> datatable-help at lists.r-forge.r-project.org
>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>>>
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> datatable-help mailing list
>>> datatable-help at lists.r-forge.r-project.org
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>
>>
>>
>>
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
>
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>


More information about the datatable-help mailing list