[datatable-help] := unclarity and possible bug?

Matthew Dowle mdowle at mdowle.plus.com
Thu Aug 4 19:29:21 CEST 2011


>"Chris Neff" <caneff at gmail.com> wrote in message 
>news:CAAuY0RVQNGPZ9uW2-eE8Nvh=rBQS0wpDAB6c3GFx7prtAGy3hQ at mail.gmail.com...
> The package absolutely installed. I did have 00Lock issues

Ok, that was why then.

> and deleted it to make it work.

Perhaps updating the same version of a package is more likely to generate
the 00Lock.  If you can work out how to avoid the 00Lock in the first place,
please let us all know how :-)

> For what its worth, the R-Forge build seems to be failing the same
> way: 
> https://r-forge.r-project.org/R/?group_id=240&log=check_x86_64_linux&pkg=data.table&flavor=devel

That's to be expected because the R-Forge builds nightly (see it's schedule 
on it's website), so behind by many hours.

> I took the tar from cran and installed it. now DT[,z:=5] works.

Yeehah! Relief.

> However that still means I'm not running the latest developer build.

That's ok. Nothing has been committed to it yet, you're not missing out 
currently.

> How can I make sure I install something that is the latest developer
> build?
No data.table user has been so keen to be so up to date, as you, before ;) 
So, short answer is I don't know.  Is it possible that the tar.gz source 
file is only created on R-Forge once per day, then?  I always thought 
type="source" fetched the latest commit, but perhaps it just gets the last 
nightly snapshot of tar.gz. If you could find out from R-Forge (maybe it is 
documented there, see it's documentation) please let us all know how. 
Worst case you can just 'svn up' and 'R CMD build' the package yourself. 
That's what we do as developers. Not ideal though, is it. More of a question 
for R-Forge support.

Matthew


On 4 August 2011 11:15, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
> Relief. Yes that's definitely not the latest version. That error is fixed
> (otherwise it wouldn't be accepted to CRAN). The source tar.gz is on CRAN
> now, that should work.
>
> I don't know why type="source" from R-Forge is so behind. I thought it was
> bang up to date that way. If you could ask Stefan on R-Forge I'd be most
> grateful. Or maybe someone else on the list knows. Might the package
> have gotten locked in your install? Read up about 00Lock. Absolutely sure
> no error messages on install? Did you require(data.table) in the "sudo R"
> after the install just to tickle it? There are some conditions it rolls
> back to previous (silently, but I never worked it out). If other R
> processes are using the package (even zombies) it might get confused, so
> kill all R sessions and use a sudo R --vanilla to install. Maybe an 'svn
> up' way to get latest, but I never thought that was necessary. You did
> "sudo R" to install, right? Otherwise it asks if you want to install
> somewhere else, and you don't want to do that.
>
>
> "Chris Neff" <caneff at gmail.com> wrote in message
> news:CAAuY0RXA1-9pzL3SR=uF5iJw4MnU_nF40Lw-P1VpiN6Yn=cuDA at mail.gmail.com...
>> test.data.table()
> Running
> /home/caneff/R/x86_64-pc-linux-gnu-library/2.12/data.table/tests/tests.R
> Loading required package: ggplot2
> Loading required package: reshape
>
> Attaching package: 'reshape'
>
> The following object(s) are masked from 'package:plyr':
>
> rename, round_any
>
> Loading required package: grid
> Loading required package: proto
> Test 304 Error in try(x, TRUE) : could not find function "haskey"
> Error in eval(expr, envir, enclos) : 1 errors in test.data.table()
>
> And the banner says 1.6.3. I got it from the link after "Download:"
> here: https://r-forge.r-project.org/R/?group_id=240
>
> Should I be clicking somewhere else for 1.6.4? Why won't
> install.packages with type="source" get 1.6.4 for me either?
>
> On 4 August 2011 10:16, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
>> does test.data.table() work? How many tests does it run? Does the startup
>> banner state 1.6.4 ?
>>
>> "Chris Neff" <caneff at gmail.com> wrote in message
>> news:CAAuY0RX3er-gSB52am1AZo4Ws9=fbXzqigaJnmfKqPxF_mn-8g at mail.gmail.com...
>> I thought this might be the problem (from the R-forge download page):
>> "In order to successfully install the packages provided on R-Forge,
>> you have to switch to the most recent version of R or, alternatively,
>> install from the package sources (.tar.gz) in older versions of R"
>>
>> I'm running 2.12.1 because it is built with internal company compilers
>> for extra internal support. I tried just downloading the
>> data.table_1.6.3.tar.gz file from r-forge and doing R CMD INSTALL, but
>> still DT[,z:=5] doesn't work.
>>
>>
>>
>> On 4 August 2011 10:06, Chris Neff <caneff at gmail.com> wrote:
>>> cacheOK=FALSE didn't fix it. Running on Ubuntu (so should I do
>>> anything in that paragraph?). Session info:
>>>
>>>> sessionInfo()
>>> R version 2.12.1 (2010-12-16)
>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>>
>>> locale:
>>> [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C
>>> [3] LC_TIME=C LC_COLLATE=C
>>> [5] LC_MONETARY=C LC_MESSAGES=en_US.utf8
>>> [7] LC_PAPER=en_US.utf8 LC_NAME=C
>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] stats graphics grDevices utils datasets methods base
>>>
>>> other attached packages:
>>> [1] data.table_1.6.3
>>>
>>>
>>> On 4 August 2011 09:56, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
>>>>
>>>> Try cacheOK=FALSE (passed on to download.file() via ...)
>>>>
>>>> Or, sessionInfo() please. Is it Windows (dll not being refeshed).
>>>> Reboot,
>>>> clear out, R --vanilla install, clear out browser cache manually.
>>>> Failing
>>>> all that, download file manually and install from file.
>>>>
>>>> The rnorm(10) is already a vector as long as the table itself => 
>>>> invokes
>>>> "replace" column. Since you the user already created it, it is plonked
>>>> right into the column, bang.
>>>> The other case is recycling or subassign, and that preserves the
>>>> column's
>>>> type (for speed, unlike data.frame).
>>>> So, intended behaviour, just not what you expected.
>>>>
>>>>
>>>> "Chris Neff" <caneff at gmail.com> wrote in message
>>>> news:CAAuY0RWYJ8+wEbVG8XvYYqhSNFPV6XxgdQo8smRy+DRWGj5sfg at mail.gmail.com...
>>>> I've ran the following 3 different times in new sessions:
>>>>
>>>> install.packages("data.table",
>>>> repos="http://R-Forge.R-project.org",type="source")
>>>>
>>>> and still DT[,z:=5] does nothing. Is there something I check to make
>>>> sure that the latest version is loaded?
>>>>
>>>>
>>>> As for the coercion stuff, I feel that it feels somewhat inconsistent
>>>> right now. For instance:
>>>>
>>>>> DT <- data.table(x=1:10, y=1:10)
>>>>
>>>>> DT$y <- TRUE
>>>>
>>>>> sapply(DT, class)
>>>>
>>>> x y
>>>> "integer" "integer"
>>>>
>>>>> DT$y <- rnorm(10)
>>>>> sapply(DT, class)
>>>> x y
>>>> "integer" "numeric"
>>>>
>>>> So in the first case y silently coerces the logical to an integer
>>>> without warning, but in the second case y happily turns into a numeric
>>>> when need be. Why the difference?
>>>>
>>>> When I do something like DT$y <- foo, I expect that y should turn into
>>>> foo regardless of what y was before. If there is some reason why DT[,
>>>> y:=foo] should be different than DT$y <- foo, that is a secondary
>>>> matter, but I get mightily confused when DT$y <- foo doesn't behave
>>>> like data.frame.
>>>>
>>>> On 4 August 2011 08:50, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
>>>>> Still doesn't seem to be latest version: DT[,z:=5] should add column
>>>>> (and
>>>>> that's tested).
>>>>> Otherwise correct and intended behaviour (although an informative
>>>>> warning
>>>>> needs adding when 5 gets coerced to type of column (i.e. logical) -
>>>>> thanks
>>>>> for spotting). Remember as.logical(5) is TRUE without warning. So, try
>>>>> creating column with NA_integer_ or NA_real_ instead. Once the column
>>>>> type
>>>>> is set, that's it. Columns aren't coerced to match type of RHS, unlike
>>>>> data.frame [which if you think about it is a big hit if the data is
>>>>> large].
>>>>>
>>>>> "Chris Neff" <caneff at gmail.com> wrote in message
>>>>> news:CAAuY0RXT7q+cm91PJ8KGkMwDApwFxM_EALb-Yu=P6ndp+LEfXg at mail.gmail.com...
>>>>> Ignore this second one, restarting and refreshing my data.table
>>>>> install now gives the proper error message when I try that. Sorry I'm
>>>>> not used to being on the bleeding edge of these things and I forget to
>>>>> update. However the first question is still mainly relevant:
>>>>>
>>>>>> DT <- data.table(x=1:10, y=rep(1:2,5))
>>>>>> DT[,z:=5]
>>>>> x y
>>>>> [1,] 1 1
>>>>> [2,] 2 2
>>>>> [3,] 3 1
>>>>> [4,] 4 2
>>>>> [5,] 5 1
>>>>> [6,] 6 2
>>>>> [7,] 7 1
>>>>> [8,] 8 2
>>>>> [9,] 9 1
>>>>> [10,] 10 2
>>>>>> DT[1:nrow(DT),z:=5]
>>>>> Error in `[.data.table`(DT, 1:nrow(DT), `:=`(z, 5)) :
>>>>> Attempt to add new column(s) and set subset of rows at the same
>>>>> time. Create the new column(s) first, and then you'll be able to
>>>>> assign to a subset. If i is set to 1:nrow(x) then please remove that
>>>>> (no need, it's faster without).
>>>>>> DT$z <- NA
>>>>>> DT[, z:=5]
>>>>> x y z
>>>>> [1,] 1 1 TRUE
>>>>> [2,] 2 2 TRUE
>>>>> [3,] 3 1 TRUE
>>>>> [4,] 4 2 TRUE
>>>>> [5,] 5 1 TRUE
>>>>> [6,] 6 2 TRUE
>>>>> [7,] 7 1 TRUE
>>>>> [8,] 8 2 TRUE
>>>>> [9,] 9 1 TRUE
>>>>> [10,] 10 2 TRUE
>>>>>
>>>>>
>>>>>
>>>>> The return on DT[,z:=5] when I haven't initialized DT$z yet is
>>>>> different, but still more uninformative than it is when I do
>>>>> DT[1:nrow(DT), z:=5]. And the DT$z <- NA issue is still there.
>>>>>
>>>>> Thanks!
>>>>>
>>>>>
>>>>> On 4 August 2011 08:18, Chris Neff <caneff at gmail.com> wrote:
>>>>>> A second question while I'm playing with it. It seems from the FRs
>>>>>> that it doesn't support multiple := in one select, but:
>>>>>>
>>>>>> DT <- data.table(x=1:10, y=rep(1:2,10))
>>>>>> DT$a = 0
>>>>>> DT$z = 0
>>>>>>
>>>>>> DT[, list(a := y/sum(y), z := 5)]
>>>>>>
>>>>>> works just fine for me. An error gets thrown but afterwards the
>>>>>> columns are modified as intended. Why the error?
>>>>>>
>>>>>>> DT[,list(z:=5,a:=y/sum(y))]
>>>>>> z
>>>>>> [1] 5
>>>>>> [1] TRUE
>>>>>> a
>>>>>> y/sum(y)
>>>>>> [1] TRUE
>>>>>> Error in data.table(`:=`(z, 5), `:=`(a, y/sum(y))) :
>>>>>> column or argument 1 is NULL
>>>>>>> DT
>>>>>> x y z a
>>>>>> [1,] 1 1 5 0.06666667
>>>>>> [2,] 2 2 5 0.13333333
>>>>>> [3,] 3 1 5 0.06666667
>>>>>> [4,] 4 2 5 0.13333333
>>>>>> [5,] 5 1 5 0.06666667
>>>>>> [6,] 6 2 5 0.13333333
>>>>>> [7,] 7 1 5 0.06666667
>>>>>> [8,] 8 2 5 0.13333333
>>>>>> [9,] 9 1 5 0.06666667
>>>>>> [10,] 10 2 5 0.13333333
>>>>>>
>>>>>> -Chris
>>>>>>
>>>>>> On 4 August 2011 08:12, Chris Neff <caneff at gmail.com> wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> If I do:
>>>>>>>
>>>>>>> DT <- data.table(x=1:10, y=rep(1:2,5))
>>>>>>>
>>>>>>> Then try the following
>>>>>>>
>>>>>>> DT[, z:=5]
>>>>>>>
>>>>>>> I get:
>>>>>>>
>>>>>>>> DT[, z:=5]
>>>>>>> z
>>>>>>> [1] 5
>>>>>>> [1] TRUE
>>>>>>> NULL
>>>>>>>
>>>>>>> and if I were to do DT <- DT[, z:=5], then DT gets set to NULL.
>>>>>>> Alternatively if I do
>>>>>>>
>>>>>>> DT[1:10, z:=5]
>>>>>>>
>>>>>>> I get
>>>>>>>
>>>>>>>> DT=DT[1:nrow(DT),z:=5]
>>>>>>> z
>>>>>>> [1] 5
>>>>>>> [1] 1 2 3 4 5 6 7 8 9 10
>>>>>>> Error in `:=`(z, 5) :
>>>>>>> Attempt to add new column(s) and set subset of rows at the same
>>>>>>> time. Create the new column(s) first, and then you'll be able to
>>>>>>> assign to a subset. If i is set to 1:nrow(x) then please remove that
>>>>>>> (no need, it's faster without).
>>>>>>>
>>>>>>>
>>>>>>> Which is more informative. So I do as it instructs:
>>>>>>>
>>>>>>> DT$z <- NA
>>>>>>>
>>>>>>> DT[, z:=5]
>>>>>>>
>>>>>>> And as output I get:
>>>>>>>
>>>>>>>> DT
>>>>>>> x y z
>>>>>>> [1,] 1 1 TRUE
>>>>>>> [2,] 2 2 TRUE
>>>>>>> [3,] 3 1 TRUE
>>>>>>> [4,] 4 2 TRUE
>>>>>>> [5,] 5 1 TRUE
>>>>>>> [6,] 6 2 TRUE
>>>>>>> [7,] 7 1 TRUE
>>>>>>> [8,] 8 2 TRUE
>>>>>>> [9,] 9 1 TRUE
>>>>>>> [10,] 10 2 TRUE
>>>>>>>
>>>>>>>
>>>>>>> Why isn't z 5 like assigned? I think it is because I assigned it as
>>>>>>> NA, and data table didn't know to change it to integer (although why
>>>>>>> it changed it to logical is another puzzle). If I instead do
>>>>>>>
>>>>>>> DT$z <- 0
>>>>>>>
>>>>>>> DT[, z:=5]
>>>>>>>
>>>>>>> It works fine.
>>>>>>>
>>>>>>> So my two points are:
>>>>>>>
>>>>>>> A) Doing DT[,z:=5] should be as informative as doing DT[1:nrow(DT),
>>>>>>> z:=5] with the error message.
>>>>>>>
>>>>>>> B) What went wrong with the NA assignment I did?
>>>>>>>
>>>>>>> Thanks!
>>>>>>> Chris
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> datatable-help mailing list
>>>>> datatable-help at lists.r-forge.r-project.org
>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> datatable-help mailing list
>>>> datatable-help at lists.r-forge.r-project.org
>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>>
>>>
>>
>>
>>
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
>
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> 





More information about the datatable-help mailing list