[datatable-help] := unclarity and possible bug?

Chris Neff caneff at gmail.com
Fri Aug 5 13:47:58 CEST 2011


Now that I've played with := for a little bit, what is the rationale
for the following?

> DT <- data.table(x=1:10, y=1:10)
> out <- DT[, z:=1:10]
> out
       x  y
 [1,]  1  1
 [2,]  2  2
 [3,]  3  3
 [4,]  4  4
 [5,]  5  5
 [6,]  6  6
 [7,]  7  7
 [8,]  8  8
 [9,]  9  9
[10,] 10 10
> DT
       x  y  z
 [1,]  1  1  1
 [2,]  2  2  2
 [3,]  3  3  3
 [4,]  4  4  4
 [5,]  5  5  5
 [6,]  6  6  6
 [7,]  7  7  7
 [8,]  8  8  8
 [9,]  9  9  9
[10,] 10 10 10


I would have expected the return from DT[, z:=1:10] to be either A)
nothing, which is what I think is the preferred thing if you are
really trying to drill home the idea of in place assignment, or B) the
newly updated version of DT with z in it (but I think that muddles
what := does).  Why does it return what it does?

On 4 August 2011 13:43, Chris Neff <caneff at gmail.com> wrote:
>>> How can I make sure I install something that is the latest developer
>>> build?
>> No data.table user has been so keen to be so up to date, as you, before ;)
>> So, short answer is I don't know.  Is it possible that the tar.gz source
>> file is only created on R-Forge once per day, then?  I always thought
>> type="source" fetched the latest commit, but perhaps it just gets the last
>> nightly snapshot of tar.gz. If you could find out from R-Forge (maybe it is
>> documented there, see it's documentation) please let us all know how.
>> Worst case you can just 'svn up' and 'R CMD build' the package yourself.
>> That's what we do as developers. Not ideal though, is it. More of a question
>> for R-Forge support.
>
> All this seems to just point to a stale R-Forge file. I can dig into
> it a bit.  Want to stay up to date to make sure I'm not spamming the
> list with bugs that are fixed.
>
>>
>> Matthew
>>
>>
>> On 4 August 2011 11:15, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
>>> Relief. Yes that's definitely not the latest version. That error is fixed
>>> (otherwise it wouldn't be accepted to CRAN). The source tar.gz is on CRAN
>>> now, that should work.
>>>
>>> I don't know why type="source" from R-Forge is so behind. I thought it was
>>> bang up to date that way. If you could ask Stefan on R-Forge I'd be most
>>> grateful. Or maybe someone else on the list knows. Might the package
>>> have gotten locked in your install? Read up about 00Lock. Absolutely sure
>>> no error messages on install? Did you require(data.table) in the "sudo R"
>>> after the install just to tickle it? There are some conditions it rolls
>>> back to previous (silently, but I never worked it out). If other R
>>> processes are using the package (even zombies) it might get confused, so
>>> kill all R sessions and use a sudo R --vanilla to install. Maybe an 'svn
>>> up' way to get latest, but I never thought that was necessary. You did
>>> "sudo R" to install, right? Otherwise it asks if you want to install
>>> somewhere else, and you don't want to do that.
>>>
>>>
>>> "Chris Neff" <caneff at gmail.com> wrote in message
>>> news:CAAuY0RXA1-9pzL3SR=uF5iJw4MnU_nF40Lw-P1VpiN6Yn=cuDA at mail.gmail.com...
>>>> test.data.table()
>>> Running
>>> /home/caneff/R/x86_64-pc-linux-gnu-library/2.12/data.table/tests/tests.R
>>> Loading required package: ggplot2
>>> Loading required package: reshape
>>>
>>> Attaching package: 'reshape'
>>>
>>> The following object(s) are masked from 'package:plyr':
>>>
>>> rename, round_any
>>>
>>> Loading required package: grid
>>> Loading required package: proto
>>> Test 304 Error in try(x, TRUE) : could not find function "haskey"
>>> Error in eval(expr, envir, enclos) : 1 errors in test.data.table()
>>>
>>> And the banner says 1.6.3. I got it from the link after "Download:"
>>> here: https://r-forge.r-project.org/R/?group_id=240
>>>
>>> Should I be clicking somewhere else for 1.6.4? Why won't
>>> install.packages with type="source" get 1.6.4 for me either?
>>>
>>> On 4 August 2011 10:16, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
>>>> does test.data.table() work? How many tests does it run? Does the startup
>>>> banner state 1.6.4 ?
>>>>
>>>> "Chris Neff" <caneff at gmail.com> wrote in message
>>>> news:CAAuY0RX3er-gSB52am1AZo4Ws9=fbXzqigaJnmfKqPxF_mn-8g at mail.gmail.com...
>>>> I thought this might be the problem (from the R-forge download page):
>>>> "In order to successfully install the packages provided on R-Forge,
>>>> you have to switch to the most recent version of R or, alternatively,
>>>> install from the package sources (.tar.gz) in older versions of R"
>>>>
>>>> I'm running 2.12.1 because it is built with internal company compilers
>>>> for extra internal support. I tried just downloading the
>>>> data.table_1.6.3.tar.gz file from r-forge and doing R CMD INSTALL, but
>>>> still DT[,z:=5] doesn't work.
>>>>
>>>>
>>>>
>>>> On 4 August 2011 10:06, Chris Neff <caneff at gmail.com> wrote:
>>>>> cacheOK=FALSE didn't fix it. Running on Ubuntu (so should I do
>>>>> anything in that paragraph?). Session info:
>>>>>
>>>>>> sessionInfo()
>>>>> R version 2.12.1 (2010-12-16)
>>>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>>>>
>>>>> locale:
>>>>> [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C
>>>>> [3] LC_TIME=C LC_COLLATE=C
>>>>> [5] LC_MONETARY=C LC_MESSAGES=en_US.utf8
>>>>> [7] LC_PAPER=en_US.utf8 LC_NAME=C
>>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>>>> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
>>>>>
>>>>> attached base packages:
>>>>> [1] stats graphics grDevices utils datasets methods base
>>>>>
>>>>> other attached packages:
>>>>> [1] data.table_1.6.3
>>>>>
>>>>>
>>>>> On 4 August 2011 09:56, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
>>>>>>
>>>>>> Try cacheOK=FALSE (passed on to download.file() via ...)
>>>>>>
>>>>>> Or, sessionInfo() please. Is it Windows (dll not being refeshed).
>>>>>> Reboot,
>>>>>> clear out, R --vanilla install, clear out browser cache manually.
>>>>>> Failing
>>>>>> all that, download file manually and install from file.
>>>>>>
>>>>>> The rnorm(10) is already a vector as long as the table itself =>
>>>>>> invokes
>>>>>> "replace" column. Since you the user already created it, it is plonked
>>>>>> right into the column, bang.
>>>>>> The other case is recycling or subassign, and that preserves the
>>>>>> column's
>>>>>> type (for speed, unlike data.frame).
>>>>>> So, intended behaviour, just not what you expected.
>>>>>>
>>>>>>
>>>>>> "Chris Neff" <caneff at gmail.com> wrote in message
>>>>>> news:CAAuY0RWYJ8+wEbVG8XvYYqhSNFPV6XxgdQo8smRy+DRWGj5sfg at mail.gmail.com...
>>>>>> I've ran the following 3 different times in new sessions:
>>>>>>
>>>>>> install.packages("data.table",
>>>>>> repos="http://R-Forge.R-project.org",type="source")
>>>>>>
>>>>>> and still DT[,z:=5] does nothing. Is there something I check to make
>>>>>> sure that the latest version is loaded?
>>>>>>
>>>>>>
>>>>>> As for the coercion stuff, I feel that it feels somewhat inconsistent
>>>>>> right now. For instance:
>>>>>>
>>>>>>> DT <- data.table(x=1:10, y=1:10)
>>>>>>
>>>>>>> DT$y <- TRUE
>>>>>>
>>>>>>> sapply(DT, class)
>>>>>>
>>>>>> x y
>>>>>> "integer" "integer"
>>>>>>
>>>>>>> DT$y <- rnorm(10)
>>>>>>> sapply(DT, class)
>>>>>> x y
>>>>>> "integer" "numeric"
>>>>>>
>>>>>> So in the first case y silently coerces the logical to an integer
>>>>>> without warning, but in the second case y happily turns into a numeric
>>>>>> when need be. Why the difference?
>>>>>>
>>>>>> When I do something like DT$y <- foo, I expect that y should turn into
>>>>>> foo regardless of what y was before. If there is some reason why DT[,
>>>>>> y:=foo] should be different than DT$y <- foo, that is a secondary
>>>>>> matter, but I get mightily confused when DT$y <- foo doesn't behave
>>>>>> like data.frame.
>>>>>>
>>>>>> On 4 August 2011 08:50, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
>>>>>>> Still doesn't seem to be latest version: DT[,z:=5] should add column
>>>>>>> (and
>>>>>>> that's tested).
>>>>>>> Otherwise correct and intended behaviour (although an informative
>>>>>>> warning
>>>>>>> needs adding when 5 gets coerced to type of column (i.e. logical) -
>>>>>>> thanks
>>>>>>> for spotting). Remember as.logical(5) is TRUE without warning. So, try
>>>>>>> creating column with NA_integer_ or NA_real_ instead. Once the column
>>>>>>> type
>>>>>>> is set, that's it. Columns aren't coerced to match type of RHS, unlike
>>>>>>> data.frame [which if you think about it is a big hit if the data is
>>>>>>> large].
>>>>>>>
>>>>>>> "Chris Neff" <caneff at gmail.com> wrote in message
>>>>>>> news:CAAuY0RXT7q+cm91PJ8KGkMwDApwFxM_EALb-Yu=P6ndp+LEfXg at mail.gmail.com...
>>>>>>> Ignore this second one, restarting and refreshing my data.table
>>>>>>> install now gives the proper error message when I try that. Sorry I'm
>>>>>>> not used to being on the bleeding edge of these things and I forget to
>>>>>>> update. However the first question is still mainly relevant:
>>>>>>>
>>>>>>>> DT <- data.table(x=1:10, y=rep(1:2,5))
>>>>>>>> DT[,z:=5]
>>>>>>> x y
>>>>>>> [1,] 1 1
>>>>>>> [2,] 2 2
>>>>>>> [3,] 3 1
>>>>>>> [4,] 4 2
>>>>>>> [5,] 5 1
>>>>>>> [6,] 6 2
>>>>>>> [7,] 7 1
>>>>>>> [8,] 8 2
>>>>>>> [9,] 9 1
>>>>>>> [10,] 10 2
>>>>>>>> DT[1:nrow(DT),z:=5]
>>>>>>> Error in `[.data.table`(DT, 1:nrow(DT), `:=`(z, 5)) :
>>>>>>> Attempt to add new column(s) and set subset of rows at the same
>>>>>>> time. Create the new column(s) first, and then you'll be able to
>>>>>>> assign to a subset. If i is set to 1:nrow(x) then please remove that
>>>>>>> (no need, it's faster without).
>>>>>>>> DT$z <- NA
>>>>>>>> DT[, z:=5]
>>>>>>> x y z
>>>>>>> [1,] 1 1 TRUE
>>>>>>> [2,] 2 2 TRUE
>>>>>>> [3,] 3 1 TRUE
>>>>>>> [4,] 4 2 TRUE
>>>>>>> [5,] 5 1 TRUE
>>>>>>> [6,] 6 2 TRUE
>>>>>>> [7,] 7 1 TRUE
>>>>>>> [8,] 8 2 TRUE
>>>>>>> [9,] 9 1 TRUE
>>>>>>> [10,] 10 2 TRUE
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> The return on DT[,z:=5] when I haven't initialized DT$z yet is
>>>>>>> different, but still more uninformative than it is when I do
>>>>>>> DT[1:nrow(DT), z:=5]. And the DT$z <- NA issue is still there.
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>>
>>>>>>> On 4 August 2011 08:18, Chris Neff <caneff at gmail.com> wrote:
>>>>>>>> A second question while I'm playing with it. It seems from the FRs
>>>>>>>> that it doesn't support multiple := in one select, but:
>>>>>>>>
>>>>>>>> DT <- data.table(x=1:10, y=rep(1:2,10))
>>>>>>>> DT$a = 0
>>>>>>>> DT$z = 0
>>>>>>>>
>>>>>>>> DT[, list(a := y/sum(y), z := 5)]
>>>>>>>>
>>>>>>>> works just fine for me. An error gets thrown but afterwards the
>>>>>>>> columns are modified as intended. Why the error?
>>>>>>>>
>>>>>>>>> DT[,list(z:=5,a:=y/sum(y))]
>>>>>>>> z
>>>>>>>> [1] 5
>>>>>>>> [1] TRUE
>>>>>>>> a
>>>>>>>> y/sum(y)
>>>>>>>> [1] TRUE
>>>>>>>> Error in data.table(`:=`(z, 5), `:=`(a, y/sum(y))) :
>>>>>>>> column or argument 1 is NULL
>>>>>>>>> DT
>>>>>>>> x y z a
>>>>>>>> [1,] 1 1 5 0.06666667
>>>>>>>> [2,] 2 2 5 0.13333333
>>>>>>>> [3,] 3 1 5 0.06666667
>>>>>>>> [4,] 4 2 5 0.13333333
>>>>>>>> [5,] 5 1 5 0.06666667
>>>>>>>> [6,] 6 2 5 0.13333333
>>>>>>>> [7,] 7 1 5 0.06666667
>>>>>>>> [8,] 8 2 5 0.13333333
>>>>>>>> [9,] 9 1 5 0.06666667
>>>>>>>> [10,] 10 2 5 0.13333333
>>>>>>>>
>>>>>>>> -Chris
>>>>>>>>
>>>>>>>> On 4 August 2011 08:12, Chris Neff <caneff at gmail.com> wrote:
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> If I do:
>>>>>>>>>
>>>>>>>>> DT <- data.table(x=1:10, y=rep(1:2,5))
>>>>>>>>>
>>>>>>>>> Then try the following
>>>>>>>>>
>>>>>>>>> DT[, z:=5]
>>>>>>>>>
>>>>>>>>> I get:
>>>>>>>>>
>>>>>>>>>> DT[, z:=5]
>>>>>>>>> z
>>>>>>>>> [1] 5
>>>>>>>>> [1] TRUE
>>>>>>>>> NULL
>>>>>>>>>
>>>>>>>>> and if I were to do DT <- DT[, z:=5], then DT gets set to NULL.
>>>>>>>>> Alternatively if I do
>>>>>>>>>
>>>>>>>>> DT[1:10, z:=5]
>>>>>>>>>
>>>>>>>>> I get
>>>>>>>>>
>>>>>>>>>> DT=DT[1:nrow(DT),z:=5]
>>>>>>>>> z
>>>>>>>>> [1] 5
>>>>>>>>> [1] 1 2 3 4 5 6 7 8 9 10
>>>>>>>>> Error in `:=`(z, 5) :
>>>>>>>>> Attempt to add new column(s) and set subset of rows at the same
>>>>>>>>> time. Create the new column(s) first, and then you'll be able to
>>>>>>>>> assign to a subset. If i is set to 1:nrow(x) then please remove that
>>>>>>>>> (no need, it's faster without).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Which is more informative. So I do as it instructs:
>>>>>>>>>
>>>>>>>>> DT$z <- NA
>>>>>>>>>
>>>>>>>>> DT[, z:=5]
>>>>>>>>>
>>>>>>>>> And as output I get:
>>>>>>>>>
>>>>>>>>>> DT
>>>>>>>>> x y z
>>>>>>>>> [1,] 1 1 TRUE
>>>>>>>>> [2,] 2 2 TRUE
>>>>>>>>> [3,] 3 1 TRUE
>>>>>>>>> [4,] 4 2 TRUE
>>>>>>>>> [5,] 5 1 TRUE
>>>>>>>>> [6,] 6 2 TRUE
>>>>>>>>> [7,] 7 1 TRUE
>>>>>>>>> [8,] 8 2 TRUE
>>>>>>>>> [9,] 9 1 TRUE
>>>>>>>>> [10,] 10 2 TRUE
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Why isn't z 5 like assigned? I think it is because I assigned it as
>>>>>>>>> NA, and data table didn't know to change it to integer (although why
>>>>>>>>> it changed it to logical is another puzzle). If I instead do
>>>>>>>>>
>>>>>>>>> DT$z <- 0
>>>>>>>>>
>>>>>>>>> DT[, z:=5]
>>>>>>>>>
>>>>>>>>> It works fine.
>>>>>>>>>
>>>>>>>>> So my two points are:
>>>>>>>>>
>>>>>>>>> A) Doing DT[,z:=5] should be as informative as doing DT[1:nrow(DT),
>>>>>>>>> z:=5] with the error message.
>>>>>>>>>
>>>>>>>>> B) What went wrong with the NA assignment I did?
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>> Chris
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> datatable-help mailing list
>>>>>>> datatable-help at lists.r-forge.r-project.org
>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> datatable-help mailing list
>>>>>> datatable-help at lists.r-forge.r-project.org
>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> datatable-help mailing list
>>>> datatable-help at lists.r-forge.r-project.org
>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> datatable-help mailing list
>>> datatable-help at lists.r-forge.r-project.org
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>
>>
>>
>>
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
>


More information about the datatable-help mailing list