[datatable-help] can I count on data.table supporting syntactically invalid column names?

Matthew Dowle mdowle at mdowle.plus.com
Thu Aug 9 01:16:35 CEST 2012


Then somehow you don't have the CRAN version of v1.8.2 installed. By any
chance did you install 1.8.2 from R-Forge in the few days a slightly
earlier version of 1.8.2 existed on R-Forge?  R-Forge also happened to be
stale in that time window. The first submission of 1.8.2 to CRAN was
reverted due to some difficulties, so it needed a 2nd attempt and took
longer than usual.

Please uninstall data.table and reinstall from any CRAN mirror (not
R-Forge) to make sure. A difference between 714 and 717 indicates an
installation problem of data.table, not R itself. test.data.table() v1.8.2
must return 717 precisely.

Another way would be to include the SVN rev number in the package version.
But I haven't found a way to do that for packages yet. R itself does that
of course, but I don't know how for packages. Since all changes in
data.table are accompanied by new tests, the current approach is using the
number of tests. And actually running all the tests on your hardware etc
is a stronger test everything is working as intended.


> The test.data.table() routine returns 714, not 717.
>
> I'm running data.table 1.8.2.
>
> The only thing not bleeding edge (I think) is R itself which is at 2.15.0.
>
> A search for "merge" on r-forge gets two hits, neither are related; a
> search for setcolorder gets no hits. Should I file a bug report (or two)?
>
> Here's my output from test.data.table() and sessionInfo():
>
>> test.data.table()
> Running .../tests.Rraw
> Loading required package: hexbin
> Loading required package: grid
> Loading required package: lattice
> x =  10,000 sample from 100 strings (quick test to save load on CRAN
> servers where tests run every day. In dev we increase n and m a lot for
> meaningful times.
> 0.002 : f=factor(x) [high up front cost, plus storage and maintenance of
> levels]
> 0.000 : sort.list(,'radix') on f
> 0.000 : u=unique(x)
> 0.000 : .Internal(order(u))
> 0.000 : sort.list(,'radix') on fsorted
> -vs-
> 0.000 : char group on x (ad hoc by)  [slower than radix on f but without
> up front cost]
> 0.000 : char sort on x (setkey)  [lower up front cost than factor(x)]
> 0.000 : char group on xsorted (keyed by)  [faster than sort.list(,'radix')
> on fsorted, same result]
> All 714 tests in test.data.table() completed ok in 15.272sec
>
>> sessionInfo()
> R version 2.15.0 (2012-03-30)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=C                 LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] grid      stats     graphics  grDevices utils     datasets  methods
> [8] base
>
> other attached packages:
> [1] hexbin_1.26.0    lattice_0.20-6   nlme_3.1-103     ggplot2_0.9.1
> [5] reshape_0.8.4    plyr_1.7.1       data.table_1.8.2
>
> loaded via a namespace (and not attached):
>  [1] colorspace_1.1-1   dichromat_1.2-4    digest_0.5.2       labeling_0.1
>  [5] MASS_7.3-17        memoise_0.1        munsell_0.3
> proto_0.3-9.2
>  [9] RColorBrewer_1.0-5 reshape2_1.2.1     scales_0.2.1
> stringr_0.6.1
>
> -----Original Message-----
> From: Matthew Dowle [mailto:mdowle at mdowle.plus.com]
> Sent: Wednesday, August 08, 2012 4:49 AM
> To: Kaupas, George
> Cc: datatable-help at lists.r-forge.r-project.org
> Subject: Re: [datatable-help] can I count on data.table supporting
> syntactically invalid column names?
>
>
> Meant to write 2nd paragraph as follows :
>
>>
>> Hi. Yes you should be able to rely on that. It's useful to have
>> special characters in column names for latex formatting, and spaces
>> are allowed too. There are tests for these things. If you need to
>> refer to such column names as variables, then it's up to you to wrap
>> with ``; e.g., by=`Illegal(name%)`+1.
>>
>> So yes, if you find problems with special characters, please report as
>> bugs, and suggest where the documentation needs improving would be
>> great.
>>
>> I seem to remember a bug fix in this regard, and in particular in
>> merge (so my first thought is to ask you if you've recently upgraded
>> to 1.8.2 and if test.data.table returns 717), but as you say R-Forge
>> is currently down for maintenance...
>>
>> That neworder error looks familiar too. Are you sure you have 1.8.2
>> running in memory? (Run test.data.table() to see if it returns 717).
>>
>> Matthew
>>
>>> I'm taking advantage of a feature in data.table which lets me get
>>> away with naming columns with characters that would not survive a
>>> call to make.names(), e.g.:
>>>
>>>> DT1 = data.table(a=letters[1:5], "Illegal(name%)"=1:5, key="a")
>>>> DT1
>>>    a Illegal(name%)
>>> 1: a              1
>>> 2: b              2
>>> 3: c              3
>>> 4: d              4
>>> 5: e              5
>>>
>>> (The the dcast function from the reshape2 package will also create
>>> columns named "illegally".)
>>>
>>> But when using merge.data.table, I get two side-effects; either the
>>> merge works, but the column names appear to be run through
>>> make.names(), or the merge fails in setcolorder():
>>>
>>>> DT1 = data.table(a=letters[1:5], "Illegal(name%)"=1:5, key="a")
>>>> DT2 = data.table(a=letters[1:5], b=6L, key="a")
>>>
>>>> merge(DT1,DT2)
>>>    a Illegal.name.. b
>>> 1: a              1 6
>>> 2: b              2 6
>>> 3: c              3 6
>>> 4: d              4 6
>>> 5: e              5 6
>>>
>>>> merge(DT2,DT1)
>>> Error in setcolorder(dt, c(setdiff(names(dt), end), end)) :
>>>   neworder is length 4 but x has 3 columns.
>>>
>>> I can't get to datatable.r-forge.r-project.org - getting a 504.
>>>
>>> So... should I NOT rely on being able to use special characters in
>>> column names?
>>>
>>> Thanks
>>> George
>>>
>>>> sessionInfo()
>>> R version 2.15.0 (2012-03-30)
>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>> [1] data.table_1.8.2
>
>




More information about the datatable-help mailing list