[datatable-help] can I count on data.table supporting syntactically invalid column names?

Kaupas, George George.Kaupas at spansion.com
Wed Aug 8 20:39:32 CEST 2012


The test.data.table() routine returns 714, not 717.

I'm running data.table 1.8.2.

The only thing not bleeding edge (I think) is R itself which is at 2.15.0.

A search for "merge" on r-forge gets two hits, neither are related; a search for setcolorder gets no hits. Should I file a bug report (or two)?

Here's my output from test.data.table() and sessionInfo():

> test.data.table()
Running .../tests.Rraw
Loading required package: hexbin
Loading required package: grid
Loading required package: lattice
x =  10,000 sample from 100 strings (quick test to save load on CRAN servers where tests run every day. In dev we increase n and m a lot for meaningful times.
0.002 : f=factor(x) [high up front cost, plus storage and maintenance of levels]
0.000 : sort.list(,'radix') on f
0.000 : u=unique(x)
0.000 : .Internal(order(u))
0.000 : sort.list(,'radix') on fsorted
-vs-
0.000 : char group on x (ad hoc by)  [slower than radix on f but without up front cost]
0.000 : char sort on x (setkey)  [lower up front cost than factor(x)]
0.000 : char group on xsorted (keyed by)  [faster than sort.list(,'radix') on fsorted, same result]
All 714 tests in test.data.table() completed ok in 15.272sec

> sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C                 LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] hexbin_1.26.0    lattice_0.20-6   nlme_3.1-103     ggplot2_0.9.1
[5] reshape_0.8.4    plyr_1.7.1       data.table_1.8.2

loaded via a namespace (and not attached):
 [1] colorspace_1.1-1   dichromat_1.2-4    digest_0.5.2       labeling_0.1
 [5] MASS_7.3-17        memoise_0.1        munsell_0.3        proto_0.3-9.2
 [9] RColorBrewer_1.0-5 reshape2_1.2.1     scales_0.2.1       stringr_0.6.1

-----Original Message-----
From: Matthew Dowle [mailto:mdowle at mdowle.plus.com] 
Sent: Wednesday, August 08, 2012 4:49 AM
To: Kaupas, George
Cc: datatable-help at lists.r-forge.r-project.org
Subject: Re: [datatable-help] can I count on data.table supporting syntactically invalid column names?


Meant to write 2nd paragraph as follows :

>
> Hi. Yes you should be able to rely on that. It's useful to have 
> special characters in column names for latex formatting, and spaces 
> are allowed too. There are tests for these things. If you need to 
> refer to such column names as variables, then it's up to you to wrap 
> with ``; e.g., by=`Illegal(name%)`+1.
>
> So yes, if you find problems with special characters, please report as 
> bugs, and suggest where the documentation needs improving would be great.
>
> I seem to remember a bug fix in this regard, and in particular in 
> merge (so my first thought is to ask you if you've recently upgraded 
> to 1.8.2 and if test.data.table returns 717), but as you say R-Forge 
> is currently down for maintenance...
>
> That neworder error looks familiar too. Are you sure you have 1.8.2 
> running in memory? (Run test.data.table() to see if it returns 717).
>
> Matthew
>
>> I'm taking advantage of a feature in data.table which lets me get 
>> away with naming columns with characters that would not survive a 
>> call to make.names(), e.g.:
>>
>>> DT1 = data.table(a=letters[1:5], "Illegal(name%)"=1:5, key="a")
>>> DT1
>>    a Illegal(name%)
>> 1: a              1
>> 2: b              2
>> 3: c              3
>> 4: d              4
>> 5: e              5
>>
>> (The the dcast function from the reshape2 package will also create 
>> columns named "illegally".)
>>
>> But when using merge.data.table, I get two side-effects; either the 
>> merge works, but the column names appear to be run through 
>> make.names(), or the merge fails in setcolorder():
>>
>>> DT1 = data.table(a=letters[1:5], "Illegal(name%)"=1:5, key="a")
>>> DT2 = data.table(a=letters[1:5], b=6L, key="a")
>>
>>> merge(DT1,DT2)
>>    a Illegal.name.. b
>> 1: a              1 6
>> 2: b              2 6
>> 3: c              3 6
>> 4: d              4 6
>> 5: e              5 6
>>
>>> merge(DT2,DT1)
>> Error in setcolorder(dt, c(setdiff(names(dt), end), end)) :
>>   neworder is length 4 but x has 3 columns.
>>
>> I can't get to datatable.r-forge.r-project.org - getting a 504.
>>
>> So... should I NOT rely on being able to use special characters in 
>> column names?
>>
>> Thanks
>> George
>>
>>> sessionInfo()
>> R version 2.15.0 (2012-03-30)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>> [1] data.table_1.8.2 



More information about the datatable-help mailing list