[datatable-help] can I count on data.table supporting syntactically invalid column names?

Matthew Dowle mdowle at mdowle.plus.com
Thu Aug 9 22:42:49 CEST 2012


Great.  Yes, bug.report(package="data.table") please. R-Forge is back
online now.
Thanks.

> Good catch; I did indeed snag 1.8.2 from R-Forge because I needed
> something in that version but didn't see it on CRAN at the time; never
> occurred to me the version would change.
>
> I uninstalled and installed the CRAN version. I get 717 from the tests.
> However the merge behavior is the same; in one direction it succeeds but
> changes the column names; in the other direction it fails in setcolorder.
>
> So I should open bug reports, then, eh?
>
>> test.data.table()
> Running .../tests.Rraw
> x =  10,000 sample from 100 strings (quick test to save load on CRAN
> servers where tests run every day. In dev we increase n and m a lot for
> meaningful times.
> 0.001 : f=factor(x) [high up front cost, plus storage and maintenance of
> levels]
> 0.000 : sort.list(,'radix') on f
> 0.000 : u=unique(x)
> 0.000 : .Internal(order(u))
> 0.001 : sort.list(,'radix') on fsorted
> -vs-
> 0.000 : char group on x (ad hoc by)  [slower than radix on f but without
> up front cost]
> 0.000 : char sort on x (setkey)  [lower up front cost than factor(x)]
> 0.000 : char group on xsorted (keyed by)  [faster than sort.list(,'radix')
> on fsorted, same result]
> All 717 tests in test.data.table() completed ok in 15.697sec
>
>> DT1 = data.table(a=letters[1:5], "Illegal(name%)"=1:5, key="a")
>> DT2 = data.table(a=letters[1:5], b=6L, key="a")
>> merge(DT1,DT2)
>    a Illegal.name.. b
> 1: a              1 6
> 2: b              2 6
> 3: c              3 6
> 4: d              4 6
> 5: e              5 6
>> merge(DT2,DT1)
> Error in setcolorder(dt, c(setdiff(names(dt), end), end)) :
>   neworder is length 4 but x has 3 columns.
>
> -----Original Message-----
> From: Matthew Dowle [mailto:mdowle at mdowle.plus.com]
> Sent: Wednesday, August 08, 2012 6:17 PM
> To: Kaupas, George
> Cc: datatable-help at lists.r-forge.r-project.org
> Subject: RE: [datatable-help] can I count on data.table supporting
> syntactically invalid column names?
>
>
> Then somehow you don't have the CRAN version of v1.8.2 installed. By any
> chance did you install 1.8.2 from R-Forge in the few days a slightly
> earlier version of 1.8.2 existed on R-Forge?  R-Forge also happened to be
> stale in that time window. The first submission of 1.8.2 to CRAN was
> reverted due to some difficulties, so it needed a 2nd attempt and took
> longer than usual.
>
> Please uninstall data.table and reinstall from any CRAN mirror (not
> R-Forge) to make sure. A difference between 714 and 717 indicates an
> installation problem of data.table, not R itself. test.data.table() v1.8.2
> must return 717 precisely.
>
> Another way would be to include the SVN rev number in the package version.
> But I haven't found a way to do that for packages yet. R itself does that
> of course, but I don't know how for packages. Since all changes in
> data.table are accompanied by new tests, the current approach is using the
> number of tests. And actually running all the tests on your hardware etc
> is a stronger test everything is working as intended.
>
>
>> The test.data.table() routine returns 714, not 717.
>>
>> I'm running data.table 1.8.2.
>>
>> The only thing not bleeding edge (I think) is R itself which is at
>> 2.15.0.
>>
>> A search for "merge" on r-forge gets two hits, neither are related; a
>> search for setcolorder gets no hits. Should I file a bug report (or
>> two)?
>>
>> Here's my output from test.data.table() and sessionInfo():
>>
>>> test.data.table()
>> Running .../tests.Rraw
>> Loading required package: hexbin
>> Loading required package: grid
>> Loading required package: lattice
>> x =  10,000 sample from 100 strings (quick test to save load on CRAN
>> servers where tests run every day. In dev we increase n and m a lot
>> for meaningful times.
>> 0.002 : f=factor(x) [high up front cost, plus storage and maintenance
>> of levels]
>> 0.000 : sort.list(,'radix') on f
>> 0.000 : u=unique(x)
>> 0.000 : .Internal(order(u))
>> 0.000 : sort.list(,'radix') on fsorted
>> -vs-
>> 0.000 : char group on x (ad hoc by)  [slower than radix on f but
>> without up front cost]
>> 0.000 : char sort on x (setkey)  [lower up front cost than factor(x)]
>> 0.000 : char group on xsorted (keyed by)  [faster than
>> sort.list(,'radix') on fsorted, same result] All 714 tests in
>> test.data.table() completed ok in 15.272sec
>>
>>> sessionInfo()
>> R version 2.15.0 (2012-03-30)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>  [7] LC_PAPER=C                 LC_NAME=C
>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] grid      stats     graphics  grDevices utils     datasets  methods
>> [8] base
>>
>> other attached packages:
>> [1] hexbin_1.26.0    lattice_0.20-6   nlme_3.1-103     ggplot2_0.9.1
>> [5] reshape_0.8.4    plyr_1.7.1       data.table_1.8.2
>>
>> loaded via a namespace (and not attached):
>>  [1] colorspace_1.1-1   dichromat_1.2-4    digest_0.5.2
>> labeling_0.1
>>  [5] MASS_7.3-17        memoise_0.1        munsell_0.3
>> proto_0.3-9.2
>>  [9] RColorBrewer_1.0-5 reshape2_1.2.1     scales_0.2.1
>> stringr_0.6.1
>>
>> -----Original Message-----
>> From: Matthew Dowle [mailto:mdowle at mdowle.plus.com]
>> Sent: Wednesday, August 08, 2012 4:49 AM
>> To: Kaupas, George
>> Cc: datatable-help at lists.r-forge.r-project.org
>> Subject: Re: [datatable-help] can I count on data.table supporting
>> syntactically invalid column names?
>>
>>
>> Meant to write 2nd paragraph as follows :
>>
>>>
>>> Hi. Yes you should be able to rely on that. It's useful to have
>>> special characters in column names for latex formatting, and spaces
>>> are allowed too. There are tests for these things. If you need to
>>> refer to such column names as variables, then it's up to you to wrap
>>> with ``; e.g., by=`Illegal(name%)`+1.
>>>
>>> So yes, if you find problems with special characters, please report
>>> as bugs, and suggest where the documentation needs improving would be
>>> great.
>>>
>>> I seem to remember a bug fix in this regard, and in particular in
>>> merge (so my first thought is to ask you if you've recently upgraded
>>> to 1.8.2 and if test.data.table returns 717), but as you say R-Forge
>>> is currently down for maintenance...
>>>
>>> That neworder error looks familiar too. Are you sure you have 1.8.2
>>> running in memory? (Run test.data.table() to see if it returns 717).
>>>
>>> Matthew
>>>
>>>> I'm taking advantage of a feature in data.table which lets me get
>>>> away with naming columns with characters that would not survive a
>>>> call to make.names(), e.g.:
>>>>
>>>>> DT1 = data.table(a=letters[1:5], "Illegal(name%)"=1:5, key="a")
>>>>> DT1
>>>>    a Illegal(name%)
>>>> 1: a              1
>>>> 2: b              2
>>>> 3: c              3
>>>> 4: d              4
>>>> 5: e              5
>>>>
>>>> (The the dcast function from the reshape2 package will also create
>>>> columns named "illegally".)
>>>>
>>>> But when using merge.data.table, I get two side-effects; either the
>>>> merge works, but the column names appear to be run through
>>>> make.names(), or the merge fails in setcolorder():
>>>>
>>>>> DT1 = data.table(a=letters[1:5], "Illegal(name%)"=1:5, key="a")
>>>>> DT2 = data.table(a=letters[1:5], b=6L, key="a")
>>>>
>>>>> merge(DT1,DT2)
>>>>    a Illegal.name.. b
>>>> 1: a              1 6
>>>> 2: b              2 6
>>>> 3: c              3 6
>>>> 4: d              4 6
>>>> 5: e              5 6
>>>>
>>>>> merge(DT2,DT1)
>>>> Error in setcolorder(dt, c(setdiff(names(dt), end), end)) :
>>>>   neworder is length 4 but x has 3 columns.
>>>>
>>>> I can't get to datatable.r-forge.r-project.org - getting a 504.
>>>>
>>>> So... should I NOT rely on being able to use special characters in
>>>> column names?
>>>>
>>>> Thanks
>>>> George
>>>>
>>>>> sessionInfo()
>>>> R version 2.15.0 (2012-03-30)
>>>> Platform: x86_64-unknown-linux-gnu (64-bit) [1] data.table_1.8.2
>>
>>
>
>
>




More information about the datatable-help mailing list