[datatable-help] Preserving comments in [.data.table?

Matthew Dowle mdowle at mdowle.plus.com
Tue Sep 11 13:30:19 CEST 2012


>
> On 11/09/12 11:00, Matthew Dowle wrote:
>> Hi. I didn't know about base::comment. Should use unknownr::unk(),
>> shouldn't I!
>>
>> Please raise a new feature request, or add it to this one :
>>
>> #2197 "A simple labels attribute like in the Hmisc package for variable
>> descriptions"
>> https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2197&group_id=240&atid=978
>>
>> Do we want labels, or comments, or both? Griffith Rees (cc'd)?
>
> I think we want column attributes. As in: all of them. The comment is
> just one attribute. R uses attributes for all sorts of sneaky things so
> I think it is a bad idea if we lose some of them some of the time.

Oh I see. Please file a bug.report(package="data.table") something like:
"Column attributes (such as 'comment') are sometimes lost."

>
>> Since comment<- will copy the whole table,
>
> Even for column comments? I am not quite sure how that works.

Hm. Maybe not actually, since it's internal :

> get("comment<-")
function (x, value)
.Internal(`comment<-`(x, value))

But, even so, tracemem reports that it does copy (a simple vector at
least), so yes :

> x = 1:10
> .Internal(inspect(x))
@0x0000000004167c58 13 INTSXP g0c4 [NAM(1)] (len=10, tl=0) 1,2,3,4,5,...
> tracemem(x)
[1] "<0x0000000004167c58"
> comment(x) <- "hello"
tracemem[0x0000000004167c58 -> 0x0000000004167c00]: comment<-
> .Internal(inspect(x))
@0x0000000004167c00 13 INTSXP g0c4 [NAM(1),TR,ATT] (len=10, tl=0)
[ ... snip attributes ... ]


>> we'd need setcomment() [or
>> setlabel(), or both] to avoid that copy; adding to the set* family.
>
> A set attributes that allows setting column attributes may suffice.

Already exists: try setattr(). Works on anything. [Which makes my worrying
in response to yesterday's request for setnames to work for data.frame
seem off track. You can probably set names by reference already on a
data.frame, going via setattr().]


>
>>
>>> PS: is stopifnot(identical(DT1[DT1]$A, DT1[DT1]$A.1)) a buglet?
>> Seems ok :
>
> Sorry, my bad (because str(DT1[DT1]) lists them as having different
> storage classes).
>
> Allan
>
>>
>>> DT1 <- data.table(id = seq.int(1, 10), A = LETTERS[1:10], key = "id")
>>> stopifnot(identical(DT1[DT1]$A, DT1[DT1]$A.1))
>>> comment(DT1$A) <- "A"
>>> stopifnot(identical(DT1[DT1]$A, DT1[DT1]$A.1))
>> Error: identical(DT1[DT1]$A, DT1[DT1]$A.1) is not TRUE
>>> str(DT1[DT1]$A)
>>   chr [1:10] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J"
>>> str(DT1[DT1]$A.1)
>>   atomic [1:10] A B C D ...
>>   - attr(*, "comment")= chr "A"
>>
>>
>>> On 11/09/12 09:47, I wrote:
>>>> I like comments, but [.data.table drops them, e.g.
>>> More specifically, it drops them for the 'outer' data.table but not the
>>> inner (!?) when doing the join:
>>>
>>> DT1 <- data.table(id = seq.int(1, 10), A = LETTERS[1:10], key = "id")
>>> comment(DT1$A) <- "A"
>>> DT2 <- data.table(id = seq.int(2, 10, 2), b = letters[1:5], key = "id")
>>> comment(DT2$b) <- "b"
>>> str(DT1[DT2]) # No comment on A
>>> str(DT2[DT1]) # No comment on b
>>>
>>> Allan
>>>
>>> PS: is stopifnot(identical(DT1[DT1]$A, DT1[DT1]$A.1)) a buglet?
>>>
>>>> DT1 <- data.table(A = 1:10)
>>>> comment(DT1$A) <- "Documenting A column"
>>>> DT2 <- DT1[A %% 2]
>>>> stopifnot(identical(comment(DT1$A), comment(DT2$A)))
>>>>
>>>> Is there any way of preserving the comments? (My normal use case is
>>>> DT1 <- DT1[...] so copying from one to the other is a little ...
>>>> tedious?)
>>>>
>>>> Allan
>>>>
>>> _______________________________________________
>>> datatable-help mailing list
>>> datatable-help at lists.r-forge.r-project.org
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>
>>
>
>




More information about the datatable-help mailing list