[datatable-help] Suggest a cool feature: Use data.table like a sorted / indexed data.list?

Branson Owen branson.owen at gmail.com
Mon Sep 20 16:24:50 CEST 2010


Alright, I did it again. I was on data.table 1.4, where data.table
doesn't inherit from data.frame. I have three versions of R on my
machine and I mean to keep one of them using the old version for
comparison and emergency. My idea and experiment came very quick and
pay not enough attention to other things. Sorry about it.

This is great. It looks like that 1.5 mean to or by accident implement
an long-time wish on the list!

I found that most of the syntax works as I expect but not this one:
(follow Tom's example)

> dt[paste(b),by = list(a)]
Error in `[.data.table`(dt, , paste(b), by = list(a)) :
 only integer,double,logical and character vectors are allowed so
far. Type 19 would need to be added.

However, the alternative for grouping using i expression (faster?) works:
> dt[J(unique(a)),paste(b)]
[1] "1:2" "3:5" "r"   "6:9"

Is there any performance issue? It looks like that data.table can
support non-vectors column already (probably because it's now
inherited from data.frame), but the "by" syntax auto-block the
non-vector type. Do you think that we can disable the type
auto-checking because it should work for 1.5? (not sure though)

Another example that works for data.frame but not data.table. (This
seems only a display problem. Other data.table features except for
"by" syntax still works if we are careful about the data type)

> dt$c = dt
> dt
Error in rep("", ncol(xi)) : invalid 'times' argument

> class(dt)
[1] "data.table" "data.frame"
> class(dt) = "data.frame"
> dt
 a          b c.a        c.b
1 1       1, 2   1       1, 2
2 2    3, 4, 5   2    3, 4, 5
3 3          r   3          r
4 4 6, 7, 8, 9   4 6, 7, 8, 9

I am sorry I was not aware of the existing old issue and was reckless
about the version. :-P Thanks a lot for the correction and references.

Best regards,

2010/9/17 Matthew Dowle <mdowle at mdowle.plus.com>:
> This seems to work :
>
>> dt = data.table(a=1:4)
>> dt$b = list(1:2,3:5,"r",6:9)
>> dt
>     a          b
> [1,] 1       1, 2
> [2,] 2    3, 4, 5
> [3,] 3          r
> [4,] 4 6, 7, 8, 9
>> setkey(dt,a)
>> dt[J(2)]
>     a       b
> [1,] 2 3, 4, 5
>> dt[J(2),b]
> [[1]]
> [1] 3 4 5
>>
>
> Branson - what didn't you get to work? Thanks.
>
> Yes FR#202 was about the $C in the example (being picky about it) i.e.
> allowing non-vectors as a column (such as matrix, data.table and
> data.frame which themselves have columns) but the $D is slightly
> different since a list is a vector, too :
>
>> is.vector(list(i=1:6, j = 1,k="?"))
> [1] TRUE
>>
>
> Matthew
>
>
> On Fri, 2010-09-17 at 17:00 -0400, Tom Short wrote:
>> Branson, that's been on the wishlist for a while:
>>
>> https://r-forge.r-project.org/tracker/index.php?func=detail&aid=202&group_id=240&atid=978
>>
>> It hasn't been an urgent enough need for anyone to dig into it. You
>> can always use one data table to index a list. It may take more memory
>> and a bit more bookkeeping for the user, but it's not that hard.
>>
>> - Tom
>>
>> On Fri, Sep 17, 2010 at 1:47 PM, Branson Owen <branson.owen at gmail.com> wrote:
>> > I believe you have been already aware of what I know. Just add some
>> > suggestions here.
>> >
>> > My understanding for data.frame is list of column VECTORs, so is
>> > data.table. What I just learned is that data.frame now can be a list
>> > of anything?
>> >
>> >> DF = data.frame(A = 1:3, B = rnorm(3))
>> >> DF$C = data.frame(a=1:3,b=rnorm(3))
>> >> DF$D = list(i=1:6, j = 1,k="?")
>> >> print(DF)
>> >
>> >  A         B     C.a        C.b                D
>> > 1 1 -0.949565   1 -0.5815717 1, 2, 3, 4, 5, 6
>> > 2 2 -1.903233   2 -0.5087712                1
>> > 3 3  1.559566   3  1.4596933                ?
>> >
>> >> class(DF$C)
>> > [1] "data.frame"
>> >
>> >> class(DF$D)
>> > [1] "list"
>> >
>> > This is very cool to me! I can think of many benefits from this features.
>> >
>> > A very common example: if D is a function of B but with variable
>> > output size, and I want to do fast grouping or sorting based on key A.
>> > Before I know this, I would have to save them as separate objects and
>> > add complexity of my codes. This just adds coding and management
>> > sugar. No benefits to performance yet.
>> >
>> > But, I think data.table can make a difference just like it makes
>> > differences to data.frame! There is no sorted / indexed list object
>> > yet, right? If my variable-size outputs are millions length, any
>> > aggregating operation on a less structured object like it will be
>> > painful. Technically, data.table can make it a sorted list to enjoy
>> > data.table high performance and syntax.
>> >
>> > I did some tests, use data.table as data.list, but most of the
>> > syntaxes that work for data.frame doesn't work for data.table.
>> >
>> > I would expect this could be an easy feature, since data.frame is kind
>> > of smoothly support it. Just a suggestion. *^^*
>> >
>> > Best regards,
>> > _______________________________________________
>> > datatable-help mailing list
>> > datatable-help at lists.r-forge.r-project.org
>> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>> >
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help


More information about the datatable-help mailing list