[datatable-help] Suggest a cool feature: Use data.table like a sorted / indexed data.list?

Branson Owen branson.owen at gmail.com
Fri Sep 17 19:47:12 CEST 2010


I believe you have been already aware of what I know. Just add some
suggestions here.

My understanding for data.frame is list of column VECTORs, so is
data.table. What I just learned is that data.frame now can be a list
of anything?

> DF = data.frame(A = 1:3, B = rnorm(3))
> DF$C = data.frame(a=1:3,b=rnorm(3))
> DF$D = list(i=1:6, j = 1,k="?")
> print(DF)

  A         B     C.a        C.b                D
1 1 -0.949565   1 -0.5815717 1, 2, 3, 4, 5, 6
2 2 -1.903233   2 -0.5087712                1
3 3  1.559566   3  1.4596933                ?

> class(DF$C)
[1] "data.frame"

> class(DF$D)
[1] "list"

This is very cool to me! I can think of many benefits from this features.

A very common example: if D is a function of B but with variable
output size, and I want to do fast grouping or sorting based on key A.
Before I know this, I would have to save them as separate objects and
add complexity of my codes. This just adds coding and management
sugar. No benefits to performance yet.

But, I think data.table can make a difference just like it makes
differences to data.frame! There is no sorted / indexed list object
yet, right? If my variable-size outputs are millions length, any
aggregating operation on a less structured object like it will be
painful. Technically, data.table can make it a sorted list to enjoy
data.table high performance and syntax.

I did some tests, use data.table as data.list, but most of the
syntaxes that work for data.frame doesn't work for data.table.

I would expect this could be an easy feature, since data.frame is kind
of smoothly support it. Just a suggestion. *^^*

Best regards,


More information about the datatable-help mailing list