[datatable-help] Suggest a cool feature: Use data.table like a sorted / indexed data.list?

Matthew Dowle mdowle at mdowle.plus.com
Sat Sep 18 03:20:56 CEST 2010


This seems to work :

> dt = data.table(a=1:4)
> dt$b = list(1:2,3:5,"r",6:9)
> dt
     a          b
[1,] 1       1, 2
[2,] 2    3, 4, 5
[3,] 3          r
[4,] 4 6, 7, 8, 9
> setkey(dt,a)
> dt[J(2)]
     a       b
[1,] 2 3, 4, 5
> dt[J(2),b]
[[1]]
[1] 3 4 5
> 

Branson - what didn't you get to work? Thanks.

Yes FR#202 was about the $C in the example (being picky about it) i.e.
allowing non-vectors as a column (such as matrix, data.table and
data.frame which themselves have columns) but the $D is slightly
different since a list is a vector, too :

> is.vector(list(i=1:6, j = 1,k="?"))
[1] TRUE
>

Matthew


On Fri, 2010-09-17 at 17:00 -0400, Tom Short wrote:
> Branson, that's been on the wishlist for a while:
> 
> https://r-forge.r-project.org/tracker/index.php?func=detail&aid=202&group_id=240&atid=978
> 
> It hasn't been an urgent enough need for anyone to dig into it. You
> can always use one data table to index a list. It may take more memory
> and a bit more bookkeeping for the user, but it's not that hard.
> 
> - Tom
> 
> On Fri, Sep 17, 2010 at 1:47 PM, Branson Owen <branson.owen at gmail.com> wrote:
> > I believe you have been already aware of what I know. Just add some
> > suggestions here.
> >
> > My understanding for data.frame is list of column VECTORs, so is
> > data.table. What I just learned is that data.frame now can be a list
> > of anything?
> >
> >> DF = data.frame(A = 1:3, B = rnorm(3))
> >> DF$C = data.frame(a=1:3,b=rnorm(3))
> >> DF$D = list(i=1:6, j = 1,k="?")
> >> print(DF)
> >
> >  A         B     C.a        C.b                D
> > 1 1 -0.949565   1 -0.5815717 1, 2, 3, 4, 5, 6
> > 2 2 -1.903233   2 -0.5087712                1
> > 3 3  1.559566   3  1.4596933                ?
> >
> >> class(DF$C)
> > [1] "data.frame"
> >
> >> class(DF$D)
> > [1] "list"
> >
> > This is very cool to me! I can think of many benefits from this features.
> >
> > A very common example: if D is a function of B but with variable
> > output size, and I want to do fast grouping or sorting based on key A.
> > Before I know this, I would have to save them as separate objects and
> > add complexity of my codes. This just adds coding and management
> > sugar. No benefits to performance yet.
> >
> > But, I think data.table can make a difference just like it makes
> > differences to data.frame! There is no sorted / indexed list object
> > yet, right? If my variable-size outputs are millions length, any
> > aggregating operation on a less structured object like it will be
> > painful. Technically, data.table can make it a sorted list to enjoy
> > data.table high performance and syntax.
> >
> > I did some tests, use data.table as data.list, but most of the
> > syntaxes that work for data.frame doesn't work for data.table.
> >
> > I would expect this could be an easy feature, since data.frame is kind
> > of smoothly support it. Just a suggestion. *^^*
> >
> > Best regards,
> > _______________________________________________
> > datatable-help mailing list
> > datatable-help at lists.r-forge.r-project.org
> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> >
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help





More information about the datatable-help mailing list