[datatable-help] Comments on data.table

Matthew Dowle mdowle at mdowle.plus.com
Wed May 23 11:59:13 CEST 2012


Hi Tim,

Thanks for your comments. They're all valid and quite common to hear from long 
time R users. I've recently realised that it might help to say a bit about my 
background and perhaps some name dropping won't hurt. I started using S-plus in 
1999. I sat in the same office as Pat Burns, my mentor. He gave me a copy of S 
Poetry, and guided me through my first ever s-help post. I'm not claiming 
data.table design decisions are correct, or not dangerous. My *only* claim is 
that they have been thought through by a long time R/S user, rightly or 
wrongly. The very first version of data.table was on CRAN in 2006.

Chris has already got most points spot on. Most of the points I've already 
answered before on this list, on r-help, the FAQs or Stack Overflow. To draw it 
all together, perhaps the following would be a quick strategy to tackle all 
your points, in this order :

1. Open the detail of the 15 reviews on Crantastic. There are nice hints and 
tips there.
   http://crantastic.org/packages/data-table

2. Rank StackOverflow data.table tag by most voted questions and scroll 
through :
   http://stackoverflow.com/questions/tagged/data.table?sort=votes&pagesize=50

In particular the 2nd one tackles the concern about the design of `j` :
   http://stackoverflow.com/questions/7768686/r-self-reference

and this one highlights j's design being used twice in one query :
   http://stackoverflow.com/questions/10705290/r-select-a-value-for-based-on-a-
highest-value-in-another-column

There's a danger in concentrating on the most popular only, so also sort by 
newest as those might discuss the more recent features which by definition 
haven't been asked about yet or voted for.

3. On the danger of departing from [.data.frame syntax, see this :
   http://stackoverflow.com/questions/10527072/using-data-table-package-inside-
my-own-package
(the question may not seem relevant to that point, but the answer is)

4. Read the first section of vignette("datatable-faq") in order starting at 
1.1. It's structured with your concerns in mind.

5. Run example(data.table) and follow the results through at the prompt. Don't 
actually read ?data.table yet.

6. On the danger of data.table not being copied, that's very deliberate as 
Chris says. See ?copy and run example(copy) at the prompt.

I suspect the two key features of data.table you're missig are := and cedta(), 
which the above links answer hopefully.  I've listed it like that, in that 
order, so hopefully it is very quick to get through.

Matthew





More information about the datatable-help mailing list