[datatable-help] Comments on data.table
Matthew Dowle
mdowle at mdowle.plus.com
Wed May 23 11:59:13 CEST 2012
Hi Tim,
Thanks for your comments. They're all valid and quite common to hear from long
time R users. I've recently realised that it might help to say a bit about my
background and perhaps some name dropping won't hurt. I started using S-plus in
1999. I sat in the same office as Pat Burns, my mentor. He gave me a copy of S
Poetry, and guided me through my first ever s-help post. I'm not claiming
data.table design decisions are correct, or not dangerous. My *only* claim is
that they have been thought through by a long time R/S user, rightly or
wrongly. The very first version of data.table was on CRAN in 2006.
Chris has already got most points spot on. Most of the points I've already
answered before on this list, on r-help, the FAQs or Stack Overflow. To draw it
all together, perhaps the following would be a quick strategy to tackle all
your points, in this order :
1. Open the detail of the 15 reviews on Crantastic. There are nice hints and
tips there.
http://crantastic.org/packages/data-table
2. Rank StackOverflow data.table tag by most voted questions and scroll
through :
http://stackoverflow.com/questions/tagged/data.table?sort=votes&pagesize=50
In particular the 2nd one tackles the concern about the design of `j` :
http://stackoverflow.com/questions/7768686/r-self-reference
and this one highlights j's design being used twice in one query :
http://stackoverflow.com/questions/10705290/r-select-a-value-for-based-on-a-
highest-value-in-another-column
There's a danger in concentrating on the most popular only, so also sort by
newest as those might discuss the more recent features which by definition
haven't been asked about yet or voted for.
3. On the danger of departing from [.data.frame syntax, see this :
http://stackoverflow.com/questions/10527072/using-data-table-package-inside-
my-own-package
(the question may not seem relevant to that point, but the answer is)
4. Read the first section of vignette("datatable-faq") in order starting at
1.1. It's structured with your concerns in mind.
5. Run example(data.table) and follow the results through at the prompt. Don't
actually read ?data.table yet.
6. On the danger of data.table not being copied, that's very deliberate as
Chris says. See ?copy and run example(copy) at the prompt.
I suspect the two key features of data.table you're missig are := and cedta(),
which the above links answer hopefully. I've listed it like that, in that
order, so hopefully it is very quick to get through.
Matthew
More information about the datatable-help
mailing list