[datatable-help] Is data.table ready for prime-time andsensitive work?

Rob Forler rforler at uchicago.edu
Tue Dec 7 14:48:03 CET 2010


I can attest that I used data.table very extensively for several months on
large datasets (financial). I was replacing a fair of poorly coded
data/frame, sql, plyr, apply code, and was able to match the previous
numbers and do a significant amount of new analysis because of the ease of
using data.tables.

If I was still coding in R on a regular basis you can gauarauntee I'd use
data.table every day.

Thanks,
Rob

On Tue, Dec 7, 2010 at 7:30 AM, Matthew Dowle <mdowle at mdowle.plus.com>wrote:

> Just to clarify also about the date of first release - March 2010? Any
> chance Mel you looked at the CRAN archive page and read off the last row?
> Oldest is first not last on that page :
>
> http://cran.r-project.org/src/contrib/Archive/data.table/
>
> v1.0 was released April 2006 but that was removed from CRAN happily because
> base quickly (within weeks) included features that removed the need for
> data.table.  It was re-released in Aug 2008 with new functionality so
> that's
> the relevant release date for your purpose.
>
> Feel free to post the puzzling results. You've done well to use it for 2
> weeks without posting, so you can probably tilt towards using this list
> more
> (on a new thread please). If we can get you over those hurdles first then
> reconsider if the 'robustness' question still stands.
>
> Other info which you may have not have found yet ...
>
> Crantastic has 5 detailed user reviews of data.table. It does state that
> v1.1 was released over 2 years ago, too, so leads me to guess you may have
> missed the link to crantastic on the data.table homepage.
>
> There are some oddities in the ranking formula but if you look at
> http://crantastic.org/popcon and realise that the batch near the bottom
> starting with reshape, ggplot2 and plyr should be at the top (seems like a
> bug, I'll let them know) then data.table appears to be around the 8th most
> popular CRAN package with average score 4.7/5 and 10 users, compared to
> ggplot2's 39 users.  So crantastic itself is not popular since everyone
> knows that ggplot2 has many more than 39 users, and some very popular and
> stable packages don't have any votes at all. Even so perhaps this small
> amount of data may be useful in your assessment generally.  "data.table" is
> not the easiest to google for.
>
> The NEWS file (link on the homepage) says that v1.2 was released in Aug
> 2008, too, at the bottom, along with what changed in each release since
> then.
>
> Matthew
>
> "Tom Short" <tshort.rlists at gmail.com> wrote in message
> news:AANLkTik=0j5da9j8_zVaW4DZhygKg6oqRP1Pg+JG3TFg at mail.gmail.com...
>  > On Mon, Dec 6, 2010 at 10:54 PM, mbacou <mel at mbacou.com> wrote:
> >> My question is: is data.table ready for production? Would you rely on it
> >> for
> >> sensitive publications?
> >
> > If you have tight time deadlines, you may want to go with what you
> > have experience with, especially if it involves complicated queries or
> > manipulations. If you've already tried the data.table features you'll
> > need for "production", then using data.table may help you get things
> > done faster.
> >
> > Data.table has been robust for me on 6-GB datasets on a machine with
> > 24 GB of ram. With data.table, as with most tools, user error is more
> > likely than a tool bug, so you need to test/check your data and your
> > results.
> >
> > - Tom
>
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20101207/eefd6865/attachment.htm>


More information about the datatable-help mailing list