[datatable-help] Is data.table ready for prime-timeandsensitivework?

Matthew Dowle mdowle at mdowle.plus.com
Tue Dec 7 15:30:07 CET 2010


Interesting. I don't know python but it's been on the radar a few times.
  "Rob Forler" <rforler at uchicago.edu> wrote in message news:AANLkTikCZL=WhE_UoWiUvoQ+TLXj8dgrJ6iG4GhdOSdv at mail.gmail.com...
  I'm coding in python now. The group I'm in now has a similar tool (closed source)  to data.table but in python and is based on numpy.

  The api isn't as beautiful as data.table's, but has similar functionality. 

  -Rob

  On Tue, Dec 7, 2010 at 7:54 AM, Matthew Dowle <mdowle at mdowle.plus.com> wrote:

    Thanks Rob. That begs an obvious question then ... what are you coding in now ?
      "Rob Forler" <rforler at uchicago.edu> wrote in message news:AANLkTikSsoy8F6aKuQYXW5GWeMLiE+Y07D_3X+K7AeX=@mail.gmail.com...
      I can attest that I used data.table very extensively for several months on large datasets (financial). I was replacing a fair of poorly coded data/frame, sql, plyr, apply code, and was able to match the previous numbers and do a significant amount of new analysis because of the ease of using data.tables.

      If I was still coding in R on a regular basis you can gauarauntee I'd use data.table every day.

      Thanks,
      Rob


      On Tue, Dec 7, 2010 at 7:30 AM, Matthew Dowle <mdowle at mdowle.plus.com> wrote:

        Just to clarify also about the date of first release - March 2010? Any
        chance Mel you looked at the CRAN archive page and read off the last row?
        Oldest is first not last on that page :

        http://cran.r-project.org/src/contrib/Archive/data.table/

        v1.0 was released April 2006 but that was removed from CRAN happily because
        base quickly (within weeks) included features that removed the need for
        data.table.  It was re-released in Aug 2008 with new functionality so that's
        the relevant release date for your purpose.

        Feel free to post the puzzling results. You've done well to use it for 2
        weeks without posting, so you can probably tilt towards using this list more
        (on a new thread please). If we can get you over those hurdles first then
        reconsider if the 'robustness' question still stands.

        Other info which you may have not have found yet ...

        Crantastic has 5 detailed user reviews of data.table. It does state that
        v1.1 was released over 2 years ago, too, so leads me to guess you may have
        missed the link to crantastic on the data.table homepage.

        There are some oddities in the ranking formula but if you look at
        http://crantastic.org/popcon and realise that the batch near the bottom
        starting with reshape, ggplot2 and plyr should be at the top (seems like a
        bug, I'll let them know) then data.table appears to be around the 8th most
        popular CRAN package with average score 4.7/5 and 10 users, compared to
        ggplot2's 39 users.  So crantastic itself is not popular since everyone
        knows that ggplot2 has many more than 39 users, and some very popular and
        stable packages don't have any votes at all. Even so perhaps this small
        amount of data may be useful in your assessment generally.  "data.table" is
        not the easiest to google for.

        The NEWS file (link on the homepage) says that v1.2 was released in Aug
        2008, too, at the bottom, along with what changed in each release since
        then.

        Matthew

        "Tom Short" <tshort.rlists at gmail.com> wrote in message
        news:AANLkTik=0j5da9j8_zVaW4DZhygKg6oqRP1Pg+JG3TFg at mail.gmail.com...

        > On Mon, Dec 6, 2010 at 10:54 PM, mbacou <mel at mbacou.com> wrote:
        >> My question is: is data.table ready for production? Would you rely on it
        >> for
        >> sensitive publications?
        >
        > If you have tight time deadlines, you may want to go with what you
        > have experience with, especially if it involves complicated queries or
        > manipulations. If you've already tried the data.table features you'll
        > need for "production", then using data.table may help you get things
        > done faster.
        >
        > Data.table has been robust for me on 6-GB datasets on a machine with
        > 24 GB of ram. With data.table, as with most tools, user error is more
        > likely than a tool bug, so you need to test/check your data and your
        > results.
        >
        > - Tom



        _______________________________________________
        datatable-help mailing list
        datatable-help at lists.r-forge.r-project.org
        https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help





--------------------------------------------------------------------------


      _______________________________________________
      datatable-help mailing list
      datatable-help at lists.r-forge.r-project.org
      https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help




    _______________________________________________
    datatable-help mailing list
    datatable-help at lists.r-forge.r-project.org
    https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help






------------------------------------------------------------------------------


  _______________________________________________
  datatable-help mailing list
  datatable-help at lists.r-forge.r-project.org
  https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20101207/82e233de/attachment-0001.htm>


More information about the datatable-help mailing list