[datatable-help] Is data.table ready for prime-timeandsensitivework?

Rob Forler rforler at uchicago.edu
Tue Dec 7 18:07:14 CET 2010


I don't do anything that mixes R and python so I can't comment on using the
two interchangeably.

There are some api's that I've played around with that seem okay, but they
probably buckle under a reasonable amount of preasure.

On Tue, Dec 7, 2010 at 10:47 AM, Santosh Srinivas <
santosh.srinivas at gmail.com> wrote:

> First ... thanks for data.table .... even I (with average R skills) am
> able to do some cool stuff on large datasets and FAST!.
>
> I've heard that integrability with R is quite decent?
> Python is next on my list but I hope my experience will be complementary
>
>
> On Tue, Dec 7, 2010 at 10:10 PM, Rob Forler <rforler at uchicago.edu> wrote:
> > It's not a hard language to pick up, but definitely one of the challenges
> > with replacing R with python is many of the stat and data packages that
> > exist in R.
> > Python has strong scientific and data packages, but I not at the same
> level.
> >
> > On the other hand python is much stronger from a language point of view
> > (good oo etc), and has many more operational tools.
> >
> > -Rob
> >
> > On Tue, Dec 7, 2010 at 8:30 AM, Matthew Dowle <mdowle at mdowle.plus.com>
> > wrote:
> >>
> >> Interesting. I don't know python but it's been on the radar a few times.
> >>
> >> "Rob Forler" <rforler at uchicago.edu> wrote in message
> >> news:AANLkTikCZL=WhE_UoWiUvoQ+TLXj8dgrJ6iG4GhdOSdv at mail.gmail.com...
> >> I'm coding in python now. The group I'm in now has a similar tool
> (closed
> >> source)  to data.table but in python and is based on numpy.
> >>
> >> The api isn't as beautiful as data.table's, but has similar
> functionality.
> >>
> >> -Rob
> >>
> >> On Tue, Dec 7, 2010 at 7:54 AM, Matthew Dowle <mdowle at mdowle.plus.com>
> >> wrote:
> >>>
> >>> Thanks Rob. That begs an obvious question then ... what are you coding
> in
> >>> now ?
> >>>
> >>> "Rob Forler" <rforler at uchicago.edu> wrote in message
> >>> news:AANLkTikSsoy8F6aKuQYXW5GWeMLiE+Y07D_3X+K7AeX=@mail.gmail.com...
> >>> I can attest that I used data.table very extensively for several months
> >>> on large datasets (financial). I was replacing a fair of poorly coded
> >>> data/frame, sql, plyr, apply code, and was able to match the previous
> >>> numbers and do a significant amount of new analysis because of the ease
> of
> >>> using data.tables.
> >>>
> >>> If I was still coding in R on a regular basis you can gauarauntee I'd
> use
> >>> data.table every day.
> >>>
> >>> Thanks,
> >>> Rob
> >>>
> >>> On Tue, Dec 7, 2010 at 7:30 AM, Matthew Dowle <mdowle at mdowle.plus.com>
> >>> wrote:
> >>>>
> >>>> Just to clarify also about the date of first release - March 2010? Any
> >>>> chance Mel you looked at the CRAN archive page and read off the last
> >>>> row?
> >>>> Oldest is first not last on that page :
> >>>>
> >>>> http://cran.r-project.org/src/contrib/Archive/data.table/
> >>>>
> >>>> v1.0 was released April 2006 but that was removed from CRAN happily
> >>>> because
> >>>> base quickly (within weeks) included features that removed the need
> for
> >>>> data.table.  It was re-released in Aug 2008 with new functionality so
> >>>> that's
> >>>> the relevant release date for your purpose.
> >>>>
> >>>> Feel free to post the puzzling results. You've done well to use it for
> 2
> >>>> weeks without posting, so you can probably tilt towards using this
> list
> >>>> more
> >>>> (on a new thread please). If we can get you over those hurdles first
> >>>> then
> >>>> reconsider if the 'robustness' question still stands.
> >>>>
> >>>> Other info which you may have not have found yet ...
> >>>>
> >>>> Crantastic has 5 detailed user reviews of data.table. It does state
> that
> >>>> v1.1 was released over 2 years ago, too, so leads me to guess you may
> >>>> have
> >>>> missed the link to crantastic on the data.table homepage.
> >>>>
> >>>> There are some oddities in the ranking formula but if you look at
> >>>> http://crantastic.org/popcon and realise that the batch near the
> bottom
> >>>> starting with reshape, ggplot2 and plyr should be at the top (seems
> like
> >>>> a
> >>>> bug, I'll let them know) then data.table appears to be around the 8th
> >>>> most
> >>>> popular CRAN package with average score 4.7/5 and 10 users, compared
> to
> >>>> ggplot2's 39 users.  So crantastic itself is not popular since
> everyone
> >>>> knows that ggplot2 has many more than 39 users, and some very popular
> >>>> and
> >>>> stable packages don't have any votes at all. Even so perhaps this
> small
> >>>> amount of data may be useful in your assessment generally.
>  "data.table"
> >>>> is
> >>>> not the easiest to google for.
> >>>>
> >>>> The NEWS file (link on the homepage) says that v1.2 was released in
> Aug
> >>>> 2008, too, at the bottom, along with what changed in each release
> since
> >>>> then.
> >>>>
> >>>> Matthew
> >>>>
> >>>> "Tom Short" <tshort.rlists at gmail.com> wrote in message
> >>>> news:AANLkTik=0j5da9j8_zVaW4DZhygKg6oqRP1Pg+JG3TFg at mail.gmail.com...
> >>>> > On Mon, Dec 6, 2010 at 10:54 PM, mbacou <mel at mbacou.com> wrote:
> >>>> >> My question is: is data.table ready for production? Would you rely
> on
> >>>> >> it
> >>>> >> for
> >>>> >> sensitive publications?
> >>>> >
> >>>> > If you have tight time deadlines, you may want to go with what you
> >>>> > have experience with, especially if it involves complicated queries
> or
> >>>> > manipulations. If you've already tried the data.table features
> you'll
> >>>> > need for "production", then using data.table may help you get things
> >>>> > done faster.
> >>>> >
> >>>> > Data.table has been robust for me on 6-GB datasets on a machine with
> >>>> > 24 GB of ram. With data.table, as with most tools, user error is
> more
> >>>> > likely than a tool bug, so you need to test/check your data and your
> >>>> > results.
> >>>> >
> >>>> > - Tom
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> datatable-help mailing list
> >>>> datatable-help at lists.r-forge.r-project.org
> >>>>
> >>>>
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> >>>
> >>> ________________________________
> >>>
> >>> _______________________________________________
> >>> datatable-help mailing list
> >>> datatable-help at lists.r-forge.r-project.org
> >>>
> >>>
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> >>>
> >>> _______________________________________________
> >>> datatable-help mailing list
> >>> datatable-help at lists.r-forge.r-project.org
> >>>
> >>>
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> >>>
> >>
> >> ________________________________
> >>
> >> _______________________________________________
> >> datatable-help mailing list
> >> datatable-help at lists.r-forge.r-project.org
> >>
> >>
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> >>
> >> _______________________________________________
> >> datatable-help mailing list
> >> datatable-help at lists.r-forge.r-project.org
> >>
> >>
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> >>
> >
> >
> > _______________________________________________
> > datatable-help mailing list
> > datatable-help at lists.r-forge.r-project.org
> >
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> >
> >
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20101207/3e6055fb/attachment-0001.htm>


More information about the datatable-help mailing list