[datatable-help] Is data.table ready for prime-timeandsensitivework?

Santosh Srinivas santosh.srinivas at gmail.com
Tue Dec 7 17:47:11 CET 2010


First ... thanks for data.table .... even I (with average R skills) am
able to do some cool stuff on large datasets and FAST!.

I've heard that integrability with R is quite decent?
Python is next on my list but I hope my experience will be complementary


On Tue, Dec 7, 2010 at 10:10 PM, Rob Forler <rforler at uchicago.edu> wrote:
> It's not a hard language to pick up, but definitely one of the challenges
> with replacing R with python is many of the stat and data packages that
> exist in R.
> Python has strong scientific and data packages, but I not at the same level.
>
> On the other hand python is much stronger from a language point of view
> (good oo etc), and has many more operational tools.
>
> -Rob
>
> On Tue, Dec 7, 2010 at 8:30 AM, Matthew Dowle <mdowle at mdowle.plus.com>
> wrote:
>>
>> Interesting. I don't know python but it's been on the radar a few times.
>>
>> "Rob Forler" <rforler at uchicago.edu> wrote in message
>> news:AANLkTikCZL=WhE_UoWiUvoQ+TLXj8dgrJ6iG4GhdOSdv at mail.gmail.com...
>> I'm coding in python now. The group I'm in now has a similar tool (closed
>> source)  to data.table but in python and is based on numpy.
>>
>> The api isn't as beautiful as data.table's, but has similar functionality.
>>
>> -Rob
>>
>> On Tue, Dec 7, 2010 at 7:54 AM, Matthew Dowle <mdowle at mdowle.plus.com>
>> wrote:
>>>
>>> Thanks Rob. That begs an obvious question then ... what are you coding in
>>> now ?
>>>
>>> "Rob Forler" <rforler at uchicago.edu> wrote in message
>>> news:AANLkTikSsoy8F6aKuQYXW5GWeMLiE+Y07D_3X+K7AeX=@mail.gmail.com...
>>> I can attest that I used data.table very extensively for several months
>>> on large datasets (financial). I was replacing a fair of poorly coded
>>> data/frame, sql, plyr, apply code, and was able to match the previous
>>> numbers and do a significant amount of new analysis because of the ease of
>>> using data.tables.
>>>
>>> If I was still coding in R on a regular basis you can gauarauntee I'd use
>>> data.table every day.
>>>
>>> Thanks,
>>> Rob
>>>
>>> On Tue, Dec 7, 2010 at 7:30 AM, Matthew Dowle <mdowle at mdowle.plus.com>
>>> wrote:
>>>>
>>>> Just to clarify also about the date of first release - March 2010? Any
>>>> chance Mel you looked at the CRAN archive page and read off the last
>>>> row?
>>>> Oldest is first not last on that page :
>>>>
>>>> http://cran.r-project.org/src/contrib/Archive/data.table/
>>>>
>>>> v1.0 was released April 2006 but that was removed from CRAN happily
>>>> because
>>>> base quickly (within weeks) included features that removed the need for
>>>> data.table.  It was re-released in Aug 2008 with new functionality so
>>>> that's
>>>> the relevant release date for your purpose.
>>>>
>>>> Feel free to post the puzzling results. You've done well to use it for 2
>>>> weeks without posting, so you can probably tilt towards using this list
>>>> more
>>>> (on a new thread please). If we can get you over those hurdles first
>>>> then
>>>> reconsider if the 'robustness' question still stands.
>>>>
>>>> Other info which you may have not have found yet ...
>>>>
>>>> Crantastic has 5 detailed user reviews of data.table. It does state that
>>>> v1.1 was released over 2 years ago, too, so leads me to guess you may
>>>> have
>>>> missed the link to crantastic on the data.table homepage.
>>>>
>>>> There are some oddities in the ranking formula but if you look at
>>>> http://crantastic.org/popcon and realise that the batch near the bottom
>>>> starting with reshape, ggplot2 and plyr should be at the top (seems like
>>>> a
>>>> bug, I'll let them know) then data.table appears to be around the 8th
>>>> most
>>>> popular CRAN package with average score 4.7/5 and 10 users, compared to
>>>> ggplot2's 39 users.  So crantastic itself is not popular since everyone
>>>> knows that ggplot2 has many more than 39 users, and some very popular
>>>> and
>>>> stable packages don't have any votes at all. Even so perhaps this small
>>>> amount of data may be useful in your assessment generally.  "data.table"
>>>> is
>>>> not the easiest to google for.
>>>>
>>>> The NEWS file (link on the homepage) says that v1.2 was released in Aug
>>>> 2008, too, at the bottom, along with what changed in each release since
>>>> then.
>>>>
>>>> Matthew
>>>>
>>>> "Tom Short" <tshort.rlists at gmail.com> wrote in message
>>>> news:AANLkTik=0j5da9j8_zVaW4DZhygKg6oqRP1Pg+JG3TFg at mail.gmail.com...
>>>> > On Mon, Dec 6, 2010 at 10:54 PM, mbacou <mel at mbacou.com> wrote:
>>>> >> My question is: is data.table ready for production? Would you rely on
>>>> >> it
>>>> >> for
>>>> >> sensitive publications?
>>>> >
>>>> > If you have tight time deadlines, you may want to go with what you
>>>> > have experience with, especially if it involves complicated queries or
>>>> > manipulations. If you've already tried the data.table features you'll
>>>> > need for "production", then using data.table may help you get things
>>>> > done faster.
>>>> >
>>>> > Data.table has been robust for me on 6-GB datasets on a machine with
>>>> > 24 GB of ram. With data.table, as with most tools, user error is more
>>>> > likely than a tool bug, so you need to test/check your data and your
>>>> > results.
>>>> >
>>>> > - Tom
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> datatable-help mailing list
>>>> datatable-help at lists.r-forge.r-project.org
>>>>
>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>
>>> ________________________________
>>>
>>> _______________________________________________
>>> datatable-help mailing list
>>> datatable-help at lists.r-forge.r-project.org
>>>
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>
>>> _______________________________________________
>>> datatable-help mailing list
>>> datatable-help at lists.r-forge.r-project.org
>>>
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>
>>
>> ________________________________
>>
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>>
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>>
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>


More information about the datatable-help mailing list