[datatable-help] Response to dplyr baseball vignette benchmarks

Chris Neff caneff at gmail.com
Wed Jan 22 21:17:08 CET 2014


When you do use larger data sets where it will matter, I think more
strongly highlighting the in-place vs. copying differences will be key.
There is also the notion that yes, you should compare things as closely as
possible when just doing standard benchmarking, but I think this is selling
data.table a bit short by mimicking dplyr with copying.  You show this a
bit in the mutate example, but even in the arrange example the copy is
slowing things down.  It is so small that it wouldn't really make a ton of
difference in this case, but with 10m rows the copying gets to be a large
noticeable difference between data.table and standard data.frame methods
like setnames vs names<-




On Wed, Jan 22, 2014 at 3:09 PM, Arunkumar Srinivasan <aragorn168b at gmail.com
> wrote:

> Chris,
>
> Thanks. Yes that's the plan (the last line in the link). Once the next
> version of data.table is out on CRAN, the benchmarks should come out.
>
> Arun
> ------------------------------
> From: Chris Neff Chris Neff <caneff at gmail.com>
> Reply: Chris Neff caneff at gmail.com
> Date: January 22, 2014 at 9:07:34 PM
> To: Arunkumar Srinivasan aragorn168b at gmail.com
> Subject:  Re: [datatable-help] Response to dplyr baseball vignette
> benchmarks
>
> Thank you for responding to this so fast to get out ahead of the
> misleading aspects.
>
> As another comparison, it would definitely be constructive to also use a
> data set that is larger than 10 MB.  Something in the 1m+ row range perhaps.
>
>
> On Wed, Jan 22, 2014 at 2:54 PM, Arunkumar Srinivasan <
> aragorn168b at gmail.com> wrote:
>
>>  Hello,
>>
>>  Matthew and I have redone the benchmarks and posted a response to the
>> dplyr's
>>  baseball vignette benchmark here:
>> http://arunsrinivasan.github.io/dplyr_benchmark/
>>
>> Have a look and let us know what you think!
>>
>>  Arun
>>
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>>
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140122/a97575c7/attachment-0001.html>


More information about the datatable-help mailing list