[datatable-help] rbind() vs. rbindlist() behavior/warning

Eduard Antonyan eduard.antonyan at gmail.com
Sat Nov 9 19:25:00 CET 2013


Re speed: last I checked, new rbindlist was about 5% slower in no-coercion
cases and was quite a bit faster in cases where there was coercion.

do.call(rbind is indeed much slower than rbindlist and even if
.rbind.data.table took no time to do, it'll still be much slower than
rbindlist because of all the dispatching before it gets to
.rbind.data.table. That said, I'm pretty sure rbind is now faster than
rbind in 1.8.10 in all cases.


On Sat, Nov 9, 2013 at 11:49 AM, G See <gsee000 at gmail.com> wrote:

> I really meant that I thought that do.call(rbind, list(a, b)) would be
> slower than rbindlist(list(a, b)).  e.g. when you don't know the
> length of the list of data.tables
>
> On Sat, Nov 9, 2013 at 11:44 AM, Arunkumar Srinivasan
> <aragorn168b at gmail.com> wrote:
> > I am not aware of the status now after eddi's recent edits. "rbindlist"
> > initially only checked the type of the first data.table's columns. But
> now I
> > guess with eddi's changes, it does look-down and decide based on class
> > hierarchy. That is, if column 1 of dt1 is integer, but of dt2 is numeric,
> > it's now "numeric", but before it was "integer". I guess this'll affect
> the
> > speed. I've not done any benchmarking yet. But I'm guessing it'll be
> slower
> > than at least the previous version.
> >
> > Eddi, any thoughts on this?
> >
> > Arun
> >
> > On Saturday, November 9, 2013 at 6:38 PM, G See wrote:
> >
> > Isn't rbindlist(myList) faster than do.call(rbind, myList)?
> >
> > Garrett
> >
> > On Sat, Nov 9, 2013 at 11:33 AM, Arunkumar Srinivasan
> > <aragorn168b at gmail.com> wrote:
> >
> > GSee, I find this a bit confusing at the moment as well - the
> convergence of
> > "rbind" and "rbindlist" and therefore the future of "rbindlist".
> >
> > `rbindlist` gained speed (to some extent) by assuming things like this
> and
> > skipping checks in the first place. So, should we include checks like
> this?
> > Also, if "rbind" and/or "rbindlist" are made to do the exact same thing,
> > then, what's the purpose of "rbindlist"?
> >
> > Any thoughts?
> >
> > Arun
> >
> > On Saturday, November 9, 2013 at 6:29 PM, Eduard Antonyan wrote:
> >
> > Fyi, it's not well documented, but setting use.names=FALSE in rbind would
> > replicate rbindlist behavior.
> >
> > I think it's a reasonable FR - if/when all of rbind code goes into C, it
> > would be trivial to add.
> >
> > On Nov 9, 2013 10:51 AM, "G See" <gsee000 at gmail.com> wrote:
> >
> > Hi,
> >
> > Please note the inconsistency between the behavior of rbind() and
> > rbindlist() below.
> >
> > m1 <- as.data.table(mtcars)
> > m2 <- copy(m1)
> > rbind(m1[, .SD[1], by=cyl], m2) # Gives warning and binds by name
> > rbindlist(list(m1[, .SD[1], by=cyl], m2)) # no warning, and does NOT
> > bind by name
> >
> > What do you think about making them have the same behavior and/or
> > warning? Personally, I prefer the behavior of rbind(), and would
> > prefer to see a warning if column names are ignored like they are with
> > rbindlist().
> >
> > Thanks,
> > Garrett
> > _______________________________________________
> > datatable-help mailing list
> > datatable-help at lists.r-forge.r-project.org
> >
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> >
> > _______________________________________________
> > datatable-help mailing list
> > datatable-help at lists.r-forge.r-project.org
> >
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20131109/a3d814a6/attachment-0001.html>


More information about the datatable-help mailing list