[datatable-help] rbindlist

Eduard Antonyan eduard.antonyan at gmail.com
Tue Dec 3 18:22:28 CET 2013


I took a cursory look at your code - the new rbind does everything you want
(check use.names and the fill arguments), and you may want to take a look
at its code.


On Tue, Dec 3, 2013 at 11:05 AM, Alexandre Sieira <
alexandre.sieira at gmail.com> wrote:

> For whom it may concern, I wrote a (rather bulky) wrapper around rbindlist
> that:
>
> - checks that the classes of columns with the same name match;
> - fills in any missing columns with NAs of the appropriate type;
> - reorders columns for consistency;
> - calls rbindlist on the results of this preprocessing.
>
> The code is here: https://gist.github.com/asieira/7772953
>
> The results would be as follows:
>
> > smartrbindlist(list(data.table(a=1, b=2), data.table(b=4, a=3)))
>    a b
> 1: 1 2
> 2: 3 4
>
> > smartrbindlist(list(data.table(a=1, b=2), list(c=3),
> data.table(d="foo")))
>     a  b  c   d
> 1:  1  2 NA  NA
> 2: NA NA  3  NA
> 3: NA NA NA foo
>
> > smartrbindlist(list(data.table(a=1L, b=2), list(a=10)))
> Erro em smartrbindlist(list(data.table(a = 1L, b = 2), list(a = 10)))
>   smartrbindlist: column a has different classes in entry 2 [numeric] and
> its predecessors [integer]
>
> Hope this helps anyone else out there.
>
> --
> Alexandre Sieira
> CISA, CISSP, ISO 27001 Lead Auditor
>
> "The truth is rarely pure and never simple."
> Oscar Wilde, The Importance of Being Earnest, 1895, Act I
>
> On 3 de dezembro de 2013 at 14:46:08, G See (gsee000 at gmail.com<//gsee000 at gmail.com>)
> wrote:
>
> I agree. Here is a related thread:
> http://thread.gmane.org/gmane.comp.lang.r.datatable/2231
>
> Garrett
>
>
> On Tue, Dec 3, 2013 at 8:26 AM, Alexandre Sieira
> <alexandre.sieira at gmail.com> wrote:
> > I have come across some behavior in rbindlist that look unexpected to
> me:
> >
> >> rbindlist(list(data.table(a=1, b=2), data.table(b=4, a=3)))
> > a b
> > 1: 1 2
> > 2: 4 3
> >
> > So it appears to assume (without checking) that all objects have not
> only
> > the same column names but also the same column order. So a value
> assigned
> > to column ‘a’ in the second object was used for column ‘b’ in the end
> result
> > (and vice-versa).
> >
> > I know the documentation says rbindlist uses the column types from the
> first
> > entry of the list, but I didn’t see any mention to column order or names
> > anywhere.
> >
> > I suggest that column names are matched, even if they are not in the
> same
> > order. Perhaps a ‘use.names’ parameter could be used to ask for this
> > behavior to avoid breaking backwards compatibility.
> >
> > Or, at the very least, I suggest the documentation of bindlist be
> updated to
> > explicitly mention that the columns will be considered by position only,
> and
> > that callers need to ensure the column orders of all objects match
> exactly.
> > And that a warning is issued by rbindlist when the column names don’t
> match.
> >
> > --
> > Alexandre Sieira
> > CISA, CISSP, ISO 27001 Lead Auditor
> >
> > "The truth is rarely pure and never simple."
> > Oscar Wilde, The Importance of Being Earnest, 1895, Act I
> >
> > _______________________________________________
> > datatable-help mailing list
> > datatable-help at lists.r-forge.r-project.org
> >
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20131203/42ab545e/attachment-0001.html>


More information about the datatable-help mailing list