[datatable-help] rbindlist
Alexandre Sieira
alexandre.sieira at gmail.com
Tue Dec 3 18:05:59 CET 2013
For whom it may concern, I wrote a (rather bulky) wrapper around rbindlist that:
- checks that the classes of columns with the same name match;
- fills in any missing columns with NAs of the appropriate type;
- reorders columns for consistency;
- calls rbindlist on the results of this preprocessing.
The code is here: https://gist.github.com/asieira/7772953
The results would be as follows:
> smartrbindlist(list(data.table(a=1, b=2), data.table(b=4, a=3)))
a b
1: 1 2
2: 3 4
> smartrbindlist(list(data.table(a=1, b=2), list(c=3), data.table(d="foo")))
a b c d
1: 1 2 NA NA
2: NA NA 3 NA
3: NA NA NA foo
> smartrbindlist(list(data.table(a=1L, b=2), list(a=10)))
Erro em smartrbindlist(list(data.table(a = 1L, b = 2), list(a = 10)))
smartrbindlist: column a has different classes in entry 2 [numeric] and its predecessors [integer]
Hope this helps anyone else out there.
--
Alexandre Sieira
CISA, CISSP, ISO 27001 Lead Auditor
"The truth is rarely pure and never simple."
Oscar Wilde, The Importance of Being Earnest, 1895, Act I
On 3 de dezembro de 2013 at 14:46:08, G See (gsee000 at gmail.com) wrote:
I agree. Here is a related thread:
http://thread.gmane.org/gmane.comp.lang.r.datatable/2231
Garrett
On Tue, Dec 3, 2013 at 8:26 AM, Alexandre Sieira
<alexandre.sieira at gmail.com> wrote:
> I have come across some behavior in rbindlist that look unexpected to me:
>
>> rbindlist(list(data.table(a=1, b=2), data.table(b=4, a=3)))
> a b
> 1: 1 2
> 2: 4 3
>
> So it appears to assume (without checking) that all objects have not only
> the same column names but also the same column order. So a value assigned
> to column ‘a’ in the second object was used for column ‘b’ in the end result
> (and vice-versa).
>
> I know the documentation says rbindlist uses the column types from the first
> entry of the list, but I didn’t see any mention to column order or names
> anywhere.
>
> I suggest that column names are matched, even if they are not in the same
> order. Perhaps a ‘use.names’ parameter could be used to ask for this
> behavior to avoid breaking backwards compatibility.
>
> Or, at the very least, I suggest the documentation of bindlist be updated to
> explicitly mention that the columns will be considered by position only, and
> that callers need to ensure the column orders of all objects match exactly.
> And that a warning is issued by rbindlist when the column names don’t match.
>
> --
> Alexandre Sieira
> CISA, CISSP, ISO 27001 Lead Auditor
>
> "The truth is rarely pure and never simple."
> Oscar Wilde, The Importance of Being Earnest, 1895, Act I
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20131203/33a4d0b0/attachment.html>
More information about the datatable-help
mailing list