[datatable-help] rbindlist

Alexandre Sieira alexandre.sieira at gmail.com
Tue Dec 3 18:05:59 CET 2013


For whom it may concern, I wrote a (rather bulky) wrapper around rbindlist that:

- checks that the classes of columns with the same name match;
- fills in any missing columns with NAs of the appropriate type;
- reorders columns for consistency;
- calls rbindlist on the results of this preprocessing.

The code is here: https://gist.github.com/asieira/7772953

The results would be as follows:

> smartrbindlist(list(data.table(a=1, b=2), data.table(b=4, a=3)))
   a b
1: 1 2
2: 3 4

> smartrbindlist(list(data.table(a=1, b=2), list(c=3), data.table(d="foo")))
    a  b  c   d
1:  1  2 NA  NA
2: NA NA  3  NA
3: NA NA NA foo

> smartrbindlist(list(data.table(a=1L, b=2), list(a=10)))
Erro em smartrbindlist(list(data.table(a = 1L, b = 2), list(a = 10)))
  smartrbindlist: column a has different classes in entry 2 [numeric] and its predecessors [integer]

Hope this helps anyone else out there.

-- 
Alexandre Sieira
CISA, CISSP, ISO 27001 Lead Auditor

"The truth is rarely pure and never simple."
Oscar Wilde, The Importance of Being Earnest, 1895, Act I

On 3 de dezembro de 2013 at 14:46:08, G See (gsee000 at gmail.com) wrote:

I agree. Here is a related thread:  
http://thread.gmane.org/gmane.comp.lang.r.datatable/2231  

Garrett  


On Tue, Dec 3, 2013 at 8:26 AM, Alexandre Sieira  
<alexandre.sieira at gmail.com> wrote:  
> I have come across some behavior in rbindlist that look unexpected to me:  
>  
>> rbindlist(list(data.table(a=1, b=2), data.table(b=4, a=3)))  
> a b  
> 1: 1 2  
> 2: 4 3  
>  
> So it appears to assume (without checking) that all objects have not only  
> the same column names but also the same column order. So a value assigned  
> to column ‘a’ in the second object was used for column ‘b’ in the end result  
> (and vice-versa).  
>  
> I know the documentation says rbindlist uses the column types from the first  
> entry of the list, but I didn’t see any mention to column order or names  
> anywhere.  
>  
> I suggest that column names are matched, even if they are not in the same  
> order. Perhaps a ‘use.names’ parameter could be used to ask for this  
> behavior to avoid breaking backwards compatibility.  
>  
> Or, at the very least, I suggest the documentation of bindlist be updated to  
> explicitly mention that the columns will be considered by position only, and  
> that callers need to ensure the column orders of all objects match exactly.  
> And that a warning is issued by rbindlist when the column names don’t match.  
>  
> --  
> Alexandre Sieira  
> CISA, CISSP, ISO 27001 Lead Auditor  
>  
> "The truth is rarely pure and never simple."  
> Oscar Wilde, The Importance of Being Earnest, 1895, Act I  
>  
> _______________________________________________  
> datatable-help mailing list  
> datatable-help at lists.r-forge.r-project.org  
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20131203/33a4d0b0/attachment.html>


More information about the datatable-help mailing list