[datatable-help] FR #5249 - rbindlist gains use.names and fill arguments

Arunkumar Srinivasan aragorn168b at gmail.com
Tue May 20 22:07:00 CEST 2014


Hi Gabor,

Thanks for the quick response. Just to be clear, you don’t have to set use.names=TRUE when fill=TRUE. If you just set fill=TRUE and use.names happens to be FALSE, then it’ll automatically set it to TRUE (with a message/warning), which you can safely ignore. Do you find this still ugly? You’ll get the warning if you use rbindlist with just fill=TRUE (because use.name=FALSE by default).


Arun

From: Gabor Grothendieck ggrothendieck at gmail.com
Reply: Gabor Grothendieck ggrothendieck at gmail.com
Date: May 20, 2014 at 10:04:21 PM
To: Arunkumar Srinivasan aragorn168b at gmail.com
Cc: datatable-help at lists.r-forge.r-project.org datatable-help at lists.r-forge.r-project.org
Subject:  Re: [datatable-help] FR #5249 - rbindlist gains use.names and fill arguments  

The requirement to set use.names to TRUE if fill is TRUE seems ugly.  
I suggest that fill be the default for use.names.  

On Tue, May 20, 2014 at 3:27 PM, Arunkumar Srinivasan  
<aragorn168b at gmail.com> wrote:  
> Hello everyone,  
>  
> With the latest commit #1266, the extra functionality offered via rbind  
> (use.names and fill) is also now available to rbindlist. In addition, the  
> implementation is completely moved to C, and is therefore tremendously fast,  
> especially for cases where one has to bind using with use.names=TRUE and/or  
> with fill=TRUE. I’ll try to put out a benchmark comparing speed differences  
> with the older implementation ASAP.  
>  
> Note that this change comes with a very low cost to the default speed to  
> rbindlist - with use.names=FALSE and fill=FALSE. As an example, binding  
> 10,000 data.tables with 20 columns each, resulted in the new version running  
> in 0.107 seconds, where as the older version ran in 0.095 seconds.  
>  
> In addition the documentation for ?rbindlist also has been improved (#5158  
> from Alexander). Here’s the change log from NEWS:  
>  
> o 'rbindlist' gains 'use.names' and 'fill' arguments and is now  
> implemented entirely in C. Closes #5249  
> -> use.names by default is FALSE for backwards compatibility  
> (doesn't bind by names by default)  
> -> rbind(...) now just calls rbindlist() internally, except that  
> 'use.names' is TRUE by default,  
> for compatibility with base (and backwards compatibility).  
> -> fill by default is FALSE. If fill is TRUE, use.names has to be  
> TRUE.  
> -> At least one item of the input list has to have non-null column  
> names.  
> -> Duplicate columns are bound in the order of occurrence, like  
> base.  
> -> Attributes that might exist in individual items would be lost in  
> the bound result.  
> -> Columns are coerced to the highest SEXPTYPE, if they are  
> different, if/when possible.  
> -> And incredibly fast ;).  
> -> Documentation updated in much detail. Closes DR #5158.  
> Eddi's (excellent) work on finding factor levels, type coercion of  
> columns etc. are all retained.  
>  
> Please try it and write back if things aren’t working as it was before. The  
> tests that had to be fixed are extremely rare cases. I suspect there should  
> be minimal issue, if at all, in this version. However, I do find the changes  
> here bring consistency to the function.  
>  
> One (very rare) feature that is not available due to this implementation is  
> the ability to recycle.  
>  
> dt1 <- data.table(x=1:3, y=4:6, z=list(1:2, 1:3, 1:4))  
> lst1 <- list(x=4, y=5, z=as.list(1:3))  
>  
> rbind(dt1, lst1)  
> # x y z  
> # 1: 1 4 1,2  
> # 2: 2 5 1,2,3  
> # 3: 3 6 1,2,3,4  
> # 4: 4 5 1  
> # 5: 4 5 2  
> # 6: 4 5 3  
>  
> The 4,5 are recycled very nicely here.. This is not possible at the moment.  
> This is because the earlier rbind implementation used as.data.table to  
> convert to data.table, however it takes a copy (very inefficient on huge /  
> many tables). I’d love to add this feature in C as well, as it would help  
> incredibly for use within [.data.table (now that we can fill columns and  
> bind by names faster). Will add a FR.  
>  
> In summary, I think there should be minimal issues, if any and should be  
> much faster (for rbind cases). Please write back what you think, if you  
> happen to try out.  
>  
>  
>  
> Arun  
>  
>  
> _______________________________________________  
> datatable-help mailing list  
> datatable-help at lists.r-forge.r-project.org  
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help  



--  
Statistics & Software Consulting  
GKX Group, GKX Associates Inc.  
tel: 1-877-GKX-GROUP  
email: ggrothendieck at gmail.com  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140520/865cdfd2/attachment-0001.html>


More information about the datatable-help mailing list