[datatable-help] probably undesirable function of `rbindlist`

Arunkumar Srinivasan aragorn168b at gmail.com
Sun Jul 28 12:16:09 CEST 2013


Ricardo, 

Thanks for your reply. Yes, the question comes down to: is it better to retain the type of the first input or the most general input? Even if 1 data.table has a factor input, is it better to retain "factor" instead of "character"? If one of them has a numeric column, then is it better to retain numeric even if the first data.table has integer column?

And if the first data.table through a division operation yielded integers, then this'll cause an issue, unless one manually typesets. data.table is consistent, alright. But maybe a "warning" or a "message" would be nice. 

Arun


On Sunday, July 28, 2013 at 5:39 AM, Ricardo Saporta wrote:

> Arun, 
> 
> Im pretty sure `rbindlist` identifies column class based on the first argument.   
> 
> compare 
>   rbindlist(list(DT2, DT1))
> 
>   rbindlist(list(DT1, DT2))
> 
> 
> 
> I agree with you though that a more ideal behavior would be one that mimics `c( )`
> 
> 
> -Rick 
> 
> On Sat, Jul 27, 2013 at 3:07 PM, Arunkumar Srinivasan <aragorn168b at gmail.com (mailto:aragorn168b at gmail.com)> wrote:
> > Hi all, 
> > 
> > Here's a behaviour of `rbindlist` that I came across that I think is undesirable. If the columns to be "rbind" are of type "integer" and "numeric", then, the class "integer" is retained which results in different results than intended. 
> > 
> > require(data.table)
> > DT1 <- data.table(x = 1:5, y = 1:5)
> >    x y
> > 1: 1 1
> > 2: 2 2
> > 3: 3 3
> > 4: 4 4
> > 5: 5 5
> > 
> > 
> > DT2 <- data.table(x = 6:10, y = 1:5/10)
> >     x   y
> > 1:  6 0.1
> > 2:  7 0.2
> > 3:  8 0.3
> > 4:  9 0.4
> > 5: 10 0.5
> > 
> > 
> > sapply(DT1, class) 
> >         x         y 
> > "integer" "integer" 
> > 
> > 
> > sapply(DT2, class)
> >         x         y 
> > "integer" "numeric" 
> > 
> > 
> > rbindlist(list(DT1, DT2)) 
> >      x y
> >  1:  1 1
> >  2:  2 2
> >  3:  3 3
> >  4:  4 4
> >  5:  5 5
> >  6:  6 0 <~~~~ from here, the result should be 0.1 to 0.5 for the next 5 rows or y.
> >  7:  7 0
> >  8:  8 0
> >  9:  9 0
> > 10: 10 0
> > 
> > 
> > Is this behaviour unexpected or we've to manually take care of this? Seems more proper to be taken care of internally to me though. 
> > 
> > Best,
> > Arun.
> > 
> > 
> > _______________________________________________
> > datatable-help mailing list
> > datatable-help at lists.r-forge.r-project.org (mailto:datatable-help at lists.r-forge.r-project.org)
> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130728/142858f1/attachment.html>


More information about the datatable-help mailing list