[datatable-help] probably undesirable function of `rbindlist`

Arunkumar Srinivasan aragorn168b at gmail.com
Mon Jul 29 16:10:56 CEST 2013


Ricardo, 

I feel the same way between "numeric" and "integer"; "numeric" should be preserved. 

I don't mind if I get back a "character" or "factor" as long as the data is right. "character" may be faster, allowing the user to decide if he wants to "factor" or not, but I don't mind either ways here. 

Arun


On Monday, July 29, 2013 at 3:29 PM, Ricardo Saporta wrote:

> << the question comes down to: is it better to retain the type of the first input or the most general input? >>
> 
> My personal preference is to use the class that preservers the most amount of information.  Between numeric & integer, that is clearly numeric.  (Between factor and character, there is the question of losing the levels).  
> 
> I'm not sure how others feel, but I wouldn't mind seeing a change in rbindlist where 
> * For each column, all elements are coerced to the most generic class
> * An optional flag where factors will not be coerced into characters (this might end up being useless, and in the end better for the user to preserve the levels and then reapply them as needed). 
> 
> -Rick
> 
> 
> On Sun, Jul 28, 2013 at 6:16 AM, Arunkumar Srinivasan <aragorn168b at gmail.com (mailto:aragorn168b at gmail.com)> wrote:
> > Ricardo, 
> > 
> > Thanks for your reply. Yes, the question comes down to: is it better to retain the type of the first input or the most general input? Even if 1 data.table has a factor input, is it better to retain "factor" instead of "character"? If one of them has a numeric column, then is it better to retain numeric even if the first data.table has integer column? 
> > 
> > And if the first data.table through a division operation yielded integers, then this'll cause an issue, unless one manually typesets. data.table is consistent, alright. But maybe a "warning" or a "message" would be nice. 
> > 
> > Arun
> > 
> > 
> > On Sunday, July 28, 2013 at 5:39 AM, Ricardo Saporta wrote:
> > 
> > > Arun, 
> > > 
> > > Im pretty sure `rbindlist` identifies column class based on the first argument.   
> > > 
> > > compare 
> > >   rbindlist(list(DT2, DT1))
> > > 
> > >   rbindlist(list(DT1, DT2))
> > > 
> > > 
> > > 
> > > I agree with you though that a more ideal behavior would be one that mimics `c( )`
> > > 
> > > 
> > > -Rick 
> > > 
> > > On Sat, Jul 27, 2013 at 3:07 PM, Arunkumar Srinivasan <aragorn168b at gmail.com (mailto:aragorn168b at gmail.com)> wrote:
> > > > Hi all, 
> > > > 
> > > > Here's a behaviour of `rbindlist` that I came across that I think is undesirable. If the columns to be "rbind" are of type "integer" and "numeric", then, the class "integer" is retained which results in different results than intended. 
> > > > 
> > > > require(data.table)
> > > > DT1 <- data.table(x = 1:5, y = 1:5)
> > > >    x y
> > > > 1: 1 1
> > > > 2: 2 2
> > > > 3: 3 3
> > > > 4: 4 4
> > > > 5: 5 5
> > > > 
> > > > 
> > > > DT2 <- data.table(x = 6:10, y = 1:5/10)
> > > >     x   y
> > > > 1:  6 0.1
> > > > 2:  7 0.2
> > > > 3:  8 0.3
> > > > 4:  9 0.4
> > > > 5: 10 0.5
> > > > 
> > > > 
> > > > sapply(DT1, class) 
> > > >         x         y 
> > > > "integer" "integer" 
> > > > 
> > > > 
> > > > sapply(DT2, class)
> > > >         x         y 
> > > > "integer" "numeric" 
> > > > 
> > > > 
> > > > rbindlist(list(DT1, DT2)) 
> > > >      x y
> > > >  1:  1 1
> > > >  2:  2 2
> > > >  3:  3 3
> > > >  4:  4 4
> > > >  5:  5 5
> > > >  6:  6 0 <~~~~ from here, the result should be 0.1 to 0.5 for the next 5 rows or y.
> > > >  7:  7 0
> > > >  8:  8 0
> > > >  9:  9 0
> > > > 10: 10 0
> > > > 
> > > > 
> > > > Is this behaviour unexpected or we've to manually take care of this? Seems more proper to be taken care of internally to me though. 
> > > > 
> > > > Best,
> > > > Arun.
> > > > 
> > > > 
> > > > _______________________________________________
> > > > datatable-help mailing list
> > > > datatable-help at lists.r-forge.r-project.org (mailto:datatable-help at lists.r-forge.r-project.org)
> > > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> > > 
> > 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130729/37e12181/attachment.html>


More information about the datatable-help mailing list