[datatable-help] probably undesirable function of `rbindlist`

Ricardo Saporta saporta at scarletmail.rutgers.edu
Mon Jul 29 15:29:32 CEST 2013


<< the question comes down to: is it better to retain the type of the first
input or the most general input? >>

My personal preference is to use the class that preservers the most amount
of information.  Between numeric & integer, that is clearly numeric.
 (Between factor and character, there is the question of losing the
levels).

I'm not sure how others feel, but I wouldn't mind seeing a change in
rbindlist where
* For each column, all elements are coerced to the most generic class
* An optional flag where factors will not be coerced into characters (this
might end up being useless, and in the end better for the user to preserve
the levels and then reapply them as needed).

-Rick


On Sun, Jul 28, 2013 at 6:16 AM, Arunkumar Srinivasan <aragorn168b at gmail.com
> wrote:

>  Ricardo,
>
> Thanks for your reply. Yes, the question comes down to: is it better to
> retain the type of the first input or the most general input? Even if 1
> data.table has a factor input, is it better to retain "factor" instead of
> "character"? If one of them has a numeric column, then is it better to
> retain numeric even if the first data.table has integer column?
>
> And if the first data.table through a division operation yielded integers,
> then this'll cause an issue, unless one manually typesets. data.table is
> consistent, alright. But maybe a "warning" or a "message" would be nice.
>
> Arun
>
> On Sunday, July 28, 2013 at 5:39 AM, Ricardo Saporta wrote:
>
> Arun,
>
> Im pretty sure `rbindlist` identifies column class based on the first
> argument.
>
> compare
>   rbindlist(list(DT2, DT1))
>   rbindlist(list(DT1, DT2))
>
>
> I agree with you though that a more ideal behavior would be one that
> mimics `c( )`
>
>
> -Rick
>
>
> On Sat, Jul 27, 2013 at 3:07 PM, Arunkumar Srinivasan <
> aragorn168b at gmail.com> wrote:
>
>  Hi all,
>
> Here's a behaviour of `rbindlist` that I came across that I think is
> undesirable. If the columns to be "rbind" are of type "integer" and
> "numeric", then, the class "integer" is retained which results in different
> results than intended.
>
> require(data.table)
> DT1 <- data.table(x = 1:5, y = 1:5)
>    x y
> 1: 1 1
> 2: 2 2
> 3: 3 3
> 4: 4 4
> 5: 5 5
>
> DT2 <- data.table(x = 6:10, y = 1:5/10)
>     x   y
> 1:  6 0.1
> 2:  7 0.2
> 3:  8 0.3
> 4:  9 0.4
> 5: 10 0.5
>
> sapply(DT1, class)
>         x         y
> "integer" "integer"
>
> sapply(DT2, class)
>         x         y
> "integer" "numeric"
>
> rbindlist(list(DT1, DT2))
>      x y
>  1:  1 1
>  2:  2 2
>  3:  3 3
>  4:  4 4
>  5:  5 5
>  6:  6 0 <~~~~ from here, the result should be 0.1 to 0.5 for the next 5
> rows or y.
>  7:  7 0
>  8:  8 0
>  9:  9 0
> 10: 10 0
>
> Is this behaviour unexpected or we've to manually take care of this? Seems
> more proper to be taken care of internally to me though.
>
> Best,
> Arun.
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130729/0b1179e7/attachment.html>


More information about the datatable-help mailing list