[datatable-help] Unexpected behavior in setnames()

Steve Lianoglou lianoglou.steve at gene.com
Thu Nov 7 00:01:05 CET 2013


On Wed, Nov 6, 2013 at 2:50 PM, Arunkumar Srinivasan
<aragorn168b at gmail.com> wrote:
> Eddi,
>
> 1) We can still allow duplicate names in "fread" and during creation of
> data.table with the data.table() command.
> 2) There's really no loss of data as we can allow "setnames" to set
> duplicate names/unduplicate them (and they anyways have the data as they
> load that into R using fread). There's therefore no *real* loss of data.
> 3) The point is to decide upon where duplicate names are allowed and where
> it should give an error…
>
> As I said before, I think it's essential to allow duplicate names while
> loading a file (and therefore for consistency during creation of data.table
> as well). However, all grouping/aggregating/subsetting etc.. where ambiguity
> can arise should end in error. At least this is my stance so far. Are we
> agreeing on this?

Add "evaluation in `j`" to the things you want to throw an error, and
I guess I'm ok w/ Arun's stance, too, since I guess we should stay as
close to data.frame as possible (even though I think it's still
"wrong" to have duplicate column names in principle).

I guess a more clever handling of setnames needs to happen too, as it
fails if the target data.table has any duplicate names (I'm assuming
this has come up already, but I'm only half-tuned-in to this
discussion)

I also think that the output of the aggregation example Eddi used
earlier should be changed, ie:

R> x <- data.table(V1=sample(letters[1:3], 10, rep=TRUE), B=rnorm(10))
R> x[, sum(B), by=V1]
   V1         V1
1:  b -0.8581098
2:  a  0.8762710
3:  c  1.3274762

Just feels wrong for the `sum`ed column to also be V1, but maybe this
is an FR for another day.

-steve

-- 
Steve Lianoglou
Computational Biologist
Bioinformatics and Computational Biology
Genentech


More information about the datatable-help mailing list