[datatable-help] Unexpected behavior in setnames()

Fri Nov 8 15:03:12 CET 2013

Hi Simon,

On Fri, Nov 8, 2013 at 5:30 AM, Simon O'Hanlon
<simon.ohanlon at imperial.ac.uk> wrote:

> I am not particularly opposed or otherwise, to duplicate column names,
> although I do see the issues that creates.
>
> I think that whatever you, as custodians of data.table decide with respect
> to column names, the behaviour of numeric indices to indicate columns
> included in .SD needs to be fixed when duplicate column names are present.
> As a user I'd expect the following to return two columns with the values 2
> and 6 respectively:
>
> Example:
>
> dt <- data.table( 1,2,3,4 )
> setnames(dt , rep( c("a", "b") , 2 ) )
>    a b a b
> 1: 1 2 3 4
>
> dt[ , lapply( .SD ,function(x) x*2 ) , .SDcols = c(1,3) ]
>    a a
> 1: 2 2
>
> I hope that contributes in some small way to your decision making process.
> This is lifted from a question I asked on Stack Overflow here;
>
> http://stackoverflow.com/questions/19811644/can-data-table-handle-identical-
> column-names-when-using-sdcols

I agree -- when using numeric columns, this is clearly wrong and I
would expect an answer of 2 and 6.

I'm curious what you think, however, when you use the names of the
columns in .SDcols

If you ask .SDcols="a" would you expect the first "a" column to be
used, or all of them? To use all of them, would you expect to use
.SDcols=c('a', 'a')?

-steve

-- 
Steve Lianoglou
Computational Biologist
Bioinformatics and Computational Biology
Genentech