[datatable-help] Wonder whether there is an easier way to changepart of data.table values

Branson Owen branson.owen at gmail.com
Tue Aug 3 15:18:29 CEST 2010


Wow, thanks a lot Tom!

2010/8/2 Short, Tom <TShort at epri.com>:
> I've just checked in versions of [<-.data.table and $<-.data.table that
> check for the columns adjusted and reset the key if appropriate. This
> brings up some incompatibilities:
> (*) KEYS -- Before, you could do:
> dt$key_column = anything
> And it wouldn't change the status of the key. Now, the key will be nullified.

I think it is an expected behavior. Key is not supposed to be changed
when it is still key.

> (*) ASSIGNMENT DIFFERENCE
> Before: dt["a"] <- "b" meant change column a.
> Now:    dt[,"a"] <- "b"
> Now, you can do
> dt[J("a"), "somecol"] <- 33 means assign 33 to the column "somecol" based on the key being equal to "a".

This is cool. dt[,"a"] is clearer than dt["a"] to me.

> (*) QUESTIONS
> - Do we need a "keep.key" argument for cases where we don't want the key nullified (the user knows the order is unaffected). This isn't really possible for dt$a[1:4] <- something.

I think advanced user would prefer to have this option if it's not too
hard to provide. So far we trust the speed of re-setting the key but
when data size reach a certain amount, this might be an issue.

> - Is it a good idea to use data.table-style indexing for the i part of
> [<-.data.table? I was skeptical when Branson first asked (prefering
> data.frame compatibility), but it makes more sense now that I think
> about it.

I think that will be very sweet features, and also more complete for
replacing data.frame. I think all data.table users feel that they can
completely replace data.frame using data.table, but this is a case
where we have to roll back.

> Finally, we should put a warning in the documentation somewhere that
> functions that re-arrange or assign values to a data frame may "corrupt"
> the key of a data.table. Data.table-aware functions should account for
> this, but other functions may not.

A short list of cases (syntaxes) for when key will be automatically
removed will be great!


More information about the datatable-help mailing list