[datatable-help] new key argument to [.data.table in 1.8.11
Eduard Antonyan
eduard.antonyan at gmail.com
Sun Sep 29 15:47:36 CEST 2013
There wasn't a 'key' argument before and yes, it will change the key
regardless of whether you're merging or not. Initially I added it just for
the merges, but then realized that there us no conceptual reason to
restrict it just to merges.
Fyi the reason you probably thought there is a key argument before is
because in R shorthand of arguments is valid syntax and you were actually
using 'keyby' (which has not changed).
You raise a good point that I haven't thought of that copying can be faster
than sorting - I will check when that's true. It's easy to implement the
copy version and I did this because I assumed it's the faster option, but
if it's not then might as well copy and do this for merges only.
On Sep 28, 2013 11:50 PM, "Frank Erickson" <FErickson at psu.edu> wrote:
> Hi,
>
> I'm just continuing a discussion with @eddi that would not fit in an SO
> comment. If you want to catch up, the references are...
>
> http://r-forge.r-project.org/tracker/index.php?func=detail&aid=4675&group_id=240&atid=978
> http://stackoverflow.com/a/19074195/1191259
> The SO question (scroll up on the second link) was whether there was a way
> to use a "temporary" key for X in an X[Y] join.
>
> @eddi:
>
> +1. Yeah, I like this new option and will probably use it.
>
> Will this also overwrite the key when using [.data.table without doing
> joins? That might be backward incompatible I guess, since `key` is already
> an argument to `[.data.table`. That is, will x[i,,key='B'] do anything? I
> don't think that type of command has had much use until now, and adding a j
> argument (that doesn't start with `:=`) always makes a copy (right?), so
> maybe backward compatibility would not be an issue there.
>
> Regarding whether it's a reasonable compromise, ... well, I'll be using
> it, anyway! I don't know what the feasibility constraints are on
> implementing what I initially had in mind, so I'll defer to you and the
> developers on that. If "secondary keys" are implemented down the road, that
> would solve this problem in most cases.
>
> As far as when I will use it, I guess it depends on the relative cost of
> making a copy vs resetting the key on x. If I use the old syntax, I make a
> copy, but don't have to change x's key back at the end (one copy, one key
> setting). With the new syntax, I'd have to change the key on x back
> afterward (zero copies, two key settings). If I know the sorting takes a
> long time (e.g., because the key is the whole set of columns), I might
> still go with copying.
>
> Best,
>
> Frank
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130929/1df4de3e/attachment.html>
More information about the datatable-help
mailing list