[datatable-help] best way to set keys, when you don't know in advance wich fields you will use

Chris Neff caneff at gmail.com
Thu Aug 25 17:17:59 CEST 2011


You don't necessarily have to use keys at all.  When you aggregate and
give the by columns, they don't necessarily have to be keys of the
data table.  This is called an "ad-hoc by". It is slightly slower, but
my intuition says that it isn't really any slower than setting the
key.

When you add a key you sort by those fields.  You incur a time cost
for that. If you are consistently doing things with those keys then
you may make up for that time cost further on.  But for multiple
different groupings the ad-hoc by is probably faster.  Do some timings
to see.  Some simple ones I did show that the act of sorting is slower
than ad-hoc by.

On 25 August 2011 11:05, Jean Jacques Dureau <jj.dureau at gmail.com> wrote:
> Hi,
> i have a data.table (10,000k of rows) with 20 (factor) fields and i
> need to filter data according some of them.
> I use this data.table inside a function and i don't know "in advance"
> wich fileds i'll use to filter data and to sum.
>
> So, for example, consider a data.table (named dt_data) with 20 fileds,
> named f1, f2, ... ,f20.
>
> I use this approach: i set the key on the field i have to use, for
> example f2. Then i "filter" the data and i use them to do some
> computations.
>
> Subsequently, with these computations, i discover wich fileds i have
> to filter, for example f4 and f5. Now, i set the key on dt_data on
> (f4,f5), and so on ...
>
> I use this approach because i don't  know if it's possible to set the
> key on all fields f1, f2, .., f20 in advance and then use only some of
> them!
>
> Is there a better way to use data.table?
>
> thanks
>
> jj
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>


More information about the datatable-help mailing list