[datatable-help] Something seems funky. I think with character-to-factor conversion for keys (?)
Matthew Dowle
mdowle at mdowle.plus.com
Tue Mar 8 04:18:06 CET 2011
Maybe. The slowdown would be fairly significant, perhaps. Although the
levels vector is contiguous in memory, the global character hash (the
memory where the character pointers point to) isn't. It's not the string
cmp as such, it's the page fetches. Also, it might potentially do this
check over and over again for the same levels vectors (very wasteful).
Remember that [.data.table is recursive in places, although once only I
think.
Did you find out what created the out-of-order levels? This check won't
help you find out where that occurred, or will it?
On Mon, 2011-03-07 at 21:39 -0500, Steve Lianoglou wrote:
> On Mon, Mar 7, 2011 at 8:50 PM, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
> > Btw :
> >
> >> a small utility at the C level to scan through the levels() of
> >> factor-keys to test for them being in order and
> >> breaking/short-circuiting as soon as it finds one level that's out of
> >> order?
> >
> > That's base::is.unsorted(), which is done in C.
>
> Aww -- was looking forward to writing some C code ...
>
> It looks like you were right, though -- the problematic data.table has
> a (factor) key where `is.unsorted(levels(the_key_column))` is TRUE.
>
> So I guess we're talking about having something like
> options(datatable.check.factor.levels=TRUE) check at the top of the
> [.data.table function that fires a warning() when the levels are
> unsorted, yeah?
>
> -steve
>
More information about the datatable-help
mailing list