[datatable-help] An implicit factor for a key?

Matthew Dowle mdowle at mdowle.plus.com
Thu Aug 25 21:50:36 CEST 2011


Hi Chris,

I see where your coworker is coming from. FRs 1224 and 1493 are next on
the agenda for v1.6.7. Sort the FR tracker by descending priority to see
what we plan to do next, 5 = highest priority.  Fast sorting has been
the issue.

You might want to look at countingcharacterorder.c which is already part
of the package but not yet hooked up. Basically we plan to allow
character columns in keys since we now have a fast way to sort
character. data.table() will no longer coerce character to factor, and a
known performance issue with very large number of levels should be fixed
at the same time.

It's all a bit tricky because it's closely related to how R's global
string cache works.  Thinking of doing it in 1.6.7 and that will become
v1.7.  Don't hold your breath though, it is tricky.

Matthew


On Thu, 2011-08-25 at 10:38 -0400, Chris Neff wrote:
> Hi all,
> 
> I've been pondering the following. One of my coworkers doesn't like
> data.table because of the fact that he doesn't like factors.  Namely
> things like adding a new value to a factor field only to have it choke
> because it isn't one of the levels.  Also often times the variable is
> something like a list of subnested categories, and sometimes he will
> do a substitute to go up a level in the categories. This is a pain
> when they are factors.
> 
> Suffie to say, his work flow just makes a lot more sense to him when
> they are characters and he doesn't have to worry about underlying
> levels and the like.
> 
> How hard would an "implicit factor" be?  Something that to the user
> behaves exactly like a normal character variable, but internally
> data.frame is keeping the mapping of character values to integer codes
> somewhere behind the scene.
> 
> This is my thrust towards a hack at allowing character vectors to be
> keys.  If the real right way is much simpler than what this would take
> please ignore me.
> 
> -Chris
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help




More information about the datatable-help mailing list