[datatable-help] Behavior of setkey with factors

Damian Betebenner dbetebenner at nciea.org
Tue Aug 10 12:49:45 CEST 2010


All,

I was wondering how setkey orders a factor and whether it observes whether the factor is ordered or just alphabetically orders the factor 

I would like to have the key observe the order of a factor (e.g., a course taken field may run from 1 to 5 with 1=Basic Math, 2=Calculus, 3=Geometry,
4=Algebra I and 5=Algebra 2. I would like the sort imposed by data.table to "respect" the canonical ordering of the classes, no an alphabetical ordering.

I can't however, seem to get the key to behave the way I want.

Here's an example:

setkey(123)
my.course.sample <- sample(1:5, 10, replace=TRUE)

X <- 1:10
Y <- factor(my.course.sample, levels=1:5, labels=c("Basic Math", "Calculus", "Geometry", "Algebra I", "Algebra II"))

my.dt <- data.table(ID=X, COURSE=Y)

> my.dt
      ID     COURSE
 [1,]  1 Algebra II
 [2,]  2  Algebra I
 [3,]  3  Algebra I
 [4,]  4 Algebra II
 [5,]  5   Geometry
 [6,]  6  Algebra I
 [7,]  7   Geometry
 [8,]  8   Calculus
 [9,]  9  Algebra I
[10,] 10   Geometry


setkey(my.dt, COURSE)

> my.dt
      ID     COURSE
 [1,]  2  Algebra I
 [2,]  3  Algebra I
 [3,]  6  Algebra I
 [4,]  9  Algebra I
 [5,]  1 Algebra II
 [6,]  4 Algebra II
 [7,]  8   Calculus
 [8,]  5   Geometry
 [9,]  7   Geometry
[10,] 10   Geometry


###
### The COURSE key is alphabetizing based upon the labels
###

###
### Now try to impose a different ordering
###

Y <- factor(my.course.sample, levels=c(1,4,3,5,2), labels=c("Basic Math", "Calculus", "Geometry", "Algebra I", "Algebra II"))

my.dt <- data.table(ID=X, COURSE=Y)

> my.dt
      ID     COURSE
 [1,]  1  Algebra I
 [2,]  2   Calculus
 [3,]  3   Calculus
 [4,]  4  Algebra I
 [5,]  5   Geometry
 [6,]  6   Calculus
 [7,]  7   Geometry
 [8,]  8 Algebra II
 [9,]  9   Calculus
[10,] 10   Geometry

setkey(my.dt, COURSE)

> my.dt
      ID     COURSE
 [1,]  1  Algebra I
 [2,]  3  Algebra I
 [3,]  9  Algebra I
 [4,]  2 Algebra II
 [5,]  4 Algebra II
 [6,]  8 Algebra II
 [7,]  7 Basic Math
 [8,]  5   Calculus
 [9,]  6   Calculus
[10,] 10   Geometry


Y <- factor(my.course.sample, levels=c(1,4,3,5,2), labels=c("Basic Math", "Calculus", "Geometry", "Algebra I", "Algebra II"), ordered=TRUE)

my.dt <- data.table(ID=X, COURSE=Y)

my.dt

      ID     COURSE
 [1,]  1  Algebra I
 [2,]  2   Calculus
 [3,]  3   Calculus
 [4,]  4  Algebra I
 [5,]  5   Geometry
 [6,]  6   Calculus
 [7,]  7   Geometry
 [8,]  8 Algebra II
 [9,]  9   Calculus
[10,] 10   Geometry

setkey(my.dt, COURSE)

my.dt

      ID     COURSE
 [1,]  1  Algebra I
 [2,]  4  Algebra I
 [3,]  8 Algebra II
 [4,]  2   Calculus
 [5,]  3   Calculus
 [6,]  6   Calculus
 [7,]  9   Calculus
 [8,]  5   Geometry
 [9,]  7   Geometry
[10,] 10   Geometry


### Setting COURSE as the key for an ordered factor seems to over-ride the ordering associated with the factor and impose an alphabetical order.


I'd like the key to respect the order associated with the factor


Any help with this greatly appreciated.


Best regards,



Damian Betebenner
Center for Assessment
PO Box 351
Dover, NH   03821-0351
 
Phone (office): (603) 516-7900
Phone (cell): (857) 234-2474
Fax: (603) 516-7910

dbetebenner at nciea.org
www.nciea.org




More information about the datatable-help mailing list