[datatable-help] Behavior of setkey with factors
Damian Betebenner
dbetebenner at nciea.org
Tue Aug 10 12:49:45 CEST 2010
All,
I was wondering how setkey orders a factor and whether it observes whether the factor is ordered or just alphabetically orders the factor
I would like to have the key observe the order of a factor (e.g., a course taken field may run from 1 to 5 with 1=Basic Math, 2=Calculus, 3=Geometry,
4=Algebra I and 5=Algebra 2. I would like the sort imposed by data.table to "respect" the canonical ordering of the classes, no an alphabetical ordering.
I can't however, seem to get the key to behave the way I want.
Here's an example:
setkey(123)
my.course.sample <- sample(1:5, 10, replace=TRUE)
X <- 1:10
Y <- factor(my.course.sample, levels=1:5, labels=c("Basic Math", "Calculus", "Geometry", "Algebra I", "Algebra II"))
my.dt <- data.table(ID=X, COURSE=Y)
> my.dt
ID COURSE
[1,] 1 Algebra II
[2,] 2 Algebra I
[3,] 3 Algebra I
[4,] 4 Algebra II
[5,] 5 Geometry
[6,] 6 Algebra I
[7,] 7 Geometry
[8,] 8 Calculus
[9,] 9 Algebra I
[10,] 10 Geometry
setkey(my.dt, COURSE)
> my.dt
ID COURSE
[1,] 2 Algebra I
[2,] 3 Algebra I
[3,] 6 Algebra I
[4,] 9 Algebra I
[5,] 1 Algebra II
[6,] 4 Algebra II
[7,] 8 Calculus
[8,] 5 Geometry
[9,] 7 Geometry
[10,] 10 Geometry
###
### The COURSE key is alphabetizing based upon the labels
###
###
### Now try to impose a different ordering
###
Y <- factor(my.course.sample, levels=c(1,4,3,5,2), labels=c("Basic Math", "Calculus", "Geometry", "Algebra I", "Algebra II"))
my.dt <- data.table(ID=X, COURSE=Y)
> my.dt
ID COURSE
[1,] 1 Algebra I
[2,] 2 Calculus
[3,] 3 Calculus
[4,] 4 Algebra I
[5,] 5 Geometry
[6,] 6 Calculus
[7,] 7 Geometry
[8,] 8 Algebra II
[9,] 9 Calculus
[10,] 10 Geometry
setkey(my.dt, COURSE)
> my.dt
ID COURSE
[1,] 1 Algebra I
[2,] 3 Algebra I
[3,] 9 Algebra I
[4,] 2 Algebra II
[5,] 4 Algebra II
[6,] 8 Algebra II
[7,] 7 Basic Math
[8,] 5 Calculus
[9,] 6 Calculus
[10,] 10 Geometry
Y <- factor(my.course.sample, levels=c(1,4,3,5,2), labels=c("Basic Math", "Calculus", "Geometry", "Algebra I", "Algebra II"), ordered=TRUE)
my.dt <- data.table(ID=X, COURSE=Y)
my.dt
ID COURSE
[1,] 1 Algebra I
[2,] 2 Calculus
[3,] 3 Calculus
[4,] 4 Algebra I
[5,] 5 Geometry
[6,] 6 Calculus
[7,] 7 Geometry
[8,] 8 Algebra II
[9,] 9 Calculus
[10,] 10 Geometry
setkey(my.dt, COURSE)
my.dt
ID COURSE
[1,] 1 Algebra I
[2,] 4 Algebra I
[3,] 8 Algebra II
[4,] 2 Calculus
[5,] 3 Calculus
[6,] 6 Calculus
[7,] 9 Calculus
[8,] 5 Geometry
[9,] 7 Geometry
[10,] 10 Geometry
### Setting COURSE as the key for an ordered factor seems to over-ride the ordering associated with the factor and impose an alphabetical order.
I'd like the key to respect the order associated with the factor
Any help with this greatly appreciated.
Best regards,
Damian Betebenner
Center for Assessment
PO Box 351
Dover, NH 03821-0351
Phone (office): (603) 516-7900
Phone (cell): (857) 234-2474
Fax: (603) 516-7910
dbetebenner at nciea.org
www.nciea.org
More information about the datatable-help
mailing list