[datatable-help] rbindlist and factors
Alexandre Sieira
alexandre.sieira at gmail.com
Tue May 21 20:06:25 CEST 2013
I think I found an unexpected behavior with rbindlist when columns are factors:
> dt1 = data.table(a=as.factor(c("a", "a", "a")))
> dt1
a
1: a
2: a
3: a
> str(dt1)
Classes ‘data.table’ and 'data.frame': 3 obs. of 1 variable:
$ a: Factor w/ 1 level "a": 1 1 1
- attr(*, ".internal.selfref")=<externalptr>
> dt2 = data.table(a=as.factor(c("b", "b", "b")))
> dt2
a
1: b
2: b
3: b
> str(dt2)
Classes ‘data.table’ and 'data.frame': 3 obs. of 1 variable:
$ a: Factor w/ 1 level "b": 1 1 1
- attr(*, ".internal.selfref")=<externalptr>
If I rbind them, I get the expected value - a table with 6 rows, 3 of which have value "a" and 3 with value "b":
> rbind(dt1, dt2)
a
1: a
2: a
3: a
4: b
5: b
6: b
So if I do rbindlist(list(dt1, dt2)), I would expect to get the exact same result, only faster. Unfortunately, that is not the case:
> rbindlist(list(dt1, dt2))
a
1: a
2: a
3: a
4: a
5: a
6: a
> str(rbindlist(list(dt1, dt2)))
Classes ‘data.table’ and 'data.frame': 6 obs. of 1 variable:
$ a: Factor w/ 1 level "a": 1 1 1 1 1 1
- attr(*, ".internal.selfref")=<externalptr>
This was executed with R 3.0.1 and data.table 1.8.8 on a Mac OS X 10.8.3.
Is this expected behavior? Am I missing something?
--
Alexandre Sieira
CISA, CISSP, ISO 27001 Lead Auditor
"The truth is rarely pure and never simple."
Oscar Wilde, The Importance of Being Earnest, 1895, Act I
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130521/10756c20/attachment-0001.html>
More information about the datatable-help
mailing list