[datatable-help] rbindlist on list of data.frames with factor column
Ricardo Saporta
saporta at scarletmail.rutgers.edu
Thu Mar 28 18:34:29 CET 2013
My apologies, I had a mistake in my previous email. (I forgot that
data.table does not coerce strings to factor)
It looks like the `rbindlist` behavior observed occurs for *both*, a list
of data.tables and a list of data.frames (assuming, of course, that there
is factor column present)
# sample data, using data.frame
set.seed(1)
sampleList.DF <- lapply(LETTERS[1:5], function(L)
data.frame(Val1=rnorm(3), Val2=runif(3), FactorCol=factor(L)) )
sampleList.DF <- lapply(sampleList.DF, function(x)
{x$StringCol <- as.character(x$FactorCol); x})
# sample data, using data.table
set.seed(1)
sampleList.DT <- lapply(LETTERS[1:5], function(L)
data.table(Val1=rnorm(3), Val2=runif(3), FactorCol=factor(L)) )
sampleList.DT <- lapply(sampleList.DT, function(x)
x[, StringCol := as.character(FactorCol)])
# rbindlist results:
rbindlist(sampleList.DT)
rbindlist(sampleList.DF)
# expected behavior similiar to do.call(rbind, LIST)
do.call(rbind, sampleList.DF)
do.call(rbind, sampleList.DT)
On Thu, Mar 28, 2013 at 12:52 PM, Ricardo Saporta <
saporta at scarletmail.rutgers.edu> wrote:
> Hello,
>
> I found that when using `rbindlist` on a list of data.frames with factor
> columns, the factor column is getting concat'd as its numeric equivalent.
>
> This of course, does not happen when using a list of data.tables.
>
> # sample data, using data.frame
> sampleList.DF <- lapply(LETTERS[1:5], function(L)
> data.frame(Val1=rnorm(3), Val2=runif(3), FactorCol=L) )
>
> sampleList.DF <- lapply(sampleList.DF, function(x)
> {x$StringCol <- as.character(x$FactorCol); x})
>
> # sample data, using data.table
> sampleList.DT <- lapply(LETTERS[1:5], function(L)
> data.table(Val1=rnorm(3), Val2=runif(3), FactorCol=L) )
> sampleList.DT <- lapply(sampleList.DT, function(x)
> x[, StringCol := as.character(FactorCol)])
>
>
> # Compare the column `FactorCol`:
>
> rbindlist(sampleList.DT)
> rbindlist(sampleList.DF)
> do.call(rbind, sampleList.DF)
>
> Interestingly, I originally thought it was levels dependent:
> (I would have expected, for example, the following to allow for the levels
> of the third list element, but it does not).
>
> sampleList.DF[[1]][, "FactorCol"] <- factor(c("A", "C", "A"))
>
> # all the levels in third element are present in the first
> all(levels(sampleList.DF[[3]][, "FactorCol"]) %in%
> levels(sampleList.DF[[1]][, "FactorCol"]))
> # [1] TRUE
>
> But...
>
> rbindlist(sampleList.DF)
>
> However:
>
> sampleList.DF[[1]][, "FactorCol"] <- factor(c("C", "A", "A"),
> levels=c("C", "A"))
> rbindlist(sampleList.DF)
>
> Is the above behavior intended?
>
> Cheers,
> Rick
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130328/355aa76a/attachment.html>
More information about the datatable-help
mailing list