[datatable-help] rbindlist on list of data.frames with factor column

Ricardo Saporta saporta at scarletmail.rutgers.edu
Thu Mar 28 17:52:57 CET 2013


Hello,

I found that when using `rbindlist` on a list of data.frames with factor
columns, the factor column is getting concat'd as its numeric equivalent.

This of course, does not happen when using a list of data.tables.

    # sample data, using data.frame
    sampleList.DF <- lapply(LETTERS[1:5], function(L)
      data.frame(Val1=rnorm(3), Val2=runif(3), FactorCol=L) )

    sampleList.DF <- lapply(sampleList.DF, function(x)
      {x$StringCol <- as.character(x$FactorCol); x})

    # sample data, using data.table
    sampleList.DT <- lapply(LETTERS[1:5], function(L)
      data.table(Val1=rnorm(3), Val2=runif(3), FactorCol=L) )
    sampleList.DT <- lapply(sampleList.DT, function(x)
       x[, StringCol := as.character(FactorCol)])


# Compare the column `FactorCol`:

    rbindlist(sampleList.DT)
    rbindlist(sampleList.DF)
    do.call(rbind, sampleList.DF)

Interestingly, I originally thought it was levels dependent:
(I would have expected, for example, the following to allow for the levels
of the third list element, but it does not).

    sampleList.DF[[1]][, "FactorCol"] <- factor(c("A", "C", "A"))

    # all the levels in third element are present in the first
    all(levels(sampleList.DF[[3]][, "FactorCol"])  %in%
 levels(sampleList.DF[[1]][, "FactorCol"]))
    # [1] TRUE

But...

    rbindlist(sampleList.DF)

However:

    sampleList.DF[[1]][, "FactorCol"] <- factor(c("C", "A", "A"),
levels=c("C", "A"))
    rbindlist(sampleList.DF)

Is the above behavior intended?

Cheers,
Rick
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130328/e10006ba/attachment.html>


More information about the datatable-help mailing list