[datatable-help] rbindlist on list of data.frames with factor column

Ricardo Saporta saporta at scarletmail.rutgers.edu
Thu Mar 28 18:34:29 CET 2013


My apologies, I had a mistake in my previous email.  (I forgot that
data.table does not coerce strings to factor)
It looks like the `rbindlist` behavior observed occurs for *both*, a list
of data.tables and a list of data.frames (assuming, of course, that there
is factor column present)

    # sample data, using data.frame
    set.seed(1)
    sampleList.DF <- lapply(LETTERS[1:5], function(L)
      data.frame(Val1=rnorm(3), Val2=runif(3), FactorCol=factor(L)) )
    sampleList.DF <- lapply(sampleList.DF, function(x)
      {x$StringCol <- as.character(x$FactorCol); x})

    # sample data, using data.table
    set.seed(1)
    sampleList.DT <- lapply(LETTERS[1:5], function(L)
      data.table(Val1=rnorm(3), Val2=runif(3), FactorCol=factor(L)) )
    sampleList.DT <- lapply(sampleList.DT, function(x)
       x[, StringCol := as.character(FactorCol)])


# rbindlist results:

    rbindlist(sampleList.DT)
    rbindlist(sampleList.DF)

# expected behavior similiar to do.call(rbind, LIST)

    do.call(rbind, sampleList.DF)
    do.call(rbind, sampleList.DT)




On Thu, Mar 28, 2013 at 12:52 PM, Ricardo Saporta <
saporta at scarletmail.rutgers.edu> wrote:

> Hello,
>
> I found that when using `rbindlist` on a list of data.frames with factor
> columns, the factor column is getting concat'd as its numeric equivalent.
>
> This of course, does not happen when using a list of data.tables.
>
>     # sample data, using data.frame
>     sampleList.DF <- lapply(LETTERS[1:5], function(L)
>       data.frame(Val1=rnorm(3), Val2=runif(3), FactorCol=L) )
>
>     sampleList.DF <- lapply(sampleList.DF, function(x)
>       {x$StringCol <- as.character(x$FactorCol); x})
>
>     # sample data, using data.table
>     sampleList.DT <- lapply(LETTERS[1:5], function(L)
>       data.table(Val1=rnorm(3), Val2=runif(3), FactorCol=L) )
>     sampleList.DT <- lapply(sampleList.DT, function(x)
>        x[, StringCol := as.character(FactorCol)])
>
>
> # Compare the column `FactorCol`:
>
>     rbindlist(sampleList.DT)
>     rbindlist(sampleList.DF)
>     do.call(rbind, sampleList.DF)
>
> Interestingly, I originally thought it was levels dependent:
> (I would have expected, for example, the following to allow for the levels
> of the third list element, but it does not).
>
>     sampleList.DF[[1]][, "FactorCol"] <- factor(c("A", "C", "A"))
>
>     # all the levels in third element are present in the first
>     all(levels(sampleList.DF[[3]][, "FactorCol"])  %in%
>  levels(sampleList.DF[[1]][, "FactorCol"]))
>     # [1] TRUE
>
> But...
>
>     rbindlist(sampleList.DF)
>
> However:
>
>     sampleList.DF[[1]][, "FactorCol"] <- factor(c("C", "A", "A"),
> levels=c("C", "A"))
>     rbindlist(sampleList.DF)
>
> Is the above behavior intended?
>
> Cheers,
> Rick
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130328/355aa76a/attachment.html>


More information about the datatable-help mailing list