<div>Hello, </div><div><br></div><div>I found that when using `rbindlist` on a list of data.frames with factor columns, the factor column is getting concat'd as its numeric equivalent. </div><div><br></div><div>This of course, does not happen when using a list of data.tables. </div>
<div><br></div><div> # sample data, using data.frame</div><div> sampleList.DF <- lapply(LETTERS[1:5], function(L) </div><div> data.frame(Val1=rnorm(3), Val2=runif(3), FactorCol=L) )</div><div><br></div><div>
sampleList.DF <- lapply(sampleList.DF, function(x) </div><div> {x$StringCol <- as.character(x$FactorCol); x})</div><div><br></div><div> # sample data, using data.table</div><div> sampleList.DT <- lapply(LETTERS[1:5], function(L) </div>
<div> data.table(Val1=rnorm(3), Val2=runif(3), FactorCol=L) )</div><div> sampleList.DT <- lapply(sampleList.DT, function(x) </div><div> x[, StringCol := as.character(FactorCol)])</div><div><br></div><div>
<br></div><div># Compare the column `FactorCol`: </div><div><br></div><div> rbindlist(sampleList.DT)</div><div> rbindlist(sampleList.DF)</div><div> do.call(rbind, sampleList.DF)</div><div><br></div><div>Interestingly, I originally thought it was levels dependent: </div>
<div>(I would have expected, for example, the following to allow for the levels of the third list element, but it does not).</div><div><br></div><div> sampleList.DF[[1]][, "FactorCol"] <- factor(c("A", "C", "A"))</div>
<div> </div><div> # all the levels in third element are present in the first</div><div> all(levels(sampleList.DF[[3]][, "FactorCol"]) %in% levels(sampleList.DF[[1]][, "FactorCol"]))</div><div>
# [1] TRUE</div><div><br></div><div>But... </div><div><br></div><div> rbindlist(sampleList.DF)</div><div><br></div><div>However: </div><div><br></div><div> sampleList.DF[[1]][, "FactorCol"] <- factor(c("C", "A", "A"), levels=c("C", "A"))</div>
<div> rbindlist(sampleList.DF)</div><div> </div><div>Is the above behavior intended? </div><div><br></div><div>Cheers, </div><div>Rick</div><div><br></div>