[datatable-help] rbindlist on list of data.frames with factor column
Matthew Dowle
mdowle at mdowle.plus.com
Fri Mar 29 02:04:32 CET 2013
Well spotted. Looking at the C source just now it looks like I never
considered factor columns in rbindlist(). At the time I needed
rbindlist, I needed it quickly for something I was doing, which didn't
use factor columns.
Please file as a bug report. Should be fairly easy
to implement, and quick in C. It would populate the column as if it were
character (without actually converting to a new character vector for
each item l column) and then call factor() at R level afterwards to
refactor it.
Matthew
On 28.03.2013 17:34, Ricardo Saporta wrote:
>
My apologies, I had a mistake in my previous email. (I forgot that
data.table does not coerce strings to factor)
> It looks like the
`rbindlist` behavior observed occurs for _BOTH_, a list of data.tables
and a list of data.frames (assuming, of course, that there is factor
column present)
> # sample data, using data.frame
> set.seed(1)
>
sampleList.DF
> data.frame(Val1=rnorm(3), Val2=runif(3),
FactorCol=factor(L)) )
> sampleList.DF
> {x$StringCol
> # sample
data, using data.table
> set.seed(1)
> sampleList.DT
>
data.table(Val1=rnorm(3), Val2=runif(3), FactorCol=factor(L)) )
>
sampleList.DT
> x[, StringCol := as.character(FactorCol)])
> #
rbindlist results:
> rbindlist(sampleList.DT)
>
rbindlist(sampleList.DF)
> # expected behavior similiar to
do.call(rbind, LIST)
> do.call(rbind, sampleList.DF)
> do.call(rbind,
sampleList.DT)
>
> On Thu, Mar 28, 2013 at 12:52 PM, Ricardo Saporta
<saporta at scarletmail.rutgers.edu [1]> wrote:
>
>> Hello,
>> I found
that when using `rbindlist` on a list of data.frames with factor
columns, the factor column is getting concat'd as its numeric
equivalent.
>> This of course, does not happen when using a list of
data.tables.
>> # sample data, using data.frame
>> sampleList.DF
>>
data.frame(Val1=rnorm(3), Val2=runif(3), FactorCol=L) )
>>
sampleList.DF
>> {x$StringCol
>> # sample data, using data.table
>>
sampleList.DT
>> data.table(Val1=rnorm(3), Val2=runif(3), FactorCol=L)
)
>> sampleList.DT
>> x[, StringCol := as.character(FactorCol)])
>> #
Compare the column `FactorCol`:
>> rbindlist(sampleList.DT)
>>
rbindlist(sampleList.DF)
>> do.call(rbind, sampleList.DF)
>>
Interestingly, I originally thought it was levels dependent:
>> (I
would have expected, for example, the following to allow for the levels
of the third list element, but it does not).
>> sampleList.DF[[1]][,
"FactorCol"]
>>
>> # all the levels in third element are present in
the first
>> all(levels(sampleList.DF[[3]][, "FactorCol"]) %in%
levels(sampleList.DF[[1]][, "FactorCol"]))
>> # [1] TRUE
>> But...
>>
rbindlist(sampleList.DF)
>> However:
>> sampleList.DF[[1]][,
"FactorCol"]
>> rbindlist(sampleList.DF)
>>
>> Is the above behavior
intended?
>> Cheers,
>> Rick
Links:
------
[1]
mailto:saporta at scarletmail.rutgers.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130329/96904d46/attachment.html>
More information about the datatable-help
mailing list