[datatable-help] rbindlist on list of data.frames with factor column

Matthew Dowle mdowle at mdowle.plus.com
Fri Mar 29 02:04:32 CET 2013


 

Well spotted. Looking at the C source just now it looks like I never
considered factor columns in rbindlist(). At the time I needed
rbindlist, I needed it quickly for something I was doing, which didn't
use factor columns. 

Please file as a bug report. Should be fairly easy
to implement, and quick in C. It would populate the column as if it were
character (without actually converting to a new character vector for
each item l column) and then call factor() at R level afterwards to
refactor it. 

Matthew 

On 28.03.2013 17:34, Ricardo Saporta wrote: 

>
My apologies, I had a mistake in my previous email. (I forgot that
data.table does not coerce strings to factor) 
> It looks like the
`rbindlist` behavior observed occurs for _BOTH_, a list of data.tables
and a list of data.frames (assuming, of course, that there is factor
column present) 
> # sample data, using data.frame 
> set.seed(1) 
>
sampleList.DF 
> data.frame(Val1=rnorm(3), Val2=runif(3),
FactorCol=factor(L)) ) 
> sampleList.DF 
> {x$StringCol 
> # sample
data, using data.table 
> set.seed(1) 
> sampleList.DT 
>
data.table(Val1=rnorm(3), Val2=runif(3), FactorCol=factor(L)) ) 
>
sampleList.DT 
> x[, StringCol := as.character(FactorCol)]) 
> #
rbindlist results: 
> rbindlist(sampleList.DT) 
>
rbindlist(sampleList.DF) 
> # expected behavior similiar to
do.call(rbind, LIST) 
> do.call(rbind, sampleList.DF) 
> do.call(rbind,
sampleList.DT) 
> 
> On Thu, Mar 28, 2013 at 12:52 PM, Ricardo Saporta
<saporta at scarletmail.rutgers.edu [1]> wrote:
> 
>> Hello, 
>> I found
that when using `rbindlist` on a list of data.frames with factor
columns, the factor column is getting concat'd as its numeric
equivalent. 
>> This of course, does not happen when using a list of
data.tables. 
>> # sample data, using data.frame 
>> sampleList.DF 
>>
data.frame(Val1=rnorm(3), Val2=runif(3), FactorCol=L) ) 
>>
sampleList.DF 
>> {x$StringCol 
>> # sample data, using data.table 
>>
sampleList.DT 
>> data.table(Val1=rnorm(3), Val2=runif(3), FactorCol=L)
) 
>> sampleList.DT 
>> x[, StringCol := as.character(FactorCol)]) 
>> #
Compare the column `FactorCol`: 
>> rbindlist(sampleList.DT) 
>>
rbindlist(sampleList.DF) 
>> do.call(rbind, sampleList.DF) 
>>
Interestingly, I originally thought it was levels dependent: 
>> (I
would have expected, for example, the following to allow for the levels
of the third list element, but it does not). 
>> sampleList.DF[[1]][,
"FactorCol"] 
>> 
>> # all the levels in third element are present in
the first 
>> all(levels(sampleList.DF[[3]][, "FactorCol"]) %in%
levels(sampleList.DF[[1]][, "FactorCol"])) 
>> # [1] TRUE 
>> But... 
>>
rbindlist(sampleList.DF) 
>> However: 
>> sampleList.DF[[1]][,
"FactorCol"] 
>> rbindlist(sampleList.DF) 
>> 
>> Is the above behavior
intended? 
>> Cheers, 
>> Rick

 

Links:
------
[1]
mailto:saporta at scarletmail.rutgers.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130329/96904d46/attachment.html>


More information about the datatable-help mailing list