[datatable-help] rbinding an empty data.table and a non-empty data.table

Matthew Dowle mdowle at mdowle.plus.com
Wed Nov 7 23:46:54 CET 2012


Well spotted. It's a new bug. rbind calls rbindlist internally, so that fits.

rbindlist's result has column types taken from the first non-NULL item.
I'd coded for that at least, see "First non-NULL" comment in the C source
:

https://r-forge.r-project.org/scm/viewvc.php/pkg/src/rbindlist.c?view=markup&root=datatable

But the types _should_ be taken from the first _non-empty_ data.table
item, then. Since it's read.csv that's creating the first data.frame with
empty logical(0) columns :

> sapply(read.csv("~/tmp/new.csv"),class)
        A         B
"logical" "logical"

All known bugs had just been cleared, sigh ;)  Will fix and include in
release to CRAN ...

Thanks!

Matthew


> Thanks. I actually discovered it using rbindlist; it suffers the same
> problem.
>
> Garrett
>
>
> On Wed, Nov 7, 2012 at 4:07 PM, Cook, Malcolm <MEC at stowers.org> wrote:
>
>> not sure but try data.table::rbindlist like this (it should be faster
>> too)
>> ****
>>
>> ** **
>>
>> rbindlist(lapply(c("~/tmp/new.csv", "~/tmp/new2.csv"), function(x)
>> as.data.table(read.csv(x)))****
>>
>> ** **
>>
>> ~Malcolm****
>>
>> ** **
>>
>> *From:* datatable-help-bounces at lists.r-forge.r-project.org [mailto:
>> datatable-help-bounces at lists.r-forge.r-project.org] *On Behalf Of *G See
>> *Sent:* Wednesday, November 07, 2012 3:53 PM
>> *To:* datatable-help at lists.r-forge.r-project.org
>> *Subject:* [datatable-help] rbinding an empty data.table and a non-empty
>> data.table****
>>
>> ** **
>>
>> When I try to rbind an empty data.table to a non-empty data table, all
>> my
>> data are converted to logical.  Here's an example****
>>
>> ** **
>>
>> # create a directory and put 2 csv file in it.****
>>
>> dir.create("~/tmp")****
>>
>> system("echo 'A,B' > ~/tmp/new.csv") # this csv only has headers; no
>> data*
>> ***
>>
>> write.csv(data.frame(A=1, B=2), row.names=FALSE, file='~/tmp/new2.csv')
>> #
>> this one has header and 1 row****
>>
>> ** **
>>
>> lapply(c("~/tmp/new.csv", "~/tmp/new2.csv"), read.csv)****
>>
>> #[[1]]****
>>
>> #[1] A B****
>>
>> #<0 rows> (or 0-length row.names)****
>>
>> #****
>>
>> #[[2]]****
>>
>> #  A B****
>>
>> #1 1 2****
>>
>> ** **
>>
>> # now rbind them, and we're left with the data from the non-empy csv****
>>
>> do.call(rbind, lapply(c("~/tmp/new.csv", "~/tmp/new2.csv"),
>> read.csv))****
>>
>> #  A B****
>>
>> #1 1 2****
>>
>> ** **
>>
>> # Now let's try to work with data.tables instead of data.frames****
>>
>> lapply(c("~/tmp/new.csv", "~/tmp/new2.csv"), function(x)
>> as.data.table(read.csv(x)))****
>>
>> #[[1]]****
>>
>> #Empty data.table (0 rows) of 2 cols: A,B****
>>
>> #****
>>
>> #[[2]]****
>>
>> #   A B****
>>
>> #1: 1 2****
>>
>> ** **
>>
>> #Ok, but look at what happens when we rbind them****
>>
>> do.call(rbind, lapply(c("~/tmp/new.csv", "~/tmp/new2.csv"), function(x)
>> as.data.table(read.csv(x))))****
>>
>>       A    B****
>>
>> 1: TRUE TRUE****
>>
>> ** **
>>
>> ** **
>>
>>  What's going on here?****
>>
>> ** **
>>
>> Thanks,****
>>
>> Garrett****
>>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help



More information about the datatable-help mailing list