[datatable-help] data.table segfaulting, need help verifying the reason

Chris Neff caneff at gmail.com
Wed Sep 11 11:17:50 CEST 2013


Indeed, it shows that k1 and k2 both have names of length 2, and both times
the value of names is just the variable names.

Where the names are getting added is by apply.  What the issue with
data.table is that it does not ignore names from short variables. I now
have a small reproducible example I can share:

d <- data.frame(x=1:5)

f <- function(x) {data.table(x=x, y=1:10)}

l <- apply(d, 1, f)

lapply(l, function(x) lapply(x, names)) # All values of x have a name

a <- rbindlist(l) # a$x will segfault after this


The underlying issue is what data.table and data.frame do with rownames and
recycling. Look at this simple case:

x <- 1:5
names(x) <- letters[1:5]

df <- data.frame(x=x, y=1:10)
#Warning message:
#  In data.frame(x = x, y = 1:10) :
#  row names were found from a short variable and have been discarded

lapply(df, names) # no names

dt <- data.table(x=x, y=1:1) # No warning

lapply(dt, names) # x has names, and they get recycled.


So data.table needs to follow data.frame logic for discarding row names
when they would otherwise be recycled.


Bug submitted here:
https://r-forge.r-project.org/tracker/index.php?func=detail&aid=4890&group_id=240&atid=975
I'm surprised this has never arisen before, it seems like something that
has been around forever.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130911/1846abaa/attachment.html>


More information about the datatable-help mailing list