[datatable-help] data.table segfaulting, need help verifying the reason

Arunkumar Srinivasan aragorn168b at gmail.com
Wed Sep 11 11:33:06 CEST 2013


Chris, 
It's not filed as a FR, IIRC. It's filed under "Internals".

Arun


On Wednesday, September 11, 2013 at 11:31 AM, Chris Neff wrote:

> Yes, dropping names altogether in data.table would fix this, and would be the cleanest thing overall since as is said in that thread data.table doesn't really work with rownames in mind anyway.
> 
> Except it is less of a FR now and more of a bad bug because you can get segfaults from it.
> 
> 
> On Wed, Sep 11, 2013 at 5:24 AM, Arunkumar Srinivasan <aragorn168b at gmail.com (mailto:aragorn168b at gmail.com)> wrote:
> > Most likely, this (https://r-forge.r-project.org/tracker/index.php?func=detail&aid=4882&group_id=240&atid=5335), when fixed, will take care of it? 
> > 
> > Arun
> > 
> > 
> > On Wednesday, September 11, 2013 at 11:17 AM, Chris Neff wrote:
> > 
> > 
> > 
> > > Indeed, it shows that k1 and k2 both have names of length 2, and both times the value of names is just the variable names.
> > > 
> > > Where the names are getting added is by apply.  What the issue with data.table is that it does not ignore names from short variables. I now have a small reproducible example I can share: 
> > > 
> > > d <- data.frame(x=1:5)
> > > 
> > > f <- function(x) {data.table(x=x, y=1:10)}
> > > 
> > > l <- apply(d, 1, f)
> > > 
> > > lapply(l, function(x) lapply(x, names)) # All values of x have a name 
> > > 
> > > a <- rbindlist(l) # a$x will segfault after this
> > > 
> > > 
> > > The underlying issue is what data.table and data.frame do with rownames and recycling. Look at this simple case: 
> > > 
> > > x <- 1:5
> > > names(x) <- letters[1:5]
> > > 
> > > df <- data.frame(x=x, y=1:10) 
> > > #Warning message:
> > > #  In data.frame(x = x, y = 1:10) :
> > > #  row names were found from a short variable and have been discarded
> > > 
> > > lapply(df, names) # no names
> > > 
> > > dt <- data.table(x=x, y=1:1) # No warning
> > > 
> > > lapply(dt, names) # x has names, and they get recycled.
> > > 
> > > 
> > > So data.table needs to follow data.frame logic for discarding row names when they would otherwise be recycled. 
> > > 
> > > 
> > > Bug submitted here: https://r-forge.r-project.org/tracker/index.php?func=detail&aid=4890&group_id=240&atid=975
 (https://r-forge.r-project.org/tracker/index.php?func=detail&aid=4890&group_id=240&atid=975)> > > 
> > > I'm surprised this has never arisen before, it seems like something that has been around forever.
> > > 
> > > 
> > > 
> > > 
> > > _______________________________________________
> > > datatable-help mailing list
> > > datatable-help at lists.r-forge.r-project.org (mailto:datatable-help at lists.r-forge.r-project.org)
> > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> > > 
> > > 
> > > 
> > > 
> > 
> > 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130911/d13e31c1/attachment-0001.html>


More information about the datatable-help mailing list