[datatable-help] data.table segfaulting, need help verifying the reason

Frank Erickson FErickson at psu.edu
Wed Sep 11 15:52:00 CEST 2013


@Chris: If your application is like the example given, you might consider
using

CJ(x=1:5,y=1:10)

which is a data.table analogue to

expand.grid(x=1:5,y=1:10)

that automatically sets a key of c("x","y") on the result.

--Frank


On Wed, Sep 11, 2013 at 5:55 AM, Chris Neff <caneff at gmail.com> wrote:

> Oh okay, sorry. Either way it is more than just a slight improvement :)
>  But yes that would fix everything.
>
>
> On Wed, Sep 11, 2013 at 5:33 AM, Arunkumar Srinivasan <
> aragorn168b at gmail.com> wrote:
>
>>  Chris,
>> It's not filed as a FR, IIRC. It's filed under "Internals".
>>
>> Arun
>>
>> On Wednesday, September 11, 2013 at 11:31 AM, Chris Neff wrote:
>>
>> Yes, dropping names altogether in data.table would fix this, and would be
>> the cleanest thing overall since as is said in that thread data.table
>> doesn't really work with rownames in mind anyway.
>>
>> Except it is less of a FR now and more of a bad bug because you can get
>> segfaults from it.
>>
>>
>> On Wed, Sep 11, 2013 at 5:24 AM, Arunkumar Srinivasan <
>> aragorn168b at gmail.com> wrote:
>>
>>  Most likely, this<https://r-forge.r-project.org/tracker/index.php?func=detail&aid=4882&group_id=240&atid=5335>,
>> when fixed, will take care of it?
>>
>> Arun
>>
>> On Wednesday, September 11, 2013 at 11:17 AM, Chris Neff wrote:
>>
>> Indeed, it shows that k1 and k2 both have names of length 2, and both
>> times the value of names is just the variable names.
>>
>> Where the names are getting added is by apply.  What the issue with
>> data.table is that it does not ignore names from short variables. I now
>> have a small reproducible example I can share:
>>
>> d <- data.frame(x=1:5)
>>
>> f <- function(x) {data.table(x=x, y=1:10)}
>>
>> l <- apply(d, 1, f)
>>
>> lapply(l, function(x) lapply(x, names)) # All values of x have a name
>>
>> a <- rbindlist(l) # a$x will segfault after this
>>
>>
>> The underlying issue is what data.table and data.frame do with rownames
>> and recycling. Look at this simple case:
>>
>> x <- 1:5
>> names(x) <- letters[1:5]
>>
>> df <- data.frame(x=x, y=1:10)
>> #Warning message:
>> #  In data.frame(x = x, y = 1:10) :
>> #  row names were found from a short variable and have been discarded
>>
>> lapply(df, names) # no names
>>
>> dt <- data.table(x=x, y=1:1) # No warning
>>
>> lapply(dt, names) # x has names, and they get recycled.
>>
>>
>> So data.table needs to follow data.frame logic for discarding row names
>> when they would otherwise be recycled.
>>
>>
>> Bug submitted here:
>> https://r-forge.r-project.org/tracker/index.php?func=detail&aid=4890&group_id=240&atid=975
>> I'm surprised this has never arisen before, it seems like something that
>> has been around forever.
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>>
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
>>
>>
>>
>>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130911/2a647e59/attachment.html>


More information about the datatable-help mailing list