[datatable-help] Random segfaults

Chris Neff caneff at gmail.com
Fri Dec 16 16:36:27 CET 2011


Sorry but this is just crashing too often to keep trying with it.  I
can't get a really reproducible example, so I'll just explain the
sorts of circumstances that seem to make it happen:

1) It always seems to need to be a really large dataset. For instance,
mine was about 2.5 million rows and 20 columns.
2) My dataset has a factor that is unique to every row as a key, so a
factor with 2.5 million levels (don't know if that matters but
throwing it out there).
3) Crashes seem to happen most when trying to make a new column, and
also bizarrely when trying to use ggplot.  A lot of crashes happen
when I try to plot subsets of the data with ggplot.

As one more piece of data, I tried to take a subset of my data.table,
and do str on that subset. so

d <- DT[x<10]

str(d)

and got this error:

 *** caught segfault ***
address (nil), cause 'unknown'

Traceback:
 1: encodeString(lev.att, na.encode = FALSE, quote = "\"")
 2: str.default(object[[i]], nest.lev = nest.lev + 1, indent.str =
paste(indent.str,     ".."), nchar.max = nchar.max, max.level =
max.level, vec.len = vec.len,     digits.d = digits.d, give.attr =
give.attr, give.head = give.head,     give.length = give.length, width
= width, envir = envir,     list.len = list.len)
 3: str(object[[i]], nest.lev = nest.lev + 1, indent.str =
paste(indent.str,     ".."), nchar.max = nchar.max, max.level =
max.level, vec.len = vec.len,     digits.d = digits.d, give.attr =
give.attr, give.head = give.head,     give.length = give.length, width
= width, envir = envir,     list.len = list.len)
 4: str.default(d, give.length = FALSE)
 5: NextMethod("str", give.length = FALSE, ...)
 6: str.data.frame(d)
 7: str(d)


Once again it is hard to reproduce though.

At this point I have to get some real work done so I'm reverting back
to 1.7.1 until someone comes up with a new fix or thing for me to try.

On 16 December 2011 10:20, Chris Neff <caneff at gmail.com> wrote:
> Just posting things as I find them.  I run my script (and it makes it
> through no complaints), but then I just try to modify it slightly more
> like:
>
> DT[, w := x*y]
>
> where x,y are both integer columns of DT (and w doesn't previously
> exist), and I get the following:
>
> Error in match(as.vector(x), y, 0L) :
>  'translateCharUTF8' must be called on a CHARSXP
>
> If I then try to print DT again I get the same error as above:
>
> Error in do.call("cbind", lapply(x, format, justify = justify, ...)) :
>  'getCharCE' must be called on a CHARSXP
>
>
> The problem is I cant get this to reproduce on simpler code.  So I
> just have to tell you what I see when I see it.
>
>
>
>
>
> On 16 December 2011 09:38, Chris Neff <caneff at gmail.com> wrote:
>> On the current latest SVN build, with debugging enabled as listed
>> below, I get the following when trying to even print the contents of a
>> data.table:
>>
>> Error in do.call("cbind", lapply(x, format, justify = justify, ...)) :
>>   'getCharCE' must be called on a CHARSXP
>>
>> Never saw this error without debugging.  I tried printing a few times
>> in a row, got this same error, and then like the 4th time it
>> segfaulted.
>>
>> Having a hard time reproducing that, but at least it is something?
>>
>>
>> On 15 December 2011 15:05, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
>>>
>>> One thought ... how about turning on debugging. That way when it crashes
>>> at least you can report the file and line number. Btw, I've installed
>>> 2.12.0 on 64bit in case that managed to reproduce, but it still works
>>> for me ok as does 32bit 2.12.0, and both 32 and 64bit 2.14.0. So we're
>>> left with you debugging at your end, but should be fairly easy ...
>>>
>>> sudo MAKEFLAGS='CFLAGS=-O0\ -g\ -Wall\ -pedantic' R CMD INSTALL
>>> data.table_1.7.7.tar.gz
>>>
>>> R -d gdb
>>>
>>> run
>>>
>>> Do the stuff that crashes it.  Does it report a C file and line number?
>>>
>>> Just to rule out possible svn / R CMD build strangeness, please also use
>>> the data.table_1.7.7.tar.gz that's on CRAN.  It still hasn't run checks
>>> for 1.7.7 so on tenterhooks for that.
>>>
>>>
>>>
>>> On Thu, 2011-12-15 at 12:26 -0500, Chris Neff wrote:
>>>> Just to come back, it still crashes at seemingly random times.   I'm
>>>> reverting back to an earlier version (1.7.1) to see if that fixes my
>>>> problem.
>>>>
>>>> On 15 December 2011 11:08, Chris Neff <caneff at gmail.com> wrote:
>>>> > Internal build of R. Can't upgrade until they do.  I think it is
>>>> > unlikely to see 2.14 any time soon.
>>>> >
>>>> > On 15 December 2011 10:50, Steve Lianoglou
>>>> > <mailinglist.honeypot at gmail.com> wrote:
>>>> >> Hi,
>>>> >>
>>>> >> Out of curiosity, is it impossible for you to upgrade R to the latest, or?
>>>> >>
>>>> >> -steve
>>>> >>
>>>> >>
>>>> >> On Thu, Dec 15, 2011 at 10:42 AM, Chris Neff <caneff at gmail.com> wrote:
>>>> >>> I always use svn up. I'll reboot and reinstall just to make sure. As
>>>> >>> for reproducible, it still doesn't seem to crash in any consistent
>>>> >>> place but I'll give it a stronger try with a test data set.
>>>> >>>
>>>> >>> All 480 tests in test.data.table() completed ok in 7.395sec
>>>> >>> R version 2.12.1 (2010-12-16)
>>>> >>> Platform: x86_64-pc-linux-gnu (64-bit)
>>>> >>>
>>>> >>> locale:
>>>> >>>  [1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C
>>>> >>> LC_TIME=en_US.utf8        LC_COLLATE=en_US.utf8
>>>> >>>  [5] LC_MONETARY=C             LC_MESSAGES=en_US.utf8
>>>> >>> LC_PAPER=en_US.utf8       LC_NAME=C
>>>> >>>  [9] LC_ADDRESS=C              LC_TELEPHONE=C
>>>> >>> LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
>>>> >>>
>>>> >>> attached base packages:
>>>> >>> [1] stats     graphics  grDevices utils     datasets  grid
>>>> >>> methods   base
>>>> >>>
>>>> >>> other attached packages:
>>>> >>> [1] hexbin_1.26.0      lattice_0.19-33    RColorBrewer_1.0-5
>>>> >>> data.table_1.7.8   ggplot2_0.8.9      reshape_0.8.4
>>>> >>> [6] plyr_1.6
>>>> >>>
>>>> >>> On 15 December 2011 09:52, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
>>>> >>>>
>>>> >>>> And you did an 'svn up' (or equivalent)? Grabbing daily tar.gz snapshot
>>>> >>>> from R-Forge won't include the fix yet. So svn up, then R CMD build, then
>>>> >>>> R CMD INSTALL, right? (Just checking quick basics first).
>>>> >>>>
>>>> >>>>> Result of test.data.table(), sessionInfo() and confirm it's a clean
>>>> >>>>> install after a reboot to make sure no old .so is still knocking around
>>>> >>>>> somehow please. Definitely installed to the right library? If it's
>>>> >>>>> crashing a lot then it should be reproducible?
>>>> >>>>> Still waiting for CRAN check results for 1.7.7 in old-rel. If it's not
>>>> >>>>> fixed there either that'll help to know....
>>>> >>>>>
>>>> >>>>>> Latest SVN version, no alloccol set, still crashing a lot.  I don't
>>>> >>>>>> use [<- or $<-, the only times I modify a data.table are with :=  or
>>>> >>>>>> by doing DT=merge(DT,blah).
>>>> >>>>>>
>>>> >>>>>> Any more info I can provide?
>>>> >>>>>>
>>>> >>>>>> On 15 December 2011 08:32, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
>>>> >>>>>>> Great fingers and toes crossed. If you could unset alloccol option just
>>>> >>>>>>> to
>>>> >>>>>>> be sure please, that would be great. You're our best hope of confirming
>>>> >>>>>>> it's fixed since it was biting you several times an hour. If you use
>>>> >>>>>>> [<-
>>>> >>>>>>> or $<- syntax then R will copy via *tmp* and at that point the *tmp*
>>>> >>>>>>> data.table is similar to a data.table loaded from disk in that it isn't
>>>> >>>>>>> over-allocated anymore, I realised. Also a copy() will lose
>>>> >>>>>>> over-allocation until the next column addition.  That 'should' all be
>>>> >>>>>>> fine
>>>> >>>>>>> now in both <=2.13.2 and >=2.14.0, although the bug was something
>>>> >>>>>>> simpler.
>>>> >>>>>>>
>>>> >>>>>>> 1.7.7 is on CRAN now and been built for windows so if CRAN check
>>>> >>>>>>> results
>>>> >>>>>>> tick over from "ERROR" to "OK" later today (for both windows and mac
>>>> >>>>>>> old-rel), and, you're ok too, then it's fixed.
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>>> I've updated to the latest SVN version, and I'll be sure to let you
>>>> >>>>>>>> know if it still crashes (however I do have the alloccol option set to
>>>> >>>>>>>> 1000, so I shouldn't be bumping into reallocation very often). Thanks
>>>> >>>>>>>> for finding the bug so fast!
>>>> >>>>>>>>
>>>> >>>>>>>> On 14 December 2011 19:56, Matthew Dowle <mdowle at mdowle.plus.com>
>>>> >>>>>>>> wrote:
>>>> >>>>>>>>>
>>>> >>>>>>>>> Hm. Sounds like it could be a different problem then if it was in R
>>>> >>>>>>>>> 2.14. There have been quite a few fixes since 1.7.4 so if you can
>>>> >>>>>>>>> reproduce with 1.7.7 would be great.  Or, we've sometimes seen that
>>>> >>>>>>>>> just
>>>> >>>>>>>>> after a package upgrade that a clean re-install can often fix things.
>>>> >>>>>>>>> Perhaps if the .so was in use by another R process or a zombie, or
>>>> >>>>>>>>> something. R seems to report data.table v1.7.4 (say) but it hasn't
>>>> >>>>>>>>> fully
>>>> >>>>>>>>> installed it properly and is still (perhaps partially) at 1.7.3. So
>>>> >>>>>>>>> quit
>>>> >>>>>>>>> all R (reboot to clear zombies too perhaps) and try reinstalling
>>>> >>>>>>>>> using
>>>> >>>>>>>>> R
>>>> >>>>>>>>> CMD INSTALL. Next time it happens I mean. Can also run
>>>> >>>>>>>>> test.data.table()
>>>> >>>>>>>>> to check the install.
>>>> >>>>>>>>>
>>>> >>>>>>>>> On Wed, 2011-12-14 at 17:40 +0000, Timothée Carayol wrote:
>>>> >>>>>>>>>> Hi --
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> I have been having many unreproducible bugs with R 2.14, data.table
>>>> >>>>>>>>>> 1.7.4 and ubuntu 64 bits about 10 days ago. Data was getting
>>>> >>>>>>>>>> corrupted, and then R crashed. I had to go back to data.frame for
>>>> >>>>>>>>>> the
>>>> >>>>>>>>>> bits of code affected. I was doing a lot of rather unsafe
>>>> >>>>>>>>>> manipulations with row names, rbind and cbinds.
>>>> >>>>>>>>>> I didn't file a report, nor signal it, as it was occurring seemingly
>>>> >>>>>>>>>> at random, and I was doing operations which aren't really what
>>>> >>>>>>>>>> data.table was made for (tons of little manipulations on small
>>>> >>>>>>>>>> data);
>>>> >>>>>>>>>> still I guess I should now signal that 2.14 didn't fix everything
>>>> >>>>>>>>>> for
>>>> >>>>>>>>>> me. I do not know whether bugs subsist on post-1.7.4 versions.
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> t
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> On Wed, Dec 14, 2011 at 5:31 PM, Matthew Dowle
>>>> >>>>>>>>>> <mdowle at mdowle.plus.com>
>>>> >>>>>>>>>> wrote:
>>>> >>>>>>>>>> >
>>>> >>>>>>>>>> > Maybe, worth a try. Are you loading any data.table objects from
>>>> >>>>>>>>>> disk?
>>>> >>>>>>>>>> >
>>>> >>>>>>>>>> >> 64 bit 2.12.1 linux.
>>>> >>>>>>>>>> >>
>>>> >>>>>>>>>> >> Is there an option I can set in my session in order to work
>>>> >>>>>>>>>> around
>>>> >>>>>>>>>> the
>>>> >>>>>>>>>> >> truelength issue? I don't care if I lose some of the
>>>> >>>>>>>>>> over-allocation
>>>> >>>>>>>>>> >> niceties if it stops things from crashing. Looking at the
>>>> >>>>>>>>>> truelength
>>>> >>>>>>>>>> >> help, would just doing:
>>>> >>>>>>>>>> >>
>>>> >>>>>>>>>> >> options(datatable.alloc=quote(1000))
>>>> >>>>>>>>>> >>
>>>> >>>>>>>>>> >> stop this? I never have more than about 50 columns at a time.
>>>> >>>>>>>>>> >>
>>>> >>>>>>>>>> >> On 14 December 2011 11:43, Matthew Dowle <mdowle at mdowle.plus.com>
>>>> >>>>>>>>>> wrote:
>>>> >>>>>>>>>> >>>
>>>> >>>>>>>>>> >>> You're R < 2.14.0, right?  I'm really struggling in R < 2.14.0
>>>> >>>>>>>>>> to
>>>> >>>>>>>>>> make
>>>> >>>>>>>>>> >>> over-allocation work because R only started to initialize
>>>> >>>>>>>>>> truelength to
>>>> >>>>>>>>>> >>> 0
>>>> >>>>>>>>>> >>> in R 2.14.0+. Before that it's unitialized (random). Trouble is
>>>> >>>>>>>>>> my
>>>> >>>>>>>>>> >>> attempts in R < 2.14.0 to work around that work fine for me in
>>>> >>>>>>>>>> linux
>>>> >>>>>>>>>> >>> 32bit
>>>> >>>>>>>>>> >>> when I test in R 2.13.2, and I even test in 2.12.0 too. I test
>>>> >>>>>>>>>> on
>>>> >>>>>>>>>> 64bit
>>>> >>>>>>>>>> >>> too but just 2.14.0.  CRAN is also showing errors on 2.13.2
>>>> >>>>>>>>>> (old-rel)
>>>> >>>>>>>>>> >>> for
>>>> >>>>>>>>>> >>> both mac and windows.
>>>> >>>>>>>>>> >>>
>>>> >>>>>>>>>> >>> So, this is a pre-2.14.0 (only) problem that I'll continue to
>>>> >>>>>>>>>> try
>>>> >>>>>>>>>> and
>>>> >>>>>>>>>> >>> fix.
>>>> >>>>>>>>>> >>>
>>>> >>>>>>>>>> >>> Are you 64bit pre-2.14.0? Which OS?  If you are 64bit linux then
>>>> >>>>>>>>>> it
>>>> >>>>>>>>>> adds
>>>> >>>>>>>>>> >>> weight to me installing pre-2.14.0 on my 64bit instance in an
>>>> >>>>>>>>>> effort to
>>>> >>>>>>>>>> >>> reproduce.
>>>> >>>>>>>>>> >>>
>>>> >>>>>>>>>> >>>
>>>> >>>>>>>>>> >>>> This will be a crappy help request because I can't seem to
>>>> >>>>>>>>>> reproduce
>>>> >>>>>>>>>> >>>> it, but the past few days I've been getting a lot of segfaults.
>>>> >>>>>>>>>>  The
>>>> >>>>>>>>>> >>>> only common thing between every crash is that it happens when I
>>>> >>>>>>>>>> do
>>>> >>>>>>>>>> >>>>
>>>> >>>>>>>>>> >>>> DT[, z := x]
>>>> >>>>>>>>>> >>>>
>>>> >>>>>>>>>> >>>> where z was not a column that existed in DT before, and x is
>>>> >>>>>>>>>> either an
>>>> >>>>>>>>>> >>>> existing column of DT or a separate variable, doesn't matter.
>>>> >>>>>>>>>>  Beyond
>>>> >>>>>>>>>> >>>> that I can't reproduce a set of steps that gets R to crash.
>>>> >>>>>>>>>>  This
>>>> >>>>>>>>>> is
>>>> >>>>>>>>>> >>>> with the latest SVN version.
>>>> >>>>>>>>>> >>>>
>>>> >>>>>>>>>> >>>> Is there more information I can provide to help track this
>>>> >>>>>>>>>> down?
>>>> >>>>>>>>>> >>>> _______________________________________________
>>>> >>>>>>>>>> >>>> datatable-help mailing list
>>>> >>>>>>>>>> >>>> datatable-help at lists.r-forge.r-project.org
>>>> >>>>>>>>>> >>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>> >>>>>>>>>> >>>>
>>>> >>>>>>>>>> >>>
>>>> >>>>>>>>>> >>>
>>>> >>>>>>>>>> >>
>>>> >>>>>>>>>> >
>>>> >>>>>>>>>> >
>>>> >>>>>>>>>> > _______________________________________________
>>>> >>>>>>>>>> > datatable-help mailing list
>>>> >>>>>>>>>> > datatable-help at lists.r-forge.r-project.org
>>>> >>>>>>>>>> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>> >>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>>>>> _______________________________________________
>>>> >>>>>>>>> datatable-help mailing list
>>>> >>>>>>>>> datatable-help at lists.r-forge.r-project.org
>>>> >>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>> >>>>>>>>
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> _______________________________________________
>>>> >>>>> datatable-help mailing list
>>>> >>>>> datatable-help at lists.r-forge.r-project.org
>>>> >>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>> >>>>>
>>>> >>>>
>>>> >>>>
>>>> >>> _______________________________________________
>>>> >>> datatable-help mailing list
>>>> >>> datatable-help at lists.r-forge.r-project.org
>>>> >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> Steve Lianoglou
>>>> >> Graduate Student: Computational Systems Biology
>>>> >>  | Memorial Sloan-Kettering Cancer Center
>>>> >>  | Weill Medical College of Cornell University
>>>> >> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>>
>>>


More information about the datatable-help mailing list