[datatable-help] Random segfaults
Matthew Dowle
mdowle at mdowle.plus.com
Fri Dec 16 16:43:39 CET 2011
Great, thanks. Have seen this quite a bit, see FAQ 4.3. It indicates an
earlier memory corruption happened, could have been at any point. It's not
anything to do with locale or CHARSXP. The next step is to follow all the
steps in section 4.3 of R-exts. Turn on gctorture, --use-gct,
--enable-strict-barrier, and, valgrind especially. The goal is to detect
where the earlier corruption is happening.
On the tenterhook front, 1.7.7 is now passing CRAN checks for oldrel (both
mac and windows) fully OK so that means the last fix definitely fixed the
problem I found, so that's some progress.
But, since 1.7.7+ doesn't fix it for you it means either :
i) you've found a new corruption that could happen in 2.14.0+, too.
or,
ii) you've found a new problem in my workaround attempts for
uninitialized truelength in <=2.13.2. That might lead to unexpected
information that could lead to improvements in 2.14.0+ in unexpected
ways.
So either way it's worth following this trail, if you're ok to do so. Fast
techniques to debug the corruptions (e.g. valgrind) might come in handy in
future anyway.
Only other thought ... your special internal build of R ... does it
increase R_len_t on 64bit to allow longer vectors than 2^31, by any
chance? I've used R_len_t quite a bit in data.table to future proof for
when that happens, but if you've done it already in your build then that
would help to know since it's never been tested afaik when R_len_t != int
on 64bit. I'm also assuming R_len_t is signed. If your R has R_len_t as
unsigned would need to know.
> On the current latest SVN build, with debugging enabled as listed
> below, I get the following when trying to even print the contents of a
> data.table:
>
> Error in do.call("cbind", lapply(x, format, justify = justify, ...)) :
> 'getCharCE' must be called on a CHARSXP
>
> Never saw this error without debugging. I tried printing a few times
> in a row, got this same error, and then like the 4th time it
> segfaulted.
>
> Having a hard time reproducing that, but at least it is something?
>
>
> On 15 December 2011 15:05, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
>>
>> One thought ... how about turning on debugging. That way when it crashes
>> at least you can report the file and line number. Btw, I've installed
>> 2.12.0 on 64bit in case that managed to reproduce, but it still works
>> for me ok as does 32bit 2.12.0, and both 32 and 64bit 2.14.0. So we're
>> left with you debugging at your end, but should be fairly easy ...
>>
>> sudo MAKEFLAGS='CFLAGS=-O0\ -g\ -Wall\ -pedantic' R CMD INSTALL
>> data.table_1.7.7.tar.gz
>>
>> R -d gdb
>>
>> run
>>
>> Do the stuff that crashes it. Does it report a C file and line number?
>>
>> Just to rule out possible svn / R CMD build strangeness, please also use
>> the data.table_1.7.7.tar.gz that's on CRAN. It still hasn't run checks
>> for 1.7.7 so on tenterhooks for that.
>>
>>
>>
>> On Thu, 2011-12-15 at 12:26 -0500, Chris Neff wrote:
>>> Just to come back, it still crashes at seemingly random times. I'm
>>> reverting back to an earlier version (1.7.1) to see if that fixes my
>>> problem.
>>>
>>> On 15 December 2011 11:08, Chris Neff <caneff at gmail.com> wrote:
>>> > Internal build of R. Can't upgrade until they do. I think it is
>>> > unlikely to see 2.14 any time soon.
>>> >
>>> > On 15 December 2011 10:50, Steve Lianoglou
>>> > <mailinglist.honeypot at gmail.com> wrote:
>>> >> Hi,
>>> >>
>>> >> Out of curiosity, is it impossible for you to upgrade R to the
>>> latest, or?
>>> >>
>>> >> -steve
>>> >>
>>> >>
>>> >> On Thu, Dec 15, 2011 at 10:42 AM, Chris Neff <caneff at gmail.com>
>>> wrote:
>>> >>> I always use svn up. I'll reboot and reinstall just to make sure.
>>> As
>>> >>> for reproducible, it still doesn't seem to crash in any consistent
>>> >>> place but I'll give it a stronger try with a test data set.
>>> >>>
>>> >>> All 480 tests in test.data.table() completed ok in 7.395sec
>>> >>> R version 2.12.1 (2010-12-16)
>>> >>> Platform: x86_64-pc-linux-gnu (64-bit)
>>> >>>
>>> >>> locale:
>>> >>> [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C
>>> >>> LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8
>>> >>> [5] LC_MONETARY=C LC_MESSAGES=en_US.utf8
>>> >>> LC_PAPER=en_US.utf8 LC_NAME=C
>>> >>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>> >>> LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
>>> >>>
>>> >>> attached base packages:
>>> >>> [1] stats graphics grDevices utils datasets grid
>>> >>> methods base
>>> >>>
>>> >>> other attached packages:
>>> >>> [1] hexbin_1.26.0 lattice_0.19-33 RColorBrewer_1.0-5
>>> >>> data.table_1.7.8 ggplot2_0.8.9 reshape_0.8.4
>>> >>> [6] plyr_1.6
>>> >>>
>>> >>> On 15 December 2011 09:52, Matthew Dowle <mdowle at mdowle.plus.com>
>>> wrote:
>>> >>>>
>>> >>>> And you did an 'svn up' (or equivalent)? Grabbing daily tar.gz
>>> snapshot
>>> >>>> from R-Forge won't include the fix yet. So svn up, then R CMD
>>> build, then
>>> >>>> R CMD INSTALL, right? (Just checking quick basics first).
>>> >>>>
>>> >>>>> Result of test.data.table(), sessionInfo() and confirm it's a
>>> clean
>>> >>>>> install after a reboot to make sure no old .so is still knocking
>>> around
>>> >>>>> somehow please. Definitely installed to the right library? If
>>> it's
>>> >>>>> crashing a lot then it should be reproducible?
>>> >>>>> Still waiting for CRAN check results for 1.7.7 in old-rel. If
>>> it's not
>>> >>>>> fixed there either that'll help to know....
>>> >>>>>
>>> >>>>>> Latest SVN version, no alloccol set, still crashing a lot. I
>>> don't
>>> >>>>>> use [<- or $<-, the only times I modify a data.table are with :=
>>> or
>>> >>>>>> by doing DT=merge(DT,blah).
>>> >>>>>>
>>> >>>>>> Any more info I can provide?
>>> >>>>>>
>>> >>>>>> On 15 December 2011 08:32, Matthew Dowle
>>> <mdowle at mdowle.plus.com> wrote:
>>> >>>>>>> Great fingers and toes crossed. If you could unset alloccol
>>> option just
>>> >>>>>>> to
>>> >>>>>>> be sure please, that would be great. You're our best hope of
>>> confirming
>>> >>>>>>> it's fixed since it was biting you several times an hour. If
>>> you use
>>> >>>>>>> [<-
>>> >>>>>>> or $<- syntax then R will copy via *tmp* and at that point the
>>> *tmp*
>>> >>>>>>> data.table is similar to a data.table loaded from disk in that
>>> it isn't
>>> >>>>>>> over-allocated anymore, I realised. Also a copy() will lose
>>> >>>>>>> over-allocation until the next column addition. That 'should'
>>> all be
>>> >>>>>>> fine
>>> >>>>>>> now in both <=2.13.2 and >=2.14.0, although the bug was
>>> something
>>> >>>>>>> simpler.
>>> >>>>>>>
>>> >>>>>>> 1.7.7 is on CRAN now and been built for windows so if CRAN
>>> check
>>> >>>>>>> results
>>> >>>>>>> tick over from "ERROR" to "OK" later today (for both windows
>>> and mac
>>> >>>>>>> old-rel), and, you're ok too, then it's fixed.
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>>> I've updated to the latest SVN version, and I'll be sure to
>>> let you
>>> >>>>>>>> know if it still crashes (however I do have the alloccol
>>> option set to
>>> >>>>>>>> 1000, so I shouldn't be bumping into reallocation very often).
>>> Thanks
>>> >>>>>>>> for finding the bug so fast!
>>> >>>>>>>>
>>> >>>>>>>> On 14 December 2011 19:56, Matthew Dowle
>>> <mdowle at mdowle.plus.com>
>>> >>>>>>>> wrote:
>>> >>>>>>>>>
>>> >>>>>>>>> Hm. Sounds like it could be a different problem then if it
>>> was in R
>>> >>>>>>>>> 2.14. There have been quite a few fixes since 1.7.4 so if you
>>> can
>>> >>>>>>>>> reproduce with 1.7.7 would be great. Or, we've sometimes
>>> seen that
>>> >>>>>>>>> just
>>> >>>>>>>>> after a package upgrade that a clean re-install can often fix
>>> things.
>>> >>>>>>>>> Perhaps if the .so was in use by another R process or a
>>> zombie, or
>>> >>>>>>>>> something. R seems to report data.table v1.7.4 (say) but it
>>> hasn't
>>> >>>>>>>>> fully
>>> >>>>>>>>> installed it properly and is still (perhaps partially) at
>>> 1.7.3. So
>>> >>>>>>>>> quit
>>> >>>>>>>>> all R (reboot to clear zombies too perhaps) and try
>>> reinstalling
>>> >>>>>>>>> using
>>> >>>>>>>>> R
>>> >>>>>>>>> CMD INSTALL. Next time it happens I mean. Can also run
>>> >>>>>>>>> test.data.table()
>>> >>>>>>>>> to check the install.
>>> >>>>>>>>>
>>> >>>>>>>>> On Wed, 2011-12-14 at 17:40 +0000, Timothée Carayol wrote:
>>> >>>>>>>>>> Hi --
>>> >>>>>>>>>>
>>> >>>>>>>>>> I have been having many unreproducible bugs with R 2.14,
>>> data.table
>>> >>>>>>>>>> 1.7.4 and ubuntu 64 bits about 10 days ago. Data was getting
>>> >>>>>>>>>> corrupted, and then R crashed. I had to go back to
>>> data.frame for
>>> >>>>>>>>>> the
>>> >>>>>>>>>> bits of code affected. I was doing a lot of rather unsafe
>>> >>>>>>>>>> manipulations with row names, rbind and cbinds.
>>> >>>>>>>>>> I didn't file a report, nor signal it, as it was occurring
>>> seemingly
>>> >>>>>>>>>> at random, and I was doing operations which aren't really
>>> what
>>> >>>>>>>>>> data.table was made for (tons of little manipulations on
>>> small
>>> >>>>>>>>>> data);
>>> >>>>>>>>>> still I guess I should now signal that 2.14 didn't fix
>>> everything
>>> >>>>>>>>>> for
>>> >>>>>>>>>> me. I do not know whether bugs subsist on post-1.7.4
>>> versions.
>>> >>>>>>>>>>
>>> >>>>>>>>>> t
>>> >>>>>>>>>>
>>> >>>>>>>>>> On Wed, Dec 14, 2011 at 5:31 PM, Matthew Dowle
>>> >>>>>>>>>> <mdowle at mdowle.plus.com>
>>> >>>>>>>>>> wrote:
>>> >>>>>>>>>> >
>>> >>>>>>>>>> > Maybe, worth a try. Are you loading any data.table objects
>>> from
>>> >>>>>>>>>> disk?
>>> >>>>>>>>>> >
>>> >>>>>>>>>> >> 64 bit 2.12.1 linux.
>>> >>>>>>>>>> >>
>>> >>>>>>>>>> >> Is there an option I can set in my session in order to
>>> work
>>> >>>>>>>>>> around
>>> >>>>>>>>>> the
>>> >>>>>>>>>> >> truelength issue? I don't care if I lose some of the
>>> >>>>>>>>>> over-allocation
>>> >>>>>>>>>> >> niceties if it stops things from crashing. Looking at the
>>> >>>>>>>>>> truelength
>>> >>>>>>>>>> >> help, would just doing:
>>> >>>>>>>>>> >>
>>> >>>>>>>>>> >> options(datatable.alloc=quote(1000))
>>> >>>>>>>>>> >>
>>> >>>>>>>>>> >> stop this? I never have more than about 50 columns at a
>>> time.
>>> >>>>>>>>>> >>
>>> >>>>>>>>>> >> On 14 December 2011 11:43, Matthew Dowle
>>> <mdowle at mdowle.plus.com>
>>> >>>>>>>>>> wrote:
>>> >>>>>>>>>> >>>
>>> >>>>>>>>>> >>> You're R < 2.14.0, right? I'm really struggling in R <
>>> 2.14.0
>>> >>>>>>>>>> to
>>> >>>>>>>>>> make
>>> >>>>>>>>>> >>> over-allocation work because R only started to
>>> initialize
>>> >>>>>>>>>> truelength to
>>> >>>>>>>>>> >>> 0
>>> >>>>>>>>>> >>> in R 2.14.0+. Before that it's unitialized (random).
>>> Trouble is
>>> >>>>>>>>>> my
>>> >>>>>>>>>> >>> attempts in R < 2.14.0 to work around that work fine for
>>> me in
>>> >>>>>>>>>> linux
>>> >>>>>>>>>> >>> 32bit
>>> >>>>>>>>>> >>> when I test in R 2.13.2, and I even test in 2.12.0 too.
>>> I test
>>> >>>>>>>>>> on
>>> >>>>>>>>>> 64bit
>>> >>>>>>>>>> >>> too but just 2.14.0. CRAN is also showing errors on
>>> 2.13.2
>>> >>>>>>>>>> (old-rel)
>>> >>>>>>>>>> >>> for
>>> >>>>>>>>>> >>> both mac and windows.
>>> >>>>>>>>>> >>>
>>> >>>>>>>>>> >>> So, this is a pre-2.14.0 (only) problem that I'll
>>> continue to
>>> >>>>>>>>>> try
>>> >>>>>>>>>> and
>>> >>>>>>>>>> >>> fix.
>>> >>>>>>>>>> >>>
>>> >>>>>>>>>> >>> Are you 64bit pre-2.14.0? Which OS? If you are 64bit
>>> linux then
>>> >>>>>>>>>> it
>>> >>>>>>>>>> adds
>>> >>>>>>>>>> >>> weight to me installing pre-2.14.0 on my 64bit instance
>>> in an
>>> >>>>>>>>>> effort to
>>> >>>>>>>>>> >>> reproduce.
>>> >>>>>>>>>> >>>
>>> >>>>>>>>>> >>>
>>> >>>>>>>>>> >>>> This will be a crappy help request because I can't seem
>>> to
>>> >>>>>>>>>> reproduce
>>> >>>>>>>>>> >>>> it, but the past few days I've been getting a lot of
>>> segfaults.
>>> >>>>>>>>>> The
>>> >>>>>>>>>> >>>> only common thing between every crash is that it
>>> happens when I
>>> >>>>>>>>>> do
>>> >>>>>>>>>> >>>>
>>> >>>>>>>>>> >>>> DT[, z := x]
>>> >>>>>>>>>> >>>>
>>> >>>>>>>>>> >>>> where z was not a column that existed in DT before, and
>>> x is
>>> >>>>>>>>>> either an
>>> >>>>>>>>>> >>>> existing column of DT or a separate variable, doesn't
>>> matter.
>>> >>>>>>>>>> Beyond
>>> >>>>>>>>>> >>>> that I can't reproduce a set of steps that gets R to
>>> crash.
>>> >>>>>>>>>> This
>>> >>>>>>>>>> is
>>> >>>>>>>>>> >>>> with the latest SVN version.
>>> >>>>>>>>>> >>>>
>>> >>>>>>>>>> >>>> Is there more information I can provide to help track
>>> this
>>> >>>>>>>>>> down?
>>> >>>>>>>>>> >>>> _______________________________________________
>>> >>>>>>>>>> >>>> datatable-help mailing list
>>> >>>>>>>>>> >>>> datatable-help at lists.r-forge.r-project.org
>>> >>>>>>>>>> >>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>> >>>>>>>>>> >>>>
>>> >>>>>>>>>> >>>
>>> >>>>>>>>>> >>>
>>> >>>>>>>>>> >>
>>> >>>>>>>>>> >
>>> >>>>>>>>>> >
>>> >>>>>>>>>> > _______________________________________________
>>> >>>>>>>>>> > datatable-help mailing list
>>> >>>>>>>>>> > datatable-help at lists.r-forge.r-project.org
>>> >>>>>>>>>> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>> _______________________________________________
>>> >>>>>>>>> datatable-help mailing list
>>> >>>>>>>>> datatable-help at lists.r-forge.r-project.org
>>> >>>>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>> >>>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> _______________________________________________
>>> >>>>> datatable-help mailing list
>>> >>>>> datatable-help at lists.r-forge.r-project.org
>>> >>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>> >>>>>
>>> >>>>
>>> >>>>
>>> >>> _______________________________________________
>>> >>> datatable-help mailing list
>>> >>> datatable-help at lists.r-forge.r-project.org
>>> >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Steve Lianoglou
>>> >> Graduate Student: Computational Systems Biology
>>> >> | Memorial Sloan-Kettering Cancer Center
>>> >> | Weill Medical College of Cornell University
>>> >> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>
>>
>
More information about the datatable-help
mailing list