[GHC] #15571: Eager AP_STACK blackholing causes incorrect size info for sanity checks

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

[GHC] #15571: Eager AP_STACK blackholing causes incorrect size info for sanity checks

GHC - devs mailing list
#15571: Eager AP_STACK blackholing causes incorrect size info for sanity checks
-------------------------------------+-------------------------------------
           Reporter:  osa1           |             Owner:  (none)
               Type:  bug            |            Status:  new
           Priority:  normal         |         Milestone:  8.6.1
          Component:  Runtime        |           Version:  8.5
  System                             |
           Keywords:                 |  Operating System:  Unknown/Multiple
       Architecture:                 |   Type of failure:  None/Unknown
  Unknown/Multiple                   |
          Test Case:                 |        Blocked By:
           Blocking:                 |   Related Tickets:  #15508
Differential Rev(s):                 |         Wiki Page:
-------------------------------------+-------------------------------------
 While debugging #15508 I found a case where eager blackholing in AP_STACK
 causes `closure_sizeW()` to return incorrect size, which in turn causes
 incorrect slop zeroing by `OVERWRITING_CLOSURE()`, which breaks sanity
 checks.

 To reproduce, cd into `testsuite/tests/concurrent/prog001`, then:

 {{{
 $ ghc-stage2 Mult.hs -fforce-recomp -debug -rtsopts
 $ ./Mult +RTS -DS
 Mult: internal error: checkClosure: stack frame
     (GHC version 8.7.20180825 for x86_64_unknown_linux)
     Please report this as a GHC bug:
 http://www.haskell.org/ghc/reportabug
 zsh: abort (core dumped)  ./Mult +RTS -DS
 }}}

 Here's how the problem occurs:

 1. Allocate an AP_STACK in a generation during a GC.

 2. Evaluate the AP_STACK. The entry code first WHITEHOLEs and then eagerly
    BLACKHOLEs it. At this point size of the STACK becomes 2 because that's
 the
    size of (eager or not) BLACKHOLE.

 3. To start a GC the thread does `threadPaused`, which in line 342
 actually
    BLACKHOLEs the eager blackhole (is this part really correct?) and zeros
 the
    slop, but because the eager blackhole has the same size as BLACKHOLE it
    doesn't actually zero the stack frames in the original AP_STACK's
 payload.

 4. In the next GC, in pre-GC sanity check we check the whole heap. When
    checking the generation that the BLACKHOLE (the AP_STACK that became a
    BLACKHOLE in step (2)) resides in we check the closure, and then check
    `closure + 2` (2 is the size of BLACKHOLE) instead of `closure + <size
 of the stack>`, and end up checking a stack frame of the original
 AP_STACK.
    This causes the sanity check to fail because we don't expect to see a
 stack
    frame outside of a stack.

 In summary, normally when blackhole an object we zero the space after the
 blackhole (i.e. some part of the original object's payload) so that in
 sanity
 checks we can skip over that space, but we can't do this when eagerly
 blackholing (because the payload of the original object will be used)
 which
 causes sanity check failures.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15571>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler

_______________________________________________
ghc-tickets mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #15571: Eager AP_STACK blackholing causes incorrect size info for sanity checks

GHC - devs mailing list
#15571: Eager AP_STACK blackholing causes incorrect size info for sanity checks
-------------------------------------+-------------------------------------
        Reporter:  osa1              |                Owner:  (none)
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:  8.6.1
       Component:  Runtime System    |              Version:  8.5
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:  #15508            |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by simonmar):

 Isn't this a problem in general for eager blackholing? I can't think of a
 good fix off the top of my head.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15571#comment:1>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler

_______________________________________________
ghc-tickets mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #15571: Eager AP_STACK blackholing causes incorrect size info for sanity checks

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#15571: Eager AP_STACK blackholing causes incorrect size info for sanity checks
-------------------------------------+-------------------------------------
        Reporter:  osa1              |                Owner:  (none)
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:  8.6.1
       Component:  Runtime System    |              Version:  8.5
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:  #15508            |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by osa1):

 We discussed this. Simon is right in comment:1 that this is problem with
 eager
 blackholing in general. However, eager blackholing is optional, so if
 you're
 working on the RTS and you need sanity checks can always not use it (which
 is
 the default). The problem with AP_STACK eager blackholing is more serious
 as it
 can't be disabled (expect perhaps in non-threaded runtime?)

 Anyways, we came up with this plan:

 - Implement a new stack frame AP_STACK_EAGER_BLACKHOLE which is exactly
 like
   the EAGER_BLACKHOLE but indicates that the object that became an eager
   blackhole was an AP_STACK.

 - (Only in debug runtime) Allocate one more word when allocating an
 AP_STACK
   and record its size. Note that this is only possible because we only
 allocate
   AP_STACKs in runtime (and not in generated code).

 - When eagerly blackholing an AP_STACK use AP_STACK_EAGER_BLACKHOLE
 instead of
   EAGER_BLACKHOLE and record the AP_STACK's size.

 - When we actually BLACKHOLE the AP_STACK_EAGER_BLACKHOLE in
 `threadPaused` we
   look at the size of the object and we can correctly return the
 `AP_STACK`
   size in `closure_sizeW()` because we recorded it in step (2).

 Simon, did I get this right? One thing I'm not sure (and forgot to ask at
 the
 meeting) is (3) in the bug report. I don't know if we're supposed to zero
 the
 slop when blackholing an AP_STACK eager blackhole in `threadPaused`. We
 enter
 the AP_STACK right after eagerly blackholing it, but I'm not sure if we
 can
 call `threadPaused` before being done with the stack. Can you comment on
 this?

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15571#comment:2>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler

_______________________________________________
ghc-tickets mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #15571: Eager AP_STACK blackholing causes incorrect size info for sanity checks

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#15571: Eager AP_STACK blackholing causes incorrect size info for sanity checks
-------------------------------------+-------------------------------------
        Reporter:  osa1              |                Owner:  (none)
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:  8.6.1
       Component:  Runtime System    |              Version:  8.5
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:  #15508            |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by simonmar):

 Hmm, I wonder if there's a much simpler way. The AP_STACK entry code
 simply copies the payload of the object onto the stack, after doing a
 stack check. So why don't we zero the slop immediately after copying the
 payload? That way we don't overwrite any data, and we don't need any new
 objects or stack frames.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15571#comment:3>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler

_______________________________________________
ghc-tickets mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #15571: Eager AP_STACK blackholing causes incorrect size info for sanity checks

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#15571: Eager AP_STACK blackholing causes incorrect size info for sanity checks
-------------------------------------+-------------------------------------
        Reporter:  osa1              |                Owner:  (none)
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:  8.6.1
       Component:  Runtime System    |              Version:  8.5
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:  #15508            |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by simonmar):

 Zero the slop right here:
 https://phabricator.haskell.org/diffusion/GHC/browse/master/rts%2FApply.cmm$682

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15571#comment:4>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler

_______________________________________________
ghc-tickets mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #15571: Eager AP_STACK blackholing causes incorrect size info for sanity checks

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#15571: Eager AP_STACK blackholing causes incorrect size info for sanity checks
-------------------------------------+-------------------------------------
        Reporter:  osa1              |                Owner:  (none)
            Type:  bug               |               Status:  patch
        Priority:  normal            |            Milestone:  8.6.1
       Component:  Runtime System    |              Version:  8.5
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:  #15508            |  Differential Rev(s):  Phab:D5165
       Wiki Page:                    |
-------------------------------------+-------------------------------------
Changes (by osa1):

 * status:  new => patch
 * differential:   => Phab:D5165


--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15571#comment:5>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler

_______________________________________________
ghc-tickets mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #15571: Eager AP_STACK blackholing causes incorrect size info for sanity checks

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#15571: Eager AP_STACK blackholing causes incorrect size info for sanity checks
-------------------------------------+-------------------------------------
        Reporter:  osa1              |                Owner:  (none)
            Type:  bug               |               Status:  patch
        Priority:  normal            |            Milestone:  8.6.1
       Component:  Runtime System    |              Version:  8.5
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:  #15508            |  Differential Rev(s):  Phab:D5165
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by osa1):

 I submitted a patch.

 Simon, looking at this note

 {{{
    Note [zeroing slop]

    In some scenarios we write zero words into "slop"; memory that is
    left unoccupied after we overwrite a closure in the heap with a
    smaller closure.

    Zeroing slop is required for:

     - full-heap sanity checks (DEBUG, and +RTS -DS)
     - LDV profiling (PROFILING, and +RTS -hb)

    Zeroing slop must be disabled for:

     - THREADED_RTS with +RTS -N2 and greater, because we cannot
       overwrite slop when another thread might be reading it.

    Hence, slop is zeroed when either:

     - PROFILING && era <= 0 (LDV is on)
     - !THREADED_RTS && DEBUG

    And additionally:

     - LDV profiling and +RTS -N2 are incompatible
     - full-heap sanity checks are disabled for THREADED_RTS
 }}}

 This says that we can't zero slops in threaded runtime and heap sanity
 checks
 are disabled in threaded runtime, but that's not true. I can see in gdb
 that we
 do full heap sanity check in threaded programs too. Do you know what
 changed
 since this comment? I'm surprised that sanity checks work at all in
 threaded
 runtime because we don't zero the slops...

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15571#comment:6>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler

_______________________________________________
ghc-tickets mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #15571: Eager AP_STACK blackholing causes incorrect size info for sanity checks

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#15571: Eager AP_STACK blackholing causes incorrect size info for sanity checks
-------------------------------------+-------------------------------------
        Reporter:  osa1              |                Owner:  (none)
            Type:  bug               |               Status:  patch
        Priority:  normal            |            Milestone:  8.6.1
       Component:  Runtime System    |              Version:  8.5
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:  #15508            |  Differential Rev(s):  Phab:D5165
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by simonmar):

 If I recall correctly we disable the part of the sanity check that
 requires zeroing the slop when THREADED_RTS is on. See here:
 https://phabricator.haskell.org/diffusion/GHC/browse/master/rts%2Fsm%2FSanity.c$738-743

 So full heap sanity check only happens after a major GC in the threaded
 runtime, and we don't do slop zeroing.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15571#comment:7>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler

_______________________________________________
ghc-tickets mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #15571: Eager AP_STACK blackholing causes incorrect size info for sanity checks

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#15571: Eager AP_STACK blackholing causes incorrect size info for sanity checks
-------------------------------------+-------------------------------------
        Reporter:  osa1              |                Owner:  (none)
            Type:  bug               |               Status:  patch
        Priority:  normal            |            Milestone:  8.6.1
       Component:  Runtime System    |              Version:  8.5
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:  #15508            |  Differential Rev(s):  Phab:D5165
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by Ömer Sinan Ağacan <omeragacan@…>):

 In [changeset:"66c17293648fd03a04aabfd807b3c8336e8f843a/ghc"
 66c17293/ghc]:
 {{{
 #!CommitTicketReference repository="ghc"
 revision="66c17293648fd03a04aabfd807b3c8336e8f843a"
 Fix slop zeroing for AP_STACK eager blackholes in debug build

 As #15571 reports, eager blackholing breaks sanity checks as we can't
 zero the payload when eagerly blackholing (because we'll be using the
 payload after blackholing), but by the time we blackhole a previously
 eagerly blackholed object (in `threadPaused()`) we don't have the
 correct size information for the object (because the object's type
 becomes BLACKHOLE when we eagerly blackhole it) so can't properly zero
 the slop.

 This problem can be solved for AP_STACK eager blackholing (which unlike
 eager blackholing in general, is not optional) by zeroing the payload
 after entering the stack. This patch implements this idea.

 Fixes #15571.

 Test Plan:
 Previously concprog001 when compiled and run with sanity checks

     ghc-stage2 Mult.hs -debug -rtsopts
     ./Mult +RTS -DS

 was failing with

     Mult: internal error: checkClosure: stack frame
         (GHC version 8.7.20180821 for x86_64_unknown_linux)
         Please report this as a GHC bug:
 http://www.haskell.org/ghc/reportabug

 thic patch fixes this panic. The test still panics, but it runs for a
 while
 before panicking (instead of directly panicking as before), and the new
 problem
 seems unrelated:

     Mult: internal error: ASSERTION FAILED: file rts/sm/Sanity.c, line 296
         (GHC version 8.7.20180919 for x86_64_unknown_linux)
         Please report this as a GHC bug:
 http://www.haskell.org/ghc/reportabug

 The new problem will be fixed in another diff.

 I also tried slow validate (which requires D5164): this does not introduce
 any
 new failures.

 Reviewers: simonmar, bgamari, erikd

 Reviewed By: simonmar

 Subscribers: rwbarton, carter

 GHC Trac Issues: #15571

 Differential Revision: https://phabricator.haskell.org/D5165
 }}}

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15571#comment:8>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler

_______________________________________________
ghc-tickets mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #15571: Eager AP_STACK blackholing causes incorrect size info for sanity checks

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#15571: Eager AP_STACK blackholing causes incorrect size info for sanity checks
-------------------------------------+-------------------------------------
        Reporter:  osa1              |                Owner:  (none)
            Type:  bug               |               Status:  closed
        Priority:  normal            |            Milestone:
       Component:  Runtime System    |              Version:  8.5
      Resolution:  fixed             |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:  #15508            |  Differential Rev(s):  Phab:D5165
       Wiki Page:                    |
-------------------------------------+-------------------------------------
Changes (by osa1):

 * status:  patch => closed
 * resolution:   => fixed
 * milestone:  8.6.1 =>


--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15571#comment:9>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler

_______________________________________________
ghc-tickets mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #15571: Eager AP_STACK blackholing causes incorrect size info for sanity checks

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#15571: Eager AP_STACK blackholing causes incorrect size info for sanity checks
-------------------------------------+-------------------------------------
        Reporter:  osa1              |                Owner:  (none)
            Type:  bug               |               Status:  merge
        Priority:  normal            |            Milestone:  8.6.2
       Component:  Runtime System    |              Version:  8.5
      Resolution:  fixed             |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:  #15508            |  Differential Rev(s):  Phab:D5165
       Wiki Page:                    |
-------------------------------------+-------------------------------------
Changes (by bgamari):

 * status:  closed => merge
 * milestone:   => 8.6.2


--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15571#comment:10>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler

_______________________________________________
ghc-tickets mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #15571: Eager AP_STACK blackholing causes incorrect size info for sanity checks

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#15571: Eager AP_STACK blackholing causes incorrect size info for sanity checks
-------------------------------------+-------------------------------------
        Reporter:  osa1              |                Owner:  (none)
            Type:  bug               |               Status:  closed
        Priority:  normal            |            Milestone:  8.6.2
       Component:  Runtime System    |              Version:  8.5
      Resolution:  fixed             |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:  #15508            |  Differential Rev(s):  Phab:D5165
       Wiki Page:                    |
-------------------------------------+-------------------------------------
Changes (by bgamari):

 * status:  merge => closed


Comment:

 Merged to `ghc-8.6` with 10e3125d4f6ea0684cbe1315b35a2a213d2765cd.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15571#comment:11>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler

_______________________________________________
ghc-tickets mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-tickets