[GHC] #15155: How untagged pointers sneak into banged fields

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[GHC] #15155: How untagged pointers sneak into banged fields

GHC - devs mailing list
#15155: How untagged pointers sneak into banged fields
-------------------------------------+-------------------------------------
           Reporter:  heisenbug      |             Owner:  (none)
               Type:  bug            |            Status:  new
           Priority:  normal         |         Milestone:  8.6.1
          Component:  Compiler       |           Version:  8.4.2
           Keywords:                 |  Operating System:  Unknown/Multiple
       Architecture:                 |   Type of failure:  None/Unknown
  Unknown/Multiple                   |
          Test Case:                 |        Blocked By:
           Blocking:  14677          |   Related Tickets:
Differential Rev(s):                 |         Wiki Page:
-------------------------------------+-------------------------------------
 (''N.B.'' I am writing this up from memory, and cannot verify it just now,
 maybe someone can lend a hand, otherwise I'll do it ASAP!)

 Here is a way how untagged pointers to strict data can be created in
 banged (strict) constructor fields.
 This reproduction recipe **depends on the patch from #14677 applied**.

 We have 3 modules `A`, `B` and `C`:

 {{{#!hs
 module A where
 data A = X | Y | Z
 a = Z
 }}}

 {{{#!hs
 module B where
 import A
 newtype B = B A
 b = B a
 }}}

 {{{#!hs
 {-# language MagicHash #-}

 module C where
 import A
 import B
 import GHC.Exts

 data C = C !B
 c = C b

 main = do print (I# (reallyUnsafePtrEquality# a (coerce b))) -- prints 0,
 b is softlink
           print (I# (dataToTag# c)) -- prints 0: not entered yet
           print (case c of C b' -> I# (dataToTag# b')) -- prints 0?
           print (case c of C (B a') -> I# (dataToTag# a')) -- prints 3
 }}}

 -------------------
 == Why this happens

 `B.b` is a newtype to `A.a` so one would expect that both alias the same
 memory location (a ''hardlink'' in filesystem parlance). But currently
 reexports are implemented with a special type of closure `IND_STATIC` (a
 ''softlink'') which needs to be entered to obtain the actual (tagged
 pointer). The `IND_STATIC` closure's pointer is never tagged (otherwise it
 would never be entered, instead interpreted as a honest-to-goodness `A.A`,
 which causes the symptoms seen in #14677).

 With #14677 applied to GHC, the unfolding of `B.b` is correctly read when
 compiling `C` (with `-O1` and better) and thus the compiler knows that it
 should be a tagged pointer value. Thus the construction of `C.c` shortcuts
 the entering of `B.b` when filling the strict field, and (because `B.b`
 being a softlink, thus untagged) the field ends up carrying a 0 tag.

 -------------------------
 == How can this be fixed?

 I see two possibilities one conservative and one invasive.

 === Conservative

 When seeing a coercion unfolding of a tagged value being used to
 initialise a strict field, do not skip the evaluatedness check, but cater
 for the possibility of an `IND_STATIC` closure. Check the closure type,
 and if confirmed, steal the pointee and use that.

 === Invasive

 Get rid of the `IND_STATIC` closures altogether. For ''intra-module''
 softlinks we can have proper hardlinks (assembler `.equiv` directives, or
 LLVM `alias`es). ''Inter-module'' softlinks can also be eliminated by
 linker scripts. This would however cause more build artifacts, so I don't
 know how hairy it would turn out.

 OTOH, it would reduce binary size by eliminating indirection closures and
 potential dereferencing code.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15155>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler

_______________________________________________
ghc-tickets mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #15155: How untagged pointers sneak into banged fields

GHC - devs mailing list
#15155: How untagged pointers sneak into banged fields
-------------------------------------+-------------------------------------
        Reporter:  heisenbug         |                Owner:  (none)
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:  8.6.1
       Component:  Compiler          |              Version:  8.4.2
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:  14677
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by simonpj):

 I'm lost in a maze of twisty passages.  We have

 * #14677 Code generator does not correctly tag a pointer.  I'm not sure if
 this is a serious bug, or just the loss of a potential optimisation.  (I
 think the latter.)

 * #14626 No need to enter a scrutinised value.  Apparently blocked on
 #14677

 * #14373 Introduce PTR-tagging for big constructor families.   This has
 Phab:D4267.  I think it may be blocked (in some way) on #15155.

 * #15155 (this ticket).

 I'm not sure how these tickets inter-relate beyond the blocking notes I've
 put here.  Which is most urgent?  Gabor, are you actively working on any
 of them?

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15155#comment:1>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler

_______________________________________________
ghc-tickets mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #15155: How untagged pointers sneak into banged fields

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#15155: How untagged pointers sneak into banged fields
-------------------------------------+-------------------------------------
        Reporter:  heisenbug         |                Owner:  (none)
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:  8.6.1
       Component:  Compiler          |              Version:  8.4.2
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:  14677
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by heisenbug):

 Replying to [comment:1 simonpj]:
 > I'm lost in a maze of twisty passages.  We have

 Sorry, but this was how the story unfolded with time. Believe me, I am
 also starting to get lost. Will try to clarify the relations, to my best
 knowledge, below.

 >
 > * #14677 Code generator does not correctly tag a pointer.  I'm not sure
 if this is a serious bug, or just the loss of a potential optimisation.
 (I think the latter.)

 Performance-only bug. Leads to unnecessarily generated code and extra
 runtime because closures are entered redundantly.

 >
 > * #14626 No need to enter a scrutinised value.  Apparently blocked on
 #14677

 Does not seem to block #14373 but could make it more effective.

 >
 > * #14373 Introduce PTR-tagging for big constructor families.   This has
 Phab:D4267.  I think it may be blocked (in some way) on #15155.

 All the above is about making pointer tagging more expressive and reducing
 closure entries.

 The very bottom of this stack is

 * #13861 Take more advantage of STG representation invariance (follows up
 #9291)


 >
 > * #15155 (this ticket).
 >
 > I'm not sure how these tickets inter-relate beyond the blocking notes
 I've put here.  Which is most urgent?  Gabor, are you actively working on
 any of them?

 I think #15155 (this ticket) is the one that needs to be resolved first,
 so that #14677 can land. Then #14626 and #14373 can be resolved, unless
 new problems surface. #13861 has a proof-of-concept implementation, which
 will need some love, but with extended PTR-tagging (#14373) it'd cover
 more cases.

 Since I dreamed up the "conservative" solution just last night, I have no
 fix for this one yet, but could start with an implementation in early June
 latest. (In January I implemented intra-module "hardlinks" in the NCG.) I
 think it would be nice to have a discussion about `IND_STATIC` closures
 too, resp. their elimination.

 Overall I think it is worrisome that strict fields get initialised with
 non-tagged pointers, even if the compiler statically knows that the
 corresponding data is in WHNF.

 TL/DR: This is a stack of mostly dependent tickets (newer ones need to be
 resolved before older ones), each unlocking a new optimisation in its own
 right. Hopefully no more monsters lurking.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15155#comment:2>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler

_______________________________________________
ghc-tickets mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #15155: How untagged pointers sneak into banged fields

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#15155: How untagged pointers sneak into banged fields
-------------------------------------+-------------------------------------
        Reporter:  heisenbug         |                Owner:  (none)
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:  8.6.1
       Component:  Compiler          |              Version:  8.4.2
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:  14677
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by simonmar):

 I didn't really follow the description here.  Naively I'd expect `c` to
 compile into a CAF, like `c = case b of b' -> C b'` and then we'd be fine.
 If the compiler is assuming that `b` is already a value and thus avoiding
 evaluating it, that would be a false assumption. Where is the false
 assumption being made?

 I think the `IND_STATIC` stuff is a red herring. There's a top-level
 binding `b = a` which is compiled into an `IND_STATIC` as an optimisation,
 but it could also be compiled into a CAF, this is just a back-end code-
 generation choice.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15155#comment:3>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler

_______________________________________________
ghc-tickets mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-tickets