[GHC] #14941: Switching direct type family application to EqPred (~) prevents inlining in code using vector (10x slowdown)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

[GHC] #14941: Switching direct type family application to EqPred (~) prevents inlining in code using vector (10x slowdown)

GHC - devs mailing list
#14941: Switching direct type family application to EqPred (~) prevents inlining in
code using vector (10x slowdown)
-------------------------------------+-------------------------------------
           Reporter:  nh2            |             Owner:  (none)
               Type:  bug            |            Status:  new
           Priority:  normal         |         Milestone:
          Component:  Compiler       |           Version:  8.2.2
           Keywords:                 |  Operating System:  Unknown/Multiple
       Architecture:                 |   Type of failure:  Runtime
  Unknown/Multiple                   |  performance bug
          Test Case:                 |        Blocked By:
           Blocking:                 |   Related Tickets:
Differential Rev(s):                 |         Wiki Page:
-------------------------------------+-------------------------------------
 {{{#!hs
 selectVectorDestructive2 :: (PrimMonad m, VG.Vector v e, Ord e, Show e,
 VG.Mutable v ~ vm)
   => Int -> v e ->           vm (PrimState m) e -> Int -> Int -> m ()
 -- slow

 selectVectorDestructive2 :: (PrimMonad m, VG.Vector v e, Ord e, Show e)
   => Int -> v e -> VG.Mutable v (PrimState m) e -> Int -> Int -> m ()
 -- fast
 }}}

 These two functions are identical except one has `VG.Mutable v ~ vm` as a
 constraint, the other one has it in the type signature right of the `=>`.

 The second function is 10x faster.

 I would expect them to be equally fast.

 The code of the functions is identical, I change only the type
 declaration.

 The slowness happens because with the first function, inlining of
 primitives like `unsafeRead` does not happen, and thus also it boxes the
 `Int#`s back to `Int`s when calling `unsafeRead`.

 In particular, in `-ddump-simpl`, the slow version has

 {{{#!hs
 $wpartitionLoop2_rgEy
   :: forall (m :: * -> *) (vm :: * -> * -> *) e.
      (PrimMonad m, MVector vm e, Ord e) =>
      vm (PrimState m) e
      -> GHC.Prim.Int#
      -> e
      -> GHC.Prim.Int#
      -> GHC.Prim.Int#
      -> GHC.Prim.Int#
      -> m Int
 $wpartitionLoop2_rgEy
   = \... ->
           let {
             endIndex_a7Ay :: Int
             ...
             endIndex_a7Ay = GHC.Types.I# ww2_sfZn } in
   ...
                                      (VGM.basicUnsafeRead
                                         ...
                                         endIndex_a7Ay)
   ...
 }}}

 while the fast version has

 {{{#!hs
 $w$spartitionLoop2_rgUN
   :: GHC.Prim.Int#
      -> GHC.Prim.Int#
      -> GHC.Prim.MutableByteArray# GHC.Prim.RealWorld
      -> GHC.Prim.Int#
      -> GHC.Prim.Int#
      -> GHC.Prim.Int#
      -> GHC.Prim.Int#
      -> GHC.Prim.Int#
      -> GHC.Prim.State# GHC.Prim.RealWorld
      -> (# GHC.Prim.State# GHC.Prim.RealWorld, Int #)

 ...

 $spartitionLoop2_rgUP
   :: VG.Mutable VU.Vector (PrimState IO) Int
      -> Int
      -> Int
      -> Int
      -> Int
      -> Int
      -> GHC.Prim.State# GHC.Prim.RealWorld
      -> (# GHC.Prim.State# GHC.Prim.RealWorld, Int #)
 }}}

 So with `VG.Mutable v (PrimState m) e` in the type signature, GHC managed
 to inline + specialise it all the way down to concrete types
 (`VUM.MVector`, `IO`, and `Int` as the element type), and consequently
 inline `basicUnsafeRead` on unboxed `Int#`.

 But with `VG.Mutable v ~ vm`, ghc keeps `vm (PrimState m) e` all the way,
 passes around dictionaries, thus eventually cannot inline
 `basicUnsafeRead` and re-packs already unboxed values, like `endIndex_a7Ay
 = GHC.Types.I# ww2_sfZn`, before passing them into the non-inlined call of
 `basicUnsafeRead`, thus making a tight loop allocate that normally
 wouldn't allocate.

 Why might rewriting the type signature in such a trivial way make this
 happen?

 ----

 I have tested this on GHC 8.0.2, GHC 8.2.2, and GHC 8.5 HEAD commit
 cc4677c36ee.

 Reproducer:

 * https://github.com/nh2/haskell-quickselect-median-of-
 medians/blob/0efd6293e779fda2d864ec3d75329fb16b8af6d9/Main.hs#L506
 * Running instructions are [https://github.com/nh2/haskell-quickselect-
 median-of-medians/commit/7a49d673990dfaebdb0ba837c3fbbaae0455dba0 in this
 commit message]; for short: `stack exec -- ghc -O --make Main.hs -rtsopts
 -ddump-simpl -dsuppress-coercions -fforce-recomp -ddump-to-file -fno-full-
 laziness && ./Main +RTS -sstderr`
 * For that file, I have pregenerated `-dverbose-core2core` output here:
 https://github.com/nh2/haskell-quickselect-median-of-
 medians/tree/0efd6293e779fda2d864ec3d75329fb16b8af6d9/slowness-analysis
 * I originally just wanted to write a fast median-of-medians
 implementation on ZuriHac 2017, but got totally derailed by this
 performance problem. The version of it that I link here is a total mess
 because of me mauling it to track down this performance issue.

 Trying to increase `-funfolding-use-threshold` or `-funfolding-keeness-
 factor` did not change the situation.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/14941>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler

_______________________________________________
ghc-tickets mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #14941: Switching direct type family application to EqPred (~) prevents inlining in code using vector (10x slowdown)

GHC - devs mailing list
#14941: Switching direct type family application to EqPred (~) prevents inlining in
code using vector (10x slowdown)
-------------------------------------+-------------------------------------
        Reporter:  nh2               |                Owner:  (none)
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:
       Component:  Compiler          |              Version:  8.2.2
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
 Type of failure:  Runtime           |  Unknown/Multiple
  performance bug                    |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by mpickering):

 I can reproduce this but I had to delete the criterion dependency in the
 cabal file.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/14941#comment:1>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler

_______________________________________________
ghc-tickets mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #14941: Switching direct type family application to EqPred (~) prevents inlining in code using vector (10x slowdown)

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#14941: Switching direct type family application to EqPred (~) prevents inlining in
code using vector (10x slowdown)
-------------------------------------+-------------------------------------
        Reporter:  nh2               |                Owner:  (none)
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:
       Component:  Compiler          |              Version:  8.2.2
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
 Type of failure:  Runtime           |  Unknown/Multiple
  performance bug                    |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------
Description changed by nh2:

Old description:

> {{{#!hs
> selectVectorDestructive2 :: (PrimMonad m, VG.Vector v e, Ord e, Show e,
> VG.Mutable v ~ vm)
>   => Int -> v e ->           vm (PrimState m) e -> Int -> Int -> m ()
> -- slow
>
> selectVectorDestructive2 :: (PrimMonad m, VG.Vector v e, Ord e, Show e)
>   => Int -> v e -> VG.Mutable v (PrimState m) e -> Int -> Int -> m ()
> -- fast
> }}}
>
> These two functions are identical except one has `VG.Mutable v ~ vm` as a
> constraint, the other one has it in the type signature right of the `=>`.
>
> The second function is 10x faster.
>
> I would expect them to be equally fast.
>
> The code of the functions is identical, I change only the type
> declaration.
>
> The slowness happens because with the first function, inlining of
> primitives like `unsafeRead` does not happen, and thus also it boxes the
> `Int#`s back to `Int`s when calling `unsafeRead`.
>
> In particular, in `-ddump-simpl`, the slow version has
>
> {{{#!hs
> $wpartitionLoop2_rgEy
>   :: forall (m :: * -> *) (vm :: * -> * -> *) e.
>      (PrimMonad m, MVector vm e, Ord e) =>
>      vm (PrimState m) e
>      -> GHC.Prim.Int#
>      -> e
>      -> GHC.Prim.Int#
>      -> GHC.Prim.Int#
>      -> GHC.Prim.Int#
>      -> m Int
> $wpartitionLoop2_rgEy
>   = \... ->
>           let {
>             endIndex_a7Ay :: Int
>             ...
>             endIndex_a7Ay = GHC.Types.I# ww2_sfZn } in
>   ...
>                                      (VGM.basicUnsafeRead
>                                         ...
>                                         endIndex_a7Ay)
>   ...
> }}}
>
> while the fast version has
>
> {{{#!hs
> $w$spartitionLoop2_rgUN
>   :: GHC.Prim.Int#
>      -> GHC.Prim.Int#
>      -> GHC.Prim.MutableByteArray# GHC.Prim.RealWorld
>      -> GHC.Prim.Int#
>      -> GHC.Prim.Int#
>      -> GHC.Prim.Int#
>      -> GHC.Prim.Int#
>      -> GHC.Prim.Int#
>      -> GHC.Prim.State# GHC.Prim.RealWorld
>      -> (# GHC.Prim.State# GHC.Prim.RealWorld, Int #)
>
> ...
>
> $spartitionLoop2_rgUP
>   :: VG.Mutable VU.Vector (PrimState IO) Int
>      -> Int
>      -> Int
>      -> Int
>      -> Int
>      -> Int
>      -> GHC.Prim.State# GHC.Prim.RealWorld
>      -> (# GHC.Prim.State# GHC.Prim.RealWorld, Int #)
> }}}
>
> So with `VG.Mutable v (PrimState m) e` in the type signature, GHC managed
> to inline + specialise it all the way down to concrete types
> (`VUM.MVector`, `IO`, and `Int` as the element type), and consequently
> inline `basicUnsafeRead` on unboxed `Int#`.
>
> But with `VG.Mutable v ~ vm`, ghc keeps `vm (PrimState m) e` all the way,
> passes around dictionaries, thus eventually cannot inline
> `basicUnsafeRead` and re-packs already unboxed values, like
> `endIndex_a7Ay = GHC.Types.I# ww2_sfZn`, before passing them into the
> non-inlined call of `basicUnsafeRead`, thus making a tight loop allocate
> that normally wouldn't allocate.
>
> Why might rewriting the type signature in such a trivial way make this
> happen?
>
> ----
>
> I have tested this on GHC 8.0.2, GHC 8.2.2, and GHC 8.5 HEAD commit
> cc4677c36ee.
>
> Reproducer:
>
> * https://github.com/nh2/haskell-quickselect-median-of-
> medians/blob/0efd6293e779fda2d864ec3d75329fb16b8af6d9/Main.hs#L506
> * Running instructions are [https://github.com/nh2/haskell-quickselect-
> median-of-medians/commit/7a49d673990dfaebdb0ba837c3fbbaae0455dba0 in this
> commit message]; for short: `stack exec -- ghc -O --make Main.hs -rtsopts
> -ddump-simpl -dsuppress-coercions -fforce-recomp -ddump-to-file -fno-
> full-laziness && ./Main +RTS -sstderr`
> * For that file, I have pregenerated `-dverbose-core2core` output here:
> https://github.com/nh2/haskell-quickselect-median-of-
> medians/tree/0efd6293e779fda2d864ec3d75329fb16b8af6d9/slowness-analysis
> * I originally just wanted to write a fast median-of-medians
> implementation on ZuriHac 2017, but got totally derailed by this
> performance problem. The version of it that I link here is a total mess
> because of me mauling it to track down this performance issue.
>
> Trying to increase `-funfolding-use-threshold` or `-funfolding-keeness-
> factor` did not change the situation.
New description:

 {{{#!hs
 selectVectorDestructive2 :: (PrimMonad m, VG.Vector v e, Ord e, Show e,
 VG.Mutable v ~ vm)
   => Int -> v e ->           vm (PrimState m) e -> Int -> Int -> m ()
 -- slow

 selectVectorDestructive2 :: (PrimMonad m, VG.Vector v e, Ord e, Show e)
   => Int -> v e -> VG.Mutable v (PrimState m) e -> Int -> Int -> m ()
 -- fast
 }}}

 These two functions are identical except one has `VG.Mutable v ~ vm` as a
 constraint, the other one has it in the type signature right of the `=>`.

 The second function is 10x faster.

 I would expect them to be equally fast.

 The code of the functions is identical, I change only the type
 declaration.

 The slowness happens because with the first function, inlining of
 primitives like `unsafeRead` does not happen, and thus also it boxes the
 `Int#`s back to `Int`s when calling `unsafeRead`.

 In particular, in `-ddump-simpl`, the slow version has

 {{{#!hs
 $wpartitionLoop2_rgEy
   :: forall (m :: * -> *) (vm :: * -> * -> *) e.
      (PrimMonad m, MVector vm e, Ord e) =>
      vm (PrimState m) e
      -> GHC.Prim.Int#
      -> e
      -> GHC.Prim.Int#
      -> GHC.Prim.Int#
      -> GHC.Prim.Int#
      -> m Int
 $wpartitionLoop2_rgEy
   = \... ->
           let {
             endIndex_a7Ay :: Int
             ...
             endIndex_a7Ay = GHC.Types.I# ww2_sfZn } in
   ...
                                      (VGM.basicUnsafeRead
                                         ...
                                         endIndex_a7Ay)
   ...
 }}}

 while the fast version has

 {{{#!hs
 $w$spartitionLoop2_rgUN
   :: GHC.Prim.Int#
      -> GHC.Prim.Int#
      -> GHC.Prim.MutableByteArray# GHC.Prim.RealWorld
      -> GHC.Prim.Int#
      -> GHC.Prim.Int#
      -> GHC.Prim.Int#
      -> GHC.Prim.Int#
      -> GHC.Prim.Int#
      -> GHC.Prim.State# GHC.Prim.RealWorld
      -> (# GHC.Prim.State# GHC.Prim.RealWorld, Int #)

 ...

 $spartitionLoop2_rgUP
   :: VG.Mutable VU.Vector (PrimState IO) Int
      -> Int
      -> Int
      -> Int
      -> Int
      -> Int
      -> GHC.Prim.State# GHC.Prim.RealWorld
      -> (# GHC.Prim.State# GHC.Prim.RealWorld, Int #)
 }}}

 So with `VG.Mutable v (PrimState m) e` in the type signature, GHC managed
 to inline + specialise it all the way down to concrete types
 (`VUM.MVector`, `IO`, and `Int` as the element type), and consequently
 inline `basicUnsafeRead` on unboxed `Int#`.

 But with `VG.Mutable v ~ vm`, ghc keeps `vm (PrimState m) e` all the way,
 passes around dictionaries, thus eventually cannot inline
 `basicUnsafeRead` and re-packs already unboxed values, like `endIndex_a7Ay
 = GHC.Types.I# ww2_sfZn`, before passing them into the non-inlined call of
 `basicUnsafeRead`, thus making a tight loop allocate that normally
 wouldn't allocate.

 Why might rewriting the type signature in such a trivial way make this
 happen?

 ----

 I have tested this on GHC 8.0.2, GHC 8.2.2, and GHC 8.5 HEAD commit
 cc4677c36ee (edit: in which case I commented out the quickcheck and
 criterion related stuff because their deps don't build there yet).

 Reproducer:

 * https://github.com/nh2/haskell-quickselect-median-of-
 medians/blob/0efd6293e779fda2d864ec3d75329fb16b8af6d9/Main.hs#L506
 * Running instructions are [https://github.com/nh2/haskell-quickselect-
 median-of-medians/commit/7a49d673990dfaebdb0ba837c3fbbaae0455dba0 in this
 commit message]; for short: `stack exec -- ghc -O --make Main.hs -rtsopts
 -ddump-simpl -dsuppress-coercions -fforce-recomp -ddump-to-file -fno-full-
 laziness && ./Main +RTS -sstderr`
 * For that file, I have pregenerated `-dverbose-core2core` output here:
 https://github.com/nh2/haskell-quickselect-median-of-
 medians/tree/0efd6293e779fda2d864ec3d75329fb16b8af6d9/slowness-analysis
 * I originally just wanted to write a fast median-of-medians
 implementation on ZuriHac 2017, but got totally derailed by this
 performance problem. The version of it that I link here is a total mess
 because of me mauling it to track down this performance issue.

 Trying to increase `-funfolding-use-threshold` or `-funfolding-keeness-
 factor` did not change the situation.

--

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/14941#comment:2>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler

_______________________________________________
ghc-tickets mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #14941: Switching direct type family application to EqPred (~) prevents inlining in code using vector (10x slowdown)

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#14941: Switching direct type family application to EqPred (~) prevents inlining in
code using vector (10x slowdown)
-------------------------------------+-------------------------------------
        Reporter:  nh2               |                Owner:  (none)
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:
       Component:  Compiler          |              Version:  8.2.2
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
 Type of failure:  Runtime           |  Unknown/Multiple
  performance bug                    |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by mpickering):

 I copied the core of `selectVectorDestructive2` from the good and bad
 examples and diffed them. They looked basically the same apart from lots
 and lots of code like the following:

 In bad:

 {{{
 1# ->
                                   case $wlvl_rnHf
                                          @ VUM.MVector
                                          @ Int
                                          @ IO
 Data.Vector.Unboxed.Base.$fMVectorMVectorInt
 ((Data.Vector.Primitive.Mutable.MVector
                                              @ ghc-
 prim-0.5.2.0:GHC.Prim.RealWorld
                                              @ Int
                                              ww1_smP2
                                              ww2_smP3
                                              ww3_smP4)
                                           `cast` (Sym
 (Data.Vector.Unboxed.Base.N:R:MVectorsInt[0]
                                                            <ghc-
 prim-0.5.2.0:GHC.Prim.RealWorld>_N) ; Sym
 (Data.Vector.Unboxed.Base.DR:MVectorsInt0[0]
 (Control.Monad.Primitive.D:R:PrimStateIO[0]))
                                                   ::
 (Data.Vector.Primitive.Mutable.MVector
                                                         ghc-
 prim-0.5.2.0:GHC.Prim.RealWorld
                                                         Int :: *)
                                                      ~R# (VUM.MVector
 (PrimState IO) Int :: *)))
                                          ww4_smP8
                                   of wild_00 {
                                   }
 }}}

 In good:

 {{{
 1# -> case $wlvl_rnRh ww2_smWB ww3_smWC ww4_smWG of wild_00 { }
 }}}

 So it looks like on every call the `MVector` is being constructed rather
 than created one and shared? Anyone know immediately why this is such a
 problem?

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/14941#comment:3>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler

_______________________________________________
ghc-tickets mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #14941: Switching direct type family application to EqPred (~) prevents inlining in code using vector (10x slowdown)

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#14941: Switching direct type family application to EqPred (~) prevents inlining in
code using vector (10x slowdown)
-------------------------------------+-------------------------------------
        Reporter:  nh2               |                Owner:  (none)
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:
       Component:  Compiler          |              Version:  8.2.2
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
 Type of failure:  Runtime           |  Unknown/Multiple
  performance bug                    |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by nh2):

 This seems to be unrelated to type families and only a problem with `~` in
 general, because:

 I now also found a case where using `(val ~ Int) =>` produces slower code
 than subsituting `val` with `Int` syntactically, or using `SPECIALISE`.

 {{{
 myfun :: forall val . (Eq val, Show val, val ~ Int) => Mytype1 val ->
 Mytype2 val -> Mytype3 -> Mytype4 -> Mytype5 -> Mytype6 -> IO ()
 }}}

 is slow, but

 {{{
 {-# SPECIALIZE myfun :: Mytype1 Int -> Mytype2 Int -> Mytype3 -> Mytype4
 -> Mytype5 -> Mytype6 -> IO () #-}

 myfun :: forall val . (Eq val, Show val, val ~ Int) => Mytype1 val ->
 Mytype2 val -> Mytype3 -> Mytype4 -> Mytype5 -> Mytype6 -> IO ()
 }}}

 is fast (in my case, a 30% performance difference).

 Diffing the output of `-ddump-simpl -ddump-to-file -dsuppress-idinfo
 -dsuppress-coercions -dsuppress-module-prefixes -dsuppress-uniques` shows
 how the call looks different from a module that calls `myfun`:

 `val ~ Int` version:

 {{{
 $d~~ :: (Int :: *) ~~ (Int :: *)
 $d~~ = Eq# @ * @ * @ Int @ Int @~ ...

 ...
                        $wmyfun
                          @ Int
                          $fEqInt
                          ($d~~ `cast` ...)
                          y
                          wild
                          mything1
                          ipv7
                          mything2
                          ww1
                          w9)
 ...
 }}}

 `SPECIALISE`d `val ~ Int` version:

 {{{
                        myfun
                          y
                          wild
                          mything1
                          ipv7
                          mything2
                          ww1
 }}}

 I would have expect that after the simplifier has run, GHC would have used
 all information it has to produce the best core (e.g. use the fact that it
 knows `val ~ Int` even without `SPECIALISE`), but it did not.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/14941#comment:4>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler

_______________________________________________
ghc-tickets mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #14941: Switching direct type family application to EqPred (~) prevents inlining in code using vector (10x slowdown)

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#14941: Switching direct type family application to EqPred (~) prevents inlining in
code using vector (10x slowdown)
-------------------------------------+-------------------------------------
        Reporter:  nh2               |                Owner:  (none)
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:
       Component:  Compiler          |              Version:  8.2.2
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
 Type of failure:  Runtime           |  Unknown/Multiple
  performance bug                    |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by simonpj):

 I think I see. Can you offer a small example?

 I think something like this is happening.  We have
 {{{
 ...(\(g :: Int ~ val).   ...f (d |> Num g)... ) ...

 where d :: Num Int
 }}}
 Now, we can't float that call up to the definition of `f` because it
 mentions `g`.  But we could in principle specialise `f` for `Num Int`, and
 then use that specialised version at the call.

 I'm not certain this is exactly what is happening for you.  Hence wanting
 a test case.  (You don't need to exhibit worse perf; just that a
 specialisation is created and used without the equality, but not with.)

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/14941#comment:5>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler

_______________________________________________
ghc-tickets mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #14941: Switching direct type family application to EqPred (~) prevents inlining in code using vector (10x slowdown)

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#14941: Switching direct type family application to EqPred (~) prevents inlining in
code using vector (10x slowdown)
-------------------------------------+-------------------------------------
        Reporter:  nh2               |                Owner:  davide
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:
       Component:  Compiler          |              Version:  8.2.2
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
 Type of failure:  Runtime           |  Unknown/Multiple
  performance bug                    |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------
Changes (by davide):

 * owner:  (none) => davide


--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/14941#comment:6>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler

_______________________________________________
ghc-tickets mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #14941: Switching direct type family application to EqPred (~) prevents inlining in code using vector (10x slowdown)

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#14941: Switching direct type family application to EqPred (~) prevents inlining in
code using vector (10x slowdown)
-------------------------------------+-------------------------------------
        Reporter:  nh2               |                Owner:  davide
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:
       Component:  Compiler          |              Version:  8.2.2
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
 Type of failure:  Runtime           |  Unknown/Multiple
  performance bug                    |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by davide):

 I've created a simple case where this happens:
 {{{#!haskell
 {-# LANGUAGE ScopedTypeVariables #-}
 {-# LANGUAGE TypeFamilies #-}

 module F (f) where

 f :: Int -> Int                           -- FAST
 -- f :: forall a. (a ~ Int) => a -> a     -- SLOW
 f x = x + x
 {-# NOINLINE f #-}
 }}}

 Compiling with:
 {{{
 $ ghc-8.4.3 -O -ddump-simpl -dsuppress-coercions F.hs
 }}}
 I get this core (note worker-wrapper transformation):
 {{{
 -- RHS size: {terms: 4, types: 1, coercions: 0, joins: 0/0}
 F.$wf [InlPrag=NOINLINE] :: GHC.Prim.Int# -> GHC.Prim.Int#
 [GblId, Arity=1, Caf=NoCafRefs, Str=<S,U>]
 F.$wf = \ (ww_s1ay :: GHC.Prim.Int#) -> GHC.Prim.+# ww_s1ay ww_s1ay

 -- RHS size: {terms: 10, types: 4, coercions: 0, joins: 0/0}
 f [InlPrag=NOUSERINLINE[0]] :: Int -> Int
 [GblId,
  Arity=1,
  Caf=NoCafRefs,
  Str=<S(S),1*U(U)>m,
  Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
          WorkFree=True, Expandable=True,
          Guidance=ALWAYS_IF(arity=1,unsat_ok=True,boring_ok=False)
          Tmpl= \ (w_s1av [Occ=Once!] :: Int) ->
                  case w_s1av of { GHC.Types.I# ww1_s1ay [Occ=Once] ->
                  case F.$wf ww1_s1ay of ww2_s1aC { __DEFAULT ->
                  GHC.Types.I# ww2_s1aC
                  }
                  }}]
 f = \ (w_s1av :: Int) ->
       case w_s1av of { GHC.Types.I# ww1_s1ay ->
       case F.$wf ww1_s1ay of ww2_s1aC { __DEFAULT ->
       GHC.Types.I# ww2_s1aC
       }
       }
 }}}
 Swapping to `f :: forall a. (a ~ Int) => a -> a` gives:
 {{{
 -- RHS size: {terms: 10, types: 21, coercions: 12, joins: 0/0}
 f [InlPrag=NOINLINE] :: forall a. ((a :: *) ~ (Int :: *)) => a -> a
 [GblId, Arity=2, Caf=NoCafRefs, Str=<S(S),1*U(1*U)><S,U>m]
 f = \ (@ a_a13U)
       ($d~_a13W :: (a_a13U :: *) ~ (Int :: *))
       (eta_B1 :: a_a13U) ->
       case GHC.Types.HEq_sc
              @ * @ * @ a_a13U @ Int ($d~_a13W `cast` <Co:5>)
       of co_a14c
       { __DEFAULT ->
       (GHC.Num.$fNumInt_$c+
          (eta_B1 `cast` <Co:2>) (eta_B1 `cast` <Co:2>))
       `cast` <Co:3>
       }
 }}}
 I ran a benchmark to confirm the performance difference:
 {{{
 benchmarking Int -> Int
 time                 12.47 ns   (12.43 ns .. 12.55 ns)
                      1.000 R²   (1.000 R² .. 1.000 R²)
 mean                 12.53 ns   (12.48 ns .. 12.59 ns)
 std dev              173.4 ps   (140.2 ps .. 239.2 ps)
 variance introduced by outliers: 17% (moderately inflated)

 benchmarking (a ~ Int) => a -> a
 time                 15.72 ns   (15.69 ns .. 15.76 ns)
                      1.000 R²   (1.000 R² .. 1.000 R²)
 mean                 15.80 ns   (15.74 ns .. 16.01 ns)
 std dev              327.8 ps   (135.0 ps .. 691.2 ps)
 variance introduced by outliers: 32% (moderately inflated)
 }}}

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/14941#comment:7>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler

_______________________________________________
ghc-tickets mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #14941: Switching direct type family application to EqPred (~) prevents inlining in code using vector (10x slowdown)

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#14941: Switching direct type family application to EqPred (~) prevents inlining in
code using vector (10x slowdown)
-------------------------------------+-------------------------------------
        Reporter:  nh2               |                Owner:  davide
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:
       Component:  Compiler          |              Version:  8.2.2
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
 Type of failure:  Runtime           |  Unknown/Multiple
  performance bug                    |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------
Changes (by davide):

 * Attachment "simple_case.zip" added.

 Simple case and benchmark.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/14941>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler

_______________________________________________
ghc-tickets mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #14941: Switching direct type family application to EqPred (~) prevents inlining in code using vector (10x slowdown)

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#14941: Switching direct type family application to EqPred (~) prevents inlining in
code using vector (10x slowdown)
-------------------------------------+-------------------------------------
        Reporter:  nh2               |                Owner:  davide
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:
       Component:  Compiler          |              Version:  8.2.2
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
 Type of failure:  Runtime           |  Unknown/Multiple
  performance bug                    |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by davide):

 == Regarding simple example

 `f :: forall a. (a ~ Int) => a -> a`, the difference in performance is
 somewhat expected. This may be a different issue than the example given in
 the ticket description. In short, `a ~ Int` is a proof that type `a` is
 equal to type `Int`. In core, `a ~ Int` is a regular ''boxed'' GADT
 meaning that it could be bottom i.e. an invalid prove (this is the main
 mechanism behind
 [https://downloads.haskell.org/~ghc/8.0.2/docs/html/users_guide/glasgow_exts.html?highlight=defered%20type%20error
 #deferring-type-errors-to-runtime -fdefer-type-errors]). Unboxing `a ~ b`
 at corresponds to checking the proof which is required to coerce the input
 binding from `a` to `Int`. Normally the `(a ~ Int)` would be optimized
 away (as described [http://dreixel.net/research/pdf/epdtecp.pdf here] in
 section 7.3), but that requires a worker wrapper transformation that never
 happens. Removing `NOINLINE` allows `f` to be optimized across modules,
 which closes the performance gap.

 == Regarding original example

 Unlike my simple example, all the code is in one module, so I expect the
 equality proof `VG.Mutable v ~ vm` to be optimized away (again see
 [http://dreixel.net/research/pdf/epdtecp.pdf here] section 7.3). With ghc
 3.2.2, when compiling the slow version, I see `selectVectorDestructive2`
 is specialized to
 `$sselectVectorDestructive2 :: Int -> Vector Int -> MVector (PrimState IO)
 Int -> Int -> Int -> IO ()` (pass 2). This is good, but for some reason
 myread and partitionLoop2 are not specialized even though they are used by
 `$sselectVectorDestructive2`:
 {{{#!haskell
 $sselectVectorDestructive2 =
 ...
     let

         $dMVector =
           Data.Vector.Generic.Base.$p1Vector
             @Vector
             @Int
             Data.Vector.Unboxed.Base.$fVectorVectorInt
     in
 ...
           (Main.myread
                 @IO
                 @MVector
                 @Int
                 Control.Monad.Primitive.$fPrimMonadIO
                 $dMVector
                 GHC.Classes.$fOrdInt
                 GHC.Show.$fShowInt
                 v
                 begin)
 ...
           (Main.partitionLoop2
             @IO
             @MVector
             @Int
             Control.Monad.Primitive.$fPrimMonadIO
             $dMVector
             GHC.Classes.$fOrdInt
             GHC.Show.$fShowInt
             v
             begin
             pivot
             (GHC.Types.I# ...)
 }}}

 In the fast version, myread and partitionLoop2 are specialized in this
 pass. I noticed 2 other differences:
 * fast version floats `$dMVector` to a top level binding.
 * fast version specializes to `Mutable Vector (PrimState IO) Int` instead
 of `MVector (PrimState IO) Int`. Note `Mutable` is a type family and
 `Mutable Vector = MVector`

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/14941#comment:8>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler

_______________________________________________
ghc-tickets mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #14941: Switching direct type family application to EqPred (~) prevents inlining in code using vector (10x slowdown)

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#14941: Switching direct type family application to EqPred (~) prevents inlining in
code using vector (10x slowdown)
-------------------------------------+-------------------------------------
        Reporter:  nh2               |                Owner:  davide
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:
       Component:  Compiler          |              Version:  8.2.2
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
 Type of failure:  Runtime           |  Unknown/Multiple
  performance bug                    |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by simonpj):

 Thanks for the simpler example.  I think I now understand what
 is going on.  It's all to do with strictness analysis.

 Consider this function, and suppose it is strict in x:
 {{{
 f1 :: Int -> blah
 f1 x = ...(case x of I# y -> ...)...
 }}}
 The strictness analysis sees that `f1` is strict in `x` and we get a w/w
 split
 {{{
 f1 :: Int -> blah
 f1 x = case x of I# y -> $wf y

 $wf1 :: Int# -> blah
 $wf1 y = let x = I# y in ...original body of f...
 }}}
 Now suppose that we have
 {{{
 type family F a
 type instance F Bool = Int

 f2 :: F Bool -> blah
 f2 x = ...same as before...
 }}}
 In fact the strictness analysis still sees that `f2` is strict in `x`,
 and worker wrapper works too -- all by using the family instances.
 We get this
 {{{
 f2 :: F Bool -> blah
 f2 x = case (x |> g1) of I# y -> $wf2 y

 $wf2 :: Int# -> blah
 $wf2 y = let x = (I# y) |> sym g
          in ..as before...
 }}}
 Here `g` is a coercion `g :: F Bool ~ Int`, constructed from the family
 instances.  This coersionn is generated by `topNormaliseType_maybe`,
 itself called from `deepSplitCprType_maybe` in `WwLib`, during
 worker/wrapper
 generation.

 But it's harder if the coercion is purely local. Let's start with this
 (yes I know you can't write `(~#)` in source Haskell but imagine this
 is Core:
 {{{
 f3 :: forall a. (a ~# Int) => a -> blah
 f3 a (g :: (a ~# Int) (x :: a) = ...as before...
 }}}
 What we'd like is this:
 {{{
 f3 :: forall a. (a ~# Int) => a -> blah
 f3 a (g :: (a ~# Int) (x :: a) = case (x |> g) of I# y -> $wf3 a g y

 $wf3 :: forall a. (a ~# Int) => Int# -> blah
 $wf3 a g y = let x = (I# y) |> sym g
             in ...as before...
 }}}
 but alas neither the demand analyser, nor the worker/wrapper generator
 are clever enough to do this.   We need a kind of `normaliseType` that
 can take some local "givens" as well as the top-level family instance
 envs.

 This has the same ring as something Ryan wanted in the coverage
 checker.

 It's not obvious exactly what "should work".  Eg what about `(Int ~# a)`,
 or even `([Int] ~# [a])`?

 Finally, there is the question of `(~)` vs `(~#)`.  I think we are very
 aggressive about unboxing `(~)` constraints, so I'd like this:
 {{{
 f4 :: forall a. (a ~ Int) => a -> blah
 f4 a (g :: (a ~ Int) (x :: a)
   = case g of MkTwiddle (g2 :: a ~# Int) ->
     case (x |> g2) of I# y -> $wf4 a g2 y

 $wf4 :: forall a. (a ~# Int) => Int# -> blah
 $wf4 a g2 y = let x = (I# y) |> sym g2
                   g = MkTwiddle g2
             in ...as before...
 }}}
 Making all this happen will take some work though.

 How important is it.  I'm tempted to say "don't write types like that"!
 (Although the type checker goes to some trouble to support them well.)

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/14941#comment:9>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler

_______________________________________________
ghc-tickets mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #14941: Switching direct type family application to EqPred (~) prevents inlining in code using vector (10x slowdown)

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#14941: Switching direct type family application to EqPred (~) prevents inlining in
code using vector (10x slowdown)
-------------------------------------+-------------------------------------
        Reporter:  nh2               |                Owner:  davide
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:
       Component:  Compiler          |              Version:  8.2.2
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
 Type of failure:  Runtime           |  Unknown/Multiple
  performance bug                    |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by nh2):

 Replying to [comment:9 simonpj]:
 > How important is it?  I'm tempted to say "don't write types like that"!

 From the user's perspective, I can put forward two use cases:

 1. Type level let bindings

   `(..., VG.Mutable v ~ vm) => ... -> vm (PrimState m) e -> ...`

   From the issue description. This is essentially type-signature
 `let`-binding, allowing me to write the type signature more
 clearly/legibly, especially if `vm` is used multiple times.

 2. Refactorings

   For the code in comment 4, `(val ~ Int) =>`, this arose from a
 refactoring where I replaced `Int` by `val` across a large code base, and
 then in the top-level-ish function set `val ~ Int` to check if the
 performance was unimpacted.

 I'd probably be OK without this being fixed, now that I know what's going
 on, but if it's not fixed, we somehow have to do a much better job at
 giving warnings or educating people that you can't just apply a
 substitution principle when "cleaning up" type signaturees with `~`. I
 incorrectly assumed this would be type-level only, and as a result had to
 start a multi-day investigation of where my performance went because I
 expected any mistake on my side but not this. Especially when writing code
 using `vector` whose type family heavy API, in my opinion, almost begs for
 `~` to be used to achieve readable signatures.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/14941#comment:10>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler

_______________________________________________
ghc-tickets mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-tickets