[GHC] #8763: forM_ [1..N] does not get fused (10 times slower than go function)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
87 messages Options
12345
Reply | Threaded
Open this post in threaded view
|

[GHC] #8763: forM_ [1..N] does not get fused (10 times slower than go function)

GHC - devs mailing list
#8763: forM_ [1..N] does not get fused (10 times slower than go function)
------------------------------+--------------------------------------------
       Reporter:  nh2         |             Owner:
           Type:  bug         |            Status:  new
       Priority:  normal      |         Milestone:
      Component:  Compiler    |           Version:  7.6.3
       Keywords:              |  Operating System:  Unknown/Multiple
   Architecture:              |   Type of failure:  Runtime performance bug
  Unknown/Multiple            |         Test Case:
     Difficulty:  Unknown     |          Blocking:
     Blocked By:              |
Related Tickets:              |
------------------------------+--------------------------------------------
 Apparently idiomatic code like {{{forM_ [1.._N]}}} does not get fused
 away.

 This can give serious performance problems when unnoticed.

 {{{
 -- Slow:
 forM_ [0.._N-1] $ \i -> do ...

 -- Around 10 times faster:
 loop _N $ \i -> do ...

 {-# INLINE loop #-}
 loop :: (Monad m) => Int -> (Int -> m ()) -> m ()
 loop bex f = go 0
 where
 go !n | n == bex = return ()
 | otherwise = f n >> go (n+1)
 }}}

 Full code example: https://gist.github.com/nh2/8905997 - the relevant
 alternatives are commented.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/8763>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
_______________________________________________
ghc-tickets mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #8763: forM_ [1..N] does not get fused (10 times slower than go function)

GHC - devs mailing list
#8763: forM_ [1..N] does not get fused (10 times slower than go function)
--------------------------------------------+------------------------------
        Reporter:  nh2                      |            Owner:
            Type:  bug                      |           Status:  new
        Priority:  normal                   |        Milestone:
       Component:  Compiler                 |          Version:  7.6.3
      Resolution:                           |         Keywords:
Operating System:  Unknown/Multiple         |     Architecture:
 Type of failure:  Runtime performance bug  |  Unknown/Multiple
       Test Case:                           |       Difficulty:  Unknown
        Blocking:                           |       Blocked By:
                                            |  Related Tickets:
--------------------------------------------+------------------------------

Comment (by nh2):

 People on #ghc think that it is because the {{{[1.._N]}}} gets floated out
 as a top-level constant expression:


 ----


 '''nomeata:''' nh2: I’d consider that a bug
 '''nomeata:''' I believe the problem is that [0..512] does not depend on
 any local values
 '''nomeata:''' so it is floated out as a top-level value
 '''nomeata:''' and there it is not matched by any rules
 '''thoughtpolice:''' let floating strikes again
 '''thoughtpolice:''' (well, floating-out being a non-optimization,
 anyway.)
 '''Fuuzetsu:''' does this mean that if I use [0 .. 512] twice in a program
 in different places, it will only be computed once?
 '''hvr:''' Fuuzetsu: there's a chance, yes
 '''Fuuzetsu:''' neat
 '''thoughtpolice:''' well, not so neat. in cases like this you really
 don't want to float out some things, because it hurts later opportunities
 to optimize sometimes (e.g. float out a binding that otherwise would have
 triggered a RULE or fusion, perhaps)
 '''thoughtpolice:''' unfortunately floating like this is one of the harder
 things to 'fight against' when you don't want it, from what i've seen.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/8763#comment:1>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
_______________________________________________
ghc-tickets mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #8763: forM_ [1..N] does not get fused (10 times slower than go function)

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#8763: forM_ [1..N] does not get fused (10 times slower than go function)
--------------------------------------------+------------------------------
        Reporter:  nh2                      |            Owner:
            Type:  bug                      |           Status:  new
        Priority:  normal                   |        Milestone:
       Component:  Compiler                 |          Version:  7.6.3
      Resolution:                           |         Keywords:
Operating System:  Unknown/Multiple         |     Architecture:
 Type of failure:  Runtime performance bug  |  Unknown/Multiple
       Test Case:                           |       Difficulty:  Unknown
        Blocking:                           |       Blocked By:
                                            |  Related Tickets:
--------------------------------------------+------------------------------

Comment (by darchon):

 I believe let-floating happens due to the full-laziness transform. Do you
 get better performance with -fno-full-laziness ?

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/8763#comment:2>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
_______________________________________________
ghc-tickets mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #8763: forM_ [1..N] does not get fused (10 times slower than go function)

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#8763: forM_ [1..N] does not get fused (10 times slower than go function)
--------------------------------------------+------------------------------
        Reporter:  nh2                      |            Owner:
            Type:  bug                      |           Status:  new
        Priority:  normal                   |        Milestone:
       Component:  Compiler                 |          Version:  7.6.3
      Resolution:                           |         Keywords:
Operating System:  Unknown/Multiple         |     Architecture:
 Type of failure:  Runtime performance bug  |  Unknown/Multiple
       Test Case:                           |       Difficulty:  Unknown
        Blocking:                           |       Blocked By:
                                            |  Related Tickets:
--------------------------------------------+------------------------------

Comment (by nomeata):

 A comment in `SetLevels` (which I just came across) in the code indicates
 that this problem should have been taken care of:

 {{{
           -- We are keen to float something to the top level, even if it
 does not
           -- escape a lambda, because then it needs no allocation.  But
 it's controlled
           -- by a flag, because doing this too early loses opportunities
 for RULES
           -- which (needless to say) are important in some nofib programs
           -- (gcd is an example).
 }}}


 So either my assumption is wrong, or this does not work as desired.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/8763#comment:3>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
_______________________________________________
ghc-tickets mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #8763: forM_ [1..N] does not get fused (10 times slower than go function)

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#8763: forM_ [1..N] does not get fused (10 times slower than go function)
--------------------------------------------+------------------------------
        Reporter:  nh2                      |            Owner:
            Type:  bug                      |           Status:  new
        Priority:  normal                   |        Milestone:
       Component:  Compiler                 |          Version:  7.6.3
      Resolution:                           |         Keywords:
Operating System:  Unknown/Multiple         |     Architecture:
 Type of failure:  Runtime performance bug  |  Unknown/Multiple
       Test Case:                           |       Difficulty:  Unknown
        Blocking:                           |       Blocked By:
                                            |  Related Tickets:
--------------------------------------------+------------------------------

Comment (by nomeata):

 It turns out that this comment is obsolete; the flag is never set. I quote
 from `SimplCore`

 {{{
                 -- Was: gentleFloatOutSwitches
                 --
                 -- I have no idea why, but not floating constants to
                 -- top level is very bad in some cases.
                 --
                 -- Notably: p_ident in spectral/rewrite
                 --          Changing from "gentle" to "constantsOnly"
                 --          improved rewrite's allocation by 19%, and
                 --          made 0.0% difference to any other nofib
                 --          benchmark
 }}}

 This comment was introduced in eaeca51efc0be3ff865c4530137bfbe9f8553549
 (2009) by SPJ.

 Maybe rules matching should look though unfoldings more easily (at the
 risk of losing sharing)? There is no point in worrying about sharing
 `[0..N]` in a rule application whose purpose is to eliminate that list.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/8763#comment:4>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
_______________________________________________
ghc-tickets mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #8763: forM_ [1..N] does not get fused (10 times slower than go function)

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#8763: forM_ [1..N] does not get fused (10 times slower than go function)
--------------------------------------------+------------------------------
        Reporter:  nh2                      |            Owner:
            Type:  bug                      |           Status:  new
        Priority:  normal                   |        Milestone:
       Component:  Compiler                 |          Version:  7.6.3
      Resolution:                           |         Keywords:
Operating System:  Unknown/Multiple         |     Architecture:
 Type of failure:  Runtime performance bug  |  Unknown/Multiple
       Test Case:                           |       Difficulty:  Unknown
        Blocking:                           |       Blocked By:
                                            |  Related Tickets:
--------------------------------------------+------------------------------

Comment (by nh2):

 @nomeata

 Regarding your suspicion that it gets floated out as a constant, I don't
 see an improvement when getting the upper bound m of `[1..m]` from an IO
 action. What do you think?

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/8763#comment:5>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
_______________________________________________
ghc-tickets mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #8763: forM_ [1..N] does not get fused (10 times slower than go function)

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#8763: forM_ [1..N] does not get fused (10 times slower than go function)
--------------------------------------------+------------------------------
        Reporter:  nh2                      |            Owner:
            Type:  bug                      |           Status:  new
        Priority:  normal                   |        Milestone:
       Component:  Compiler                 |          Version:  7.6.3
      Resolution:                           |         Keywords:
Operating System:  Unknown/Multiple         |     Architecture:
 Type of failure:  Runtime performance bug  |  Unknown/Multiple
       Test Case:                           |       Difficulty:  Unknown
        Blocking:                           |       Blocked By:
                                            |  Related Tickets:
--------------------------------------------+------------------------------

Comment (by nomeata):

 It might still get floated out of some local function. Have you tried
 coming up with a minimal example?

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/8763#comment:6>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
_______________________________________________
ghc-tickets mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #8763: forM_ [1..N] does not get fused (10 times slower than go function)

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#8763: forM_ [1..N] does not get fused (10 times slower than go function)
--------------------------------------------+------------------------------
        Reporter:  nh2                      |            Owner:
            Type:  bug                      |           Status:  new
        Priority:  normal                   |        Milestone:
       Component:  Compiler                 |          Version:  7.6.3
      Resolution:                           |         Keywords:
Operating System:  Unknown/Multiple         |     Architecture:
 Type of failure:  Runtime performance bug  |  Unknown/Multiple
       Test Case:                           |       Difficulty:  Unknown
        Blocking:                           |       Blocked By:
                                            |  Related Tickets:
--------------------------------------------+------------------------------

Comment (by nh2):

 @nomeata

 There is an example I made for this, mentioned in the bug description.

 The performance I measure for that is:

 * using `forM_` with `ghc -O`: 2.0 s
 * using `loop ` with `ghc -O`: 1.6 s
 * using `forM_` with `ghc -O2`: 0.9 s
 * using `loop ` with `ghc -O2`: 0.3 s
 * using `forM_` with `ghc -O2 -fllvm`: 0.75 s
 * using `loop ` with `ghc -O2 -fllvm`: 0.15 s

 I tried to make an even smaller benchmark
 (https://gist.github.com/nh2/11333427) but the performance is identical
 there although the same thing changes as before.

 Could you try my two benchmarks and see if you get the same behaviour?

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/8763#comment:7>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
_______________________________________________
ghc-tickets mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #8763: forM_ [1..N] does not get fused (10 times slower than go function)

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#8763: forM_ [1..N] does not get fused (10 times slower than go function)
--------------------------------------------+------------------------------
        Reporter:  nh2                      |            Owner:
            Type:  bug                      |           Status:  new
        Priority:  normal                   |        Milestone:
       Component:  Compiler                 |          Version:  7.6.3
      Resolution:                           |         Keywords:
Operating System:  Unknown/Multiple         |     Architecture:
 Type of failure:  Runtime performance bug  |  Unknown/Multiple
       Test Case:                           |       Difficulty:  Unknown
        Blocking:                           |       Blocked By:
                                            |  Related Tickets:
--------------------------------------------+------------------------------

Comment (by nh2):

 I have updated the gist at https://gist.github.com/nh2/11333427 to contain
 both the matmult example (where the difference between `forM_` and `loop`
 is big) and the simple example (where no difference can be measured).

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/8763#comment:8>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
_______________________________________________
ghc-tickets mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #8763: forM_ [1..N] does not get fused (10 times slower than go function)

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#8763: forM_ [1..N] does not get fused (10 times slower than go function)
--------------------------------------------+------------------------------
        Reporter:  nh2                      |            Owner:
            Type:  bug                      |           Status:  new
        Priority:  normal                   |        Milestone:
       Component:  Compiler                 |          Version:  7.6.3
      Resolution:                           |         Keywords:
Operating System:  Unknown/Multiple         |     Architecture:
 Type of failure:  Runtime performance bug  |  Unknown/Multiple
       Test Case:                           |       Difficulty:  Unknown
        Blocking:                           |       Blocked By:
                                            |  Related Tickets:
--------------------------------------------+------------------------------

Comment (by daniel.is.fischer):

 Replying to [comment:8 nh2]:
 > I have updated the gist at https://gist.github.com/nh2/11333427 to
 contain both the matmult example (where the difference between `forM_` and
 `loop` is big) and the simple example (where no difference can be
 measured).

 The simple example doesn't use the same list in different places, so GHC
 is capable of eliminating it and giving you a loop on unboxed `Int#`s, at
 least with `-O2`. In the matmult example, you need to conceal the fact
 that both lists are the same from GHC to get a loop on unboxed `Int#`s.

 So in principle, GHC can do the desired thing, just the sharing gets in
 the way. Can somebody think of a situation where sharing is beneficial for
 `forM_ [a .. b] $ \n -> do ...` code? If not, perhaps special-casing
 `enumFromTo` arguments for `forM_` etc. is, at least for standard integer
 types, something to be considered.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/8763#comment:9>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
_______________________________________________
ghc-tickets mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #8763: forM_ [1..N] does not get fused (10 times slower than go function)

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#8763: forM_ [1..N] does not get fused (10 times slower than go function)
--------------------------------------------+------------------------------
        Reporter:  nh2                      |            Owner:
            Type:  bug                      |           Status:  new
        Priority:  normal                   |        Milestone:
       Component:  Compiler                 |          Version:  7.6.3
      Resolution:                           |         Keywords:
Operating System:  Unknown/Multiple         |     Architecture:
 Type of failure:  Runtime performance bug  |  Unknown/Multiple
       Test Case:                           |       Difficulty:  Unknown
        Blocking:                           |       Blocked By:
                                            |  Related Tickets:
--------------------------------------------+------------------------------

Comment (by nh2):

 Preventing it from sharing sounds sensible for me: If the `[a .. b]` was
 something expensive to compute (a list of primes, say), I suspect any sane
 person would easily share it manually by declaring it top-level.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/8763#comment:10>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
_______________________________________________
ghc-tickets mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #8763: forM_ [1..N] does not get fused (10 times slower than go function)

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#8763: forM_ [1..N] does not get fused (10 times slower than go function)
--------------------------------------------+------------------------------
        Reporter:  nh2                      |            Owner:
            Type:  bug                      |           Status:  new
        Priority:  normal                   |        Milestone:
       Component:  Compiler                 |          Version:  7.6.3
      Resolution:                           |         Keywords:
Operating System:  Unknown/Multiple         |     Architecture:
 Type of failure:  Runtime performance bug  |  Unknown/Multiple
       Test Case:                           |       Difficulty:  Unknown
        Blocking:                           |       Blocked By:
                                            |  Related Tickets:
--------------------------------------------+------------------------------

Comment (by nh2):

 Replying to [comment:9 daniel.is.fischer]:
 > The simple example doesn't use the same list in different places

 Unfortunately, I still get no difference in performance even if I
 duplicate the loops to

   forM_ [1.._MAX] $ \i -> do
     UM.unsafeWrite v 0 i
   forM_ [1.._MAX] $ \i -> do
     UM.unsafeWrite v 0 i

 Any idea?

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/8763#comment:11>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
_______________________________________________
ghc-tickets mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #8763: forM_ [1..N] does not get fused (10 times slower than go function)

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#8763: forM_ [1..N] does not get fused (10 times slower than go function)
--------------------------------------------+------------------------------
        Reporter:  nh2                      |            Owner:
            Type:  bug                      |           Status:  new
        Priority:  normal                   |        Milestone:
       Component:  Compiler                 |          Version:  7.6.3
      Resolution:                           |         Keywords:
Operating System:  Unknown/Multiple         |     Architecture:
 Type of failure:  Runtime performance bug  |  Unknown/Multiple
       Test Case:                           |       Difficulty:  Unknown
        Blocking:                           |       Blocked By:
                                            |  Related Tickets:
--------------------------------------------+------------------------------
Changes (by pivo):

 * cc: pivo@… (added)


--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/8763#comment:12>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
_______________________________________________
ghc-tickets mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #8763: forM_ [1..N] does not get fused (10 times slower than go function)

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#8763: forM_ [1..N] does not get fused (10 times slower than go function)
--------------------------------------------+------------------------------
        Reporter:  nh2                      |            Owner:
            Type:  bug                      |           Status:  new
        Priority:  normal                   |        Milestone:
       Component:  Compiler                 |          Version:  7.6.3
      Resolution:                           |         Keywords:
Operating System:  Unknown/Multiple         |     Architecture:
 Type of failure:  Runtime performance bug  |  Unknown/Multiple
       Test Case:                           |       Difficulty:  Unknown
        Blocking:                           |       Blocked By:
                                            |  Related Tickets:
--------------------------------------------+------------------------------

Comment (by simonpj):

 I've got lost in this thread.  If someone thinks there is a real bug here,
 in GHC 7.8, could you re-articulate what you think it is?

 Simon

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/8763#comment:13>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
_______________________________________________
ghc-tickets mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #8763: forM_ [1..N] does not get fused (10 times slower than go function)

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#8763: forM_ [1..N] does not get fused (10 times slower than go function)
--------------------------------------------+------------------------------
        Reporter:  nh2                      |            Owner:
            Type:  bug                      |           Status:  new
        Priority:  normal                   |        Milestone:
       Component:  Compiler                 |          Version:  7.6.3
      Resolution:                           |         Keywords:
Operating System:  Unknown/Multiple         |     Architecture:
 Type of failure:  Runtime performance bug  |  Unknown/Multiple
       Test Case:                           |       Difficulty:  Unknown
        Blocking:                           |       Blocked By:
                                            |  Related Tickets:
--------------------------------------------+------------------------------

Comment (by nh2):

 @simonpj: Summary:

 1) I reported that my manually written loop is much faster than `forM_
 [1..n]` in some cases, suggesting that in some cases optimizing the list
 away doesn't work well.

 2) nomeata said some technical things that are a bit beyond me.

 3) I submit two benchmarks in the gist at
 ​https://gist.github.com/nh2/11333427, a "matmult" benchmark where there
 is a big difference between `forM_` and the hand-written `loop`, and a
 "simple" benchmark where they are equally fast.

 4) Daniel suspects the slow case comes from using the same syntactical
 list twice, and that in this case GHC floats it out to share it, which
 breaks eliminating it. He suggests we might special-case `enumFromTo` when
 used with `forM_` to prevent it.

 5) I give a counter example for his suspicion, by changing my "simple"
 benchmark, where using the same list twice gives the same good performance
 as using it once.

 I get the same behaviour for 7.6 and 7.8.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/8763#comment:14>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
_______________________________________________
ghc-tickets mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #8763: forM_ [1..N] does not get fused (10 times slower than go function)

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#8763: forM_ [1..N] does not get fused (10 times slower than go function)
--------------------------------------------+------------------------------
        Reporter:  nh2                      |            Owner:
            Type:  bug                      |           Status:  new
        Priority:  normal                   |        Milestone:
       Component:  Compiler                 |          Version:  7.6.3
      Resolution:                           |         Keywords:
Operating System:  Unknown/Multiple         |     Architecture:
 Type of failure:  Runtime performance bug  |  Unknown/Multiple
       Test Case:                           |       Difficulty:  Unknown
        Blocking:                           |       Blocked By:
                                            |  Related Tickets:
--------------------------------------------+------------------------------

Comment (by daniel.is.fischer):

 Slight correction, @Niklas, it's not a suspicion that it's the floating
 out of the list to the top-level and consequently the use of the list for
 looping instead of unboxed `Int#`s, that is direct from the core (`-ddump-
 simpl` is your friend in all matters related to performance). The question
 is under which exact circumstances GHC floats the list out. To answer
 that, you need somebody knowing how GHC works.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/8763#comment:15>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
_______________________________________________
ghc-tickets mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #8763: forM_ [1..N] does not get fused (10 times slower than go function)

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#8763: forM_ [1..N] does not get fused (10 times slower than go function)
--------------------------------------------+------------------------------
        Reporter:  nh2                      |            Owner:
            Type:  bug                      |           Status:  new
        Priority:  normal                   |        Milestone:
       Component:  Compiler                 |          Version:  7.6.3
      Resolution:                           |         Keywords:
Operating System:  Unknown/Multiple         |     Architecture:
 Type of failure:  Runtime performance bug  |  Unknown/Multiple
       Test Case:                           |       Difficulty:  Unknown
        Blocking:                           |       Blocked By:
                                            |  Related Tickets:
--------------------------------------------+------------------------------

Comment (by simonpj):

 I've tried with 7.8.2.  I get the same performance for `matmultForM_` and
 `matMultLoop`.

 I'm not using criterion; just using `+RTS -s`, with `_SIZE = 300`.

 So I can't reproduce the problem.

 Simon

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/8763#comment:16>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
_______________________________________________
ghc-tickets mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #8763: forM_ [1..N] does not get fused (10 times slower than go function)

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#8763: forM_ [1..N] does not get fused (10 times slower than go function)
--------------------------------------------+------------------------------
        Reporter:  nh2                      |            Owner:
            Type:  bug                      |           Status:  new
        Priority:  normal                   |        Milestone:
       Component:  Compiler                 |          Version:  7.6.3
      Resolution:                           |         Keywords:
Operating System:  Unknown/Multiple         |     Architecture:
 Type of failure:  Runtime performance bug  |  Unknown/Multiple
       Test Case:                           |       Difficulty:  Unknown
        Blocking:                           |       Blocked By:
                                            |  Related Tickets:
--------------------------------------------+------------------------------

Comment (by daniel.is.fischer):

 Replying to [comment:16 simonpj]:
 > I've tried with 7.8.2.  I get the same performance for `matmultForM_`
 and `matMultLoop`.

 Are you compiling with `-O`? You need `-O2` for a significant difference
 to appear (with only `-O`, both versions are slow).

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/8763#comment:17>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
_______________________________________________
ghc-tickets mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #8763: forM_ [1..N] does not get fused (10 times slower than go function)

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#8763: forM_ [1..N] does not get fused (10 times slower than go function)
-------------------------------------+-------------------------------------
        Reporter:  nh2               |                   Owner:
            Type:  bug               |                  Status:  new
        Priority:  normal            |               Milestone:
       Component:  Compiler          |                 Version:  7.6.3
      Resolution:                    |                Keywords:
Operating System:  Unknown/Multiple  |            Architecture:
 Type of failure:  Runtime           |  Unknown/Multiple
  performance bug                    |               Test Case:
      Blocked By:                    |                Blocking:
 Related Tickets:                    |  Differential Revisions:
-------------------------------------+-------------------------------------

Comment (by George):

 With 7.10.1 I see a factor of 2 difference in performance:

 benchmarking matmultForM_
 time                 10.90 μs   (10.89 μs .. 10.91 μs)
                      1.000 R²   (1.000 R² .. 1.000 R²)
 mean                 10.89 μs   (10.89 μs .. 10.91 μs)
 std dev              32.72 ns   (18.98 ns .. 65.42 ns)

 benchmarking matmultLoop
 time                 5.404 μs   (5.387 μs .. 5.419 μs)
                      1.000 R²   (1.000 R² .. 1.000 R²)
 mean                 5.409 μs   (5.398 μs .. 5.420 μs)
 std dev              37.99 ns   (33.64 ns .. 44.26 ns)

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/8763#comment:18>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
_______________________________________________
ghc-tickets mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #8763: forM_ [1..N] does not get fused (10 times slower than go function)

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#8763: forM_ [1..N] does not get fused (10 times slower than go function)
-------------------------------------+-------------------------------------
        Reporter:  nh2               |                   Owner:
            Type:  bug               |                  Status:  new
        Priority:  normal            |               Milestone:  7.12.1
       Component:  Compiler          |                 Version:  7.6.3
      Resolution:                    |                Keywords:
Operating System:  Unknown/Multiple  |            Architecture:
 Type of failure:  Runtime           |  Unknown/Multiple
  performance bug                    |               Test Case:
      Blocked By:                    |                Blocking:
 Related Tickets:                    |  Differential Revisions:
-------------------------------------+-------------------------------------
Changes (by George):

 * milestone:   => 7.12.1


--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/8763#comment:19>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
_______________________________________________
ghc-tickets mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-tickets
12345