Quantcast

Early inline

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Early inline

GHC - devs mailing list

Ben, David, Reid

I have been working for months (on and off, mostly off, but very ON for the last week or two) on a very simple idea: the simplifier should inline things even in the “gentle” phase.

It seems so simple.  And it is: the key patch is tiny.

But it stressed corners of the optimiser that were not stressed before; and digging into it showed opportunities I did not know about before.

So I  have ended up a with a whole series of patches, which are on wip/spj-early-inline branch

7f14d15c0e5fc2c9a81db3d0f0b01d85857b1d87 Error message wibbles accumulated from the preceding patches

0499c65d9fa45e7879e1e1264fdaa15274adcba6 Improve SetLevels for join points

3b2fc0827ff6cafa34836c2d9dc710b628c990b6 Change -ddump-tc-trace output in TcErrors, slightly

9ffdf62b0ca72c4f35579f9d6f31a9beebf23025 Improve pretty-printing of types

3f346eac06399a79adf48425018ee949cee245bf Add VarSet.anyDVarSet, allDVarSet

912e71eb3b4ec91e805ecf2236d1033e55e2933a The Early Inline Patch

7188cd13f8e54efa764d52ca016b87b3669b29f5 Small changes to expression sizing in CoreUnfold

bfc6fa3f377d11bdfcdbf82b65bf2f39cb00b90c Fix SetLevels for makeStaticPtr

8b1cfea089faacb5b95ffcc3511e05faeabb8076 Extend CSE to handle recursive bindings

50411995641802568bb27c867afe804f91e0524c Combine identical case alterantives in CSE

2e077ccc736a0b2a622b7f42b7929966bddb4ded Inline data constructor wrappers in phase 2 only

b868de53dd19f639c1070089ecff21948ff33e0d Make Specialise work with casts

c767ae5f04a09ef71dcb8f67a17225a52c2cc5d2 Stop uniques ending up in SPEC rule names

b49ed1f0102f93ca7f62632c436b41bd240b501f Occurrence-analyse the result of rule firings

607a735dfb99bb8f0edf466ccb01e732218c42ec Add -fspec-constr-keen

67a0c1872c0515f1f12ea68097a84e02da92f45b Refactor floating of bindings (fiBind)

e90f4d7c6d3003039fa1647a3da3dafcaa75527b More tracing in SpecConstr

 

Much to my surprise, we get some jolly nice improvements in compiler perf:

3%   perf/compiler/T5837.run            T5837 [stat too good] (normal)

7%   perf/compiler/parsing001.run       parsing001 [stat too good] (normal)

9%   perf/compiler/T12234.run           T12234 [stat too good] (optasm)

35%  perf/compiler/T9020.run            T9020 [stat too good] (optasm)

9%   perf/compiler/T3064.run            T3064 [stat too good] (normal)

13%  perf/compiler/T9961.run            T9961 [stat too good] (normal)

20%  perf/compiler/T13056.run           T13056 [stat too good] (optasm)

5%   perf/compiler/T9872d.run           T9872d [stat too good] (normal)

5%   perf/compiler/T9872c.run           T9872c [stat too good] (normal)

5%   perf/compiler/T9872b.run           T9872b [stat too good] (normal)

7%   perf/compiler/T9872a.run           T9872a [stat too good] (normal)

5%   perf/compiler/T783.run             T783 [stat too good] (normal)

35%   perf/compiler/T12227.run           T12227 [stat too good] (normal)

20%   perf/compiler/T1969.run            T1969 [stat too good] (normal)

5%   perf/should_run/lazy-bs-alloc.run  lazy-bs-alloc [stat too good] (normal)

5%   perf/compiler/T12707.run         T12707 [stat too good] (normal)

 

4%   perf/compiler/T3294.run            T3294 [stat too good] (normal)

1.5% perf/space_leaks/T4029.run         T4029 [stat too good] (ghci)

 

So what is left?  I have sunk so much time into this and am still not QUITE out of the woods.   I was left with

Unexpected failures:

   codeGen/should_compile/debug.run              debug [bad stdout] (normal)

   concurrent/should_run/T4030.run               T4030 [bad exit code] (normal)

I’m re-validating having pulled from HEAD, but I THINK that’s all.

Now

·         I don’t know how to Phab these individually

·         I have not sweated through which patch is responsible for which perf improvments.  Maybe Gipeda can tell?

·         I have not put each error message change with the correct patch.  I don’t know how much that matters.

So this is to say: anything you guys can do to help get this actually Done would be really helpful.   I’m out of time till Monday at least.

It would be great to collect those performance improvements!

Thanks!

Simon

 


_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Early inline

Mikolaj Konarski-2
Yay! Is that related to the following ("I also want to investigate
making INLINE pragmas fire in the "gentle" phase, on the grounds
that that's what the programmer said.")?

https://ghc.haskell.org/trac/ghc/ticket/12603#comment:30



On Fri, Feb 17, 2017 at 5:41 PM, Simon Peyton Jones via ghc-devs
<[hidden email]> wrote:

> Ben, David, Reid
>
> I have been working for months (on and off, mostly off, but very ON for the
> last week or two) on a very simple idea: the simplifier should inline things
> even in the “gentle” phase.
>
> It seems so simple.  And it is: the key patch is tiny.
>
> But it stressed corners of the optimiser that were not stressed before; and
> digging into it showed opportunities I did not know about before.
>
> So I  have ended up a with a whole series of patches, which are on
> wip/spj-early-inline branch
>
> 7f14d15c0e5fc2c9a81db3d0f0b01d85857b1d87 Error message wibbles accumulated
> from the preceding patches
>
> 0499c65d9fa45e7879e1e1264fdaa15274adcba6 Improve SetLevels for join points
>
> 3b2fc0827ff6cafa34836c2d9dc710b628c990b6 Change -ddump-tc-trace output in
> TcErrors, slightly
>
> 9ffdf62b0ca72c4f35579f9d6f31a9beebf23025 Improve pretty-printing of types
>
> 3f346eac06399a79adf48425018ee949cee245bf Add VarSet.anyDVarSet, allDVarSet
>
> 912e71eb3b4ec91e805ecf2236d1033e55e2933a The Early Inline Patch
>
> 7188cd13f8e54efa764d52ca016b87b3669b29f5 Small changes to expression sizing
> in CoreUnfold
>
> bfc6fa3f377d11bdfcdbf82b65bf2f39cb00b90c Fix SetLevels for makeStaticPtr
>
> 8b1cfea089faacb5b95ffcc3511e05faeabb8076 Extend CSE to handle recursive
> bindings
>
> 50411995641802568bb27c867afe804f91e0524c Combine identical case alterantives
> in CSE
>
> 2e077ccc736a0b2a622b7f42b7929966bddb4ded Inline data constructor wrappers in
> phase 2 only
>
> b868de53dd19f639c1070089ecff21948ff33e0d Make Specialise work with casts
>
> c767ae5f04a09ef71dcb8f67a17225a52c2cc5d2 Stop uniques ending up in SPEC rule
> names
>
> b49ed1f0102f93ca7f62632c436b41bd240b501f Occurrence-analyse the result of
> rule firings
>
> 607a735dfb99bb8f0edf466ccb01e732218c42ec Add -fspec-constr-keen
>
> 67a0c1872c0515f1f12ea68097a84e02da92f45b Refactor floating of bindings
> (fiBind)
>
> e90f4d7c6d3003039fa1647a3da3dafcaa75527b More tracing in SpecConstr
>
>
>
> Much to my surprise, we get some jolly nice improvements in compiler perf:
>
> 3%   perf/compiler/T5837.run            T5837 [stat too good] (normal)
>
> 7%   perf/compiler/parsing001.run       parsing001 [stat too good] (normal)
>
> 9%   perf/compiler/T12234.run           T12234 [stat too good] (optasm)
>
> 35%  perf/compiler/T9020.run            T9020 [stat too good] (optasm)
>
> 9%   perf/compiler/T3064.run            T3064 [stat too good] (normal)
>
> 13%  perf/compiler/T9961.run            T9961 [stat too good] (normal)
>
> 20%  perf/compiler/T13056.run           T13056 [stat too good] (optasm)
>
> 5%   perf/compiler/T9872d.run           T9872d [stat too good] (normal)
>
> 5%   perf/compiler/T9872c.run           T9872c [stat too good] (normal)
>
> 5%   perf/compiler/T9872b.run           T9872b [stat too good] (normal)
>
> 7%   perf/compiler/T9872a.run           T9872a [stat too good] (normal)
>
> 5%   perf/compiler/T783.run             T783 [stat too good] (normal)
>
> 35%   perf/compiler/T12227.run           T12227 [stat too good] (normal)
>
> 20%   perf/compiler/T1969.run            T1969 [stat too good] (normal)
>
> 5%   perf/should_run/lazy-bs-alloc.run  lazy-bs-alloc [stat too good]
> (normal)
>
> 5%   perf/compiler/T12707.run         T12707 [stat too good] (normal)
>
>
>
> 4%   perf/compiler/T3294.run            T3294 [stat too good] (normal)
>
> 1.5% perf/space_leaks/T4029.run         T4029 [stat too good] (ghci)
>
>
>
> So what is left?  I have sunk so much time into this and am still not QUITE
> out of the woods.   I was left with
>
> Unexpected failures:
>
>    codeGen/should_compile/debug.run              debug [bad stdout] (normal)
>
>    concurrent/should_run/T4030.run               T4030 [bad exit code]
> (normal)
>
> I’m re-validating having pulled from HEAD, but I THINK that’s all.
>
> Now
>
> ·         I don’t know how to Phab these individually
>
> ·         I have not sweated through which patch is responsible for which
> perf improvments.  Maybe Gipeda can tell?
>
> ·         I have not put each error message change with the correct patch.
> I don’t know how much that matters.
>
> So this is to say: anything you guys can do to help get this actually Done
> would be really helpful.   I’m out of time till Monday at least.
>
> It would be great to collect those performance improvements!
>
> Thanks!
>
> Simon
>
>
>
>
> _______________________________________________
> ghc-devs mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Early inline

GHC - devs mailing list
| Yay! Is that related to the following ("I also want to investigate making
| INLINE pragmas fire in the "gentle" phase, on the grounds that that's
| what the programmer said.")?
|

Yes, precisely

Simon

| -----Original Message-----
| From: Mikolaj Konarski [mailto:[hidden email]]
| Sent: 17 February 2017 17:06
| To: Simon Peyton Jones <[hidden email]>
| Cc: ghc-devs <[hidden email]>
| Subject: Re: Early inline
|
| Yay! Is that related to the following ("I also want to investigate making
| INLINE pragmas fire in the "gentle" phase, on the grounds that that's
| what the programmer said.")?
|
| https://ghc.haskell.org/trac/ghc/ticket/12603#comment:30
|
|
|
| On Fri, Feb 17, 2017 at 5:41 PM, Simon Peyton Jones via ghc-devs <ghc-
| [hidden email]> wrote:
| > Ben, David, Reid
| >
| > I have been working for months (on and off, mostly off, but very ON
| > for the last week or two) on a very simple idea: the simplifier should
| > inline things even in the “gentle” phase.
| >
| > It seems so simple.  And it is: the key patch is tiny.
| >
| > But it stressed corners of the optimiser that were not stressed
| > before; and digging into it showed opportunities I did not know about
| before.
| >
| > So I  have ended up a with a whole series of patches, which are on
| > wip/spj-early-inline branch
| >
| > 7f14d15c0e5fc2c9a81db3d0f0b01d85857b1d87 Error message wibbles
| > accumulated from the preceding patches
| >
| > 0499c65d9fa45e7879e1e1264fdaa15274adcba6 Improve SetLevels for join
| > points
| >
| > 3b2fc0827ff6cafa34836c2d9dc710b628c990b6 Change -ddump-tc-trace output
| > in TcErrors, slightly
| >
| > 9ffdf62b0ca72c4f35579f9d6f31a9beebf23025 Improve pretty-printing of
| > types
| >
| > 3f346eac06399a79adf48425018ee949cee245bf Add VarSet.anyDVarSet,
| > allDVarSet
| >
| > 912e71eb3b4ec91e805ecf2236d1033e55e2933a The Early Inline Patch
| >
| > 7188cd13f8e54efa764d52ca016b87b3669b29f5 Small changes to expression
| > sizing in CoreUnfold
| >
| > bfc6fa3f377d11bdfcdbf82b65bf2f39cb00b90c Fix SetLevels for
| > makeStaticPtr
| >
| > 8b1cfea089faacb5b95ffcc3511e05faeabb8076 Extend CSE to handle
| > recursive bindings
| >
| > 50411995641802568bb27c867afe804f91e0524c Combine identical case
| > alterantives in CSE
| >
| > 2e077ccc736a0b2a622b7f42b7929966bddb4ded Inline data constructor
| > wrappers in phase 2 only
| >
| > b868de53dd19f639c1070089ecff21948ff33e0d Make Specialise work with
| > casts
| >
| > c767ae5f04a09ef71dcb8f67a17225a52c2cc5d2 Stop uniques ending up in
| > SPEC rule names
| >
| > b49ed1f0102f93ca7f62632c436b41bd240b501f Occurrence-analyse the result
| > of rule firings
| >
| > 607a735dfb99bb8f0edf466ccb01e732218c42ec Add -fspec-constr-keen
| >
| > 67a0c1872c0515f1f12ea68097a84e02da92f45b Refactor floating of bindings
| > (fiBind)
| >
| > e90f4d7c6d3003039fa1647a3da3dafcaa75527b More tracing in SpecConstr
| >
| >
| >
| > Much to my surprise, we get some jolly nice improvements in compiler
| perf:
| >
| > 3%   perf/compiler/T5837.run            T5837 [stat too good] (normal)
| >
| > 7%   perf/compiler/parsing001.run       parsing001 [stat too good]
| (normal)
| >
| > 9%   perf/compiler/T12234.run           T12234 [stat too good] (optasm)
| >
| > 35%  perf/compiler/T9020.run            T9020 [stat too good] (optasm)
| >
| > 9%   perf/compiler/T3064.run            T3064 [stat too good] (normal)
| >
| > 13%  perf/compiler/T9961.run            T9961 [stat too good] (normal)
| >
| > 20%  perf/compiler/T13056.run           T13056 [stat too good] (optasm)
| >
| > 5%   perf/compiler/T9872d.run           T9872d [stat too good] (normal)
| >
| > 5%   perf/compiler/T9872c.run           T9872c [stat too good] (normal)
| >
| > 5%   perf/compiler/T9872b.run           T9872b [stat too good] (normal)
| >
| > 7%   perf/compiler/T9872a.run           T9872a [stat too good] (normal)
| >
| > 5%   perf/compiler/T783.run             T783 [stat too good] (normal)
| >
| > 35%   perf/compiler/T12227.run           T12227 [stat too good]
| (normal)
| >
| > 20%   perf/compiler/T1969.run            T1969 [stat too good] (normal)
| >
| > 5%   perf/should_run/lazy-bs-alloc.run  lazy-bs-alloc [stat too good]
| > (normal)
| >
| > 5%   perf/compiler/T12707.run         T12707 [stat too good] (normal)
| >
| >
| >
| > 4%   perf/compiler/T3294.run            T3294 [stat too good] (normal)
| >
| > 1.5% perf/space_leaks/T4029.run         T4029 [stat too good] (ghci)
| >
| >
| >
| > So what is left?  I have sunk so much time into this and am still not
| QUITE
| > out of the woods.   I was left with
| >
| > Unexpected failures:
| >
| >    codeGen/should_compile/debug.run              debug [bad stdout]
| (normal)
| >
| >    concurrent/should_run/T4030.run               T4030 [bad exit code]
| > (normal)
| >
| > I’m re-validating having pulled from HEAD, but I THINK that’s all.
| >
| > Now
| >
| > ·         I don’t know how to Phab these individually
| >
| > ·         I have not sweated through which patch is responsible for
| which
| > perf improvments.  Maybe Gipeda can tell?
| >
| > ·         I have not put each error message change with the correct
| patch.
| > I don’t know how much that matters.
| >
| > So this is to say: anything you guys can do to help get this actually
| Done
| > would be really helpful.   I’m out of time till Monday at least.
| >
| > It would be great to collect those performance improvements!
| >
| > Thanks!
| >
| > Simon
| >
| >
| >
| >
| > _______________________________________________
| > ghc-devs mailing list
| > [hidden email]
| > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.h
| > askell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-
| devs&data=02%7C01%7Csimonpj%40microsoft.com%7C8c16ded135904fee0a8b08d4575
| 74965%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636229479871604558&sda
| ta=Rduq%2B2qaF5MC1p0%2BO0GTV%2BmXK8En9xwHNM7KAYhto10%3D&reserved=0
| >
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Early inline

David Feuer-2
In reply to this post by GHC - devs mailing list
Yes, we definely want these. Are you wanting each of these submitted as a
separate differential *in order*? Or do you want a more complex mix-and-match?
Also, are there any commits you think should be squashed?

On Friday, February 17, 2017 4:41:33 PM EST Simon Peyton Jones via ghc-devs
wrote:

> Ben, David, Reid
> I have been working for months (on and off, mostly off, but very ON for the
> last week or two) on a very simple idea: the simplifier should inline
> things even in the "gentle" phase. It seems so simple.  And it is: the key
> patch is tiny.
> But it stressed corners of the optimiser that were not stressed before; and
> digging into it showed opportunities I did not know about before. So I
> have ended up a with a whole series of patches, which are on
> wip/spj-early-inline branch
>
> 7f14d15c0e5fc2c9a81db3d0f0b01d85857b1d87 Error message wibbles accumulated
> from the preceding patches
>
> 0499c65d9fa45e7879e1e1264fdaa15274adcba6 Improve SetLevels for join points
>
> 3b2fc0827ff6cafa34836c2d9dc710b628c990b6 Change -ddump-tc-trace output in
> TcErrors, slightly
>
> 9ffdf62b0ca72c4f35579f9d6f31a9beebf23025 Improve pretty-printing of types
>
> 3f346eac06399a79adf48425018ee949cee245bf Add VarSet.anyDVarSet, allDVarSet
>
> 912e71eb3b4ec91e805ecf2236d1033e55e2933a The Early Inline Patch
>
> 7188cd13f8e54efa764d52ca016b87b3669b29f5 Small changes to expression sizing
> in CoreUnfold
>
> bfc6fa3f377d11bdfcdbf82b65bf2f39cb00b90c Fix SetLevels for makeStaticPtr
>
> 8b1cfea089faacb5b95ffcc3511e05faeabb8076 Extend CSE to handle recursive
> bindings
>
> 50411995641802568bb27c867afe804f91e0524c Combine identical case alterantives
> in CSE
>
> 2e077ccc736a0b2a622b7f42b7929966bddb4ded Inline data constructor wrappers in
> phase 2 only
>
> b868de53dd19f639c1070089ecff21948ff33e0d Make Specialise work with casts
>
> c767ae5f04a09ef71dcb8f67a17225a52c2cc5d2 Stop uniques ending up in SPEC rule
> names
>
> b49ed1f0102f93ca7f62632c436b41bd240b501f Occurrence-analyse the result of
> rule firings
>
> 607a735dfb99bb8f0edf466ccb01e732218c42ec Add -fspec-constr-keen
>
> 67a0c1872c0515f1f12ea68097a84e02da92f45b Refactor floating of bindings
> (fiBind)
>
> e90f4d7c6d3003039fa1647a3da3dafcaa75527b More tracing in SpecConstr
>
>
> Much to my surprise, we get some jolly nice improvements in compiler perf:
>
> 3%   perf/compiler/T5837.run            T5837 [stat too good] (normal)
>
> 7%   perf/compiler/parsing001.run       parsing001 [stat too good] (normal)
>
> 9%   perf/compiler/T12234.run           T12234 [stat too good] (optasm)
>
> 35%  perf/compiler/T9020.run            T9020 [stat too good] (optasm)
>
> 9%   perf/compiler/T3064.run            T3064 [stat too good] (normal)
>
> 13%  perf/compiler/T9961.run            T9961 [stat too good] (normal)
>
> 20%  perf/compiler/T13056.run           T13056 [stat too good] (optasm)
>
> 5%   perf/compiler/T9872d.run           T9872d [stat too good] (normal)
>
> 5%   perf/compiler/T9872c.run           T9872c [stat too good] (normal)
>
> 5%   perf/compiler/T9872b.run           T9872b [stat too good] (normal)
>
> 7%   perf/compiler/T9872a.run           T9872a [stat too good] (normal)
>
> 5%   perf/compiler/T783.run             T783 [stat too good] (normal)
>
> 35%   perf/compiler/T12227.run           T12227 [stat too good] (normal)
>
> 20%   perf/compiler/T1969.run            T1969 [stat too good] (normal)
>
> 5%   perf/should_run/lazy-bs-alloc.run  lazy-bs-alloc [stat too good]
> (normal)
>
> 5%   perf/compiler/T12707.run         T12707 [stat too good] (normal)
>
>
>
> 4%   perf/compiler/T3294.run            T3294 [stat too good] (normal)
>
> 1.5% perf/space_leaks/T4029.run         T4029 [stat too good] (ghci)
>
> So what is left?  I have sunk so much time into this and am still not QUITE
> out of the woods.   I was left with
>
> Unexpected failures:
>
>    codeGen/should_compile/debug.run              debug [bad stdout] (normal)
>
>    concurrent/should_run/T4030.run               T4030 [bad exit code]
> (normal) I'm re-validating having pulled from HEAD, but I THINK that's all.
> Now
>
> *         I don't know how to Phab these individually
>
> *         I have not sweated through which patch is responsible for which
> perf improvments.  Maybe Gipeda can tell?
>
> *         I have not put each error message change with the correct patch.
> I don't know how much that matters. So this is to say: anything you guys
> can do to help get this actually Done would be really helpful.   I'm out of
> time till Monday at least. It would be great to collect those performance
> improvements!
> Thanks!
> Simon


_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Early inline

Joachim Breitner-2
In reply to this post by GHC - devs mailing list
Hi,

Am Freitag, den 17.02.2017, 16:41 +0000 schrieb Simon Peyton Jones via
ghc-devs:
> ·         I have not sweated through which patch is responsible for
> which perf improvments.  Maybe Gipeda can tell?

yes it can! It does not draw nice graphs for branches yet, but it will
(try to) build all the commits on the branch. Once that is done (can
take a while), the branch will show up under “Branches” on
https://perf.haskell.org/ghc/

Clicking on the hash next to the branch will show you the latest commit
on that brach, together with its performance changes. That page also
has a “parent” link that you can click to look at the previous patches
in sequence.

I can have a look once the patches are built.

Greetings,
Joachim

--
Joachim “nomeata” Breitner
  [hidden email]https://www.joachim-breitner.de/
  XMPP: [hidden email] • OpenPGP-Key: 0xF0FBF51F
  Debian Developer: [hidden email]
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Early inline

GHC - devs mailing list
In reply to this post by David Feuer-2
I can see that

- it'd be nice to associate the perf improvements with the right patch
- it'd be nice to associate the error-message wibbles with the right patch
- it'd be nice to Phab them all so others can comment

But life is short, so I'd be perfectly happy if we were able to just commit them, provided they validate collectively.    It's up to you guys.

There may be some more error message wibbles when you do full run (didn't have time to do that before leaving).

Don't squash them.. each patch does something separate... it's not a stream of successive fixes to the same thing.  I've already done the squashing.

The SetLevels changes strictly subsume everything in the separate patch I sent Ben (cc ghc-devs) fixing #13255, and will conflict with it.  If so, ignore the latter.

Simon


 -----Original Message-----
| From: David Feuer [mailto:[hidden email]]
| Sent: 17 February 2017 18:33
| To: [hidden email]; Simon Peyton Jones <[hidden email]>
| Cc: Ben Gamari <[hidden email]>; Reid Barton <[hidden email]>;
| David Feuer <[hidden email]>
| Subject: Re: Early inline
|
| Yes, we definely want these. Are you wanting each of these submitted as a
| separate differential *in order*? Or do you want a more complex mix-and-
| match?
| Also, are there any commits you think should be squashed?
|
| On Friday, February 17, 2017 4:41:33 PM EST Simon Peyton Jones via ghc-
| devs
| wrote:
| > Ben, David, Reid
| > I have been working for months (on and off, mostly off, but very ON
| > for the last week or two) on a very simple idea: the simplifier should
| > inline things even in the "gentle" phase. It seems so simple.  And it
| > is: the key patch is tiny.
| > But it stressed corners of the optimiser that were not stressed
| > before; and digging into it showed opportunities I did not know about
| > before. So I have ended up a with a whole series of patches, which are
| > on wip/spj-early-inline branch
| >
| > 7f14d15c0e5fc2c9a81db3d0f0b01d85857b1d87 Error message wibbles
| > accumulated from the preceding patches
| >
| > 0499c65d9fa45e7879e1e1264fdaa15274adcba6 Improve SetLevels for join
| > points
| >
| > 3b2fc0827ff6cafa34836c2d9dc710b628c990b6 Change -ddump-tc-trace output
| > in TcErrors, slightly
| >
| > 9ffdf62b0ca72c4f35579f9d6f31a9beebf23025 Improve pretty-printing of
| > types
| >
| > 3f346eac06399a79adf48425018ee949cee245bf Add VarSet.anyDVarSet,
| > allDVarSet
| >
| > 912e71eb3b4ec91e805ecf2236d1033e55e2933a The Early Inline Patch
| >
| > 7188cd13f8e54efa764d52ca016b87b3669b29f5 Small changes to expression
| > sizing in CoreUnfold
| >
| > bfc6fa3f377d11bdfcdbf82b65bf2f39cb00b90c Fix SetLevels for
| > makeStaticPtr
| >
| > 8b1cfea089faacb5b95ffcc3511e05faeabb8076 Extend CSE to handle
| > recursive bindings
| >
| > 50411995641802568bb27c867afe804f91e0524c Combine identical case
| > alterantives in CSE
| >
| > 2e077ccc736a0b2a622b7f42b7929966bddb4ded Inline data constructor
| > wrappers in phase 2 only
| >
| > b868de53dd19f639c1070089ecff21948ff33e0d Make Specialise work with
| > casts
| >
| > c767ae5f04a09ef71dcb8f67a17225a52c2cc5d2 Stop uniques ending up in
| > SPEC rule names
| >
| > b49ed1f0102f93ca7f62632c436b41bd240b501f Occurrence-analyse the result
| > of rule firings
| >
| > 607a735dfb99bb8f0edf466ccb01e732218c42ec Add -fspec-constr-keen
| >
| > 67a0c1872c0515f1f12ea68097a84e02da92f45b Refactor floating of bindings
| > (fiBind)
| >
| > e90f4d7c6d3003039fa1647a3da3dafcaa75527b More tracing in SpecConstr
| >
| >
| > Much to my surprise, we get some jolly nice improvements in compiler
| perf:
| >
| > 3%   perf/compiler/T5837.run            T5837 [stat too good] (normal)
| >
| > 7%   perf/compiler/parsing001.run       parsing001 [stat too good]
| (normal)
| >
| > 9%   perf/compiler/T12234.run           T12234 [stat too good] (optasm)
| >
| > 35%  perf/compiler/T9020.run            T9020 [stat too good] (optasm)
| >
| > 9%   perf/compiler/T3064.run            T3064 [stat too good] (normal)
| >
| > 13%  perf/compiler/T9961.run            T9961 [stat too good] (normal)
| >
| > 20%  perf/compiler/T13056.run           T13056 [stat too good] (optasm)
| >
| > 5%   perf/compiler/T9872d.run           T9872d [stat too good] (normal)
| >
| > 5%   perf/compiler/T9872c.run           T9872c [stat too good] (normal)
| >
| > 5%   perf/compiler/T9872b.run           T9872b [stat too good] (normal)
| >
| > 7%   perf/compiler/T9872a.run           T9872a [stat too good] (normal)
| >
| > 5%   perf/compiler/T783.run             T783 [stat too good] (normal)
| >
| > 35%   perf/compiler/T12227.run           T12227 [stat too good]
| (normal)
| >
| > 20%   perf/compiler/T1969.run            T1969 [stat too good] (normal)
| >
| > 5%   perf/should_run/lazy-bs-alloc.run  lazy-bs-alloc [stat too good]
| > (normal)
| >
| > 5%   perf/compiler/T12707.run         T12707 [stat too good] (normal)
| >
| >
| >
| > 4%   perf/compiler/T3294.run            T3294 [stat too good] (normal)
| >
| > 1.5% perf/space_leaks/T4029.run         T4029 [stat too good] (ghci)
| >
| > So what is left?  I have sunk so much time into this and am still not
| QUITE
| > out of the woods.   I was left with
| >
| > Unexpected failures:
| >
| >    codeGen/should_compile/debug.run              debug [bad stdout]
| (normal)
| >
| >    concurrent/should_run/T4030.run               T4030 [bad exit code]
| > (normal) I'm re-validating having pulled from HEAD, but I THINK that's
| all.
| > Now
| >
| > *         I don't know how to Phab these individually
| >
| > *         I have not sweated through which patch is responsible for
| which
| > perf improvments.  Maybe Gipeda can tell?
| >
| > *         I have not put each error message change with the correct
| patch.
| > I don't know how much that matters. So this is to say: anything you
| guys
| > can do to help get this actually Done would be really helpful.   I'm
| out of
| > time till Monday at least. It would be great to collect those
| > performance improvements!
| > Thanks!
| > Simon
|

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Early inline

Joachim Breitner-2
In reply to this post by GHC - devs mailing list
Hi,

perf.haskell.org has built all but the last patch in this sequence, so
I can now see what it has to say about where the performance changes
came from:

Am Freitag, den 17.02.2017, 16:41 +0000 schrieb Simon Peyton Jones via
ghc-devs:
> So I  have ended up a with a whole series of patches, which are on
> wip/spj-early-inline branch
> 7f14d15c0e5fc2c9a81db3d0f0b01d85857b1d87 Error message wibbles accumulated from the preceding patches

Not built yet, but probably not interesting.

> 0499c65d9fa45e7879e1e1264fdaa15274adcba6 Improve SetLevels for join points
nofib/time/hidden  0.376  + 5.59%  0.397  s
econds

> 3b2fc0827ff6cafa34836c2d9dc710b628c990b6 Change -ddump-tc-trace output in TcErrors, slightly
no change

> 9ffdf62b0ca72c4f35579f9d6f31a9beebf23025 Improve pretty-printing of types
no change

> 3f346eac06399a79adf48425018ee949cee245bf Add VarSet.anyDVarSet, allDVarSet
no change

> 912e71eb3b4ec91e805ecf2236d1033e55e2933a The Early Inline Patch


> 7188cd13f8e54efa764d52ca016b87b3669b29f5 Small changes to expression sizing in CoreUnfold
> bfc6fa3f377d11bdfcdbf82b65bf2f39cb00b90c Fix SetLevels for makeStaticPtr
> 8b1cfea089faacb5b95ffcc3511e05faeabb8076 Extend CSE to handle recursive bindings
> 50411995641802568bb27c867afe804f91e0524c Combine identical case alterantives in CSE
> 2e077ccc736a0b2a622b7f42b7929966bddb4ded Inline data constructor wrappers in phase 2 only
> b868de53dd19f639c1070089ecff21948ff33e0d Make Specialise work with casts
> c767ae5f04a09ef71dcb8f67a17225a52c2cc5d2 Stop uniques ending up in SPEC rule names
> b49ed1f0102f93ca7f62632c436b41bd240b501f Occurrence-analyse the result of rule firings
> 607a735dfb99bb8f0edf466ccb01e732218c42ec Add -fspec-constr-keen
> 67a0c1872c0515f1f12ea68097a84e02da92f45b Refactor floating of bindings (fiBind)
These patches cannot be distinguished because all but the last one
failed to build:

    compiler/simplCore/SimplCore.hs:435:48: error:
                • Couldn't match type ‘CoreM ModGuts’
                                 with ‘CoreProgram -> CoreProgram’
                  Expected type: DynFlags -> CoreProgram -> CoreProgram
                    Actual type: ModGuts -> CoreM ModGuts
                • In the first argument of ‘doPassD’, namely ‘floatInwards’
                  In the expression: doPassD floatInwards
                  In the expression:
                    {-# SCC "FloatInwards" #-} (doPassD floatInwards)
    https://github.com/nomeata/ghc-speed-logs/blob/ae1b6dcd32fd2c8578ef3eee4c6f8926d845cb97/bfc6fa3f377d11bdfcdbf82b65bf2f39cb00b90c.log.broken

The overall effect of this patch was (as you already know):

nofib/time/binary-trees  0.751  - 4.79%  0.715 
seconds
nofib/time/fannkuch-redux  4.751  - 3.85%  4.568 
seconds
nofib/time/integer        1.276  + 19.04%  1.519 
seconds

all sizes increase by 3 or 4%.

tests/alloc/T10547  32406096  - 4.48%  30953160  bytes
tests/alloc/T10858  259699544  - 4.94%  246866000  bytes
tests/alloc/T12227  1654153320  - 35.87%  1060777528  bytes
tests/alloc/T12234  75197448  - 7.02%  69918192  bytes
tests/alloc/T12707  1309049328  - 5.06%  1242803272  bytes
tests/alloc/T13035  90082344  - 4.04%  86438544  bytes
tests/alloc/T13056  512447048  - 20.21%  408873760  bytes
tests/alloc/T1969  756392264  - 19%  612713624  bytes
tests/alloc/T3064  287429088  - 8.9%  261860968  bytes
tests/alloc/T3294  2715661784  - 3.51%  2620404344  bytes
tests/alloc/T4801  412672008  - 5.77%  388841920  bytes
tests/alloc/T5321FD  470413728  - 3.67%  453148744  bytes
tests/alloc/T5321Fun  500839840  - 3.11%  485276616  bytes
tests/alloc/T5642  836251056  - 5.19%  792875648  bytes
tests/alloc/T5837  51684016  - 3.97%  49631216  bytes
tests/alloc/T6048  98489944  + 3.4%  101835168  bytes
tests/alloc/T783  462334328  - 5.21%  438237272  bytes
tests/alloc/T9020  775878448  - 35.27%  502248184  bytes
tests/alloc/T9872a  3136944168  - 6.81%  2923428352  bytes
tests/alloc/T9872b  3964092608  - 5.85%  3732226832  bytes
tests/alloc/T9872c  3603773864  - 5.49%  3405843000  bytes
tests/alloc/T9872d  466420232  - 5.1%  442644168  bytes
tests/alloc/T9961  575612760  - 13.15%  499917080  bytes
tests/alloc/lazy-bs-all 436680  - 3.77%  420224  bytes
tests/alloc/parsing001  499038992  - 6.77%  465237088  bytes
tests/alloc/T10547  32406096  - 4.48%  30953160  bytes
tests/alloc/T10858  259699544  - 4.94%  246866000  bytes
tests/alloc/T12227  1654153320  - 35.87%  1060777528  bytes
tests/alloc/T12234  75197448  - 7.02%  69918192  bytes
tests/alloc/T12707  1309049328  - 5.06%  1242803272  bytes
tests/alloc/T13035  90082344  - 4.04%  86438544  bytes
tests/alloc/T13056  512447048  - 20.21%  408873760  bytes
tests/alloc/T1969  756392264  - 19%  612713624  bytes
tests/alloc/T3064  287429088  - 8.9%  261860968  bytes
tests/alloc/T3294  2715661784  - 3.51%  2620404344  bytes
tests/alloc/T4801  412672008  - 5.77%  388841920  bytes
tests/alloc/T5321FD  470413728  - 3.67%  453148744  bytes
tests/alloc/T5321Fun  500839840  - 3.11%  485276616  bytes
tests/alloc/T5642  836251056  - 5.19%  792875648  bytes
tests/alloc/T5837  51684016  - 3.97%  49631216  bytes
tests/alloc/T6048  98489944  + 3.4%  101835168  bytes
tests/alloc/T783  462334328  - 5.21%  438237272  bytes
tests/alloc/T9020  775878448  - 35.27%  502248184  bytes
tests/alloc/T9872a  3136944168  - 6.81%  2923428352  bytes
tests/alloc/T9872b  3964092608  - 5.85%  3732226832  bytes
tests/alloc/T9872c  3603773864  - 5.49%  3405843000  bytes
tests/alloc/T9872d  466420232  - 5.1%  442644168  bytes
tests/alloc/T9961  575612760  - 13.15%  499917080  bytes
tests/alloc/lazy-bs-all 436680   - 3.77%  420224  bytes
tests/alloc/parsing001  499038992  - 6.77%  465237088  bytes


> e90f4d7c6d3003039fa1647a3da3dafcaa75527b More tracing in SpecConstr
no changes.

Well, less helpful than expected, but hard to do better given a patch
series where not every patch builds.

Greetings,
Joachim
--
Joachim “nomeata” Breitner
  [hidden email]https://www.joachim-breitner.de/
  XMPP: [hidden email] • OpenPGP-Key: 0xF0FBF51F
  Debian Developer: [hidden email]
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

signature.asc (849 bytes) Download Attachment
Loading...