Question about the new codegen in GHC

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Question about the new codegen in GHC

Jan Stolarek
Hi Simon,

as part of my internship at MSR I'm working on optimizing the new code generator. My starting point is Krzysztof Wo?'s project "Low-level code optimizations in the Glasgow Haskell Compiler" [1]. Right now I'm trying to figure out which of the described optimisations are currently implemented and which ones are not. Looking at a simple factorial program, which was used throughout Krzysztof's project, I see that the code gets optimized properly. Does this mean that all the optimizations described in that work are implemented? Two things I'm particularily interested in are:
  * do tail calls always call the entry code, or can it happen that a tail call is done indirectly (e.g. via another label that does immediate jump to the entry code)?
  * commentary on the wiki says that Common Block Elimination "essentially implements the Adams optimisation, we believe". Is that really the case, or are there situations when two similar blocks could be eliminated by Adams optimization, but will not be eliminated by CBE?

Janek

[1] http://research.microsoft.com/en-us/um/people/simonpj/tmp/wos-diss-draft.pdf


Reply | Threaded
Open this post in threaded view
|

Question about the new codegen in GHC

Simon Marlow-7
The "Adams optimisation" is now performed explicitly at
code-generation time, rather than generating the extra code and
relying on common block elimination to clean it up.  I did it this way
for a few reasons: (1) I wanted to be disable common-block-elimination
when -O is off without generating bloated code, and (2) to improve
compilation time (not generating too much code that we have to clean
up later), and (3) because we're less likely to accidentally break it
later without noticing.  For more information see Note [sharing
continuations] in StgCmmMonad.hs.

A tail call might be indirect, if the function being called is unknown
at compile time (Simon will be able to explain this to you in more
detail).

Most of the optimisation that we do on Cmm is in the CmmSink pass
(cmm/CmmSink.hs). This does code motion and constant folding, and is
designed to get most of the low-hanging fruit in the code we generate.
 It doesn't handle loops at all (i.e. it's pessimistic and bails out),
so that might be one thing you could look at.

On the other hand, we shouldn't be putting significant effort into
optimisations in GHC that LLVM can already do.  So what's the goal
here? Not to replace LLVM presumably, so I'm guessing you want to do
some optimisation at the Cmm level that can't be done with LLVM.  LLVM
already turns tail calls into loops, but whether it needs any help
with that I'm not sure. One area where LLVM gets stuck on
GHC-generated code is aliasing:
http://hackage.haskell.org/trac/ghc/ticket/5567.  Another issue is
that LLVM sometimes generates *worse* code than the NCG, see
http://hackage.haskell.org/trac/ghc/ticket/4308.  Fixing those would
be good, especially for platforms where we have no NCG (ARM).

Cheers,
Simon

On 4 July 2013 13:44, Jan Stolarek <jan.stolarek at p.lodz.pl> wrote:
> Hi Simon,
>
> as part of my internship at MSR I'm working on optimizing the new code generator. My starting point is Krzysztof Wo?'s project "Low-level code optimizations in the Glasgow Haskell Compiler" [1]. Right now I'm trying to figure out which of the described optimisations are currently implemented and which ones are not. Looking at a simple factorial program, which was used throughout Krzysztof's project, I see that the code gets optimized properly. Does this mean that all the optimizations described in that work are implemented? Two things I'm particularily interested in are:
>   * do tail calls always call the entry code, or can it happen that a tail call is done indirectly (e.g. via another label that does immediate jump to the entry code)?
>   * commentary on the wiki says that Common Block Elimination "essentially implements the Adams optimisation, we believe". Is that really the case, or are there situations when two similar blocks could be eliminated by Adams optimization, but will not be eliminated by CBE?
>
> Janek
>
> [1] http://research.microsoft.com/en-us/um/people/simonpj/tmp/wos-diss-draft.pdf