Slower Compilation on GHC 7.8.4 (vs. 7.6.3)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Slower Compilation on GHC 7.8.4 (vs. 7.6.3)

ozataman
Hello,

Apologies upfront for the long post below, but I thought our experience on
this particular point might be worth sharing. I present some observations
first and then mention the particular problem around "cabal repl"
facilitated development workflow. Any ideas/feedback on the latter point
would be much appreciated and please let me know if there are other
metrics, info or output I can produce here for further
investigation/clarification.

Recently at work, we upgraded a decent size project from GHC 7.6.3 to GHC
7.8.4 and as a result observed a very significant slowdown in compile
times. Code changes were very minimal and should not be a factor. Any
mistakes below are mine, but I can fairly confidently say that there is a
visible-to-the-eye slowdown in compile times.

We made a few measurements to quantify the issue and here is the data:

   - This project has 220 direct modules (non-dependency) that are compiled
   with each "cabal build" and we always use sandboxes
   - Tests were performed on OS X, though we saw similar results on Linux
   - Single threaded (-j=1) wall-clock compile time has gone up by around
   30% for the overall project
   - With O1 and -j=N, the overall wall-clock time is approximately the
   same (around 8 min), but the CPU time spent is a staggering 7.5X higher (!)
   - With O0 and -j=N the overall wall-clock time is 40% higher, but the
   CPU time spent is a staggering 7.5X higher (!)
   - With -O0, "cabal repl" load times have gone up by a staggering 2.5X (!)
      - 7.8.4: 3 min. 35 seconds
      - 7.6.3: 1 min. 2 seconds
   - Measuring compile times for individual modules, we see that those that
   are heavily loaded with lots of types, TH-facilitated type and typeclass
   derivations and those that contain large "blobs" of values directly at the
   top level now take much longer to compile:
      - We have 10 modules that each take over 10 seconds
      - We have 3 modules that each take over 35 seconds
   - Sidenote observation: In general, parallel builds with -j appear to
   cause a very significant overhead under the "system" part of timing:
      - Example with 7.6.3, O0: cabal build  140.57s user 13.25s system
      100% cpu 2:33.70 total
      - Example with 7.8.4, O0: cabal build  507.83s user 655.43s system
      549% cpu 3:31.59 total


The main hurt here has been the infeasibility of using "cabal repl / ghci"
 fast-feedback style development (via emacs, vim, command-line, etc.),
since:

   - Unlike "cabal build", "cabal repl" re-compiles from scratch with every
   invocation ":l App.Foo.Bar", which itself is the necessary first step when
   starting to hack on a module (we can then use :r to somewhat help the
   situation)
   - The slowdown is about 2.5X as stated above: What used to take a minute
   now takes 3.5 minutes.
   - cabal repl does not seem to benefit from -j at all
   - cabal repl 7.8.4 appears to be hurt particularly by the heavier modules
   - The heavier modules are often at the top of the compile tree (types,
   derivations, etc) and are practically loaded on the critical path all the
   time

Let me know if I'm missing anything here and any/all feedback is much
appreciated!

Cheers,
Oz
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20150405/69542f0f/attachment.html>

Reply | Threaded
Open this post in threaded view
|

Slower Compilation on GHC 7.8.4 (vs. 7.6.3)

Simon Peyton Jones
Oz, and others,

Thanks for this data point.  Two things:


?         Oz: is there any chance you could boil out a test case that highlights the regression most starkly?  It is sad to know that GHC?s compilation time is increasing, but hard to tackle.  A test case, preferably one that doesn?t rely on zillions of other libraries, increases the chances of something getting done by a factor of 10 or 100.

Do make a ticket for it as well.  Otherwise it just gets lost.


?         Everyone.  It?s really hard to keep focus on keeping GHC?s compilation time and space down.  Everyone is usually focused on bugs and features.  It would be incredibly helpful if someone, or a small group, could build a profiled version of GHC 7.8, 7.10, and HEAD, and track down what is happening.  Usually there are multiple causes, but a factor of 7.5 ought not to be hard to nail down.


Please!

Thanks.

Simon

From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On Behalf Of Ozgun Ataman
Sent: 05 April 2015 19:01
To: ghc-devs at haskell.org
Subject: Slower Compilation on GHC 7.8.4 (vs. 7.6.3)

Hello,

Apologies upfront for the long post below, but I thought our experience on this particular point might be worth sharing. I present some observations first and then mention the particular problem around "cabal repl" facilitated development workflow. Any ideas/feedback on the latter point would be much appreciated and please let me know if there are other metrics, info or output I can produce here for further investigation/clarification.

Recently at work, we upgraded a decent size project from GHC 7.6.3 to GHC 7.8.4 and as a result observed a very significant slowdown in compile times. Code changes were very minimal and should not be a factor. Any mistakes below are mine, but I can fairly confidently say that there is a visible-to-the-eye slowdown in compile times.

We made a few measurements to quantify the issue and here is the data:

  *   This project has 220 direct modules (non-dependency) that are compiled with each "cabal build" and we always use sandboxes
  *   Tests were performed on OS X, though we saw similar results on Linux
  *   Single threaded (-j=1) wall-clock compile time has gone up by around 30% for the overall project
  *   With O1 and -j=N, the overall wall-clock time is approximately the same (around 8 min), but the CPU time spent is a staggering 7.5X higher (!)
  *   With O0 and -j=N the overall wall-clock time is 40% higher, but the CPU time spent is a staggering 7.5X higher (!)
  *   With -O0, "cabal repl" load times have gone up by a staggering 2.5X (!)

     *   7.8.4: 3 min. 35 seconds
     *   7.6.3: 1 min. 2 seconds

  *   Measuring compile times for individual modules, we see that those that are heavily loaded with lots of types, TH-facilitated type and typeclass derivations and those that contain large "blobs" of values directly at the top level now take much longer to compile:

     *   We have 10 modules that each take over 10 seconds
     *   We have 3 modules that each take over 35 seconds

  *   Sidenote observation: In general, parallel builds with -j appear to cause a very significant overhead under the "system" part of timing:

     *   Example with 7.6.3, O0: cabal build  140.57s user 13.25s system 100% cpu 2:33.70 total
     *   Example with 7.8.4, O0: cabal build  507.83s user 655.43s system 549% cpu 3:31.59 total

The main hurt here has been the infeasibility of using "cabal repl / ghci"  fast-feedback style development (via emacs, vim, command-line, etc.), since:

  *   Unlike "cabal build", "cabal repl" re-compiles from scratch with every invocation ":l App.Foo.Bar", which itself is the necessary first step when starting to hack on a module (we can then use :r to somewhat help the situation)
  *   The slowdown is about 2.5X as stated above: What used to take a minute now takes 3.5 minutes.
  *   cabal repl does not seem to benefit from -j at all
  *   cabal repl 7.8.4 appears to be hurt particularly by the heavier modules
  *   The heavier modules are often at the top of the compile tree (types, derivations, etc) and are practically loaded on the critical path all the time
Let me know if I'm missing anything here and any/all feedback is much appreciated!

Cheers,
Oz
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20150406/a7d88c6f/attachment-0001.html>

Reply | Threaded
Open this post in threaded view
|

Slower Compilation on GHC 7.8.4 (vs. 7.6.3)

Richard Eisenberg-2
>  
> ?         Everyone.  It?s really hard to keep focus on keeping GHC?s compilation time and space down.  Everyone is usually focused on bugs and features.  It would be incredibly helpful if someone, or a small group, could build a profiled version of GHC 7.8, 7.10, and HEAD, and track down what is happening.  Usually there are multiple causes, but a factor of 7.5 ought not to be hard to nail down.
>  
> Please!

This is a great newcomer-to-ghc-hacking task. It's both relatively easy to do and gives a newcomer a chance to tour the codebase. The person doing the profiling doesn't also have to fix the bug. Just knowing what module or component of GHC is causing the trouble helps immensely.

As a further plug: when a change I made introduced a performance regression, I went into the debugging process very warily, expecting profiling GHC to be painful. It was just the opposite -- remarkably straightforward and able to produce results without terribly much effort. On the other hand, it does generally require having multiple (profiled) builds of GHC to get meaningful comparisons.

Thanks to whoever takes this on!

Richard
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20150406/11425aac/attachment.html>

Reply | Threaded
Open this post in threaded view
|

Slower Compilation on GHC 7.8.4 (vs. 7.6.3)

Simon Marlow-7
In reply to this post by ozataman
Just a comment on this one point:

On 05/04/2015 19:01, Ozgun Ataman wrote:
>   * With O0 and -j=N the overall wall-clock time is 40% higher, but the
>     CPU time spent is a staggering 7.5X higher (!)

When looking at Haskell programs executing in parallel, it's normal to
see a high CPU time, because the GC threads spin looking for work.  The
extra CPU time is just time spent spinning, it doesn't imply that the
compiler was doing extra work.  What matters is the overall wall-clock
time - a worse wall-clock time indicates that something has gone wrong.

We should look at compile-time regression independently of performance
issues with -j.  Parallel compilation is a new feature and mostly likely
needs a lot of tuning.  Note that cabal's -j feature is different from
GHC's (and is likely to give much more reliable improvements) because it
runs multiple compiler instances in parallel.

Cheers,
Simon


>   * With -O0, "cabal repl" load times have gone up by a staggering 2.5X (!)
>       o 7.8.4: 3 min. 35 seconds
>       o 7.6.3: 1 min. 2 seconds
>   * Measuring compile times for individual modules, we see that those
>     that are heavily loaded with lots of types, TH-facilitated type and
>     typeclass derivations and those that contain large "blobs" of values
>     directly at the top level now take much longer to compile:
>       o We have 10 modules that each take over 10 seconds
>       o We have 3 modules that each take over 35 seconds
>   * Sidenote observation: In general, parallel builds with -j appear to
>     cause a very significant overhead under the "system" part of timing:
>       o Example with 7.6.3, O0: cabal build  140.57s user 13.25s system
>         100% cpu 2:33.70 total
>       o Example with 7.8.4, O0: cabal build  507.83s user 655.43s system
>         549% cpu 3:31.59 total
>
>
> The main hurt here has been the infeasibility of using "cabal repl /
> ghci"  fast-feedback style development (via emacs, vim, command-line,
> etc.), since:
>
>   * Unlike "cabal build", "cabal repl" re-compiles from scratch with
>     every invocation ":l App.Foo.Bar", which itself is the necessary
>     first step when starting to hack on a module (we can then use :r to
>     somewhat help the situation)
>   * The slowdown is about 2.5X as stated above: What used to take a
>     minute now takes 3.5 minutes.
>   * cabal repl does not seem to benefit from -j at all
>   * cabal repl 7.8.4 appears to be hurt particularly by the heavier modules
>   * The heavier modules are often at the top of the compile tree (types,
>     derivations, etc) and are practically loaded on the critical path
>     all the time
>
> Let me know if I'm missing anything here and any/all feedback is much
> appreciated!
>
> Cheers,
> Oz
>
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>

Reply | Threaded
Open this post in threaded view
|

Slower Compilation on GHC 7.8.4 (vs. 7.6.3)

ozataman
Hello all,

Thank you for the comments and apologies that it took a bit of time to get
back to you. My colleague Michael and I have taken a stab at a package
demonstrating the slowdown. Our hypothesis was that the presence of type
declarations, large values/literals, typeclass instantiations and
engagement of TH caused at least some of the slowdown we were seeing in our
app code. It turns out this was only partially true:

   - 500 random type declarations were a modest 10-25% slower.
   - Simply instantiating types with a given class did not contribute that
   much to the slowdown (around 5% on top of type declarations)
   - There was no notable slowdown on large values/literals (false
   hypothesis)
   - TH-enabled derivations were over 100% slower (lens and safecopy used
   as test subjects)
      - lens: ~100% slower
      - safecopy: ~125% slower
   - The slowdowns appeared to be super-linear in at least some cases.
   Doubling the declaration count to 1000 increased the % of the slowdown:
      - Declarations: 30% slower
      - lens TH slowdown: ~100% slower (stayed the same)
      - safecopy TH slowdown: ~140% slower


I'd be happy to create a ticket if this is sufficient support material.
It's hard to say these conclusively isolate *all* culprits for the full
slowdown we observe in our codebase, but are hopefully at least a
significant part.

Here's the package:
https://dl.dropboxusercontent.com/u/58525/ghc-slowdown.tar.gz

Best,
Oz



On Tue, Apr 7, 2015 at 3:39 PM, Simon Marlow <marlowsd at gmail.com> wrote:

> Just a comment on this one point:
>
> On 05/04/2015 19:01, Ozgun Ataman wrote:
>
>>   * With O0 and -j=N the overall wall-clock time is 40% higher, but the
>>     CPU time spent is a staggering 7.5X higher (!)
>>
>
> When looking at Haskell programs executing in parallel, it's normal to see
> a high CPU time, because the GC threads spin looking for work.  The extra
> CPU time is just time spent spinning, it doesn't imply that the compiler
> was doing extra work.  What matters is the overall wall-clock time - a
> worse wall-clock time indicates that something has gone wrong.
>
> We should look at compile-time regression independently of performance
> issues with -j.  Parallel compilation is a new feature and mostly likely
> needs a lot of tuning.  Note that cabal's -j feature is different from
> GHC's (and is likely to give much more reliable improvements) because it
> runs multiple compiler instances in parallel.
>
> Cheers,
> Simon
>
>
>    * With -O0, "cabal repl" load times have gone up by a staggering 2.5X
>> (!)
>>       o 7.8.4: 3 min. 35 seconds
>>       o 7.6.3: 1 min. 2 seconds
>>   * Measuring compile times for individual modules, we see that those
>>     that are heavily loaded with lots of types, TH-facilitated type and
>>     typeclass derivations and those that contain large "blobs" of values
>>     directly at the top level now take much longer to compile:
>>       o We have 10 modules that each take over 10 seconds
>>       o We have 3 modules that each take over 35 seconds
>>   * Sidenote observation: In general, parallel builds with -j appear to
>>     cause a very significant overhead under the "system" part of timing:
>>       o Example with 7.6.3, O0: cabal build  140.57s user 13.25s system
>>         100% cpu 2:33.70 total
>>       o Example with 7.8.4, O0: cabal build  507.83s user 655.43s system
>>         549% cpu 3:31.59 total
>>
>>
>> The main hurt here has been the infeasibility of using "cabal repl /
>> ghci"  fast-feedback style development (via emacs, vim, command-line,
>> etc.), since:
>>
>>   * Unlike "cabal build", "cabal repl" re-compiles from scratch with
>>     every invocation ":l App.Foo.Bar", which itself is the necessary
>>     first step when starting to hack on a module (we can then use :r to
>>     somewhat help the situation)
>>   * The slowdown is about 2.5X as stated above: What used to take a
>>     minute now takes 3.5 minutes.
>>   * cabal repl does not seem to benefit from -j at all
>>   * cabal repl 7.8.4 appears to be hurt particularly by the heavier
>> modules
>>   * The heavier modules are often at the top of the compile tree (types,
>>     derivations, etc) and are practically loaded on the critical path
>>     all the time
>>
>> Let me know if I'm missing anything here and any/all feedback is much
>> appreciated!
>>
>> Cheers,
>> Oz
>>
>>
>> _______________________________________________
>> ghc-devs mailing list
>> ghc-devs at haskell.org
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20150414/64a4f750/attachment-0001.html>

Reply | Threaded
Open this post in threaded view
|

Slower Compilation on GHC 7.8.4 (vs. 7.6.3)

Andrés Sicard-Ramírez-2
On 14 April 2015 at 14:56, Ozgun Ataman <ozataman at gmail.com> wrote:
> 500 random type declarations were a modest 10-25% slower.

I don't think this is a modest percent by only adding type
declarations. Is it an expected behaviour?


--
Andr?s

Reply | Threaded
Open this post in threaded view
|

Slower Compilation on GHC 7.8.4 (vs. 7.6.3)

Simon Peyton Jones
In reply to this post by ozataman
This is all very helpful Oz.  Thanks.

What would be even more helpful would be ticket(s) that focus on individual bad aspects, especially ones that are non-linear.  Eg ?Here are modules with 500, 1000, 2000 declarations, and compile time goes up non-linearly?.

The two big ones seem to be:

?         Type declarations

?         TH-enabled derivations (whatever they are)

The simpler the better.   Eg safecopy probably has fewer dependencies than lens.  And maybe it?s nothing to do with safecopy, but any old derivation will do?

What?s ideal is to be able to turn up the size knob, with a profiled compiler, and see which bits of the compiler are going non-linear.

Thanks!

Simon

From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On Behalf Of Ozgun Ataman
Sent: 14 April 2015 20:56
To: Simon Marlow
Cc: michael.xavier at soostone.com; ghc-devs at haskell.org
Subject: Re: Slower Compilation on GHC 7.8.4 (vs. 7.6.3)

Hello all,

Thank you for the comments and apologies that it took a bit of time to get back to you. My colleague Michael and I have taken a stab at a package demonstrating the slowdown. Our hypothesis was that the presence of type declarations, large values/literals, typeclass instantiations and engagement of TH caused at least some of the slowdown we were seeing in our app code. It turns out this was only partially true:

  *   500 random type declarations were a modest 10-25% slower.
  *   Simply instantiating types with a given class did not contribute that much to the slowdown (around 5% on top of type declarations)
  *   There was no notable slowdown on large values/literals (false hypothesis)
  *   TH-enabled derivations were over 100% slower (lens and safecopy used as test subjects)

     *   lens: ~100% slower
     *   safecopy: ~125% slower

  *   The slowdowns appeared to be super-linear in at least some cases. Doubling the declaration count to 1000 increased the % of the slowdown:

     *   Declarations: 30% slower
     *   lens TH slowdown: ~100% slower (stayed the same)
     *   safecopy TH slowdown: ~140% slower

I'd be happy to create a ticket if this is sufficient support material. It's hard to say these conclusively isolate *all* culprits for the full slowdown we observe in our codebase, but are hopefully at least a significant part.

Here's the package: https://dl.dropboxusercontent.com/u/58525/ghc-slowdown.tar.gz

Best,
Oz



On Tue, Apr 7, 2015 at 3:39 PM, Simon Marlow <marlowsd at gmail.com<mailto:marlowsd at gmail.com>> wrote:
Just a comment on this one point:

On 05/04/2015 19:01, Ozgun Ataman wrote:
  * With O0 and -j=N the overall wall-clock time is 40% higher, but the
    CPU time spent is a staggering 7.5X higher (!)

When looking at Haskell programs executing in parallel, it's normal to see a high CPU time, because the GC threads spin looking for work.  The extra CPU time is just time spent spinning, it doesn't imply that the compiler was doing extra work.  What matters is the overall wall-clock time - a worse wall-clock time indicates that something has gone wrong.

We should look at compile-time regression independently of performance issues with -j.  Parallel compilation is a new feature and mostly likely needs a lot of tuning.  Note that cabal's -j feature is different from GHC's (and is likely to give much more reliable improvements) because it runs multiple compiler instances in parallel.

Cheers,
Simon

  * With -O0, "cabal repl" load times have gone up by a staggering 2.5X (!)
      o 7.8.4: 3 min. 35 seconds
      o 7.6.3: 1 min. 2 seconds
  * Measuring compile times for individual modules, we see that those
    that are heavily loaded with lots of types, TH-facilitated type and
    typeclass derivations and those that contain large "blobs" of values
    directly at the top level now take much longer to compile:
      o We have 10 modules that each take over 10 seconds
      o We have 3 modules that each take over 35 seconds
  * Sidenote observation: In general, parallel builds with -j appear to
    cause a very significant overhead under the "system" part of timing:
      o Example with 7.6.3, O0: cabal build  140.57s user 13.25s system
        100% cpu 2:33.70 total
      o Example with 7.8.4, O0: cabal build  507.83s user 655.43s system
        549% cpu 3:31.59 total


The main hurt here has been the infeasibility of using "cabal repl /
ghci"  fast-feedback style development (via emacs, vim, command-line,
etc.), since:

  * Unlike "cabal build", "cabal repl" re-compiles from scratch with
    every invocation ":l App.Foo.Bar", which itself is the necessary
    first step when starting to hack on a module (we can then use :r to
    somewhat help the situation)
  * The slowdown is about 2.5X as stated above: What used to take a
    minute now takes 3.5 minutes.
  * cabal repl does not seem to benefit from -j at all
  * cabal repl 7.8.4 appears to be hurt particularly by the heavier modules
  * The heavier modules are often at the top of the compile tree (types,
    derivations, etc) and are practically loaded on the critical path
    all the time

Let me know if I'm missing anything here and any/all feedback is much
appreciated!

Cheers,
Oz


_______________________________________________
ghc-devs mailing list
ghc-devs at haskell.org<mailto:ghc-devs at haskell.org>
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20150415/6263453f/attachment-0001.html>