Problem with compiler perf tests

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Problem with compiler perf tests

Ömer Sinan Ağacan
Hi,

Currently we have a bunch of tests in testsuite/tests/perf/compiler for keeping
compile time allocations, max residency etc. in the expected ranges and avoid
introducing accidental compile time performance regressions.

This has a problem: we expect every MR to keep the compile time stats in the
specified ranges, but sometimes a patch fixes an issue, or does something right
(removes hacks/refactors bad code etc.) but also increases the numbers because
sometimes doing it right means doing more work or keeping more things in memory
(e.g. !1747, !2100 which is required by !1304).

We then spend hours/days trying to shave a few bytes off in those patches,
because the previous hacky/buggy code set the standards. It doesn't make sense
to compare bad/buggy code with good code and expect them to do the same thing.

Second problem is that it forces the developer to focus on a tiny part of the
compiler to reduce the numbers to the where they were. If they looked at the big
picture instead it might be possible to see rooms of improvements in other
places that could be possibly lead to much more efficient use of the developer
time.

I think what we should do instead is that once it's clear that the patch did not
introduce *accidental* increases in numbers (e.g. in !2100 I checked and
explained the increase in average residency, and showed that the increase makes
sense and is not a leak) and it's the right thing to do, we should merge it, and
track the performance issues in another issue. The CI should still run perf
tests, but those should be allowed to fail.

Any opinions on this?

Ömer
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Problem with compiler perf tests

Ben Gamari-3


On November 17, 2019 3:22:59 AM EST, "Ömer Sinan Ağacan" <[hidden email]> wrote:

>Hi,
>
>Currently we have a bunch of tests in testsuite/tests/perf/compiler for
>keeping
>compile time allocations, max residency etc. in the expected ranges and
>avoid
>introducing accidental compile time performance regressions.
>
>This has a problem: we expect every MR to keep the compile time stats
>in the
>specified ranges, but sometimes a patch fixes an issue, or does
>something right
>(removes hacks/refactors bad code etc.) but also increases the numbers
>because
>sometimes doing it right means doing more work or keeping more things
>in memory
>(e.g. !1747, !2100 which is required by !1304).
>
>We then spend hours/days trying to shave a few bytes off in those
>patches,
>because the previous hacky/buggy code set the standards. It doesn't
>make sense
>to compare bad/buggy code with good code and expect them to do the same
>thing.
>
>Second problem is that it forces the developer to focus on a tiny part
>of the
>compiler to reduce the numbers to the where they were. If they looked
>at the big
>picture instead it might be possible to see rooms of improvements in
>other
>places that could be possibly lead to much more efficient use of the
>developer
>time.
>
>I think what we should do instead is that once it's clear that the
>patch did not
>introduce *accidental* increases in numbers (e.g. in !2100 I checked
>and
>explained the increase in average residency, and showed that the
>increase makes
>sense and is not a leak) and it's the right thing to do, we should
>merge it, and
>track the performance issues in another issue. The CI should still run
>perf
>tests, but those should be allowed to fail.
>


To be clear, our policy is not that GHC's performance tests should never regress. You are quite right that it is sometimes unrealistic to expect a patch fixing previously-incorrect behavior to do so without introducing additional cost.

However, we do generally want to ensure that we aren't introducing low-hanging regressions. This is most easily done before the merge request is merged, when a developer has the relevant bits of the design (both new and old) fresh in mind and a clear picture of the structure of their implementation.

This is of course a tradeoff. At some point we must conclude that the marginal benefit of investigating any potential regressions is outweighed by that of using that effort elsewhere in the compiler. This is inevitably a judgement call.


In the specific case of !2100 I think we probably have crossed this threshold. The overall ~0.1% compile time regression that you report seems reasonable and I doubt that further work on this particular patch will eliminate this.

However, it also seems that in this particular case there are outstanding design questions which have yet to be addressed (specifically the exchange between you and Simon regarding PartialModIface which has more to do with implementation clarity than performance). I agree with Simon that we should avoid committing this patch in two pieces if unless there is a good reason. Perhaps you have such a reason?

Cheers,

- Ben

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Problem with compiler perf tests

Andreas Klebinger
In reply to this post by Ömer Sinan Ağacan

Ömer Sinan Ağacan schrieb am 17.11.2019 um 09:22:
> I think what we should do instead is that once it's clear that the
> patch did not
> introduce *accidental* increases in numbers (e.g. in !2100 I checked and
> explained the increase in average residency, and showed that the increase makes
> sense and is not a leak) and it's the right thing to do, we should merge it
But that's what we do already isn't it? We don't expect all changes to
have no performance implications
if they can be argued for.

However it's easy for "insignificant" changes to compound to a
significant slowdown so I don't think we are too careful currently. I've
never seen anyone care about "a few bytes".
Assuming we get 6 MR's who regresses a metric by 1% per year that adds
up quickly. Three years and we will be about 20% worse! So I think we
are right to be cautions with those things.

It's just that people sometimes (as in !2100 initially) disagree on what
the right thing to do is.
But I don't see a way around that no matter where we set the thresholds.
That can only be resolved by discourse.

What I don't agree with is pushing that discussion into separate tickets
in general.
That would just mean we get a bunch of performance regression, and a
bunch of tickets documenting them.

Which is better than not documenting them! And sometimes that will be
the best course of action.
But if there is a chance to resolve performance issues while a patch is
still being worked on that
will in general always be a better solution.

At least that's my opinion on the general case.

Cheers,
Andreas

> Hi,
>
> Currently we have a bunch of tests in testsuite/tests/perf/compiler for keeping
> compile time allocations, max residency etc. in the expected ranges and avoid
> introducing accidental compile time performance regressions.
>
> This has a problem: we expect every MR to keep the compile time stats in the
> specified ranges, but sometimes a patch fixes an issue, or does something right
> (removes hacks/refactors bad code etc.) but also increases the numbers because
> sometimes doing it right means doing more work or keeping more things in memory
> (e.g. !1747, !2100 which is required by !1304).
>
> We then spend hours/days trying to shave a few bytes off in those patches,
> because the previous hacky/buggy code set the standards. It doesn't make sense
> to compare bad/buggy code with good code and expect them to do the same thing.
>
> Second problem is that it forces the developer to focus on a tiny part of the
> compiler to reduce the numbers to the where they were. If they looked at the big
> picture instead it might be possible to see rooms of improvements in other
> places that could be possibly lead to much more efficient use of the developer
> time.
>
> I think what we should do instead is that once it's clear that the patch did not
> introduce *accidental* increases in numbers (e.g. in !2100 I checked and
> explained the increase in average residency, and showed that the increase makes
> sense and is not a leak) and it's the right thing to do, we should merge it, and
> track the performance issues in another issue. The CI should still run perf
> tests, but those should be allowed to fail.
>
> Any opinions on this?
>
> Ömer
> _______________________________________________
> ghc-devs mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs