Testing of GHC extensions & optimizations

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Testing of GHC extensions & optimizations

Rodrigo Stevaux
Hi,

For those familiar with GHC source code & internals, how are extensions & optimizations tested? And what are the quality policies for accepting new code into GHC?

I am interested in testing compilers in general using random testing. Is it used on GHC?



_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Testing of GHC extensions & optimizations

Ömer Sinan Ağacan
Hi,

Here are a few things we do regarding compiler/runtime performance:

- Each commit goes through some set of tests, some of which also check max.
  residency, total allocations etc. of the compiler or the compiled program,
  and fail if those numbers are more than the allowed amount. See [1] for an
  example.

- There's https://perf.haskell.org/ghc/ which does some testing on every
  commit. I don't know what exactly it's doing (hard to tell from the web page,
  but I guess it's only running a few select tests/benchmarks?). I've
  personally never used it, I just know that it exists.

- Most of the time if a patch is expected to change compiler or runtime
  performance the author submits nofib results and updates the perf tests in the
  test suite for new numbers. This process is manual and sometimes contributors
  are asked for nofib numbers by reviewers etc. See [2,3] for nofib.

We currently don't use random testing.

[1]: https://github.com/ghc/ghc/blob/565ef4cc036905f9f9801c1e775236bb007b026c/testsuite/tests/perf/compiler/all.T#L30
[2]: https://github.com/ghc/nofib
[3]: https://ghc.haskell.org/trac/ghc/wiki/Building/RunningNoFib

Ömer

Rodrigo Stevaux <[hidden email]>, 31 Ağu 2018 Cum, 20:54 tarihinde şunu yazdı:

>
> Hi,
>
> For those familiar with GHC source code & internals, how are extensions & optimizations tested? And what are the quality policies for accepting new code into GHC?
>
> I am interested in testing compilers in general using random testing. Is it used on GHC?
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Testing of GHC extensions & optimizations

Rodrigo Stevaux
Hi Omer, thanks for the reply. The tests you run are for regression testing, that is, non-functional aspects, is my understanding right? What about testing that optimizations and extensions are correct from a functional aspect?

Em sáb, 1 de set de 2018 às 08:32, Ömer Sinan Ağacan <[hidden email]> escreveu:
Hi,

Here are a few things we do regarding compiler/runtime performance:

- Each commit goes through some set of tests, some of which also check max.
  residency, total allocations etc. of the compiler or the compiled program,
  and fail if those numbers are more than the allowed amount. See [1] for an
  example.

- There's https://perf.haskell.org/ghc/ which does some testing on every
  commit. I don't know what exactly it's doing (hard to tell from the web page,
  but I guess it's only running a few select tests/benchmarks?). I've
  personally never used it, I just know that it exists.

- Most of the time if a patch is expected to change compiler or runtime
  performance the author submits nofib results and updates the perf tests in the
  test suite for new numbers. This process is manual and sometimes contributors
  are asked for nofib numbers by reviewers etc. See [2,3] for nofib.

We currently don't use random testing.

[1]: https://github.com/ghc/ghc/blob/565ef4cc036905f9f9801c1e775236bb007b026c/testsuite/tests/perf/compiler/all.T#L30
[2]: https://github.com/ghc/nofib
[3]: https://ghc.haskell.org/trac/ghc/wiki/Building/RunningNoFib

Ömer

Rodrigo Stevaux <[hidden email]>, 31 Ağu 2018 Cum, 20:54 tarihinde şunu yazdı:
>
> Hi,
>
> For those familiar with GHC source code & internals, how are extensions & optimizations tested? And what are the quality policies for accepting new code into GHC?
>
> I am interested in testing compilers in general using random testing. Is it used on GHC?
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Testing of GHC extensions & optimizations

Sven Panne-2
Am So., 2. Sep. 2018 um 20:05 Uhr schrieb Rodrigo Stevaux <[hidden email]>:
Hi Omer, thanks for the reply. The tests you run are for regression testing, that is, non-functional aspects, is my understanding right? [...]

Quite the opposite, the usual steps are:

   * A bug is reported.
   * A regression test is added to GHC's test suite, reproducing the bug (https://ghc.haskell.org/trac/ghc/wiki/Building/RunningTests/Adding).
   * The bug is fixed.

This way it is made sure that the bug doesn't come back later. Do this for a few decades, and you have a very comprehensive test suite for functional aspects. :-) The reasoning behind this: Blindly adding tests is wasted effort most of time, because this way you often test things which only very rarely break: Bugs OTOH hint you very concretely at problematic/tricky/complicated parts of your SW.

Catching increases in runtime/memory consumption is a slightly different story, because you have to come up with "typical" scenarios to make useful comparisons. You can have synthetic scenarios for very specific parts of the compiler, too, like pattern matching with tons of constructors, or using gigantic literals, or type checking deeply nested tricky things, etc., but I am not sure if such things are usually called "regression tests".

Cheers,
   S.



_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Testing of GHC extensions & optimizations

Joachim Durchholz
Am 02.09.2018 um 21:58 schrieb Sven Panne:
> Quite the opposite, the usual steps are:
>
>     * A bug is reported.
>     * A regression test is added to GHC's test suite, reproducing the
> bug (https://ghc.haskell.org/trac/ghc/wiki/Building/RunningTests/Adding).
>     * The bug is fixed.
>
> This way it is made sure that the bug doesn't come back later.

That's just the... non-thinking aspect, and more embarrassment
avoidance. The first level of automated testing.

 > Do this
> for a few decades, and you have a very comprehensive test suite for
> functional aspects. :-) The reasoning behind this: Blindly adding tests
> is wasted effort most of time, because this way you often test things
> which only very rarely break: Bugs OTOH hint you very concretely at
> problematic/tricky/complicated parts of your SW.

Well, you have to *think*.
You can't just blindly add tests for every bug that was ever reported;
you get an every-growing pile of test code, and if the spec changes you
need to change the tests. So you need a strategy to curate the test
code, and you very much prefer to test for the thing that actually went
wrong, not the thing that was reported.

I'm pretty sure the GHC guys do, actually; I'm just speaking up so that
people don't take this "just add a test whenever a bug occurs" at face
value, there's much more to it.

> Catching increases in runtime/memory consumption is a slightly different
> story, because you have to come up with "typical" scenarios to make
> useful comparisons.

It's just a case where you cannot blindly add a test for every
performance regression you see, you have to set up testing beforehand.
Which is the exact opposite of what you recommend, so maybe the
recommendation shouldn't be taken at face value ;-P

 > You can have synthetic scenarios for very specific
> parts of the compiler, too, like pattern matching with tons of
> constructors, or using gigantic literals, or type checking deeply nested
> tricky things, etc., but I am not sure if such things are usually called
> "regression tests".

It's a matter of definition and common usage, but indeed many people
associate the term "regression testing" with "let's write a test case
whenever we see a bug".

This is one of the reasons why I prefer the term "automated testing".
It's both more general and encompasses all the things that one does.

Oh, and sometimes you even add a test blindly due to a bug report. It's
still a good first line of defense, it's just not what you should always
do, and never without thinking about an alternative.

Regards,
Jo
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Testing of GHC extensions & optimizations

Rodrigo Stevaux
In reply to this post by Sven Panne-2
Thanks for the clarification.

What I am hinting at is, the Csmith project caught many bugs in C compilers by using random testing -- feeding random programs and testing if the optimizations preserved program behavior.

Haskell, having tens of optimizations, could be a potential application of the same technique.

I have no familiarity with the GHC or with any compilers in general; I am just looking for something to study.

My questions in its most direct form is, as in your view, could GHC optimizations hide bugs that could be potentially be revealed by exploring program spaces?

Em dom, 2 de set de 2018 às 16:58, Sven Panne <[hidden email]> escreveu:
Am So., 2. Sep. 2018 um 20:05 Uhr schrieb Rodrigo Stevaux <[hidden email]>:
Hi Omer, thanks for the reply. The tests you run are for regression testing, that is, non-functional aspects, is my understanding right? [...]

Quite the opposite, the usual steps are:

   * A bug is reported.
   * A regression test is added to GHC's test suite, reproducing the bug (https://ghc.haskell.org/trac/ghc/wiki/Building/RunningTests/Adding).
   * The bug is fixed.

This way it is made sure that the bug doesn't come back later. Do this for a few decades, and you have a very comprehensive test suite for functional aspects. :-) The reasoning behind this: Blindly adding tests is wasted effort most of time, because this way you often test things which only very rarely break: Bugs OTOH hint you very concretely at problematic/tricky/complicated parts of your SW.

Catching increases in runtime/memory consumption is a slightly different story, because you have to come up with "typical" scenarios to make useful comparisons. You can have synthetic scenarios for very specific parts of the compiler, too, like pattern matching with tons of constructors, or using gigantic literals, or type checking deeply nested tricky things, etc., but I am not sure if such things are usually called "regression tests".

Cheers,
   S.



_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Testing of GHC extensions & optimizations

Sven Panne-2
In reply to this post by Joachim Durchholz
Am So., 2. Sep. 2018 um 22:44 Uhr schrieb Joachim Durchholz <[hidden email]>:
That's just the... non-thinking aspect, and more embarrassment
avoidance. The first level of automated testing.

Well, even avoiding embarrassing bugs is extremely valuable. The vast amount of bugs in real-world SW *is* actually highly embarrassing, and even worse: Similar bugs have probably been introduced before. Getting some tricky algorithm wrong is the exception, at least for two reasons: The majority of code is typically very mundane and boring, and people are usually more awake and concentrated when they know that they are writing non-trivial stuff. Of course your mileage varies, depending on the domain, experience of programmers, deadline pressure, etc.
 
> Do this
> for a few decades, and you have a very comprehensive test suite for
> functional aspects. :-) The reasoning behind this: Blindly adding tests
> is wasted effort most of time, because this way you often test things
> which only very rarely break: Bugs OTOH hint you very concretely at
> problematic/tricky/complicated parts of your SW.

Well, you have to *think*.
You can't just blindly add tests for every bug that was ever reported;
you get an every-growing pile of test code, and if the spec changes you
need to change the tests. So you need a strategy to curate the test
code, and you very much prefer to test for the thing that actually went
wrong, not the thing that was reported.

Two things here: I never proposed to add the exact code from the bug report to a test suite. Bug reports are ususally too big and too unspecific, so of course you add a minimal, focused test triggering the buggy behavior. Furthermore: If the spec changes, your tests *must* break, by all means, otherwise: What are the tests actually testing if it's not the spec? Of course only those tests should break which test the changed part of the spec.
 
It's just a case where you cannot blindly add a test for every
performance regression you see, you have to set up testing beforehand.
Which is the exact opposite of what you recommend, so maybe the
recommendation shouldn't be taken at face value ;-P

This is exactly why I said that these tests are a different story. For performance measurements there is no binary "failed" or "correct" outcome, because typically many tradeoffs are involved (space vs. time etc.). Therefore you have to define what you consider important, measure that, and guard it against regressions.

It's a matter of definition and common usage, but indeed many people
associate the term "regression testing" with "let's write a test case
whenever we see a bug". [...]

This sounds far too disparaging, and a quite a few companies have a rule like "no bug fix gets committed without an accompanying regression test" for a good reason. People usually have no real clue where their most problematic code is (just like they have no clue where the most performance-critical part is), so having *some* hint (bug report) is far better than guessing without any hint.

Cheers,
   S.

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Testing of GHC extensions & optimizations

Emil Axelsson-3
In reply to this post by Rodrigo Stevaux
Have a look at Michal Palka's Ph.D. thesis:

https://research.chalmers.se/publication/195849

IIRC, his testing revealed several strictness bugs in GHC when compiling
with optimization.

/ Emil

Den 2018-09-03 kl. 03:40, skrev Rodrigo Stevaux:

> Thanks for the clarification.
>
> What I am hinting at is, the Csmith project caught many bugs in C
> compilers by using random testing -- feeding random programs and
> testing if the optimizations preserved program behavior.
>
> Haskell, having tens of optimizations, could be a potential
> application of the same technique.
>
> I have no familiarity with the GHC or with any compilers in general; I
> am just looking for something to study.
>
> My questions in its most direct form is, as in your view, could GHC
> optimizations hide bugs that could be potentially be revealed by
> exploring program spaces?
>
> Em dom, 2 de set de 2018 às 16:58, Sven Panne <[hidden email]
> <mailto:[hidden email]>> escreveu:
>
>     Am So., 2. Sep. 2018 um 20:05 Uhr schrieb Rodrigo Stevaux
>     <[hidden email] <mailto:[hidden email]>>:
>
>         Hi Omer, thanks for the reply. The tests you run are for
>         regression testing, that is, non-functional aspects, is my
>         understanding right? [...]
>
>
>     Quite the opposite, the usual steps are:
>
>        * A bug is reported.
>        * A regression test is added to GHC's test suite, reproducing
>     the bug
>     (https://ghc.haskell.org/trac/ghc/wiki/Building/RunningTests/Adding).
>        * The bug is fixed.
>
>     This way it is made sure that the bug doesn't come back later. Do
>     this for a few decades, and you have a very comprehensive test
>     suite for functional aspects. :-) The reasoning behind this:
>     Blindly adding tests is wasted effort most of time, because this
>     way you often test things which only very rarely break: Bugs OTOH
>     hint you very concretely at problematic/tricky/complicated parts
>     of your SW.
>
>     Catching increases in runtime/memory consumption is a slightly
>     different story, because you have to come up with "typical"
>     scenarios to make useful comparisons. You can have synthetic
>     scenarios for very specific parts of the compiler, too, like
>     pattern matching with tons of constructors, or using gigantic
>     literals, or type checking deeply nested tricky things, etc., but
>     I am not sure if such things are usually called "regression tests".
>
>     Cheers,
>        S.
>
>
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Testing of GHC extensions & optimizations

Rodrigo Stevaux
Ok this is the kind of stuff im looking for. This is great. Many thanks for the insight.

Em seg, 3 de set de 2018 às 04:08, Emil Axelsson <[hidden email]> escreveu:
Have a look at Michal Palka's Ph.D. thesis:

https://research.chalmers.se/publication/195849

IIRC, his testing revealed several strictness bugs in GHC when compiling
with optimization.

/ Emil

Den 2018-09-03 kl. 03:40, skrev Rodrigo Stevaux:
> Thanks for the clarification.
>
> What I am hinting at is, the Csmith project caught many bugs in C
> compilers by using random testing -- feeding random programs and
> testing if the optimizations preserved program behavior.
>
> Haskell, having tens of optimizations, could be a potential
> application of the same technique.
>
> I have no familiarity with the GHC or with any compilers in general; I
> am just looking for something to study.
>
> My questions in its most direct form is, as in your view, could GHC
> optimizations hide bugs that could be potentially be revealed by
> exploring program spaces?
>
> Em dom, 2 de set de 2018 às 16:58, Sven Panne <[hidden email]
> <mailto:[hidden email]>> escreveu:
>
>     Am So., 2. Sep. 2018 um 20:05 Uhr schrieb Rodrigo Stevaux
>     <[hidden email] <mailto:[hidden email]>>:
>
>         Hi Omer, thanks for the reply. The tests you run are for
>         regression testing, that is, non-functional aspects, is my
>         understanding right? [...]
>
>
>     Quite the opposite, the usual steps are:
>
>        * A bug is reported.
>        * A regression test is added to GHC's test suite, reproducing
>     the bug
>     (https://ghc.haskell.org/trac/ghc/wiki/Building/RunningTests/Adding).
>        * The bug is fixed.
>
>     This way it is made sure that the bug doesn't come back later. Do
>     this for a few decades, and you have a very comprehensive test
>     suite for functional aspects. :-) The reasoning behind this:
>     Blindly adding tests is wasted effort most of time, because this
>     way you often test things which only very rarely break: Bugs OTOH
>     hint you very concretely at problematic/tricky/complicated parts
>     of your SW.
>
>     Catching increases in runtime/memory consumption is a slightly
>     different story, because you have to come up with "typical"
>     scenarios to make useful comparisons. You can have synthetic
>     scenarios for very specific parts of the compiler, too, like
>     pattern matching with tons of constructors, or using gigantic
>     literals, or type checking deeply nested tricky things, etc., but
>     I am not sure if such things are usually called "regression tests".
>
>     Cheers,
>        S.
>
>
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.