Perf tests which are better than expected on perf builds

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Perf tests which are better than expected on perf builds

Edward Z. Yang
These tests have been doing better than expected in the nightlies
for some while.

> Unexpected failures:
>    perf/compiler  T3064 [stat too good] (normal)
>    perf/compiler  T3294 [stat too good] (normal)
>    perf/compiler  T5642 [stat too good] (normal)
>    perf/haddock   haddock.Cabal [stat too good] (normal)
>    perf/haddock   haddock.base [stat too good] (normal)

Unfortunately, fixing them is not a simple matter of shifting
the ranges up, since the tests only exceed expectations on
a /perf/ build, so on a normal build such as 'quick', these
tests all pass normally.

I could bump up the upper bounds so that the builder stops bleating
about them; perhaps we could do something more complicated where the
expected performance depends on what level of optimization GHC was built
with (but I don't know how to implement this.)

Thoughts?

Cheers,
Edward



Reply | Threaded
Open this post in threaded view
|

Perf tests which are better than expected on perf builds

Ian Lynagh-2
On Sat, Jul 20, 2013 at 11:26:10AM -0700, Edward Z. Yang wrote:

> These tests have been doing better than expected in the nightlies
> for some while.
>
> > Unexpected failures:
> >    perf/compiler  T3064 [stat too good] (normal)
> >    perf/compiler  T3294 [stat too good] (normal)
> >    perf/compiler  T5642 [stat too good] (normal)
> >    perf/haddock   haddock.Cabal [stat too good] (normal)
> >    perf/haddock   haddock.base [stat too good] (normal)
>
> Unfortunately, fixing them is not a simple matter of shifting
> the ranges up, since the tests only exceed expectations on
> a /perf/ build, so on a normal build such as 'quick', these
> tests all pass normally.
>
> I could bump up the upper bounds so that the builder stops bleating
> about them; perhaps we could do something more complicated where the
> expected performance depends on what level of optimization GHC was built
> with (but I don't know how to implement this.)
>
> Thoughts?

The problem with just widening the bounds to cover 2 different types of
build is that it increases the chance that performance changes won't
actually be noticed by thge person responsible.

Having different bounds for different build configurations is a pain,
because (a) the testsuite has to work out which set of bounds to use,
and (b) you now have even more wobbly values to keep up-to-date.

I think perhaps the best thing would be to add some sort of (per-test?)
fudge factor for non-validate builds. That way validate will still find
performance regressions, like it does today, but other builds are less
likely to give false positives.


Thanks
Ian
--
Ian Lynagh, Haskell Consultant
Well-Typed LLP, http://www.well-typed.com/



Reply | Threaded
Open this post in threaded view
|

Perf tests which are better than expected on perf builds

Edward Z. Yang
OK, I ticket-ified this conversation, at some point I'll get around
to this.

Excerpts from Ian Lynagh's message of Sun Jul 21 03:25:50 -0700 2013:

> On Sat, Jul 20, 2013 at 11:26:10AM -0700, Edward Z. Yang wrote:
> > These tests have been doing better than expected in the nightlies
> > for some while.
> >
> > > Unexpected failures:
> > >    perf/compiler  T3064 [stat too good] (normal)
> > >    perf/compiler  T3294 [stat too good] (normal)
> > >    perf/compiler  T5642 [stat too good] (normal)
> > >    perf/haddock   haddock.Cabal [stat too good] (normal)
> > >    perf/haddock   haddock.base [stat too good] (normal)
> >
> > Unfortunately, fixing them is not a simple matter of shifting
> > the ranges up, since the tests only exceed expectations on
> > a /perf/ build, so on a normal build such as 'quick', these
> > tests all pass normally.
> >
> > I could bump up the upper bounds so that the builder stops bleating
> > about them; perhaps we could do something more complicated where the
> > expected performance depends on what level of optimization GHC was built
> > with (but I don't know how to implement this.)
> >
> > Thoughts?
>
> The problem with just widening the bounds to cover 2 different types of
> build is that it increases the chance that performance changes won't
> actually be noticed by thge person responsible.
>
> Having different bounds for different build configurations is a pain,
> because (a) the testsuite has to work out which set of bounds to use,
> and (b) you now have even more wobbly values to keep up-to-date.
>
> I think perhaps the best thing would be to add some sort of (per-test?)
> fudge factor for non-validate builds. That way validate will still find
> performance regressions, like it does today, but other builds are less
> likely to give false positives.
>
>
> Thanks
> Ian