nofib comparisons between 7.0.4, 7.4.2, 7.6.1, and 7.6.2

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

nofib comparisons between 7.0.4, 7.4.2, 7.6.1, and 7.6.2

Johan Tibell-2
Hi all,

I haven't had much time to do performance tzar work yet, but I did run
nofib on the last few GHC releases to see the current trend. The benchmarks
where run on my 64-bit Core i7-3770 @ 3.40GHz Linux machine. Here are the
results:

7.0.4 to 7.4.2:

------------------------------------------------------------
--------------------
        Program           Size    Allocs   Runtime   Elapsed  TotalMem
--------------------------------------------------------------------------------
            Min          -1.6%    -57.3%    -39.1%    -36.4%    -25.0%
            Max         +21.5%   +121.5%    +24.5%    +25.4%   +300.0%
 Geometric Mean          +8.5%     -0.7%     -7.1%     -5.2%     +2.0%

The big loser here in terms of runtime is "kahan", which I added to test
tight loops involving unboxed arrays and floating point arithmetic. I
believe there was a regression in fromIntegral RULES during this release,
which meant that some conversions between fixed-width types went via
Integer, causing unnecessary allocation.

7.4.2 to 7.6.1:

--------------------------------------------------------------------------------
        Program           Size    Allocs   Runtime   Elapsed  TotalMem
--------------------------------------------------------------------------------
            Min          -5.1%    -23.8%    -11.8%    -12.9%    -50.0%
            Max          +5.3%   +225.5%     +7.2%     +8.8%   +200.0%
 Geometric Mean          -0.4%     +2.1%     +0.3%     +0.2%     +0.7%

The biggest loser here in terms of runtime is "integrate". I haven't looked
into why yet.

7.6.1 to 7.6.2:

--------------------------------------------------------------------------------
        Program           Size    Allocs   Runtime   Elapsed  TotalMem
--------------------------------------------------------------------------------
            Min          -2.9%     +0.0%     -4.8%     -4.4%     -1.9%
            Max          +0.0%     +1.0%     +4.5%     +6.4%    +20.8%
 Geometric Mean          -1.7%     +0.0%     +0.1%     +0.3%     +0.2%

I have two takeaways:

 * It's worthwhile running nofib before releases as it does find some
programs that regressed.
 * There are some other regressions out there (i.e. in code on Hackage)
that aren't reflected here, suggesting that we need to add more programs to
nofib.

Cheers,
Johan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130204/479c17b0/attachment-0001.htm>

Reply | Threaded
Open this post in threaded view
|

nofib comparisons between 7.0.4, 7.4.2, 7.6.1, and 7.6.2

Austin Seipp
I'm +1 for this. Eyal Lotem and I were just discussing this on IRC a
few minutes ago, and he suffered a rather large (~25%) performance hit
when upgrading to 7.6.1, which is unfortunate.

Committers are typically very good about recording nofib results in
their commit and being performance-courteous, but I'm not sure there's
ever been a longer-scale view of GHC performance over multiple
releases like this - or even a few months. At least not recently. On
top of that, his application was a type checker, which may certainly
stress different performance points than what nofib might. Once we get
performance bots set up, I've got a small set of machines I'm willing
to throw at it.

Thanks for the results, Johan!

On Mon, Feb 4, 2013 at 4:33 PM, Johan Tibell <johan.tibell at gmail.com> wrote:

> Hi all,
>
> I haven't had much time to do performance tzar work yet, but I did run nofib
> on the last few GHC releases to see the current trend. The benchmarks where
> run on my 64-bit Core i7-3770 @ 3.40GHz Linux machine. Here are the results:
>
> 7.0.4 to 7.4.2:
>
> --------------------------------------------------------------------------------
>         Program           Size    Allocs   Runtime   Elapsed  TotalMem
> --------------------------------------------------------------------------------
>             Min          -1.6%    -57.3%    -39.1%    -36.4%    -25.0%
>             Max         +21.5%   +121.5%    +24.5%    +25.4%   +300.0%
>  Geometric Mean          +8.5%     -0.7%     -7.1%     -5.2%     +2.0%
>
> The big loser here in terms of runtime is "kahan", which I added to test
> tight loops involving unboxed arrays and floating point arithmetic. I
> believe there was a regression in fromIntegral RULES during this release,
> which meant that some conversions between fixed-width types went via
> Integer, causing unnecessary allocation.
>
> 7.4.2 to 7.6.1:
>
> --------------------------------------------------------------------------------
>         Program           Size    Allocs   Runtime   Elapsed  TotalMem
> --------------------------------------------------------------------------------
>             Min          -5.1%    -23.8%    -11.8%    -12.9%    -50.0%
>             Max          +5.3%   +225.5%     +7.2%     +8.8%   +200.0%
>  Geometric Mean          -0.4%     +2.1%     +0.3%     +0.2%     +0.7%
>
> The biggest loser here in terms of runtime is "integrate". I haven't looked
> into why yet.
>
> 7.6.1 to 7.6.2:
>
> --------------------------------------------------------------------------------
>         Program           Size    Allocs   Runtime   Elapsed  TotalMem
> --------------------------------------------------------------------------------
>             Min          -2.9%     +0.0%     -4.8%     -4.4%     -1.9%
>             Max          +0.0%     +1.0%     +4.5%     +6.4%    +20.8%
>  Geometric Mean          -1.7%     +0.0%     +0.1%     +0.3%     +0.2%
>
> I have two takeaways:
>
>  * It's worthwhile running nofib before releases as it does find some
> programs that regressed.
>  * There are some other regressions out there (i.e. in code on Hackage) that
> aren't reflected here, suggesting that we need to add more programs to
> nofib.
>
> Cheers,
> Johan
>
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs
>



--
Regards,
Austin


Reply | Threaded
Open this post in threaded view
|

nofib comparisons between 7.0.4, 7.4.2, 7.6.1, and 7.6.2

Simon Peyton Jones
I'm +10.  This is precisely the reason we have our supreme Performance Tsars, to keep us honest.  GHC leadership is become increasingly decentralised and I am truly grateful to Bryan and Johan for picking up this particular challenge.

My guess is that regressions are accidental and readily fixed, but we can't fix them if we don't know about them.

Johan mentions more nofib benchmarks: yes please!  But someone has to put them in.

Austin, a 25% performance regression moving to 7.6 is not AT ALL what I expect. I generally expect modest performance improvements?  Can you characterise more precisely what is happening?  The place I always start is to compile the entire thing with -ticky and see where allocation is changing.  (Using -prof affects the optimiser too much.)

Simon

| -----Original Message-----
| From: ghc-devs-bounces at haskell.org [mailto:ghc-devs-bounces at haskell.org]
| On Behalf Of Austin Seipp
| Sent: 05 February 2013 04:22
| To: Johan Tibell
| Cc: ghc-devs at haskell.org
| Subject: Re: nofib comparisons between 7.0.4, 7.4.2, 7.6.1, and 7.6.2
|
| I'm +1 for this. Eyal Lotem and I were just discussing this on IRC a few
| minutes ago, and he suffered a rather large (~25%) performance hit when
| upgrading to 7.6.1, which is unfortunate.
|
| Committers are typically very good about recording nofib results in
| their commit and being performance-courteous, but I'm not sure there's
| ever been a longer-scale view of GHC performance over multiple releases
| like this - or even a few months. At least not recently. On top of that,
| his application was a type checker, which may certainly stress different
| performance points than what nofib might. Once we get performance bots
| set up, I've got a small set of machines I'm willing to throw at it.
|
| Thanks for the results, Johan!
|
| On Mon, Feb 4, 2013 at 4:33 PM, Johan Tibell <johan.tibell at gmail.com>
| wrote:
| > Hi all,
| >
| > I haven't had much time to do performance tzar work yet, but I did run
| > nofib on the last few GHC releases to see the current trend. The
| > benchmarks where run on my 64-bit Core i7-3770 @ 3.40GHz Linux
| machine. Here are the results:
| >
| > 7.0.4 to 7.4.2:
| >
| > ----------------------------------------------------------------------
| ----------
| >         Program           Size    Allocs   Runtime   Elapsed  TotalMem
| > ----------------------------------------------------------------------
| ----------
| >             Min          -1.6%    -57.3%    -39.1%    -36.4%    -25.0%
| >             Max         +21.5%   +121.5%    +24.5%    +25.4%   +300.0%
| >  Geometric Mean          +8.5%     -0.7%     -7.1%     -5.2%     +2.0%
| >
| > The big loser here in terms of runtime is "kahan", which I added to
| > test tight loops involving unboxed arrays and floating point
| > arithmetic. I believe there was a regression in fromIntegral RULES
| > during this release, which meant that some conversions between
| > fixed-width types went via Integer, causing unnecessary allocation.
| >
| > 7.4.2 to 7.6.1:
| >
| > ----------------------------------------------------------------------
| ----------
| >         Program           Size    Allocs   Runtime   Elapsed  TotalMem
| > ----------------------------------------------------------------------
| ----------
| >             Min          -5.1%    -23.8%    -11.8%    -12.9%    -50.0%
| >             Max          +5.3%   +225.5%     +7.2%     +8.8%   +200.0%
| >  Geometric Mean          -0.4%     +2.1%     +0.3%     +0.2%     +0.7%
| >
| > The biggest loser here in terms of runtime is "integrate". I haven't
| > looked into why yet.
| >
| > 7.6.1 to 7.6.2:
| >
| > ----------------------------------------------------------------------
| ----------
| >         Program           Size    Allocs   Runtime   Elapsed  TotalMem
| > ----------------------------------------------------------------------
| ----------
| >             Min          -2.9%     +0.0%     -4.8%     -4.4%     -1.9%
| >             Max          +0.0%     +1.0%     +4.5%     +6.4%    +20.8%
| >  Geometric Mean          -1.7%     +0.0%     +0.1%     +0.3%     +0.2%
| >
| > I have two takeaways:
| >
| >  * It's worthwhile running nofib before releases as it does find some
| > programs that regressed.
| >  * There are some other regressions out there (i.e. in code on
| > Hackage) that aren't reflected here, suggesting that we need to add
| > more programs to nofib.
| >
| > Cheers,
| > Johan
| >
| >
| > _______________________________________________
| > ghc-devs mailing list
| > ghc-devs at haskell.org
| > http://www.haskell.org/mailman/listinfo/ghc-devs
| >
|
|
|
| --
| Regards,
| Austin
|
| _______________________________________________
| ghc-devs mailing list
| ghc-devs at haskell.org
| http://www.haskell.org/mailman/listinfo/ghc-devs


Reply | Threaded
Open this post in threaded view
|

nofib comparisons between 7.0.4, 7.4.2, 7.6.1, and 7.6.2

Nicolas Frisby
In reply to this post by Johan Tibell-2
Is anyone familiar with the "fibon" directory within the nofib.git
repository?

http://darcs.haskell.org/nofib/fibon/

Johan, this at least seems like an potential home for the additional
programs you suggested adding. In particular, it has Repa, Dph, Shootout,
and Hackage subdirectories.

I'm doing a GHC HQ internship at the moment, and one of
the just-needs-to-happen tasks on my (growing) todo list is to look into
fibon.

SPJ recalls that not all of the various building infrastructures were
getting along. Anyone know the story? Thanks!


On Mon, Feb 4, 2013 at 10:33 PM, Johan Tibell <johan.tibell at gmail.com>wrote:

> Hi all,
>
> I haven't had much time to do performance tzar work yet, but I did run
> nofib on the last few GHC releases to see the current trend. The benchmarks
> where run on my 64-bit Core i7-3770 @ 3.40GHz Linux machine. Here are the
> results:
>
> 7.0.4 to 7.4.2:
>
> ------------------------------------------------------------
> --------------------
>         Program           Size    Allocs   Runtime   Elapsed  TotalMem
>
> --------------------------------------------------------------------------------
>             Min          -1.6%    -57.3%    -39.1%    -36.4%    -25.0%
>             Max         +21.5%   +121.5%    +24.5%    +25.4%   +300.0%
>  Geometric Mean          +8.5%     -0.7%     -7.1%     -5.2%     +2.0%
>
> The big loser here in terms of runtime is "kahan", which I added to test
> tight loops involving unboxed arrays and floating point arithmetic. I
> believe there was a regression in fromIntegral RULES during this release,
> which meant that some conversions between fixed-width types went via
> Integer, causing unnecessary allocation.
>
> 7.4.2 to 7.6.1:
>
>
> --------------------------------------------------------------------------------
>         Program           Size    Allocs   Runtime   Elapsed  TotalMem
>
> --------------------------------------------------------------------------------
>             Min          -5.1%    -23.8%    -11.8%    -12.9%    -50.0%
>             Max          +5.3%   +225.5%     +7.2%     +8.8%   +200.0%
>  Geometric Mean          -0.4%     +2.1%     +0.3%     +0.2%     +0.7%
>
> The biggest loser here in terms of runtime is "integrate". I haven't
> looked into why yet.
>
> 7.6.1 to 7.6.2:
>
>
> --------------------------------------------------------------------------------
>         Program           Size    Allocs   Runtime   Elapsed  TotalMem
>
> --------------------------------------------------------------------------------
>             Min          -2.9%     +0.0%     -4.8%     -4.4%     -1.9%
>             Max          +0.0%     +1.0%     +4.5%     +6.4%    +20.8%
>  Geometric Mean          -1.7%     +0.0%     +0.1%     +0.3%     +0.2%
>
> I have two takeaways:
>
>  * It's worthwhile running nofib before releases as it does find some
> programs that regressed.
>  * There are some other regressions out there (i.e. in code on Hackage)
> that aren't reflected here, suggesting that we need to add more programs to
> nofib.
>
> Cheers,
> Johan
>
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130205/4257fedd/attachment-0001.htm>

Reply | Threaded
Open this post in threaded view
|

nofib comparisons between 7.0.4, 7.4.2, 7.6.1, and 7.6.2

Simon Peyton Jones
I believe fibon/ was helpfully added by someone, but never integrated into the nofib build system.  Just needs doing, I think

Simon

From: ghc-devs-bounces at haskell.org [mailto:ghc-devs-bounces at haskell.org] On Behalf Of Nicolas Frisby
Sent: 05 February 2013 09:24
To: Johan Tibell
Cc: ghc-devs at haskell.org
Subject: Re: nofib comparisons between 7.0.4, 7.4.2, 7.6.1, and 7.6.2

Is anyone familiar with the "fibon" directory within the nofib.git repository?

http://darcs.haskell.org/nofib/fibon/

Johan, this at least seems like an potential home for the additional programs you suggested adding. In particular, it has Repa, Dph, Shootout, and Hackage subdirectories.

I'm doing a GHC HQ internship at the moment, and one of the just-needs-to-happen tasks on my (growing) todo list is to look into fibon.

SPJ recalls that not all of the various building infrastructures were getting along. Anyone know the story? Thanks!

On Mon, Feb 4, 2013 at 10:33 PM, Johan Tibell <johan.tibell at gmail.com<mailto:johan.tibell at gmail.com>> wrote:
Hi all,

I haven't had much time to do performance tzar work yet, but I did run nofib on the last few GHC releases to see the current trend. The benchmarks where run on my 64-bit Core i7-3770 @ 3.40GHz Linux machine. Here are the results:

7.0.4 to 7.4.2:

--------------------------------------------------------------------------------
        Program           Size    Allocs   Runtime   Elapsed  TotalMem
--------------------------------------------------------------------------------
            Min          -1.6%    -57.3%    -39.1%    -36.4%    -25.0%
            Max         +21.5%   +121.5%    +24.5%    +25.4%   +300.0%
 Geometric Mean          +8.5%     -0.7%     -7.1%     -5.2%     +2.0%

The big loser here in terms of runtime is "kahan", which I added to test tight loops involving unboxed arrays and floating point arithmetic. I believe there was a regression in fromIntegral RULES during this release, which meant that some conversions between fixed-width types went via Integer, causing unnecessary allocation.

7.4.2 to 7.6.1:

--------------------------------------------------------------------------------
        Program           Size    Allocs   Runtime   Elapsed  TotalMem
--------------------------------------------------------------------------------
            Min          -5.1%    -23.8%    -11.8%    -12.9%    -50.0%
            Max          +5.3%   +225.5%     +7.2%     +8.8%   +200.0%
 Geometric Mean          -0.4%     +2.1%     +0.3%     +0.2%     +0.7%

The biggest loser here in terms of runtime is "integrate". I haven't looked into why yet.

7.6.1 to 7.6.2:

--------------------------------------------------------------------------------
        Program           Size    Allocs   Runtime   Elapsed  TotalMem
--------------------------------------------------------------------------------
            Min          -2.9%     +0.0%     -4.8%     -4.4%     -1.9%
            Max          +0.0%     +1.0%     +4.5%     +6.4%    +20.8%
 Geometric Mean          -1.7%     +0.0%     +0.1%     +0.3%     +0.2%

I have two takeaways:

 * It's worthwhile running nofib before releases as it does find some programs that regressed.
 * There are some other regressions out there (i.e. in code on Hackage) that aren't reflected here, suggesting that we need to add more programs to nofib.

Cheers,
Johan


_______________________________________________
ghc-devs mailing list
ghc-devs at haskell.org<mailto:ghc-devs at haskell.org>
http://www.haskell.org/mailman/listinfo/ghc-devs

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130205/3178a481/attachment.htm>

Reply | Threaded
Open this post in threaded view
|

nofib comparisons between 7.0.4, 7.4.2, 7.6.1, and 7.6.2

Simon Marlow-7
On 05/02/13 10:13, Simon Peyton-Jones wrote:
> I believe fibon/ was helpfully added by someone, but never integrated
> into the nofib build system.  Just needs doing, I think

Right - I think it was even integrated into the build system, but it
wasn't turned on by default.  I tried it once and something didn't work,
and I didn't have the time to fix it then.

There are some other collections of programs in nofib that aren't run by
default:

nofib/gc

My GC benchmarks (some of these overlap with the rest of nofib, but
might have different inputs/parameters).  I usually run these when I
change something in the GC.

nofib/smp

The concurrency benchmarks.  Edward is using these to tune his new
scheduler.  These could be enabled by default.

nofib/parallel

The parallel benchmarks.  It wouldn't hurt to run these by default too,
on at least 1 core and maybe more.  I generally run them on 8 cores when
I change something in the RTS.

Cheers,
        Simon


> Simon
>
> *From:*ghc-devs-bounces at haskell.org
> [mailto:ghc-devs-bounces at haskell.org] *On Behalf Of *Nicolas Frisby
> *Sent:* 05 February 2013 09:24
> *To:* Johan Tibell
> *Cc:* ghc-devs at haskell.org
> *Subject:* Re: nofib comparisons between 7.0.4, 7.4.2, 7.6.1, and 7.6.2
>
> Is anyone familiar with the "fibon" directory within the nofib.git
> repository?
>
> http://darcs.haskell.org/nofib/fibon/
>
> Johan, this at least seems like an potential home for the additional
> programs you suggested adding. In particular, it has Repa, Dph,
> Shootout, and Hackage subdirectories.
>
> I'm doing a GHC HQ internship at the moment, and one of
> the just-needs-to-happen tasks on my (growing) todo list is to look into
> fibon.
>
> SPJ recalls that not all of the various building infrastructures were
> getting along. Anyone know the story? Thanks!
>
> On Mon, Feb 4, 2013 at 10:33 PM, Johan Tibell <johan.tibell at gmail.com
> <mailto:johan.tibell at gmail.com>> wrote:
>
> Hi all,
>
> I haven't had much time to do performance tzar work yet, but I did run
> nofib on the last few GHC releases to see the current trend. The
> benchmarks where run on my 64-bit Core i7-3770 @ 3.40GHz Linux machine.
> Here are the results:
>
> 7.0.4 to 7.4.2:
>
> --------------------------------------------------------------------------------
>          Program           Size    Allocs   Runtime   Elapsed  TotalMem
>
> --------------------------------------------------------------------------------
>
>              Min          -1.6%    -57.3%    -39.1%    -36.4%    -25.0%
>
>              Max         +21.5%   +121.5%    +24.5%    +25.4%   +300.0%
>
>   Geometric Mean          +8.5%     -0.7%     -7.1%     -5.2%     +2.0%
>
> The big loser here in terms of runtime is "kahan", which I added to test
> tight loops involving unboxed arrays and floating point arithmetic. I
> believe there was a regression in fromIntegral RULES during this
> release, which meant that some conversions between fixed-width types
> went via Integer, causing unnecessary allocation.
>
> 7.4.2 to 7.6.1:
>
> --------------------------------------------------------------------------------
>          Program           Size    Allocs   Runtime   Elapsed  TotalMem
> --------------------------------------------------------------------------------
>              Min          -5.1%    -23.8%    -11.8%    -12.9%    -50.0%
>              Max          +5.3%   +225.5%     +7.2%     +8.8%   +200.0%
>   Geometric Mean          -0.4%     +2.1%     +0.3%     +0.2%     +0.7%
>
> The biggest loser here in terms of runtime is "integrate". I haven't
> looked into why yet.
>
> 7.6.1 to 7.6.2:
>
> --------------------------------------------------------------------------------
>          Program           Size    Allocs   Runtime   Elapsed  TotalMem
>
> --------------------------------------------------------------------------------
>
>              Min          -2.9%     +0.0%     -4.8%     -4.4%     -1.9%
>
>              Max          +0.0%     +1.0%     +4.5%     +6.4%    +20.8%
>
>   Geometric Mean          -1.7%     +0.0%     +0.1%     +0.3%     +0.2%
>
> I have two takeaways:
>
>   * It's worthwhile running nofib before releases as it does find some
> programs that regressed.
>
>   * There are some other regressions out there (i.e. in code on Hackage)
> that aren't reflected here, suggesting that we need to add more programs
> to nofib.
>
> Cheers,
>
> Johan
>
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org <mailto:ghc-devs at haskell.org>
> http://www.haskell.org/mailman/listinfo/ghc-devs
>
>
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs
>



Reply | Threaded
Open this post in threaded view
|

nofib comparisons between 7.0.4, 7.4.2, 7.6.1, and 7.6.2

David Terei
In reply to this post by Nicolas Frisby
On 5 February 2013 01:24, Nicolas Frisby <nicolas.frisby at gmail.com> wrote:
> Is anyone familiar with the "fibon" directory within the nofib.git
> repository?
>
> http://darcs.haskell.org/nofib/fibon/

Yes. They are from here: https://github.com/dmpots/fibon

Fibon is a newer, alternative benchmarking suite for Haskell done by
David M Peixotto. I've used it at times but sadly haven't had much
luck, it always seems to take many hours to run on my machine.

>
> Johan, this at least seems like an potential home for the additional
> programs you suggested adding. In particular, it has Repa, Dph, Shootout,
> and Hackage subdirectories.
>
> I'm doing a GHC HQ internship at the moment, and one of the
> just-needs-to-happen tasks on my (growing) todo list is to look into fibon.
>
> SPJ recalls that not all of the various building infrastructures were
> getting along. Anyone know the story? Thanks!
>
>
> On Mon, Feb 4, 2013 at 10:33 PM, Johan Tibell <johan.tibell at gmail.com>
> wrote:
>>
>> Hi all,
>>
>> I haven't had much time to do performance tzar work yet, but I did run
>> nofib on the last few GHC releases to see the current trend. The benchmarks
>> where run on my 64-bit Core i7-3770 @ 3.40GHz Linux machine. Here are the
>> results:
>>
>> 7.0.4 to 7.4.2:
>>
>>
>> --------------------------------------------------------------------------------
>>         Program           Size    Allocs   Runtime   Elapsed  TotalMem
>>
>> --------------------------------------------------------------------------------
>>             Min          -1.6%    -57.3%    -39.1%    -36.4%    -25.0%
>>             Max         +21.5%   +121.5%    +24.5%    +25.4%   +300.0%
>>  Geometric Mean          +8.5%     -0.7%     -7.1%     -5.2%     +2.0%
>>
>> The big loser here in terms of runtime is "kahan", which I added to test
>> tight loops involving unboxed arrays and floating point arithmetic. I
>> believe there was a regression in fromIntegral RULES during this release,
>> which meant that some conversions between fixed-width types went via
>> Integer, causing unnecessary allocation.
>>
>> 7.4.2 to 7.6.1:
>>
>>
>> --------------------------------------------------------------------------------
>>         Program           Size    Allocs   Runtime   Elapsed  TotalMem
>>
>> --------------------------------------------------------------------------------
>>             Min          -5.1%    -23.8%    -11.8%    -12.9%    -50.0%
>>             Max          +5.3%   +225.5%     +7.2%     +8.8%   +200.0%
>>  Geometric Mean          -0.4%     +2.1%     +0.3%     +0.2%     +0.7%
>>
>> The biggest loser here in terms of runtime is "integrate". I haven't
>> looked into why yet.
>>
>> 7.6.1 to 7.6.2:
>>
>>
>> --------------------------------------------------------------------------------
>>         Program           Size    Allocs   Runtime   Elapsed  TotalMem
>>
>> --------------------------------------------------------------------------------
>>             Min          -2.9%     +0.0%     -4.8%     -4.4%     -1.9%
>>             Max          +0.0%     +1.0%     +4.5%     +6.4%    +20.8%
>>  Geometric Mean          -1.7%     +0.0%     +0.1%     +0.3%     +0.2%
>>
>> I have two takeaways:
>>
>>  * It's worthwhile running nofib before releases as it does find some
>> programs that regressed.
>>  * There are some other regressions out there (i.e. in code on Hackage)
>> that aren't reflected here, suggesting that we need to add more programs to
>> nofib.
>>
>> Cheers,
>> Johan
>>
>>
>> _______________________________________________
>> ghc-devs mailing list
>> ghc-devs at haskell.org
>> http://www.haskell.org/mailman/listinfo/ghc-devs
>>
>
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs
>


Reply | Threaded
Open this post in threaded view
|

nofib comparisons between 7.0.4, 7.4.2, 7.6.1, and 7.6.2

David Terei
In reply to this post by Simon Peyton Jones
On 5 February 2013 02:13, Simon Peyton-Jones <simonpj at microsoft.com> wrote:
> I believe fibon/ was helpfully added by someone, but never integrated into
> the nofib build system.  Just needs doing, I think

No I spent a fair amount of effort fixing this up about 9 months back.
At that stage it worked fine, I haven't run for 6 months so not sure
any more but they should be close to working at the least.

>
>
>
> Simon
>
>
>
> From: ghc-devs-bounces at haskell.org [mailto:ghc-devs-bounces at haskell.org] On
> Behalf Of Nicolas Frisby
> Sent: 05 February 2013 09:24
>
>
> To: Johan Tibell
> Cc: ghc-devs at haskell.org
> Subject: Re: nofib comparisons between 7.0.4, 7.4.2, 7.6.1, and 7.6.2
>
>
>
> Is anyone familiar with the "fibon" directory within the nofib.git
> repository?
>
>
>
> http://darcs.haskell.org/nofib/fibon/
>
>
>
> Johan, this at least seems like an potential home for the additional
> programs you suggested adding. In particular, it has Repa, Dph, Shootout,
> and Hackage subdirectories.
>
>
>
> I'm doing a GHC HQ internship at the moment, and one of the
> just-needs-to-happen tasks on my (growing) todo list is to look into fibon.
>
>
>
> SPJ recalls that not all of the various building infrastructures were
> getting along. Anyone know the story? Thanks!
>
>
>
> On Mon, Feb 4, 2013 at 10:33 PM, Johan Tibell <johan.tibell at gmail.com>
> wrote:
>
> Hi all,
>
>
>
> I haven't had much time to do performance tzar work yet, but I did run nofib
> on the last few GHC releases to see the current trend. The benchmarks where
> run on my 64-bit Core i7-3770 @ 3.40GHz Linux machine. Here are the results:
>
>
>
> 7.0.4 to 7.4.2:
>
>
>
> --------------------------------------------------------------------------------
>         Program           Size    Allocs   Runtime   Elapsed  TotalMem
>
> --------------------------------------------------------------------------------
>
>             Min          -1.6%    -57.3%    -39.1%    -36.4%    -25.0%
>
>             Max         +21.5%   +121.5%    +24.5%    +25.4%   +300.0%
>
>  Geometric Mean          +8.5%     -0.7%     -7.1%     -5.2%     +2.0%
>
>
>
> The big loser here in terms of runtime is "kahan", which I added to test
> tight loops involving unboxed arrays and floating point arithmetic. I
> believe there was a regression in fromIntegral RULES during this release,
> which meant that some conversions between fixed-width types went via
> Integer, causing unnecessary allocation.
>
>
>
> 7.4.2 to 7.6.1:
>
>
>
> --------------------------------------------------------------------------------
>         Program           Size    Allocs   Runtime   Elapsed  TotalMem
> --------------------------------------------------------------------------------
>             Min          -5.1%    -23.8%    -11.8%    -12.9%    -50.0%
>             Max          +5.3%   +225.5%     +7.2%     +8.8%   +200.0%
>  Geometric Mean          -0.4%     +2.1%     +0.3%     +0.2%     +0.7%
>
> The biggest loser here in terms of runtime is "integrate". I haven't looked
> into why yet.
>
>
>
> 7.6.1 to 7.6.2:
>
>
>
> --------------------------------------------------------------------------------
>         Program           Size    Allocs   Runtime   Elapsed  TotalMem
>
> --------------------------------------------------------------------------------
>
>             Min          -2.9%     +0.0%     -4.8%     -4.4%     -1.9%
>
>             Max          +0.0%     +1.0%     +4.5%     +6.4%    +20.8%
>
>  Geometric Mean          -1.7%     +0.0%     +0.1%     +0.3%     +0.2%
>
>
>
> I have two takeaways:
>
>
>
>  * It's worthwhile running nofib before releases as it does find some
> programs that regressed.
>
>  * There are some other regressions out there (i.e. in code on Hackage) that
> aren't reflected here, suggesting that we need to add more programs to
> nofib.
>
>
>
> Cheers,
>
> Johan
>
>
>
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs
>
>
>
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs
>


Reply | Threaded
Open this post in threaded view
|

nofib comparisons between 7.0.4, 7.4.2, 7.6.1, and 7.6.2

Nicolas Frisby
In reply to this post by Austin Seipp
I'd like to investigate the "other regressions out there".

Do you have more info? Perhaps a list? Maybe even benchmarking code?

Thanks.


On Tue, Feb 5, 2013 at 4:22 AM, Austin Seipp <mad.one at gmail.com> wrote:

> I'm +1 for this. Eyal Lotem and I were just discussing this on IRC a
> few minutes ago, and he suffered a rather large (~25%) performance hit
> when upgrading to 7.6.1, which is unfortunate.
>
> Committers are typically very good about recording nofib results in
> their commit and being performance-courteous, but I'm not sure there's
> ever been a longer-scale view of GHC performance over multiple
> releases like this - or even a few months. At least not recently. On
> top of that, his application was a type checker, which may certainly
> stress different performance points than what nofib might. Once we get
> performance bots set up, I've got a small set of machines I'm willing
> to throw at it.
>
> Thanks for the results, Johan!
>
> On Mon, Feb 4, 2013 at 4:33 PM, Johan Tibell <johan.tibell at gmail.com>
> wrote:
> > Hi all,
> >
> > I haven't had much time to do performance tzar work yet, but I did run
> nofib
> > on the last few GHC releases to see the current trend. The benchmarks
> where
> > run on my 64-bit Core i7-3770 @ 3.40GHz Linux machine. Here are the
> results:
> >
> > 7.0.4 to 7.4.2:
> >
> >
> --------------------------------------------------------------------------------
> >         Program           Size    Allocs   Runtime   Elapsed  TotalMem
> >
> --------------------------------------------------------------------------------
> >             Min          -1.6%    -57.3%    -39.1%    -36.4%    -25.0%
> >             Max         +21.5%   +121.5%    +24.5%    +25.4%   +300.0%
> >  Geometric Mean          +8.5%     -0.7%     -7.1%     -5.2%     +2.0%
> >
> > The big loser here in terms of runtime is "kahan", which I added to test
> > tight loops involving unboxed arrays and floating point arithmetic. I
> > believe there was a regression in fromIntegral RULES during this release,
> > which meant that some conversions between fixed-width types went via
> > Integer, causing unnecessary allocation.
> >
> > 7.4.2 to 7.6.1:
> >
> >
> --------------------------------------------------------------------------------
> >         Program           Size    Allocs   Runtime   Elapsed  TotalMem
> >
> --------------------------------------------------------------------------------
> >             Min          -5.1%    -23.8%    -11.8%    -12.9%    -50.0%
> >             Max          +5.3%   +225.5%     +7.2%     +8.8%   +200.0%
> >  Geometric Mean          -0.4%     +2.1%     +0.3%     +0.2%     +0.7%
> >
> > The biggest loser here in terms of runtime is "integrate". I haven't
> looked
> > into why yet.
> >
> > 7.6.1 to 7.6.2:
> >
> >
> --------------------------------------------------------------------------------
> >         Program           Size    Allocs   Runtime   Elapsed  TotalMem
> >
> --------------------------------------------------------------------------------
> >             Min          -2.9%     +0.0%     -4.8%     -4.4%     -1.9%
> >             Max          +0.0%     +1.0%     +4.5%     +6.4%    +20.8%
> >  Geometric Mean          -1.7%     +0.0%     +0.1%     +0.3%     +0.2%
> >
> > I have two takeaways:
> >
> >  * It's worthwhile running nofib before releases as it does find some
> > programs that regressed.
> >  * There are some other regressions out there (i.e. in code on Hackage)
> that
> > aren't reflected here, suggesting that we need to add more programs to
> > nofib.
> >
> > Cheers,
> > Johan
> >
> >
> > _______________________________________________
> > ghc-devs mailing list
> > ghc-devs at haskell.org
> > http://www.haskell.org/mailman/listinfo/ghc-devs
> >
>
>
>
> --
> Regards,
> Austin
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130205/c3f78172/attachment.htm>

Reply | Threaded
Open this post in threaded view
|

nofib comparisons between 7.0.4, 7.4.2, 7.6.1, and 7.6.2

Tim Watson
We have some benchmarks for Cloud Haskell and its underlying network-transport
infrastructure that I'm in the process of trying to automate. I'd be very
interested to see how these fare against various GHC releases, though I suspect
we'll have to tweak the dependencies considerably in order to make the
automation happen.

I don't know if that fits into the 'other regressions' category or not?

Cheers,
Tim

On 5 Feb 2013, at 14:24, Nicolas Frisby wrote:

> I'd like to investigate the "other regressions out there".
>  
> Do you have more info? Perhaps a list? Maybe even benchmarking code?
>  
> Thanks.
>
>
> On Tue, Feb 5, 2013 at 4:22 AM, Austin Seipp <mad.one at gmail.com> wrote:
> I'm +1 for this. Eyal Lotem and I were just discussing this on IRC a
> few minutes ago, and he suffered a rather large (~25%) performance hit
> when upgrading to 7.6.1, which is unfortunate.
>
> Committers are typically very good about recording nofib results in
> their commit and being performance-courteous, but I'm not sure there's
> ever been a longer-scale view of GHC performance over multiple
> releases like this - or even a few months. At least not recently. On
> top of that, his application was a type checker, which may certainly
> stress different performance points than what nofib might. Once we get
> performance bots set up, I've got a small set of machines I'm willing
> to throw at it.
>
> Thanks for the results, Johan!
>
> On Mon, Feb 4, 2013 at 4:33 PM, Johan Tibell <johan.tibell at gmail.com> wrote:
> > Hi all,
> >
> > I haven't had much time to do performance tzar work yet, but I did run nofib
> > on the last few GHC releases to see the current trend. The benchmarks where
> > run on my 64-bit Core i7-3770 @ 3.40GHz Linux machine. Here are the results:
> >
> > 7.0.4 to 7.4.2:
> >
> > --------------------------------------------------------------------------------
> >         Program           Size    Allocs   Runtime   Elapsed  TotalMem
> > --------------------------------------------------------------------------------
> >             Min          -1.6%    -57.3%    -39.1%    -36.4%    -25.0%
> >             Max         +21.5%   +121.5%    +24.5%    +25.4%   +300.0%
> >  Geometric Mean          +8.5%     -0.7%     -7.1%     -5.2%     +2.0%
> >
> > The big loser here in terms of runtime is "kahan", which I added to test
> > tight loops involving unboxed arrays and floating point arithmetic. I
> > believe there was a regression in fromIntegral RULES during this release,
> > which meant that some conversions between fixed-width types went via
> > Integer, causing unnecessary allocation.
> >
> > 7.4.2 to 7.6.1:
> >
> > --------------------------------------------------------------------------------
> >         Program           Size    Allocs   Runtime   Elapsed  TotalMem
> > --------------------------------------------------------------------------------
> >             Min          -5.1%    -23.8%    -11.8%    -12.9%    -50.0%
> >             Max          +5.3%   +225.5%     +7.2%     +8.8%   +200.0%
> >  Geometric Mean          -0.4%     +2.1%     +0.3%     +0.2%     +0.7%
> >
> > The biggest loser here in terms of runtime is "integrate". I haven't looked
> > into why yet.
> >
> > 7.6.1 to 7.6.2:
> >
> > --------------------------------------------------------------------------------
> >         Program           Size    Allocs   Runtime   Elapsed  TotalMem
> > --------------------------------------------------------------------------------
> >             Min          -2.9%     +0.0%     -4.8%     -4.4%     -1.9%
> >             Max          +0.0%     +1.0%     +4.5%     +6.4%    +20.8%
> >  Geometric Mean          -1.7%     +0.0%     +0.1%     +0.3%     +0.2%
> >
> > I have two takeaways:
> >
> >  * It's worthwhile running nofib before releases as it does find some
> > programs that regressed.
> >  * There are some other regressions out there (i.e. in code on Hackage) that
> > aren't reflected here, suggesting that we need to add more programs to
> > nofib.
> >
> > Cheers,
> > Johan
> >
> >
> > _______________________________________________
> > ghc-devs mailing list
> > ghc-devs at haskell.org
> > http://www.haskell.org/mailman/listinfo/ghc-devs
> >
>
>
>
> --
> Regards,
> Austin
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs



Reply | Threaded
Open this post in threaded view
|

nofib comparisons between 7.0.4, 7.4.2, 7.6.1, and 7.6.2

Austin Seipp
In reply to this post by Simon Peyton Jones
On Tue, Feb 5, 2013 at 2:54 AM, Simon Peyton-Jones
<simonpj at microsoft.com> wrote:
> Austin, a 25% performance regression moving to 7.6 is not AT ALL what I expect. I generally expect modest performance improvements?  Can you characterise more precisely what is happening?  The place I always start is to compile the entire thing with -ticky and see where allocation is changing.  (Using -prof affects the optimiser too much.)

I have CC'd Eyal just in case. The discussion was informal but he can
hopefully provide more context and rigor. I think off hand, this
occurred in a rather large-ish application of his (Lamdu?,) and so
tracking down precise reasons may prove difficult. I think the most
likely case is just those few 'small cuts' accumulate quickly and are
reflecting poorly for this particular case - and that's really the
worse 'bug report' of all!

Hashable/lens alone for example could certainly make a sizable impact
here when added up, e.g. [1] is a recent example of an alleged perf
anomaly as of late. And the OS could certainly be relevant.[2] All the
more reason to expand nofib and get those bots up!

[1] https://github.com/tibbe/hashable/issues/57

[2] Just thinking out loud, but, whenever this happens we really need
to characterize results on a per-OS/hardware basis if possible in the
future, with some relatively detailed hardware info, to be
unambiguous. In terms of raw CPU speed, a lot of benchmarks probably
won't stand out due to the OS. But OS X is scheduled to get worse in
the SMP case soon[3] for example, and if we inevitably try and start
doing things like latency or I/O benchmarks, I'm more than certain
things will pop up here.

[3] See this ticket: http://hackage.haskell.org/trac/ghc/ticket/7602

> Simon
>
> | -----Original Message-----
> | From: ghc-devs-bounces at haskell.org [mailto:ghc-devs-bounces at haskell.org]
> | On Behalf Of Austin Seipp
> | Sent: 05 February 2013 04:22
> | To: Johan Tibell
> | Cc: ghc-devs at haskell.org
> | Subject: Re: nofib comparisons between 7.0.4, 7.4.2, 7.6.1, and 7.6.2
> |
> | I'm +1 for this. Eyal Lotem and I were just discussing this on IRC a few
> | minutes ago, and he suffered a rather large (~25%) performance hit when
> | upgrading to 7.6.1, which is unfortunate.
> |
> | Committers are typically very good about recording nofib results in
> | their commit and being performance-courteous, but I'm not sure there's
> | ever been a longer-scale view of GHC performance over multiple releases
> | like this - or even a few months. At least not recently. On top of that,
> | his application was a type checker, which may certainly stress different
> | performance points than what nofib might. Once we get performance bots
> | set up, I've got a small set of machines I'm willing to throw at it.
> |
> | Thanks for the results, Johan!
> |
> | On Mon, Feb 4, 2013 at 4:33 PM, Johan Tibell <johan.tibell at gmail.com>
> | wrote:
> | > Hi all,
> | >
> | > I haven't had much time to do performance tzar work yet, but I did run
> | > nofib on the last few GHC releases to see the current trend. The
> | > benchmarks where run on my 64-bit Core i7-3770 @ 3.40GHz Linux
> | machine. Here are the results:
> | >
> | > 7.0.4 to 7.4.2:
> | >
> | > ----------------------------------------------------------------------
> | ----------
> | >         Program           Size    Allocs   Runtime   Elapsed  TotalMem
> | > ----------------------------------------------------------------------
> | ----------
> | >             Min          -1.6%    -57.3%    -39.1%    -36.4%    -25.0%
> | >             Max         +21.5%   +121.5%    +24.5%    +25.4%   +300.0%
> | >  Geometric Mean          +8.5%     -0.7%     -7.1%     -5.2%     +2.0%
> | >
> | > The big loser here in terms of runtime is "kahan", which I added to
> | > test tight loops involving unboxed arrays and floating point
> | > arithmetic. I believe there was a regression in fromIntegral RULES
> | > during this release, which meant that some conversions between
> | > fixed-width types went via Integer, causing unnecessary allocation.
> | >
> | > 7.4.2 to 7.6.1:
> | >
> | > ----------------------------------------------------------------------
> | ----------
> | >         Program           Size    Allocs   Runtime   Elapsed  TotalMem
> | > ----------------------------------------------------------------------
> | ----------
> | >             Min          -5.1%    -23.8%    -11.8%    -12.9%    -50.0%
> | >             Max          +5.3%   +225.5%     +7.2%     +8.8%   +200.0%
> | >  Geometric Mean          -0.4%     +2.1%     +0.3%     +0.2%     +0.7%
> | >
> | > The biggest loser here in terms of runtime is "integrate". I haven't
> | > looked into why yet.
> | >
> | > 7.6.1 to 7.6.2:
> | >
> | > ----------------------------------------------------------------------
> | ----------
> | >         Program           Size    Allocs   Runtime   Elapsed  TotalMem
> | > ----------------------------------------------------------------------
> | ----------
> | >             Min          -2.9%     +0.0%     -4.8%     -4.4%     -1.9%
> | >             Max          +0.0%     +1.0%     +4.5%     +6.4%    +20.8%
> | >  Geometric Mean          -1.7%     +0.0%     +0.1%     +0.3%     +0.2%
> | >
> | > I have two takeaways:
> | >
> | >  * It's worthwhile running nofib before releases as it does find some
> | > programs that regressed.
> | >  * There are some other regressions out there (i.e. in code on
> | > Hackage) that aren't reflected here, suggesting that we need to add
> | > more programs to nofib.
> | >
> | > Cheers,
> | > Johan
> | >
> | >
> | > _______________________________________________
> | > ghc-devs mailing list
> | > ghc-devs at haskell.org
> | > http://www.haskell.org/mailman/listinfo/ghc-devs
> | >
> |
> |
> |
> | --
> | Regards,
> | Austin
> |
> | _______________________________________________
> | ghc-devs mailing list
> | ghc-devs at haskell.org
> | http://www.haskell.org/mailman/listinfo/ghc-devs



--
Regards,
Austin


Reply | Threaded
Open this post in threaded view
|

nofib comparisons between 7.0.4, 7.4.2, 7.6.1, and 7.6.2

Johan Tibell-2
In reply to this post by David Terei
On Tue, Feb 5, 2013 at 3:19 AM, David Terei <davidterei at gmail.com> wrote:

> On 5 February 2013 02:13, Simon Peyton-Jones <simonpj at microsoft.com>
> wrote:
> > I believe fibon/ was helpfully added by someone, but never integrated
> into
> > the nofib build system.  Just needs doing, I think
>
> No I spent a fair amount of effort fixing this up about 9 months back.
> At that stage it worked fine, I haven't run for 6 months so not sure
> any more but they should be close to working at the least.


Instead of trying to get fibon to work I'll try to get some of the shootout
benchmarks into nofib. These are small micro benchmarks that shouldn't
require anything special to run.

-- Johan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130205/f512de47/attachment.htm>

Reply | Threaded
Open this post in threaded view
|

nofib comparisons between 7.0.4, 7.4.2, 7.6.1, and 7.6.2

David Terei
On 5 February 2013 09:34, Johan Tibell <johan.tibell at gmail.com> wrote:

> On Tue, Feb 5, 2013 at 3:19 AM, David Terei <davidterei at gmail.com> wrote:
>>
>> On 5 February 2013 02:13, Simon Peyton-Jones <simonpj at microsoft.com>
>> wrote:
>> > I believe fibon/ was helpfully added by someone, but never integrated
>> > into
>> > the nofib build system.  Just needs doing, I think
>>
>> No I spent a fair amount of effort fixing this up about 9 months back.
>> At that stage it worked fine, I haven't run for 6 months so not sure
>> any more but they should be close to working at the least.
>
>
> Instead of trying to get fibon to work I'll try to get some of the shootout
> benchmarks into nofib. These are small micro benchmarks that shouldn't
> require anything special to run.

Agreed. The issue with the fibon folder as a whole is a lot of the
benchmarks have substantial dependencies as they are taken from
Hackage to represent real world programs. This is handled in a very
ugly fashion right now by just including a copy of the source of all
dependencies. So overtime it will always break as GHC and base
changes.

Shootout and some of them though don't have dependencies, so we should
look at moving them out of the fibon folder and enabling them by
default. After that we can look at better ways to handle the
dependencies of the remaining fibon benchmarks.

Why are you creating new shootout benchmarks though rather than simply
move the exiting Shootout folder from fibon/Shootout to the top level
and fixing the makefile?

Some of this discussion going forward may make more sense on trac.
There is trac ticket for improving nofib in general here:
http://hackage.haskell.org/trac/ghc/ticket/5793

Cheers,
David

>
> -- Johan
>


Reply | Threaded
Open this post in threaded view
|

nofib comparisons between 7.0.4, 7.4.2, 7.6.1, and 7.6.2

Johan Tibell-2
On Tue, Feb 5, 2013 at 2:11 PM, David Terei <davidterei at gmail.com> wrote:

> Why are you creating new shootout benchmarks though rather than simply
> move the exiting Shootout folder from fibon/Shootout to the top level
> and fixing the makefile?
>

I discussed this with David offline. The summary is that the shootout
benchmarks in fibon has bitrotted to the point that they no longer
corresponds to the shootout benchmarks on the official site so there's
nothing really gained by modifying the current ones. In addition, I've made
sure to closely mirror the compilation settings and input sizes used in the
shootout.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130205/c4ff5d47/attachment.htm>

Reply | Threaded
Open this post in threaded view
|

nofib comparisons between 7.0.4, 7.4.2, 7.6.1, and 7.6.2

Johan Tibell-2
I've now added the shootout programs that could be added without modifying
the programs itself. I described why some programs weren't added in
nofib/shootout/README.

For the curious, here's the change in these benchmarks from 7.0.4 to 7.6.2:

--------------------------------------------------------------------------------
        Program           Size    Allocs   Runtime   Elapsed  TotalMem
--------------------------------------------------------------------------------
   binary-trees          +2.6%     -0.6%     -2.8%     -2.8%    -22.3%
 fannkuch-redux          +1.4%+11514445.     +0.2%     +0.2%     +0.0%
         n-body          +3.8%     +0.0%     +4.4%     +4.4%     +0.0%
       pidigits          +2.2%     -6.9%     -1.7%     -1.2%    -20.0%
  spectral-norm          +2.1%    -61.3%    -54.8%    -54.8%     +0.0%
--------------------------------------------------------------------------------
            Min          +1.4%    -61.3%    -54.8%    -54.8%    -22.3%
            Max          +3.8%+11514445.     +4.4%     +4.4%     +0.0%
 Geometric Mean          +2.4%   +737.6%    -14.7%    -14.6%     -9.1%

Some interesting differences here (and some really good ones)!

I looked into fannkuch-redux (nofib/shootout/fannkuch-redux) and confirmed
the allocation difference:

7.0.4:

          93,680 bytes allocated in the heap
           2,880 bytes copied during GC
          43,784 bytes maximum residency (1 sample(s))
          21,752 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

  Generation 0:     0 collections,     0 parallel,  0.00s,  0.00s elapsed
  Generation 1:     1 collections,     0 parallel,  0.00s,  0.00s elapsed

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time   38.53s  ( 38.56s elapsed)
  GC    time    0.00s  (  0.00s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time   38.53s  ( 38.56s elapsed)

  %GC time       0.0%  (0.0% elapsed)

  Alloc rate    2,431 bytes per MUT second

  Productivity 100.0% of total user, 99.9% of total elapsed

7.6.2:

  10,538,113,312 bytes allocated in the heap
         819,304 bytes copied during GC
          44,416 bytes maximum residency (2 sample(s))
          25,216 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0     20177 colls,     0 par    0.06s    0.05s     0.0000s    0.0000s
  Gen  1         2 colls,     0 par    0.00s    0.00s     0.0001s    0.0002s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time   38.76s  ( 38.82s elapsed)
  GC      time    0.06s  (  0.05s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time   38.83s  ( 38.88s elapsed)

  %GC     time       0.2%  (0.1% elapsed)

  Alloc rate    271,864,153 bytes per MUT second

  Productivity  99.8% of total user, 99.7% of total elapsed

We're going from a essentially non-allocation program to an allocating one.

Aside: I didn't use -fllvm, which is what the shootout normally uses.

-- Johan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130205/2f45b725/attachment.htm>

Reply | Threaded
Open this post in threaded view
|

nofib comparisons between 7.0.4, 7.4.2, 7.6.1, and 7.6.2

Simon Peyton Jones
In reply to this post by Johan Tibell-2
Instead of trying to get fibon to work I'll try to get some of the shootout benchmarks into nofib. These are small micro benchmarks that shouldn't require anything special to run.

Thank you!

From: Johan Tibell [mailto:johan.tibell at gmail.com]
Sent: 05 February 2013 17:34
To: David Terei
Cc: Simon Peyton-Jones; Nicolas Frisby; ghc-devs at haskell.org
Subject: Re: nofib comparisons between 7.0.4, 7.4.2, 7.6.1, and 7.6.2

On Tue, Feb 5, 2013 at 3:19 AM, David Terei <davidterei at gmail.com<mailto:davidterei at gmail.com>> wrote:
On 5 February 2013 02:13, Simon Peyton-Jones <simonpj at microsoft.com<mailto:simonpj at microsoft.com>> wrote:
> I believe fibon/ was helpfully added by someone, but never integrated into
> the nofib build system.  Just needs doing, I think
No I spent a fair amount of effort fixing this up about 9 months back.
At that stage it worked fine, I haven't run for 6 months so not sure
any more but they should be close to working at the least.

Instead of trying to get fibon to work I'll try to get some of the shootout benchmarks into nofib. These are small micro benchmarks that shouldn't require anything special to run.

-- Johan

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130206/9d2f0911/attachment.htm>

Reply | Threaded
Open this post in threaded view
|

nofib comparisons between 7.0.4, 7.4.2, 7.6.1, and 7.6.2

Simon Marlow-7
In reply to this post by Johan Tibell-2
On 05/02/13 23:48, Johan Tibell wrote:

> I've now added the shootout programs that could be added without
> modifying the programs itself. I described why some programs weren't
> added in nofib/shootout/README.
>
> For the curious, here's the change in these benchmarks from 7.0.4 to 7.6.2:
>
> --------------------------------------------------------------------------------
>          Program           Size    Allocs   Runtime   Elapsed  TotalMem
> --------------------------------------------------------------------------------
>     binary-trees          +2.6%     -0.6%     -2.8%     -2.8%    -22.3%
>   fannkuch-redux          +1.4%+11514445.     +0.2%     +0.2%     +0.0%
>           n-body          +3.8%     +0.0%     +4.4%     +4.4%     +0.0%
>         pidigits          +2.2%     -6.9%     -1.7%     -1.2%    -20.0%
>    spectral-norm          +2.1%    -61.3%    -54.8%    -54.8%     +0.0%
> --------------------------------------------------------------------------------
>              Min          +1.4%    -61.3%    -54.8%    -54.8%    -22.3%
>              Max          +3.8%+11514445.     +4.4%     +4.4%     +0.0%
>   Geometric Mean          +2.4%   +737.6%    -14.7%    -14.6%     -9.1%

This is slightly off topic, but I wanted to plant this thought in
people's brains: we shouldn't place much significance in the average of
a bunch of benchmarks (even the geometric mean), because it assumes that
the benchmarks have a sensible distribution, and we have no reason to
expect that to be the case.  For example, in the results above, we
wouldn't expect a 14.7% reduction in runtime to be seen in a typical
program.

Using the median might be slightly more useful, which here would be
something around 0% for runtime, though still technically dodgy.  When I
get around to it I'll modify nofib-analyse to report medians instead of GMs.

Cheers,
        Simon



Reply | Threaded
Open this post in threaded view
|

nofib comparisons between 7.0.4, 7.4.2, 7.6.1, and 7.6.2

Johan Tibell-2
On Wed, Feb 6, 2013 at 2:09 AM, Simon Marlow <marlowsd at gmail.com> wrote:

> This is slightly off topic, but I wanted to plant this thought in people's
> brains: we shouldn't place much significance in the average of a bunch of
> benchmarks (even the geometric mean), because it assumes that the
> benchmarks have a sensible distribution, and we have no reason to expect
> that to be the case.  For example, in the results above, we wouldn't expect
> a 14.7% reduction in runtime to be seen in a typical program.
>
> Using the median might be slightly more useful, which here would be
> something around 0% for runtime, though still technically dodgy.  When I
> get around to it I'll modify nofib-analyse to report medians instead of GMs.
>

Using the geometric mean as a way to summarize the results isn't that bad.
See "How not to lie with statistics: the correct way to summarize benchmark
results" (http://ece.uprm.edu/~nayda/Courses/Icom6115F06/Papers/paper4.pdf).

That being said, I think the most useful thing to do is to look at the big
losers, as they're often regressions. Making some class of programs much
worse is but improving the geometric mean overall is often worse
than changing nothing at all.

-- Johan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130206/08139d96/attachment.htm>

Reply | Threaded
Open this post in threaded view
|

nofib comparisons between 7.0.4, 7.4.2, 7.6.1, and 7.6.2

Simon Marlow-7
On 06/02/13 16:04, Johan Tibell wrote:

> On Wed, Feb 6, 2013 at 2:09 AM, Simon Marlow <marlowsd at gmail.com
> <mailto:marlowsd at gmail.com>> wrote:
>
>     This is slightly off topic, but I wanted to plant this thought in
>     people's brains: we shouldn't place much significance in the average
>     of a bunch of benchmarks (even the geometric mean), because it
>     assumes that the benchmarks have a sensible distribution, and we
>     have no reason to expect that to be the case.  For example, in the
>     results above, we wouldn't expect a 14.7% reduction in runtime to be
>     seen in a typical program.
>
>     Using the median might be slightly more useful, which here would be
>     something around 0% for runtime, though still technically dodgy.
>       When I get around to it I'll modify nofib-analyse to report
>     medians instead of GMs.
>
>
> Using the geometric mean as a way to summarize the results isn't that
> bad. See "How not to lie with statistics: the correct way to summarize
> benchmark results"
> (http://ece.uprm.edu/~nayda/Courses/Icom6115F06/Papers/paper4.pdf).

Yes - our current usage of GM is because we read that paper :)  I've
reported GMs of nofib programs in several papers.  I'm not saying the
paper is wrong - the GM is definitely more correct than the AM for
averaging normalised results.

The problem is that we're attributing equal weight to all of our
benchmarks, without any reason to expect that they are representative.
We collect as many benchmarks as we can and hope they are
representative, but in fact it's rarely the case: often a particular
optimisation or regression will hit just one or two benchmarks.  So all
I'm saying is that we shouldn't expect the GM to be representative.
Often there's no sensible mean at all - saying "some programs get a lot
better but most don't change" is far more informative than "on average
programs got faster by 1.2%".

> That being said, I think the most useful thing to do is to look at the
> big losers, as they're often regressions. Making some class of programs
> much worse is but improving the geometric mean overall is often worse
> than changing nothing at all.

Absolutely.

Cheers,
        Simon



Reply | Threaded
Open this post in threaded view
|

nofib comparisons between 7.0.4, 7.4.2, 7.6.1, and 7.6.2

itkovian
In reply to this post by Johan Tibell-2
Hi Johan,

On 06 Feb 2013, at 17:04, Johan Tibell <johan.tibell at gmail.com> wrote:

> On Wed, Feb 6, 2013 at 2:09 AM, Simon Marlow <marlowsd at gmail.com> wrote:
> This is slightly off topic, but I wanted to plant this thought in people's brains: we shouldn't place much significance in the average of a bunch of benchmarks (even the geometric mean), because it assumes that the benchmarks have a sensible distribution, and we have no reason to expect that to be the case.  For example, in the results above, we wouldn't expect a 14.7% reduction in runtime to be seen in a typical program.
>
> Using the median might be slightly more useful, which here would be something around 0% for runtime, though still technically dodgy.  When I get around to it I'll modify nofib-analyse to report medians instead of GMs.

No.

> Using the geometric mean as a way to summarize the results isn't that bad. See "How not to lie with statistics: the correct way to summarize benchmark results" (http://ece.uprm.edu/~nayda/Courses/Icom6115F06/Papers/paper4.pdf).

I would argue the exact opposite. The geometric mean has absolutely no meaning whatsoever.

See e.g.,

Computer Architecture Performance Evaluation Methods. L. Eeckhout Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, June 2010.
Quantifying performance changes with effect size confidence intervals - Tomas Kalibera and Richard Jones, 2012 (tech report)
Measuring Computer Performance: A Practitioner's Guide - Lilja, DJ, 2005
The Art of Computer Systems Performance Analysis: techniques for experimental design, measurement, simulation, and modelling - Jain, R.  1991

> That being said, I think the most useful thing to do is to look at the big losers, as they're often regressions. Making some class of programs much worse is but improving the geometric mean overall is often worse than changing nothing at all.

Yes.

Regards,
-- Andy Georges

PS. I wrote this a while back for the Evaluate collaborator


# The Mean Is Not A Simple Average

## Application domain

Aggregating measurements. The anti-pattern discusses a single example, though for all uses of an average it is important to consider the right mean to use. Examples of applicable means are: (weighed) arithmetic, (weighed) harmonic, each with respect to the proper weighing factors.

## Premise

You have a set of benchmarks and you wish to quantify your Shiny New Idea (SNI). To make it easy to grok the results, you decide to aggregate the impact of your Shiny New Idea with a single performance number: the mean of the various
measurements. This means that readers of your paper can easily compare single numbers: those for a baseline system, those for other enhancements you compare with and of course, the number for your SNI.

## Description

You have implemented your SNI and you wish to conduct a comparison study to show that your SNI outperforms existing work and improves execution time (or other metrics such as energy consumption, ...) with X% compared to a baseline
system. You design an experiment with a set of benchmarks from an applicable benchmark suite and you assemble performance numbers for each benchmark and for each scenario (baseline, your SNI, other work, ...). For example, you assemble executions times (the metric of choice for single-program workloads) and you wish to assess the speedup.

Since people prefer single numbers they can compare to see which one is bigger, you must aggregate you data into an average value. While this contains less information that the original data set, it is an easy way to see if your SNI improves things or not and to prove it to your readers or users.

You should choose a mean that allows you to: (i) directly compare the alternatives to each other by canceling out the baseline, (ii) make sure (relevant) outliers do not influence your average too much. Clearly, the geometric mean is perfectly suited for this purpose. Without further ado, determine the per-benchmark speedup for each scenario and you compute the various geometric means.

The resulting average values immediately allow you to see if your SNI improves the other scenarios derived from existing work. It also allows you to see how much you improve over these scenarios by dividing them by the geometric mean of
your SNI. Do not worry, the formula for the geometric mean makes sure that the baseline values are canceled out and you effectively get the average speedup of your SNI compared to existing work.

Now go ahead and publish these numbers that support your SNI.

## Why this is a bad idea

There may be specific circumstances where the use of a geometric mean is warranted, yet producing the average over some benchmark suite for a performance metric of your choice is not one of them. Typically, the geometric mean can be used when the final aggregate performance number results from multiplying individual numbers. For example, when making several enhancements to a system, the average improvement per enhancement can be expressed as the geometric mean of the speedups resulting from the individual improvements. However, for any benchmark suite (regardless of the relative importance one attaches to each benchmark in the suite), the aggregate results from adding the individual results, as is the case for, e.g., overall speedup. In practically all cases using either the (weighed) arithmetic mean of the (weighed) harmonic mean is the correct way to compute and report an average.

While it is true that the geometric mean sustains a smaller impact from outliers in the measurements compared to the other means, one should always investigate outliers and disregard them if there is an indication that the data is wrong. Otherwise, they can provide valuable insight. Moreover, by adding appropriate weights, one can easily reduce the impact of outliers.

## Example

Suppose you have 5 benchmarks, B1 ... B5. The baseline system has the following measurements: 10, 15, 7, 12, and 16, which yields a total execution time of 60. Hence, the aggregate score is the sum of the individual scores. Suppose now
you wish to compare two difference enhancements. The first enhancement yields the measurements 8, 10, 6, 11, 12 -- adding up to 47; the second enhancement yields the measurements 7, 12, 5, 10, 14 -- adding up to 48.

If we take a look at the global improvement achieved, then that is 60/47 = 1.2766 and 48/60 = 1.25 for enhancement 1 and enhancement 2, respectively. Therefore we conclude that by a small margin, enhancement 1 outperforms enhancement 2, for this particular set of benchmarks.

However, the geometric means are 1.2604 and 1.2794. From these numbers we would conclude the opposite, namely that enhancement 2 outperforms enhancement 1.

Which mean then yields the correct result? The answer is dependent on which system to weigh against. If we weigh against the enhanced system, giving the benchmarks the weights that correspond to their relative execution time
compared to the execution time of the complete suite (on the same configuration), then we need to use a weighed arithmetic mean. If we weigh against the baseline system, the correct answer is that we need to use the weighted harmonic mean.

Of course, the use of weights is often disregarded. If we assume all benchmarks are of equal importance, then we likely will not weigh them. In that case, all three means yield the same conclusion, but none of them accurately reflect the
true speedup that is achieved over the entire suite.

## Why is this pattern relevant

The geometric mean is still widely used and accepted by researchers. It can be found in papers published at top venues, such as OOPSLA, PLDI, CGO, etc. It is commonly used by e.g., VMMark, SPEC CPU, ... On multiple occasions the argument
regarding the impact of outliers is brought forth, even though there are other ways to deal with outliers.

References
        ? [[1]] J.E., Smith. Characterizing computer performance with a single number. CACM 31(10), 1988.
        ? [[2]] D.A., Patterson; J.L., Hennessy. Computer Organization and Design: The Hardware/Software Approach, Morgan Kaufman.




12