A small but useful tool for performance characterisation

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

A small but useful tool for performance characterisation

Ben Gamari-3
Hi everyone,

I have recently been doing a fair amount of performance characterisation
and have long wanted a convenient means of collecting GHC runtime
statistics for later analysis. For this I quickly developed a small
wrapper utility [1].

To see what it does, let's consider an example. Say we made a change to
GHC which we believe might affect the runtime performance of Program.hs.
We could quickly check this by running,

    $ ghc-before/_build/stage1/bin/ghc -O Program.hs
    $ ghc_perf.py -o before.json ./Program
    $ ghc-before/_build/stage1/bin/ghc -O Program.hs
    $ ghc_perf.py -o after.json ./Program

This will produce two files, before.json and after.json, which contain
the various runtime statistics emitted by +RTS -s --machine-readable.
These files are in the same format as is used by my nofib branch [2] and
therefore can be compared using `nofib-compare` from that branch.

In addition to being able to collect runtime metrics, ghc_perf is also
able to collect performance counters (on Linux only) using perf. For
instance,

    $ ghc_perf.py -o program.json \
        -e instructions,cycles,cache-misses ./Program

will produce program.json containing not only RTS statistics but also
event counts from the perf instructions, cycles, and cache-misses
events. Alternatively, passing simply `ghc_perf.py --perf` enables a
reasonable default set of events (namely instructions, cycles,
cache-misses, branches, and branch-misses).

Finally, ghc_perf can also handle repeated runs. For instance,

    $ ghc_perf.py -o program.json -r 5 --summarize \
         -e instructions,cycles,cache-misses ./Program

will run Program 5 times, emit all of the collected samples to
program.json, and produce a (very basic) statistical summary of what it
collected on stdout.

Note that there are a few possible TODOs that I've been considering:

 * I chose JSON as the output format to accomodate structured data (e.g.
   capture experimental parameters in a structured way). However, in
   practice this choice has lead to significantly more inconvenience
   than I would like, especially given that so far I've only used the
   format to capture basic key/value pairs. Perhaps reverting to CSV
   would be preferable.

 * It might be nice to also add support for cachegrind.
 
Anyways, I hope that others find this as useful as I have.

Cheers,

- Ben


[1] https://gitlab.haskell.org/bgamari/ghc-utils/blob/master/ghc_perf.py
[2] https://gitlab.haskell.org/ghc/nofib/merge_requests/24

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

signature.asc (497 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: A small but useful tool for performance characterisation

Richard Eisenberg-5
Hi Ben,

This sounds great. Is there a place on the wiki to catalog tools like this?

Thanks for telling us about it!
Richard

> On Jan 4, 2020, at 7:37 PM, Ben Gamari <[hidden email]> wrote:
>
> Hi everyone,
>
> I have recently been doing a fair amount of performance characterisation
> and have long wanted a convenient means of collecting GHC runtime
> statistics for later analysis. For this I quickly developed a small
> wrapper utility [1].
>
> To see what it does, let's consider an example. Say we made a change to
> GHC which we believe might affect the runtime performance of Program.hs.
> We could quickly check this by running,
>
>    $ ghc-before/_build/stage1/bin/ghc -O Program.hs
>    $ ghc_perf.py -o before.json ./Program
>    $ ghc-before/_build/stage1/bin/ghc -O Program.hs
>    $ ghc_perf.py -o after.json ./Program
>
> This will produce two files, before.json and after.json, which contain
> the various runtime statistics emitted by +RTS -s --machine-readable.
> These files are in the same format as is used by my nofib branch [2] and
> therefore can be compared using `nofib-compare` from that branch.
>
> In addition to being able to collect runtime metrics, ghc_perf is also
> able to collect performance counters (on Linux only) using perf. For
> instance,
>
>    $ ghc_perf.py -o program.json \
>        -e instructions,cycles,cache-misses ./Program
>
> will produce program.json containing not only RTS statistics but also
> event counts from the perf instructions, cycles, and cache-misses
> events. Alternatively, passing simply `ghc_perf.py --perf` enables a
> reasonable default set of events (namely instructions, cycles,
> cache-misses, branches, and branch-misses).
>
> Finally, ghc_perf can also handle repeated runs. For instance,
>
>    $ ghc_perf.py -o program.json -r 5 --summarize \
>         -e instructions,cycles,cache-misses ./Program
>
> will run Program 5 times, emit all of the collected samples to
> program.json, and produce a (very basic) statistical summary of what it
> collected on stdout.
>
> Note that there are a few possible TODOs that I've been considering:
>
> * I chose JSON as the output format to accomodate structured data (e.g.
>   capture experimental parameters in a structured way). However, in
>   practice this choice has lead to significantly more inconvenience
>   than I would like, especially given that so far I've only used the
>   format to capture basic key/value pairs. Perhaps reverting to CSV
>   would be preferable.
>
> * It might be nice to also add support for cachegrind.
>
> Anyways, I hope that others find this as useful as I have.
>
> Cheers,
>
> - Ben
>
>
> [1] https://gitlab.haskell.org/bgamari/ghc-utils/blob/master/ghc_perf.py
> [2] https://gitlab.haskell.org/ghc/nofib/merge_requests/24
> _______________________________________________
> ghc-devs mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: A small but useful tool for performance characterisation

Ben Gamari-3
There is the "useful tools" page [1] which has mentioned the ghc-utils repository where the aforementioned script lives for a few years now. That being said, I get the impression that not many people have found it via this page. Everyone who I know of who has used anything in ghc-utils has discovered it via word of mouth.

I'm not sure what to do about this. The page isn't *that* buried: from the wiki home page one arrives at it via the link path Working Conventions/Various tools.

Cheers,

- Ben

On January 4, 2020 8:51:07 PM EST, Richard Eisenberg <[hidden email]> wrote:
Hi Ben,

This sounds great. Is there a place on the wiki to catalog tools like this?

Thanks for telling us about it!
Richard

On Jan 4, 2020, at 7:37 PM, Ben Gamari <[hidden email]> wrote:

Hi everyone,

I have recently been doing a fair amount of performance characterisation
and have long wanted a convenient means of collecting GHC runtime
statistics for later analysis. For this I quickly developed a small
wrapper utility [1].

To see what it does, let's consider an example. Say we made a change to
GHC which we believe might affect the runtime performance of Program.hs.
We could quickly check this by running,

$ ghc-before/_build/stage1/bin/ghc -O Program.hs
$ ghc_perf.py -o before.json ./Program
$ ghc-before/_build/stage1/bin/ghc -O Program.hs
$ ghc_perf.py -o after.json ./Program

This will produce two files, before.json and after.json, which contain
the various runtime statistics emitted by +RTS -s --machine-readable.
These files are in the same format as is used by my nofib branch [2] and
therefore can be compared using `nofib-compare` from that branch.

In addition to being able to collect runtime metrics, ghc_perf is also
able to collect performance counters (on Linux only) using perf. For
instance,

$ ghc_perf.py -o program.json \
-e instructions,cycles,cache-misses ./Program

will produce program.json containing not only RTS statistics but also
event counts from the perf instructions, cycles, and cache-misses
events. Alternatively, passing simply `ghc_perf.py --perf` enables a
reasonable default set of events (namely instructions, cycles,
cache-misses, branches, and branch-misses).

Finally, ghc_perf can also handle repeated runs. For instance,

$ ghc_perf.py -o program.json -r 5 --summarize \
-e instructions,cycles,cache-misses ./Program

will run Program 5 times, emit all of the collected samples to
program.json, and produce a (very basic) statistical summary of what it
collected on stdout.

Note that there are a few possible TODOs that I've been considering:

* I chose JSON as the output format to accomodate structured data (e.g.
capture experimental parameters in a structured way). However, in
practice this choice has lead to significantly more inconvenience
than I would like, especially given that so far I've only used the
format to capture basic key/value pairs. Perhaps reverting to CSV
would be preferable.

* It might be nice to also add support for cachegrind.

Anyways, I hope that others find this as useful as I have.

Cheers,

- Ben


[1] https://gitlab.haskell.org/bgamari/ghc-utils/blob/master/ghc_perf.py
[2] https://gitlab.haskell.org/ghc/nofib/merge_requests/24
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs