GHC perf

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

GHC perf

GHC - devs mailing list

Ben, David

I’m still baffled by how to reliably get GHC perf metrics on my local machine.

The wiki page https://gitlab.haskell.org/ghc/ghc/wikis/building/running-tests/performance-tests helps, but not enough!

  • There are two things going on:
    1. CI perf measurements
    2. Local machine perf measurements

I think that they are somehow handled differently (why?) but they are all muddled up on the wiki page.

  • My goal is this:
    • Start with a master commit, say from Dec 2019.
    • Implement some change, on a branch.
    • sh validate –legacy (or something else if you like)
    • Look at perf regressions.
  • I believe I have first to utter the incantation

$ git fetch https://gitlab.haskell.org/ghc/ghc-performance-notes.git refs/notes/perf:refs/notes/ci/perf

  • But then:
    • How do I ensure that the baseline perf numbers I get relate to the master commit I started from, back in Dec 2019?  I don’t want numbers from Jan 2020.
    • If I rebase my branch on top of HEAD, say, how do I update the perf baseline numbers to be for HEAD?
    • Generally, how can I tell the commit to which the baseline numbers relate?
  • Also, in my tree I have a series of incremental changes; I want to see if any of them have perf regressions.    How do I do that?

Thanks

Simon


_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: GHC perf

David Eichmann
Hi Simon,
  • There are two things going on:
    1. CI perf measurements
    2. Local machine perf measurements

I think that they are somehow handled differently (why?) but they are all muddled up on the wiki page.

They are handled differently because we do not want to compare local metrics with CI metrics. The exception is when local metrics don't exist, then we fall back to CI metrics as a baseline (see How baseline metrics are calculated).

  • My goal is this:
    • Start with a master commit, say from Dec 2019.
    • Implement some change, on a branch.
    • sh validate –legacy (or something else if you like)
    • Look at perf regressions.
Getting to the *raw data* should be easy:
  1. Checkout an the <baseline> commit.
  2. Use `git status` to double check git sees a clean working tree.
  3. Run the performance tests.
  4. Check out your <target> branch.
  5. Use `git status` to double check git sees a clean working tree (else commit any changes)
  6. Run the performance tests.
  7. Compare metrics (filtering for `local` metrics and outputting a chart):

            python3 testsuite/driver/perf_notes.py --chart chart.html --test-env local <baseline> <target>

see `python3 testsuite/driver/perf_notes.py --help` for more filtering options. This doesn't detect regressions automatically, it only shows you the raw data. Ideally we'd add an option to the testrunner to let you specify a baseline commit manually. I suspect that would be close to what you're looking for.
  • I believe I have first to utter the incantation

$ git fetch https://gitlab.haskell.org/ghc/ghc-performance-notes.git refs/notes/perf:refs/notes/ci/perf

Yes, this fetches the latest CI metrics into your git notes.

  • But then:
    • How do I ensure that the baseline perf numbers I get relate to the master commit I started from, back in Dec 2019?  I don’t want numbers from Jan 2020.
see above.
    • If I rebase my branch on top of HEAD, say, how do I update the perf baseline numbers to be for HEAD
The test runner should use HEAD's metrics automatically (see How baseline metrics are calculated), though you will need to fetch CI metrics or run the perf tests locally on HEAD to get the relevant metrics.
    • Generally, how can I tell the commit to which the baseline numbers relate?
The test runner will output (per test) which baseline commit is used e.g. "... from local
baseline @ HEAD~2" says the baseline was a local run from 2 commits ago.
  • Also, in my tree I have a series of incremental changes; I want to see if any of them have perf regressions.    How do I do that?

You can run the perf tests on each commit *in commit order*, and the previous commit will always be used as the baseline. You can also then chart the results:

            python3 testsuite/driver/perf_notes.py --chart chart.html --test-env local <oldestCommit>..<newestCommit>

Sorry if this is a bit unoptimal, but I Hope that helps

- David E



-- 
David Eichmann, Haskell Consultant
Well-Typed LLP, http://www.well-typed.com

Registered in England & Wales, OC335890
118 Wymering Mansions, Wymering Road, London W9 2NF, England 

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

RE: GHC perf

GHC - devs mailing list

David

 

Thanks.   Concerning this:

  1. Checkout an the <baseline> commit.
  2. Use `git status` to double check git sees a clean working tree.
  3. Run the performance tests.
  4. Check out your <target> branch.
  5. Use `git status` to double check git sees a clean working tree (else commit any changes)
  6. Run the performance tests.
  7. Compare metrics (filtering for `local` metrics and outputting a chart):

            python3 testsuite/driver/perf_notes.py --chart chart.html --test-env local <baseline> <target>

I believe that

  • This compares two local builds
  • It does not require fetching CI perf data; in fact it 100% independent of the CI system
  • It does require two separate build trees (that is fine)

 

Is that right?  If so, two questions

  • In that Python command line (step 7) is “<baseline>” the path to the root of the baseline tree, or to some file within that tree?
  • Is this process (and what it does) written up on some wiki page somewhere?  Where? Rather than replying to me individually, it’d be better to use this conversation to produce better guidance for everyone.

Thanks

 

Simon

 

 

From: David Eichmann <[hidden email]>
Sent: 20 January 2020 10:37
To: Simon Peyton Jones <[hidden email]>; Ben Gamari <[hidden email]>
Cc: ghc-devs <[hidden email]>
Subject: Re: GHC perf

 

Hi Simon,

  • There are two things going on:
    1. CI perf measurements
    2. Local machine perf measurements

I think that they are somehow handled differently (why?) but they are all muddled up on the wiki page.

They are handled differently because we do not want to compare local metrics with CI metrics. The exception is when local metrics don't exist, then we fall back to CI metrics as a baseline (see How baseline metrics are calculated).

  • My goal is this:
    • Start with a master commit, say from Dec 2019.
    • Implement some change, on a branch.
    • sh validate –legacy (or something else if you like)
    • Look at perf regressions.

Getting to the *raw data* should be easy:

  1. Checkout an the <baseline> commit.
  2. Use `git status` to double check git sees a clean working tree.
  3. Run the performance tests.
  4. Check out your <target> branch.
  5. Use `git status` to double check git sees a clean working tree (else commit any changes)
  6. Run the performance tests.
  7. Compare metrics (filtering for `local` metrics and outputting a chart):

            python3 testsuite/driver/perf_notes.py --chart chart.html --test-env local <baseline> <target>

see `python3 testsuite/driver/perf_notes.py --help` for more filtering options. This doesn't detect regressions automatically, it only shows you the raw data. Ideally we'd add an option to the testrunner to let you specify a baseline commit manually. I suspect that would be close to what you're looking for.

  • I believe I have first to utter the incantation

$ git fetch https://gitlab.haskell.org/ghc/ghc-performance-notes.git refs/notes/perf:refs/notes/ci/perf

Yes, this fetches the latest CI metrics into your git notes.

  • But then:
    • How do I ensure that the baseline perf numbers I get relate to the master commit I started from, back in Dec 2019?  I don’t want numbers from Jan 2020.

see above.

    • If I rebase my branch on top of HEAD, say, how do I update the perf baseline numbers to be for HEAD

The test runner should use HEAD's metrics automatically (see How baseline metrics are calculated), though you will need to fetch CI metrics or run the perf tests locally on HEAD to get the relevant metrics.

    • Generally, how can I tell the commit to which the baseline numbers relate?

The test runner will output (per test) which baseline commit is used e.g. "... from local

baseline @ HEAD~2" says the baseline was a local run from 2 commits ago.

  • Also, in my tree I have a series of incremental changes; I want to see if any of them have perf regressions.    How do I do that?

You can run the perf tests on each commit *in commit order*, and the previous commit will always be used as the baseline. You can also then chart the results:

            python3 testsuite/driver/perf_notes.py --chart chart.html --test-env local <oldestCommit>..<newestCommit>

Sorry if this is a bit unoptimal, but I Hope that helps

- David E

 

 
 
-- 
David Eichmann, Haskell Consultant
Well-Typed LLP, http://www.well-typed.com
 
Registered in England & Wales, OC335890
118 Wymering Mansions, Wymering Road, London W9 2NF, England 

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

RE: GHC perf

GHC - devs mailing list

Thanks

 

This information is a bit spread out over the wiki page.

 

Which wiki page?   Yes, it’d be fantastic to write this out clearly.  Thanks!

 

$ git checkout a12b34c56 && git submodule update --init
$ ./hadrian/build.sh test --only-perf
$ git checkout x98y76z54 && git submodule update --init
$ ./hadrian/build.sh test --only-perf
$ python3 testsuite/driver/perf_notes.py --chart chart.html --test-env local a12b34c56 x98y76z54
$ firefox chart.html

 

Ah.  Now I’m lost.  Somehow the second and fourth line must be recording info, locally in my tree, but two distinct batches of information.   Perhaps kept distinct by the current commit?  Where is the info actually stored?

 

OK, suppose I start from commit XX, and make some local changes.   Then I do the –only-perf thing.  presumably that’ll be recorded tagged with XX.  That’s fine; just want it to be clear.  Worth adding this info to the wiki page, so we have a clear mental model.

 

Thanks

 

Simon

 

 

 

From: David Eichmann <[hidden email]>
Sent: 23 January 2020 11:19
To: Simon Peyton Jones <[hidden email]>
Subject: Re: GHC perf

 

Simon

  • This compares two local builds

Yes

  • It does not require fetching CI perf data; in fact it 100% independent of the CI system

Yes

  • It does require two separate build trees (that is fine)

No, this does not require different build trees, <baseline> and <target> are git commits (or similar e.g. branch name). The actual process might look like:

$ git checkout a12b34c56 && git submodule update --init
$ ./hadrian/build.sh test --only-perf
$ git checkout x98y76z54 && git submodule update --init
$ ./hadrian/build.sh test --only-perf
$ python3 testsuite/driver/perf_notes.py --chart chart.html --test-env local a12b34c56 x98y76z54
$ firefox chart.html

This information is a bit spread out over the wiki page. Perhaps a "quick start" section describing this use case would be helpful.

On 1/22/20 10:54 AM, Simon Peyton Jones wrote:

David

 

Thanks.   Concerning this:

  1. Checkout an the <baseline> commit.
  2. Use `git status` to double check git sees a clean working tree.
  3. Run the performance tests.
  4. Check out your <target> branch.
  5. Use `git status` to double check git sees a clean working tree (else commit any changes)
  6. Run the performance tests.
  7. Compare metrics (filtering for `local` metrics and outputting a chart):

            python3 testsuite/driver/perf_notes.py --chart chart.html --test-env local <baseline> <target>

I believe that

  • This compares two local builds
  • It does not require fetching CI perf data; in fact it 100% independent of the CI system
  • It does require two separate build trees (that is fine)

 

Is that right?  If so, two questions

  • In that Python command line (step 7) is “<baseline>” the path to the root of the baseline tree, or to some file within that tree?
  • Is this process (and what it does) written up on some wiki page somewhere?  Where? Rather than replying to me individually, it’d be better to use this conversation to produce better guidance for everyone.

Thanks

 

Simon

 

 

From: David Eichmann [hidden email]
Sent: 20 January 2020 10:37
To: Simon Peyton Jones [hidden email]; Ben Gamari [hidden email]
Cc: ghc-devs [hidden email]
Subject: Re: GHC perf

 

Hi Simon,

  • There are two things going on:
    1. CI perf measurements
    2. Local machine perf measurements

I think that they are somehow handled differently (why?) but they are all muddled up on the wiki page.

They are handled differently because we do not want to compare local metrics with CI metrics. The exception is when local metrics don't exist, then we fall back to CI metrics as a baseline (see How baseline metrics are calculated).

  • My goal is this:
    • Start with a master commit, say from Dec 2019.
    • Implement some change, on a branch.
    • sh validate –legacy (or something else if you like)
    • Look at perf regressions.

Getting to the *raw data* should be easy:

  1. Checkout an the <baseline> commit.
  2. Use `git status` to double check git sees a clean working tree.
  3. Run the performance tests.
  4. Check out your <target> branch.
  5. Use `git status` to double check git sees a clean working tree (else commit any changes)
  6. Run the performance tests.
  7. Compare metrics (filtering for `local` metrics and outputting a chart):

            python3 testsuite/driver/perf_notes.py --chart chart.html --test-env local <baseline> <target>

see `python3 testsuite/driver/perf_notes.py --help` for more filtering options. This doesn't detect regressions automatically, it only shows you the raw data. Ideally we'd add an option to the testrunner to let you specify a baseline commit manually. I suspect that would be close to what you're looking for.


  • I believe I have first to utter the incantation

$ git fetch https://gitlab.haskell.org/ghc/ghc-performance-notes.git refs/notes/perf:refs/notes/ci/perf

Yes, this fetches the latest CI metrics into your git notes.


  • But then:
    • How do I ensure that the baseline perf numbers I get relate to the master commit I started from, back in Dec 2019?  I don’t want numbers from Jan 2020.

see above.


    • If I rebase my branch on top of HEAD, say, how do I update the perf baseline numbers to be for HEAD

The test runner should use HEAD's metrics automatically (see How baseline metrics are calculated), though you will need to fetch CI metrics or run the perf tests locally on HEAD to get the relevant metrics.


    • Generally, how can I tell the commit to which the baseline numbers relate?

The test runner will output (per test) which baseline commit is used e.g. "... from local

baseline @ HEAD~2" says the baseline was a local run from 2 commits ago.

  • Also, in my tree I have a series of incremental changes; I want to see if any of them have perf regressions.    How do I do that?

You can run the perf tests on each commit *in commit order*, and the previous commit will always be used as the baseline. You can also then chart the results:

            python3 testsuite/driver/perf_notes.py --chart chart.html --test-env local <oldestCommit>..<newestCommit>

Sorry if this is a bit unoptimal, but I Hope that helps

- David E

 

 
 
-- 
David Eichmann, Haskell Consultant
Well-Typed LLP, http://www.well-typed.com
 
Registered in England & Wales, OC335890
118 Wymering Mansions, Wymering Road, London W9 2NF, England 
-- 
David Eichmann, Haskell Consultant
Well-Typed LLP, http://www.well-typed.com
 
Registered in England & Wales, OC335890
118 Wymering Mansions, Wymering Road, London W9 2NF, England 

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

RE: GHC perf

GHC - devs mailing list

We store the metrics in git notes *per-commit*. All metrics for commit XX are stored on the git note for commit XX. You can even view the raw data with this command (where XX is the commit hash):

OK.   But the master repo *already* has perf notes for that commit (I assume).  Do mine somehow overwrite the master copy?

 

So suppose, on my local machine, I do

$ git checkout a12b34c56 && git submodule update --init
$ ./hadrian/build.sh test --only-perf

Now you say that I’m going to create git notes for a12b34c56.  But those are purely for my local machine!  Maybe my compiler is build with -DDEBUG.  I don’t want them to accidentally land in the main repo as the canonical perf figures for a12b34c56.

 

How do I avoid accidentally pushing them?

I should stress one caveat: we do not save metrics if you have uncommitted changes.

Oh wow.  Put that in MASSIVE BOLD CAPITALS.   You mean that the entire exercise will (silently) be bogus if I have any uncommitted changes?   That’s a bit of a pain if I make a change, run some perf tests, make another change, run again.  But I can live with it if I know.

 

Simon

 

 

From: David Eichmann <[hidden email]>
Sent: 23 January 2020 14:48
To: Simon Peyton Jones <[hidden email]>
Subject: Re: GHC perf

 

Which wiki page?

https://gitlab.haskell.org/ghc/ghc/wikis/building/running-tests/performance-tests

Ah.  Now I’m lost.  Somehow the second and fourth line must be recording info, locally in my tree, but two distinct batches of information.   Perhaps kept distinct by the current commit?  Where is the info actually stored?

All metric results are stored in git notes. This is a feature of git that lets you attach arbitrary text to a commit (without affecting the commit's hash). It's mentioned here. Whenever you run a performance test, the raw metrics will be appended to the git note for the current commit in a simple tab separated value (tsv) format.

OK, suppose I start from commit XX, and make some local changes.   Then I do the –only-perf thing.  presumably that’ll be recorded tagged with XX.  That’s fine; just want it to be clear.  Worth adding this info to the wiki page, so we have a clear mental model.

We store the metrics in git notes *per-commit*. All metrics for commit XX are stored on the git note for commit XX. You can even view the raw data with this command (where XX is the commit hash):

$ git notes --ref perf show XX

NOTE `--only-perf` is optional. It limits the test runner to only run performance tests but the performance metrics will be stored regardless of this option. So, if you've ever run performance test locally, chances are the metrics will have be record without you even knowing.

I should stress one caveat: we do not save metrics if you have uncommitted changes.

-- 
David Eichmann, Haskell Consultant
Well-Typed LLP, http://www.well-typed.com
 
Registered in England & Wales, OC335890
118 Wymering Mansions, Wymering Road, London W9 2NF, England 

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs