Scaling back CI (for now)?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Scaling back CI (for now)?

Matthew Pickering
Hi all,

Everyone has probably noticed that getting anything merged is a real
effort at the moment. The main problem is that CI takes in the region
of 5-7 hours and then spuriously fails at the end. After 5-7 hours you
have to rebase and run CI again and so on. Therefore I propose to run
just these four jobs on every MR:

validate-x86_64-linux-deb9
validate-x86_64-linux-deb8-hadrian
validate-x86_64-windows
validate-x86_64-darwin

The reasoning is as follows:

validate-x86_64-linux-deb9
validate-x86_64-linux-deb8-hadrian

These run first and are reliable and finish within an hour. Then we
have lots of less reliable, lower priority jobs.

Two windows jobs which take forever to run.

validate-x86_64-windows
validate-x86_64-windows-hadrian

One darwin job

validate-x86_64-darwin

Many more linux jobs

validate-x86_64-linux-deb9-unreg
validate-x86_64-linux-deb9-integer-simple
validate-x86_64-linux-fedora27
validate-x86_64-linux-deb9-llvm
validate-x86_64-linux-deb8
validate-i386-linux-deb9
validate-aarch64-linux-deb9

So I don't argue that these are important to test but at the moment
they produce too much friction on every commit through a combination
of lack of resources and taking too long.

Further to this, we really don't need to test fedora27, deb9 and deb8
for every build. When was the last time we broke one of these
platforms but not the other, it's rare!

So the concrete proposal is to slim back the per commit validation to four jobs.

validate-x86_64-linux-deb9
validate-x86_64-linux-deb8-hadrian
validate-x86_64-windows
validate-x86_64-darwin

which will test on the three major platforms.

All the other flavours should be run once the commit reaches master.

Thoughts?

Cheers,

Matt
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Scaling back CI (for now)?

Sebastian Graf
Hi,

Am Sa., 2. Feb. 2019 um 16:09 Uhr schrieb Matthew Pickering <[hidden email]>:

All the other flavours should be run once the commit reaches master.

Thoughts?

That's even better than my idea of only running them as nightlies. In favor!


Cheers,

Matt
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Scaling back CI (for now)?

Matthew Pickering
It has been established today that Marge is failing to run in batch
mode for some reason which means it takes at least as long as CI takes
to complete for each commit to be merged. The rate is about 4
commits/day with the current configuration.

On Sat, Feb 2, 2019 at 7:57 PM Sebastian Graf <[hidden email]> wrote:

>
> Hi,
>
> Am Sa., 2. Feb. 2019 um 16:09 Uhr schrieb Matthew Pickering <[hidden email]>:
>>
>>
>> All the other flavours should be run once the commit reaches master.
>>
>> Thoughts?
>
>
> That's even better than my idea of only running them as nightlies. In favor!
>
>>
>> Cheers,
>>
>> Matt
>> _______________________________________________
>> ghc-devs mailing list
>> [hidden email]
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Scaling back CI (for now)?

Phyx
That aside, the CIs don't seem stable at all. Frequent timeouts even before they start. I have been trying to merge 3 changes for a while now and everytime one of them times out and I have to restart the timed out ones. Then there are merge conflicts and I have to start over. 

This is "bot wackamole" :) 

On Sun, Feb 3, 2019, 13:56 Matthew Pickering <[hidden email]> wrote:
It has been established today that Marge is failing to run in batch
mode for some reason which means it takes at least as long as CI takes
to complete for each commit to be merged. The rate is about 4
commits/day with the current configuration.

On Sat, Feb 2, 2019 at 7:57 PM Sebastian Graf <[hidden email]> wrote:
>
> Hi,
>
> Am Sa., 2. Feb. 2019 um 16:09 Uhr schrieb Matthew Pickering <[hidden email]>:
>>
>>
>> All the other flavours should be run once the commit reaches master.
>>
>> Thoughts?
>
>
> That's even better than my idea of only running them as nightlies. In favor!
>
>>
>> Cheers,
>>
>> Matt
>> _______________________________________________
>> ghc-devs mailing list
>> [hidden email]
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Scaling back CI (for now)?

Matthew Pickering
This evening I have fixed the batch mode.

For example: https://gitlab.haskell.org/ghc/ghc/merge_requests/302

Hopefully it should be smoother sailing now.

Matt

On Tue, Feb 5, 2019 at 7:36 PM Phyx <[hidden email]> wrote:

>
> That aside, the CIs don't seem stable at all. Frequent timeouts even before they start. I have been trying to merge 3 changes for a while now and everytime one of them times out and I have to restart the timed out ones. Then there are merge conflicts and I have to start over.
>
> This is "bot wackamole" :)
>
> On Sun, Feb 3, 2019, 13:56 Matthew Pickering <[hidden email]> wrote:
>>
>> It has been established today that Marge is failing to run in batch
>> mode for some reason which means it takes at least as long as CI takes
>> to complete for each commit to be merged. The rate is about 4
>> commits/day with the current configuration.
>>
>> On Sat, Feb 2, 2019 at 7:57 PM Sebastian Graf <[hidden email]> wrote:
>> >
>> > Hi,
>> >
>> > Am Sa., 2. Feb. 2019 um 16:09 Uhr schrieb Matthew Pickering <[hidden email]>:
>> >>
>> >>
>> >> All the other flavours should be run once the commit reaches master.
>> >>
>> >> Thoughts?
>> >
>> >
>> > That's even better than my idea of only running them as nightlies. In favor!
>> >
>> >>
>> >> Cheers,
>> >>
>> >> Matt
>> >> _______________________________________________
>> >> ghc-devs mailing list
>> >> [hidden email]
>> >> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>> _______________________________________________
>> ghc-devs mailing list
>> [hidden email]
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Scaling back CI (for now)?

Ben Gamari-2
In reply to this post by Phyx
Phyx <[hidden email]> writes:

> That aside, the CIs don't seem stable at all. Frequent timeouts even before
> they start. I have been trying to merge 3 changes for a while now and
> everytime one of them times out and I have to restart the timed out ones.
> Then there are merge conflicts and I have to start over.
>
Indeed Marge was causing a remarkable amount of CI traffic, leading to
long queues, and eventually build timeouts. Thankfully Matthew
investigated why Marge's batch mode wasn't batching and consequently
things should now be much better.

Sorry for the previous inconvenience!

Cheers,

- Ben


_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

signature.asc (497 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Scaling back CI (for now)?

Richard Eisenberg-4
So, just checking: is the recommended route to merging now to use the Marge Bot instructions posted previously? (That is, get 1+ approvals and then assign to Marge.)

Thanks,
Richard

> On Feb 6, 2019, at 5:23 PM, Ben Gamari <[hidden email]> wrote:
>
> Phyx <[hidden email]> writes:
>
>> That aside, the CIs don't seem stable at all. Frequent timeouts even before
>> they start. I have been trying to merge 3 changes for a while now and
>> everytime one of them times out and I have to restart the timed out ones.
>> Then there are merge conflicts and I have to start over.
>>
> Indeed Marge was causing a remarkable amount of CI traffic, leading to
> long queues, and eventually build timeouts. Thankfully Matthew
> investigated why Marge's batch mode wasn't batching and consequently
> things should now be much better.
>
> Sorry for the previous inconvenience!
>
> Cheers,
>
> - Ben
>
> _______________________________________________
> ghc-devs mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Scaling back CI (for now)?

Ben Gamari-2
Richard Eisenberg <[hidden email]> writes:

> So, just checking: is the recommended route to merging now to use the
> Marge Bot instructions posted previously? (That is, get 1+ approvals
> and then assign to Marge.)
>
Indeed. I was just sent an email reiterating the previous guidance to
the list.

Cheers,

- Ben


_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

signature.asc (497 bytes) Download Attachment