CI on forked projects: Darwin woes

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

CI on forked projects: Darwin woes

Kevin Buhr
Over the past few days, I've submitted several merge requests from
branches on my forked project (mostly because I didn't even realize
pushing to a branch on the main project was an alternative).

When those MRs run under CI, I've had a bunch of failures due to
timeouts waiting on a darwin-x86_64 runner.  I was a little mystified
that no other pipelines besides mine seemed to be having this problem,
but I've come to understand that MRs submitted from branches on the main
project use a different, larger set of runners than the shared runners
used by MRs from branches on forked projects.

Under my project, I can view the available shared runners under the
"Settings" -> "CI/CD" -> "Runners" tab, and the problem seems to be that
there's only one darwin runner ("b4bc6410" /
mac-mini-x86_64-darwin-davxkc).  This machine is a trooper, but it
unfortunately shares a circuit breaker with a toaster oven, so it goes
offline every time someone wants a bagel, and the rest of the time it
must be running CI for a few hundred GHC forks.

I ended up deleting an (unreviewed) MR sourced from my branch, and
pushing it to the main project and resubmitting just to get the CI to
run.  (Admittedly, it failed, but at least not on darwin!)  I obviously
don't want to do this with the merge requests that have already been
reviewed.

Is this a temporary problem?  Is there anything I can do other than keep
retrying the darwin jobs every couple days?

Also, is there a better place than "ghc-dev" to send these sorts of
GitLab/CI issues?  I thought there might be a project dedicated to it,
but if so I couldn't find it.


--
Kevin Buhr <[hidden email]>

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: CI on forked projects: Darwin woes

Iavor Diatchki
I think there was the ghc-devops-group list, but I don't know if it is
still active, and I kind of like to not have to follow too many lists.

For example, I had also not realized that it is an option to push to
branches on the main project, and have been using my own fork,
so thanks for posting this here!

-Iavor


On Wed, May 8, 2019 at 11:40 AM Kevin Buhr <[hidden email]> wrote:

>
> Over the past few days, I've submitted several merge requests from
> branches on my forked project (mostly because I didn't even realize
> pushing to a branch on the main project was an alternative).
>
> When those MRs run under CI, I've had a bunch of failures due to
> timeouts waiting on a darwin-x86_64 runner.  I was a little mystified
> that no other pipelines besides mine seemed to be having this problem,
> but I've come to understand that MRs submitted from branches on the main
> project use a different, larger set of runners than the shared runners
> used by MRs from branches on forked projects.
>
> Under my project, I can view the available shared runners under the
> "Settings" -> "CI/CD" -> "Runners" tab, and the problem seems to be that
> there's only one darwin runner ("b4bc6410" /
> mac-mini-x86_64-darwin-davxkc).  This machine is a trooper, but it
> unfortunately shares a circuit breaker with a toaster oven, so it goes
> offline every time someone wants a bagel, and the rest of the time it
> must be running CI for a few hundred GHC forks.
>
> I ended up deleting an (unreviewed) MR sourced from my branch, and
> pushing it to the main project and resubmitting just to get the CI to
> run.  (Admittedly, it failed, but at least not on darwin!)  I obviously
> don't want to do this with the merge requests that have already been
> reviewed.
>
> Is this a temporary problem?  Is there anything I can do other than keep
> retrying the darwin jobs every couple days?
>
> Also, is there a better place than "ghc-dev" to send these sorts of
> GitLab/CI issues?  I thought there might be a project dedicated to it,
> but if so I couldn't find it.
>
>
> --
> Kevin Buhr <[hidden email]>
>
> _______________________________________________
> ghc-devs mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: CI on forked projects: Darwin woes

Carter Schonwald
I’m the root care taker on the Mac ci box.  

One issue here is that while  forks and branches both get ci, only branches are visible to non admin roles. So there could be a kajillion other folks forks going on or something. 

timeouts sound like gitlab side thing.  I definitely have had to restart jobs before.  

For what it’s worth: it’s hosted at Mac stadium so it’s actually in a data center.  Plus after Ben added some disk space cleanup scripts to the ci its had zero administrative interventions for months.  Plus it’s configured to actually have a working gitlab runner even if a reboot happens (took a while to figure out that bit of Mac admin )

Failures on the Mac mini side tend to have more informative failure modes. Timeouts are a gitlab runner thing. And I’ve definitely had to tickle restarting in my own patches. 

Next time you hit a failure could you share with the devs list and or #ghc irc ?

On Wed, May 8, 2019 at 2:59 PM Iavor Diatchki <[hidden email]> wrote:
I think there was the ghc-devops-group list, but I don't know if it is
still active, and I kind of like to not have to follow too many lists.

For example, I had also not realized that it is an option to push to
branches on the main project, and have been using my own fork,
so thanks for posting this here!

-Iavor


On Wed, May 8, 2019 at 11:40 AM Kevin Buhr <[hidden email]> wrote:
>
> Over the past few days, I've submitted several merge requests from
> branches on my forked project (mostly because I didn't even realize
> pushing to a branch on the main project was an alternative).
>
> When those MRs run under CI, I've had a bunch of failures due to
> timeouts waiting on a darwin-x86_64 runner.  I was a little mystified
> that no other pipelines besides mine seemed to be having this problem,
> but I've come to understand that MRs submitted from branches on the main
> project use a different, larger set of runners than the shared runners
> used by MRs from branches on forked projects.
>
> Under my project, I can view the available shared runners under the
> "Settings" -> "CI/CD" -> "Runners" tab, and the problem seems to be that
> there's only one darwin runner ("b4bc6410" /
> mac-mini-x86_64-darwin-davxkc).  This machine is a trooper, but it
> unfortunately shares a circuit breaker with a toaster oven, so it goes
> offline every time someone wants a bagel, and the rest of the time it
> must be running CI for a few hundred GHC forks.
>
> I ended up deleting an (unreviewed) MR sourced from my branch, and
> pushing it to the main project and resubmitting just to get the CI to
> run.  (Admittedly, it failed, but at least not on darwin!)  I obviously
> don't want to do this with the merge requests that have already been
> reviewed.
>
> Is this a temporary problem?  Is there anything I can do other than keep
> retrying the darwin jobs every couple days?
>
> Also, is there a better place than "ghc-dev" to send these sorts of
> GitLab/CI issues?  I thought there might be a project dedicated to it,
> but if so I couldn't find it.
>
>
> --
> Kevin Buhr <[hidden email]>
>
> _______________________________________________
> ghc-devs mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: CI on forked projects: Darwin woes

Carter Schonwald
Cool.  I recommend irc and devs list plus urls / copies of error messages.

Hard to debug timeout if we don’t have the literal url or error messages shared !

-Carter
 

From: Kevin Buhr <[hidden email]>
Sent: Sunday, May 12, 2019 11:01 AM
To: Carter Schonwald
Cc: Iavor Diatchki
Subject: Re: CI on forked projects: Darwin woes
 
Thanks!  I'll send a note if it starts happening again.


On 5/12/19 7:23 AM, Carter Schonwald wrote:
>
[ . . . ]
> Next time you hit a failure could you share with the devs list and or
> #ghc irc ?

--
Kevin Buhr <[hidden email]>


_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: CI on forked projects: Darwin woes

Ben Gamari-2
Carter Schonwald <[hidden email]> writes:

> Cool.  I recommend irc and devs list plus urls / copies of error messages.
>
> Hard to debug timeout if we don’t have the literal url or error messages shared !
>
For what it's worth I suspect these timeouts are simply due to the fact
that we are somewhat lacking in Darwin builder capacity. There are
rarely fewer than five builds queued to run on our two Darwin machines
and this number can sometimes spike to much higher than the machines can
run in the 10-hour build timeout.

Cheers,

- Ben


_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

signature.asc (497 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: CI on forked projects: Darwin woes

Carter Schonwald
Yeah.  That’s my current theory.  It doesn’t help that the queue length isn’t visible 

On Mon, May 13, 2019 at 8:43 AM Ben Gamari <[hidden email]> wrote:
Carter Schonwald <[hidden email]> writes:

> Cool.  I recommend irc and devs list plus urls / copies of error messages.
>
> Hard to debug timeout if we don’t have the literal url or error messages shared !
>
For what it's worth I suspect these timeouts are simply due to the fact
that we are somewhat lacking in Darwin builder capacity. There are
rarely fewer than five builds queued to run on our two Darwin machines
and this number can sometimes spike to much higher than the machines can
run in the 10-hour build timeout.

Cheers,

- Ben


_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs