Allowing Windows CI to fail

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Allowing Windows CI to fail

Ben Gamari-3
Hi everyone,

After multiple weeks of effort struggling to get Windows CI into a stable
condition I'm sorry to say that we're going to need to revert to
allowing it to fail for a bit longer. The status quo is essentially
holding up the entire merge queue and we still seem quite far from
resolving the issues.

I have summarised the current state-of-play in #17777. In short, the
gcc toolchain likely can't be used reliably on Windows due to its
ubiquitous use of `exec`, which cannot be reliably implemented on
Windows.

Switching to LLVM as our native toolchain was my (initially promising)
last-ditch attempt at avoiding this issue but sadly this looks to be a
long road. My current attempt is stuck on an inscrutable loader error.

For the short-term, I am afraid I have run out of time for this effort.
My current plan is to merge what I can from my wip/windows-ci branch but
again enable the Windows CI jobs' allow_failure flag so that its
unreliable nature doesn't hold up otherwise-passing CI jobs.

While it's unfortunately that we still lack reliable CI on Windows,
I think the effords of the last few weeks were quite worthwhile. We now
have:

 * A much better understanding of the issues affecting us on Windows
 * Significantly better documentation and automation for producing our
   mingw toolchain artifacts
 * better scripting for setting up Windows CI runners
 * fixed several bugs in the ghc-jailbreak library used to work
   around the Windows MAX_PATH limitation

Many thanks to Tamar Christina for his many hours of patient help.
Without him, GHC's Windows support would be in significantly worse shape
than it is.

Users of GHC should note that the CI issues we are struggling with *do
not* affect compiled code. These bugs manifest only as (rare) failed
compilations (particularly when building GHC itself); however, once
compilation succeeds the program that results is correct and reliable.

Cheers,

- Ben

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

signature.asc (497 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: Allowing Windows CI to fail

GHC - devs mailing list
Ben

This sounds like a good decision to me, thanks.

Is there a possibility to have a slow CI-on-windows job (not part of the "this must pass before merging" step), which will slowly, but reliably, fail if the Windows build fails. E.g. does it help to make the build be 100% sequential?

Or is there currently no way to build GHC at all on Windows in a way that won't fail?  (That would be surprising to me.  Until relatively recently I was *only* building on Windows.)

Simon

| -----Original Message-----
| From: ghc-devs <[hidden email]> On Behalf Of Ben Gamari
| Sent: 03 February 2020 16:03
| To: GHC developers <[hidden email]>
| Subject: Allowing Windows CI to fail
|
| Hi everyone,
|
| After multiple weeks of effort struggling to get Windows CI into a stable
| condition I'm sorry to say that we're going to need to revert to allowing
| it to fail for a bit longer. The status quo is essentially holding up the
| entire merge queue and we still seem quite far from resolving the issues.
|
| I have summarised the current state-of-play in #17777. In short, the gcc
| toolchain likely can't be used reliably on Windows due to its ubiquitous
| use of `exec`, which cannot be reliably implemented on Windows.
|
| Switching to LLVM as our native toolchain was my (initially promising)
| last-ditch attempt at avoiding this issue but sadly this looks to be a long
| road. My current attempt is stuck on an inscrutable loader error.
|
| For the short-term, I am afraid I have run out of time for this effort.
| My current plan is to merge what I can from my wip/windows-ci branch but
| again enable the Windows CI jobs' allow_failure flag so that its unreliable
| nature doesn't hold up otherwise-passing CI jobs.
|
| While it's unfortunately that we still lack reliable CI on Windows, I think
| the effords of the last few weeks were quite worthwhile. We now
| have:
|
|  * A much better understanding of the issues affecting us on Windows
|  * Significantly better documentation and automation for producing our
|    mingw toolchain artifacts
|  * better scripting for setting up Windows CI runners
|  * fixed several bugs in the ghc-jailbreak library used to work
|    around the Windows MAX_PATH limitation
|
| Many thanks to Tamar Christina for his many hours of patient help.
| Without him, GHC's Windows support would be in significantly worse shape
| than it is.
|
| Users of GHC should note that the CI issues we are struggling with *do
| not* affect compiled code. These bugs manifest only as (rare) failed
| compilations (particularly when building GHC itself); however, once
| compilation succeeds the program that results is correct and reliable.
|
| Cheers,
|
| - Ben
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

RE: Allowing Windows CI to fail

Ben Gamari-3
Simon Peyton Jones via ghc-devs <[hidden email]> writes:

> Ben
>
> This sounds like a good decision to me, thanks.
>
> Is there a possibility to have a slow CI-on-windows job (not part of
> the "this must pass before merging" step), which will slowly, but
> reliably, fail if the Windows build fails. E.g. does it help to make
> the build be 100% sequential?
>
Sadly that won't fix the underlying problem.

> Or is there currently no way to build GHC at all on Windows in a way
> that won't fail? (That would be surprising to me. Until relatively
> recently I was *only* building on Windows.)
>
There is no way to build GHC that won't have a chance of failing. Indeed
Phyx and I also find it quite surprising how the probability of failure
seems to be higher now than in the past. However, we also both agree
that the status quo, when it works, works only accidentally (if the
win32 API documentation is to be believed).

What is especially intriguing is the fact that mingw32 gnu make should
also be affected by the same `exec` issue that we are struggling with,
does none of the job object headstands that we are doing, and yet
*appears* to be quite reliable. Tamar had a hypothesis for why this
might be that he will test when he has time.

Cheers,

- Ben


_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

signature.asc (497 bytes) Download Attachment