a better workflow?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

a better workflow?

Richard Eisenberg-5
Hi devs,

Having gotten back to spending more time on GHC, I've found myself frequently hitting capacity limits on my machine. At one point, I could use a server at work that was a workhorse, but that's not possible any more (for boring reasons). It was great, and I miss it. So I started wondering about renting an AWS instance to help, but I quickly got overwhelmed by choice in setting that up. It's now pretty clear that their free services won't serve me, even as a trial prototype. So before diving deeper, I thought I'd ask: has anyone tried this? Or does anyone have a workflow that they like?

Problems I have in want of a solution:
 - Someone submits an MR and I'm reviewing it. I want to interact with it. This invariably means building from scratch and waiting 45 minutes.
 - I work on a patch for a few weeks, on and off. It's ready, but I want to rebase. So I build from scratch and wait 45 minutes.
 - I make a controversial change and want to smoke out any programs that fail. So I run the testsuite and wait over an hour.

This gets tiresome quickly. Most days of GHC hacking require at least one forced task-switch due to these wait times. If I had a snappy server, perhaps these times would be lessened.

By the way, I'm aware of ghc-artefact-nix, but I don't know how to use it. I tried it twice. The first time, I think it worked. But by the second time, it had been revamped (ghc-head-from), and I think I needed to go into two subshells to get it working... and then the ghc I had didn't include the MR code. I think. It's hard to be sure when you're not sure whether or not the patch itself is working. Part of the problem is that I don't use Nix and mostly don't know what I'm doing when I follow the ghc-artefact-nix instructions, which seem to target Nix users.

Thanks!
Richard
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: a better workflow?

Jan Stolarek
Hi Richard,

I think it's been around two years since I last built GHC, but back in the days I could get a full
build time around 17 minutes on my laptop. Not sure how much the build times have increased since
then but I suspect you should be able to build GHC faster than in 45 minutes. The trick I used
wasn't really much of a trick, it was simply about having good hardware: an SSD drive, a good CPU
(I have Xeon), and lots of RAM. And then I had:

1. several separate source trees. This means being able to work on your own stuff in one source
tree and being able to review MRs in another without a need to do a full rebuild when you want to
switch between the two (or more). Downside of this setup was when you wanted to bootstrap from
different GHC versions in different source trees, but with enough scripting this is definitely
doable.

2. build trees separated from the source trees. If I really wanted to squeeze max performance I
would map the build tree onto a ramdisk - that's why you want lots of RAM. It definitely made the
build faster but I can't recall how much it improved the testsuite runs. The downside of course
is that you lose the build when you switch off your machine, so I simply wouldn't switch off
mine, only suspend it to RAM.

Janek

PS. A friend of mine recently told me that his company was considering using AWS but after
calculating the costs it turned out that buying and maintaining their own servers will be
cheaper.
 

---
Politechnika Łódzka
Lodz University of Technology

Treść tej wiadomości zawiera informacje przeznaczone tylko dla adresata.
Jeżeli nie jesteście Państwo jej adresatem, bądź otrzymaliście ją przez pomyłkę
prosimy o powiadomienie o tym nadawcy oraz trwałe jej usunięcie.

This email contains information intended solely for the use of the individual to whom it is addressed.
If you are not the intended recipient or if you have received this message in error,
please notify the sender and delete it from your system.


_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: a better workflow?

Ben Gamari-2
In reply to this post by Richard Eisenberg-5
Richard Eisenberg <[hidden email]> writes:

> Hi devs,
>
> Having gotten back to spending more time on GHC, I've found myself
> frequently hitting capacity limits on my machine. At one point, I
> could use a server at work that was a workhorse, but that's not
> possible any more (for boring reasons). It was great, and I miss it.
> So I started wondering about renting an AWS instance to help, but I
> quickly got overwhelmed by choice in setting that up. It's now pretty
> clear that their free services won't serve me, even as a trial
> prototype. So before diving deeper, I thought I'd ask: has anyone
> tried this? Or does anyone have a workflow that they like?
>
> Problems I have in want of a solution:
>  - Someone submits an MR and I'm reviewing it. I want to interact with
>  it. This invariably means building from scratch and waiting 45
>  minutes.
>  - I work on a patch for a few weeks, on and off. It's ready, but I
>  want to rebase. So I build from scratch and wait 45 minutes.
>  - I make a controversial change and want to smoke out any programs
>  that fail. So I run the testsuite and wait over an hour.
>
> This gets tiresome quickly. Most days of GHC hacking require at least
> one forced task-switch due to these wait times. If I had a snappy
> server, perhaps these times would be lessened.
>
Indeed. I can't imagine working on GHC without my build server. As you
likely know, having a fast machine with plenty of storage always
available has a few nice consequences:

 * I can keep around as many GHC trees (often already built) as I have
   concurrent projects

 * I can leave a tmux session running for each of those projects with
   build environment, an editor session, and whatever else might be
   relevant

 * working from my laptop is no problem, even when running on
   battery: just SSH home and pick up where I left off

Compared to human-hours, even a snappy computer is cheap.

A few years ago I tried using an AWS instance for my development
environment instead of self-hosting. In the end this experiment didn't
last long for a few reasons:

 * reasonably fast cloud instances are expensive so keeping the machine
   up all the time simply wasn't economical (compared to the cost of
   running the machine myself). The performance of one AWS "vCPU" tends
   to be pretty anemic relative to a single modern core.

   Anyone who uses cloud services for long enough will eventually make a
   mistake which puts this cost into perspective. In my case this
   mistake was inadvertently leaving a moderate-size instance running
   for ten days a few years ago. At that point I realized that with the
   cost incurred by this one mistake I could have purchased around a
   quarter of a far more capable computer.

 * having to rebuild your development environment every time you need to
   do a build is expensive in time, even when automated. Indeed some of
   the steps necessary to build a branch aren't even readily automated
   (e.g. ensuring that you remember to set your build flavour
   correctly). This inevitably results in mistakes, resulting in yet
   more rebuilds.

Admittedly self-hosting does have its costs:

 * You need to reasonably reliable internet connection and power

 * You must configure your local router to allow traffic into the box

 * You must configure a dynamic DNS service so you can reliably reach
   your box

 * You must live with the knowledge that you are turning >10W of
   perfectly good electricity into heat and carbon dioxide 24 hours per
   day, seven days per week.

   (Of course, considering how many dead dinosaurs I will vaporize
   getting to Berlin in a few weeks, I suspect I have bigger fish to
   fry [1])
 

> By the way, I'm aware of ghc-artefact-nix, but I don't know how to use
> it. I tried it twice. The first time, I think it worked. But by the
> second time, it had been revamped (ghc-head-from), and I think I
> needed to go into two subshells to get it working... and then the ghc
> I had didn't include the MR code. I think. It's hard to be sure when
> you're not sure whether or not the patch itself is working. Part of
> the problem is that I don't use Nix and mostly don't know what I'm
> doing when I follow the ghc-artefact-nix instructions, which seem to
> target Nix users.
>
We should try to fix improve this. I think ghc-artefact-nix could be a
great tool to enable the consumption of CI-prepared bindists. I'll try
to heave a look and document this when I finish my second head.hackage
blog post.

I personally use NixOS both on my laptop and my build server. This is
quite nice since the environments are guaranteed to be reasonably
consistent. Furthermore, bringing up a development environment on
another machine is straightforward:

    $ git clone git://github.com/alpmestan/ghc.nix
    $ nix-shell ghc.nix
    $ git clone --recursive https://gitlab.haskell.org/ghc/ghc
    $ cd ghc
    $ ./validate

Of course, Nix is far from perfect and it doesn't always realize its
goal of guaranteed reproducibility. However, it is in my opinion a step
up from the ad-hoc Debian configuration that I used up until a couple of
years ago.

Naturally, your mileage may vary.

Cheers,

- Ben


[1] I was curious about the numbers here:

    The distance from New Hampshire to Berlin is around 3000 nautical
    miles. A typical commercial flight of this distance has a burn rate
    per seat [2] of around 3L/100km.

    Burning one liter of jet fuel will evolve [3] roughly 2.5 kg of
    CO_2. Consequently, this single trip (both ways) will cost roughly
    800 kg CO_2 eq.

    By contrast, the carbon intensity of electricity production in my
    region [4] is 280 gCO_2 eq/kWh. Consequently, assuming an average
    power of 50W, running my server for one year would cost around
    100 kg CO_2 eq.

    Indeed it's not as negligible as I thought, but still not awful.

[2] https://en.wikipedia.org/wiki/Fuel_economy_in_aircraft#Long-haul_flights
[3] https://www.eia.gov/environment/emissions/co2_vol_mass.php 
[4] https://www.electricitymap.org/?page=country&solar=false&remote=true&wind=false&countryCode=US-NEISO

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

signature.asc (497 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: a better workflow?

Richard Eisenberg-5
This is very helpful information. I've long thought about doing something like this, but never quite had the crying need until now. And given my short-term peripateticism (summer at my in-laws' in Massachusetts, followed by a year's stint in Cambridge, UK, followed by another month's visit to my in-laws', all while my main home is rented out), this is not viable for now. But it does drive home the advantages quite well. And it describes exactly the trouble I thought I might get into with AWS, once I realized how big a machine I would need to make it worthwhile -- and how manual my interactions with it would have to be.

Thanks for writing this up. It convinces me to give up on AWS and either find another solution or live with what I have now.

Richard

On Jul 23, 2019, at 9:06 PM, Ben Gamari <[hidden email]> wrote:

Richard Eisenberg <[hidden email]> writes:

Hi devs,

Having gotten back to spending more time on GHC, I've found myself
frequently hitting capacity limits on my machine. At one point, I
could use a server at work that was a workhorse, but that's not
possible any more (for boring reasons). It was great, and I miss it.
So I started wondering about renting an AWS instance to help, but I
quickly got overwhelmed by choice in setting that up. It's now pretty
clear that their free services won't serve me, even as a trial
prototype. So before diving deeper, I thought I'd ask: has anyone
tried this? Or does anyone have a workflow that they like?

Problems I have in want of a solution:
- Someone submits an MR and I'm reviewing it. I want to interact with
it. This invariably means building from scratch and waiting 45
minutes.
- I work on a patch for a few weeks, on and off. It's ready, but I
want to rebase. So I build from scratch and wait 45 minutes.
- I make a controversial change and want to smoke out any programs
that fail. So I run the testsuite and wait over an hour.

This gets tiresome quickly. Most days of GHC hacking require at least
one forced task-switch due to these wait times. If I had a snappy
server, perhaps these times would be lessened.

Indeed. I can't imagine working on GHC without my build server. As you
likely know, having a fast machine with plenty of storage always
available has a few nice consequences:

* I can keep around as many GHC trees (often already built) as I have
  concurrent projects

* I can leave a tmux session running for each of those projects with
  build environment, an editor session, and whatever else might be
  relevant

* working from my laptop is no problem, even when running on
  battery: just SSH home and pick up where I left off

Compared to human-hours, even a snappy computer is cheap.

A few years ago I tried using an AWS instance for my development
environment instead of self-hosting. In the end this experiment didn't
last long for a few reasons:

* reasonably fast cloud instances are expensive so keeping the machine
  up all the time simply wasn't economical (compared to the cost of
  running the machine myself). The performance of one AWS "vCPU" tends
  to be pretty anemic relative to a single modern core.

  Anyone who uses cloud services for long enough will eventually make a
  mistake which puts this cost into perspective. In my case this
  mistake was inadvertently leaving a moderate-size instance running
  for ten days a few years ago. At that point I realized that with the
  cost incurred by this one mistake I could have purchased around a
  quarter of a far more capable computer.

* having to rebuild your development environment every time you need to
  do a build is expensive in time, even when automated. Indeed some of
  the steps necessary to build a branch aren't even readily automated
  (e.g. ensuring that you remember to set your build flavour
  correctly). This inevitably results in mistakes, resulting in yet
  more rebuilds.

Admittedly self-hosting does have its costs:

* You need to reasonably reliable internet connection and power

* You must configure your local router to allow traffic into the box

* You must configure a dynamic DNS service so you can reliably reach
  your box

* You must live with the knowledge that you are turning >10W of
  perfectly good electricity into heat and carbon dioxide 24 hours per
  day, seven days per week.

  (Of course, considering how many dead dinosaurs I will vaporize
  getting to Berlin in a few weeks, I suspect I have bigger fish to
  fry [1])


By the way, I'm aware of ghc-artefact-nix, but I don't know how to use
it. I tried it twice. The first time, I think it worked. But by the
second time, it had been revamped (ghc-head-from), and I think I
needed to go into two subshells to get it working... and then the ghc
I had didn't include the MR code. I think. It's hard to be sure when
you're not sure whether or not the patch itself is working. Part of
the problem is that I don't use Nix and mostly don't know what I'm
doing when I follow the ghc-artefact-nix instructions, which seem to
target Nix users.

We should try to fix improve this. I think ghc-artefact-nix could be a
great tool to enable the consumption of CI-prepared bindists. I'll try
to heave a look and document this when I finish my second head.hackage
blog post.

I personally use NixOS both on my laptop and my build server. This is
quite nice since the environments are guaranteed to be reasonably
consistent. Furthermore, bringing up a development environment on
another machine is straightforward:

   $ git clone <a href="git://github.com/alpmestan/ghc.nix" style="font-family: Menlo-Regular; font-size: 11px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px;" class="">git://github.com/alpmestan/ghc.nix
   $ nix-shell ghc.nix
   $ git clone --recursive https://gitlab.haskell.org/ghc/ghc
   $ cd ghc
   $ ./validate

Of course, Nix is far from perfect and it doesn't always realize its
goal of guaranteed reproducibility. However, it is in my opinion a step
up from the ad-hoc Debian configuration that I used up until a couple of
years ago.

Naturally, your mileage may vary.

Cheers,

- Ben


[1] I was curious about the numbers here:

   The distance from New Hampshire to Berlin is around 3000 nautical
   miles. A typical commercial flight of this distance has a burn rate
   per seat [2] of around 3L/100km.

   Burning one liter of jet fuel will evolve [3] roughly 2.5 kg of
   CO_2. Consequently, this single trip (both ways) will cost roughly
   800 kg CO_2 eq.

   By contrast, the carbon intensity of electricity production in my
   region [4] is 280 gCO_2 eq/kWh. Consequently, assuming an average
   power of 50W, running my server for one year would cost around
   100 kg CO_2 eq.

   Indeed it's not as negligible as I thought, but still not awful.

[2] https://en.wikipedia.org/wiki/Fuel_economy_in_aircraft#Long-haul_flights
[3] https://www.eia.gov/environment/emissions/co2_vol_mass.php 
[4] https://www.electricitymap.org/?page=country&solar=false&remote=true&wind=false&countryCode=US-NEISO


_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: a better workflow?

Daniel Gröber
Hi,

On Tue, Jul 23, 2019 at 09:42:37PM -0400, Richard Eisenberg wrote:
> Thanks for writing this up. It convinces me to give up on AWS and
> either find another solution or live with what I have now.

I don't think you ever mentioned -- are you already using `git
worktree` to get multiple source checkouts or are you working off a
single build tree? I find using it essential to reducing context
switching overhead.

Also AWS is by far not the only game in town when it comes to server
hosting. If you don't mind getting something on a month-to-month basis
rather than hourly then bog standard server hosting providers are
probably a much cheaper option. Since they don't offer any of the
fancy managed cloud features you're unlikely to need.

I can recomend Hetzner in terms of price, if you don't mind just
getting some old(ish) 4 core, 8 threads hardware they have some really
affordable options in the 30EUR/mo range (look for the "Server
Auctions" stuff).

--Daniel
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: a better workflow?

Steven Shaw
Hi Richard,

I'd second Hetzner. They are in Europe so latency should be pretty good from England. I don't build GHC regularly but I have just purchased a machine similar to this one from Hetnzer (with only a single 500GB Samsung 500GB 970 EVO Plus) and it makes a meal of my client's application with many dependencies. We use a Hetzner machine at work as a CI server and it hasn't let us down yet.

Note that I used to use GCP because my MacBook Air wasn't really up to the task. I'd use tmux and emacs so things were pretty good (on a free trial with preemptible — shut down your machine when you're not using it and it can be pretty cheap). However, SSD speeds are not like those you get with a dedicated server. IIRC 300Gbps vs 1000Gbps.

Cheers,
Steve.


_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: a better workflow?

Carter Schonwald
also: depending on the time scales of havingt these machines, it sometimes makes sense to just have a mini itx/ micro tower/etc at home!  I dont have any build recommendations but im sure folks like Ben have suggestions :) 

On Tue, Jul 23, 2019 at 10:59 PM Steven Shaw <[hidden email]> wrote:
Hi Richard,

I'd second Hetzner. They are in Europe so latency should be pretty good from England. I don't build GHC regularly but I have just purchased a machine similar to this one from Hetnzer (with only a single 500GB Samsung 500GB 970 EVO Plus) and it makes a meal of my client's application with many dependencies. We use a Hetzner machine at work as a CI server and it hasn't let us down yet.

Note that I used to use GCP because my MacBook Air wasn't really up to the task. I'd use tmux and emacs so things were pretty good (on a free trial with preemptible — shut down your machine when you're not using it and it can be pretty cheap). However, SSD speeds are not like those you get with a dedicated server. IIRC 300Gbps vs 1000Gbps.

Cheers,
Steve.

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: a better workflow?

Richard Eisenberg-5
In reply to this post by Daniel Gröber


On Jul 23, 2019, at 10:48 PM, Daniel Gröber <[hidden email]> wrote:

I don't think you ever mentioned -- are you already using `git
worktree` to get multiple source checkouts or are you working off a
single build tree? I find using it essential to reducing context
switching overhead.

This is a good point. No, I'm not currently. Some post I read (actually, I think the manpage) said that `git worktree` and submodules don't mix, so I got scared off. Regardless, I don't think worktree will solve my problem exactly. It eliminates the annoyance of shuttling commits from one checkout to another, but that's not really a pain point for me. (Yes, it's a small annoyance, but I hit it only rarely, and it's quick to sort out.) Perhaps I'm missing something though about worktree that will allow more, e.g., sharing of build products. Am I?

Thanks!
Richard

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: a better workflow?

Sebastian Graf
I found that git worktree works rather well, even with submodules (well, mostly. Even if it doesn't for some reason, you can still update and init the submodules manually, losing sharing in the process).
See https://stackoverflow.com/a/31872051, in particular the GitHub links to `wtas` alias.

I mostly do this:

$ cd ~/code/hs/ghc
$ cd pristine
$ git wtas ../pmcheck

and mostly just hack away. From time to time I seem to have issues because of confused submodule references, but as I said above doing a `git submodule update --init --recursive` fixes that. Cloning the root GHC checkout is the most time-consuming step, after all.

Also I'm currently in the rather comfortable situation of having an 8 core azure VM just for GHC dev, which is pretty amazing. Doing the same as Ben here: Having a tmux open with one (or more) tab per checkout I'm working on in parallel. VSCode is my editor of choice and seamlessly picks up any SSH connection I throw at it. Can highly recommend that when you're on a rather weak machine like a laptop or convertible.

Am Mi., 24. Juli 2019 um 14:03 Uhr schrieb Richard Eisenberg <[hidden email]>:


On Jul 23, 2019, at 10:48 PM, Daniel Gröber <[hidden email]> wrote:

I don't think you ever mentioned -- are you already using `git
worktree` to get multiple source checkouts or are you working off a
single build tree? I find using it essential to reducing context
switching overhead.

This is a good point. No, I'm not currently. Some post I read (actually, I think the manpage) said that `git worktree` and submodules don't mix, so I got scared off. Regardless, I don't think worktree will solve my problem exactly. It eliminates the annoyance of shuttling commits from one checkout to another, but that's not really a pain point for me. (Yes, it's a small annoyance, but I hit it only rarely, and it's quick to sort out.) Perhaps I'm missing something though about worktree that will allow more, e.g., sharing of build products. Am I?

Thanks!
Richard
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: a better workflow?

Ben Gamari-3
In reply to this post by Richard Eisenberg-5
Richard Eisenberg <[hidden email]> writes:

>> On Jul 23, 2019, at 10:48 PM, Daniel Gröber <[hidden email]> wrote:
>>
>> I don't think you ever mentioned -- are you already using `git
>> worktree` to get multiple source checkouts or are you working off a
>> single build tree? I find using it essential to reducing context
>> switching overhead.
>
> This is a good point. No, I'm not currently. Some post I read
> (actually, I think the manpage) said that `git worktree` and
> submodules don't mix, so I got scared off. Regardless, I don't think
> worktree will solve my problem exactly. It eliminates the annoyance of
> shuttling commits from one checkout to another, but that's not really
> a pain point for me. (Yes, it's a small annoyance, but I hit it only
> rarely, and it's quick to sort out.) Perhaps I'm missing something
> though about worktree that will allow more, e.g., sharing of build
> products. Am I?
>
Sadly no. Recently we (specifically David Eichmann) invested quite some
effort in trying to enable Shake's support for caching in Hadrian which
would have allowed sharing of build artifacts between trees.
Unfortunately the challenges here were significantly greater than we
expected. David summarized the effort in his recent blog post [1].

Cheers,

- Ben


[1] https://www.haskell.org/ghc/blog/20190731-hadrian-cloud-builds.html

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

signature.asc (497 bytes) Download Attachment