parallelizing ghc

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

parallelizing ghc

Evan Laforge
I recently switched from ghc --make to a parallelized build system.  I
was looking forward to faster builds, and while they are much faster
at figuring out what has to be rebuilt (which is most of the time for
a small rebuild, since ld dominates), compilation of the whole system
is either the same or slightly slower than the single threaded ghc
--make version.  My guess is that the overhead of starting up lots of
individual ghcs, each of which has to read all the .hi files all over
again, just about cancels out the parallelism gains.

So one way around that would be parallelizing --make, which has been a
TODO for a long time.  However, I believe that's never going to be
satisfactory for a project involving various different languages,
because ghc itself is never going to be a general purpose build
system.

So ghc --make provides two things: a dependency chaser and a way to
keep the compiler resident as it compiles new files.  Since the
dependency chaser will never be as powerful as a real build system, it
occurs to me that the only reasonable way forward is to split out the
second part, by adding an --interactive flag to ghc.  It would then
read filenames on stdin, compiling each one in turn, only exiting when
it sees EOF.

Then a separate program, ghc-fe, can wrap ghc and acts like a drop-in
replacement for ghc.

It would be nice if ghc could atomically read one line from the input,
then you could just start a bunch of ghcs behind a named pipe and each
would steal its own work.  But I don't think that's possible with unix
pipes, and of course there are still a few non-unix systems out there.
 And I guess ghc-fe has to wait for the compilation to finish, so I
guess ghc has to print a status line when it completes (or fails) a
module.  But it can still be done with an external distributor program
that acts like a server: starts up n ghcs, distributes src files
between them, and shuts them down then given the command:

data Ghc = Ghc { status :: Free|Busy, in :: Handle, out :: Handle, pid :: Int }

main = do
    origFlags <- getArgs
    ghcs <- mapM (startup origFlags) [0..cpus]
    socket <- accept
    while $ read socket >>= \case of
        Quit -> return False
        Compile ghcFlags src -> forkIO $
            assert $ ghcFlags == origFlags
            result <- bracket (findFreeAndMarkBusy ghcs) markFree $ \ghc -> do
                tellGhc ghc src
                readResult ghc
            write socket result
            return True
    mapM_ shutdown ghcs

The ghc-fe then starts a distributor if one is not running, sends a
src file and waits for the response, acting like a drop-in replacement
for the ghc cmdline.  Build systems just call ghc-fe and have an extra
responsibility to call ghc-fe --quit when they are done.  And I guess
if they know how many files they want to rebuild, it won't be worth it
below a certain threshold.


So I'm wondering, does this seem reasonable and feasible?  Is there a
better way to do it?  Even if it could be done, would it be worth it?
If the answers are "yes", "maybe not", and "maybe yes", then how hard
would this be to do and where should I start looking?  I'm assuming
start at GhcMake.hs and work outwards from there...

I'm not entirely sure it would be worth it to me even if it did make
full builds, say 1.5x faster for my dual core i5, but it's interesting
to think about all the same.

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: parallelizing ghc

Mikhail Glushenkov-2
Hi,

On Tue, Jan 24, 2012 at 4:53 AM, Evan Laforge <[hidden email]> wrote:
> [...]
>
> So ghc --make provides two things: a dependency chaser and a way to
> keep the compiler resident as it compiles new files.  Since the
> dependency chaser will never be as powerful as a real build system, it
> occurs to me that the only reasonable way forward is to split out the
> second part, by adding an --interactive flag to ghc.  It would then
> read filenames on stdin, compiling each one in turn, only exiting when
> it sees EOF.

There is in fact an '--interactive' flag already, 'ghc --interactive'
is a synonym for 'ghci'.

> So I'm wondering, does this seem reasonable and feasible?  Is there a
> better way to do it?  Even if it could be done, would it be worth it?
> If the answers are "yes", "maybe not", and "maybe yes", then how hard
> would this be to do and where should I start looking?  I'm assuming
> start at GhcMake.hs and work outwards from there...

I'm also interested in a "build server" mode for ghc. I have written a
parallel wrapper for 'ghc --make' [1], but the speed gains are not as
impressive [2] as I hoped because of the duplicated work.


[1] https://github.com/23Skidoo/ghc-parmake
[2] https://gist.github.com/1360470

--
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: parallelizing ghc

Mikhail Glushenkov-2
In reply to this post by Evan Laforge
Hi,

On Tue, Jan 24, 2012 at 4:53 AM, Evan Laforge <[hidden email]> wrote:

> So ghc --make provides two things: a dependency chaser and a way to
> keep the compiler resident as it compiles new files.  Since the
> dependency chaser will never be as powerful as a real build system, it
> occurs to me that the only reasonable way forward is to split out the
> second part, by adding an --interactive flag to ghc.  It would then
> read filenames on stdin, compiling each one in turn, only exiting when
> it sees EOF.
>
> Then a separate program, ghc-fe, can wrap ghc and acts like a drop-in
> replacement for ghc.

One immediate problem I see with this is linking - 'ghc --make
Main.hs' is able to figure out what packages a program depends on,
while 'ghc Main.o ... -o Main' requires the user to specify them
manually with -package. So you'll either need to pass this information
back to the parent process, or use 'ghc --make' for linking (which
adds more overhead).

--
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: parallelizing ghc

Evan Laforge
> One immediate problem I see with this is linking - 'ghc --make
> Main.hs' is able to figure out what packages a program depends on,
> while 'ghc Main.o ... -o Main' requires the user to specify them
> manually with -package. So you'll either need to pass this information
> back to the parent process, or use 'ghc --make' for linking (which
> adds more overhead).

Well, figuring out dependencies is the job of the build system.  I'd
be perfectly happy to just invoke ghc with a hardcoded package list as
I do currently, or as you said, invoke --make just to figure out the
package list for me.  The time is going to be dominated by linking,
which is single threaded anyway, so either way works.

It would be a neat feature to be able to ask ghc to figure out the
packages needed for a particular file and emit them for the build
system (or is there already a way to do that currently?), but it's
orthogonal I think.  Probably not hard though, just stick a knob on
--make that prints the link line instead of running it.

> There is in fact an '--interactive' flag already, 'ghc --interactive'
> is a synonym for 'ghci'.

Oh right, well some other name then :)

> I'm also interested in a "build server" mode for ghc. I have written a
> parallel wrapper for 'ghc --make' [1], but the speed gains are not as
> impressive [2] as I hoped because of the duplicated work.

Was the duplicated work rereading .hi files, or was there something else?

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: parallelizing ghc

Mikhail Glushenkov-2
Hi,

On Tue, Jan 24, 2012 at 7:04 PM, Evan Laforge <[hidden email]> wrote:
>> I'm also interested in a "build server" mode for ghc. I have written a
>> parallel wrapper for 'ghc --make' [1], but the speed gains are not as
>> impressive [2] as I hoped because of the duplicated work.
>
> Was the duplicated work rereading .hi files, or was there something else?

I think so - according to the GHC manual, the main speed improvement
comes from caching the information between compilations.

--
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: parallelizing ghc

Ryan Newton
In reply to this post by Evan Laforge
package list for me.  The time is going to be dominated by linking,
which is single threaded anyway, so either way works.

What is the state of incremental linkers?  I thought those existed now.


_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: parallelizing ghc

Evan Laforge
On Wed, Jan 25, 2012 at 11:42 AM, Ryan Newton <[hidden email]> wrote:
>> package list for me.  The time is going to be dominated by linking,
>> which is single threaded anyway, so either way works.
>
> What is the state of incremental linkers?  I thought those existed now.

I think in some specific cases.  I've heard there's a microsoft one?
It would be windows only of course.  Is anyone using that with ghc?

gold is supposed to be multi-threaded and fast (don't know about
incremental), but once again it's ELF-only.  I've heard a few people
talking about gold with ghc, but I don't know what the results were.

Unfortunately I'm on OS X, I don't know about any incremental or
multithreaded linking here.

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: parallelizing ghc

Simon Marlow-7
In reply to this post by Evan Laforge
On 24/01/2012 03:53, Evan Laforge wrote:
 > I recently switched from ghc --make to a parallelized build system.  I
 > was looking forward to faster builds, and while they are much faster
 > at figuring out what has to be rebuilt (which is most of the time for
 > a small rebuild, since ld dominates), compilation of the whole system
 > is either the same or slightly slower than the single threaded ghc
 > --make version.  My guess is that the overhead of starting up lots of
 > individual ghcs, each of which has to read all the .hi files all over
 > again, just about cancels out the parallelism gains.

I'm slightly surprised by this - in my experience parallel builds beat
--make as long as the parallelism is a factor of 2 or more.  Is your
dependency graph very narrow, or do you have lots of very small modules?

> So I'm wondering, does this seem reasonable and feasible?  Is there a
> better way to do it?  Even if it could be done, would it be worth it?
> If the answers are "yes", "maybe not", and "maybe yes", then how hard
> would this be to do and where should I start looking?  I'm assuming
> start at GhcMake.hs and work outwards from there...

I like the idea!  And it should be possible to build this without
modifying GHC at all, on top of the GHC API.  As you say, you'll need a
server process, which accepts command lines, executes them, and sends
back the results.  A local socket should be fine (and will work on both
Unix and Windows).

The server process can either do the compilation itself, or have several
workers.  Unfortunately the workers would have to be separate processes,
because the GHC API is single threaded.

When a worker gets too large, just kill it and start a new one.

Cheers,
        Simon

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: parallelizing ghc

John Lato-2
In reply to this post by Evan Laforge
> From: Evan Laforge <[hidden email]>
>
> On Wed, Jan 25, 2012 at 11:42 AM, Ryan Newton <[hidden email]> wrote:
>>> package list for me. ?The time is going to be dominated by linking,
>>> which is single threaded anyway, so either way works.
>>
>> What is the state of incremental linkers? ?I thought those existed now.
>
> I think in some specific cases.  I've heard there's a microsoft one?
> It would be windows only of course.  Is anyone using that with ghc?
>
> gold is supposed to be multi-threaded and fast (don't know about
> incremental), but once again it's ELF-only.  I've heard a few people
> talking about gold with ghc, but I don't know what the results were.
>
> Unfortunately I'm on OS X, I don't know about any incremental or
> multithreaded linking here.

Neither do I.  On my older machine with 2GB RAM, builds are often
dominated by ld because it starts thrashing.  And not many linkers
target Mach-O.

I've been toying with building my own ld replacement.  I don't know
anything about linkers, but I'd say at least even odds that I can do
better than this.

John L.

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: parallelizing ghc

Evan Laforge
In reply to this post by Simon Marlow-7
> I'm slightly surprised by this - in my experience parallel builds beat
> --make as long as the parallelism is a factor of 2 or more.  Is your
> dependency graph very narrow, or do you have lots of very small modules?

I get full parallelism, 4 threads at once on a 2 core machine * 2
hyperthread/whatever core i5, and SSD.  Maybe I should try with just 2
threads.  I only ever get 200% CPU at most, so it seems like the
hyperthreads are not really much like a whole core.

The modules are usually around 150-250 lines.  Here are the timings
for an older run:

    from scratch (191 modules):
    runghc Shake/Shakefile.hs build/debug/seq  128.43s user 20.04s system 178%
cpu 1:23.01 total
    no link: runghc Shake/Shakefile.hs build/debug/seq  118.92s user 19.21s sys
tem 249% cpu 55.383 total
    make -j3 build/seq  68.81s user 9.98s system 98% cpu 1:19.60 total

    modify nothing:
    runghc Shake/Shakefile.hs build/debug/seq  0.65s user 0.10s system 96% cpu
0.780 total
    make -j3 build/seq  6.05s user 1.21s system 85% cpu 8.492 total

    modify one file:
    runghc Shake/Shakefile.hs build/debug/seq  19.50s user 2.37s system 94% cpu
 23.166 total
    make -j3 build/seq  12.81s user 1.85s system 94% cpu 15.586 total

From scratch, --make (that's what 'make -j3' winds up calling) wins
slightly.  --make loses handily at detecting than nothing need be done
:)  And as expected, modifying one file is all about the linking,
though it's odd how --make was faster.

> I like the idea!  And it should be possible to build this without modifying
> GHC at all, on top of the GHC API.  As you say, you'll need a server
> process, which accepts command lines, executes them, and sends back the
> results.  A local socket should be fine (and will work on both Unix and
> Windows).
>
> The server process can either do the compilation itself, or have several
> workers.  Unfortunately the workers would have to be separate processes,
> because the GHC API is single threaded.
>
> When a worker gets too large, just kill it and start a new one.

A benefit of real processes, I'm pretty confident all the memory will
be GCed after the whole process is killed :)

I'll start looking into the ghc api.  I have no experience with it,
but I assume I can look at what GhcMake.hs is doing and learn from
that.

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: parallelizing ghc

Evan Laforge
In reply to this post by John Lato-2
> I've been toying with building my own ld replacement.  I don't know
> anything about linkers, but I'd say at least even odds that I can do
> better than this.

I'm guessing linkers are hard, but gold proves that if you keep the
scope small and use modern techniques you can get really good
improvements over gnu ld.  I think the gold author had quite a bit of
linker experience though.

I'd think apple would care about linker performance... I'm even a
little surprised Xcode doesn't have something better than a lightly
hacked gnu ld.

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: parallelizing ghc

Nathan Howell-2
On Thu, Jan 26, 2012 at 3:44 PM, Evan Laforge <[hidden email]> wrote:
I'd think apple would care about linker performance... I'm even a
little surprised Xcode doesn't have something better than a lightly
hacked gnu ld.

Someone mentioned that it was on their wish-list at LLVM 2010 conference... it's hinted at here too: http://llvm.org/devmtg/2010-11/Spencer-ObjectFiles.pdf. He might know if anyone is actually working on one.

-n

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: parallelizing ghc

Simon Marlow-7
In reply to this post by Evan Laforge
On 26/01/2012 23:37, Evan Laforge wrote:

>> I'm slightly surprised by this - in my experience parallel builds beat
>> --make as long as the parallelism is a factor of 2 or more.  Is your
>> dependency graph very narrow, or do you have lots of very small modules?
>
> I get full parallelism, 4 threads at once on a 2 core machine * 2
> hyperthread/whatever core i5, and SSD.  Maybe I should try with just 2
> threads.  I only ever get 200% CPU at most, so it seems like the
> hyperthreads are not really much like a whole core.
>
> The modules are usually around 150-250 lines.  Here are the timings
> for an older run:
>
>      from scratch (191 modules):
>      runghc Shake/Shakefile.hs build/debug/seq  128.43s user 20.04s system 178%
> cpu 1:23.01 total
>      no link: runghc Shake/Shakefile.hs build/debug/seq  118.92s user 19.21s sys
> tem 249% cpu 55.383 total
>      make -j3 build/seq  68.81s user 9.98s system 98% cpu 1:19.60 total

This looks a bit suspicious.  The Shake build is doing nearly twice as
much work as the --make build, in terms of CPU time, but because it is
getting nearly 2x parallelism it comes in a close second.  How many
processes is the Shake build using?

I'd investigate this further.  Are you sure there's no swapping going
on?  How many processes is the Shake build creating - perhaps too many?

Cheers,
        Simon

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: parallelizing ghc

Neil Mitchell
Hi Simon,

I have found that a factor of 2 parallelism is required on Linux to
draw with ghc --make. In particular:

GHC --make = 7.688
Shake -j1 = 11.828 (of which 11.702 is spent running system commands)
Shake full -j4 = 7.414 (of which 12.906 is spent running system commands)

This is for a Haskell program which has several bottlenecks, you can
see graph of spawned processes here:
http://community.haskell.org/~ndm/darcs/shake/academic/icfp2012/profile.eps
- everything above the 1 mark is more than one process in parallel, so
it gets to 4 processes, but not all the time - roughly an average of ~
x2 parallelism.

On Windows the story is much worse. If you -j4 then the time spent
executing system commands shoots up from ~15s to around ~25s, since
even on a 4 core machine the contention in the processes is high. I
tried investigating this, checking for things like a locked file (none
I can find), or disk/CPU/memory contention (its basically taking no
system resources), but couldn't find anything.

If you specify -O2 then the parallel performance also goes down - I
suspect because each ghc process needs to read inline information for
packages that are imported multiple times, and ghc --make gets away
with doing that once?

> This looks a bit suspicious.  The Shake build is doing nearly twice as much
> work as the --make build, in terms of CPU time, but because it is getting
> nearly 2x parallelism it comes in a close second.  How many processes is the
> Shake build using?

Shake uses a maximum of the number of processes you specify, it never
exceeds the -j flag - so in the above example it caps out at 4. It is
very good at getting parallelism (I believe it to be perfect, but the
code is 150 lines of IORef twiddling, so I wouldn't guarantee it), and
very safe about never exceeding the cap you specify (I think I can
even prove that, for some value of proof). The profiling makes it easy
to verify these claims after the fact.

Thanks, Neil

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: parallelizing ghc

Evan Laforge
In reply to this post by Simon Marlow-7
> I like the idea!  And it should be possible to build this without modifying
> GHC at all, on top of the GHC API.  As you say, you'll need a server
> process, which accepts command lines, executes them, and sends back the
> results.  A local socket should be fine (and will work on both Unix and
> Windows).

I took a whack at this, but I'm having to backtrack a bit now because
I don't fully understand the GHC API, so I thought I should explain my
understanding to make sure I'm on the right track.

It appears the cached information I want to preserve between compiles
is in HscEnv.  At first I thought I could just do what --make does,
but what it does is call 'GHC.load', which maintains the HscEnv (which
mostly means loading already compiled modules into the
HomePackageTable, since the other cache entries are apparently loaded
on demand by DriverPipeline.compileFile).  But actually it does a lot
of things, such as detecting that a module doesn't need recompilation
and directly loading the interface in that case.  So I thought it
would be quickest to just use it: add a new target to the set of
targets and call load again.

However, there are problems with that.  The first is it doesn't pay
attention to DynFlags.outputFile, which makes sense because it's
expecting to compile multiple files.  The bigger problem is that it
apparently wants to reload the whole set each time, so it winds up
being slower rather than faster.  I guess 'load' is really set up to
figure out dependencies on its own and compile a set of modules, so
I'm talking at the wrong level.

So I think I need to rewrite the HPT-maintaining parts of GHC.load and
write my own compileFile that *does* maintain the HPT.  And also
figure out what other parts of the HscEnv should be updated, if any.
Sound about right?


Along the way I ran into the problem that it's impossible to re-parse
GHC flags to compare them to previous runs, because static flags only
export a parsing function that mutates global variables and can only
be called once.  So I parse out the dynamic flags, strip out the *.hs
args, and assume the rest are static flags.  I noticed comments about
converting them all to dynamic, I guess that might make a nice
housekeeping project some day.

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: parallelizing ghc

Simon Marlow-7
On 10/02/2012 08:01, Evan Laforge wrote:

>> I like the idea!  And it should be possible to build this without modifying
>> GHC at all, on top of the GHC API.  As you say, you'll need a server
>> process, which accepts command lines, executes them, and sends back the
>> results.  A local socket should be fine (and will work on both Unix and
>> Windows).
>
> I took a whack at this, but I'm having to backtrack a bit now because
> I don't fully understand the GHC API, so I thought I should explain my
> understanding to make sure I'm on the right track.
>
> It appears the cached information I want to preserve between compiles
> is in HscEnv.  At first I thought I could just do what --make does,
> but what it does is call 'GHC.load', which maintains the HscEnv (which
> mostly means loading already compiled modules into the
> HomePackageTable, since the other cache entries are apparently loaded
> on demand by DriverPipeline.compileFile).  But actually it does a lot
> of things, such as detecting that a module doesn't need recompilation
> and directly loading the interface in that case.  So I thought it
> would be quickest to just use it: add a new target to the set of
> targets and call load again.
>
> However, there are problems with that.  The first is it doesn't pay
> attention to DynFlags.outputFile, which makes sense because it's
> expecting to compile multiple files.  The bigger problem is that it
> apparently wants to reload the whole set each time, so it winds up
> being slower rather than faster.  I guess 'load' is really set up to
> figure out dependencies on its own and compile a set of modules, so
> I'm talking at the wrong level.
>
> So I think I need to rewrite the HPT-maintaining parts of GHC.load and
> write my own compileFile that *does* maintain the HPT.  And also
> figure out what other parts of the HscEnv should be updated, if any.
> Sound about right?

What you're trying to do is mimic the operation of 'ghc -c Foo.hs ..'
but cache any loaded interface files and re-use them.  This means you
need to retain the contents of HscEnv (as you say), because that
contains the cached information.

However, the GHC API doesn't provide a way to do this directly (I hadn't
really thought about this when I suggested the idea before, sorry).  The
GHC API provides support for compiling multiple modules in the way that
GHCi and --make work; each module is added to the HPT as it is compiled.
  But when compiling single modules, GHC doesn't normally use the HPT -
interfaces for modules in the home package are normally demand-loaded in
the same way as interfaces for package modules, and added to the PIT.
The crucial difference between the HPT and the PIT is that the PIT
supports demand-loading of interfaces, but the HPT is supposed to be
populated in the right order by the compilation manager - home package
modules are assumed to be present in the HPT when they are required.

For 'ghc -c Foo.hs' you want to demand-load interfaces for other modules
in the same package (and cache them), but you want them to not get mixed
up with interfaces from other packages that may be being compiled
simultaneously by other clients.  There's no easy way to solve this.
You could avoid the problem by not caching home-package interfaces, but
that may throw away a lot of the benefit of doing this.  Or you could
maintain some kind of session state with the client over multiple
compilations, and only discard the home package interfaces if another
client connects.

There are further complications in that certain flags can invalidate the
information you have cached: changing the package flags, for instance.

So I think some additions to the API are almost certainly needed.  But
this is as far as I have got in thinking about the problem...

Cheers,
        Simon



>
> Along the way I ran into the problem that it's impossible to re-parse
> GHC flags to compare them to previous runs, because static flags only
> export a parsing function that mutates global variables and can only
> be called once.  So I parse out the dynamic flags, strip out the *.hs
> args, and assume the rest are static flags.  I noticed comments about
> converting them all to dynamic, I guess that might make a nice
> housekeeping project some day.


_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: parallelizing ghc

Evan Laforge
> However, the GHC API doesn't provide a way to do this directly (I hadn't
> really thought about this when I suggested the idea before, sorry).  The GHC
> API provides support for compiling multiple modules in the way that GHCi and
> --make work; each module is added to the HPT as it is compiled.  But when
> compiling single modules, GHC doesn't normally use the HPT - interfaces for
> modules in the home package are normally demand-loaded in the same way as
> interfaces for package modules, and added to the PIT. The crucial difference
> between the HPT and the PIT is that the PIT supports demand-loading of
> interfaces, but the HPT is supposed to be populated in the right order by
> the compilation manager - home package modules are assumed to be present in
> the HPT when they are required.

Yah, that's what I don't understand about HscEnv.  The HPT doc says
that in one-shot mode, the HPT is empty and even local modules are
demand-cached in the ExternalPackageState (which the PIT belongs to).
And the EPS doc itself reinforces that where it says in one-shot mode
"home-package modules accumulate in the external package state".

So why not just ignore the HPT, and run multiple "one-shot" compiles,
and let all the info accumulate in the PIT?

A fair amount of work in GhcMake is concerned with trimming old data
out of the HPT, I assume this is for ghci that wants to reload changed
modules but keep unchanged ones.  I don't actually care about that
since I can assume the modules will be unchanged over one run.

So I tried just calling compileFile multiple times in the same
GhcMonad, assuming the mutable bits of the HscEnv get updated
appropriately.  Here are the results for a build of about 200 modules:

with persistent server:
no link:
3.30s user 1.60s system 12% cpu 38.323 total
3.50s user 1.66s system 13% cpu 38.368 total
link:
21.66s user 4.13s system 35% cpu 1:11.62 total
21.59s user 4.54s system 38% cpu 1:08.13 total
21.82s user 4.70s system 35% cpu 1:14.56 total

without server (ghc -c):
no link:
109.25s user 19.90s system 240% cpu 53.750 total
109.11s user 19.23s system 243% cpu 52.794 total
link:
128.10s user 21.66s system 201% cpu 1:14.29 total

ghc --make (with linking since I can't turn that off):
42.57s user 5.83s system 74% cpu 1:05.15 total

The 'user' is low for the server because it doesn't count time spent
by the subprocesses on the other end of the socket, but excluding
linking it looks like I can shave about 25% off compile time.
Unfortunately it winds up being just about the same as ghc --make, so
it seems too low.  Perhaps I should be using the HPT?  I'm also
falling back to plain ghc for linking, maybe --make can link faster
when it has everything cached?  I guess it shouldn't, because it
presumably just dispatches to ld.

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: parallelizing ghc

Simon Marlow-7
On 17/02/2012 01:59, Evan Laforge wrote:

>> However, the GHC API doesn't provide a way to do this directly (I hadn't
>> really thought about this when I suggested the idea before, sorry).  The GHC
>> API provides support for compiling multiple modules in the way that GHCi and
>> --make work; each module is added to the HPT as it is compiled.  But when
>> compiling single modules, GHC doesn't normally use the HPT - interfaces for
>> modules in the home package are normally demand-loaded in the same way as
>> interfaces for package modules, and added to the PIT. The crucial difference
>> between the HPT and the PIT is that the PIT supports demand-loading of
>> interfaces, but the HPT is supposed to be populated in the right order by
>> the compilation manager - home package modules are assumed to be present in
>> the HPT when they are required.
>
> Yah, that's what I don't understand about HscEnv.  The HPT doc says
> that in one-shot mode, the HPT is empty and even local modules are
> demand-cached in the ExternalPackageState (which the PIT belongs to).
> And the EPS doc itself reinforces that where it says in one-shot mode
> "home-package modules accumulate in the external package state".
>
> So why not just ignore the HPT, and run multiple "one-shot" compiles,
> and let all the info accumulate in the PIT?

Sure, except that if the server is to be used by multiple clients, you
will get clashes in the PIT when say two clients both try to compile a
module with the same name.

The PIT is indexed by Module, which is basically the pair
(package,modulename), and the package for the main program is always the
same: "main".

This will work fine if you spin up a new server for each program you
want to build - maybe that's fine for your use case?

Don't forget to make sure the GhcMode is set to OneShot, not
CompManager, BTW.

> A fair amount of work in GhcMake is concerned with trimming old data
> out of the HPT, I assume this is for ghci that wants to reload changed
> modules but keep unchanged ones.  I don't actually care about that
> since I can assume the modules will be unchanged over one run.
>
> So I tried just calling compileFile multiple times in the same
> GhcMonad, assuming the mutable bits of the HscEnv get updated
> appropriately.  Here are the results for a build of about 200 modules:
>
> with persistent server:
> no link:
> 3.30s user 1.60s system 12% cpu 38.323 total
> 3.50s user 1.66s system 13% cpu 38.368 total
> link:
> 21.66s user 4.13s system 35% cpu 1:11.62 total
> 21.59s user 4.54s system 38% cpu 1:08.13 total
> 21.82s user 4.70s system 35% cpu 1:14.56 total
>
> without server (ghc -c):
> no link:
> 109.25s user 19.90s system 240% cpu 53.750 total
> 109.11s user 19.23s system 243% cpu 52.794 total
> link:
> 128.10s user 21.66s system 201% cpu 1:14.29 total
>
> ghc --make (with linking since I can't turn that off):
> 42.57s user 5.83s system 74% cpu 1:05.15 total

Yep, it seems to be doing the right thing.

> The 'user' is low for the server because it doesn't count time spent
> by the subprocesses on the other end of the socket, but excluding
> linking it looks like I can shave about 25% off compile time.
> Unfortunately it winds up being just about the same as ghc --make, so
> it seems too low.

But that's what you expect, isn't it?

> Perhaps I should be using the HPT?  I'm also
> falling back to plain ghc for linking, maybe --make can link faster
> when it has everything cached?  I guess it shouldn't, because it
> presumably just dispatches to ld.

--make has a slight advantage for linking in that it knows which
packages it needs to link against, whereas plain ghc will link against
all the packages on the command line.

Cheers,
        Simon

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: parallelizing ghc

Evan Laforge
> Sure, except that if the server is to be used by multiple clients, you will
> get clashes in the PIT when say two clients both try to compile a module
> with the same name.
>
> The PIT is indexed by Module, which is basically the pair
> (package,modulename), and the package for the main program is always the
> same: "main".
>
> This will work fine if you spin up a new server for each program you want to
> build - maybe that's fine for your use case?

Yep, I have a new server for each CPU.  So compiling one program will
start up (say) 4 compilers and one server.  Then shake will start
throwing source files at the server, in the proper dependency order,
and the server will distribute the input files among the 4 servers.
Each server is single-threaded so I don't have to worry about calling
GHC functions reentrantly.

But --make is single-threaded as well, so why doesn't it just call
compileFile repeatedly and instead bother with all that HPT stuff?  Is
it just for ghci?

>> The 'user' is low for the server because it doesn't count time spent
>> by the subprocesses on the other end of the socket, but excluding
>> linking it looks like I can shave about 25% off compile time.
>> Unfortunately it winds up being just about the same as ghc --make, so
>> it seems too low.
>
> But that's what you expect, isn't it?

It's surprising to me that the serial --make is just about the same
speed as a parallelized one.  The whole point was to compile faster!

Granted, each interface has to be loaded for each processor while
--make only needs to do it once, but once loaded they should stay
loaded and I'd expect the benefit from two processors would win out
pretty quickly.

> --make has a slight advantage for linking in that it knows which packages it
> needs to link against, whereas plain ghc will link against all the packages
> on the command line.

Ohh, so maybe with --make it can omit some packages and do less work.
Let me try minimizing the -packages and see if that helps.

As an aside, it would be handy to be able to ask ghc "given this main
module, which -packages should the final program get?" but not
actually compile anything.  Is there a way to do that, short of
writing my own with the ghc api?  Would it be a reasonable ghc flag,
along the lines of -M but for packages?


BTW, in case anyone is interested, a darcs repo is at
http://ofb.net/~elaforge/ghc-server/

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: parallelizing ghc

Simon Marlow-7
On 17/02/2012 18:12, Evan Laforge wrote:

>> Sure, except that if the server is to be used by multiple clients, you will
>> get clashes in the PIT when say two clients both try to compile a module
>> with the same name.
>>
>> The PIT is indexed by Module, which is basically the pair
>> (package,modulename), and the package for the main program is always the
>> same: "main".
>>
>> This will work fine if you spin up a new server for each program you want to
>> build - maybe that's fine for your use case?
>
> Yep, I have a new server for each CPU.  So compiling one program will
> start up (say) 4 compilers and one server.  Then shake will start
> throwing source files at the server, in the proper dependency order,
> and the server will distribute the input files among the 4 servers.
> Each server is single-threaded so I don't have to worry about calling
> GHC functions reentrantly.
>
> But --make is single-threaded as well, so why doesn't it just call
> compileFile repeatedly and instead bother with all that HPT stuff?  Is
> it just for ghci?

That might be true, but I'm not completely sure.  The HPT stuff was
added with a continuous edit-recompile cycle in mind (i.e. for GHCi),
and we added --make at the same time because it fitted nicely.  It might
be that just calling compileFile repeatedly works, and it would end up
storing the interfaces for the home-package modules in the
PackageIfaceTable, but we never considered this use case.  One thing
that worries me: will it be reading the .hi file for a module off the
disk after compiling it?  I suspect it might, whereas the HPT method
will be caching the iface in the HPT.

>>> The 'user' is low for the server because it doesn't count time spent
>>> by the subprocesses on the other end of the socket, but excluding
>>> linking it looks like I can shave about 25% off compile time.
>>> Unfortunately it winds up being just about the same as ghc --make, so
>>> it seems too low.
>>
>> But that's what you expect, isn't it?
>
> It's surprising to me that the serial --make is just about the same
> speed as a parallelized one.  The whole point was to compile faster!

Ah, so maybe the problem is that the compileFile method is re-reading
.hi files off the disk (and typechecking them), and that is making it
slower.

> Granted, each interface has to be loaded for each processor while
> --make only needs to do it once, but once loaded they should stay
> loaded and I'd expect the benefit from two processors would win out
> pretty quickly.
>
>> --make has a slight advantage for linking in that it knows which packages it
>> needs to link against, whereas plain ghc will link against all the packages
>> on the command line.
>
> Ohh, so maybe with --make it can omit some packages and do less work.
> Let me try minimizing the -packages and see if that helps.
>
> As an aside, it would be handy to be able to ask ghc "given this main
> module, which -packages should the final program get?" but not
> actually compile anything.  Is there a way to do that, short of
> writing my own with the ghc api?  Would it be a reasonable ghc flag,
> along the lines of -M but for packages?

I don't think we can calculate the package dependencies without knowing
the ModIface, which is generated by compiling (or at least typechecking)
each module.

Cheers,
        Simon


>
> BTW, in case anyone is interested, a darcs repo is at
> http://ofb.net/~elaforge/ghc-server/


_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users