GHC API: memory usage of loaded modules

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

GHC API: memory usage of loaded modules

Evan Laforge
I have a program that uses the GHC API to provide a REPL.  It winds up
taking up 200mb in RAM, as measured by GHC.Stats.currentBytesUsed, but
without the GHC API it's 5mb.  If I turn on verbose, I can see that
GHC is loading 255 modules, all loaded binary ("skipping M ( M.hs,
M.hs.o )") except the toplevel, and the memory use is zooming up as it
loads them.

I expect some memory usage from loading modules, but 195mb seems like
a lot.  If I do a 'du' on the entire obj directory (which has 401
*.hs.o files... the REPL doesn't expose everything), it's only 76mb on
disk.  How do loaded modules wind up consuming space, and is there any
way to use less space?

The thing is, all those loaded modules are part of the application
itself, so presumably they've already been linked into the binary and
loaded into memory.  The ideal would be that I could somehow reuse
that.  I imagine that I could by writing my own haskell interpreter
and making a big symbol table of all the callable functions, but I'd
rather not write my own interpreter if I can use an existing one!
_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: GHC API: memory usage of loaded modules

Reid Barton-2
On Mon, Nov 28, 2016 at 8:54 PM, Evan Laforge <[hidden email]> wrote:

> I have a program that uses the GHC API to provide a REPL.  It winds up
> taking up 200mb in RAM, as measured by GHC.Stats.currentBytesUsed, but
> without the GHC API it's 5mb.  If I turn on verbose, I can see that
> GHC is loading 255 modules, all loaded binary ("skipping M ( M.hs,
> M.hs.o )") except the toplevel, and the memory use is zooming up as it
> loads them.
>
> I expect some memory usage from loading modules, but 195mb seems like
> a lot.  If I do a 'du' on the entire obj directory (which has 401
> *.hs.o files... the REPL doesn't expose everything), it's only 76mb on
> disk.  How do loaded modules wind up consuming space, and is there any
> way to use less space?
>
> The thing is, all those loaded modules are part of the application
> itself, so presumably they've already been linked into the binary and
> loaded into memory.  The ideal would be that I could somehow reuse
> that.  I imagine that I could by writing my own haskell interpreter
> and making a big symbol table of all the callable functions, but I'd
> rather not write my own interpreter if I can use an existing one!

You'd probably find that you also want to, for example, type check the
expressions that you are interpreting. The information needed to do so
is not contained in your executable at all; it's in the .hi files that
were built alongside your program and its dependencies, and the ones
that came with the libraries bundled into GHC. I assume the in-memory
representation of these interface files is not very efficient, and
they probably account for a lot of the space usage of your program.

I'm not sure offhand, but perhaps using -fignore-interface-pragmas
when you invoke the GHC API would reduce the amount of space used
while loading interface files, and if you're using the bytecode
interpreter then you probably don't care about any of the information
it will discard (which mostly has to do with optimizations).

If you build your executable dynamically then the GHC API should also
reuse the same shared libraries and executable image rather than
loading a second copy of the object code. If that doesn't work then it
would be helpful if you could produce a minimal reproducer of it not
working. (The potential disadvantage is that you have to load the
entirety of each of your dependencies, rather than just the parts you
actually use.)

Regards,
Reid Barton
_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: GHC API: memory usage of loaded modules

Evan Laforge
Sorry about the delay, I got distracted by an unrelated memory leak.

On Tue, Nov 29, 2016 at 9:35 AM, Reid Barton <[hidden email]> wrote:
> You'd probably find that you also want to, for example, type check the
> expressions that you are interpreting. The information needed to do so
> is not contained in your executable at all; it's in the .hi files that
> were built alongside your program and its dependencies, and the ones
> that came with the libraries bundled into GHC. I assume the in-memory
> representation of these interface files is not very efficient, and
> they probably account for a lot of the space usage of your program.

That's true, but the .hi files on disk take up about 20k of that 76mb.
If the .o files are loaded basically directly as binary, then that
would mean 20k of .hi files turn into around 124mb in memory, which is
quite an expansion.  But then there's all of the libraries I use and
then all their libraries... perhaps those need to be loaded too?  If
so, there is more than I'm counting.  I'm not sure how to count those,
since they're not included in the "upsweep" log msgs when you do a
GHC.load.

ghci itself takes about 200mb when it loads all that stuff, so I
imagine the memory use is "working as intended", not me just using the
API wrong.

> I'm not sure offhand, but perhaps using -fignore-interface-pragmas
> when you invoke the GHC API would reduce the amount of space used
> while loading interface files, and if you're using the bytecode
> interpreter then you probably don't care about any of the information
> it will discard (which mostly has to do with optimizations).

I tried it, and I recall at the time it helped, but now it's being
exactly the same, whether I try with ghci or my own GHC API using
program.  E.g. I have:

memory_used :: IO Bytes
memory_used = do
    System.Mem.performMajorGC
    stats <- GHC.Stats.getGCStats
    return $ Bytes $ fromIntegral $ GHC.Stats.currentBytesUsed stats

in a module that loads a lot of stuff.  When I run that with ghci or
ghci -fignore-interface-pragmas, memory use is about the same.

> If you build your executable dynamically then the GHC API should also
> reuse the same shared libraries and executable image rather than
> loading a second copy of the object code. If that doesn't work then it
> would be helpful if you could produce a minimal reproducer of it not
> working. (The potential disadvantage is that you have to load the
> entirety of each of your dependencies, rather than just the parts you
> actually use.)

I do build dynamically, since it's the only option nowadays to load .o
files, but I guess what you mean is link the application as a shared
library, and then link it to the Main module for the app, and pass it
to GHC.parseDynamicFlags for the REPL?  That's a good idea.  But I'd
still be loading all those .hi files, and if the majority of the
memory use is actually from those, it might not help, right?

I don't fully understand the "have to load the entirety of your
dependencies" part.  If I'm using the same code linked into the main
application, then isn't it a given that I'm loading everything in the
application in the first place?  Or do you mean load all the .hi
files, even if I'm not exposing functions from them?  If the size of
in-memory .hi files dwarfs the binary size, then that might be a net
lose.  Though if my guess is correct about most .hi files being loaded
from external packages, then maybe there won't be much difference.
_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: GHC API: memory usage of loaded modules

Brandon Allbery

On Tue, Dec 13, 2016 at 1:21 PM, Evan Laforge <[hidden email]> wrote:
If I'm using the same code linked into the main
application, then isn't it a given that I'm loading everything in the
application in the first place?

It's not necessarily accessible in a useful form for use by demand loaded modules; this is a common issue, leading to things like Apache and Perl moving most of their implementation into shared objects specifically so this kind of sharing will work. Additionally, since the demand loaded environment is a separate evaluation environment, all data will necessarily be duplicated (in theory sharing of initialized data with copy on write is possible, but in practice it's a lot of work and demand loading overhead for (for most C/C++ programs at least; ghc initialized data might differ) very little practical gain).

--
brandon s allbery kf8nh                               sine nomine associates
[hidden email]                                  [hidden email]
unix, openafs, kerberos, infrastructure, xmonad        http://sinenomine.net

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: GHC API: memory usage of loaded modules

Reid Barton-2
In reply to this post by Evan Laforge
On Tue, Dec 13, 2016 at 1:21 PM, Evan Laforge <[hidden email]> wrote:

> Sorry about the delay, I got distracted by an unrelated memory leak.
>
> On Tue, Nov 29, 2016 at 9:35 AM, Reid Barton <[hidden email]> wrote:
>> You'd probably find that you also want to, for example, type check the
>> expressions that you are interpreting. The information needed to do so
>> is not contained in your executable at all; it's in the .hi files that
>> were built alongside your program and its dependencies, and the ones
>> that came with the libraries bundled into GHC. I assume the in-memory
>> representation of these interface files is not very efficient, and
>> they probably account for a lot of the space usage of your program.
>
> That's true, but the .hi files on disk take up about 20k of that 76mb.
> If the .o files are loaded basically directly as binary, then that
> would mean 20k of .hi files turn into around 124mb in memory, which is
> quite an expansion.  But then there's all of the libraries I use and
> then all their libraries... perhaps those need to be loaded too?  If
> so, there is more than I'm counting.  I'm not sure how to count those,
> since they're not included in the "upsweep" log msgs when you do a
> GHC.load.

GHCi definitely needs to load some .hi files of your dependencies.
Your .hi files contain the types of your functions, needed to type
check expressions that use them. Let's say the type of one of your
functions involves ByteString. Then GHCi has to read the interface
file that defines ByteString, so that there is something in the
compiler for the type of your function to refer to.

I'm not sure how to predict what exact set of .hi files GHCi will need
to load, but you could run your program under strace (or equivalent)
to see which .hi files it is loading. Then I would guess the expansion
factor when converting into the compiler's internal types is maybe
around 10x. However there's also some kind of lazy loading of .hi
files, and I'm not sure how that works or what granularity it has.

By the way, you can use `ghc --show-iface` to examine .hi files
manually, which might be illuminating.

>> If you build your executable dynamically then the GHC API should also
>> reuse the same shared libraries and executable image rather than
>> loading a second copy of the object code. If that doesn't work then it
>> would be helpful if you could produce a minimal reproducer of it not
>> working. (The potential disadvantage is that you have to load the
>> entirety of each of your dependencies, rather than just the parts you
>> actually use.)
>
> I do build dynamically, since it's the only option nowadays to load .o
> files, but I guess what you mean is link the application as a shared
> library, and then link it to the Main module for the app, and pass it
> to GHC.parseDynamicFlags for the REPL?  That's a good idea.  But I'd
> still be loading all those .hi files, and if the majority of the
> memory use is actually from those, it might not help, right?

I'm pretty sure the old way of linking your program statically, which
will cause the RTS to use its own linker to load .o files, is still
supposed to work. It has the same limitations it has always had, of
course. The new thing is that you need to build dynamically in order
to link object files into the ghc compiler itself; but that's just
because the ghc binary shipped in the binary distribution was built
dynamically; this isn't a constraint on your own GHC API use. (And you
can choose to build ghc statically, too. Windows builds still work
that way.)

I really just meant building your executable dynamically, i.e., with
-dynamic. If the code size is a small proportion of the total memory
use then it won't make a big difference, as you say. However, I'm not
sure that is really the case considering that the GHC library itself
is already about 74 MB on-disk.

I'm not sure why you are looking at the GHC.Stats.currentBytesUsed
number; be aware that it only measures the size of the GCed heap. Many
things that contribute to the total memory usage of your program (such
as its code size, or anything allocated by malloc or mmap) will not
show up there.

> I don't fully understand the "have to load the entirety of your
> dependencies" part.  If I'm using the same code linked into the main
> application, then isn't it a given that I'm loading everything in the
> application in the first place?

Let me explain what I meant with an example. If I build a hello world
program statically, I get a 1.2M executable. Let's assume most of that
size comes from the base package. If I build the same hello world
program dynamically, I get an 18K executable dynamically linked
against an 11M base shared library! At runtime, the dynamic loader
will map that whole 11M file into my process's memory space. Whether
you want to count that as part of the space usage of your program is
up to you; the code segments will be shared between multiple
simultaneous instances of your program (or other programs compiled by
GHC), but if you only run one copy of your program at a time, that
doesn't help you. It certainly won't be counted by currentBytesUsed.

The base library is composed of many individual .o files. When I
linked the hello world statically, the linker took only the .o files
that were actually needed for my program, which is why it was only
1.2M when the base library is 11M. Your real program probably uses
most of base, but may have other dependencies that you use only a
small part of (lens?)

Now when you use the RTS linker in a statically linked program,
although some of the code you need is linked into your program
already, it's not in a usable form for the RTS linker, so it has to
load the .o files itself, effectively creating a second copy of the
code. If you used dynamic linking, then the RTS calls dlopen which
should reuse the mappings that were made when your program was loaded.
The tradeoff is that if you use very little of your dependencies then
it still might be cheaper to store two copies of only the code that
you actually do use.

Regards,
Reid Barton
_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: GHC API: memory usage of loaded modules

Evan Laforge
On Wed, Dec 14, 2016 at 8:21 AM, Reid Barton <[hidden email]> wrote:
> On Tue, Dec 13, 2016 at 1:21 PM, Evan Laforge <[hidden email]> wrote:
> GHCi definitely needs to load some .hi files of your dependencies.
> Your .hi files contain the types of your functions, needed to type
> check expressions that use them. Let's say the type of one of your
> functions involves ByteString. Then GHCi has to read the interface
> file that defines ByteString, so that there is something in the
> compiler for the type of your function to refer to.

Right, that makes sense.  When I enable verbose logging, I see that in
the upsweep phase it collects the imports of all of the transitively
loaded modules.  I assume it loads all the local .hi files, and then
it also has to load the package dependency .hi files (--show-iface
also shows a "package dependencies" section).  I can't tell if it does
that lazily, but it would make sense because surely I'm not using
every single module exported from every single package.  Certainly
packages themselves can be loaded lazily, I frequently see ghci wait
until I try to evaluate an expression to link in a bunch of external
packages.

> I'm not sure how to predict what exact set of .hi files GHCi will need
> to load, but you could run your program under strace (or equivalent)
> to see which .hi files it is loading. Then I would guess the expansion
> factor when converting into the compiler's internal types is maybe
> around 10x. However there's also some kind of lazy loading of .hi
> files, and I'm not sure how that works or what granularity it has.

I guess it would be dtrace on OS X, I'll look into it and see what I
can learn.  Then I can divide the size of the loaded .hi files by the
increase in memory size and see what the ratio actually is.

> By the way, you can use `ghc --show-iface` to examine .hi files
> manually, which might be illuminating.

That is pretty interesting, thanks.  There's quite a lot of stuff in
there, including some I didn't expect, like apparently lots of Show
instance implementations for concrete types:

bac9698d086d969aebee0847bf123997
  $s$fShow(,)_$s$fShow(,)_$cshowList ::
    [(Writable, SaveFile)] -> ShowS

In this case, both Writable and SaveFile are defined elsewhere, but I
do show a list of them in that module so maybe the instances get
inlined in here?

But it's not a crazy amount of stuff, and I wouldn't expect it, since
the .hi files themselves are not unreasonably large.

>> I do build dynamically, since it's the only option nowadays to load .o
>> files, but I guess what you mean is link the application as a shared
>> library, and then link it to the Main module for the app, and pass it
>> to GHC.parseDynamicFlags for the REPL?  That's a good idea.  But I'd
>> still be loading all those .hi files, and if the majority of the
>> memory use is actually from those, it might not help, right?
>
> I'm pretty sure the old way of linking your program statically, which
> will cause the RTS to use its own linker to load .o files, is still
> supposed to work. It has the same limitations it has always had, of
> course. The new thing is that you need to build dynamically in order
> to link object files into the ghc compiler itself; but that's just
> because the ghc binary shipped in the binary distribution was built
> dynamically; this isn't a constraint on your own GHC API use. (And you
> can choose to build ghc statically, too. Windows builds still work
> that way.)

I see from my darcs history that I added -dynamic to all builds except
profiling back in July 2014, I think after upgrading to 7.8.  From the
comment, I did that because otherwise ghci wouldn't load the .o files.
And I remember lots of talk on trac around 7.8 about finally
abandoning the home-grown linker.  This is on OS X, so maybe it's
platform dependent.

> I really just meant building your executable dynamically, i.e., with
> -dynamic. If the code size is a small proportion of the total memory
> use then it won't make a big difference, as you say. However, I'm not
> sure that is really the case considering that the GHC library itself
> is already about 74 MB on-disk.

In that case, I must already be doing that.  But how would that work
for my own application's binary?  When I do otool -L I see that indeed
all the cabal libraries like libHSbase and libHSghc are dynamic
libraries, so presumably those will be shared across the whole OS.
But application's binary is linked via 'ghc -dynamic -package=... A.o
B.o C.o etc'.  The .o files are built with -dynamic and I assume ghci
itself uses the OS's loader for them, but they seem to be linked into
the binary in the traditional static way.

It's confusing to me because traditionally -dynamic is a link only
flag, but ghc also uses it for building .o files... I assume because
of the ghci loading thing.  I always assumed it used the OS's low
level shared object loading, but not the whole dynamic library
mechanism.

> I'm not sure why you are looking at the GHC.Stats.currentBytesUsed
> number; be aware that it only measures the size of the GCed heap. Many
> things that contribute to the total memory usage of your program (such
> as its code size, or anything allocated by malloc or mmap) will not
> show up there.

I just picked it out of GCStats as having a promising looking name.
It's the one the goes up while the GHC API is loading its modules, so
I assumed it was the most useful one.

currentBytesUsed reports 200mb, but the system process viewer shows
350mb, so clearly some isn't being counted.  But that looks like
2-space GC overhead, and indeed if I do +RTS -c, the system usage goes
down to 240mb while currentBytesUsed stays around 200mb (it goes up a
bit actually).  So perhaps most of the allocation is indeed in the
GCed heap, and the extra space is mostly GC overhead.  Does the GHC
API use malloc or mmap internally?  I wouldn't be surprised if .o
files are loaded with mmap.

Another thing that occurred to me, if the GC heap is really mostly
loaded .hi files, then maybe I should increase the number of
generations since most of the heap is immortal.  Or maybe when the
static regions stuff stabilizes, all the .hi data could go into a
static region.  I guess that might require non-trivial ghc hacking
though.

>> I don't fully understand the "have to load the entirety of your
>> dependencies" part.  If I'm using the same code linked into the main
>> application, then isn't it a given that I'm loading everything in the
>> application in the first place?
>
> Let me explain what I meant with an example. If I build a hello world
> program statically, I get a 1.2M executable. Let's assume most of that
> size comes from the base package. If I build the same hello world
> program dynamically, I get an 18K executable dynamically linked
> against an 11M base shared library! At runtime, the dynamic loader
> will map that whole 11M file into my process's memory space. Whether
> you want to count that as part of the space usage of your program is
> up to you; the code segments will be shared between multiple
> simultaneous instances of your program (or other programs compiled by
> GHC), but if you only run one copy of your program at a time, that
> doesn't help you. It certainly won't be counted by currentBytesUsed.
>
> The base library is composed of many individual .o files. When I
> linked the hello world statically, the linker took only the .o files
> that were actually needed for my program, which is why it was only
> 1.2M when the base library is 11M. Your real program probably uses
> most of base, but may have other dependencies that you use only a
> small part of (lens?)

Oh ok, that makes sense.  In that case, because I'm dynamically
linking cabal packages, then certainly I'm already getting that
sharing.  I was mostly concerned with the code from my program itself.
The REPL is only loading about 255 of the 401 local .o files, and
presumably if I link those 401 modules into a local dynlib, and then
link that to both the Main module and have the GHC API load it, then
I'd also share the local code.  Since my program uses all of its own
code kind of by definition, it's loaded no matter what, even if the
REPL doesn't need all of it.
_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: GHC API: memory usage of loaded modules

Brandon Allbery

On Thu, Dec 15, 2016 at 1:55 AM, Evan Laforge <[hidden email]> wrote:
It's confusing to me because traditionally -dynamic is a link only
flag, but ghc also uses it for building .o files... I assume because
of the ghci loading thing.

There may also be some OS X specific behavior here; OS X doesn't like static objects much, due to its PPC heritage (the PPC ABI pretty much restricts "normal" position-dependent static objects to the kernel).

--
brandon s allbery kf8nh                               sine nomine associates
[hidden email]                                  [hidden email]
unix, openafs, kerberos, infrastructure, xmonad        http://sinenomine.net

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users