slow load and typecheck

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

slow load and typecheck

fommil
Hello all,

I am writing an interactive tool using the ghc api. It is able to load
and typecheck a source file in a user's package.

I obtain the flags that cabal uses to compile the user's package via the
hie-bios trick, and I `parseDynamicFlagsCmdLine' them inside my tool,
then I `setTargets' all the home modules (with targetAllowObjCode=True).

I use HscNothing and NoLink because I only want access to the trees, I
don't want to produce any output files.

For the file that I wish to inspect, I `removeTarget' the module and
`addTarget` it again but this time providing the full path to the file
and don't allow object code.

Then I LoadUpTo and typecheck. For the sake of simplicity, let's assume
that the file under inspection only has a module definiton and no
imports or top levels.

Functionally, my code is working great and I am able to do what I want
with the typechecked tree.

However, load is very slow (~10 seconds user time) on large projects.
Here is a cpu time trace of my program (milliseconds):

  main              1
  parse flags      93
  load          20436
  typecheck     20437

I can enable a bit more ghc timing info via -Rghc-timings and I see

  !!! Chasing dependencies: finished in 157.20 milliseconds, allocated
      528.112 megabytes

This seems fine, anything sub-second is ok.

But then I see a bunch of home modules in CodeGen that I was not expecting:

   !!! CodeGen [My.Module.Dependency]:
       finished in 3335.62 milliseconds, allocated 270.615 megabytes

So it looks like the targetAllowObjCode is being ignored... is there any
way to force it? Actually I'd prefer to fail fast than to ever compile
or codegen a dependency module.


I know that it should be possible to load the module a lot faster
because if I make a small change in the file under inspection and ask
cabal to recompile the module it is super fast (less than a second).

Could somebody who understands how incremental/partial compiles work
please help me out?


PS: If this textual description is confusing, I could put together a
minimal reproduction and example project but it will take me some time
to do that.

--
Best regards,
Sam


_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

signature.asc (199 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: slow load and typecheck

Brandon Allbery
I think the only path for loading a dependency that doesn't involve loading object code of some kind is the {-# SOURCE #-} hack as part of .hs-boot files, which isn't general enough to be reused here as I understand it. A decent chunk of the compiler would need to be duplicated to avoid this, and it might use a fair amount of memory and end up generating at least part of the object into memory.

Also recall that if any TH or quasiquotation is involved, it'll need to load object code in support of that; and it might well need to prepare for this in the general case rather than again having to duplicate a bunch of code to support different no-TH and TH paths.

Cabal will build all that stuff the first time and then reuse it the next, so it's not quite the same thing. Since you told ghc no object code, it discards what it generates here and may not use existing compiled modules; or you may have specified settings incompatible with any it did find.

In short, you may want to rethink this; ghc is a compiler, not an IDE, and doesn't quite work the way you had hoped.

On Tue, Oct 8, 2019 at 10:15 AM Sam Halliday <[hidden email]> wrote:
Hello all,

I am writing an interactive tool using the ghc api. It is able to load
and typecheck a source file in a user's package.

I obtain the flags that cabal uses to compile the user's package via the
hie-bios trick, and I `parseDynamicFlagsCmdLine' them inside my tool,
then I `setTargets' all the home modules (with targetAllowObjCode=True).

I use HscNothing and NoLink because I only want access to the trees, I
don't want to produce any output files.

For the file that I wish to inspect, I `removeTarget' the module and
`addTarget` it again but this time providing the full path to the file
and don't allow object code.

Then I LoadUpTo and typecheck. For the sake of simplicity, let's assume
that the file under inspection only has a module definiton and no
imports or top levels.

Functionally, my code is working great and I am able to do what I want
with the typechecked tree.

However, load is very slow (~10 seconds user time) on large projects.
Here is a cpu time trace of my program (milliseconds):

  main              1
  parse flags      93
  load          20436
  typecheck     20437

I can enable a bit more ghc timing info via -Rghc-timings and I see

  !!! Chasing dependencies: finished in 157.20 milliseconds, allocated
      528.112 megabytes

This seems fine, anything sub-second is ok.

But then I see a bunch of home modules in CodeGen that I was not expecting:

   !!! CodeGen [My.Module.Dependency]:
       finished in 3335.62 milliseconds, allocated 270.615 megabytes

So it looks like the targetAllowObjCode is being ignored... is there any
way to force it? Actually I'd prefer to fail fast than to ever compile
or codegen a dependency module.


I know that it should be possible to load the module a lot faster
because if I make a small change in the file under inspection and ask
cabal to recompile the module it is super fast (less than a second).

Could somebody who understands how incremental/partial compiles work
please help me out?


PS: If this textual description is confusing, I could put together a
minimal reproduction and example project but it will take me some time
to do that.

--
Best regards,
Sam
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


--
brandon s allbery kf8nh

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: slow load and typecheck

fommil
In reply to this post by fommil
A quick follow-up to this, Rahul Muttinieni gave me some advice to try
out

  HscInterpreted / LinkInMemory

instead of

  HscNothing / NoLink

and now I am no longer seeing home modules being compiled, and
everything is a lot faster. Woohoo!


But I have no idea why this speeds things up... my code isn't using
TemplateHaskell so HscNothing should really mean "don't do any codegen".
Something is causing the HscNothing to be ignored. I'd still really like
to get to the bottom of this so if anybody knows how the batch compiler
is able to avoid recompiling home modules then please let me know... I
would like to continue using HscNothing instead of HscInterpreted.


Sam Halliday <[hidden email]> writes:

> Hello all,
>
> I am writing an interactive tool using the ghc api. It is able to load
> and typecheck a source file in a user's package.
>
> I obtain the flags that cabal uses to compile the user's package via the
> hie-bios trick, and I `parseDynamicFlagsCmdLine' them inside my tool,
> then I `setTargets' all the home modules (with targetAllowObjCode=True).
>
> I use HscNothing and NoLink because I only want access to the trees, I
> don't want to produce any output files.
>
> For the file that I wish to inspect, I `removeTarget' the module and
> `addTarget` it again but this time providing the full path to the file
> and don't allow object code.
>
> Then I LoadUpTo and typecheck. For the sake of simplicity, let's assume
> that the file under inspection only has a module definiton and no
> imports or top levels.
>
> Functionally, my code is working great and I am able to do what I want
> with the typechecked tree.
>
> However, load is very slow (~10 seconds user time) on large projects.
> Here is a cpu time trace of my program (milliseconds):
>
>   main              1
>   parse flags      93
>   load          20436
>   typecheck     20437
>
> I can enable a bit more ghc timing info via -Rghc-timings and I see
>
>   !!! Chasing dependencies: finished in 157.20 milliseconds, allocated
>       528.112 megabytes
>
> This seems fine, anything sub-second is ok.
>
> But then I see a bunch of home modules in CodeGen that I was not expecting:
>
>    !!! CodeGen [My.Module.Dependency]:
>        finished in 3335.62 milliseconds, allocated 270.615 megabytes
>
> So it looks like the targetAllowObjCode is being ignored... is there any
> way to force it? Actually I'd prefer to fail fast than to ever compile
> or codegen a dependency module.
>
>
> I know that it should be possible to load the module a lot faster
> because if I make a small change in the file under inspection and ask
> cabal to recompile the module it is super fast (less than a second).
>
> Could somebody who understands how incremental/partial compiles work
> please help me out?
>
>
> PS: If this textual description is confusing, I could put together a
> minimal reproduction and example project but it will take me some time
> to do that.
>
> --
> Best regards,
> Sam
--
Best regards,
Sam


_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

signature.asc (199 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: slow load and typecheck

fommil
In reply to this post by Brandon Allbery
Thanks Brandon,

Brandon Allbery <[hidden email]> writes:
> Cabal will build all that stuff the first time and then reuse it the next,
> so it's not quite the same thing. Since you told ghc no object code,

Sorry, I meant that I used targetAllowObjCode=True for everything,
except the file under inspection. Do you mean that if I used
targetAllowObjCode=False for just one module it will invalidate the
object code for everything it depends on? That is unexpected.


> In short, you may want to rethink this; ghc is a compiler, not an IDE, and
> doesn't quite work the way you had hoped.

How would you suggest rethinking it? Bare in mind that the api is
working exactly the way I want from a functional point of view (just
slow) with HscNothing... and seems to work exactly the way I want with
HscInterpreted (but with all the ghci caveats like unboxed tuples etc).

--
Best regards,
Sam


_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

signature.asc (199 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: slow load and typecheck

Brandon Allbery
It's doing what you — but not ghc — consider "extra work", though. ghc expects to be compiling code, and doesn't have a separate code path for "load symbols from an external module by parsing its source code" instead of "load symbols from an external module by loading its .hsc file and object code", aside from HscInterpreted.

On Tue, Oct 8, 2019 at 10:37 AM Sam Halliday <[hidden email]> wrote:
Thanks Brandon,

Brandon Allbery <[hidden email]> writes:
> Cabal will build all that stuff the first time and then reuse it the next,
> so it's not quite the same thing. Since you told ghc no object code,

Sorry, I meant that I used targetAllowObjCode=True for everything,
except the file under inspection. Do you mean that if I used
targetAllowObjCode=False for just one module it will invalidate the
object code for everything it depends on? That is unexpected.


> In short, you may want to rethink this; ghc is a compiler, not an IDE, and
> doesn't quite work the way you had hoped.

How would you suggest rethinking it? Bare in mind that the api is
working exactly the way I want from a functional point of view (just
slow) with HscNothing... and seems to work exactly the way I want with
HscInterpreted (but with all the ghci caveats like unboxed tuples etc).

--
Best regards,
Sam


--
brandon s allbery kf8nh

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: slow load and typecheck

Matthew Pickering
In reply to this post by fommil
Are you writing interface files (-fwrite-interface)? It makes no sense
for HscInterpreted to be faster than HscNothing.

Cheers,

Matt

On Tue, Oct 8, 2019 at 3:30 PM Sam Halliday <[hidden email]> wrote:

>
> A quick follow-up to this, Rahul Muttinieni gave me some advice to try
> out
>
>   HscInterpreted / LinkInMemory
>
> instead of
>
>   HscNothing / NoLink
>
> and now I am no longer seeing home modules being compiled, and
> everything is a lot faster. Woohoo!
>
>
> But I have no idea why this speeds things up... my code isn't using
> TemplateHaskell so HscNothing should really mean "don't do any codegen".
> Something is causing the HscNothing to be ignored. I'd still really like
> to get to the bottom of this so if anybody knows how the batch compiler
> is able to avoid recompiling home modules then please let me know... I
> would like to continue using HscNothing instead of HscInterpreted.
>
>
> Sam Halliday <[hidden email]> writes:
>
> > Hello all,
> >
> > I am writing an interactive tool using the ghc api. It is able to load
> > and typecheck a source file in a user's package.
> >
> > I obtain the flags that cabal uses to compile the user's package via the
> > hie-bios trick, and I `parseDynamicFlagsCmdLine' them inside my tool,
> > then I `setTargets' all the home modules (with targetAllowObjCode=True).
> >
> > I use HscNothing and NoLink because I only want access to the trees, I
> > don't want to produce any output files.
> >
> > For the file that I wish to inspect, I `removeTarget' the module and
> > `addTarget` it again but this time providing the full path to the file
> > and don't allow object code.
> >
> > Then I LoadUpTo and typecheck. For the sake of simplicity, let's assume
> > that the file under inspection only has a module definiton and no
> > imports or top levels.
> >
> > Functionally, my code is working great and I am able to do what I want
> > with the typechecked tree.
> >
> > However, load is very slow (~10 seconds user time) on large projects.
> > Here is a cpu time trace of my program (milliseconds):
> >
> >   main              1
> >   parse flags      93
> >   load          20436
> >   typecheck     20437
> >
> > I can enable a bit more ghc timing info via -Rghc-timings and I see
> >
> >   !!! Chasing dependencies: finished in 157.20 milliseconds, allocated
> >       528.112 megabytes
> >
> > This seems fine, anything sub-second is ok.
> >
> > But then I see a bunch of home modules in CodeGen that I was not expecting:
> >
> >    !!! CodeGen [My.Module.Dependency]:
> >        finished in 3335.62 milliseconds, allocated 270.615 megabytes
> >
> > So it looks like the targetAllowObjCode is being ignored... is there any
> > way to force it? Actually I'd prefer to fail fast than to ever compile
> > or codegen a dependency module.
> >
> >
> > I know that it should be possible to load the module a lot faster
> > because if I make a small change in the file under inspection and ask
> > cabal to recompile the module it is super fast (less than a second).
> >
> > Could somebody who understands how incremental/partial compiles work
> > please help me out?
> >
> >
> > PS: If this textual description is confusing, I could put together a
> > minimal reproduction and example project but it will take me some time
> > to do that.
> >
> > --
> > Best regards,
> > Sam
>
> --
> Best regards,
> Sam
> _______________________________________________
> ghc-devs mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: slow load and typecheck

fommil
Matthew Pickering <[hidden email]> writes:

> Are you writing interface files (-fwrite-interface)? It makes no sense
> for HscInterpreted to be faster than HscNothing.

Nope, not writing anything like that (I just checked the ghc flags from
hie-bios to confirm)... and I agree that this makes no sense.


--
Best regards,
Sam


_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

signature.asc (199 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: slow load and typecheck

Brandon Allbery
I already mentioned needing .hi (I may have said hsc, whoops; Haskell Interface files) from dependencies; you really want to turn that part on, at least. And possibly ensure your other options are compatible with existing .hi files, so they can be loaded directly. I think the .o isn't used until link time, which should be irrelevant for you; but you really do want those .hi files, otherwise it must compile the dependency module to generate one.

On Tue, Oct 8, 2019 at 10:51 AM Sam Halliday <[hidden email]> wrote:
Matthew Pickering <[hidden email]> writes:

> Are you writing interface files (-fwrite-interface)? It makes no sense
> for HscInterpreted to be faster than HscNothing.

Nope, not writing anything like that (I just checked the ghc flags from
hie-bios to confirm)... and I agree that this makes no sense.


--
Best regards,
Sam
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


--
brandon s allbery kf8nh

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: slow load and typecheck

fommil
In reply to this post by Brandon Allbery
Brandon Allbery <[hidden email]> writes:

> It's doing what you — but not ghc — consider "extra work", though. ghc
> expects to be compiling code, and doesn't have a separate code path for
> "load symbols from an external module by parsing its source code" instead
> of "load symbols from an external module by loading its .hsc file and
> object code", aside from HscInterpreted.


I'm confused: it sounds like you saying that only HscInterpreted can
load symbols of dependencies from object code. Then how does cabal+ghc
do this when I make a change to one file in my project and do a
recompile of the package?

BTW, I am seeing modules going through CodeGen that are not part of the
file's dependency graph... LoadUpTo is behaving more like LoadAll.

--
Best regards,
Sam


_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

signature.asc (199 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: slow load and typecheck

fommil
In reply to this post by Brandon Allbery
Brandon Allbery <[hidden email]> writes:

> you really do want those .hi files, otherwise it must compile the
> dependency module to generate one.

Right, exactly! But I thought that's what targetAllowObjCode=True was
doing, is it not? Is there another setting that I'm missing?

Should I use that for all my modules and not just the dependencies?

--
Best regards,
Sam


_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

signature.asc (199 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: slow load and typecheck

Brandon Allbery
In reply to this post by fommil
It reuses the .hi files already built for other modules. Those aren't in the source directory but under a build directory. If they don't exist there, it will build the dependencies to create them.

On Tue, Oct 8, 2019 at 10:57 AM Sam Halliday <[hidden email]> wrote:
Brandon Allbery <[hidden email]> writes:

> It's doing what you — but not ghc — consider "extra work", though. ghc
> expects to be compiling code, and doesn't have a separate code path for
> "load symbols from an external module by parsing its source code" instead
> of "load symbols from an external module by loading its .hsc file and
> object code", aside from HscInterpreted.


I'm confused: it sounds like you saying that only HscInterpreted can
load symbols of dependencies from object code. Then how does cabal+ghc
do this when I make a change to one file in my project and do a
recompile of the package?

BTW, I am seeing modules going through CodeGen that are not part of the
file's dependency graph... LoadUpTo is behaving more like LoadAll.

--
Best regards,
Sam


--
brandon s allbery kf8nh

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: slow load and typecheck

Brandon Allbery
In reply to this post by fommil
If they are loading each other, they likewise need .hi files. .o files are optional if you aren't linking them.

On Tue, Oct 8, 2019 at 10:59 AM Sam Halliday <[hidden email]> wrote:
Brandon Allbery <[hidden email]> writes:

> you really do want those .hi files, otherwise it must compile the
> dependency module to generate one.

Right, exactly! But I thought that's what targetAllowObjCode=True was
doing, is it not? Is there another setting that I'm missing?

Should I use that for all my modules and not just the dependencies?

--
Best regards,
Sam


--
brandon s allbery kf8nh

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: slow load and typecheck

fommil
In reply to this post by Brandon Allbery
Brandon Allbery <[hidden email]> writes:
> It reuses the .hi files already built for other modules. Those aren't in
> the source directory but under a build directory. If they don't exist
> there, it will build the dependencies to create them.

The .hi files exist in the target directory and my tool has informed the
ghc api about that location, but it's not using them and I don't know
why... I guess I'm asking "how can I make ghc use the .hi files instead
of compiling the .hs files?". It seems to work fine when I use
HscInterpreted instead of HscNothing.

BTW I tried using targetAllowObjCode=True for everything, but it makes
no difference.

--
Best regards,
Sam


_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

signature.asc (199 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: slow load and typecheck

Evan Laforge
In reply to this post by fommil
I'm not sure if I'm doing the same thing as you, but I use a GHC repl
for my program.  It loads a 200-300 modules in under a second, and is
able to reload changed ones dynamically, just like ghci.

The source is https://github.com/elaforge/karya/blob/work/Cmd/ReplGhc.hs,
see 'parse_flags' and its call in 'interpreter'.

The main thing is getting ghc to load the .o files, but if ghci will
do it, then the ghc API will do it.  You just have to get the flags to
be the same, and ghc is pretty opaque about why it doesn't want to
load.  There is a -ddump-something flag but it doesn't say what flags
actually changed.  I actually wound up patching ghc to add that
feature.

On Tue, Oct 8, 2019 at 7:15 AM Sam Halliday <[hidden email]> wrote:

>
> Hello all,
>
> I am writing an interactive tool using the ghc api. It is able to load
> and typecheck a source file in a user's package.
>
> I obtain the flags that cabal uses to compile the user's package via the
> hie-bios trick, and I `parseDynamicFlagsCmdLine' them inside my tool,
> then I `setTargets' all the home modules (with targetAllowObjCode=True).
>
> I use HscNothing and NoLink because I only want access to the trees, I
> don't want to produce any output files.
>
> For the file that I wish to inspect, I `removeTarget' the module and
> `addTarget` it again but this time providing the full path to the file
> and don't allow object code.
>
> Then I LoadUpTo and typecheck. For the sake of simplicity, let's assume
> that the file under inspection only has a module definiton and no
> imports or top levels.
>
> Functionally, my code is working great and I am able to do what I want
> with the typechecked tree.
>
> However, load is very slow (~10 seconds user time) on large projects.
> Here is a cpu time trace of my program (milliseconds):
>
>   main              1
>   parse flags      93
>   load          20436
>   typecheck     20437
>
> I can enable a bit more ghc timing info via -Rghc-timings and I see
>
>   !!! Chasing dependencies: finished in 157.20 milliseconds, allocated
>       528.112 megabytes
>
> This seems fine, anything sub-second is ok.
>
> But then I see a bunch of home modules in CodeGen that I was not expecting:
>
>    !!! CodeGen [My.Module.Dependency]:
>        finished in 3335.62 milliseconds, allocated 270.615 megabytes
>
> So it looks like the targetAllowObjCode is being ignored... is there any
> way to force it? Actually I'd prefer to fail fast than to ever compile
> or codegen a dependency module.
>
>
> I know that it should be possible to load the module a lot faster
> because if I make a small change in the file under inspection and ask
> cabal to recompile the module it is super fast (less than a second).
>
> Could somebody who understands how incremental/partial compiles work
> please help me out?
>
>
> PS: If this textual description is confusing, I could put together a
> minimal reproduction and example project but it will take me some time
> to do that.
>
> --
> Best regards,
> Sam
> _______________________________________________
> ghc-devs mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: slow load and typecheck

fommil
Thanks Evan,

Evan Laforge writes:
> https://github.com/elaforge/karya/blob/work/Cmd/ReplGhc.hs

Yes what you're doing is very similar. I'm also adding args_left as
modules because they tend to be RTS and home modules. But it looks like
you always use False for object code. You're also using ghcMode so I'll
investigate if I need to force a mode.


> ghc is pretty opaque about why it doesn't want to load.

:-D

--
Best regards,
Sam


_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

signature.asc (199 bytes) Download Attachment