Recompilation avoidance questions

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Recompilation avoidance questions

Ömer Sinan Ağacan
Hi all,

I'm currently reading the "recompilation avoidance" wiki page [1], and I have a
few questions about the current design.

The wiki page says (in the paragraph "Suppose the change to D ...") if a module
B re-exports x from module D, changing x in D does not cause any changes in B's
interface.

I'm wondering why this is the case. To me this doesn't make sense. Anything that
can potentially effect users of B should be a part of B's interface. This
includes re-exports. I don't understand why there is a difference between normal
exports and re-exports. As far as users of the module concerned there's no
difference. So I'd expect any changes in re-exports to make a difference in B's
interface.

The wiki page says (in "Why not do (1)", where (1) refers to making D.x part of
B's interface) that this is because sometimes changes in D.x should not cause
recompiling B's users. I don't understand why (1) would cause this problem. If
we make x a part of B, as if it's defined in B, similar to how we can avoid
recompilation of users of B when a definition of B changes but the interface is
the same, we could avoid recompiling users when D.x changes.

For example,

    -- B.hs
    module B where

    b = 123123

    -- Main.hs
    import B

    main = print b


    $ ghc-stage1 Main.hs
    [1 of 2] Compiling B                ( B.hs, B.o )
    [2 of 2] Compiling Main             ( Main.hs, Main.o )
    Linking Main ...

Now if I update B and recompile I'll only link Main, won't recompile it:

    -- B.hs
    module B where

    b = 123123 + 12308

    $ ghc-stage1 Main.hs
    [1 of 2] Compiling B                ( B.hs, B.o )
    Linking Main ...

Now suppose B.b was a re-export from D. I don't understand why changing it in D
would cause recompiling Main if we make b a part of B's interface. I think what
would happen is: because D's interface hash won't change we won't recompile B.
No problems at all.

Finally, I'm a bit confused about this part

> To ensure that A is recompiled, we therefore have two options:
> ...
> (2) arrange to touch B.hi and C.hi even if they haven't changed.

I don't understand how touching is relevant, as far as I understand touching
can't force recompilation. Example:

    $ ghc-stage1 Main.hs
    [1 of 3] Compiling A                ( A.hs, A.o )
    [2 of 3] Compiling B                ( B.hs, B.o )
    [3 of 3] Compiling Main             ( Main.hs, Main.o )
    Linking Main ...
    $ touch A.hi
    $ ghc-stage1 Main.hs
    $ touch B.hi
    $ ghc-stage1 Main.hs

Am I missing anything?

Thanks,

Ömer

[1]: https://gitlab.haskell.org/ghc/ghc/-/wikis/commentary/compiler/recompilation-avoidance
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Recompilation avoidance questions

Simon Marlow-7
On Tue, 21 Apr 2020 at 11:38, Ömer Sinan Ağacan <[hidden email]> wrote:
Hi all,

I'm currently reading the "recompilation avoidance" wiki page [1], and I have a
few questions about the current design.

The wiki page says (in the paragraph "Suppose the change to D ...") if a module
B re-exports x from module D, changing x in D does not cause any changes in B's
interface.

I'm wondering why this is the case. To me this doesn't make sense. Anything that
can potentially effect users of B should be a part of B's interface. This
includes re-exports. I don't understand why there is a difference between normal
exports and re-exports. As far as users of the module concerned there's no
difference. So I'd expect any changes in re-exports to make a difference in B's
interface.

Yes, that's already the case. Under "Deciding whether to recompile", we say:

* If anything else has changed in a way that would affect the results of compiling this module, we must recompile.

so that's the basic requirement.

We don't want to include the *definitions* of things that are re-exported, because that would bloat interface files a lot. Consider that an interface would have to contain the unfoldings for every exported identifier, and the unfoldings of anything referred to by those unfoldings, and so on. Imagine the size of Prelude.hi! (historical note: it did work this way a long time ago, I think GHC 2.x was when it changed)

The wiki page says (in "Why not do (1)", where (1) refers to making D.x part of
B's interface)

here (1) refers to

1. arrange that make knows about the dependency of A on D.

which is not the same as making D.x part of B's interface.

This section of the wiki page is about "make", incidentally.
 
that this is because sometimes changes in D.x should not cause
recompiling B's users. I don't understand why (1) would cause this problem. If
we make x a part of B, as if it's defined in B, similar to how we can avoid
recompilation of users of B when a definition of B changes but the interface is
the same, we could avoid recompiling users when D.x changes.

For example,

    -- B.hs
    module B where

    b = 123123

    -- Main.hs
    import B

    main = print b


    $ ghc-stage1 Main.hs
    [1 of 2] Compiling B                ( B.hs, B.o )
    [2 of 2] Compiling Main             ( Main.hs, Main.o )
    Linking Main ...

Now if I update B and recompile I'll only link Main, won't recompile it:

    -- B.hs
    module B where

    b = 123123 + 12308

    $ ghc-stage1 Main.hs
    [1 of 2] Compiling B                ( B.hs, B.o )
    Linking Main ...

Now suppose B.b was a re-export from D. I don't understand why changing it in D
would cause recompiling Main if we make b a part of B's interface. I think what
would happen is: because D's interface hash won't change we won't recompile B.
No problems at all.

I think this all stems from the confusion above.
 

Finally, I'm a bit confused about this part

> To ensure that A is recompiled, we therefore have two options:
> ...
> (2) arrange to touch B.hi and C.hi even if they haven't changed.

I don't understand how touching is relevant, as far as I understand touching
can't force recompilation. Example:

    $ ghc-stage1 Main.hs
    [1 of 3] Compiling A                ( A.hs, A.o )
    [2 of 3] Compiling B                ( B.hs, B.o )
    [3 of 3] Compiling Main             ( Main.hs, Main.o )
    Linking Main ...
    $ touch A.hi
    $ ghc-stage1 Main.hs
    $ touch B.hi
    $ ghc-stage1 Main.hs

Am I missing anything?

Touching is relevant to "make" only, not ghc --make.  Under " Why do we need recompilation avoidance?" there are two sections: "GHCi and --make" and "make", but the formatting doesn't make the structure very clear here. Perhaps this got worse when we migrated to gitlab?. Maybe adding an outline would help make the structure clearer?

Cheers
Simon
 

Thanks,

Ömer

[1]: https://gitlab.haskell.org/ghc/ghc/-/wikis/commentary/compiler/recompilation-avoidance
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Recompilation avoidance questions

Ömer Sinan Ağacan
Thanks Simon,

> We don't want to include the *definitions* of things that are re-exported,
> because that would bloat interface files a lot.

I think by definition you mean unfoldings, pragmas, annotations, and rules,
right?

I'm a bit surprised by this, because this would require tracking transitive
dependencies, which is opposite of what we want to do in #16885.

If M1 re-exports something from M2 and M0 imports M1 then I think we could
consider M2 a direct import, but that complicates the story a little bit. I
think we don't have to track *all* transitive deps though, only tracking
re-export paths should be enough. So maybe this is not too bad.

Ömer

Simon Marlow <[hidden email]>, 22 Nis 2020 Çar, 12:02 tarihinde şunu yazdı:

>
> On Tue, 21 Apr 2020 at 11:38, Ömer Sinan Ağacan <[hidden email]> wrote:
>>
>> Hi all,
>>
>> I'm currently reading the "recompilation avoidance" wiki page [1], and I have a
>> few questions about the current design.
>>
>> The wiki page says (in the paragraph "Suppose the change to D ...") if a module
>> B re-exports x from module D, changing x in D does not cause any changes in B's
>> interface.
>>
>> I'm wondering why this is the case. To me this doesn't make sense. Anything that
>> can potentially effect users of B should be a part of B's interface. This
>> includes re-exports. I don't understand why there is a difference between normal
>> exports and re-exports. As far as users of the module concerned there's no
>> difference. So I'd expect any changes in re-exports to make a difference in B's
>> interface.
>
>
> Yes, that's already the case. Under "Deciding whether to recompile", we say:
>
> * If anything else has changed in a way that would affect the results of compiling this module, we must recompile.
>
> so that's the basic requirement.
>
> We don't want to include the *definitions* of things that are re-exported, because that would bloat interface files a lot. Consider that an interface would have to contain the unfoldings for every exported identifier, and the unfoldings of anything referred to by those unfoldings, and so on. Imagine the size of Prelude.hi! (historical note: it did work this way a long time ago, I think GHC 2.x was when it changed)
>
>> The wiki page says (in "Why not do (1)", where (1) refers to making D.x part of
>> B's interface)
>
>
> here (1) refers to
>
> 1. arrange that make knows about the dependency of A on D.
>
> which is not the same as making D.x part of B's interface.
>
> This section of the wiki page is about "make", incidentally.
>
>>
>> that this is because sometimes changes in D.x should not cause
>> recompiling B's users. I don't understand why (1) would cause this problem. If
>> we make x a part of B, as if it's defined in B, similar to how we can avoid
>> recompilation of users of B when a definition of B changes but the interface is
>> the same, we could avoid recompiling users when D.x changes.
>>
>> For example,
>>
>>     -- B.hs
>>     module B where
>>
>>     b = 123123
>>
>>     -- Main.hs
>>     import B
>>
>>     main = print b
>>
>>
>>     $ ghc-stage1 Main.hs
>>     [1 of 2] Compiling B                ( B.hs, B.o )
>>     [2 of 2] Compiling Main             ( Main.hs, Main.o )
>>     Linking Main ...
>>
>> Now if I update B and recompile I'll only link Main, won't recompile it:
>>
>>     -- B.hs
>>     module B where
>>
>>     b = 123123 + 12308
>>
>>     $ ghc-stage1 Main.hs
>>     [1 of 2] Compiling B                ( B.hs, B.o )
>>     Linking Main ...
>>
>> Now suppose B.b was a re-export from D. I don't understand why changing it in D
>> would cause recompiling Main if we make b a part of B's interface. I think what
>> would happen is: because D's interface hash won't change we won't recompile B.
>> No problems at all.
>
>
> I think this all stems from the confusion above.
>
>>
>>
>> Finally, I'm a bit confused about this part
>>
>> > To ensure that A is recompiled, we therefore have two options:
>> > ...
>> > (2) arrange to touch B.hi and C.hi even if they haven't changed.
>>
>> I don't understand how touching is relevant, as far as I understand touching
>> can't force recompilation. Example:
>>
>>     $ ghc-stage1 Main.hs
>>     [1 of 3] Compiling A                ( A.hs, A.o )
>>     [2 of 3] Compiling B                ( B.hs, B.o )
>>     [3 of 3] Compiling Main             ( Main.hs, Main.o )
>>     Linking Main ...
>>     $ touch A.hi
>>     $ ghc-stage1 Main.hs
>>     $ touch B.hi
>>     $ ghc-stage1 Main.hs
>>
>> Am I missing anything?
>
>
> Touching is relevant to "make" only, not ghc --make.  Under " Why do we need recompilation avoidance?" there are two sections: "GHCi and --make" and "make", but the formatting doesn't make the structure very clear here. Perhaps this got worse when we migrated to gitlab?. Maybe adding an outline would help make the structure clearer?
>
> Cheers
> Simon
>
>>
>>
>> Thanks,
>>
>> Ömer
>>
>> [1]: https://gitlab.haskell.org/ghc/ghc/-/wikis/commentary/compiler/recompilation-avoidance
>> _______________________________________________
>> ghc-devs mailing list
>> [hidden email]
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Recompilation avoidance questions

Simon Marlow-7
On Thu, 23 Apr 2020 at 09:17, Ömer Sinan Ağacan <[hidden email]> wrote:
Thanks Simon,

> We don't want to include the *definitions* of things that are re-exported,
> because that would bloat interface files a lot.

I think by definition you mean unfoldings, pragmas, annotations, and rules,
right?

And the types of bindings, and the definitions of types. Everything that is not the name, basically.
 
I'm a bit surprised by this, because this would require tracking transitive
dependencies, which is opposite of what we want to do in #16885.

Not really. It's just a tradeoff between copying all the definitions (recursively) of things we need into the current module vs. leaving the definitions in the interface of the original module where the entity was defined.

Even if we were to copy the definitions of things we depend on into the current module's interface, we still have to know where they came from, and to know when the original definition changes so that we can recompile. So I don't think there would be any difference in which modules we have to list in the current module's interface file usage list.

Note: the "usages" in the interface file is different from the "dependencies". We're not proposing to change how "usages" work. The difference is explained in https://gitlab.haskell.org/ghc/ghc/-/wikis/commentary/compiler/recompilation-avoidance#deciding-whether-to-recompile

If M1 re-exports something from M2 and M0 imports M1 then I think we could
consider M2 a direct import, but that complicates the story a little bit. I
think we don't have to track *all* transitive deps though, only tracking
re-export paths should be enough. So maybe this is not too bad.

I think we already arrived at a reasonable design on #16885, what do you think of it? Also, David already listed all the places that would potentially need to change if we no longer include transitive dependencies in `dep_mods`: https://gitlab.haskell.org/ghc/ghc/issues/16885#note_215715


There was some subsequent discussion on #16885 about how to handle boot modules, and a proposal to fix that. Aside from that, the idea is to just remove transitive dependencies from `dep_mods` and fix up the places that used it, which David listed in that comment.

Cheers
Simon
 

Ömer

Simon Marlow <[hidden email]>, 22 Nis 2020 Çar, 12:02 tarihinde şunu yazdı:
>
> On Tue, 21 Apr 2020 at 11:38, Ömer Sinan Ağacan <[hidden email]> wrote:
>>
>> Hi all,
>>
>> I'm currently reading the "recompilation avoidance" wiki page [1], and I have a
>> few questions about the current design.
>>
>> The wiki page says (in the paragraph "Suppose the change to D ...") if a module
>> B re-exports x from module D, changing x in D does not cause any changes in B's
>> interface.
>>
>> I'm wondering why this is the case. To me this doesn't make sense. Anything that
>> can potentially effect users of B should be a part of B's interface. This
>> includes re-exports. I don't understand why there is a difference between normal
>> exports and re-exports. As far as users of the module concerned there's no
>> difference. So I'd expect any changes in re-exports to make a difference in B's
>> interface.
>
>
> Yes, that's already the case. Under "Deciding whether to recompile", we say:
>
> * If anything else has changed in a way that would affect the results of compiling this module, we must recompile.
>
> so that's the basic requirement.
>
> We don't want to include the *definitions* of things that are re-exported, because that would bloat interface files a lot. Consider that an interface would have to contain the unfoldings for every exported identifier, and the unfoldings of anything referred to by those unfoldings, and so on. Imagine the size of Prelude.hi! (historical note: it did work this way a long time ago, I think GHC 2.x was when it changed)
>
>> The wiki page says (in "Why not do (1)", where (1) refers to making D.x part of
>> B's interface)
>
>
> here (1) refers to
>
> 1. arrange that make knows about the dependency of A on D.
>
> which is not the same as making D.x part of B's interface.
>
> This section of the wiki page is about "make", incidentally.
>
>>
>> that this is because sometimes changes in D.x should not cause
>> recompiling B's users. I don't understand why (1) would cause this problem. If
>> we make x a part of B, as if it's defined in B, similar to how we can avoid
>> recompilation of users of B when a definition of B changes but the interface is
>> the same, we could avoid recompiling users when D.x changes.
>>
>> For example,
>>
>>     -- B.hs
>>     module B where
>>
>>     b = 123123
>>
>>     -- Main.hs
>>     import B
>>
>>     main = print b
>>
>>
>>     $ ghc-stage1 Main.hs
>>     [1 of 2] Compiling B                ( B.hs, B.o )
>>     [2 of 2] Compiling Main             ( Main.hs, Main.o )
>>     Linking Main ...
>>
>> Now if I update B and recompile I'll only link Main, won't recompile it:
>>
>>     -- B.hs
>>     module B where
>>
>>     b = 123123 + 12308
>>
>>     $ ghc-stage1 Main.hs
>>     [1 of 2] Compiling B                ( B.hs, B.o )
>>     Linking Main ...
>>
>> Now suppose B.b was a re-export from D. I don't understand why changing it in D
>> would cause recompiling Main if we make b a part of B's interface. I think what
>> would happen is: because D's interface hash won't change we won't recompile B.
>> No problems at all.
>
>
> I think this all stems from the confusion above.
>
>>
>>
>> Finally, I'm a bit confused about this part
>>
>> > To ensure that A is recompiled, we therefore have two options:
>> > ...
>> > (2) arrange to touch B.hi and C.hi even if they haven't changed.
>>
>> I don't understand how touching is relevant, as far as I understand touching
>> can't force recompilation. Example:
>>
>>     $ ghc-stage1 Main.hs
>>     [1 of 3] Compiling A                ( A.hs, A.o )
>>     [2 of 3] Compiling B                ( B.hs, B.o )
>>     [3 of 3] Compiling Main             ( Main.hs, Main.o )
>>     Linking Main ...
>>     $ touch A.hi
>>     $ ghc-stage1 Main.hs
>>     $ touch B.hi
>>     $ ghc-stage1 Main.hs
>>
>> Am I missing anything?
>
>
> Touching is relevant to "make" only, not ghc --make.  Under " Why do we need recompilation avoidance?" there are two sections: "GHCi and --make" and "make", but the formatting doesn't make the structure very clear here. Perhaps this got worse when we migrated to gitlab?. Maybe adding an outline would help make the structure clearer?
>
> Cheers
> Simon
>
>>
>>
>> Thanks,
>>
>> Ömer
>>
>> [1]: https://gitlab.haskell.org/ghc/ghc/-/wikis/commentary/compiler/recompilation-avoidance
>> _______________________________________________
>> ghc-devs mailing list
>> [hidden email]
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs