Minimizing cascading rebuilds

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Minimizing cascading rebuilds

joshchia
Hi,

In my project, I have multiple packages. One of the packages, packageA, is very fundamental and depended on directly and indirectly by almost all the other packages. It has functions that use some hard-coded data (a ByteString top-level variable) also defined within packageA.

This hard-coded data is appended regularly, causing packageA to be rebuilt and thus almost all the other packages to be rebuilt, and building takes a painfully long time. I know I can move this hard-coded data to a file that's read at run-time, but that means one more item to plumb in at run-time (where to find the file), and IO (preventing the functions from being pure), so I would like to keep it hard-coded.

Is there an elegant way to prevent or minimize the cascading rebuild of the dependent packages just because the hard-coded data in packageA changed?

For analogy, in C or C++, source code gets compiled to .o files, one for each .cpp source file. Multiple .o files get linked into executables. So, unless the interface (.hpp files) also change, an implementation (.cpp file) change does not cause dependents to be recompiled to get new .o files, although dependent executables get relinked. I'm not familiar with the compilation and linking logic in GHC so maybe it has additional complications.

BTW, I'm using stack, in case it makes any difference to the nature of the problem.

Josh

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Minimizing cascading rebuilds

Dan Burton
I don't believe this is currently possible with ghc, due to the way ghc handles optimizations. I would love to be proven wrong on that.

On Wed, Mar 28, 2018, 22:11 ☂Josh Chia (謝任中) <[hidden email]> wrote:
Hi,

In my project, I have multiple packages. One of the packages, packageA, is very fundamental and depended on directly and indirectly by almost all the other packages. It has functions that use some hard-coded data (a ByteString top-level variable) also defined within packageA.

This hard-coded data is appended regularly, causing packageA to be rebuilt and thus almost all the other packages to be rebuilt, and building takes a painfully long time. I know I can move this hard-coded data to a file that's read at run-time, but that means one more item to plumb in at run-time (where to find the file), and IO (preventing the functions from being pure), so I would like to keep it hard-coded.

Is there an elegant way to prevent or minimize the cascading rebuild of the dependent packages just because the hard-coded data in packageA changed?

For analogy, in C or C++, source code gets compiled to .o files, one for each .cpp source file. Multiple .o files get linked into executables. So, unless the interface (.hpp files) also change, an implementation (.cpp file) change does not cause dependents to be recompiled to get new .o files, although dependent executables get relinked. I'm not familiar with the compilation and linking logic in GHC so maybe it has additional complications.

BTW, I'm using stack, in case it makes any difference to the nature of the problem.

Josh
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Minimizing cascading rebuilds

Brandon Allbery
In reply to this post by joshchia
The logic has *lots* of additional complications; Haskell isn't C. https://wiki.haskell.org/Shared_libraries_and_GHC isn't quite about this (shared libraries have even more complications), but has a reasonable overview.

Hypothetically, this specific case *could* be handled better. But it doesn't come up often enough, and the whole thing is tangled enough that it's only very recently that ghc's dependency handling started to play along well with nix's idea of how things work, much less anything trickier.

Although this might be one of the rare cases where -split-objs could be worth the cost (-split-sections is generally preferred with recent ghc, but wouldn't help with this specific case).

On Thu, Mar 29, 2018 at 1:09 AM, ☂Josh Chia (謝任中) <[hidden email]> wrote:
Hi,

In my project, I have multiple packages. One of the packages, packageA, is very fundamental and depended on directly and indirectly by almost all the other packages. It has functions that use some hard-coded data (a ByteString top-level variable) also defined within packageA.

This hard-coded data is appended regularly, causing packageA to be rebuilt and thus almost all the other packages to be rebuilt, and building takes a painfully long time. I know I can move this hard-coded data to a file that's read at run-time, but that means one more item to plumb in at run-time (where to find the file), and IO (preventing the functions from being pure), so I would like to keep it hard-coded.

Is there an elegant way to prevent or minimize the cascading rebuild of the dependent packages just because the hard-coded data in packageA changed?

For analogy, in C or C++, source code gets compiled to .o files, one for each .cpp source file. Multiple .o files get linked into executables. So, unless the interface (.hpp files) also change, an implementation (.cpp file) change does not cause dependents to be recompiled to get new .o files, although dependent executables get relinked. I'm not familiar with the compilation and linking logic in GHC so maybe it has additional complications.

BTW, I'm using stack, in case it makes any difference to the nature of the problem.

Josh

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.



--
brandon s allbery kf8nh                               sine nomine associates
[hidden email]                                  [hidden email]
unix, openafs, kerberos, infrastructure, xmonad        http://sinenomine.net

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Minimizing cascading rebuilds

Brandon Allbery
In reply to this post by Dan Burton
Optimizations shouldn't matter here: the problems there are caused by inlining, and if this data were short enough to get inlined then it also wouldn't be causing problems.

Although I should note that ByteString "literals" aren't actually literals in the way the OP thinks, and this will also cause problems. The external data file is actually preferable for this reason.

On Thu, Mar 29, 2018 at 1:25 AM, Dan Burton <[hidden email]> wrote:
I don't believe this is currently possible with ghc, due to the way ghc handles optimizations. I would love to be proven wrong on that.

On Wed, Mar 28, 2018, 22:11 ☂Josh Chia (謝任中) <[hidden email]> wrote:
Hi,

In my project, I have multiple packages. One of the packages, packageA, is very fundamental and depended on directly and indirectly by almost all the other packages. It has functions that use some hard-coded data (a ByteString top-level variable) also defined within packageA.

This hard-coded data is appended regularly, causing packageA to be rebuilt and thus almost all the other packages to be rebuilt, and building takes a painfully long time. I know I can move this hard-coded data to a file that's read at run-time, but that means one more item to plumb in at run-time (where to find the file), and IO (preventing the functions from being pure), so I would like to keep it hard-coded.

Is there an elegant way to prevent or minimize the cascading rebuild of the dependent packages just because the hard-coded data in packageA changed?

For analogy, in C or C++, source code gets compiled to .o files, one for each .cpp source file. Multiple .o files get linked into executables. So, unless the interface (.hpp files) also change, an implementation (.cpp file) change does not cause dependents to be recompiled to get new .o files, although dependent executables get relinked. I'm not familiar with the compilation and linking logic in GHC so maybe it has additional complications.

BTW, I'm using stack, in case it makes any difference to the nature of the problem.

Josh
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.



--
brandon s allbery kf8nh                               sine nomine associates
[hidden email]                                  [hidden email]
unix, openafs, kerberos, infrastructure, xmonad        http://sinenomine.net

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Minimizing cascading rebuilds

Theodore Lief Gannon
In reply to this post by joshchia
(sorry for duplicate, failed to reply to list)

Haskell is going to be unwieldy if you try to code in a non-functional style! Your data, however static, should be passed as an argument to functions that need it. In this case, "as an argument" should probably be read as "with the Reader monad."

If you want to avoid loading via IO, you can still put your data set in a separate package. It will depend on packageA to define the data type, but nothing but your final executable has to depend on it -- in fact, it can probably live in the executable's source tree. Put all of your logic that depends on that data (directly or indirectly) in a Reader, and invoke it at the top level.

Once you've got that down, it's very much worth reading up on transformers and mtl. I'd also suggest looking at the Rio[1] prelude (currently not quite to a stable release, but will be inside the month) which is built around encouraging current best practices. Once you know the tools, this style becomes convenient, extremely versatile, and helps avoid unnecessary dependencies all over the place.


On Wed, Mar 28, 2018, 10:11 PM ☂Josh Chia (謝任中) <[hidden email]> wrote:
Hi,

In my project, I have multiple packages. One of the packages, packageA, is very fundamental and depended on directly and indirectly by almost all the other packages. It has functions that use some hard-coded data (a ByteString top-level variable) also defined within packageA.

This hard-coded data is appended regularly, causing packageA to be rebuilt and thus almost all the other packages to be rebuilt, and building takes a painfully long time. I know I can move this hard-coded data to a file that's read at run-time, but that means one more item to plumb in at run-time (where to find the file), and IO (preventing the functions from being pure), so I would like to keep it hard-coded.

Is there an elegant way to prevent or minimize the cascading rebuild of the dependent packages just because the hard-coded data in packageA changed?

For analogy, in C or C++, source code gets compiled to .o files, one for each .cpp source file. Multiple .o files get linked into executables. So, unless the interface (.hpp files) also change, an implementation (.cpp file) change does not cause dependents to be recompiled to get new .o files, although dependent executables get relinked. I'm not familiar with the compilation and linking logic in GHC so maybe it has additional complications.

BTW, I'm using stack, in case it makes any difference to the nature of the problem.

Josh
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Minimizing cascading rebuilds

Vo Minh Thu
Hi,

Maybe that switching to another linker can help, e.g. gold or the llvm one.

Cheers,
Thu

Le 29 mars 2018 07:43, "Theodore Lief Gannon" <[hidden email]> a écrit :
(sorry for duplicate, failed to reply to list)

Haskell is going to be unwieldy if you try to code in a non-functional style! Your data, however static, should be passed as an argument to functions that need it. In this case, "as an argument" should probably be read as "with the Reader monad."

If you want to avoid loading via IO, you can still put your data set in a separate package. It will depend on packageA to define the data type, but nothing but your final executable has to depend on it -- in fact, it can probably live in the executable's source tree. Put all of your logic that depends on that data (directly or indirectly) in a Reader, and invoke it at the top level.

Once you've got that down, it's very much worth reading up on transformers and mtl. I'd also suggest looking at the Rio[1] prelude (currently not quite to a stable release, but will be inside the month) which is built around encouraging current best practices. Once you know the tools, this style becomes convenient, extremely versatile, and helps avoid unnecessary dependencies all over the place.


On Wed, Mar 28, 2018, 10:11 PM ☂Josh Chia (謝任中) <[hidden email]> wrote:
Hi,

In my project, I have multiple packages. One of the packages, packageA, is very fundamental and depended on directly and indirectly by almost all the other packages. It has functions that use some hard-coded data (a ByteString top-level variable) also defined within packageA.

This hard-coded data is appended regularly, causing packageA to be rebuilt and thus almost all the other packages to be rebuilt, and building takes a painfully long time. I know I can move this hard-coded data to a file that's read at run-time, but that means one more item to plumb in at run-time (where to find the file), and IO (preventing the functions from being pure), so I would like to keep it hard-coded.

Is there an elegant way to prevent or minimize the cascading rebuild of the dependent packages just because the hard-coded data in packageA changed?

For analogy, in C or C++, source code gets compiled to .o files, one for each .cpp source file. Multiple .o files get linked into executables. So, unless the interface (.hpp files) also change, an implementation (.cpp file) change does not cause dependents to be recompiled to get new .o files, although dependent executables get relinked. I'm not familiar with the compilation and linking logic in GHC so maybe it has additional complications.

BTW, I'm using stack, in case it makes any difference to the nature of the problem.

Josh
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Minimizing cascading rebuilds

Rahul Muttineni
In reply to this post by joshchia
Hi Josh,

I just tried a quick experiment with stack resolver lts-11.2 and I'd like to share the results as there are interesting:

1. Consider the following module setup that's a simplified version of your situation
Dependencies:
- Main depends on Hi
- Hi depends on Hum
- Hee depends on Hum

Main.hs:
```
module Main where

import Hi
import Hee

main :: IO ()
main = print $ hi ++ hee ++ "!"
```

Hee.hs:
```
module Hee (hee) where

import Hum (hum)

hee :: String
hee = "hee1" ++ hum
```

Hi.hs
```
module Hi (hi) where

import Hum (hum)

hi :: String
hi = "hi1" ++ hum
```

Hum.hs
```
module Hum (hum) where

hum :: String
hum = "hum"
```

2. Now build it once with `stack build`.
3. Now change "hum" to "hum1" and run `stack build` notice that all 4 modules will recompile.
4. Now add {-# NOINLINE hum #-} just above hum :: String and run `stack build`
5. Change hum again and run `stack build`.
6. Only Hum will recompile!

Lesson: Add NOINLINE to any function/value that you change frequently and don't want to trigger massive recompilations. This does come at a performace tradeoff since GHC will not be able to inline whatever you added that pragma to, but your compile-time will be saved. In your case of hard-coded data, I think you won't be able to measure any performance penalty.

Hope that helps,
Rahul


On Thu, Mar 29, 2018 at 10:39 AM, ☂Josh Chia (謝任中) <[hidden email]> wrote:
Hi,

In my project, I have multiple packages. One of the packages, packageA, is very fundamental and depended on directly and indirectly by almost all the other packages. It has functions that use some hard-coded data (a ByteString top-level variable) also defined within packageA.

This hard-coded data is appended regularly, causing packageA to be rebuilt and thus almost all the other packages to be rebuilt, and building takes a painfully long time. I know I can move this hard-coded data to a file that's read at run-time, but that means one more item to plumb in at run-time (where to find the file), and IO (preventing the functions from being pure), so I would like to keep it hard-coded.

Is there an elegant way to prevent or minimize the cascading rebuild of the dependent packages just because the hard-coded data in packageA changed?

For analogy, in C or C++, source code gets compiled to .o files, one for each .cpp source file. Multiple .o files get linked into executables. So, unless the interface (.hpp files) also change, an implementation (.cpp file) change does not cause dependents to be recompiled to get new .o files, although dependent executables get relinked. I'm not familiar with the compilation and linking logic in GHC so maybe it has additional complications.

BTW, I'm using stack, in case it makes any difference to the nature of the problem.

Josh

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.



--
Rahul Muttineni

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Minimizing cascading rebuilds

arrowdodger
In reply to this post by joshchia


On Thu, Mar 29, 2018 at 8:09 AM, ☂Josh Chia (謝任中) <[hidden email]> wrote:
Hi,

In my project, I have multiple packages. One of the packages, packageA, is very fundamental and depended on directly and indirectly by almost all the other packages. It has functions that use some hard-coded data (a ByteString top-level variable) also defined within packageA.

This hard-coded data is appended regularly, causing packageA to be rebuilt and thus almost all the other packages to be rebuilt, and building takes a painfully long time. I know I can move this hard-coded data to a file that's read at run-time, but that means one more item to plumb in at run-time (where to find the file), and IO (preventing the functions from being pure), so I would like to keep it hard-coded.

Is there an elegant way to prevent or minimize the cascading rebuild of the dependent packages just because the hard-coded data in packageA changed?

For analogy, in C or C++, source code gets compiled to .o files, one for each .cpp source file. Multiple .o files get linked into executables. So, unless the interface (.hpp files) also change, an implementation (.cpp file) change does not cause dependents to be recompiled to get new .o files, although dependent executables get relinked. I'm not familiar with the compilation and linking logic in GHC so maybe it has additional complications.

BTW, I'm using stack, in case it makes any difference to the nature of the problem.

Josh

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.

Since you mentioned C++, why just not use preprocessor? For "developer" builds you can use a runtime file and for "release" ones - embedded file.


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Minimizing cascading rebuilds

joshchia
In reply to this post by Rahul Muttineni
Rahul's idea works within a package to prevent cascading builds of modules, but when the base package needs to be rebuilt, it is unregistered first as are its direct and indirect dependents, so unfortunately the idea works on an intra-package level but not an inter-package level.

Regarding Gleb's comment, isn't this CPP distinction between embedded and external file going to affect a lot of code including their function signatures? (One version takes a filename for the data file and does IO and another version doesn't) Sounds hard to maintain.

On Thu, Mar 29, 2018 at 2:14 PM, Rahul Muttineni <[hidden email]> wrote:
Hi Josh,

I just tried a quick experiment with stack resolver lts-11.2 and I'd like to share the results as there are interesting:

1. Consider the following module setup that's a simplified version of your situation
Dependencies:
- Main depends on Hi
- Hi depends on Hum
- Hee depends on Hum

Main.hs:
```
module Main where

import Hi
import Hee

main :: IO ()
main = print $ hi ++ hee ++ "!"
```

Hee.hs:
```
module Hee (hee) where

import Hum (hum)

hee :: String
hee = "hee1" ++ hum
```

Hi.hs
```
module Hi (hi) where

import Hum (hum)

hi :: String
hi = "hi1" ++ hum
```

Hum.hs
```
module Hum (hum) where

hum :: String
hum = "hum"
```

2. Now build it once with `stack build`.
3. Now change "hum" to "hum1" and run `stack build` notice that all 4 modules will recompile.
4. Now add {-# NOINLINE hum #-} just above hum :: String and run `stack build`
5. Change hum again and run `stack build`.
6. Only Hum will recompile!

Lesson: Add NOINLINE to any function/value that you change frequently and don't want to trigger massive recompilations. This does come at a performace tradeoff since GHC will not be able to inline whatever you added that pragma to, but your compile-time will be saved. In your case of hard-coded data, I think you won't be able to measure any performance penalty.

Hope that helps,
Rahul


On Thu, Mar 29, 2018 at 10:39 AM, ☂Josh Chia (謝任中) <[hidden email]> wrote:
Hi,

In my project, I have multiple packages. One of the packages, packageA, is very fundamental and depended on directly and indirectly by almost all the other packages. It has functions that use some hard-coded data (a ByteString top-level variable) also defined within packageA.

This hard-coded data is appended regularly, causing packageA to be rebuilt and thus almost all the other packages to be rebuilt, and building takes a painfully long time. I know I can move this hard-coded data to a file that's read at run-time, but that means one more item to plumb in at run-time (where to find the file), and IO (preventing the functions from being pure), so I would like to keep it hard-coded.

Is there an elegant way to prevent or minimize the cascading rebuild of the dependent packages just because the hard-coded data in packageA changed?

For analogy, in C or C++, source code gets compiled to .o files, one for each .cpp source file. Multiple .o files get linked into executables. So, unless the interface (.hpp files) also change, an implementation (.cpp file) change does not cause dependents to be recompiled to get new .o files, although dependent executables get relinked. I'm not familiar with the compilation and linking logic in GHC so maybe it has additional complications.

BTW, I'm using stack, in case it makes any difference to the nature of the problem.

Josh

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.



--
Rahul Muttineni


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Minimizing cascading rebuilds

arrowdodger


On Thu, Mar 29, 2018 at 7:31 PM, ☂Josh Chia (謝任中) <[hidden email]> wrote:
Rahul's idea works within a package to prevent cascading builds of modules, but when the base package needs to be rebuilt, it is unregistered first as are its direct and indirect dependents, so unfortunately the idea works on an intra-package level but not an inter-package level.

Regarding Gleb's comment, isn't this CPP distinction between embedded and external file going to affect a lot of code including their function signatures? (One version takes a filename for the data file and does IO and another version doesn't) Sounds hard to maintain.

Well, you can wrap your data in IO:

#ifdef DEVEL_BUILD
constantData'  :: ByteString
constantData' = $(embedFile "blabla")

constantData :: IO ByteString
constantData = return constantData '
#else
constantData :: IO ByteString
constantData = readFile "blabla"
#endif


On Thu, Mar 29, 2018 at 2:14 PM, Rahul Muttineni <[hidden email]> wrote:
Hi Josh,

I just tried a quick experiment with stack resolver lts-11.2 and I'd like to share the results as there are interesting:

1. Consider the following module setup that's a simplified version of your situation
Dependencies:
- Main depends on Hi
- Hi depends on Hum
- Hee depends on Hum

Main.hs:
```
module Main where

import Hi
import Hee

main :: IO ()
main = print $ hi ++ hee ++ "!"
```

Hee.hs:
```
module Hee (hee) where

import Hum (hum)

hee :: String
hee = "hee1" ++ hum
```

Hi.hs
```
module Hi (hi) where

import Hum (hum)

hi :: String
hi = "hi1" ++ hum
```

Hum.hs
```
module Hum (hum) where

hum :: String
hum = "hum"
```

2. Now build it once with `stack build`.
3. Now change "hum" to "hum1" and run `stack build` notice that all 4 modules will recompile.
4. Now add {-# NOINLINE hum #-} just above hum :: String and run `stack build`
5. Change hum again and run `stack build`.
6. Only Hum will recompile!

Lesson: Add NOINLINE to any function/value that you change frequently and don't want to trigger massive recompilations. This does come at a performace tradeoff since GHC will not be able to inline whatever you added that pragma to, but your compile-time will be saved. In your case of hard-coded data, I think you won't be able to measure any performance penalty.

Hope that helps,
Rahul


On Thu, Mar 29, 2018 at 10:39 AM, ☂Josh Chia (謝任中) <[hidden email]> wrote:
Hi,

In my project, I have multiple packages. One of the packages, packageA, is very fundamental and depended on directly and indirectly by almost all the other packages. It has functions that use some hard-coded data (a ByteString top-level variable) also defined within packageA.

This hard-coded data is appended regularly, causing packageA to be rebuilt and thus almost all the other packages to be rebuilt, and building takes a painfully long time. I know I can move this hard-coded data to a file that's read at run-time, but that means one more item to plumb in at run-time (where to find the file), and IO (preventing the functions from being pure), so I would like to keep it hard-coded.

Is there an elegant way to prevent or minimize the cascading rebuild of the dependent packages just because the hard-coded data in packageA changed?

For analogy, in C or C++, source code gets compiled to .o files, one for each .cpp source file. Multiple .o files get linked into executables. So, unless the interface (.hpp files) also change, an implementation (.cpp file) change does not cause dependents to be recompiled to get new .o files, although dependent executables get relinked. I'm not familiar with the compilation and linking logic in GHC so maybe it has additional complications.

BTW, I'm using stack, in case it makes any difference to the nature of the problem.

Josh

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.



--
Rahul Muttineni


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Minimizing cascading rebuilds

Alex Rozenshteyn
Might this be something backpack can help with? Write an interface file for your data library and actually link it in at your top level application? This doesn't help the total build time, but it helps tighten the feedback loop for the intermediate packages.

This is tangentially related to Theodre's suggestion in that your data doesn't get combined in until late in the build.

On Thu, Mar 29, 2018 at 10:59 AM Gleb Popov <[hidden email]> wrote:
On Thu, Mar 29, 2018 at 7:31 PM, ☂Josh Chia (謝任中) <[hidden email]> wrote:
Rahul's idea works within a package to prevent cascading builds of modules, but when the base package needs to be rebuilt, it is unregistered first as are its direct and indirect dependents, so unfortunately the idea works on an intra-package level but not an inter-package level.

Regarding Gleb's comment, isn't this CPP distinction between embedded and external file going to affect a lot of code including their function signatures? (One version takes a filename for the data file and does IO and another version doesn't) Sounds hard to maintain.

Well, you can wrap your data in IO:

#ifdef DEVEL_BUILD
constantData'  :: ByteString
constantData' = $(embedFile "blabla")

constantData :: IO ByteString
constantData = return constantData '
#else
constantData :: IO ByteString
constantData = readFile "blabla"
#endif


On Thu, Mar 29, 2018 at 2:14 PM, Rahul Muttineni <[hidden email]> wrote:
Hi Josh,

I just tried a quick experiment with stack resolver lts-11.2 and I'd like to share the results as there are interesting:

1. Consider the following module setup that's a simplified version of your situation
Dependencies:
- Main depends on Hi
- Hi depends on Hum
- Hee depends on Hum

Main.hs:
```
module Main where

import Hi
import Hee

main :: IO ()
main = print $ hi ++ hee ++ "!"
```

Hee.hs:
```
module Hee (hee) where

import Hum (hum)

hee :: String
hee = "hee1" ++ hum
```

Hi.hs
```
module Hi (hi) where

import Hum (hum)

hi :: String
hi = "hi1" ++ hum
```

Hum.hs
```
module Hum (hum) where

hum :: String
hum = "hum"
```

2. Now build it once with `stack build`.
3. Now change "hum" to "hum1" and run `stack build` notice that all 4 modules will recompile.
4. Now add {-# NOINLINE hum #-} just above hum :: String and run `stack build`
5. Change hum again and run `stack build`.
6. Only Hum will recompile!

Lesson: Add NOINLINE to any function/value that you change frequently and don't want to trigger massive recompilations. This does come at a performace tradeoff since GHC will not be able to inline whatever you added that pragma to, but your compile-time will be saved. In your case of hard-coded data, I think you won't be able to measure any performance penalty.

Hope that helps,
Rahul


On Thu, Mar 29, 2018 at 10:39 AM, ☂Josh Chia (謝任中) <[hidden email]> wrote:
Hi,

In my project, I have multiple packages. One of the packages, packageA, is very fundamental and depended on directly and indirectly by almost all the other packages. It has functions that use some hard-coded data (a ByteString top-level variable) also defined within packageA.

This hard-coded data is appended regularly, causing packageA to be rebuilt and thus almost all the other packages to be rebuilt, and building takes a painfully long time. I know I can move this hard-coded data to a file that's read at run-time, but that means one more item to plumb in at run-time (where to find the file), and IO (preventing the functions from being pure), so I would like to keep it hard-coded.

Is there an elegant way to prevent or minimize the cascading rebuild of the dependent packages just because the hard-coded data in packageA changed?

For analogy, in C or C++, source code gets compiled to .o files, one for each .cpp source file. Multiple .o files get linked into executables. So, unless the interface (.hpp files) also change, an implementation (.cpp file) change does not cause dependents to be recompiled to get new .o files, although dependent executables get relinked. I'm not familiar with the compilation and linking logic in GHC so maybe it has additional complications.

BTW, I'm using stack, in case it makes any difference to the nature of the problem.

Josh

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.



--
Rahul Muttineni


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.