Suggestion for resolving the Cabal/GHC dependency problems

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Suggestion for resolving the Cabal/GHC dependency problems

Duncan Coutts-4
All,

I was discussing this with Yuri earlier and I had an idea that I think
may resolve our problems.

Firstly, what are the problems:

     1. ghc devs and users grumble because the ghc library depends on
        Cabal, making it hard to use the ghc lib with a later Cabal.
     2. ghc devs grumble generally that Cabal seems quite big but they
        only need small parts of it
     3. Cabal devs complain that they cannot add useful dependencies
        (like a parser with error messages) because ghc depends on
        Cabal.

Secondly, let us recall why it is that ghc does use Cabal, and where:

     1. it's used by ghc-pkg to read/write the external representation
        of installed package files (external rep is defined by Cabal
        spec, and implemented in the Cabal lib)
     2. it's used by ghc to read the ghc package database files/dirs.
        These databases use the same external representation, and ghc &
        ghc-pkg use the InstalledPackageInfo type internally
        (InstalledPackageInfo is defined in the Cabal lib).
     3. it's used by the ghc build system to help with building all the
        libraries that ship with ghc. I believe that this part uses more
        of the build system part of Cabal, not just the types and
        external formats.
     4. ghc comes with Cabal pre-installed so that users can run
        Setup.hs scripts to install other packages. This was part of the
        original Cabal design: that all compilers would use the
        installed package info format defined by Cabal, and all
        compilers would ship Cabal to users so the Setup.hs mechanism
        will work.

Now, as far as I know, nobody is suggesting that ghc stop shipping
Cabal, nor that it stop using it as part of the build system.

The problems all centre around use number 2, where the ghc library
package depends on Cabal. Number 1 isn't really a problem because
ghc-pkg is an executable.

So my suggestion is quite simple, eliminate the dependency in case 2
above, but keep it in the other three cases. Specifically:

      * ghc will use a new internal type to represent info coming from
        the ghc-pkg databases, ie not InstalledPackageInfo. This can be
        smaller as ghc doesn't care about the metadata.
      * The InstalledPackageInfo and the current need for ghc to read
        its external representation is the main reason the ghc lib
        depends on Cabal. Other dependencies should be minor and easy to
        remove.
      * ghc and ghc-pkg will agree on a new on-disk representation of
        the installed package info.
      * ghc-pkg will continue to depend on Cabal, it will continue to
        use the types and parsers defined by Cabal to read/write the
        InstalledPackageInfo. It will translate from
        InstalledPackageInfo into the on-disk representation that ghc &
        ghc-pkg share.

So what might the on-disk representation for the ghc-pkg databases look
like? Currently they use the external format of InstalledPackageInfo
because this is convenient using Cabal.

One simple option is just to store both formats for all packages.
Another option would be that ghc never reads package dbs where the cache
is out of date. Then it only ever reads the cache and never has to look
at the other files. In principle the cache should never be out of date:
there are two options for updating the db, calling ghc-pkg, or putting
the file directly and calling ghc-pkg recache (distros often use the
latter as it is simpler for them). In either case the db cache will be
up to date. (In fact calling it a cache is not really correct.)

So this is a better solution than the one previously proposed to split
out some small part of Cabal, because in this proposal, ghc doesn't
depend on Cabal at all, not even some smaller common lib.

It's also better from the point of view of the Cabal folks because it
does not involve splitting Cabal in unnatural ways. The Cabal folks do
want to split the Cabal lib, but not in a way that is especially helpful
to ghc. This suggestion is orthogonal to any Cabal lib splits.

Further, if only ghc-pkg and the ghc build system depend on Cabal, then
it is easier for Cabal to add more dependencies, since they do not have
to be installed with ghc (due to the ghc lib depending on them). In
particular the Cabal folks would like to use a proper parser and have
suggested adding dependencies on parsec, mtl and transformers. If only
ghc-pkg depends on Cabal, then these dependencies only need to be used
at build time, and do not have to be installed (which also means they
don't have to be kept quite so up to date).


Note that this would not address SPJ's complaint that the start of
building ghc involves building 60+ modules of Cabal. The ghc-cabal tool
still uses Cabal and I am not suggesting changing that now. It's
plausible that when the Cabal lib is split that the ghc-cabal tool could
depend on just the smaller of the two (someone would need to look at how
much functionality from the "Simple" build system it uses). I don't see
that this is a big priority however.

Duncan



Reply | Threaded
Open this post in threaded view
|

Suggestion for resolving the Cabal/GHC dependency problems

Duncan Coutts-4
On Wed, 2013-09-11 at 17:28 +0100, Duncan Coutts wrote:

> Further, if only ghc-pkg and the ghc build system depend on Cabal, then
> it is easier for Cabal to add more dependencies, since they do not have
> to be installed with ghc (due to the ghc lib depending on them). In
> particular the Cabal folks would like to use a proper parser and have
> suggested adding dependencies on parsec, mtl and transformers. If only
> ghc-pkg depends on Cabal, then these dependencies only need to be used
> at build time, and do not have to be installed (which also means they
> don't have to be kept quite so up to date).

Actually, this is not quite right. Since ghc would still ship Cabal (but
not depend on it), it would also ship its dependencies including parsec,
mtl and transformers. So they would need to be up to date and installed,
it's just that ghc itself would not depend on them.

If that's really inconvenient, it's plausible to have a minimal set
which is just the things ghc depends on, so long as what gets shipped to
users is the useful set, including Cabal.

Duncan



Reply | Threaded
Open this post in threaded view
|

Suggestion for resolving the Cabal/GHC dependency problems

Johan Tibell-2
On Wed, Sep 11, 2013 at 12:19 PM, Duncan Coutts <
duncan.coutts at googlemail.com> wrote:

> Actually, this is not quite right. Since ghc would still ship Cabal (but
>  not depend on it), it would also ship its dependencies including parsec,
> mtl and transformers. So they would need to be up to date and installed,
> it's just that ghc itself would not depend on them.
>
> If that's really inconvenient, it's plausible to have a minimal set
> which is just the things ghc depends on, so long as what gets shipped to
> users is the useful set, including Cabal.


I don't quite like how GHC's dependencies leak out to the rest of the
world. It makes it possible for us to decide what version we want to ship
in the platform of those libraries. I guess we don't have a good technical
solution to this problem though.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130911/21d50c17/attachment.htm>

Reply | Threaded
Open this post in threaded view
|

Suggestion for resolving the Cabal/GHC dependency problems

Carter Schonwald
wasn't there an effort to have a mini private variant of attoparsec for the
parser combinator deps?


On Wed, Sep 11, 2013 at 4:03 PM, Johan Tibell <johan.tibell at gmail.com>wrote:

> On Wed, Sep 11, 2013 at 12:19 PM, Duncan Coutts <
> duncan.coutts at googlemail.com> wrote:
>
>> Actually, this is not quite right. Since ghc would still ship Cabal (but
>>  not depend on it), it would also ship its dependencies including parsec,
>> mtl and transformers. So they would need to be up to date and installed,
>> it's just that ghc itself would not depend on them.
>>
>> If that's really inconvenient, it's plausible to have a minimal set
>> which is just the things ghc depends on, so long as what gets shipped to
>> users is the useful set, including Cabal.
>
>
> I don't quite like how GHC's dependencies leak out to the rest of the
> world. It makes it possible for us to decide what version we want to ship
> in the platform of those libraries. I guess we don't have a good technical
> solution to this problem though.
>
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130911/d2e320d8/attachment.htm>

Reply | Threaded
Open this post in threaded view
|

Suggestion for resolving the Cabal/GHC dependency problems

Simon Marlow-7
In reply to this post by Duncan Coutts-4
On 11/09/13 17:28, Duncan Coutts wrote:

> All,
>
> I was discussing this with Yuri earlier and I had an idea that I think
> may resolve our problems.
>
> Firstly, what are the problems:
>
>       1. ghc devs and users grumble because the ghc library depends on
>          Cabal, making it hard to use the ghc lib with a later Cabal.
>       2. ghc devs grumble generally that Cabal seems quite big but they
>          only need small parts of it
>       3. Cabal devs complain that they cannot add useful dependencies
>          (like a parser with error messages) because ghc depends on
>          Cabal.
>
> Secondly, let us recall why it is that ghc does use Cabal, and where:
>
>       1. it's used by ghc-pkg to read/write the external representation
>          of installed package files (external rep is defined by Cabal
>          spec, and implemented in the Cabal lib)
>       2. it's used by ghc to read the ghc package database files/dirs.
>          These databases use the same external representation, and ghc &
>          ghc-pkg use the InstalledPackageInfo type internally
>          (InstalledPackageInfo is defined in the Cabal lib).
>       3. it's used by the ghc build system to help with building all the
>          libraries that ship with ghc. I believe that this part uses more
>          of the build system part of Cabal, not just the types and
>          external formats.
>       4. ghc comes with Cabal pre-installed so that users can run
>          Setup.hs scripts to install other packages. This was part of the
>          original Cabal design: that all compilers would use the
>          installed package info format defined by Cabal, and all
>          compilers would ship Cabal to users so the Setup.hs mechanism
>          will work.
>
> Now, as far as I know, nobody is suggesting that ghc stop shipping
> Cabal, nor that it stop using it as part of the build system.
>
> The problems all centre around use number 2, where the ghc library
> package depends on Cabal. Number 1 isn't really a problem because
> ghc-pkg is an executable.
>
> So my suggestion is quite simple, eliminate the dependency in case 2
> above, but keep it in the other three cases. Specifically:
>
>        * ghc will use a new internal type to represent info coming from
>          the ghc-pkg databases, ie not InstalledPackageInfo. This can be
>          smaller as ghc doesn't care about the metadata.
>        * The InstalledPackageInfo and the current need for ghc to read
>          its external representation is the main reason the ghc lib
>          depends on Cabal. Other dependencies should be minor and easy to
>          remove.
>        * ghc and ghc-pkg will agree on a new on-disk representation of
>          the installed package info.
>        * ghc-pkg will continue to depend on Cabal, it will continue to
>          use the types and parsers defined by Cabal to read/write the
>          InstalledPackageInfo. It will translate from
>          InstalledPackageInfo into the on-disk representation that ghc &
>          ghc-pkg share.
>
> So what might the on-disk representation for the ghc-pkg databases look
> like? Currently they use the external format of InstalledPackageInfo
> because this is convenient using Cabal.
>
> One simple option is just to store both formats for all packages.
> Another option would be that ghc never reads package dbs where the cache
> is out of date. Then it only ever reads the cache and never has to look
> at the other files. In principle the cache should never be out of date:
> there are two options for updating the db, calling ghc-pkg, or putting
> the file directly and calling ghc-pkg recache (distros often use the
> latter as it is simpler for them). In either case the db cache will be
> up to date. (In fact calling it a cache is not really correct.)

GHC currently always reads the binary cache, even if it is out of date
(I just checked).  However, it still also supports the legacy format of
package databases using the Read instance of InstalledPackageInfo.  I'm
not sure whether this is still used at all.

We certainly could make another type similar to InstalledPackageInfo,
derive Binary for it, and use that as the package database format.  I
think you're right that it's probably easier to do this than to split
out InstalledPackageInfo from Cabal.  We would need to make small
package for this that would be shared by ghc-pkg and GHC.

Cheers,
        Simon


> So this is a better solution than the one previously proposed to split
> out some small part of Cabal, because in this proposal, ghc doesn't
> depend on Cabal at all, not even some smaller common lib.
>
> It's also better from the point of view of the Cabal folks because it
> does not involve splitting Cabal in unnatural ways. The Cabal folks do
> want to split the Cabal lib, but not in a way that is especially helpful
> to ghc. This suggestion is orthogonal to any Cabal lib splits.
>
> Further, if only ghc-pkg and the ghc build system depend on Cabal, then
> it is easier for Cabal to add more dependencies, since they do not have
> to be installed with ghc (due to the ghc lib depending on them). In
> particular the Cabal folks would like to use a proper parser and have
> suggested adding dependencies on parsec, mtl and transformers. If only
> ghc-pkg depends on Cabal, then these dependencies only need to be used
> at build time, and do not have to be installed (which also means they
> don't have to be kept quite so up to date).
>
>
> Note that this would not address SPJ's complaint that the start of
> building ghc involves building 60+ modules of Cabal. The ghc-cabal tool
> still uses Cabal and I am not suggesting changing that now. It's
> plausible that when the Cabal lib is split that the ghc-cabal tool could
> depend on just the smaller of the two (someone would need to look at how
> much functionality from the "Simple" build system it uses). I don't see
> that this is a big priority however.
>
> Duncan
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs
>