Advance notice that I'd like to make Cabal depend on parsec

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
33 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Advance notice that I'd like to make Cabal depend on parsec

Duncan Coutts-4
Hi folks,

I want to give you advance notice that I would like to make Cabal depend
on parsec. The implication is that GHC would therefore depend on parsec
and thus it would become a core package, rather than just a HP package.
So this would affect both GHC and the HP, though I hope not too much.

The rationale is that Cabal needs to parse things, like .cabal files and
currently we do not have a decent parser in the core libraries. By
decent I mean one that can produce error messages with source locations
and that doesn't have unpredictable memory use. The only parser in the
core libraries at the moment is Text.ParserCombinators.ReadP from the
base package and that fails my "decent" criteria on both counts. Its
idea of an error message is (), and on some largish .cabal files we take
100s of MB to parse (I realise that the ReadP in the base package is a
cutdown version so I don't mean to malign all ReadP-style libs out
there).

Partly due to the performance problem, the terrible .cabal file error
messages, and partly because Doaitse Swierstra keeps asking me if .cabal
files have a grammar, I've been writing a new .cabal parser. It uses an
alex lexer and a parsec parser. It's fast and the error messages are
pretty good. I have reverse engineered a grammar that closely matches
the existing parser and .cabal files in the wild, though I'm not sure
Doaitse will be satisfied with the approach I've taken to handling
layout.

Why did I choose parsec? Practicality dictates that I can only use
things in the core libraries, and the nearest thing we have to that is
the parser lib that is in the HP. I tried to use happy but I could not
construct a grammar/lexer combo to handle the layout (also, happy is not
exactly known for its great error messages).

I've been doing regression testing against hackage and I'm satisfied
that the new parser matches close enough. I've uncovered all kinds of
horrors with .cabal files in the wild relying on quirks of the old
parser. I've made adjustments for most of them but I will be breaking a
half dozen old packages (most of those don't actually build correctly
because though their syntax errors are not picked up by the parser, they
do cause failure eventually).

So far I've just done the outline parser, not the individual field
parsers. I'll be doing those next and then integrate. So this change is
still a bit of a ways off, but I thought it'd be useful to warn people
now.

Duncan



Reply | Threaded
Open this post in threaded view
|

Advance notice that I'd like to make Cabal depend on parsec

Duncan Coutts-3
On Thu, 2013-03-14 at 14:53 +0000, Duncan Coutts wrote:
> Hi folks,
>
> I want to give you advance notice that I would like to make Cabal depend
> on parsec. The implication is that GHC would therefore depend on parsec
> and thus it would become a core package, rather than just a HP package.
> So this would affect both GHC and the HP, though I hope not too much.

It's already been pointed out to me that this also implies the following
dependencies:

text, deepseq, mtl, transformers

deepseq is a core package already I think, though ghc doesn't actually
depend on it currently.

I should also say that I want to make Cabal depend on bytestring and
text too.

--
Duncan Coutts, Haskell Consultant
Well-Typed LLP, http://www.well-typed.com/



Reply | Threaded
Open this post in threaded view
|

Advance notice that I'd like to make Cabal depend on parsec

Gregory Collins-3
In reply to this post by Duncan Coutts-4
On Thu, Mar 14, 2013 at 3:53 PM, Duncan Coutts <duncan.coutts at googlemail.com
> wrote:

> Hi folks,
>
> I want to give you advance notice that I would like to make Cabal depend
> on parsec. The implication is that GHC would therefore depend on parsec
> and thus it would become a core package, rather than just a HP package.
> So this would affect both GHC and the HP, though I hope not too much.


+1 from me, although the amount of potential knock-on work might be
discouraging. The current cabal-install bootstrap process (which is
currently pretty easy and is necessary at times) will get a bunch more deps
as a result of this change, no?

--
Gregory Collins <greg at gregorycollins.net>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130314/c174f9f2/attachment.htm>

Reply | Threaded
Open this post in threaded view
|

Advance notice that I'd like to make Cabal depend on parsec

Duncan Coutts-4
On Thu, 2013-03-14 at 16:06 +0100, Gregory Collins wrote:

> On Thu, Mar 14, 2013 at 3:53 PM, Duncan Coutts <duncan.coutts at googlemail.com
> > wrote:
>
> > Hi folks,
> >
> > I want to give you advance notice that I would like to make Cabal depend
> > on parsec. The implication is that GHC would therefore depend on parsec
> > and thus it would become a core package, rather than just a HP package.
> > So this would affect both GHC and the HP, though I hope not too much.
>
>
> +1 from me, although the amount of potential knock-on work might be
> discouraging. The current cabal-install bootstrap process (which is
> currently pretty easy and is necessary at times) will get a bunch more deps
> as a result of this change, no?

Yes it will, but given that we do have a script it's not too bad I
think. And overall I think its worth it to have the better error
messages, performance and memory use. Do you have any idea how slow it
is to parse all the .cabal files on hackage, and how much memory that
takes? You'd be horrified :-)

Duncan



Reply | Threaded
Open this post in threaded view
|

Advance notice that I'd like to make Cabal depend on parsec

Administrator
This GHC dependency on Cabal is putting a rather troubling constraint
in Cabal's evolution, which in my opinion is a serious problem. When I
first took a look at the dependencies between GHC and Cabal I found it
a bit strange that GHC would depend on Cabal as I would expect GHC to
be as low in the dependency tree as possible to avoid exactly these
kinds of problems.

These GHC dependencies on Cabal are in fact small (see
http://hackage.haskell.org/trac/ghc/attachment/ticket/7740/ghc-2.png
for a summary) and with a little bit of refactoring it would be
possible to split these dependencies into a very small shared package
with minimal or no further dependencies. This would liberate Cabal to
make the necessary refactoring.

IMHO, the addition of these new dependencies to Cabal should go
together with splitting the GHC-Cabal shared dependencies into a
separate package so that there would be no additional coordination
needed from then on between these two development efforts (except when
dealing with this new package).


On Thu, Mar 14, 2013 at 12:12 PM, Duncan Coutts
<duncan.coutts at googlemail.com> wrote:

> On Thu, 2013-03-14 at 16:06 +0100, Gregory Collins wrote:
>> On Thu, Mar 14, 2013 at 3:53 PM, Duncan Coutts <duncan.coutts at googlemail.com
>> > wrote:
>>
>> > Hi folks,
>> >
>> > I want to give you advance notice that I would like to make Cabal depend
>> > on parsec. The implication is that GHC would therefore depend on parsec
>> > and thus it would become a core package, rather than just a HP package.
>> > So this would affect both GHC and the HP, though I hope not too much.
>>
>>
>> +1 from me, although the amount of potential knock-on work might be
>> discouraging. The current cabal-install bootstrap process (which is
>> currently pretty easy and is necessary at times) will get a bunch more deps
>> as a result of this change, no?
>
> Yes it will, but given that we do have a script it's not too bad I
> think. And overall I think its worth it to have the better error
> messages, performance and memory use. Do you have any idea how slow it
> is to parse all the .cabal files on hackage, and how much memory that
> takes? You'd be horrified :-)
>
> Duncan
>
>
> _______________________________________________
> cabal-devel mailing list
> cabal-devel at haskell.org
> http://www.haskell.org/mailman/listinfo/cabal-devel


Reply | Threaded
Open this post in threaded view
|

Advance notice that I'd like to make Cabal depend on parsec

Duncan Coutts-4
On Thu, 2013-03-14 at 12:22 -0300, Administrator wrote:
> This GHC dependency on Cabal is putting a rather troubling constraint
> in Cabal's evolution, which in my opinion is a serious problem. When I
> first took a look at the dependencies between GHC and Cabal I found it
> a bit strange that GHC would depend on Cabal as I would expect GHC to
> be as low in the dependency tree as possible to avoid exactly these
> kinds of problems.

The problem is that a compiler is a rather sophisticated application and
so though you'd like it to have minimal deps, it needs to do so much
stuff that it ends up needing lots of deps to support its features.

Things would be easier if that were not the case, and it's made harder
by the fact that ghc is not just a program, but it's exposed as a
library, which exposes all of its dependencies.

> These GHC dependencies on Cabal are in fact small (see
> http://hackage.haskell.org/trac/ghc/attachment/ticket/7740/ghc-2.png
> for a summary) and with a little bit of refactoring it would be
> possible to split these dependencies into a very small shared package
> with minimal or no further dependencies. This would liberate Cabal to
> make the necessary refactoring.

Except that the bits of Cabal that ghc needs are exactly the bits that
will now need parsec, text etc. The shared part would be the part that
defines the InstalledPackageInfo and the parser for that.

Also, though the ghc library has only relatively small dependencies on
Cabal, the ghc build process uses Cabal extensively, and currently the
system is that libraries that ghc needs to build get included as core
libraries and shipped with ghc. That itself could change but it's also
more work.

> IMHO, the addition of these new dependencies to Cabal should go
> together with splitting the GHC-Cabal shared dependencies into a
> separate package so that there would be no additional coordination
> needed from then on between these two development efforts (except when
> dealing with this new package).

So I would consider this if I thought it'd make a difference. In
particular at some point we'll want to split the Cabal lib into the bit
that just defines types and parsers etc, and the part that is a build
system. But even that wouldn't save us any dependencies in this
situation.

Duncan



Reply | Threaded
Open this post in threaded view
|

Advance notice that I'd like to make Cabal depend on parsec

Simon Peyton Jones
In reply to this post by Administrator
Yes I think that'd be a great plan.  It's bizarre that GHC depends on *all* of Cabal, but only uses a tiny part of it (more or less the Package data type I think).

Simon

|  -----Original Message-----
|  From: cabal-devel-bounces at haskell.org [mailto:cabal-devel-bounces at haskell.org]
|  On Behalf Of Administrator
|  Sent: 14 March 2013 15:23
|  To: Duncan Coutts
|  Cc: Lentczner; cabal-devel; Haskell Libraries; ghc-devs at haskell.org
|  Subject: Re: Advance notice that I'd like to make Cabal depend on parsec
|  
|  This GHC dependency on Cabal is putting a rather troubling constraint
|  in Cabal's evolution, which in my opinion is a serious problem. When I
|  first took a look at the dependencies between GHC and Cabal I found it
|  a bit strange that GHC would depend on Cabal as I would expect GHC to
|  be as low in the dependency tree as possible to avoid exactly these
|  kinds of problems.
|  
|  These GHC dependencies on Cabal are in fact small (see
|  http://hackage.haskell.org/trac/ghc/attachment/ticket/7740/ghc-2.png
|  for a summary) and with a little bit of refactoring it would be
|  possible to split these dependencies into a very small shared package
|  with minimal or no further dependencies. This would liberate Cabal to
|  make the necessary refactoring.
|  
|  IMHO, the addition of these new dependencies to Cabal should go
|  together with splitting the GHC-Cabal shared dependencies into a
|  separate package so that there would be no additional coordination
|  needed from then on between these two development efforts (except when
|  dealing with this new package).
|  
|  
|  On Thu, Mar 14, 2013 at 12:12 PM, Duncan Coutts
|  <duncan.coutts at googlemail.com> wrote:
|  > On Thu, 2013-03-14 at 16:06 +0100, Gregory Collins wrote:
|  >> On Thu, Mar 14, 2013 at 3:53 PM, Duncan Coutts
|  <duncan.coutts at googlemail.com
|  >> > wrote:
|  >>
|  >> > Hi folks,
|  >> >
|  >> > I want to give you advance notice that I would like to make Cabal depend
|  >> > on parsec. The implication is that GHC would therefore depend on parsec
|  >> > and thus it would become a core package, rather than just a HP package.
|  >> > So this would affect both GHC and the HP, though I hope not too much.
|  >>
|  >>
|  >> +1 from me, although the amount of potential knock-on work might be
|  >> discouraging. The current cabal-install bootstrap process (which is
|  >> currently pretty easy and is necessary at times) will get a bunch more deps
|  >> as a result of this change, no?
|  >
|  > Yes it will, but given that we do have a script it's not too bad I
|  > think. And overall I think its worth it to have the better error
|  > messages, performance and memory use. Do you have any idea how slow it
|  > is to parse all the .cabal files on hackage, and how much memory that
|  > takes? You'd be horrified :-)
|  >
|  > Duncan
|  >
|  >
|  > _______________________________________________
|  > cabal-devel mailing list
|  > cabal-devel at haskell.org
|  > http://www.haskell.org/mailman/listinfo/cabal-devel
|  
|  _______________________________________________
|  cabal-devel mailing list
|  cabal-devel at haskell.org
|  http://www.haskell.org/mailman/listinfo/cabal-devel


Reply | Threaded
Open this post in threaded view
|

Advance notice that I'd like to make Cabal depend on parsec

Duncan Coutts-4
On Thu, 2013-03-14 at 16:44 +0000, Simon Peyton-Jones wrote:
> Yes I think that'd be a great plan.  It's bizarre that GHC depends on
> *all* of Cabal, but only uses a tiny part of it (more or less the
> Package data type I think).

The sensible way to split it (I think) would be like this:

cabal-lib:
  Distribution.*
  -- containing definitions of types and parsers & pretty printers
  -- including the InstalledPackageInfo

cabal-build-simple
  Distribution.Simple.*
  -- the build system for "Simple" packages

cabal
  -- the program, what is currently called cabal-install

And then the ghc package would only depend on the cabal-lib package. But
it's that package that is going to use bytestring, text, parsec etc, for
its type definitions and parser.

The InstalledPackageInfo and its parser is what ghc and ghc-pkg
primarily use (though there's the opportunity to share code for handling
package indexes) and that type and that parser are also going to end up
using text and parsec etc.

It'd be possible to split things out further and have
InstalledPackageInfo and the types it uses and a special parser just for
that with fewer dependencies, but I'm not sure that's really worth it
and it would duplicate things (the types and/or parsers shared by
InstalledPackageInfo and the source package description).

So all in all, the split I suggest above makes sense for its own reasons
but it wouldn't help ghc here, and a further split just to help ghc
would be rather annoying.

Duncan

> |  -----Original Message-----
> |  From: cabal-devel-bounces at haskell.org [mailto:cabal-devel-bounces at haskell.org]
> |  On Behalf Of Administrator
> |  Sent: 14 March 2013 15:23
> |  To: Duncan Coutts
> |  Cc: Lentczner; cabal-devel; Haskell Libraries; ghc-devs at haskell.org
> |  Subject: Re: Advance notice that I'd like to make Cabal depend on parsec
> |  
> |  This GHC dependency on Cabal is putting a rather troubling constraint
> |  in Cabal's evolution, which in my opinion is a serious problem. When I
> |  first took a look at the dependencies between GHC and Cabal I found it
> |  a bit strange that GHC would depend on Cabal as I would expect GHC to
> |  be as low in the dependency tree as possible to avoid exactly these
> |  kinds of problems.
> |  
> |  These GHC dependencies on Cabal are in fact small (see
> |  http://hackage.haskell.org/trac/ghc/attachment/ticket/7740/ghc-2.png
> |  for a summary) and with a little bit of refactoring it would be
> |  possible to split these dependencies into a very small shared package
> |  with minimal or no further dependencies. This would liberate Cabal to
> |  make the necessary refactoring.
> |  
> |  IMHO, the addition of these new dependencies to Cabal should go
> |  together with splitting the GHC-Cabal shared dependencies into a
> |  separate package so that there would be no additional coordination
> |  needed from then on between these two development efforts (except when
> |  dealing with this new package).
> |  
> |  
> |  On Thu, Mar 14, 2013 at 12:12 PM, Duncan Coutts
> |  <duncan.coutts at googlemail.com> wrote:
> |  > On Thu, 2013-03-14 at 16:06 +0100, Gregory Collins wrote:
> |  >> On Thu, Mar 14, 2013 at 3:53 PM, Duncan Coutts
> |  <duncan.coutts at googlemail.com
> |  >> > wrote:
> |  >>
> |  >> > Hi folks,
> |  >> >
> |  >> > I want to give you advance notice that I would like to make Cabal depend
> |  >> > on parsec. The implication is that GHC would therefore depend on parsec
> |  >> > and thus it would become a core package, rather than just a HP package.
> |  >> > So this would affect both GHC and the HP, though I hope not too much.
> |  >>
> |  >>
> |  >> +1 from me, although the amount of potential knock-on work might be
> |  >> discouraging. The current cabal-install bootstrap process (which is
> |  >> currently pretty easy and is necessary at times) will get a bunch more deps
> |  >> as a result of this change, no?
> |  >
> |  > Yes it will, but given that we do have a script it's not too bad I
> |  > think. And overall I think its worth it to have the better error
> |  > messages, performance and memory use. Do you have any idea how slow it
> |  > is to parse all the .cabal files on hackage, and how much memory that
> |  > takes? You'd be horrified :-)
> |  >
> |  > Duncan
> |  >
> |  >
> |  > _______________________________________________
> |  > cabal-devel mailing list
> |  > cabal-devel at haskell.org
> |  > http://www.haskell.org/mailman/listinfo/cabal-devel
> |  
> |  _______________________________________________
> |  cabal-devel mailing list
> |  cabal-devel at haskell.org
> |  http://www.haskell.org/mailman/listinfo/cabal-devel





Reply | Threaded
Open this post in threaded view
|

Advance notice that I'd like to make Cabal depend on parsec

Roman Cheplyaka-2
* Duncan Coutts <duncan.coutts at googlemail.com> [2013-03-14 17:12:14+0000]
> The InstalledPackageInfo and its parser is what ghc and ghc-pkg
> primarily use (though there's the opportunity to share code for handling
> package indexes) and that type and that parser are also going to end up
> using text and parsec etc.

Correct me if I'm wrong, but isn't it just a strange coincidence that
InstalledPackageInfo is serialised in the format similar to .cabal
format?

InstalledPackageInfos aren't supposed to be edited by hand and do not
need good error reporting. They can be serialized using any
serialization library.

(Then again, "any serialization library" like aeson would probably bring
more dependencies than you're considering...)

Roman


Reply | Threaded
Open this post in threaded view
|

Advance notice that I'd like to make Cabal depend on parsec

Duncan Coutts-4
On Thu, 2013-03-14 at 21:29 +0200, Roman Cheplyaka wrote:
> * Duncan Coutts <duncan.coutts at googlemail.com> [2013-03-14 17:12:14+0000]
> > The InstalledPackageInfo and its parser is what ghc and ghc-pkg
> > primarily use (though there's the opportunity to share code for handling
> > package indexes) and that type and that parser are also going to end up
> > using text and parsec etc.
>
> Correct me if I'm wrong, but isn't it just a strange coincidence that
> InstalledPackageInfo is serialised in the format similar to .cabal
> format?

It's not a very strange coincidence. The type is not specific to ghc,
it's defined in a compiler-neutral way by the original Cabal spec. So
since both the source package and installed package info was defined in
the Cabal spec, using the same kind of external syntax and sharing many
of the same types, then they both ended up in the Cabal lib and share
the same parsers & pretty printers.

> InstalledPackageInfos aren't supposed to be edited by hand and do not
> need good error reporting. They can be serialized using any
> serialization library.

Right, it doesn't need good error reporting (though it's nice if it's
fast, which it isn't currently). The main advantage of the current
arrangement is that the source and installed package descriptions get to
share the same types and parser/pretty printer.

I think there's a slightly more general point here though. Why is it
that we don't have any good parser in the core packages? It's not just
Cabal that needs to parse things. We have two useless parsers in the
base package, ReadS and ReadP. Haskell is famous for its parser
combinators and yet our core infrastructure is stuck with only useless
ones!

Duncan



Reply | Threaded
Open this post in threaded view
|

Advance notice that I'd like to make Cabal depend on parsec

Roman Cheplyaka-2
In reply to this post by Duncan Coutts-4
* Heinrich Apfelmus <apfelmus at quantentunnel.de> [2013-03-15 10:38:37+0100]

> Duncan Coutts wrote:
> >Hi folks,
> >
> >I want to give you advance notice that I would like to make Cabal depend
> >on parsec. The implication is that GHC would therefore depend on parsec
> >and thus it would become a core package, rather than just a HP package.
> >So this would affect both GHC and the HP, though I hope not too much.
> >
> >[..]
> >
> >Why did I choose parsec? Practicality dictates that I can only use
> >things in the core libraries, and the nearest thing we have to that is
> >the parser lib that is in the HP. I tried to use happy but I could not
> >construct a grammar/lexer combo to handle the layout (also, happy is not
> >exactly known for its great error messages).
>
> Reuse is good, but the implication I'm worried about is this: Can I
> upgrade the  parsec  package installed on my system by doing a user
> install from  hackage ? Without an implementation of more flexible
> package installations (multiple versions installed simultaneously),
> any dependency of GHC has its version number essentially set into
> stone.

We've had that working for a long time. Right now I even have multiple
installed versions of Cabal-the-library itself.

It's not that Parsec would be automatically linked into each executable.
It's just that ghc-the-program would have Parsec linked into it.

Roman


Reply | Threaded
Open this post in threaded view
|

Advance notice that I'd like to make Cabal depend on parsec

Roman Cheplyaka-2
* Ivan Lazar Miljenovic <ivan.miljenovic at gmail.com> [2013-03-15 22:12:47+1100]

> On 15 March 2013 22:05, Roman Cheplyaka <roma at ro-che.info> wrote:
> > * Heinrich Apfelmus <apfelmus at quantentunnel.de> [2013-03-15 10:38:37+0100]
> >> Duncan Coutts wrote:
> >> >Hi folks,
> >> >
> >> >I want to give you advance notice that I would like to make Cabal depend
> >> >on parsec. The implication is that GHC would therefore depend on parsec
> >> >and thus it would become a core package, rather than just a HP package.
> >> >So this would affect both GHC and the HP, though I hope not too much.
> >> >
> >> >[..]
> >> >
> >> >Why did I choose parsec? Practicality dictates that I can only use
> >> >things in the core libraries, and the nearest thing we have to that is
> >> >the parser lib that is in the HP. I tried to use happy but I could not
> >> >construct a grammar/lexer combo to handle the layout (also, happy is not
> >> >exactly known for its great error messages).
> >>
> >> Reuse is good, but the implication I'm worried about is this: Can I
> >> upgrade the  parsec  package installed on my system by doing a user
> >> install from  hackage ? Without an implementation of more flexible
> >> package installations (multiple versions installed simultaneously),
> >> any dependency of GHC has its version number essentially set into
> >> stone.
> >
> > We've had that working for a long time. Right now I even have multiple
> > installed versions of Cabal-the-library itself.
> >
> > It's not that Parsec would be automatically linked into each executable.
> > It's just that ghc-the-program would have Parsec linked into it.
>
> And ghc-the-library, which means that anything that uses
> ghc-as-a-library (and indeed even Cabal-as-a-library) no longer has a
> choice of which version of parsec they use.

Right. But in this regard, GHC API and Cabal are no different from any
other libraries that suffer from the same issue. (Except that it's hard
to recompile GHC to use an alternative Parsec version.) And these are
not exactly the most popular libraries either ? so I doubt this change
will have a large impact.

Roman


Reply | Threaded
Open this post in threaded view
|

Advance notice that I'd like to make Cabal depend on parsec

Malcolm Wallace-2
In reply to this post by Duncan Coutts-4

On 14 Mar 2013, at 14:53, Duncan Coutts wrote:

> Why did I choose parsec? Practicality dictates that I can only use
> things in the core libraries, and the nearest thing we have to that is
> the parser lib that is in the HP.

I fully agree that a real parser is needed for Cabal files.  I implemented one myself, many years ago, using the polyparse library, and using a hand-written lexer.  Feel free to reuse it (attached, together with a sample program) if you like, although I expect it has bit-rotted a little over time.

Regards,
    Malcolm


-------------- next part --------------
A non-text attachment was scrubbed...
Name: cabal-parse2.hs
Type: application/octet-stream
Size: 1348 bytes
Desc: not available
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130315/3c1dc0a9/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: CabalParse2.hs
Type: application/octet-stream
Size: 16073 bytes
Desc: not available
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130315/3c1dc0a9/attachment-0001.obj>

Reply | Threaded
Open this post in threaded view
|

Advance notice that I'd like to make Cabal depend on parsec

Duncan Coutts-4
On Fri, 2013-03-15 at 12:57 +0000, Malcolm Wallace wrote:

> On 14 Mar 2013, at 14:53, Duncan Coutts wrote:
>
> > Why did I choose parsec? Practicality dictates that I can only use
> > things in the core libraries, and the nearest thing we have to that is
> > the parser lib that is in the HP.
>
> I fully agree that a real parser is needed for Cabal files.  I
> implemented one myself, many years ago, using the polyparse library,
> and using a hand-written lexer.  Feel free to reuse it (attached,
> together with a sample program) if you like, although I expect it has
> bit-rotted a little over time.

Thanks Malcolm. I should point out that I would also be perfectly happy
to use polyparse. The practical constraint is that Cabal can only depend
on other Core libs. My assumption was that moving parsec from HP to core
was easier than adding polyparse into core. But if someone wanted to
suggest ripping ReadP out of base and replacing it with polyparse, I
would certainly not complain.

Duncan



Reply | Threaded
Open this post in threaded view
|

Advance notice that I'd like to make Cabal depend on parsec

Duncan Coutts-4
In reply to this post by Duncan Coutts-4
On Fri, 2013-03-15 at 12:37 +0800, Conrad Parker wrote:

> On 14 March 2013 22:53, Duncan Coutts <duncan.coutts at googlemail.com> wrote:
> >
> > I've been doing regression testing against hackage and I'm satisfied
> > that the new parser matches close enough. I've uncovered all kinds of
> > horrors with .cabal files in the wild relying on quirks of the old
> > parser. I've made adjustments for most of them but I will be breaking a
> > half dozen old packages
>
> When you say you've "made adjustments for" dodgy .cabal files in the
> wild, do you mean that you'll send those maintainers patches that make
> their cabal files less dodgy, or do you mean you've added hacks to
> your parser to reproduce the quirky behaviour?

The latter, but the egregiousness of the hacks is actually not too bad
in the end. I don't find it revolting. For the worst examples I didn't
make adjustments and those ones will break. I think I've made a
reasonable judgement about the where to draw the line between the two.

I can look into generating warnings in those cases (which is probably
better than me emailing them).

Duncan



Reply | Threaded
Open this post in threaded view
|

Advance notice that I'd like to make Cabal depend on parsec

Mark Lentczner-2
In reply to this post by Duncan Coutts-4
This thread is raising all sorts of questions for me:

Is it essential, or even sensical, that the serialization format GHC needs
for storing package info bear any relation to the human authored form? If
not, the split out of the package types could be accomplished in a way
where GHC uses simple show/read(P) style serialization for storage of
package info, where as cabal-lib would use a lovely parsec parser for
humans. I'd like this approach.

The issue of putting the yet one more HP package into GHC's core packages
is increasing the exposure of the difficulty of the current GHC/HP
relationship. See also threads in HP's mailing list for why can't we bump
some packages in GHC's core set for the next HP release. The split
arrangement is strange because we have two groups making up what is in the
HP, but they have different processes and aims. The complex technical
relationship between the moving parts only heightens the difficulty.

Perhaps the major cause is that because GHC is shipped as a library itself,
it exposes all it's package dependencies. And as it is a large, and
growing, piece of software, the list only wants to grow. But I wonder how
often GHC is used as a library itself? If not often, then perhaps GHC
should be shipped as two parts: Just a compiler (plus the small number of
packages that the compiler forces), and ghc-lib as an optional,
even separate, package - perhaps one with even a traditional way of
depending on other packages. In otherwords, users that wanted to
incorporate the ghc-lib into their programs would depend, and download, and
configure, and build, ghc-lib indpenendant of the GHC binaries installed on
their system. Perhaps then, GHC, the compiler, built from ghc-lib, would be
bootstrapped not from the past compiler, but from the past HP.....

Okay, perhaps that is all just fantasy. But, no other programming system
operates the way we do. They all fall into one of two camps:

   - The dominant implementation is maintained, built, and shipped along
   with a large collection of "common packages". Examples: Python, Ruby, PHP,
   Java.
   - The dominant implementation is shipped as a bare tool, and large
   common libraries are maintained and shipped independently. Examples: C++
   (think g++ and boost), JavaScript (think browsers, and jQuery).

We are in the middle and, I think, experiencing growing pains because of it.

- Mark


On Sat, Mar 16, 2013 at 3:42 PM, dag.odenhall at gmail.com <
dag.odenhall at gmail.com> wrote:

> I'd love to have a proper parser and source-location-aware AST for sake of
> editor/IDE tools, so +1 from me. If you don't end up doing this after all,
> I'd still like to see your parser in a separate package, although I
> understand if you don't feel like maintaining two parsers especially given
> the tedious process for verifying they work similarly. I guess it could
> still be useful in the same way we find haskell-src-exts useful despite
> some incompatibilities with GHC.
>
>
> On Thu, Mar 14, 2013 at 3:53 PM, Duncan Coutts <
> duncan.coutts at googlemail.com> wrote:
>
>> Hi folks,
>>
>> I want to give you advance notice that I would like to make Cabal depend
>> on parsec. The implication is that GHC would therefore depend on parsec
>> and thus it would become a core package, rather than just a HP package.
>> So this would affect both GHC and the HP, though I hope not too much.
>>
>> The rationale is that Cabal needs to parse things, like .cabal files and
>> currently we do not have a decent parser in the core libraries. By
>> decent I mean one that can produce error messages with source locations
>> and that doesn't have unpredictable memory use. The only parser in the
>> core libraries at the moment is Text.ParserCombinators.ReadP from the
>> base package and that fails my "decent" criteria on both counts. Its
>> idea of an error message is (), and on some largish .cabal files we take
>> 100s of MB to parse (I realise that the ReadP in the base package is a
>> cutdown version so I don't mean to malign all ReadP-style libs out
>> there).
>>
>> Partly due to the performance problem, the terrible .cabal file error
>> messages, and partly because Doaitse Swierstra keeps asking me if .cabal
>> files have a grammar, I've been writing a new .cabal parser. It uses an
>> alex lexer and a parsec parser. It's fast and the error messages are
>> pretty good. I have reverse engineered a grammar that closely matches
>> the existing parser and .cabal files in the wild, though I'm not sure
>> Doaitse will be satisfied with the approach I've taken to handling
>> layout.
>>
>> Why did I choose parsec? Practicality dictates that I can only use
>> things in the core libraries, and the nearest thing we have to that is
>> the parser lib that is in the HP. I tried to use happy but I could not
>> construct a grammar/lexer combo to handle the layout (also, happy is not
>> exactly known for its great error messages).
>>
>> I've been doing regression testing against hackage and I'm satisfied
>> that the new parser matches close enough. I've uncovered all kinds of
>> horrors with .cabal files in the wild relying on quirks of the old
>> parser. I've made adjustments for most of them but I will be breaking a
>> half dozen old packages (most of those don't actually build correctly
>> because though their syntax errors are not picked up by the parser, they
>> do cause failure eventually).
>>
>> So far I've just done the outline parser, not the individual field
>> parsers. I'll be doing those next and then integrate. So this change is
>> still a bit of a ways off, but I thought it'd be useful to warn people
>> now.
>>
>> Duncan
>>
>>
>> _______________________________________________
>> cabal-devel mailing list
>> cabal-devel at haskell.org
>> http://www.haskell.org/mailman/listinfo/cabal-devel
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130317/8684618d/attachment.htm>

Reply | Threaded
Open this post in threaded view
|

Advance notice that I'd like to make Cabal depend on parsec

Ian Lynagh-2
On Sun, Mar 17, 2013 at 09:57:25AM -0700, Mark Lentczner wrote:
>
> Is it essential, or even sensical, that the serialization format GHC needs
> for storing package info bear any relation to the human authored form? If
> not, the split out of the package types could be accomplished in a way
> where GHC uses simple show/read(P) style serialization for storage of
> package info, where as cabal-lib would use a lovely parsec parser for
> humans. I'd like this approach.

I think it would be feasible to stop GHC itself from using the human
readable format. The only place I can think of it being used is in the
package database, but we could use either Read/Show for that, or just
exclusively use the binary format.

It would be a little less user-friendly, but maybe worth it to remove
the ghc library dependencies on most-of-Cabal, mtl and parsec.

> Perhaps the major cause is that because GHC is shipped as a library itself,
> it exposes all it's package dependencies.

Yes.

> In otherwords, users that wanted to
> incorporate the ghc-lib into their programs would depend, and download, and
> configure, and build, ghc-lib indpenendant of the GHC binaries

I think this would create more problems than it solves.

> Okay, perhaps that is all just fantasy. But, no other programming system
> operates the way we do. They all fall into one of two camps:
>
>    - The dominant implementation is maintained, built, and shipped along
>    with a large collection of "common packages". Examples: Python, Ruby, PHP,
>    Java.
>    - The dominant implementation is shipped as a bare tool, and large
>    common libraries are maintained and shipped independently. Examples: C++
>    (think g++ and boost), JavaScript (think browsers, and jQuery).
>
> We are in the middle and, I think, experiencing growing pains because of it.

I would say that we are doing the first option, in the form of the HP.
It's just that the core gets frozen (i.e., ghc + libs gets released)
earlier than the higher level libraries. I don't think that moving
(back) to trying to freeze/release everything all at once would be an
improvement.

You just need to remain strong, and keep saying "no"  :-)
(you're doing a great job, BTW!)


Thanks
Ian



Reply | Threaded
Open this post in threaded view
|

Advance notice that I'd like to make Cabal depend on parsec

Henning Thielemann

On Sun, 17 Mar 2013, Ian Lynagh wrote:

> I think it would be feasible to stop GHC itself from using the human
> readable format. The only place I can think of it being used is in the
> package database, but we could use either Read/Show for that, or just
> exclusively use the binary format.

I already needed the human readable format in order to check what
information a custom configure file generated.


Reply | Threaded
Open this post in threaded view
|

Advance notice that I'd like to make Cabal depend on parsec

Ian Lynagh-2
On Sun, Mar 17, 2013 at 09:04:58PM +0100, Henning Thielemann wrote:

>
> On Sun, 17 Mar 2013, Ian Lynagh wrote:
>
> >I think it would be feasible to stop GHC itself from using the human
> >readable format. The only place I can think of it being used is in the
> >package database, but we could use either Read/Show for that, or just
> >exclusively use the binary format.
>
> I already needed the human readable format in order to check what
> information a custom configure file generated.

You can use "ghc-pkg describe p" for that.

I don't think you should ever need the human readable format unless you
need to alter the package database by hand.


--
Ian Lynagh, Haskell Consultant
Well-Typed LLP, http://www.well-typed.com/


Reply | Threaded
Open this post in threaded view
|

Advance notice that I'd like to make Cabal depend on parsec

Henning Thielemann

On Sun, 17 Mar 2013, Ian Lynagh wrote:

> On Sun, Mar 17, 2013 at 09:04:58PM +0100, Henning Thielemann wrote:
>>
>> On Sun, 17 Mar 2013, Ian Lynagh wrote:
>>
>>> I think it would be feasible to stop GHC itself from using the human
>>> readable format. The only place I can think of it being used is in the
>>> package database, but we could use either Read/Show for that, or just
>>> exclusively use the binary format.
>>
>> I already needed the human readable format in order to check what
>> information a custom configure file generated.
>
> You can use "ghc-pkg describe p" for that.
>
> I don't think you should ever need the human readable format unless you
> need to alter the package database by hand.

I think I also altered these package descriptions in order to check what
the correct content should be.


12