Parser.y rewrite with parser combinators

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
23 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Parser.y rewrite with parser combinators

Vladislav Zavialov
Hello devs,

Recently I've been working on a couple of parsing-related issues in
GHC. I implemented support for the -XStarIsType extension, fixed
parsing of the (!) type operator (Trac #15457), allowed using type
operators in existential contexts (Trac #15675).

Doing these tasks required way more engineering effort than I expected
from my prior experience working with parsers due to complexities of
GHC's grammar.

In the last couple of days, I've been working on Trac #1087 - a
12-year old parsing bug. After trying out a couple of approaches, to
my dismay I realised that fixing it properly (including support for
bang patterns inside infix constructors, etc) would require a complete
rewrite of expression and pattern parsing logic.

Worse yet, most of the work would be done outside Parser.y in Haskell
code instead, in RdrHsSyn helpers. When I try to keep the logic inside
Parser.y, in every design direction I face reduce/reduce conflicts.

The reduce/reduce conflicts are the worst.

Perhaps it is finally time to admit that Haskell syntax with all of
the GHC cannot fit into a LALR grammar?

The extent of hacks that we have right now just to make parsing
possible is astonishing. For instance, we have dedicated constructors
in HsExpr to make parsing patterns possible (EWildPat, EAsPat,
EViewPat, ELazyPat). That is, one of the fundamental types (that the
type checker operates on) has four additional constructors that exist
due to a reduce/reduce conflict between patterns and expressions.

I propose a complete rewrite of GHC's parser to use recursive descent
parsing with monadic parser combinators.

1. We could significantly simplify parsing logic by doing things in a
more direct manner. For instance, instead of parsing patterns as
expressions and then post-processing them, we could have separate
parsing logic for patterns and expressions.

2. We could fix long-standing parsing bugs like Trac #1087 because
recursive descent offers more expressive power than LALR (at the cost
of support for left recursion, which is not much of a loss in
practice).

3. New extensions to the grammar would require less engineering effort.

Of course, this rewrite is a huge chunk of work, so before I start, I
would like to know that this work would be accepted if done well.
Here's what I want to achieve:

* Comparable performance. The new parser could turn out to be faster
because it would do less post-processing, but it could be slower
because 'happy' does all the sorts of low-level optimisations. I will
consider this project a success only if comparable performance is
achieved.

* Correctness. The new parser should handle 100% of the syntactic
constructs that the current parser can handle.

* Error messages. The new error messages should be of equal or better
quality than existing ones.

* Elegance. The new parser should bring simplification to other parts
of the compiler (e.g. removal of pattern constructors from HsExpr).
And one of the design principles is to represent things by dedicated
data structures, in contrast to the current state of affairs where we
represent patterns as expressions, data constructor declarations as
types (before D5180), etc.

Let me know if this is a good/acceptable direction of travel. That's
definitely something that I personally would like to see happen.

All the best,
- Vladislav
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

RE: Parser.y rewrite with parser combinators

GHC - devs mailing list
I'm no parser expert, but a parser that was easier to understand and modify, and was as fast as the current one, sounds good to me.

It's a tricky area though; e.g. the layout rule.

Worth talking to Simon Marlow.

Simon



| -----Original Message-----
| From: ghc-devs <[hidden email]> On Behalf Of Vladislav
| Zavialov
| Sent: 08 October 2018 21:44
| To: ghc-devs <[hidden email]>
| Subject: Parser.y rewrite with parser combinators
|
| Hello devs,
|
| Recently I've been working on a couple of parsing-related issues in
| GHC. I implemented support for the -XStarIsType extension, fixed
| parsing of the (!) type operator (Trac #15457), allowed using type
| operators in existential contexts (Trac #15675).
|
| Doing these tasks required way more engineering effort than I expected
| from my prior experience working with parsers due to complexities of
| GHC's grammar.
|
| In the last couple of days, I've been working on Trac #1087 - a
| 12-year old parsing bug. After trying out a couple of approaches, to
| my dismay I realised that fixing it properly (including support for
| bang patterns inside infix constructors, etc) would require a complete
| rewrite of expression and pattern parsing logic.
|
| Worse yet, most of the work would be done outside Parser.y in Haskell
| code instead, in RdrHsSyn helpers. When I try to keep the logic inside
| Parser.y, in every design direction I face reduce/reduce conflicts.
|
| The reduce/reduce conflicts are the worst.
|
| Perhaps it is finally time to admit that Haskell syntax with all of
| the GHC cannot fit into a LALR grammar?
|
| The extent of hacks that we have right now just to make parsing
| possible is astonishing. For instance, we have dedicated constructors
| in HsExpr to make parsing patterns possible (EWildPat, EAsPat,
| EViewPat, ELazyPat). That is, one of the fundamental types (that the
| type checker operates on) has four additional constructors that exist
| due to a reduce/reduce conflict between patterns and expressions.
|
| I propose a complete rewrite of GHC's parser to use recursive descent
| parsing with monadic parser combinators.
|
| 1. We could significantly simplify parsing logic by doing things in a
| more direct manner. For instance, instead of parsing patterns as
| expressions and then post-processing them, we could have separate
| parsing logic for patterns and expressions.
|
| 2. We could fix long-standing parsing bugs like Trac #1087 because
| recursive descent offers more expressive power than LALR (at the cost
| of support for left recursion, which is not much of a loss in
| practice).
|
| 3. New extensions to the grammar would require less engineering effort.
|
| Of course, this rewrite is a huge chunk of work, so before I start, I
| would like to know that this work would be accepted if done well.
| Here's what I want to achieve:
|
| * Comparable performance. The new parser could turn out to be faster
| because it would do less post-processing, but it could be slower
| because 'happy' does all the sorts of low-level optimisations. I will
| consider this project a success only if comparable performance is
| achieved.
|
| * Correctness. The new parser should handle 100% of the syntactic
| constructs that the current parser can handle.
|
| * Error messages. The new error messages should be of equal or better
| quality than existing ones.
|
| * Elegance. The new parser should bring simplification to other parts
| of the compiler (e.g. removal of pattern constructors from HsExpr).
| And one of the design principles is to represent things by dedicated
| data structures, in contrast to the current state of affairs where we
| represent patterns as expressions, data constructor declarations as
| types (before D5180), etc.
|
| Let me know if this is a good/acceptable direction of travel. That's
| definitely something that I personally would like to see happen.
|
| All the best,
| - Vladislav
| _______________________________________________
| ghc-devs mailing list
| [hidden email]
| https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.hask
| ell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-
| devs&amp;data=02%7C01%7Csimonpj%40microsoft.com%7C19181de5c6bd493ab07a08d
| 62d5edbe0%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636746282778542095
| &amp;sdata=lFRt1t4k3BuuRdyOqwOYTZcLPRB%2BtFJwfFtgMpNLxW0%3D&amp;reserved=
| 0
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Parser.y rewrite with parser combinators

Iavor Diatchki
Hello,

my experience with complex parsers written using parsing combinators is that they tend to be quite difficult to modify and have any kind of assurance that now you haven't broken something else.   While reduce-reduce errors are indeed annoying, you at least know that there is some sort of issue you need to address.   With a combinator based parser, you basically have to do program verification, or more pragmatically, have a large test suite and hope that you tested everything.

I think the current approach is actually quite reasonable:  use the Happy grammar to parse out the basic structure of the program, without trying to be completely precise, and then have a separate pass that validates and fixes up the results.   While this has the draw-back of some constructors being in the "wrong place", there are also benefits---namely we can report better parse errors.  Also, with the new rewrite of HsSyn, we should be able to mark such constructors as only usable in the parsing pass, so later passes wouldn't need to worry about them.

-Iavor











On Mon, Oct 8, 2018 at 2:26 PM Simon Peyton Jones via ghc-devs <[hidden email]> wrote:
I'm no parser expert, but a parser that was easier to understand and modify, and was as fast as the current one, sounds good to me.

It's a tricky area though; e.g. the layout rule.

Worth talking to Simon Marlow.

Simon



| -----Original Message-----
| From: ghc-devs <[hidden email]> On Behalf Of Vladislav
| Zavialov
| Sent: 08 October 2018 21:44
| To: ghc-devs <[hidden email]>
| Subject: Parser.y rewrite with parser combinators
|
| Hello devs,
|
| Recently I've been working on a couple of parsing-related issues in
| GHC. I implemented support for the -XStarIsType extension, fixed
| parsing of the (!) type operator (Trac #15457), allowed using type
| operators in existential contexts (Trac #15675).
|
| Doing these tasks required way more engineering effort than I expected
| from my prior experience working with parsers due to complexities of
| GHC's grammar.
|
| In the last couple of days, I've been working on Trac #1087 - a
| 12-year old parsing bug. After trying out a couple of approaches, to
| my dismay I realised that fixing it properly (including support for
| bang patterns inside infix constructors, etc) would require a complete
| rewrite of expression and pattern parsing logic.
|
| Worse yet, most of the work would be done outside Parser.y in Haskell
| code instead, in RdrHsSyn helpers. When I try to keep the logic inside
| Parser.y, in every design direction I face reduce/reduce conflicts.
|
| The reduce/reduce conflicts are the worst.
|
| Perhaps it is finally time to admit that Haskell syntax with all of
| the GHC cannot fit into a LALR grammar?
|
| The extent of hacks that we have right now just to make parsing
| possible is astonishing. For instance, we have dedicated constructors
| in HsExpr to make parsing patterns possible (EWildPat, EAsPat,
| EViewPat, ELazyPat). That is, one of the fundamental types (that the
| type checker operates on) has four additional constructors that exist
| due to a reduce/reduce conflict between patterns and expressions.
|
| I propose a complete rewrite of GHC's parser to use recursive descent
| parsing with monadic parser combinators.
|
| 1. We could significantly simplify parsing logic by doing things in a
| more direct manner. For instance, instead of parsing patterns as
| expressions and then post-processing them, we could have separate
| parsing logic for patterns and expressions.
|
| 2. We could fix long-standing parsing bugs like Trac #1087 because
| recursive descent offers more expressive power than LALR (at the cost
| of support for left recursion, which is not much of a loss in
| practice).
|
| 3. New extensions to the grammar would require less engineering effort.
|
| Of course, this rewrite is a huge chunk of work, so before I start, I
| would like to know that this work would be accepted if done well.
| Here's what I want to achieve:
|
| * Comparable performance. The new parser could turn out to be faster
| because it would do less post-processing, but it could be slower
| because 'happy' does all the sorts of low-level optimisations. I will
| consider this project a success only if comparable performance is
| achieved.
|
| * Correctness. The new parser should handle 100% of the syntactic
| constructs that the current parser can handle.
|
| * Error messages. The new error messages should be of equal or better
| quality than existing ones.
|
| * Elegance. The new parser should bring simplification to other parts
| of the compiler (e.g. removal of pattern constructors from HsExpr).
| And one of the design principles is to represent things by dedicated
| data structures, in contrast to the current state of affairs where we
| represent patterns as expressions, data constructor declarations as
| types (before D5180), etc.
|
| Let me know if this is a good/acceptable direction of travel. That's
| definitely something that I personally would like to see happen.
|
| All the best,
| - Vladislav
| _______________________________________________
| ghc-devs mailing list
| [hidden email]
| https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.hask
| ell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-
| devs&amp;data=02%7C01%7Csimonpj%40microsoft.com%7C19181de5c6bd493ab07a08d
| 62d5edbe0%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636746282778542095
| &amp;sdata=lFRt1t4k3BuuRdyOqwOYTZcLPRB%2BtFJwfFtgMpNLxW0%3D&amp;reserved=
| 0
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

RE: Parser.y rewrite with parser combinators

GHC - devs mailing list

use the Happy grammar to parse out the basic structure of the program, without trying to be completely precise, and then have a separate pass that validates and fixes up the results.   

 

Incidentally, we use this for operator fixity and precedence, where the fixup is done in the renamer, and for that purpose it works really well.

 

 

 

From: Iavor Diatchki <[hidden email]>
Sent: 08 October 2018 23:00
To: Simon Peyton Jones <[hidden email]>
Cc: [hidden email]; ghc-devs <[hidden email]>
Subject: Re: Parser.y rewrite with parser combinators

 

Hello,

 

my experience with complex parsers written using parsing combinators is that they tend to be quite difficult to modify and have any kind of assurance that now you haven't broken something else.   While reduce-reduce errors are indeed annoying, you at least know that there is some sort of issue you need to address.   With a combinator based parser, you basically have to do program verification, or more pragmatically, have a large test suite and hope that you tested everything.

 

I think the current approach is actually quite reasonable:  use the Happy grammar to parse out the basic structure of the program, without trying to be completely precise, and then have a separate pass that validates and fixes up the results.   While this has the draw-back of some constructors being in the "wrong place", there are also benefits---namely we can report better parse errors.  Also, with the new rewrite of HsSyn, we should be able to mark such constructors as only usable in the parsing pass, so later passes wouldn't need to worry about them.

 

-Iavor

 

 

 

 

 

 

 

 

 

 

 

On Mon, Oct 8, 2018 at 2:26 PM Simon Peyton Jones via ghc-devs <[hidden email]> wrote:

I'm no parser expert, but a parser that was easier to understand and modify, and was as fast as the current one, sounds good to me.

It's a tricky area though; e.g. the layout rule.

Worth talking to Simon Marlow.

Simon



| -----Original Message-----
| From: ghc-devs <[hidden email]> On Behalf Of Vladislav
| Zavialov
| Sent: 08 October 2018 21:44
| To: ghc-devs <[hidden email]>
| Subject: Parser.y rewrite with parser combinators
|
| Hello devs,
|
| Recently I've been working on a couple of parsing-related issues in
| GHC. I implemented support for the -XStarIsType extension, fixed
| parsing of the (!) type operator (Trac #15457), allowed using type
| operators in existential contexts (Trac #15675).
|
| Doing these tasks required way more engineering effort than I expected
| from my prior experience working with parsers due to complexities of
| GHC's grammar.
|
| In the last couple of days, I've been working on Trac #1087 - a
| 12-year old parsing bug. After trying out a couple of approaches, to
| my dismay I realised that fixing it properly (including support for
| bang patterns inside infix constructors, etc) would require a complete
| rewrite of expression and pattern parsing logic.
|
| Worse yet, most of the work would be done outside Parser.y in Haskell
| code instead, in RdrHsSyn helpers. When I try to keep the logic inside
| Parser.y, in every design direction I face reduce/reduce conflicts.
|
| The reduce/reduce conflicts are the worst.
|
| Perhaps it is finally time to admit that Haskell syntax with all of
| the GHC cannot fit into a LALR grammar?
|
| The extent of hacks that we have right now just to make parsing
| possible is astonishing. For instance, we have dedicated constructors
| in HsExpr to make parsing patterns possible (EWildPat, EAsPat,
| EViewPat, ELazyPat). That is, one of the fundamental types (that the
| type checker operates on) has four additional constructors that exist
| due to a reduce/reduce conflict between patterns and expressions.
|
| I propose a complete rewrite of GHC's parser to use recursive descent
| parsing with monadic parser combinators.
|
| 1. We could significantly simplify parsing logic by doing things in a
| more direct manner. For instance, instead of parsing patterns as
| expressions and then post-processing them, we could have separate
| parsing logic for patterns and expressions.
|
| 2. We could fix long-standing parsing bugs like Trac #1087 because
| recursive descent offers more expressive power than LALR (at the cost
| of support for left recursion, which is not much of a loss in
| practice).
|
| 3. New extensions to the grammar would require less engineering effort.
|
| Of course, this rewrite is a huge chunk of work, so before I start, I
| would like to know that this work would be accepted if done well.
| Here's what I want to achieve:
|
| * Comparable performance. The new parser could turn out to be faster
| because it would do less post-processing, but it could be slower
| because 'happy' does all the sorts of low-level optimisations. I will
| consider this project a success only if comparable performance is
| achieved.
|
| * Correctness. The new parser should handle 100% of the syntactic
| constructs that the current parser can handle.
|
| * Error messages. The new error messages should be of equal or better
| quality than existing ones.
|
| * Elegance. The new parser should bring simplification to other parts
| of the compiler (e.g. removal of pattern constructors from HsExpr).
| And one of the design principles is to represent things by dedicated
| data structures, in contrast to the current state of affairs where we
| represent patterns as expressions, data constructor declarations as
| types (before D5180), etc.
|
| Let me know if this is a good/acceptable direction of travel. That's
| definitely something that I personally would like to see happen.
|
| All the best,
| - Vladislav
| _______________________________________________
| ghc-devs mailing list
| [hidden email]
| https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.hask
| ell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-
| devs&amp;data=02%7C01%7Csimonpj%40microsoft.com%7C19181de5c6bd493ab07a08d
| 62d5edbe0%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636746282778542095
| &amp;sdata=lFRt1t4k3BuuRdyOqwOYTZcLPRB%2BtFJwfFtgMpNLxW0%3D&amp;reserved=
| 0
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Parser.y rewrite with parser combinators

Alan & Kim Zimmerman
In reply to this post by GHC - devs mailing list
I am not against this proposal,  but want to raise a possible future concern.

As part of improving the haskell tooling environment I am keen on making GHC incremental, and have started a proof of concept based in the same techniques as used in the tree-sitter library.

This is achieved by modifying happy, and requires minimal changes to the existing Parser.y.

It would be unfortunate if this possibility was prevented by this rewrite.

Alan

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Parser.y rewrite with parser combinators

Vladislav Zavialov
In reply to this post by Iavor Diatchki
> complex parsers written using parsing combinators is that they tend to be quite difficult to modify and have any kind of assurance that now you haven't broken something else

That's true regardless of implementation technique, parsers are rather
delicate. A LALR-based parser generator does provide more information
when it detects shift/reduce and reduce/reduce conflicts, but I never
found this information useful. It was always quite the opposite of
being helpful - an indication that a LALR parser could not handle my
change and I had to look for workarounds.

> With a combinator based parser, you basically have to do program verification, or more pragmatically, have a large test suite and hope that you tested everything.

Even when doing modifications to Parser.y, I relied mainly on the test
suite to determine whether my change was right (and the test suite
always caught many issues). A large test suite is the best approach
both for 'happy'-based parsers and for combinator-based parsers.

> and then have a separate pass that validates and fixes up the results

That's where my concern lies. This separate pass is confusing (at
least for me - it's not the most straightforward thing to parse
something incorrectly and then restructure it), it is hard to modify,
it does not handle corner cases (e.g. #1087).

Since we have all this Haskell code that does a significant portion of
processing, why even bother with having a LALR pass before it?

> namely we can report better parse errors

I don't think that's true, we can achieve better error messages with
recursive descent.

> Also, with the new rewrite of HsSyn, we should be able to mark such constructors as only usable in the parsing pass, so later passes wouldn't need to worry about them.

Not completely true, GhcPs-parametrized structures are the final
output of parsing, so at least the renamer will face these
constructors.

On Tue, Oct 9, 2018 at 1:00 AM Iavor Diatchki <[hidden email]> wrote:

>
> Hello,
>
> my experience with complex parsers written using parsing combinators is that they tend to be quite difficult to modify and have any kind of assurance that now you haven't broken something else.   While reduce-reduce errors are indeed annoying, you at least know that there is some sort of issue you need to address.   With a combinator based parser, you basically have to do program verification, or more pragmatically, have a large test suite and hope that you tested everything.
>
> I think the current approach is actually quite reasonable:  use the Happy grammar to parse out the basic structure of the program, without trying to be completely precise, and then have a separate pass that validates and fixes up the results.   While this has the draw-back of some constructors being in the "wrong place", there are also benefits---namely we can report better parse errors.  Also, with the new rewrite of HsSyn, we should be able to mark such constructors as only usable in the parsing pass, so later passes wouldn't need to worry about them.
>
> -Iavor
>
>
>
>
>
>
>
>
>
>
>
> On Mon, Oct 8, 2018 at 2:26 PM Simon Peyton Jones via ghc-devs <[hidden email]> wrote:
>>
>> I'm no parser expert, but a parser that was easier to understand and modify, and was as fast as the current one, sounds good to me.
>>
>> It's a tricky area though; e.g. the layout rule.
>>
>> Worth talking to Simon Marlow.
>>
>> Simon
>>
>>
>>
>> | -----Original Message-----
>> | From: ghc-devs <[hidden email]> On Behalf Of Vladislav
>> | Zavialov
>> | Sent: 08 October 2018 21:44
>> | To: ghc-devs <[hidden email]>
>> | Subject: Parser.y rewrite with parser combinators
>> |
>> | Hello devs,
>> |
>> | Recently I've been working on a couple of parsing-related issues in
>> | GHC. I implemented support for the -XStarIsType extension, fixed
>> | parsing of the (!) type operator (Trac #15457), allowed using type
>> | operators in existential contexts (Trac #15675).
>> |
>> | Doing these tasks required way more engineering effort than I expected
>> | from my prior experience working with parsers due to complexities of
>> | GHC's grammar.
>> |
>> | In the last couple of days, I've been working on Trac #1087 - a
>> | 12-year old parsing bug. After trying out a couple of approaches, to
>> | my dismay I realised that fixing it properly (including support for
>> | bang patterns inside infix constructors, etc) would require a complete
>> | rewrite of expression and pattern parsing logic.
>> |
>> | Worse yet, most of the work would be done outside Parser.y in Haskell
>> | code instead, in RdrHsSyn helpers. When I try to keep the logic inside
>> | Parser.y, in every design direction I face reduce/reduce conflicts.
>> |
>> | The reduce/reduce conflicts are the worst.
>> |
>> | Perhaps it is finally time to admit that Haskell syntax with all of
>> | the GHC cannot fit into a LALR grammar?
>> |
>> | The extent of hacks that we have right now just to make parsing
>> | possible is astonishing. For instance, we have dedicated constructors
>> | in HsExpr to make parsing patterns possible (EWildPat, EAsPat,
>> | EViewPat, ELazyPat). That is, one of the fundamental types (that the
>> | type checker operates on) has four additional constructors that exist
>> | due to a reduce/reduce conflict between patterns and expressions.
>> |
>> | I propose a complete rewrite of GHC's parser to use recursive descent
>> | parsing with monadic parser combinators.
>> |
>> | 1. We could significantly simplify parsing logic by doing things in a
>> | more direct manner. For instance, instead of parsing patterns as
>> | expressions and then post-processing them, we could have separate
>> | parsing logic for patterns and expressions.
>> |
>> | 2. We could fix long-standing parsing bugs like Trac #1087 because
>> | recursive descent offers more expressive power than LALR (at the cost
>> | of support for left recursion, which is not much of a loss in
>> | practice).
>> |
>> | 3. New extensions to the grammar would require less engineering effort.
>> |
>> | Of course, this rewrite is a huge chunk of work, so before I start, I
>> | would like to know that this work would be accepted if done well.
>> | Here's what I want to achieve:
>> |
>> | * Comparable performance. The new parser could turn out to be faster
>> | because it would do less post-processing, but it could be slower
>> | because 'happy' does all the sorts of low-level optimisations. I will
>> | consider this project a success only if comparable performance is
>> | achieved.
>> |
>> | * Correctness. The new parser should handle 100% of the syntactic
>> | constructs that the current parser can handle.
>> |
>> | * Error messages. The new error messages should be of equal or better
>> | quality than existing ones.
>> |
>> | * Elegance. The new parser should bring simplification to other parts
>> | of the compiler (e.g. removal of pattern constructors from HsExpr).
>> | And one of the design principles is to represent things by dedicated
>> | data structures, in contrast to the current state of affairs where we
>> | represent patterns as expressions, data constructor declarations as
>> | types (before D5180), etc.
>> |
>> | Let me know if this is a good/acceptable direction of travel. That's
>> | definitely something that I personally would like to see happen.
>> |
>> | All the best,
>> | - Vladislav
>> | _______________________________________________
>> | ghc-devs mailing list
>> | [hidden email]
>> | https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.hask
>> | ell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-
>> | devs&amp;data=02%7C01%7Csimonpj%40microsoft.com%7C19181de5c6bd493ab07a08d
>> | 62d5edbe0%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636746282778542095
>> | &amp;sdata=lFRt1t4k3BuuRdyOqwOYTZcLPRB%2BtFJwfFtgMpNLxW0%3D&amp;reserved=
>> | 0
>> _______________________________________________
>> ghc-devs mailing list
>> [hidden email]
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Parser.y rewrite with parser combinators

Vladislav Zavialov
In reply to this post by Alan & Kim Zimmerman
That is a very good point, thank you! I have not thought about
incremental parsing. That's something I need to research before I
start the rewrite.
On Tue, Oct 9, 2018 at 1:06 AM Alan & Kim Zimmerman <[hidden email]> wrote:

>
> I am not against this proposal,  but want to raise a possible future concern.
>
> As part of improving the haskell tooling environment I am keen on making GHC incremental, and have started a proof of concept based in the same techniques as used in the tree-sitter library.
>
> This is achieved by modifying happy, and requires minimal changes to the existing Parser.y.
>
> It would be unfortunate if this possibility was prevented by this rewrite.
>
> Alan
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Parser.y rewrite with parser combinators

Richard Eisenberg-4
I, too, have wondered about this.

A pair of students this summer were working on merging the type-level and term-level parsers, in preparation for, e.g., visible dependent quantification in terms (not to mention dependent types). If successful, this would have been an entirely internal refactor. In any case, it seemed impossible to do in an LALR parser, so the students instead parsed into a new datatype Term, which then got converted either to an HsExpr, an HsPat, or an HsType. The students never finished. But the experience suggests that moving away from LALR might be a good move.

All that said, I'm not sure how going to parser combinators stops us from needing an intermediate datatype to parse expressions/patterns into before we can tell whether they are expressions or patterns. For example, if we see `do K x y z ...`, we don't know whether we're parsing an expression or a pattern before we can see what's in the ..., which is arbitrarily later than the ambiguity starts. Of course, while we can write a backtracking parser with combinators, doing so doesn't seem like a particularly swell idea. This isn't an argument against using parser combinators, but fixing the pattern/expression ambiguity was a "pro" listed for them -- except I don't think this is correct.

Come to think of it, the problem with parsing expressions vs. types would persist just as much in the combinator style as it does in the LALR style, so perhaps I've talked myself into a corner. Nevertheless, it seems awkward to do half the parsing in one language (happy) and half in another.

Richard

> On Oct 8, 2018, at 6:38 PM, Vladislav Zavialov <[hidden email]> wrote:
>
> That is a very good point, thank you! I have not thought about
> incremental parsing. That's something I need to research before I
> start the rewrite.
> On Tue, Oct 9, 2018 at 1:06 AM Alan & Kim Zimmerman <[hidden email]> wrote:
>>
>> I am not against this proposal,  but want to raise a possible future concern.
>>
>> As part of improving the haskell tooling environment I am keen on making GHC incremental, and have started a proof of concept based in the same techniques as used in the tree-sitter library.
>>
>> This is achieved by modifying happy, and requires minimal changes to the existing Parser.y.
>>
>> It would be unfortunate if this possibility was prevented by this rewrite.
>>
>> Alan
> _______________________________________________
> ghc-devs mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Parser.y rewrite with parser combinators

Niklas Hambüchen
In reply to this post by Vladislav Zavialov
Another thing that may be of interest is that parser generators can guarantee you complexity bounds of parsing time (as usual, the goal is linear).

Some of the conflicts that annoy us about parser generators are often hints on this topic; if the parser generator succeeds, you are guaranteed to have a linear parser.

If backtracking is allowed in parser combinators, it is comparatively easy to get that wrong.

Niklas
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Parser.y rewrite with parser combinators

Vanessa McHale
In reply to this post by Vladislav Zavialov

I actually have some experience in this department, having authored both madlang and language-ats. Parsers using combinators alone are more brittle than parsers using Happy, at least for human-facing languages.

I'm also not sure what exactly parser combinators provide over Happy. It has macros that can emulate e.g. between, many. Drawing up a minimal example might be a good idea.


On 10/08/2018 05:24 PM, Vladislav Zavialov wrote:
complex parsers written using parsing combinators is that they tend to be quite difficult to modify and have any kind of assurance that now you haven't broken something else
That's true regardless of implementation technique, parsers are rather
delicate. A LALR-based parser generator does provide more information
when it detects shift/reduce and reduce/reduce conflicts, but I never
found this information useful. It was always quite the opposite of
being helpful - an indication that a LALR parser could not handle my
change and I had to look for workarounds.

With a combinator based parser, you basically have to do program verification, or more pragmatically, have a large test suite and hope that you tested everything.
Even when doing modifications to Parser.y, I relied mainly on the test
suite to determine whether my change was right (and the test suite
always caught many issues). A large test suite is the best approach
both for 'happy'-based parsers and for combinator-based parsers.

and then have a separate pass that validates and fixes up the results
That's where my concern lies. This separate pass is confusing (at
least for me - it's not the most straightforward thing to parse
something incorrectly and then restructure it), it is hard to modify,
it does not handle corner cases (e.g. #1087).

Since we have all this Haskell code that does a significant portion of
processing, why even bother with having a LALR pass before it?

namely we can report better parse errors
I don't think that's true, we can achieve better error messages with
recursive descent.

Also, with the new rewrite of HsSyn, we should be able to mark such constructors as only usable in the parsing pass, so later passes wouldn't need to worry about them.
Not completely true, GhcPs-parametrized structures are the final
output of parsing, so at least the renamer will face these
constructors.

On Tue, Oct 9, 2018 at 1:00 AM Iavor Diatchki [hidden email] wrote:
Hello,

my experience with complex parsers written using parsing combinators is that they tend to be quite difficult to modify and have any kind of assurance that now you haven't broken something else.   While reduce-reduce errors are indeed annoying, you at least know that there is some sort of issue you need to address.   With a combinator based parser, you basically have to do program verification, or more pragmatically, have a large test suite and hope that you tested everything.

I think the current approach is actually quite reasonable:  use the Happy grammar to parse out the basic structure of the program, without trying to be completely precise, and then have a separate pass that validates and fixes up the results.   While this has the draw-back of some constructors being in the "wrong place", there are also benefits---namely we can report better parse errors.  Also, with the new rewrite of HsSyn, we should be able to mark such constructors as only usable in the parsing pass, so later passes wouldn't need to worry about them.

-Iavor











On Mon, Oct 8, 2018 at 2:26 PM Simon Peyton Jones via ghc-devs [hidden email] wrote:
I'm no parser expert, but a parser that was easier to understand and modify, and was as fast as the current one, sounds good to me.

It's a tricky area though; e.g. the layout rule.

Worth talking to Simon Marlow.

Simon



| -----Original Message-----
| From: ghc-devs [hidden email] On Behalf Of Vladislav
| Zavialov
| Sent: 08 October 2018 21:44
| To: ghc-devs [hidden email]
| Subject: Parser.y rewrite with parser combinators
|
| Hello devs,
|
| Recently I've been working on a couple of parsing-related issues in
| GHC. I implemented support for the -XStarIsType extension, fixed
| parsing of the (!) type operator (Trac #15457), allowed using type
| operators in existential contexts (Trac #15675).
|
| Doing these tasks required way more engineering effort than I expected
| from my prior experience working with parsers due to complexities of
| GHC's grammar.
|
| In the last couple of days, I've been working on Trac #1087 - a
| 12-year old parsing bug. After trying out a couple of approaches, to
| my dismay I realised that fixing it properly (including support for
| bang patterns inside infix constructors, etc) would require a complete
| rewrite of expression and pattern parsing logic.
|
| Worse yet, most of the work would be done outside Parser.y in Haskell
| code instead, in RdrHsSyn helpers. When I try to keep the logic inside
| Parser.y, in every design direction I face reduce/reduce conflicts.
|
| The reduce/reduce conflicts are the worst.
|
| Perhaps it is finally time to admit that Haskell syntax with all of
| the GHC cannot fit into a LALR grammar?
|
| The extent of hacks that we have right now just to make parsing
| possible is astonishing. For instance, we have dedicated constructors
| in HsExpr to make parsing patterns possible (EWildPat, EAsPat,
| EViewPat, ELazyPat). That is, one of the fundamental types (that the
| type checker operates on) has four additional constructors that exist
| due to a reduce/reduce conflict between patterns and expressions.
|
| I propose a complete rewrite of GHC's parser to use recursive descent
| parsing with monadic parser combinators.
|
| 1. We could significantly simplify parsing logic by doing things in a
| more direct manner. For instance, instead of parsing patterns as
| expressions and then post-processing them, we could have separate
| parsing logic for patterns and expressions.
|
| 2. We could fix long-standing parsing bugs like Trac #1087 because
| recursive descent offers more expressive power than LALR (at the cost
| of support for left recursion, which is not much of a loss in
| practice).
|
| 3. New extensions to the grammar would require less engineering effort.
|
| Of course, this rewrite is a huge chunk of work, so before I start, I
| would like to know that this work would be accepted if done well.
| Here's what I want to achieve:
|
| * Comparable performance. The new parser could turn out to be faster
| because it would do less post-processing, but it could be slower
| because 'happy' does all the sorts of low-level optimisations. I will
| consider this project a success only if comparable performance is
| achieved.
|
| * Correctness. The new parser should handle 100% of the syntactic
| constructs that the current parser can handle.
|
| * Error messages. The new error messages should be of equal or better
| quality than existing ones.
|
| * Elegance. The new parser should bring simplification to other parts
| of the compiler (e.g. removal of pattern constructors from HsExpr).
| And one of the design principles is to represent things by dedicated
| data structures, in contrast to the current state of affairs where we
| represent patterns as expressions, data constructor declarations as
| types (before D5180), etc.
|
| Let me know if this is a good/acceptable direction of travel. That's
| definitely something that I personally would like to see happen.
|
| All the best,
| - Vladislav
| _______________________________________________
| ghc-devs mailing list
| [hidden email]
| https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.hask
| ell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-
| devs&amp;data=02%7C01%7Csimonpj%40microsoft.com%7C19181de5c6bd493ab07a08d
| 62d5edbe0%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636746282778542095
| &amp;sdata=lFRt1t4k3BuuRdyOqwOYTZcLPRB%2BtFJwfFtgMpNLxW0%3D&amp;reserved=
| 0
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

signature.asc (499 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Parser.y rewrite with parser combinators

Vladislav Zavialov
> I'm also not sure what exactly parser combinators provide over Happy.

Parser combinators offer backtracking. With 'happy' we get the
guarantee that we parse in linear time, but we lose it because of
post-processing that is not guaranteed to be linear. I think it'd be
easier to backtrack in the parser itself rather than in a later pass.


On Tue, Oct 9, 2018 at 6:47 AM Vanessa McHale <[hidden email]> wrote:

>
> I actually have some experience in this department, having authored both madlang and language-ats. Parsers using combinators alone are more brittle than parsers using Happy, at least for human-facing languages.
>
> I'm also not sure what exactly parser combinators provide over Happy. It has macros that can emulate e.g. between, many. Drawing up a minimal example might be a good idea.
>
>
> On 10/08/2018 05:24 PM, Vladislav Zavialov wrote:
>
> complex parsers written using parsing combinators is that they tend to be quite difficult to modify and have any kind of assurance that now you haven't broken something else
>
> That's true regardless of implementation technique, parsers are rather
> delicate. A LALR-based parser generator does provide more information
> when it detects shift/reduce and reduce/reduce conflicts, but I never
> found this information useful. It was always quite the opposite of
> being helpful - an indication that a LALR parser could not handle my
> change and I had to look for workarounds.
>
> With a combinator based parser, you basically have to do program verification, or more pragmatically, have a large test suite and hope that you tested everything.
>
> Even when doing modifications to Parser.y, I relied mainly on the test
> suite to determine whether my change was right (and the test suite
> always caught many issues). A large test suite is the best approach
> both for 'happy'-based parsers and for combinator-based parsers.
>
> and then have a separate pass that validates and fixes up the results
>
> That's where my concern lies. This separate pass is confusing (at
> least for me - it's not the most straightforward thing to parse
> something incorrectly and then restructure it), it is hard to modify,
> it does not handle corner cases (e.g. #1087).
>
> Since we have all this Haskell code that does a significant portion of
> processing, why even bother with having a LALR pass before it?
>
> namely we can report better parse errors
>
> I don't think that's true, we can achieve better error messages with
> recursive descent.
>
> Also, with the new rewrite of HsSyn, we should be able to mark such constructors as only usable in the parsing pass, so later passes wouldn't need to worry about them.
>
> Not completely true, GhcPs-parametrized structures are the final
> output of parsing, so at least the renamer will face these
> constructors.
>
> On Tue, Oct 9, 2018 at 1:00 AM Iavor Diatchki <[hidden email]> wrote:
>
> Hello,
>
> my experience with complex parsers written using parsing combinators is that they tend to be quite difficult to modify and have any kind of assurance that now you haven't broken something else.   While reduce-reduce errors are indeed annoying, you at least know that there is some sort of issue you need to address.   With a combinator based parser, you basically have to do program verification, or more pragmatically, have a large test suite and hope that you tested everything.
>
> I think the current approach is actually quite reasonable:  use the Happy grammar to parse out the basic structure of the program, without trying to be completely precise, and then have a separate pass that validates and fixes up the results.   While this has the draw-back of some constructors being in the "wrong place", there are also benefits---namely we can report better parse errors.  Also, with the new rewrite of HsSyn, we should be able to mark such constructors as only usable in the parsing pass, so later passes wouldn't need to worry about them.
>
> -Iavor
>
>
>
>
>
>
>
>
>
>
>
> On Mon, Oct 8, 2018 at 2:26 PM Simon Peyton Jones via ghc-devs <[hidden email]> wrote:
>
> I'm no parser expert, but a parser that was easier to understand and modify, and was as fast as the current one, sounds good to me.
>
> It's a tricky area though; e.g. the layout rule.
>
> Worth talking to Simon Marlow.
>
> Simon
>
>
>
> | -----Original Message-----
> | From: ghc-devs <[hidden email]> On Behalf Of Vladislav
> | Zavialov
> | Sent: 08 October 2018 21:44
> | To: ghc-devs <[hidden email]>
> | Subject: Parser.y rewrite with parser combinators
> |
> | Hello devs,
> |
> | Recently I've been working on a couple of parsing-related issues in
> | GHC. I implemented support for the -XStarIsType extension, fixed
> | parsing of the (!) type operator (Trac #15457), allowed using type
> | operators in existential contexts (Trac #15675).
> |
> | Doing these tasks required way more engineering effort than I expected
> | from my prior experience working with parsers due to complexities of
> | GHC's grammar.
> |
> | In the last couple of days, I've been working on Trac #1087 - a
> | 12-year old parsing bug. After trying out a couple of approaches, to
> | my dismay I realised that fixing it properly (including support for
> | bang patterns inside infix constructors, etc) would require a complete
> | rewrite of expression and pattern parsing logic.
> |
> | Worse yet, most of the work would be done outside Parser.y in Haskell
> | code instead, in RdrHsSyn helpers. When I try to keep the logic inside
> | Parser.y, in every design direction I face reduce/reduce conflicts.
> |
> | The reduce/reduce conflicts are the worst.
> |
> | Perhaps it is finally time to admit that Haskell syntax with all of
> | the GHC cannot fit into a LALR grammar?
> |
> | The extent of hacks that we have right now just to make parsing
> | possible is astonishing. For instance, we have dedicated constructors
> | in HsExpr to make parsing patterns possible (EWildPat, EAsPat,
> | EViewPat, ELazyPat). That is, one of the fundamental types (that the
> | type checker operates on) has four additional constructors that exist
> | due to a reduce/reduce conflict between patterns and expressions.
> |
> | I propose a complete rewrite of GHC's parser to use recursive descent
> | parsing with monadic parser combinators.
> |
> | 1. We could significantly simplify parsing logic by doing things in a
> | more direct manner. For instance, instead of parsing patterns as
> | expressions and then post-processing them, we could have separate
> | parsing logic for patterns and expressions.
> |
> | 2. We could fix long-standing parsing bugs like Trac #1087 because
> | recursive descent offers more expressive power than LALR (at the cost
> | of support for left recursion, which is not much of a loss in
> | practice).
> |
> | 3. New extensions to the grammar would require less engineering effort.
> |
> | Of course, this rewrite is a huge chunk of work, so before I start, I
> | would like to know that this work would be accepted if done well.
> | Here's what I want to achieve:
> |
> | * Comparable performance. The new parser could turn out to be faster
> | because it would do less post-processing, but it could be slower
> | because 'happy' does all the sorts of low-level optimisations. I will
> | consider this project a success only if comparable performance is
> | achieved.
> |
> | * Correctness. The new parser should handle 100% of the syntactic
> | constructs that the current parser can handle.
> |
> | * Error messages. The new error messages should be of equal or better
> | quality than existing ones.
> |
> | * Elegance. The new parser should bring simplification to other parts
> | of the compiler (e.g. removal of pattern constructors from HsExpr).
> | And one of the design principles is to represent things by dedicated
> | data structures, in contrast to the current state of affairs where we
> | represent patterns as expressions, data constructor declarations as
> | types (before D5180), etc.
> |
> | Let me know if this is a good/acceptable direction of travel. That's
> | definitely something that I personally would like to see happen.
> |
> | All the best,
> | - Vladislav
> | _______________________________________________
> | ghc-devs mailing list
> | [hidden email]
> | https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.hask
> | ell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-
> | devs&amp;data=02%7C01%7Csimonpj%40microsoft.com%7C19181de5c6bd493ab07a08d
> | 62d5edbe0%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636746282778542095
> | &amp;sdata=lFRt1t4k3BuuRdyOqwOYTZcLPRB%2BtFJwfFtgMpNLxW0%3D&amp;reserved=
> | 0
> _______________________________________________
> ghc-devs mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
> _______________________________________________
> ghc-devs mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
> _______________________________________________
> ghc-devs mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Parser.y rewrite with parser combinators

Vladislav Zavialov
In reply to this post by Richard Eisenberg-4
> For example, if we see `do K x y z ...`, we don't know whether we're parsing an expression or a pattern before we can see what's in the ..., which is arbitrarily later than the ambiguity starts. Of course, while we can write a backtracking parser with combinators, doing so doesn't seem like a particularly swell idea.

Backtracking is exactly what I wanted to do here. Perhaps it is lack
of theoretical background on my behalf showing, but I do not see
downsides to it. It supposedly robs us of linear time guarantee, but
consider this.

With 'happy' and post-processing we

1. Parse into an expression (linear in the amount of tokens)
2. If it turns out we needed a pattern, rejig (linear in the size of expression)

With parser combinators

1. Parse into an expression (linear in the amount of tokens)
2. If it turns out we needed a pattern, backtrack and parse into a
pattern (linear in the amount of tokens)

Doesn't post-processing that we do today mean that we don't actually
take advantage of the linearity guarantee?

On Tue, Oct 9, 2018 at 3:31 AM Richard Eisenberg <[hidden email]> wrote:

>
> I, too, have wondered about this.
>
> A pair of students this summer were working on merging the type-level and term-level parsers, in preparation for, e.g., visible dependent quantification in terms (not to mention dependent types). If successful, this would have been an entirely internal refactor. In any case, it seemed impossible to do in an LALR parser, so the students instead parsed into a new datatype Term, which then got converted either to an HsExpr, an HsPat, or an HsType. The students never finished. But the experience suggests that moving away from LALR might be a good move.
>
> All that said, I'm not sure how going to parser combinators stops us from needing an intermediate datatype to parse expressions/patterns into before we can tell whether they are expressions or patterns. For example, if we see `do K x y z ...`, we don't know whether we're parsing an expression or a pattern before we can see what's in the ..., which is arbitrarily later than the ambiguity starts. Of course, while we can write a backtracking parser with combinators, doing so doesn't seem like a particularly swell idea. This isn't an argument against using parser combinators, but fixing the pattern/expression ambiguity was a "pro" listed for them -- except I don't think this is correct.
>
> Come to think of it, the problem with parsing expressions vs. types would persist just as much in the combinator style as it does in the LALR style, so perhaps I've talked myself into a corner. Nevertheless, it seems awkward to do half the parsing in one language (happy) and half in another.
>
> Richard
>
> > On Oct 8, 2018, at 6:38 PM, Vladislav Zavialov <[hidden email]> wrote:
> >
> > That is a very good point, thank you! I have not thought about
> > incremental parsing. That's something I need to research before I
> > start the rewrite.
> > On Tue, Oct 9, 2018 at 1:06 AM Alan & Kim Zimmerman <[hidden email]> wrote:
> >>
> >> I am not against this proposal,  but want to raise a possible future concern.
> >>
> >> As part of improving the haskell tooling environment I am keen on making GHC incremental, and have started a proof of concept based in the same techniques as used in the tree-sitter library.
> >>
> >> This is achieved by modifying happy, and requires minimal changes to the existing Parser.y.
> >>
> >> It would be unfortunate if this possibility was prevented by this rewrite.
> >>
> >> Alan
> > _______________________________________________
> > ghc-devs mailing list
> > [hidden email]
> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Parser.y rewrite with parser combinators

Sven Panne-2
In reply to this post by Vladislav Zavialov
Am Di., 9. Okt. 2018 um 00:25 Uhr schrieb Vladislav Zavialov <[hidden email]>:
[...] That's true regardless of implementation technique, parsers are rather
delicate.

I think it's not the parsers themselves which are delicate, it is the language that they should recognize.
 
A LALR-based parser generator does provide more information
when it detects shift/reduce and reduce/reduce conflicts, but I never
found this information useful. It was always quite the opposite of
being helpful - an indication that a LALR parser could not handle my
change and I had to look for workarounds. [...]

Not that this would help at this point, but: The conflicts reported by parser generators like Happy are *extremely* valuable, they hint at tricky/ambiguous points in the grammar, which in turn is a strong hint that the language you're trying to parse has dark corners. IMHO every language designer and e.g. everybody proposing a syntactic extension to GHC should try to fit this into a grammar for Happy *before* proposing that extension. If you get conflicts, it is a very strong hint that the language is hard to parse by *humans*, too, which is the most important thing to consider. Haskell already has tons of syntactic warts which can only be parsed by infinite lookahead, which is only a minor technical problem, but a major usablity problem. "Programs are meant to be read by humans and only incidentally for computers to execute." (D.E.K.) </rant> ;-)

The situation is a bit strange: We all love strong guarantees offered by type checking, but somehow most people shy away from "syntactic type checking" offered by parser generators. Parser combinators are the Python of parsing: Easy to use initially, but a maintenance hell in the long run for larger projects...

Cheers,
   S.

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Parser.y rewrite with parser combinators

Sven Panne-2
In reply to this post by Vladislav Zavialov
Am Di., 9. Okt. 2018 um 09:18 Uhr schrieb Vladislav Zavialov <[hidden email]>:
[...] With parser combinators

1. Parse into an expression (linear in the amount of tokens)
2. If it turns out we needed a pattern, backtrack and parse into a
pattern (linear in the amount of tokens) [...]

In a larger grammar implemented by parser combinators it is quite hard to guarantee that you don't backtrack while backtracking, which would easily result in exponential runtime. And given the size of the language GHC recognizes, I can almost guarantee that this will happen unless you use formal methods. :-)

Cheers,
   S.

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Parser.y rewrite with parser combinators

Vladislav Zavialov
In reply to this post by Sven Panne-2
> which in turn is a strong hint that the language you're trying to parse has dark corners. IMHO every language designer and e.g. everybody proposing a syntactic extension to GHC should try to fit this into a grammar for Happy *before* proposing that extension

I do agree here! Having a language that has a context-free grammar
would be superb. The issue is that Haskell with GHC extensions is
already far from this point and it isn't helping to first pretend that
it is, and then do half of the parsing in post-processing because it
has no such constraints.
On Tue, Oct 9, 2018 at 10:23 AM Sven Panne <[hidden email]> wrote:

>
> Am Di., 9. Okt. 2018 um 00:25 Uhr schrieb Vladislav Zavialov <[hidden email]>:
>>
>> [...] That's true regardless of implementation technique, parsers are rather
>> delicate.
>
>
> I think it's not the parsers themselves which are delicate, it is the language that they should recognize.
>
>>
>> A LALR-based parser generator does provide more information
>> when it detects shift/reduce and reduce/reduce conflicts, but I never
>> found this information useful. It was always quite the opposite of
>> being helpful - an indication that a LALR parser could not handle my
>> change and I had to look for workarounds. [...]
>
>
> Not that this would help at this point, but: The conflicts reported by parser generators like Happy are *extremely* valuable, they hint at tricky/ambiguous points in the grammar, which in turn is a strong hint that the language you're trying to parse has dark corners. IMHO every language designer and e.g. everybody proposing a syntactic extension to GHC should try to fit this into a grammar for Happy *before* proposing that extension. If you get conflicts, it is a very strong hint that the language is hard to parse by *humans*, too, which is the most important thing to consider. Haskell already has tons of syntactic warts which can only be parsed by infinite lookahead, which is only a minor technical problem, but a major usablity problem. "Programs are meant to be read by humans and only incidentally for computers to execute." (D.E.K.) </rant> ;-)
>
> The situation is a bit strange: We all love strong guarantees offered by type checking, but somehow most people shy away from "syntactic type checking" offered by parser generators. Parser combinators are the Python of parsing: Easy to use initially, but a maintenance hell in the long run for larger projects...
>
> Cheers,
>    S.
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Parser.y rewrite with parser combinators

Vladislav Zavialov
In reply to this post by Sven Panne-2
>  backtrack while backtracking <...> I can almost guarantee that this will happen unless you use formal methods

That is a great idea, I can track backtracking depth in a type-level
natural number and make sure it doesn't go over 1 (or add
justification with performance analysis when it does). Formal methods
for the win :-)
On Tue, Oct 9, 2018 at 10:27 AM Sven Panne <[hidden email]> wrote:

>
> Am Di., 9. Okt. 2018 um 09:18 Uhr schrieb Vladislav Zavialov <[hidden email]>:
>>
>> [...] With parser combinators
>>
>> 1. Parse into an expression (linear in the amount of tokens)
>> 2. If it turns out we needed a pattern, backtrack and parse into a
>> pattern (linear in the amount of tokens) [...]
>
>
> In a larger grammar implemented by parser combinators it is quite hard to guarantee that you don't backtrack while backtracking, which would easily result in exponential runtime. And given the size of the language GHC recognizes, I can almost guarantee that this will happen unless you use formal methods. :-)
>
> Cheers,
>    S.
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

RE: Parser.y rewrite with parser combinators

GHC - devs mailing list
In reply to this post by Sven Panne-2

We all love strong guarantees offered by type checking, but somehow most people shy away from "syntactic type checking" offered by parser generators. Parser combinators are the Python of parsing: Easy to use initially, but a maintenance hell in the long run for larger projects...

I’d never thought of it that way before – interesting.

 

Simon

 

From: ghc-devs <[hidden email]> On Behalf Of Sven Panne
Sent: 09 October 2018 08:23
To: [hidden email]
Cc: GHC developers <[hidden email]>
Subject: Re: Parser.y rewrite with parser combinators

 

Am Di., 9. Okt. 2018 um 00:25 Uhr schrieb Vladislav Zavialov <[hidden email]>:

[...] That's true regardless of implementation technique, parsers are rather
delicate.

 

I think it's not the parsers themselves which are delicate, it is the language that they should recognize.

 

A LALR-based parser generator does provide more information
when it detects shift/reduce and reduce/reduce conflicts, but I never
found this information useful. It was always quite the opposite of
being helpful - an indication that a LALR parser could not handle my
change and I had to look for workarounds. [...]

 

Not that this would help at this point, but: The conflicts reported by parser generators like Happy are *extremely* valuable, they hint at tricky/ambiguous points in the grammar, which in turn is a strong hint that the language you're trying to parse has dark corners. IMHO every language designer and e.g. everybody proposing a syntactic extension to GHC should try to fit this into a grammar for Happy *before* proposing that extension. If you get conflicts, it is a very strong hint that the language is hard to parse by *humans*, too, which is the most important thing to consider. Haskell already has tons of syntactic warts which can only be parsed by infinite lookahead, which is only a minor technical problem, but a major usablity problem. "Programs are meant to be read by humans and only incidentally for computers to execute." (D.E.K.) </rant> ;-)

 

The situation is a bit strange: We all love strong guarantees offered by type checking, but somehow most people shy away from "syntactic type checking" offered by parser generators. Parser combinators are the Python of parsing: Easy to use initially, but a maintenance hell in the long run for larger projects...

 

Cheers,

   S.


_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Parser.y rewrite with parser combinators

Vladislav Zavialov
It's a nice way to look at the problem, and we're facing the same
issues as with insufficiently powerful type systems. LALR is the Go of
parsing in this case :)

I'd rather write Python and have a larger test suite than deal with
lack of generics in Go, if you allow me to take the analogy that far.

In fact, we do have a fair share of boilerplate in our current grammar
due to lack of parametrisation. That's another issue that would be
solved by parser combinators (or by a fancier parser generator, but
I'm not aware of such one).

On Tue, Oct 9, 2018 at 1:52 PM Simon Peyton Jones <[hidden email]> wrote:

>
> We all love strong guarantees offered by type checking, but somehow most people shy away from "syntactic type checking" offered by parser generators. Parser combinators are the Python of parsing: Easy to use initially, but a maintenance hell in the long run for larger projects...
>
> I’d never thought of it that way before – interesting.
>
>
>
> Simon
>
>
>
> From: ghc-devs <[hidden email]> On Behalf Of Sven Panne
> Sent: 09 October 2018 08:23
> To: [hidden email]
> Cc: GHC developers <[hidden email]>
> Subject: Re: Parser.y rewrite with parser combinators
>
>
>
> Am Di., 9. Okt. 2018 um 00:25 Uhr schrieb Vladislav Zavialov <[hidden email]>:
>
> [...] That's true regardless of implementation technique, parsers are rather
> delicate.
>
>
>
> I think it's not the parsers themselves which are delicate, it is the language that they should recognize.
>
>
>
> A LALR-based parser generator does provide more information
> when it detects shift/reduce and reduce/reduce conflicts, but I never
> found this information useful. It was always quite the opposite of
> being helpful - an indication that a LALR parser could not handle my
> change and I had to look for workarounds. [...]
>
>
>
> Not that this would help at this point, but: The conflicts reported by parser generators like Happy are *extremely* valuable, they hint at tricky/ambiguous points in the grammar, which in turn is a strong hint that the language you're trying to parse has dark corners. IMHO every language designer and e.g. everybody proposing a syntactic extension to GHC should try to fit this into a grammar for Happy *before* proposing that extension. If you get conflicts, it is a very strong hint that the language is hard to parse by *humans*, too, which is the most important thing to consider. Haskell already has tons of syntactic warts which can only be parsed by infinite lookahead, which is only a minor technical problem, but a major usablity problem. "Programs are meant to be read by humans and only incidentally for computers to execute." (D.E.K.) </rant> ;-)
>
>
>
> The situation is a bit strange: We all love strong guarantees offered by type checking, but somehow most people shy away from "syntactic type checking" offered by parser generators. Parser combinators are the Python of parsing: Easy to use initially, but a maintenance hell in the long run for larger projects...
>
>
>
> Cheers,
>
>    S.
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Parser.y rewrite with parser combinators

Richard Eisenberg-4
I think one problem is that we don't even have bounded levels of backtracking, because (with view patterns) you can put expressions into patterns.

Consider

> f = do K x (z -> ...

Do we have a constructor pattern with a view pattern inside it? Or do we have an expression with a required visible type application and a function type? (This last bit will be possible only with visible dependent quantification in terms, but I'm confident that Vlad will appreciate the example.) We'll need nested backtracking to sort this disaster out -- especially if we have another `do` in the ...

What I'm trying to say here is that tracking the backtracking level in types doesn't seem like it will fly (tempting though it may be).

Richard

> On Oct 9, 2018, at 7:08 AM, Vladislav Zavialov <[hidden email]> wrote:
>
> It's a nice way to look at the problem, and we're facing the same
> issues as with insufficiently powerful type systems. LALR is the Go of
> parsing in this case :)
>
> I'd rather write Python and have a larger test suite than deal with
> lack of generics in Go, if you allow me to take the analogy that far.
>
> In fact, we do have a fair share of boilerplate in our current grammar
> due to lack of parametrisation. That's another issue that would be
> solved by parser combinators (or by a fancier parser generator, but
> I'm not aware of such one).
>
> On Tue, Oct 9, 2018 at 1:52 PM Simon Peyton Jones <[hidden email]> wrote:
>>
>> We all love strong guarantees offered by type checking, but somehow most people shy away from "syntactic type checking" offered by parser generators. Parser combinators are the Python of parsing: Easy to use initially, but a maintenance hell in the long run for larger projects...
>>
>> I’d never thought of it that way before – interesting.
>>
>>
>>
>> Simon
>>
>>
>>
>> From: ghc-devs <[hidden email]> On Behalf Of Sven Panne
>> Sent: 09 October 2018 08:23
>> To: [hidden email]
>> Cc: GHC developers <[hidden email]>
>> Subject: Re: Parser.y rewrite with parser combinators
>>
>>
>>
>> Am Di., 9. Okt. 2018 um 00:25 Uhr schrieb Vladislav Zavialov <[hidden email]>:
>>
>> [...] That's true regardless of implementation technique, parsers are rather
>> delicate.
>>
>>
>>
>> I think it's not the parsers themselves which are delicate, it is the language that they should recognize.
>>
>>
>>
>> A LALR-based parser generator does provide more information
>> when it detects shift/reduce and reduce/reduce conflicts, but I never
>> found this information useful. It was always quite the opposite of
>> being helpful - an indication that a LALR parser could not handle my
>> change and I had to look for workarounds. [...]
>>
>>
>>
>> Not that this would help at this point, but: The conflicts reported by parser generators like Happy are *extremely* valuable, they hint at tricky/ambiguous points in the grammar, which in turn is a strong hint that the language you're trying to parse has dark corners. IMHO every language designer and e.g. everybody proposing a syntactic extension to GHC should try to fit this into a grammar for Happy *before* proposing that extension. If you get conflicts, it is a very strong hint that the language is hard to parse by *humans*, too, which is the most important thing to consider. Haskell already has tons of syntactic warts which can only be parsed by infinite lookahead, which is only a minor technical problem, but a major usablity problem. "Programs are meant to be read by humans and only incidentally for computers to execute." (D.E.K.) </rant> ;-)
>>
>>
>>
>> The situation is a bit strange: We all love strong guarantees offered by type checking, but somehow most people shy away from "syntactic type checking" offered by parser generators. Parser combinators are the Python of parsing: Easy to use initially, but a maintenance hell in the long run for larger projects...
>>
>>
>>
>> Cheers,
>>
>>   S.
> _______________________________________________
> ghc-devs mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Parser.y rewrite with parser combinators

Vladislav Zavialov
I agree with you. This example puts a nail on the coffin of the
backtracking approach.

I will have to think of something else, and at this point a full
rewrite to parser combinators does not seem as appealing.

Thanks!

On Tue, Oct 9, 2018 at 4:45 PM Richard Eisenberg <[hidden email]> wrote:

>
> I think one problem is that we don't even have bounded levels of backtracking, because (with view patterns) you can put expressions into patterns.
>
> Consider
>
> > f = do K x (z -> ...
>
> Do we have a constructor pattern with a view pattern inside it? Or do we have an expression with a required visible type application and a function type? (This last bit will be possible only with visible dependent quantification in terms, but I'm confident that Vlad will appreciate the example.) We'll need nested backtracking to sort this disaster out -- especially if we have another `do` in the ...
>
> What I'm trying to say here is that tracking the backtracking level in types doesn't seem like it will fly (tempting though it may be).
>
> Richard
>
> > On Oct 9, 2018, at 7:08 AM, Vladislav Zavialov <[hidden email]> wrote:
> >
> > It's a nice way to look at the problem, and we're facing the same
> > issues as with insufficiently powerful type systems. LALR is the Go of
> > parsing in this case :)
> >
> > I'd rather write Python and have a larger test suite than deal with
> > lack of generics in Go, if you allow me to take the analogy that far.
> >
> > In fact, we do have a fair share of boilerplate in our current grammar
> > due to lack of parametrisation. That's another issue that would be
> > solved by parser combinators (or by a fancier parser generator, but
> > I'm not aware of such one).
> >
> > On Tue, Oct 9, 2018 at 1:52 PM Simon Peyton Jones <[hidden email]> wrote:
> >>
> >> We all love strong guarantees offered by type checking, but somehow most people shy away from "syntactic type checking" offered by parser generators. Parser combinators are the Python of parsing: Easy to use initially, but a maintenance hell in the long run for larger projects...
> >>
> >> I’d never thought of it that way before – interesting.
> >>
> >>
> >>
> >> Simon
> >>
> >>
> >>
> >> From: ghc-devs <[hidden email]> On Behalf Of Sven Panne
> >> Sent: 09 October 2018 08:23
> >> To: [hidden email]
> >> Cc: GHC developers <[hidden email]>
> >> Subject: Re: Parser.y rewrite with parser combinators
> >>
> >>
> >>
> >> Am Di., 9. Okt. 2018 um 00:25 Uhr schrieb Vladislav Zavialov <[hidden email]>:
> >>
> >> [...] That's true regardless of implementation technique, parsers are rather
> >> delicate.
> >>
> >>
> >>
> >> I think it's not the parsers themselves which are delicate, it is the language that they should recognize.
> >>
> >>
> >>
> >> A LALR-based parser generator does provide more information
> >> when it detects shift/reduce and reduce/reduce conflicts, but I never
> >> found this information useful. It was always quite the opposite of
> >> being helpful - an indication that a LALR parser could not handle my
> >> change and I had to look for workarounds. [...]
> >>
> >>
> >>
> >> Not that this would help at this point, but: The conflicts reported by parser generators like Happy are *extremely* valuable, they hint at tricky/ambiguous points in the grammar, which in turn is a strong hint that the language you're trying to parse has dark corners. IMHO every language designer and e.g. everybody proposing a syntactic extension to GHC should try to fit this into a grammar for Happy *before* proposing that extension. If you get conflicts, it is a very strong hint that the language is hard to parse by *humans*, too, which is the most important thing to consider. Haskell already has tons of syntactic warts which can only be parsed by infinite lookahead, which is only a minor technical problem, but a major usablity problem. "Programs are meant to be read by humans and only incidentally for computers to execute." (D.E.K.) </rant> ;-)
> >>
> >>
> >>
> >> The situation is a bit strange: We all love strong guarantees offered by type checking, but somehow most people shy away from "syntactic type checking" offered by parser generators. Parser combinators are the Python of parsing: Easy to use initially, but a maintenance hell in the long run for larger projects...
> >>
> >>
> >>
> >> Cheers,
> >>
> >>   S.
> > _______________________________________________
> > ghc-devs mailing list
> > [hidden email]
> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
12