Parsing funny arrows

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Parsing funny arrows

Csongor Kiss
Hello devs,

I am trying to modify GHC's parser to allow the following syntax in types:

  a -> @m b

but my naive attempt was unsuccessful:

type :: { LHsType GhcPs }
        : btype                        { $1 }
        | btype '->' PREFIX_AT btype ctype  ...

For example when I try to parse the following code (and turn on the lexer debug log):
  
  test :: a -> @m b
  test = undefined

I get the following 

token: ITvarid "test"
token: ITdcolon NormalSyntax
token: ITvarid "a"
token: ITrarrow NormalSyntax
token: ITtypeApp
token: ITvarid "m"
token: ITvarid "b"
token: ITsemi

Parse.hs:2:1: error:
    parse error (possibly incorrect indentation or mismatched brackets)
  |
2 | test = undefined


I don't have much experience with hacking on the parser so I'm likely missing something obvious.
Could someone please point at what I might be doing wrong?

Thanks in advance.

Cheers,
Csongor

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Parsing funny arrows

Vladislav Zavialov-2
Hi Csongor,

I believe the reason for this failure is that  a -> @m b  gets parsed as  a -> @(m b).
Why is that? Because a ‘btype’ includes type-level application.

If you replace the ‘btype’ after PREFIX_AT with an ‘atype’, this particular issue should go away. At least that’s my hypothesis, I haven’t tested it.

- Vlad

> On 29 Aug 2020, at 01:32, Csongor Kiss <[hidden email]> wrote:
>
> Hello devs,
>
> I am trying to modify GHC's parser to allow the following syntax in types:
>
>   a -> @m b
>
> but my naive attempt was unsuccessful:
>
> type :: { LHsType GhcPs }
>         : btype                        { $1 }
>         | btype '->' PREFIX_AT btype ctype  ...
>
> For example when I try to parse the following code (and turn on the lexer debug log):
>  
>   test :: a -> @m b
>   test = undefined
>
> I get the following
>
> token: ITvarid "test"
> token: ITdcolon NormalSyntax
> token: ITvarid "a"
> token: ITrarrow NormalSyntax
> token: ITtypeApp
> token: ITvarid "m"
> token: ITvarid "b"
> token: ITsemi
>
> Parse.hs:2:1: error:
>     parse error (possibly incorrect indentation or mismatched brackets)
>   |
> 2 | test = undefined
>
>
> I don't have much experience with hacking on the parser so I'm likely missing something obvious.
> Could someone please point at what I might be doing wrong?
>
> Thanks in advance.
>
> Cheers,
> Csongor
> _______________________________________________
> ghc-devs mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Parsing funny arrows

Shayne Fletcher


On Fri, Aug 28, 2020 at 7:38 PM Vladislav Zavialov <[hidden email]> wrote:
Hi Csongor,

I believe the reason for this failure is that  a -> @m b  gets parsed as  a -> @(m b).
Why is that? Because a ‘btype’ includes type-level application.

If you replace the ‘btype’ after PREFIX_AT with an ‘atype’, this particular issue should go away. At least that’s my hypothesis, I haven’t tested it.


I confirm that this is correct and with that change the example string reduces as hoped.


- Vlad
 
--
Shayne Fletcher

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Parsing funny arrows

Shayne Fletcher


On Fri, Aug 28, 2020 at 7:48 PM Shayne Fletcher <[hidden email]> wrote:


On Fri, Aug 28, 2020 at 7:38 PM Vladislav Zavialov <[hidden email]> wrote:
Hi Csongor,

I believe the reason for this failure is that  a -> @m b  gets parsed as  a -> @(m b).
Why is that? Because a ‘btype’ includes type-level application.

If you replace the ‘btype’ after PREFIX_AT with an ‘atype’, this particular issue should go away. At least that’s my hypothesis, I haven’t tested it.


I confirm that this is correct and with that change the example string reduces as hoped.


- Vlad

Also, with that correction there are no new shift/reduce conflicts. The original rule gave rise to 3.

--
Shayne Fletcher

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Parsing funny arrows

Csongor Kiss
Thanks a lot Vlad and Shayne, that indeed did the trick!

Out of curiosity, how could I have figured out that this was the culprit? The parse
error I got was a bit puzzling, and I couldn't find any flags that would give more information
(I think I was looking for the parser equivalent of -ddump-tc-trace).

Best,
Csongor

On 29 Aug 2020, at 00:51, Shayne Fletcher <[hidden email]> wrote:



On Fri, Aug 28, 2020 at 7:48 PM Shayne Fletcher <[hidden email]> wrote:


On Fri, Aug 28, 2020 at 7:38 PM Vladislav Zavialov <[hidden email]> wrote:
Hi Csongor,

I believe the reason for this failure is that  a -> @m b  gets parsed as  a -> @(m b).
Why is that? Because a ‘btype’ includes type-level application.

If you replace the ‘btype’ after PREFIX_AT with an ‘atype’, this particular issue should go away. At least that’s my hypothesis, I haven’t tested it.


I confirm that this is correct and with that change the example string reduces as hoped.


- Vlad

Also, with that correction there are no new shift/reduce conflicts. The original rule gave rise to 3.

-- 
Shayne Fletcher


_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Parsing funny arrows

Vladislav Zavialov-2
The lexer produces only as many tokens as the parser requires. In the ‘lexerDbg’ dump that you included in the message, there were these lines:

  token: ITvarid "m"
  token: ITvarid "b"
  token: ITsemi

So I knew that the parser consumed the entire string, up to the virtual semicolon. I also recognized the parse error as the one produced by ‘happy’ rather than by a later validation pass. So even though the parser consumed the entire string, it failed. Therefore, it didn’t expect this string to end so abruptly, it expected it to continue.

But what did it expect to find? To figure it out, we need to know which grammar production is involved in the failure. The only grammar production that could’ve consumed the ‘->’ PREFIX_AT sequence successfully and proceed to process the rest of the string is this one:

   btype '->' PREFIX_AT btype ctype

By inspecting the definitions of ‘btype’ and ‘ctype’, one can see that neither of those accept the empty string, and both of those accept type-level function application. Thus it’s possible that ‘btype’ consumed “m b” as an application, and ‘ctype’ failed because it didn’t accept the remaining empty string:

  btype = “m b”
  ctype = parse error (nothing to consume)

But what you wanted instead was:

  btype = “m”
  ctype = “b”

The solution is to use ‘atype’ instead of ‘btype’, as ‘atype’ does not accept type-level application.

By the way, there’s also the input string on which the original grammar would’ve succeeded (or at least I think so):

  test :: a -> @m forall b. b
  test = undefined

That’s because ‘btype’ wouldn’t have consumed the ‘forall’, it would’ve stopped at this point. And then ‘ctype’ could’ve consumed “forall b. b”.

I don’t think there’s a parser equivalent of -ddump-tc-trace. You’ll need to figure this stuff out by reading the grammar and keeping in mind that ‘happy’ generates a shift-reduce parser that does not backtrack. The ‘lexerDbg’ output is useful to see how far the parser got. And there’s also this command in case you want to go low level and inspect the state machine generated by ‘happy’:

    happy -agc --strict compiler/GHC/Parser.y -idetailed-info

Hope this helps,
- Vlad

> On 29 Aug 2020, at 10:16, Csongor Kiss <[hidden email]> wrote:
>
> Thanks a lot Vlad and Shayne, that indeed did the trick!
>
> Out of curiosity, how could I have figured out that this was the culprit? The parse
> error I got was a bit puzzling, and I couldn't find any flags that would give more information
> (I think I was looking for the parser equivalent of -ddump-tc-trace).
>
> Best,
> Csongor

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Parsing funny arrows

Csongor Kiss
Thanks once again for your detailed analysis, it's really helpful.
Especially the forall example - I did try constructing some string that would parse, but I was shooting in the dark.

Best,
Csongor

> On 29 Aug 2020, at 09:05, Vladislav Zavialov <[hidden email]> wrote:
>
> The lexer produces only as many tokens as the parser requires. In the ‘lexerDbg’ dump that you included in the message, there were these lines:
>
>  token: ITvarid "m"
>  token: ITvarid "b"
>  token: ITsemi
>
> So I knew that the parser consumed the entire string, up to the virtual semicolon. I also recognized the parse error as the one produced by ‘happy’ rather than by a later validation pass. So even though the parser consumed the entire string, it failed. Therefore, it didn’t expect this string to end so abruptly, it expected it to continue.
>
> But what did it expect to find? To figure it out, we need to know which grammar production is involved in the failure. The only grammar production that could’ve consumed the ‘->’ PREFIX_AT sequence successfully and proceed to process the rest of the string is this one:
>
>   btype '->' PREFIX_AT btype ctype
>
> By inspecting the definitions of ‘btype’ and ‘ctype’, one can see that neither of those accept the empty string, and both of those accept type-level function application. Thus it’s possible that ‘btype’ consumed “m b” as an application, and ‘ctype’ failed because it didn’t accept the remaining empty string:
>
>  btype = “m b”
>  ctype = parse error (nothing to consume)
>
> But what you wanted instead was:
>
>  btype = “m”
>  ctype = “b”
>
> The solution is to use ‘atype’ instead of ‘btype’, as ‘atype’ does not accept type-level application.
>
> By the way, there’s also the input string on which the original grammar would’ve succeeded (or at least I think so):
>
>  test :: a -> @m forall b. b
>  test = undefined
>
> That’s because ‘btype’ wouldn’t have consumed the ‘forall’, it would’ve stopped at this point. And then ‘ctype’ could’ve consumed “forall b. b”.
>
> I don’t think there’s a parser equivalent of -ddump-tc-trace. You’ll need to figure this stuff out by reading the grammar and keeping in mind that ‘happy’ generates a shift-reduce parser that does not backtrack. The ‘lexerDbg’ output is useful to see how far the parser got. And there’s also this command in case you want to go low level and inspect the state machine generated by ‘happy’:
>
>    happy -agc --strict compiler/GHC/Parser.y -idetailed-info
>
> Hope this helps,
> - Vlad
>
>> On 29 Aug 2020, at 10:16, Csongor Kiss <[hidden email]> wrote:
>>
>> Thanks a lot Vlad and Shayne, that indeed did the trick!
>>
>> Out of curiosity, how could I have figured out that this was the culprit? The parse
>> error I got was a bit puzzling, and I couldn't find any flags that would give more information
>> (I think I was looking for the parser equivalent of -ddump-tc-trace).
>>
>> Best,
>> Csongor
>

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Parsing funny arrows

Brandon Allbery
In reply to this post by Csongor Kiss
Another way to figure it out is the shift/reduce conflict on @, which tells you it had two ways to recognize it. "Reduce" here means returning to your parser rule, so "shift" means btype wanted to recognize the @. Inspecting btype would then have shown that it was looking for a type application.

On Sat, Aug 29, 2020, 03:17 Csongor Kiss <[hidden email]> wrote:
Thanks a lot Vlad and Shayne, that indeed did the trick!

Out of curiosity, how could I have figured out that this was the culprit? The parse
error I got was a bit puzzling, and I couldn't find any flags that would give more information
(I think I was looking for the parser equivalent of -ddump-tc-trace).

Best,
Csongor

On 29 Aug 2020, at 00:51, Shayne Fletcher <[hidden email]> wrote:



On Fri, Aug 28, 2020 at 7:48 PM Shayne Fletcher <[hidden email]> wrote:


On Fri, Aug 28, 2020 at 7:38 PM Vladislav Zavialov <[hidden email]> wrote:
Hi Csongor,

I believe the reason for this failure is that  a -> @m b  gets parsed as  a -> @(m b).
Why is that? Because a ‘btype’ includes type-level application.

If you replace the ‘btype’ after PREFIX_AT with an ‘atype’, this particular issue should go away. At least that’s my hypothesis, I haven’t tested it.


I confirm that this is correct and with that change the example string reduces as hoped.


- Vlad

Also, with that correction there are no new shift/reduce conflicts. The original rule gave rise to 3.

-- 
Shayne Fletcher

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Parsing funny arrows

Vladislav Zavialov-2
Hi Brandon, I’m afraid your analysis is not entirely correct. The shift/reduce conflict is not on @ but after it.

- Vlad

> On 29 Aug 2020, at 14:42, Brandon Allbery <[hidden email]> wrote:
>
> Another way to figure it out is the shift/reduce conflict on @, which tells you it had two ways to recognize it. "Reduce" here means returning to your parser rule, so "shift" means btype wanted to recognize the @. Inspecting btype would then have shown that it was looking for a type application.
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs