Parsing Haskell with Layout

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Parsing Haskell with Layout

Benjamin Redelings
Hi,

Section 10.3 of the Haskell 2010 report describes the layout rule for
Haskell in terms of rules for inserting curly braces and semicolons into
the token stream.

I'm not sure of the theory behind implementing Note 5 in section 10.3. 
This says roughly that we insert a "}" token after token t[n] if the
tokens t[1]...t[n+1] have no valid parses, and the sequence of tokens
t[1] ... t[n] followed by "}" DO have a valid parse.

I think this allows you to write

f z = let x=2;y=3 in x+y+z

However, except for Note 5, it seems like you could do a simple
preprocessing step on the token stream to insert "{", ";", and "}"
before parsing.  Note 5 seems more complicated.  (I feel like I read a
note somewhere that said "read this and weep" and then failed to
implement Note 5.  But now I can't remember where I read this.)

In simple explanations like Write you a haskell
(http://dev.stephendiehl.com/fun/008_extended_parser.html) it seems that
the approach is simply to ignore Note 5.

Can anyone fill me in on

(a) what concepts are assumed by the report that would make it "obvious"
how Note 5 is supposed to be implemented?

(b) a simple (hopefully) implementation strategy for implementing it?
I'm looking for a language-independent approach for parsing.

I did see some mention of error-correcting parsers in the Parsec paper,
which might allow adding the "}" at a parse error.  But I see there is a
paper called " An error correcting parser for context free grammars that
takes less than cubic time".  The "less than cubic time" seems rather
worrisome, so I don't know if this is the right approach.

Also, when using parsec, it seems that you might get an idea how many
tokens are a valid prefix of the grammer when the token stream fails to
parse.  Hence, you could try inserting a "}" into the token stream at
the point of a parse error.  However, a naive approach would then
require re-parsing the whole stream from the beginning until each parse
error, which seems very expensive.

Thanks for any help!

-BenRI

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Parsing Haskell with Layout

Brandon Allbery
In the context of parsec, you'd probably behave as if you'd seen a closing brace (adjusting the generated AST accordingly) instead of actually inserting one. And likewise with respect to inserting semicolons. The spec isn't written in terms of that because it specifies effective behavior in terms of an unspecified parser design, instead of committing the implementor to a specific parser design.

On Tue, Aug 28, 2018 at 3:53 PM Benjamin Redelings <[hidden email]> wrote:
Hi,

Section 10.3 of the Haskell 2010 report describes the layout rule for
Haskell in terms of rules for inserting curly braces and semicolons into
the token stream.

I'm not sure of the theory behind implementing Note 5 in section 10.3. 
This says roughly that we insert a "}" token after token t[n] if the
tokens t[1]...t[n+1] have no valid parses, and the sequence of tokens
t[1] ... t[n] followed by "}" DO have a valid parse.

I think this allows you to write

f z = let x=2;y=3 in x+y+z

However, except for Note 5, it seems like you could do a simple
preprocessing step on the token stream to insert "{", ";", and "}"
before parsing.  Note 5 seems more complicated.  (I feel like I read a
note somewhere that said "read this and weep" and then failed to
implement Note 5.  But now I can't remember where I read this.)

In simple explanations like Write you a haskell
(http://dev.stephendiehl.com/fun/008_extended_parser.html) it seems that
the approach is simply to ignore Note 5.

Can anyone fill me in on

(a) what concepts are assumed by the report that would make it "obvious"
how Note 5 is supposed to be implemented?

(b) a simple (hopefully) implementation strategy for implementing it?
I'm looking for a language-independent approach for parsing.

I did see some mention of error-correcting parsers in the Parsec paper,
which might allow adding the "}" at a parse error.  But I see there is a
paper called " An error correcting parser for context free grammars that
takes less than cubic time".  The "less than cubic time" seems rather
worrisome, so I don't know if this is the right approach.

Also, when using parsec, it seems that you might get an idea how many
tokens are a valid prefix of the grammer when the token stream fails to
parse.  Hence, you could try inserting a "}" into the token stream at
the point of a parse error.  However, a naive approach would then
require re-parsing the whole stream from the beginning until each parse
error, which seems very expensive.

Thanks for any help!

-BenRI

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.


--
brandon s allbery kf8nh

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Parsing Haskell with Layout

J. Garrett Morris-5
In reply to this post by Benjamin Redelings
On Tue, Aug 28, 2018 at 2:53 PM Benjamin Redelings
<[hidden email]> wrote:
> I'm not sure of the theory behind implementing Note 5 in section 10.3.
> This says roughly that we insert a "}" token after token t[n] if the
> tokens t[1]...t[n+1] have no valid parses, and the sequence of tokens
> t[1] ... t[n] followed by "}" DO have a valid parse.
>
> I think this allows you to write
>
> f z = let x=2;y=3 in x+y+z

Yes.

> (a) what concepts are assumed by the report that would make it "obvious"
> how Note 5 is supposed to be implemented?

I don't think the report is intending to suggest a particular
implementation approach.  GHC's is described at
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/Parser.

> (b) a simple (hopefully) implementation strategy for implementing it?
> I'm looking for a language-independent approach for parsing.

Perhaps you would find Michael Adams "Principled parsing for
indentation-sensitive languages: Revisiting Landin’s offside rule"
helpful.  http://dx.doi.org/10.1145/2429069.2429129

That approach is implemented by the "indentation" library on Hackage,
for either Parsec or Trifecta.

 /g

--
Prosperum ac felix scelus virtus vocatur
 -- Seneca
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Parsing Haskell with Layout

Benjamin Redelings
Great, thanks!


On 08/29/2018 12:32 PM, J. Garrett Morris wrote:

> On Tue, Aug 28, 2018 at 2:53 PM Benjamin Redelings
> <[hidden email]> wrote:
>> I'm not sure of the theory behind implementing Note 5 in section 10.3.
>> This says roughly that we insert a "}" token after token t[n] if the
>> tokens t[1]...t[n+1] have no valid parses, and the sequence of tokens
>> t[1] ... t[n] followed by "}" DO have a valid parse.
>>
>> I think this allows you to write
>>
>> f z = let x=2;y=3 in x+y+z
> Yes.
>
>> (a) what concepts are assumed by the report that would make it "obvious"
>> how Note 5 is supposed to be implemented?
> I don't think the report is intending to suggest a particular
> implementation approach.  GHC's is described at
> https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/Parser.
>
>> (b) a simple (hopefully) implementation strategy for implementing it?
>> I'm looking for a language-independent approach for parsing.
> Perhaps you would find Michael Adams "Principled parsing for
> indentation-sensitive languages: Revisiting Landin’s offside rule"
> helpful.  http://dx.doi.org/10.1145/2429069.2429129
>
> That approach is implemented by the "indentation" library on Hackage,
> for either Parsec or Trifecta.
>
>   /g
>

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.