How to do reversible parsing?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

How to do reversible parsing?

Haskell - Haskell-Cafe mailing list
Hi. How to do deterministic reversible parsing in Haskell? Please, point me to some libs or tools.
Or give some advice on how to write such lib or tool. (Also, I am interested in solutions for C or C++.)

Now let me explain my task in detail.

I am interested in formal math. I am going to develop various proof checkers. In particular, I want
to develop my PTS checker ( https://en.wikipedia.org/wiki/Pure_type_system ). Here is example of
existing PTS checker: http://web.archive.org/web/20180924133846/http://www.cs.kun.nl/~janz/yarrow/ .
Here is example of input this checker accepts (taken from manual):

Var (+) : Nat -> Nat -> Nat
Def double := \x:Nat. x+x
Def S := (+) one

Also, I want to develop something similar to Isabelle ( https://isabelle.in.tum.de/ ). Example of
Isabelle input: https://isabelle.in.tum.de/dist/library/HOL/HOL/HOL.html .

Also, I am going to write converters between various languages. For example, between input for some
existing PTS checker and my PTS checker.

In short, I want to write lot of parsers and pretty-printers for various languages. And thus I need
some parsing solution.

The solution should have the following features:

1. The parsing should be divided into lexing and parsing. I. e. usual lex + yacc model. Parsing should
be done using usual context-free grammars (CFGs). Left-recursive grammars should be supported (this
leaves out parser combinator libraries). All languages I am going to deal with fit this model. I. e.
I will deal with simple to parse languages, such as Pascal. Not complex ones, like C or C++.

2. Parsing should be deterministic. I. e. parser should check that CFG is unambiguous. Yes, I know
that this task is impossible to do on Turing machine. But at very least some partial solution will
go. For example, some algorithm, which will check that some reasonable set of CFGs is deterministic.
Or even brute force up to some limit will go. (Yes, this will not give absolutely correct results,
but this will go.) This is example of grammar for PTS in one of my checkers (start symbol is t0):

t1000 = id
t1000 = "(" t0")"
t999 = t999 t1000
t3 = t4 "::" t3
t3 = "%" id "::" t0 "." t3
t0 = "!!" id "::" t0 "." t0
t1 = t2 "==>" t1
t0 = t1
t1 = t2
t2 = t3
t3 = t4
t4 = t999
t999 = t1000

If I remember correctly, this grammar is deterministic (checked using brute force up to a limit).
I want solution which will verify that this grammar is deterministic.

All stages of parsing should be proved as deterministic (i. e. scanner too).

3. Parsing should be reversible. I. e. pretty-printing should be supported, too. And the tool should
guarantee that parsing and pretty-printing match. Now let me give more precise definition. Let's
assume we have these functions:

parse :: String -> Maybe X
print :: X -> Maybe String

As you can see, not every internal representation is printable. This is OK. For example, if internal
representation contains variable names that happen to equal reserved words, then, of course, this is
unprintable representation.

The tool should guarantee that:
a. Result of parsing should be printable, but printing not necessary should give same string. I. e.
if "parse s == Just x", then "print x /= Nothing". (But not necessary "print x == Just s".)
b. Parsing of result of printing should give same representation. I. e. if "print x == Just s", then
"parse s == Just x"

4. Not only parsing should generate parse tree (i. e. exact representation of derivation), but also
AST. AST differs from parse tree. For example, AST should not contain braces. I. e. "2 + (3 * 4)" and
"2 + 3 * 4" should give same AST. And the tool should be able parse string to AST and pretty print
AST back to string.

5. It should be possible to use CFG not known in compile time. Consider Isabelle language. Here is
again example of Isabelle text: https://isabelle.in.tum.de/dist/library/HOL/HOL/HOL.html . As you can
see the language is divided into inner language and outer language. Strings in inner language are
quoted. Outer language can add productions to inner language. Inner language cannot change itself.
"notation" and "translations" commands add productions to inner language. So inner language is not
known beforehand, it is created dynamically. There is no need to dynamically create scanner, i. e.
it should be possible to dynamically create parser (for inner language), but there is no such
requirement for scanner. Also, parser will not change in the middle of parsing given string.

Again: let's assume I write parser for my Isabelle analog. Scanner and parser for outer language is
fixed. Scanner for inner language is fixed, too. But parser for inner language is not. I read input
document and add productions to CFG of inner language on the fly. Every time I add production,
resulting CFG should be checked for ambiguity. And when I find inner language string in input, I
process it using just constructed parser for inner language.

--

Points 2 and 3 are the most important in this list. If there is solution which contains them alone,
this already will be very good.

Now let me list considered solutions.

* Usual combinator libraries (such as parsec) will not go. They don't work with left-recursive CFGs.
They don't check grammars for ambiguities. They don't do reversible parsing. But they allow creating
language on the fly, and this is good thing.

* Happy and alex will not go. As well as I know Happy can prove unambiguity of some very simple grammars
(i. e. for some set of simple grammars it will not report any conflicts and thus will prove they unambiguity).
It seems this is LR(1) set. But, as well as I know, Happy will report conflicts on PTS grammar given
above, i. e. Happy cannot prove its unambiguity. Also, Happy and Alex cannot pretty-print. And Happy
and Alex don't allow supplying CFG on the fly.

* Package "earley". Handle arbitrary CFGs, including left-recursive. This is good thing. It seems CFG
can be supplied in runtime. This is good thing, too. Gives all parse trees, thus we can verify that
count of parse trees equals to 1. This is good thing, too. Unfortunately, earley cannot verify CFS's
unambiguity before supplying input. Also, earley doesn't support pretty-printing.

* PEG libraries. Will not go, I want CFGs.

* I searched on Hackage reversible parsing libraries. Unfortunately, none of them guarantee
reversibility. I. e. it is very easy to fool these libraries and create parsers and printers not
satisfying equations I gave above. Also, this libraries doesn't have other features I mentioned

* http://augeas.net/ is very cool project. Unlike nearly everything I found in internet, Augeas
provides guaranteed reversible parsing and pretty-printing (!!!!!). Unfortunately, set of supported
target languages is very limited. Regular languages are supported. And some very limited set of CFGs
is supported. For example, it is possible to construct parser for JSON. But parser for Pascal seems
impossible. Augeas not designed for parsing programming languages. Only simple configuration
languages like JSON, XML, etc. Mutually recursive nonterminals are not supported. Also, Augeas
provides C API, not Haskell one.

* https://www.brics.dk/xsugar.html is very good. It provides guaranteed reversible translation
between arbitrary language and XML. And XSugar even verifies unambiguity of the CFG using some
sophisticated algorithm. Unfortunately, this algorithm still cannot prove unambiguity of PTS grammar
given above. Also, XSugar is not Haskell library. It is standalone Java app. So, I need to convert
some language to XML, then load this XML to my Haskell program and parse. Also, there is similar lib
https://www.brics.dk/Xact.html from same author, but it is for Java, not for Haskell.

I was able to find only 2 projects, which do guaranteed reversible parsing: mentioned Augeas and
XSugar/Xact. Unfortunately, both are not for Haskell. And both cannot prove unambiguity for my PTS
grammar.

So, please tell me about solution. Or give advice how to write one.

Also, let me describe my experience with various algorithms for checking ambiguity of grammar.

1. Try to construct LR(1) table. If there is no conflicts, then the grammar is unambiguous. It seems
this is algorithm used by bison and happy. Unfortunately it is too weak and cannot prove unambiguity
of my PTS grammar.
2. Try to construct LR(k) table. This algorithm is stronger than LR(1). I was unable to find
implementation of this algorithm in any language. I don't want to write such algorithm.
3. Schmitz's algorithm. http://www.lsv.fr/~schmitz/pub/expamb.pdf . Even stronger than LR(k). I
downloaded his bison fork. Unfortunately this bison fork can prove unambiguity for same set of
grammars original bison can (i. e. LR(1)). I. e. it seems Schmitz did not implemented his algorithm.
I don't want to implement his algorithm either.
4. Horizontal and vertical ambiguity checking. https://www.brics.dk/grammar/kruse.pdf . This is
algorithm used in XSugar. Unfortunately, it doesn't handle my PTS grammar.
5. Brute force up to some limit. Doesn't give unambiguity guarantee. But I like it, and I will use
it if I don't find anything else. It is the only algorithm which can handle my PTS grammar (and
give enough amount of guarantee), not counting LR(k) and Schmitz's algorithm (I was not able to find
implementations of both).

I will try to write tool I want, I will write about my progress to this list.

==
Askar Safin
https://github.com/safinaskar
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: How to do reversible parsing?

Johannes Waldmann-2
Rendell and Ostermann,
Invertible syntax descriptions,
Haskell Symposium '10
https://doi.org/10.1145/1863523.1863525
https://hackage.haskell.org/package/invertible-syntax

For the general background, see, e.g.,

Grohne and Voigtländer 2017
Formalizing semantic bidirectionalization  ...
https://doi.org/10.1016/j.jlamp.2016.04.002

And there's a workshop series
on Bidirectional Transformations (from 2012 onwards)

- J.


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: How to do reversible parsing?

Michal J Gajda
In reply to this post by Haskell - Haskell-Cafe mailing list
Hi Askar,

> Hi. How to do deterministic reversible parsing in Haskell? Please, point me to some libs or tools.
> Or give some advice on how to write such lib or tool. (Also, I am interested in solutions for C or C++.)
>
> Now let me explain my task in detail.

I liked the bidirectional applicatives and monads described in here:

1. Composing bidirectional programs monadically (with appendices) -
Li-yao Xia, Dominic Orchard, Meng Wang
https://arxiv.org/abs/1902.06950

You can use it for both bidirectional parsing/pretty pretting and type
inference/type checking.

2. BiGuL is a bit more complicated:
https://bitbucket.org/prl_tokyo/bigul/src/master/
--
  Cheers
    Michał
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: How to do reversible parsing?

Johannes Waldmann-2
In reply to this post by Johannes Waldmann-2
Perhaps you don't even need to pretty-print?

If you're just echoing parts of the input,
then have the parser add source range annotations to the AST.
(Example: see Section A.2 of https://arxiv.org/abs/2009.01326)

Of course, if you are generating code, then this does not work.

- J.

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: How to do reversible parsing?

Mario
In reply to this post by Haskell - Haskell-Cafe mailing list
That's one ambitious research program you've taken on. I'm not aware of
any complete solution, nor even anything close to it. I can only provide
a couple of potentially useful pieces. I'm the author of the
grammatical-parsers and construct Haskell libraries. The former can
handle left-recursive grammars, the latter is bidirectional. I hope you
can find some inspiration there.

[1] http://hackage.haskell.org/package/grammatical-parsers
[2] http://hackage.haskell.org/package/construct


On 2021-01-02 6:59 p.m., Askar Safin via Haskell-Cafe wrote:

> Hi. How to do deterministic reversible parsing in Haskell? Please, point me to some libs or tools.
> Or give some advice on how to write such lib or tool. (Also, I am interested in solutions for C or C++.)
>
> Now let me explain my task in detail.
>
> I am interested in formal math. I am going to develop various proof checkers. In particular, I want
> to develop my PTS checker ( https://en.wikipedia.org/wiki/Pure_type_system ). Here is example of
> existing PTS checker: http://web.archive.org/web/20180924133846/http://www.cs.kun.nl/~janz/yarrow/ .
> Here is example of input this checker accepts (taken from manual):
>
> Var (+) : Nat -> Nat -> Nat
> Def double := \x:Nat. x+x
> Def S := (+) one
>
> Also, I want to develop something similar to Isabelle ( https://isabelle.in.tum.de/ ). Example of
> Isabelle input: https://isabelle.in.tum.de/dist/library/HOL/HOL/HOL.html .
>
> Also, I am going to write converters between various languages. For example, between input for some
> existing PTS checker and my PTS checker.
>
> In short, I want to write lot of parsers and pretty-printers for various languages. And thus I need
> some parsing solution.
>
> The solution should have the following features:
>
> 1. The parsing should be divided into lexing and parsing. I. e. usual lex + yacc model. Parsing should
> be done using usual context-free grammars (CFGs). Left-recursive grammars should be supported (this
> leaves out parser combinator libraries). All languages I am going to deal with fit this model. I. e.
> I will deal with simple to parse languages, such as Pascal. Not complex ones, like C or C++.
>
> 2. Parsing should be deterministic. I. e. parser should check that CFG is unambiguous. Yes, I know
> that this task is impossible to do on Turing machine. But at very least some partial solution will
> go. For example, some algorithm, which will check that some reasonable set of CFGs is deterministic.
> Or even brute force up to some limit will go. (Yes, this will not give absolutely correct results,
> but this will go.) This is example of grammar for PTS in one of my checkers (start symbol is t0):
>
> t1000 = id
> t1000 = "(" t0")"
> t999 = t999 t1000
> t3 = t4 "::" t3
> t3 = "%" id "::" t0 "." t3
> t0 = "!!" id "::" t0 "." t0
> t1 = t2 "==>" t1
> t0 = t1
> t1 = t2
> t2 = t3
> t3 = t4
> t4 = t999
> t999 = t1000
>
> If I remember correctly, this grammar is deterministic (checked using brute force up to a limit).
> I want solution which will verify that this grammar is deterministic.
>
> All stages of parsing should be proved as deterministic (i. e. scanner too).
>
> 3. Parsing should be reversible. I. e. pretty-printing should be supported, too. And the tool should
> guarantee that parsing and pretty-printing match. Now let me give more precise definition. Let's
> assume we have these functions:
>
> parse :: String -> Maybe X
> print :: X -> Maybe String
>
> As you can see, not every internal representation is printable. This is OK. For example, if internal
> representation contains variable names that happen to equal reserved words, then, of course, this is
> unprintable representation.
>
> The tool should guarantee that:
> a. Result of parsing should be printable, but printing not necessary should give same string. I. e.
> if "parse s == Just x", then "print x /= Nothing". (But not necessary "print x == Just s".)
> b. Parsing of result of printing should give same representation. I. e. if "print x == Just s", then
> "parse s == Just x"
>
> 4. Not only parsing should generate parse tree (i. e. exact representation of derivation), but also
> AST. AST differs from parse tree. For example, AST should not contain braces. I. e. "2 + (3 * 4)" and
> "2 + 3 * 4" should give same AST. And the tool should be able parse string to AST and pretty print
> AST back to string.
>
> 5. It should be possible to use CFG not known in compile time. Consider Isabelle language. Here is
> again example of Isabelle text: https://isabelle.in.tum.de/dist/library/HOL/HOL/HOL.html . As you can
> see the language is divided into inner language and outer language. Strings in inner language are
> quoted. Outer language can add productions to inner language. Inner language cannot change itself.
> "notation" and "translations" commands add productions to inner language. So inner language is not
> known beforehand, it is created dynamically. There is no need to dynamically create scanner, i. e.
> it should be possible to dynamically create parser (for inner language), but there is no such
> requirement for scanner. Also, parser will not change in the middle of parsing given string.
>
> Again: let's assume I write parser for my Isabelle analog. Scanner and parser for outer language is
> fixed. Scanner for inner language is fixed, too. But parser for inner language is not. I read input
> document and add productions to CFG of inner language on the fly. Every time I add production,
> resulting CFG should be checked for ambiguity. And when I find inner language string in input, I
> process it using just constructed parser for inner language.
>
> --
>
> Points 2 and 3 are the most important in this list. If there is solution which contains them alone,
> this already will be very good.
>
> Now let me list considered solutions.
>
> * Usual combinator libraries (such as parsec) will not go. They don't work with left-recursive CFGs.
> They don't check grammars for ambiguities. They don't do reversible parsing. But they allow creating
> language on the fly, and this is good thing.
>
> * Happy and alex will not go. As well as I know Happy can prove unambiguity of some very simple grammars
> (i. e. for some set of simple grammars it will not report any conflicts and thus will prove they unambiguity).
> It seems this is LR(1) set. But, as well as I know, Happy will report conflicts on PTS grammar given
> above, i. e. Happy cannot prove its unambiguity. Also, Happy and Alex cannot pretty-print. And Happy
> and Alex don't allow supplying CFG on the fly.
>
> * Package "earley". Handle arbitrary CFGs, including left-recursive. This is good thing. It seems CFG
> can be supplied in runtime. This is good thing, too. Gives all parse trees, thus we can verify that
> count of parse trees equals to 1. This is good thing, too. Unfortunately, earley cannot verify CFS's
> unambiguity before supplying input. Also, earley doesn't support pretty-printing.
>
> * PEG libraries. Will not go, I want CFGs.
>
> * I searched on Hackage reversible parsing libraries. Unfortunately, none of them guarantee
> reversibility. I. e. it is very easy to fool these libraries and create parsers and printers not
> satisfying equations I gave above. Also, this libraries doesn't have other features I mentioned
>
> * http://augeas.net/ is very cool project. Unlike nearly everything I found in internet, Augeas
> provides guaranteed reversible parsing and pretty-printing (!!!!!). Unfortunately, set of supported
> target languages is very limited. Regular languages are supported. And some very limited set of CFGs
> is supported. For example, it is possible to construct parser for JSON. But parser for Pascal seems
> impossible. Augeas not designed for parsing programming languages. Only simple configuration
> languages like JSON, XML, etc. Mutually recursive nonterminals are not supported. Also, Augeas
> provides C API, not Haskell one.
>
> * https://www.brics.dk/xsugar.html is very good. It provides guaranteed reversible translation
> between arbitrary language and XML. And XSugar even verifies unambiguity of the CFG using some
> sophisticated algorithm. Unfortunately, this algorithm still cannot prove unambiguity of PTS grammar
> given above. Also, XSugar is not Haskell library. It is standalone Java app. So, I need to convert
> some language to XML, then load this XML to my Haskell program and parse. Also, there is similar lib
> https://www.brics.dk/Xact.html from same author, but it is for Java, not for Haskell.
>
> I was able to find only 2 projects, which do guaranteed reversible parsing: mentioned Augeas and
> XSugar/Xact. Unfortunately, both are not for Haskell. And both cannot prove unambiguity for my PTS
> grammar.
>
> So, please tell me about solution. Or give advice how to write one.
>
> Also, let me describe my experience with various algorithms for checking ambiguity of grammar.
>
> 1. Try to construct LR(1) table. If there is no conflicts, then the grammar is unambiguous. It seems
> this is algorithm used by bison and happy. Unfortunately it is too weak and cannot prove unambiguity
> of my PTS grammar.
> 2. Try to construct LR(k) table. This algorithm is stronger than LR(1). I was unable to find
> implementation of this algorithm in any language. I don't want to write such algorithm.
> 3. Schmitz's algorithm. http://www.lsv.fr/~schmitz/pub/expamb.pdf . Even stronger than LR(k). I
> downloaded his bison fork. Unfortunately this bison fork can prove unambiguity for same set of
> grammars original bison can (i. e. LR(1)). I. e. it seems Schmitz did not implemented his algorithm.
> I don't want to implement his algorithm either.
> 4. Horizontal and vertical ambiguity checking. https://www.brics.dk/grammar/kruse.pdf . This is
> algorithm used in XSugar. Unfortunately, it doesn't handle my PTS grammar.
> 5. Brute force up to some limit. Doesn't give unambiguity guarantee. But I like it, and I will use
> it if I don't find anything else. It is the only algorithm which can handle my PTS grammar (and
> give enough amount of guarantee), not counting LR(k) and Schmitz's algorithm (I was not able to find
> implementations of both).
>
> I will try to write tool I want, I will write about my progress to this list.
>
> ==
> Askar Safin
> https://github.com/safinaskar
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: How to do reversible parsing?

Haskell - Haskell-Cafe mailing list
I poked a bit at this problem late last year[1][2]. I tried to go all the way to an abstraction for monoidal functors, but that might have been aiming too high. Instead, it could be worthwhile to create subclasses of Invariant[3] which stand in for Applicative/Divisible and Alternative/Decidable (and probably their unitless superclasses also). This would give you something like Pickler Combinators[4] on a foundation that looks a bit more like normal Applicative parsers.

The other thing I looked at last year was profunctors `P a a` where you use the same type variable in both positions. Instead of following the "monadic profunctors" work of Xia, Orchard, and Wang[5], I had some interesting discussions with Tom Ellis about expanding the hierarchy of the `product-profunctors` package[6][7][8]. (Which I really should revisit and help tie off.)

[1]: http://jackkelly.name/blog/archives/2020/08/19/abstracting_over_applicative_alternative_divisible_and_decidable
[2]: http://jackkelly.name/blog/archives/2020/11/03/profunctor_decoders_optical_decoders
[3]: https://hackage.haskell.org/package/invariant
[4]: https://www.microsoft.com/en-us/research/wp-content/uploads/2004/01/picklercombinators.pdf
[5]: https://poisson.chat/esop19/composing-bidir-prog-monadically.pdf
[6]: https://github.com/tomjaguarpaw/product-profunctors/issues/51
[7]: https://github.com/tomjaguarpaw/product-profunctors/pull/53
[8]: https://github.com/tomjaguarpaw/product-profunctors/pull/54

HTH,

-- Jack

January 5, 2021 4:23 AM, "Mario" <[hidden email]> wrote:

> That's one ambitious research program you've taken on. I'm not aware of
> any complete solution, nor even anything close to it. I can only provide
> a couple of potentially useful pieces. I'm the author of the
> grammatical-parsers and construct Haskell libraries. The former can
> handle left-recursive grammars, the latter is bidirectional. I hope you
> can find some inspiration there.
>
> [1] http://hackage.haskell.org/package/grammatical-parsers
> [2] http://hackage.haskell.org/package/construct
>
> On 2021-01-02 6:59 p.m., Askar Safin via Haskell-Cafe wrote:
>
>> Hi. How to do deterministic reversible parsing in Haskell? Please, point me to some libs or tools.
>> Or give some advice on how to write such lib or tool. (Also, I am interested in solutions for C or
>> C++.)
>>
>> Now let me explain my task in detail.
>>
>> I am interested in formal math. I am going to develop various proof checkers. In particular, I want
>> to develop my PTS checker ( https://en.wikipedia.org/wiki/Pure_type_system ). Here is example of
>> existing PTS checker: http://web.archive.org/web/20180924133846/http://www.cs.kun.nl/~janz/yarrow .
>> Here is example of input this checker accepts (taken from manual):
>>
>> Var (+) : Nat -> Nat -> Nat
>> Def double := \x:Nat. x+x
>> Def S := (+) one
>>
>> Also, I want to develop something similar to Isabelle ( https://isabelle.in.tum.de ). Example of
>> Isabelle input: https://isabelle.in.tum.de/dist/library/HOL/HOL/HOL.html .
>>
>> Also, I am going to write converters between various languages. For example, between input for some
>> existing PTS checker and my PTS checker.
>>
>> In short, I want to write lot of parsers and pretty-printers for various languages. And thus I need
>> some parsing solution.
>>
>> The solution should have the following features:
>>
>> 1. The parsing should be divided into lexing and parsing. I. e. usual lex + yacc model. Parsing
>> should
>> be done using usual context-free grammars (CFGs). Left-recursive grammars should be supported (this
>> leaves out parser combinator libraries). All languages I am going to deal with fit this model. I.
>> e.
>> I will deal with simple to parse languages, such as Pascal. Not complex ones, like C or C++.
>>
>> 2. Parsing should be deterministic. I. e. parser should check that CFG is unambiguous. Yes, I know
>> that this task is impossible to do on Turing machine. But at very least some partial solution will
>> go. For example, some algorithm, which will check that some reasonable set of CFGs is
>> deterministic.
>> Or even brute force up to some limit will go. (Yes, this will not give absolutely correct results,
>> but this will go.) This is example of grammar for PTS in one of my checkers (start symbol is t0):
>>
>> t1000 = id
>> t1000 = "(" t0")"
>> t999 = t999 t1000
>> t3 = t4 "::" t3
>> t3 = "%" id "::" t0 "." t3
>> t0 = "!!" id "::" t0 "." t0
>> t1 = t2 "==>" t1
>> t0 = t1
>> t1 = t2
>> t2 = t3
>> t3 = t4
>> t4 = t999
>> t999 = t1000
>>
>> If I remember correctly, this grammar is deterministic (checked using brute force up to a limit).
>> I want solution which will verify that this grammar is deterministic.
>>
>> All stages of parsing should be proved as deterministic (i. e. scanner too).
>>
>> 3. Parsing should be reversible. I. e. pretty-printing should be supported, too. And the tool
>> should
>> guarantee that parsing and pretty-printing match. Now let me give more precise definition. Let's
>> assume we have these functions:
>>
>> parse :: String -> Maybe X
>> print :: X -> Maybe String
>>
>> As you can see, not every internal representation is printable. This is OK. For example, if
>> internal
>> representation contains variable names that happen to equal reserved words, then, of course, this
>> is
>> unprintable representation.
>>
>> The tool should guarantee that:
>> a. Result of parsing should be printable, but printing not necessary should give same string. I. e.
>> if "parse s == Just x", then "print x /= Nothing". (But not necessary "print x == Just s".)
>> b. Parsing of result of printing should give same representation. I. e. if "print x == Just s",
>> then
>> "parse s == Just x"
>>
>> 4. Not only parsing should generate parse tree (i. e. exact representation of derivation), but also
>> AST. AST differs from parse tree. For example, AST should not contain braces. I. e. "2 + (3 * 4)"
>> and
>> "2 + 3 * 4" should give same AST. And the tool should be able parse string to AST and pretty print
>> AST back to string.
>>
>> 5. It should be possible to use CFG not known in compile time. Consider Isabelle language. Here is
>> again example of Isabelle text: https://isabelle.in.tum.de/dist/library/HOL/HOL/HOL.html . As you
>> can
>> see the language is divided into inner language and outer language. Strings in inner language are
>> quoted. Outer language can add productions to inner language. Inner language cannot change itself.
>> "notation" and "translations" commands add productions to inner language. So inner language is not
>> known beforehand, it is created dynamically. There is no need to dynamically create scanner, i. e.
>> it should be possible to dynamically create parser (for inner language), but there is no such
>> requirement for scanner. Also, parser will not change in the middle of parsing given string.
>>
>> Again: let's assume I write parser for my Isabelle analog. Scanner and parser for outer language is
>> fixed. Scanner for inner language is fixed, too. But parser for inner language is not. I read input
>> document and add productions to CFG of inner language on the fly. Every time I add production,
>> resulting CFG should be checked for ambiguity. And when I find inner language string in input, I
>> process it using just constructed parser for inner language.
>>
>> --
>>
>> Points 2 and 3 are the most important in this list. If there is solution which contains them alone,
>> this already will be very good.
>>
>> Now let me list considered solutions.
>>
>> * Usual combinator libraries (such as parsec) will not go. They don't work with left-recursive
>> CFGs.
>> They don't check grammars for ambiguities. They don't do reversible parsing. But they allow
>> creating
>> language on the fly, and this is good thing.
>>
>> * Happy and alex will not go. As well as I know Happy can prove unambiguity of some very simple
>> grammars
>> (i. e. for some set of simple grammars it will not report any conflicts and thus will prove they
>> unambiguity).
>> It seems this is LR(1) set. But, as well as I know, Happy will report conflicts on PTS grammar
>> given
>> above, i. e. Happy cannot prove its unambiguity. Also, Happy and Alex cannot pretty-print. And
>> Happy
>> and Alex don't allow supplying CFG on the fly.
>>
>> * Package "earley". Handle arbitrary CFGs, including left-recursive. This is good thing. It seems
>> CFG
>> can be supplied in runtime. This is good thing, too. Gives all parse trees, thus we can verify that
>> count of parse trees equals to 1. This is good thing, too. Unfortunately, earley cannot verify
>> CFS's
>> unambiguity before supplying input. Also, earley doesn't support pretty-printing.
>>
>> * PEG libraries. Will not go, I want CFGs.
>>
>> * I searched on Hackage reversible parsing libraries. Unfortunately, none of them guarantee
>> reversibility. I. e. it is very easy to fool these libraries and create parsers and printers not
>> satisfying equations I gave above. Also, this libraries doesn't have other features I mentioned
>>
>> * http://augeas.net is very cool project. Unlike nearly everything I found in internet, Augeas
>> provides guaranteed reversible parsing and pretty-printing (!!!!!). Unfortunately, set of supported
>> target languages is very limited. Regular languages are supported. And some very limited set of
>> CFGs
>> is supported. For example, it is possible to construct parser for JSON. But parser for Pascal seems
>> impossible. Augeas not designed for parsing programming languages. Only simple configuration
>> languages like JSON, XML, etc. Mutually recursive nonterminals are not supported. Also, Augeas
>> provides C API, not Haskell one.
>>
>> * https://www.brics.dk/xsugar.html is very good. It provides guaranteed reversible translation
>> between arbitrary language and XML. And XSugar even verifies unambiguity of the CFG using some
>> sophisticated algorithm. Unfortunately, this algorithm still cannot prove unambiguity of PTS
>> grammar
>> given above. Also, XSugar is not Haskell library. It is standalone Java app. So, I need to convert
>> some language to XML, then load this XML to my Haskell program and parse. Also, there is similar
>> lib
>> https://www.brics.dk/Xact.html from same author, but it is for Java, not for Haskell.
>>
>> I was able to find only 2 projects, which do guaranteed reversible parsing: mentioned Augeas and
>> XSugar/Xact. Unfortunately, both are not for Haskell. And both cannot prove unambiguity for my PTS
>> grammar.
>>
>> So, please tell me about solution. Or give advice how to write one.
>>
>> Also, let me describe my experience with various algorithms for checking ambiguity of grammar.
>>
>> 1. Try to construct LR(1) table. If there is no conflicts, then the grammar is unambiguous. It
>> seems
>> this is algorithm used by bison and happy. Unfortunately it is too weak and cannot prove
>> unambiguity
>> of my PTS grammar.
>> 2. Try to construct LR(k) table. This algorithm is stronger than LR(1). I was unable to find
>> implementation of this algorithm in any language. I don't want to write such algorithm.
>> 3. Schmitz's algorithm. http://www.lsv.fr/~schmitz/pub/expamb.pdf . Even stronger than LR(k). I
>> downloaded his bison fork. Unfortunately this bison fork can prove unambiguity for same set of
>> grammars original bison can (i. e. LR(1)). I. e. it seems Schmitz did not implemented his
>> algorithm.
>> I don't want to implement his algorithm either.
>> 4. Horizontal and vertical ambiguity checking. https://www.brics.dk/grammar/kruse.pdf . This is
>> algorithm used in XSugar. Unfortunately, it doesn't handle my PTS grammar.
>> 5. Brute force up to some limit. Doesn't give unambiguity guarantee. But I like it, and I will use
>> it if I don't find anything else. It is the only algorithm which can handle my PTS grammar (and
>> give enough amount of guarantee), not counting LR(k) and Schmitz's algorithm (I was not able to
>> find
>> implementations of both).
>>
>> I will try to write tool I want, I will write about my progress to this list.
>>
>> ==
>> Askar Safin
>> https://github.com/safinaskar
>> _______________________________________________
>> Haskell-Cafe mailing list
>> To (un)subscribe, modify options or view archives go to:
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>> Only members subscribed via the mailman list are allowed to post.
>
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: How to do reversible parsing?

Jon Purdy
In reply to this post by Haskell - Haskell-Cafe mailing list
You may be interested in the profunctor technique of bidirectional parsing/serialisation, a basic version of which is implemented in the ‘codec’ package: <https://hackage.haskell.org/package/codec>. It was originally intended for binary and textual data formats, but could be used for a conventional textual language as well. I have also used this technique at work to develop a similar library, and found it quite effective & performant once you get the (somewhat delicate) internals correct.

As you suggest, it’s generally not feasible to have “encode linearity” (‘decode . encode = id’; printing omits no information from output) and “decode linearity” (‘encode . decode = id’; parsing discards no info from input). However, it’s worth verifying both “encode preservation” (‘encode . decode . encode = encode’; parsing preserves all info that printing does) and “decode preservation” (‘decode . encode . decode = decode’; printing preserves all info that parsing does).

On Sat, Jan 2, 2021 at 4:02 PM Askar Safin via Haskell-Cafe <[hidden email]> wrote:
Hi. How to do deterministic reversible parsing in Haskell? Please, point me to some libs or tools.
Or give some advice on how to write such lib or tool. (Also, I am interested in solutions for C or C++.)

Now let me explain my task in detail.

I am interested in formal math. I am going to develop various proof checkers. In particular, I want
to develop my PTS checker ( https://en.wikipedia.org/wiki/Pure_type_system ). Here is example of
existing PTS checker: http://web.archive.org/web/20180924133846/http://www.cs.kun.nl/~janz/yarrow/ .
Here is example of input this checker accepts (taken from manual):

Var (+) : Nat -> Nat -> Nat
Def double := \x:Nat. x+x
Def S := (+) one

Also, I want to develop something similar to Isabelle ( https://isabelle.in.tum.de/ ). Example of
Isabelle input: https://isabelle.in.tum.de/dist/library/HOL/HOL/HOL.html .

Also, I am going to write converters between various languages. For example, between input for some
existing PTS checker and my PTS checker.

In short, I want to write lot of parsers and pretty-printers for various languages. And thus I need
some parsing solution.

The solution should have the following features:

1. The parsing should be divided into lexing and parsing. I. e. usual lex + yacc model. Parsing should
be done using usual context-free grammars (CFGs). Left-recursive grammars should be supported (this
leaves out parser combinator libraries). All languages I am going to deal with fit this model. I. e.
I will deal with simple to parse languages, such as Pascal. Not complex ones, like C or C++.

2. Parsing should be deterministic. I. e. parser should check that CFG is unambiguous. Yes, I know
that this task is impossible to do on Turing machine. But at very least some partial solution will
go. For example, some algorithm, which will check that some reasonable set of CFGs is deterministic.
Or even brute force up to some limit will go. (Yes, this will not give absolutely correct results,
but this will go.) This is example of grammar for PTS in one of my checkers (start symbol is t0):

t1000 = id
t1000 = "(" t0")"
t999 = t999 t1000
t3 = t4 "::" t3
t3 = "%" id "::" t0 "." t3
t0 = "!!" id "::" t0 "." t0
t1 = t2 "==>" t1
t0 = t1
t1 = t2
t2 = t3
t3 = t4
t4 = t999
t999 = t1000

If I remember correctly, this grammar is deterministic (checked using brute force up to a limit).
I want solution which will verify that this grammar is deterministic.

All stages of parsing should be proved as deterministic (i. e. scanner too).

3. Parsing should be reversible. I. e. pretty-printing should be supported, too. And the tool should
guarantee that parsing and pretty-printing match. Now let me give more precise definition. Let's
assume we have these functions:

parse :: String -> Maybe X
print :: X -> Maybe String

As you can see, not every internal representation is printable. This is OK. For example, if internal
representation contains variable names that happen to equal reserved words, then, of course, this is
unprintable representation.

The tool should guarantee that:
a. Result of parsing should be printable, but printing not necessary should give same string. I. e.
if "parse s == Just x", then "print x /= Nothing". (But not necessary "print x == Just s".)
b. Parsing of result of printing should give same representation. I. e. if "print x == Just s", then
"parse s == Just x"

4. Not only parsing should generate parse tree (i. e. exact representation of derivation), but also
AST. AST differs from parse tree. For example, AST should not contain braces. I. e. "2 + (3 * 4)" and
"2 + 3 * 4" should give same AST. And the tool should be able parse string to AST and pretty print
AST back to string.

5. It should be possible to use CFG not known in compile time. Consider Isabelle language. Here is
again example of Isabelle text: https://isabelle.in.tum.de/dist/library/HOL/HOL/HOL.html . As you can
see the language is divided into inner language and outer language. Strings in inner language are
quoted. Outer language can add productions to inner language. Inner language cannot change itself.
"notation" and "translations" commands add productions to inner language. So inner language is not
known beforehand, it is created dynamically. There is no need to dynamically create scanner, i. e.
it should be possible to dynamically create parser (for inner language), but there is no such
requirement for scanner. Also, parser will not change in the middle of parsing given string.

Again: let's assume I write parser for my Isabelle analog. Scanner and parser for outer language is
fixed. Scanner for inner language is fixed, too. But parser for inner language is not. I read input
document and add productions to CFG of inner language on the fly. Every time I add production,
resulting CFG should be checked for ambiguity. And when I find inner language string in input, I
process it using just constructed parser for inner language.

--

Points 2 and 3 are the most important in this list. If there is solution which contains them alone,
this already will be very good.

Now let me list considered solutions.

* Usual combinator libraries (such as parsec) will not go. They don't work with left-recursive CFGs.
They don't check grammars for ambiguities. They don't do reversible parsing. But they allow creating
language on the fly, and this is good thing.

* Happy and alex will not go. As well as I know Happy can prove unambiguity of some very simple grammars
(i. e. for some set of simple grammars it will not report any conflicts and thus will prove they unambiguity).
It seems this is LR(1) set. But, as well as I know, Happy will report conflicts on PTS grammar given
above, i. e. Happy cannot prove its unambiguity. Also, Happy and Alex cannot pretty-print. And Happy
and Alex don't allow supplying CFG on the fly.

* Package "earley". Handle arbitrary CFGs, including left-recursive. This is good thing. It seems CFG
can be supplied in runtime. This is good thing, too. Gives all parse trees, thus we can verify that
count of parse trees equals to 1. This is good thing, too. Unfortunately, earley cannot verify CFS's
unambiguity before supplying input. Also, earley doesn't support pretty-printing.

* PEG libraries. Will not go, I want CFGs.

* I searched on Hackage reversible parsing libraries. Unfortunately, none of them guarantee
reversibility. I. e. it is very easy to fool these libraries and create parsers and printers not
satisfying equations I gave above. Also, this libraries doesn't have other features I mentioned

* http://augeas.net/ is very cool project. Unlike nearly everything I found in internet, Augeas
provides guaranteed reversible parsing and pretty-printing (!!!!!). Unfortunately, set of supported
target languages is very limited. Regular languages are supported. And some very limited set of CFGs
is supported. For example, it is possible to construct parser for JSON. But parser for Pascal seems
impossible. Augeas not designed for parsing programming languages. Only simple configuration
languages like JSON, XML, etc. Mutually recursive nonterminals are not supported. Also, Augeas
provides C API, not Haskell one.

* https://www.brics.dk/xsugar.html is very good. It provides guaranteed reversible translation
between arbitrary language and XML. And XSugar even verifies unambiguity of the CFG using some
sophisticated algorithm. Unfortunately, this algorithm still cannot prove unambiguity of PTS grammar
given above. Also, XSugar is not Haskell library. It is standalone Java app. So, I need to convert
some language to XML, then load this XML to my Haskell program and parse. Also, there is similar lib
https://www.brics.dk/Xact.html from same author, but it is for Java, not for Haskell.

I was able to find only 2 projects, which do guaranteed reversible parsing: mentioned Augeas and
XSugar/Xact. Unfortunately, both are not for Haskell. And both cannot prove unambiguity for my PTS
grammar.

So, please tell me about solution. Or give advice how to write one.

Also, let me describe my experience with various algorithms for checking ambiguity of grammar.

1. Try to construct LR(1) table. If there is no conflicts, then the grammar is unambiguous. It seems
this is algorithm used by bison and happy. Unfortunately it is too weak and cannot prove unambiguity
of my PTS grammar.
2. Try to construct LR(k) table. This algorithm is stronger than LR(1). I was unable to find
implementation of this algorithm in any language. I don't want to write such algorithm.
3. Schmitz's algorithm. http://www.lsv.fr/~schmitz/pub/expamb.pdf . Even stronger than LR(k). I
downloaded his bison fork. Unfortunately this bison fork can prove unambiguity for same set of
grammars original bison can (i. e. LR(1)). I. e. it seems Schmitz did not implemented his algorithm.
I don't want to implement his algorithm either.
4. Horizontal and vertical ambiguity checking. https://www.brics.dk/grammar/kruse.pdf . This is
algorithm used in XSugar. Unfortunately, it doesn't handle my PTS grammar.
5. Brute force up to some limit. Doesn't give unambiguity guarantee. But I like it, and I will use
it if I don't find anything else. It is the only algorithm which can handle my PTS grammar (and
give enough amount of guarantee), not counting LR(k) and Schmitz's algorithm (I was not able to find
implementations of both).

I will try to write tool I want, I will write about my progress to this list.

==
Askar Safin
https://github.com/safinaskar
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: How to do reversible parsing?

Haskell - Haskell-Cafe mailing list
In reply to this post by Johannes Waldmann-2
> Rendell and Ostermann,
> Invertible syntax descriptions,
> Haskell Symposium '10
> https://doi.org/10.1145/1863523.1863525
> https://hackage.haskell.org/package/invertible-syntax
Hi. Thanks a lot for answer.

This package does not build with ghc 8.8.4, because it did not implemented MonadFail proposal.
This shows that (unfortunately) nobody cares that reversible parsing actually works.
But I was able to build the package with ghc 8.0.1.
I tried to break the package, i. e. I tried to create grammar such that parsing and printing will not match.
I was unable to do this, so it seems the package really guarantees reversible parsing.
It seems I overlooked it when I searched Hackage.
I will probably use this package.
But it doesn't check grammars for ambiguity. I will probably use this package together with my brute force
algorithm.

>Perhaps you don't even need to pretty-print?
I need printing.

==
Askar Safin
https://github.com/safinaskar
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: How to do reversible parsing?

Haskell - Haskell-Cafe mailing list
In reply to this post by Johannes Waldmann-2
> https://hackage.haskell.org/package/invertible-syntax
Hi. My first impression about this package was wrong. The package doesn't check
that parsing is deterministic. Parser returns list of results instead of just one result.
I. e. we cannot be sure about determinism before actual parsing. So, this package
will not go
==
Askar Safin
https://github.com/safinaskar
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: How to do reversible parsing?

Johannes Waldmann-2
> (unfortunately) nobody cares that reversible parsing actually works.

Well what about
https://github.com/kolmodin/binary/blob/master/tests/QC.hs#L50
is this not "caring for reversibility"? (in a similar context)


> we cannot be sure about determinism

Assume you come up with a deterministic grammar for Isabelle (say).
How do you want to be sure it is the right one?

Would it not be a good idea to use the given grammar
(extract from sources of tools, e.g.,
https://isabelle-dev.sketis.net/source/isabelle/browse/default/src/Pure/Isar/parse_spec.ML
- this is already a combinator parser? or from
specification/documentation, e.g.,
http://isabelle.in.tum.de/dist/Isabelle2020/doc/isar-ref.pdf ?)


Best regards, J.


NB: full Isabelle seems quite the challenge to parse: isar-ref 8.4.5
"... parsing may be ambiguous. Isabelle lets the Earley parser
enumerate all possible parse trees... Terms that cannot be
type-checked are filtered out. Unlike regular type reconstruction, ...
the filtering stage only ..."
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: How to do reversible parsing?

Haskell - Haskell-Cafe mailing list
Hi.

> How do you want to be sure it is the right one?
I don't want to parse Isabelle code. I want to create language similar to Isabelle.

==
Askar Safin
https://github.com/safinaskar
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: How to do reversible parsing?

Haskell - Haskell-Cafe mailing list
In reply to this post by Haskell - Haskell-Cafe mailing list
Hi. How to do reversible converting between parse tree and AST?

My final task can be thought as the following reversible computation:

stream of chars <---> stream of tokens <---> parse tree <---> AST

On first piece (chars <---> tokens): I will probably just give up on
reversible lexing, so the problem is gone. :) I will write separate lexer and
printer.

On second piece (tokens <---> parse tree): printing is easy. For parsing I will
use package "Earley" from Hackage. And I will use brute force to ensure
unambiguity.

On third piece (parse tree <---> AST): it seems to be last remaining piece. I don't
know how to get both translators in the same time.

Consider usual arithmetic language, i. e. a * (b + c). This is how parse tree may look:

data Term = Id Id | Braces LBrace Expr RBrace
data Product = Term Term | Times Product Times Term
data Expr = Product Product | Plus Expr Plus Product

As you can see this data types are direct mapping of grammar.

And this is how corresponding AST looks:

data AExpr = AId Id | ATimes AExpr AExpr | APlus AExpr AExpr

Is there some way to simultaneously write translators in both directions?

==
Askar Safin
https://github.com/safinaskar
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.