Why is Megaparsec treating these two operators differently?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Why is Megaparsec treating these two operators differently?

Jeffrey Brown
I used Text.Megaparsec.Expr to write a minimal (56 lines) 2-operator recursive parser. It's on github[1]. It outputs a binary tree of the following form:

    data AExpr = Var String | Pair AExpr AExpr deriving (Show)

The operator table supplied to makeExprParser defines two operators, # and ##. ## binds after #, but both of them refer to the same function, the Pair constructor of the AExpr data type:

    aOperators :: [[Operator Parser AExpr]]
    aOperators =
      [ [ InfixN # symbol "#" *> pure (Pair) ]
      , [ InfixN # symbol "##" *> pure (Pair) ]
      ]

The # operator works in isolation:

    > parseMaybe aExpr "a # b"
    Just (Pair (Var "a") (Var "b"))

Parentheses work with the # operator:

    > parseMaybe aExpr "(a # b) # (c # d)"
    Just (Pair (Pair (Var "a") (Var "b")) -- whitespace added by hand
               (Pair (Var "c") (Var "d")))

And the # and ## operators work together as intended:

    > parseMaybe aExpr "a # b ## c # d"
    Just (Pair (Pair (Var "a") (Var "b")) -- whitespace added by hand
               (Pair (Var "c") (Var "d")))

But the ## operator in isolation does not parse!

    > parseMaybe aExpr "a ## b"
    Nothing



--
Jeff Brown | Jeffrey Benjamin Brown
Website   |   Facebook   |   LinkedIn(I often miss messages here)   |   Github

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Why is Megaparsec treating these two operators differently?

Brandon Allbery

On Sun, Oct 23, 2016 at 4:15 PM, Jeffrey Brown <[hidden email]> wrote:
      [ [ InfixN # symbol "#" *> pure (Pair) ]
      , [ InfixN # symbol "##" *> pure (Pair) ]
      ]

Combinator parsers can't rearrange themselves to do longest token matching. So the ## operator will take the first case, match against `symbol "#"` and aOperator will succeed; the the next token match will hit the unconsumed "#" and fail. If you place "##" first then it will match "##" but not "#", which would the match the second rule.

--
brandon s allbery kf8nh                               sine nomine associates
[hidden email]                                  [hidden email]
unix, openafs, kerberos, infrastructure, xmonad        http://sinenomine.net

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Why is Megaparsec treating these two operators differently?

Jeffrey Brown
Thanks, Brandon! How did you know that?

I changed them to "#1" and "#2" and now it works[1].

But before making that change, why would "a # b ## c # d" evaluate, even though "a ## b" would not?


The corrected file is called "experim.hs"; the old one, uncorrected, is called "experim.buggy.hs".

On Sun, Oct 23, 2016 at 2:03 PM, Brandon Allbery <[hidden email]> wrote:

On Sun, Oct 23, 2016 at 4:15 PM, Jeffrey Brown <[hidden email]> wrote:
      [ [ InfixN # symbol "#" *> pure (Pair) ]
      , [ InfixN # symbol "##" *> pure (Pair) ]
      ]

Combinator parsers can't rearrange themselves to do longest token matching. So the ## operator will take the first case, match against `symbol "#"` and aOperator will succeed; the the next token match will hit the unconsumed "#" and fail. If you place "##" first then it will match "##" but not "#", which would the match the second rule.

--
brandon s allbery kf8nh                               sine nomine associates
[hidden email]                                  [hidden email]
unix, openafs, kerberos, infrastructure, xmonad        http://sinenomine.net



--
Jeff Brown | Jeffrey Benjamin Brown
Website   |   Facebook   |   LinkedIn(I often miss messages here)   |   Github

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Why is Megaparsec treating these two operators differently?

Brandon Allbery
Aha. I had forgotten some details.

If you want to have an operator that is a prefix of another operator in the table, use the following (or similar) wrapper instead of plain symbol:
op n = (lexeme . try) (string n <* notFollowedBy punctuationChar)

So you actually need to be a little clever for those two operators to work; it's not as simple as I had recalled it (which would have been correct for a basic manual combinator setup). I am going to guess that something in there is not using `try` and silently consuming the extra "#", but I'd have to study the `makeExprParser` code in Megaparsec to be certain.

On Sun, Oct 23, 2016 at 5:38 PM, Jeffrey Brown <[hidden email]> wrote:
Thanks, Brandon! How did you know that?

I changed them to "#1" and "#2" and now it works[1].

But before making that change, why would "a # b ## c # d" evaluate, even though "a ## b" would not?


The corrected file is called "experim.hs"; the old one, uncorrected, is called "experim.buggy.hs".

On Sun, Oct 23, 2016 at 2:03 PM, Brandon Allbery <[hidden email]> wrote:

On Sun, Oct 23, 2016 at 4:15 PM, Jeffrey Brown <[hidden email]> wrote:
      [ [ InfixN # symbol "#" *> pure (Pair) ]
      , [ InfixN # symbol "##" *> pure (Pair) ]
      ]

Combinator parsers can't rearrange themselves to do longest token matching. So the ## operator will take the first case, match against `symbol "#"` and aOperator will succeed; the the next token match will hit the unconsumed "#" and fail. If you place "##" first then it will match "##" but not "#", which would the match the second rule.

--
brandon s allbery kf8nh                               sine nomine associates
[hidden email]                                  [hidden email]
unix, openafs, kerberos, infrastructure, xmonad        http://sinenomine.net



--
Jeff Brown | Jeffrey Benjamin Brown
Website   |   Facebook   |   LinkedIn(I often miss messages here)   |   Github



--
brandon s allbery kf8nh                               sine nomine associates
[hidden email]                                  [hidden email]
unix, openafs, kerberos, infrastructure, xmonad        http://sinenomine.net

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Why is Megaparsec treating these two operators differently?

Jeffrey Brown
The problem's not solved! Check out this weirdness. |+| and |-| are supposed to be identical, mapping to the same function, the Pair constructor.

    > mapM_ putStrLn exprs
    a |+| b
    a |-| b
    a |+| b |-| c
    a |-| b |+| c
    a |+| b   |-|  c |+| d
    > mapM_ putStrLn $ map (show . parseMaybe aExpr) exprs
    Just (Pair (Var "a") (Var "b"))
    Nothing
    Just (Pair (Pair (Var "a") (Var "b")) (Var "c"))
    Nothing
    Just (Pair (Pair (Var "a") (Var "b")) (Pair (Var "c") (Var "d")))

Why the two failures?

If I change them from InfixN to InfixR or InfixL, then only the first of those five expressions parses.

Here are two relevant definitions:

    data AExpr = Var String | Pair AExpr AExpr deriving (Show)

    aOperators :: [[Operator Parser AExpr]]
    aOperators =
      [ [ InfixN $ symbol "|+|" *> pure (Pair) ]
      , [ InfixN $ symbol "|-|" *> pure (Pair) ] -- binds last, I think
      ]

And here is the code in full[1].

Thanks,
Jeff




On Sun, Oct 23, 2016 at 2:50 PM, Brandon Allbery <[hidden email]> wrote:
Aha. I had forgotten some details.

If you want to have an operator that is a prefix of another operator in the table, use the following (or similar) wrapper instead of plain symbol:
op n = (lexeme . try) (string n <* notFollowedBy punctuationChar)

So you actually need to be a little clever for those two operators to work; it's not as simple as I had recalled it (which would have been correct for a basic manual combinator setup). I am going to guess that something in there is not using `try` and silently consuming the extra "#", but I'd have to study the `makeExprParser` code in Megaparsec to be certain.

On Sun, Oct 23, 2016 at 5:38 PM, Jeffrey Brown <[hidden email]> wrote:
Thanks, Brandon! How did you know that?

I changed them to "#1" and "#2" and now it works[1].

But before making that change, why would "a # b ## c # d" evaluate, even though "a ## b" would not?


The corrected file is called "experim.hs"; the old one, uncorrected, is called "experim.buggy.hs".

On Sun, Oct 23, 2016 at 2:03 PM, Brandon Allbery <[hidden email]> wrote:

On Sun, Oct 23, 2016 at 4:15 PM, Jeffrey Brown <[hidden email]> wrote:
      [ [ InfixN # symbol "#" *> pure (Pair) ]
      , [ InfixN # symbol "##" *> pure (Pair) ]
      ]

Combinator parsers can't rearrange themselves to do longest token matching. So the ## operator will take the first case, match against `symbol "#"` and aOperator will succeed; the the next token match will hit the unconsumed "#" and fail. If you place "##" first then it will match "##" but not "#", which would the match the second rule.

--
brandon s allbery kf8nh                               sine nomine associates
[hidden email]                                  [hidden email]
unix, openafs, kerberos, infrastructure, xmonad        http://sinenomine.net



--
Jeff Brown | Jeffrey Benjamin Brown
Website   |   Facebook   |   LinkedIn(I often miss messages here)   |   Github



--
brandon s allbery kf8nh                               sine nomine associates
[hidden email]                                  [hidden email]
unix, openafs, kerberos, infrastructure, xmonad        http://sinenomine.net



--
Jeff Brown | Jeffrey Benjamin Brown
Website   |   Facebook   |   LinkedIn(spammy, so I often miss messages here)   |   Github   

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Why is Megaparsec treating these two operators differently?

David Turner-2
The "code in full" link has operators called `#1` and `#2` rather than `|+|` and `|-|`, but I see one of your test cases fail there too. You're using quite an old version of megaparsec, 4.3.0 (stackage lts-5.5) and it looks like something affecting this was fixed in 4.4.0 (e.g. stackage lts-6.9):

$ stack --resolver lts-5.5 ghci --no-load --no-build
Configuring GHCi with the following packages: Dwt
GHCi, version 7.10.3: http://www.haskell.org/ghc/  :? for help
Prelude> :l howto/megaparsec/minimal.hs
[1 of 1] Compiling Experim          ( howto/megaparsec/minimal.hs, interpreted )
Ok, modules loaded: Experim.
*Experim> test
Just (Pair (Var "a") (Var "b"))
Nothing
Just (Pair (Pair (Var "a") (Var "b")) (Pair (Var "c") (Var "d")))
Just (Pair (Pair (Var "a") (Var "b")) (Pair (Var "c") (Var "d")))
*Experim>
Leaving GHCi.
$ stack --resolver lts-6.9 ghci --no-load --no-build
Configuring GHCi with the following packages: Dwt
GHCi, version 7.10.3: http://www.haskell.org/ghc/  :? for help
Prelude> :l howto/megaparsec/minimal.hs
[1 of 1] Compiling Experim          ( howto/megaparsec/minimal.hs, interpreted )
Ok, modules loaded: Experim.
*Experim> test
Just (Pair (Var "a") (Var "b"))
Just (Pair (Var "a") (Var "b"))
Just (Pair (Pair (Var "a") (Var "b")) (Pair (Var "c") (Var "d")))
Just (Pair (Pair (Var "a") (Var "b")) (Pair (Var "c") (Var "d")))
*Experim>
Leaving GHCi.


On 14 May 2017 at 07:53, Jeffrey Brown <[hidden email]> wrote:
The problem's not solved! Check out this weirdness. |+| and |-| are supposed to be identical, mapping to the same function, the Pair constructor.

    > mapM_ putStrLn exprs
    a |+| b
    a |-| b
    a |+| b |-| c
    a |-| b |+| c
    a |+| b   |-|  c |+| d
    > mapM_ putStrLn $ map (show . parseMaybe aExpr) exprs
    Just (Pair (Var "a") (Var "b"))
    Nothing
    Just (Pair (Pair (Var "a") (Var "b")) (Var "c"))
    Nothing
    Just (Pair (Pair (Var "a") (Var "b")) (Pair (Var "c") (Var "d")))

Why the two failures?

If I change them from InfixN to InfixR or InfixL, then only the first of those five expressions parses.

Here are two relevant definitions:

    data AExpr = Var String | Pair AExpr AExpr deriving (Show)

    aOperators :: [[Operator Parser AExpr]]
    aOperators =
      [ [ InfixN $ symbol "|+|" *> pure (Pair) ]
      , [ InfixN $ symbol "|-|" *> pure (Pair) ] -- binds last, I think
      ]

And here is the code in full[1].

Thanks,
Jeff




On Sun, Oct 23, 2016 at 2:50 PM, Brandon Allbery <[hidden email]> wrote:
Aha. I had forgotten some details.

If you want to have an operator that is a prefix of another operator in the table, use the following (or similar) wrapper instead of plain symbol:
op n = (lexeme . try) (string n <* notFollowedBy punctuationChar)

So you actually need to be a little clever for those two operators to work; it's not as simple as I had recalled it (which would have been correct for a basic manual combinator setup). I am going to guess that something in there is not using `try` and silently consuming the extra "#", but I'd have to study the `makeExprParser` code in Megaparsec to be certain.

On Sun, Oct 23, 2016 at 5:38 PM, Jeffrey Brown <[hidden email]> wrote:
Thanks, Brandon! How did you know that?

I changed them to "#1" and "#2" and now it works[1].

But before making that change, why would "a # b ## c # d" evaluate, even though "a ## b" would not?


The corrected file is called "experim.hs"; the old one, uncorrected, is called "experim.buggy.hs".

On Sun, Oct 23, 2016 at 2:03 PM, Brandon Allbery <[hidden email]> wrote:

On Sun, Oct 23, 2016 at 4:15 PM, Jeffrey Brown <[hidden email]> wrote:
      [ [ InfixN # symbol "#" *> pure (Pair) ]
      , [ InfixN # symbol "##" *> pure (Pair) ]
      ]

Combinator parsers can't rearrange themselves to do longest token matching. So the ## operator will take the first case, match against `symbol "#"` and aOperator will succeed; the the next token match will hit the unconsumed "#" and fail. If you place "##" first then it will match "##" but not "#", which would the match the second rule.

--
brandon s allbery kf8nh                               sine nomine associates
[hidden email]                                  [hidden email]
unix, openafs, kerberos, infrastructure, xmonad        http://sinenomine.net



--
Jeff Brown | Jeffrey Benjamin Brown
Website   |   Facebook   |   LinkedIn(I often miss messages here)   |   Github



--
brandon s allbery kf8nh                               sine nomine associates
[hidden email]                                  [hidden email]
unix, openafs, kerberos, infrastructure, xmonad        http://sinenomine.net



--
Jeff Brown | Jeffrey Benjamin Brown
Website   |   Facebook   |   LinkedIn(spammy, so I often miss messages here)   |   Github   

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Why is Megaparsec treating these two operators differently?

Jeffrey Brown
Awww yiss! Thanks, David!

On Sun, May 14, 2017 at 12:13 AM, David Turner <[hidden email]> wrote:
The "code in full" link has operators called `#1` and `#2` rather than `|+|` and `|-|`, but I see one of your test cases fail there too. You're using quite an old version of megaparsec, 4.3.0 (stackage lts-5.5) and it looks like something affecting this was fixed in 4.4.0 (e.g. stackage lts-6.9):

$ stack --resolver lts-5.5 ghci --no-load --no-build
Configuring GHCi with the following packages: Dwt
GHCi, version 7.10.3: http://www.haskell.org/ghc/  :? for help
Prelude> :l howto/megaparsec/minimal.hs
[1 of 1] Compiling Experim          ( howto/megaparsec/minimal.hs, interpreted )
Ok, modules loaded: Experim.
*Experim> test
Just (Pair (Var "a") (Var "b"))
Nothing
Just (Pair (Pair (Var "a") (Var "b")) (Pair (Var "c") (Var "d")))
Just (Pair (Pair (Var "a") (Var "b")) (Pair (Var "c") (Var "d")))
*Experim>
Leaving GHCi.
$ stack --resolver lts-6.9 ghci --no-load --no-build
Configuring GHCi with the following packages: Dwt
GHCi, version 7.10.3: http://www.haskell.org/ghc/  :? for help
Prelude> :l howto/megaparsec/minimal.hs
[1 of 1] Compiling Experim          ( howto/megaparsec/minimal.hs, interpreted )
Ok, modules loaded: Experim.
*Experim> test
Just (Pair (Var "a") (Var "b"))
Just (Pair (Var "a") (Var "b"))
Just (Pair (Pair (Var "a") (Var "b")) (Pair (Var "c") (Var "d")))
Just (Pair (Pair (Var "a") (Var "b")) (Pair (Var "c") (Var "d")))
*Experim>
Leaving GHCi.


On 14 May 2017 at 07:53, Jeffrey Brown <[hidden email]> wrote:
The problem's not solved! Check out this weirdness. |+| and |-| are supposed to be identical, mapping to the same function, the Pair constructor.

    > mapM_ putStrLn exprs
    a |+| b
    a |-| b
    a |+| b |-| c
    a |-| b |+| c
    a |+| b   |-|  c |+| d
    > mapM_ putStrLn $ map (show . parseMaybe aExpr) exprs
    Just (Pair (Var "a") (Var "b"))
    Nothing
    Just (Pair (Pair (Var "a") (Var "b")) (Var "c"))
    Nothing
    Just (Pair (Pair (Var "a") (Var "b")) (Pair (Var "c") (Var "d")))

Why the two failures?

If I change them from InfixN to InfixR or InfixL, then only the first of those five expressions parses.

Here are two relevant definitions:

    data AExpr = Var String | Pair AExpr AExpr deriving (Show)

    aOperators :: [[Operator Parser AExpr]]
    aOperators =
      [ [ InfixN $ symbol "|+|" *> pure (Pair) ]
      , [ InfixN $ symbol "|-|" *> pure (Pair) ] -- binds last, I think
      ]

And here is the code in full[1].

Thanks,
Jeff




On Sun, Oct 23, 2016 at 2:50 PM, Brandon Allbery <[hidden email]> wrote:
Aha. I had forgotten some details.

If you want to have an operator that is a prefix of another operator in the table, use the following (or similar) wrapper instead of plain symbol:
op n = (lexeme . try) (string n <* notFollowedBy punctuationChar)

So you actually need to be a little clever for those two operators to work; it's not as simple as I had recalled it (which would have been correct for a basic manual combinator setup). I am going to guess that something in there is not using `try` and silently consuming the extra "#", but I'd have to study the `makeExprParser` code in Megaparsec to be certain.

On Sun, Oct 23, 2016 at 5:38 PM, Jeffrey Brown <[hidden email]> wrote:
Thanks, Brandon! How did you know that?

I changed them to "#1" and "#2" and now it works[1].

But before making that change, why would "a # b ## c # d" evaluate, even though "a ## b" would not?


The corrected file is called "experim.hs"; the old one, uncorrected, is called "experim.buggy.hs".

On Sun, Oct 23, 2016 at 2:03 PM, Brandon Allbery <[hidden email]> wrote:

On Sun, Oct 23, 2016 at 4:15 PM, Jeffrey Brown <[hidden email]> wrote:
      [ [ InfixN # symbol "#" *> pure (Pair) ]
      , [ InfixN # symbol "##" *> pure (Pair) ]
      ]

Combinator parsers can't rearrange themselves to do longest token matching. So the ## operator will take the first case, match against `symbol "#"` and aOperator will succeed; the the next token match will hit the unconsumed "#" and fail. If you place "##" first then it will match "##" but not "#", which would the match the second rule.

--
brandon s allbery kf8nh                               sine nomine associates
[hidden email]                                  [hidden email]
unix, openafs, kerberos, infrastructure, xmonad        http://sinenomine.net



--
Jeff Brown | Jeffrey Benjamin Brown
Website   |   Facebook   |   LinkedIn(I often miss messages here)   |   Github



--
brandon s allbery kf8nh                               sine nomine associates
[hidden email]                                  [hidden email]
unix, openafs, kerberos, infrastructure, xmonad        http://sinenomine.net



--
Jeff Brown | Jeffrey Benjamin Brown
Website   |   Facebook   |   LinkedIn(spammy, so I often miss messages here)   |   Github   

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.




--
Jeff Brown | Jeffrey Benjamin Brown
Website   |   Facebook   |   LinkedIn(spammy, so I often miss messages here)   |   Github   

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.