Parsec operator with letter problem

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Parsec operator with letter problem

Reto Kramer
I ran into the following problem with parsec's handling of operators.  
I hope someone on the list can show me a trick that resolve my  
current issue. At the end of this message is the full code for the  
reproducer.

The language I'm parsing has infix operators of two forms. Some are  
special characters (e.g. a dot) and some are LaTeX like (e.g. \in).  
The letters that appear in the e.g. \in lead to problems with  
expression parsing.

The following is what I expect:
   $ cat input1
   a.b
   $ ./test input1
   InfixExpr OP_Dot (Ident "a") (Ident "b")
Good!

Then I try (change the b to an n)
   $ cat input2
   a.n
   $ ./test input2
   Ident "a"
OUTSCH!  Changing the name of an identifier changed the expression!

Then I try (add a space right after the dot)
   $ cat input3
   a. n
   $ ./test input2
   InfixExpr OP_Dot (Ident "a") (Ident "n")
Good!

What is going on?

The 'n' is part of the "\in" operator (see reservedOpNames in test.hs  
program at the end of message) and confuses the parsec expression  
parser's ability to determine the end of expressions/identifiers.  
Adding a space right after the '.' operator resolves the issue, but  
that's not a suitable option for my users.

Unfortunately I cannot remove the "\in" operator from the  
reservedOpNames list since otherwise "\in" is not recognized as an  
infix operator itself anymore.

QUESTION:
how can I get parsec's expression parser to work with infix operators  
of the form "\in"?

Thanks,
- Reto


-- complete reproducer (test.hs)
-- compile with: ghc -o test -package parsec test.hs

module Main where

import List (nub)
import System (getArgs)
import Text.ParserCombinators.Parsec
import Text.ParserCombinators.Parsec.Expr
import Text.ParserCombinators.Parsec.Language( emptyDef )
import qualified Text.ParserCombinators.Parsec.Token as P

main :: IO ()
main = do{ args <- getArgs
         ; let fname = args !! 0
         ; input <- readFile fname
         ; case parse spec fname input of
              Left err -> do{ putStr "parse error at "
                            ; print err
                           }
              Right x  -> print x
         }

data Op = OP_Dot | OP_In deriving (Show, Eq, Ord)
data Expr = InfixExpr Op Expr Expr
          | Ident String
          | Number Integer deriving (Show, Eq, Ord)

spec :: Parser Expr
spec = do { whiteSpace
          ; e <- expression
          ; return e
          }

expression :: Parser Expr
expression = buildExpressionParser table basicExpr <?> "expression"

op_infix :: Op -> Expr  -> Expr -> Expr
op_infix op a b = InfixExpr op a b

table :: OperatorTable Char () Expr
table = [ [binary "."  (op_infix OP_Dot) AssocLeft]
         , [binary "\\in" (op_infix OP_In) AssocLeft] ]
binary  name fun assoc = Infix (do{ reservedOp name; return fun }) assoc

basicExpr :: Parser Expr
basicExpr = choice [
               do{ i <- identifier
                ; return $ Ident i
                }
             ]

lexer = P.makeTokenParser testdef

testdef = emptyDef {
   P.identStart      = letter <|> char '_'
, P.identLetter     = alphaNum <|> char '_'
, P.opStart         = oneOf $ nub $
                         map (\s -> head s) $ P.reservedOpNames testdef
, P.opLetter        = oneOf (concat (P.reservedOpNames testdef))
, P.reservedOpNames = [ ".", "\\in" ] }

dot             = P.dot lexer
parens          = P.parens lexer
braces          = P.braces lexer
squares         = P.squares lexer
semiSep         = P.semiSep lexer
semiSep1        = P.semiSep1 lexer
commaSep        = P.commaSep lexer
commaSep1       = P.commaSep1 lexer
brackets        = P.brackets lexer
whiteSpace      = P.whiteSpace lexer
symbol          = P.symbol lexer
identifier      = P.identifier lexer
reserved        = P.reserved lexer
reservedOp      = P.reservedOp lexer
integer         = P.integer lexer
natural         = P.natural lexer
charLiteral     = P.charLiteral lexer
stringLiteral   = P.stringLiteral lexer

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Parsec operator with letter problem

Daniel Fischer-4
Hi,

probably somebody else has already come up with something better, but still...

I surmise that you have two kinds of infix-operators,
1. dot-like operators, made up entirely of symbols (^!$%&/\,.:;#+-~* ...)
2. LaTeX-command-like operators, starting with a backslash and then followed
by a nonempty sequence of letters (or possibly alphanumeric characters).

Then the following helps:

import Data.Char (isAlpha)

lexer = lexer0{P.reservedOp = rOp}
          where
            lexer0      = P.makeTokenParser testdef
            resOp0      = P.reservedOp lexer0
            resOp1 name =
              case name of
                ('\\':cs@(_:_))
                    | all isAlpha cs -> do string name
                                           notFollowedBy letter <?>
                                                ("end of " ++ show name)
                _   -> fail (show name ++ " no good reservedOp")
            rOp name = lexeme $ try $ resOp0 name <|> resOp1 name
            lexeme p = do { x <- p; P.whiteSpace lexer0; return x }

testdef = emptyDef
    { P.identStart      = letter <|> char '_'
    , P.identLetter     = alphaNum <|> char '_'
    , P.opStart         = oneOf $ nub $
                            map (\s -> head s) $ P.reservedOpNames testdef
 -- , P.opLetter        = oneOf (concat (P.reservedOpNames testdef))
    , P.opLetter = oneOf symbs
    , P.reservedOpNames = [ ".", "\\in" ] }
    where
      symbs = filter (not . isAlpha) . concat $ P.reservedOpNames testdef
---------------------------------------------------------------------
dafis@linux:~/Documents/haskell/Reto> cat input
a.n
dafis@linux:~/Documents/haskell/Reto> reto input
InfixExpr OP_Dot (Ident "a") (Ident "n")

If you have more complicated infix operators (e.g. \foo#bar:, :ouch:),
it won't be so easy, anyway, you have to change the definition of reservedOp.

Cheers,
Daniel

Am Freitag, 31. März 2006 01:15 schrieb Reto Kramer:

> I ran into the following problem with parsec's handling of operators.
> I hope someone on the list can show me a trick that resolve my
> current issue. At the end of this message is the full code for the
> reproducer.
>
> The language I'm parsing has infix operators of two forms. Some are
> special characters (e.g. a dot) and some are LaTeX like (e.g. \in).
> The letters that appear in the e.g. \in lead to problems with
> expression parsing.
>
> The following is what I expect:
>    $ cat input1
>    a.b
>    $ ./test input1
>    InfixExpr OP_Dot (Ident "a") (Ident "b")
> Good!
>
> Then I try (change the b to an n)
>    $ cat input2
>    a.n
>    $ ./test input2
>    Ident "a"
> OUTSCH!  Changing the name of an identifier changed the expression!
>
> Then I try (add a space right after the dot)
>    $ cat input3
>    a. n
>    $ ./test input2
>    InfixExpr OP_Dot (Ident "a") (Ident "n")
> Good!
>
> What is going on?
>
> The 'n' is part of the "\in" operator (see reservedOpNames in test.hs
> program at the end of message) and confuses the parsec expression
> parser's ability to determine the end of expressions/identifiers.
> Adding a space right after the '.' operator resolves the issue, but
> that's not a suitable option for my users.
>
> Unfortunately I cannot remove the "\in" operator from the
> reservedOpNames list since otherwise "\in" is not recognized as an
> infix operator itself anymore.
>
> QUESTION:
> how can I get parsec's expression parser to work with infix operators
> of the form "\in"?
>
> Thanks,
> - Reto
>
>
> -- complete reproducer (test.hs)
> -- compile with: ghc -o test -package parsec test.hs
>
> module Main where
>
> import List (nub)
> import System (getArgs)
> import Text.ParserCombinators.Parsec
> import Text.ParserCombinators.Parsec.Expr
> import Text.ParserCombinators.Parsec.Language( emptyDef )
> import qualified Text.ParserCombinators.Parsec.Token as P
>
> main :: IO ()
> main = do{ args <- getArgs
> ; let fname = args !! 0
> ; input <- readFile fname
> ; case parse spec fname input of
>               Left err -> do{ putStr "parse error at "
>                             ; print err
>   }
>               Right x  -> print x
> }
>
> data Op = OP_Dot | OP_In deriving (Show, Eq, Ord)
> data Expr = InfixExpr Op Expr Expr
>
>  | Ident String
>  | Number Integer deriving (Show, Eq, Ord)
>
> spec :: Parser Expr
> spec = do { whiteSpace
>  ; e <- expression
>  ; return e
>  }
>
> expression :: Parser Expr
> expression = buildExpressionParser table basicExpr <?> "expression"
>
> op_infix :: Op -> Expr  -> Expr -> Expr
> op_infix op a b = InfixExpr op a b
>
> table :: OperatorTable Char () Expr
> table = [ [binary "."  (op_infix OP_Dot) AssocLeft]
>          , [binary "\\in" (op_infix OP_In) AssocLeft] ]
> binary  name fun assoc = Infix (do{ reservedOp name; return fun }) assoc
>
> basicExpr :: Parser Expr
> basicExpr = choice [
>                do{ i <- identifier
> ; return $ Ident i
> }
>              ]
>
> lexer = P.makeTokenParser testdef
>
> testdef = emptyDef {
>    P.identStart      = letter <|> char '_'
> , P.identLetter     = alphaNum <|> char '_'
> , P.opStart         = oneOf $ nub $
>                          map (\s -> head s) $ P.reservedOpNames testdef
> , P.opLetter        = oneOf (concat (P.reservedOpNames testdef))
> , P.reservedOpNames = [ ".", "\\in" ] }
>
> dot             = P.dot lexer
> parens          = P.parens lexer
> braces          = P.braces lexer
> squares         = P.squares lexer
> semiSep         = P.semiSep lexer
> semiSep1        = P.semiSep1 lexer
> commaSep        = P.commaSep lexer
> commaSep1       = P.commaSep1 lexer
> brackets        = P.brackets lexer
> whiteSpace      = P.whiteSpace lexer
> symbol          = P.symbol lexer
> identifier      = P.identifier lexer
> reserved        = P.reserved lexer
> reservedOp      = P.reservedOp lexer
> integer         = P.integer lexer
> natural         = P.natural lexer
> charLiteral     = P.charLiteral lexer
> stringLiteral   = P.stringLiteral lexer
>
> _______________________________________________
> Haskell-Cafe mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/haskell-cafe

--

"In My Egotistical Opinion, most people's C programs should be
indented six feet downward and covered with dirt."
        -- Blair P. Houghton

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Parsec operator with letter problem

Daniel Fischer-4
Am Freitag, 31. März 2006 15:24 schrieb Daniel Fischer:

> Hi,
>
> probably somebody else has already come up with something better, but
> still...
>
> I surmise that you have two kinds of infix-operators,
> 1. dot-like operators, made up entirely of symbols (^!$%&/\,.:;#+-~* ...)
> 2. LaTeX-command-like operators, starting with a backslash and then
> followed by a nonempty sequence of letters (or possibly alphanumeric
> characters).
>
> Then the following helps:
>
> import Data.Char (isAlpha)
>
> lexer = lexer0{P.reservedOp = rOp}
>           where
>             lexer0      = P.makeTokenParser testdef
>             resOp0      = P.reservedOp lexer0
>             resOp1 name =
>               case name of
>                 ('\\':cs@(_:_))
>
>                     | all isAlpha cs -> do string name
>
>                                            notFollowedBy letter <?>
>                                                 ("end of " ++ show name)
>                 _   -> fail (show name ++ " no good reservedOp")
>             rOp name = lexeme $ try $ resOp0 name <|> resOp1 name
>             lexeme p = do { x <- p; P.whiteSpace lexer0; return x }

Noho, that's not right, that parses "a\inn" as
InfixExpr OP_In (Ident "a") (Ident "n"),
because resOp1 is never used, which we don't want, so:

lexer = lexer0{P.reservedOp = rOp}
          where
            lexer0      = P.makeTokenParser testdef
            resOp0      = P.reservedOp lexer0
            resOp1 name = do string name
                             notFollowedBy letter <?> ("end of " ++ show name)
            rOp name = lexeme $ try $
                          case name of
                            ('\\':cs@(_:_)) | all isAlpha cs -> resOp1 name
                            _ -> resOp0 name
            lexeme p = do { x <- p; P.whiteSpace lexer0; return x }

Now:
dafis@linux:~/Documents/haskell/Reto> cat input
a.n
dafis@linux:~/Documents/haskell/Reto> reto input
InfixExpr OP_Dot (Ident "a") (Ident "n")
dafis@linux:~/Documents/haskell/Reto> cat input
a\inn
dafis@linux:~/Documents/haskell/Reto> reto input
Ident "a"
dafis@linux:~/Documents/haskell/Reto> cat input
a\in n
dafis@linux:~/Documents/haskell/Reto> reto input
InfixExpr OP_In (Ident "a") (Ident "n")

That's better.

>
> testdef = emptyDef
>     { P.identStart      = letter <|> char '_'
>     , P.identLetter     = alphaNum <|> char '_'
>     , P.opStart         = oneOf $ nub $
>                             map (\s -> head s) $ P.reservedOpNames testdef
>  -- , P.opLetter        = oneOf (concat (P.reservedOpNames testdef))
>     , P.opLetter = oneOf symbs
>     , P.reservedOpNames = [ ".", "\\in" ] }
>     where
>       symbs = filter (not . isAlpha) . concat $ P.reservedOpNames testdef
> ---------------------------------------------------------------------
> dafis@linux:~/Documents/haskell/Reto> cat input
> a.n
> dafis@linux:~/Documents/haskell/Reto> reto input
> InfixExpr OP_Dot (Ident "a") (Ident "n")
>
> If you have more complicated infix operators (e.g. \foo#bar:, :ouch:),
> it won't be so easy, anyway, you have to change the definition of
> reservedOp.
>
> Cheers,
> Daniel

--

"In My Egotistical Opinion, most people's C programs should be
indented six feet downward and covered with dirt."
        -- Blair P. Houghton

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Parsec operator with letter problem

Reto Kramer
Great!  Thanks for the revision Daniel.  If you're ever in San  
Francisco, please do ping me - I sure owe you lunch!

- Reto

On Mar 31, 2006, at 3:14 PM, Daniel Fischer wrote:

> Am Freitag, 31. März 2006 15:24 schrieb Daniel Fischer:
>> Hi,
>>
>> probably somebody else has already come up with something better, but
>> still...
>>
>> I surmise that you have two kinds of infix-operators,
>> 1. dot-like operators, made up entirely of symbols (^!$%&/\,.:;#+-
>> ~* ...)
>> 2. LaTeX-command-like operators, starting with a backslash and then
>> followed by a nonempty sequence of letters (or possibly alphanumeric
>> characters).
>>
>> Then the following helps:
>>
>> import Data.Char (isAlpha)
>>
>> lexer = lexer0{P.reservedOp = rOp}
>>           where
>>             lexer0      = P.makeTokenParser testdef
>>             resOp0      = P.reservedOp lexer0
>>             resOp1 name =
>>               case name of
>>                 ('\\':cs@(_:_))
>>
>>                     | all isAlpha cs -> do string name
>>
>>                                            notFollowedBy letter <?>
>>                                                 ("end of " ++ show  
>> name)
>>                 _   -> fail (show name ++ " no good reservedOp")
>>             rOp name = lexeme $ try $ resOp0 name <|> resOp1 name
>>             lexeme p = do { x <- p; P.whiteSpace lexer0; return x }
>
> Noho, that's not right, that parses "a\inn" as
> InfixExpr OP_In (Ident "a") (Ident "n"),
> because resOp1 is never used, which we don't want, so:
>
> lexer = lexer0{P.reservedOp = rOp}
>           where
>             lexer0      = P.makeTokenParser testdef
>             resOp0      = P.reservedOp lexer0
>             resOp1 name = do string name
>                              notFollowedBy letter <?> ("end of " ++  
> show name)
>             rOp name = lexeme $ try $
>                           case name of
>                             ('\\':cs@(_:_)) | all isAlpha cs ->  
> resOp1 name
>                             _ -> resOp0 name
>             lexeme p = do { x <- p; P.whiteSpace lexer0; return x }
>
> Now:
> dafis@linux:~/Documents/haskell/Reto> cat input
> a.n
> dafis@linux:~/Documents/haskell/Reto> reto input
> InfixExpr OP_Dot (Ident "a") (Ident "n")
> dafis@linux:~/Documents/haskell/Reto> cat input
> a\inn
> dafis@linux:~/Documents/haskell/Reto> reto input
> Ident "a"
> dafis@linux:~/Documents/haskell/Reto> cat input
> a\in n
> dafis@linux:~/Documents/haskell/Reto> reto input
> InfixExpr OP_In (Ident "a") (Ident "n")
>
> That's better.
>>
>> testdef = emptyDef
>>     { P.identStart      = letter <|> char '_'
>>     , P.identLetter     = alphaNum <|> char '_'
>>     , P.opStart         = oneOf $ nub $
>>                             map (\s -> head s) $ P.reservedOpNames  
>> testdef
>>  -- , P.opLetter        = oneOf (concat (P.reservedOpNames testdef))
>>     , P.opLetter = oneOf symbs
>>     , P.reservedOpNames = [ ".", "\\in" ] }
>>     where
>>       symbs = filter (not . isAlpha) . concat $ P.reservedOpNames  
>> testdef
>> ---------------------------------------------------------------------
>> dafis@linux:~/Documents/haskell/Reto> cat input
>> a.n
>> dafis@linux:~/Documents/haskell/Reto> reto input
>> InfixExpr OP_Dot (Ident "a") (Ident "n")
>>
>> If you have more complicated infix operators (e.g.  
>> \foo#bar:, :ouch:),
>> it won't be so easy, anyway, you have to change the definition of
>> reservedOp.
>>
>> Cheers,
>> Daniel
>
> --
>
> "In My Egotistical Opinion, most people's C programs should be
> indented six feet downward and covered with dirt."
> -- Blair P. Houghton
>

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe