Quantcast

Need help with learning Parsec

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Need help with learning Parsec

C K Kashyap
Dear gentle Haskellers,

I was trying to whet my Haskell by trying out Parsec today to try and parse out XML. Here's the code I cam up with - 

I wanted some help with the "gettext" parser that I've written. I had to do a dummy "char '  ') in there just to satisfy the "many" used in the xml parser. I'd appreciate it very much if someone could give me some feedback.


data XML =  Node String [XML]
          | Body String deriving Show

gettext = do
             x <- many (letter <|> digit )
             if (length x) > 0 then 
                return (Body x) 
             else (char ' ' >> (return $ Body ""))

xml :: Parser XML
xml = do {
          name <- openTag
        ; innerXML <- many innerXML
        ; endTag name
        ; return (Node name innerXML) 
         } 

innerXML = do
         x <- (try xml <|> gettext)
         return x

openTag :: Parser String
openTag = do
        char '<'
        content <- many (noneOf ">")
        char '>'
        return content

endTag :: String -> Parser String
endTag str = do
        char '<'
        char '/'
        string str
        char '>'
        return str

h1 = parse xml "" "<a>A</a>"
h2 = parse xml "" "<a><b>A</b></a>"
h3 = parse xml "" "<a><b><c></c></b></a>"
h4 = parse xml "" "<a><b></b><c></c></a>"

Regards,
Kashyap

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Need help with learning Parsec

Christian Maeder-2
Am 19.07.2012 14:53, schrieb C K Kashyap:
> Dear gentle Haskellers,
>
> I was trying to whet my Haskell by trying out Parsec today to try and
> parse out XML. Here's the code I cam up with -
>
> I wanted some help with the "gettext" parser that I've written. I had to
> do a dummy "char '  ') in there just to satisfy the "many" used in the
> xml parser. I'd appreciate it very much if someone could give me some
> feedback.

You don't want empty bodies! So use many1 in gettext.

   gettext = fmap Body $ many1 $ letter <|> digit

If you have spaces in your bodies, skip them or allow them with
noneOf "<".

HTH Christian

>
>
> data XML =  Node String [XML]
>            | Body String deriving Show
>
> gettext = do
>               x <- many (letter <|> digit )
>               if (length x) > 0 then
>                  return (Body x)
>               else (char ' ' >> (return $ Body ""))
>
> xml :: Parser XML
> xml = do {
>            name <- openTag
>          ; innerXML <- many innerXML
>          ; endTag name
>          ; return (Node name innerXML)
>           }
>
> innerXML = do
>           x <- (try xml <|> gettext)
>           return x
>
> openTag :: Parser String
> openTag = do
>          char '<'
>          content <- many (noneOf ">")
>          char '>'
>          return content
>
> endTag :: String -> Parser String
> endTag str = do
>          char '<'
>          char '/'
>          string str
>          char '>'
>          return str
>
> h1 = parse xml "" "<a>A</a>"
> h2 = parse xml "" "<a><b>A</b></a>"
> h3 = parse xml "" "<a><b><c></c></b></a>"
> h4 = parse xml "" "<a><b></b><c></c></a>"
>
> Regards,
> Kashyap
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Need help with learning Parsec

Christian Maeder-2
In reply to this post by C K Kashyap
Am 19.07.2012 14:53, schrieb C K Kashyap:
> innerXML = do
>           x <- (try xml <|> gettext)
>           return x

Omit "try" (and return).
xml always starts with "<" whereas gettext never does.

C.



_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Need help with learning Parsec

Sai Hemanth K
In reply to this post by Christian Maeder-2
gettext =  (many1 $ noneOf "><") >>= (return . Body)

works for your case.



On Thu, Jul 19, 2012 at 6:37 PM, Christian Maeder <[hidden email]> wrote:
Am 19.07.2012 14:53, schrieb C K Kashyap:

Dear gentle Haskellers,

I was trying to whet my Haskell by trying out Parsec today to try and
parse out XML. Here's the code I cam up with -

I wanted some help with the "gettext" parser that I've written. I had to
do a dummy "char '  ') in there just to satisfy the "many" used in the
xml parser. I'd appreciate it very much if someone could give me some
feedback.

You don't want empty bodies! So use many1 in gettext.

  gettext = fmap Body $ many1 $ letter <|> digit

If you have spaces in your bodies, skip them or allow them with
noneOf "<".

HTH Christian



data XML =  Node String [XML]
           | Body String deriving Show

gettext = do
              x <- many (letter <|> digit )
              if (length x) > 0 then
                 return (Body x)
              else (char ' ' >> (return $ Body ""))

xml :: Parser XML
xml = do {
           name <- openTag
         ; innerXML <- many innerXML
         ; endTag name
         ; return (Node name innerXML)
          }

innerXML = do
          x <- (try xml <|> gettext)
          return x

openTag :: Parser String
openTag = do
         char '<'
         content <- many (noneOf ">")
         char '>'
         return content

endTag :: String -> Parser String
endTag str = do
         char '<'
         char '/'
         string str
         char '>'
         return str

h1 = parse xml "" "<a>A</a>"
h2 = parse xml "" "<a><b>A</b></a>"
h3 = parse xml "" "<a><b><c></c></b></a>"
h4 = parse xml "" "<a><b></b><c></c></a>"

Regards,
Kashyap


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe



_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe



--
I drink I am thunk.

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Need help with learning Parsec

Christian Maeder-2
In reply to this post by Christian Maeder-2
Am 19.07.2012 15:14, schrieb Christian Maeder:
> Am 19.07.2012 14:53, schrieb C K Kashyap:
>> innerXML = do
>>           x <- (try xml <|> gettext)
>>           return x
>
> Omit "try" (and return).
> xml always starts with "<" whereas gettext never does.

I was wrong, you do not want to swallow an endTag as openTag.

openTag should start with:
        try $ char '<' >> notFollowedBy (char '/')

and endTag should start with:
         try $ string "</"

C.

>
> C.
>
>


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Need help with learning Parsec

Simon Hengel
In reply to this post by Sai Hemanth K
On Thu, Jul 19, 2012 at 06:45:05PM +0530, Sai Hemanth K wrote:
> gettext =  (many1 $ noneOf "><") >>= (return . Body)

You can simplify this to:


    import Control.Applicative hiding ((<|>))

    gettext = Body <$> many1 (noneOf "><")


And some of your other parsers can be simplified as well:

    innerXML = xml <|> gettext

    openTag :: Parser String
    openTag = char '<' *> many (noneOf ">") <* char '>'

    endTag :: String -> Parser String
    endTag str = string "</" *> string str <* char '>'

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Need help with learning Parsec

Simon Hengel
>     gettext = Body <$> many1 (noneOf "><")

Note that this is the same as:

    gettext = Body `fmap` many1 (noneOf "><")

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Need help with learning Parsec

Simon Hengel
In reply to this post by Simon Hengel
On Thu, Jul 19, 2012 at 03:34:47PM +0200, Simon Hengel wrote:
>     openTag :: Parser String
>     openTag = char '<' *> many (noneOf ">") <* char '>'
>
>     endTag :: String -> Parser String
>     endTag str = string "</" *> string str <* char '>'

Well yes, modified to what Christian Maeder just suggested.

Cheers,
Simon

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Need help with learning Parsec

Christian Maeder-2
In reply to this post by Christian Maeder-2
Am 19.07.2012 15:26, schrieb Christian Maeder:

> Am 19.07.2012 15:14, schrieb Christian Maeder:
>> Am 19.07.2012 14:53, schrieb C K Kashyap:
>>> innerXML = do
>>>           x <- (try xml <|> gettext)
>>>           return x
>>
>> Omit "try" (and return).
>> xml always starts with "<" whereas gettext never does.
>
> I was wrong, you do not want to swallow an endTag as openTag.
>
> openTag should start with:
>      try $ char '<' >> notFollowedBy (char '/')
>
> and endTag should start with:
>          try $ string "</"

Strictly, the try in endTag is not necessary (only in openTag)

C.

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Need help with learning Parsec

Christian Maeder-2
In reply to this post by Simon Hengel
Am 19.07.2012 15:41, schrieb Simon Hengel:
> On Thu, Jul 19, 2012 at 03:34:47PM +0200, Simon Hengel wrote:
>>      openTag :: Parser String
>>      openTag = char '<' *> many (noneOf ">") <* char '>'

if you disallow empty tags and "/" within tags, then you can avoid the
notFollowedBy construct by:

        openTag = try (char '<' *> many1 (noneOf "/>")) <* char '>'

C.

>>
>>      endTag :: String -> Parser String
>>      endTag str = string "</" *> string str <* char '>'
>
> Well yes, modified to what Christian Maeder just suggested.
>
> Cheers,
> Simon
>


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Need help with learning Parsec

C K Kashyap
Thank you so much ... I've updated my monad version here - 



The applicative version however does not seem to work.

Is there a good tutorial that I can look up for Parsec - I am checking out http://legacy.cs.uu.nl/daan/download/parsec/parsec.html but  I am looking for a tutorial where a complex parser would be built ground up. 

Next I'd like to take care of escaped angular brackets.

Regards,
Kashyap


On Thu, Jul 19, 2012 at 7:40 PM, Christian Maeder <[hidden email]> wrote:
Am 19.07.2012 15:41, schrieb Simon Hengel:

On Thu, Jul 19, 2012 at 03:34:47PM +0200, Simon Hengel wrote:
     openTag :: Parser String
     openTag = char '<' *> many (noneOf ">") <* char '>'

if you disallow empty tags and "/" within tags, then you can avoid the
notFollowedBy construct by:

       openTag = try (char '<' *> many1 (noneOf "/>")) <* char '>'

C.



     endTag :: String -> Parser String
     endTag str = string "</" *> string str <* char '>'

Well yes, modified to what Christian Maeder just suggested.

Cheers,
Simon




_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Need help with learning Parsec

C K Kashyap
I've updated the parser here -  https://github.com/ckkashyap/LearningPrograms/blob/master/Haskell/Parsing/xml_3.hs 
The whole thing is less than 100 lines and it can handle comments as well.

I have an outstanding question - What's the second parameter of the parse function really for?

Regards,
Kashyap

On Thu, Jul 19, 2012 at 8:31 PM, C K Kashyap <[hidden email]> wrote:
Thank you so much ... I've updated my monad version here - 



The applicative version however does not seem to work.

Is there a good tutorial that I can look up for Parsec - I am checking out http://legacy.cs.uu.nl/daan/download/parsec/parsec.html but  I am looking for a tutorial where a complex parser would be built ground up. 

Next I'd like to take care of escaped angular brackets.

Regards,
Kashyap


On Thu, Jul 19, 2012 at 7:40 PM, Christian Maeder <[hidden email]> wrote:
Am 19.07.2012 15:41, schrieb Simon Hengel:

On Thu, Jul 19, 2012 at 03:34:47PM +0200, Simon Hengel wrote:
     openTag :: Parser String
     openTag = char '<' *> many (noneOf ">") <* char '>'

if you disallow empty tags and "/" within tags, then you can avoid the
notFollowedBy construct by:

       openTag = try (char '<' *> many1 (noneOf "/>")) <* char '>'

C.



     endTag :: String -> Parser String
     endTag str = string "</" *> string str <* char '>'

Well yes, modified to what Christian Maeder just suggested.

Cheers,
Simon





_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Need help with learning Parsec

Simon Hengel
> I have an outstanding question - What's the second parameter of the
> parse function really for?

It's used to refer to the source file on parse errors.

Cheers,
Simon

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Need help with learning Parsec

C K Kashyap
What's the function to access it?

On Sun, Jul 22, 2012 at 9:16 PM, Simon Hengel <[hidden email]> wrote:
> I have an outstanding question - What's the second parameter of the
> parse function really for?

It's used to refer to the source file on parse errors.

Cheers,
Simon


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Need help with learning Parsec

Antoine Latter-2
On Sun, Jul 22, 2012 at 11:00 AM, C K Kashyap <[hidden email]> wrote:
> What's the function to access it?
>

The function 'runParser' returns either a result or a ParseError. You
can extract the error position with the 'errorPos' function, and then
you can extract the name of the file from the position with
'sourceName'.

The the 'Show' instance of ParseError does this.

Antoine

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Need help with learning Parsec

C K Kashyap
Thanks a lot Antonie and Simon.

Regards,
Kashyap
On Mon, Jul 23, 2012 at 12:15 AM, Antoine Latter <[hidden email]> wrote:
On Sun, Jul 22, 2012 at 11:00 AM, C K Kashyap <[hidden email]> wrote:
> What's the function to access it?
>

The function 'runParser' returns either a result or a ParseError. You
can extract the error position with the 'errorPos' function, and then
you can extract the name of the file from the position with
'sourceName'.

The the 'Show' instance of ParseError does this.

Antoine


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Need help with learning Parsec

Christian Maeder-2
In reply to this post by C K Kashyap
Am 22.07.2012 17:21, schrieb C K Kashyap:
> I've updated the parser here -
> https://github.com/ckkashyap/LearningPrograms/blob/master/Haskell/Parsing/xml_3.hs
>
> The whole thing is less than 100 lines and it can handle comments as well.

This code is still not nice: Duplicate code in openTag and
withoutExplictCloseTag.

The "toplevel-try" in
   try withoutExplictCloseTag <|>  withExplicitCloseTag
should be avoided by factoring out the common prefix.

Again, I would avoid notFollowedBy by using many1.

   tag <- try(char '<' >> many1 (letter <|> digit))

In quotedChar you do not only want to escape the quote but at least the
backslash, too. You could allow to escape any character by a backslash
using:
   quotedChar c =
     try (char '\\' >> anyChar) <|> noneOf [c, '\\']

Writing a separate parser stripLeadingSpaces is overkill. Just use
   "spaces >> parseXML"

(or apply "dropWhile isSpace" to the input string)

C.

[...]

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Need help with learning Parsec

C K Kashyap
Thank you so much Christian for your feedback ... I shall incorporate them.

Regards,
Kashyap

On Mon, Jul 23, 2012 at 3:17 PM, Christian Maeder <[hidden email]> wrote:
Am 22.07.2012 17:21, schrieb C K Kashyap:

I've updated the parser here -
https://github.com/ckkashyap/LearningPrograms/blob/master/Haskell/Parsing/xml_3.hs

The whole thing is less than 100 lines and it can handle comments as well.

This code is still not nice: Duplicate code in openTag and withoutExplictCloseTag.

The "toplevel-try" in
  try withoutExplictCloseTag <|>  withExplicitCloseTag
should be avoided by factoring out the common prefix.

Again, I would avoid notFollowedBy by using many1.

  tag <- try(char '<' >> many1 (letter <|> digit))

In quotedChar you do not only want to escape the quote but at least the backslash, too. You could allow to escape any character by a backslash using:
  quotedChar c =
    try (char '\\' >> anyChar) <|> noneOf [c, '\\']

Writing a separate parser stripLeadingSpaces is overkill. Just use
  "spaces >> parseXML"

(or apply "dropWhile isSpace" to the input string)

C.

[...]


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Loading...