|
Dear gentle Haskellers,
I was trying to whet my Haskell by trying out Parsec today to try and parse out XML. Here's the code I cam up with - I wanted some help with the "gettext" parser that I've written. I had to do a dummy "char ' ') in there just to satisfy the "many" used in the xml parser. I'd appreciate it very much if someone could give me some feedback.
data XML = Node String [XML] | Body String deriving Show gettext = do x <- many (letter <|> digit )
if (length x) > 0 then return (Body x) else (char ' ' >> (return $ Body "")) xml :: Parser XML
xml = do { name <- openTag ; innerXML <- many innerXML ; endTag name ; return (Node name innerXML) } innerXML = do x <- (try xml <|> gettext) return x openTag :: Parser String openTag = do char '<'
content <- many (noneOf ">") char '>' return content endTag :: String -> Parser String endTag str = do
char '<' char '/' string str char '>' return str h1 = parse xml "" "<a>A</a>"
h2 = parse xml "" "<a><b>A</b></a>" h3 = parse xml "" "<a><b><c></c></b></a>" h4 = parse xml "" "<a><b></b><c></c></a>"
Regards, Kashyap
_______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
Am 19.07.2012 14:53, schrieb C K Kashyap:
> Dear gentle Haskellers, > > I was trying to whet my Haskell by trying out Parsec today to try and > parse out XML. Here's the code I cam up with - > > I wanted some help with the "gettext" parser that I've written. I had to > do a dummy "char ' ') in there just to satisfy the "many" used in the > xml parser. I'd appreciate it very much if someone could give me some > feedback. You don't want empty bodies! So use many1 in gettext. gettext = fmap Body $ many1 $ letter <|> digit If you have spaces in your bodies, skip them or allow them with noneOf "<". HTH Christian > > > data XML = Node String [XML] > | Body String deriving Show > > gettext = do > x <- many (letter <|> digit ) > if (length x) > 0 then > return (Body x) > else (char ' ' >> (return $ Body "")) > > xml :: Parser XML > xml = do { > name <- openTag > ; innerXML <- many innerXML > ; endTag name > ; return (Node name innerXML) > } > > innerXML = do > x <- (try xml <|> gettext) > return x > > openTag :: Parser String > openTag = do > char '<' > content <- many (noneOf ">") > char '>' > return content > > endTag :: String -> Parser String > endTag str = do > char '<' > char '/' > string str > char '>' > return str > > h1 = parse xml "" "<a>A</a>" > h2 = parse xml "" "<a><b>A</b></a>" > h3 = parse xml "" "<a><b><c></c></b></a>" > h4 = parse xml "" "<a><b></b><c></c></a>" > > Regards, > Kashyap > > > _______________________________________________ > Haskell-Cafe mailing list > [hidden email] > http://www.haskell.org/mailman/listinfo/haskell-cafe > _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by C K Kashyap
Am 19.07.2012 14:53, schrieb C K Kashyap:
> innerXML = do > x <- (try xml <|> gettext) > return x Omit "try" (and return). xml always starts with "<" whereas gettext never does. C. _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by Christian Maeder-2
gettext = (many1 $ noneOf "><") >>= (return . Body)
works for your case. On Thu, Jul 19, 2012 at 6:37 PM, Christian Maeder <[hidden email]> wrote: Am 19.07.2012 14:53, schrieb C K Kashyap: I drink I am thunk. _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by Christian Maeder-2
Am 19.07.2012 15:14, schrieb Christian Maeder:
> Am 19.07.2012 14:53, schrieb C K Kashyap: >> innerXML = do >> x <- (try xml <|> gettext) >> return x > > Omit "try" (and return). > xml always starts with "<" whereas gettext never does. I was wrong, you do not want to swallow an endTag as openTag. openTag should start with: try $ char '<' >> notFollowedBy (char '/') and endTag should start with: try $ string "</" C. > > C. > > _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by Sai Hemanth K
On Thu, Jul 19, 2012 at 06:45:05PM +0530, Sai Hemanth K wrote:
> gettext = (many1 $ noneOf "><") >>= (return . Body) You can simplify this to: import Control.Applicative hiding ((<|>)) gettext = Body <$> many1 (noneOf "><") And some of your other parsers can be simplified as well: innerXML = xml <|> gettext openTag :: Parser String openTag = char '<' *> many (noneOf ">") <* char '>' endTag :: String -> Parser String endTag str = string "</" *> string str <* char '>' _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
> gettext = Body <$> many1 (noneOf "><")
Note that this is the same as: gettext = Body `fmap` many1 (noneOf "><") _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by Simon Hengel
On Thu, Jul 19, 2012 at 03:34:47PM +0200, Simon Hengel wrote:
> openTag :: Parser String > openTag = char '<' *> many (noneOf ">") <* char '>' > > endTag :: String -> Parser String > endTag str = string "</" *> string str <* char '>' Well yes, modified to what Christian Maeder just suggested. Cheers, Simon _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by Christian Maeder-2
Am 19.07.2012 15:26, schrieb Christian Maeder:
> Am 19.07.2012 15:14, schrieb Christian Maeder: >> Am 19.07.2012 14:53, schrieb C K Kashyap: >>> innerXML = do >>> x <- (try xml <|> gettext) >>> return x >> >> Omit "try" (and return). >> xml always starts with "<" whereas gettext never does. > > I was wrong, you do not want to swallow an endTag as openTag. > > openTag should start with: > try $ char '<' >> notFollowedBy (char '/') > > and endTag should start with: > try $ string "</" Strictly, the try in endTag is not necessary (only in openTag) C. _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by Simon Hengel
Am 19.07.2012 15:41, schrieb Simon Hengel:
> On Thu, Jul 19, 2012 at 03:34:47PM +0200, Simon Hengel wrote: >> openTag :: Parser String >> openTag = char '<' *> many (noneOf ">") <* char '>' if you disallow empty tags and "/" within tags, then you can avoid the notFollowedBy construct by: openTag = try (char '<' *> many1 (noneOf "/>")) <* char '>' C. >> >> endTag :: String -> Parser String >> endTag str = string "</" *> string str <* char '>' > > Well yes, modified to what Christian Maeder just suggested. > > Cheers, > Simon > _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
Thank you so much ... I've updated my monad version here -
and the Applicative version here -
https://github.com/ckkashyap/LearningPrograms/blob/master/Haskell/Parsing/xml_2.hs The applicative version however does not seem to work.
Is there a good tutorial that I can look up for Parsec - I am checking out http://legacy.cs.uu.nl/daan/download/parsec/parsec.html but I am looking for a tutorial where a complex parser would be built ground up.
Next I'd like to take care of escaped angular brackets. Regards, Kashyap On Thu, Jul 19, 2012 at 7:40 PM, Christian Maeder <[hidden email]> wrote: Am 19.07.2012 15:41, schrieb Simon Hengel: _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
I've updated the parser here -
https://github.com/ckkashyap/LearningPrograms/blob/master/Haskell/Parsing/xml_3.hs
The whole thing is less than 100 lines and it can handle comments as well.
I have an outstanding question - What's the second parameter of the parse function really for? Regards, Kashyap
On Thu, Jul 19, 2012 at 8:31 PM, C K Kashyap <[hidden email]> wrote: Thank you so much ... I've updated my monad version here - _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
> I have an outstanding question - What's the second parameter of the
> parse function really for? It's used to refer to the source file on parse errors. Cheers, Simon _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
What's the function to access it?
On Sun, Jul 22, 2012 at 9:16 PM, Simon Hengel <[hidden email]> wrote:
_______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
On Sun, Jul 22, 2012 at 11:00 AM, C K Kashyap <[hidden email]> wrote:
> What's the function to access it? > The function 'runParser' returns either a result or a ParseError. You can extract the error position with the 'errorPos' function, and then you can extract the name of the file from the position with 'sourceName'. The the 'Show' instance of ParseError does this. Antoine _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
Thanks a lot Antonie and Simon.
Regards, Kashyap
On Mon, Jul 23, 2012 at 12:15 AM, Antoine Latter <[hidden email]> wrote:
_______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by C K Kashyap
Am 22.07.2012 17:21, schrieb C K Kashyap:
> I've updated the parser here - > https://github.com/ckkashyap/LearningPrograms/blob/master/Haskell/Parsing/xml_3.hs > > The whole thing is less than 100 lines and it can handle comments as well. This code is still not nice: Duplicate code in openTag and withoutExplictCloseTag. The "toplevel-try" in try withoutExplictCloseTag <|> withExplicitCloseTag should be avoided by factoring out the common prefix. Again, I would avoid notFollowedBy by using many1. tag <- try(char '<' >> many1 (letter <|> digit)) In quotedChar you do not only want to escape the quote but at least the backslash, too. You could allow to escape any character by a backslash using: quotedChar c = try (char '\\' >> anyChar) <|> noneOf [c, '\\'] Writing a separate parser stripLeadingSpaces is overkill. Just use "spaces >> parseXML" (or apply "dropWhile isSpace" to the input string) C. [...] _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
Thank you so much Christian for your feedback ... I shall incorporate them.
Regards, Kashyap
On Mon, Jul 23, 2012 at 3:17 PM, Christian Maeder <[hidden email]> wrote: Am 22.07.2012 17:21, schrieb C K Kashyap: _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
| Powered by Nabble | Edit this page |
