attoparsec: How to use the many1 combinator

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

attoparsec: How to use the many1 combinator

Roland Senn
I'm trying to do a first very simple example with attoparsec.

The file "test.txt" contains the line:

START 111 2222 333 END

The following code works and gives the result: Done "\n"
["111","2222","333"]

{-# LANGUAGE OverloadedStrings #-}
import qualified Data.ByteString as BS
import Control.Applicative
import Data.Attoparsec.ByteString as P
main :: IO()
main = do
     bs <- BS.readFile "test.txt"
     parseTest pTest bs
pTest :: Parser [BS.ByteString]
pTest = do
     string "START"
     n1 <- pNumber
     n2 <- pNumber
     n3 <- pNumber
     string "END"
     return [n1, n2, n3]
pNumber :: Parser BS.ByteString
pNumber = do
     pSkipSpaces
     term <- P.takeWhile (\c -> c >= 0x31 && c <= 0x39)
     pSkipSpaces
     return term
pSkipSpaces :: Parser ()
pSkipSpaces = do
     P.takeWhile (\c -> c == 0x20)
     return ()

Unfortunately I must have exactly three numbers between START and END.
To make this more flexible, I changed pTest:

pTest :: Parser [BS.ByteString]
pTest = do
     string "START"
     ns <- P.many1 $ pNumber <* (string "END")
     return ns

Now the program fails with: Fail "2222 333 END\n" [] "Failed reading:
takeWith"
Why ??

Many thanks for your help!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Test.hs
Type: text/x-haskell
Size: 589 bytes
Desc: not available
URL: <http://www.haskell.org/pipermail/beginners/attachments/20140621/3be9d117/attachment.hs>
-------------- next part --------------
START 111 2222 333 END

Reply | Threaded
Open this post in threaded view
|

attoparsec: How to use the many1 combinator

David McBride
You have a couple problems here.

P.many1 $ pNumber <* (string "END") ... which is the same as ... P.many1
(pNumber <* string "END")

In other words it matches 111 END 222 END 333 END.  Try this:

ns <- (P.many1 pNumber) <* string "END"

This almost works, but when you actually run it you'll get an infinite
loop.  The reason is because pNumber if it doesn't find a match it will
never actually fail, therefore many1 will continue using it forever
attempting to find a match that will never occur.  The reason why it never
fails is that it is composed of combinators that never fail, pSkipSpaces
and itself are both composed entirely of takeWhiles, which if they don't
find a match they just continue on without doing anything.  If you make a
small change though:

term <- P.takeWhile1 (\c -> c >= 0x31 && c <= 0x39)

then it works fine.




On Sat, Jun 21, 2014 at 8:47 AM, Roland Senn <rsx at bluewin.ch> wrote:

> I'm trying to do a first very simple example with attoparsec.
>
> The file "test.txt" contains the line:
>
> START 111 2222 333 END
>
> The following code works and gives the result: Done "\n"
> ["111","2222","333"]
>
> {-# LANGUAGE OverloadedStrings #-}
> import qualified Data.ByteString as BS
> import Control.Applicative
> import Data.Attoparsec.ByteString as P
> main :: IO()
> main = do
>     bs <- BS.readFile "test.txt"
>     parseTest pTest bs
> pTest :: Parser [BS.ByteString]
> pTest = do
>     string "START"
>     n1 <- pNumber
>     n2 <- pNumber
>     n3 <- pNumber
>     string "END"
>     return [n1, n2, n3]
> pNumber :: Parser BS.ByteString
> pNumber = do
>     pSkipSpaces
>     term <- P.takeWhile (\c -> c >= 0x31 && c <= 0x39)
>     pSkipSpaces
>     return term
> pSkipSpaces :: Parser ()
> pSkipSpaces = do
>     P.takeWhile (\c -> c == 0x20)
>     return ()
>
> Unfortunately I must have exactly three numbers between START and END. To
> make this more flexible, I changed pTest:
>
> pTest :: Parser [BS.ByteString]
> pTest = do
>     string "START"
>     ns <- P.many1 $ pNumber <* (string "END")
>     return ns
>
> Now the program fails with: Fail "2222 333 END\n" [] "Failed reading:
> takeWith"
> Why ??
>
> Many thanks for your help!
>
> _______________________________________________
> Beginners mailing list
> Beginners at haskell.org
> http://www.haskell.org/mailman/listinfo/beginners
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/beginners/attachments/20140621/f10c0ea8/attachment.html>

Reply | Threaded
Open this post in threaded view
|

attoparsec: How to use the many1 combinator

Roland Senn
David,

Many thanks! Yes I had the infinite loop to0.

Regards
    Roland

Am 21.06.2014 15:33, schrieb David McBride:

> You have a couple problems here.
>
> P.many1 $ pNumber <* (string "END") ... which is the same as ...
> P.many1 (pNumber <* string "END")
>
> In other words it matches 111 END 222 END 333 END.  Try this:
>
> ns <- (P.many1 pNumber) <* string "END"
>
> This almost works, but when you actually run it you'll get an infinite
> loop.  The reason is because pNumber if it doesn't find a match it
> will never actually fail, therefore many1 will continue using it
> forever attempting to find a match that will never occur.  The reason
> why it never fails is that it is composed of combinators that never
> fail, pSkipSpaces and itself are both composed entirely of takeWhiles,
> which if they don't find a match they just continue on without doing
> anything.  If you make a small change though:
>
> term <- P.takeWhile1 (\c -> c >= 0x31 && c <= 0x39)
>
> then it works fine.
>
>
>
>
> On Sat, Jun 21, 2014 at 8:47 AM, Roland Senn <rsx at bluewin.ch
> <mailto:rsx at bluewin.ch>> wrote:
>
>     I'm trying to do a first very simple example with attoparsec.
>
>     The file "test.txt" contains the line:
>
>     START 111 2222 333 END
>
>     The following code works and gives the result: Done "\n"
>     ["111","2222","333"]
>
>     {-# LANGUAGE OverloadedStrings #-}
>     import qualified Data.ByteString as BS
>     import Control.Applicative
>     import Data.Attoparsec.ByteString as P
>     main :: IO()
>     main = do
>         bs <- BS.readFile "test.txt"
>         parseTest pTest bs
>     pTest :: Parser [BS.ByteString]
>     pTest = do
>         string "START"
>         n1 <- pNumber
>         n2 <- pNumber
>         n3 <- pNumber
>         string "END"
>         return [n1, n2, n3]
>     pNumber :: Parser BS.ByteString
>     pNumber = do
>         pSkipSpaces
>         term <- P.takeWhile (\c -> c >= 0x31 && c <= 0x39)
>         pSkipSpaces
>         return term
>     pSkipSpaces :: Parser ()
>     pSkipSpaces = do
>         P.takeWhile (\c -> c == 0x20)
>         return ()
>
>     Unfortunately I must have exactly three numbers between START and
>     END. To make this more flexible, I changed pTest:
>
>     pTest :: Parser [BS.ByteString]
>     pTest = do
>         string "START"
>         ns <- P.many1 $ pNumber <* (string "END")
>         return ns
>
>     Now the program fails with: Fail "2222 333 END\n" [] "Failed
>     reading: takeWith"
>     Why ??
>
>     Many thanks for your help!
>
>     _______________________________________________
>     Beginners mailing list
>     Beginners at haskell.org <mailto:Beginners at haskell.org>
>     http://www.haskell.org/mailman/listinfo/beginners
>
>
>
>
> _______________________________________________
> Beginners mailing list
> Beginners at haskell.org
> http://www.haskell.org/mailman/listinfo/beginners

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/beginners/attachments/20140621/beb15b8b/attachment.html>