Parsec and Validation

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Parsec and Validation

Vladimir Solmon
I'm attempting to use Parsec to write a parser for times and dates.
I'm aware of Data.Time.Format, but it doesn't offer the flexibility I
need for this project.

My current code (see below) uses Control.Monad.guard to validate the
numeric values for hours and minutes inside the parser. The problem
I'm running into is that checking for errors in the parsing code
causes Text.Parsec.Combinator.choice (in the time function below) to
fail and return an error if it cannot parse input using the first
option (tTimeHourMin), even if the input should match the second
option (t24hrClock).

I have several questions:

Why is choice failing in the time function below?

Is there a better way to do numeric validation while parsing?

Should I only use Parsec to validate that the input is syntactically
correct and do the numeric validation elsewhere?

What are some good examples of validating input using Parsec?


Here is a bit of the code I'm working with:

time :: Parser TimeOfDay
time = choice [ tTimeHourMin,
? ? ? ? ? ? ? ? ? ? ? ? ? t24hrClock
? ? ? ? ? ? ? ? ? ? ? ?]

tTimeHourMin :: Parser TimeOfDay
tTimeHourMin = do
? ? hour <- range (0, 23)
? ? oneOf " :,."
? ? min <- range (0, 59)
? ? return (TimeOfDay hour min 0)

t24hrClock :: Parser TimeOfDay
t24hrClock = do
? ? (h, m) <- splitAt 2 <$> count 4 digit
? ? let hour = read h
? ? let min = read m
? ? guard (hour >= 0 && hour <= 23 && min >= 0 && min <= 59)
? ? ? <?> printf "24hr time represented as hhmm"
? ? return (TimeOfDay hour min 0)

range :: (Int,Int) -> Parser Int
range (lower, upper) = do
? ?t <- read <$> many1 digit
? ?guard (t >= lower && t <= upper)
? ? ?<?> printf "integer in range [%d,%d]" lower upper
? ?return t


Thanks for any suggestions,

vladimir
Reply | Threaded
Open this post in threaded view
|

Parsec and Validation

Ozgur Akgun
Could you try it with try's:

time :: Parser TimeOfDay
time = choice $ map try [ tTimeHourMin, t24hrClock ]

In a very informal and loose description, if tTimeHourMin consumes some
input before failing, parsec gives up.

Best,
Ozgur

On 31 July 2010 20:57, Vladimir Solmon <[hidden email]> wrote:

> time :: Parser TimeOfDay
> time = choice [ tTimeHourMin,
>                           t24hrClock
>                        ]
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/beginners/attachments/20100731/53b73f5e/attachment.html
Reply | Threaded
Open this post in threaded view
|

Parsec and Validation

Brandon S Allbery KF8NH
In reply to this post by Vladimir Solmon
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 7/31/10 15:57 , Vladimir Solmon wrote:
> time :: Parser TimeOfDay
> time = choice [ tTimeHourMin,
>                           t24hrClock
>                        ]

If the parse of tTimeHourMin fails after reading some characters (most
probably, at the oneOf because it has been fed a t24hrClock value), those
characters remain read and t24hrClock will pick up where the oneOf failed,
then itself fail because all the digits were read by the many1 in range.  To
prevent this, resetting to where tTimeHourMin started its parse, wrap it in
a try:

> time = choice [ try tTimeHourMin
> , t24hrClock
> ]

- --
brandon s. allbery     [linux,solaris,freebsd,perl]      [hidden email]
system administrator  [openafs,heimdal,too many hats]  [hidden email]
electrical and computer engineering, carnegie mellon university      KF8NH
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.10 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkxUg28ACgkQIn7hlCsL25WvWwCfd+a9hGc2iS/Gxph+SjDKOuIg
L6cAoIbGGUojYjmruoo3vsiu9HGB8kMl
=cndm
-----END PGP SIGNATURE-----
Reply | Threaded
Open this post in threaded view
|

Parsec and Validation

Stephen Tetley-2
Hello Vladimir

In general it is better to avoid the try combinator and left factor
the grammar instead.

Because parsers are functions you can abuse left factoring a bit,
parse the prefix in a top level combinator and supply the prefix to
the sub-parsers:

leftFactoredTime :: Parser TimeOfDay
leftFactoredTime = do
  hh   <- width2Num
  sep  <- optionMaybe (oneOf ";,.")
  case sep of
    Nothing -> tTimeHourMin hh
    Just _  -> t24hrClock hh

tTimeHourMin :: Int -> Parser TimeOfDay
tTimeHourMin hh = do
  mm <- width2Num
  return (TimeOfDay hh mm 0)


t24hrClock :: Int -> Parser TimeOfDay
t24hrClock hh = do
  mm <- width2Num
  return (TimeOfDay hh mm 0)


However in this case, the 24 hour clock and TimeHourMin are identical
functions, the separator is a McGuffin [*] so:

betterTime :: Parser TimeOfDay
betterTime = do
  hh    <- rangeP 0 23 width2Num
  _sep  <- optionMaybe (oneOf ";,.")
  mm    <- rangeP 0 59 width2Num
  return (TimeOfDay hh mm 0)

To parse ranges I would make a new combinator that takes a range plus
a number parser and returns a new number parser:


rangeP :: Int -> Int -> Parser Int -> Parser Int
rangeP hi lo p = do
  a <- p
  if (lo <= a && a <= hi) then return a else fail "out-of-range"

Finally avoiding using read is good when using Parsec. Parsec has
token parsers to read numbers, but here read can be avoided with this
one:

width2Num :: Parser Int
width2Num = do
   a <- digit
   b <- digit
   return $ (10*digitToInt a) + digitToInt b


digitToInt is in the Data.Char module.

[*] A plot device used by Alfred Hitchcock films to throw the viewer
off the scent.