question, chapter 10 Real World Haskell

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

question, chapter 10 Real World Haskell

Michael Mossey
I'll looking at the parser example, page 242 in Chapter 10 of Real World Haskell, and
they are defining a type of monadic parser with the help of an operator they call ==>

You can find chapter 10 online. This ebook doesn't have page numbers, but you can
find the example I'm looking at in the second called "A more interesting parser",
about 40% of the way down:

<http://book.realworldhaskell.org/read/code-case-study-parsing-a-binary-data-format.html>

The authors have defined their parser by chaining together functions with ==>. The
first function is "getState". What confuses me is: they use getState to "get the
state out of the Parser," but a Parser is by definition a function that takes the
parse state as its lone argument. I don't understand why they can't drop getState
entirely. Maybe it's a simply a way to avoid wrapping the entire function in Parser
(...)?

Some of this stuff looks "inefficient" to me, but I realize that in a lazy language
with an optimizing compiler you can often write long chains of functions (many of
which discard their results) and not impede efficiency.


Thanks,
Mike

Reply | Threaded
Open this post in threaded view
|

question, chapter 10 Real World Haskell

Quentin Moser
On Sat, 11 Apr 2009 02:02:04 -0700
Michael Mossey <[hidden email]> wrote:

> I'll looking at the parser example, page 242 in Chapter 10 of Real
> World Haskell, and they are defining a type of monadic parser with
> the help of an operator they call ==>
>
> You can find chapter 10 online. This ebook doesn't have page numbers,
> but you can find the example I'm looking at in the second called "A
> more interesting parser", about 40% of the way down:
>
> <http://book.realworldhaskell.org/read/code-case-study-parsing-a-binary-data-format.html>
>
> The authors have defined their parser by chaining together functions
> with ==>. The first function is "getState". What confuses me is: they
> use getState to "get the state out of the Parser," but a Parser is by
> definition a function that takes the parse state as its lone
> argument. I don't understand why they can't drop getState entirely.
> Maybe it's a simply a way to avoid wrapping the entire function in
> Parser (...)?
>
> Some of this stuff looks "inefficient" to me, but I realize that in a
> lazy language with an optimizing compiler you can often write long
> chains of functions (many of which discard their results) and not
> impede efficiency.
>
>
> Thanks,
> Mike
>
> _______________________________________________
> Beginners mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/beginners


This is a matter of good coding style (in the sense of modularity,
maintanability, etc.).

As you've seen it, the authors have first
started with a (>>?) function that allows them to combine Maybes, that
is to combine functions with a possibility of error.

We therefore had the following functions and types:

> data Maybe a     -- The type of parsing results
> (>>?) :: Maybe a -> (a -> Maybe b) -> Maybe b
>                  -- To combine parsing functions
>
> Nothing :: Maybe a    -- To signal an error
> Just :: a -> Maybe a  -- To return a result without error

Code written using this (i.e. parseP5_take2) makes explicit use of the
representation of parsing results as Maybe values.

When they added implicit state and error messages to the machinery,
they had to modify their parsing code by renaming things around. More
importantly, the changes affected all their code, even the parts that
wouldn't make use of the new functionality. If they had already written
a huge body of code that could work perfectly well without reporting
errors or accessing state, they'd still have needed to refactor it.

Now assume they had, instead, treated the type of parsing results as
abstract from the beginning:

> newtype Parse a = Parse (Maybe a)
>                  -- The type of parsing results
> (===>) :: Parse a -> (a -> Parse b) -> Parse b
> p ===> f = case p of
>              Parse Nothing -> Parse Nothing
>              Parse (Just a) -> f a
>
> failure :: Parse a
> failure = Parse Nothing
>
> result :: a -> Parse a
> result a = Parse (Just a)

Writing their code in terms of Parse, (===>), failure and result rather
than Maybe, (>>?), Nothing and Just wouldn't have been any more
difficult. But they could then have added implicit state and error
messages by simpliy replacing the definitions:

> newtype Parse a = Parse (ParseState -> Either String (a, ParseState))
>                  -- The type of parsing results
> (===>) :: Parse a -> (a -> Parse b) -> Parse b
> (===>) = ...
>
> failure = bail "generic error"
> result = identity
>
> -- And the new primitives:
> bail :: String -> Parse a
> getState :: Parse ParseState
> putState :: ParseState -> Parse ()

Doing so, code that didn't make use of the new functionality
wouldn't have needed to be modified _at all_. Code that made use of it
would simply have called the new functions bail, getState and putState
at appropriate points. Given the magnitude of the change in
functionality, this is actually pretty incredible.



Now, and to finally answer your question:

Besides state and errors, there are a number of other useful facilities
that can fit in the framework of a (Parse a) type and a (===>) function
as above. Real World Haskell will introduce them in a systematic way
starting with Chapter 14 (Monads).

As long as our parsing code treats the Parse type as abstract, adding
another one of these facilities (like, say, backtracking) will be as
straightforward as adding state and error messages was in my example
above, and won't require any change in code that doesn't make use of
it. If we started explicitly using the fact that (Parse a) is a
function on ParseState, things would be different.

It is, generally, a matter of good style in all programming paradigms
to treat values as abstract as often as possible and only access them
through a well-defined interface that isn't dependent on their actual
representations. This is even more true in cases like this, when the
interface in question conforms to a well-known pattern (in this
particular case, it's a Monad) that can accomodate a large number of
different capabilities and uses.
Reply | Threaded
Open this post in threaded view
|

question, chapter 10 Real World Haskell

Michael Mossey


Quentin Moser wrote:

> On Sat, 11 Apr 2009 02:02:04 -0700
> Michael Mossey <[hidden email]> wrote:
>
>
> This is a matter of good coding style (in the sense of modularity,
> maintanability, etc.).
>
> As you've seen it, the authors have first
> started with a (>>?) function that allows them to combine Maybes, that
> is to combine functions with a possibility of error.
...

Thanks for the detailed explanation. I like how you show that the original >>?
function, and the method of signaling either an error or a result, could have been
written with ==> notation.

But, I'm still confused on a point. Let me put it this way. The authors wrote:

-- file: ch10/Parse.hs
-- import the Word8 type from Data.Word
parseByte :: Parse Word8
parseByte =
     getState ==> \initState ->
     case L.uncons (string initState) of
       Nothing ->
           bail "no more input"
       Just (byte,remainder) ->
           putState newState ==> \_ ->
           identity byte
         where newState = initState { string = remainder,
                                      offset = newOffset }
               newOffset = offset initState + 1


Why couldn't they have avoided getState by writing

parseByte = Parse (
   \initState ->
   case L.unconcs (string initState) of
     ...
     ...
               newOffset = offset initState +1
   )

...because a parser is by definition a function that takes ParseState as its input. I
understand you are saying that in future chapters they may introduce new capabilities
that fit within this framework, so maybe that's what I'm not seeing. Maybe they will
redefine getState (or something equivalent) so that it does more than grab the
unchanged state from the Parse.

But I'm curious to know if my second implementation works the same as the first, in
theory.

Thanks
Reply | Threaded
Open this post in threaded view
|

question, chapter 10 Real World Haskell

Daniel Fischer-4
Am Samstag 11 April 2009 16:41:50 schrieb Michael Mossey:

> Quentin Moser wrote:
> > On Sat, 11 Apr 2009 02:02:04 -0700
> > Michael Mossey <[hidden email]> wrote:
> >
> >
> > This is a matter of good coding style (in the sense of modularity,
> > maintanability, etc.).
> >
> > As you've seen it, the authors have first
> > started with a (>>?) function that allows them to combine Maybes, that
> > is to combine functions with a possibility of error.
>
> ...
>
> Thanks for the detailed explanation. I like how you show that the original
> >>? function, and the method of signaling either an error or a result,
> could have been written with ==> notation.
>
> But, I'm still confused on a point. Let me put it this way. The authors
> wrote:
>
> -- file: ch10/Parse.hs
> -- import the Word8 type from Data.Word
> parseByte :: Parse Word8
> parseByte =
>      getState ==> \initState ->
>      case L.uncons (string initState) of
>        Nothing ->
>            bail "no more input"
>        Just (byte,remainder) ->
>            putState newState ==> \_ ->
>            identity byte
>          where newState = initState { string = remainder,
>                                       offset = newOffset }
>                newOffset = offset initState + 1
>
>
> Why couldn't they have avoided getState by writing
>
> parseByte = Parse (
>    \initState ->
>    case L.unconcs (string initState) of
>      ...
>      ...
>                newOffset = offset initState +1
>    )
>
> ...because a parser is by definition a function that takes ParseState as
> its input. I understand you are saying that in future chapters they may
> introduce new capabilities that fit within this framework, so maybe that's
> what I'm not seeing. Maybe they will redefine getState (or something
> equivalent) so that it does more than grab the unchanged state from the
> Parse.

In a real library, the constructor Parse would not be exported, to allow later changing
the implementation of the Parse type without breaking user code.
So user code like parseByte cannot access the constructor and must use the exported API
(getState, putState, identity, bail, ...).

>
> But I'm curious to know if my second implementation works the same as the
> first, in theory.

As long as the constructor is accessible, yes.

>
> Thanks

Reply | Threaded
Open this post in threaded view
|

question, chapter 10 Real World Haskell

Michael Mossey


Daniel Fischer wrote:
>
> In a real library, the constructor Parse would not be exported, to allow later changing
> the implementation of the Parse type without breaking user code.
> So user code like parseByte cannot access the constructor and must use the exported API
> (getState, putState, identity, bail, ...).

thanks for the point, I get it. One other question. Later in the chapter they define
peekByte as follows:

-- file: ch10/Parse.hs
peekByte :: Parse (Maybe Word8)
peekByte = (fmap fst . L.uncons . string) <$> getState

Here they are accessing the 'string' field of the state. So whomever writes this
function needs to have the accessor functions. At this point I'm wondering how much
state is really getting hidden. Or maybe peekByte would only be written inside the
original library.

I was just playing around and discovered you can choose to export the accessor
functions or not. Let's say we have

--------
module Mod where

data What = What { acc :: Int64 }
--------

now in a separate module we write:

-------------------
-- All fine:
import Mod
x1 = What 3
x2 = What { acc = 3 }
x3 = acc x1

---------------

Now we revisit the first module and write
module Mod ( What(What) ) where

Now,
import Mod
x1 = What 3             -- Fine
x2 = What { acc = 3 }   -- Error
x3 = acc x1             -- Error

So apparently this exports the constructor but not the accessor fields. The item can
be constructed in the normal way, but not in record syntax way.

Finally
module Mod ( What(acc) ) where
   --> This exports the accessors which can be used to access, but not to construct

module Mod (What(..)) where
    --> export everything


So the lesson is, if we are going to allow users to write functions like peekByte, we
have to export the accessor functions, but not necessarily any constructors.
Reply | Threaded
Open this post in threaded view
|

question, chapter 10 Real World Haskell

Quentin Moser
On Sat, 11 Apr 2009 09:37:53 -0700
Michael Mossey <[hidden email]> wrote:

> One other question. Later in the
> chapter they define peekByte as follows:
>
> -- file: ch10/Parse.hs
> peekByte :: Parse (Maybe Word8)
> peekByte = (fmap fst . L.uncons . string) <$> getState
>
> Here they are accessing the 'string' field of the state. So whomever
> writes this function needs to have the accessor functions. At this
> point I'm wondering how much state is really getting hidden. Or maybe
> peekByte would only be written inside the original library.

Even before worrying about the accessor functions, a parsing library
would in the first place not even export getState, putState, or the
ParseState type. It would instead provide functions like parseByte and
peekByte as primitives from which all complex parsers will be built.
Reply | Threaded
Open this post in threaded view
|

question, chapter 10 Real World Haskell

Daniel Fischer-4
Am Samstag 11 April 2009 19:04:22 schrieb Quentin Moser:

> On Sat, 11 Apr 2009 09:37:53 -0700
>
> Michael Mossey <[hidden email]> wrote:
> > One other question. Later in the
> > chapter they define peekByte as follows:
> >
> > -- file: ch10/Parse.hs
> > peekByte :: Parse (Maybe Word8)
> > peekByte = (fmap fst . L.uncons . string) <$> getState
> >
> > Here they are accessing the 'string' field of the state. So whomever
> > writes this function needs to have the accessor functions. At this
> > point I'm wondering how much state is really getting hidden. Or maybe
> > peekByte would only be written inside the original library.

The point is not so much hiding the state, but hiding the details of the Parse type.

newtype Parse a = Parse { runParse :: ParseState -> Either String (a,ParseState) }

is, except for the names, StateT ParseState (Either String) a.

What peekByte assumes is that there is an

instance MonadState ParseState Parse where...

, implicitly at least. peekByte won't break if the implementation of Parse is changed, as
long as that is maintained.
If you look at the haddock docs for Parsec, you'll see that there the State is exposed
while GenParser is only exported as an abstract type, allowing the details to be changed
if deemed advantageous.

>
> Even before worrying about the accessor functions, a parsing library
> would in the first place not even export getState, putState, or the
> ParseState type. It would instead provide functions like parseByte and
> peekByte as primitives from which all complex parsers will be built.

Parsec does:
getParserState
setParserState