Very simple parser

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Very simple parser

Gregory Propf
As a programming exercise I'm trying to use the State monad to create a simple parser.  It's for a very simple assembly language for a simple virtual machine.  The state is a string of instructions.  I want to be able to call something like getNextInstruction to pull out the next instruction and then update the state (string).  I know I can do this non-monadically by just passing the string explicitly each time but I'd like to learn more about the State monad.  I also know about Parsec and Happy and so forth but this is just an exercise so I want to do it this way.  Any ideas?  I can't seem to get anything to work.  I've tried different things but I suspect I'm just missing something basic.  Can someone post a simple prototype for this?  Just assume the instructions are integers.


Pinpoint customers who are looking for what you sell.
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Very simple parser

Arie Peterson
Gregory Propf wrote:

> As a programming exercise I'm trying to use the State monad to create a
> simple parser.  It's for a very simple assembly language for a simple
> virtual machine.  The state is a string of instructions.  I want to be
> able to call something like getNextInstruction to pull out the next
> instruction and then update the state (string).  I know I can do this
> non-monadically by just passing the string explicitly each time but I'd
> like to learn more about the State monad.  I also know about Parsec and
> Happy and so forth but this is just an exercise so I want to do it this
> way.  Any ideas?  I can't seem to get anything to work.  I've tried
> different things but I suspect I'm just missing something basic.  Can
> someone post a simple prototype for this?  Just assume the instructions
> are integers.

Did you look at the documentation for the State monad?
<http://haskell.org/ghc/docs/latest/html/libraries/mtl/Control-Monad-State-Lazy.html>
It also contains some examples.

Getting the next instruction sounds like a job for 'get'; to remove it
from the state you might use 'put' or 'modify'.

If this doesn't help, can you show something you tried?


Greetings,

Arie

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Very simple parser

Stefan O'Rear
In reply to this post by Gregory Propf
vim: set ft=lhaskell:

On Mon, Jul 02, 2007 at 02:25:57PM -0700, Gregory Propf wrote:
| As a programming exercise I'm trying to use the State monad to create
| a simple parser.  It's for a very simple assembly language for a
| simple virtual machine.  The state is a string of instructions.  I
| want to be able to call something like getNextInstruction to pull out
| the next instruction and then update the state (string).  I know I can
| do this non-monadically by just passing the string explicitly each
| time but I'd like to learn more about the State monad.  I also know
| about Parsec and Happy and so forth but this is just an exercise so I
| want to do it this way.  Any ideas?  I can't seem to get anything to
| work.  I've tried different things but I suspect I'm just missing
| something basic.  Can someone post a simple prototype for this?  Just
| assume the instructions are integers.

For an example, here is the simplest parser type I know of; LL(1) a la
Crenshaw.

Our parser is simply a stateful computation using the rest of the input.

> import Char
> import Control.Monad.State

> type P = State [Char]

Simple primitives.  We need to be able to see what the next character
is.  Notice that we return Maybe because there might not be a next
character.  Also note the use of the State data constructor to modify
the value and return at the same time.

> look :: P (Maybe Char)
> look = State $ \ st -> case st of
>     []      -> (Nothing,  [])
>     (c:cs)  -> (Just c,   c:cs)

We need to do tests occasionally on the values.

> isDigit' = maybe False isDigit
> digitToInt' = maybe 0 digitToInt

getc is similar, but it removes the character.  This is typically done
after making a decision based on look.

> getc :: P (Maybe Char)
> getc = State $ \ st -> case st of
>     []      -> (Nothing,  [])
>     (c:cs)  -> (Just c,   cs)

If we find inconsistent input, we signal a fatal error using the fail
function already defined for State.  A more featureful monad such as
ErrorT (State [Char]) could be used to make error conditions non-fatal
for the program.

Often, we know what the lookahead will be; we can use this for better
error messages.  (We should also store the text position in the state,
but for pedagogical reasons I will ignore that).

For instance, if we are expecting something but don't find it, we use
expected:

> expected str = do
>     context <- gets (show . take 20)
>     fail $ str ++ " expected; found " ++ context ++ " instead"

Skipping whitespace is very important to handle our lines.  Note that
we also handle #-comments.

> white = look >>= \ch ->
>     case ch of  Just '#'   -> line  >> white
>                 Just ' '   -> getc  >> white
>                 Just '\t'  -> getc  >> white
>                 _          -> return ()
>
> line = look >>= \ch ->
>     case ch of  Just '\n'  -> return ()
>                 Nothing    -> return ()
>                 _          -> getc >> line

Another common pattern is to skip over a noise character while
verifying that it was correct.  Notice the use of expected to handle
error message formatting.

Also note that we skip whitespace afterward - from here on we will
maintain the invariant that the cursor does not point to spaces.

> match ch = look >>= \realch ->
>     if (realch == Just ch)
>         then getc
>         else expected (show ch)

Parsing integers is one of the problem requirements; it is handled
by dispatching on the first character, then parsing a natural.

> number = look >>= \ch -> case ch of
>     Just '+'  -> match '+'  >> natural 0
>     Just '-'  -> match '-'  >> negate `fmap` natural 0
>     _         -> natural 0

Naturals can be handled using a simple loop.  Notice that we check
lookahead at the *end*, this is necessary to avoid parsing eg "xx" as
a natural 0.  We use an accumulating parameter.

> natural acc = look >>= \ch -> do
>     unless (isDigit' ch) $ expected "number"
>
>     getc
>     let acc' = digitToInt' ch + acc * 10
>
>     look >>= \lk -> if isDigit' lk
>         then  natural acc'
>         else  white >> return acc'

A line of input to our assembler consists of horizontal whitespace, an
optional number, more horizontal whitespace, and a newline.

> inputLine = do
>     white
>     look >>= \ch -> if isDigit' ch
>         then  do  number
>                   white
>         else  return ()
>     match '\n'

All our input consists of input lines for as long as there is more.

> input = look >>= maybe (return ()) (\_ -> inputLine >> input)

> main = interact $ show . runState input

Stefan
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Very simple parser

Gregory Propf
In reply to this post by Gregory Propf
I hadn't seen that before, thanks.  My code wouldn't be that useful as it depends on some of my datatypes.  I don't know some basic things here since I'm so new to Haskell.  For example, am I to assume that I need to create my own instance of State and then define get and put for it?  Some of the examples seem to just use functions that manipulate a state but do not create a new datatype.

----- Original Message ----
From: Arie Peterson <[hidden email]>
To: [hidden email]
Sent: Monday, July 2, 2007 2:46:14 PM
Subject: Re: [Haskell-cafe] Very simple parser



Did you look at the documentation for the State monad?
<http://haskell.org/ghc/docs/latest/html/libraries/mtl/Control-Monad-State-Lazy.html>
It also contains some examples.

Getting the next instruction sounds like a job for 'get'; to remove it
from the state you might use 'put' or 'modify'.

If this doesn't help, can you show something you tried?


Greetings,

Arie

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe



Take the Internet to Go: Yahoo!Go puts the Internet in your pocket: mail, news, photos & more.
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Very simple parser

Hugh Perkins
In reply to this post by Stefan O'Rear
Graham Hutton has some great tutorials on parsing.  Check out the "Are parsers monodic?" thread (not exact name) for a good reference.

There's also a good tutorial at http://www.cs.nott.ac.uk/~gmh/book.html In Section "Slides", click on "8 Functional parsers", but you may just want to start from 1.  They're really quick and painless.

Graham Hutton's tutorials are about the only tutorials on monads that make sense to me.  YMMV of course.


Other than that... a list is an instance of State, I think (?), so you can do something like (writing this in directly, without trying to compile):

processor :: State a
processor = do value <- gets head
                       case value of
                            "blah" -> return blah
                            "foo" -> return foo

dotest = evalState( processor )["blah","foo"]


Note that I'm a total newbie, and I didnt check this compiles (almost certainly doesnt) so take this with a wodge of salt


I cant say I really like the way I have a case that selects on strings to decide which function to call.  If someone knows a more elegant/spelling-safe way of doing this that'd be really useful generally.

For example something like this could be more spelling safe (if it worked) (maybe it does???):

case value of
   (showConstr $ toConstr $ blah) -> return blah
   (showConstr $ toConstr $ foo) -> return foo


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Very simple parser

Arie Peterson
In reply to this post by Gregory Propf
Gregory Propf wrote:

| [...] For example, am I to assume that I need to
| create my own instance of State and then define get and put for it?

No, there is a 'State s' monad provided (for arbitrary state type 's'),
which implements the 'get' and 'put' methods. In other words, 'State s' is
an instance of the 'MonadState s' class. This terminology can be really
confusing at first.

For now, you may forget about the MonadState class. Simply use 'get' &
friends and everything will work fine.

To get you going, start with the example from the documentation, modified
slightly:

> tick :: State Int String
> tick = do
>   n <- get
>   put (n+1)
>   return (show n)

If you want to actually run 'tick', use the 'runState' function:

> runTick :: Int -> (String,Int)
> runTick = runState tick

The argument of 'runTick' is used as initial state.
The returned String is the return value, the Int is the final state.

Now, put this code in a module and load it in an interpreter (ghci,
hugs,...). Unleash 'runTick' on an integer number of your choice, and
convince yourself that the result is what you would expect from the
definition of 'tick'.

Next, try to make some variants of 'tick': replace the return value by
something else, or use a list [Int] as state, instead of a single number.


Shout if you need more or other help.


Kind regards,

Arie


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Very simple parser

Gregory Propf
In reply to this post by Gregory Propf
This was a bit baffling too.  It appears that there's an implied argument to runTick.  This also works and makes it more explicit.  I suppose the compiler just works out that the only place to put the 'n' is after tick.

runTick :: Int -> (String,Int)
runTick n = runState tick n

----- Original Message ----
From: Arie Peterson <[hidden email]>
To: [hidden email]
Sent: Monday, July 2, 2007 4:51:59 PM
Subject: Re: [Haskell-cafe] Very simple parser


To get you going, start with the example from the documentation, modified
slightly:

> tick :: State Int String
> tick = do
>   n <- get
>   put (n+1)
>   return (show n)

If you want to actually run 'tick', use the 'runState' function:

> runTick :: Int -> (String,Int)
> runTick = runState tick






Be a PS3 game guru.
Get your game face on with the latest PS3 news and previews at Yahoo! Games.
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Very simple parser

Tim Chevalier
On 7/2/07, Gregory Propf <[hidden email]> wrote:
>
>
>
> This was a bit baffling too.  It appears that there's an implied argument to runTick.  This also works and makes it more explicit.  I suppose the compiler just works out that the only place to put the 'n' is after tick.
>
> runTick :: Int -> (String,Int)
> runTick n = runState tick n
>

Not exactly. Look up "currying". (Writing out the same definition with
the argument "n" specified explicitly like you did is called
"eta-expansion", by the way.)

Cheers,
Tim

--
Tim Chevalier* catamorphism.org *Often in error, never in doubt
"My writing is all completely autobiographical. That's why I write
lesbian vampire stories." -- Jewelle Gomez
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Very simple parser

Gregory Propf
In reply to this post by Gregory Propf
OK, I get the reference to currying now.  A function of 2 variables can be seen as a function of one that returns another function of one.  So a function of one variable can be seen as a function of no variables that returns a function of one.  Very nice.

----- Original Message ----
From: Tim Chevalier <[hidden email]>
To: Gregory Propf <[hidden email]>
Cc: [hidden email]
Sent: Monday, July 2, 2007 5:37:41 PM
Subject: Re: [Haskell-cafe] Very simple parser

On 7/2/07, Gregory Propf <[hidden email]> wrote:
>
>
>
> This was a bit baffling too.  It appears that there's an implied argument to runTick.  This also works and makes it more explicit.  I suppose the compiler just works out that the only place to put the 'n' is after tick.
>
> runTick :: Int -> (String,Int)
> runTick n = runState tick n
>

Not exactly. Look up "currying". (Writing out the same definition with
the argument "n" specified explicitly like you did is called
"eta-expansion", by the way.)

Cheers,
Tim

--
Tim Chevalier* catamorphism.org *Often in error, never in doubt
"My writing is all completely autobiographical. That's why I write
lesbian vampire stories." -- Jewelle Gomez



Looking for a deal? Find great prices on flights and hotels with Yahoo! FareChase.
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Very simple parser

Alexis Hazell
In reply to this post by Arie Peterson
On Tuesday 03 July 2007 09:51, Arie Peterson wrote:

> No, there is a 'State s' monad provided (for arbitrary state type 's'),
> which implements the 'get' and 'put' methods. In other words, 'State s' is
> an instance of the 'MonadState s' class. This terminology can be really
> confusing at first.
>
> For now, you may forget about the MonadState class. Simply use 'get' &
> friends and everything will work fine.

This may be a stupid question, but i don't understand how (indeed, if) one can
maintain multiple states using the State monad, since 'get' etc. don't seem
to require that one specify which particular copy of a State monad one wishes
to 'get' from, 'put' to etc.? Does one have to use (say) a tuple or a list to
contain all the states, and when one wishes to change only one of those
states, pass that entire tuple or list to 'put' with one element changed and
the rest unchanged?


Alexis.
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Very simple parser

Arie Peterson
Alexis Hazell wrote:

| This may be a stupid question, but i don't understand how (indeed, if) one
| can
| maintain multiple states using the State monad, since 'get' etc. don't
| seem
| to require that one specify which particular copy of a State monad one
| wishes
| to 'get' from, 'put' to etc.? Does one have to use (say) a tuple or a list
| to
| contain all the states, and when one wishes to change only one of those
| states, pass that entire tuple or list to 'put' with one element changed
| and
| the rest unchanged?

There are (at least) two ways to do this:

1.

If the pieces of state are in any way related, I would try to put them
together in some data structure. Ideally, this structure is useful beyond
the state monad alone.

The 'gets' and 'modify' functions from Control.Monad.State can help to
keep low overhead.


If you use something like "functional references" (see
<http://www.mail-archive.com/haskell-cafe@.../msg24704.html>), and
define helper functions

> getR = gets . Data.Ref.select
> modifyR = modify . Data.Ref.update

, you may deal with state like this:

> data S = S
>  {
>    foo :: Foo
>    bar :: Int
>  }
>
> $(deriveRecordRefs ''S)
>
> do
>   foo <- getR fooRef
>   modifyR barRef succ
>   return foo

The syntax can be improved, but it works OK.

2.

If the pieces of state are unrelated, and especially if the different
states are not always present at the same time, I would stack multiple
StateT monad transformers. There was a question about this recently, see
<http://www.nabble.com/advice:-instantiating-duplicating-modules-t4000935.html>.


Greetings,

Arie

--

Rules to a drinking game concerning this badge will be forthcoming.


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Very simple parser

Ilya Tsindlekht
In reply to this post by Alexis Hazell
On Thu, Jul 05, 2007 at 12:58:06AM +1000, Alexis Hazell wrote:

> On Tuesday 03 July 2007 09:51, Arie Peterson wrote:
>
> > No, there is a 'State s' monad provided (for arbitrary state type 's'),
> > which implements the 'get' and 'put' methods. In other words, 'State s' is
> > an instance of the 'MonadState s' class. This terminology can be really
> > confusing at first.
> >
> > For now, you may forget about the MonadState class. Simply use 'get' &
> > friends and everything will work fine.
>
> This may be a stupid question, but i don't understand how (indeed, if) one can
> maintain multiple states using the State monad, since 'get' etc. don't seem
> to require that one specify which particular copy of a State monad one wishes
> to 'get' from, 'put' to etc.? Does one have to use (say) a tuple or a list to
> contain all the states, and when one wishes to change only one of those
> states, pass that entire tuple or list to 'put' with one element changed and
> the rest unchanged?
>
>
> Alexis.
A value of type 'State t' contains an incapsulated function can be
de-encapsulated using runState and when evaluated, performs the actual
computation. This function maintains state internally, so for each
invocation of this function (such functions from other values of type
'State t') state is preserved separately.

Hope this clarifies your confusion.
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe