monadic DSL for compile-time parser generator, not possible?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

monadic DSL for compile-time parser generator, not possible?

Jeremy Shaw-3
It would be pretty damn cool if you could create a data type for
generically describing a monadic parser, and then use template haskell
to generate a concrete parser from that data type. That would allow
you to create your specification in a generic way and then target
different parsers like parsec, attoparsec, etc. There is some code
coming up in a few paragraphs that should describe this idea more
clearly.

I would like to suggest that while it would be cool, it is
impossible. As proof, I will attempt to create a simple monadic parser
that has only one combinator:

    anyChar :: ParserSpec Char

We need the GADTs extension and some imports:

> {-# LANGUAGE GADTs, TemplateHaskell #-}
> import Control.Monad              (join)
> import qualified Text.Parsec.Char as P
> import Language.Haskell.TH        (ExpQ, appE)
> import Language.Haskell.TH.Syntax (Lift(lift))
> import Text.Parsec                (parseTest)
> import qualified Text.Parsec.Char as P
> import Text.Parsec.String         (Parser)

Next we define a type that has a constructor for each of the different
combinators we want to support, plus constructors for the functor and
monad methods:

> data ParserSpec a where
>     AnyChar :: ParserSpec Char
>     Return  :: a -> ParserSpec a
>     Join    :: ParserSpec (ParserSpec a) -> ParserSpec a
>     FMap    :: (a -> b) -> ParserSpec a -> ParserSpec b
>
> instance Lift (ParserSpec a) where
>     lift _ = error "not defined because we are screwed later anyway."

In theory, we would extend that type with things like `Many`, `Some`,
`Choice`, etc.

In Haskell, we are used to seeing a `Monad` defined in terms of
`return` and `>>=`. But, we can also define a monad in terms of
`fmap`, `return` and `join`. We will do that in `ParserSpec`, because
it makes the fatal flaw more obvious.

Now we can define the `Functor` and `Monad` instances:

> instance Functor ParserSpec where
>     fmap f p = FMap f p

> instance Monad ParserSpec where
>     return a = Return a
>     m >>= f  = Join ((FMap f) m)

and the `anyChar` combinator:

> anyChar :: ParserSpec Char
> anyChar = AnyChar

And now we can define a simple parser that parses two characters and
returns them:

> charPair :: ParserSpec (Char, Char)
> charPair =
>     do a <- anyChar
>        b <- anyChar
>        return (a, b)

Now, we just need to define a template haskell function that generates
a `Parser` from a `ParserSpec`:

> genParsec :: (Lift a) => ParserSpec a -> ExpQ
> genParsec AnyChar    = [| anyChar |]
> genParsec (Return a) = [| return a |]
> genParsec (Join p)   = genParsec p
> -- genParsec (FMap f p) = appE [| f |] (genParsec p) -- uh-oh

Looking at the `FMap` case we see the fatal flaw. In order to
generate the parser we would need some way to transform any arbitrary
Haskell function of type `a -> b` into Template Haskell. Obviously,
that is impossible (for some definition of obvious).

Therefore, we can assume that it is not possible to use Template
Haskell to generate a monadic parser from a monadic specification.

We can also assume that `Applicative` is not available either. Seems
likely that `Category` based parsers would also be out.

Now, we can, of course, do the transformation at runtime:

> interpretParsec :: ParserSpec a -> Parser a
> interpretParsec AnyChar    = P.anyChar
> interpretParsec (Return a) = return a
> interpretParsec (FMap f a) = fmap f (interpretParsec a)
> interpretParsec (Join mm)  = join (fmap interpretParsec (interpretParsec mm))

> test = parseTest (interpretParsec charPair) "ab"

My fear is that doing that will result in added runtime overhead. One
reason for wanting to create a compile-time parser generator is to have
the opportunity to generate very fast parsing code. It seems like here
we can only be slower than the parser we are targeting? Though..
perhaps not? Perhaps the parser returned by `interpretParsec` has all
the interpret stuff removed and is as fast as if we have constructed
it by hand? I don't have an intuitive feel for it.. I guess criterion
would know..

Any thoughts?

- jeremy

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: monadic DSL for compile-time parser generator, not possible?

Jacques Carette
On 13-03-12 04:06 PM, Jeremy Shaw wrote:
> It would be pretty damn cool if you could create a data type for
> generically describing a monadic parser, and then use template haskell
> to generate a concrete parser from that data type. [...]
> I would like to suggest that while it would be cool, it is
> impossible.

Impossibility proofs are notoriously difficult.  You showed that this
approach:

>> data ParserSpec a where
>>      AnyChar :: ParserSpec Char
>>      Return  :: a -> ParserSpec a
>>      Join    :: ParserSpec (ParserSpec a) -> ParserSpec a
>>      FMap    :: (a -> b) -> ParserSpec a -> ParserSpec b

does not work.  The flaw is indeed in FMap.  It should not take a
function as first argument, but rather a *description* of a function
(the same way ParserSpec gives you a description of a parser).  Then you
can make it work, if your 'description' language is adequate.

For some strange reason, I am biased towards 'finally tagless'
descriptions, but YMMV.

Jacques

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: monadic DSL for compile-time parser generator, not possible?

Dag Odenhall
In reply to this post by Jeremy Shaw-3
Why not the parsers package [1]? Write the parser against the Parsing class and then use trifecta or write instances for attoparsec or parsec. With enough inlining perhaps the overhead of the class gets optimized away?

[1] http://hackage.haskell.org/package/parsers


On Tue, Mar 12, 2013 at 9:06 PM, Jeremy Shaw <[hidden email]> wrote:
It would be pretty damn cool if you could create a data type for
generically describing a monadic parser, and then use template haskell
to generate a concrete parser from that data type. That would allow
you to create your specification in a generic way and then target
different parsers like parsec, attoparsec, etc. There is some code
coming up in a few paragraphs that should describe this idea more
clearly.

I would like to suggest that while it would be cool, it is
impossible. As proof, I will attempt to create a simple monadic parser
that has only one combinator:

    anyChar :: ParserSpec Char

We need the GADTs extension and some imports:

> {-# LANGUAGE GADTs, TemplateHaskell #-}
> import Control.Monad              (join)
> import qualified Text.Parsec.Char as P
> import Language.Haskell.TH        (ExpQ, appE)
> import Language.Haskell.TH.Syntax (Lift(lift))
> import Text.Parsec                (parseTest)
> import qualified Text.Parsec.Char as P
> import Text.Parsec.String         (Parser)

Next we define a type that has a constructor for each of the different
combinators we want to support, plus constructors for the functor and
monad methods:

> data ParserSpec a where
>     AnyChar :: ParserSpec Char
>     Return  :: a -> ParserSpec a
>     Join    :: ParserSpec (ParserSpec a) -> ParserSpec a
>     FMap    :: (a -> b) -> ParserSpec a -> ParserSpec b
>
> instance Lift (ParserSpec a) where
>     lift _ = error "not defined because we are screwed later anyway."

In theory, we would extend that type with things like `Many`, `Some`,
`Choice`, etc.

In Haskell, we are used to seeing a `Monad` defined in terms of
`return` and `>>=`. But, we can also define a monad in terms of
`fmap`, `return` and `join`. We will do that in `ParserSpec`, because
it makes the fatal flaw more obvious.

Now we can define the `Functor` and `Monad` instances:

> instance Functor ParserSpec where
>     fmap f p = FMap f p

> instance Monad ParserSpec where
>     return a = Return a
>     m >>= f  = Join ((FMap f) m)

and the `anyChar` combinator:

> anyChar :: ParserSpec Char
> anyChar = AnyChar

And now we can define a simple parser that parses two characters and
returns them:

> charPair :: ParserSpec (Char, Char)
> charPair =
>     do a <- anyChar
>        b <- anyChar
>        return (a, b)

Now, we just need to define a template haskell function that generates
a `Parser` from a `ParserSpec`:

> genParsec :: (Lift a) => ParserSpec a -> ExpQ
> genParsec AnyChar    = [| anyChar |]
> genParsec (Return a) = [| return a |]
> genParsec (Join p)   = genParsec p
> -- genParsec (FMap f p) = appE [| f |] (genParsec p) -- uh-oh

Looking at the `FMap` case we see the fatal flaw. In order to
generate the parser we would need some way to transform any arbitrary
Haskell function of type `a -> b` into Template Haskell. Obviously,
that is impossible (for some definition of obvious).

Therefore, we can assume that it is not possible to use Template
Haskell to generate a monadic parser from a monadic specification.

We can also assume that `Applicative` is not available either. Seems
likely that `Category` based parsers would also be out.

Now, we can, of course, do the transformation at runtime:

> interpretParsec :: ParserSpec a -> Parser a
> interpretParsec AnyChar    = P.anyChar
> interpretParsec (Return a) = return a
> interpretParsec (FMap f a) = fmap f (interpretParsec a)
> interpretParsec (Join mm)  = join (fmap interpretParsec (interpretParsec mm))

> test = parseTest (interpretParsec charPair) "ab"

My fear is that doing that will result in added runtime overhead. One
reason for wanting to create a compile-time parser generator is to have
the opportunity to generate very fast parsing code. It seems like here
we can only be slower than the parser we are targeting? Though..
perhaps not? Perhaps the parser returned by `interpretParsec` has all
the interpret stuff removed and is as fast as if we have constructed
it by hand? I don't have an intuitive feel for it.. I guess criterion
would know..

Any thoughts?

- jeremy

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: monadic DSL for compile-time parser generator, not possible?

Jeremy Shaw-3
In reply to this post by Jacques Carette
On Tue, Mar 12, 2013 at 3:32 PM, Jacques Carette <[hidden email]> wrote:
> On 13-03-12 04:06 PM, Jeremy Shaw wrote:

>>> data ParserSpec a where
>>>      AnyChar :: ParserSpec Char
>>>      Return  :: a -> ParserSpec a
>>>      Join    :: ParserSpec (ParserSpec a) -> ParserSpec a
>>>      FMap    :: (a -> b) -> ParserSpec a -> ParserSpec b
>
>
> does not work.  The flaw is indeed in FMap.  It should not take a function
> as first argument, but rather a *description* of a function (the same way
> ParserSpec gives you a description of a parser).  Then you can make it work,
> if your 'description' language is adequate.

Right. But, then I would not be able to use Haskell's existing do
notation -- and I would have to poorly recreate a subset of Haskell.
And, I think, ParsecSpec would not be a real monad. But.. that is sort
of the conclusion -- if you want to do compile-time generation, then
the data-type can not contain any function values -- at least none
that would need to be lifted into the generated code. And, there is no
way to make a type with a real Monad instance which does not contain
such a function.

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: monadic DSL for compile-time parser generator, not possible?

Jeremy Shaw-3
In reply to this post by Dag Odenhall
Mostly, because I want to do other sorts of compile-time inspections
on the parser. Being able to generate the parser is just the easiest
part to get started with.

- jeremy

On Tue, Mar 12, 2013 at 3:36 PM, [hidden email]
<[hidden email]> wrote:

> Why not the parsers package [1]? Write the parser against the Parsing class
> and then use trifecta or write instances for attoparsec or parsec. With
> enough inlining perhaps the overhead of the class gets optimized away?
>
> [1] http://hackage.haskell.org/package/parsers
>
>
> On Tue, Mar 12, 2013 at 9:06 PM, Jeremy Shaw <[hidden email]> wrote:
>>
>> It would be pretty damn cool if you could create a data type for
>> generically describing a monadic parser, and then use template haskell
>> to generate a concrete parser from that data type. That would allow
>> you to create your specification in a generic way and then target
>> different parsers like parsec, attoparsec, etc. There is some code
>> coming up in a few paragraphs that should describe this idea more
>> clearly.
>>
>> I would like to suggest that while it would be cool, it is
>> impossible. As proof, I will attempt to create a simple monadic parser
>> that has only one combinator:
>>
>>     anyChar :: ParserSpec Char
>>
>> We need the GADTs extension and some imports:
>>
>> > {-# LANGUAGE GADTs, TemplateHaskell #-}
>> > import Control.Monad              (join)
>> > import qualified Text.Parsec.Char as P
>> > import Language.Haskell.TH        (ExpQ, appE)
>> > import Language.Haskell.TH.Syntax (Lift(lift))
>> > import Text.Parsec                (parseTest)
>> > import qualified Text.Parsec.Char as P
>> > import Text.Parsec.String         (Parser)
>>
>> Next we define a type that has a constructor for each of the different
>> combinators we want to support, plus constructors for the functor and
>> monad methods:
>>
>> > data ParserSpec a where
>> >     AnyChar :: ParserSpec Char
>> >     Return  :: a -> ParserSpec a
>> >     Join    :: ParserSpec (ParserSpec a) -> ParserSpec a
>> >     FMap    :: (a -> b) -> ParserSpec a -> ParserSpec b
>> >
>> > instance Lift (ParserSpec a) where
>> >     lift _ = error "not defined because we are screwed later anyway."
>>
>> In theory, we would extend that type with things like `Many`, `Some`,
>> `Choice`, etc.
>>
>> In Haskell, we are used to seeing a `Monad` defined in terms of
>> `return` and `>>=`. But, we can also define a monad in terms of
>> `fmap`, `return` and `join`. We will do that in `ParserSpec`, because
>> it makes the fatal flaw more obvious.
>>
>> Now we can define the `Functor` and `Monad` instances:
>>
>> > instance Functor ParserSpec where
>> >     fmap f p = FMap f p
>>
>> > instance Monad ParserSpec where
>> >     return a = Return a
>> >     m >>= f  = Join ((FMap f) m)
>>
>> and the `anyChar` combinator:
>>
>> > anyChar :: ParserSpec Char
>> > anyChar = AnyChar
>>
>> And now we can define a simple parser that parses two characters and
>> returns them:
>>
>> > charPair :: ParserSpec (Char, Char)
>> > charPair =
>> >     do a <- anyChar
>> >        b <- anyChar
>> >        return (a, b)
>>
>> Now, we just need to define a template haskell function that generates
>> a `Parser` from a `ParserSpec`:
>>
>> > genParsec :: (Lift a) => ParserSpec a -> ExpQ
>> > genParsec AnyChar    = [| anyChar |]
>> > genParsec (Return a) = [| return a |]
>> > genParsec (Join p)   = genParsec p
>> > -- genParsec (FMap f p) = appE [| f |] (genParsec p) -- uh-oh
>>
>> Looking at the `FMap` case we see the fatal flaw. In order to
>> generate the parser we would need some way to transform any arbitrary
>> Haskell function of type `a -> b` into Template Haskell. Obviously,
>> that is impossible (for some definition of obvious).
>>
>> Therefore, we can assume that it is not possible to use Template
>> Haskell to generate a monadic parser from a monadic specification.
>>
>> We can also assume that `Applicative` is not available either. Seems
>> likely that `Category` based parsers would also be out.
>>
>> Now, we can, of course, do the transformation at runtime:
>>
>> > interpretParsec :: ParserSpec a -> Parser a
>> > interpretParsec AnyChar    = P.anyChar
>> > interpretParsec (Return a) = return a
>> > interpretParsec (FMap f a) = fmap f (interpretParsec a)
>> > interpretParsec (Join mm)  = join (fmap interpretParsec (interpretParsec
>> > mm))
>>
>> > test = parseTest (interpretParsec charPair) "ab"
>>
>> My fear is that doing that will result in added runtime overhead. One
>> reason for wanting to create a compile-time parser generator is to have
>> the opportunity to generate very fast parsing code. It seems like here
>> we can only be slower than the parser we are targeting? Though..
>> perhaps not? Perhaps the parser returned by `interpretParsec` has all
>> the interpret stuff removed and is as fast as if we have constructed
>> it by hand? I don't have an intuitive feel for it.. I guess criterion
>> would know..
>>
>> Any thoughts?
>>
>> - jeremy
>>
>> _______________________________________________
>> Haskell-Cafe mailing list
>> [hidden email]
>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: monadic DSL for compile-time parser generator, not possible?

oleg-30
In reply to this post by Jeremy Shaw-3

Jeremy Shaw wrote:
> It would be pretty damn cool if you could create a data type for
> generically describing a monadic parser, and then use template haskell
> to generate a concrete parser from that data type. That would allow
> you to create your specification in a generic way and then target
> different parsers like parsec, attoparsec, etc. There is some code
> coming up in a few paragraphs that should describe this idea more
> clearly.

After rather mild and practical restrictions, the problem is
solvable. (BTW, even the problem of lifting arbitrary functional
values, let alone handles and other stuff, is solvable in realistic
settings -- or even completely, although less practically.)

Rather than starting from the top -- implementing monadic or
applicative parser, let's start from the bottom and figure out what we
really need. It seems that many real-life parsers aren't using the
full power of Applicative, let alone monad. So why to pay, a whole
lot, for what we don't use.

Any parser combinator library has to be able to combine parsers. It
seems the applicative rule
        <*> :: Parser (a->b) -> Parser a -> Parser b
is very popular. It is indeed very useful -- although not the only
thing possible. One can come up with a set of combinators that are
used for realistic parsing. For example,
        *> :: Parser a -> Parser b -> Parser b
for sequential composition, although expressible via <*>, could be
defined as primitive. Many other such combinators can be defined as
primitives.

In other words: the great advantage of Applicative parser combinators
is letting the user supply semantic actions, and executing those
actions as parsing progresses. There is also a traditional approach:
the parser produces an AST or a stream of parsing events, which the
user consumes and semantically processes any way they wish.  Think of
XML parsing: often people parse XML and get a DOM tree, and process it
afterwards. An XML parser can be incremental: SAX.  Parsers that
produce AST need only a small fixed set of combinators. We never need
to lift arbitrary functions since those parsers don't accept arbitrary
semantic actions from the user. For that reason, these parsers are
also much easy to analyze.

Let's take the high road however, applicative parsers. The <*> rule
is not problematic: it neatly maps to code. Consider

        newtype Code a = Code Exp
which is the type-annotated TH Code. We can easily define

        app_code :: Code (a->b) -> Code a -> Code b
        app_code (Code f) (Code x) = Code $ AppE f x

So, Code is almost applicative. Almost -- because we only have a
restricted pure:
        pureR :: Lift a => a -> Code a
with a Lift constraint. Alas, this is not sufficient for realistic
parsers, because often we have to lift functions, as in the example of
parsing a pair of characters:

        pure (\x y -> (x,y)) <*> anyChar <*> anyChar

But aren't functions really unliftable? They are unliftable by value,
but we can also lift by reference.

Here is an example, using tagless final framework, since it is
extensible. We define the basic minor Applicative

> class Sym repr where
>     pureR :: Lift a => a -> repr a
>     app   :: repr (a->b) -> repr a -> repr b
>
> infixl 4 `app`


And a primitive parser, with only one primitive parser.

> class Sym repr => Parser repr where
>     anychar :: repr Char

For our example, parsing two characters and returning them as a pair,
we need pairs. So, we extend our parser with three higher-order
_constants_.

> class Sym repr => Pair repr where
>     pair :: repr (a -> b -> (a,b))
>     prj1 :: repr ((a,b) -> a)
>     prj2 :: repr ((a,b) -> b)


And here is the example.

> test1 = pair `app` anychar `app` anychar

One interpretation of Sym is to generate code (another one could
analyze the parsers)

> data C a = C{unC :: Q Exp}

Most interesting is the instance of pairs. Actually, it is not that
interesting: we just lift functions by reference.

> pair0 x y = (x,y)
>
> instance Pair C where
>     pair = C [e| pure pair0 |]
>     prj1 = C [e| pure fst |]
>     prj2 = C [e| pure snd |]

Because tagless-final is so extensible, any time we need a new
functional constant, we can easily introduce it and define its code,
either by building a code expression or by referring to a global name
that is bound to the desired value. The latter is `lift by reference'
(which is what dynamic linking does).


The obvious limitation of this approach is that all functions to
lift must be named -- because we lift by reference. We can also build
anonymous functions, if we just add lambda to our language. If we go
this way we obtain something like

        http://okmij.org/ftp/meta-programming/index.html#meta-haskell

(which has lam, let, arrays, loops, etc.)

Sample code, for reference

{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE NoMonomorphismRestriction #-}

module P where

import Language.Haskell.TH
import Language.Haskell.TH.Syntax
import Language.Haskell.TH.Ppr

import Control.Applicative

import Text.ParserCombinators.ReadP


class Sym repr where
    pureR :: Lift a => a -> repr a
    app   :: repr (a->b) -> repr a -> repr b

infixl 4 `app`

class Sym repr => Parser repr where
    anychar :: repr Char

-- Higher-order constants
class Sym repr => Pair repr where
    pair :: repr (a -> b -> (a,b))
    prj1 :: repr ((a,b) -> a)
    prj2 :: repr ((a,b) -> b)


-- parse two characters and return them as a pair
test1 = pair `app` anychar `app` anychar



-- Implementations

-- we don't need Q monad actually, neither here
-- nor anywhere!
-- It's a bummer that lift has the signature t -> Q Exp
-- rather than t -> Exp!                        

data C a = C{unC :: Q Exp}

instance Sym C where
    pureR   = C . lift
    app f x = C $ appE (appE (varE '(Control.Applicative.<*>)) (unC f)) (unC x)

instance Parser C where
    anychar = C . varE $ 'get


pair0 x y = (x,y)
 
instance Pair C where
    pair = C [e| pure pair0 |]
    prj1 = C [e| pure fst |]
    prj2 = C [e| pure snd |]

printC :: C a -> IO String
printC m = runQ (fmap pprint $ unC m )

test1C = printC test1
{-
"(Control.Applicative.<*>) ((Control.Applicative.<*>)
   (Control.Applicative.pure P.pair0)
     Text.ParserCombinators.ReadP.get) Text.ParserCombinators.ReadP.get"
-}


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: monadic DSL for compile-time parser generator, not possible?

Dominique Devriese-2
All,

2013/3/13  <[hidden email]>:
> So, Code is almost applicative. Almost -- because we only have a
> restricted pure:
>         pureR :: Lift a => a -> Code a
> with a Lift constraint. Alas, this is not sufficient for realistic
> parsers, because often we have to lift functions, as in the example of
> parsing a pair of characters:

I've previously used an approach like this in the grammar-combinators
library.  See http://hackage.haskell.org/packages/archive/grammar-combinators/0.2.7/doc/html/Text-GrammarCombinators-Base-ProductionRule.html#t:LiftableProductionRule
and http://hackage.haskell.org/packages/archive/grammar-combinators/0.2.7/doc/html/Text-GrammarCombinators-Utils-LiftGrammar.html.

The approach uses a restricted pure like this:

class ProductionRule p => LiftableProductionRule p where
  epsilonL :: a -> Q Exp -> p aSource

and associated
  epsilonLS :: (Lift v, LiftableProductionRule p) => v -> p v
  epsilonLS v = epsilonL v $ lift v

There is a function liftGrammar which lifts a grammar that uses the
type class to a list of declarations using TH.

This allowed me to start from a context-free grammar, transform it to
a non-left-recursive grammar, optimize it and then lift it using TH.
In some tests, I found that this improved performance significantly
over using the transformed grammar directly, even when I try to force
the transformation to happen before the benchmark.  I assume this is
because the lifted grammar is optimised better by the compiler.

Regards,
Dominique

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: monadic DSL for compile-time parser generator, not possible?

Dominique Devriese-2
2013/3/13 Dominique Devriese <[hidden email]>:
> class ProductionRule p => LiftableProductionRule p where
>   epsilonL :: a -> Q Exp -> p aSource
>
> and associated
>   epsilonLS :: (Lift v, LiftableProductionRule p) => v -> p v
>   epsilonLS v = epsilonL v $ lift v

Note that the point of providing epsilonL as primitive and not just
epsilonLS is that I can then still lift most functions I use:

  epsilonL (,) [| (,) |]

Even though functions are not necessarily liftable. This is an
alternative to Oleg's adding of e.g. pair etc. as DSL primitives.

Dominique

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: monadic DSL for compile-time parser generator, not possible?

Josef Svenningsson
In reply to this post by Jeremy Shaw-3
Jeremy,

The problem you're trying to solve might seem tricky but it is in fact quite solvable. In Feldspar[1] we use monads quite frequently and generate code from them, in a similar fashion to what you're trying to do. We've written a paper about how we do it[2] that I welcome you to read. If you have any questions regarding the paper I'd be happy to try to answer them. There are two parts to the trick. One is to use the continuation monad to get a monad instance. The other trick is to restrict any functions you have in your data type (like FMap in your example) so that they can be reified into something that can be compiled, which would be Template Haskell in your case.

To help you along the way I've prepared some code to give you an idea of how this can be done. This code only shows the continuation monad trick but I hope it's useful nonetheless.

\begin{code}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE GADTs #-}
module MonadReify where

newtype Cont r a = C { unC :: (a -> r) -> r }

instance Monad (Cont r) where
  return a = C $ \k -> k a
  m >>= f  = C $ \k -> unC m (\a -> unC (f a) k)

data ParserSpec a where
  AnyChar :: ParserSpec Char
  Return  :: a -> ParserSpec a
  Join    :: ParserSpec (ParserSpec a) -> ParserSpec a
  FMap    :: (a -> b) -> ParserSpec a -> ParserSpec b

bindSpec p f = Join (FMap f p)

newtype Parser a = P { unP :: forall r. Cont (ParserSpec r) a }

instance Monad Parser where
  return a = P (return a)
  m >>= f  = P (unP m >>= \a -> unP (f a))

anyChar :: Parser Char
anyChar = P (C $ \k -> bindSpec AnyChar k)

reifyParser :: Parser a -> (forall r. ParserSpec r -> b) -> b
reifyParser (P (C f)) g = g (f (\a -> Return a))
\end{code}

Cheers,

Josef


On Tue, Mar 12, 2013 at 9:06 PM, Jeremy Shaw <[hidden email]> wrote:
It would be pretty damn cool if you could create a data type for
generically describing a monadic parser, and then use template haskell
to generate a concrete parser from that data type. That would allow
you to create your specification in a generic way and then target
different parsers like parsec, attoparsec, etc. There is some code
coming up in a few paragraphs that should describe this idea more
clearly.

I would like to suggest that while it would be cool, it is
impossible. As proof, I will attempt to create a simple monadic parser
that has only one combinator:

    anyChar :: ParserSpec Char

We need the GADTs extension and some imports:

> {-# LANGUAGE GADTs, TemplateHaskell #-}
> import Control.Monad              (join)
> import qualified Text.Parsec.Char as P
> import Language.Haskell.TH        (ExpQ, appE)
> import Language.Haskell.TH.Syntax (Lift(lift))
> import Text.Parsec                (parseTest)
> import qualified Text.Parsec.Char as P
> import Text.Parsec.String         (Parser)

Next we define a type that has a constructor for each of the different
combinators we want to support, plus constructors for the functor and
monad methods:

> data ParserSpec a where
>     AnyChar :: ParserSpec Char
>     Return  :: a -> ParserSpec a
>     Join    :: ParserSpec (ParserSpec a) -> ParserSpec a
>     FMap    :: (a -> b) -> ParserSpec a -> ParserSpec b
>
> instance Lift (ParserSpec a) where
>     lift _ = error "not defined because we are screwed later anyway."

In theory, we would extend that type with things like `Many`, `Some`,
`Choice`, etc.

In Haskell, we are used to seeing a `Monad` defined in terms of
`return` and `>>=`. But, we can also define a monad in terms of
`fmap`, `return` and `join`. We will do that in `ParserSpec`, because
it makes the fatal flaw more obvious.

Now we can define the `Functor` and `Monad` instances:

> instance Functor ParserSpec where
>     fmap f p = FMap f p

> instance Monad ParserSpec where
>     return a = Return a
>     m >>= f  = Join ((FMap f) m)

and the `anyChar` combinator:

> anyChar :: ParserSpec Char
> anyChar = AnyChar

And now we can define a simple parser that parses two characters and
returns them:

> charPair :: ParserSpec (Char, Char)
> charPair =
>     do a <- anyChar
>        b <- anyChar
>        return (a, b)

Now, we just need to define a template haskell function that generates
a `Parser` from a `ParserSpec`:

> genParsec :: (Lift a) => ParserSpec a -> ExpQ
> genParsec AnyChar    = [| anyChar |]
> genParsec (Return a) = [| return a |]
> genParsec (Join p)   = genParsec p
> -- genParsec (FMap f p) = appE [| f |] (genParsec p) -- uh-oh

Looking at the `FMap` case we see the fatal flaw. In order to
generate the parser we would need some way to transform any arbitrary
Haskell function of type `a -> b` into Template Haskell. Obviously,
that is impossible (for some definition of obvious).

Therefore, we can assume that it is not possible to use Template
Haskell to generate a monadic parser from a monadic specification.

We can also assume that `Applicative` is not available either. Seems
likely that `Category` based parsers would also be out.

Now, we can, of course, do the transformation at runtime:

> interpretParsec :: ParserSpec a -> Parser a
> interpretParsec AnyChar    = P.anyChar
> interpretParsec (Return a) = return a
> interpretParsec (FMap f a) = fmap f (interpretParsec a)
> interpretParsec (Join mm)  = join (fmap interpretParsec (interpretParsec mm))

> test = parseTest (interpretParsec charPair) "ab"

My fear is that doing that will result in added runtime overhead. One
reason for wanting to create a compile-time parser generator is to have
the opportunity to generate very fast parsing code. It seems like here
we can only be slower than the parser we are targeting? Though..
perhaps not? Perhaps the parser returned by `interpretParsec` has all
the interpret stuff removed and is as fast as if we have constructed
it by hand? I don't have an intuitive feel for it.. I guess criterion
would know..

Any thoughts?

- jeremy

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe