Parsec memory behavior

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Parsec memory behavior

Juan Carlos Arevalo Baeza-2
   Try this simple program:

import Text.ParserCombinators.Parsec

ppAny = tokenPrim show (\p _ _ -> p) (\t -> Just t)
ppTest = many ppAny

p s =
    case runParser ppTest True "" s of
        Left error -> show error
        Right result -> result


main =  do  let fname = "C:/main.i"
            f <- readFile fname

            let tokens = p f

            putStrLn . show $ head tokens

   This is the simplest expression of using Parsec, it just returns the
input unchanged. But it already seems to have a big memory leak. When
parsing a 2 MB file, I'm seeing the program grow in size up to about 160
MB before it ends.

   I assume this is known behavior. I believe it's because of the way
Parsec deals with errors: when you chain two parsers in sequence, the
result of the first one (in this case the head of the list) is not
available until it knows if the second one failed or not.

   Is there any known alternative that doesn't exhibit this behavior? It
would have to somehow return errors inline or on a "side channel". I'll
be toying with this sort of thing for a while.

   Thanx!

JCAB

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Parsec memory behavior

Greg Fitzgerald
> big memory leak ... two parsers in sequence

A related question, are these two equivalents?

foldl2 f g y z lst = (foldl f y lst, foldl g z lst)
foldl2' f g y z lst = foldl (\(i, j) x -> (f i x, g j x)) (y, z) lst

If no, is there some strictness that can be added to make them equivalent?  And if so, could a compiler make this optimization as to avoid these sorts of space leaks?

Thanks,
Greg


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Parsec memory behavior

Jeremy Shaw
In reply to this post by Juan Carlos Arevalo Baeza-2
At Wed, 17 May 2006 11:36:00 -0700,
Juan Carlos Arevalo Baeza wrote:

>    Is there any known alternative that doesn't exhibit this behavior? It
> would have to somehow return errors inline or on a "side channel". I'll
> be toying with this sort of thing for a while.

You might try reading,

Polish Parsers, Step By Step
by R. John M. Hughes and S. Doaise Swierstra

http://www.cs.uu.nl/docs/vakken/afp/Literature/p224-swierstra.pdf

From the abstract:
---
We present the derivation of a space efficient parser combin-
ator library: the constructed parsers do not keep unneces-
sary references to the input, produce online results and effi-
ciently handle ambiguous grammars. The underlying tech-
niques can be applied in many contexts where traditionally
backtracking is used.
---

By 'online' they mean able to starting producing results before the
whole thing is parsed.

j.
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Parsec memory behavior

Juan Carlos Arevalo Baeza-2
   Very interesting, thanx.

   I haven't finished grokking this yet, but... Basically, this one is
of the "somehow return errors inline" category. The errors are returned
inline, together with the expression that computes the result, so that
you can extract either the errors and/or the result whenever convenient.

   I'd be worried about the performance you can get from this
breadth-first approach. I sort of like the fine-tuning control that the
"try" approach gives in parsec. I'll finish the paper before giving this
any more thought, though.

JCAB

Jeremy Shaw wrote:

> At Wed, 17 May 2006 11:36:00 -0700,
> Juan Carlos Arevalo Baeza wrote:
>
>  
>>    Is there any known alternative that doesn't exhibit this behavior? It
>> would have to somehow return errors inline or on a "side channel". I'll
>> be toying with this sort of thing for a while.
>>    
>
> You might try reading,
>
> Polish Parsers, Step By Step
> by R. John M. Hughes and S. Doaise Swierstra
>
> http://www.cs.uu.nl/docs/vakken/afp/Literature/p224-swierstra.pdf
>
> From the abstract:
> ---
> We present the derivation of a space efficient parser combin-
> ator library: the constructed parsers do not keep unneces-
> sary references to the input, produce online results and effi-
> ciently handle ambiguous grammars. The underlying tech-
> niques can be applied in many contexts where traditionally
> backtracking is used.
> ---
>
> By 'online' they mean able to starting producing results before the
> whole thing is parsed.
>
> j.
> _______________________________________________
> Haskell-Cafe mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>  

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Parsec memory behavior

Jeremy Shaw
At Thu, 18 May 2006 02:37:56 -0700,
Juan Carlos Arevalo Baeza wrote:

>    I'd be worried about the performance you can get from this
> breadth-first approach. I sort of like the fine-tuning control that the
> "try" approach gives in parsec. I'll finish the paper before giving this
> any more thought, though.

In section 7 they claim that their parser performs at "about half the
speed of off-line generated parsers, while also preparing for and
performing full error repair and error reporting."

I am guessing that 'off-line generated parsers' means things like
lex/yacc -- but I am not positive.

j.

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Parsec memory behavior

Christian Maeder
Jeremy Shaw wrote:
> I am guessing that 'off-line generated parsers' means things like
> lex/yacc -- but I am not positive.

Yes, happy is the parser generator for haskell. (and alex the
corresponding lexer)

Christian
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe