Alternative instance for non-backtracking parsers

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Alternative instance for non-backtracking parsers

Olaf Klinke
Dear Cafe,

can anyone explain the use of an Alternative instance for a parser monad where the parser (p <|> q) does not backtrack automatically when p fails?
Consider megaparsec for example. When p is a parser that does not backtrack automatically, then (p <|> q) never succeeds with q. While I do understand that fine-grained control of backtracking is desirable for performance reasons in general, it escapes me why such an Alternative instance could be useful. See also "A note on backtracking" in the parser-combinators package documentation [1].

Minimal code example and context below.

Thanks,
Olaf

[1] http://hackage.haskell.org/package/parser-combinators-1.0.0/docs/Control-Applicative-Combinators.html

import Text.Megaparsec
import Text.Megaparsec.Char
import Data.Void

p :: Parsec Void String Bool
p = do
  char '-'
  string "True"
  return True
q :: Parsec Void String Bool
q = do
  char '-'
  string "False"
  return False

>>> parseTest (p <|> q) "-True"
True
>>>  parseTest (p <|> q) "-False"
1:2:
unexpected "Fals"
expecting "True"
-- Since q can parse "-False", I would expect (p <|> q) to parse "-False", too.

Context:
Of course the example above can be mitigated by moving the char '-' outside the alternative. But we might not have control over whether the two parsers share a common prefix. I am trying to write some code generic in the parser monad and I am having difficulties making my code work for both backtracking and non-backtracking parsers. If possible, I'd like to keep my parser type class devoid of a 'try' combinator, which would be the identity function for parser monads that always backtrack.  
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Alternative instance for non-backtracking parsers

S. Doaitse Swierstra
A common situation is where the “q" part is actually the parser which parses the empty string. It is not uncommon to make the choice for “p” once “p” has made some progress. Consider e.g. the situation where you parse a list of parameters:

(expr1, expr2, )

with a parser "pChainr pComma pExpr”

Obviously the example does not parse, but the question is what kind of error message do you expect? I would prefer the message that an expression is missing after the comma. Of course we could backtrack to the point before the comma and then report that we cannot handle the comma. For me this is counterintuitive.

In the uu-parsinglib you can choose between two versions of pChainR:

-- * Combinators for chained structures
-- ** Treating the operator as right associative
pChainr    :: IsParser p => p (c -> c -> c) -> p c -> p c
pChainr    op x    =   must_be_non_empties "pChainr"    op   x r where r = x <??> (flip <$> op <*> r)
pChainr_ng :: IsParser p => p (c -> c -> c) -> p c -> p c
pChainr_ng op x    =   must_be_non_empties "pChainr_ng" op   x r where r = x <**> ((flip <$> op <*> r)  <|> pure id)

Do not pay attention to the function “must_be_non_empties” which just performs a check that not both the operator and the operand can recognise the empty string, which would build a non-sensical parser. The non-greedy parser does not commit to the choice made after seeing the next recognisable symbol. This may make a difference. What is done in our example erroneous example depends on the specified costs for inserting and deleting a specific symbol.

 Doaitse


> Op 23 aug. 2018, om 21:41  heeft Olaf Klinke <[hidden email]> het volgende geschreven:
>
> Dear Cafe,
>
> can anyone explain the use of an Alternative instance for a parser monad where the parser (p <|> q) does not backtrack automatically when p fails?
> Consider megaparsec for example. When p is a parser that does not backtrack automatically, then (p <|> q) never succeeds with q. While I do understand that fine-grained control of backtracking is desirable for performance reasons in general, it escapes me why such an Alternative instance could be useful. See also "A note on backtracking" in the parser-combinators package documentation [1].
>
> Minimal code example and context below.
>
> Thanks,
> Olaf
>
> [1] http://hackage.haskell.org/package/parser-combinators-1.0.0/docs/Control-Applicative-Combinators.html
>
> import Text.Megaparsec
> import Text.Megaparsec.Char
> import Data.Void
>
> p :: Parsec Void String Bool
> p = do
>  char '-'
>  string "True"
>  return True
> q :: Parsec Void String Bool
> q = do
>  char '-'
>  string "False"
>  return False
>
>>>> parseTest (p <|> q) "-True"
> True
>>>> parseTest (p <|> q) "-False"
> 1:2:
> unexpected "Fals"
> expecting "True"
> -- Since q can parse "-False", I would expect (p <|> q) to parse "-False", too.
>
> Context:
> Of course the example above can be mitigated by moving the char '-' outside the alternative. But we might not have control over whether the two parsers share a common prefix. I am trying to write some code generic in the parser monad and I am having difficulties making my code work for both backtracking and non-backtracking parsers. If possible, I'd like to keep my parser type class devoid of a 'try' combinator, which would be the identity function for parser monads that always backtrack.  
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Alternative instance for non-backtracking parsers

Paul
In reply to this post by Olaf Klinke
 >  Of course the example above can be mitigated by moving the char '-'
outside the alternative.

Right, this works fine:

   pq :: Parsec Void String Bool
   pq = do
     char '-'
     (q <|> p)

which for me, means that Haskell has not real backtracking. Btw,
`Alternative` is not backtracking (as well as List permutations). After
"do" should happen context saving. I think it's possible to be
implemented manually... As workaround, may be to pick char '-', but
don't consume it (and to fail if it is not expecting '-') ? Or to switch
to another parser libs: arrow based, happy, bindings to good known
libs... For me, parsers combinators absolutely ever are big problem: a
lot of tricks, low performance, may be it's typical for any "elegant"
approach.


23.08.2018 22:41, Olaf Klinke wrote:

> Dear Cafe,
>
> can anyone explain the use of an Alternative instance for a parser monad where the parser (p <|> q) does not backtrack automatically when p fails?
> Consider megaparsec for example. When p is a parser that does not backtrack automatically, then (p <|> q) never succeeds with q. While I do understand that fine-grained control of backtracking is desirable for performance reasons in general, it escapes me why such an Alternative instance could be useful. See also "A note on backtracking" in the parser-combinators package documentation [1].
>
> Minimal code example and context below.
>
> Thanks,
> Olaf
>
> [1] http://hackage.haskell.org/package/parser-combinators-1.0.0/docs/Control-Applicative-Combinators.html
>
> import Text.Megaparsec
> import Text.Megaparsec.Char
> import Data.Void
>
> p :: Parsec Void String Bool
> p = do
>    char '-'
>    string "True"
>    return True
> q :: Parsec Void String Bool
> q = do
>    char '-'
>    string "False"
>    return False
>
>>>> parseTest (p <|> q) "-True"
> True
>>>>   parseTest (p <|> q) "-False"
> 1:2:
> unexpected "Fals"
> expecting "True"
> -- Since q can parse "-False", I would expect (p <|> q) to parse "-False", too.
>
> Context:
> Of course the example above can be mitigated by moving the char '-' outside the alternative. But we might not have control over whether the two parsers share a common prefix. I am trying to write some code generic in the parser monad and I am having difficulties making my code work for both backtracking and non-backtracking parsers. If possible, I'd like to keep my parser type class devoid of a 'try' combinator, which would be the identity function for parser monads that always backtrack.
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Alternative instance for non-backtracking parsers

S. Doaitse Swierstra


> Op 24 aug. 2018, om 7:33  heeft Paul <[hidden email]> het volgende geschreven:
>
> >  Of course the example above can be mitigated by moving the char '-' outside the alternative.
>
> Right, this works fine:
>
>   pq :: Parsec Void String Bool
>   pq = do
>     char '-'
>     (q <|> p)
>
> which for me, means that Haskell has not real backtracking.

You are confusing the language “Haskell” with the library “Parsec”. Parsec is just one of the many parser combinator libraries.

> Btw, `Alternative` is not backtracking (as well as List permutations).

The class “Alternative” does in no way specify how parsers should behave; it defines an interface. One might expect <|> to be associative, but even that is in no way guaranteed.

> After "do" should happen context saving. I think it's possible to be implemented manually... As workaround, may be to pick char '-', but don't consume it (and to fail if it is not expecting '-') ? Or to switch to another parser libs: arrow based, happy, bindings to good known libs... For me, parsers combinators absolutely ever are big problem: a lot of tricks, low performance, may be it's typical for any "elegant" approach.

 - Some libraries do not need “try” constructs to set backtrack points, but may be a bit slower; other libraries are more greedy and a bit faster.
 - Some parsers keep track of where you are in the input at some cost, others just parse as fast as possible without having to update line and position counters (handy when you know the input is correct since it was machine-generated).
 - Some libraries provide nice error messages about what was expected at some cost, other libraries just report that they did not find a parse.
 - Some libraries try to “repair” the input and continue parsing when they cannot proceed otherwise, some others just stop when they get stuck.
 - Some libraries hang on to the input so they may backtrack to a point far back, some libraries work in an online way and pursue all parses “in parallel”, i.e. they perform a breadth first search for a successful parse.
 - Some libraries analyse the grammar before parsing starts and warn for non-sensical parsers, others just do not.
 - Some libraries already return part of the computed result during the parsing process -and thus exhibit online behaviour-, other libraries (usually the monadic ones) construct a huge closure describing the result, which only becomes available when a succesful parse was found.
 - Some libraries analyse a grammar for left-recursion and transform them into an equivalent non-left-recusrive one at some costs.
 - Some libraries memoise the parsing, and may be faster, but in return cannot handle real monadic constructs.
 - Etc., Etc.

My impression is that with respect to parser combinators you just jump a bit too fast to a conclusion.

Doaitse


>
>
> 23.08.2018 22:41, Olaf Klinke wrote:
>> Dear Cafe,
>>
>> can anyone explain the use of an Alternative instance for a parser monad where the parser (p <|> q) does not backtrack automatically when p fails?
>> Consider megaparsec for example. When p is a parser that does not backtrack automatically, then (p <|> q) never succeeds with q. While I do understand that fine-grained control of backtracking is desirable for performance reasons in general, it escapes me why such an Alternative instance could be useful. See also "A note on backtracking" in the parser-combinators package documentation [1].
>>
>> Minimal code example and context below.
>>
>> Thanks,
>> Olaf
>>
>> [1] http://hackage.haskell.org/package/parser-combinators-1.0.0/docs/Control-Applicative-Combinators.html
>>
>> import Text.Megaparsec
>> import Text.Megaparsec.Char
>> import Data.Void
>>
>> p :: Parsec Void String Bool
>> p = do
>>   char '-'
>>   string "True"
>>   return True
>> q :: Parsec Void String Bool
>> q = do
>>   char '-'
>>   string "False"
>>   return False
>>
>>>>> parseTest (p <|> q) "-True"
>> True
>>>>>  parseTest (p <|> q) "-False"
>> 1:2:
>> unexpected "Fals"
>> expecting "True"
>> -- Since q can parse "-False", I would expect (p <|> q) to parse "-False", too.
>>
>> Context:
>> Of course the example above can be mitigated by moving the char '-' outside the alternative. But we might not have control over whether the two parsers share a common prefix. I am trying to write some code generic in the parser monad and I am having difficulties making my code work for both backtracking and non-backtracking parsers. If possible, I'd like to keep my parser type class devoid of a 'try' combinator, which would be the identity function for parser monads that always backtrack.
>> _______________________________________________
>> Haskell-Cafe mailing list
>> To (un)subscribe, modify options or view archives go to:
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>> Only members subscribed via the mailman list are allowed to post.
>
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Alternative instance for non-backtracking parsers

Paul
may be because I always hit similar problems and needs different tricks
with them, so I an not sure that parser combinators are good solution.
Btw, usually I use standard parser combinator which is related to GHC
itself (in the 'base' package), so I'm not sure how all those P.C.
libraries are similar (may be MegaParsec has different behavior). But
it's my IMHO: I'm not sure that P.C. libraries are good for serious
parsing tasks: you should be concentrated on the task and not to spend
time in solving of such surprises like that.

Btw, some time ago I found this list of libs
https://www.reddit.com/r/haskell/comments/46u45o/what_is_the_current_state_of_parser_libraries_in/ 
and Trifecta looks interesting but I did not try it


24.08.2018 12:44, Doaitse Swierstra wrote:

>
>> Op 24 aug. 2018, om 7:33  heeft Paul <[hidden email]> het volgende geschreven:
>>
>>>   Of course the example above can be mitigated by moving the char '-' outside the alternative.
>> Right, this works fine:
>>
>>    pq :: Parsec Void String Bool
>>    pq = do
>>      char '-'
>>      (q <|> p)
>>
>> which for me, means that Haskell has not real backtracking.
> You are confusing the language “Haskell” with the library “Parsec”. Parsec is just one of the many parser combinator libraries.
>
>> Btw, `Alternative` is not backtracking (as well as List permutations).
> The class “Alternative” does in no way specify how parsers should behave; it defines an interface. One might expect <|> to be associative, but even that is in no way guaranteed.
>
>> After "do" should happen context saving. I think it's possible to be implemented manually... As workaround, may be to pick char '-', but don't consume it (and to fail if it is not expecting '-') ? Or to switch to another parser libs: arrow based, happy, bindings to good known libs... For me, parsers combinators absolutely ever are big problem: a lot of tricks, low performance, may be it's typical for any "elegant" approach.
>   - Some libraries do not need “try” constructs to set backtrack points, but may be a bit slower; other libraries are more greedy and a bit faster.
>   - Some parsers keep track of where you are in the input at some cost, others just parse as fast as possible without having to update line and position counters (handy when you know the input is correct since it was machine-generated).
>   - Some libraries provide nice error messages about what was expected at some cost, other libraries just report that they did not find a parse.
>   - Some libraries try to “repair” the input and continue parsing when they cannot proceed otherwise, some others just stop when they get stuck.
>   - Some libraries hang on to the input so they may backtrack to a point far back, some libraries work in an online way and pursue all parses “in parallel”, i.e. they perform a breadth first search for a successful parse.
>   - Some libraries analyse the grammar before parsing starts and warn for non-sensical parsers, others just do not.
>   - Some libraries already return part of the computed result during the parsing process -and thus exhibit online behaviour-, other libraries (usually the monadic ones) construct a huge closure describing the result, which only becomes available when a succesful parse was found.
>   - Some libraries analyse a grammar for left-recursion and transform them into an equivalent non-left-recusrive one at some costs.
>   - Some libraries memoise the parsing, and may be faster, but in return cannot handle real monadic constructs.
>   - Etc., Etc.
>
> My impression is that with respect to parser combinators you just jump a bit too fast to a conclusion.
>
> Doaitse
>
>
>>
>> 23.08.2018 22:41, Olaf Klinke wrote:
>>> Dear Cafe,
>>>
>>> can anyone explain the use of an Alternative instance for a parser monad where the parser (p <|> q) does not backtrack automatically when p fails?
>>> Consider megaparsec for example. When p is a parser that does not backtrack automatically, then (p <|> q) never succeeds with q. While I do understand that fine-grained control of backtracking is desirable for performance reasons in general, it escapes me why such an Alternative instance could be useful. See also "A note on backtracking" in the parser-combinators package documentation [1].
>>>
>>> Minimal code example and context below.
>>>
>>> Thanks,
>>> Olaf
>>>
>>> [1] http://hackage.haskell.org/package/parser-combinators-1.0.0/docs/Control-Applicative-Combinators.html
>>>
>>> import Text.Megaparsec
>>> import Text.Megaparsec.Char
>>> import Data.Void
>>>
>>> p :: Parsec Void String Bool
>>> p = do
>>>    char '-'
>>>    string "True"
>>>    return True
>>> q :: Parsec Void String Bool
>>> q = do
>>>    char '-'
>>>    string "False"
>>>    return False
>>>
>>>>>> parseTest (p <|> q) "-True"
>>> True
>>>>>>   parseTest (p <|> q) "-False"
>>> 1:2:
>>> unexpected "Fals"
>>> expecting "True"
>>> -- Since q can parse "-False", I would expect (p <|> q) to parse "-False", too.
>>>
>>> Context:
>>> Of course the example above can be mitigated by moving the char '-' outside the alternative. But we might not have control over whether the two parsers share a common prefix. I am trying to write some code generic in the parser monad and I am having difficulties making my code work for both backtracking and non-backtracking parsers. If possible, I'd like to keep my parser type class devoid of a 'try' combinator, which would be the identity function for parser monads that always backtrack.
>>> _______________________________________________
>>> Haskell-Cafe mailing list
>>> To (un)subscribe, modify options or view archives go to:
>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>>> Only members subscribed via the mailman list are allowed to post.
>> _______________________________________________
>> Haskell-Cafe mailing list
>> To (un)subscribe, modify options or view archives go to:
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>> Only members subscribed via the mailman list are allowed to post.

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Alternative instance for non-backtracking parsers

Isaac Elliott
Throughout the day I've been thinking about how to best respond to this thread, but I've no satisfying answer so I'm just going to throw out my thoughts on the matter.

Non-backtracking choice is adequate for many parsing tasks. Applicative with (non-backtracking) Alternative roughly corresponds to the LL(k) class of grammars, which can be parsed very quickly with low memory overhead (precisely because they lack any ability to backtrack). Because it is sufficiently expressive as well as very efficient, non-backtracking choice is a good default for LL-style parser combinators.

Backtracking has a performance cost, but we shouldn't have to pay a price for things we don't use. If I were to implement an LL(k) grammar using a parser combinator library that had always-backtracking choice, I would still take a penalty even though I know that this parser will never backtrack.

With 'try', we get opt-in backtracking. Pay for what you use. I can implement my LL(1) grammar and have it nice and speedy, and you can implement a complex grammar that needs to backtrack by using 'try' when appropriate.

> Of course the example above can be mitigated by moving the char '-' outside the alternative. But we might not have control over whether the two parsers share a common prefix

In a majority of use cases we can do the kind of factoring you suggest. But if you really, definitely can't, then LL-style parser combinators may be inappropriate for the task. I'd suggest looking at PEGs and packrat parsing, which can deal with the examples you laid out.

And regarding Paul's most recent reply:

I think one needs to better understand how these parsers work before they can comment on "whether parser combinators are good for serious parsing tasks". I don't consider the things mentioned in this thread "problems", nor their solutions to be "tricks". If certain behaviours are surprising, it's because there is a lack of understanding of the tool, and not an issue with the tool itself.

Also, trifecta has the same backtracking semantics as megaparsec.

On Fri, Aug 24, 2018 at 8:10 PM Paul <[hidden email]> wrote:
may be because I always hit similar problems and needs different tricks
with them, so I an not sure that parser combinators are good solution.
Btw, usually I use standard parser combinator which is related to GHC
itself (in the 'base' package), so I'm not sure how all those P.C.
libraries are similar (may be MegaParsec has different behavior). But
it's my IMHO: I'm not sure that P.C. libraries are good for serious
parsing tasks: you should be concentrated on the task and not to spend
time in solving of such surprises like that.

Btw, some time ago I found this list of libs
https://www.reddit.com/r/haskell/comments/46u45o/what_is_the_current_state_of_parser_libraries_in/
and Trifecta looks interesting but I did not try it


24.08.2018 12:44, Doaitse Swierstra wrote:
>
>> Op 24 aug. 2018, om 7:33  heeft Paul <[hidden email]> het volgende geschreven:
>>
>>>   Of course the example above can be mitigated by moving the char '-' outside the alternative.
>> Right, this works fine:
>>
>>    pq :: Parsec Void String Bool
>>    pq = do
>>      char '-'
>>      (q <|> p)
>>
>> which for me, means that Haskell has not real backtracking.
> You are confusing the language “Haskell” with the library “Parsec”. Parsec is just one of the many parser combinator libraries.
>
>> Btw, `Alternative` is not backtracking (as well as List permutations).
> The class “Alternative” does in no way specify how parsers should behave; it defines an interface. One might expect <|> to be associative, but even that is in no way guaranteed.
>
>> After "do" should happen context saving. I think it's possible to be implemented manually... As workaround, may be to pick char '-', but don't consume it (and to fail if it is not expecting '-') ? Or to switch to another parser libs: arrow based, happy, bindings to good known libs... For me, parsers combinators absolutely ever are big problem: a lot of tricks, low performance, may be it's typical for any "elegant" approach.
>   - Some libraries do not need “try” constructs to set backtrack points, but may be a bit slower; other libraries are more greedy and a bit faster.
>   - Some parsers keep track of where you are in the input at some cost, others just parse as fast as possible without having to update line and position counters (handy when you know the input is correct since it was machine-generated).
>   - Some libraries provide nice error messages about what was expected at some cost, other libraries just report that they did not find a parse.
>   - Some libraries try to “repair” the input and continue parsing when they cannot proceed otherwise, some others just stop when they get stuck.
>   - Some libraries hang on to the input so they may backtrack to a point far back, some libraries work in an online way and pursue all parses “in parallel”, i.e. they perform a breadth first search for a successful parse.
>   - Some libraries analyse the grammar before parsing starts and warn for non-sensical parsers, others just do not.
>   - Some libraries already return part of the computed result during the parsing process -and thus exhibit online behaviour-, other libraries (usually the monadic ones) construct a huge closure describing the result, which only becomes available when a succesful parse was found.
>   - Some libraries analyse a grammar for left-recursion and transform them into an equivalent non-left-recusrive one at some costs.
>   - Some libraries memoise the parsing, and may be faster, but in return cannot handle real monadic constructs.
>   - Etc., Etc.
>
> My impression is that with respect to parser combinators you just jump a bit too fast to a conclusion.
>
> Doaitse
>
>
>>
>> 23.08.2018 22:41, Olaf Klinke wrote:
>>> Dear Cafe,
>>>
>>> can anyone explain the use of an Alternative instance for a parser monad where the parser (p <|> q) does not backtrack automatically when p fails?
>>> Consider megaparsec for example. When p is a parser that does not backtrack automatically, then (p <|> q) never succeeds with q. While I do understand that fine-grained control of backtracking is desirable for performance reasons in general, it escapes me why such an Alternative instance could be useful. See also "A note on backtracking" in the parser-combinators package documentation [1].
>>>
>>> Minimal code example and context below.
>>>
>>> Thanks,
>>> Olaf
>>>
>>> [1] http://hackage.haskell.org/package/parser-combinators-1.0.0/docs/Control-Applicative-Combinators.html
>>>
>>> import Text.Megaparsec
>>> import Text.Megaparsec.Char
>>> import Data.Void
>>>
>>> p :: Parsec Void String Bool
>>> p = do
>>>    char '-'
>>>    string "True"
>>>    return True
>>> q :: Parsec Void String Bool
>>> q = do
>>>    char '-'
>>>    string "False"
>>>    return False
>>>
>>>>>> parseTest (p <|> q) "-True"
>>> True
>>>>>>   parseTest (p <|> q) "-False"
>>> 1:2:
>>> unexpected "Fals"
>>> expecting "True"
>>> -- Since q can parse "-False", I would expect (p <|> q) to parse "-False", too.
>>>
>>> Context:
>>> Of course the example above can be mitigated by moving the char '-' outside the alternative. But we might not have control over whether the two parsers share a common prefix. I am trying to write some code generic in the parser monad and I am having difficulties making my code work for both backtracking and non-backtracking parsers. If possible, I'd like to keep my parser type class devoid of a 'try' combinator, which would be the identity function for parser monads that always backtrack.
>>> _______________________________________________
>>> Haskell-Cafe mailing list
>>> To (un)subscribe, modify options or view archives go to:
>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>>> Only members subscribed via the mailman list are allowed to post.
>> _______________________________________________
>> Haskell-Cafe mailing list
>> To (un)subscribe, modify options or view archives go to:
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>> Only members subscribed via the mailman list are allowed to post.

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Alternative instance for non-backtracking parsers

Paul
> behaviours are surprising, it's because there is a lack of understanding of the tool, and not an issue with the tool itself.
Hello, Issac. Yes, it will be the most correct statement :)

Also, trifecta has the same backtracking semantics as megaparsec.

On Fri, Aug 24, 2018 at 8:10 PM Paul <[hidden email]> wrote:
may be because I always hit similar problems and needs different tricks
with them, so I an not sure that parser combinators are good solution.
Btw, usually I use standard parser combinator which is related to GHC
itself (in the 'base' package), so I'm not sure how all those P.C.
libraries are similar (may be MegaParsec has different behavior). But
it's my IMHO: I'm not sure that P.C. libraries are good for serious
parsing tasks: you should be concentrated on the task and not to spend
time in solving of such surprises like that.

Btw, some time ago I found this list of libs
https://www.reddit.com/r/haskell/comments/46u45o/what_is_the_current_state_of_parser_libraries_in/
and Trifecta looks interesting but I did not try it


24.08.2018 12:44, Doaitse Swierstra wrote:
>
>> Op 24 aug. 2018, om 7:33  heeft Paul <[hidden email]> het volgende geschreven:
>>
>>>   Of course the example above can be mitigated by moving the char '-' outside the alternative.
>> Right, this works fine:
>>
>>    pq :: Parsec Void String Bool
>>    pq = do
>>      char '-'
>>      (q <|> p)
>>
>> which for me, means that Haskell has not real backtracking.
> You are confusing the language “Haskell” with the library “Parsec”. Parsec is just one of the many parser combinator libraries.
>
>> Btw, `Alternative` is not backtracking (as well as List permutations).
> The class “Alternative” does in no way specify how parsers should behave; it defines an interface. One might expect <|> to be associative, but even that is in no way guaranteed.
>
>> After "do" should happen context saving. I think it's possible to be implemented manually... As workaround, may be to pick char '-', but don't consume it (and to fail if it is not expecting '-') ? Or to switch to another parser libs: arrow based, happy, bindings to good known libs... For me, parsers combinators absolutely ever are big problem: a lot of tricks, low performance, may be it's typical for any "elegant" approach.
>   - Some libraries do not need “try” constructs to set backtrack points, but may be a bit slower; other libraries are more greedy and a bit faster.
>   - Some parsers keep track of where you are in the input at some cost, others just parse as fast as possible without having to update line and position counters (handy when you know the input is correct since it was machine-generated).
>   - Some libraries provide nice error messages about what was expected at some cost, other libraries just report that they did not find a parse.
>   - Some libraries try to “repair” the input and continue parsing when they cannot proceed otherwise, some others just stop when they get stuck.
>   - Some libraries hang on to the input so they may backtrack to a point far back, some libraries work in an online way and pursue all parses “in parallel”, i.e. they perform a breadth first search for a successful parse.
>   - Some libraries analyse the grammar before parsing starts and warn for non-sensical parsers, others just do not.
>   - Some libraries already return part of the computed result during the parsing process -and thus exhibit online behaviour-, other libraries (usually the monadic ones) construct a huge closure describing the result, which only becomes available when a succesful parse was found.
>   - Some libraries analyse a grammar for left-recursion and transform them into an equivalent non-left-recusrive one at some costs.
>   - Some libraries memoise the parsing, and may be faster, but in return cannot handle real monadic constructs.
>   - Etc., Etc.
>
> My impression is that with respect to parser combinators you just jump a bit too fast to a conclusion.
>
> Doaitse
>
>
>>
>> 23.08.2018 22:41, Olaf Klinke wrote:
>>> Dear Cafe,
>>>
>>> can anyone explain the use of an Alternative instance for a parser monad where the parser (p <|> q) does not backtrack automatically when p fails?
>>> Consider megaparsec for example. When p is a parser that does not backtrack automatically, then (p <|> q) never succeeds with q. While I do understand that fine-grained control of backtracking is desirable for performance reasons in general, it escapes me why such an Alternative instance could be useful. See also "A note on backtracking" in the parser-combinators package documentation [1].
>>>
>>> Minimal code example and context below.
>>>
>>> Thanks,
>>> Olaf
>>>
>>> [1] http://hackage.haskell.org/package/parser-combinators-1.0.0/docs/Control-Applicative-Combinators.html
>>>
>>> import Text.Megaparsec
>>> import Text.Megaparsec.Char
>>> import Data.Void
>>>
>>> p :: Parsec Void String Bool
>>> p = do
>>>    char '-'
>>>    string "True"
>>>    return True
>>> q :: Parsec Void String Bool
>>> q = do
>>>    char '-'
>>>    string "False"
>>>    return False
>>>
>>>>>> parseTest (p <|> q) "-True"
>>> True
>>>>>>   parseTest (p <|> q) "-False"
>>> 1:2:
>>> unexpected "Fals"
>>> expecting "True"
>>> -- Since q can parse "-False", I would expect (p <|> q) to parse "-False", too.
>>>
>>> Context:
>>> Of course the example above can be mitigated by moving the char '-' outside the alternative. But we might not have control over whether the two parsers share a common prefix. I am trying to write some code generic in the parser monad and I am having difficulties making my code work for both backtracking and non-backtracking parsers. If possible, I'd like to keep my parser type class devoid of a 'try' combinator, which would be the identity function for parser monads that always backtrack.
>>> _______________________________________________
>>> Haskell-Cafe mailing list
>>> To (un)subscribe, modify options or view archives go to:
>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>>> Only members subscribed via the mailman list are allowed to post.
>> _______________________________________________
>> Haskell-Cafe mailing list
>> To (un)subscribe, modify options or view archives go to:
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>> Only members subscribed via the mailman list are allowed to post.

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Alternative instance for non-backtracking parsers

Olaf Klinke
In reply to this post by Olaf Klinke
Thanks to Doaitse for the example. I yet have to wrap my head around it (since I don't know how pChainr is related to <|>) but playing with q parsing the empty string is a good start.
When you said

> One might expect <|> to be associative, but even that is in no way guaranteed.

I suppose you meant that type class laws can not be enforced, because the documentation of Alternative states that <|> should be associative. It would be quite horrible if the Alternative instances of parsers weren't.

@Paul: I think that parser combinator libraries are one of the most wonderful things functional programming has given us. How else would you parse your input data? Anything cobbled together manually with the help of some Read instances is bound to be unmaintainable and does not yield good error messages. What would you suggest as a replacement? How would you write composable parsers for a complicated file type?

Today I added a 'try' method to my MonadParse type class. This lets me write (try p <|> q) generically. An instance for Attoparsec would declare try=id, since it always backtracks. We'll see how this goes.

What I learned form this thread is:
* arrange your usages of <|> so that both sides can never share a common prefix, then no backtracking will be necessary
* for that reason, some parsers like Megaparsec define their Alternative instances the way they do, without backtracking.

Please allow one more question:
Is the backtracking performance penalty linear in the amount of backtracking? That is, suppose that
p = char '-' >> someHugeParser
q = char '-' >> anotherHugeParser
Could (try p <|> q) decide after seeing two characters that p fails and commit to the second branch, freeing memory?

Thanks,
Olaf
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Alternative instance for non-backtracking parsers

Paul
Hello, Olaf. I have some distrust of elegant solutions (one of them are
C.P. libs). For example, I read that they have problems with memory
consuming (famous article about arrow-based parsers), also your example
(my own experience includes a lot of tricks which I don't like)... So
(but this is my IMHO) for big and serious project with parsing of
something complex I will use bindings to good known parser libraries... 
Maybe something like this: https://github.com/markwright/antlrc.  Btw,
Happy looks still live (https://github.com/simonmar/happy)

I use "Read"-style parsers for small constructions only, so fail of such
small parser is only case where error is producing and I do it
explicitly (as biggest example: header of Git patch).

Olaf, what do you think about
   1. http://hackage.haskell.org/package/frisby
   2. http://tanakh.github.io/Peggy
?



25.08.2018 01:15, Olaf Klinke wrote:

> Thanks to Doaitse for the example. I yet have to wrap my head around it (since I don't know how pChainr is related to <|>) but playing with q parsing the empty string is a good start.
> When you said
>
>> One might expect <|> to be associative, but even that is in no way guaranteed.
> I suppose you meant that type class laws can not be enforced, because the documentation of Alternative states that <|> should be associative. It would be quite horrible if the Alternative instances of parsers weren't.
>
> @Paul: I think that parser combinator libraries are one of the most wonderful things functional programming has given us. How else would you parse your input data? Anything cobbled together manually with the help of some Read instances is bound to be unmaintainable and does not yield good error messages. What would you suggest as a replacement? How would you write composable parsers for a complicated file type?
>
> Today I added a 'try' method to my MonadParse type class. This lets me write (try p <|> q) generically. An instance for Attoparsec would declare try=id, since it always backtracks. We'll see how this goes.
>
> What I learned form this thread is:
> * arrange your usages of <|> so that both sides can never share a common prefix, then no backtracking will be necessary
> * for that reason, some parsers like Megaparsec define their Alternative instances the way they do, without backtracking.
>
> Please allow one more question:
> Is the backtracking performance penalty linear in the amount of backtracking? That is, suppose that
> p = char '-' >> someHugeParser
> q = char '-' >> anotherHugeParser
> Could (try p <|> q) decide after seeing two characters that p fails and commit to the second branch, freeing memory?
>
> Thanks,
> Olaf
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Alternative instance for non-backtracking parsers

Olaf Klinke
In reply to this post by Olaf Klinke
>Hello, Olaf. I have some distrust of elegant solutions (one of them are
>C.P. libs).
I am a mathematician, I trust a lot in elegance and beauty. As long as
the performance penalty is not prohibitively large, I'd prefer a beautiful
implementation over an aggressively optimized one. At least when
the code does not live in some real-time system. Certainly there are
cases where you absolutely need speed or low memory footprint.

> For example, I read that they have problems with memory
>consuming (famous article about arrow-based parsers), also your example
>(my own experience includes a lot of tricks which I don't like)... So
>(but this is my IMHO) for big and serious project with parsing of
>something complex I will use bindings to good known parser libraries... 
I consider Parsec and it descendants good and known. I could not write
something like that. Currently I rely on megaparsec because of its
informative error messages, knowing that I can switch to one of the
siblings without much effort when I need more speed.

>Maybe something like this: https://github.com/markwright/antlrc.  Btw,
>Happy looks still live (https://github.com/simonmar/happy)
Have you used any of the above?

>
>I use "Read"-style parsers for small constructions only, so fail of such
>small parser is only case where error is producing and I do it
>explicitly (as biggest example: header of Git patch).
>
>Olaf, what do you think about
>   1. http://hackage.haskell.org/package/frisby
Frisby looks quite old, as it has idiosyncratic combinators for stuff now
in Data.Functor. I've never used it, so I can't tell what its strengths
are. Can you?
>   2. http://tanakh.github.io/Peggy
Notice that in the examples of bothy frisby and peggy, parsing a number
only parses a string of digits, then passes it to 'read', i.e. another
parser. Is that efficient, elegant parsing?

I'm nevertheless intrigued by PEG parsers, because I once read about the
connection between grammars and compression: A set of production rules can
be quite a bit smaller than the actual text. This turns parsing upside
down: The text is constant, the grammar unknown.

Olaf
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Alternative instance for non-backtracking parsers

Ben Franksen
In reply to this post by Paul
On 08/27/2018 11:52 AM, Paul wrote:
> I have some distrust of elegant solutions

Then why are interested in Haskell at all?

Cheers
Ben

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Alternative instance for non-backtracking parsers

Olaf Klinke
In reply to this post by Olaf Klinke
> Hello, Olaf. I have some distrust of elegant solutions (one of them are
> C.P. libs).

I have a program that parses several CSV files, one of them 50MB in size, and writes its result as HTML. When I started optimizing, the execution time was 144 seconds. Profiling (thanks to Jasper Van der Jeugt for writing profiteur!) revealed that most of the time was spent parsing and postprocessing the 50MB CSV file. Changing the data structure of the postprocessing stage cut down the execution time to 32 seconds, but still the majority is spent on parsing.
Then I realized that (StateT String Maybe) is a parser which conveniently has all the class instances one needs, most notably its Alternative instance make it a backtracking parser. After defining a few combinators I was able to swap out my megaparsec parser against the new parser, which slashed execution time in half. Now most of the parsing time is dedicated to transforming text to numbers and dates. I doubt that parsing time can be reduced much further [*]. The new parser was identical to the old parser, only the combinators now come from another module. That is the elegant thing about monadic parser libraries.
I will now use the fast parser by default, and if it returns a Nothing, the program will suggest a command line flag that switches to the original megaparsec parser, exactly telling the user where the parse failed and why.
I am not sure whether there is another family of parsers that have interfaces so similar that switching from one package to another is as effortless as monadic parsers.

Cheers
Olaf

[*] To the parser experts on this list: How much time should a parser take that processes a 50MB, 130000-line text file, extracting 5 values (String, UTCTime, Int, Double) from each line?
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Alternative instance for non-backtracking parsers

Bardur Arantsson-2
On 30/08/2018 20.21, Olaf Klinke wrote:
>> Hello, Olaf. I have some distrust of elegant solutions (one of them are
>> C.P. libs).
>
> [*] To the parser experts on this list: How much time should a parser take that processes a 50MB, 130000-line text file, extracting 5 values (String, UTCTime, Int, Double) from each line?

Not an expert, but for something as (relatively!) standard as CSV, I'd
probably go for a specialized solution like 'cassava', which seems like
it does quite well according to https://github.com/haskell-perf/csv

Based purely the lines/second numbers on that page and the number you've
given, I'd guesstimate that your parsing could potentially be as fast as
(3.185ms / 1000 lines) * 130000 lines = 414.05ms = 0.4 s.

(Of coure that still doesn't account for extracting the Int, Double,
etc., but there are also specialized solutions for that which should be
pretty hard to beat, see e.g. bytestring-lexing.)

It's also probably a bit less elegant than a generic parsec-like thing,
but that's to be expected for a more special-case solution.

Regards,

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Alternative instance for non-backtracking parsers

Bryan Richter-2
On 08/30/2018 03:43 PM, Bardur Arantsson wrote:

> On 30/08/2018 20.21, Olaf Klinke wrote:
>
>>> Hello, Olaf. I have some distrust of elegant solutions (one of
>>> them are C.P. libs).
>>
>> [*] To the parser experts on this list: How much time should
>> a parser take that processes a 50MB, 130000-line text file,
>> extracting 5 values (String, UTCTime, Int, Double) from each line?
>
> Not an expert, but for something as (relatively!) standard
> as CSV, I'd probably go for a specialized solution like
> 'cassava', which seems like it does quite well according to
> https://github.com/haskell-perf/csv
Playing the devil's advocate here....

If parser combinators aren't great for "relatively standard" things
such as CSV, then what *are* they good for?


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Alternative instance for non-backtracking parsers

Bardur Arantsson-2
On 31/08/2018 14.50, Bryan Richter wrote:

> On 08/30/2018 03:43 PM, Bardur Arantsson wrote:
>
>> On 30/08/2018 20.21, Olaf Klinke wrote:
>>
>>>> Hello, Olaf. I have some distrust of elegant solutions (one of
>>>> them are C.P. libs).
>>>
>>> [*] To the parser experts on this list: How much time should
>>> a parser take that processes a 50MB, 130000-line text file,
>>> extracting 5 values (String, UTCTime, Int, Double) from each line?
>>
>> Not an expert, but for something as (relatively!) standard
>> as CSV, I'd probably go for a specialized solution like
>> 'cassava', which seems like it does quite well according to
>> https://github.com/haskell-perf/csv
>
> Playing the devil's advocate here....
>
> If parser combinators aren't great for "relatively standard" things
> such as CSV, then what *are* they good for?
>

Off the top of my head:

a) They're often much more readable than many alternatives if you
*don't* actually care *too* much about performance. (Especially if you
have semantic actions, I find that they're *much* more readable.)

b) They're usually a lot better for "tricky" languages which e.g. might
not be context-free.

c) Embedding the "grammar" directly in Haskell means that there's no
weirdness around integrating generated code/types with the rest of the
program.

(Probably several other obvious things, I'm forgetting.)

Regards,

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Alternative instance for non-backtracking parsers

Peter Simons-2
In reply to this post by Bryan Richter-2
Hi Bryan,

 > If parser combinators aren't great for "relatively standard" things
 > such as CSV, then what *are* they good for?

there is no such thing as a "parser combinator library". There are many
different libraries for writing parsers, and these libraries each have
vastly different design goals. Attoparsec, for example, is specifically
designed to be fast, and I don't see any reason why it shouldn't be
possible to implement a high-performance CSV parsing library on top of
that package. In fact, I believe Cassava does exactly that.

Parsec and the newer Megaparsec, on the other hand, put more emphasis on
good error messages than an performance. Now, good error messages matter
when you're parsing a sophisticated input language with many different
non-trivial constructs, but for a CSV parser there's really not much to be
done in terms of "good error messages" because the syntax is so simple.
Therefore, Parsec might not be the best choice for that particular
use-case.

Anyway, I don't think that it makes much sense to talk about all those
libraries as if they were all the same and had all the same properties,
because they don't.

Best regards,
Peter

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Alternative instance for non-backtracking parsers

Brandon Allbery
In reply to this post by Bryan Richter-2
I'm tempted to say that for things like HTML and CSV, the real reason not to use a parser combinator library is that "standard" turns out to mean pretty much nothing whatsoever. And you want to use someone else's battle-tested workarounds developed over years for a "standard format" that's more or less a free-for-all.

On Fri, Aug 31, 2018 at 8:51 AM Bryan Richter <[hidden email]> wrote:
On 08/30/2018 03:43 PM, Bardur Arantsson wrote:

> On 30/08/2018 20.21, Olaf Klinke wrote:
>
>>> Hello, Olaf. I have some distrust of elegant solutions (one of
>>> them are C.P. libs).
>>
>> [*] To the parser experts on this list: How much time should
>> a parser take that processes a 50MB, 130000-line text file,
>> extracting 5 values (String, UTCTime, Int, Double) from each line?
>
> Not an expert, but for something as (relatively!) standard
> as CSV, I'd probably go for a specialized solution like
> 'cassava', which seems like it does quite well according to
> https://github.com/haskell-perf/csv

Playing the devil's advocate here....

If parser combinators aren't great for "relatively standard" things
such as CSV, then what *are* they good for?

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.


--
brandon s allbery kf8nh

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Alternative instance for non-backtracking parsers

Ben Franksen
In reply to this post by Peter Simons-2
Am 31.08.2018 um 16:55 schrieb Peter Simons:
> Parsec and the newer Megaparsec, on the other hand, put more emphasis on
> good error messages than an performance. Now, good error messages matter
> when you're parsing a sophisticated input language with many different
> non-trivial constructs, but for a CSV parser there's really not much to be
> done in terms of "good error messages" because the syntax is so simple.
> Therefore, Parsec might not be the best choice for that particular
> use-case.

Indeed, since CSV is a regular language I'd exploit that by using e.g.
regex-applicative and enjoy guaranteed linear time parsing.

Cheers
Ben

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Alternative instance for non-backtracking parsers

Mario Blažević
In reply to this post by Olaf Klinke
On 2018-08-27 04:55 PM, Olaf Klinke wrote:

>> Olaf, what do you think about
>>   1. http://hackage.haskell.org/package/frisby
> Frisby looks quite old, as it has idiosyncratic combinators for stuff
> now in Data.Functor. I've never used it, so I can't tell what its
> strengths are. Can you?
>>   2. http://tanakh.github.io/Peggy
> Notice that in the examples of bothy frisby and peggy, parsing a number
> only parses a string of digits, then passes it to 'read', i.e. another
> parser. Is that efficient, elegant parsing?

        If you're looking for a newer implementation of PEGs, there's my
grammatical-parsers library. It's also closer to the formal grammar
notation and it uses the standard parsing combinators, so it seems like
it should tick off a few of your boxes.


> I'm nevertheless intrigued by PEG parsers, because I once read about the
> connection between grammars and compression: A set of production rules
> can be quite a bit smaller than the actual text. This turns parsing
> upside down: The text is constant, the grammar unknown.

        This is a nice idea, but it's unlikely to ever have a reasonably fast
compressor implementation. Most interesting questions you can ask about
context-free grammars are undecidable, and PEGs are even more complex
from the theoretical point of view.
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Alternative instance for non-backtracking parsers

Olaf Klinke
In reply to this post by Olaf Klinke
Thanks Bardur for the pointer to bytesting-lexing and cassava. The problem is that my csv files are a little bit idiosyncratic. That is, they have BOM, semicolons as separators, The fractional numbers are in the format 1.234,567 and there are dates to parse, too. I tried parseTimeM from the time package, but that is slower than my own parser. That said, my megaparsec parser seems to spend quite some time skipping over text with regex ';"[^"]*"', that is, fields whose content does not concern the application at hand. Hence using a CSV library for tokenizing might be a good idea.

Thanks to Peter Simons for pointing out that cassava indeed has attoparsec as a dependency.

Maybe CSV is a red herring, after all. It's just a some text-based syntax where the count of semicolons indicate where in the line I can find the data I'm interested in. I'm more curious about how to make number conversion fast.
I've looked at the source of megaparsec-6.5.0, attoparsec-0.13.2.2, bytestring-lexing-0.5.0.2 and base-4.11.1.0. They all do the same thing: Convert digits to numbers individually, then fold the list of digits as follows:

f x digit = x * 10 + value digit
number = foldl' f 0 digits

For 'value' above, Megaparsec uses Data.Char.digitToInt while Attoparsec uses Data.Char.ord.
I also rolled my own Double parser for locale reasons. Are there any libraries that handle all the formats

1,234.567
1234.567
1.234,567
1234,567

maybe by specifying a locale up front? It can't be done without, since 123.456 on its own is ambigous. Maybe the locale can be guessed from the context, maybe not, but that is certainly an expensive operation. MS Excel guesses eagerly, with occasional amusing consequences.

Thanks to all who contributed to this thread so far!
Olaf
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
12