A simple attoparsec question

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

A simple attoparsec question

Robert Clausecker
Hi Haskellers!

I'm currently trying to write an assembler for Knuths MMIXAL language.
Currently, I'm writing the parser using attoparsec, and have one
question:

In MMIXAL, a string literal starts with a ", followed by an arbitrary
amount of arbitrary characters excluding the newline character and
finished with another ". For example: "Hello, World!" is a string, "ſ"
is a string too, but "\n" (\n is a newline) isn't. Here's the parser,
which parses a s tring and other kinds of constants:

  parseConstant = Reference <$> try parseLocLabel
              <|> PlainNum <$> decimal
              <|> char '#' *> fmap PlainNum hexadecimal
              <|> char '\'' *> (CharLit <$> notChar '\n') <* char '\''
              <|> try $ (char '"' *> (StringLit . B.pack <$>
                    manyTill (notChar '\n') (char '"')))
              <?> "constant"

The problem is, that attoparsec just silently fails on this kind of
strings and tries other parsers afterwards, which leads to strange
results. Is there a way to force the whole parser to fail, even if
there's an alternative parser afterwards?

I hope, you understand my question.

Yours, Robert Clausecker

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe

signature.asc (501 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: A simple attoparsec question

Evan Laforge
>  parseConstant = Reference <$> try parseLocLabel
>              <|> PlainNum <$> decimal
>              <|> char '#' *> fmap PlainNum hexadecimal
>              <|> char '\'' *> (CharLit <$> notChar '\n') <* char '\''
>              <|> try $ (char '"' *> (StringLit . B.pack <$>
>                    manyTill (notChar '\n') (char '"')))
>              <?> "constant"
>
> The problem is, that attoparsec just silently fails on this kind of
> strings and tries other parsers afterwards, which leads to strange
> results. Is there a way to force the whole parser to fail, even if
> there's an alternative parser afterwards?

If none of the alternatives consume any characters, then the next
alternative will be tried.  But this is a question for your grammar,
i.e. it sounds like you have 'parseConstant <|> parseSomethingElse'
and you want parseSomethingElse to not be tried?  Then omit it!

If your string parser isn't working how you want, I recommend breaking
out the different kinds of literals, like chars and strings, and
testing at the REPL to make sure they work on their own.  BTW, if you
use takeWhile you can avoid the extra pack.  I like to use a between
combinator: 'between a b mid = a >> mid <* b'.  It's hard to read the
applicative soup above, and I wouldn't trust it to be totally correct,
instead I'd simplify with functions and test interactively.

The 'try' is also redundant, I think.

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: A simple attoparsec question

Daniel Fischer
In reply to this post by Robert Clausecker
On Tuesday 01 March 2011 22:15:38, Robert Clausecker wrote:
>
> I hope, you understand my question.

Not sure. If I understand correctly, if you have

someParser = foo
    <|> bar
    <|> parseConstant
    <|> baz
    <|> quux

and invoke someParser on something like

"\"Line\nLine\""

it tries baz and quux but you want it to fail without trying those?
How catastrophic should the failure be?

manyTill (notChar '\n') (char '"' <|> (char '\n' >> error "Newline"))
would be simple but rather extreme.

Since the point of (<|>) is that the second parser is tried upon failure of
the first, I don't see how you could avoid that with less drastic measures,
so you'd have to return a pseudo-success for malformed string literals and
check for that after someParser.

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: A simple attoparsec question

Steve Schafer
In reply to this post by Robert Clausecker
On Tue, 01 Mar 2011 22:15:38 +0100, you wrote:

>The problem is, that attoparsec just silently fails on this kind of
>strings and tries other parsers afterwards, which leads to strange
>results.

Can you give a concrete example of the problem that you're seeing here?
(Basically, you're describing exactly how a parser is supposed to work,
so it's not clear what the problem is...)

-Steve Schafer

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: A simple attoparsec question

Malcolm Wallace-2
In reply to this post by Evan Laforge

On 1 Mar 2011, at 21:58, Evan Laforge wrote:

>>  parseConstant = Reference <$> try parseLocLabel
>>              <|> PlainNum <$> decimal
>>              <|> char '#' *> fmap PlainNum hexadecimal
>>              <|> char '\'' *> (CharLit <$> notChar '\n') <* char '\''
>>              <|> try $ (char '"' *> (StringLit . B.pack <$>
>>                    manyTill (notChar '\n') (char '"')))
>>              <?> "constant"
>>
>> The problem is, that attoparsec just silently fails on this kind of
>> strings and tries other parsers afterwards, which leads to strange
>> results. Is there a way to force the whole parser to fail, even if
>> there's an alternative parser afterwards?

I _think_ what the original poster is worried about is that, having  
consumed an initial portion of a constant, e.g. the leading # or ' or  
", if the input does not complete the token sequence in a valid way,  
then the other alternatives are tried anyway (and hopelessly).  This  
can lead to very poor error messages.

The technique advocated by the polyparse library is to explicitly  
annotate the knowledge that when a certain sequence has been seen  
already, then no other alternative can possibly match.  The combinator  
is called 'commit'.  This locates the errors much more precisely.

For instance, (in some hybrid of polyparse/attoparsec combinators)

>>  parseConstant = Reference <$> try parseLocLabel
>>              <|> PlainNum <$> decimal
>>              <|> char '#' *> commit (fmap PlainNum hexadecimal)
>>              <|> char '\'' *> commit ((CharLit <$> notChar '\n') <*  
>> char '\'')
>>              <|> char '"' *> commit ((StringLit . B.pack <$>
>>                    manyTill (notChar '\n') (char '"')))
>>              <?> "constant"


Regards,
     Malcolm


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe