Parsing words with parsec

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Parsing words with parsec

paolino
Hi,
        I had a bad time trying to parse the words of a text.
I suspect I miss some parsec knowledge.

In the end it seems working, though I haven't tested much and this example
contains the main features I was looking.

*Main> parseTest (parseLine eof) "paolo@gmail sara,mimmo! 9ab a9b ab9 cd\n"
["paolo@gmail","sara","mimmo","cd"]

---------------------
manyTillT body terminator joiner = liftM2 joiner (manyTill body (lookAhead  
terminator)) terminator

wordChar = letter <|> oneOf "_@" <?> "a valid word character"

nonSeparator = wordChar <|> digit

wordEnd = do
             x <- wordChar
             notFollowedBy nonSeparator
             return x

word = manyTillT wordChar (try wordEnd) (\b t -> b ++ [t]) <?> "a word"

wordStart = do
               (try nonSeparator >> unexpected "non separator") <|> anyChar
               lookAhead wordChar

nextWord =  manyTill anyChar (try wordStart) >> (try word <|> nextWord)

parseLine end = do
                   f <- option [] $ return `fmap` try word
                   r <- many $ try nextWord
                   manyTill anyChar end
                   return (f ++ r)              

-----------

Any comment to simplify this code is welcome.


Paolino.
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Parsing words with parsec

Stefan O'Rear
On Fri, Mar 30, 2007 at 05:43:34AM +0200, paolino wrote:
> Hi,
>         I had a bad time trying to parse the words of a text.
> I suspect I miss some parsec knowledge.

I'd start by not sextuple-posting, it just sextuples the ugliness ;-)

Anyway, I think parsec is *far* too big a hammer for the nail you're trying to hit.

import Char( isAlpha )
import List( groupBy )

equating f x y = f x == f y  -- in Data.Eq, iff you have GHC 6.7

isLetter x = isAlpha x || x == '_' || x == '@'

myWords = filter (isLetter . head) . groupBy (equating isLetter)

Stefan
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Parsing words with parsec

paolino
On 3/30/07, Stefan O'Rear <[hidden email]> wrote:
> On Fri, Mar 30, 2007 at 05:43:34AM +0200, paolino wrote:
> > Hi,
> >         I had a bad time trying to parse the words of a text.
> > I suspect I miss some parsec knowledge.
>
> I'd start by not sextuple-posting, it just sextuples the ugliness ;-)
Mhh, still I don't see any them in my inbox mails , probably something
buggy in gmail configuration, sorry :/.

>
>
> import Char( isAlpha )
> import List( groupBy )
>
> equating f x y = f x == f y  -- in Data.Eq, iff you have GHC 6.7
>
> isLetter x = isAlpha x || x == '_' || x == '@'
>
> myWords = filter (isLetter . head) . groupBy (equating isLetter)
>

Testing your code, it misses the words with numbers inside exclusion
and uses the number as separators.

!runhaskell prova.hs
["paolo@gmail","sara","mimmo","ab","a","b","ab","cd"]

Thanks
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

RE: sending from Gmail (was Parsing words with parsec)

Bayley, Alistair-2
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of Paolino
> >
> > I'd start by not sextuple-posting, it just sextuples the
> ugliness ;-)
> Mhh, still I don't see any them in my inbox mails , probably something
> buggy in gmail configuration, sorry :/.

Are you expecting to see your sent message eventually arrive in your
inbox? gmail doesn't do that by default (and I don't see an obvious
setting to change it). gmail seems to be fairly keen on removing
duplicate messages.

Alistair
*****************************************************************
Confidentiality Note: The information contained in this message,
and any attachments, may contain confidential and/or privileged
material. It is intended solely for the person(s) or entity to
which it is addressed. Any review, retransmission, dissemination,
or taking of any action in reliance upon this information by
persons or entities other than the intended recipient(s) is
prohibited. If you received this in error, please contact the
sender and delete the material from any computer.
*****************************************************************
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: sending from Gmail (was Parsing words with parsec)

paolino
On Friday 30 March 2007 11:44, Bayley, Alistair wrote:

> > From: [hidden email]
> > [mailto:[hidden email]] On Behalf Of Paolino
> >
> > > I'd start by not sextuple-posting, it just sextuples the
> >
> > ugliness ;-)
> > Mhh, still I don't see any them in my inbox mails , probably something
> > buggy in gmail configuration, sorry :/.
>
> Are you expecting to see your sent message eventually arrive in your
> inbox? gmail doesn't do that by default (and I don't see an obvious
> setting to change it). gmail seems to be fairly keen on removing
> duplicate messages.

Then probably I was expecting haskell-cafe to send messages sent by me to me
also so I can have my messages in the thread automatically. Or , I have to
copy them from the sent box to the haskell-cafe filter box ?
I'm confused.

Paolino
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: sending from Gmail (was Parsing words with parsec)

Dougal Stanton
In reply to this post by Bayley, Alistair-2
On 30/03/07, Bayley, Alistair <[hidden email]> wrote:

> > Mhh, still I don't see any them in my inbox mails , probably something
> > buggy in gmail configuration, sorry :/.
>
> Are you expecting to see your sent message eventually arrive in your
> inbox? gmail doesn't do that by default (and I don't see an obvious
> setting to change it). gmail seems to be fairly keen on removing
> duplicate messages.

The ML software has the option of not sending you your own messages,
but I can't remember if it is on by default or not.

Cheers,

Dougal.
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: sending from Gmail (was Parsing words with parsec)

paolino
On Friday 30 March 2007 11:54, Dougal Stanton wrote:
> On 30/03/07, Bayley, Alistair <[hidden email]>
wrote:

> > > Mhh, still I don't see any them in my inbox mails , probably something
> > > buggy in gmail configuration, sorry :/.
> >
> > Are you expecting to see your sent message eventually arrive in your
> > inbox? gmail doesn't do that by default (and I don't see an obvious
> > setting to change it). gmail seems to be fairly keen on removing
> > duplicate messages.
>
> The ML software has the option of not sending you your own messages,
> but I can't remember if it is on by default or not.

Just controlled , it's set to Yes on receiving copy of my posts. Thanks
anyway.

Paolino
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Parsing words with parsec

paolino
In reply to this post by Stefan O'Rear
On Friday 30 March 2007 06:59, Stefan O'Rear wrote:

> Anyway, I think parsec is *far* too big a hammer for the nail you're trying
> to hit.

In the end , the big hammer solution has become

parseLine = fmap (map fst. filter snd) $ many parser
  where parser = do w <- option ("",False) parseAWord  
                    anyChar -- skip the separator
                    return w
        parseAWord = try positive <|> (many1 nonSeparator >> return
("",False))
        positive = do c <- wordChar
                      (cs,tn) <- option ("",True) parseAWord
                      return (c:cs,tn)
   
wordChar = letter <|> oneOf "_@" <?> "a word-character"
                                         
nonSeparator = wordChar <|> digit <?> "a non-separator"

while your, corrected not parsec solution is

wordsOfLine isNonSeparator isWordChar = (filter (all isWordChar)).
     groupBy (\x y -> (isNonSeparator x) == (isNonSeparator y))

Still ,I wonder if the parsec solution can be simplified.

Thanks.

(PS. I put an option on the ML software which sends me an ack on posting , so
at least I know I sent the mail :) )
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe