regex and Regular Expressions Libraries

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

regex and Regular Expressions Libraries

Chris Dornan
Dear Haskell Cafe,

With the regex announcement, I wanted to try get other folks' perspective
on what has been happening with the Haskell regular expression libraries.

I have been in turn impressed at how good the engineering of the Regex
packages while gobsmacked by how difficult the traditional Text.Regex API
is to use.

In this blog post,

    http://engineers.irisconnect.net/posts/2017-03-07-regex.html
       
I rather cheekily speculate that the Haskellers perhaps have been a bit
disdainful of regular expressions (not important in a language capable
of doing proper parsing, etc.).

What do you think?

Chris




_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: regex and Regular Expressions Libraries

Brandon Allbery

On Thu, Mar 9, 2017 at 2:06 PM, Chris Dornan <[hidden email]> wrote:
In this blog post,

    http://engineers.irisconnect.net/posts/2017-03-07-regex.html

I rather cheekily speculate that the Haskellers perhaps have been a bit
disdainful of regular expressions (not important in a language capable
of doing proper parsing, etc.).

What do you think?

I've voiced that opinion in #haskell a few times, that the API's designed to scare people toward parsers.

--
brandon s allbery kf8nh                               sine nomine associates
[hidden email]                                  [hidden email]
unix, openafs, kerberos, infrastructure, xmonad        http://sinenomine.net

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: regex and Regular Expressions Libraries

Evan Laforge
In reply to this post by Chris Dornan
That's always been my canonical example about how you can overuse
typeclasses to make a simple job into an impenetrable documentation
hunt.  Long ago I wrapped it in a simple API and have always used
that.  Much later, pcre-heavy showed up and I just switched my wrapper
to use that, since it was still a little too typeclass happy for my
taste.

On Thu, Mar 9, 2017 at 11:06 AM, Chris Dornan <[hidden email]> wrote:

> Dear Haskell Cafe,
>
> With the regex announcement, I wanted to try get other folks' perspective
> on what has been happening with the Haskell regular expression libraries.
>
> I have been in turn impressed at how good the engineering of the Regex
> packages while gobsmacked by how difficult the traditional Text.Regex API
> is to use.
>
> In this blog post,
>
>     http://engineers.irisconnect.net/posts/2017-03-07-regex.html
>
> I rather cheekily speculate that the Haskellers perhaps have been a bit
> disdainful of regular expressions (not important in a language capable
> of doing proper parsing, etc.).
>
> What do you think?
>
> Chris
>
>
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: regex and Regular Expressions Libraries

Chris Dornan
Hi Evan,

By the sounds of it regex should help with this – each match operator being
available in an un-overloaded format. Does this API work for you?

Chris

On 2017-03-10, 05:12, "Evan Laforge" <[hidden email]> wrote:

    That's always been my canonical example about how you can overuse
    typeclasses to make a simple job into an impenetrable documentation
    hunt.  Long ago I wrapped it in a simple API and have always used
    that.  Much later, pcre-heavy showed up and I just switched my wrapper
    to use that, since it was still a little too typeclass happy for my
    taste.
   
    On Thu, Mar 9, 2017 at 11:06 AM, Chris Dornan <[hidden email]> wrote:
    > Dear Haskell Cafe,
    >
    > With the regex announcement, I wanted to try get other folks' perspective
    > on what has been happening with the Haskell regular expression libraries.
    >
    > I have been in turn impressed at how good the engineering of the Regex
    > packages while gobsmacked by how difficult the traditional Text.Regex API
    > is to use.
    >
    > In this blog post,
    >
    >     http://engineers.irisconnect.net/posts/2017-03-07-regex.html
    >
    > I rather cheekily speculate that the Haskellers perhaps have been a bit
    > disdainful of regular expressions (not important in a language capable
    > of doing proper parsing, etc.).
    >
    > What do you think?
    >
    > Chris
    >
    >
    >
    >
    > _______________________________________________
    > Haskell-Cafe mailing list
    > To (un)subscribe, modify options or view archives go to:
    > http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
    > Only members subscribed via the mailman list are allowed to post.
   


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: regex and Regular Expressions Libraries

Evan Laforge
On Fri, Mar 10, 2017 at 2:24 AM, Chris Dornan <[hidden email]> wrote:
> By the sounds of it regex should help with this – each match operator being
> available in an un-overloaded format. Does this API work for you?

I looked at the tutorial and.... maybe not so much?  I hardcode to
Text + PCRE since that's all I need, but that combination seems to be
unsupported.  As a light user of regexes, I won't remember much of the
API between uses, so I'm just looking to find the 'Regex -> Text ->
Bool' function as fast as possible, and a bunch of polymorphic
operators I'll never remember would just get in the way.  Also for the
same reason I'd be worried about any deviation from "standard" PCRE,
e.g. $(..) for groups.  However, I'm a lightweight user, so don't take
me too seriously, and I made my own tiny little bikeshed anyway.
Which is to say don't let me rain on your parade :)

For what it's worth, I mostly used regexes in python, and it gets
along fine with hardcoded Text + PCRE, no operators, and basically
three functions: match, get groups, and substitute groups.  So it's no
surprise my wrapper basically looks like that:

compileOptions :: [Option] -> String -> Either String Regex

matches :: Regex -> Text -> Bool

-- | Return (complete_match, [group_match]).
groups :: Regex -> Text -> [(Text, [Text])]

-- | Half-open ranges of where the regex matches.
groupRanges :: Regex -> Text -> [((Int, Int), [(Int, Int)])]
    -- ^ (entire, [group])

substitute :: Regex -> (Text -> [Text] -> Text)
    -- ^ (complete_match -> groups -> replacement)
    -> Text -> Text

I also added a Show instance that shows the regex rather than hex and
the mysteriously missing:

-- | Escape a string so the regex matches it literally.
escape :: String -> String


The QuasiQuote stuff seems neat, but I'm sort of scared of TH, and if
the regex gets complicated enough that would make it worth it, I
probably already switched to a parser.  Or I get regexes from user
input because of how succinct they are and that's runtime anyway.
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: regex and Regular Expressions Libraries

Chris Dornan
Thanks Evan,

That feedback is really valuable and I understand why you would have no reason
to switch to regex.

On the use of ‘$’, as far as I know this extension will not clash with any of
the PCRE extensions (if anybody knows of any problems please give me shout),
though for sure you will have to fill out the numbers when converting between
the two text-replacement schemes.

As for Text and PCRE – that is on the top my list but it will need some
coordination with the upstream regex-pcre maintainers.

I do have escape functions though I haven’t included them in the tutorial yet.

Being able to recover the text of the REs would be great and I would like to include
it in a future release, but again that will need some coordination with the regex-base
maintainers.

I will raise those issues.

Fantastic feedback!

Cheers,

Chris



On 10/03/2017, 17:08, "Evan Laforge" <[hidden email]> wrote:

    On Fri, Mar 10, 2017 at 2:24 AM, Chris Dornan <[hidden email]> wrote:
    > By the sounds of it regex should help with this – each match operator being
    > available in an un-overloaded format. Does this API work for you?
   
    I looked at the tutorial and.... maybe not so much?  I hardcode to
    Text + PCRE since that's all I need, but that combination seems to be
    unsupported.  As a light user of regexes, I won't remember much of the
    API between uses, so I'm just looking to find the 'Regex -> Text ->
    Bool' function as fast as possible, and a bunch of polymorphic
    operators I'll never remember would just get in the way.  Also for the
    same reason I'd be worried about any deviation from "standard" PCRE,
    e.g. $(..) for groups.  However, I'm a lightweight user, so don't take
    me too seriously, and I made my own tiny little bikeshed anyway.
    Which is to say don't let me rain on your parade :)
   
    For what it's worth, I mostly used regexes in python, and it gets
    along fine with hardcoded Text + PCRE, no operators, and basically
    three functions: match, get groups, and substitute groups.  So it's no
    surprise my wrapper basically looks like that:
   
    compileOptions :: [Option] -> String -> Either String Regex
   
    matches :: Regex -> Text -> Bool
   
    -- | Return (complete_match, [group_match]).
    groups :: Regex -> Text -> [(Text, [Text])]
   
    -- | Half-open ranges of where the regex matches.
    groupRanges :: Regex -> Text -> [((Int, Int), [(Int, Int)])]
        -- ^ (entire, [group])
   
    substitute :: Regex -> (Text -> [Text] -> Text)
        -- ^ (complete_match -> groups -> replacement)
        -> Text -> Text
   
    I also added a Show instance that shows the regex rather than hex and
    the mysteriously missing:
   
    -- | Escape a string so the regex matches it literally.
    escape :: String -> String
   
   
    The QuasiQuote stuff seems neat, but I'm sort of scared of TH, and if
    the regex gets complicated enough that would make it worth it, I
    probably already switched to a parser.  Or I get regexes from user
    input because of how succinct they are and that's runtime anyway.
   


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: regex and Regular Expressions Libraries

Evan Laforge
On Fri, Mar 10, 2017 at 9:30 AM, Chris Dornan <[hidden email]> wrote:
> On the use of ‘$’, as far as I know this extension will not clash with any of
> the PCRE extensions (if anybody knows of any problems please give me shout),
> though for sure you will have to fill out the numbers when converting between
> the two text-replacement schemes.

Oh ok, I was worried that ()s would become non-capturing and you'd
have to use $() to capture.  I think your scheme with $() for groups
and replacement is actually nicer than the traditional (xyz) and
(?:xyz) and \# for replacement, but you know tradition hangs heavy on
the minds of us regex cargo-culters :)

> As for Text and PCRE – that is on the top my list but it will need some
> coordination with the upstream regex-pcre maintainers.

Nowadays I inherit that from pcre-heavy, but of course if you're
already on another backend then maybe not so simple.  libpcre takes
bytestrings, so I'll bet the "Text interface" amounts to sticking ".
encodeUtf8" on the front and turning on the UTF8 flag.
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: regex and Regular Expressions Libraries

Chris Dornan
In reply to this post by Chris Dornan
I must correct myself. I said:

> Being able to recover the text of the REs would be great and I would like to include
> it in a future release, but again that will need some coordination with the regex-base
> maintainers.

Sorry, that isn’t right at all. Regex already allows you to recover the text from a compiled
RE via the reSource function:

  reSource :: RE -> String

Evans said:

> Nowadays I inherit that from pcre-heavy, but of course if you're
> already on another backend then maybe not so simple.

Yes, regex is built on top of regex-base and the regex-tdfa + regex-pcre back ends.

> but you know tradition hangs heavy

Indeed so!

Chris



_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: regex and Regular Expressions Libraries

Adam Bergmark-2
Hi Chris,

The combination of parser combinators and the old regex libraries have meant  I've avoided regexes as much as possible.

regex looks very promising so I'm sure to try it out next time i need something similar!

Will you add regex to stackage? Having to muck around with extra-deps gets tiring. I would also like to see Text support, but having to pack/unpack wouldn't be a dealbreaker for me.

Have you considered doing anything fancy to make capture groups safer to use? If i could get a compile error when i'm using the wrong number/wrongly named groups I'd be very excited.

Cheers,
Adam



On Fri, 10 Mar 2017 at 19:06 Chris Dornan <[hidden email]> wrote:
I must correct myself. I said:

> Being able to recover the text of the REs would be great and I would like to include
> it in a future release, but again that will need some coordination with the regex-base
> maintainers.

Sorry, that isn’t right at all. Regex already allows you to recover the text from a compiled
RE via the reSource function:

  reSource :: RE -> String

Evans said:

> Nowadays I inherit that from pcre-heavy, but of course if you're
> already on another backend then maybe not so simple.

Yes, regex is built on top of regex-base and the regex-tdfa + regex-pcre back ends.

> but you know tradition hangs heavy

Indeed so!

Chris



_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: regex and Regular Expressions Libraries

Chris Dornan
Adam Bergmark sez:

> Will you add regex to stackage?

Absolutely – on my list for this weekend.

> Have you considered doing anything fancy to make capture groups safer to use? If i could get a compile error
> when i'm using the wrong number/wrongly named groups I'd be very excited.

I totally agree! The only reason this has not been done is because it is not easy to do with the current
structure of regex and the way it fits into regex-base. I am open to suggestions though – just opened an
issue for it https://github.com/iconnect/regex/issues/60.

Cheers,

Chris




_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Loading...