Code runs 7x FASTER with profiling

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Code runs 7x FASTER with profiling

Neil Mayhew

I was writing a simple utility and I decided to use regexps to parse filenames. (I know, now I have two problems :-) )

I was surprised at how slow it ran, so I did a profiling build. The profiled code runs reasonably quickly, and is 7x faster, which makes it a bit hard to figure out where the slowdown is happening in the non-profiled code. I’m wondering if I’m doing something wrong, or if there’s a bug in regex-tdfa or in ghc.

I’ve pared my code down to just the following:

import Text.Regex.TDFA ((=~))

main :: IO ()
main = do
    entries <- map parseFilename . lines <$> getContents
    let check (Right (_, t)) = last t == 'Z'
        check _ = False
    print $ all check entries

parseFilename :: String -> Either String (String, String)
parseFilename fn = case (fn =~ pattern :: [[String]]) of
    [[_, full, _, time]] -> Right $ (full, time)
    _ -> Left fn
  where
    pattern =
        "^\\./duplicity-(full|inc|new)(-signatures)?\\.\
        \([0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]T[0-9][0-9][0-9][0-9][0-9][0-9]Z)\\."

The relevant part of my .cabal file looks like this:

executable DuplicityAnalyzer
    main-is: DuplicityAnalyzer.hs
    build-depends:
        base >=4.6 && <4.11,
        regex-tdfa >= 1.0 && <1.3
    default-language: Haskell2010
    ghc-options: -Wall -rtsopts

To run the profiling, I do:

cabal clean
cabal configure --enable-profiling
cabal build
dist/build/DuplicityAnalyzer/DuplicityAnalyzer <names.in +RTS -sprofiling-summary.out -p

The MUT time in the non-profiling build is 7x bigger, and the %GC time goes from 8% to 21%. I’ve put the actual output in a gist. I’ve also put my test input file there, in case anyone wants to try this themselves.

I’ve done my testing with NixOS (ghc 8.0.2) and Debian with the Haskell Platform (ghc 8.2.1) and the results are basically the same. I even tried using Docker containers with Debian Jessie and Debian Stretch, just to eliminate any OS influence, and the results are still the same. I’ve tried it on an i5-2500K, i5-3317U and Xeon E5-1620.

I also wrote a dummy implementation of =~ that ignores the regex pattern and does a hard-coded manual parse, and that produces times just slightly better than the profiled ones. So I don’t think there’s a problem in my outer code that uses =~.


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Code runs 7x FASTER with profiling

Jonas Scholl
Looking at the produced core of both versions reveals that in the
profiled build a closure of type Regex is floated to top level. The
non-profiled build doesn't do this, thus it recompiles the regex for
every iteration. This is most likely the source of the slowdown of the
non-profiled build.

On 12/07/2017 07:09 AM, Neil Mayhew wrote:

> I was writing a simple utility and I decided to use regexps to parse
> filenames. (I know, now I have two problems :-) )
>
> I was surprised at how slow it ran, so I did a profiling build. The
> profiled code runs reasonably quickly, and is 7x faster, which makes it
> a bit hard to figure out where the slowdown is happening in the
> non-profiled code. I’m wondering if I’m doing something wrong, or if
> there’s a bug in |regex-tdfa| or in ghc.
>
> I’ve pared my code down to just the following:
>
> |import Text.Regex.TDFA ((=~)) main :: IO () main = do entries <- map
> parseFilename . lines <$> getContents let check (Right (_, t)) = last t
> == 'Z' check _ = False print $ all check entries parseFilename :: String
> -> Either String (String, String) parseFilename fn = case (fn =~ pattern
> :: [[String]]) of [[_, full, _, time]] -> Right $ (full, time) _ -> Left
> fn where pattern = "^\\./duplicity-(full|inc|new)(-signatures)?\\.\
> \([0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]T[0-9][0-9][0-9][0-9][0-9][0-9]Z)\\."
> |
>
> The relevant part of my |.cabal| file looks like this:
>
> |executable DuplicityAnalyzer main-is: DuplicityAnalyzer.hs
> build-depends: base >=4.6 && <4.11, regex-tdfa >= 1.0 && <1.3
> default-language: Haskell2010 ghc-options: -Wall -rtsopts |
>
> To run the profiling, I do:
>
> |cabal clean cabal configure --enable-profiling cabal build
> dist/build/DuplicityAnalyzer/DuplicityAnalyzer <names.in +RTS
> -sprofiling-summary.out -p |
>
> The |MUT| time in the non-profiling build is 7x bigger, and the |%GC|
> time goes from 8% to 21%. I’ve put the actual output in a gist
> <https://gist.github.com/neilmayhew/247a30738c0e294902e7c2830ca2c6f5>.
> I’ve also put my test input file there, in case anyone wants to try this
> themselves.
>
> I’ve done my testing with NixOS (ghc 8.0.2) and Debian with the Haskell
> Platform (ghc 8.2.1) and the results are basically the same. I even
> tried using Docker containers with Debian Jessie and Debian Stretch,
> just to eliminate any OS influence, and the results are still the same.
> I’ve tried it on an i5-2500K, i5-3317U and Xeon E5-1620.
>
> I also wrote a dummy implementation of |=~| that ignores the regex
> pattern and does a hard-coded manual parse, and that produces times just
> slightly better than the profiled ones. So I don’t think there’s a
> problem in my outer code that uses |=~|.
>
> ​
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.
>


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.

signature.asc (499 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Code runs 7x FASTER with profiling

Neil Mayhew

On 2017-12-07 12:49 AM, Jonas Scholl wrote:

Looking at the produced core of both versions reveals that in the profiled build a closure of type Regex is floated to top level. The non-profiled build doesn’t do this, thus it recompiles the regex for every iteration. This is most likely the source of the slowdown of the non-profiled build.

Thanks, Jonas. This does indeed seem to be the problem. I changed the code to use a compiled regex (with makeRegex and match instead of =~) but in the non-profiling case the run-time doesn’t improve unless I float the compiled regex myself:

parseFilename :: String -> Either String (String, String)
parseFilename fn = case (pattern `match` fn :: [[String]]) of
    [[_, full, _, time]] -> Right $ (full, time)
    _ -> Left fn

pattern :: Regex
pattern = makeRegex
    "^\\./duplicity-(full|inc|new)(-signatures)?\\.\
    \([0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]T[0-9][0-9][0-9][0-9][0-9][0-9]Z)\\."

Then it runs 2-3x faster than the profiled code.

The question remains, however: why doesn’t the ghc optimizer spot this fairly obvious loop-invariant in the non-profiled build when it does manage to spot it in the profiled one? In other words, when I make pattern a local definition of parseFilename, why isn’t it treated as a CAF that’s evaluated only once (‘floated to the top level’)? Enabling profiling shouldn’t change the meaning of a program.

I remember back in the day having to be careful with regexes in Python to make sure they were always precompiled outside of loops and functions, but one of the nice things about Haskell is that one can usually let the compiler take care of this. (Nowadays Python gets around this by caching compiled regexes, but I prefer Haskell’s statically-optimized approach.)


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Code runs 7x FASTER with profiling

Brandon Allbery
Bugs, usually. I think there was one such in 8.2.1 and if so it may not have been completely fixed (trying to read around the edges of some dev discussion, since I'm kinda far from expert when it comes to ghc internals).

On Thu, Dec 7, 2017 at 4:25 PM, Neil Mayhew <[hidden email]> wrote:

On 2017-12-07 12:49 AM, Jonas Scholl wrote:

Looking at the produced core of both versions reveals that in the profiled build a closure of type Regex is floated to top level. The non-profiled build doesn’t do this, thus it recompiles the regex for every iteration. This is most likely the source of the slowdown of the non-profiled build.

Thanks, Jonas. This does indeed seem to be the problem. I changed the code to use a compiled regex (with makeRegex and match instead of =~) but in the non-profiling case the run-time doesn’t improve unless I float the compiled regex myself:

parseFilename :: String -> Either String (String, String)
parseFilename fn = case (pattern `match` fn :: [[String]]) of
    [[_, full, _, time]] -> Right $ (full, time)
    _ -> Left fn

pattern :: Regex
pattern = makeRegex
    "^\\./duplicity-(full|inc|new)(-signatures)?\\.\
    \([0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]T[0-9][0-9][0-9][0-9][0-9][0-9]Z)\\."

Then it runs 2-3x faster than the profiled code.

The question remains, however: why doesn’t the ghc optimizer spot this fairly obvious loop-invariant in the non-profiled build when it does manage to spot it in the profiled one? In other words, when I make pattern a local definition of parseFilename, why isn’t it treated as a CAF that’s evaluated only once (‘floated to the top level’)? Enabling profiling shouldn’t change the meaning of a program.

I remember back in the day having to be careful with regexes in Python to make sure they were always precompiled outside of loops and functions, but one of the nice things about Haskell is that one can usually let the compiler take care of this. (Nowadays Python gets around this by caching compiled regexes, but I prefer Haskell’s statically-optimized approach.)


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.



--
brandon s allbery kf8nh                               sine nomine associates
[hidden email]                                  [hidden email]
unix, openafs, kerberos, infrastructure, xmonad        http://sinenomine.net

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Code runs 7x FASTER with profiling

Neil Mayhew
On 2017-12-07 03:05 PM, Brandon Allbery wrote:
 > Bugs, usually. I think there was one such in 8.2.1 and if so it may
not have been completely fixed

So if I was being a good citizen I'd try this out with ghc master built
from source and report a bug if it's not fixed? Or do you thin it would
be sufficient to use a binary installer of ghc 8.2.2?
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Code runs 7x FASTER with profiling

Brandon Allbery
I'd do both, both to get information on what it looks like in the latest release and to verify something hasn't changed in 8.3/future 8.4 (which is moving forward). Certainly the behavior here sounds wrong and should be reported.

On Thu, Dec 7, 2017 at 5:12 PM, Neil Mayhew <[hidden email]> wrote:
On 2017-12-07 03:05 PM, Brandon Allbery wrote:
> Bugs, usually. I think there was one such in 8.2.1 and if so it may not have been completely fixed

So if I was being a good citizen I'd try this out with ghc master built from source and report a bug if it's not fixed? Or do you thin it would be sufficient to use a binary installer of ghc 8.2.2?

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.



--
brandon s allbery kf8nh                               sine nomine associates
[hidden email]                                  [hidden email]
unix, openafs, kerberos, infrastructure, xmonad        http://sinenomine.net

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Code runs 7x FASTER with profiling

Sven Panne-2
In reply to this post by Neil Mayhew
2017-12-07 22:25 GMT+01:00 Neil Mayhew <[hidden email]>:

[...] The question remains, however: why doesn’t the ghc optimizer spot this fairly obvious loop-invariant in the non-profiled build when it does manage to spot it in the profiled one? In other words, when I make pattern a local definition of parseFilename, why isn’t it treated as a CAF that’s evaluated only once (‘floated to the top level’)?

Because making something a CAF is not always an optimization: If your evaluated CAF uses e.g. hundreds of MB it might be preferable to *not* have it as a CAF, but to to recompute it instead. If I haven't missed something recently, CAFs can't be garbage collected, but I would be happy to be corrected here. :-)

 

Enabling profiling shouldn’t change the meaning of a program.

It doesn't change the meaning, that would really be a desaster, it just changes the performance characteristics. This is still not nice, but not much different from using different levels of optimization: Doing or not doing e.g. strictness analysis might change the space complexity etc.
 

I remember back in the day having to be careful with regexes in Python to make sure they were always precompiled outside of loops and functions, but one of the nice things about Haskell is that one can usually let the compiler take care of this. (Nowadays Python gets around this by caching compiled regexes, but I prefer Haskell’s statically-optimized approach.)

I think totally relying on the compiler for performance is a common misconception, because things are often more tricky than initially thought. In our example: Time vs. space tradeoff, and there is no universally "right" solution.

Cheers,
   S.

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Code runs 7x FASTER with profiling

Neil Mayhew
In reply to this post by Brandon Allbery

On 2017-12-07 03:33 PM, Brandon Allbery wrote:

Certainly the behavior here sounds wrong and should be reported.

https://ghc.haskell.org/trac/ghc/ticket/14564 - there’s already been a response (from Simon P-J) and it is being treated as a bug.

I get the slowdown with 8.0.2, 8.2.1 and 8.2.2 but 7.10.3 is OK.

Since 8.2.2 is very recent (20 Nov) I didn’t go to the trouble of building ghc from source.


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Code runs 7x FASTER with profiling

Neil Mayhew
In reply to this post by Sven Panne-2

On 2017-12-08 01:52 AM, Sven Panne wrote:

… making something a CAF is not always an optimization: If your evaluated CAF uses e.g. hundreds of MB it might be preferable to not have it as a CAF, but to to recompute it instead.

Good point. However, I think it would be very hard for the compiler to detect which CAFs use a lot of memory and aren’t referred to very often.

If I haven’t missed something recently, CAFs can’t be garbage collected, but I would be happy to be corrected here. :-)

There’s a helpful discussion on the Haskell Wiki here. In particular, it says:

A CAF … may only be accessible from within the code of one or more functions. In order for the garbage collector to be able to reclaim such structures, we associate with each function a list of the CAFs to which it refers. When garbage collecting a reference to the function we collect the CAFs on its list.

In my case, since the function itself is top-level, I assume the CAF can’t be garbage-collected.

Enabling profiling shouldn’t change the meaning of a program.

It doesn’t change the meaning, that would really be a desaster, it just changes the performance characteristics. This is still not nice, but not much different from using different levels of optimization: Doing or not doing e.g. strictness analysis might change the space complexity etc.

Yes, I see you’re right. However, I guess I was using ‘meaning’ loosely to mean ‘is a CAF’ and I assumed a CAF would always be floated. The above-mentioned wiki page says, “A CAF can always be lifted to the top level of the program.” However, I realize “can” is not the same as “should”.

I remember back in the day having to be careful with regexes in Python to make sure they were always precompiled outside of loops and functions, but one of the nice things about Haskell is that one can usually let the compiler take care of this. (Nowadays Python gets around this by caching compiled regexes, but I prefer Haskell’s statically-optimized approach.)

I think totally relying on the compiler for performance is a common misconception, because things are often more tricky than initially thought. In our example: Time vs. space tradeoff, and there is no universally “right” solution.

Good point. Having great optimization isn’t a justification for complete mental laziness on the part of the programmer! However, I did find the behaviour in this case surprising and unintuitive, given ghc’s usual ability to optimize things, and having it change on me when I enabled profiling left me wondering where to go next. The code I presented here is considerably simplified from the original program, and represents a lot of work already expended trying to get to the root of the problem.


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Code runs 7x FASTER with profiling

Brandon Allbery
On Fri, Dec 8, 2017 at 9:18 AM, Neil Mayhew <[hidden email]> wrote:

Good point. Having great optimization isn’t a justification for complete mental laziness on the part of the programmer! However, I did find the behaviour in this case surprising and unintuitive, given ghc’s usual ability to optimize things, and having it change on me when I enabled profiling left me wondering where to go next. The code I presented here is considerably simplified from the original program, and represents a lot of work already expended trying to get to the root of the problem.

This is actually why I (and likely SPJ) am inclined to consider it a bug; while it might be arguable as Sven does, the counterintuitive effect when you turn on profiling suggests that it's not intended or expected. (Although maybe it should be; the intuition is "profiling disables optimization" but what it really does is a bit more subtle than that overly simplistic notion.)

--
brandon s allbery kf8nh                               sine nomine associates
[hidden email]                                  [hidden email]
unix, openafs, kerberos, infrastructure, xmonad        http://sinenomine.net

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.