Performance: MD5

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
48 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Performance: MD5

Andrew Coppin
Hi folks.

OK, try this:

  darcs get http://darcs.orphi.me.uk/MyMD5
  cd MyMD5
  ghc -O2 --make md5sum
  md5sum <some large filename>

On my PC, it takes 3.5 seconds to compute the MD5 hash of a 10 MB file.
For comparison, the md5.exe I downloaded from the Internet does it
instantaneously. It's literally so fast I can't even time it. As far as
I know, my program always produces the *correct* answer, it just takes
far too long to do it.

If anybody has any interesting insights as to why my version is so
damned slow, I'd like to hear them. (Indeed, a description of the
process for tracking the problem down would also be useful!)

[Much to my bemusement, when I had a look at the code, I discovered that
tucked away in a corner somewhere, it reads a ByteString from disk and
instantly converts it to [Word8]. Uh... oops? Anyway, eliminating this
changed the numbers I get from the profiler, but it's still far too slow.]

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Performance: MD5

Bulat Ziganshin-2
Hello Andrew,

Saturday, May 17, 2008, 6:51:27 PM, you wrote:

> If anybody has any interesting insights as to why my version is so
> damned slow, I'd like to hear them.

i equally will be interesting to hear why you think what your program
should be as fast as C version? you wrote it in purely functional
style. as Don wrote, if you want to reach unoptimized C-like
performance, you need to write in C style - with explicit loop
processing primitive types. so

1) -funbox-strict-fields
2) don't use tuples as arguments - they are lazy. you may implement
strict tuples instead
3) function as a parameter is very bad unless it will be inlined

but these are only half-decisions. if you need maximum efficiency,
drop all this high-level code and map md5.c to haskell as it was done
in Don's bloag


--
Best regards,
 Bulat                            mailto:[hidden email]

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Performance: MD5

Thomas DuBuisson
In reply to this post by Andrew Coppin
Andrew,
I spent a reasonable amount of time making pureMD5 (available on
hackage) faster, which mainly ment strictness annoitations and unboxing
strict fields, but I also spent a good deal of time with the profiler.
One of my early versions was fairly slow due to the converting of the
LPS to blocks (an Isabelle proven 'blocks' function) caused O(n) space
requirement.  I've been meaning to revisit this to optimize more and
look closly at GHC core.

I'm on vacation now, but when I get back I'll try to make time to look
at your code (unless its moot by then).  Finally, I encourage anyone
interested to make reasonably fast bytestring implementations of SHA
algorithms as well - Haskell/Hackage has a number of pure and FFI
implementations of MD5 but is a bit lacking in any other cryptographic
algorithm.

TomMD

On Sat, 2008-05-17 at 15:51 +0100, Andrew Coppin wrote:

> Hi folks.
>
> OK, try this:
>
>   darcs get http://darcs.orphi.me.uk/MyMD5
>   cd MyMD5
>   ghc -O2 --make md5sum
>   md5sum <some large filename>
>
> On my PC, it takes 3.5 seconds to compute the MD5 hash of a 10 MB file.
> For comparison, the md5.exe I downloaded from the Internet does it
> instantaneously. It's literally so fast I can't even time it. As far as
> I know, my program always produces the *correct* answer, it just takes
> far too long to do it.
>
> If anybody has any interesting insights as to why my version is so
> damned slow, I'd like to hear them. (Indeed, a description of the
> process for tracking the problem down would also be useful!)
>
> [Much to my bemusement, when I had a look at the code, I discovered that
> tucked away in a corner somewhere, it reads a ByteString from disk and
> instantly converts it to [Word8]. Uh... oops? Anyway, eliminating this
> changed the numbers I get from the profiler, but it's still far too slow.]
>
> _______________________________________________
> Haskell-Cafe mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Performance: MD5

Andrew Coppin
In reply to this post by Bulat Ziganshin-2
Bulat Ziganshin wrote:
> if you need maximum efficiency,
> drop all this high-level code and map md5.c to haskell as it was done
> in Don's blog
>  

Wait... you're telling me to give up? To not even try??

So all that bravado about Haskell enabling higher-level optimisations to
produce a result faster than C is actually complete nonesense?

What an awesome "community" this language has behind it... I'm not sure
why I bother.

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re[2]: Performance: MD5

Bulat Ziganshin-2
Hello Andrew,

Sunday, May 18, 2008, 12:40:20 PM, you wrote:

> So all that bravado about Haskell enabling higher-level optimisations to
> produce a result faster than C is actually complete nonesense?

yes. look at bytestring implementation, for example. sorry, but you
shouldn't believe in something until you tried it yourself :)


--
Best regards,
 Bulat                            mailto:[hidden email]

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Performance: MD5

Neil Mitchell
In reply to this post by Andrew Coppin
Hi

>> if you need maximum efficiency,
>> drop all this high-level code and map md5.c to haskell as it was done
>> in Don's blog
>
> Wait... you're telling me to give up? To not even try??

He's saying use the resources the community have already provided,  to
write it in a lower-level style, following much the same pattern as
Don did.

> So all that bravado about Haskell enabling higher-level optimisations to
> produce a result faster than C is actually complete nonesense?

Nope. If you just code up a simple algorithm in C and in Haskell, it
is pretty unlikely the Haskell one will go as fast as the C one. But
Haskell (in particular GHC) does have powerful low-level features that
means you can probably draw with C.

One day, I hope that high-level Haskell will outperform C all the
time. It's already true with some of the ByteString stuff, but
hopefully one day it will be a normal result. It's pretty much the
goal of this optimiser: http://www-users.cs.york.ac.uk/~ndm/supero/

Thanks

Neil
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re[2]: Performance: MD5

Bulat Ziganshin-2
Hello Neil,

Sunday, May 18, 2008, 12:54:30 PM, you wrote:

> One day, I hope that high-level Haskell will outperform C all the
> time. It's already true with some of the ByteString stuff, but

i don't think so. all the tests where Haskell "outperform" C i've seen
was just unfair - for example they compare results of
MANUALLY-unrolled Haskell loops with rolled C ones - totally ignoring
the fact that gcc can unroll loops just by enabling option while with
ghc this can be done only by hand. many tests that show "equal
performance" just limited by memory bandwidth. and so on

it's hard to write program that outperforms gcc-generated code even
using assembler, and ghc is limited to this fact, too :)


--
Best regards,
 Bulat                            mailto:[hidden email]

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Performance: MD5

Don Stewart-2
In reply to this post by Andrew Coppin
andrewcoppin:

> Bulat Ziganshin wrote:
> >if you need maximum efficiency,
> >drop all this high-level code and map md5.c to haskell as it was done
> >in Don's blog
> >  
>
> Wait... you're telling me to give up? To not even try??
>
> So all that bravado about Haskell enabling higher-level optimisations to
> produce a result faster than C is actually complete nonesense?

What optimisations were operating in your md5 implementation that you
could expect it do match the C version?

> What an awesome "community" this language has behind it... I'm not sure
> why I bother.

Andrew, take the time to understand what you're code actually does when
you write it. If you want to write code that matches some heavily
optimised md5 in C (did you look at the C implementation?) then your
Haskell will need to compile down to similar data and control structures.

Does it?

So, as Bulat says, explain why your code should have similar performance
to the C version. It would be much more useful than complaining about
the community who patiently help you day in and out.

-- Don
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Performance: MD5

Andrew Coppin
Don Stewart wrote:

> andrewcoppin:
>  
>> Wait... you're telling me to give up? To not even try??
>>
>> So all that bravado about Haskell enabling higher-level optimisations to
>> produce a result faster than C is actually complete nonesense?
>>    
>
> What optimisations were operating in your md5 implementation that you
> could expect it do match the C version?
>  

I was hoping that things would be inlined, small loops unrolled, and I'd
end up with more or less the same imperative loops as the C version has.
I wasn't expecting to *beat* C in this particular case, but I expected
to get somewhere fairly near it.

> If you want to write code that matches some heavily
> optimised md5 in C then your
> Haskell will need to compile down to similar data and control structures.
>  

Agreed.

> (did you look at the C implementation?)
>  

I can't read C. (FWIW, I think I did briefly stare at the sources, but
eventually gave up because I simply had no clue what's going on.)

> Does it?
>  

I'm still figuring out how to answe that question.

[As in, "the profiler says 60% of my time is spent inside one function,
I've now tried writing that function 6 different ways, and currently it
compiles to about 3 pages of Core that I'm still attempting to
decypher". Given the size of the Core for this function, I'd say it's
probably inlining and unrolling just fine...]

> So, as Bulat says, explain why your code should have similar performance
> to the C version.

Because it executes the same algorithm? I mean, there's essentially only
one way round that you can perform the MD5 algorithm, so it just comes
down to how efficiently you execute the steps involved.

> It would be much more useful than complaining about
> the community who patiently help you day in and out.
>  

Telling somebody "don't bother trying, it's impossible" doesn't strike
me as terribly helpful. It doesn't make my code go any faster, and it
certainly doesn't help me learn anything.

I guess my responce was a little harsh. But when you invest a lot of
time and effort in something, and somebody off-handedly tells you you're
wasting your time... it's quite upsetting.

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Performance: MD5

Duncan Coutts

On Sun, 2008-05-18 at 11:20 +0100, Andrew Coppin wrote:

> > So, as Bulat says, explain why your code should have similar performance
> > to the C version.
>
> Because it executes the same algorithm? I mean, there's essentially only
> one way round that you can perform the MD5 algorithm, so it just comes
> down to how efficiently you execute the steps involved.

Ah if only computation were that simple.

One might say that the ultimate perfect compiler would have this
property. However it's provably impossible. See for example the Full
Employment Theorem for Compiler Writers[1].

Duncan

[1] http://en.wikipedia.org/wiki/Full_employment_theorem


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Performance: MD5

Andrew Coppin
In reply to this post by Andrew Coppin
Andrew Coppin wrote:
> If anybody has any interesting insights as to why my version is so
> damned slow, I'd like to hear them.

OK, so I am now officially talking to myself.

Profiling indicates that 60% of my program's time is spent inside one
function:

  make_block_bs :: ByteString -> Block

In other words, the program spends 60% of its running time converting an
array of Word8 into an array of Word32. [Note that the MD5 algorithm
incorrectly specifies that little-endian ordering should be used, so I
do the byte-swapping in here too.] The code for breading the file into
correctly-sized pieces and adding padding is a ful 3% of the runtime, so
it's not even the copying overhead. It's *all* in the array conversions.

How much do you want to bet that the C version just reads a chunk of
data into RAM and then does a simple type-cast? (If I'm not very much
mistaken, in C a type-cast is an O(0) operation.)

Anyway, wading through the 3-page Core output for this function, I'm
seeing a lot of stuff of the form

  case GHC.Prim.<=# 0 whuhrjhsd of wild3784_hguhse {
    GHC.Base.False ->
      (GHC.Err.error @ GHC.Base.Int MD5.Types.lvl
      `cast` (CoUnsafe ...

I'd be lying if I had I had any real idea what that does. But it occurs
to me that the code is probably performing runtime checks that every
single array access is within bounds, and that every single bit shift
operation is similarly within bounds. This is obviously very *safe*, but
possibly not very performant.

Isn't there some function kicking around somewhere for acessing arrays
without the bounds check? Similarly, I'm sure I remember the Data.Bits
bounds check being mentioned before, and somebody saying they had [or
were going to?] make an unsafe variant available. I've looked around the
library documentation and I'm not seeing anything... any hints?

(Obviously I could attempt to dig through GHC.Prim and find what I'm
after there, but... it looks pretty scary in there! I'm hoping for
something slightly higher-level if possible.)

The alternative, obviously, is to just perform an evil type cast.
Trouble is, that'll work just fine on any architecture with a
little-endian data order [conspicuously IA32 and AMD64], and malfunction
horribly on any architecture that works correctly. (At which point I
start wondering how C manages to overcome this problem...)

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re[2]: Performance: MD5

Bulat Ziganshin-2
Hello Andrew,

Sunday, May 18, 2008, 3:42:23 PM, you wrote:

> How much do you want to bet that the C version just reads a chunk of
> data into RAM and then does a simple type-cast? (If I'm not very much
> mistaken, in C a type-cast is an O(0) operation.)

try unsafeCoerce# for that. or just use hGetArray to read directly
into array of type you need


--
Best regards,
 Bulat                            mailto:[hidden email]

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re[2]: Performance: MD5

Bulat Ziganshin-2
In reply to this post by Andrew Coppin
Hello Andrew,

Sunday, May 18, 2008, 3:42:23 PM, you wrote:

> Isn't there some function kicking around somewhere for acessing arrays
> without the bounds check? Similarly, I'm sure I remember the Data.Bits
> bounds check being mentioned before, and somebody saying they had [or
> were going to?] make an unsafe variant available. I've looked around the
> library documentation and I'm not seeing anything... any hints?

unsafeRead/Write

(I# a) <<# (I# b) = (I# (a `iShiftL#` b))
(I# a) >># (I# b) = (I# (a `uncheckedIShiftRL#` b))


--
Best regards,
 Bulat                            mailto:[hidden email]

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Performance: MD5

Andrew Coppin
In reply to this post by Bulat Ziganshin-2
Bulat Ziganshin wrote:

> Hello Andrew,
>
> Sunday, May 18, 2008, 3:42:23 PM, you wrote:
>
>  
>> How much do you want to bet that the C version just reads a chunk of
>> data into RAM and then does a simple type-cast? (If I'm not very much
>> mistaken, in C a type-cast is an O(0) operation.)
>>    
>
> try unsafeCoerce# for that.

Yeah, I might end up being forced to do that. [Although obviously it'll
break on big-endian architectures.]

> or just use hGetArray to read directly into array of type you need
>  

hGetArray :: Handle -> IOUArray Int Word8 -> Int -> IO Int

(But yeah, I can coerce it afterwards.)

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Performance: MD5

Andrew Coppin
In reply to this post by Bulat Ziganshin-2
Bulat Ziganshin wrote:

> Hello Andrew,
>
> Sunday, May 18, 2008, 3:42:23 PM, you wrote:
>
>  
>> Isn't there some function kicking around somewhere for acessing arrays
>> without the bounds check? Similarly, I'm sure I remember the Data.Bits
>> bounds check being mentioned before, and somebody saying they had [or
>> were going to?] make an unsafe variant available. I've looked around the
>> library documentation and I'm not seeing anything... any hints?
>>    
>
> unsafeRead/Write
>  

Found in Data.Array.Base, apparently. (I managed to find an example in
the Gtk2hs "fastdraw" demo.)

Oddly, this seems to make virtually no performance difference. I find
this highly surprising. But then, I perform 16 write to the array, and
64 reads from the ByteString, so maybe that's where I should be worrying
about bounds checks...

> (I# a) <<# (I# b) = (I# (a `iShiftL#` b))
> (I# a) >># (I# b) = (I# (a `uncheckedIShiftRL#` b))
>  

Does it make a difference if I want Word32 instead of Int?

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Performance: MD5

Brandon S Allbery KF8NH
In reply to this post by Andrew Coppin

On 2008 May 18, at 4:40, Andrew Coppin wrote:

> Bulat Ziganshin wrote:
>> if you need maximum efficiency,
>> drop all this high-level code and map md5.c to haskell as it was done
>> in Don's blog
>
> Wait... you're telling me to give up? To not even try??

Bulat is the naysayer of the group; according to him Haskell is a nice  
idea that will never actually work (but he uses it anyway, who knows  
how he rationalizes it to himself).

--
brandon s. allbery [solaris,freebsd,perl,pugs,haskell] [hidden email]
system administrator [openafs,heimdal,too many hats] [hidden email]
electrical and computer engineering, carnegie mellon university    KF8NH


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Performance: MD5

Andrew Coppin
Brandon S. Allbery KF8NH wrote:
>
> Bulat is the naysayer of the group; according to him Haskell is a nice
> idea that will never actually work (but he uses it anyway, who knows
> how he rationalizes it to himself).
>

Bulat is apparently not alone in this belief. I seem to spend all my
time on other forums dealing with people who have exactly the same opinion.

Of course, being able to come up with clean, elegant code which never
the less performs well is a good way to counter this. Pity I haven't
succeeded in producing any yet. :-S

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re[2]: Performance: MD5

Bulat Ziganshin-2
In reply to this post by Brandon S Allbery KF8NH
Hello Brandon,

Sunday, May 18, 2008, 5:48:48 PM, you wrote:

>> Bulat Ziganshin wrote:
>>> if you need maximum efficiency,
>>> drop all this high-level code and map md5.c to haskell as it was done
>>> in Don's blog

> Bulat is the naysayer of the group; according to him Haskell is a nice
> idea that will never actually work (but he uses it anyway, who knows  
> how he rationalizes it to himself).

it significantly depends on the group, actually :) a few weeks ago i
tried to convince managers of large software company to use haskell
for development of complex algorithms. but they don't believe that
using new language instead of C++ can raise productivity several times
and told me that for real programmer it should be no matter which
language to use. OTOH when i say here that haskell produce slow code,
you can't belive me and saying that haskell is so great that it can't
have any shortages. well, i use Haskell for complex algorithms and
C++ for performance-critical parts of program which resulted in
development of world-fastest archiver. if you know how to write
high-level haskell programs that are as fast as C++ code - please tell
us just now


--
Best regards,
 Bulat                            mailto:[hidden email]

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Performance: MD5

Brandon S Allbery KF8NH
In reply to this post by Andrew Coppin

On 2008 May 18, at 9:55, Andrew Coppin wrote:

> Brandon S. Allbery KF8NH wrote:
>>
>> Bulat is the naysayer of the group; according to him Haskell is a  
>> nice idea that will never actually work (but he uses it anyway, who  
>> knows how he rationalizes it to himself).
>>
>
> Bulat is apparently not alone in this belief. I seem to spend all my  
> time on other forums dealing with people who have exactly the same  
> opinion.

There's a difference.  When I started (GHC 6.4.2 was current), Bulat  
was spending his mailing list time denying the possibility of what DPH  
does now, and claiming that what GHC 6.8.2 does now was unlikely.

(Meanwhile, I can't say much for "oh, i didn't understand that code,  
but surely we should be able to do at least as good without  
performance hacks?" when the code you didn't understand is all  
performance hacks....  You really have no business trying to draw  
comparisons when you don't even know when you're comparing apples to  
aardvarks, let alone apples to oranges.)

Optimization is hard.  Don't like it?  Become a compiler researcher  
and find better ways to do it.  Note that C / procedural language  
optimization research has at least a 20-year head start on functional  
programming optimization, and that the problems are very different:  
the C world would love to be at the point where optimizing the C  
equivalent of "sum xs / length xs" is worth thinking about; they're  
still not really in a position to *detect* it unless the language is  
simple enough to make such reasoning relatively easy (e.g. FORTRAN).

--
brandon s. allbery [solaris,freebsd,perl,pugs,haskell] [hidden email]
system administrator [openafs,heimdal,too many hats] [hidden email]
electrical and computer engineering, carnegie mellon university    KF8NH


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re[2]: Performance: MD5

Bulat Ziganshin-2
Hello Brandon,

Sunday, May 18, 2008, 6:25:31 PM, you wrote:

> There's a difference.  When I started (GHC 6.4.2 was current), Bulat
> was spending his mailing list time denying the possibility of what DPH
> does now, and claiming that what GHC 6.8.2 does now was unlikely.

what DPH and 6.8.2 does? i said that ghc can't reach the same level
of optimization as gcc and it's still true - for example, it can't
unroll loops. ghc 6.6 generates very simple code which holds all the
vars in the memory, 6.8 can hold them in registers. gcc implements
much more sophisticated optimizations, just try to implement something
larger than one-liners in imperative haskell and C and compare
results. it seems that you never developed highly-optimized programs
in C, so all your reasoning is just product of haskell-fanatic imagination

show me your code. or shut up, please


--
Best regards,
 Bulat                            mailto:[hidden email]

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
123