Performance: MD5

classic Classic list List threaded Threaded
48 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: Re: Reduceron (was Performance: MD5)

Andrew Coppin
Achim Schneider wrote:

> "Neil Mitchell" <[hidden email]> wrote:
>
>  
>>> How would a Haskell compiler look like that targets a FPGA? That is,
>>> compiling down to configware, not to a RTS built on top of it.
>>>      
>> http://www-users.cs.york.ac.uk/~mfn/reduceron2/
>>
>>    
> I'm only on page 5 and already drooling.
>  

Damnit, this paper is making me hungry! :-(

I almost want to run out into the street and purchase some FPGA hardware
right now! [I mean, I wanted to before, but this makes me want to even
more.] But hey, let's not get crazy here. (Yet.)

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Re: Reduceron (was Performance: MD5)

Sebastian Sylvan


On Sun, May 18, 2008 at 9:09 PM, Andrew Coppin <[hidden email]> wrote:
Achim Schneider wrote:
"Neil Mitchell" <[hidden email]> wrote:

 
How would a Haskell compiler look like that targets a FPGA? That is,
compiling down to configware, not to a RTS built on top of it.
     
http://www-users.cs.york.ac.uk/~mfn/reduceron2/

   
I'm only on page 5 and already drooling.
 

Damnit, this paper is making me hungry! :-(

I almost want to run out into the street and purchase some FPGA hardware right now! [I mean, I wanted to before, but this makes me want to even more.] But hey, let's not get crazy here. (Yet.)



--
Sebastian Sylvan
+44(0)7857-300802
UIN: 44640862
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Performance: MD5

Aaron Denney
In reply to this post by Achim Schneider
On 2008-05-18, Achim Schneider <[hidden email]> wrote:

> Aaron Denney <[hidden email]> wrote:
>
>> Go read K&R[1].  It shouldn't take more than a week's worth of spare
>> time.
>>
> HELL NO!
>
> There's a reason why my lecturer always refered to it as "Knall & Rauch
> C" (Bang and Smoke C).
>
> Get the Harbison & Steele instead:
> http://careferencemanual.com/

C is a little language.  It doesn't /need/ a 500 page tome.  K&R is not
something to learn programming, and of course, does quite poorly when
used as a textbook for that purpose.  It is, instead, for programmers to
learn C.

--
Aaron Denney
-><-

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Performance: MD5

Salvatore Insalaco
In reply to this post by Andrew Coppin
2008/5/18 Andrew Coppin <[hidden email]>:

>> unsafeRead/Write
>>
>
> Found in Data.Array.Base, apparently. (I managed to find an example in the
> Gtk2hs "fastdraw" demo.)
>
> Oddly, this seems to make virtually no performance difference. I find this
> highly surprising. But then, I perform 16 write to the array, and 64 reads
> from the ByteString, so maybe that's where I should be worrying about bounds
> checks...

Hi Andrew,
just a profiling suggestion: did you try to use the SCC cost-centre
annotations for profiling?
If you want to know precisely what takes 60% of time, you can try:
        bn = {-# SCC "IntegerConversion" #-} 4 * fromIntegral wn
        b0 = {-# SCC "ByteStringIndexing" #-} B.index bi (bn+0)
        b1 = {-# SCC "ByteStringIndexing" #-} B.index bi (bn+1)
        b2 = {-# SCC "ByteStringIndexing" #-} B.index bi (bn+2)
        b3 = {-# SCC "ByteStringIndexing" #-} B.index bi (bn+3)
        w  = foldl' (\w b -> shiftL w 8 .|. fromIntegral b) 0 [b3,b2,b1,b0]
      in {-# SCC "ArrayWriting" #-} unsafeWrite temp wn w

In profiling the time of all expressions with the same SCC "name" will
be summed.
You can get more information about SCC here:
http://www.haskell.org/ghc/docs/latest/html/users_guide/profiling.html#cost-centres

Obviously, even ultra-optimizing this function to be 60 times faster
(assuming that this is possible) will only make your program go about
2 time faster.

One advice: I've seen various md5sum implementations in C, all using
about the same algorithms and data structures, and they performed even
with 10x time differences between them; md5summing fast is not a
really simple problem. If I were you I would drop the comparison with
ultra-optimized C and concentrate on "does my
high-level-good-looking-super-readable implementation perform
acceptably?".

What "acceptably" means is left as an exercise to the reader.

Salvatore
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Performance: MD5

Jules Bean
In reply to this post by Andrew Coppin
Andrew Coppin wrote:
>
> So all that bravado about Haskell enabling higher-level optimisations to
> produce a result faster than C is actually complete nonesense?

Nonsense? No.

Something that actually exists? No.

Haskell enables optimisations. Writing a SufficientlySmartCompiler (
http://c2.com/cgi/wiki?SufficientlySmartCompiler ) still requires work.

Jules

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Performance: MD5

Andrew Coppin
In reply to this post by Salvatore Insalaco
Salvatore Insalaco wrote:

> Hi Andrew,
> just a profiling suggestion: did you try to use the SCC cost-centre
> annotations for profiling?
> If you want to know precisely what takes 60% of time, you can try:
>         bn = {-# SCC "IntegerConversion" #-} 4 * fromIntegral wn
>         b0 = {-# SCC "ByteStringIndexing" #-} B.index bi (bn+0)
>         b1 = {-# SCC "ByteStringIndexing" #-} B.index bi (bn+1)
>         b2 = {-# SCC "ByteStringIndexing" #-} B.index bi (bn+2)
>         b3 = {-# SCC "ByteStringIndexing" #-} B.index bi (bn+3)
>         w  = foldl' (\w b -> shiftL w 8 .|. fromIntegral b) 0 [b3,b2,b1,b0]
>       in {-# SCC "ArrayWriting" #-} unsafeWrite temp wn w
>
> In profiling the time of all expressions with the same SCC "name" will
> be summed.
> You can get more information about SCC here:
> http://www.haskell.org/ghc/docs/latest/html/users_guide/profiling.html#cost-centres
>  

OK, I'll give that a try...

> One advice: I've seen various md5sum implementations in C, all using
> about the same algorithms and data structures, and they performed even
> with 10x time differences between them; md5summing fast is not a
> really simple problem. If I were you I would drop the comparison with
> ultra-optimized C and concentrate on "does my
> high-level-good-looking-super-readable implementation perform
> acceptably?".
>
> What "acceptably" means is left as an exercise to the reader.
>  

Well, so long as it can hash 500 MB of data in a few minutes without
using absurd amounts of RAM, that'll do for me. ;-)

[I actually wanted to do this for a project at work. When I discovered
that none of the available Haskell implementations are fast enough, I
tried to write my own. But that wasn't fast enough either. So I ended up
being forced to call md5sum.exe and attempt to parse its output.
Obviously having a real Haskell function would be infinitely more
portable and convinient...]

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Performance: MD5

Don Stewart-2
andrewcoppin:

> Salvatore Insalaco wrote:
> >Hi Andrew,
> >just a profiling suggestion: did you try to use the SCC cost-centre
> >annotations for profiling?
> >If you want to know precisely what takes 60% of time, you can try:
> >        bn = {-# SCC "IntegerConversion" #-} 4 * fromIntegral wn
> >        b0 = {-# SCC "ByteStringIndexing" #-} B.index bi (bn+0)
> >        b1 = {-# SCC "ByteStringIndexing" #-} B.index bi (bn+1)
> >        b2 = {-# SCC "ByteStringIndexing" #-} B.index bi (bn+2)
> >        b3 = {-# SCC "ByteStringIndexing" #-} B.index bi (bn+3)
> >        w  = foldl' (\w b -> shiftL w 8 .|. fromIntegral b) 0
> >        [b3,b2,b1,b0]
> >      in {-# SCC "ArrayWriting" #-} unsafeWrite temp wn w
> >
> >In profiling the time of all expressions with the same SCC "name"
> >will
> >be summed.
> >You can get more information about SCC here:
> >http://www.haskell.org/ghc/docs/latest/html/users_guide/profiling.html#cost-centres
> >  
>
> OK, I'll give that a try...
>
> >One advice: I've seen various md5sum implementations in C, all using
> >about the same algorithms and data structures, and they performed
> >even
> >with 10x time differences between them; md5summing fast is not a
> >really simple problem. If I were you I would drop the comparison with
> >ultra-optimized C and concentrate on "does my
> >high-level-good-looking-super-readable implementation perform
> >acceptably?".
> >
> >What "acceptably" means is left as an exercise to the reader.
> >  
>
> Well, so long as it can hash 500 MB of data in a few minutes without
> using absurd amounts of RAM, that'll do for me. ;-)
>
> [I actually wanted to do this for a project at work. When I
> discovered that none of the available Haskell implementations are
> fast enough,

How hard did you look?
   
    import System.Environment
    import Data.Digest.OpenSSL.MD5
    import System.IO.Posix.MMap

    main = do
        [f] <- getArgs
        putStrLn . md5sum =<< mmapFile f

Take the md5 of a 600M file:

    $ time ./A /home/dons/tmp/600M  
    24a04fdf3f629a42b5baed52ed793a51
    ./A /home/dons/tmp/600M  3.61s user 1.65s system 20% cpu 25.140 total

Easy.


-- Don
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Performance: MD5

Thomas DuBuisson
In reply to this post by Andrew Coppin
Andrew,
What is "fast enough"?  My informal understanding of the implementations
are:

1) Unoptimized [Word8] implementations (constant space, two or three
orders of magnatude slower than C)
2) Bindings to 'C' routines (typically O(n) space due to strict
ByteStrings and no split out of initialContext/md5Update/md5Finialize)
3) Native Lazy.ByteString implementation (perhaps more than one?) ~3-6
times slower than 'C'.

Tom

On Tue, 2008-05-20 at 19:44 +0100, Andrew Coppin wrote:

> Salvatore Insalaco wrote:
> > Hi Andrew,
> > just a profiling suggestion: did you try to use the SCC cost-centre
> > annotations for profiling?
> > If you want to know precisely what takes 60% of time, you can try:
> >         bn = {-# SCC "IntegerConversion" #-} 4 * fromIntegral wn
> >         b0 = {-# SCC "ByteStringIndexing" #-} B.index bi (bn+0)
> >         b1 = {-# SCC "ByteStringIndexing" #-} B.index bi (bn+1)
> >         b2 = {-# SCC "ByteStringIndexing" #-} B.index bi (bn+2)
> >         b3 = {-# SCC "ByteStringIndexing" #-} B.index bi (bn+3)
> >         w  = foldl' (\w b -> shiftL w 8 .|. fromIntegral b) 0 [b3,b2,b1,b0]
> >       in {-# SCC "ArrayWriting" #-} unsafeWrite temp wn w
> >
> > In profiling the time of all expressions with the same SCC "name" will
> > be summed.
> > You can get more information about SCC here:
> > http://www.haskell.org/ghc/docs/latest/html/users_guide/profiling.html#cost-centres
> >  
>
> OK, I'll give that a try...
>
> > One advice: I've seen various md5sum implementations in C, all using
> > about the same algorithms and data structures, and they performed even
> > with 10x time differences between them; md5summing fast is not a
> > really simple problem. If I were you I would drop the comparison with
> > ultra-optimized C and concentrate on "does my
> > high-level-good-looking-super-readable implementation perform
> > acceptably?".
> >
> > What "acceptably" means is left as an exercise to the reader.
> >  
>
> Well, so long as it can hash 500 MB of data in a few minutes without
> using absurd amounts of RAM, that'll do for me. ;-)
>
> [I actually wanted to do this for a project at work. When I discovered
> that none of the available Haskell implementations are fast enough, I
> tried to write my own. But that wasn't fast enough either. So I ended up
> being forced to call md5sum.exe and attempt to parse its output.
> Obviously having a real Haskell function would be infinitely more
> portable and convinient...]
>
> _______________________________________________
> Haskell-Cafe mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
123