MD5 performance optimizations, and GHC -via-C producing segfaulting binary

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

MD5 performance optimizations, and GHC -via-C producing segfaulting binary

Salvatore Insalaco
2008/5/17 Andrew Coppin <[hidden email]>:
> Hi folks.
>
> OK, try this:
>
>  darcs get http://darcs.orphi.me.uk/MyMD5
>  cd MyMD5
>  ghc -O2 --make md5sum
>  md5sum <some large filename>

I've got some time to take a look at the code. It's very nice,
readable and declarative, but obviously not optimized for "raw speed".
There're a few things to do to have a speed-up of 2x, without going "low-level".

1) There's a lazyness leak in Pad.hs. The sum in:
          else make_block_bs bs0 : work (n + block_size_bytes) bs1
is not strict. With a very large file (e.g. 100 mb) it means stack overflow.

[roxas:~/Desktop/test2/MyMD5] kirby% ghc -O2 --make md5sum.hs

[roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip
Stack space overflow: current size 8388608 bytes.
Use `+RTS -Ksize' to increase it.

To solve it just add a bang (!) before the n parameter:

    work !n bs =

You've got to add {-# LANGUAGE BangPatterns #-} at the top of the file
too. There're solutions that don't imply BangPatterns, and use only
seq, but I like bang patterns!

Now it works:
[roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip
E6542028D538BAEC180A96A5D1E6EC3A
11.958u 0.210s 0:12.19 99.7% 0+0k 0+2io 0pf+0w

2) make_block_bs is sub-optimal, and very critical to performance. I
decided to use Data.Binary for it (it's also more readable, and you
get rid of the unsafeWrite):
import Data.Binary
import Data.Binary.Get
import Control.Monad

// ...

instance Binary Block where
    put _ = undefined
    get = do
        xs <- replicateM 16 getWord32le
        return $ Block $ listArray (0, 15) xs

make_block_bs :: B.ByteString -> Block
make_block_bs = decode

Now we are getting faster:
[roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip
E6542028D538BAEC180A96A5D1E6EC3A
8.063u 0.174s 0:08.23 100.0% 0+0k 0+0io 0pf+0w

3) You are doing a lot of access to fields of a strict data type
(State). You can at least ask the compiler to help you a bit with
-funbox-strict-fields.

Now we have:
[roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip
E6542028D538BAEC180A96A5D1E6EC3A
6.780u 0.133s 0:06.91 100.0% 0+0k 0+1io 0pf+0w

We have got a good, nearly 2x, speed-up with very few optimizations,
and we run in a very small constant amount of memory.

Probably compiling with -fvia-C could help even more, but strangely:

[roxas:~/Desktop/test2/MyMD5] kirby% ghc -fvia-C -funbox-strict-fields
-O2 --make md5sum.hs
[roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip
Segmentation fault

Is this supposed to happen? There're no "unsafe" functions or imports
used in the program. Maybe a bug in GHC?

Salvatore
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: MD5 performance optimizations, and GHC -via-C producing segfaulting binary

Don Stewart-2
Thanks for taking a look at this.

kirby81:

> 2008/5/17 Andrew Coppin <[hidden email]>:
> > Hi folks.
> >
> > OK, try this:
> >
> >  darcs get http://darcs.orphi.me.uk/MyMD5
> >  cd MyMD5
> >  ghc -O2 --make md5sum
> >  md5sum <some large filename>
>
> I've got some time to take a look at the code. It's very nice,
> readable and declarative, but obviously not optimized for "raw speed".
> There're a few things to do to have a speed-up of 2x, without going "low-level".
>
> 1) There's a lazyness leak in Pad.hs. The sum in:
>           else make_block_bs bs0 : work (n + block_size_bytes) bs1
> is not strict. With a very large file (e.g. 100 mb) it means stack overflow.
>
> [roxas:~/Desktop/test2/MyMD5] kirby% ghc -O2 --make md5sum.hs
>
> [roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip
> Stack space overflow: current size 8388608 bytes.
> Use `+RTS -Ksize' to increase it.
>
> To solve it just add a bang (!) before the n parameter:
>
>     work !n bs =
>
> You've got to add {-# LANGUAGE BangPatterns #-} at the top of the file
> too. There're solutions that don't imply BangPatterns, and use only
> seq, but I like bang patterns!
>
> Now it works:
> [roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip
> E6542028D538BAEC180A96A5D1E6EC3A
> 11.958u 0.210s 0:12.19 99.7% 0+0k 0+2io 0pf+0w

Good idea. I saw some lazy left folds in there too, which look
suspicious.

>
> 2) make_block_bs is sub-optimal, and very critical to performance. I
> decided to use Data.Binary for it (it's also more readable, and you
> get rid of the unsafeWrite):
> import Data.Binary
> import Data.Binary.Get
> import Control.Monad
>
> // ...
>
> instance Binary Block where
>     put _ = undefined
>     get = do
>         xs <- replicateM 16 getWord32le
>         return $ Block $ listArray (0, 15) xs
>
> make_block_bs :: B.ByteString -> Block
> make_block_bs = decode
>
> Now we are getting faster:
> [roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip
> E6542028D538BAEC180A96A5D1E6EC3A
> 8.063u 0.174s 0:08.23 100.0% 0+0k 0+0io 0pf+0w

Definitely a good idea.
 

> 3) You are doing a lot of access to fields of a strict data type
> (State). You can at least ask the compiler to help you a bit with
> -funbox-strict-fields.
>
> Now we have:
> [roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip
> E6542028D538BAEC180A96A5D1E6EC3A
> 6.780u 0.133s 0:06.91 100.0% 0+0k 0+1io 0pf+0w
>
> We have got a good, nearly 2x, speed-up with very few optimizations,
> and we run in a very small constant amount of memory.

Great.

>
> Probably compiling with -fvia-C could help even more, but strangely:
>
> [roxas:~/Desktop/test2/MyMD5] kirby% ghc -fvia-C -funbox-strict-fields
> -O2 --make md5sum.hs
> [roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip
> Segmentation fault
>
> Is this supposed to happen? There're no "unsafe" functions or imports
> used in the program. Maybe a bug in GHC?

That could be a gcc bug, in fact. Can you provide the final source?
Does changing the gcc optimisation level help? (i.e. -fvia-C -optc-O2 )

Andrew, I hope you look carefully at the suggestions here, and use them
to improve the code you were working on.

-- Don
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: MD5 performance optimizations

Andrew Coppin
Don Stewart wrote:
> Andrew, I hope you look carefully at the suggestions here, and use them
> to improve the code you were working on.
>  

Indeed, this is just the sort of stuff I was hoping to get... [Top marks
for spotting the laziness leak in "pad" BTW! I totally missed that.]

While I'm here, can I just clarify *exactly* what -funbox-strict-fields
actually does?

I was under the impression that if you have a single-constructor type
with all strict fields, GHC would automatically try to avoid boxing and
unboxing it. However, the existence of this flag seems to indicate that
it only performs this particular optimisation if you ask for it. But on
the third hand, turning this flag on or off makes no measurable
performance difference. (At least, in the tests I did. Maybe that's
because it was swamped by all the other inefficiencies mentioned?) Some
clarification?

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: MD5 performance optimizations

Don Stewart-2
andrewcoppin:

> Don Stewart wrote:
> >Andrew, I hope you look carefully at the suggestions here, and use them
> >to improve the code you were working on.
> >  
>
> Indeed, this is just the sort of stuff I was hoping to get... [Top marks
> for spotting the laziness leak in "pad" BTW! I totally missed that.]
>
> While I'm here, can I just clarify *exactly* what -funbox-strict-fields
> actually does?

If in doubt, ask the user's guide:

    http://www.haskell.org/ghc/docs/latest/html/users_guide/pragmas.html#unpack-pragma

-- Don
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: MD5 performance optimizations, and GHC -via-C producing segfaulting binary

Andrew Coppin
In reply to this post by Salvatore Insalaco
Salvatore Insalaco wrote:
> I've got some time to take a look at the code. It's very nice,
> readable and declarative, but obviously not optimized for "raw speed".
> There're a few things to do to have a speed-up of 2x, without going "low-level".
>  

You have my individed attention...

> 1) There's a lazyness leak in Pad.hs. The sum in:
>           else make_block_bs bs0 : work (n + block_size_bytes) bs1
> is not strict. With a very large file (e.g. 100 mb) it means stack overflow.
>  

Oops! Well spotted...

> To solve it just add a bang (!) before the n parameter:
>
>     work !n bs =
>
> You've got to add {-# LANGUAGE BangPatterns #-} at the top of the file
> too. There're solutions that don't imply BangPatterns, and use only
> seq, but I like bang patterns!
>  

Ah, so *that's* what bang patterns do?

> 2) make_block_bs is sub-optimal, and very critical to performance.

Yeah, so I see. (60% time spent here... ouch!) I'll bet that's where C
is beating me...

> I decided to use Data.Binary for it

I'm not familiar with that library (i.e. what it does or how you use it).

> import Data.Binary
> import Data.Binary.Get
> import Control.Monad
>
> // ...
>
> instance Binary Block where
>     put _ = undefined
>     get = do
>         xs <- replicateM 16 getWord32le
>         return $ Block $ listArray (0, 15) xs
>
> make_block_bs :: B.ByteString -> Block
> make_block_bs = decode
>  

Mmm, OK. Doesn't look too bad...

> 3) You are doing a lot of access to fields of a strict data type
> (State). You can at least ask the compiler to help you a bit with
> -funbox-strict-fields.
>  

I did try that, but it didn't seem to make any difference for me. [Maybe
it does now because of your other improvements? Which version of GHC and
which platform are you running on? I'm GHC 6.8.2 on Windows...]

> We have got a good, nearly 2x, speed-up with very few optimizations,
> and we run in a very small constant amount of memory.
>  

I'm liking the sound of that. :-D

> Probably compiling with -fvia-C could help even more

...oh yeah, that's no longer default for -O2. I forgot about that!

> but strangely:
>
> Segmentation fault
>
> Is this supposed to happen? There're no "unsafe" functions or imports
> used in the program. Maybe a bug in GHC?
>  

GAH! o_O That's not good(tm)...

Any chance you could email me a Darcs patch (or 2 seperate ones?) I'll
add it to the repo on my website, and test to see what numbers I get on
my PC.

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Re: MD5 performance optimizations, and GHC -via-C producing segfaulting binary

Bulat Ziganshin-2
Hello Andrew,

Tuesday, May 20, 2008, 11:05:52 PM, you wrote:

>> -funbox-strict-fields.
>>  

> I did try that, but it didn't seem to make any difference for me. [Maybe

it may be that ghc just not recompiled program when you supplied this
switch. as i wrote, this switch by itself made your original program
1.5x faster on my box. try to delete .o/.exe before rebuilding

and, without this switch representation for !Int32 is the same as for
Int32 - only difference is that when data is assigned to such field
they are evaluated first (and then boxed)

it is not enabled by default, because for *non-primitive* datatypes
such as B below automatic unboxing of strict fields of this type may
decrease sharing and thus memory/performance. imagine for example:

data A = A !B !B
data B = B !Int !Int !Int !Int !Int

b = B 1 1 1 1 1
a = A b b

jhc automatically unboxes strict fields of primitive datatypes, which
is guaranteed to provide only positive effects. may be somesay the
same will be added to ghc

--
Best regards,
 Bulat                            mailto:[hidden email]

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: MD5 performance optimizations

Andrew Coppin
In reply to this post by Don Stewart-2
Don Stewart wrote:

> andrewcoppin:
>  
>> While I'm here, can I just clarify *exactly* what -funbox-strict-fields
>> actually does?
>>    
>
> If in doubt, ask the user's guide:
>
>     http://www.haskell.org/ghc/docs/latest/html/users_guide/pragmas.html#unpack-pragma
>  

...so making a field strict causes it to be automatically forced when
the constructor is forced, but doesn't actually unbox it? So there's
still a level of indirection, and no register allocation?

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Re: MD5 performance optimizations

Andrew Coppin
In reply to this post by Bulat Ziganshin-2
Bulat Ziganshin wrote:

> Hello Andrew,
>
> Tuesday, May 20, 2008, 11:05:52 PM, you wrote:
>
>  
>>> -funbox-strict-fields.
>>>  
>>>      
>
>  
>> I did try that, but it didn't seem to make any difference for me. [Maybe
>>    
>
> it may be that ghc just not recompiled program when you supplied this
> switch.
>  

Possibly. I notice that adding *some* switches makes GHC recompile
everything, while others don't. [It's probably documented in the manual
somewhere...]

> and, without this switch representation for !Int32 is the same as for
> Int32 - only difference is that when data is assigned to such field
> they are evaluated first (and then boxed)
>  

Right, OK - so strict evaluation, but some bitpattern in RAM.

> it is not enabled by default, because for *non-primitive* datatypes
> such as B below automatic unboxing of strict fields of this type may
> decrease sharing and thus memory/performance.

Yeah, I can imagine...

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: MD5 performance optimizations, and GHC -via-C producing segfaulting binary

Salvatore Insalaco
In reply to this post by Don Stewart-2
2008/5/20 Don Stewart <[hidden email]>:

>>
>> Probably compiling with -fvia-C could help even more, but strangely:
>>
>> [roxas:~/Desktop/test2/MyMD5] kirby% ghc -fvia-C -funbox-strict-fields
>> -O2 --make md5sum.hs
>> [roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip
>> Segmentation fault
>>
>> Is this supposed to happen? There're no "unsafe" functions or imports
>> used in the program. Maybe a bug in GHC?
>
> That could be a gcc bug, in fact. Can you provide the final source?
> Does changing the gcc optimisation level help? (i.e. -fvia-C -optc-O2 )
I tried with and without gcc optimizations, but it always segfaults.
It happened to me even before my optimizations.

I've got GHC 6.8.2 on Mac OS X 10.5.2, GCC 4.0.1. -fvia-C works well
for other libraries and applications, so I don't think it's a problem
of my specific setting.

Salvatore

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe

MyMD5.tar.gz (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Re: MD5 performance optimizations, and GHC -via-C producing segfaulting binary

Don Stewart-2
In reply to this post by Bulat Ziganshin-2
bulat.ziganshin:

> Hello Andrew,
>
> Tuesday, May 20, 2008, 11:05:52 PM, you wrote:
>
> >> -funbox-strict-fields.
> >>  
>
> > I did try that, but it didn't seem to make any difference for me. [Maybe
>
> it may be that ghc just not recompiled program when you supplied this
> switch. as i wrote, this switch by itself made your original program
> 1.5x faster on my box. try to delete .o/.exe before rebuilding
>
> and, without this switch representation for !Int32 is the same as for
> Int32 - only difference is that when data is assigned to such field
> they are evaluated first (and then boxed)
>
> it is not enabled by default, because for *non-primitive* datatypes
> such as B below automatic unboxing of strict fields of this type may
> decrease sharing and thus memory/performance. imagine for example:
>
> data A = A !B !B
> data B = B !Int !Int !Int !Int !Int
>
> b = B 1 1 1 1 1
> a = A b b
>
> jhc automatically unboxes strict fields of primitive datatypes, which
> is guaranteed to provide only positive effects. may be somesay the
> same will be added to ghc

I'm confused. GHC of course unboxes strict fields of primitive data types.

    {-# OPTIONS -O2 -fvia-C -optc-O2 -funbox-strict-fields #-}

    data A = A !B !B
    data B = B !Int !Int !Int !Int !Int

    b = B 1 2 3 4 5
    a = A b b

    go :: Int -> A
    go 0 = a
    go n = go (n-1)

    main = print $ case go 10 of
               A (B a b c d e) (B a' b' c' d' e') -> a + b + c + d + e
                                                   + a' + b' + c' + d' + e'

For example, go compiles to:

        $wgo :: Int#
                     -> (# Int#,
                           Int#,
                           Int#,
                           Int#,
                           Int#,
                           Int#,
                           Int#,
                           Int#,
                           Int#,
                           Int# #)

        $wgo =
          \ (ww_svf :: Int#) ->
            case ww_svf of ds_Xmw {
              __DEFAULT -> $wgo (-# ds_Xmw 1);
              0 -> (# 1, 2, 3, 4, 5, 1, 2, 3, 4, 5 #)

Values are passed in registers, the structures are entirely unpacked into
registers or stack for return.

Were you not aware of this Bulat?

-- Don
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Re: MD5 performance optimizations, and GHC -via-C producing segfaulting binary

Brandon S Allbery KF8NH

On May 20, 2008, at 16:13 , Don Stewart wrote:

> bulat.ziganshin:
>> Hello Andrew,
>>
>> Tuesday, May 20, 2008, 11:05:52 PM, you wrote:
>>
>>>> -funbox-strict-fields.
>>>>
>>
>>> I did try that, but it didn't seem to make any difference for me.  
>>> [Maybe
>>
>> jhc automatically unboxes strict fields of primitive datatypes, which
>> is guaranteed to provide only positive effects. may be somesay the
>> same will be added to ghc
>
> I'm confused. GHC of course unboxes strict fields of primitive data  
> types.
>
>     {-# OPTIONS -O2 -fvia-C -optc-O2 -funbox-strict-fields #-}

I think "automatically" means "without -funbox-strict-fields"....

--
brandon s. allbery [solaris,freebsd,perl,pugs,haskell] [hidden email]
system administrator [openafs,heimdal,too many hats] [hidden email]
electrical and computer engineering, carnegie mellon university    KF8NH


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Re: MD5 performance optimizations, and GHC -via-C producing segfaulting binary

Salvatore Insalaco
In reply to this post by Andrew Coppin
2008/5/20 Andrew Coppin <[hidden email]>:
> Ah, so *that's* what bang patterns do?

Exactly. If you have a pattern (even a simple one like 'n'), and you
prefix it with a bang (!), the function will force its evaluation.

>> 2) make_block_bs is sub-optimal, and very critical to performance.
>
> Yeah, so I see. (60% time spent here... ouch!) I'll bet that's where C is
> beating me...

Maybe that's a bit optimistic, but it's a start :).

>
>> I decided to use Data.Binary for it
>
> I'm not familiar with that library (i.e. what it does or how you use it).

It's a library to serialize / deserialize haskell values. It uses
ByteStrings, and it's very fast. I suggest to take a look at it (it's
on Hackage).

> I did try that, but it didn't seem to make any difference for me. [Maybe it
> does now because of your other improvements? Which version of GHC and which
> platform are you running on? I'm GHC 6.8.2 on Windows...]

If you recompiled the executable correctly, maybe you were doing
profiling? Profiling slows some simple operations so much, that the
benefits of a low-level optimization like this can be lost.

Salvatore
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Re: MD5 performance optimizations, and GHC -via-C producing segfaulting binary

Isaac Dupree
In reply to this post by Don Stewart-2
Don Stewart wrote:
> I'm confused. GHC of course unboxes strict fields of primitive data types.
>
>     {-# OPTIONS -O2 -fvia-C -optc-O2 -funbox-strict-fields #-}

... but only when you give -funbox-strict-fields, as there, or UNPACK.
The point is that it never loses sharing to unbox a strict Int field
[1], so it should do that with -O, even without either of the above
overrides.

[1] I'm not sure if this is true... if it has to rebox the Int, you get
another copy floating around, not the original, right?

-Isaac
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Re: MD5 performance optimizations, and GHC -via-C producing segfaulting binary

Don Stewart-2
isaacdupree:

> Don Stewart wrote:
> >I'm confused. GHC of course unboxes strict fields of
> >primitive data types.
> >
> >    {-# OPTIONS -O2 -fvia-C -optc-O2 -funbox-strict-fields
> >    #-}
>
> ... but only when you give -funbox-strict-fields, as there,
> or UNPACK. The point is that it never loses sharing to unbox
> a strict Int field [1], so it should do that with -O, even
> without either of the above overrides.
>
> [1] I'm not sure if this is true... if it has to rebox the
> Int, you get another copy floating around, not the original,
> right?

See section 8.12.10. UNPACK pragma:

    "if the T constructor is scrutinised and the floats passed to a
    non-strict function for example, they will have to be reboxed"

That said, the strictness analyser is beefy enough, and strict data
structures  commone enough, that this seems to almost ubiquitously be a
win.

Enabling it at -O or -O2 makes perfect sense.

-- Don
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re[2]: Re: MD5 performance optimizations, and GHC -via-C producing segfaulting binary

Bulat Ziganshin-2
In reply to this post by Don Stewart-2
Hello Don,

Wednesday, May 21, 2008, 12:13:14 AM, you wrote:

>> it is not enabled by default, because for *non-primitive* datatypes

> I'm confused. GHC of course unboxes strict fields of primitive data types.

>     {-# OPTIONS -O2 -fvia-C -optc-O2 -funbox-strict-fields #-}

it is not enabled by default...

--
Best regards,
 Bulat                            mailto:[hidden email]

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re[2]: Re: MD5 performance optimizations

Bulat Ziganshin-2
In reply to this post by Andrew Coppin
Hello Andrew,

Wednesday, May 21, 2008, 12:09:39 AM, you wrote:

> Right, OK - so strict evaluation, but some bitpattern in RAM.

yes. in particular this allows to designate fields that should de
initialized:

data A = A {a :: Int, b :: !Int}
main = print A{}

>> it is not enabled by default, because for *non-primitive* datatypes
>> such as B below automatic unboxing of strict fields of this type may
>> decrease sharing and thus memory/performance.

> Yeah, I can imagine...

sometime i've seriously optimized my program by removing this switch
and adding manual unboxing directives to the sources. it will be great
if ghc will catch jhc here


--
Best regards,
 Bulat                            mailto:[hidden email]

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Re: MD5 performance optimizations

Don Stewart-2
bulat.ziganshin:

> Hello Andrew,
>
> Wednesday, May 21, 2008, 12:09:39 AM, you wrote:
>
> > Right, OK - so strict evaluation, but some bitpattern in RAM.
>
> yes. in particular this allows to designate fields that should de
> initialized:
>
> data A = A {a :: Int, b :: !Int}
> main = print A{}
>
> >> it is not enabled by default, because for *non-primitive* datatypes
> >> such as B below automatic unboxing of strict fields of this type may
> >> decrease sharing and thus memory/performance.
>
> > Yeah, I can imagine...
>
> sometime i've seriously optimized my program by removing this switch
> and adding manual unboxing directives to the sources. it will be great
> if ghc will catch jhc here

Would you like to file a feature request?

    http://hackage.haskell.org/trac/ghc/newticket?type=bug

Particularly if you have an example where JHC has done a better job
optimising some simple example. So far I've seen no ticke

Unless tickets are opened, progress will not be made.

-- Don
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: MD5 performance optimizations

Andrew Coppin
In reply to this post by Salvatore Insalaco
Woo!

Salvatore kindly sent me a Darcs patch, and applying it does indeed make it run faster. Yay!

[Note that -fvia-c works just fine for me. It doesn't appear to produce a huge speed difference, but it compiles just fine.]

Thanks for the tips, guys! :-D The changes are in the online Darcs repo.



_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: MD5 performance optimizations

Salvatore Insalaco
2008/5/21 Andrew Coppin <[hidden email]>:
> Woo!
>
> Salvatore kindly sent me a Darcs patch, and applying it does indeed make it
> run faster. Yay!

Hi Andrew,
I'm glad that -fvia-c works for you: maybe it's a Mac OS X specific bug?

Anyway, did you compile with -fvia-c -optc-O3? I expect
register-intensive code like this to be at least 20% faster with gcc.

Salvatore
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

RE: Re: MD5 performance optimizations, and GHC -via-C producing segfaulting binary

Simon Peyton Jones
In reply to this post by Isaac Dupree
| > I'm confused. GHC of course unboxes strict fields of primitive data types.
| >
| >     {-# OPTIONS -O2 -fvia-C -optc-O2 -funbox-strict-fields #-}
|
| ... but only when you give -funbox-strict-fields, as there, or UNPACK.
| The point is that it never loses sharing to unbox a strict Int field
| [1], so it should do that with -O, even without either of the above
| overrides.
|
| [1] I'm not sure if this is true... if it has to rebox the Int, you get
| another copy floating around, not the original, right?

Correct.  If you have

        data T = MkT {-# UNPACK #-} !Int
then given
        case x of { MkT y -> h y }
then GHC must re-box the 'y' before passing it to h.  That's why unpacking strict fields isn't an unambiguous win.  Perhaps it should be the default, though, which can be switched off, rather than the reverse.

Simon
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
12