Counting bits: Sanity Check

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Counting bits: Sanity Check

David Place
Hi All,

Since it seems that real applications need more than just  union,  
intersection, difference and complement to be fast to make EnumSet  
useful, I've been looking into the less naive approaches to the other  
things.   In particular, size seems to find itself in the inner  
loop.   I've made a comparison of various approaches to bit  
counting.  It seems I was too hasty to declare Bulat's suggestion of  
table lookup (table,table32) the winner.  It seems Robert's  
suggestion of Kernighan's (kern) method is faster.

I also implemented the method described in pages 187-188 of Software  
Optimization Guide for AMD Athlon™ 64 and Opteron™ Processors.  
(ones32)  It's slower on my powerbook, but may be the winner on 64bit  
processors.  Here are the results:

[Marcel:~/devl/EnumSet] david% time ./bits 2000000 3000000 ones32
21
1.788u 0.136s 0:03.30 57.8%     0+0k 0+0io 0pf+0w
[Marcel:~/devl/EnumSet] david% time ./bits 2000000 3000000 table
21
2.404u 0.164s 0:04.96 51.6%     0+0k 0+1io 0pf+0w
[Marcel:~/devl/EnumSet] david% time ./bits 2000000 3000000 table32
21
2.067u 0.140s 0:04.27 51.5%     0+0k 0+0io 0pf+0w
[Marcel:~/devl/EnumSet] david% time ./bits 2000000 3000000 kern
21
1.729u 0.137s 0:03.25 56.9%     0+0k 0+1io 0pf+0w

If you'd like to give it a whirl on your fancy modern computers,  
here's the code:



ghc -O3 -optc-O3 -o bits BitTwiddle.hs


Of course, if I've done something lame-brained and skewed the  
results, please let me know.

Cheers, David


--------------------------------
David F. Place
mailto:[hidden email]


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe

BitTwiddle.hs (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Counting bits: Sanity Check

Bugzilla from robdockins@fastmail.fm

On Apr 11, 2006, at 10:09 AM, David F. Place wrote:

> Hi All,
>
> Since it seems that real applications need more than just  union,  
> intersection, difference and complement to be fast to make EnumSet  
> useful, I've been looking into the less naive approaches to the  
> other things.   In particular, size seems to find itself in the  
> inner loop.   I've made a comparison of various approaches to bit  
> counting.  It seems I was too hasty to declare Bulat's suggestion  
> of table lookup (table,table32) the winner.  It seems Robert's  
> suggestion of Kernighan's (kern) method is faster.
>
> I also implemented the method described in pages 187-188 of  
> Software Optimization Guide for AMD Athlon™ 64 and Opteron™  
> Processors. (ones32)  It's slower on my powerbook, but may be the  
> winner on 64bit processors.  Here are the results:
>
> [Marcel:~/devl/EnumSet] david% time ./bits 2000000 3000000 ones32
> 21
> 1.788u 0.136s 0:03.30 57.8%     0+0k 0+0io 0pf+0w
> [Marcel:~/devl/EnumSet] david% time ./bits 2000000 3000000 table
> 21
> 2.404u 0.164s 0:04.96 51.6%     0+0k 0+1io 0pf+0w
> [Marcel:~/devl/EnumSet] david% time ./bits 2000000 3000000 table32
> 21
> 2.067u 0.140s 0:04.27 51.5%     0+0k 0+0io 0pf+0w
> [Marcel:~/devl/EnumSet] david% time ./bits 2000000 3000000 kern
> 21
> 1.729u 0.137s 0:03.25 56.9%     0+0k 0+1io 0pf+0w
>
> If you'd like to give it a whirl on your fancy modern computers,  
> here's the code:
>
> <BitTwiddle.hs>
>
> ghc -O3 -optc-O3 -o bits BitTwiddle.hs


I get similar results on my machine (PPC powerbook).  'ones32'  
barely edges out 'kern' and 'table' and 'table32' come in behind.

Average 'user' time over three runs:

ones32:  0.540s
kern: 0.545s
table: 0.730s
table32: 0.632s




> Of course, if I've done something lame-brained and skewed the  
> results, please let me know.
>
> Cheers, David
>
>
> --------------------------------
> David F. Place
> mailto:[hidden email]
>
> _______________________________________________
> Haskell-Cafe mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/haskell-cafe


Rob Dockins

Speak softly and drive a Sherman tank.
Laugh hard; it's a long way to the bank.
           -- TMBG



_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Counting bits: Sanity Check

Daniel McAllansmith
In reply to this post by David Place
On Wednesday 12 April 2006 02:09, David F. Place wrote:
> If you'd like to give it a whirl on your fancy modern computers,

Averages of user time for three runs on an Athlon64 running 64bit linux:

kern 0.29700
ones32 0.30733
table32 0.37333
table 0.38400

I ran a whole lot more of kern and ones32... kern was consistently faster than
ones32.  Curious.

Daniel
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Counting bits: Sanity Check

David Place

On Apr 12, 2006, at 3:30 PM, Daniel McAllansmith wrote:

> Averages of user time for three runs on an Athlon64 running 64bit  
> linux:
>
> kern 0.29700
> ones32 0.30733
> table32 0.37333
> table 0.38400
>
> I ran a whole lot more of kern and ones32... kern was consistently  
> faster than
> ones32.  Curious.
Yes, especially curious since the algorithm is taken from AMD's  
optimization guide for the Athlon and Opteron series.  I'm not good  
enough at reading core syntax to be able to see what GHC is doing  
with it.

I wonder how this other crazy algorithm I found will work on your 64  
bit omputer.   It is much slower on my 32 bit PPC powerbook for  
obvious reasons.   If you'd like to try it, I'll include an updated  
BitTwiddle.hs .

Usage:
time ./bits 2000000 3000000 64



--------------------------------
David F. Place
mailto:[hidden email]


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe

BitTwiddle.hs (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Counting bits: Sanity Check

Daniel McAllansmith
On Thursday 13 April 2006 08:55, David F. Place wrote:

> On Apr 12, 2006, at 3:30 PM, Daniel McAllansmith wrote:
> > Averages of user time for three runs on an Athlon64 running 64bit
> > linux:
> >
> > kern 0.29700
> > ones32 0.30733
> > table32 0.37333
> > table 0.38400
> >
> > I ran a whole lot more of kern and ones32... kern was consistently
> > faster than
> > ones32.  Curious.
>
> Yes, especially curious since the algorithm is taken from AMD's
> optimization guide for the Athlon and Opteron series.  I'm not good
> enough at reading core syntax to be able to see what GHC is doing
> with it.
>
> I wonder how this other crazy algorithm I found will work on your 64
> bit omputer.   It is much slower on my 32 bit PPC powerbook for
> obvious reasons.   If you'd like to try it, I'll include an updated
> BitTwiddle.hs .
>
> Usage:
> time ./bits 2000000 3000000 64

Averages of user time of five runs on an Athlon64 running 64bit linux:

64 0.1974
kern 0.2980
ones32 0.3240
table32 0.3754
table 0.3798

64 looks to be a good bit faster.

You didn't change anything in the ones32 algorithm did you?  The other
algorithms are taking roughly what they did last time, but ones32 seems
consistently slower now.

Daniel
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Counting bits: Sanity Check

David Place

On Apr 12, 2006, at 5:21 PM, Daniel McAllansmith wrote:

>
> Averages of user time of five runs on an Athlon64 running 64bit linux:
>
> 64 0.1974
> kern 0.2980
> ones32 0.3240
> table32 0.3754
> table 0.3798
>
> 64 looks to be a good bit faster.
>
> You didn't change anything in the ones32 algorithm did you?  The other
> algorithms are taking roughly what they did last time, but ones32  
> seems
> consistently slower now.

Thanks for running it.  Looks like "64" is worth the trouble.  If the  
32 bit wide algorithm is so fast,  the 12 bit wide one must be  
blazingly fast.  (See my other thread "The Marriage of Heaven and  
Hell.")     I didn't make any changes to ones32 according to "diff."

Cheers, David

--------------------------------
David F. Place
mailto:[hidden email]

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re[2]: Counting bits: Sanity Check

Bulat Ziganshin-2
In reply to this post by David Place
Hello David,

Thursday, April 13, 2006, 12:55:05 AM, you wrote:

> Yes, especially curious since the algorithm is taken from AMD's
> optimization guide for the Athlon and Opteron series.  I'm not good  
> enough at reading core syntax to be able to see what GHC is doing  
> with it.

optimization for GHC is far away from low-level asm optimization so it
is not surprise that this don't work



--
Best regards,
 Bulat                            mailto:[hidden email]

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe