# Understanding allocation behavior Classic List Threaded 13 messages Open this post in threaded view
|

## Understanding allocation behavior

 After reading Daniel Fischer's message about trying to use EnumSet in   his Sudoku.hs and finding that it was slower when processing a large   data set, I decided to do some profiling.  I rewrote his solver to   use EnumSets and profiled it.  The culprit turns out to be the   following function which is responsible for 22% of the allocating in   the run.  Is there a more efficient way to write this function? foldBits :: Bits c => (a -> Int -> a) -> a -> c -> a foldbits _ z 0  = z foldBits f z bs = foldBits' f 0 bs z foldBits' :: Bits c => (a -> Int -> a) -> Int -> c -> a -> a foldBits' f i bs z      | bs == 0 = z      | otherwise = z' `seq` foldBits' f i' bs' z'      where z' | 1 == bs .&. 1 = f z i                      | otherwise =  z                  i' = i + 1                 bs' = bs `shiftR` 1 ps.  I was impressed with how hairy DF's algorithm is and I am not   really enough interested in Sudoku to spend the  time needed to grok   it.  So, I decided to try an experiment to see if I could restructure   it without understanding it very deeply. Step 1. comment out all the type signatures. Step 2. find the main place that I wanted to change from [Int]  to   (Set Int) Step 3. compile; make obvious edits; repeat until 0 errors I had it running in a few minutes.  I can't imagine doing that in any   other programming environment! Cheers, David -------------------------------- David F. Place mailto:[hidden email] _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe
Open this post in threaded view
|

## Re: Understanding allocation behavior

 Am Freitag, 7. April 2006 22:57 schrieb David F. Place: > After reading Daniel Fischer's message about trying to use EnumSet in > his Sudoku.hs and finding that it was slower when processing a large > data set, I decided to do some profiling.  I rewrote his solver to > use EnumSets and profiled it.  The culprit turns out to be the The main & evil culprit, methinks now, was DiffArray and the small allocation area. Care to re-profile with my SudokuSet.hs ? Unless I overlooked something, I use foldBits only via size (though that's used a lot). > following function which is responsible for 22% of the allocating in > the run.  Is there a more efficient way to write this function? > > foldBits :: Bits c => (a -> Int -> a) -> a -> c -> a > foldbits _ z 0  = z > foldBits f z bs = foldBits' f 0 bs z > > foldBits' :: Bits c => (a -> Int -> a) -> Int -> c -> a -> a > foldBits' f i bs z > >      | bs == 0 = z >      | otherwise = z' `seq` foldBits' f i' bs' z' > >      where z' | 1 == bs .&. 1 = f z i                 ^^^^^^^^^^^^^^^^^ testbit bs 0 ? and foldbits(') is only used for c = Word, so why the polymorphism? > >                      | otherwise =  z > >                  i' = i + 1 >                 bs' = bs `shiftR` 1 > > ps.  I was impressed with how hairy DF's algorithm is and I am not Now there are comments, I hope they explain what I do. > really enough interested in Sudoku to spend the  time needed to grok > it.  So, I decided to try an experiment to see if I could restructure > it without understanding it very deeply. > > Step 1. comment out all the type signatures. > Step 2. find the main place that I wanted to change from [Int]  to > (Set Int) > Step 3. compile; make obvious edits; repeat until 0 errors > > I had it running in a few minutes.  I can't imagine doing that in any > other programming environment! Great! Triple Cheer for Haskell!!! I wonder how different your translation is to mine (I've no experience with bit-twiddling, but I tried to be as cheap as possible) > > Cheers, David > Cheers, Daniel -- "In My Egotistical Opinion, most people's C programs should be indented six feet downward and covered with dirt."         -- Blair P. Houghton _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe
Open this post in threaded view
|

## Re: Understanding allocation behavior

 Hello Daniel, Saturday, April 8, 2006, 4:21:14 AM, you wrote: > Unless I overlooked something, I use foldBits only via size (though that's > used a lot). size of set? there is much faster method - use a table [0..255] -> number of bits in this number seen as set then we split Word to the bytes and count total size of set by adding number of bits set in each byte foldBits can be made faster (may be) by adding strict annotations: foldBits :: Bits c => (a -> Int -> a) -> a -> c -> a foldbits _ z bs | z `seq` bs `seq` False  = undefined foldBits' :: Bits c => (a -> Int -> a) -> Int -> c -> a -> a foldbits' _ i bs z | i `seq` bs `seq` z `seq` False  = undefined moreover, GHC don't inline recursive functions! so foldbits' is out of luck and it seems that GHC generates polymorphic version that is of course very-very slow. what you can do? 1. use SPECIALIZE pragma. this allow to make faster version at least for typical cases (a=c=Int, for example) 2. use recursion on the internal foldbits' function. may be this will help to inline and therefore specialize each call to foldbits'. it's better to ask Simon Marlow about this -- Best regards,  Bulat                            mailto:[hidden email] _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe
Open this post in threaded view
|

## Re: Re: Understanding allocation behavior

Open this post in threaded view
|

## Re: Re: Understanding allocation behavior

 Thanks Bulat and Robert.  I implemented Bulat's idea as the   following.  It tests faster than Roberts.  I use Robert's to compute   the table.  The performance seems satisfactory now. size :: Set a -> Int size (Set w) = countBits w      where        countBits w            | w == 0 = 0            | otherwise = countBits (w `shiftR` 8) + bitsTable!(w .&.   0xFF) bitsTable :: Array Word Int bitsTable = array (0,255) \$ [(i,bitcount i) | i <- [0..255]] bitcount :: Word -> Int bitcount 0 = 0 bitcount x = 1 + bitcount (x .&. (x-1)) On Apr 8, 2006, at 1:21 PM, Robert Dockins wrote: > On Apr 8, 2006, at 4:24 AM, Bulat Ziganshin wrote: > >> Hello Daniel, >> >> Saturday, April 8, 2006, 4:21:14 AM, you wrote: >> >>> Unless I overlooked something, I use foldBits only via size   >>> (though that's >>> used a lot). >> >> size of set? there is much faster method - use a table >> >> [0..255] -> number of bits in this number seen as set > > Or: > > bitcount :: Word -> Int > bitcount 0 = 0 > bitcount x = 1 + bitcount (x .&. (x-1)) > > -- | /O(1)/. The number of elements in the set. > size :: Set a -> Int > size (Set w) = bitcount w -------------------------------- David F. Place mailto:[hidden email] _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe
Open this post in threaded view
|

## Re: Re: Understanding allocation behavior

 In reply to this post by Bulat Ziganshin-2 On Apr 8, 2006, at 4:24 AM, Bulat Ziganshin wrote: > foldBits can be made faster (may be) by adding strict annotations: > > foldBits :: Bits c => (a -> Int -> a) -> a -> c -> a > foldbits _ z bs | z `seq` bs `seq` False  = undefined > > foldBits' :: Bits c => (a -> Int -> a) -> Int -> c -> a -> a > foldbits' _ i bs z | i `seq` bs `seq` z `seq` False  = undefined Indeed, I had tried this.  It is slower for reasons that are   mysterious to me. -------------------------------- David F. Place mailto:[hidden email] _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe
Open this post in threaded view
|

## Re: Understanding allocation behavior

 In reply to this post by David Place Hello David, Saturday, April 8, 2006, 9:58:56 PM, you wrote: > bitsTable :: Array Word Int > bitsTable = array (0,255) \$ [(i,bitcount i) | i <- [0..255]] bitsTable :: UArray Word Int bitsTable = listArray (0,255) \$ map bitcount [0..255] UArray is much faster than Array but can be used only for simple datatypes (integral, floating point, Char, Bool). more info at http://haskell.org/haskellwiki/Arrays  btw: you can use "UArray a Bool" to represent sets of ANY type `a` belonging to class Ix. It uses only one bit per elemnt in current implementation -- Best regards,  Bulat                            mailto:[hidden email] _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe
Open this post in threaded view
|

## Re: Re: Understanding allocation behavior

 In reply to this post by David Place On Apr 8, 2006, at 1:58 PM, David F. Place wrote: > Thanks Bulat and Robert.  I implemented Bulat's idea as the   > following.  It tests faster than Roberts.  I use Robert's to   > compute the table.  The performance seems satisfactory now. > > size :: Set a -> Int > size (Set w) = countBits w >     where >       countBits w >           | w == 0 = 0 >           | otherwise = countBits (w `shiftR` 8) + bitsTable!(w .&.   > 0xFF) > > bitsTable :: Array Word Int > bitsTable = array (0,255) \$ [(i,bitcount i) | i <- [0..255]] > > bitcount :: Word -> Int > bitcount 0 = 0 > bitcount x = 1 + bitcount (x .&. (x-1)) There's a couple of other nice bit-twiddily things you can do: countBits :: Word -> Int countBits w     | w == 0 = 0     | otherwise = countBits (w `shiftR` 8) + bitsTable!(w .&. 0xFF) bitsTable :: Array Word Int bitsTable = array (0,255) \$ [(i,bitcount i) | i <- [0..255]] bitcount :: Word -> Int bitcount 0 = 0 bitcount x = 1 + bitcount (x .&. (x-1)) lsb :: Word -> Int lsb x = countBits ((x-1) .&. (complement x)) -- stolen from http://aggregate.org/MAGIC/msb :: Word -> Int msb x0 = let       x1 = x0 .|. (x0 `shiftR` 1)       x2 = x1 .|. (x1 `shiftR` 2)       x3 = x2 .|. (x2 `shiftR` 4)       x4 = x3 .|. (x3 `shiftR` 8)       x5 = x4 .|. (x4 `shiftR` 16)       in countBits x5 - 1 findMinIndex :: Word -> Int findMinIndex 0 =      error "EnumSet.findMin: empty set has no minimal element" findMinIndex w = lsb w findMaxIndex :: Word -> Int findMaxIndex 0 =      error "EnumSet.findMax: empty set has no maximal element" findMaxIndex w = msb w Which should make all access to the greatest or least element O(1).   I guess, come to think of it, all operations on EnumSet are O(1) by   virtue of the set size being upper-bounded.  At any rate this turns   recursion into unboxable straight-line code and I think it does less   allocations. Rob Dockins Speak softly and drive a Sherman tank. Laugh hard; it's a long way to the bank.            -- TMBG _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe
Open this post in threaded view
|

## Re: Understanding allocation behavior

 Hum, oddly, these actually slow things down. While the new size brought the sudoku17 time from ~570s down to ~490s, the new findMinIndex/findMaxIndex increased the time to ~515s, although hardly used. Why? Cheers, Daniel Am Sonntag, 9. April 2006 00:54 schrieb Robert Dockins: > On Apr 8, 2006, at 1:58 PM, David F. Place wrote: > > Thanks Bulat and Robert.  I implemented Bulat's idea as the > > following.  It tests faster than Roberts.  I use Robert's to > > compute the table.  The performance seems satisfactory now. > > > > size :: Set a -> Int > > size (Set w) = countBits w > >     where > >       countBits w > > > >           | w == 0 = 0 > >           | otherwise = countBits (w `shiftR` 8) + bitsTable!(w .&. 0xFF) > > > > bitsTable :: Array Word Int > > bitsTable = array (0,255) \$ [(i,bitcount i) | i <- [0..255]] > > > > bitcount :: Word -> Int > > bitcount 0 = 0 > > bitcount x = 1 + bitcount (x .&. (x-1)) > > There's a couple of other nice bit-twiddily things you can do: > > countBits :: Word -> Int > countBits w > >     | w == 0 = 0 >     | otherwise = countBits (w `shiftR` 8) + bitsTable!(w .&. 0xFF) > > bitsTable :: Array Word Int > bitsTable = array (0,255) \$ [(i,bitcount i) | i <- [0..255]] > > bitcount :: Word -> Int > bitcount 0 = 0 > bitcount x = 1 + bitcount (x .&. (x-1)) > > lsb :: Word -> Int > lsb x = countBits ((x-1) .&. (complement x)) > > -- stolen from http://aggregate.org/MAGIC/> msb :: Word -> Int > msb x0 = let >       x1 = x0 .|. (x0 `shiftR` 1) >       x2 = x1 .|. (x1 `shiftR` 2) >       x3 = x2 .|. (x2 `shiftR` 4) >       x4 = x3 .|. (x3 `shiftR` 8) >       x5 = x4 .|. (x4 `shiftR` 16) >       in countBits x5 - 1 > > > findMinIndex :: Word -> Int > findMinIndex 0 = >      error "EnumSet.findMin: empty set has no minimal element" > findMinIndex w = lsb w > > findMaxIndex :: Word -> Int > findMaxIndex 0 = >      error "EnumSet.findMax: empty set has no maximal element" > findMaxIndex w = msb w > > > > Which should make all access to the greatest or least element O(1). > I guess, come to think of it, all operations on EnumSet are O(1) by > virtue of the set size being upper-bounded.  At any rate this turns > recursion into unboxable straight-line code and I think it does less > allocations. > > > > Rob Dockins > > Speak softly and drive a Sherman tank. > Laugh hard; it's a long way to the bank. >            -- TMBG > _______________________________________________ > Haskell-Cafe mailing list > [hidden email] > http://www.haskell.org/mailman/listinfo/haskell-cafe-- "In My Egotistical Opinion, most people's C programs should be indented six feet downward and covered with dirt."         -- Blair P. Houghton _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe
Open this post in threaded view
|

## Re: Understanding allocation behavior

 In reply to this post by Bugzilla from robdockins@fastmail.fm Hello Robert, Sunday, April 9, 2006, 2:54:58 AM, you wrote: > findMinIndex :: Word -> Int > findMaxIndex :: Word -> Int on the other side, these procedures can use the same divide-to-bytes technique as `size` findMinIndex 0 = undefined findMinIndex n = case (n `shiftR` 8) of                    0 -> minIndexInByte ! (n .&. 255)                    b -> 8 + findMinIndex b something like this should work -- Best regards,  Bulat                            mailto:[hidden email] _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe
Open this post in threaded view
|

## Re: Re: Understanding allocation behavior

 Hi Bulat, On Apr 9, 2006, at 6:31 AM, Bulat Ziganshin wrote: > on the other side, these procedures can use the same divide-to-bytes > technique as `size` > > findMinIndex 0 = undefined > findMinIndex n = case (n `shiftR` 8) of >                    0 -> minIndexInByte ! (n .&. 255) >                    b -> 8 + findMinIndex b > > something like this should work In an email to me, Jean-Philippe Bernardy expressed a concern that a   large table could needlessly fill the data cache.  He proposed   checking 4 bits at a time and using a small table of 16 elements.   Not surprisingly, it isn't as fast.  I wonder what you think of   this.  Also, I wonder what would be a good test to demonstrate this   possible interaction with the cache. Cheers, David ps.  Thanks for the tip about UArray. -------------------------------- David F. Place mailto:[hidden email] _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe