Int vs Word performance?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Int vs Word performance?

Claus Reinke
Looking at prelude/PrelRules.hs has reminded me of an old
conundrum: if I switch from Int to Word, should I expect any
performance differences?

A while ago, I needed lots of fairly small positive numbers,
together with a small number of flags for each, so I thought
I'd switch from Int to Word, and map the flags to bits. But
the performance dropped so drastically that I went back
to Int, slightly complicating the bitmaps. I didn't expect that,
and I can't see any builtin rules for Int that would have no
Word counterpart.

Here is a trivial example with drastic difference between
T = Int and T = Word (~2.5x here):

    main = print $ foldl' (+) 0 [1..100000000::T]

What am I missing here?

Also, in the real code I ended up seeing things like this in
the -ddump-simpl output for the bit-fiddling code:

    GHC.Prim.word2Int#
        (GHC.Prim.and#
            (GHC.Prim.int2Word# wild13_XbE)
            (GHC.Prim.int2Word# y#_a4EZ))

Is that likely to cost me a lot or are these conversions cheap?

Claus

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: Int vs Word performance?

Don Stewart-2
claus.reinke:

> Looking at prelude/PrelRules.hs has reminded me of an old
> conundrum: if I switch from Int to Word, should I expect any
> performance differences?
>
> A while ago, I needed lots of fairly small positive numbers,
> together with a small number of flags for each, so I thought
> I'd switch from Int to Word, and map the flags to bits. But the
> performance dropped so drastically that I went back
> to Int, slightly complicating the bitmaps. I didn't expect that,
> and I can't see any builtin rules for Int that would have no
> Word counterpart.
>
> Here is a trivial example with drastic difference between
> T = Int and T = Word (~2.5x here):
>
>    main = print $ foldl' (+) 0 [1..100000000::T]
>
> What am I missing here?
>
> Also, in the real code I ended up seeing things like this in
> the -ddump-simpl output for the bit-fiddling code:
>
>    GHC.Prim.word2Int#
>        (GHC.Prim.and#
>            (GHC.Prim.int2Word# wild13_XbE)
>            (GHC.Prim.int2Word# y#_a4EZ))
>
> Is that likely to cost me a lot or are these conversions cheap?

Those guys are no-ops, and in general you should never see a performance
difference. If you do, it is a bug!  There are some known cases where
rules are missing however:

  * Conversions to Int from Double for ceil, floor, round are missing rules.
            http://hackage.haskell.org/trac/ghc/ticket/2271

  * gcd only has a specialised implementation for Int,
            http://hackage.haskell.org/trac/ghc/ticket/2270

Some others I'm aware of are product/sum/maximum/minimum
on lists have specialisations for some atomic types (Int, Integer) but
not all (needs a ticket for this too).

I'm not aware of any remaining "theoretically noop" conversions that
aren't in fact implemented noops now. If you find them, please open a
ticket.

-- Don
_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: Int vs Word performance?

Malcolm Wallace
In reply to this post by Claus Reinke
"Claus Reinke" <[hidden email]> wrote:

> A while ago, I needed lots of fairly small positive numbers,
> together with a small number of flags for each, so I thought
> I'd switch from Int to Word, and map the flags to bits.

Since there are few guarantees about the size of a Word (or Int), surely
it would be better to choose a definitely sized basic type, e.g. Word8
or Word16?  I vaguely recall that ghc used to generate better code for
definitely sized WordN than the generic unguaranteed-size Word.

Regards,
    Malcolm
_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: Int vs Word performance?

Claus Reinke
In reply to this post by Don Stewart-2
>> Here is a trivial example with drastic difference between
>> T = Int and T = Word (~2.5x here):
>>
>>    main = print $ foldl' (+) 0 [1..100000000::T]
..

>>    GHC.Prim.word2Int#
>>        (GHC.Prim.and#
>>            (GHC.Prim.int2Word# wild13_XbE)
>>            (GHC.Prim.int2Word# y#_a4EZ))
>>
>> Is that likely to cost me a lot or are these conversions cheap?
>
> Those guys are no-ops, and in general you should never see a performance
> difference. If you do, it is a bug!  There are some known cases where
> rules are missing however:

Thanks, that is one thing less to worry about. Btw, is there a "guide to
reading Core" somewhere, with emphasis on performance aspects (what
to look for when optimizing time or space usage, what to ignore, how to
make it more readable, etc)?

Until I stumbled over CORE annotations, I found it near impossible even
to find the pieces of interest for non-trivial programs, things like
-dsuppress-uniques help a little with diffs, some things look big but
are noops, etc. - that kind of helpful pragmatic knowledge (why does
it look as if source variable names aren't always preserved; why does
it use random uniques instead of de Bruijn-style disambiguation, which
wouldn't interfere with diffs and would have static semantic content;
why do the outputs look different for core2core vs dump-simpl, ..).

> Some others I'm aware of are product/sum/maximum/minimum
> on lists have specialisations for some atomic types (Int, Integer) but
> not all (needs a ticket for this too).

A quick grep shows almost no specialization at all for Word, or for
IntXX/WordXX (see below). Still, none of that seems to explain the
example repeated at the top of this message.

Claus

$ find libraries/ -name _darcs -prune -o -name *hs | xargs grep SPECIAL | grep '\<Int\|\<Word'
libraries/base/Data/List.hs:{-# SPECIALISE sum     :: [Int] -> Int #-}
libraries/base/Data/List.hs:{-# SPECIALISE sum     :: [Integer] -> Integer #-}
libraries/base/Data/List.hs:{-# SPECIALISE product :: [Int] -> Int #-}
libraries/base/Data/List.hs:{-# SPECIALISE product :: [Integer] -> Integer #-}
libraries/base/GHC/Arr.lhs:    {-# SPECIALISE instance Ix (Int,Int) #-}
libraries/base/GHC/Arr.lhs:    {-# SPECIALISE instance Ix (Int,Int,Int) #-}
libraries/base/GHC/Float.lhs:    {-# SPECIALIZE properFraction :: Float -> (Int, Float) #-}
libraries/base/GHC/Float.lhs:    {-# SPECIALIZE round    :: Float -> Int #-}
libraries/base/GHC/Float.lhs:    {-# SPECIALIZE properFraction :: Float  -> (Integer, Float) #-}
libraries/base/GHC/Float.lhs:    {-# SPECIALIZE round    :: Float -> Integer #-}
libraries/base/GHC/Float.lhs:    {-# SPECIALIZE properFraction :: Double -> (Int, Double) #-}
libraries/base/GHC/Float.lhs:    {-# SPECIALIZE round    :: Double -> Int #-}
libraries/base/GHC/Float.lhs:    {-# SPECIALIZE properFraction :: Double -> (Integer, Double) #-}
libraries/base/GHC/Float.lhs:    {-# SPECIALIZE round    :: Double -> Integer #-}
libraries/base/GHC/Real.lhs:{-# SPECIALISE (%) :: Integer -> Integer -> Rational #-}
libraries/base/GHC/Real.lhs:{-# SPECIALISE reduce :: Integer -> Integer -> Rational #-}
libraries/base/GHC/Real.lhs:{-# SPECIALISE lcm :: Int -> Int -> Int #-}
libraries/bytestring/Data/ByteString/Internal.hs:{-# SPECIALIZE unpackWith :: (Word8 -> Char) ->
ByteString -> [Char] #-}
libraries/bytestring/Data/ByteString/Internal.hs:{-# SPECIALIZE packWith :: (Char -> Word8) ->
[Char] -> ByteString #-}
libraries/bytestring/Data/ByteString/Lazy.hs:{-# SPECIALIZE packWith :: (Char -> Word8) -> [Char] ->
ByteString #-}
libraries/bytestring/Data/ByteString/Lazy.hs:{-# SPECIALIZE unpackWith :: (Word8 -> Char) ->
ByteString -> [Char] #-}
libraries/containers/Data/Sequence.hs:{-# SPECIALIZE lookupTree :: Int -> FingerTree (Elem a) ->
Place (Elem a) #-}
libraries/containers/Data/Sequence.hs:{-# SPECIALIZE lookupTree :: Int -> FingerTree (Node a) ->
Place (Node a) #-}
libraries/containers/Data/Sequence.hs:{-# SPECIALIZE lookupNode :: Int -> Node (Elem a) -> Place
(Elem a) #-}
libraries/containers/Data/Sequence.hs:{-# SPECIALIZE lookupNode :: Int -> Node (Node a) -> Place
(Node a) #-}
libraries/containers/Data/Sequence.hs:{-# SPECIALIZE lookupDigit :: Int -> Digit (Elem a) -> Place
(Elem a) #-}
libraries/containers/Data/Sequence.hs:{-# SPECIALIZE lookupDigit :: Int -> Digit (Node a) -> Place
(Node a) #-}
libraries/containers/Data/Sequence.hs:{-# SPECIALIZE adjustTree :: (Int -> Elem a -> Elem a) ->
Int -> FingerTree (Elem a) -> FingerTree (Elem a) #-}
libraries/containers/Data/Sequence.hs:{-# SPECIALIZE adjustTree :: (Int -> Node a -> Node a) ->
Int -> FingerTree (Node a) -> FingerTree (Node a) #-}
libraries/containers/Data/Sequence.hs:{-# SPECIALIZE adjustNode :: (Int -> Elem a -> Elem a) ->
Int -> Node (Elem a) -> Node (Elem a) #-}
libraries/containers/Data/Sequence.hs:{-# SPECIALIZE adjustNode :: (Int -> Node a -> Node a) ->
Int-> Node (Node a) -> Node (Node a) #-}
libraries/containers/Data/Sequence.hs:{-# SPECIALIZE adjustDigit :: (Int -> Elem a -> Elem a) ->
Int -> Digit (Elem a) -> Digit (Elem a) #-}
libraries/containers/Data/Sequence.hs:{-# SPECIALIZE adjustDigit :: (Int -> Node a -> Node a) ->
Int -> Digit (Node a) -> Digit (Node a) #-}
libraries/containers/Data/Sequence.hs:{-# SPECIALIZE splitTree :: Int -> FingerTree (Elem a) ->
Split (FingerTree (Elem a)) (Elem a) #-}
libraries/containers/Data/Sequence.hs:{-# SPECIALIZE splitTree :: Int -> FingerTree (Node a) ->
Split (FingerTree (Node a)) (Node a) #-}
libraries/containers/Data/Sequence.hs:{-# SPECIALIZE splitNode :: Int -> Node (Elem a) -> Split
(Maybe (Digit (Elem a))) (Elem a) #-}
libraries/containers/Data/Sequence.hs:{-# SPECIALIZE splitNode :: Int -> Node (Node a) -> Split
(Maybe (Digit (Node a))) (Node a) #-}
libraries/containers/Data/Sequence.hs:{-# SPECIALIZE splitDigit :: Int -> Digit (Elem a) -> Split
(Maybe (Digit (Elem a))) (Elem a) #-}
libraries/containers/Data/Sequence.hs:{-# SPECIALIZE splitDigit :: Int -> Digit (Node a) -> Split
(Maybe (Digit (Node a))) (Node a) #-}
libraries/parallel/Control/Parallel/Strategies.hs:{-# SPECIALISE seqListN :: Int -> Strategy b ->
Strategy [b] #-}


_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: Int vs Word performance?

Claus Reinke
In reply to this post by Malcolm Wallace
>> A while ago, I needed lots of fairly small positive numbers,
>> together with a small number of flags for each, so I thought
>> I'd switch from Int to Word, and map the flags to bits.
>
> Since there are few guarantees about the size of a Word (or Int), surely
> it would be better to choose a definitely sized basic type, e.g. Word8
> or Word16?  

Good point in principle, and I would indeed prefer a specific size.
Unfortunately, I found just the opposite of this

> I vaguely recall that ghc used to generate better code for
> definitely sized WordN than the generic unguaranteed-size Word.

to be true (although I don't recall whether I checked with IntN or
WordN, and I don't have a small example for this issue): Even
just replacing Int with Int32 on a system where that should be
the same was liable to reduce performance..

Given Don's point about SPECIALI[ZS]E, and the lack of
specialisations for IntN/WordN, that might explain it?

Claus

PS. perhaps on newer 64 bit machines, explicitly selecting
    32 bits can offer savings? I'd certainly expect selecting
    64 bits on a 32 bit machine to lead to slowdowns.

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: Int vs Word performance?

Simon Marlow-7
In reply to this post by Claus Reinke
Claus Reinke wrote:
>>> Here is a trivial example with drastic difference between
>>> T = Int and T = Word (~2.5x here):
>>>
>>>    main = print $ foldl' (+) 0 [1..100000000::T]
> ..

> A quick grep shows almost no specialization at all for Word, or for
> IntXX/WordXX (see below). Still, none of that seems to explain the
> example repeated at the top of this message.

The Enum instance for Int uses specialised implementations of enumFromTo
and friends, whereas the Word version uses the generic integralEnumFromTo.

Cheers,
        Simon
_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

RE: Int vs Word performance?

Simon Peyton Jones
In reply to this post by Claus Reinke
| Until I stumbled over CORE annotations, I found it near impossible even
| to find the pieces of interest for non-trivial programs, things like
| -dsuppress-uniques help a little with diffs, some things look big but
| are noops, etc. - that kind of helpful pragmatic knowledge (why does
| it look as if source variable names aren't always preserved; why does
| it use random uniques instead of de Bruijn-style disambiguation, which
| wouldn't interfere with diffs and would have static semantic content;
| why do the outputs look different for core2core vs dump-simpl, ..).

Many of these things might be fixable if someone thought about the specification carefully.  The current core pretty-printer was initially designed only for GHC internals hackers, rather than Joe User.

A first step might be for a posse of Joe Users to specify what they want, as precisely as possible.  Then this same posse might even write a Core pretty-printer to achieve it. I am happy to advise.  Then we could substitute new for old.

| A quick grep shows almost no specialization at all for Word, or for
| IntXX/WordXX (see below). Still, none of that seems to explain the
| example repeated at the top of this message.

We'd be delighted to apply suitable library patches.

Simon
_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

RE: Int vs Word performance?

Simon Peyton Jones
| | A quick grep shows almost no specialization at all for Word, or for
| | IntXX/WordXX (see below). Still, none of that seems to explain the
| | example repeated at the top of this message.
|
| We'd be delighted to apply suitable library patches.

PS: in the case that no one gets around to creating such a patch, creating a ticket that documents the problem and points to the needed specialisations would be a start
_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: Int vs Word performance?

Simon Marlow-7
In reply to this post by Claus Reinke
Claus Reinke wrote:
> PS. perhaps on newer 64 bit machines, explicitly selecting
>    32 bits can offer savings? I'd certainly expect selecting
>    64 bits on a 32 bit machine to lead to slowdowns.

Unlikely, I'd have thought.  We implement all the explicitly sized integral
types by zero-extending or sign-extending to the size of an Int/Word.
That's why there are no primitive types or operations for Word8#, Word16#, etc.

Cheers,
        Simon
_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: Int vs Word performance?

Don Stewart-2
In reply to this post by Claus Reinke
claus.reinke:

>>> Here is a trivial example with drastic difference between
>>> T = Int and T = Word (~2.5x here):
>>>
>>>    main = print $ foldl' (+) 0 [1..100000000::T]
> ..
>>>    GHC.Prim.word2Int#
>>>        (GHC.Prim.and#
>>>            (GHC.Prim.int2Word# wild13_XbE)
>>>            (GHC.Prim.int2Word# y#_a4EZ))
>>>
>>> Is that likely to cost me a lot or are these conversions cheap?
>>
>> Those guys are no-ops, and in general you should never see a performance
>> difference. If you do, it is a bug!  There are some known cases where
>> rules are missing however:
>
> Thanks, that is one thing less to worry about. Btw, is there a "guide to
> reading Core" somewhere, with emphasis on performance aspects (what
> to look for when optimizing time or space usage, what to ignore, how to
> make it more readable, etc)?
>
> Until I stumbled over CORE annotations, I found it near impossible even
> to find the pieces of interest for non-trivial programs, things like
> -dsuppress-uniques help a little with diffs, some things look big but
> are noops, etc. - that kind of helpful pragmatic knowledge (why does
> it look as if source variable names aren't always preserved; why does
> it use random uniques instead of de Bruijn-style disambiguation, which
> wouldn't interfere with diffs and would have static semantic content;
> why do the outputs look different for core2core vs dump-simpl, ..).

I use the ghc-core tool:

    http://hackage.haskell.org/cgi-bin/hackage-scripts/package/ghc-core
_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: Int vs Word performance?

Don Stewart-2
In reply to this post by Claus Reinke
claus.reinke:

>>> Here is a trivial example with drastic difference between
>>> T = Int and T = Word (~2.5x here):
>>>
>>>    main = print $ foldl' (+) 0 [1..100000000::T]
> ..
>>>    GHC.Prim.word2Int#
>>>        (GHC.Prim.and#
>>>            (GHC.Prim.int2Word# wild13_XbE)
>>>            (GHC.Prim.int2Word# y#_a4EZ))
>>>
>>> Is that likely to cost me a lot or are these conversions cheap?
>>
>> Those guys are no-ops, and in general you should never see a performance
>> difference. If you do, it is a bug!  There are some known cases where
>> rules are missing however:
>
> Thanks, that is one thing less to worry about. Btw, is there a "guide to
> reading Core" somewhere, with emphasis on performance aspects (what
> to look for when optimizing time or space usage, what to ignore, how to
> make it more readable, etc)?
>
> Until I stumbled over CORE annotations, I found it near impossible even
> to find the pieces of interest for non-trivial programs, things like
> -dsuppress-uniques help a little with diffs, some things look big but
> are noops, etc. - that kind of helpful pragmatic knowledge (why does
> it look as if source variable names aren't always preserved; why does
> it use random uniques instead of de Bruijn-style disambiguation, which
> wouldn't interfere with diffs and would have static semantic content;
> why do the outputs look different for core2core vs dump-simpl, ..).
>
>> Some others I'm aware of are product/sum/maximum/minimum
>> on lists have specialisations for some atomic types (Int, Integer) but
>> not all (needs a ticket for this too).
>
> A quick grep shows almost no specialization at all for Word, or for
> IntXX/WordXX (see below). Still, none of that seems to explain the
> example repeated at the top of this message.

We do need to decide on if we want to add specializations for all atomic
types in general, and if so, then let'd do that intentionally.

Does anyone see a reason not to do it in the libraries, via rules?
_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: Int vs Word performance?

Don Stewart-2
In reply to this post by Simon Marlow-7
marlowsd:

> Claus Reinke wrote:
>>>> Here is a trivial example with drastic difference between
>>>> T = Int and T = Word (~2.5x here):
>>>>
>>>>    main = print $ foldl' (+) 0 [1..100000000::T]
>> ..
>
>> A quick grep shows almost no specialization at all for Word, or for
>> IntXX/WordXX (see below). Still, none of that seems to explain the
>> example repeated at the top of this message.
>
> The Enum instance for Int uses specialised implementations of enumFromTo  
> and friends, whereas the Word version uses the generic
> integralEnumFromTo.

Another good reason to use uvector,

    import Data.Array.Vector
    import Data.Word

    main = print $ foldlU (+) (0::T) (enumFromToU 1 (100000000::T))

type T = Word


    $wfold :: Word# -> Word# -> Word#

    $wfold =
      \ (ww_s1cg :: Word#) (ww1_s1ck :: Word#) ->
        case gtWord# ww1_s1ck __word 100000000 of wild_a19p {
          False ->
            $wfold
              (plusWord# ww_s1cg ww1_s1ck)
              (plusWord# ww1_s1ck __word 1);
          True -> ww_s1cg

Yields:

    Main_zdwfold_info:
    .Lc1e1:
      cmpq $100000000,%rdi
      ja .Lc1e4
      leaq 1(%rdi),%rax
      addq %rdi,%rsi
      movq %rax,%rdi
      jmp Main_zdwfold_info

While at

    type T = Int

We get:

    $wfold :: Int# -> Int# -> Int#

    $wfold =
      \ (ww_s144 :: Int#) (ww1_s148 :: Int#) ->
        case ># ww1_s148 100000000 of wild_a11q {
          False ->
            $wfold
              (+# ww_s144 ww1_s148) (+# ww1_s148 1);
          True -> ww_s144

And *identical assembly*

    Main_zdwfold_info:
    .Lc15E:
      cmpq $100000000,%rdi
      jg .Lc15H
      leaq 1(%rdi),%rax
      addq %rdi,%rsi
      movq %rax,%rdi
      jmp Main_zdwfold_info

-- Don
_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: Int vs Word performance?

Claus Reinke
In reply to this post by Simon Peyton Jones
| | A quick grep shows almost no specialization at all for Word, or for
| | IntXX/WordXX (see below). Still, none of that seems to explain the
| | example repeated at the top of this message.
|
| We'd be delighted to apply suitable library patches.

>PS: in the case that no one gets around to creating such a patch,
>creating a ticket that documents the problem and points to the
>needed specialisations would be a start

Since more that just SPECIALI[SZ]E pragmas seem involved,
I've created a ticket to summarize this thread's findings:

http://hackage.haskell.org/trac/ghc/ticket/3055

Claus

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users