Integer constant folding in the presence of new primops

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Integer constant folding in the presence of new primops

Jan Stolarek
I'm finishing my patches for T6135, but I still have some work to do on integer-gmp and
integer-simpl libraries. I changed the previously existing functions that compare Integers:

{-# NOINLINE #-}
eqInteger :: Integer -> Integer -> Bool
eqInteger = ...

to something like this:

{-# NOINLINE eqIntegerPrim #-}
eqIntegerPrim :: Integer -> Integer -> Int#
eqIntegerPrim ... -- same as before, but uses new primops

eqInteger :: Integer -> Integer -> Bool
eqInteger  a b = tagToEnum# (a `eqIntegerPrim`  b)

I noticed that in some cases this implementation prevents constant folding for Integers. I'm
trying to figure out the proper way to fix this. I am considering two ways of doing it:

- add new folding rules for eqIntegerPrim and other functions, leave the existing rules for
eqInteger. This should fold: (100012 :: Integer) == 100012 to True (as it does now)

- add new folding rules for eqIntegerPrim and other functions, remove the existing rules for
eqInteger. This would fold (100012 :: Integer) == 100012 to tagToEnum# 1#

Which approach is better?

Another question related to this: how do I run validation with integer-simpl enabled? Wiki
describes how to enable integer-simpl for the build, but I think that validation is always run
against integer-gmp? Or am I wrong here?

Janek


Reply | Threaded
Open this post in threaded view
|

Integer constant folding in the presence of new primops

Simon Peyton Jones
All the same issues arise for Int# too, right?  


| {-# NOINLINE eqIntegerPrim #-}
| eqIntegerPrim :: Integer -> Integer -> Int#
| eqIntegerPrim ... -- same as before, but uses new primops
|
| eqInteger :: Integer -> Integer -> Bool
| eqInteger  a b = tagToEnum# (a `eqIntegerPrim`  b)
|
| I noticed that in some cases this implementation prevents constant
| folding for Integers.

Why?  Because eqInteger is now inlined, so the rule doesn't get a chance to fire?

| A add new folding rules for eqIntegerPrim and other functions, leave the
| existing rules for
| eqInteger. This should fold: (100012 :: Integer) == 100012 to True (as
| it does now)
|
| B add new folding rules for eqIntegerPrim and other functions, remove
| the existing rules for
| eqInteger. This would fold (100012 :: Integer) == 100012 to tagToEnum#
| 1#

I'm not sure I understand, but I think that the difference between (A) and (B) is whether the existing rules for eqInteger remain or are removed. I'd remove them if you can; simpler, only one way for things to happen

S



Reply | Threaded
Open this post in threaded view
|

Integer constant folding in the presence of new primops

Jan Stolarek
> All the same issues arise for Int# too, right?
I guess you're right :) I removed the rules for wrappers (eqInteger) and created rules
for "primOps" (eqIntegerPrim, which I think I'll rename to eqInteger#)

> Why?  Because eqInteger is now inlined, so the rule doesn't get a chance to
> fire?
There's something very subtle going on with inlining and I can't grasp what it is. At the moment I
have this in prelude/PrelNames.lhs:

eqIntegerPrimIdKey      = mkPreludeMiscIdUnique 70
eqIntegerPrimName     = varQual gHC_INTEGER_TYPE (fsLit "eqIntegerPrim")     eqIntegerPrimIdKey

and in prelude/PrelRules.lhs I have this:

builtinIntegerRules = [
  ...
   rule_binop_Prim     "eqIntegerPrim"       eqIntegerPrimName       (==),
  ...]
    where
          rule_binop_Prim str name op
           = BuiltinRule { ru_name = fsLit str, ru_fn = name, ru_nargs = 2,
                           ru_try = match_Integer_binop_Prim op }

match_Integer_binop_Prim binop dflags _ id_unf [xl, yl]
  | Just (LitInteger x _) <- exprIsLiteral_maybe id_unf xl
  , Just (LitInteger y _) <- exprIsLiteral_maybe id_unf yl
  = Just (if x `binop` y then trueValInt dflags else falseValInt dflags)
match_Integer_binop_Prim _ _ _ _ _ = Nothing

My understanding is that this rule will fire if it finds a function "eqIntegerPrim" with two
Integer literals as parameters. Given my definitions in integer-gmp library:

{-# NOINLINE eqIntegerPrim #-}
eqIntegerPrim :: Integer -> Integer -> Int#
eqIntegerPrim ... -- same as before, but uses new primops

{-# INLINE eqInteger #-}
eqInteger :: Integer -> Integer -> Bool
eqInteger ?a b = tagToEnum# (a `eqIntegerPrim` ?b)

my understanding is that in expression:

    eqIntegerE = (100012 :: Integer) == 100012

the (==) gets inlined to eqInteger:

    eqIntegerE = (100012 :: Integer) `eqInteger` 100012

which in turn gets inlined to eqIntegerPrim:

    eqIntegerE = tagToEnum# ((100012 :: Integer) `eqIntegerPrim` 100012)

At this point inling stops (because of NOINLINE) and my rule fires giving:

    eqIntegerE = tagToEnum# 1#

which in turns allows the rule for tagToEnum# to fire, giving:

    eqIntegerE = GHC.Prim.True

Now here's the tricky part. I'm testing this with test lib/integer/integerConstantFolding in the
testsuite and the test fails because rules for quotRemInteger, divModInteger, quotInteger and
remInteger don't fire, leaving the constants unfolded. I noticed that if I mark eqInteger with
NOINLINE, then these rules fire, but then obviously comparisons like (100012 :: Integer) ==
100012 don't get folded and the test fails anyway. I'm analyzing how the function quotInteger and
others use eqInteger, but I don't see anything that would be obvious.

Janek


Reply | Threaded
Open this post in threaded view
|

Integer constant folding in the presence of new primops

Ian Lynagh-2
On Fri, May 17, 2013 at 11:49:26AM +0200, Jan Stolarek wrote:
>
> Now here's the tricky part. I'm testing this with test lib/integer/integerConstantFolding in the
> testsuite and the test fails because rules for quotRemInteger, divModInteger, quotInteger and
> remInteger don't fire, leaving the constants unfolded. I noticed that if I mark eqInteger with
> NOINLINE, then these rules fire, but then obviously comparisons like (100012 :: Integer) ==
> 100012 don't get folded and the test fails anyway. I'm analyzing how the function quotInteger and
> others use eqInteger, but I don't see anything that would be obvious.

If you remove everything but the quotInteger test from
integerConstantFolding and compile with -ddump-rule-rewrites then you'll
see that the eqInteger rule fires before quotInteger. This is presumably
comparing against 0, as the definition of quot for Integer (in GHC.Real)
is
    _ `quot` 0 = divZeroError
    n `quot` d = n `quotInteger` d

> eqIntegerPrimIdKey      = mkPreludeMiscIdUnique 70
> eqIntegerPrimName     = varQual gHC_INTEGER_TYPE (fsLit "eqIntegerPrim")     eqIntegerPrimIdKey

Do you also still have eqInteger wired in? It sounds like you might have
given them both the same unique?


Thanks
Ian
--
Ian Lynagh, Haskell Consultant
Well-Typed LLP, http://www.well-typed.com/


Reply | Threaded
Open this post in threaded view
|

Integer constant folding in the presence of new primops

Jan Stolarek
> If you remove everything but the quotInteger test from
> integerConstantFolding and compile with -ddump-rule-rewrites then you'll
> see that the eqInteger rule fires before quotInteger. This is presumably
> comparing against 0, as the definition of quot for Integer (in GHC.Real)
> is
>     _ `quot` 0 = divZeroError
>     n `quot` d = n `quotInteger` d
Yes, I noticed these two rules firing together - perhaps that's the explanation why. I created a
small program for testing:

main = print quotInt
quotInt :: Integer
quotInt = 100063 `quot` 156

I noticed that when I define eqInteger wrapper to be NOINLINE, the call to quot is translated to
Core as:

Main.quotInt =
  GHC.Real.$fIntegralInteger_$cquot
    (__integer 100063) (__integer 156)

but when I change the wrapper to INLINE I get:

Main.quotInt =
  GHC.Real.$fNumRatio_$cquot             <-------- NumRatio instead of IntegralInteger
    (__integer 100063) (__integer 156)

All rule firing happens later (I used -ddump-simpl-iterations -ddump-rule-firings), except that
for $fNumRatio_$cquot the quot rules don't fire.

> Do you also still have eqInteger wired in? It sounds like you might have
> given them both the same unique?
No, they didn't have the same unique. I modified the existing rules to work on the new primops and
ignore their wrappers. At the moment I reverted these changes so that I can make progress and
leave this problem for later.

Janek


Reply | Threaded
Open this post in threaded view
|

Integer constant folding in the presence of new primops

Simon Peyton Jones
Jan

I'm conscious that we have not closed the loop on this thread, below.  (I think the attached email is also relevant.)

If it's not sorted out, can you open a ticket, put in the relevant info (so we don't need to look at the email trail), and we can tackle it when you get here.

Simon

-----Original Message-----
From: ghc-devs-bounces at haskell.org [mailto:ghc-devs-bounces at haskell.org] On Behalf Of Jan Stolarek
Sent: 20 May 2013 12:35
To: Ian Lynagh
Cc: ghc-devs at haskell.org
Subject: Re: Integer constant folding in the presence of new primops

> If you remove everything but the quotInteger test from
> integerConstantFolding and compile with -ddump-rule-rewrites then
> you'll see that the eqInteger rule fires before quotInteger. This is
> presumably comparing against 0, as the definition of quot for Integer
> (in GHC.Real) is
>     _ `quot` 0 = divZeroError
>     n `quot` d = n `quotInteger` d
Yes, I noticed these two rules firing together - perhaps that's the explanation why. I created a small program for testing:

main = print quotInt
quotInt :: Integer
quotInt = 100063 `quot` 156

I noticed that when I define eqInteger wrapper to be NOINLINE, the call to quot is translated to Core as:

Main.quotInt =
  GHC.Real.$fIntegralInteger_$cquot
    (__integer 100063) (__integer 156)

but when I change the wrapper to INLINE I get:

Main.quotInt =
  GHC.Real.$fNumRatio_$cquot             <-------- NumRatio instead of IntegralInteger
    (__integer 100063) (__integer 156)

All rule firing happens later (I used -ddump-simpl-iterations -ddump-rule-firings), except that for $fNumRatio_$cquot the quot rules don't fire.

> Do you also still have eqInteger wired in? It sounds like you might
> have given them both the same unique?
No, they didn't have the same unique. I modified the existing rules to work on the new primops and ignore their wrappers. At the moment I reverted these changes so that I can make progress and leave this problem for later.

Janek

_______________________________________________
ghc-devs mailing list
ghc-devs at haskell.org
http://www.haskell.org/mailman/listinfo/ghc-devs
-------------- next part --------------
An embedded message was scrubbed...
From: Jan Stolarek <jan.stolarek at p.lodz.pl>
Subject: Re: Integer constant folding in the presence of new primops
Date: Fri, 17 May 2013 09:49:26 +0000
Size: 5187
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130619/ace0b4bd/attachment.eml>

Reply | Threaded
Open this post in threaded view
|

Integer constant folding in the presence of new primops

Jan Stolarek
> If it's not sorted out, can you open a ticket, put in the relevant info (so
> we don't need to look at the email trail), and we can tackle it when you
> get here.
Currently there's a temporary workaround: I'm using new folding rules for all primitive types,
except for Integer, in which case I left the old folding rules unchanged. This of course should
be modified to make all rules uniform, but for now it at least passes validation. I didn't fill
the ticket, because the bug does not exist yet :) It only manifests itself in my patches, which
have not been applied yet. I'll add all the information from this discussion to my github fork of
GHC and then move it to Trac once the bug makes it to HEAD.

What worries me more about my patches is the performance regression in kahan, because I see no
obvious differences in the generated assembly.

Janek

>
> Simon
>
> -----Original Message-----
> From: ghc-devs-bounces at haskell.org [mailto:ghc-devs-bounces at haskell.org] On
> Behalf Of Jan Stolarek Sent: 20 May 2013 12:35
> To: Ian Lynagh
> Cc: ghc-devs at haskell.org
> Subject: Re: Integer constant folding in the presence of new primops
>
> > If you remove everything but the quotInteger test from
> > integerConstantFolding and compile with -ddump-rule-rewrites then
> > you'll see that the eqInteger rule fires before quotInteger. This is
> > presumably comparing against 0, as the definition of quot for Integer
> > (in GHC.Real) is
> >     _ `quot` 0 = divZeroError
> >     n `quot` d = n `quotInteger` d
>
> Yes, I noticed these two rules firing together - perhaps that's the
> explanation why. I created a small program for testing:
>
> main = print quotInt
> quotInt :: Integer
> quotInt = 100063 `quot` 156
>
> I noticed that when I define eqInteger wrapper to be NOINLINE, the call to
> quot is translated to Core as:
>
> Main.quotInt =
>   GHC.Real.$fIntegralInteger_$cquot
>     (__integer 100063) (__integer 156)
>
> but when I change the wrapper to INLINE I get:
>
> Main.quotInt =
>   GHC.Real.$fNumRatio_$cquot             <-------- NumRatio instead of
> IntegralInteger (__integer 100063) (__integer 156)
>
> All rule firing happens later (I used -ddump-simpl-iterations
> -ddump-rule-firings), except that for $fNumRatio_$cquot the quot rules
> don't fire.
>
> > Do you also still have eqInteger wired in? It sounds like you might
> > have given them both the same unique?
>
> No, they didn't have the same unique. I modified the existing rules to work
> on the new primops and ignore their wrappers. At the moment I reverted
> these changes so that I can make progress and leave this problem for later.
>
> Janek
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs




Reply | Threaded
Open this post in threaded view
|

Integer constant folding in the presence of new primops

Nicolas Frisby
I'm also seeing performance regressions in the shootout benchmarks that I
can't identify in the asm. The new asm looks better but performs worse,
with a ~15% slowdown.

I fired up the performance counters in my CPU and the free Intel code for
inspecting them showed that my CPU utilization took about a 10% hit, even
while executing fewer total instructions.

  1) Jan, perhaps we're seeing the same sort of behavior ? the shootout
benchmarks have extremely hot loops (hundreds of millions of iterations
IIRC). I used ticky profiling too, and saw no suspicious changes in any
counters.

  2) Dear Low-level Gurus: How feasible is it that a ~15% slowdown in a
program with a very hot loop is due to incidentally inhibiting some caching
behavior (instr? data?)? Or perhaps effecting alignment? FTR my CPU is a
Core i7-2620M, Sandy Bridge.

Thanks all.

On Wed, Jun 19, 2013 at 9:27 AM, Jan Stolarek <jan.stolarek at p.lodz.pl>wrote:

> > If it's not sorted out, can you open a ticket, put in the relevant info
> (so
> > we don't need to look at the email trail), and we can tackle it when you
> > get here.
> Currently there's a temporary workaround: I'm using new folding rules for
> all primitive types,
> except for Integer, in which case I left the old folding rules unchanged.
> This of course should
> be modified to make all rules uniform, but for now it at least passes
> validation. I didn't fill
> the ticket, because the bug does not exist yet :) It only manifests itself
> in my patches, which
> have not been applied yet. I'll add all the information from this
> discussion to my github fork of
> GHC and then move it to Trac once the bug makes it to HEAD.
>
> What worries me more about my patches is the performance regression in
> kahan, because I see no
> obvious differences in the generated assembly.
>
> Janek
>
> >
> > Simon
> >
> > -----Original Message-----
> > From: ghc-devs-bounces at haskell.org [mailto:ghc-devs-bounces at haskell.org]
> On
> > Behalf Of Jan Stolarek Sent: 20 May 2013 12:35
> > To: Ian Lynagh
> > Cc: ghc-devs at haskell.org
> > Subject: Re: Integer constant folding in the presence of new primops
> >
> > > If you remove everything but the quotInteger test from
> > > integerConstantFolding and compile with -ddump-rule-rewrites then
> > > you'll see that the eqInteger rule fires before quotInteger. This is
> > > presumably comparing against 0, as the definition of quot for Integer
> > > (in GHC.Real) is
> > >     _ `quot` 0 = divZeroError
> > >     n `quot` d = n `quotInteger` d
> >
> > Yes, I noticed these two rules firing together - perhaps that's the
> > explanation why. I created a small program for testing:
> >
> > main = print quotInt
> > quotInt :: Integer
> > quotInt = 100063 `quot` 156
> >
> > I noticed that when I define eqInteger wrapper to be NOINLINE, the call
> to
> > quot is translated to Core as:
> >
> > Main.quotInt =
> >   GHC.Real.$fIntegralInteger_$cquot
> >     (__integer 100063) (__integer 156)
> >
> > but when I change the wrapper to INLINE I get:
> >
> > Main.quotInt =
> >   GHC.Real.$fNumRatio_$cquot             <-------- NumRatio instead of
> > IntegralInteger (__integer 100063) (__integer 156)
> >
> > All rule firing happens later (I used -ddump-simpl-iterations
> > -ddump-rule-firings), except that for $fNumRatio_$cquot the quot rules
> > don't fire.
> >
> > > Do you also still have eqInteger wired in? It sounds like you might
> > > have given them both the same unique?
> >
> > No, they didn't have the same unique. I modified the existing rules to
> work
> > on the new primops and ignore their wrappers. At the moment I reverted
> > these changes so that I can make progress and leave this problem for
> later.
> >
> > Janek
> >
> > _______________________________________________
> > ghc-devs mailing list
> > ghc-devs at haskell.org
> > http://www.haskell.org/mailman/listinfo/ghc-devs
>
>
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130619/16b12385/attachment-0001.htm>

Reply | Threaded
Open this post in threaded view
|

Integer constant folding in the presence of new primops

Austin Seipp-4
I mean, it certainly *seems* reasonable a 15% hit could come from
pipelining changes or cache behavior or something. I don't think
alignment would really be a huge issue; post-Nehalem I believe
non-aligned writes/reads are extremely cheap. Non-intuitive behavior
can totally happen too: I've seen cases of adding instructions to a
loop which speeds things up (e.g. by taking the extra step, you may
mitigate a dependency stall, which massively helps pipelining across
the loop body etc.)

Nicolas, can I ask what benchmark you're looking at? And what
performance tools are you using, Intels'? If you're on Linux, the
'perf' tool on a modern kernel can be used to quickly get an overview
of how many cache misses/hits your process has, how many pipeline
stalls occur, etc. You can then use it to drill down a bit into the
assembly that's problematic.

That might not give you an exact culprit (it could be many changes and
accumulative hits,) but it's a start.

On Wed, Jun 19, 2013 at 10:43 AM, Nicolas Frisby
<nicolas.frisby at gmail.com> wrote:

> I'm also seeing performance regressions in the shootout benchmarks that I
> can't identify in the asm. The new asm looks better but performs worse, with
> a ~15% slowdown.
>
> I fired up the performance counters in my CPU and the free Intel code for
> inspecting them showed that my CPU utilization took about a 10% hit, even
> while executing fewer total instructions.
>
>   1) Jan, perhaps we're seeing the same sort of behavior ? the shootout
> benchmarks have extremely hot loops (hundreds of millions of iterations
> IIRC). I used ticky profiling too, and saw no suspicious changes in any
> counters.
>
>   2) Dear Low-level Gurus: How feasible is it that a ~15% slowdown in a
> program with a very hot loop is due to incidentally inhibiting some caching
> behavior (instr? data?)? Or perhaps effecting alignment? FTR my CPU is a
> Core i7-2620M, Sandy Bridge.
>
> Thanks all.
>
> On Wed, Jun 19, 2013 at 9:27 AM, Jan Stolarek <jan.stolarek at p.lodz.pl>
> wrote:
>>
>> > If it's not sorted out, can you open a ticket, put in the relevant info
>> > (so
>> > we don't need to look at the email trail), and we can tackle it when you
>> > get here.
>> Currently there's a temporary workaround: I'm using new folding rules for
>> all primitive types,
>> except for Integer, in which case I left the old folding rules unchanged.
>> This of course should
>> be modified to make all rules uniform, but for now it at least passes
>> validation. I didn't fill
>> the ticket, because the bug does not exist yet :) It only manifests itself
>> in my patches, which
>> have not been applied yet. I'll add all the information from this
>> discussion to my github fork of
>> GHC and then move it to Trac once the bug makes it to HEAD.
>>
>> What worries me more about my patches is the performance regression in
>> kahan, because I see no
>> obvious differences in the generated assembly.
>>
>> Janek
>>
>> >
>> > Simon
>> >
>> > -----Original Message-----
>> > From: ghc-devs-bounces at haskell.org [mailto:ghc-devs-bounces at haskell.org]
>> > On
>> > Behalf Of Jan Stolarek Sent: 20 May 2013 12:35
>> > To: Ian Lynagh
>> > Cc: ghc-devs at haskell.org
>> > Subject: Re: Integer constant folding in the presence of new primops
>> >
>> > > If you remove everything but the quotInteger test from
>> > > integerConstantFolding and compile with -ddump-rule-rewrites then
>> > > you'll see that the eqInteger rule fires before quotInteger. This is
>> > > presumably comparing against 0, as the definition of quot for Integer
>> > > (in GHC.Real) is
>> > >     _ `quot` 0 = divZeroError
>> > >     n `quot` d = n `quotInteger` d
>> >
>> > Yes, I noticed these two rules firing together - perhaps that's the
>> > explanation why. I created a small program for testing:
>> >
>> > main = print quotInt
>> > quotInt :: Integer
>> > quotInt = 100063 `quot` 156
>> >
>> > I noticed that when I define eqInteger wrapper to be NOINLINE, the call
>> > to
>> > quot is translated to Core as:
>> >
>> > Main.quotInt =
>> >   GHC.Real.$fIntegralInteger_$cquot
>> >     (__integer 100063) (__integer 156)
>> >
>> > but when I change the wrapper to INLINE I get:
>> >
>> > Main.quotInt =
>> >   GHC.Real.$fNumRatio_$cquot             <-------- NumRatio instead of
>> > IntegralInteger (__integer 100063) (__integer 156)
>> >
>> > All rule firing happens later (I used -ddump-simpl-iterations
>> > -ddump-rule-firings), except that for $fNumRatio_$cquot the quot rules
>> > don't fire.
>> >
>> > > Do you also still have eqInteger wired in? It sounds like you might
>> > > have given them both the same unique?
>> >
>> > No, they didn't have the same unique. I modified the existing rules to
>> > work
>> > on the new primops and ignore their wrappers. At the moment I reverted
>> > these changes so that I can make progress and leave this problem for
>> > later.
>> >
>> > Janek
>> >
>> > _______________________________________________
>> > ghc-devs mailing list
>> > ghc-devs at haskell.org
>> > http://www.haskell.org/mailman/listinfo/ghc-devs
>>
>>
>>
>> _______________________________________________
>> ghc-devs mailing list
>> ghc-devs at haskell.org
>> http://www.haskell.org/mailman/listinfo/ghc-devs
>
>
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs
>



--
Regards,
Austin - PGP: 4096R/0x91384671


Reply | Threaded
Open this post in threaded view
|

Integer constant folding in the presence of new primops

Nicolas Frisby
Thanks Austin.

The program exhibiting these behaviors is shootout/reverse-complement. The
performance monitoring I used was Intel's pcm from

http://software.intel.com/en-us/articles/intel-performance-counter-monitor-a-better-way-to-measure-cpu-utilization

I've been working only on my MBP, so no perfmon yet. I plan to investigate
this with different architectures/machines when this issue percolates back
up my todo list.


On Wed, Jun 19, 2013 at 11:39 AM, Austin Seipp <aseipp at pobox.com> wrote:

> I mean, it certainly *seems* reasonable a 15% hit could come from
> pipelining changes or cache behavior or something. I don't think
> alignment would really be a huge issue; post-Nehalem I believe
> non-aligned writes/reads are extremely cheap. Non-intuitive behavior
> can totally happen too: I've seen cases of adding instructions to a
> loop which speeds things up (e.g. by taking the extra step, you may
> mitigate a dependency stall, which massively helps pipelining across
> the loop body etc.)
>
> Nicolas, can I ask what benchmark you're looking at? And what
> performance tools are you using, Intels'? If you're on Linux, the
> 'perf' tool on a modern kernel can be used to quickly get an overview
> of how many cache misses/hits your process has, how many pipeline
> stalls occur, etc. You can then use it to drill down a bit into the
> assembly that's problematic.
>
> That might not give you an exact culprit (it could be many changes and
> accumulative hits,) but it's a start.
>
> On Wed, Jun 19, 2013 at 10:43 AM, Nicolas Frisby
> <nicolas.frisby at gmail.com> wrote:
> > I'm also seeing performance regressions in the shootout benchmarks that I
> > can't identify in the asm. The new asm looks better but performs worse,
> with
> > a ~15% slowdown.
> >
> > I fired up the performance counters in my CPU and the free Intel code for
> > inspecting them showed that my CPU utilization took about a 10% hit, even
> > while executing fewer total instructions.
> >
> >   1) Jan, perhaps we're seeing the same sort of behavior ? the shootout
> > benchmarks have extremely hot loops (hundreds of millions of iterations
> > IIRC). I used ticky profiling too, and saw no suspicious changes in any
> > counters.
> >
> >   2) Dear Low-level Gurus: How feasible is it that a ~15% slowdown in a
> > program with a very hot loop is due to incidentally inhibiting some
> caching
> > behavior (instr? data?)? Or perhaps effecting alignment? FTR my CPU is a
> > Core i7-2620M, Sandy Bridge.
> >
> > Thanks all.
> >
> > On Wed, Jun 19, 2013 at 9:27 AM, Jan Stolarek <jan.stolarek at p.lodz.pl>
> > wrote:
> >>
> >> > If it's not sorted out, can you open a ticket, put in the relevant
> info
> >> > (so
> >> > we don't need to look at the email trail), and we can tackle it when
> you
> >> > get here.
> >> Currently there's a temporary workaround: I'm using new folding rules
> for
> >> all primitive types,
> >> except for Integer, in which case I left the old folding rules
> unchanged.
> >> This of course should
> >> be modified to make all rules uniform, but for now it at least passes
> >> validation. I didn't fill
> >> the ticket, because the bug does not exist yet :) It only manifests
> itself
> >> in my patches, which
> >> have not been applied yet. I'll add all the information from this
> >> discussion to my github fork of
> >> GHC and then move it to Trac once the bug makes it to HEAD.
> >>
> >> What worries me more about my patches is the performance regression in
> >> kahan, because I see no
> >> obvious differences in the generated assembly.
> >>
> >> Janek
> >>
> >> >
> >> > Simon
> >> >
> >> > -----Original Message-----
> >> > From: ghc-devs-bounces at haskell.org [mailto:
> ghc-devs-bounces at haskell.org]
> >> > On
> >> > Behalf Of Jan Stolarek Sent: 20 May 2013 12:35
> >> > To: Ian Lynagh
> >> > Cc: ghc-devs at haskell.org
> >> > Subject: Re: Integer constant folding in the presence of new primops
> >> >
> >> > > If you remove everything but the quotInteger test from
> >> > > integerConstantFolding and compile with -ddump-rule-rewrites then
> >> > > you'll see that the eqInteger rule fires before quotInteger. This is
> >> > > presumably comparing against 0, as the definition of quot for
> Integer
> >> > > (in GHC.Real) is
> >> > >     _ `quot` 0 = divZeroError
> >> > >     n `quot` d = n `quotInteger` d
> >> >
> >> > Yes, I noticed these two rules firing together - perhaps that's the
> >> > explanation why. I created a small program for testing:
> >> >
> >> > main = print quotInt
> >> > quotInt :: Integer
> >> > quotInt = 100063 `quot` 156
> >> >
> >> > I noticed that when I define eqInteger wrapper to be NOINLINE, the
> call
> >> > to
> >> > quot is translated to Core as:
> >> >
> >> > Main.quotInt =
> >> >   GHC.Real.$fIntegralInteger_$cquot
> >> >     (__integer 100063) (__integer 156)
> >> >
> >> > but when I change the wrapper to INLINE I get:
> >> >
> >> > Main.quotInt =
> >> >   GHC.Real.$fNumRatio_$cquot             <-------- NumRatio instead of
> >> > IntegralInteger (__integer 100063) (__integer 156)
> >> >
> >> > All rule firing happens later (I used -ddump-simpl-iterations
> >> > -ddump-rule-firings), except that for $fNumRatio_$cquot the quot rules
> >> > don't fire.
> >> >
> >> > > Do you also still have eqInteger wired in? It sounds like you might
> >> > > have given them both the same unique?
> >> >
> >> > No, they didn't have the same unique. I modified the existing rules to
> >> > work
> >> > on the new primops and ignore their wrappers. At the moment I reverted
> >> > these changes so that I can make progress and leave this problem for
> >> > later.
> >> >
> >> > Janek
> >> >
> >> > _______________________________________________
> >> > ghc-devs mailing list
> >> > ghc-devs at haskell.org
> >> > http://www.haskell.org/mailman/listinfo/ghc-devs
> >>
> >>
> >>
> >> _______________________________________________
> >> ghc-devs mailing list
> >> ghc-devs at haskell.org
> >> http://www.haskell.org/mailman/listinfo/ghc-devs
> >
> >
> >
> > _______________________________________________
> > ghc-devs mailing list
> > ghc-devs at haskell.org
> > http://www.haskell.org/mailman/listinfo/ghc-devs
> >
>
>
>
> --
> Regards,
> Austin - PGP: 4096R/0x91384671
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130619/d5fbd875/attachment.htm>

Reply | Threaded
Open this post in threaded view
|

Integer constant folding in the presence of new primops

Jan Stolarek
In reply to this post by Nicolas Frisby
Nicolas, I kinda like that explanation, because it relieves me of any responsibility for this
problem :) Still, I have reasons to suspect that this might actually be my fault. Generated Core
is slightly different - the generated worker function accepts parameters in different order - and
I don't know why that happens. I also don't see why this would impact performance. Looks like I
will need to become familiar with the profiling tools that you mentioned.

Janek

Dnia ?roda, 19 czerwca 2013, Nicolas Frisby napisa?:

> I'm also seeing performance regressions in the shootout benchmarks that I
> can't identify in the asm. The new asm looks better but performs worse,
> with a ~15% slowdown.
>
> I fired up the performance counters in my CPU and the free Intel code for
> inspecting them showed that my CPU utilization took about a 10% hit, even
> while executing fewer total instructions.
>
>   1) Jan, perhaps we're seeing the same sort of behavior ? the shootout
> benchmarks have extremely hot loops (hundreds of millions of iterations
> IIRC). I used ticky profiling too, and saw no suspicious changes in any
> counters.
>
>   2) Dear Low-level Gurus: How feasible is it that a ~15% slowdown in a
> program with a very hot loop is due to incidentally inhibiting some caching
> behavior (instr? data?)? Or perhaps effecting alignment? FTR my CPU is a
> Core i7-2620M, Sandy Bridge.
>
> Thanks all.
>
> On Wed, Jun 19, 2013 at 9:27 AM, Jan Stolarek <jan.stolarek at p.lodz.pl>wrote:
> > > If it's not sorted out, can you open a ticket, put in the relevant info
> >
> > (so
> >
> > > we don't need to look at the email trail), and we can tackle it when
> > > you get here.
> >
> > Currently there's a temporary workaround: I'm using new folding rules for
> > all primitive types,
> > except for Integer, in which case I left the old folding rules unchanged.
> > This of course should
> > be modified to make all rules uniform, but for now it at least passes
> > validation. I didn't fill
> > the ticket, because the bug does not exist yet :) It only manifests
> > itself in my patches, which
> > have not been applied yet. I'll add all the information from this
> > discussion to my github fork of
> > GHC and then move it to Trac once the bug makes it to HEAD.
> >
> > What worries me more about my patches is the performance regression in
> > kahan, because I see no
> > obvious differences in the generated assembly.
> >
> > Janek
> >
> > > Simon
> > >
> > > -----Original Message-----
> > > From: ghc-devs-bounces at haskell.org
> > > [mailto:ghc-devs-bounces at haskell.org]
> >
> > On
> >
> > > Behalf Of Jan Stolarek Sent: 20 May 2013 12:35
> > > To: Ian Lynagh
> > > Cc: ghc-devs at haskell.org
> > > Subject: Re: Integer constant folding in the presence of new primops
> > >
> > > > If you remove everything but the quotInteger test from
> > > > integerConstantFolding and compile with -ddump-rule-rewrites then
> > > > you'll see that the eqInteger rule fires before quotInteger. This is
> > > > presumably comparing against 0, as the definition of quot for Integer
> > > > (in GHC.Real) is
> > > >     _ `quot` 0 = divZeroError
> > > >     n `quot` d = n `quotInteger` d
> > >
> > > Yes, I noticed these two rules firing together - perhaps that's the
> > > explanation why. I created a small program for testing:
> > >
> > > main = print quotInt
> > > quotInt :: Integer
> > > quotInt = 100063 `quot` 156
> > >
> > > I noticed that when I define eqInteger wrapper to be NOINLINE, the call
> >
> > to
> >
> > > quot is translated to Core as:
> > >
> > > Main.quotInt =
> > >   GHC.Real.$fIntegralInteger_$cquot
> > >     (__integer 100063) (__integer 156)
> > >
> > > but when I change the wrapper to INLINE I get:
> > >
> > > Main.quotInt =
> > >   GHC.Real.$fNumRatio_$cquot             <-------- NumRatio instead of
> > > IntegralInteger (__integer 100063) (__integer 156)
> > >
> > > All rule firing happens later (I used -ddump-simpl-iterations
> > > -ddump-rule-firings), except that for $fNumRatio_$cquot the quot rules
> > > don't fire.
> > >
> > > > Do you also still have eqInteger wired in? It sounds like you might
> > > > have given them both the same unique?
> > >
> > > No, they didn't have the same unique. I modified the existing rules to
> >
> > work
> >
> > > on the new primops and ignore their wrappers. At the moment I reverted
> > > these changes so that I can make progress and leave this problem for
> >
> > later.
> >
> > > Janek
> > >
> > > _______________________________________________
> > > ghc-devs mailing list
> > > ghc-devs at haskell.org
> > > http://www.haskell.org/mailman/listinfo/ghc-devs
> >
> > _______________________________________________
> > ghc-devs mailing list
> > ghc-devs at haskell.org
> > http://www.haskell.org/mailman/listinfo/ghc-devs




Reply | Threaded
Open this post in threaded view
|

Integer constant folding in the presence of new primops

Nicolas Frisby
The relief comes when we can confirm, explain, and hopefully avoid it :)
On Jun 19, 2013 3:20 PM, "Jan Stolarek" <jan.stolarek at p.lodz.pl> wrote:

> Nicolas, I kinda like that explanation, because it relieves me of any
> responsibility for this
> problem :) Still, I have reasons to suspect that this might actually be my
> fault. Generated Core
> is slightly different - the generated worker function accepts parameters
> in different order - and
> I don't know why that happens. I also don't see why this would impact
> performance. Looks like I
> will need to become familiar with the profiling tools that you mentioned.
>
> Janek
>
> Dnia ?roda, 19 czerwca 2013, Nicolas Frisby napisa?:
> > I'm also seeing performance regressions in the shootout benchmarks that I
> > can't identify in the asm. The new asm looks better but performs worse,
> > with a ~15% slowdown.
> >
> > I fired up the performance counters in my CPU and the free Intel code for
> > inspecting them showed that my CPU utilization took about a 10% hit, even
> > while executing fewer total instructions.
> >
> >   1) Jan, perhaps we're seeing the same sort of behavior -- the shootout
> > benchmarks have extremely hot loops (hundreds of millions of iterations
> > IIRC). I used ticky profiling too, and saw no suspicious changes in any
> > counters.
> >
> >   2) Dear Low-level Gurus: How feasible is it that a ~15% slowdown in a
> > program with a very hot loop is due to incidentally inhibiting some
> caching
> > behavior (instr? data?)? Or perhaps effecting alignment? FTR my CPU is a
> > Core i7-2620M, Sandy Bridge.
> >
> > Thanks all.
> >
> > On Wed, Jun 19, 2013 at 9:27 AM, Jan Stolarek <jan.stolarek at p.lodz.pl
> >wrote:
> > > > If it's not sorted out, can you open a ticket, put in the relevant
> info
> > >
> > > (so
> > >
> > > > we don't need to look at the email trail), and we can tackle it when
> > > > you get here.
> > >
> > > Currently there's a temporary workaround: I'm using new folding rules
> for
> > > all primitive types,
> > > except for Integer, in which case I left the old folding rules
> unchanged.
> > > This of course should
> > > be modified to make all rules uniform, but for now it at least passes
> > > validation. I didn't fill
> > > the ticket, because the bug does not exist yet :) It only manifests
> > > itself in my patches, which
> > > have not been applied yet. I'll add all the information from this
> > > discussion to my github fork of
> > > GHC and then move it to Trac once the bug makes it to HEAD.
> > >
> > > What worries me more about my patches is the performance regression in
> > > kahan, because I see no
> > > obvious differences in the generated assembly.
> > >
> > > Janek
> > >
> > > > Simon
> > > >
> > > > -----Original Message-----
> > > > From: ghc-devs-bounces at haskell.org
> > > > [mailto:ghc-devs-bounces at haskell.org]
> > >
> > > On
> > >
> > > > Behalf Of Jan Stolarek Sent: 20 May 2013 12:35
> > > > To: Ian Lynagh
> > > > Cc: ghc-devs at haskell.org
> > > > Subject: Re: Integer constant folding in the presence of new primops
> > > >
> > > > > If you remove everything but the quotInteger test from
> > > > > integerConstantFolding and compile with -ddump-rule-rewrites then
> > > > > you'll see that the eqInteger rule fires before quotInteger. This
> is
> > > > > presumably comparing against 0, as the definition of quot for
> Integer
> > > > > (in GHC.Real) is
> > > > >     _ `quot` 0 = divZeroError
> > > > >     n `quot` d = n `quotInteger` d
> > > >
> > > > Yes, I noticed these two rules firing together - perhaps that's the
> > > > explanation why. I created a small program for testing:
> > > >
> > > > main = print quotInt
> > > > quotInt :: Integer
> > > > quotInt = 100063 `quot` 156
> > > >
> > > > I noticed that when I define eqInteger wrapper to be NOINLINE, the
> call
> > >
> > > to
> > >
> > > > quot is translated to Core as:
> > > >
> > > > Main.quotInt =
> > > >   GHC.Real.$fIntegralInteger_$cquot
> > > >     (__integer 100063) (__integer 156)
> > > >
> > > > but when I change the wrapper to INLINE I get:
> > > >
> > > > Main.quotInt =
> > > >   GHC.Real.$fNumRatio_$cquot             <-------- NumRatio instead
> of
> > > > IntegralInteger (__integer 100063) (__integer 156)
> > > >
> > > > All rule firing happens later (I used -ddump-simpl-iterations
> > > > -ddump-rule-firings), except that for $fNumRatio_$cquot the quot
> rules
> > > > don't fire.
> > > >
> > > > > Do you also still have eqInteger wired in? It sounds like you might
> > > > > have given them both the same unique?
> > > >
> > > > No, they didn't have the same unique. I modified the existing rules
> to
> > >
> > > work
> > >
> > > > on the new primops and ignore their wrappers. At the moment I
> reverted
> > > > these changes so that I can make progress and leave this problem for
> > >
> > > later.
> > >
> > > > Janek
> > > >
> > > > _______________________________________________
> > > > ghc-devs mailing list
> > > > ghc-devs at haskell.org
> > > > http://www.haskell.org/mailman/listinfo/ghc-devs
> > >
> > > _______________________________________________
> > > ghc-devs mailing list
> > > ghc-devs at haskell.org
> > > http://www.haskell.org/mailman/listinfo/ghc-devs
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130619/3efd7377/attachment.htm>