Common Context transformation for join points

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Common Context transformation for join points

Joachim Breitner-2
Hi all,

I just came up with this, and unless I write it down I doubt I?ll be
able to fall asleep.

The problem I?m trying to solve is the bad interaction of the CPR worker
wrapper transformation and join points. Consider, as a running example:

f x =
 let $j n = Just (Int# (n +# 1))
 in case x of
      A -> $j 1
      B -> $j 2
      C -> Just undefined

The join point $j has the important property that in the final code, it
can be jumped to, because its result is also the result of f, so no
function closure has to be allocated for it. It seems that this is
crucial for performance.

But now lets do CPR, and just for demonstration, do nested CPR (the
effect can also occur with non-nested CPR). Doing w/w we get

$wf x = case (
 let $w$j n = n +# 1
     $j n = Just (Int# ($w$j n))
 in case x of
      A -> $j 1
      B -> $j 2
      C -> Just undefined
 ) of Just n -> n
f x = Just ($wf x)

now we simplify, i.e. move the worker transformation inside, and inline
$j?s wrapper. (I hope I am doing this right; but this is what I expect
to happen):

$wf x =
 let $w$j n = n+1
 in case x of
      A -> Int# ($w$j 1)
      B -> Int# ($w$j 2)
      C -> undefined
f x = Just ($wf x)

So f?s worker transformation has nicely canceled with the invocations of
Just, but it still leaves a "Int#" invocation around $w$j, so we lose
the no-let-escape property of it and get worse code in the backend. This
will always happen if the join point function gets a richer CPR property
than the function that it belongs to, and is the reason for some hacks
in the CPR code (i.e. no CPR for locally stuff that returns a sum
type... not nice.)

I think I have a solution for this. One observation is that these
$j-functions are not really functions of their own. For this
transformation (and possibly for others), one should try to treat them
so that they behave as if they were inlined. So here is the idea:

We do not do any w/w for $j at first, on the premise that it is not a
proper function of its own. We do w/w for f as before, and simplify as
before, ending up with

$wf x =
 let $j n = Just (Int# (n +# 1))
 in case x of
      A -> case $j 1 of Just n -> n
      B -> case $j 2 of Just n -> n
      C -> undefined
f x = Just ($wf x)

so we temporarily broke the property. Now a new simplifier step kicks
in: For a let that looks like it was or should be a join point (hand
waving here ? one could possibly do it for all non-recursive lets, maybe
there is even more to gain), we find out

        for all uses of $j applied to its arity number of arguments,
        what is the largest common context?
       
where a context is
      * ?, the trivial context,
      * case c of ..., where c is a context, and ... does not contain a
        call to $j,
      * f c, where c is a context and f does not contain a call to $j,
        or
      * c x, where c is a context and f does not contain a call to $j.
In particular we do not include code here where $j is part of an
alternative of a case. (Casts should ok fine as well, ignoring them for
now.)

In our example, the context would be "case ? of Just n -> n".

Once we have found that, I believe it will be always safe and beneficial
to replace the body e of $j by c[e], and in the body of the let replace
every c[$j args..] by $j args.

In our example, we would obtain

$wf x =
 let ?j n = case Just (Int# (n +# 1)) of Just n -> n
 in case x of
      A -> $j 1
      B -> $j 2
      C -> undefined
f x = Just ($wf x)

which then in further steps simplifies to the in all respects ideal code

$wf x =
 let ?j n = Int# (n +# 1))
 in case x of
      A -> $j 1
      B -> $j 2
      C -> undefined
f x = Just ($wf x)


Implementationwise I figured that this will need two traversals (down
and up each) of the let body: Starting from a let binding that looks
promising (e.g. one-shot non-recursive lambda), walk down the body until
invocations of the right arity are found. Then walk back returning a
growing contest, and at case-expressions lub these contexts by taking
their common prefix (or suffix, depends on how they are
represented :-)). Then decide whether the transformation is useful (i.e.
the context is non-trivial), and if so, walk back down to find the $j
invocations again, and then walk back, removing the final context while
doing so.


I believe this would make some let-no-escape-related reluctanceness in
the CPR-analysis obsolete, which would be great. And it might be a
worthwhile transformation on its own (if it actually does occur), as it
reduces code size without any downside (as far as I can see).


Now it is likely that I thought of something existing here, or of
something that has been tried and found not useful or feasible. If so,
please let me know.


Good night,
Joachim


--
Joachim ?nomeata? Breitner
  mail at joachim-breitner.de ? http://www.joachim-breitner.de/
  Jabber: nomeata at joachim-breitner.de  ? GPG-Key: 0x4743206C
  Debian Developer: nomeata at debian.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20131212/8b54b8ff/attachment.sig>

Reply | Threaded
Open this post in threaded view
|

Common Context transformation for join points

Joachim Breitner-2
Hi,

I created a first prototype of this optimization (branch
wip/common-context), and the results are a clear, although very small
win in some benchmarks, and no regression (ignoring the flaky
cacheprof):

--------------------------------------------------------------------------------
        Program           Size    Allocs   Runtime   Elapsed  TotalMem
--------------------------------------------------------------------------------
      cacheprof          -0.0%     +0.1%     +2.4%     +2.4%     +0.0%
       fibheaps          -0.0%     -0.3%      0.02      0.02     +0.0%
          fluid          -0.1%     -0.0%      0.01      0.01     +0.0%
         gamteb          -0.0%     -0.3%      0.04      0.04     +0.0%
            ida          -0.0%     -1.4%      0.06      0.06     +0.0%
            scs          -0.0%     -0.2%     -0.5%     -0.5%     +0.0%
         simple          -0.0%     -0.8%      0.15      0.15     +0.0%
--------------------------------------------------------------------------------
            Min          -0.1%     -1.4%     -4.5%     -4.5%     +0.0%
            Max          -0.0%     +0.1%     +3.3%     +3.3%     +6.7%
 Geometric Mean          -0.0%     -0.0%     +0.2%     +0.2%     +0.1%

I don?t have much experience with performance numbers in GHC yet, but I
guess a geometric mean of -0.0% is not something to get excited about.
Maybe in conjunction with CPR and nested CPR, there will be a combined
benefit.

Not sure why size does not go down generally, as I expected the
optimization to take code that is generated more than once, and move it
to where it is generated only once.

Greetings,
Joachim

--
Joachim ?nomeata? Breitner
  mail at joachim-breitner.de ? http://www.joachim-breitner.de/
  Jabber: nomeata at joachim-breitner.de  ? GPG-Key: 0x4743206C
  Debian Developer: nomeata at debian.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20131218/4b70f350/attachment.sig>

Reply | Threaded
Open this post in threaded view
|

Common Context transformation for join points

Simon Peyton Jones
In reply to this post by Joachim Breitner-2
Joachim

Interesting.  We can discuss when you get back, but let me jot down one comment. You write:

| This will always happen if the join point function gets a richer
| CPR property than the function that it belongs to

Good observation.  Maybe we can exploit it directly rather than figuring out "common contexts"?  If we knew, for each local function, whether it was a join point, and if so for what, we could just make its inferred CPR property no richer than its "parent", perhaps.

Simon


| -----Original Message-----
| From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On Behalf Of
| Joachim Breitner
| Sent: 12 December 2013 01:29
| To: ghc-devs at haskell.org
| Subject: Common Context transformation for join points
|
| Hi all,
|
| I just came up with this, and unless I write it down I doubt I?ll be
| able to fall asleep.
|
| The problem I?m trying to solve is the bad interaction of the CPR worker
| wrapper transformation and join points. Consider, as a running example:
|
| f x =
|  let $j n = Just (Int# (n +# 1))
|  in case x of
|       A -> $j 1
|       B -> $j 2
|       C -> Just undefined
|
| The join point $j has the important property that in the final code, it
| can be jumped to, because its result is also the result of f, so no
| function closure has to be allocated for it. It seems that this is
| crucial for performance.
|
| But now lets do CPR, and just for demonstration, do nested CPR (the
| effect can also occur with non-nested CPR). Doing w/w we get
|
| $wf x = case (
|  let $w$j n = n +# 1
|      $j n = Just (Int# ($w$j n))
|  in case x of
|       A -> $j 1
|       B -> $j 2
|       C -> Just undefined
|  ) of Just n -> n
| f x = Just ($wf x)
|
| now we simplify, i.e. move the worker transformation inside, and inline
| $j?s wrapper. (I hope I am doing this right; but this is what I expect
| to happen):
|
| $wf x =
|  let $w$j n = n+1
|  in case x of
|       A -> Int# ($w$j 1)
|       B -> Int# ($w$j 2)
|       C -> undefined
| f x = Just ($wf x)
|
| So f?s worker transformation has nicely canceled with the invocations of
| Just, but it still leaves a "Int#" invocation around $w$j, so we lose
| the no-let-escape property of it and get worse code in the backend. This
| will always happen if the join point function gets a richer CPR property
| than the function that it belongs to, and is the reason for some hacks
| in the CPR code (i.e. no CPR for locally stuff that returns a sum
| type... not nice.)
|
| I think I have a solution for this. One observation is that these $j-
| functions are not really functions of their own. For this transformation
| (and possibly for others), one should try to treat them so that they
| behave as if they were inlined. So here is the idea:
|
| We do not do any w/w for $j at first, on the premise that it is not a
| proper function of its own. We do w/w for f as before, and simplify as
| before, ending up with
|
| $wf x =
|  let $j n = Just (Int# (n +# 1))
|  in case x of
|       A -> case $j 1 of Just n -> n
|       B -> case $j 2 of Just n -> n
|       C -> undefined
| f x = Just ($wf x)
|
| so we temporarily broke the property. Now a new simplifier step kicks
| in: For a let that looks like it was or should be a join point (hand
| waving here ? one could possibly do it for all non-recursive lets, maybe
| there is even more to gain), we find out
|
|         for all uses of $j applied to its arity number of arguments,
|         what is the largest common context?
|
| where a context is
|       * ?, the trivial context,
|       * case c of ..., where c is a context, and ... does not contain a
|         call to $j,
|       * f c, where c is a context and f does not contain a call to $j,
|         or
|       * c x, where c is a context and f does not contain a call to $j.
| In particular we do not include code here where $j is part of an
| alternative of a case. (Casts should ok fine as well, ignoring them for
| now.)
|
| In our example, the context would be "case ? of Just n -> n".
|
| Once we have found that, I believe it will be always safe and beneficial
| to replace the body e of $j by c[e], and in the body of the let replace
| every c[$j args..] by $j args.
|
| In our example, we would obtain
|
| $wf x =
|  let ?j n = case Just (Int# (n +# 1)) of Just n -> n  in case x of
|       A -> $j 1
|       B -> $j 2
|       C -> undefined
| f x = Just ($wf x)
|
| which then in further steps simplifies to the in all respects ideal code
|
| $wf x =
|  let ?j n = Int# (n +# 1))
|  in case x of
|       A -> $j 1
|       B -> $j 2
|       C -> undefined
| f x = Just ($wf x)
|
|
| Implementationwise I figured that this will need two traversals (down
| and up each) of the let body: Starting from a let binding that looks
| promising (e.g. one-shot non-recursive lambda), walk down the body until
| invocations of the right arity are found. Then walk back returning a
| growing contest, and at case-expressions lub these contexts by taking
| their common prefix (or suffix, depends on how they are
| represented :-)). Then decide whether the transformation is useful (i.e.
| the context is non-trivial), and if so, walk back down to find the $j
| invocations again, and then walk back, removing the final context while
| doing so.
|
|
| I believe this would make some let-no-escape-related reluctanceness in
| the CPR-analysis obsolete, which would be great. And it might be a
| worthwhile transformation on its own (if it actually does occur), as it
| reduces code size without any downside (as far as I can see).
|
|
| Now it is likely that I thought of something existing here, or of
| something that has been tried and found not useful or feasible. If so,
| please let me know.
|
|
| Good night,
| Joachim
|
|
| --
| Joachim ?nomeata? Breitner
|   mail at joachim-breitner.de ? http://www.joachim-breitner.de/
|   Jabber: nomeata at joachim-breitner.de  ? GPG-Key: 0x4743206C
|   Debian Developer: nomeata at debian.org

Reply | Threaded
Open this post in threaded view
|

Common Context transformation for join points

Joachim Breitner-2
Hi,

Am Dienstag, den 31.12.2013, 11:33 +0000 schrieb Simon Peyton-Jones:

> Interesting.  We can discuss when you get back, but let me jot down
> one comment. You write:
>
> | This will always happen if the join point function gets a richer
> | CPR property than the function that it belongs to
>
> Good observation.  Maybe we can exploit it directly rather than
> figuring out "common contexts"?  If we knew, for each local function,
> whether it was a join point, and if so for what, we could just make
> its inferred CPR property no richer than its "parent", perhaps.

correct, I thought about this as well. It could be a post-processing
step, after demand analysis, but before worker-wrapper, that goes
through the file and reduces CPR information where it is found to be
non-helpful.

There is also a related issue with CPR?ing a function whose call is used
in a argument position:
  let f x = ... --
  in ... g (f z) ...
When is CPR?ing f useful? Only if g will get w/w?ed in a way that
cancels with the CPR wrapper of f.
And when is that the case? If the demand of g on its first argument is
as least as rich as f CPR?s wrapper. Again, if f?s CPR information is
richer than g?s demand CPR is not useful.

It seems that a generalization of the idea ?fix CPR information to be no
richer than useful? will need some kind of demand information on
arguments.

(In this case, doing CPR of f and then Common Context will clean up any
useless unwrapping, if all uses of f have the same problem.)


Greetings from Karlsruhe,
Joachim
--
Joachim ?nomeata? Breitner
  mail at joachim-breitner.de ? http://www.joachim-breitner.de/
  Jabber: nomeata at joachim-breitner.de  ? GPG-Key: 0x4743206C
  Debian Developer: nomeata at debian.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20131231/35d4b713/attachment.sig>