Common Context transformation for join points

4 messages
Open this post in threaded view
|

Common Context transformation for join points

 Hi all, I just came up with this, and unless I write it down I doubt I?ll be able to fall asleep. The problem I?m trying to solve is the bad interaction of the CPR worker wrapper transformation and join points. Consider, as a running example: f x =  let \$j n = Just (Int# (n +# 1))  in case x of       A -> \$j 1       B -> \$j 2       C -> Just undefined The join point \$j has the important property that in the final code, it can be jumped to, because its result is also the result of f, so no function closure has to be allocated for it. It seems that this is crucial for performance. But now lets do CPR, and just for demonstration, do nested CPR (the effect can also occur with non-nested CPR). Doing w/w we get \$wf x = case (  let \$w\$j n = n +# 1      \$j n = Just (Int# (\$w\$j n))  in case x of       A -> \$j 1       B -> \$j 2       C -> Just undefined  ) of Just n -> n f x = Just (\$wf x) now we simplify, i.e. move the worker transformation inside, and inline \$j?s wrapper. (I hope I am doing this right; but this is what I expect to happen): \$wf x =  let \$w\$j n = n+1  in case x of       A -> Int# (\$w\$j 1)       B -> Int# (\$w\$j 2)       C -> undefined f x = Just (\$wf x) So f?s worker transformation has nicely canceled with the invocations of Just, but it still leaves a "Int#" invocation around \$w\$j, so we lose the no-let-escape property of it and get worse code in the backend. This will always happen if the join point function gets a richer CPR property than the function that it belongs to, and is the reason for some hacks in the CPR code (i.e. no CPR for locally stuff that returns a sum type... not nice.) I think I have a solution for this. One observation is that these \$j-functions are not really functions of their own. For this transformation (and possibly for others), one should try to treat them so that they behave as if they were inlined. So here is the idea: We do not do any w/w for \$j at first, on the premise that it is not a proper function of its own. We do w/w for f as before, and simplify as before, ending up with \$wf x =  let \$j n = Just (Int# (n +# 1))  in case x of       A -> case \$j 1 of Just n -> n       B -> case \$j 2 of Just n -> n       C -> undefined f x = Just (\$wf x) so we temporarily broke the property. Now a new simplifier step kicks in: For a let that looks like it was or should be a join point (hand waving here ? one could possibly do it for all non-recursive lets, maybe there is even more to gain), we find out         for all uses of \$j applied to its arity number of arguments,         what is the largest common context?         where a context is       * ?, the trivial context,       * case c of ..., where c is a context, and ... does not contain a         call to \$j,       * f c, where c is a context and f does not contain a call to \$j,         or       * c x, where c is a context and f does not contain a call to \$j. In particular we do not include code here where \$j is part of an alternative of a case. (Casts should ok fine as well, ignoring them for now.) In our example, the context would be "case ? of Just n -> n". Once we have found that, I believe it will be always safe and beneficial to replace the body e of \$j by c[e], and in the body of the let replace every c[\$j args..] by \$j args. In our example, we would obtain \$wf x =  let ?j n = case Just (Int# (n +# 1)) of Just n -> n  in case x of       A -> \$j 1       B -> \$j 2       C -> undefined f x = Just (\$wf x) which then in further steps simplifies to the in all respects ideal code \$wf x =  let ?j n = Int# (n +# 1))  in case x of       A -> \$j 1       B -> \$j 2       C -> undefined f x = Just (\$wf x) Implementationwise I figured that this will need two traversals (down and up each) of the let body: Starting from a let binding that looks promising (e.g. one-shot non-recursive lambda), walk down the body until invocations of the right arity are found. Then walk back returning a growing contest, and at case-expressions lub these contexts by taking their common prefix (or suffix, depends on how they are represented :-)). Then decide whether the transformation is useful (i.e. the context is non-trivial), and if so, walk back down to find the \$j invocations again, and then walk back, removing the final context while doing so. I believe this would make some let-no-escape-related reluctanceness in the CPR-analysis obsolete, which would be great. And it might be a worthwhile transformation on its own (if it actually does occur), as it reduces code size without any downside (as far as I can see). Now it is likely that I thought of something existing here, or of something that has been tried and found not useful or feasible. If so, please let me know. Good night, Joachim -- Joachim ?nomeata? Breitner   mail at joachim-breitner.de ? http://www.joachim-breitner.de/  Jabber: nomeata at joachim-breitner.de  ? GPG-Key: 0x4743206C   Debian Developer: nomeata at debian.org -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part URL:
Open this post in threaded view
|

Common Context transformation for join points

 Hi, I created a first prototype of this optimization (branch wip/common-context), and the results are a clear, although very small win in some benchmarks, and no regression (ignoring the flaky cacheprof): --------------------------------------------------------------------------------         Program           Size    Allocs   Runtime   Elapsed  TotalMem --------------------------------------------------------------------------------       cacheprof          -0.0%     +0.1%     +2.4%     +2.4%     +0.0%        fibheaps          -0.0%     -0.3%      0.02      0.02     +0.0%           fluid          -0.1%     -0.0%      0.01      0.01     +0.0%          gamteb          -0.0%     -0.3%      0.04      0.04     +0.0%             ida          -0.0%     -1.4%      0.06      0.06     +0.0%             scs          -0.0%     -0.2%     -0.5%     -0.5%     +0.0%          simple          -0.0%     -0.8%      0.15      0.15     +0.0% --------------------------------------------------------------------------------             Min          -0.1%     -1.4%     -4.5%     -4.5%     +0.0%             Max          -0.0%     +0.1%     +3.3%     +3.3%     +6.7%  Geometric Mean          -0.0%     -0.0%     +0.2%     +0.2%     +0.1% I don?t have much experience with performance numbers in GHC yet, but I guess a geometric mean of -0.0% is not something to get excited about. Maybe in conjunction with CPR and nested CPR, there will be a combined benefit. Not sure why size does not go down generally, as I expected the optimization to take code that is generated more than once, and move it to where it is generated only once. Greetings, Joachim -- Joachim ?nomeata? Breitner   mail at joachim-breitner.de ? http://www.joachim-breitner.de/  Jabber: nomeata at joachim-breitner.de  ? GPG-Key: 0x4743206C   Debian Developer: nomeata at debian.org -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part URL: