Fusion of lists and chunky sequences

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Fusion of lists and chunky sequences

Henning Thielemann

On the one hand I like to use lists in my code because element types can
be freely chosen and maximum laziness allows feedback and other tricks. On
the other hand there are ByteString.Lazy and one could build chunky
sequences from other fast array implementations. They have constraints on
the element types like Storable in
  http://code.haskell.org/~sjanssen/storablevector
 and IArray multi-parameter type class for UArray.

 I like to continue writing code for lists, and let the optimizer replace
it by operations on chunky sequences whereever possible. E.g.
  Chunky.fromList (List.map f (List.unfoldr g x))
 might be transformed to
  Chunky.unfoldr (f ... g) x

Chunky.fromList serves two purposes:
 1. I obtain the result in a data structure that can be quickly accessed
by further operations.
 2. It tells the optimizer that element granularity for laziness is not
needed and that the element type fulfills the constraint of the fast array
type, and thus fusion can go on safely. (As far as I can see.)

 Is there some framework which fuses lists and chunky sequences? When
writing fusion rules like the above one by myself, then they interfer with
Prelude's fusion rules (and they would probably also interfer with those
of an alternative list fusion framework). The 'map' and 'unfoldr' is
certainly already fused to the internal 'build' or to another auxiliary
function. As far as I know, I cannot disable the Prelude rules and give
the List-Chunk rules priority higher than Prelude's ones.
 I hoped to be able to apply application specific fusion rules by defining
a newtype wrapper around the chunky sequence type, while keeping the rest
of the list code unchanged. You might argue, that code cannot be
application specific if it still relies on the generic list type. Maybe
it's the best to wrap the list type in a newtype and lift all of the
application relevant list functions to this type and then define fusion
rules on the lifted functions.
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Fusion of lists and chunky sequences

Don Stewart-2
lemming:

>
> On the one hand I like to use lists in my code because element types can
> be freely chosen and maximum laziness allows feedback and other tricks. On
> the other hand there are ByteString.Lazy and one could build chunky
> sequences from other fast array implementations. They have constraints on
> the element types like Storable in
>   http://code.haskell.org/~sjanssen/storablevector
>  and IArray multi-parameter type class for UArray.
>
>  I like to continue writing code for lists, and let the optimizer replace
> it by operations on chunky sequences whereever possible. E.g.
>   Chunky.fromList (List.map f (List.unfoldr g x))
>  might be transformed to
>   Chunky.unfoldr (f ... g) x
>
> Chunky.fromList serves two purposes:
>  1. I obtain the result in a data structure that can be quickly accessed
> by further operations.
>  2. It tells the optimizer that element granularity for laziness is not
> needed and that the element type fulfills the constraint of the fast array
> type, and thus fusion can go on safely. (As far as I can see.)
>
>  Is there some framework which fuses lists and chunky sequences? When
> writing fusion rules like the above one by myself, then they interfer with
> Prelude's fusion rules (and they would probably also interfer with those
> of an alternative list fusion framework). The 'map' and 'unfoldr' is
> certainly already fused to the internal 'build' or to another auxiliary
> function. As far as I know, I cannot disable the Prelude rules and give
> the List-Chunk rules priority higher than Prelude's ones.

You can, with some caveats, use a single fusion system across data
structures, and avoid the built in build/foldr system.

I'd start by installing the stream-fusion list library, from hackage,
which gives you the list api, and a fusion mechanism.

To avoid the build in fusion system, you need to:

    * avoid list comprehensions
    * avoid .. (use Stream.enumFromTo instead)

then you can write fusion rules for your structure in terms of streams,
and they'll fuse with list operations as well.

Duncan, Roman and I plan to have strict and lazy bytestrings fusing
on top of the stream-fusion package in Q1 this year, but you can start
now looking at other data structures.

>  I hoped to be able to apply application specific fusion rules by defining
> a newtype wrapper around the chunky sequence type, while keeping the rest
> of the list code unchanged. You might argue, that code cannot be
> application specific if it still relies on the generic list type. Maybe
> it's the best to wrap the list type in a newtype and lift all of the
> application relevant list functions to this type and then define fusion
> rules on the lifted functions.

-- Don
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Fusion vs. inlining (Was: Fusion of lists and chunky sequences)

Henning Thielemann

On Thu, 3 Jan 2008, Don Stewart wrote:

> You can, with some caveats, use a single fusion system across data
> structures, and avoid the built in build/foldr system.
>
> I'd start by installing the stream-fusion list library, from hackage,
> which gives you the list api, and a fusion mechanism.
>
> To avoid the build in fusion system, you need to:

With 'build in' you mean the fusion system of Prelude?

>     * avoid list comprehensions
>     * avoid .. (use Stream.enumFromTo instead)

No problem.

> then you can write fusion rules for your structure in terms of streams,
> and they'll fuse with list operations as well.

But then in order to fuse your streams with my chunky sequences I have to
know, how stream-fusion fuses streams, right?



Anyway, I tried to wrap Prelude lists in a newtype and thus got GHC (still
6.4.1) to invoke my rules instead of the Prelude rules. But I encountered
the following problem: I define something like

  nonFusable x y = fusable (aux x y)

 where fusion rules are defined for 'fusable', but not for 'nonFusable'. I
hoped that 'nonFusable' will be inlined and then 'fusable' is fused with
other expressions. This does not happen. If I state the function
definition also as rule, then GHC fuses eagerly.
 Analogously I observed that usage of ($) and (.) blocks fusion, and when
I add the rules

  "unfold-dollar" forall f x.
     f $ x = f x ;

  "unfold-dot" forall f g.
     f . g  =  \x -> f (g x) ;

 then fusion takes place as expected.

Am I doing something wrong? Do I have to dig into phase control? I
wouldn't like that, since it seems to be too fragile.
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Fusion vs. inlining (Was: Fusion of lists and chunky sequences)

Roman Leshchinskiy
Henning Thielemann wrote:

>
> Anyway, I tried to wrap Prelude lists in a newtype and thus got GHC (still
> 6.4.1) to invoke my rules instead of the Prelude rules. But I encountered
> the following problem: I define something like
>
>   nonFusable x y = fusable (aux x y)
>
>  where fusion rules are defined for 'fusable', but not for 'nonFusable'. I
> hoped that 'nonFusable' will be inlined and then 'fusable' is fused with
> other expressions. This does not happen. If I state the function
> definition also as rule, then GHC fuses eagerly.

I suspect that fusable and/or aux are inlined into nonFusable when the
latter is compiled. That's too early - you want nonFusable (with the
simple definition above) to be inlined into the client code first. Adding

   {-# INLINE nonFusable #-}

should take care of this.

>  Analogously I observed that usage of ($) and (.) blocks fusion, and when
> I add the rules
>
>   "unfold-dollar" forall f x.
>      f $ x = f x ;
>
>   "unfold-dot" forall f g.
>      f . g  =  \x -> f (g x) ;
>
>  then fusion takes place as expected.

That shouldn't be necessary, these two ought to be inlined. Do you have
a concrete example where this happens?

Roman

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Fusion vs. inlining (Was: Fusion of lists and chunky sequences)

Henning Thielemann

On Tue, 8 Jan 2008, Roman Leshchinskiy wrote:

> Henning Thielemann wrote:
> >
> > Anyway, I tried to wrap Prelude lists in a newtype and thus got GHC (still
> > 6.4.1) to invoke my rules instead of the Prelude rules. But I encountered
> > the following problem: I define something like
> >
> >   nonFusable x y = fusable (aux x y)
> >
> >  where fusion rules are defined for 'fusable', but not for 'nonFusable'. I
> > hoped that 'nonFusable' will be inlined and then 'fusable' is fused with
> > other expressions. This does not happen. If I state the function
> > definition also as rule, then GHC fuses eagerly.
>
> I suspect that fusable and/or aux are inlined into nonFusable when the
> latter is compiled. That's too early - you want nonFusable (with the
> simple definition above) to be inlined into the client code first. Adding
>
>    {-# INLINE nonFusable #-}
>
> should take care of this.

I forget to mention, that I already declared
  {-# INLINE nonFusable #-}

As you guessed, it seems that 'fusable' was inlined and thus was no longer
available for fusion. So I set INLINE for nonFusable and NOINLINE for
fusable. Now I see the foldr/build counter decreases in the simplifier
statistics, where the counter for my custom rules increases.


> >  Analogously I observed that usage of ($) and (.) blocks fusion, and when
> > I add the rules
> >
> >   "unfold-dollar" forall f x.
> >      f $ x = f x ;
> >
> >   "unfold-dot" forall f g.
> >      f . g  =  \x -> f (g x) ;
> >
> >  then fusion takes place as expected.
>
> That shouldn't be necessary, these two ought to be inlined. Do you have
> a concrete example where this happens?

 When I constructed a small example I encountered the behaviour you
mentioned. However there are still several cases where I expected that
fusion took place but it didn't. Maybe I can track them down to very
simple cases. Since the functions on the chunky structure can be faster
than the list functions by a factor of 30, I'm very keen on getting the
list functions eliminated completely.
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Properties of optimizer rule application?

Henning Thielemann
In reply to this post by Roman Leshchinskiy

Reading various papers and the Wiki about GHC optimizer rules I got the
impression that there are not much properties I can rely on and I wonder
how I can write a reliable fusion framework with this constraint.
 I read about the strategy to replace functions early by fusable
implementations and replace them back to fast low-level implementation if
fusion was not possible. However, can I rely on the back-translation if I
have no warranty that the corresponding rule is applied? Is there some
warranty that rules are applied as long as applicable rules are available
or is the optimizer free to decide that it worked enough for today?
 I see several phases with a fixed number of iterations in the output of
-ddump-simpl-iterations. Is there some idea behind these phases or is the
structure and number rather arbitrary? If there is only a fixed number of
simplifier runs, how can I rely on complete fusion of arbitrary large
expressions?
 At some place I read that the order of application of rules is arbitrary.
I like to have some warranty that more special rules are applied before
more general rules. That is, if rule X is applicable whereever Y is
applicable then Y shall be tried before X. This is currently not assured,
right?
 Another text passage tells that the simplification is inside-out
expressions. Such a direction would make the design of rules definitely
easier. Having both directions, maybe alternating in the runs of the
simplifier, would be also nice. I could then design transforms of the
kind:
   toFastStructure . slowA . slowB . slowC . slowWithNoFastCounterpart
   fastA . toFastStructure . slowB . slowC . slowWithNoFastCounterpart
   fastA . fastB . toFastStructure . slowC . slowWithNoFastCounterpart
   fastA . fastB . fastC . toFastStructure . slowWithNoFastCounterpart
   fastA . fastBC . toFastStructure . slowWithNoFastCounterpart
   fastABC . toFastStructure . slowWithNoFastCounterpart

 On the one hand the inner of functions may not be available to fusion, if
the INLINE pragma is omitted. As far as I know inlining may take place
also without the INLINE pragma, but I have no warranty. Can I rely on
functions being inlined with INLINE pragma? Somewhere I read that
functions are not inlined if there is still an applicable rule that uses
the function on the left-hand side. Altogether I'm uncertain how inlining
is interleaved with rule application. It was said, that rules are just
alternative function definitions. In this sense a function definition with
INLINE is a more aggressively used simplifier rule, right?
 On the other hand if I set the INLINE pragma then the inner of the
function is not fused. If this would be the case, I could guide the
optimizer to fuse several sub-expressions before others. Say,
  doubleMap f g = map f . map g
 could be fused to
  doubleMap f g = map (f . g)
 and then this fused version can be fused further in the context of the
caller. The current situation seems to be that {-# INLINE doubleMap #-}
switches off local fusion and allows global fusion, whereas omitting the
INLINE pragma switches on local fusion and disallows global fusion. How
can I have both of them?
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

RE: Properties of optimizer rule application?

Simon Peyton Jones
GHC has one main mechanism for controlling the application of rules, namely simplifier "phases".  You can say "apply this rule only after phase N" or "apply this rule only before phase N".  Similarly for INLINE pragmas.  The manual describes this in detail.

I urge against relying on "top-down" or "bottom-up" guarantees, because they are fragile: if you miss a single opportunity to apply rule A, then rule B may kick in; but a later inlining or other simplification might make rule A applicable.  Phases are the way to go.

That said, GHC has much too rigid a notion of phases at the moment. There are precisely 3, namely 2 then 1 then 0, and that does not give enough control.  Really we should let you give arbitrary names to phases, express constraints (A must be before B), and run a constraint solver to map phase names to a linear ordering.  The current system is horribly non-modular.

There's scope for an intern project here.

Simon

| -----Original Message-----
| From: [hidden email] [mailto:[hidden email]] On Behalf Of Henning Thielemann
| Sent: 16 January 2008 09:57
| To: Haskell Cafe
| Subject: [Haskell-cafe] Properties of optimizer rule application?
|
|
| Reading various papers and the Wiki about GHC optimizer rules I got the
| impression that there are not much properties I can rely on and I wonder
| how I can write a reliable fusion framework with this constraint.
|  I read about the strategy to replace functions early by fusable
| implementations and replace them back to fast low-level implementation if
| fusion was not possible. However, can I rely on the back-translation if I
| have no warranty that the corresponding rule is applied? Is there some
| warranty that rules are applied as long as applicable rules are available
| or is the optimizer free to decide that it worked enough for today?
|  I see several phases with a fixed number of iterations in the output of
| -ddump-simpl-iterations. Is there some idea behind these phases or is the
| structure and number rather arbitrary? If there is only a fixed number of
| simplifier runs, how can I rely on complete fusion of arbitrary large
| expressions?
|  At some place I read that the order of application of rules is arbitrary.
| I like to have some warranty that more special rules are applied before
| more general rules. That is, if rule X is applicable whereever Y is
| applicable then Y shall be tried before X. This is currently not assured,
| right?
|  Another text passage tells that the simplification is inside-out
| expressions. Such a direction would make the design of rules definitely
| easier. Having both directions, maybe alternating in the runs of the
| simplifier, would be also nice. I could then design transforms of the
| kind:
|    toFastStructure . slowA . slowB . slowC . slowWithNoFastCounterpart
|    fastA . toFastStructure . slowB . slowC . slowWithNoFastCounterpart
|    fastA . fastB . toFastStructure . slowC . slowWithNoFastCounterpart
|    fastA . fastB . fastC . toFastStructure . slowWithNoFastCounterpart
|    fastA . fastBC . toFastStructure . slowWithNoFastCounterpart
|    fastABC . toFastStructure . slowWithNoFastCounterpart
|
|  On the one hand the inner of functions may not be available to fusion, if
| the INLINE pragma is omitted. As far as I know inlining may take place
| also without the INLINE pragma, but I have no warranty. Can I rely on
| functions being inlined with INLINE pragma? Somewhere I read that
| functions are not inlined if there is still an applicable rule that uses
| the function on the left-hand side. Altogether I'm uncertain how inlining
| is interleaved with rule application. It was said, that rules are just
| alternative function definitions. In this sense a function definition with
| INLINE is a more aggressively used simplifier rule, right?
|  On the other hand if I set the INLINE pragma then the inner of the
| function is not fused. If this would be the case, I could guide the
| optimizer to fuse several sub-expressions before others. Say,
|   doubleMap f g = map f . map g
|  could be fused to
|   doubleMap f g = map (f . g)
|  and then this fused version can be fused further in the context of the
| caller. The current situation seems to be that {-# INLINE doubleMap #-}
| switches off local fusion and allows global fusion, whereas omitting the
| INLINE pragma switches on local fusion and disallows global fusion. How
| can I have both of them?
| _______________________________________________
| Haskell-Cafe mailing list
| [hidden email]
| http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Properties of optimizer rule application?

Roman Leshchinskiy
In reply to this post by Henning Thielemann
Henning Thielemann wrote:
> Reading various papers and the Wiki about GHC optimizer rules I got the
> impression that there are not much properties I can rely on and I wonder
> how I can write a reliable fusion framework with this constraint.

That depends on your definition of reliable. You can't have a framework
which fuses everything that can be fused but then, I don't think that's
even theoretically possible. You can, however, have a framework which
does a pretty good job.

>  I read about the strategy to replace functions early by fusable
> implementations and replace them back to fast low-level implementation if
> fusion was not possible. However, can I rely on the back-translation if I
> have no warranty that the corresponding rule is applied? Is there some
> warranty that rules are applied as long as applicable rules are available
> or is the optimizer free to decide that it worked enough for today?
>  I see several phases with a fixed number of iterations in the output of
> -ddump-simpl-iterations. Is there some idea behind these phases or is the
> structure and number rather arbitrary? If there is only a fixed number of
> simplifier runs, how can I rely on complete fusion of arbitrary large
> expressions?

In general, you can't. You can control the number of simplifier phases
with -fsimplifier-phases (in the HEAD only) and the number of iterations
in each phase with -fmax-simplifier-iterations.

That said, there are other things that break fusion (such as code
getting between two functions you want to fuse). Again, you can only try
to make your framework good enough; it'll never be perfect.

>  At some place I read that the order of application of rules is arbitrary.
> I like to have some warranty that more special rules are applied before
> more general rules. That is, if rule X is applicable whereever Y is
> applicable then Y shall be tried before X. This is currently not assured,
> right?

IIRC, ghc tries more specific rules first but that's somewhat
unreliable. You can make rule X inactive in simplifier phase 2, however.
Then, only rule Y will be tried in phase 2; both rules will be tried in
subsequent phases.

I suspect, though, that ordering requirements on rules might indicate a
problem in the design of the fusion framework. I think they are best
avoided.

>  Another text passage tells that the simplification is inside-out
> expressions. Such a direction would make the design of rules definitely
> easier. Having both directions, maybe alternating in the runs of the
> simplifier, would be also nice. I could then design transforms of the
> kind:
 >   toFastStructure . slowA . slowB . slowC . slowWithNoFastCounterpart
 >   fastA . toFastStructure . slowB . slowC . slowWithNoFastCounterpart
 >   fastA . fastB . toFastStructure . slowC . slowWithNoFastCounterpart
 >   fastA . fastB . fastC . toFastStructure . slowWithNoFastCounterpart
 >   fastA . fastBC . toFastStructure . slowWithNoFastCounterpart
 >   fastABC . toFastStructure . slowWithNoFastCounterpart

Again, I don't think you really want to rely on the order of
simplification. For your example, you only need the following rules:

toFastStructure (slow{A|B|C} x) = fast{A|B|C} (toFastStructure x)
fastB (fastC x) = fastBC x
fastA (fastBC x) = fastABC x

They do not require any specific traversal order.

>  On the one hand the inner of functions may not be available to fusion, if
> the INLINE pragma is omitted. As far as I know inlining may take place
> also without the INLINE pragma, but I have no warranty. Can I rely on
> functions being inlined with INLINE pragma?

No. The inliner still uses heuristic to determine if inlining really is
beneficial. If you want to be sure, use rewrite rules.

 > Somewhere I read that
> functions are not inlined if there is still an applicable rule that uses
> the function on the left-hand side. Altogether I'm uncertain how inlining
> is interleaved with rule application. It was said, that rules are just
> alternative function definitions. In this sense a function definition with
> INLINE is a more aggressively used simplifier rule, right?

No, rules are "more aggressive" since they are applied unconditionally.

>  On the other hand if I set the INLINE pragma then the inner of the
> function is not fused. If this would be the case, I could guide the
> optimizer to fuse several sub-expressions before others. Say,
>   doubleMap f g = map f . map g
>  could be fused to
>   doubleMap f g = map (f . g)
>  and then this fused version can be fused further in the context of the
> caller. The current situation seems to be that {-# INLINE doubleMap #-}
> switches off local fusion and allows global fusion, whereas omitting the
> INLINE pragma switches on local fusion and disallows global fusion. How
> can I have both of them?

If you say {-# INLINE doubleMap #-}, you really expect doubleMap to be
inlined and never to be called explicitly; therefore, you don't really
care too much what actually happens to it. You can, however, do
something like:

{-# NOINLINE doubleMap #-}
doubleMap f g = map f . map g   -- will be fused

{-# RULES
   "doubleMap" forall f g. doubleMap f g = map f . map g
#-}

This makes sure that the rhs of doubleMap will be optimised for those
cases when the rule doesn't fire (e.g., when doubleMap is applied to
less than two arguments).

Roman

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

RE: Properties of optimizer rule application?

Henning Thielemann
In reply to this post by Simon Peyton Jones

On Wed, 16 Jan 2008, Simon Peyton-Jones wrote:

> GHC has one main mechanism for controlling the application of rules,
> namely simplifier "phases".  You can say "apply this rule only after
> phase N" or "apply this rule only before phase N".  Similarly for INLINE
> pragmas.  The manual describes this in detail.

Indeed. But since it does not mention the number of phases, nor the number
of iterations per phase, nor what actually is performed per iteration,
this appeared to me to be an internal issue of GHC which should not be
relied on.

> I urge against relying on "top-down" or "bottom-up" guarantees, because
> they are fragile: if you miss a single opportunity to apply rule A, then
> rule B may kick in; but a later inlining or other simplification might
> make rule A applicable.  Phases are the way to go.

I see.

> That said, GHC has much too rigid a notion of phases at the moment.
> There are precisely 3, namely 2 then 1 then 0, and that does not give enough control.

What about the 'gentle' phase in the dump ?

> Really we should let you give arbitrary names to phases, express
> constraints (A must be before B), and run a constraint solver to map
> phase names to a linear ordering.

Sounds like a topological sort. Reminds me on precedence control of infix
operators.

It seems to me that you have something more sophisticated already in mind.
What you sketch would allow application specific code to defer
optimization rules from the standard libraries. E.g. I could write rules
for lists that are designed for my application and that can be applied
without interference from Data.List. When no more of my rules can be
applied, then Data.List rules can fuse the rest.


It's interesting how to integrate this in the Haskell language. When you
want to state "phase A before phase B" you may have to refer to phases
defined in other modules. You have to be able to import them from other
modules, and you cannot use the regular 'import' syntax, since phase
identifiers are not part of Haskell language. Maybe you must enclose those
imports in pragmas, too. You need new module dependency checking, since
more dependencies can be introduced when optimization is switched on or
you have to restrict phase import to modules that are imported anyway.

{-# RULES
  import qualified Data.List as List
  #-}

> There's scope for an intern project here.

I could take the opportunity.
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Properties of optimizer rule application?

Henning Thielemann
In reply to this post by Roman Leshchinskiy

On Wed, 16 Jan 2008, Roman Leshchinskiy wrote:

> Henning Thielemann wrote:
> > Reading various papers and the Wiki about GHC optimizer rules I got the
> > impression that there are not much properties I can rely on and I wonder
> > how I can write a reliable fusion framework with this constraint.
>
> That depends on your definition of reliable. You can't have a framework
> which fuses everything that can be fused but then, I don't think that's
> even theoretically possible.

At least I expect that it fuses greedily and does not stop as long as
rules are applicable. Thinking of intermediate fusable function
replacements, I can be sure that rules are invoked that prevent me from
making things worse by optimization attempts.

> >  I read about the strategy to replace functions early by fusable
> > implementations and replace them back to fast low-level implementation if
> > fusion was not possible. However, can I rely on the back-translation if I
> > have no warranty that the corresponding rule is applied? Is there some
> > warranty that rules are applied as long as applicable rules are available
> > or is the optimizer free to decide that it worked enough for today?
> >  I see several phases with a fixed number of iterations in the output of
> > -ddump-simpl-iterations. Is there some idea behind these phases or is the
> > structure and number rather arbitrary? If there is only a fixed number of
> > simplifier runs, how can I rely on complete fusion of arbitrary large
> > expressions?
>
> In general, you can't.

To give a precise example: If I have a sequence of 'map's
  map f0 . map f1 . ... . map fn
 then there is some length where this is no longer collapsed to a single
'map'? However then I wonder, how it is possible to make the compiler to
go into an infinite loop by the rule

   "loop"   forall x,y.  f x y = f y x

 as stated in the GHC manual:
   http://haskell.org/ghc/docs/latest/html/users_guide/rewrite-rules.html

 I'm still uncertain how much is done in one iteration in one phase, since
there seems to be several rules that can fire in one iteration.


> You can control the number of simplifier phases with -fsimplifier-phases
> (in the HEAD only) and the number of iterations in each phase with
> -fmax-simplifier-iterations.

Good to know.

> That said, there are other things that break fusion (such as code
> getting between two functions you want to fuse). Again, you can only try
> to make your framework good enough; it'll never be perfect.

It would be nice to have a flag which alters the rule application order of
the compiler randomly in order to see whether the fusion framework
implicitly relies on a particular behaviour of the current compiler
version.

> >  Another text passage tells that the simplification is inside-out
> > expressions. Such a direction would make the design of rules definitely
> > easier. Having both directions, maybe alternating in the runs of the
> > simplifier, would be also nice. I could then design transforms of the
> > kind:
>  >   toFastStructure . slowA . slowB . slowC . slowWithNoFastCounterpart
>  >   fastA . toFastStructure . slowB . slowC . slowWithNoFastCounterpart
>  >   fastA . fastB . toFastStructure . slowC . slowWithNoFastCounterpart
>  >   fastA . fastB . fastC . toFastStructure . slowWithNoFastCounterpart
>  >   fastA . fastBC . toFastStructure . slowWithNoFastCounterpart
>  >   fastABC . toFastStructure . slowWithNoFastCounterpart
>
> Again, I don't think you really want to rely on the order of
> simplification. For your example, you only need the following rules:
>
> toFastStructure (slow{A|B|C} x) = fast{A|B|C} (toFastStructure x)
> fastB (fastC x) = fastBC x
> fastA (fastBC x) = fastABC x
>
> They do not require any specific traversal order.

Ok, this was a bad example. Try this one:
   project . project . foo
 with the rules
   project (project x) = project x
   project (foo     x) = projectFoo x

Both rules can be applied to the expression, but you get one fusion more,
if you use the first one first. Let me guess, in order to solve that, I
should restrict the first rule to an earlier phase than the second rule.




Thanks for the detailed answer and thanks to the busy people who have
created the optimizer and who have written all the papers and Wiki pages
for making use of this feature. I don't know another language where it is
possible to control the optimizer in this way.
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Properties of optimizer rule application?

Roman Leshchinskiy
Henning Thielemann wrote:
>
> To give a precise example: If I have a sequence of 'map's
>   map f0 . map f1 . ... . map fn
>  then there is some length where this is no longer collapsed to a single
> 'map'?

No. After applying a rule, the simplifier optimises the result of the
rewriting. This means that with (map f (map g x) = map (f . g) x),

   map f (map g (map h xs))

is first rewritten to

   map (f . g) (map h xs)

and the immediately to

   map (f . g . h) xs

Rewriting does not shift the "focus" of the simplifier.

>    project . project . foo
>  with the rules
>    project (project x) = project x
>    project (foo     x) = projectFoo x
>
> Both rules can be applied to the expression, but you get one fusion more,
> if you use the first one first. Let me guess, in order to solve that, I
> should restrict the first rule to an earlier phase than the second rule.

That's one possibility. It would be vastly preferable, however, to add
the rule

   project (projectFoo x) = projectFoo x

In general, you want your rewrite system to be confluent. I suspect that
  non-confluence always indicates a design problem. This is within one
set of rules, of course - explicitly staged things like "rewriting back"
of stuff which didn't fuse are different.

Roman

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

RE: Re: Properties of optimizer rule application?

Simon Peyton Jones
In reply to this post by Henning Thielemann

| Ok, this was a bad example. Try this one:
|    project . project . foo
|  with the rules
|    project (project x) = project x
|    project (foo     x) = projectFoo x
|
| Both rules can be applied to the expression, but you get one fusion more,
| if you use the first one first. Let me guess, in order to solve that, I
| should restrict the first rule to an earlier phase than the second rule.

As you point out, this set of rules is not confluent:
        project (project foo)
can reduce to
        --->  project (projectFoo x)
or to
        ---> projectFoo x
depending on the order of application.

The conventional solution is not to apply the rules very carefully (which is extremely hard in general), but rather to "complete" the rules, by adding
        project (projectFoo x) = projectFoo x

Now it doesn't matter which order you apply them in.

You can do this by hand, although it'd be quite a nice thing to automate it in GHC.


| To give a precise example: If I have a sequence of 'map's
|   map f0 . map f1 . ... . map fn
|  then there is some length where this is no longer collapsed to a single
| 'map'?

(a) GHC tries to do as much as possible in a single iteration of the simplifer; I think it uses an outermost-first strategy for this.

(b) For each phase it runs the simplifier until nothing changes, or a maximum of N times, where N is settable by a command-line-flag -fmax-simplifier-iterations.  After N it stops running that phase, even if the simplification has not terminated.

| However then I wonder, how it is possible to make the compiler to
| go into an infinite loop by the rule
|
|    "loop"   forall x,y.  f x y = f y x

Yes, it's possible.  Remember (a) does "as much as possible", which in your rule means rather a lot.


In this thread Roman and I have described stuff that isn't in the manual.  Henning, would you feel like elaborating the Wiki page
        http://haskell.org/haskellwiki/GHC/Using_rules
(which already has a lot of info) to reflect what you've learned?  That way it's preserved for others.

thanks

Simon

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

RE: Re: Properties of optimizer rule application?

Henning Thielemann

On Thu, 17 Jan 2008, Simon Peyton-Jones wrote:

> | To give a precise example: If I have a sequence of 'map's
> |   map f0 . map f1 . ... . map fn
> |  then there is some length where this is no longer collapsed to a single
> | 'map'?
>
> (a) GHC tries to do as much as possible in a single iteration of the
> simplifer; I think it uses an outermost-first strategy for this.
>
> (b) For each phase it runs the simplifier until nothing changes, or a
> maximum of N times, where N is settable by a command-line-flag
> -fmax-simplifier-iterations.  After N it stops running that phase, even
> if the simplification has not terminated.

This means that the simplifier follows a specific direction (outermost to
inner or vice versa). I shall not rely on the order, but I must expect
that there is an order, which restricts the application of rules. If it
would really do "as much as possible in a single iteration" then there
would be nothing left to do after one iteration, since the set of
applicable rules remains the same within one phase.

> | However then I wonder, how it is possible to make the compiler to
> | go into an infinite loop by the rule
> |
> |    "loop"   forall x,y.  f x y = f y x
>
> Yes, it's possible.  Remember (a) does "as much as possible", which in your rule means rather a lot.

"as much as possible" in a particular order (which I shall not rely on),
right?

> In this thread Roman and I have described stuff that isn't in the
> manual.  Henning, would you feel like elaborating the Wiki page
>         http://haskell.org/haskellwiki/GHC/Using_rules
> (which already has a lot of info) to reflect what you've learned?  That way it's preserved for others.

I have added many points and I hope I haven't made things more confusing
as they are and I have set more links and added more articles to
  http://www.haskell.org/haskellwiki/Category:Program_transformation


I think I also found a typo:

http://www.haskell.org/ghc/docs/latest/html/users_guide/pragmas.html#phase-control

The last line in the list, should certainly be
  "NOINLINE[~k] f" means: be willing to inline f until phase k, but from phase k onwards do not inline it.
   ^^


Recently I found that specialisation interacts in an unexpected way with
explicit RULES (and with inlining). I used a function multiple times and
this seemed to make GHC specialising this function (although I did not
used a SPECIALISE pragma) to the particular type. But then this function
was no longer available for fusion. It reminds me on the sharing problem -
it is not always an optimization to share common sub-expressions. In this
case the common sub-expression was a function which was used with the same
type (thus same class dictionary) for each call.

Also in one case declaring a function 'foo' as INLINE [0] avoided fusion
of 'foo', where NOINLINE [0] did the fusion in phase 2. I assumed that
these two pragmas are identical in phases before 0.

Summarized I think it is not only required to have better control over the
phases of the optimizer but to have a clear unifying concept of several
kinds of program transformations, namely SPECIALISE, INLINE, RULES.
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

RE: Re: Properties of optimizer rule application?

Simon Peyton Jones
| I think I also found a typo:

Quite right, thanks -- now fixed.

| Recently I found that specialisation interacts in an unexpected way with
| explicit RULES (and with inlining). I used a function multiple times and
| this seemed to make GHC specialising this function (although I did not
| used a SPECIALISE pragma) to the particular type. But then this function
| was no longer available for fusion.

Yes, specialisation generates new RULES.  These new rules apply all the time, so they may rewrite your call to f before your fusion rules run.  This is a bad thing.  Really, specialisation should generate rules for a particular phase 'spec', and you should be able to add your fusion rules to precede or follow 'spec'.

| Also in one case declaring a function 'foo' as INLINE [0] avoided fusion
| of 'foo', where NOINLINE [0] did the fusion in phase 2. I assumed that
| these two pragmas are identical in phases before 0.

I'm not sure what is going on here

| Summarized I think it is not only required to have better control over the
| phases of the optimizer but to have a clear unifying concept of several
| kinds of program transformations, namely SPECIALISE, INLINE, RULES.

I can hardly argue with a "clear unifying concept".  But I'm not quite sure what you mean.  Here's a stab.  You could put this on the wiki and refine it with help from this end.

* SPECIALISE generates a rule
* RULES specifies a rule
* INLINE is just like a rule (lhs = rhs)

The latter two can be controlled to some extend by the "phase" mechanism.  The first should be.


Did you mean more than that?

Simon
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Properties of optimizer rule application?

Henning Thielemann

On Mon, 21 Jan 2008, Simon Peyton-Jones wrote:

> | Recently I found that specialisation interacts in an unexpected way with
> | explicit RULES (and with inlining). I used a function multiple times and
> | this seemed to make GHC specialising this function (although I did not
> | used a SPECIALISE pragma) to the particular type. But then this function
> | was no longer available for fusion.
>
> Yes, specialisation generates new RULES.  These new rules apply all the
> time, so they may rewrite your call to f before your fusion rules run.
> This is a bad thing.  Really, specialisation should generate rules for a
> particular phase 'spec', and you should be able to add your fusion rules
> to precede or follow 'spec'.

One could also make RULES and INLINE confluent with SPECIALISE by
generated specialised RULES and INLINE rules.

> | Summarized I think it is not only required to have better control over the
> | phases of the optimizer but to have a clear unifying concept of several
> | kinds of program transformations, namely SPECIALISE, INLINE, RULES.
>
> I can hardly argue with a "clear unifying concept".  But I'm not quite
> sure what you mean.  Here's a stab.  You could put this on the wiki and
> refine it with help from this end.
>
> * SPECIALISE generates a rule
> * RULES specifies a rule
> * INLINE is just like a rule (lhs = rhs)
>
> The latter two can be controlled to some extend by the "phase" mechanism.  The first should be.
>
>
> Did you mean more than that?

Yes, this and more. Today

* SPECIALISE has no phase control and is applied conditionally
* RULES has phase control and is applied unconditionally
* INLINE has phase control and is applied conditionally depending on some cost measure,
     INLINE disables application of RULES within the function definition body,
     although it is not granted that the function is always inlined

Phase control for SPECIALISE would be one way to greater uniformity. Is it
sufficient to restrict specialisation to one phase? Shall rules and
specialisations be controlled by costs? I don't know.

  I learnt in this thread, that there are two ways to resolve conflicts:
 1. Make rules confluent, that is, merge what fits together
 2. Assign to different phases, that is, keep things separated, that cannot be merged.
  It seems to me that establishing confluence is to be prefered, because
if two rules are active only in separate phases they cannot be applied
alternatedly multiple times. Thus an improved optimizer should somehow
support making fusion systems confluent, that consists of all three kinds
of rules.
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe