Hello all,
First of all, I'm sorry that this email is so absurdly long. But it's not easy to explain the problem at hand, so I took a step-by-step approach. The executive summary is: GHC can do a great job with inlining, but often it doesn't, and I don't understand why. So I have some questions, which are highlighted in the text below. In general, any insights regarding inlining or improving the performance of generics are welcome. My final goal is to be able to state that generic functions (in particular using GHC.Generics) will have no runtime overhead whatsoever when compared to a handwritten type-specific version. The setting Generic programming is based on representing datatypes in a uniform way using a small set of representation types. Functions defined on those representation types can then be applied to all datatypes, because we can convert between datatypes and their representations. However, generic functions tend to be slower than their specialised counterparts, because they have to deal with the conversions. But clever inlining (together with other compiler optimisations) can completely remove this overhead. The problem I'm tackling is how to tell GHC exactly what it should in the particular case of optimisation of generic code. Simplified example I'll focus on the problem of optimising a non-trivial function for generic enumeration of terms. My experience shows that GHC does quite good at optimising simple functions, especially consumers (like generic equality). But producers are trickier. First, we'll need some auxiliary functions: -- | Interleave elements from two lists. Similar to (++), but swap left and The particular implementation of these functions doesn't really matter. What's important is that we have a way to interleave lists (|||) and a way to diagonalise a matrix into a list (diag). We mark these functions as NOINLINE because inlining them will only make the core code more complicated (and may prevent rules from firing). Suppose we have a type of Peano natural numbers: data Nat = Ze | Su Nat deriving Eq Implementing enumeration on this type is simple: enumNat :: [Nat] Now, a generic representation of Nat in terms of sums and products could look something like this: type RepNat = Either () Nat That is, either a singleton (for the Ze case) or a Nat (for the Su case). Note that I am building a shallow representation, since at the leaves we have Nat, and not RepNat. This mimics the situation with current generic programming libraries (in particular GHC.Generics). We'll need a way to convert between RepNat and Nat: toNat :: RepNat -> Nat (In fact, since we're only dealing with a generic producer we won't need the fromNat function.) To get an enumeration for RepNat, we first need to know how to enumerate units and sums: enumU :: [()] Now we can define an enumeration for RepNat: enumRepNat :: [RepNat] With the conversion function toNat, we can use enumRepNat to get an enumeration for Nat directly: enumNatFromRep :: [Nat] First, convince yourself that enumNatFromRep and enumNat are equivalent functions: take 100 enumNat == take 100 enumNatFromRep Now, what I want is that enumNatFromRep generates the same core code as enumNat. That should be possible; here are the necessary steps: map toNat enumRepNat Now let's see what the compiler generates. I'm using GHC-7.4.1. Let's compile with -O1 and use -ddump-simpl to see the final simplifier output (core code) for enumNatFromRep: EnumAlone.enumNatFromRep :: [EnumAlone.Nat] Ah, it didn't even inline enumRepNat because it made it a loop breaker. We certainly want to inline it, so let's add a pragma: {-# INLINE enumRepNat #-} Recompiling, we get: EnumAlone.enumRepNat [InlPrag=INLINE (sat-args=0)] So no real difference in the generated code, other than the reassignment of loop breakers. For some reason enumRepNat still doesn't get inlined. Question: why won't GHC inline enumRepNat, even when I tell it to do so with an INLINE pragma? Well, let's inline it ourselves, then. We redefine enumNatFromRep to: enumNatFromRep = map toNat (enumEither enumU enumNatFromRep) This however doesn't help much. We get the following core: lvl6_rw5 :: [Data.Either.Either () EnumAlone.Nat] GHC is really keen on floating that (|||) out. Let's be very explicit about inlining: {-# INLINE toNat #-} Also, maybe it's floating it out because it doesn't have anything else to do to it. Let's add the free theorem of (|||) as a rule: {-# RULES "ft |||" forall f a b. map f (a ||| b) = map f a ||| map f b #-} We needed this in our manual derivation, so GHC should need it too. Recompiling, we see we've made some progress: lvl5_ryv :: [Data.Either.Either () EnumAlone.Nat] enumNatFromRep finally starts with (|||) directly. But its second argument, lvl6_ryw, is a map of lvl5_ryv, which is itself a map! At this stage I expected GHC to be aware of the fusion law for map, but it seems that it isn't. Question: why is map fusion not happening automatically? Let's add it as a rule: {-# RULES "map/map1" forall f g l. map f (map g l) = map (f . g) l #-} And now we're in a much better situation: lvl3_ryD :: [Data.Either.Either () EnumAlone.Nat] Note how toNat is entirely gone from the second part of the enumeration (lvl5_ryF). Strangely enough, the enumerator for Ze (lvl4_ryE) is still very complicated: map toNat ([Left ()]). Why doesn't GHC simplify this to just [Ze]? Apparently because GHC doesn't simplify map over a single element list. Question: why doesn't GHC optimise map f [x] to [f x]? Let's tell it to do so: {-# RULES "map/map2" forall f x. map f (x:[]) = (f x):[] #-} Now we're finally where we wanted: lvl_ryA :: [EnumAlone.Nat] This is what I wanted: no more representation types (Either or ()), and the code looks exactly like what is generated for the handwritten enumNat. More realistic generic programming Now let's see if we can transport this to the setting of a generic programming library. I'll use a bare-bones version of GHC.Generics: infixr 5 :+: Let's represent Nat in this library: instance Representable Nat where (Note, in particular, that we do not need INLINE pragmas on the from/to methods. This might just be because GHC thinks these are small and inlines them anyway, but in general we want to make sure they are inlined, so we typically use pragmas there.) Now we need to implement enumeration generically. We do this by giving an instance for each representation type: class GEnum' a where We explicitly tell GHC to inline each case, as before. Note that for products I'm not using the more natural list comprehension syntax because I don't quite understand how that gets translated into core. In the cases for Var and Rec we use genum from the GEnum class: class GEnum a where GEnum' is the class used for instantiating the generic representation types, and GEnum is used for user types. We use a default signature to provide a default method that can be used when we have a Representable instance for the type in question. This makes instantiating Nat very easy: instance GEnum Nat Unfortunately, the core code generated in this situation (with the same RULES as before) is not nice at all: Main.$fGEnumNat_$cgenum [Occ=LoopBreaker] :: [Base.Nat] We see a map of the `to` function, which is definitely not what we want. Oddly enough, if we give an explicit definition of genum for Nat, with the inlined default... instance GEnum Nat where genum = map to genum' ... then we get the optimised code we want: lvl34_r79p :: [Base.Nat] Again, no representation types, no `to`, just the same code that is generated for enumNat. Perfect. But we had to avoid using the default definition, which is a pity. Question: why won't GHC inline the default method of a class, even when I have a pragma telling it to do so? Let's look at one more datatype, because Nat does not use products. So let's consider trees: data Tree a = Leaf | Bin a (Tree a) (Tree a) We give a GEnum instance using the same trick as before: instance GEnum (Tree Int) where genum = map to genum' (For simplicity only for trees of integers.) The generated code for trees is unfortunately not as nice: a2_r79M Note how lvl9_r79N is a map over a2_r79M, and a2_r79M is a `diag`. Ok, we need the free theorem of `diag` to tell GHC how to commute the `diag` with map: {-# RULES "ft/diag" forall f l. map f (diag l) = diag (map (map f) l) #-} Unfortunately this doesn't change the generated core code. With some more debugging looking at the generated code at each simplifier iteration, I believe that this is because a2_r79M got lifted out too soon, prevent the rule from applying. With some imagination I decided to try the -fno-full-laziness flag to prevent let-floating. I'm not sure this is a good idea in general, but in this particular case it gives much better results: a1_r72i :: [Base.Rec (Base.Tree GHC.Types.Int)] Note how our enum is now of the shape `[Leaf] ||| diag y`, which is good. The only catch is that there are some `Rec`s still laying around, with their associated newtype coercions, and a function a1_r72i that basically wraps the recursive enumeration in a Rec, only to be unwrapped in the body of `$fGEnumTree_$cgenum`. I don't know how to get GHC to simplify this code any further. Question: why do I need -fno-full-laziness for the ft/diag rule to apply? Question: why is GHC not getting rid of the Rec newtype in this case? I have also played with -O2, in particular because of the SpecConstr optimisation, but found that it does not affect these particular examples (perhaps it only becomes important with larger datatypes). I have also experimented with phase control in the rewrite rules and the inline pragmas, but didn't find it necessary for this example. In general, anyway, my experience with the inliner is that it is extremely fragile, especially across different GHC versions, and it's hard to get any guarantees of optimisation. I have also played with the -funfolding-* options before, with mixed results. [1] It's also a pity that certain flags are not explained in detail in the user's manual [2,3], like -fliberate-case, and -fspec-constr-count and threshold, for instance. Thank you for reading this. Any insights are welcome. In particular, I'm wondering if I might be missing some details regarding strictness. Cheers, Pedro [1] http://dreixel.net/research/pdf/ogie.pdf [2] http://www.haskell.org/ghc/docs/latest/html/users_guide/flag-reference.html [3] http://www.haskell.org/ghc/docs/latest/html/users_guide/options-optimise.html#options-f _______________________________________________ Glasgow-haskell-users mailing list [hidden email] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users |
Pedro Some responses to your long message! (Which I attach for background) Your example was unusual in that it used a lot of top-level definitions. GHC treats them slightly specially. Given: x = g 4 y = f x GHC does not transform into this: y = f (g 4) which it would do in a nested let. Why not? Because the latter will generate code that dynamically allocates a thunk for (g 4), while the former will make
a static thunk. (An alternative would be to treat them uniformly and only pull out those nested thunks at the very last minute; but GHC doesn’t do that right now.) A disadvantage is that it’s not statically visible to the simplifier that x is used once. If we have a RULE for f (g n), it might not fire -- because of the
worry that someone else might be sharing x. I think this is the root cause of much of your trouble. Incidentally , it makes no difference giving x an INLINE pragma. GHC is very cautious about duplicating non-values and currently
not even INLINE will make it less cautious. That’s another thing we could consider changing. I’ll respond to part 2 (about generic programming) separately Simon Hello all, First of all, I'm sorry that this email is so absurdly long. But it's not easy to explain the problem at hand, so I took a step-by-step approach. The executive summary is: GHC can do a great job with inlining, but often it doesn't, and I don't understand why. So I have some questions, which are highlighted in the text below. In general, any insights regarding inlining or improving the performance of generics are welcome. My final goal is to be able to state that generic functions (in particular using GHC.Generics) will have no runtime overhead whatsoever when compared to a handwritten type-specific version. The setting Generic programming is based on representing datatypes in a uniform way using a small set of representation types. Functions defined on those representation types can then be applied to all datatypes, because we can convert between datatypes and their representations. However, generic functions tend to be slower than their specialised counterparts, because they have to deal with the conversions. But clever inlining (together with other compiler optimisations) can completely remove this overhead. The problem I'm tackling is how to tell GHC exactly what it should in the particular case of optimisation of generic code. Simplified example I'll focus on the problem of optimising a non-trivial function for generic enumeration of terms. My experience shows that GHC does quite good at optimising simple functions, especially consumers (like generic equality). But producers are trickier. First, we'll need some auxiliary functions: -- | Interleave elements from two lists. Similar to (++), but swap left and The particular implementation of these functions doesn't really matter. What's important is that we have a way to interleave lists (|||) and a way to diagonalise a matrix into a list (diag). We mark these functions as NOINLINE because inlining them will only make the core code more complicated (and may prevent rules from firing). Suppose we have a type of Peano natural numbers: data Nat = Ze | Su Nat deriving Eq Implementing enumeration on this type is simple: enumNat :: [Nat] Now, a generic representation of Nat in terms of sums and products could look something like this: type RepNat = Either () Nat That is, either a singleton (for the Ze case) or a Nat (for the Su case). Note that I am building a shallow representation, since at the leaves we have Nat, and not RepNat. This mimics the situation with current generic programming libraries (in particular GHC.Generics). We'll need a way to convert between RepNat and Nat: toNat :: RepNat -> Nat (In fact, since we're only dealing with a generic producer we won't need the fromNat function.) To get an enumeration for RepNat, we first need to know how to enumerate units and sums: enumU :: [()] Now we can define an enumeration for RepNat: enumRepNat :: [RepNat] With the conversion function toNat, we can use enumRepNat to get an enumeration for Nat directly: enumNatFromRep :: [Nat] First, convince yourself that enumNatFromRep and enumNat are equivalent functions: take 100 enumNat == take 100 enumNatFromRep Now, what I want is that enumNatFromRep generates the same core code as enumNat. That should be possible; here are the necessary steps: map toNat enumRepNat Now let's see what the compiler generates. I'm using GHC-7.4.1. Let's compile with -O1 and use -ddump-simpl to see the final simplifier output (core code) for enumNatFromRep: EnumAlone.enumNatFromRep :: [EnumAlone.Nat] Ah, it didn't even inline enumRepNat because it made it a loop breaker. We certainly want to inline it, so let's add a pragma: {-# INLINE enumRepNat #-} Recompiling, we get: EnumAlone.enumRepNat [InlPrag=INLINE (sat-args=0)] So no real difference in the generated code, other than the reassignment of loop breakers. For some reason enumRepNat still doesn't get inlined. Question: why won't GHC inline enumRepNat, even when I tell it to do so with an INLINE pragma? Well, let's inline it ourselves, then. We redefine enumNatFromRep to: enumNatFromRep = map toNat (enumEither enumU enumNatFromRep) This however doesn't help much. We get the following core: lvl6_rw5 :: [Data.Either.Either () EnumAlone.Nat] GHC is really keen on floating that (|||) out. Let's be very explicit about inlining: {-# INLINE toNat #-} Also, maybe it's floating it out because it doesn't have anything else to do to it. Let's add the free theorem of (|||) as a rule: {-# RULES "ft |||" forall f a b. map f (a ||| b) = map f a ||| map f b #-} We needed this in our manual derivation, so GHC should need it too. Recompiling, we see we've made some progress: lvl5_ryv :: [Data.Either.Either () EnumAlone.Nat] enumNatFromRep finally starts with (|||) directly. But its second argument, lvl6_ryw, is a map of lvl5_ryv, which is itself a map! At this stage I expected GHC to be aware of the fusion law for map, but it seems that it isn't. Question: why is map fusion not happening automatically? Let's add it as a rule: {-# RULES "map/map1" forall f g l. map f (map g l) = map (f . g) l #-} And now we're in a much better situation: lvl3_ryD :: [Data.Either.Either () EnumAlone.Nat] Note how toNat is entirely gone from the second part of the enumeration (lvl5_ryF). Strangely enough, the enumerator for Ze (lvl4_ryE) is still very complicated: map toNat ([Left ()]). Why doesn't GHC simplify this to just [Ze]? Apparently because GHC doesn't simplify map over a single element list. Question: why doesn't GHC optimise map f [x] to [f x]? Let's tell it to do so: {-# RULES "map/map2" forall f x. map f (x:[]) = (f x):[] #-} Now we're finally where we wanted: lvl_ryA :: [EnumAlone.Nat] This is what I wanted: no more representation types (Either or ()), and the code looks exactly like what is generated for the handwritten enumNat. More realistic generic programming Now let's see if we can transport this to the setting of a generic programming library. I'll use a bare-bones version of GHC.Generics: infixr 5 :+: Let's represent Nat in this library: instance Representable Nat where (Note, in particular, that we do not need INLINE pragmas on the from/to methods. This might just be because GHC thinks these are small and inlines them anyway, but in general we want to make sure they are inlined, so we typically use pragmas there.) Now we need to implement enumeration generically. We do this by giving an instance for each representation type: class GEnum' a where We explicitly tell GHC to inline each case, as before. Note that for products I'm not using the more natural list comprehension syntax because I don't quite understand how that gets translated into core. In the cases for Var and Rec we use genum from the GEnum class: class GEnum a where GEnum' is the class used for instantiating the generic representation types, and GEnum is used for user types. We use a default signature to provide a default method that can be used when we have a Representable instance for the type in question. This makes instantiating Nat very easy: instance GEnum Nat Unfortunately, the core code generated in this situation (with the same RULES as before) is not nice at all: Main.$fGEnumNat_$cgenum [Occ=LoopBreaker] :: [Base.Nat] We see a map of the `to` function, which is definitely not what we want. Oddly enough, if we give an explicit definition of genum for Nat, with the inlined default... instance GEnum Nat where genum = map to genum' ... then we get the optimised code we want: lvl34_r79p :: [Base.Nat] Again, no representation types, no `to`, just the same code that is generated for enumNat. Perfect. But we had to avoid using the default definition, which is a pity. Question: why won't GHC inline the default method of a class, even when I have a pragma telling it to do so? Let's look at one more datatype, because Nat does not use products. So let's consider trees: data Tree a = Leaf | Bin a (Tree a) (Tree a) We give a GEnum instance using the same trick as before: instance GEnum (Tree Int) where genum = map to genum' (For simplicity only for trees of integers.) The generated code for trees is unfortunately not as nice: a2_r79M Note how lvl9_r79N is a map over a2_r79M, and a2_r79M is a `diag`. Ok, we need the free theorem of `diag` to tell GHC how to commute the `diag` with map: {-# RULES "ft/diag" forall f l. map f (diag l) = diag (map (map f) l) #-} Unfortunately this doesn't change the generated core code. With some more debugging looking at the generated code at each simplifier iteration, I believe that this is because a2_r79M got lifted out too soon, prevent the rule from applying. With some imagination I decided to try the -fno-full-laziness flag to prevent let-floating. I'm not sure this is a good idea in general, but in this particular case it gives much better results: a1_r72i :: [Base.Rec (Base.Tree GHC.Types.Int)] Note how our enum is now of the shape `[Leaf] ||| diag y`, which is good. The only catch is that there are some `Rec`s still laying around, with their associated newtype coercions, and a function a1_r72i that basically wraps the recursive enumeration in a Rec, only to be unwrapped in the body of `$fGEnumTree_$cgenum`. I don't know how to get GHC to simplify this code any further. Question: why do I need -fno-full-laziness for the ft/diag rule to apply? Question: why is GHC not getting rid of the Rec newtype in this case? I have also played with -O2, in particular because of the SpecConstr optimisation, but found that it does not affect these particular examples (perhaps it only becomes important with larger datatypes). I have also experimented with phase control in the rewrite rules and the inline pragmas, but didn't find it necessary for this example. In general, anyway, my experience with the inliner is that it is extremely fragile, especially across different GHC versions, and it's hard to get any guarantees of optimisation. I have also played with the -funfolding-* options before, with mixed results. [1] It's also a pity that certain flags are not explained in detail in the user's manual [2,3], like -fliberate-case, and -fspec-constr-count and threshold, for instance. Thank you for reading this. Any insights are welcome. In particular, I'm wondering if I might be missing some details regarding strictness. Cheers, Pedro [1] http://dreixel.net/research/pdf/ogie.pdf [2] http://www.haskell.org/ghc/docs/latest/html/users_guide/flag-reference.html [3] http://www.haskell.org/ghc/docs/latest/html/users_guide/options-optimise.html#options-f _______________________________________________ Glasgow-haskell-users mailing list [hidden email] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users |
In reply to this post by José Pedro Magalhães
Pedro
The second part of your message (attached), about generics, turned out to be *much* more subtle. You wondered why it made a difference whether you said instance GEnum Nat where genum = map to genum' or instance GEnum Nat -- Fill in from default method Well, it turns out that the difference is largely accidental. Here are the types of the functions involved: to :: Representable a => Rep a -> a genum' :: GEnum' a => [a] type instance Rep Nat = RepNat type RepNat = U :+: (Rec Nat) Consider the instance definition genum = map to genum' There are two different ways of typing it: (A) map @RepNat @Nat (to @Nat dReprNat |> g1) (genum' @RepNat dGEnum'_RepNat) where g1 :: Rep Nat -> Nat ~ RepNat -> Nat dReprNat :: Representable Nat dGEnum'Nat :: GEnum' RepNat or (B) map @(Rep Nat) @Nat (to @Nat dReprNat) (genum' @(Rep Nat) dGEnum'_Rep_Nat) where dReprNat :: Representable Nat dGEnum'Nat :: GEnum' (Rep Nat) Which of these is chosen depends on accidental things in the constraint solver; it's not supposed to matter. But it DOES affect whether the map/(|||) rule fires. {-# RULES "ft |||" forall f a b. map f (a ||| b) = map f a ||| map f b #-} It makes a difference because in (A) we have an instance for GEnum' RepNat that uses ||| directly, instance (GEnum' f, GEnum' g) => GEnum' (f :+: g) where genum' = map L genum' ||| map R genum' so we get map ... (blah1 ||| blah2) But in (B) we need an instance for GEnum' (Rep Nat) and that has an extra cast, so we get map ... ((blah1 ||| blah2) |> g) And the cast stops the RULE for map/(|||) firing. =============== Parametricity to the rescue ============= Note that (|||) :: [a] -> [a] -> [a] So by parametricity we know that if g :: [T1] ~ [T2] then ((|||) @T1 xs ys |> g) = ((|||) @T2 (xs |> g) (ys |> g) If we used that to push the cast inwards, the RULE would match. Likewise, map is polymorphic: map :: (a->b) -> [a] -> [b] So by parametricity if :: [T1] -> [T2] then map @T2 @TR f (xs |> g)] = map @T1 @TR (f |> sym (right g) -> TR) xs If we used that to move the cast out of the way, the RULE would match too. But GHC is nowhere near clever enough to do either of these things. And it's far from obvious what to do in general. ================= Bottom line: the choices made by the constraint solver can affect exactly where casts are inserted into the code. GHC knows how to move casts around to stop them getting in the way of its own transformations, but is helpless if they get in the way of RULES. I am really not sure how to deal with this. But it is very interesting! Simon Hello all, First of all, I'm sorry that this email is so absurdly long. But it's not easy to explain the problem at hand, so I took a step-by-step approach. The executive summary is: GHC can do a great job with inlining, but often it doesn't, and I don't understand why. So I have some questions, which are highlighted in the text below. In general, any insights regarding inlining or improving the performance of generics are welcome. My final goal is to be able to state that generic functions (in particular using GHC.Generics) will have no runtime overhead whatsoever when compared to a handwritten type-specific version. The setting Generic programming is based on representing datatypes in a uniform way using a small set of representation types. Functions defined on those representation types can then be applied to all datatypes, because we can convert between datatypes and their representations. However, generic functions tend to be slower than their specialised counterparts, because they have to deal with the conversions. But clever inlining (together with other compiler optimisations) can completely remove this overhead. The problem I'm tackling is how to tell GHC exactly what it should in the particular case of optimisation of generic code. Simplified example I'll focus on the problem of optimising a non-trivial function for generic enumeration of terms. My experience shows that GHC does quite good at optimising simple functions, especially consumers (like generic equality). But producers are trickier. First, we'll need some auxiliary functions: -- | Interleave elements from two lists. Similar to (++), but swap left and The particular implementation of these functions doesn't really matter. What's important is that we have a way to interleave lists (|||) and a way to diagonalise a matrix into a list (diag). We mark these functions as NOINLINE because inlining them will only make the core code more complicated (and may prevent rules from firing). Suppose we have a type of Peano natural numbers: data Nat = Ze | Su Nat deriving Eq Implementing enumeration on this type is simple: enumNat :: [Nat] Now, a generic representation of Nat in terms of sums and products could look something like this: type RepNat = Either () Nat That is, either a singleton (for the Ze case) or a Nat (for the Su case). Note that I am building a shallow representation, since at the leaves we have Nat, and not RepNat. This mimics the situation with current generic programming libraries (in particular GHC.Generics). We'll need a way to convert between RepNat and Nat: toNat :: RepNat -> Nat (In fact, since we're only dealing with a generic producer we won't need the fromNat function.) To get an enumeration for RepNat, we first need to know how to enumerate units and sums: enumU :: [()] Now we can define an enumeration for RepNat: enumRepNat :: [RepNat] With the conversion function toNat, we can use enumRepNat to get an enumeration for Nat directly: enumNatFromRep :: [Nat] First, convince yourself that enumNatFromRep and enumNat are equivalent functions: take 100 enumNat == take 100 enumNatFromRep Now, what I want is that enumNatFromRep generates the same core code as enumNat. That should be possible; here are the necessary steps: map toNat enumRepNat Now let's see what the compiler generates. I'm using GHC-7.4.1. Let's compile with -O1 and use -ddump-simpl to see the final simplifier output (core code) for enumNatFromRep: EnumAlone.enumNatFromRep :: [EnumAlone.Nat] Ah, it didn't even inline enumRepNat because it made it a loop breaker. We certainly want to inline it, so let's add a pragma: {-# INLINE enumRepNat #-} Recompiling, we get: EnumAlone.enumRepNat [InlPrag=INLINE (sat-args=0)] So no real difference in the generated code, other than the reassignment of loop breakers. For some reason enumRepNat still doesn't get inlined. Question: why won't GHC inline enumRepNat, even when I tell it to do so with an INLINE pragma? Well, let's inline it ourselves, then. We redefine enumNatFromRep to: enumNatFromRep = map toNat (enumEither enumU enumNatFromRep) This however doesn't help much. We get the following core: lvl6_rw5 :: [Data.Either.Either () EnumAlone.Nat] GHC is really keen on floating that (|||) out. Let's be very explicit about inlining: {-# INLINE toNat #-} Also, maybe it's floating it out because it doesn't have anything else to do to it. Let's add the free theorem of (|||) as a rule: {-# RULES "ft |||" forall f a b. map f (a ||| b) = map f a ||| map f b #-} We needed this in our manual derivation, so GHC should need it too. Recompiling, we see we've made some progress: lvl5_ryv :: [Data.Either.Either () EnumAlone.Nat] enumNatFromRep finally starts with (|||) directly. But its second argument, lvl6_ryw, is a map of lvl5_ryv, which is itself a map! At this stage I expected GHC to be aware of the fusion law for map, but it seems that it isn't. Question: why is map fusion not happening automatically? Let's add it as a rule: {-# RULES "map/map1" forall f g l. map f (map g l) = map (f . g) l #-} And now we're in a much better situation: lvl3_ryD :: [Data.Either.Either () EnumAlone.Nat] Note how toNat is entirely gone from the second part of the enumeration (lvl5_ryF). Strangely enough, the enumerator for Ze (lvl4_ryE) is still very complicated: map toNat ([Left ()]). Why doesn't GHC simplify this to just [Ze]? Apparently because GHC doesn't simplify map over a single element list. Question: why doesn't GHC optimise map f [x] to [f x]? Let's tell it to do so: {-# RULES "map/map2" forall f x. map f (x:[]) = (f x):[] #-} Now we're finally where we wanted: lvl_ryA :: [EnumAlone.Nat] This is what I wanted: no more representation types (Either or ()), and the code looks exactly like what is generated for the handwritten enumNat. More realistic generic programming Now let's see if we can transport this to the setting of a generic programming library. I'll use a bare-bones version of GHC.Generics: infixr 5 :+: Let's represent Nat in this library: instance Representable Nat where (Note, in particular, that we do not need INLINE pragmas on the from/to methods. This might just be because GHC thinks these are small and inlines them anyway, but in general we want to make sure they are inlined, so we typically use pragmas there.) Now we need to implement enumeration generically. We do this by giving an instance for each representation type: class GEnum' a where We explicitly tell GHC to inline each case, as before. Note that for products I'm not using the more natural list comprehension syntax because I don't quite understand how that gets translated into core. In the cases for Var and Rec we use genum from the GEnum class: class GEnum a where GEnum' is the class used for instantiating the generic representation types, and GEnum is used for user types. We use a default signature to provide a default method that can be used when we have a Representable instance for the type in question. This makes instantiating Nat very easy: instance GEnum Nat Unfortunately, the core code generated in this situation (with the same RULES as before) is not nice at all: Main.$fGEnumNat_$cgenum [Occ=LoopBreaker] :: [Base.Nat] We see a map of the `to` function, which is definitely not what we want. Oddly enough, if we give an explicit definition of genum for Nat, with the inlined default... instance GEnum Nat where genum = map to genum' ... then we get the optimised code we want: lvl34_r79p :: [Base.Nat] Again, no representation types, no `to`, just the same code that is generated for enumNat. Perfect. But we had to avoid using the default definition, which is a pity. Question: why won't GHC inline the default method of a class, even when I have a pragma telling it to do so? Let's look at one more datatype, because Nat does not use products. So let's consider trees: data Tree a = Leaf | Bin a (Tree a) (Tree a) We give a GEnum instance using the same trick as before: instance GEnum (Tree Int) where genum = map to genum' (For simplicity only for trees of integers.) The generated code for trees is unfortunately not as nice: a2_r79M Note how lvl9_r79N is a map over a2_r79M, and a2_r79M is a `diag`. Ok, we need the free theorem of `diag` to tell GHC how to commute the `diag` with map: {-# RULES "ft/diag" forall f l. map f (diag l) = diag (map (map f) l) #-} Unfortunately this doesn't change the generated core code. With some more debugging looking at the generated code at each simplifier iteration, I believe that this is because a2_r79M got lifted out too soon, prevent the rule from applying. With some imagination I decided to try the -fno-full-laziness flag to prevent let-floating. I'm not sure this is a good idea in general, but in this particular case it gives much better results: a1_r72i :: [Base.Rec (Base.Tree GHC.Types.Int)] Note how our enum is now of the shape `[Leaf] ||| diag y`, which is good. The only catch is that there are some `Rec`s still laying around, with their associated newtype coercions, and a function a1_r72i that basically wraps the recursive enumeration in a Rec, only to be unwrapped in the body of `$fGEnumTree_$cgenum`. I don't know how to get GHC to simplify this code any further. Question: why do I need -fno-full-laziness for the ft/diag rule to apply? Question: why is GHC not getting rid of the Rec newtype in this case? I have also played with -O2, in particular because of the SpecConstr optimisation, but found that it does not affect these particular examples (perhaps it only becomes important with larger datatypes). I have also experimented with phase control in the rewrite rules and the inline pragmas, but didn't find it necessary for this example. In general, anyway, my experience with the inliner is that it is extremely fragile, especially across different GHC versions, and it's hard to get any guarantees of optimisation. I have also played with the -funfolding-* options before, with mixed results. [1] It's also a pity that certain flags are not explained in detail in the user's manual [2,3], like -fliberate-case, and -fspec-constr-count and threshold, for instance. Thank you for reading this. Any insights are welcome. In particular, I'm wondering if I might be missing some details regarding strictness. Cheers, Pedro [1] http://dreixel.net/research/pdf/ogie.pdf [2] http://www.haskell.org/ghc/docs/latest/html/users_guide/flag-reference.html [3] http://www.haskell.org/ghc/docs/latest/html/users_guide/options-optimise.html#options-f _______________________________________________ Glasgow-haskell-users mailing list [hidden email] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users |
Simon,
Thanks a lot for looking into this. One question regarding maps that I still don't understand: can you explain me if it is indeed to be expected that GHC won't fuse `map f . map g` into `map (f . g)` by default? Also, same for `map f [x]` ~> `[f x]`? Regarding your reply: 2012/3/9 Simon Peyton-Jones <[hidden email]> Cheers,You wondered why it made a difference whether you said I understand your explanation for why this can result in different behaviour. But I think we should try to find a way to address this. We came up with DefaultSignatures to simplify instantiating generic functions, but now it turns out that using them can make the code slower! This is rather unexpected. Could we perhaps have a way to let users specify rewrite rules involving `cast`? Pedro _______________________________________________ Glasgow-haskell-users mailing list [hidden email] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users |
Powered by Nabble | Edit this page |