Nested constructed product results?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Nested constructed product results?

Alexis King
Hi all,

I spent some time today looking into the performance of a program
involving a parser type that looks something like this:

    data AnnotatedParser a = AnnotatedParser
      { annotation :: Annotation
      , parser :: String -> (Maybe a, String)
      }

The `Annotation` records metadata about the structure of an
`AnnotatedParser` that can be accessed statically (that is, without
having to run the parser on some input). `AnnotatedParser`s are built
from various primitive constructors and composed using various
combinators. These combinators end up looking something like this:

    (<+>) :: AnnotatedParser a -> AnnotatedParser b -> AnnotatedParser (a, b)
    AnnotatedParser ann1 f <+> AnnotatedParser ann2 g = AnnotatedParser
      { annotation = Seq ann1 ann2
      , parser = \s1 ->
          let !(a, s2) = f s1
              !(b, s3) = g s2
          in ((,) <$> a <*> b, s3)
      }

Use of these combinators leads to the construction and subsequent case
analysis of numerous `AnnotatedParser` closures. Happily, constructed
product result[1] analysis kicks in and rewrites such combinators to cut
down on the needless boxing, leading to worker/wrapper splits like this:

    $w<+> :: Annotation
          -> (String -> (Maybe a, String))
          -> Annotation
          -> (String -> (Maybe b, String))
          -> (# Annotation, String -> (Maybe (a, b), String) #)
    $w<+> ann1 f ann2 g =
      (# Seq ann1 ann2
       , \s1 -> let !(a, s2) = f s1
                    !(b, s3) = g s2
                in ((,) <$> a <*> b, s3) #)

    <+> :: AnnotatedParser a -> AnnotatedParser b -> AnnotatedParser (a, b)
    <+> (AnnotatedParser ann1 f) (AnnotatedParser ann2 g) =
      case $w<+> ann1 f ann2 g of
        (# a, b #) -> AnnotatedParser a b
    {-# INLINE <+> #-}

This is great, and it cuts down on allocation significantly, but there
is still something unsatisfying about it: the `parser` function inside
the record is not affected by CPR! This is a shame, because
essentially all use sites immediately deconstruct the pair, making it
a prime candidate for unboxing. Ideally, we’d like to get this, instead:

    $w<+> :: Annotation
          -> (String -> (Maybe a, String))
          -> Annotation
          -> (String -> (Maybe b, String))
          -> (# Annotation, String -> (# Maybe (a, b), String #) #)
    $w<+> ann1 f ann2 g =
      (# Seq ann1 ann2
       , \s1 -> let !(a, s2) = f s1
                    !(b, s3) = g s2
                in (# (,) <$> a <*> b, s3 #) #)

In practice, little combinators like `$w<+>` are marked INLINE, so `f`
and `g` are usually known rather than unknown calls. This nested CPR
transformation would allow the tuple construction to fuse with the
tuple deconstruction, eliminating quite a lot of unnecessary
boxing/unboxing.

Unfortunately, it seems as though GHC’s implementation of CPR is
entirely first-order: although function arguments are given rich
demand signatures, results are only described one level deep. But as
the above example hopefully illustrates, that’s leaving significant
optimization opportunities on the table! Hence, my questions:

    1. Has this notion of “nested CPR” been explored at all before?
    2. Does such an extension to CPR sound worth its weight?

I peeked a little at GHC.Types.Cpr and GHC.Core.Opt.CprAnal, and it
seems quite manageable to me, but I haven’t actually looked into an
implementation attempt just yet. I’m mostly interested in whether
others have thought about something like this and/or run into similar
issues in the past, or if this is really an unusual construction.

Thanks,
Alexis

[1]: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/07/cpr.pdf

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Nested constructed product results?

Sebastian Graf
Hi Alexis,

that's a very interesting example you have there!

So far, what we referred to as Nested CPR concerned unboxing for returned nested *records*, e.g., the `annotation` field in your example. That's what I try to exploit in !1866, which after a rebase that I'll hopefully be doing this week, some more sleuthing and then documenting what I did will finally make it into master.

CPR'ing the Lambda, i.e., what is returned for `parser`, on the other hand, is a surprising new opportunity for what Nested CPR could do beyond unboxing records! And it's pretty simple, too: Because it's a function, we don't care about subtleties such as whether all callers actually evaluate the pair that deep (actually, that's wrong, as I realise below). I think it's entirely within the reach of !1866 today. So we could transform (provided that `(,) <$> a <*> b` inlines `<$>` and `<*>` and then will actually have the CPR property)

    AnnotatedParser ann1 f <+> AnnotatedParser ann2 g = AnnotatedParser
      { annotation = Seq ann1 ann2
      , parser = \s1 ->
          let !(a, s2) = f s1
              !(b, s3) = g s2
          in ((,) <$> a <*> b, s3)
      }

to

    $w<+> :: Annotation
          -> (String -> (Maybe a, String))
          -> Annotation
          -> (String -> (Maybe b, String))
          -> (# Annotation, String -> (# Maybe (a, b), String #) #)
    $w<+> ann1 f ann2 g =
      (# Seq ann1 ann2
       , \s1 -> case (\s1 -> let !(a, s2) = f s1
                    !(b, s3) = g s2
                in ((,) <$> a <*> b) s1 of (p, q) -> (#p, q#), s3) #)

    <+> :: AnnotatedParser a -> AnnotatedParser b -> AnnotatedParser (a, b)
    <+> (AnnotatedParser ann1 f) (AnnotatedParser ann2 g) =
      case $w<+> ann1 f ann2 g of
        (# a, b #) -> AnnotatedParser (\s1 -> case a s1 of (# p, q#) -> (p, q)) b
    {-# INLINE <+> #-}

Actually writing out the transformation tells me that this isn't always a win: We now have to allocate a lambda in the wrapper. That is only a win if that lambda cancels away at call sites! So we have to make sure that all call sites of the wrapper actually call the `parser`, so that the lambda simplifies away. If it doesn't, we have a situation akin to reboxing. So I was wrong above when I said "we don't care about subtleties such as whether all callers actually evaluate the pair that deep": We very much need to know whether all call sites call the lambda. Luckily, I implemented just that for exploitation by Nested CPR! That's the reason why I need to rebase !1866 now. I'll ḱeep you posted.

---

You might wonder why CPR today doesn't care for lambdas. Well, they only make sense in nested scenarios (otherwise the function wasn't eta-expanded that far, for good reasons) and CPR currently doesn't bother unboxing records nestedly, which is what #18174 discusses and what !1866 tries to fix.

Cheers,
Sebastian

Am Di., 15. Dez. 2020 um 06:52 Uhr schrieb Alexis King <[hidden email]>:
Hi all,

I spent some time today looking into the performance of a program
involving a parser type that looks something like this:

    data AnnotatedParser a = AnnotatedParser
      { annotation :: Annotation
      , parser :: String -> (Maybe a, String)
      }

The `Annotation` records metadata about the structure of an
`AnnotatedParser` that can be accessed statically (that is, without
having to run the parser on some input). `AnnotatedParser`s are built
from various primitive constructors and composed using various
combinators. These combinators end up looking something like this:

    (<+>) :: AnnotatedParser a -> AnnotatedParser b -> AnnotatedParser (a, b)
    AnnotatedParser ann1 f <+> AnnotatedParser ann2 g = AnnotatedParser
      { annotation = Seq ann1 ann2
      , parser = \s1 ->
          let !(a, s2) = f s1
              !(b, s3) = g s2
          in ((,) <$> a <*> b, s3)
      }

Use of these combinators leads to the construction and subsequent case
analysis of numerous `AnnotatedParser` closures. Happily, constructed
product result[1] analysis kicks in and rewrites such combinators to cut
down on the needless boxing, leading to worker/wrapper splits like this:

    $w<+> :: Annotation
          -> (String -> (Maybe a, String))
          -> Annotation
          -> (String -> (Maybe b, String))
          -> (# Annotation, String -> (Maybe (a, b), String) #)
    $w<+> ann1 f ann2 g =
      (# Seq ann1 ann2
       , \s1 -> let !(a, s2) = f s1
                    !(b, s3) = g s2
                in ((,) <$> a <*> b, s3) #)

    <+> :: AnnotatedParser a -> AnnotatedParser b -> AnnotatedParser (a, b)
    <+> (AnnotatedParser ann1 f) (AnnotatedParser ann2 g) =
      case $w<+> ann1 f ann2 g of
        (# a, b #) -> AnnotatedParser a b
    {-# INLINE <+> #-}

This is great, and it cuts down on allocation significantly, but there
is still something unsatisfying about it: the `parser` function inside
the record is not affected by CPR! This is a shame, because
essentially all use sites immediately deconstruct the pair, making it
a prime candidate for unboxing. Ideally, we’d like to get this, instead:

    $w<+> :: Annotation
          -> (String -> (Maybe a, String))
          -> Annotation
          -> (String -> (Maybe b, String))
          -> (# Annotation, String -> (# Maybe (a, b), String #) #)
    $w<+> ann1 f ann2 g =
      (# Seq ann1 ann2
       , \s1 -> let !(a, s2) = f s1
                    !(b, s3) = g s2
                in (# (,) <$> a <*> b, s3 #) #)

In practice, little combinators like `$w<+>` are marked INLINE, so `f`
and `g` are usually known rather than unknown calls. This nested CPR
transformation would allow the tuple construction to fuse with the
tuple deconstruction, eliminating quite a lot of unnecessary
boxing/unboxing.

Unfortunately, it seems as though GHC’s implementation of CPR is
entirely first-order: although function arguments are given rich
demand signatures, results are only described one level deep. But as
the above example hopefully illustrates, that’s leaving significant
optimization opportunities on the table! Hence, my questions:

    1. Has this notion of “nested CPR” been explored at all before?
    2. Does such an extension to CPR sound worth its weight?

I peeked a little at GHC.Types.Cpr and GHC.Core.Opt.CprAnal, and it
seems quite manageable to me, but I haven’t actually looked into an
implementation attempt just yet. I’m mostly interested in whether
others have thought about something like this and/or run into similar
issues in the past, or if this is really an unusual construction.

Thanks,
Alexis

[1]: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/07/cpr.pdf

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs