Question on optimization

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Question on optimization

Todd Wilson
Dear Cafe,

In code of the form
    case as ++ bs of
      [] -> ...
      x:xs -> ...
under what circumstances is the overhead of cons-cell creation for the
elements of `as` that would happen in an eager language avoided?
Always? Is the answer the same if `f` is defined by simple list
recursion and we call `f (as ++ bs)`? What if `f` is defined in a
different module (or comes from the Prelude)?

I know that questions of this type can be answered in specific cases
by looking at the (intermediate) code generated by the compiler, but
more generally, are there some succinct principles that everyday
programmers can hang on their cubicle walls that summarize a lot of
what they can expect from the compiler in terms of optimizations?

Todd Wilson
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Question on optimization

Viktor Dukhovni
On Tue, Dec 15, 2020 at 02:05:13PM -0800, Todd Wilson wrote:

> In code of the form
>     case as ++ bs of
>       [] -> ...
>       x:xs -> ...
>
> under what circumstances is the overhead of cons-cell creation for the
> elements of `as` that would happen in an eager language avoided?

In Haskell, even without optimisation, I'd expect the above to
run in constant space and time regardless of the length of `as`.

> Always? Is the answer the same if `f` is defined by simple list
> recursion and we call `f (as ++ bs)`? What if `f` is defined in a
> different module (or comes from the Prelude)?

For example, ghci with no optimisation:

    λ> let f n = [n..n*n]
    λ> case f 1000000 ++ f 10 of { [] -> "empty"; x:xs -> show x }
    "1000000"
    (0.01 secs, 69,680 bytes)
    λ> case f 10000000000000 ++ f 10 of { [] -> "empty"; x:xs -> show x }
    "10000000000000"
    (0.01 secs, 75,384 bytes)

    λ> let x = (1 : undefined) ++ [2]
    (0.01 secs, 63,200 bytes)
    λ> case x of { [] -> "empty"; x:xs -> show x }
    "1"
    (0.01 secs, 64,808 bytes)

So at least with "++" the result is expected to be lazy in the tail of
`as`.  If you interpose some list-generating function:

    f :: [a] -> [b]

called as `f (as ++ bs)`, the result rather depends on how strict `f` is
in the tail of its input.  If `f` can return an initial segment of its
output from just an initial segment of its input, then the same sort
of constant space/time behaviour would be expected from `f`.

> by looking at the (intermediate) code generated by the compiler, but
> more generally, are there some succinct principles that everyday
> programmers can hang on their cubicle walls that summarize a lot of
> what they can expect from the compiler in terms of optimizations?

This feels more like strict/lazy question to me, than optimised vs.
unoptimised.  Haskell promises lazy evaluation even when unoptimised.

    λ> head $ map (+1) $ (1 : undefined) ++ [2]
    2

Here, `map` is but one example of an `f` that is lazy in the tail of its
input:

    map _ [] = []
    map f (x:xs) = f x : map f xs

Generally, any list-to-list function that returns incrementally
consumable output given an infinite list, or a usable initial segment
from a list with a divergent term is one that necessarily only
evaluates some finite initial segment of its input, and is not
sensitive to what lies beyond.

A naive mental model for (++) is then:

    [] ++ bs = bs
    (a:as) ++ bs = a : (as ++ bs)

which is easily seen to be lazy in `as`.

--
    Viktor.
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Question on optimization

Todd Wilson
Thanks, Viktor, for your response, but I think you misunderstood my question:

On Tue, Dec 15, 2020 at 3:14 PM Viktor Dukhovni <[hidden email]> wrote:

>
> On Tue, Dec 15, 2020 at 02:05:13PM -0800, Todd Wilson wrote:
>
> > In code of the form
> >     case as ++ bs of
> >       [] -> ...
> >       x:xs -> ...
> >
> > under what circumstances is the overhead of cons-cell creation for the
> > elements of `as` that would happen in an eager language avoided?
>
> In Haskell, even without optimisation, I'd expect the above to
> run in constant space and time regardless of the length of `as`.

Of course this is constant time and space, but that wasn't what I was
asking. Suppose HNF(as) = a : ys. Then the `case` will cause the
definition of (++) to unfold once and produce a : (ys ++ bs), taking
the second branch with x = a and xs = ys ++ bs, the latter a thunk. My
question was whether a cons-cell containing `a` is created in the heap
or not -- that's the "overhead" I was asking about. In this case, it
seems that the compiler could easily avoid this overhead by inlining
the definition of (++) and doing some basic simplification to arrive
at

> case as of
>   [] -> case bs of
>           [] -> -- 1st branch above
>           x:xs -> -- 2nd branch above
>   x:ys -> let xs = ys ++ bs in -- 2nd branch above

My follow-up questions were about whether this overhead would also be
avoided in `f (as ++ bs)` if f was defined in the same module by list
recursion, or in another module or the Prelude (say f = length).

Here is a similar, but more sophisticated example involving flattening
of nested lists, coded three ways:

> data Nest a = Nil | Cons a (Nest a) | Nest (Nest a) (Nest a)
>
> flatten :: Nest a -> [a]
> flatten Nil = []
> flatten (Cons x n) = x : flatten n
> flatten (Nest n1 n2) = flatten n1 ++ flatten n2
>
> flatten2 n = fl [n] where
>   fl :: [Nest a] -> [a]
>   fl [] = []
>   fl (Nil : ns) = fl ns
>   fl (Cons x n : ns) = x : fl (n:ns)
>   fl (Nest n1 n2 : ns) = fl (n1:n2:ns)
>
> flatten3 n = fl n [] where
>   fl :: Nest a -> [Nest a] -> [a]
>   fl Nil [] = []
>   fl Nil (n:ns) = fl n ns
>   fl (Cons x n) ns = x : fl n ns
>   fl (Nest n1 n2) ns = fl n1 (n2:ns)

On the surface, the difference between these three functions is how
many cons-cells they create, in decreasing order. In `flatten`,
elements are also copied during (++) operations a number of times
equal to their depth in the original nested list; the other two each
copy the original elements once. But does (or can) the compiler
simplify `flatten2` more or less to `flatten3`? It seems to hinge on
how nested patterns are handled.

Again, I wish I had some general principles about what kind of
compiler optimizations are applied, so that I could answer such
questions in the abstract and write code in an easier-to-understand
but seemingly more expensive way knowing that the extra expense was
going to be compiled away.

--Todd
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Question on optimization

Viktor Dukhovni
On Wed, Dec 16, 2020 at 05:04:53PM -0800, Todd Wilson wrote:

> Thanks, Viktor, for your response, but I think you misunderstood my question:

Indeed I read much too little into it.  Sorry about that.

> Here is a similar, but more sophisticated example involving flattening
> of nested lists, coded three ways:
>
> > data Nest a = Nil | Cons a (Nest a) | Nest (Nest a) (Nest a)
> >
> > flatten :: Nest a -> [a]
> > flatten Nil = []
> > flatten (Cons x n) = x : flatten n
> > flatten (Nest n1 n2) = flatten n1 ++ flatten n2

I would have gone with a right fold:

    flatten = foldr k []
      where
        k Nil a = a
        k (Cons x n) a = x : foldr k a n
        k (Nest x y) a = foldr k (foldr k a y) x

In fact, I'd be rather inclined to do this indirectly by instead
defining an instance of Foldable for this structure, and then just
using that to do the flattening:

    instance Foldable Nest where
        foldr _ z Nil = z
        foldr f z (Cons x n) = f x (foldr f z n)
        foldr f z (Nest x y) = foldr f (foldr f z y) x

    flatten = foldr (:) []

> Again, I wish I had some general principles about what kind of
> compiler optimizations are applied, so that I could answer such
> questions in the abstract and write code in an easier-to-understand
> but seemingly more expensive way knowing that the extra expense was
> going to be compiled away.

I don't have an answer for your original question, but I think
you'll find that flattening foldable/traversable data structures
is often optimally done with some sort of right fold, which avoids
the costs of (++), by producing the first element directly, and
then lazily the rest of the structure.

The Traversable instance in this case would be:

    {-# LANGUAGE DerivingFunctor, StandaloneDeriving #-}
    data Nest a = Nil
                | Cons a (Nest a)
                | Nest (Nest a) (Nest a)
        deriving (Show, Functor)

    instance Traversable Nest where
        traverse f Nil = pure Nil
        traverse f (Cons x n) = Cons <$> f x <*> traverse f n
        traverse f (Nest x y) = Nest <$> traverse f x <*> traverse f y

A complete demo program is below my signature, which produces the
expected output:

    [1,2,3,10,4,20]
    Just (Cons 2 (Cons 3 (Nest (Cons 4 (Cons 11 Nil)) (Cons 5 (Cons 21 Nil)))))

I don't know whether indirection via "foldr = go" with `go`
recursive and `foldr` "INLINABLE" and ditto for `traverse` would
create further opportunities for optimisation when the instances
are in a separate module.  Someone else might clear that up, or
you could benchmark and see.  IIRC, recursive functions don't
directly inline without such tweaks, but the tweaks might not
help much.

--
    Viktor.

{-# LANGUAGE DeriveFunctor #-}

module Main (main) where

data Nest a = Nil
            | Cons a (Nest a)
            | Nest (Nest a) (Nest a)
    deriving (Show, Functor)

instance Foldable Nest where
    foldr _ z Nil = z
    foldr f z (Cons x n) = f x (foldr f z n)
    foldr f z (Nest x y) = foldr f (foldr f z y) x

instance Traversable Nest where
    traverse f Nil = pure Nil
    traverse f (Cons x n) = Cons <$> f x <*> traverse f n
    traverse f (Nest x y) = Nest <$> traverse f x <*> traverse f y

flatten :: Nest a -> [a]
flatten = foldr (:) []

nested :: Nest Int
nested =
    Cons 1
    $ Cons 2
    $ Nest (Cons 3 $ Cons 10 Nil)
           (Cons 4 $ Cons 20 Nil)

main :: IO ()
main = do
    print $ flatten nested
    print $ traverse (\i -> Just (i + 1)) nested
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Question on optimization

Viktor Dukhovni
On Thu, Dec 17, 2020 at 12:15:19AM -0500, Viktor Dukhovni wrote:

> I don't know whether indirection via "foldr = go" with `go`
> recursive and `foldr` "INLINABLE" and ditto for `traverse` would
> create further opportunities for optimisation when the instances
> are in a separate module.  Someone else might clear that up, or
> you could benchmark and see.  IIRC, recursive functions don't
> directly inline without such tweaks, but the tweaks might not
> help much.

Naive experiments seem to suggest that inlining via a `go` indirection
that captures `f` is somewhat helpful, but the real win is using a right
fold, vs.  your original flatten implementation.  Here's a comparison
with a somewhat branchy deep structure.

Flatten via Foldable:
---------------------

       2,281,024 bytes allocated in the heap
           3,312 bytes copied during GC
          44,408 bytes maximum residency (1 sample(s))
          25,224 bytes maximum slop
              17 MiB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0         0 colls,     0 par    0.000s   0.000s     0.0000s    0.0000s
  Gen  1         1 colls,     0 par    0.000s   0.000s     0.0005s    0.0005s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    0.002s  (  0.002s elapsed)
  GC      time    0.000s  (  0.000s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time    0.003s  (  0.003s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    1,234,320,346 bytes per MUT second

  Productivity  68.0% of total user, 68.0% of total elapsed

The posted flatten (1000x more allocations! 200x runtime):
--------------------

   2,651,027,056 bytes allocated in the heap
      11,386,968 bytes copied during GC
         164,424 bytes maximum residency (2 sample(s))
          29,320 bytes maximum slop
              27 MiB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       104 colls,     0 par    0.015s   0.015s     0.0001s    0.0004s
  Gen  1         2 colls,     0 par    0.001s   0.001s     0.0007s    0.0010s

  INIT    time    0.001s  (  0.001s elapsed)
  MUT     time    0.715s  (  0.715s elapsed)
  GC      time    0.016s  (  0.016s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time    0.732s  (  0.732s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    3,705,952,327 bytes per MUT second

  Productivity  97.7% of total user, 97.7% of total elapsed

--
    Viktor.

{-# LANGUAGE DeriveFunctor #-}

module Nest
    ( Nest(..)
    ) where

data Nest a = Nil
            | Cons a (Nest a)
            | Nest (Nest a) (Nest a)
    deriving (Show, Functor)

instance Foldable Nest where
    {-# INLINE foldr #-}
    foldr f = go
      where
        go z Nil = z
        go z (Cons x n) = f x (go z n)
        go z (Nest x y) = go (go z y) x

instance Traversable Nest where
    {-# INLINE traverse #-}
    traverse f = go
      where
        go Nil = pure Nil
        go (Cons x n) = Cons <$> f x <*> go n
        go (Nest x y) = Nest <$> go x <*> go y
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.