shared local definitions

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

shared local definitions

Alberto Ruiz-2
Hi all,

I have a question about optimization of shared values in local definitions. I
frequently use this scheme:

fun a b c d x = r where
    q = costly computation depending only on a, b, c, and d
    r = depends only on q and x

g1 = fun 1 2 3 4
g2 = fun 5 4 2 7
(etc.)

When I compute (using ghc -O) things like

map g1 [1 .. 1000]

the common q is evaluated only once, which is very nice. But the problem
is that in some strange cases this kind of optimization is not applied, and
the same q is evaluated 1000 times. Curiously, this happens if I add just:

module Main where

to one of my programs. Optimization is also lost if in the same program I use
two partially applied functions:

map g1 [1 .. 1000]
map g2 [1 .. 1000]

And in some cases optimization is only applied if the local definition is
"easy" enough. For example:

---------------------------------------------------
fun1 :: Int -> Int -> Int
fun1 a x = q*x where
    q = {-# SCC "easy" #-} a+1+a^2   -- OK
 
fun2 :: Int -> Int -> Int
fun2 a x = q*x where
    q = {-# SCC "hard" #-} a+1+a^2 +a^3+(2*a)  -- NO

fun3 :: Int -> Int -> Int
fun3 a x = r where
    q = local a
    r = q*x
    local u = {-# SCC "local easy" #-} u+1  -- OK
 
fun4 :: Int -> Int -> Int
fun4 a x = r where
    q = local a
    r = q*x
    local u = {-# SCC "local hard" #-} u+1+u^2  -- NO
 
test h = print $ sum $ map h [1 .. 100]    
     
main = do
    test (fun1 3)
    test (fun2 3)
    test (fun3 3)
    test (fun4 3)
--------------------------------------------------
COST CENTRE              MODULE         no.    entries

MAIN                     MAIN            1           0
 main                    Main          154         101
  fun4                   Main          166         300
   local hard            Main          167         100   NO
  fun3                   Main          163         100  
  fun2                   Main          161         100    
   hard                  Main          162         100   NO
  fun1                   Main          157         100          
  test                   Main          155           4
 CAF                     Main          148           6
  fun4                   Main          168           0
  main                   Main          158           0
   fun3                  Main          164           1
    local easy           Main          165           1   OK
   fun1                  Main          159           0
    easy                 Main          160           1   OK
  test                   Main          156           0
 CAF                     System.IO     103           1
 CAF                     GHC.Handle    101           3
------------------------------------------------------------

Where can I find information about this topic? I have made some searches but
probably using wrong keywords. Should I use some individual optimization
flags?

Thanks,

Alberto
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: shared local definitions

Duncan Coutts
On Thu, 2006-05-18 at 11:00 +0200, Alberto Ruiz wrote:

> When I compute (using ghc -O) things like
>
> map g1 [1 .. 1000]
>
> the common q is evaluated only once, which is very nice. But the problem
> is that in some strange cases this kind of optimization is not applied, and
> the same q is evaluated 1000 times. Curiously, this happens if I add just:
>
> module Main where

This is explained by how it affects what functions your module exports.
When you have no module header, the default is that it is the Main
module and that nothing other than the function main is exported. When
you add "module Main where" then everything in the module is exported.
The compiler can often optimise things better if it knows that they do
not need to be exported.

For example if a function is not exported and only used once then the
compiler will probably decide that it is beneficial to inline. If it
needs to keep a copy of the function to export from the module then it
may decide that the code duplication is not worth it and not inline.

Duncan

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

RE: shared local definitions

Simon Peyton Jones
In reply to this post by Alberto Ruiz-2
Consider
        f x y = let r = expensive x in r+y
        g vs = map (f 2) vs

You are expecting (expensive 2) to be computed just once.  That is
indeed what will happen if you write

        f_opt x = let r = expensive x in \y -> r+y
        g_opt vs = map (f_opt 2) vs

It's easy enough to transform f into f_opt.  (This is called the "full
laziness" transformation.)  BUT in the cases when f is fully-applied,
f_opt is *less* efficient than f; consider
        h ys zs = zipWith f_opt ys zs
Reason: it's much less efficient to have separate lambdas than one
compound lambda \xy -> e.

So the best way to transform f depends on how it is used.  When it's
used locally and just once, GHC inlines it at the call site and all is
good.  But when it's exported or called many times, GHC never "floats" a
let *between* two lambdas.  So it won't transform f into f_opt. On the
other hand, if you write f_opt, GHC will keep it that way.

I've added a FAQ entry about this.

S


| -----Original Message-----
| From: [hidden email]
[mailto:[hidden email]] On Behalf Of
| Alberto Ruiz
| Sent: 18 May 2006 10:00
| To: [hidden email]
| Subject: [Haskell-cafe] shared local definitions
|
| Hi all,
|
| I have a question about optimization of shared values in local
definitions. I
| frequently use this scheme:
|
| fun a b c d x = r where
|     q = costly computation depending only on a, b, c, and d
|     r = depends only on q and x
|
| g1 = fun 1 2 3 4
| g2 = fun 5 4 2 7
| (etc.)
|
| When I compute (using ghc -O) things like
|
| map g1 [1 .. 1000]
|
| the common q is evaluated only once, which is very nice. But the
problem
| is that in some strange cases this kind of optimization is not
applied, and
| the same q is evaluated 1000 times. Curiously, this happens if I add
just:
|
| module Main where
|
| to one of my programs. Optimization is also lost if in the same
program I use
| two partially applied functions:
|
| map g1 [1 .. 1000]
| map g2 [1 .. 1000]
|
| And in some cases optimization is only applied if the local definition
is
| "easy" enough. For example:
|
| ---------------------------------------------------
| fun1 :: Int -> Int -> Int
| fun1 a x = q*x where
|     q = {-# SCC "easy" #-} a+1+a^2   -- OK
|
| fun2 :: Int -> Int -> Int
| fun2 a x = q*x where
|     q = {-# SCC "hard" #-} a+1+a^2 +a^3+(2*a)  -- NO
|
| fun3 :: Int -> Int -> Int
| fun3 a x = r where
|     q = local a
|     r = q*x
|     local u = {-# SCC "local easy" #-} u+1  -- OK
|
| fun4 :: Int -> Int -> Int
| fun4 a x = r where
|     q = local a
|     r = q*x
|     local u = {-# SCC "local hard" #-} u+1+u^2  -- NO
|
| test h = print $ sum $ map h [1 .. 100]
|
| main = do
|     test (fun1 3)
|     test (fun2 3)
|     test (fun3 3)
|     test (fun4 3)
| --------------------------------------------------
| COST CENTRE              MODULE         no.    entries
|
| MAIN                     MAIN            1           0
|  main                    Main          154         101
|   fun4                   Main          166         300
|    local hard            Main          167         100   NO
|   fun3                   Main          163         100
|   fun2                   Main          161         100
|    hard                  Main          162         100   NO
|   fun1                   Main          157         100
|   test                   Main          155           4
|  CAF                     Main          148           6
|   fun4                   Main          168           0
|   main                   Main          158           0
|    fun3                  Main          164           1
|     local easy           Main          165           1   OK
|    fun1                  Main          159           0
|     easy                 Main          160           1   OK
|   test                   Main          156           0
|  CAF                     System.IO     103           1
|  CAF                     GHC.Handle    101           3
| ------------------------------------------------------------
|
| Where can I find information about this topic? I have made some
searches but
| probably using wrong keywords. Should I use some individual
optimization
| flags?
|
| Thanks,
|
| Alberto
| _______________________________________________
| Haskell-Cafe mailing list
| [hidden email]
| http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

RE: shared local definitions

ajb@spamcop.net
G'day all.

Quoting Simon Peyton-Jones <[hidden email]>:

> So the best way to transform f depends on how it is used.  When it's
> used locally and just once, GHC inlines it at the call site and all is
> good.  But when it's exported or called many times, GHC never "floats" a
> let *between* two lambdas.  So it won't transform f into f_opt. On the
> other hand, if you write f_opt, GHC will keep it that way.

While this reasoning makes perfect sense, it does tend to violate the
principle of least surprise.  I would expect a Haskell implementation
either to provide full laziness or not.  (Possibly with a compiler
switch.)  This looks more like a quirk of the STG Core.  An intermediate
representation which treated multiple lambdas separately in its
intermediate language would have far fewer qualms about applying
let-floating in this case.  Actually, it might be mildly amusing to
see if Gofer runs this code faster than Hugs or GHCi.

(As an aside, this issue bit me not so long ago.  I was trying to
unroll a recursive function at run-time.  It took quite a bit of
eta-conversion to get it right, and it was only in reading this mail
that I finally worked out what was going wrong.)

Perhaps Haskell' might like to look into this.  H98 demands laziness,
not full laziness, but it seems to me that this is exactly the sort of
thing that a programmer might unconsciously rely on that becomes a hard
to track down performance bug when switching implementations.

Cheers,
Andrew Bromage
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe