[Haskell-begin] Maximum of a List?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

[Haskell-begin] Maximum of a List?

Steve Klabnik
Hello everyone-

I'm working on some project Euler problems today, and I'm stuck on one. It's
not the problem itself that's the problem, it's that finding the maximum of
a list makes me run out of heap space!

http://projecteuler.net/index.php?section=problems&id=14

my code:

import Data.List

f :: Int -> Int -> Int
f acc x
    | x == 1 = acc
        | even x = f (acc + 1) (x `div` 2)
        | otherwise = f (acc + 1) (3 * x + 1)

answer = (foldl' max 0) $ map (f 1) [1 .. 999999]

I tried using 'foldl' max ' instead of 'maximum' because I thought that
foldl' was supposed to work better than foldl or something...I could be
confused on that point. Anyway, here's what I get...

  / _ \ /\  /\/ __(_)
 / /_\// /_/ / /  | |      GHC Interactive, version 6.4, for Haskell 98.
/ /_\\/ __  / /___| |      http://www.haskell.org/ghc/
\____/\/ /_/\____/|_|      Type :? for help.

Loading package base-1.0 ... linking ... done.
Prelude> :l ..\test.hs
Compiling Main             ( ..\test.hs, interpreted )
Ok, modules loaded: Main.
*Main> answer
GHC's heap exhausted: current limit is 268435456 bytes;
Use the `-M<size>' option to increase the total heap size.

I also tried writing a simple 'max' function that I thought was tail
recursive, and that that would help. Same error. I'm finding it hard to
believe that finding a maximum of a list of a million small integers would
cause this kind of overflow...is it really that big of a problem?

Any help would be appreciated. I have a feeling I'll run into this problem
again in the future.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/beginners/attachments/20080726/b7645d1a/attachment.htm
Reply | Threaded
Open this post in threaded view
|

[Haskell-begin] Maximum of a List?

Braden Shepherdson-2
Steve Klabnik wrote:

> Hello everyone-
>
> I'm working on some project Euler problems today, and I'm stuck on one.
> It's not the problem itself that's the problem, it's that finding the
> maximum of a list makes me run out of heap space!
>
> http://projecteuler.net/index.php?section=problems&id=14 
> <http://projecteuler.net/index.php?section=problems&id=14>
>
> my code:
>
> import Data.List
>
> f :: Int -> Int -> Int
> f acc x
>     | x == 1 = acc
>         | even x = f (acc + 1) (x `div` 2)
>         | otherwise = f (acc + 1) (3 * x + 1)
>
> answer = (foldl' max 0) $ map (f 1) [1 .. 999999]
>
> I tried using 'foldl' max ' instead of 'maximum' because I thought that
> foldl' was supposed to work better than foldl or something...I could be
> confused on that point. Anyway, here's what I get...
>
>   / _ \ /\  /\/ __(_)
>  / /_\// /_/ / /  | |      GHC Interactive, version 6.4, for Haskell 98.
> / /_\\/ __  / /___| |      http://www.haskell.org/ghc/
> \____/\/ /_/\____/|_|      Type :? for help.
>
> Loading package base-1.0 ... linking ... done.
> Prelude> :l ..\test.hs
> Compiling Main             ( ..\test.hs, interpreted )
> Ok, modules loaded: Main.
> *Main> answer
> GHC's heap exhausted: current limit is 268435456 bytes;
> Use the `-M<size>' option to increase the total heap size.
>
> I also tried writing a simple 'max' function that I thought was tail
> recursive, and that that would help. Same error. I'm finding it hard to
> believe that finding a maximum of a list of a million small integers
> would cause this kind of overflow...is it really that big of a problem?

Hello Steve,

First, is there a reason to use such an old GHC? Newer versions produce
much better code.

That said, certainly 6.4 is sufficient for this problem. foldl' is the
right choice here, it is strict in the accumulating parameter and that
is what you want. The problem is the two parameters in f. When you make
the recursive call 'f (acc + 1) ...', a thunk is being created to hold
that calculation (acc + 1). It isn't being evaluated right away, so the
thunk remains. On the next iteration, acc is incremented again. You are
effectively building up the expression 'acc + 1 + 1 + 1 + 1 ...',
perhaps quite deeply. This takes up heap space; apparently too much
space. Similarly, the (x `div` 2) and (3 * x + 1) calculations are being
suspended as thunks, though x gets forced on the next call, since it is
tested by the guards.

Adding the pragma
{-# LANGUAGE BangPatterns #-} at the top of the file, you could then
annotate acc:

f !acc x

this seems to run in constant space for me.

Note that there is a much more concise way to write f, though I doubt it
would be much faster. Consider 'iterate' and 'takeWhile'.

Finally, compiling to a binary with ghc -O2 will yield much better speed
than ghci. And GHC 6.8.3 would provide a further 20%+ speed boost, if
installing it is an option on your environment.


Hope this helps,

Braden Shepherdson
shepheb

Reply | Threaded
Open this post in threaded view
|

[Haskell-begin] Maximum of a List?

Niels Aan de Brugh-3
In reply to this post by Steve Klabnik
Steve Klabnik wrote:

> Hello everyone-
>
> I'm working on some project Euler problems today, and I'm stuck on
> one. It's not the problem itself that's the problem, it's that finding
> the maximum of a list makes me run out of heap space!
>
> http://projecteuler.net/index.php?section=problems&id=14 
> <http://projecteuler.net/index.php?section=problems&id=14>
>
> my code:
>
> import Data.List
>
> f :: Int -> Int -> Int
> f acc x
>     | x == 1 = acc
>         | even x = f (acc + 1) (x `div` 2)
>         | otherwise = f (acc + 1) (3 * x + 1)
>
> answer = (foldl' max 0) $ map (f 1) [1 .. 999999]
>
> I tried using 'foldl' max ' instead of 'maximum' because I thought
> that foldl' was supposed to work better than foldl or something...I
> could be confused on that point. Anyway, here's what I get...
>
I've had the same problem when solving that particular puzzle.

The code that solves the puzzle brute-force is following:

f x | even x    = x `div` 2
     | otherwise = 3*x + 1
col n = (takeWhile (/=1) $ iterate f n) ++ [1]
ls = [(length $ col n, n) | n <- [1..1000000]]

Now, with

  maximumBy (\(a,_) (b,_) -> compare a b) ls

I run out of stack space. If I'm not mistaken, maxmimumBy uses foldl, so
maybe it blows up in the number of comparisons it stacks up (rather than
reduce these immediately).

Using foldl' or a custom made function (I used one so I could print
intermediate results):

  m' x [] = x
  m' (l,e) ((l',e'):xs) | l > l' = m' (l,e) xs
                              | otherwise = trace ("max: " ++ show
(l',e')) (m' (l',e') xs)

the problem can be solved. It does take a little while (over two
minutes). For all Euler puzzles I've solved so far I've always tried to
minimize "run time" + "development time", not just the run time.

The solution you gave (using foldl') works on my PC (GHC(i) 6.8.2
(Ubuntu package), Linux, 2GB of RAM). That is: it doesn't run out of
stack space. Unfortunately I wasn't able to get an answer out of it
either before I had to kill the process. I'm not sure which function is
so greedy on memory. Maybe you could try making the arguments for f strict.

The thing could probably be heavily optimized (like many Euler puzzles)
by adding state and keeping intermediate results (for f) in an array.

Niels
Reply | Threaded
Open this post in threaded view
|

[Haskell-begin] Maximum of a List?

Niels Aan de Brugh-3
In reply to this post by Steve Klabnik
Steve Klabnik wrote:
> f :: Int -> Int -> Int
> f acc x
>     | x == 1 = acc
>         | even x = f (acc + 1) (x `div` 2)
>         | otherwise = f (acc + 1) (3 * x + 1)
One more remark: you're using Ints. This type is limited to
[-2^31..2^31]. Maybe you should use Integers, or only specify that the
in- and output should be numeric, or perhaps even omit the type
annotation altogether.

Try running: f 1 113383
(with Int and Integer/no type annotation)

Niels
Reply | Threaded
Open this post in threaded view
|

Fwd: [Haskell-begin] Maximum of a List?

Rafael Gustavo da Cunha Pereira Pinto
I tried to solve the same problem and came out with a O(log n) solution. It
is pretty quick, since I keep track of the steps on a map structure.

Even though it runs really fast, it hogs on memory just like Steve's
version.

Following is the full source, for your consideration:


======================Euler14.hs======================
import qualified Data.Map as Map
import qualified Data.List as List

type Table=Map.Map Integer Integer

rank::Table->Integer->(Table,Integer)
rank s n= (s',r)
    where
        nxt=if even n then (n `div` 2) else (3*n+1)
        r=case (Map.lookup n s) of
            Just a -> a
            Nothing-> (1 + (snd $ rank s nxt))
        s'=Map.insert n r s

search::Integer->Integer
search n = case List.findIndex (\a -> a==ms) sw of
                Just a -> toInteger(a) +1
                Nothing -> -1
    where s=Map.singleton 1 1
          sw=searchWork s [1..n]
          ms=maximum sw

searchWork::Table->[Integer]->[Integer]
searchWork s []=[]
searchWork s (i:is)= r:(searchWork s' is)
    where
        (s',r)=rank s i

main = do
    print $ search 1000000
======================Euler14.hs======================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/beginners/attachments/20080727/fbb73a34/attachment.htm
Reply | Threaded
Open this post in threaded view
|

Re: [Haskell-begin] Maximum of a List?

Steve Klabnik
In reply to this post by Niels Aan de Brugh-3
First of all, thanks for pointing out the old ghci. I had installed it on a
lab computer by just googling 'ghci windows' and that was what came up. My
home machine has the newest one, and is also considerably beefier
hardware-wise.

I think you're onto something with the Int/Integer thing....when using a
type signature of "Integer", "f  1 113383" gives "248" immediately.
Compiling and running the code with "Integer" types on my home machine
yields "525".... which Euler says isn't the right answer?

Also, on the strictness annotations: Do you put them in the type
declaration? Or in the pattern match on the lhs of the declaration?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/beginners/attachments/20080728/2786ad77/attachment-0001.htm
Reply | Threaded
Open this post in threaded view
|

Re: [Haskell-begin] Maximum of a List?

Daniel Fischer-4
Am Montag, 28. Juli 2008 06:01 schrieb Steve Klabnik:
> First of all, thanks for pointing out the old ghci. I had installed it on a
> lab computer by just googling 'ghci windows' and that was what came up. My
> home machine has the newest one, and is also considerably beefier
> hardware-wise.
>
> I think you're onto something with the Int/Integer thing....when using a
> type signature of "Integer", "f  1 113383" gives "248" immediately.
> Compiling and running the code with "Integer" types on my home machine
> yields "525".... which Euler says isn't the right answer?

You might want to re-read the question :)
>
> Also, on the strictness annotations: Do you put them in the type
> declaration? Or in the pattern match on the lhs of the declaration?

You want the strictness annotations in the pattern match. Strictness
annotations can also appear in data declarations/definitions, but AFAIK not
in type signatures.

Cheers,
Daniel

Reply | Threaded
Open this post in threaded view
|

Re: [Haskell-begin] Maximum of a List?

Niels Aan de Brugh-3
In reply to this post by Steve Klabnik
Steve Klabnik wrote:
> I think you're onto something with the Int/Integer thing....when using
> a type signature of "Integer", "f  1 113383" gives "248" immediately.
> Compiling and running the code with "Integer" types on my home machine
> yields "525".... which Euler says isn't the right answer?
Because you're not providing the right number. :-)
> Also, on the strictness annotations: Do you put them in the type
> declaration? Or in the pattern match on the lhs of the declaration?
I've tested your code with strictness annotations and it appears to not
make a difference. GHC employs several optimization techniques, one of
those being strictness analysis, so maybe it is already using a strict,
unboxed integer.

The real speed-up (a non-linear one) here is not to re-calculate every
sequence over and over again, but keep it in a map/array (as suggested
by Rafael and me). I've found some Euler puzzles are impossible to solve
without this technique.

Niels

P.S.: If you're really going for speed, compile (not interpret) the code
(using -O -fvia-C, and there's some more stuff in the manual) using the
latest greatest version of GHC.
Reply | Threaded
Open this post in threaded view
|

Re: [Haskell-begin] Maximum of a List?

Daniel Fischer-4
Am Montag, 28. Juli 2008 18:15 schrieb Niels Aan de Brugh:

> Steve Klabnik wrote:
> > I think you're onto something with the Int/Integer thing....when using
> > a type signature of "Integer", "f  1 113383" gives "248" immediately.
> > Compiling and running the code with "Integer" types on my home machine
> > yields "525".... which Euler says isn't the right answer?
>
> Because you're not providing the right number. :-)
>
> > Also, on the strictness annotations: Do you put them in the type
> > declaration? Or in the pattern match on the lhs of the declaration?
>
> I've tested your code with strictness annotations and it appears to not
> make a difference. GHC employs several optimization techniques, one of
> those being strictness analysis, so maybe it is already using a strict,
> unboxed integer.

Using -O2, GHC produces the same core with or without strictness annotations,
the 'acc' parameter of f is an unboxed Int, so the optimiser indeed sees it
is needed. The Integer parameter unfortunately can't be unboxed. A native
64-bit integer type might gain something.

>
> The real speed-up (a non-linear one) here is not to re-calculate every
> sequence over and over again, but keep it in a map/array (as suggested
> by Rafael and me).

Using a Map, though easier to code, doesn't buy you much here, indeed less
than a few easy considerations where the maximal length cannot occur. The
problem is that you have to update the Map often, which is rather costly,
also the Map takes a lot of memory. Using an array is more convoluted, but
much faster (if done right).

> I've found some Euler puzzles are impossible to solve
> without this technique.

At least, it would take far longer to get the solution.
>
> Niels
>
> P.S.: If you're really going for speed, compile (not interpret) the code
> (using -O -fvia-C, and there's some more stuff in the manual) using the
> latest greatest version of GHC.

Nowadays, the native code generator is not necessarily slower than -fvia-C,
often faster. So the best should be to test both, -O2 and -O2 -fvia-C
-optc-On, for some medium sized inputs and see which actually is faster.

Cheers,
Daniel

Reply | Threaded
Open this post in threaded view
|

Re: [Haskell-begin] Maximum of a List?

Steve Klabnik
In reply to this post by Niels Aan de Brugh-3
On Mon, Jul 28, 2008 at 12:15 PM, Niels Aan de Brugh <[hidden email]>wrote:

>
> Because you're not providing the right number. :-)
>

You know, I realized that this morning after I woke up....ha ha. Sometimes a
good night's sleep is all it takes...


> I've tested your code with strictness annotations and it appears to not
> make a difference. GHC employs several optimization techniques, one of those
> being strictness analysis, so maybe it is already using a strict, unboxed
> integer.
>

Ah. I figured that I probably didn't have to worry about it manually...


>
> The real speed-up (a non-linear one) here is not to re-calculate every
> sequence over and over again, but keep it in a map/array (as suggested by
> Rafael and me). I've found some Euler puzzles are impossible to solve
> without this technique.
>
> Niels
>
> P.S.: If you're really going for speed, compile (not interpret) the code
> (using -O -fvia-C, and there's some more stuff in the manual) using the
> latest greatest version of GHC.
>


I'll keep this in mind. Right now I'm going for just doing it, but I'll heed
your advice for future problems.



Finally, what I ended up doing:

f :: Integer -> Integer -> Integer
f acc x
    | x == 1 = acc
        | even x = f (acc + 1) (x `div` 2)
        | otherwise = f (acc + 1) (3 * x + 1)

g :: Integer -> (Integer, Integer)
g x = (f 1 x, x)

answer = (foldl' max (0,0)) $ map g [1 .. 999999]


main = putStrLn( show answer)


Again, not the most efficient, but pretty clean. Runs in just under a minute
on my machine. Thanks again!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/beginners/attachments/20080728/855e6787/attachment.htm
Reply | Threaded
Open this post in threaded view
|

Re: [Haskell-begin] Maximum of a List?

Niels Aan de Brugh-3
In reply to this post by Daniel Fischer-4
Daniel Fischer wrote:
> Using -O2, GHC produces the same core with or without strictness annotations,
> the 'acc' parameter of f is an unboxed Int, so the optimiser indeed sees it
> is needed. The Integer parameter unfortunately can't be unboxed. A native
> 64-bit integer type might gain something.
>  
Right. For this puzzle (n up to 1000000) it's quite safe to say an Int64
will never overflow. :-)

I was wondering how the Integer type is implemented. If I were to
implement such a type my first approach would be to keep a list of
digits (and perform calculation like I do by hand). But operations on
Integers are very fast, even on very big numbers, so I guess that's not
the approach taken by the GHC developers. (For example, calculating the
faculties of [1..1000] takes a little under 300ms to perform on my Intel
PC; if I don't pipe to /dev/null it takes about 10s because of all the
digits being printed.)

Having a type like Integer on board makes many of the puzzles on Project
Euler a shoo-in. One of the questions is indeed to calculate the last n
digits of 1000!.
> Using a Map, though easier to code, doesn't buy you much here, indeed less
> than a few easy considerations where the maximal length cannot occur. The
> problem is that you have to update the Map often, which is rather costly,
> also the Map takes a lot of memory. Using an array is more convoluted, but
> much faster (if done right).
>  
The problem with an array is finding a good upper limit. One could of
course copy the whole array if a new upper bound is found, the costs
would amortize over the whole run. Having a really big and sparse array
is no good either (no/bad locality of reference).

Oh well, the puzzle itself definitely isn't worth the effort, but maybe
some custom made heap structure would perform best.
> At least, it would take far longer to get the solution.
>  
Yes, but sometimes the difference is several days vs. a couple of minutes.
> Nowadays, the native code generator is not necessarily slower than -fvia-C,
> often faster. So the best should be to test both, -O2 and -O2 -fvia-C
> -optc-On, for some medium sized inputs and see which actually is faster.
>  
Interesting. In which release has the back-end been reworked? Does it
work equally well for architectures other than Intel?

Niels

Reply | Threaded
Open this post in threaded view
|

Re: [Haskell-begin] Maximum of a List?

Daniel Fischer-4
Am Montag, 28. Juli 2008 23:35 schrieb Niels Aan de Brugh:
> Right. For this puzzle (n up to 1000000) it's quite safe to say an Int64
> will never overflow. :-)

Fairly small risk. You can't be sure beforehand, though.

>
> I was wondering how the Integer type is implemented.

Two constructors, using native machine integers and their arithmetics when
possible because it's way faster, using GMP for larger numbers. There's work
going on to replace GMP, but I don't know the state.

-- | Arbitrary-precision integers.
data Integer    
   = S# Int#                            -- small integers
#ifndef ILX
   | J# Int# ByteArray#                 -- large integers
#else
   | J# Void BigInteger                 -- .NET big ints

foreign type dotnet "BigInteger" BigInteger
#endif

> If I were to implement such a type my first approach would be to keep
> a list of digits (and perform calculation like I do by hand).

I'd do something along those lines, too - of course I wouldn't use decimal
digits, rather use base 2^16 (easy to control overflow then). But if I had to
do it in earnest, I'd research some algorithms first.

> Having a type like Integer on board makes many of the puzzles on Project
> Euler a shoo-in. One of the questions is indeed to calculate the last n
> digits of 1000!.

Don't you mean the sum of the digits of 100! ?
The idea behind those questions was to have people write their own BigInt
library, from the days when they weren't so widespread. Nowadays everything
fits into 64 bits (overflow prevention may be necessary, though).

>
> > Using a Map, though easier to code, doesn't buy you much here, indeed
> > less than a few easy considerations where the maximal length cannot
> > occur. The problem is that you have to update the Map often, which is
> > rather costly, also the Map takes a lot of memory. Using an array is more
> > convoluted, but much faster (if done right).
>
> The problem with an array is finding a good upper limit.

1000000 is a good upper limit. What makes the code convoluted is how you
handle the values above :)

> One could of
> course copy the whole array if a new upper bound is found, the costs
> would amortize over the whole run. Having a really big and sparse array
> is no good either (no/bad locality of reference).
>
> Oh well, the puzzle itself definitely isn't worth the effort, but maybe
> some custom made heap structure would perform best.
>
> > At least, it would take far longer to get the solution.
>
> Yes, but sometimes the difference is several days vs. a couple of minutes.

Or even years vs. seconds.

>
> > Nowadays, the native code generator is not necessarily slower than
> > -fvia-C, often faster. So the best should be to test both, -O2 and -O2
> > -fvia-C -optc-On, for some medium sized inputs and see which actually is
> > faster.
>
> Interesting. In which release has the back-end been reworked?

I think it was the 6.8 branch. I can't remember which, but I had programmes
that performed better when compiled via C, others with the native code
generator, though I never found big differences. But when going via C, don't
forget -optc-On (I habitually use n = 3), otherwise gcc won't do much with
the generated C.
Generally everything got faster with the 6.8 branch, let's see what 6.10 does.

> Does it work equally well for architectures other than Intel?

No Idea, you might ask on glasgow-haskell-users, somebody there should know.
>
> Niels

Cheers,
Daniel

Reply | Threaded
Open this post in threaded view
|

Re: [Haskell-begin] Maximum of a List?

Rafael Gustavo da Cunha Pereira Pinto
In reply to this post by Steve Klabnik
Since this is the "beginners" list, could someone explain me why using g
made everything run like the wind, with almost no memory?

I am puzzled! :-(

On Mon, Jul 28, 2008 at 17:32, Steve Klabnik <[hidden email]>wrote:

>
>
> Finally, what I ended up doing:
>
> f :: Integer -> Integer -> Integer
> f acc x
>     | x == 1 = acc
>         | even x = f (acc + 1) (x `div` 2)
>         | otherwise = f (acc + 1) (3 * x + 1)
>
> g :: Integer -> (Integer, Integer)
> g x = (f 1 x, x)
>
> answer = (foldl' max (0,0)) $ map g [1 .. 999999]
>
>
> main = putStrLn( show answer)
>
>
>


--
Rafael Gustavo da Cunha Pereira Pinto
Electronic Engineer, MSc.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/beginners/attachments/20080728/b7cdf249/attachment.htm
Reply | Threaded
Open this post in threaded view
|

Re: [Haskell-begin] Maximum of a List?

Daniel Fischer-4
Am Dienstag, 29. Juli 2008 01:53 schrieb Rafael Gustavo da Cunha Pereira
Pinto:
> Since this is the "beginners" list, could someone explain me why using g
> made everything run like the wind, with almost no memory?

I'd rate "run like the wind" as an exaggeration since the completely
unoptimised C version runs in 1.6 seconds vs. 54 seconds for the code below
(49 seconds if using Int for the chain length) on my box. And the "g" isn't
important for speed or memory, just for getting the correct result.

Two factors make it run in constant memory without stack overflow:
1. strictness
2. laziness.

The strictness of foldl' is the reason that every g k is evaluated without
delay and can immediately be thrown away (unless it's the maximum so far, in
which case the previous maximum can be discarded) and the laziness of map and
enumFromTo (remember that [a .. b] is syntactic sugar for enumFromTo a b)
allows the list to be consumed as it is generated.

Let us trace a few evaluation steps of answer:
answer
---> foldl' max (0,0) $ map g (enumFromTo 1 999999)
-- First, we must see whether the list is empty or not,
-- 1 <= 999999, so the list is not empty
---> foldl' max (0,0) $ map g (1 : enumFromTo (1+1) 999999)
---> foldl' max (0,0) $ g 1 : map g (enumFromTo (1+1) 999999)
---> foldl' max (max (0,0) (g 1)) $ map g (enumFromTo (1+1) 999999)
-- Now, due to the strictness of foldl', max (0,0) (g 1) has to be evaluated
-- far enough to know if it is _|_ or not. For that, we obviously have to
-- compare (0,0) and g 1, hence we must evaluate g 1 far enough to see if
-- (0,0) <= g 1. Now g 1 = (f 1 1, 1) and we must check whether 0 <= f 1 1.
-- However, f 1 1 = 1, so yes, and max (0,0) (g 1) turns out to be (1,1)
---> foldl' max (1,1) $ map g (enumFromTo (1+1) 999999)
-- Now to decide if the list is empty, (1+1) must be compared to 999999,
-- for that it must be evaluated, so
---> foldl' max (1,1) $ map g (2 : enumFromTo (2+1) 999999)
---> foldl' max (1,1) $ g 2 : map g (enumFromTo (2+1) 999999)
---> foldl' max (max (1,1) (g 2)) $ map g (enumFromTo (2+1) 999999)
-- Is max (1,1) (g 2) _|_ or not? To find out evaluate g 2 = (2,2)
---> foldl' max (2,2) $ map g (enumFromTo (2+1) 999999)
---> foldl' max (2,2) $ map g (3 : enumFromTo (3+1) 999999)
---> foldl' max (2,2) $ g 3 : map g (enumFromTo (3+1) 999999)
---> foldl' max (max (2,2) (g 3)) $ map g (enumFromTo (3+1) 999999)
---> foldl' max (8,3) $ map g (enumFromTo (3+1) 999999)
...
---> foldl' max (max (8,3) (g 4)) $ map g (enumFromTo (4+1) 999999)
---> foldl' max (8,3) $ map g (enumFromTo (4+1) 999999)

and so on. Hardly any memory needed.
But you have more than 100,000,000 calls to f.
Now if you store previous results in a Map, you save many of these calls (to
rank in your code), but you pay for it with memory usage (and a couple of
million expensive insertions into the Map). For every entry in the Map, apart
from the space for key and value, you also have an Int holding the size of
the subtree and pointers to the two subtrees of the node, on a 32-bit box
that's a theoretical minimal overhead of 12 bytes per entry, probably rather
20 in reality, so a Map Integer Integer with one million entries would use
some 30 MB memory.

Hope that helps,
Daniel

>
> I am puzzled! :-(
>
> On Mon, Jul 28, 2008 at 17:32, Steve Klabnik <[hidden email]>wrote:
> > Finally, what I ended up doing:
> >
> > f :: Integer -> Integer -> Integer
> > f acc x
> >
> >     | x == 1 = acc
> >     |
> >         | even x = f (acc + 1) (x `div` 2)
> >         | otherwise = f (acc + 1) (3 * x + 1)
> >
> > g :: Integer -> (Integer, Integer)
> > g x = (f 1 x, x)
> >
> > answer = (foldl' max (0,0)) $ map g [1 .. 999999]
> >
> >
> > main = putStrLn( show answer)

Reply | Threaded
Open this post in threaded view
|

Re: [Haskell-begin] Maximum of a List?

Rafael Gustavo da Cunha Pereira Pinto
Thanks Daniel!

I only asked, because I was comparing the versions:

1) Steve's first version, without g x took forever to run and ate all the
stack.
2) Mine (with Map) ran somewhat fast, but it takes almost 2 minutes running
and eats 48MB of stack
3) Steve's last version (with g x) took less than 30s on my machine and
needed no more than 8MB of stack.


If I understood it right, using (g x) on foldl' forced the strictness of the
thunk (acc+1), without the need of using the bang pattern.

I'll try to play with the strictness on my code, trying to take advantage of
both the Map and strictness.

Rafael


On Mon, Jul 28, 2008 at 23:15, Daniel Fischer <[hidden email]>wrote:

> Am Dienstag, 29. Juli 2008 01:53 schrieb Rafael Gustavo da Cunha Pereira
> Pinto:
> > Since this is the "beginners" list, could someone explain me why using g
> > made everything run like the wind, with almost no memory?
>
> I'd rate "run like the wind" as an exaggeration since the completely
> unoptimised C version runs in 1.6 seconds vs. 54 seconds for the code below
> (49 seconds if using Int for the chain length) on my box. And the "g" isn't
> important for speed or memory, just for getting the correct result.
>
> Two factors make it run in constant memory without stack overflow:
> 1. strictness
> 2. laziness.
>
> The strictness of foldl' is the reason that every g k is evaluated without
> delay and can immediately be thrown away (unless it's the maximum so far,
> in
> which case the previous maximum can be discarded) and the laziness of map
> and
> enumFromTo (remember that [a .. b] is syntactic sugar for enumFromTo a b)
> allows the list to be consumed as it is generated.
>
> Let us trace a few evaluation steps of answer:
> answer
> ---> foldl' max (0,0) $ map g (enumFromTo 1 999999)
> -- First, we must see whether the list is empty or not,
> -- 1 <= 999999, so the list is not empty
> ---> foldl' max (0,0) $ map g (1 : enumFromTo (1+1) 999999)
> ---> foldl' max (0,0) $ g 1 : map g (enumFromTo (1+1) 999999)
> ---> foldl' max (max (0,0) (g 1)) $ map g (enumFromTo (1+1) 999999)
> -- Now, due to the strictness of foldl', max (0,0) (g 1) has to be
> evaluated
> -- far enough to know if it is _|_ or not. For that, we obviously have to
> -- compare (0,0) and g 1, hence we must evaluate g 1 far enough to see if
> -- (0,0) <= g 1. Now g 1 = (f 1 1, 1) and we must check whether 0 <= f 1 1.
> -- However, f 1 1 = 1, so yes, and max (0,0) (g 1) turns out to be (1,1)
> ---> foldl' max (1,1) $ map g (enumFromTo (1+1) 999999)
> -- Now to decide if the list is empty, (1+1) must be compared to 999999,
> -- for that it must be evaluated, so
> ---> foldl' max (1,1) $ map g (2 : enumFromTo (2+1) 999999)
> ---> foldl' max (1,1) $ g 2 : map g (enumFromTo (2+1) 999999)
> ---> foldl' max (max (1,1) (g 2)) $ map g (enumFromTo (2+1) 999999)
> -- Is max (1,1) (g 2) _|_ or not? To find out evaluate g 2 = (2,2)
> ---> foldl' max (2,2) $ map g (enumFromTo (2+1) 999999)
> ---> foldl' max (2,2) $ map g (3 : enumFromTo (3+1) 999999)
> ---> foldl' max (2,2) $ g 3 : map g (enumFromTo (3+1) 999999)
> ---> foldl' max (max (2,2) (g 3)) $ map g (enumFromTo (3+1) 999999)
> ---> foldl' max (8,3) $ map g (enumFromTo (3+1) 999999)
> ...
> ---> foldl' max (max (8,3) (g 4)) $ map g (enumFromTo (4+1) 999999)
> ---> foldl' max (8,3) $ map g (enumFromTo (4+1) 999999)
>
> and so on. Hardly any memory needed.
> But you have more than 100,000,000 calls to f.
> Now if you store previous results in a Map, you save many of these calls
> (to
> rank in your code), but you pay for it with memory usage (and a couple of
> million expensive insertions into the Map). For every entry in the Map,
> apart
> from the space for key and value, you also have an Int holding the size of
> the subtree and pointers to the two subtrees of the node, on a 32-bit box
> that's a theoretical minimal overhead of 12 bytes per entry, probably
> rather
> 20 in reality, so a Map Integer Integer with one million entries would use
> some 30 MB memory.
>
> Hope that helps,
> Daniel
>
> >
> > I am puzzled! :-(
> >
> > On Mon, Jul 28, 2008 at 17:32, Steve Klabnik <[hidden email]
> >wrote:
> > > Finally, what I ended up doing:
> > >
> > > f :: Integer -> Integer -> Integer
> > > f acc x
> > >
> > >     | x == 1 = acc
> > >     |
> > >         | even x = f (acc + 1) (x `div` 2)
> > >         | otherwise = f (acc + 1) (3 * x + 1)
> > >
> > > g :: Integer -> (Integer, Integer)
> > > g x = (f 1 x, x)
> > >
> > > answer = (foldl' max (0,0)) $ map g [1 .. 999999]
> > >
> > >
> > > main = putStrLn( show answer)
>
>


--
Rafael Gustavo da Cunha Pereira Pinto
Electronic Engineer, MSc.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/beginners/attachments/20080729/49097dd9/attachment.htm
Reply | Threaded
Open this post in threaded view
|

Re: [Haskell-begin] Maximum of a List?

Daniel Fischer-4
Am Dienstag, 29. Juli 2008 13:12 schrieb Rafael Gustavo da Cunha Pereira
Pinto:

> Thanks Daniel!
>
> I only asked, because I was comparing the versions:
>
> 1) Steve's first version, without g x took forever to run and ate all the
> stack.
> 2) Mine (with Map) ran somewhat fast, but it takes almost 2 minutes running
> and eats 48MB of stack
> 3) Steve's last version (with g x) took less than 30s on my machine and
> needed no more than 8MB of stack.

Odd, I get quite different results.
Steve's first, with type signature
f :: Int -> Integer -> Int
runs fine, although it takes 760 seconds and allocates *tons* when
interpreted, takes 50 seconds when compiled (with -O2, that's my standard).
I suspect you ran it with type signature
f :: Int -> Int -> Int ?
Then you hit a loop containing -68 for f 1 113383, no wonder it doesn't finish
:-)

Yours does far better here:
./rafa +RTS -sstderr
1,794,748,252 bytes allocated in the heap
110,383,616 bytes copied during GC (scavenged)
 45,186,920 bytes copied during GC (not scavenged)
 35,016,704 bytes maximum residency (7 sample(s))

       3424 collections in generation 0 (  2.74s)
          7 collections in generation 1 (  1.26s)

         68 Mb total memory in use

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time   22.79s  ( 25.17s elapsed)
  GC    time    4.00s  (  4.28s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time   26.79s  ( 29.45s elapsed)

  %GC time      14.9%  (14.5% elapsed)

  Alloc rate    78,751,568 bytes per MUT second

  Productivity  85.1% of total user, 77.4% of total elapsed

Though that difference may partly be due to different compiling options (-O2
for me). But I doubt it consumes much stack, the Map, which takes a lot of
memory, should live on the heap.

Steve's new version doesn't differ much from the first with respect to running
time and memory usage, takes about 50 seconds here and runs happily within
1MB.

>
>
> If I understood it right, using (g x) on foldl' forced the strictness of
> the thunk (acc+1), without the need of using the bang pattern.

No, the strictness of the acc parameter was found by the strictness analyser
(at least with optimised compilation), neither foldl' nor g play a role here.

If my reading of the core is correct, without optimisations, the strictness
isn't inferred and I would've thought that is largely responsible for the
approximately doubled running time and more than doubled allocation figures
vs. -O2. Then each of the ((...(1+1)+1...)+1) thunks would only be evaluated
when the strictness of foldl' forces the evaluation of max (a,b) (g k). Since
these thunks contain some 100 nested additions on average, that should cost
considerable time and allocation, but as none contains more than a few
hundred additions, it'd be no problem for the stack.
However, things apparently aren't so simple, because with -O0, a strictifying
acc with a bang makes no difference :-(

What blows the stack pretty fast is a gigantic thunk of
max (max ... max (max (0,0) (g 1)) (g 2) ... (g 999999))
which is constructed when using foldl instead of foldl' and compiling without
optimisations (with -O1 already, all necessary strictness is seen, and the
compiler makes foldl behave like foldl' here).

The essential thing here is to avoid that thunk of maxes. Best done by using
foldl' and compiling with optimisations.

>
> I'll try to play with the strictness on my code, trying to take advantage
> of both the Map and strictness.
>
> Rafael

Cheers,
Daniel