Question about STRef

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Question about STRef

Antoine Rimlet
Hi list,

I try to get the following little program (a slightly modified "Man or boy", it prints -14254067) work "as expected", that is, without consuming lots of memory:

 import Data.STRef
 a k x1 x2 x3 x4 x5 =
   do kk <- newSTRef k
      let b = do k <- modifySTRef kk pred >> readSTRef kk; a k b x1 x2 x3 x4
      if k <= 0 then do x3' <- x3; x4' <- x4; return (x3' + x4')
                else do x5' <- x5; b' <- b; return (x5' + b')
 main = print (runST (a 22 (return 1) (return (-1)) (return (-1)) (return 1) (return 0)))

I use GHC 7.8.4, and executing this program uses about 2.5 GB of memory (or about 3.5 GB with runghc).  However, the "equivalent" program in OCaml only needs 4 MB (or 9 MB with the interpreter):

let rec a k x1 x2 x3 x4 x5 =
  let kk = ref k in let rec b () = begin decr kk; a !kk b x1 x2 x3 x4 end
  in if k <= 0 then let v = x3 () in v + x4 () else let v = x5 () in v + b ();;
print_int (a 22 (fun () -> 1) (fun () -> -1) (fun () -> -1) (fun () -> 1) (fun () -> 0));; print_newline ();;

Therefore I suspect I'm doing something wrong, but I can't see what.  I did try to use the strict version modifySTRef' as indicated in the manual, but with no visible improvement.

Thanks,

Antoine


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Question about STRef

Johannes Waldmann-2
Thanks for pointing out this (Knuth's) nice test case.
I should use this as an exam question ...

Allocation and run-time can be reduced
by replacing  return (x3' + x4')
with  let x = (x3' + x4') in x `seq` return x
And, modifySTRef' *does* help.

I *do* notice a regression
(for the seq-ed and primed version)

ghc-8.0.2 : 2.6 GB alloc, 2.3 sec
ghc-6.10.4: 2.0 GB alloc, 1.7 sec

(measured on X5365  @ 3.00GHz )

Numbers are for total allocation,
residency is small  (it runs with  +RTS -H10k -A10k
but then the number of GCs goes up)

- J.
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Question about STRef

Robin Palotai
Alternatively, just use "return $! ", which has the same effect as seq-ing, and is generally a good practice (unless you want to be explicitly lazy). See http://johantibell.com/files/haskell-performance-patterns.html.

the-thing +RTS -s     (with GHC 8)

  3,965,758,088 bytes allocated in the heap
      46,006,592 bytes copied during GC
         207,792 bytes maximum residency (5 sample(s))             <---- down to 200K from 1.5G
         182,864 bytes maximum slop
               8 MB total memory in use (0 MB lost due to fragmentation)

Robin

2017-01-26 12:42 GMT+01:00 Johannes Waldmann <[hidden email]>:
Thanks for pointing out this (Knuth's) nice test case.
I should use this as an exam question ...

Allocation and run-time can be reduced
by replacing  return (x3' + x4')
with  let x = (x3' + x4') in x `seq` return x
And, modifySTRef' *does* help.

I *do* notice a regression
(for the seq-ed and primed version)

ghc-8.0.2 : 2.6 GB alloc, 2.3 sec
ghc-6.10.4: 2.0 GB alloc, 1.7 sec

(measured on X5365  @ 3.00GHz )

Numbers are for total allocation,
residency is small  (it runs with  +RTS -H10k -A10k
but then the number of GCs goes up)

- J.
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Question about STRef

Johannes Waldmann-2
Yes. Here's a table with runtime performance by ghc version:
http://www.imn.htwk-leipzig.de/~waldmann/etc/mob/
The winner is  6.12.3 (1.56 sec)  (where 8.0.2 : 2.30 sec) - J.

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Question about STRef

Michael Snoyman
In reply to this post by Antoine Rimlet
I made a few modifications to your code, and found that replacing `return (x3' + x4')` with `return $! x3' + x4'` reduced maximum residency down to 64kb. This forces evaluation earlier. You can see the progression here:


I took it one step further, and used the mutable-containers package to use an unboxed reference instead of the boxed STRef type. In other words: it avoids allocating a heap object. Here's that version of the code:

#!/usr/bin/env stack
-- stack --resolver lts-7.14 --install-ghc exec --package mutable-containers -- ghc -O2 -with-rtsopts=-s
import Data.Mutable

a :: Int
  -> ST s Int
  -> ST s Int
  -> ST s Int
  -> ST s Int
  -> ST s Int
  -> ST s Int
a k x1 x2 x3 x4 x5 =
   do kk <- fmap asURef $ newRef k
      let b = do k0 <- readRef kk
                 let k1 = k0 - 1
                 writeRef kk k1
                 a k1 b x1 x2 x3 x4
      if k <= 0 then do x3' <- x3; x4' <- x4; return $! x3' + x4'
                else do x5' <- x5; b' <- b; return $! x5' + b'

main = print (runST (a 22 (return 1) (return (-1)) (return (-1)) (return 1) (return 0)))

It knocked down total allocations from 2.7GB to 1.8GB, which is an improvement, but I think there's still some more low hanging fruit here.

On Thu, Jan 26, 2017 at 12:26 PM, Antoine Rimlet <[hidden email]> wrote:
Hi list,

I try to get the following little program (a slightly modified "Man or boy", it prints -14254067) work "as expected", that is, without consuming lots of memory:

 import Data.STRef
 a k x1 x2 x3 x4 x5 =
   do kk <- newSTRef k
      let b = do k <- modifySTRef kk pred >> readSTRef kk; a k b x1 x2 x3 x4
      if k <= 0 then do x3' <- x3; x4' <- x4; return (x3' + x4')
                else do x5' <- x5; b' <- b; return (x5' + b')
 main = print (runST (a 22 (return 1) (return (-1)) (return (-1)) (return 1) (return 0)))

I use GHC 7.8.4, and executing this program uses about 2.5 GB of memory (or about 3.5 GB with runghc).  However, the "equivalent" program in OCaml only needs 4 MB (or 9 MB with the interpreter):

let rec a k x1 x2 x3 x4 x5 =
  let kk = ref k in let rec b () = begin decr kk; a !kk b x1 x2 x3 x4 end
  in if k <= 0 then let v = x3 () in v + x4 () else let v = x5 () in v + b ();;
print_int (a 22 (fun () -> 1) (fun () -> -1) (fun () -> -1) (fun () -> 1) (fun () -> 0));; print_newline ();;

Therefore I suspect I'm doing something wrong, but I can't see what.  I did try to use the strict version modifySTRef' as indicated in the manual, but with no visible improvement.

Thanks,

Antoine


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Question about STRef

Johannes Waldmann-2
In reply to this post by Antoine Rimlet
What Michael did not mention in the mail, but it's in his code:
numerals defaulted to Integer, and the story changes (somewhat)
if we force Int.

I updated my measurements:
http://www.imn.htwk-leipzig.de/~waldmann/etc/mob/
and ghc-8 looks better now - but not quite wins.

> there's still some more low hanging fruit here

... for whom? What could we reasonably expect of the compiler here?
It could do the strictification? the unboxing?

It certainly could not replace Integer by Int,
since the range of numerical values has no obvious bound.

But then, all these Integers are small, but we pay quite an overhead
(for represeting Int as Integer) which has increased for current ghcs?

- J.
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Question about STRef

Michael Snoyman


On Thu, Jan 26, 2017 at 3:05 PM, Johannes Waldmann <[hidden email]> wrote:
What Michael did not mention in the mail, but it's in his code:
numerals defaulted to Integer, and the story changes (somewhat)
if we force Int.

I updated my measurements:
http://www.imn.htwk-leipzig.de/~waldmann/etc/mob/
and ghc-8 looks better now - but not quite wins.

> there's still some more low hanging fruit here

... for whom? What could we reasonably expect of the compiler here?
It could do the strictification? the unboxing?

It certainly could not replace Integer by Int,
since the range of numerical values has no obvious bound.

But then, all these Integers are small, but we pay quite an overhead
(for represeting Int as Integer) which has increased for current ghcs?


I meant that, if someone wanted to bring down allocations significantly, I would think there's still room for optimization. I was not commenting on what GHC should or shouldn't be capable of doing here. It would certainly be nice, for example, if it was capable of noticing the strictness involved and automatically unbox the STRefs, but that would be quite a feat.

I didn't investigate further myself, so I could be mistaken about the possibility to further reduce allocations. It was just a guess (thus the "I think").

Michael

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Question about STRef

Michael Snoyman


On Thu, Jan 26, 2017 at 3:15 PM, Michael Snoyman <[hidden email]> wrote:


On Thu, Jan 26, 2017 at 3:05 PM, Johannes Waldmann <[hidden email]> wrote:
What Michael did not mention in the mail, but it's in his code:
numerals defaulted to Integer, and the story changes (somewhat)
if we force Int.

I updated my measurements:
http://www.imn.htwk-leipzig.de/~waldmann/etc/mob/
and ghc-8 looks better now - but not quite wins.

> there's still some more low hanging fruit here

... for whom? What could we reasonably expect of the compiler here?
It could do the strictification? the unboxing?

It certainly could not replace Integer by Int,
since the range of numerical values has no obvious bound.

But then, all these Integers are small, but we pay quite an overhead
(for represeting Int as Integer) which has increased for current ghcs?


I meant that, if someone wanted to bring down allocations significantly, I would think there's still room for optimization. I was not commenting on what GHC should or shouldn't be capable of doing here. It would certainly be nice, for example, if it was capable of noticing the strictness involved and automatically unbox the STRefs, but that would be quite a feat.

I didn't investigate further myself, so I could be mistaken about the possibility to further reduce allocations. It was just a guess (thus the "I think").

Michael

As an example, upon reviewing the code I realized it was unnecessarily allocating a reference in the case of `k <= 0`. Fixing that reduces allocations to 800MB (from 1.9GB). This was a pretty easy change to make. A more involved change would be to share a mutable vector of larger size instead of allocating a new reference each time. I'm now intrigued enough by the problem that I'll probably do that next :)

#!/usr/bin/env stack
-- stack --resolver lts-7.14 --install-ghc exec --package mutable-containers -- ghc -O2 -with-rtsopts=-s
import Data.Mutable

a :: Int
  -> ST s Int
  -> ST s Int
  -> ST s Int
  -> ST s Int
  -> ST s Int
  -> ST s Int
a k x1 x2 x3 x4 x5
    | k <= 0 = do
        x3' <- x3
        x4' <- x4
        return $! x3' + x4'
    | otherwise = do
        kRef <- fmap asURef $ newRef k

        let b = do
                k0 <- readRef kRef
                let k1 = k0 - 1
                writeRef kRef k1
                a k1 b x1 x2 x3 x4

        x5' <- x5
        b' <- b
        return $! x5' + b'


main = print $ runST $ a
    22
    (return 1)
    (return (-1))
    (return (-1))
    (return 1)
    (return 0)

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Question about STRef

Frank Staals
In reply to this post by Antoine Rimlet
Antoine Rimlet <[hidden email]> writes:

> Hi list,
>
> I try to get the following little program (a slightly modified "Man or
> boy", it prints -14254067) work "as expected", that is, without consuming
> lots of memory

Are you sure "-14254067" is correct for k=22? Wikipedia [1] and
RosettaCode [2] both seem to suggest that -865 609 is the right
answer. Similarly, for k=10 it is supposed to return -67 rather than -577.

[1] https://en.wikipedia.org/wiki/Man_or_boy_test
[2] http://rosettacode.org/wiki/Man_or_boy_test

--

- Frank
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Question about STRef

Antoine Rimlet
Thanks to all for your answers!
With "return $!" the program now works "as expected".
Forcing Int, and moving the "let b" in the "else" part, also brings some (but comparatively far less) improvement.
Note that, as indicated in my initial post, this program is a *modified* version of "Man or boy", and therefore it does not return the same values.
Best,
Antoine

2017-01-26 14:32 GMT+01:00 Frank Staals <[hidden email]>:
Antoine Rimlet <[hidden email]> writes:

> Hi list,
>
> I try to get the following little program (a slightly modified "Man or
> boy", it prints -14254067) work "as expected", that is, without consuming
> lots of memory

Are you sure "-14254067" is correct for k=22? Wikipedia [1] and
RosettaCode [2] both seem to suggest that -865 609 is the right
answer. Similarly, for k=10 it is supposed to return -67 rather than -577.

[1] https://en.wikipedia.org/wiki/Man_or_boy_test
[2] http://rosettacode.org/wiki/Man_or_boy_test

--

- Frank


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.