bizarre memory usage with data.binary

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

bizarre memory usage with data.binary

Anatoly Yakovenko
i am getting some weird memory usage out of this program:


module Main where

import Data.Binary
import Data.List(foldl')


main = do
   let sum' = foldl' (+) 0
   let list::[Int] = decode $ encode $ ([1..] :: [Int])
   print $ sum' list
   print "done"

it goes up to 500M and down to 17M on windows.  Its build with ghc
6.6.1 with the latest data.binary

Any ideas what could be causing the memory usage to jump around so much?


Thanks,
Anatoly
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: bizarre memory usage with data.binary

Roel van Dijk-3
Does it terminate?

Looks like you are summing all the natural numbers. On a turing
machine it should run forever,   on a real computer it should run out
of memory. Unless I am missing something obvious :-)
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: bizarre memory usage with data.binary

Stefan O'Rear
On Wed, Oct 03, 2007 at 01:22:25AM +0200, Roel van Dijk wrote:
> Does it terminate?
>
> Looks like you are summing all the natural numbers. On a turing
> machine it should run forever,   on a real computer it should run out
> of memory. Unless I am missing something obvious :-)

There are only about 4 billion distinct values of type Int, 2 billion of
which are positive.  Integer is required for bigger values

Stefan

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe

signature.asc (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: bizarre memory usage with data.binary

Stefan O'Rear
In reply to this post by Anatoly Yakovenko
On Tue, Oct 02, 2007 at 04:08:01PM -0700, Anatoly Yakovenko wrote:

> i am getting some weird memory usage out of this program:
>
>
> module Main where
>
> import Data.Binary
> import Data.List(foldl')
>
>
> main = do
>    let sum' = foldl' (+) 0
>    let list::[Int] = decode $ encode $ ([1..] :: [Int])
>    print $ sum' list
>    print "done"
>
> it goes up to 500M and down to 17M on windows.  Its build with ghc
> 6.6.1 with the latest data.binary
>
> Any ideas what could be causing the memory usage to jump around so much?
Only 500M?  encode for lists is strict, I would have expected around
80GB usage...  What does -ddump-simpl-stats say?

Stefan

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe

signature.asc (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: bizarre memory usage with data.binary

Anatoly Yakovenko
In reply to this post by Anatoly Yakovenko
Program1:

module Main where

import Data.Binary
import Data.List(foldl')


main = do
  let sum' = foldl' (+) 0
  let list::[Int] = decode $ encode $ ([1..] :: [Int])
  print $ sum' list
  print "done"

vs

Program2:

module Main where

import Data.Binary
import Data.List(foldl')


main = do
  let sum' = foldl' (+) 0
  let list::[Int] = [1..]
  print $ sum' list
  print "done"

neither program is expected to terminate.  The point of these examples
is to demonstrate that Data.Binary encode and decode have some strange
memory allocation patters.

If you run Program1, it will run forever, but its memory usage on my
machine goes to 500M then back down to 17M then back up to 500M then
back down to 17M... repeatedly.  I don't think this has anything to do
with running out of space in a 32 bit integer.

Program2 on the other hand runs at constant memory at around 2M.

Anatoly

On 10/2/07, Anatoly Yakovenko <[hidden email]> wrote:

> i am getting some weird memory usage out of this program:
>
>
> module Main where
>
> import Data.Binary
> import Data.List(foldl')
>
>
> main = do
>    let sum' = foldl' (+) 0
>    let list::[Int] = decode $ encode $ ([1..] :: [Int])
>    print $ sum' list
>    print "done"
>
> it goes up to 500M and down to 17M on windows.  Its build with ghc
> 6.6.1 with the latest data.binary
>
> Any ideas what could be causing the memory usage to jump around so much?
>
>
> Thanks,
> Anatoly
>
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Re: bizarre memory usage with data.binary

Don Stewart-2
aeyakovenko:

> Program1:
>
> module Main where
>
> import Data.Binary
> import Data.List(foldl')
>
>
> main = do
>   let sum' = foldl' (+) 0
>   let list::[Int] = decode $ encode $ ([1..] :: [Int])
>   print $ sum' list
>   print "done"
>
> vs
>
> Program2:
>
> module Main where
>
> import Data.Binary
> import Data.List(foldl')
>
>
> main = do
>   let sum' = foldl' (+) 0
>   let list::[Int] = [1..]
>   print $ sum' list
>   print "done"
>
> neither program is expected to terminate.  The point of these examples
> is to demonstrate that Data.Binary encode and decode have some strange
> memory allocation patters.

The encode instance for lists is fairly strict:

    instance Binary a => Binary [a] where
        put l  = put (length l) >> mapM_ put l
        get    = do n <- get :: Get Int
                    replicateM n get

This is ok, since typically you aren't serialising infinite structures.

Use a newtype, and a lazier instance, if you need to do this.

-- Don
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Re: bizarre memory usage with data.binary

Dan Weston
In reply to this post by Anatoly Yakovenko
Maybe what you are observing is that the operational semantics of
undefined is undefined. The program can halt, run forever, use no
memory, use all the memory.

Although I doubt what GHC does with this code is a random process, I
don't think it's too meaningful to ask what are the space usage patterns
of a program returning bottom.

Anatoly Yakovenko wrote:

> Program1:
>
> module Main where
>
> import Data.Binary
> import Data.List(foldl')
>
>
> main = do
>   let sum' = foldl' (+) 0
>   let list::[Int] = decode $ encode $ ([1..] :: [Int])
>   print $ sum' list
>   print "done"
>
> vs
>
> Program2:
>
> module Main where
>
> import Data.Binary
> import Data.List(foldl')
>
>
> main = do
>   let sum' = foldl' (+) 0
>   let list::[Int] = [1..]
>   print $ sum' list
>   print "done"
>
> neither program is expected to terminate.  The point of these examples
> is to demonstrate that Data.Binary encode and decode have some strange
> memory allocation patters.
>
> If you run Program1, it will run forever, but its memory usage on my
> machine goes to 500M then back down to 17M then back up to 500M then
> back down to 17M... repeatedly.  I don't think this has anything to do
> with running out of space in a 32 bit integer.
>
> Program2 on the other hand runs at constant memory at around 2M.
>
> Anatoly
>
> On 10/2/07, Anatoly Yakovenko <[hidden email]> wrote:
>> i am getting some weird memory usage out of this program:
>>
>>
>> module Main where
>>
>> import Data.Binary
>> import Data.List(foldl')
>>
>>
>> main = do
>>    let sum' = foldl' (+) 0
>>    let list::[Int] = decode $ encode $ ([1..] :: [Int])
>>    print $ sum' list
>>    print "done"
>>
>> it goes up to 500M and down to 17M on windows.  Its build with ghc
>> 6.6.1 with the latest data.binary
>>
>> Any ideas what could be causing the memory usage to jump around so much?
>>
>>
>> Thanks,
>> Anatoly
>>
> _______________________________________________
> Haskell-Cafe mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Re: bizarre memory usage with data.binary

Anatoly Yakovenko
In reply to this post by Don Stewart-2
On 10/2/07, Don Stewart <[hidden email]> wrote:

> aeyakovenko:
> > Program1:
> >
> > module Main where
> >
> > import Data.Binary
> > import Data.List(foldl')
> >
> >
> > main = do
> >   let sum' = foldl' (+) 0
> >   let list::[Int] = decode $ encode $ ([1..] :: [Int])
> >   print $ sum' list
> >   print "done"

> The encode instance for lists is fairly strict:
>
>     instance Binary a => Binary [a] where
>         put l  = put (length l) >> mapM_ put l
>         get    = do n <- get :: Get Int
>                     replicateM n get
>
> This is ok, since typically you aren't serialising infinite structures.

hmm, this doesn't make sense to me, it goes up to 500M then down then
back up, then back down, so it doesn't just run out of memory because
of (length l) forces you to evaluate the entire list.

> Use a newtype, and a lazier instance, if you need to do this.

Thanks for the tip.  this runs at a constant 4M

module Main where

import Data.Binary
import Data.List(foldl')

data Foo = Foo Int Foo | Null

instance Binary Foo where
   put (Foo i f) = do put (0 :: Word8)
                      put i
                      put f
   put (Null)  = do put (1 :: Word8)
   get = do t <- get :: Get Word8
            case t of
               0 -> do i <- get
                       f <- get
                       return (Foo i f)
               1 -> do return Null

sumFoo zz (Null) = zz
sumFoo zz (Foo ii ff) = sumFoo (zz + ii) ff

fooBar i = Foo i (fooBar (i + 1))

main = do
   print $ sumFoo 0 $ decode $ encode $ fooBar 1
   print "done"
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Re: bizarre memory usage with data.binary

Anatoly Yakovenko
In reply to this post by Dan Weston
servers never terminate, pretend that i have a server that reads a
list encoded with data.binary from a socket, and sums it up and
returns the current sum.  i would expect it to run in constant memory,
never terminate, and do useful work.

which is basically the problem that I am facing right now.  my program
seems to grow randomly in memory use when marshaling large data types
encoded using data.binary.

On 10/2/07, Dan Weston <[hidden email]> wrote:

> Maybe what you are observing is that the operational semantics of
> undefined is undefined. The program can halt, run forever, use no
> memory, use all the memory.
>
> Although I doubt what GHC does with this code is a random process, I
> don't think it's too meaningful to ask what are the space usage patterns
> of a program returning bottom.
>
> Anatoly Yakovenko wrote:
> > Program1:
> >
> > module Main where
> >
> > import Data.Binary
> > import Data.List(foldl')
> >
> >
> > main = do
> >   let sum' = foldl' (+) 0
> >   let list::[Int] = decode $ encode $ ([1..] :: [Int])
> >   print $ sum' list
> >   print "done"
> >
> > vs
> >
> > Program2:
> >
> > module Main where
> >
> > import Data.Binary
> > import Data.List(foldl')
> >
> >
> > main = do
> >   let sum' = foldl' (+) 0
> >   let list::[Int] = [1..]
> >   print $ sum' list
> >   print "done"
> >
> > neither program is expected to terminate.  The point of these examples
> > is to demonstrate that Data.Binary encode and decode have some strange
> > memory allocation patters.
> >
> > If you run Program1, it will run forever, but its memory usage on my
> > machine goes to 500M then back down to 17M then back up to 500M then
> > back down to 17M... repeatedly.  I don't think this has anything to do
> > with running out of space in a 32 bit integer.
> >
> > Program2 on the other hand runs at constant memory at around 2M.
> >
> > Anatoly
> >
> > On 10/2/07, Anatoly Yakovenko <[hidden email]> wrote:
> >> i am getting some weird memory usage out of this program:
> >>
> >>
> >> module Main where
> >>
> >> import Data.Binary
> >> import Data.List(foldl')
> >>
> >>
> >> main = do
> >>    let sum' = foldl' (+) 0
> >>    let list::[Int] = decode $ encode $ ([1..] :: [Int])
> >>    print $ sum' list
> >>    print "done"
> >>
> >> it goes up to 500M and down to 17M on windows.  Its build with ghc
> >> 6.6.1 with the latest data.binary
> >>
> >> Any ideas what could be causing the memory usage to jump around so much?
> >>
> >>
> >> Thanks,
> >> Anatoly
> >>
> > _______________________________________________
> > Haskell-Cafe mailing list
> > [hidden email]
> > http://www.haskell.org/mailman/listinfo/haskell-cafe
> >
> >
>
>
>
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Re: bizarre memory usage with data.binary

Felipe Lessa
In reply to this post by Don Stewart-2
On 10/2/07, Don Stewart <[hidden email]> wrote:

> The encode instance for lists is fairly strict:
>
>     instance Binary a => Binary [a] where
>         put l  = put (length l) >> mapM_ put l
>         get    = do n <- get :: Get Int
>                     replicateM n get
>
> This is ok, since typically you aren't serialising infinite structures.
>
> Use a newtype, and a lazier instance, if you need to do this.

Maybe something like this (WARNING: ugly code, as in "not elegant", follows):

newtype List a = List [a]

split255 :: [a] -> (Word8, [a], [a])
split255 = s 0 where
    s 255 xs   = (255, [], xs)
    s n (x:xs) = let (n', ys, zs) = s (n+1) xs in (n', x:ys, zs)
    s n []     = (n, [], [])

instance Binary a => Binary (List a) where
  put (List l) = let (n, xs, ys) = split255 l
                 in do putWord8 n
                       mapM_ put xs
                       when (n == 255) (put $ List ys)
  get = do n <- getWord8
           xs <- replicateM (fromEnum n) get
           if n == 255
             then get >>= \(List ys) -> return (List $ xs ++ ys)
             else return (List xs)


It uses chunks of 255 elements and so doesn't traverse the whole list
until starting to output something. OTOH, there's some data overhead
(1 byte every 255 elements). Seems to run your example fine and in
constant memory.

HTH,

--
Felipe.
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Re: bizarre memory usage with data.binary

Don Stewart-2
In reply to this post by Anatoly Yakovenko
aeyakovenko:
> servers never terminate, pretend that i have a server that reads a
> list encoded with data.binary from a socket, and sums it up and
> returns the current sum.  i would expect it to run in constant memory,
> never terminate, and do useful work.
>
> which is basically the problem that I am facing right now.  my program
> seems to grow randomly in memory use when marshaling large data types
> encoded using data.binary.

If its specifically the list instance, where we currently trade laziness
for efficiency of encoding (which may or may not be the right thing),
I'd suggest a fully lazy encoding instance?

-- Don
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Re: bizarre memory usage with data.binary

Anatoly Yakovenko
> If its specifically the list instance, where we currently trade laziness
> for efficiency of encoding (which may or may not be the right thing),
> I'd suggest a fully lazy encoding instance?

Its not really a list, its more of a tree that has shared nodes, so
something like this:

        A
     /     \
    B      C
      \   /
       D
     /   \
    E    F

I suspect that maybe after encode/decode i end up with something like

        A
     /     \
    B      C
   /          \
  D           D
 /   \        /   \
E    F    E     F
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Re: bizarre memory usage with data.binary

Spencer Janssen-2
On Tuesday 02 October 2007 19:51:47 Anatoly Yakovenko wrote:

> > If its specifically the list instance, where we currently trade laziness
> > for efficiency of encoding (which may or may not be the right thing),
> > I'd suggest a fully lazy encoding instance?
>
> Its not really a list, its more of a tree that has shared nodes, so
> something like this:
>
>         A
>      /     \
>     B      C
>       \   /
>        D
>      /   \
>     E    F
>
> I suspect that maybe after encode/decode i end up with something like
>
>         A
>      /     \
>     B      C
>    /          \
>   D           D
>  /   \        /   \
> E    F    E     F

That is correct, binary doesn't attempt to share substructures.  If you'd like
to do this, you'll need to do it by hand.


Cheers,
Spencer Janssen

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Re: bizarre memory usage with data.binary

Jules Bean
Spencer Janssen wrote:

> On Tuesday 02 October 2007 19:51:47 Anatoly Yakovenko wrote:
>>> If its specifically the list instance, where we currently trade laziness
>>> for efficiency of encoding (which may or may not be the right thing),
>>> I'd suggest a fully lazy encoding instance?
>> Its not really a list, its more of a tree that has shared nodes, so
>> something like this:
>>
>>         A
>>      /     \
>>     B      C
>>       \   /
>>        D
>>      /   \
>>     E    F
>>
>> I suspect that maybe after encode/decode i end up with something like
>>
>>         A
>>      /     \
>>     B      C
>>    /          \
>>   D           D
>>  /   \        /   \
>> E    F    E     F
>
> That is correct, binary doesn't attempt to share substructures.  If you'd like
> to do this, you'll need to do it by hand.

...and indeed it can't be done, except by the naive brute-force method
of comparing every subtree, possibly optimised by cryptographically
hashing a representation of every subtree, since sharing isn't an
observable property.

Of course, hashing doesn't actually observe the real sharing present,
rather, it computes maximal sharing. There are some applications where
this could be a worthwhile win.

Jules

PS Well, except unsafePtrEquality but I don't really want to go there...

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Re: bizarre memory usage with data.binary

Anatoly Yakovenko
> ...and indeed it can't be done, except by the naive brute-force method
> of comparing every subtree, possibly optimized by cryptographically
> hashing a representation of every subtree, since sharing isn't an
> observable property.

i was thinking that instead of having a reference to a node, each node
just holds an index from an array of nodes.  Traversal would take an
extra step, but it should fix the problem with encode/decode.
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Re: bizarre memory usage with data.binary

Thomas Conway
In reply to this post by Jules Bean
On 10/4/07, Jules Bean <[hidden email]> wrote:
> ...and indeed it can't be done, except by the naive brute-force method
> of comparing every subtree, possibly optimised by cryptographically
> hashing a representation of every subtree, since sharing isn't an
> observable property.

At least one Prolog implementation (I forget which, I'm sorry), had a
[de]serialisation library which used a hash-consing approach.
Basically, it did its serialization using a post-order traversal and
emitted references to previous values when the same value had already
been emitted. Not rocket science. Actually, I've heard a Prolog guy -
Bart Demoen - talk about doing pretty much this during GC to improve
sharing.

cheers,
T.
 --
Thomas Conway
[hidden email]

Silence is the perfectest herald of joy:
I were but little happy, if I could say how much.
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Re: bizarre memory usage with data.binary

Jules Bean
Thomas Conway wrote:

> On 10/4/07, Jules Bean <[hidden email]> wrote:
>> ...and indeed it can't be done, except by the naive brute-force method
>> of comparing every subtree, possibly optimised by cryptographically
>> hashing a representation of every subtree, since sharing isn't an
>> observable property.
>
> At least one Prolog implementation (I forget which, I'm sorry), had a
> [de]serialisation library which used a hash-consing approach.
> Basically, it did its serialization using a post-order traversal and
> emitted references to previous values when the same value had already
> been emitted. Not rocket science. Actually, I've heard a Prolog guy -
> Bart Demoen - talk about doing pretty much this during GC to improve
> sharing.

Not rocket science at all, but relatively expensive. A time/space
tradeoff. And these days, with memory and disks feeling cheap, most
people want to trade time for space, not the other way around.

Not everyone, of course.

Jules
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: bizarre memory usage with data.binary

Aaron Denney
On 2007-10-04, Jules Bean <[hidden email]> wrote:

> Thomas Conway wrote:
>> On 10/4/07, Jules Bean <[hidden email]> wrote:
>>> ...and indeed it can't be done, except by the naive brute-force method
>>> of comparing every subtree, possibly optimised by cryptographically
>>> hashing a representation of every subtree, since sharing isn't an
>>> observable property.
>>
>> At least one Prolog implementation (I forget which, I'm sorry), had a
>> [de]serialisation library which used a hash-consing approach.
>> Basically, it did its serialization using a post-order traversal and
>> emitted references to previous values when the same value had already
>> been emitted. Not rocket science. Actually, I've heard a Prolog guy -
>> Bart Demoen - talk about doing pretty much this during GC to improve
>> sharing.
>
> Not rocket science at all, but relatively expensive. A time/space
> tradeoff. And these days, with memory and disks feeling cheap, most
> people want to trade time for space, not the other way around.

Caches are still limited sizes, and that can make a huge difference for
time.

--
Aaron Denney
-><-

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe