i am getting some weird memory usage out of this program:
module Main where import Data.Binary import Data.List(foldl') main = do let sum' = foldl' (+) 0 let list::[Int] = decode $ encode $ ([1..] :: [Int]) print $ sum' list print "done" it goes up to 500M and down to 17M on windows. Its build with ghc 6.6.1 with the latest data.binary Any ideas what could be causing the memory usage to jump around so much? Thanks, Anatoly _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
Does it terminate?
Looks like you are summing all the natural numbers. On a turing machine it should run forever, on a real computer it should run out of memory. Unless I am missing something obvious :-) _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
On Wed, Oct 03, 2007 at 01:22:25AM +0200, Roel van Dijk wrote:
> Does it terminate? > > Looks like you are summing all the natural numbers. On a turing > machine it should run forever, on a real computer it should run out > of memory. Unless I am missing something obvious :-) There are only about 4 billion distinct values of type Int, 2 billion of which are positive. Integer is required for bigger values Stefan _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe signature.asc (196 bytes) Download Attachment |
In reply to this post by Anatoly Yakovenko
On Tue, Oct 02, 2007 at 04:08:01PM -0700, Anatoly Yakovenko wrote:
> i am getting some weird memory usage out of this program: > > > module Main where > > import Data.Binary > import Data.List(foldl') > > > main = do > let sum' = foldl' (+) 0 > let list::[Int] = decode $ encode $ ([1..] :: [Int]) > print $ sum' list > print "done" > > it goes up to 500M and down to 17M on windows. Its build with ghc > 6.6.1 with the latest data.binary > > Any ideas what could be causing the memory usage to jump around so much? 80GB usage... What does -ddump-simpl-stats say? Stefan _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe signature.asc (196 bytes) Download Attachment |
In reply to this post by Anatoly Yakovenko
Program1:
module Main where import Data.Binary import Data.List(foldl') main = do let sum' = foldl' (+) 0 let list::[Int] = decode $ encode $ ([1..] :: [Int]) print $ sum' list print "done" vs Program2: module Main where import Data.Binary import Data.List(foldl') main = do let sum' = foldl' (+) 0 let list::[Int] = [1..] print $ sum' list print "done" neither program is expected to terminate. The point of these examples is to demonstrate that Data.Binary encode and decode have some strange memory allocation patters. If you run Program1, it will run forever, but its memory usage on my machine goes to 500M then back down to 17M then back up to 500M then back down to 17M... repeatedly. I don't think this has anything to do with running out of space in a 32 bit integer. Program2 on the other hand runs at constant memory at around 2M. Anatoly On 10/2/07, Anatoly Yakovenko <[hidden email]> wrote: > i am getting some weird memory usage out of this program: > > > module Main where > > import Data.Binary > import Data.List(foldl') > > > main = do > let sum' = foldl' (+) 0 > let list::[Int] = decode $ encode $ ([1..] :: [Int]) > print $ sum' list > print "done" > > it goes up to 500M and down to 17M on windows. Its build with ghc > 6.6.1 with the latest data.binary > > Any ideas what could be causing the memory usage to jump around so much? > > > Thanks, > Anatoly > Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
aeyakovenko:
> Program1: > > module Main where > > import Data.Binary > import Data.List(foldl') > > > main = do > let sum' = foldl' (+) 0 > let list::[Int] = decode $ encode $ ([1..] :: [Int]) > print $ sum' list > print "done" > > vs > > Program2: > > module Main where > > import Data.Binary > import Data.List(foldl') > > > main = do > let sum' = foldl' (+) 0 > let list::[Int] = [1..] > print $ sum' list > print "done" > > neither program is expected to terminate. The point of these examples > is to demonstrate that Data.Binary encode and decode have some strange > memory allocation patters. The encode instance for lists is fairly strict: instance Binary a => Binary [a] where put l = put (length l) >> mapM_ put l get = do n <- get :: Get Int replicateM n get This is ok, since typically you aren't serialising infinite structures. Use a newtype, and a lazier instance, if you need to do this. -- Don _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
In reply to this post by Anatoly Yakovenko
Maybe what you are observing is that the operational semantics of
undefined is undefined. The program can halt, run forever, use no memory, use all the memory. Although I doubt what GHC does with this code is a random process, I don't think it's too meaningful to ask what are the space usage patterns of a program returning bottom. Anatoly Yakovenko wrote: > Program1: > > module Main where > > import Data.Binary > import Data.List(foldl') > > > main = do > let sum' = foldl' (+) 0 > let list::[Int] = decode $ encode $ ([1..] :: [Int]) > print $ sum' list > print "done" > > vs > > Program2: > > module Main where > > import Data.Binary > import Data.List(foldl') > > > main = do > let sum' = foldl' (+) 0 > let list::[Int] = [1..] > print $ sum' list > print "done" > > neither program is expected to terminate. The point of these examples > is to demonstrate that Data.Binary encode and decode have some strange > memory allocation patters. > > If you run Program1, it will run forever, but its memory usage on my > machine goes to 500M then back down to 17M then back up to 500M then > back down to 17M... repeatedly. I don't think this has anything to do > with running out of space in a 32 bit integer. > > Program2 on the other hand runs at constant memory at around 2M. > > Anatoly > > On 10/2/07, Anatoly Yakovenko <[hidden email]> wrote: >> i am getting some weird memory usage out of this program: >> >> >> module Main where >> >> import Data.Binary >> import Data.List(foldl') >> >> >> main = do >> let sum' = foldl' (+) 0 >> let list::[Int] = decode $ encode $ ([1..] :: [Int]) >> print $ sum' list >> print "done" >> >> it goes up to 500M and down to 17M on windows. Its build with ghc >> 6.6.1 with the latest data.binary >> >> Any ideas what could be causing the memory usage to jump around so much? >> >> >> Thanks, >> Anatoly >> > _______________________________________________ > Haskell-Cafe mailing list > [hidden email] > http://www.haskell.org/mailman/listinfo/haskell-cafe > > _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
In reply to this post by Don Stewart-2
On 10/2/07, Don Stewart <[hidden email]> wrote:
> aeyakovenko: > > Program1: > > > > module Main where > > > > import Data.Binary > > import Data.List(foldl') > > > > > > main = do > > let sum' = foldl' (+) 0 > > let list::[Int] = decode $ encode $ ([1..] :: [Int]) > > print $ sum' list > > print "done" > The encode instance for lists is fairly strict: > > instance Binary a => Binary [a] where > put l = put (length l) >> mapM_ put l > get = do n <- get :: Get Int > replicateM n get > > This is ok, since typically you aren't serialising infinite structures. hmm, this doesn't make sense to me, it goes up to 500M then down then back up, then back down, so it doesn't just run out of memory because of (length l) forces you to evaluate the entire list. > Use a newtype, and a lazier instance, if you need to do this. Thanks for the tip. this runs at a constant 4M module Main where import Data.Binary import Data.List(foldl') data Foo = Foo Int Foo | Null instance Binary Foo where put (Foo i f) = do put (0 :: Word8) put i put f put (Null) = do put (1 :: Word8) get = do t <- get :: Get Word8 case t of 0 -> do i <- get f <- get return (Foo i f) 1 -> do return Null sumFoo zz (Null) = zz sumFoo zz (Foo ii ff) = sumFoo (zz + ii) ff fooBar i = Foo i (fooBar (i + 1)) main = do print $ sumFoo 0 $ decode $ encode $ fooBar 1 print "done" _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
In reply to this post by Dan Weston
servers never terminate, pretend that i have a server that reads a
list encoded with data.binary from a socket, and sums it up and returns the current sum. i would expect it to run in constant memory, never terminate, and do useful work. which is basically the problem that I am facing right now. my program seems to grow randomly in memory use when marshaling large data types encoded using data.binary. On 10/2/07, Dan Weston <[hidden email]> wrote: > Maybe what you are observing is that the operational semantics of > undefined is undefined. The program can halt, run forever, use no > memory, use all the memory. > > Although I doubt what GHC does with this code is a random process, I > don't think it's too meaningful to ask what are the space usage patterns > of a program returning bottom. > > Anatoly Yakovenko wrote: > > Program1: > > > > module Main where > > > > import Data.Binary > > import Data.List(foldl') > > > > > > main = do > > let sum' = foldl' (+) 0 > > let list::[Int] = decode $ encode $ ([1..] :: [Int]) > > print $ sum' list > > print "done" > > > > vs > > > > Program2: > > > > module Main where > > > > import Data.Binary > > import Data.List(foldl') > > > > > > main = do > > let sum' = foldl' (+) 0 > > let list::[Int] = [1..] > > print $ sum' list > > print "done" > > > > neither program is expected to terminate. The point of these examples > > is to demonstrate that Data.Binary encode and decode have some strange > > memory allocation patters. > > > > If you run Program1, it will run forever, but its memory usage on my > > machine goes to 500M then back down to 17M then back up to 500M then > > back down to 17M... repeatedly. I don't think this has anything to do > > with running out of space in a 32 bit integer. > > > > Program2 on the other hand runs at constant memory at around 2M. > > > > Anatoly > > > > On 10/2/07, Anatoly Yakovenko <[hidden email]> wrote: > >> i am getting some weird memory usage out of this program: > >> > >> > >> module Main where > >> > >> import Data.Binary > >> import Data.List(foldl') > >> > >> > >> main = do > >> let sum' = foldl' (+) 0 > >> let list::[Int] = decode $ encode $ ([1..] :: [Int]) > >> print $ sum' list > >> print "done" > >> > >> it goes up to 500M and down to 17M on windows. Its build with ghc > >> 6.6.1 with the latest data.binary > >> > >> Any ideas what could be causing the memory usage to jump around so much? > >> > >> > >> Thanks, > >> Anatoly > >> > > _______________________________________________ > > Haskell-Cafe mailing list > > [hidden email] > > http://www.haskell.org/mailman/listinfo/haskell-cafe > > > > > > > Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
In reply to this post by Don Stewart-2
On 10/2/07, Don Stewart <[hidden email]> wrote:
> The encode instance for lists is fairly strict: > > instance Binary a => Binary [a] where > put l = put (length l) >> mapM_ put l > get = do n <- get :: Get Int > replicateM n get > > This is ok, since typically you aren't serialising infinite structures. > > Use a newtype, and a lazier instance, if you need to do this. Maybe something like this (WARNING: ugly code, as in "not elegant", follows): newtype List a = List [a] split255 :: [a] -> (Word8, [a], [a]) split255 = s 0 where s 255 xs = (255, [], xs) s n (x:xs) = let (n', ys, zs) = s (n+1) xs in (n', x:ys, zs) s n [] = (n, [], []) instance Binary a => Binary (List a) where put (List l) = let (n, xs, ys) = split255 l in do putWord8 n mapM_ put xs when (n == 255) (put $ List ys) get = do n <- getWord8 xs <- replicateM (fromEnum n) get if n == 255 then get >>= \(List ys) -> return (List $ xs ++ ys) else return (List xs) It uses chunks of 255 elements and so doesn't traverse the whole list until starting to output something. OTOH, there's some data overhead (1 byte every 255 elements). Seems to run your example fine and in constant memory. HTH, -- Felipe. _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
In reply to this post by Anatoly Yakovenko
aeyakovenko:
> servers never terminate, pretend that i have a server that reads a > list encoded with data.binary from a socket, and sums it up and > returns the current sum. i would expect it to run in constant memory, > never terminate, and do useful work. > > which is basically the problem that I am facing right now. my program > seems to grow randomly in memory use when marshaling large data types > encoded using data.binary. If its specifically the list instance, where we currently trade laziness for efficiency of encoding (which may or may not be the right thing), I'd suggest a fully lazy encoding instance? -- Don _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
> If its specifically the list instance, where we currently trade laziness
> for efficiency of encoding (which may or may not be the right thing), > I'd suggest a fully lazy encoding instance? Its not really a list, its more of a tree that has shared nodes, so something like this: A / \ B C \ / D / \ E F I suspect that maybe after encode/decode i end up with something like A / \ B C / \ D D / \ / \ E F E F _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
On Tuesday 02 October 2007 19:51:47 Anatoly Yakovenko wrote:
> > If its specifically the list instance, where we currently trade laziness > > for efficiency of encoding (which may or may not be the right thing), > > I'd suggest a fully lazy encoding instance? > > Its not really a list, its more of a tree that has shared nodes, so > something like this: > > A > / \ > B C > \ / > D > / \ > E F > > I suspect that maybe after encode/decode i end up with something like > > A > / \ > B C > / \ > D D > / \ / \ > E F E F That is correct, binary doesn't attempt to share substructures. If you'd like to do this, you'll need to do it by hand. Cheers, Spencer Janssen _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
Spencer Janssen wrote:
> On Tuesday 02 October 2007 19:51:47 Anatoly Yakovenko wrote: >>> If its specifically the list instance, where we currently trade laziness >>> for efficiency of encoding (which may or may not be the right thing), >>> I'd suggest a fully lazy encoding instance? >> Its not really a list, its more of a tree that has shared nodes, so >> something like this: >> >> A >> / \ >> B C >> \ / >> D >> / \ >> E F >> >> I suspect that maybe after encode/decode i end up with something like >> >> A >> / \ >> B C >> / \ >> D D >> / \ / \ >> E F E F > > That is correct, binary doesn't attempt to share substructures. If you'd like > to do this, you'll need to do it by hand. ...and indeed it can't be done, except by the naive brute-force method of comparing every subtree, possibly optimised by cryptographically hashing a representation of every subtree, since sharing isn't an observable property. Of course, hashing doesn't actually observe the real sharing present, rather, it computes maximal sharing. There are some applications where this could be a worthwhile win. Jules PS Well, except unsafePtrEquality but I don't really want to go there... _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
> ...and indeed it can't be done, except by the naive brute-force method
> of comparing every subtree, possibly optimized by cryptographically > hashing a representation of every subtree, since sharing isn't an > observable property. i was thinking that instead of having a reference to a node, each node just holds an index from an array of nodes. Traversal would take an extra step, but it should fix the problem with encode/decode. _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
In reply to this post by Jules Bean
On 10/4/07, Jules Bean <[hidden email]> wrote:
> ...and indeed it can't be done, except by the naive brute-force method > of comparing every subtree, possibly optimised by cryptographically > hashing a representation of every subtree, since sharing isn't an > observable property. At least one Prolog implementation (I forget which, I'm sorry), had a [de]serialisation library which used a hash-consing approach. Basically, it did its serialization using a post-order traversal and emitted references to previous values when the same value had already been emitted. Not rocket science. Actually, I've heard a Prolog guy - Bart Demoen - talk about doing pretty much this during GC to improve sharing. cheers, T. -- Thomas Conway [hidden email] Silence is the perfectest herald of joy: I were but little happy, if I could say how much. _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
Thomas Conway wrote:
> On 10/4/07, Jules Bean <[hidden email]> wrote: >> ...and indeed it can't be done, except by the naive brute-force method >> of comparing every subtree, possibly optimised by cryptographically >> hashing a representation of every subtree, since sharing isn't an >> observable property. > > At least one Prolog implementation (I forget which, I'm sorry), had a > [de]serialisation library which used a hash-consing approach. > Basically, it did its serialization using a post-order traversal and > emitted references to previous values when the same value had already > been emitted. Not rocket science. Actually, I've heard a Prolog guy - > Bart Demoen - talk about doing pretty much this during GC to improve > sharing. Not rocket science at all, but relatively expensive. A time/space tradeoff. And these days, with memory and disks feeling cheap, most people want to trade time for space, not the other way around. Not everyone, of course. Jules _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
On 2007-10-04, Jules Bean <[hidden email]> wrote:
> Thomas Conway wrote: >> On 10/4/07, Jules Bean <[hidden email]> wrote: >>> ...and indeed it can't be done, except by the naive brute-force method >>> of comparing every subtree, possibly optimised by cryptographically >>> hashing a representation of every subtree, since sharing isn't an >>> observable property. >> >> At least one Prolog implementation (I forget which, I'm sorry), had a >> [de]serialisation library which used a hash-consing approach. >> Basically, it did its serialization using a post-order traversal and >> emitted references to previous values when the same value had already >> been emitted. Not rocket science. Actually, I've heard a Prolog guy - >> Bart Demoen - talk about doing pretty much this during GC to improve >> sharing. > > Not rocket science at all, but relatively expensive. A time/space > tradeoff. And these days, with memory and disks feeling cheap, most > people want to trade time for space, not the other way around. Caches are still limited sizes, and that can make a huge difference for time. -- Aaron Denney -><- _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
Free forum by Nabble | Edit this page |