binary package: memory problem decoding an IntMap

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

binary package: memory problem decoding an IntMap

Manlio Perillo-3
Hi.

I'm having memory problems decoding a big IntMap.

The data structure is:

IntMap (UArr (Word16 :*: Word8))


There are 480189 keys, and a total of 100480507 elements
(Netflix Prize).
The size of the encoded (and compressed) data is 184 MB.

When I load data from the Netflix Prize data set, total memory usage is
1030 Mb.

However when I try to decode the data, memory usage grows too much (even
using the -F1.1 option in the RTS).


The problem seems to be with `fromAscList` function, defined as:

fromList :: [(Key,a)] -> IntMap a
fromList xs
   = foldlStrict ins empty xs
   where
     ins t (k,x)  = insert k x t

(by the way, why IntMap module does not use Data.List.foldl'?).

The `ins` function is not strict.



This seems an hard problem to solve.
First of all, IntMap should provide strict variants of the implemented
functions.
And the binary package should choose whether use the strict or lazy version.


For me, the simplest solution is to serialize the association list
obtained from `toAscList` function, instead of directly serialize the
IntMap.

The question is: can I "reuse" the data already serialized?
Is the binary format of `IntMap a` and `[(Int, a)]` compatible?



Thanks  Manlio Perillo
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: binary package: memory problem decoding an IntMap

Manlio Perillo-3
Manlio Perillo ha scritto:

> Hi.
>
> I'm having memory problems decoding a big IntMap.
>
> The data structure is:
>
> IntMap (UArr (Word16 :*: Word8))
>
>
> There are 480189 keys, and a total of 100480507 elements
> (Netflix Prize).
> The size of the encoded (and compressed) data is 184 MB.
>
> When I load data from the Netflix Prize data set, total memory usage is
> 1030 Mb.
>

It seems there is a problem with tuples, too.

I have a:
     [(Word16, UArr (Word32 :*:* Word8))]

This eats more memory than it should, since tuples are decoded lazily.



Manlio
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: binary package: memory problem decoding an IntMap

Nicolas Pouillard-2
Excerpts from Manlio Perillo's message of Sun Apr 05 22:41:57 +0200 2009:

> Manlio Perillo ha scritto:
> > Hi.
> >
> > I'm having memory problems decoding a big IntMap.
> >
> > The data structure is:
> >
> > IntMap (UArr (Word16 :*: Word8))
> >
> >
> > There are 480189 keys, and a total of 100480507 elements
> > (Netflix Prize).
> > The size of the encoded (and compressed) data is 184 MB.
> >
> > When I load data from the Netflix Prize data set, total memory usage is
> > 1030 Mb.
> >
>
> It seems there is a problem with tuples, too.
>
> I have a:
>      [(Word16, UArr (Word32 :*:* Word8))]
>
> This eats more memory than it should, since tuples are decoded lazily.

Why not switch to [(Word16 :*: UArr (Word32 :*: Word8))] then?

--
Nicolas Pouillard
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: binary package: memory problem decoding an IntMap

Manlio Perillo-3
In reply to this post by Manlio Perillo-3
Manlio Perillo ha scritto:
> [...]
>
> It seems there is a problem with tuples, too.
>
> I have a:
>     [(Word16, UArr (Word32 :*:* Word8))]
>
> This eats more memory than it should, since tuples are decoded lazily.
>

My bad, sorry.

I simply solved by using a strict consumer (foldl' instead of foldl).


Manlio
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: binary package: memory problem decoding an IntMap

Manlio Perillo-3
In reply to this post by Nicolas Pouillard-2
Nicolas Pouillard ha scritto:

> Excerpts from Manlio Perillo's message of Sun Apr 05 22:41:57 +0200 2009:
>> Manlio Perillo ha scritto:
>>> Hi.
>>>
> [...]
>>
>> I have a:
>>      [(Word16, UArr (Word32 :*:* Word8))]
>>
>> This eats more memory than it should, since tuples are decoded lazily.
>
> Why not switch to [(Word16 :*: UArr (Word32 :*: Word8))] then?
>

I finally made some tests today, and I can confirm that using :*:
reduces memory usage.


Thanks  Manlio

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe