Measuring memory usage

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Measuring memory usage

Vlatko Basic

Hello,

I've come to some strange results using Weigh package.

It shows that HashMap inside 'data' is using much, much more memory.

The strange thing is that I'm seeing too large mem usage in my app as well (several "MapData" like in records), and trying to figure out with 'weigh' what's keeping the mem.

Noticed that when I change the code to use HashMap directly (not inside 'data', that's the only change), the mem usage observed with top drops down for ~60M, from 850M to 790M.


These are the test results for 10K, 5K and 3.3K items for "data MapData k v = MapData (HashMap k v)" (at the end is the full runnable example.)

Case           Allocated  GCs
HashMap          262,824    0
HashMap half      58,536    0
HashMap third     17,064    0
MapData        4,242,208    4

I tested by changing the order, disabling all but one etc., and the results were the same. Same 'weigh' behaviour with IntMap and Map.


So, if anyone knows and has some experience with such issues, my questions are:

1. Is 'weigh' package reliable/usable, at least to some extent? (the results do show diff between full, half and third)

2. How do you measure mem consumptions of your large data/records?

3. If the results are even approximately valid, what could cause such large discrepancies with 'data'?

4. Is there a way to see if some record has been freed from memory, GCed?



module Main where

import Prelude

import Control.DeepSeq     (NFData)
import Data.HashMap.Strict (HashMap, fromList)
import GHC.Generics        (Generic)
import Weigh               (mainWith, value)


data MapData k v = MapData (HashMap k v) deriving Generic
instance (NFData k, NFData v) => NFData (MapData k v)

full, half, third :: Int
full  = 10000
half  =  5000
third =  3333

main :: IO ()
main = mainWith $ do
  value "HashMap"       (          mkHMList full)
  value "HashMap half"  (          mkHMList half)
  value "HashMap third" (          mkHMList third)
  value "MapData"       (MapData $ mkHMList full)

mkHMList :: Int -> HashMap Int String
mkHMList n = fromList . zip [1..n] $ replicate n "some text"




_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Measuring memory usage

Vanessa McHale

You might consider profiling your application or making an event log instead. The event log should show GC activity and the heap profile should show memory usage more accurately.


On 06/29/2018 07:31 AM, Vlatko Basic wrote:

Hello,

I've come to some strange results using Weigh package.

It shows that HashMap inside 'data' is using much, much more memory.

The strange thing is that I'm seeing too large mem usage in my app as well (several "MapData" like in records), and trying to figure out with 'weigh' what's keeping the mem.

Noticed that when I change the code to use HashMap directly (not inside 'data', that's the only change), the mem usage observed with top drops down for ~60M, from 850M to 790M.


These are the test results for 10K, 5K and 3.3K items for "data MapData k v = MapData (HashMap k v)" (at the end is the full runnable example.)

Case           Allocated  GCs
HashMap          262,824    0
HashMap half      58,536    0
HashMap third     17,064    0
MapData        4,242,208    4

I tested by changing the order, disabling all but one etc., and the results were the same. Same 'weigh' behaviour with IntMap and Map.


So, if anyone knows and has some experience with such issues, my questions are:

1. Is 'weigh' package reliable/usable, at least to some extent? (the results do show diff between full, half and third)

2. How do you measure mem consumptions of your large data/records?

3. If the results are even approximately valid, what could cause such large discrepancies with 'data'?

4. Is there a way to see if some record has been freed from memory, GCed?



module Main where

import Prelude

import Control.DeepSeq     (NFData)
import Data.HashMap.Strict (HashMap, fromList)
import GHC.Generics        (Generic)
import Weigh               (mainWith, value)


data MapData k v = MapData (HashMap k v) deriving Generic
instance (NFData k, NFData v) => NFData (MapData k v)

full, half, third :: Int
full  = 10000
half  =  5000
third =  3333

main :: IO ()
main = mainWith $ do
  value "HashMap"       (          mkHMList full)
  value "HashMap half"  (          mkHMList half)
  value "HashMap third" (          mkHMList third)
  value "MapData"       (MapData $ mkHMList full)

mkHMList :: Int -> HashMap Int String
mkHMList n = fromList . zip [1..n] $ replicate n "some text"





_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.

signature.asc (499 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Measuring memory usage

Claude Heiland-Allen-3
In reply to this post by Vlatko Basic
Hi Vlatko,

On 29/06/18 13:31, Vlatko Basic wrote:
>
> Hello,
>
> I've come to some strange results using Weigh package.
>
> It shows that HashMap inside 'data' is using much, much more memory.
>
This seems to be astrictness issue - you may be measuring the size of a
thunk instead of the resulting evaluated data.

To confirm that this is the case, you can replace:

data MapData k v = MapData (HashMap k v) deriving Generic

with

data MapData k v = MapData !(HashMap k v) deriving Generic

Or replace:

   value "MapData"       (MapData $ mkHMList full)

with

   value "MapData"       (MapData $! mkHMList full)

Either of these changes gave me results like this:

Case           Allocated  GCs
HashMap          262,824    0
HashMap half      58,536    0
HashMap third     17,064    0
MapData          263,416    0

The real issue seems to be NFData not doing what you expect. I'm not
sure what the generic NFData instance is supposed to do, as there is no
instance Generic (HashMap k v), so maybe you need to write your own rnf
if you don't like either of the above workarounds.

Claude

>
> The strange thing is that I'm seeing too large mem usage in my app as
> well (several "MapData" like in records), and trying to figure out
> with 'weigh' what's keeping the mem.
>
> Noticed that when I change the code to use HashMap directly (not
> inside 'data', that's the only change), the mem usage observed with
> top drops down for ~60M, from 850M to 790M.
>
>
> These are the test results for 10K, 5K and 3.3K items for "data
> MapData k v = MapData (HashMap k v)" (at the end is the full runnable
> example.)
>
> Case           Allocated  GCs
> HashMap          262,824    0
> HashMap half      58,536    0
> HashMap third     17,064    0
> MapData        4,242,208    4
>
> I tested by changing the order, disabling all but one etc., and the
> results were the same. Same 'weigh' behaviour with IntMap and Map.
>
>
> So, if anyone knows and has some experience with such issues, my
> questions are:
>
> 1. Is 'weigh' package reliable/usable, at least to some extent? (the
> results do show diff between full, half and third)
>
> 2. How do you measure mem consumptions of your large data/records?
>
> 3. If the results are even approximately valid, what could cause such
> large discrepancies with 'data'?
>
> 4. Is there a way to see if some record has been freed from memory, GCed?
>
>
>
> module Main where
>
> import Prelude
>
> import Control.DeepSeq     (NFData)
> import Data.HashMap.Strict (HashMap, fromList)
> import GHC.Generics        (Generic)
> import Weigh               (mainWith, value)
>
>
> data MapData k v = MapData (HashMap k v) deriving Generic
> instance (NFData k, NFData v) => NFData (MapData k v)
>
> full, half, third :: Int
> full  = 10000
> half  =  5000
> third =  3333
>
> main :: IO ()
> main = mainWith $ do
>   value "HashMap"       (          mkHMList full)
>   value "HashMap half"  (          mkHMList half)
>   value "HashMap third" (          mkHMList third)
>   value "MapData"       (MapData $ mkHMList full)
>
> mkHMList :: Int -> HashMap Int String
> mkHMList n = fromList . zip [1..n] $ replicate n "some text"
>
>
--
https://mathr.co.uk

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Measuring memory usage

Vlatko Basic
In reply to this post by Vanessa McHale
Good advice for eventlog. I read about it long ago and completely forgot we have it. :-)



-------- Original Message --------
Subject: Re: [Haskell-cafe] Measuring memory usage
From: Vanessa McHale [hidden email]
To: [hidden email]
Date: 29/06/18 15:34


You might consider profiling your application or making an event log instead. The event log should show GC activity and the heap profile should show memory usage more accurately.


On 06/29/2018 07:31 AM, Vlatko Basic wrote:

Hello,

I've come to some strange results using Weigh package.

It shows that HashMap inside 'data' is using much, much more memory.

The strange thing is that I'm seeing too large mem usage in my app as well (several "MapData" like in records), and trying to figure out with 'weigh' what's keeping the mem.

Noticed that when I change the code to use HashMap directly (not inside 'data', that's the only change), the mem usage observed with top drops down for ~60M, from 850M to 790M.


These are the test results for 10K, 5K and 3.3K items for "data MapData k v = MapData (HashMap k v)" (at the end is the full runnable example.)

Case           Allocated  GCs
HashMap          262,824    0
HashMap half      58,536    0
HashMap third     17,064    0
MapData        4,242,208    4

I tested by changing the order, disabling all but one etc., and the results were the same. Same 'weigh' behaviour with IntMap and Map.


So, if anyone knows and has some experience with such issues, my questions are:

1. Is 'weigh' package reliable/usable, at least to some extent? (the results do show diff between full, half and third)

2. How do you measure mem consumptions of your large data/records?

3. If the results are even approximately valid, what could cause such large discrepancies with 'data'?

4. Is there a way to see if some record has been freed from memory, GCed?



module Main where

import Prelude

import Control.DeepSeq     (NFData)
import Data.HashMap.Strict (HashMap, fromList)
import GHC.Generics        (Generic)
import Weigh               (mainWith, value)


data MapData k v = MapData (HashMap k v) deriving Generic
instance (NFData k, NFData v) => NFData (MapData k v)

full, half, third :: Int
full  = 10000
half  =  5000
third =  3333

main :: IO ()
main = mainWith $ do
  value "HashMap"       (          mkHMList full)
  value "HashMap half"  (          mkHMList half)
  value "HashMap third" (          mkHMList third)
  value "MapData"       (MapData $ mkHMList full)

mkHMList :: Int -> HashMap Int String
mkHMList n = fromList . zip [1..n] $ replicate n "some text"





_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Measuring memory usage

Vlatko Basic
In reply to this post by Claude Heiland-Allen-3
Indeed bang solves the issue. I didn't try it because the docs says value
doesn't have to be forced for validateFunc (which is used for value), but
obviously only to whnf.

Thanks. :-)
Been wasting whole morning on this.


-------- Original Message  --------
Subject: Re: [Haskell-cafe] Measuring memory usage
From: Claude Heiland-Allen <[hidden email]>
To: [hidden email]
Date: 29/06/18 15:37

> Hi Vlatko,
>
> On 29/06/18 13:31, Vlatko Basic wrote:
>>
>> Hello,
>>
>> I've come to some strange results using Weigh package.
>>
>> It shows that HashMap inside 'data' is using much, much more memory.
>>
> This seems to be astrictness issue - you may be measuring the size of a thunk
> instead of the resulting evaluated data.
>
> To confirm that this is the case, you can replace:
>
> data MapData k v = MapData (HashMap k v) deriving Generic
>
> with
>
> data MapData k v = MapData !(HashMap k v) deriving Generic
>
> Or replace:
>
>    value "MapData"       (MapData $ mkHMList full)
>
> with
>
>    value "MapData"       (MapData $! mkHMList full)
>
> Either of these changes gave me results like this:
>
> Case           Allocated  GCs
> HashMap          262,824    0
> HashMap half      58,536    0
> HashMap third     17,064    0
> MapData          263,416    0
>
> The real issue seems to be NFData not doing what you expect. I'm not sure what
> the generic NFData instance is supposed to do, as there is no instance Generic
> (HashMap k v), so maybe you need to write your own rnf if you don't like either
> of the above workarounds.
>
> Claude
>>
>> The strange thing is that I'm seeing too large mem usage in my app as well
>> (several "MapData" like in records), and trying to figure out with 'weigh'
>> what's keeping the mem.
>>
>> Noticed that when I change the code to use HashMap directly (not inside
>> 'data', that's the only change), the mem usage observed with top drops down
>> for ~60M, from 850M to 790M.
>>
>>
>> These are the test results for 10K, 5K and 3.3K items for "data MapData k v =
>> MapData (HashMap k v)" (at the end is the full runnable example.)
>>
>> Case           Allocated  GCs
>> HashMap          262,824    0
>> HashMap half      58,536    0
>> HashMap third     17,064    0
>> MapData        4,242,208    4
>>
>> I tested by changing the order, disabling all but one etc., and the results
>> were the same. Same 'weigh' behaviour with IntMap and Map.
>>
>>
>> So, if anyone knows and has some experience with such issues, my questions are:
>>
>> 1. Is 'weigh' package reliable/usable, at least to some extent? (the results
>> do show diff between full, half and third)
>>
>> 2. How do you measure mem consumptions of your large data/records?
>>
>> 3. If the results are even approximately valid, what could cause such large
>> discrepancies with 'data'?
>>
>> 4. Is there a way to see if some record has been freed from memory, GCed?
>>
>>
>>
>> module Main where
>>
>> import Prelude
>>
>> import Control.DeepSeq     (NFData)
>> import Data.HashMap.Strict (HashMap, fromList)
>> import GHC.Generics        (Generic)
>> import Weigh               (mainWith, value)
>>
>>
>> data MapData k v = MapData (HashMap k v) deriving Generic
>> instance (NFData k, NFData v) => NFData (MapData k v)
>>
>> full, half, third :: Int
>> full  = 10000
>> half  =  5000
>> third =  3333
>>
>> main :: IO ()
>> main = mainWith $ do
>>   value "HashMap"       (          mkHMList full)
>>   value "HashMap half"  (          mkHMList half)
>>   value "HashMap third" (          mkHMList third)
>>   value "MapData"       (MapData $ mkHMList full)
>>
>> mkHMList :: Int -> HashMap Int String
>> mkHMList n = fromList . zip [1..n] $ replicate n "some text"
>>
>>
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Measuring memory usage

Claude Heiland-Allen-3
On 2018-06-29 15:14, Vlatko Basic wrote:
> Indeed bang solves the issue. I didn't try it because the docs says
> value doesn't have to be forced for validateFunc (which is used for
> value), but obviously only to whnf.

I think the issue is something to do with the two default
implementations for rnf in the NFData class.  Historically, `rnf a = seq
a ()` was the default implementation (ie just WHNF), but more recently
there is a Generic-based version that should automatically reduce to
normal form.  I don't know why the Generic version is either 1. not used
at all, or 2. not working properly, but I suspect lack of instance
Generic (HashMap k v), or possibly instance Generic1/2 MapData (if they
are things?), may have something to do with it.  I don't know why there
is no instance, but maybe it would allow breaking internal data
structure invariants?


Claude

>
> Thanks. :-)
> Been wasting whole morning on this.
>
>
> -------- Original Message  --------
> Subject: Re: [Haskell-cafe] Measuring memory usage
> From: Claude Heiland-Allen <[hidden email]>
> To: [hidden email]
> Date: 29/06/18 15:37
>
>> Hi Vlatko,
>>
>> On 29/06/18 13:31, Vlatko Basic wrote:
>>>
>>> Hello,
>>>
>>> I've come to some strange results using Weigh package.
>>>
>>> It shows that HashMap inside 'data' is using much, much more memory.
>>>
>> This seems to be astrictness issue - you may be measuring the size of
>> a thunk instead of the resulting evaluated data.
>>
>> To confirm that this is the case, you can replace:
>>
>> data MapData k v = MapData (HashMap k v) deriving Generic
>>
>> with
>>
>> data MapData k v = MapData !(HashMap k v) deriving Generic
>>
>> Or replace:
>>
>>    value "MapData"       (MapData $ mkHMList full)
>>
>> with
>>
>>    value "MapData"       (MapData $! mkHMList full)
>>
>> Either of these changes gave me results like this:
>>
>> Case           Allocated  GCs
>> HashMap          262,824    0
>> HashMap half      58,536    0
>> HashMap third     17,064    0
>> MapData          263,416    0
>>
>> The real issue seems to be NFData not doing what you expect. I'm not
>> sure what the generic NFData instance is supposed to do, as there is
>> no instance Generic (HashMap k v), so maybe you need to write your own
>> rnf if you don't like either of the above workarounds.
>>
>> Claude
>>>
>>> The strange thing is that I'm seeing too large mem usage in my app as
>>> well (several "MapData" like in records), and trying to figure out
>>> with 'weigh' what's keeping the mem.
>>>
>>> Noticed that when I change the code to use HashMap directly (not
>>> inside 'data', that's the only change), the mem usage observed with
>>> top drops down for ~60M, from 850M to 790M.
>>>
>>>
>>> These are the test results for 10K, 5K and 3.3K items for "data
>>> MapData k v = MapData (HashMap k v)" (at the end is the full runnable
>>> example.)
>>>
>>> Case           Allocated  GCs
>>> HashMap          262,824    0
>>> HashMap half      58,536    0
>>> HashMap third     17,064    0
>>> MapData        4,242,208    4
>>>
>>> I tested by changing the order, disabling all but one etc., and the
>>> results were the same. Same 'weigh' behaviour with IntMap and Map.
>>>
>>>
>>> So, if anyone knows and has some experience with such issues, my
>>> questions are:
>>>
>>> 1. Is 'weigh' package reliable/usable, at least to some extent? (the
>>> results do show diff between full, half and third)
>>>
>>> 2. How do you measure mem consumptions of your large data/records?
>>>
>>> 3. If the results are even approximately valid, what could cause such
>>> large discrepancies with 'data'?
>>>
>>> 4. Is there a way to see if some record has been freed from memory,
>>> GCed?
>>>
>>>
>>>
>>> module Main where
>>>
>>> import Prelude
>>>
>>> import Control.DeepSeq     (NFData)
>>> import Data.HashMap.Strict (HashMap, fromList)
>>> import GHC.Generics        (Generic)
>>> import Weigh               (mainWith, value)
>>>
>>>
>>> data MapData k v = MapData (HashMap k v) deriving Generic
>>> instance (NFData k, NFData v) => NFData (MapData k v)
>>>
>>> full, half, third :: Int
>>> full  = 10000
>>> half  =  5000
>>> third =  3333
>>>
>>> main :: IO ()
>>> main = mainWith $ do
>>>   value "HashMap"       (          mkHMList full)
>>>   value "HashMap half"  (          mkHMList half)
>>>   value "HashMap third" (          mkHMList third)
>>>   value "MapData"       (MapData $ mkHMList full)
>>>
>>> mkHMList :: Int -> HashMap Int String
>>> mkHMList n = fromList . zip [1..n] $ replicate n "some text"
>>>
>>>

--
https://mathr.co.uk
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Measuring memory usage

Vlatko Basic
In reply to this post by Vlatko Basic
Hello again,

I'm still trying to find some method to predict memory usage and came upon this page: https://wiki.haskell.org/GHC/Memory_Footprint

I tried to manually calculate how much memory will the record below consume (64-bit, 1W = 8B):

newtype Id   = Id    Text deriving (Generic, Data, NFData) -- = 6W
newtype Uid  = Uid   Int  deriving (Generic, Data, NFData) -- = 2W
newtype XUid = XUid  Uid  deriving (Generic, Data, NFData) -- = 2W
newtype YUid = YUid  Uid  deriving (Generic, Data, NFData) -- = 2W
data    Dir  = Yes | No   deriving (Generic, Data)         -- = 2W

data X = X
  { a :: XUid                -- =  2W       -- Int
  , b :: YUid                -- =  2W       -- Int
  , c :: Id                  -- =  6W + 8B  -- Text len 4
  , d :: Either Dir Dir      -- =  1W + 2W  -- Either + Dir + No/Yes
  , e :: Text                -- =  6W + 8B  -- Text len 4
  } deriving (Generic, Data) -- = 19W + 16B = 152 + 16 = 168 B

and calculated the assumed sizes of few lists with different number of elements:

Expected list sizes    ([v] = (1 + 3N) words + N * sizeof(v))
  30: 1 + 3W *   30 + (19W *   30 + 16 *   30) =   5,761 B
 600: 1 + 3W *  600 + (19W *  600 + 16 *  600) = 115,201 B
5000: 1 + 3W * 5000 + (19W * 5000 + 16 * 5000) = 960,001 B

I also compared these sizes with three libs (Data.Generics.Schemes.gsize, GHC.DataSize.recursiveSizeNF, Weigh) and the results were:

#items recursiveSizeNF    gSize      Weigh    Expected  Diff/recursiveSizeNF
   1:            1,416       18        696         168          -
  30:            8,008      931     20,880       5,761        28%
 600:          135,688   18,601    417,600     115,201        15%
5000:        1,121,288  155,001  3,480,000     960,001        14%

As you can see, the results are more than surprising (to me), with recursiveSizeNF coming closest. They all measure the same variable.


What am I missing?


For completeness, here are relevant parts of code for creating elements (with excessive forcing):

  let mkX i = force X{ a = XUid $ Uid i
                     , b = YUid $ Uid i
                     , c = Id $ tshow i
                     , d = if even i then (Left Yes) else (Right No)
                     , e = T.reverse (tshow i)
                     }
      xs30  = force . map mkX $ take   30 $ randomRs (1000,1030) (mkStdGen 0)
      xs600 = force . map mkX $ take  600 $ randomRs (1000,1600) (mkStdGen 0)
      xs5K  = force . map mkX $ take 5000 $ randomRs (1000,5000) (mkStdGen 0)

  dataSize <- recursiveSizeNF $!! {a}
  let gSize = gsize $!! mkX 0



-------- Original Message --------
Subject: Measuring memory usage
From: Vlatko Basic [hidden email]
To: haskell-cafe [hidden email]
Date: 29/06/18 14:31


Hello,

I've come to some strange results using Weigh package.

It shows that HashMap inside 'data' is using much, much more memory.

The strange thing is that I'm seeing too large mem usage in my app as well (several "MapData" like in records), and trying to figure out with 'weigh' what's keeping the mem.

Noticed that when I change the code to use HashMap directly (not inside 'data', that's the only change), the mem usage observed with top drops down for ~60M, from 850M to 790M.


These are the test results for 10K, 5K and 3.3K items for "data MapData k v = MapData (HashMap k v)" (at the end is the full runnable example.)

Case           Allocated  GCs
HashMap          262,824    0
HashMap half      58,536    0
HashMap third     17,064    0
MapData        4,242,208    4

I tested by changing the order, disabling all but one etc., and the results were the same. Same 'weigh' behaviour with IntMap and Map.


So, if anyone knows and has some experience with such issues, my questions are:

1. Is 'weigh' package reliable/usable, at least to some extent? (the results do show diff between full, half and third)

2. How do you measure mem consumptions of your large data/records?

3. If the results are even approximately valid, what could cause such large discrepancies with 'data'?

4. Is there a way to see if some record has been freed from memory, GCed?



module Main where

import Prelude

import Control.DeepSeq     (NFData)
import Data.HashMap.Strict (HashMap, fromList)
import GHC.Generics        (Generic)
import Weigh               (mainWith, value)


data MapData k v = MapData (HashMap k v) deriving Generic
instance (NFData k, NFData v) => NFData (MapData k v)

full, half, third :: Int
full  = 10000
half  =  5000
third =  3333

main :: IO ()
main = mainWith $ do
  value "HashMap"       (          mkHMList full)
  value "HashMap half"  (          mkHMList half)
  value "HashMap third" (          mkHMList third)
  value "MapData"       (MapData $ mkHMList full)

mkHMList :: Int -> HashMap Int String
mkHMList n = fromList . zip [1..n] $ replicate n "some text"





_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Measuring memory usage

Li-yao Xia-2
Hi Vlatko,

I don't get the same numbers with ghc-datasize

recursiveSize $! xs1   -- 240 (list of length 1)
recursiveSize $! xs30  -- 6296

compiled on GHC 8.0.2 with optimizations (-O)

whereas recursiveSizeNF gives me the size of the thunk (force xs1),
which depends on whether xs1 is evaluated or not.

The size of X is missing:

- 1 word for the X constructor
- 5 words for the fields (they are not unpacked, this must be explicitly
required with the {-# UNPACK #-} pragma)
- 8B for one of the text fields (which for some reason takes 64B instead
of 56B)

That's 56B extra, for a total of 224B. Add in 4W for a singleton list
and that's 256B. Why are 16B missing in the above 240B figure? Note that
in mkX, there are two fields equal to (Uid i), so they get shared
(recursiveSize doesn't recount shared structures). (It's hardly an
optimization: if we take out the newtypes, the code looks like X {a = i,
b = i, ...}, so the compiler simply puts the same pointer in the two
fields.)

Notice also that nullary constructors (in particular Yes/No) will always
be shared. The compiler may also float out the whole "Left Yes" and
"Right No" to the toplevel, further reducing the size of longer lists.

gsize counts constructors. In particular, each newtype constructor
counts as one, and primitives like Int and Char also count as one (the
value they box is not visible to Data), and Text has a dummy Data
instance to make it seem like a newtype around [Char].

You can use this snippet to see a trace of the generic traversal:

     everywhereM (\x -> print (dataTypeOf x) >> return x) xs1

Finally, "weigh" counts allocations, which are generally a superset of
the actual space taken by a data structure once it is fully computed.

Li-yao


On 07/04/2018 07:12 AM, Vlatko Basic wrote:

> Hello again,
>
> I'm still trying to find some method to predict memory usage and came
> upon this page: https://wiki.haskell.org/GHC/Memory_Footprint
>
> I tried to manually calculate how much memory will the record below
> consume (64-bit, 1W = 8B):
>
> newtype Id   = Id    Text deriving (Generic, Data, NFData) -- = 6W
> newtype Uid  = Uid   Int  deriving (Generic, Data, NFData) -- = 2W
> newtype XUid = XUid  Uid  deriving (Generic, Data, NFData) -- = 2W
> newtype YUid = YUid  Uid  deriving (Generic, Data, NFData) -- = 2W
> data    Dir  = Yes | No   deriving (Generic, Data)         -- = 2W
>
> data X = X
>    { a :: XUid                -- =  2W       -- Int
>    , b :: YUid                -- =  2W       -- Int
>    , c :: Id                  -- =  6W + 8B  -- Text len 4
>    , d :: Either Dir Dir      -- =  1W + 2W  -- Either + Dir + No/Yes
>    , e :: Text                -- =  6W + 8B  -- Text len 4
>    } deriving (Generic, Data) -- = 19W + 16B = 152 + 16 = 168B
>
> and calculated the assumed sizes of few lists with different number of
> elements:
>
> Expected list sizes    ([v] = (1 + 3N) words + N * sizeof(v))
>    30: 1 + 3W *   30 + (19W *   30 + 16 *   30) =   5,761 B
>   600: 1 + 3W *  600 + (19W *  600 + 16 *  600) = 115,201 B
> 5000: 1 + 3W * 5000 + (19W * 5000 + 16 * 5000) = 960,001 B
>
> I also compared these sizes with three libs
> (Data.Generics.Schemes.gsize, GHC.DataSize.recursiveSizeNF, Weigh) and
> the results were:
>
> #items recursiveSizeNF gSize      Weigh    Expected  Diff/recursiveSizeNF
>     1:            1,416       18        696 168          -
>    30:            8,008      931     20,880 5,761        28%
>   600:          135,688   18,601    417,600 115,201        15%
> 5000:        1,121,288  155,001  3,480,000 960,001        14%
>
> As you can see, the results are more than surprising (to me), with
> recursiveSizeNF coming closest. They all measure the same variable.
>
>
> What am I missing?
>
>
> For completeness, here are relevant parts of code for creating elements
> (with excessive forcing):
>
>    let mkX i = force X{ a = XUid $ Uid i
>                       , b = YUid $ Uid i
>                       , c = Id $ tshow i
>                       , d = if even i then (Left Yes) else (Right No)
>                       , e = T.reverse (tshow i)
>                       }
>        xs30  = force . map mkX $ take   30 $ randomRs (1000,1030)
> (mkStdGen 0)
>        xs600 = force . map mkX $ take  600 $ randomRs (1000,1600)
> (mkStdGen 0)
>        xs5K  = force . map mkX $ take 5000 $ randomRs (1000,5000)
> (mkStdGen 0)
>
>    dataSize <- recursiveSizeNF $!! {a}
>    let gSize = gsize $!! mkX 0
>
>
>
>> -------- Original Message --------
>> Subject: Measuring memory usage
>> From: Vlatko Basic <[hidden email]>
>> To: haskell-cafe <[hidden email]>
>> Date: 29/06/18 14:31
>>
>>
>> Hello,
>>
>> I've come to some strange results using Weigh package.
>>
>> It shows that HashMap inside 'data' is using much, much more memory.
>>
>> The strange thing is that I'm seeing too large mem usage in my app as
>> well (several "MapData" like in records), and trying to figure out
>> with 'weigh' what's keeping the mem.
>>
>> Noticed that when I change the code to use HashMap directly (not
>> inside 'data', that's the only change), the mem usage observed with
>> top drops down for ~60M, from 850M to 790M.
>>
>>
>> These are the test results for 10K, 5K and 3.3K items for "data
>> MapData k v = MapData (HashMap k v)" (at the end is the full runnable
>> example.)
>>
>> Case           Allocated  GCs
>> HashMap          262,824    0
>> HashMap half      58,536    0
>> HashMap third     17,064    0
>> MapData        4,242,208    4
>>
>> I tested by changing the order, disabling all but one etc., and the
>> results were the same. Same 'weigh' behaviour with IntMap and Map.
>>
>>
>> So, if anyone knows and has some experience with such issues, my
>> questions are:
>>
>> 1. Is 'weigh' package reliable/usable, at least to some extent? (the
>> results do show diff between full, half and third)
>>
>> 2. How do you measure mem consumptions of your large data/records?
>>
>> 3. If the results are even approximately valid, what could cause such
>> large discrepancies with 'data'?
>>
>> 4. Is there a way to see if some record has been freed from memory, GCed?
>>
>>
>>
>> module Main where
>>
>> import Prelude
>>
>> import Control.DeepSeq     (NFData)
>> import Data.HashMap.Strict (HashMap, fromList)
>> import GHC.Generics        (Generic)
>> import Weigh               (mainWith, value)
>>
>>
>> data MapData k v = MapData (HashMap k v) deriving Generic
>> instance (NFData k, NFData v) => NFData (MapData k v)
>>
>> full, half, third :: Int
>> full  = 10000
>> half  =  5000
>> third =  3333
>>
>> main :: IO ()
>> main = mainWith $ do
>>   value "HashMap"       (          mkHMList full)
>>   value "HashMap half"  (          mkHMList half)
>>   value "HashMap third" (          mkHMList third)
>>   value "MapData"       (MapData $ mkHMList full)
>>
>> mkHMList :: Int -> HashMap Int String
>> mkHMList n = fromList . zip [1..n] $ replicate n "some text"
>>
>>
>>
>
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.
>
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Measuring memory usage

Vlatko Basic
Hi Li-yao,

Thanks for taking the time to test and explain in so much details. It is much
clearer to me now. :-)

I changed 'b' to "b = YUid $ Uid (i * 2)" so they are different now.

Thanks to your correction, I added 6W to sizeOf X and calculated that its mem
usage is now 216, but I do not understand which text field are you talking
about. I added 8 (i.e. 2N) to both Text fields (c and e). I checked xs30 and
both text fields are 4 chars long.

data X = X                     -- = 1W + 5W
   { a :: XUid                  -- = 2W       -- Int
   , b :: YUid                  -- = 2W       -- Int
   , c :: Id                    -- = 6W + 8B  -- Text len 4
   , d :: Either Dir Dir        -- = W  + 2W
   , e :: Text                  -- = 6W + 8B  -- Text len 4
   } deriving (Generic, D.Data) -- = 25W + 16B = 200 + 16 = 216

so xs1 would be 216 + 4W = 248

The diff is now minimal, except for xs1.

What        ds    Expected   Diff
xs1:       272         248   0.88%    -- overhead 24B - 3W
xs30:    7,232       7,208   0.04%    -- overhead 24B - 3W
xs600: 144,032     144,008   0.00%    -- overhead 24B - 3W


Seems like a constant induced by measurement. For recursiveSizeNF it is also
constant, 64B - 8W.

I'm very satisfied with the proof that recursiveSize(NF) shows the correct size,
less 24 (or 64).

I'm on GHC 8.2.2 and saw no diff with different levels of optimization, or
without any.


Have a nice day,
vlatko



-------- Original Message  --------
Subject: Re: [Haskell-cafe] Measuring memory usage
From: Li-yao Xia <[hidden email]>
To: [hidden email]
Cc: haskell-cafe <[hidden email]>
Date: 04/07/18 15:12

> Hi Vlatko,
>
> I don't get the same numbers with ghc-datasize
>
> recursiveSize $! xs1   -- 240 (list of length 1)
> recursiveSize $! xs30  -- 6296
>
> compiled on GHC 8.0.2 with optimizations (-O)
>
> whereas recursiveSizeNF gives me the size of the thunk (force xs1), which
> depends on whether xs1 is evaluated or not.
>
> The size of X is missing:
>
> - 1 word for the X constructor
> - 5 words for the fields (they are not unpacked, this must be explicitly
> required with the {-# UNPACK #-} pragma)
> - 8B for one of the text fields (which for some reason takes 64B instead of 56B)
>
> That's 56B extra, for a total of 224B. Add in 4W for a singleton list and that's
> 256B. Why are 16B missing in the above 240B figure? Note that in mkX, there are
> two fields equal to (Uid i), so they get shared (recursiveSize doesn't recount
> shared structures). (It's hardly an optimization: if we take out the newtypes,
> the code looks like X {a = i, b = i, ...}, so the compiler simply puts the same
> pointer in the two fields.)
>
> Notice also that nullary constructors (in particular Yes/No) will always be
> shared. The compiler may also float out the whole "Left Yes" and "Right No" to
> the toplevel, further reducing the size of longer lists.
>
> gsize counts constructors. In particular, each newtype constructor counts as
> one, and primitives like Int and Char also count as one (the value they box is
> not visible to Data), and Text has a dummy Data instance to make it seem like a
> newtype around [Char].
>
> You can use this snippet to see a trace of the generic traversal:
>
>      everywhereM (\x -> print (dataTypeOf x) >> return x) xs1
>
> Finally, "weigh" counts allocations, which are generally a superset of the
> actual space taken by a data structure once it is fully computed.
>
> Li-yao
>
>
> On 07/04/2018 07:12 AM, Vlatko Basic wrote:
>> Hello again,
>>
>> I'm still trying to find some method to predict memory usage and came upon
>> this page: https://wiki.haskell.org/GHC/Memory_Footprint
>>
>> I tried to manually calculate how much memory will the record below consume
>> (64-bit, 1W = 8B):
>>
>> newtype Id   = Id    Text deriving (Generic, Data, NFData) -- = 6W
>> newtype Uid  = Uid   Int  deriving (Generic, Data, NFData) -- = 2W
>> newtype XUid = XUid  Uid  deriving (Generic, Data, NFData) -- = 2W
>> newtype YUid = YUid  Uid  deriving (Generic, Data, NFData) -- = 2W
>> data    Dir  = Yes | No   deriving (Generic, Data)         -- = 2W
>>
>> data X = X
>>    { a :: XUid                -- =  2W       -- Int
>>    , b :: YUid                -- =  2W       -- Int
>>    , c :: Id                  -- =  6W + 8B  -- Text len 4
>>    , d :: Either Dir Dir      -- =  1W + 2W  -- Either + Dir + No/Yes
>>    , e :: Text                -- =  6W + 8B  -- Text len 4
>>    } deriving (Generic, Data) -- = 19W + 16B = 152 + 16 = 168B
>>
>> and calculated the assumed sizes of few lists with different number of elements:
>>
>> Expected list sizes    ([v] = (1 + 3N) words + N * sizeof(v))
>>    30: 1 + 3W *   30 + (19W *   30 + 16 *   30) =   5,761 B
>>   600: 1 + 3W *  600 + (19W *  600 + 16 *  600) = 115,201 B
>> 5000: 1 + 3W * 5000 + (19W * 5000 + 16 * 5000) = 960,001 B
>>
>> I also compared these sizes with three libs (Data.Generics.Schemes.gsize,
>> GHC.DataSize.recursiveSizeNF, Weigh) and the results were:
>>
>> #items recursiveSizeNF gSize      Weigh    Expected  Diff/recursiveSizeNF
>>     1:            1,416       18        696 168          -
>>    30:            8,008      931     20,880 5,761        28%
>>   600:          135,688   18,601    417,600 115,201        15%
>> 5000:        1,121,288  155,001  3,480,000 960,001        14%
>>
>> As you can see, the results are more than surprising (to me), with
>> recursiveSizeNF coming closest. They all measure the same variable.
>>
>>
>> What am I missing?
>>
>>
>> For completeness, here are relevant parts of code for creating elements (with
>> excessive forcing):
>>
>>    let mkX i = force X{ a = XUid $ Uid i
>>                       , b = YUid $ Uid i
>>                       , c = Id $ tshow i
>>                       , d = if even i then (Left Yes) else (Right No)
>>                       , e = T.reverse (tshow i)
>>                       }
>>        xs30  = force . map mkX $ take   30 $ randomRs (1000,1030) (mkStdGen 0)
>>        xs600 = force . map mkX $ take  600 $ randomRs (1000,1600) (mkStdGen 0)
>>        xs5K  = force . map mkX $ take 5000 $ randomRs (1000,5000) (mkStdGen 0)
>>
>>    dataSize <- recursiveSizeNF $!! {a}
>>    let gSize = gsize $!! mkX 0
>>
>>
>>
>>> -------- Original Message --------
>>> Subject: Measuring memory usage
>>> From: Vlatko Basic <[hidden email]>
>>> To: haskell-cafe <[hidden email]>
>>> Date: 29/06/18 14:31
>>>
>>>
>>> Hello,
>>>
>>> I've come to some strange results using Weigh package.
>>>
>>> It shows that HashMap inside 'data' is using much, much more memory.
>>>
>>> The strange thing is that I'm seeing too large mem usage in my app as well
>>> (several "MapData" like in records), and trying to figure out with 'weigh'
>>> what's keeping the mem.
>>>
>>> Noticed that when I change the code to use HashMap directly (not inside
>>> 'data', that's the only change), the mem usage observed with top drops down
>>> for ~60M, from 850M to 790M.
>>>
>>>
>>> These are the test results for 10K, 5K and 3.3K items for "data MapData k v =
>>> MapData (HashMap k v)" (at the end is the full runnable example.)
>>>
>>> Case           Allocated  GCs
>>> HashMap          262,824    0
>>> HashMap half      58,536    0
>>> HashMap third     17,064    0
>>> MapData        4,242,208    4
>>>
>>> I tested by changing the order, disabling all but one etc., and the results
>>> were the same. Same 'weigh' behaviour with IntMap and Map.
>>>
>>>
>>> So, if anyone knows and has some experience with such issues, my questions are:
>>>
>>> 1. Is 'weigh' package reliable/usable, at least to some extent? (the results
>>> do show diff between full, half and third)
>>>
>>> 2. How do you measure mem consumptions of your large data/records?
>>>
>>> 3. If the results are even approximately valid, what could cause such large
>>> discrepancies with 'data'?
>>>
>>> 4. Is there a way to see if some record has been freed from memory, GCed?
>>>
>>>
>>>
>>> module Main where
>>>
>>> import Prelude
>>>
>>> import Control.DeepSeq     (NFData)
>>> import Data.HashMap.Strict (HashMap, fromList)
>>> import GHC.Generics        (Generic)
>>> import Weigh               (mainWith, value)
>>>
>>>
>>> data MapData k v = MapData (HashMap k v) deriving Generic
>>> instance (NFData k, NFData v) => NFData (MapData k v)
>>>
>>> full, half, third :: Int
>>> full  = 10000
>>> half  =  5000
>>> third =  3333
>>>
>>> main :: IO ()
>>> main = mainWith $ do
>>>   value "HashMap"       (          mkHMList full)
>>>   value "HashMap half"  (          mkHMList half)
>>>   value "HashMap third" (          mkHMList third)
>>>   value "MapData"       (MapData $ mkHMList full)
>>>
>>> mkHMList :: Int -> HashMap Int String
>>> mkHMList n = fromList . zip [1..n] $ replicate n "some text"
>>>
>>>
>>>
>>
>>
>>
>> _______________________________________________
>> Haskell-Cafe mailing list
>> To (un)subscribe, modify options or view archives go to:
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>> Only members subscribed via the mailman list are allowed to post.
>>
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.