Removing latency spikes. Garbage collector related?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Removing latency spikes. Garbage collector related?

Will Sewell
Hi, I was told in the #haskell IRC channel that this would be a good
place to ask this question, so here goes!

We’re writing a low-latency messaging system. The problem is we are
getting a lot of latency spikes. See this image:
http://i.imgur.com/GZ0Ek98.png (yellow to red is the 90th percentile),
which shows end-to-end latency of messages through the system.

I have tried to eliminate the problem by removing parts of the system
that I suspected to be expensive, but the spikes are still there.

I’m now thinking that it’s the GC. As you can see in this output from
ghc-events-analyze, work on the GC thread (red) seems to be blocking
work on the main program thread (green) http://i.imgur.com/4YO5q4U.png
(x axis is time, darkness of buckets is % CPU time).

*Note: the graphs are not of the same run, but are typical*

Do you think the GC is the most likely culprit?
Is there anything I can do to confirm this hypothesis? (I looked into
turning off the GC, but this seems tricky)
If it is the GC, then is there anything that can be done about it?

Thanks!
Will
_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: Removing latency spikes. Garbage collector related?

Gregory Collins-3

On Mon, Sep 28, 2015 at 9:08 AM, Will Sewell <[hidden email]> wrote:
If it is the GC, then is there anything that can be done about it?
  • Increase value of -A (the default is too small) -- best value for this is L3 cache size of the chip
  • Increase value of -H (total heap size) -- this will use more ram but you'll run GC less often
  • This will sound flip, but: generate less garbage. Frequency of GC runs is proportional to the amount of garbage being produced, so if you can lower mutator allocation rate then you will also increase net productivity. Built-up thunks can transparently hide a lot of allocation so fire up the profiler and tighten those up (there's an 80-20 rule here). Reuse output buffers if you aren't already, etc.
G

--
Gregory Collins <[hidden email]>

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: Removing latency spikes. Garbage collector related?

John Lato-2
Try Greg's recommendations first.  If you still need to do more investigation, I'd recommend that you look at some samples with either threadscope or dumping the eventlog to text.  I really like ghc-events-analyze, but it doesn't provide quite the same level of detail.  You may also want to dump some of your metrics into the eventlog, because then you'll be able to see exactly how high latency episodes line up with GC pauses.

On Mon, Sep 28, 2015 at 1:02 PM Gregory Collins <[hidden email]> wrote:

On Mon, Sep 28, 2015 at 9:08 AM, Will Sewell <[hidden email]> wrote:
If it is the GC, then is there anything that can be done about it?
  • Increase value of -A (the default is too small) -- best value for this is L3 cache size of the chip
  • Increase value of -H (total heap size) -- this will use more ram but you'll run GC less often
  • This will sound flip, but: generate less garbage. Frequency of GC runs is proportional to the amount of garbage being produced, so if you can lower mutator allocation rate then you will also increase net productivity. Built-up thunks can transparently hide a lot of allocation so fire up the profiler and tighten those up (there's an 80-20 rule here). Reuse output buffers if you aren't already, etc.
G

--
Gregory Collins <[hidden email]>
_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Fwd: Removing latency spikes. Garbage collector related?

Will Sewell
In reply to this post by Gregory Collins-3
Thanks for the reply Greg. I have already tried tweaking these values
a bit, and this is what I found:

* I first tried -A256k because the L2 cache is that size (Simon Marlow
mentioned this can lead to good performance
http://stackoverflow.com/a/3172704/1018290)
* I then tried a value of -A2048k because he also said "using a very
large young generation size might outweigh the cache benefits". I
don't exactly know what he meant by "a very large young generation
size", so I guessed at this value. Is it in the right ballpark?
* With -H, I tried values of -H8m, -H32m, -H128m, -H512m, -H1024m

But all lead to worse performance over the defaults (and -H didn't
really have much affect at all).

I will try your suggestion of setting -A to the L3 cache size.

Are there any other values I should try setting these at?

As for your final point, I have run space profiling, and it looks like
>90% of the memory is used for our message index, which is a temporary
store of messages that have gone through the system. These messages
are stored in aligned chunks in memory that are merged together. I
initially though this was causing the spikes, but they were still
there even after I removed the component. I will try and run space
profiling in the build with the message index.

Thanks again.

On 28 September 2015 at 19:02, Gregory Collins <[hidden email]> wrote:

>
> On Mon, Sep 28, 2015 at 9:08 AM, Will Sewell <[hidden email]> wrote:
>>
>> If it is the GC, then is there anything that can be done about it?
>
> Increase value of -A (the default is too small) -- best value for this is L3
> cache size of the chip
> Increase value of -H (total heap size) -- this will use more ram but you'll
> run GC less often
> This will sound flip, but: generate less garbage. Frequency of GC runs is
> proportional to the amount of garbage being produced, so if you can lower
> mutator allocation rate then you will also increase net productivity.
> Built-up thunks can transparently hide a lot of allocation so fire up the
> profiler and tighten those up (there's an 80-20 rule here). Reuse output
> buffers if you aren't already, etc.
>
> G
>
> --
> Gregory Collins <[hidden email]>
_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: Removing latency spikes. Garbage collector related?

Will Sewell
In reply to this post by John Lato-2
Thanks for the reply John. I will have a go at doing that. What do you
mean exactly by dumping metrics, do you mean measuring the latency
within the program, and dumping it if it exceeds a certain threshold?

And from the answers I'm assuming you believe it is the GC that is
most likely causing these spikes. I've never profiled Haskell code, so
I'm not used to seeing what the effects of the GC actually are.

On 28 September 2015 at 19:31, John Lato <[hidden email]> wrote:

> Try Greg's recommendations first.  If you still need to do more
> investigation, I'd recommend that you look at some samples with either
> threadscope or dumping the eventlog to text.  I really like
> ghc-events-analyze, but it doesn't provide quite the same level of detail.
> You may also want to dump some of your metrics into the eventlog, because
> then you'll be able to see exactly how high latency episodes line up with GC
> pauses.
>
> On Mon, Sep 28, 2015 at 1:02 PM Gregory Collins <[hidden email]>
> wrote:
>>
>>
>> On Mon, Sep 28, 2015 at 9:08 AM, Will Sewell <[hidden email]> wrote:
>>>
>>> If it is the GC, then is there anything that can be done about it?
>>
>> Increase value of -A (the default is too small) -- best value for this is
>> L3 cache size of the chip
>> Increase value of -H (total heap size) -- this will use more ram but
>> you'll run GC less often
>> This will sound flip, but: generate less garbage. Frequency of GC runs is
>> proportional to the amount of garbage being produced, so if you can lower
>> mutator allocation rate then you will also increase net productivity.
>> Built-up thunks can transparently hide a lot of allocation so fire up the
>> profiler and tighten those up (there's an 80-20 rule here). Reuse output
>> buffers if you aren't already, etc.
>>
>> G
>>
>> --
>> Gregory Collins <[hidden email]>
>> _______________________________________________
>> Glasgow-haskell-users mailing list
>> [hidden email]
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users
_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: Removing latency spikes. Garbage collector related?

Neil Davies
In reply to this post by Will Sewell
Will

is your issue with the spikes i response time, rather than the mean values?

If so, once you’ve reduced the amount of unnecessary mutation, you might want
to take more control over when the GC is taking place. You might want to disable
GC on timer (-I0) and force GC to occur at points you select - we found this useful.

Lastly, is the arrival pattern (and distribution pattern) of messages constant or
variable? just making sure that you are not trying to fight basic queueing theory here.


Neil

On 29 Sep 2015, at 10:03, Will Sewell <[hidden email]> wrote:

> Thanks for the reply Greg. I have already tried tweaking these values
> a bit, and this is what I found:
>
> * I first tried -A256k because the L2 cache is that size (Simon Marlow
> mentioned this can lead to good performance
> http://stackoverflow.com/a/3172704/1018290)
> * I then tried a value of -A2048k because he also said "using a very
> large young generation size might outweigh the cache benefits". I
> don't exactly know what he meant by "a very large young generation
> size", so I guessed at this value. Is it in the right ballpark?
> * With -H, I tried values of -H8m, -H32m, -H128m, -H512m, -H1024m
>
> But all lead to worse performance over the defaults (and -H didn't
> really have much affect at all).
>
> I will try your suggestion of setting -A to the L3 cache size.
>
> Are there any other values I should try setting these at?
>
> As for your final point, I have run space profiling, and it looks like
>> 90% of the memory is used for our message index, which is a temporary
> store of messages that have gone through the system. These messages
> are stored in aligned chunks in memory that are merged together. I
> initially though this was causing the spikes, but they were still
> there even after I removed the component. I will try and run space
> profiling in the build with the message index.
>
> Thanks again.
>
> On 28 September 2015 at 19:02, Gregory Collins <[hidden email]> wrote:
>>
>> On Mon, Sep 28, 2015 at 9:08 AM, Will Sewell <[hidden email]> wrote:
>>>
>>> If it is the GC, then is there anything that can be done about it?
>>
>> Increase value of -A (the default is too small) -- best value for this is L3
>> cache size of the chip
>> Increase value of -H (total heap size) -- this will use more ram but you'll
>> run GC less often
>> This will sound flip, but: generate less garbage. Frequency of GC runs is
>> proportional to the amount of garbage being produced, so if you can lower
>> mutator allocation rate then you will also increase net productivity.
>> Built-up thunks can transparently hide a lot of allocation so fire up the
>> profiler and tighten those up (there's an 80-20 rule here). Reuse output
>> buffers if you aren't already, etc.
>>
>> G
>>
>> --
>> Gregory Collins <[hidden email]>
> _______________________________________________
> Glasgow-haskell-users mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: Removing latency spikes. Garbage collector related?

Will Sewell
Thank you for the reply Neil.

The spikes are in response time. The graph I linked to shows the
distribution of response times in a given window of time (darkness of
the square is the number of messages in a particular window of
response time). So the spikes are in the mean and also the max
response time. Having said that I'm not exactly sure what you mean by
"mean values".

I will have a look into -I0.

Yes the arrival of messages is constant. This graph shows the number
of messages that have been published to the system:
http://i.imgur.com/ADzMPIp.png

On 29 September 2015 at 10:16, Neil Davies
<[hidden email]> wrote:

> Will
>
> is your issue with the spikes i response time, rather than the mean values?
>
> If so, once you’ve reduced the amount of unnecessary mutation, you might want
> to take more control over when the GC is taking place. You might want to disable
> GC on timer (-I0) and force GC to occur at points you select - we found this useful.
>
> Lastly, is the arrival pattern (and distribution pattern) of messages constant or
> variable? just making sure that you are not trying to fight basic queueing theory here.
>
>
> Neil
>
> On 29 Sep 2015, at 10:03, Will Sewell <[hidden email]> wrote:
>
>> Thanks for the reply Greg. I have already tried tweaking these values
>> a bit, and this is what I found:
>>
>> * I first tried -A256k because the L2 cache is that size (Simon Marlow
>> mentioned this can lead to good performance
>> http://stackoverflow.com/a/3172704/1018290)
>> * I then tried a value of -A2048k because he also said "using a very
>> large young generation size might outweigh the cache benefits". I
>> don't exactly know what he meant by "a very large young generation
>> size", so I guessed at this value. Is it in the right ballpark?
>> * With -H, I tried values of -H8m, -H32m, -H128m, -H512m, -H1024m
>>
>> But all lead to worse performance over the defaults (and -H didn't
>> really have much affect at all).
>>
>> I will try your suggestion of setting -A to the L3 cache size.
>>
>> Are there any other values I should try setting these at?
>>
>> As for your final point, I have run space profiling, and it looks like
>>> 90% of the memory is used for our message index, which is a temporary
>> store of messages that have gone through the system. These messages
>> are stored in aligned chunks in memory that are merged together. I
>> initially though this was causing the spikes, but they were still
>> there even after I removed the component. I will try and run space
>> profiling in the build with the message index.
>>
>> Thanks again.
>>
>> On 28 September 2015 at 19:02, Gregory Collins <[hidden email]> wrote:
>>>
>>> On Mon, Sep 28, 2015 at 9:08 AM, Will Sewell <[hidden email]> wrote:
>>>>
>>>> If it is the GC, then is there anything that can be done about it?
>>>
>>> Increase value of -A (the default is too small) -- best value for this is L3
>>> cache size of the chip
>>> Increase value of -H (total heap size) -- this will use more ram but you'll
>>> run GC less often
>>> This will sound flip, but: generate less garbage. Frequency of GC runs is
>>> proportional to the amount of garbage being produced, so if you can lower
>>> mutator allocation rate then you will also increase net productivity.
>>> Built-up thunks can transparently hide a lot of allocation so fire up the
>>> profiler and tighten those up (there's an 80-20 rule here). Reuse output
>>> buffers if you aren't already, etc.
>>>
>>> G
>>>
>>> --
>>> Gregory Collins <[hidden email]>
>> _______________________________________________
>> Glasgow-haskell-users mailing list
>> [hidden email]
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users
>
_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: Removing latency spikes. Garbage collector related?

Neil Davies
Will

I was trying to get a feeling for what those coloured squares actually
denoted - typically we examine this sort of performance information
as CDFs (cumulative distribution functions[1]) trying to pull apart the
issues that “mean” effecting (i.e typical path through code/system) and
those that are “tail” effecting (i.e exceptions - and GC running could
be seen as an “exception” - one that you can manage and time shift
in the relative timing).

I’m assuming that messages have a similar “cost” (i.e similar work
to complete) - so that a uniform arrival rate equates to a uniform
rate of work to be done arriving.

Neil
[1] We plot the CDF’s in two ways, the “usual” way for the major part
of the probability mass and then as a (1-CDF) on a log log scale to
expose the tail behaviour.

On 29 Sep 2015, at 10:35, Will Sewell <[hidden email]> wrote:

> Thank you for the reply Neil.
>
> The spikes are in response time. The graph I linked to shows the
> distribution of response times in a given window of time (darkness of
> the square is the number of messages in a particular window of
> response time). So the spikes are in the mean and also the max
> response time. Having said that I'm not exactly sure what you mean by
> "mean values".
>
> I will have a look into -I0.
>
> Yes the arrival of messages is constant. This graph shows the number
> of messages that have been published to the system:
> http://i.imgur.com/ADzMPIp.png
>
> On 29 September 2015 at 10:16, Neil Davies
> <[hidden email]> wrote:
>> Will
>>
>> is your issue with the spikes i response time, rather than the mean values?
>>
>> If so, once you’ve reduced the amount of unnecessary mutation, you might want
>> to take more control over when the GC is taking place. You might want to disable
>> GC on timer (-I0) and force GC to occur at points you select - we found this useful.
>>
>> Lastly, is the arrival pattern (and distribution pattern) of messages constant or
>> variable? just making sure that you are not trying to fight basic queueing theory here.
>>
>>
>> Neil
>>
>> On 29 Sep 2015, at 10:03, Will Sewell <[hidden email]> wrote:
>>
>>> Thanks for the reply Greg. I have already tried tweaking these values
>>> a bit, and this is what I found:
>>>
>>> * I first tried -A256k because the L2 cache is that size (Simon Marlow
>>> mentioned this can lead to good performance
>>> http://stackoverflow.com/a/3172704/1018290)
>>> * I then tried a value of -A2048k because he also said "using a very
>>> large young generation size might outweigh the cache benefits". I
>>> don't exactly know what he meant by "a very large young generation
>>> size", so I guessed at this value. Is it in the right ballpark?
>>> * With -H, I tried values of -H8m, -H32m, -H128m, -H512m, -H1024m
>>>
>>> But all lead to worse performance over the defaults (and -H didn't
>>> really have much affect at all).
>>>
>>> I will try your suggestion of setting -A to the L3 cache size.
>>>
>>> Are there any other values I should try setting these at?
>>>
>>> As for your final point, I have run space profiling, and it looks like
>>>> 90% of the memory is used for our message index, which is a temporary
>>> store of messages that have gone through the system. These messages
>>> are stored in aligned chunks in memory that are merged together. I
>>> initially though this was causing the spikes, but they were still
>>> there even after I removed the component. I will try and run space
>>> profiling in the build with the message index.
>>>
>>> Thanks again.
>>>
>>> On 28 September 2015 at 19:02, Gregory Collins <[hidden email]> wrote:
>>>>
>>>> On Mon, Sep 28, 2015 at 9:08 AM, Will Sewell <[hidden email]> wrote:
>>>>>
>>>>> If it is the GC, then is there anything that can be done about it?
>>>>
>>>> Increase value of -A (the default is too small) -- best value for this is L3
>>>> cache size of the chip
>>>> Increase value of -H (total heap size) -- this will use more ram but you'll
>>>> run GC less often
>>>> This will sound flip, but: generate less garbage. Frequency of GC runs is
>>>> proportional to the amount of garbage being produced, so if you can lower
>>>> mutator allocation rate then you will also increase net productivity.
>>>> Built-up thunks can transparently hide a lot of allocation so fire up the
>>>> profiler and tighten those up (there's an 80-20 rule here). Reuse output
>>>> buffers if you aren't already, etc.
>>>>
>>>> G
>>>>
>>>> --
>>>> Gregory Collins <[hidden email]>
>>> _______________________________________________
>>> Glasgow-haskell-users mailing list
>>> [hidden email]
>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users
>>

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: Removing latency spikes. Garbage collector related?

Will Sewell
That's interesting. I have not done this kind of work before, and had
not come across CDFs. I can see why it make sense to look at the mean
and tail.

Your assumption is correct. The messages have a similar cost, which is
why the graph I posted is relatively flat most of the time. The spikes
suggest to me that it is a tail affecting issue because the messages
are following the same code path as when it is running normally.

On 29 September 2015 at 11:45, Neil Davies
<[hidden email]> wrote:

> Will
>
> I was trying to get a feeling for what those coloured squares actually
> denoted - typically we examine this sort of performance information
> as CDFs (cumulative distribution functions[1]) trying to pull apart the
> issues that “mean” effecting (i.e typical path through code/system) and
> those that are “tail” effecting (i.e exceptions - and GC running could
> be seen as an “exception” - one that you can manage and time shift
> in the relative timing).
>
> I’m assuming that messages have a similar “cost” (i.e similar work
> to complete) - so that a uniform arrival rate equates to a uniform
> rate of work to be done arriving.
>
> Neil
> [1] We plot the CDF’s in two ways, the “usual” way for the major part
> of the probability mass and then as a (1-CDF) on a log log scale to
> expose the tail behaviour.
>
> On 29 Sep 2015, at 10:35, Will Sewell <[hidden email]> wrote:
>
>> Thank you for the reply Neil.
>>
>> The spikes are in response time. The graph I linked to shows the
>> distribution of response times in a given window of time (darkness of
>> the square is the number of messages in a particular window of
>> response time). So the spikes are in the mean and also the max
>> response time. Having said that I'm not exactly sure what you mean by
>> "mean values".
>>
>> I will have a look into -I0.
>>
>> Yes the arrival of messages is constant. This graph shows the number
>> of messages that have been published to the system:
>> http://i.imgur.com/ADzMPIp.png
>>
>> On 29 September 2015 at 10:16, Neil Davies
>> <[hidden email]> wrote:
>>> Will
>>>
>>> is your issue with the spikes i response time, rather than the mean values?
>>>
>>> If so, once you’ve reduced the amount of unnecessary mutation, you might want
>>> to take more control over when the GC is taking place. You might want to disable
>>> GC on timer (-I0) and force GC to occur at points you select - we found this useful.
>>>
>>> Lastly, is the arrival pattern (and distribution pattern) of messages constant or
>>> variable? just making sure that you are not trying to fight basic queueing theory here.
>>>
>>>
>>> Neil
>>>
>>> On 29 Sep 2015, at 10:03, Will Sewell <[hidden email]> wrote:
>>>
>>>> Thanks for the reply Greg. I have already tried tweaking these values
>>>> a bit, and this is what I found:
>>>>
>>>> * I first tried -A256k because the L2 cache is that size (Simon Marlow
>>>> mentioned this can lead to good performance
>>>> http://stackoverflow.com/a/3172704/1018290)
>>>> * I then tried a value of -A2048k because he also said "using a very
>>>> large young generation size might outweigh the cache benefits". I
>>>> don't exactly know what he meant by "a very large young generation
>>>> size", so I guessed at this value. Is it in the right ballpark?
>>>> * With -H, I tried values of -H8m, -H32m, -H128m, -H512m, -H1024m
>>>>
>>>> But all lead to worse performance over the defaults (and -H didn't
>>>> really have much affect at all).
>>>>
>>>> I will try your suggestion of setting -A to the L3 cache size.
>>>>
>>>> Are there any other values I should try setting these at?
>>>>
>>>> As for your final point, I have run space profiling, and it looks like
>>>>> 90% of the memory is used for our message index, which is a temporary
>>>> store of messages that have gone through the system. These messages
>>>> are stored in aligned chunks in memory that are merged together. I
>>>> initially though this was causing the spikes, but they were still
>>>> there even after I removed the component. I will try and run space
>>>> profiling in the build with the message index.
>>>>
>>>> Thanks again.
>>>>
>>>> On 28 September 2015 at 19:02, Gregory Collins <[hidden email]> wrote:
>>>>>
>>>>> On Mon, Sep 28, 2015 at 9:08 AM, Will Sewell <[hidden email]> wrote:
>>>>>>
>>>>>> If it is the GC, then is there anything that can be done about it?
>>>>>
>>>>> Increase value of -A (the default is too small) -- best value for this is L3
>>>>> cache size of the chip
>>>>> Increase value of -H (total heap size) -- this will use more ram but you'll
>>>>> run GC less often
>>>>> This will sound flip, but: generate less garbage. Frequency of GC runs is
>>>>> proportional to the amount of garbage being produced, so if you can lower
>>>>> mutator allocation rate then you will also increase net productivity.
>>>>> Built-up thunks can transparently hide a lot of allocation so fire up the
>>>>> profiler and tighten those up (there's an 80-20 rule here). Reuse output
>>>>> buffers if you aren't already, etc.
>>>>>
>>>>> G
>>>>>
>>>>> --
>>>>> Gregory Collins <[hidden email]>
>>>> _______________________________________________
>>>> Glasgow-haskell-users mailing list
>>>> [hidden email]
>>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users
>>>
>
_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: Removing latency spikes. Garbage collector related?

Gregory Collins-3
In reply to this post by Will Sewell

On Tue, Sep 29, 2015 at 2:03 AM, Will Sewell <[hidden email]> wrote:
* I then tried a value of -A2048k because he also said "using a very
large young generation size might outweigh the cache benefits". I
don't exactly know what he meant by "a very large young generation
size", so I guessed at this value. Is it in the right ballpark?

I usually use 2-8M for this value, depending on the chip. Most values in the young generation are going to be garbage, and collection is O(num_live_objects), so as long as you can keep this buffer and your working set (i.e. the long-lived stuff that doesn't get GC'ed) in L3 cache, higher values are better. I expect there is another such phase transition as you set -A around the L2 cache size, but everything depends on what your program is actually doing. Keeping a smaller young generation will mean that those cache lines are hotter than they would be if you set it larger, and that means increasing L2 cache pressure and potentially evicting working set, so maybe you make average GC pause time faster (helping with tail latency) at the expense of doing GC more often and maybe reducing the amount of L2 cache available.

* With -H, I tried values of -H8m, -H32m, -H128m, -H512m, -H1024m

But all lead to worse performance over the defaults (and -H didn't
really have much affect at all).

What you should expect to see as you increase -H is that major GC pauses become more infrequent, but average GC times go up. Dumping +RTS -S for us will also help us understand your GC behaviour, since I wouldn't expect to see 1s pauses on any but the largest heaps. Are you using large MutableArrays?

--
Gregory Collins <[hidden email]>

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: Removing latency spikes. Garbage collector related?

John Lato-2
In reply to this post by Will Sewell

By dumping metrics, I mean essentially the same as the ghc-events-analyze annotations but with any more information that is useful for the investigation.  In particular,  if you have a message id, include that. You may also want to annotate thread names with GHC.Conc.labelThread. You may also want to add more annotations to drill down if you uncover a problem area.

If I were investigating, I would take e.g. the five largest outliers, then look in the (text) eventlog for those message ids, and see what happened between the start and stop.  You'll likely want to track the thread states (which is why I suggested you annotate the thread names).

I'm not convinced it's entirely the GC, the latencies are larger than I would expect from a GC pause (although lots of factors can affect that). I suspect that either you have something causing abnormal GC spikes, or there's a different cause.


On 04:15, Tue, Sep 29, 2015 Will Sewell <[hidden email]> wrote:
Thanks for the reply John. I will have a go at doing that. What do you
mean exactly by dumping metrics, do you mean measuring the latency
within the program, and dumping it if it exceeds a certain threshold?

And from the answers I'm assuming you believe it is the GC that is
most likely causing these spikes. I've never profiled Haskell code, so
I'm not used to seeing what the effects of the GC actually are.

On 28 September 2015 at 19:31, John Lato <[hidden email]> wrote:
> Try Greg's recommendations first.  If you still need to do more
> investigation, I'd recommend that you look at some samples with either
> threadscope or dumping the eventlog to text.  I really like
> ghc-events-analyze, but it doesn't provide quite the same level of detail.
> You may also want to dump some of your metrics into the eventlog, because
> then you'll be able to see exactly how high latency episodes line up with GC
> pauses.
>
> On Mon, Sep 28, 2015 at 1:02 PM Gregory Collins <[hidden email]>
> wrote:
>>
>>
>> On Mon, Sep 28, 2015 at 9:08 AM, Will Sewell <[hidden email]> wrote:
>>>
>>> If it is the GC, then is there anything that can be done about it?
>>
>> Increase value of -A (the default is too small) -- best value for this is
>> L3 cache size of the chip
>> Increase value of -H (total heap size) -- this will use more ram but
>> you'll run GC less often
>> This will sound flip, but: generate less garbage. Frequency of GC runs is
>> proportional to the amount of garbage being produced, so if you can lower
>> mutator allocation rate then you will also increase net productivity.
>> Built-up thunks can transparently hide a lot of allocation so fire up the
>> profiler and tighten those up (there's an 80-20 rule here). Reuse output
>> buffers if you aren't already, etc.
>>
>> G
>>
>> --
>> Gregory Collins <[hidden email]>
>> _______________________________________________
>> Glasgow-haskell-users mailing list
>> [hidden email]
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: Removing latency spikes. Garbage collector related?

Will Sewell
Thanks you for the suggestions both of you.

After some more profiling, I've come to realise that most of the
garbage is created from allocations when serialising messages to write
to the socket. I am going to try and reduce this next. If that does
not help reduce the latency spikes, I will work through each of your
suggestions.

On 29 September 2015 at 16:47, John Lato <[hidden email]> wrote:

> By dumping metrics, I mean essentially the same as the ghc-events-analyze
> annotations but with any more information that is useful for the
> investigation.  In particular,  if you have a message id, include that. You
> may also want to annotate thread names with GHC.Conc.labelThread. You may
> also want to add more annotations to drill down if you uncover a problem
> area.
>
> If I were investigating, I would take e.g. the five largest outliers, then
> look in the (text) eventlog for those message ids, and see what happened
> between the start and stop.  You'll likely want to track the thread states
> (which is why I suggested you annotate the thread names).
>
> I'm not convinced it's entirely the GC, the latencies are larger than I
> would expect from a GC pause (although lots of factors can affect that). I
> suspect that either you have something causing abnormal GC spikes, or
> there's a different cause.
>
>
> On 04:15, Tue, Sep 29, 2015 Will Sewell <[hidden email]> wrote:
>>
>> Thanks for the reply John. I will have a go at doing that. What do you
>> mean exactly by dumping metrics, do you mean measuring the latency
>> within the program, and dumping it if it exceeds a certain threshold?
>>
>> And from the answers I'm assuming you believe it is the GC that is
>> most likely causing these spikes. I've never profiled Haskell code, so
>> I'm not used to seeing what the effects of the GC actually are.
>>
>> On 28 September 2015 at 19:31, John Lato <[hidden email]> wrote:
>> > Try Greg's recommendations first.  If you still need to do more
>> > investigation, I'd recommend that you look at some samples with either
>> > threadscope or dumping the eventlog to text.  I really like
>> > ghc-events-analyze, but it doesn't provide quite the same level of
>> > detail.
>> > You may also want to dump some of your metrics into the eventlog,
>> > because
>> > then you'll be able to see exactly how high latency episodes line up
>> > with GC
>> > pauses.
>> >
>> > On Mon, Sep 28, 2015 at 1:02 PM Gregory Collins
>> > <[hidden email]>
>> > wrote:
>> >>
>> >>
>> >> On Mon, Sep 28, 2015 at 9:08 AM, Will Sewell <[hidden email]> wrote:
>> >>>
>> >>> If it is the GC, then is there anything that can be done about it?
>> >>
>> >> Increase value of -A (the default is too small) -- best value for this
>> >> is
>> >> L3 cache size of the chip
>> >> Increase value of -H (total heap size) -- this will use more ram but
>> >> you'll run GC less often
>> >> This will sound flip, but: generate less garbage. Frequency of GC runs
>> >> is
>> >> proportional to the amount of garbage being produced, so if you can
>> >> lower
>> >> mutator allocation rate then you will also increase net productivity.
>> >> Built-up thunks can transparently hide a lot of allocation so fire up
>> >> the
>> >> profiler and tighten those up (there's an 80-20 rule here). Reuse
>> >> output
>> >> buffers if you aren't already, etc.
>> >>
>> >> G
>> >>
>> >> --
>> >> Gregory Collins <[hidden email]>
>> >> _______________________________________________
>> >> Glasgow-haskell-users mailing list
>> >> [hidden email]
>> >> http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users
_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users