GHC Threads affinity

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

GHC Threads affinity

Michael Baikov
Greetings


Currently GHC supports two kinds of threads - pinned to a specific capability (bound threads) and those it can migrate between any capabilities (unbound threads). For purposes of achieving lower latency in Haskell applications it would be nice to have something in between - threads GHC can migrate but within a certain subset of capabilities only.

I'm developing a program that contains several kinds of threads - those that do little work and sensitive to latency and those that can spend more CPU time and less latency sensitive. I looked into several cases of increased latency in those sensitive threads (using GHC eventlog) and in all cases sensitive threads were waiting for non-sensitive threads to finish working. I was able to reduce worst case latency by factor of 10 by pinning all the threads in the program to specific capability but manually distributing threads (60+ of them) between capabilities (several different machines with different numbers of cores available) seems very fragile. World stopping GC is still a problem but at least in my case is much less frequently so.

It would be nice to be able to allow GHC runtime to migrate a thread between a subset of capabilities using interface similar to this one:

-- creates a thread that is allowed to migrate between capabilities according to following rule: ghc is allowed to run this thread on Nth capability if Nth `mod` size_of_word bit in mask is set.
forkOn' :: Int -> IO () -> IO ThreadId
forkOn' mask act = undefined

This should allow to define up to 64 (32) distinct groups and allow user to break down their threads into bigger number of potentially intersecting groups by specifying things like capability 0 does latency sensitive things, caps 1..5 - less  sensitive things, caps 6-7 bulk things.

Anything obvious I'm missing? Any recommendations to how to implement this?

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: GHC Threads affinity

Simon Marlow-7
On 10 September 2017 at 04:03, Michael Baikov <[hidden email]> wrote:
Greetings


Currently GHC supports two kinds of threads - pinned to a specific capability (bound threads) and those it can migrate between any capabilities (unbound threads). For purposes of achieving lower latency in Haskell applications it would be nice to have something in between - threads GHC can migrate but within a certain subset of capabilities only.

That's not correct actually: a bound thread is associated with a particular OS thread, but it can migrate between capabilities just like unbound threads.
 
I'm developing a program that contains several kinds of threads - those that do little work and sensitive to latency and those that can spend more CPU time and less latency sensitive. I looked into several cases of increased latency in those sensitive threads (using GHC eventlog) and in all cases sensitive threads were waiting for non-sensitive threads to finish working. I was able to reduce worst case latency by factor of 10 by pinning all the threads in the program to specific capability but manually distributing threads (60+ of them) between capabilities (several different machines with different numbers of cores available) seems very fragile. World stopping GC is still a problem but at least in my case is much less frequently so.
 
If you have a fixed set of threads you might just want to use -N<threads> -qn<cores>, and then pin every thread to a different capability.  This gives you 1:1 scheduling at the GHC level, delegating the scheduling job to the OS.  You will also want to use nursery chunks with something like -n2m, so you don't waste too much nursery space on the idle capabilities.

Even if your set of threads isn't fixed you might be able to use a hybrid scheme with -N<large> -qn<cores> and pin the high-priority threads on their own capability, while putting all the low-priority threads on a single capability, or a few separate ones.

It would be nice to be able to allow GHC runtime to migrate a thread between a subset of capabilities using interface similar to this one:

-- creates a thread that is allowed to migrate between capabilities according to following rule: ghc is allowed to run this thread on Nth capability if Nth `mod` size_of_word bit in mask is set.
forkOn' :: Int -> IO () -> IO ThreadId
forkOn' mask act = undefined

This should allow to define up to 64 (32) distinct groups and allow user to break down their threads into bigger number of potentially intersecting groups by specifying things like capability 0 does latency sensitive things, caps 1..5 - less  sensitive things, caps 6-7 bulk things.

We could do this, but it would add some complexity to the scheduler and load balancer (which has already been quite hard to get right, I fixed a handful of bugs there recently). I'd be happy review a patch if you want to try it though.

Cheers
Simon
 

Anything obvious I'm missing? Any recommendations to how to implement this?

 

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs



_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: GHC Threads affinity

Michael Baikov

>> I'm developing a program that contains several kinds of threads - those that do little work and sensitive to latency and those that can spend more CPU time and less latency sensitive. I looked into several cases of increased latency in those sensitive threads (using GHC eventlog) and in all cases sensitive threads were waiting for non-sensitive threads to finish working. I was able to reduce worst case latency by factor of 10 by pinning all the threads in the program to specific capability but manually distributing threads (60+ of them) between capabilities (several different machines with different numbers of cores available) seems very fragile. World stopping GC is still a problem but at least in my case is much less frequently so.
>  
> If you have a fixed set of threads you might just want to use -N<threads> -qn<cores>, and then pin every thread to a different capability.  This gives you 1:1 scheduling at the GHC level, delegating the scheduling job to the OS.  You will also want to use nursery chunks with something like -n2m, so you don't waste too much nursery space on the idle capabilities.
>
> Even if your set of threads isn't fixed you might be able to use a hybrid scheme with -N<large> -qn<cores> and pin the high-priority threads on their own capability, while putting all the low-priority threads on a single capability, or a few separate ones.

There's about 80 threads right now and some of them are very short lived. Most of them are low priority and require lots of CPU which means having to manually distribute them over several capabilities - this process I'd like to avoid.

>> It would be nice to be able to allow GHC runtime to migrate a thread between a subset of capabilities using interface similar to this one:
>>
>> -- creates a thread that is allowed to migrate between capabilities according to following rule: ghc is allowed to run this thread on Nth capability if Nth `mod` size_of_word bit in mask is set.
>> forkOn' :: Int -> IO () -> IO ThreadId
>> forkOn' mask act = undefined
>>
>> This should allow to define up to 64 (32) distinct groups and allow user to break down their threads into bigger number of potentially intersecting groups by specifying things like capability 0 does latency sensitive things, caps 1..5 - less  sensitive things, caps 6-7 bulk things.
>
>
> We could do this, but it would add some complexity to the scheduler and load balancer (which has already been quite hard to get right, I fixed a handful of bugs there recently). I'd be happy review a patch if you want to try it though.


I guess I'll start by studying the scheduler and load balancer in more details. Thank you for your input Simon!

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: GHC Threads affinity

Takenobu Tani
Hi,

Here is a simple diagram of forkIO, forkOn and forkOS:

  https://takenobu-hs.github.io/downloads/haskell_ghc_illustrated.pdf#page=69

Regards,
Takenobu


2017-09-11 21:54 GMT+09:00 Michael Baikov <[hidden email]>:

>> I'm developing a program that contains several kinds of threads - those that do little work and sensitive to latency and those that can spend more CPU time and less latency sensitive. I looked into several cases of increased latency in those sensitive threads (using GHC eventlog) and in all cases sensitive threads were waiting for non-sensitive threads to finish working. I was able to reduce worst case latency by factor of 10 by pinning all the threads in the program to specific capability but manually distributing threads (60+ of them) between capabilities (several different machines with different numbers of cores available) seems very fragile. World stopping GC is still a problem but at least in my case is much less frequently so.
>  
> If you have a fixed set of threads you might just want to use -N<threads> -qn<cores>, and then pin every thread to a different capability.  This gives you 1:1 scheduling at the GHC level, delegating the scheduling job to the OS.  You will also want to use nursery chunks with something like -n2m, so you don't waste too much nursery space on the idle capabilities.
>
> Even if your set of threads isn't fixed you might be able to use a hybrid scheme with -N<large> -qn<cores> and pin the high-priority threads on their own capability, while putting all the low-priority threads on a single capability, or a few separate ones.

There's about 80 threads right now and some of them are very short lived. Most of them are low priority and require lots of CPU which means having to manually distribute them over several capabilities - this process I'd like to avoid.

>> It would be nice to be able to allow GHC runtime to migrate a thread between a subset of capabilities using interface similar to this one:
>>
>> -- creates a thread that is allowed to migrate between capabilities according to following rule: ghc is allowed to run this thread on Nth capability if Nth `mod` size_of_word bit in mask is set.
>> forkOn' :: Int -> IO () -> IO ThreadId
>> forkOn' mask act = undefined
>>
>> This should allow to define up to 64 (32) distinct groups and allow user to break down their threads into bigger number of potentially intersecting groups by specifying things like capability 0 does latency sensitive things, caps 1..5 - less  sensitive things, caps 6-7 bulk things.
>
>
> We could do this, but it would add some complexity to the scheduler and load balancer (which has already been quite hard to get right, I fixed a handful of bugs there recently). I'd be happy review a patch if you want to try it though.


I guess I'll start by studying the scheduler and load balancer in more details. Thank you for your input Simon!

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs



_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: GHC Threads affinity

Niklas Hambüchen
In reply to this post by Michael Baikov
Hey Michael, greetings!

Here's a little side issue that may also be of interest to you in case
you've got HyperThreading on:

  https://ghc.haskell.org/trac/ghc/ticket/10229

Niklas
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: GHC Threads affinity

Michael Baikov
Hi Niklas

This is indeed looks interesting and I think I saw behavior similar to
this one. At the moment I'm working through ghc-events code to get
myself a better understanding on what is going on in threads scheduler
and get a tool that can handle event stream incrementally, once I'm
done with that - I'll see what can be done about that ticket.

On Sun, Oct 1, 2017 at 7:51 AM, Niklas Hambüchen <[hidden email]> wrote:
> Hey Michael, greetings!
>
> Here's a little side issue that may also be of interest to you in case
> you've got HyperThreading on:
>
>   https://ghc.haskell.org/trac/ghc/ticket/10229
>
> Niklas
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: GHC Threads affinity

Boespflug, Mathieu
Note that the (AFAIK unreleased) version of ghc-events on the master
branch of the upstream repo can parse event streams incrementally, if
that's what you meant.
--
Mathieu Boespflug
Founder at http://tweag.io.


On 1 October 2017 at 03:49, Michael Baikov <[hidden email]> wrote:

> Hi Niklas
>
> This is indeed looks interesting and I think I saw behavior similar to
> this one. At the moment I'm working through ghc-events code to get
> myself a better understanding on what is going on in threads scheduler
> and get a tool that can handle event stream incrementally, once I'm
> done with that - I'll see what can be done about that ticket.
>
> On Sun, Oct 1, 2017 at 7:51 AM, Niklas Hambüchen <[hidden email]> wrote:
>> Hey Michael, greetings!
>>
>> Here's a little side issue that may also be of interest to you in case
>> you've got HyperThreading on:
>>
>>   https://ghc.haskell.org/trac/ghc/ticket/10229
>>
>> Niklas
> _______________________________________________
> ghc-devs mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: GHC Threads affinity

Michael Baikov
On Sun, Oct 1, 2017 at 8:09 PM, Boespflug, Mathieu <[hidden email]> wrote:
> Note that the (AFAIK unreleased) version of ghc-events on the master
> branch of the upstream repo can parse event streams incrementally, if
> that's what you meant.

It can but it got some problems. For one the only thing incremental
parser can do is to print the output - for everything else old parser
will be used and output of that incremental is partially out of order
due to the way event blocks are stored. Anyway, I already have my own
version that does proper incremental parsing, provides interface with
a streaming library and collects some info that wasn't available in
the original version. How it's more about shuffling stuff around,
cleaning and testing.
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: GHC Threads affinity

Boespflug, Mathieu
You might want to have a look at
https://github.com/mboes/ghc-events/tree/streaming. Similar to what
you mention, it uses the "streaming" package for the incremental
parsing. I ran into an issue with binary that I wasn't able to track
down: even medium sized buffers make the parsing slower (significantly
so) rather than faster (my suspicion: something somewhere that should
be constant time is actually linear).
--
Mathieu Boespflug
Founder at http://tweag.io.


On 1 October 2017 at 14:34, Michael Baikov <[hidden email]> wrote:

> On Sun, Oct 1, 2017 at 8:09 PM, Boespflug, Mathieu <[hidden email]> wrote:
>> Note that the (AFAIK unreleased) version of ghc-events on the master
>> branch of the upstream repo can parse event streams incrementally, if
>> that's what you meant.
>
> It can but it got some problems. For one the only thing incremental
> parser can do is to print the output - for everything else old parser
> will be used and output of that incremental is partially out of order
> due to the way event blocks are stored. Anyway, I already have my own
> version that does proper incremental parsing, provides interface with
> a streaming library and collects some info that wasn't available in
> the original version. How it's more about shuffling stuff around,
> cleaning and testing.
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: GHC Threads affinity

Michael Baikov
Hmmm.... I'll take a look, but from what I see - it uses the same code as ghc-events for decoding and all the streaming done in a short single commit - it must suffer the same bug then. It is not a single stream of events, it's several of them per capability mixed due to caching done by RTS so you need to decode several streams at once and merge the results.

On Oct 1, 2017 20:43, "Boespflug, Mathieu" <[hidden email]> wrote:
You might want to have a look at
https://github.com/mboes/ghc-events/tree/streaming. Similar to what
you mention, it uses the "streaming" package for the incremental
parsing. I ran into an issue with binary that I wasn't able to track
down: even medium sized buffers make the parsing slower (significantly
so) rather than faster (my suspicion: something somewhere that should
be constant time is actually linear).
--
Mathieu Boespflug
Founder at http://tweag.io.


On 1 October 2017 at 14:34, Michael Baikov <[hidden email]> wrote:
> On Sun, Oct 1, 2017 at 8:09 PM, Boespflug, Mathieu <[hidden email]> wrote:
>> Note that the (AFAIK unreleased) version of ghc-events on the master
>> branch of the upstream repo can parse event streams incrementally, if
>> that's what you meant.
>
> It can but it got some problems. For one the only thing incremental
> parser can do is to print the output - for everything else old parser
> will be used and output of that incremental is partially out of order
> due to the way event blocks are stored. Anyway, I already have my own
> version that does proper incremental parsing, provides interface with
> a streaming library and collects some info that wasn't available in
> the original version. How it's more about shuffling stuff around,
> cleaning and testing.

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: GHC Threads affinity

Ben Gamari-2
In reply to this post by Boespflug, Mathieu
"Boespflug, Mathieu" <[hidden email]> writes:

> You might want to have a look at
> https://github.com/mboes/ghc-events/tree/streaming. Similar to what
> you mention, it uses the "streaming" package for the incremental
> parsing. I ran into an issue with binary that I wasn't able to track
> down: even medium sized buffers make the parsing slower (significantly
> so) rather than faster (my suspicion: something somewhere that should
> be constant time is actually linear).

Indeed there was a rather terrible bug potentially leading to
unexpected asymptotic performance issues present in `binary` versions
prior to 0.8.4 IIRC. See https://github.com/kolmodin/binary/pull/115.
Perhaps this is what you are hitting?

Cheers,

- Ben

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

signature.asc (497 bytes) Download Attachment