GHC Threads affinity

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

GHC Threads affinity

Michael Baikov
Greetings


Currently GHC supports two kinds of threads - pinned to a specific capability (bound threads) and those it can migrate between any capabilities (unbound threads). For purposes of achieving lower latency in Haskell applications it would be nice to have something in between - threads GHC can migrate but within a certain subset of capabilities only.

I'm developing a program that contains several kinds of threads - those that do little work and sensitive to latency and those that can spend more CPU time and less latency sensitive. I looked into several cases of increased latency in those sensitive threads (using GHC eventlog) and in all cases sensitive threads were waiting for non-sensitive threads to finish working. I was able to reduce worst case latency by factor of 10 by pinning all the threads in the program to specific capability but manually distributing threads (60+ of them) between capabilities (several different machines with different numbers of cores available) seems very fragile. World stopping GC is still a problem but at least in my case is much less frequently so.

It would be nice to be able to allow GHC runtime to migrate a thread between a subset of capabilities using interface similar to this one:

-- creates a thread that is allowed to migrate between capabilities according to following rule: ghc is allowed to run this thread on Nth capability if Nth `mod` size_of_word bit in mask is set.
forkOn' :: Int -> IO () -> IO ThreadId
forkOn' mask act = undefined

This should allow to define up to 64 (32) distinct groups and allow user to break down their threads into bigger number of potentially intersecting groups by specifying things like capability 0 does latency sensitive things, caps 1..5 - less  sensitive things, caps 6-7 bulk things.

Anything obvious I'm missing? Any recommendations to how to implement this?

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: GHC Threads affinity

Simon Marlow-7
On 10 September 2017 at 04:03, Michael Baikov <[hidden email]> wrote:
Greetings


Currently GHC supports two kinds of threads - pinned to a specific capability (bound threads) and those it can migrate between any capabilities (unbound threads). For purposes of achieving lower latency in Haskell applications it would be nice to have something in between - threads GHC can migrate but within a certain subset of capabilities only.

That's not correct actually: a bound thread is associated with a particular OS thread, but it can migrate between capabilities just like unbound threads.
 
I'm developing a program that contains several kinds of threads - those that do little work and sensitive to latency and those that can spend more CPU time and less latency sensitive. I looked into several cases of increased latency in those sensitive threads (using GHC eventlog) and in all cases sensitive threads were waiting for non-sensitive threads to finish working. I was able to reduce worst case latency by factor of 10 by pinning all the threads in the program to specific capability but manually distributing threads (60+ of them) between capabilities (several different machines with different numbers of cores available) seems very fragile. World stopping GC is still a problem but at least in my case is much less frequently so.
 
If you have a fixed set of threads you might just want to use -N<threads> -qn<cores>, and then pin every thread to a different capability.  This gives you 1:1 scheduling at the GHC level, delegating the scheduling job to the OS.  You will also want to use nursery chunks with something like -n2m, so you don't waste too much nursery space on the idle capabilities.

Even if your set of threads isn't fixed you might be able to use a hybrid scheme with -N<large> -qn<cores> and pin the high-priority threads on their own capability, while putting all the low-priority threads on a single capability, or a few separate ones.

It would be nice to be able to allow GHC runtime to migrate a thread between a subset of capabilities using interface similar to this one:

-- creates a thread that is allowed to migrate between capabilities according to following rule: ghc is allowed to run this thread on Nth capability if Nth `mod` size_of_word bit in mask is set.
forkOn' :: Int -> IO () -> IO ThreadId
forkOn' mask act = undefined

This should allow to define up to 64 (32) distinct groups and allow user to break down their threads into bigger number of potentially intersecting groups by specifying things like capability 0 does latency sensitive things, caps 1..5 - less  sensitive things, caps 6-7 bulk things.

We could do this, but it would add some complexity to the scheduler and load balancer (which has already been quite hard to get right, I fixed a handful of bugs there recently). I'd be happy review a patch if you want to try it though.

Cheers
Simon
 

Anything obvious I'm missing? Any recommendations to how to implement this?

 

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs



_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: GHC Threads affinity

Michael Baikov

>> I'm developing a program that contains several kinds of threads - those that do little work and sensitive to latency and those that can spend more CPU time and less latency sensitive. I looked into several cases of increased latency in those sensitive threads (using GHC eventlog) and in all cases sensitive threads were waiting for non-sensitive threads to finish working. I was able to reduce worst case latency by factor of 10 by pinning all the threads in the program to specific capability but manually distributing threads (60+ of them) between capabilities (several different machines with different numbers of cores available) seems very fragile. World stopping GC is still a problem but at least in my case is much less frequently so.
>  
> If you have a fixed set of threads you might just want to use -N<threads> -qn<cores>, and then pin every thread to a different capability.  This gives you 1:1 scheduling at the GHC level, delegating the scheduling job to the OS.  You will also want to use nursery chunks with something like -n2m, so you don't waste too much nursery space on the idle capabilities.
>
> Even if your set of threads isn't fixed you might be able to use a hybrid scheme with -N<large> -qn<cores> and pin the high-priority threads on their own capability, while putting all the low-priority threads on a single capability, or a few separate ones.

There's about 80 threads right now and some of them are very short lived. Most of them are low priority and require lots of CPU which means having to manually distribute them over several capabilities - this process I'd like to avoid.

>> It would be nice to be able to allow GHC runtime to migrate a thread between a subset of capabilities using interface similar to this one:
>>
>> -- creates a thread that is allowed to migrate between capabilities according to following rule: ghc is allowed to run this thread on Nth capability if Nth `mod` size_of_word bit in mask is set.
>> forkOn' :: Int -> IO () -> IO ThreadId
>> forkOn' mask act = undefined
>>
>> This should allow to define up to 64 (32) distinct groups and allow user to break down their threads into bigger number of potentially intersecting groups by specifying things like capability 0 does latency sensitive things, caps 1..5 - less  sensitive things, caps 6-7 bulk things.
>
>
> We could do this, but it would add some complexity to the scheduler and load balancer (which has already been quite hard to get right, I fixed a handful of bugs there recently). I'd be happy review a patch if you want to try it though.


I guess I'll start by studying the scheduler and load balancer in more details. Thank you for your input Simon!

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: GHC Threads affinity

Takenobu Tani
Hi,

Here is a simple diagram of forkIO, forkOn and forkOS:

  https://takenobu-hs.github.io/downloads/haskell_ghc_illustrated.pdf#page=69

Regards,
Takenobu


2017-09-11 21:54 GMT+09:00 Michael Baikov <[hidden email]>:

>> I'm developing a program that contains several kinds of threads - those that do little work and sensitive to latency and those that can spend more CPU time and less latency sensitive. I looked into several cases of increased latency in those sensitive threads (using GHC eventlog) and in all cases sensitive threads were waiting for non-sensitive threads to finish working. I was able to reduce worst case latency by factor of 10 by pinning all the threads in the program to specific capability but manually distributing threads (60+ of them) between capabilities (several different machines with different numbers of cores available) seems very fragile. World stopping GC is still a problem but at least in my case is much less frequently so.
>  
> If you have a fixed set of threads you might just want to use -N<threads> -qn<cores>, and then pin every thread to a different capability.  This gives you 1:1 scheduling at the GHC level, delegating the scheduling job to the OS.  You will also want to use nursery chunks with something like -n2m, so you don't waste too much nursery space on the idle capabilities.
>
> Even if your set of threads isn't fixed you might be able to use a hybrid scheme with -N<large> -qn<cores> and pin the high-priority threads on their own capability, while putting all the low-priority threads on a single capability, or a few separate ones.

There's about 80 threads right now and some of them are very short lived. Most of them are low priority and require lots of CPU which means having to manually distribute them over several capabilities - this process I'd like to avoid.

>> It would be nice to be able to allow GHC runtime to migrate a thread between a subset of capabilities using interface similar to this one:
>>
>> -- creates a thread that is allowed to migrate between capabilities according to following rule: ghc is allowed to run this thread on Nth capability if Nth `mod` size_of_word bit in mask is set.
>> forkOn' :: Int -> IO () -> IO ThreadId
>> forkOn' mask act = undefined
>>
>> This should allow to define up to 64 (32) distinct groups and allow user to break down their threads into bigger number of potentially intersecting groups by specifying things like capability 0 does latency sensitive things, caps 1..5 - less  sensitive things, caps 6-7 bulk things.
>
>
> We could do this, but it would add some complexity to the scheduler and load balancer (which has already been quite hard to get right, I fixed a handful of bugs there recently). I'd be happy review a patch if you want to try it though.


I guess I'll start by studying the scheduler and load balancer in more details. Thank you for your input Simon!

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs



_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs