Joels Time Leak

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
33 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Joels Time Leak

Adrian Hey
Hello,

I haven't followed everything that's happened on the Binary IO
thread, but has anybody else actually tried Joels code? ..

 http://wagerlabs.com/timeleak.tgz

I can reproduce the problem (ghc/Linux), but can't explain it. It
seems very strange that friggin about with an otherwise irrelevant
(AFAICT) MVar fixes the problem.

Regards
--
Adrian Hey
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

RE: Joels Time Leak

Simon Peyton Jones
Thanks for looking, Adrian,

It'd be great if someone was able to find out more about what's going on
here.   Bandwidth at GHC HQ is always tight, so the more precisely
someone can pinpoint what's happening, the faster we can fix it.  Joel
has done a lot by making a repro case.  Maybe others can help to narrow
it down?

Simon

| -----Original Message-----
| From: [hidden email]
[mailto:[hidden email]] On Behalf Of
| Adrian Hey
| Sent: 29 December 2005 13:03
| To: [hidden email]
| Subject: [Haskell-cafe] Joels Time Leak
|
| Hello,
|
| I haven't followed everything that's happened on the Binary IO
| thread, but has anybody else actually tried Joels code? ..
|
|  http://wagerlabs.com/timeleak.tgz
|
| I can reproduce the problem (ghc/Linux), but can't explain it. It
| seems very strange that friggin about with an otherwise irrelevant
| (AFAICT) MVar fixes the problem.
|
| Regards
| --
| Adrian Hey
| _______________________________________________
| Haskell-Cafe mailing list
| [hidden email]
| http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Joels Time Leak

Tomasz Zielonka
In reply to this post by Adrian Hey
On Thu, Dec 29, 2005 at 01:02:45PM +0000, Adrian Hey wrote:
> I haven't followed everything that's happened on the Binary IO
> thread, but has anybody else actually tried Joels code? ..
>
>  http://wagerlabs.com/timeleak.tgz

I have and I am puzzled too.

> I can reproduce the problem (ghc/Linux), but can't explain it. It
> seems very strange that friggin about with an otherwise irrelevant
> (AFAICT) MVar fixes the problem.

Could be something with GHC's thread scheduler (is it fair?).
I have some ideas for workarounds, but I didn't get around
to trying them. For example, I think that unpickling too many
messages simultaneously put too much pressure on memory, cache and GC.

I don't really have the time to test my serialization libraries with
Joels code - mostly it's this unicode that scares me off (should it?).
Maybe I'll try to simplify things on my own.

Best regards
Tomasz

--
I am searching for a programmer who is good at least in some of
[Haskell, ML, C++, Linux, FreeBSD, math] for work in Warsaw, Poland
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Joels Time Leak

Joel Reymont
In reply to this post by Adrian Hey
Adrian,

There's no mistery here.

Threads take a while to unpickle the large server info packet when  
the gang up on it all together. By adding the MVar you are basically  
installing a door in front of the packet and only letting one thread  
come through.

The end result is that you are pushing the timeout somewhere else as  
threads spend a lot of time queued up and waiting to get through the  
door.

It takes a fraction of a second for a thread to unpickle the server  
info packet. It needs to make a FFI call to ZLib's uncompress and  
that's a blocking call since it's marked unsafe. I think that happens  
rather quickly otherwise alerts would be showing here. What takes  
time is unpickling the packet after it's uncompressed.

Why does it take a fraction of a second for 1 thread to unpickle and  
several seconds per thread for several threads to do it at the same  
time? I think this is where the mistery lies.

        Joel

On Dec 29, 2005, at 1:02 PM, Adrian Hey wrote:

> I can reproduce the problem (ghc/Linux), but can't explain it. It
> seems very strange that friggin about with an otherwise irrelevant
> (AFAICT) MVar fixes the problem.

--
http://wagerlabs.com/





_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Joels Time Leak

Tomasz Zielonka
On Thu, Dec 29, 2005 at 01:20:41PM +0000, Joel Reymont wrote:
> Why does it take a fraction of a second for 1 thread to unpickle and  
> several seconds per thread for several threads to do it at the same  
> time? I think this is where the mistery lies.

Have you considered any of this:

- too big memory pressure: more memory means more frequent and more
  expensive GCs, 1000 threads using so much memory means bad cache
  performance
- a deficiency of GHC's thread scheduler - giving too much time one
  thread steals it from others (Simons, don't get angry at me - I am
  probably wrong here ;-)

Best regards
Tomasz

--
I am searching for a programmer who is good at least in some of
[Haskell, ML, C++, Linux, FreeBSD, math] for work in Warsaw, Poland
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Joels Time Leak

Joel Reymont

On Dec 29, 2005, at 1:23 PM, Tomasz Zielonka wrote:

> - a deficiency of GHC's thread scheduler - giving too much time one
>   thread steals it from others (Simons, don't get angry at me - I am
>   probably wrong here ;-)

I would finger the scheduler, at least partially. There's no magic in  
this world and my Erlang version does not fare much better. Erlang  
too uses a GC and when a lot of threads load all that data into  
memory things are bound to get nasty.

The results for Erlang are more uniform, though. All the delays are  
all within 3-4s if I have 3s as the cut-off time and with GHC the  
delays are all over the place.

        Joel

--
http://wagerlabs.com/





_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Joels Time Leak

Cale Gibbard
In reply to this post by Simon Peyton Jones
Well, there's actually a more interesting problem hidden in here too.

The issue with it taking too long seems to basically be as Joel said,
only one of the threads can take that MVar at a time. Even if the time
that it's taken is fairly short, if one is running faster than the
others, it tries to take the MVar more often, which means that it runs
a higher risk of being blocked and slowed down, letting other threads
take its place. It essentially just forces the scheduler to be more
fair.

The more interesting issue seems to be when one replaces the forkIO in
Util.hs, line 36, with a forkOS and compile with -threaded. You'll get
no alerts for quite some time, and just when you think it's working:

unstuff: user error (ORANGE ALERT: 0s, 4s, SrvServerInfo, ix1: 6, size: 49722)

unstuff: internal error: scavenge_stack: weird activation record found
on stack: 63280
    Please report this as a bug to [hidden email],
    or http://www.sourceforge.net/projects/ghc/
cale@zaphod[~/timeleak]$

This seems to happen consistently on at least 3 platforms (Linux,
OpenBSD, Windows) (with sometimes a red alert rather than orange). I
filed a bug in trac corresponding to it.
(http://cvs.haskell.org/trac/ghc/ticket/641)

 - Cale

On 29/12/05, Simon Peyton-Jones <[hidden email]> wrote:

> Thanks for looking, Adrian,
>
> It'd be great if someone was able to find out more about what's going on
> here.   Bandwidth at GHC HQ is always tight, so the more precisely
> someone can pinpoint what's happening, the faster we can fix it.  Joel
> has done a lot by making a repro case.  Maybe others can help to narrow
> it down?
>
> Simon
>
> | -----Original Message-----
> | From: [hidden email]
> [mailto:[hidden email]] On Behalf Of
> | Adrian Hey
> | Sent: 29 December 2005 13:03
> | To: [hidden email]
> | Subject: [Haskell-cafe] Joels Time Leak
> |
> | Hello,
> |
> | I haven't followed everything that's happened on the Binary IO
> | thread, but has anybody else actually tried Joels code? ..
> |
> |  http://wagerlabs.com/timeleak.tgz
> |
> | I can reproduce the problem (ghc/Linux), but can't explain it. It
> | seems very strange that friggin about with an otherwise irrelevant
> | (AFAICT) MVar fixes the problem.
> |
> | Regards
> | --
> | Adrian Hey
> | _______________________________________________
> | Haskell-Cafe mailing list
> | [hidden email]
> | http://www.haskell.org/mailman/listinfo/haskell-cafe
> _______________________________________________
> Haskell-Cafe mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Joels Time Leak

Tomasz Zielonka
On Thu, Dec 29, 2005 at 03:34:10PM -0500, Cale Gibbard wrote:
> The issue with it taking too long seems to basically be as Joel said,
> only one of the threads can take that MVar at a time. Even if the time
> that it's taken is fairly short, if one is running faster than the
> others, it tries to take the MVar more often, which means that it runs
> a higher risk of being blocked and slowed down, letting other threads
> take its place. It essentially just forces the scheduler to be more
> fair.

I get results that confirm scheduler unfairness. I have numbered the
threads and every thread prints it number before starting "read".
When there are ORANGE or RED alerts, the output looks odd - below is
the result of "sort -n | uniq -c":

     53 1
     53 2
     ...
     21 46
     21 47
     ...
      8 109
      8 110
      3 111
      ...
      2 998
      2 999
      2 1000

So thread number 1 managed to run at least 52 or 53 "read"s, but thread
number 1000 only 1 or 2 "read"s.

Best regards
Tomasz

--
I am searching for a programmer who is good at least in some of
[Haskell, ML, C++, Linux, FreeBSD, math] for work in Warsaw, Poland
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re[2]: Joels Time Leak

Bulat Ziganshin
Hello Tomasz,

Friday, December 30, 2005, 12:36:37 AM, you wrote:

TZ> I get results that confirm scheduler unfairness. I have numbered the

TZ> So thread number 1 managed to run at least 52 or 53 "read"s, but thread
TZ> number 1000 only 1 or 2 "read"s.

it may be just because slowness in threads creation. try to count only
from the moment when 1000'th thread prints something


--
Best regards,
 Bulat                            mailto:[hidden email]



_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Joels Time Leak

Tomasz Zielonka
In reply to this post by Tomasz Zielonka
On Thu, Dec 29, 2005 at 10:36:37PM +0100, Tomasz Zielonka wrote:

> On Thu, Dec 29, 2005 at 03:34:10PM -0500, Cale Gibbard wrote:
> > The issue with it taking too long seems to basically be as Joel said,
> > only one of the threads can take that MVar at a time. Even if the time
> > that it's taken is fairly short, if one is running faster than the
> > others, it tries to take the MVar more often, which means that it runs
> > a higher risk of being blocked and slowed down, letting other threads
> > take its place. It essentially just forces the scheduler to be more
> > fair.
>
> I get results that confirm scheduler unfairness.

I am taking it back. It seems that these results were caused by threads
starting at different moments. I made every thread wail till all threads
are created, and now I can't reproduce it anymore.

Best regards
Tomasz

--
I am searching for a programmer who is good at least in some of
[Haskell, ML, C++, Linux, FreeBSD, math] for work in Warsaw, Poland
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Joels Time Leak

Simon Marlow-4
In reply to this post by Tomasz Zielonka
Tomasz Zielonka wrote:

> On Thu, Dec 29, 2005 at 01:20:41PM +0000, Joel Reymont wrote:
>
>>Why does it take a fraction of a second for 1 thread to unpickle and  
>>several seconds per thread for several threads to do it at the same  
>>time? I think this is where the mistery lies.
>
>
> Have you considered any of this:
>
> - too big memory pressure: more memory means more frequent and more
>   expensive GCs, 1000 threads using so much memory means bad cache
>   performance
> - a deficiency of GHC's thread scheduler - giving too much time one
>   thread steals it from others (Simons, don't get angry at me - I am
>   probably wrong here ;-)

I don't think there's anything really strange going on here.

The default context switch interval in GHC is 0.02 seconds, measured in
CPU time by default.  GHC's scheduler is stricly round-robin, so
therefore with 100 threads in the system it can be 2 seconds between a
thread being descheduled and scheduled again.

I measured the time taken to unpickle those large 50k packets as 0.3
seconds on my amd64 box (program compiled *without* optimisation), so
the thread can get descheduled twice during while unpickling a large
packet, giving a >4s delay with 100 threads running.

The actual context switch interval seems to often be larger than 0.2
seconds; I'm not sure exactly why this is, it might be due to delays in
the OS delivering the signal.  This does mean that the timeleak program
reports alerts for as little as 50 threads, though.

Cheers,
        Simon

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Haskell vs. Erlang: The scheduler

Joel Reymont
On Jan 3, 2006, at 2:30 PM, Simon Marlow wrote:

> The default context switch interval in GHC is 0.02 seconds,  
> measured in CPU time by default.  GHC's scheduler is stricly round-
> robin, so therefore with 100 threads in the system it can be 2  
> seconds between a thread being descheduled and scheduled again.
>
> I measured the time taken to unpickle those large 50k packets as  
> 0.3 seconds on my amd64 box (program compiled *without*  
> optimisation), so the thread can get descheduled twice during while  
> unpickling a large packet, giving a >4s delay with 100 threads  
> running.

Is it impractical then to implement this type of app in Haskell?  
Based on the nature of Haskell scheduling I would be inclined to say  
yes. I'm including information on the Erlang scheduler below.

I think it's possible to emulate the workings of the Erlang scheduler  
in Haskell by using delimited continuations a-la Zipper File Server/
OS. A single delimited continuation (request in Zipper FS parlance?)  
would be a scheduling unit and a programmer could then tune the  
"scheduler" to their hearts content.

Apart from putting a lot of burden on the programmer this becomes  
quite troublesome when multiple sockets or file descriptors are  
concerned. There's no easy way to plug into the select facility of  
the Haskell runtime to receive notifications of input available. You  
will notice the Zipper FS spending quite a few lines of code to roll  
its own select facility.

The Erlang scheduler is based on reduction count where one reduction  
is roughly equivalent to a function call. See http://www.erlang.org/ 
ml-archive/erlang-questions/200104/msg00072.html for more detail.

There's also this helpful bit of information:

--
erlang:bump_reductions(Reductions) -> void()

Types  Reductions = int()

This implementation-dependent function increments the  reduction
counter  for  the  calling  process.  In  the Beam emulator, the
reduction counter is normally incremented by one for each  func-
tion  and  BIF  call,  and  a  context switch is forced when the
counter reaches 1000.
--

Regarding the issue of why a logger process in Erlang does not get  
overwhelved, this is the reply I got from Raimo Niskanen (Erlang team  
at Ericsson):

There is a small fix in the scheduler for the standard
producer/consumer problem: A process that sends to a
receiver having a large receive queue gets punished
with a large reduction (number of function calls)
count for the send operation, and will therefore
get smaller scheduling slots.

        Thanks, Joel

--
http://wagerlabs.com/





_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Re: Joels Time Leak

Sebastian Sylvan
In reply to this post by Simon Marlow-4
On 1/3/06, Simon Marlow <[hidden email]> wrote:

> Tomasz Zielonka wrote:
> > On Thu, Dec 29, 2005 at 01:20:41PM +0000, Joel Reymont wrote:
> >
> >>Why does it take a fraction of a second for 1 thread to unpickle and
> >>several seconds per thread for several threads to do it at the same
> >>time? I think this is where the mistery lies.
> >
> >
> > Have you considered any of this:
> >
> > - too big memory pressure: more memory means more frequent and more
> >   expensive GCs, 1000 threads using so much memory means bad cache
> >   performance
> > - a deficiency of GHC's thread scheduler - giving too much time one
> >   thread steals it from others (Simons, don't get angry at me - I am
> >   probably wrong here ;-)
>
> I don't think there's anything really strange going on here.
>
> The default context switch interval in GHC is 0.02 seconds, measured in
> CPU time by default.  GHC's scheduler is stricly round-robin, so
> therefore with 100 threads in the system it can be 2 seconds between a
> thread being descheduled and scheduled again.

According to this:
http://www.haskell.org/ghc/docs/latest/html/users_guide/sec-using-parallel.html#parallel-rts-opts

The minimum time between context switches is 20 milliseconds.

Is there any good reason why 0.02 seconds is the best that you can get
here? Couldn't GHC's internal timer tick at a _much_ faster rate (like
50-100µs or so)?
Apart from meaning big trouble for applications with a large number of
threads (such as Joels) it'll also make life difficult for any sort of
real-time application. For instance if you want to use HOpenGL to
render a simulation engine and you split it up into tons of concurrent
processes (say one for each dynamic entity in the engine), the 20ms
granularity would make it quite hard to achieve 60 frames per second
in that case...

/S

--
Sebastian Sylvan
+46(0)736-818655
UIN: 44640862
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Re: Joels Time Leak

haskell-2
In reply to this post by Simon Marlow-4
General follow-up questions:

Would adding Control.Concurrent.yield commands cause a context switch
more often than every 0.02 seconds?

Is there any command in GHC to allow a thread to prevent itself from
being rescheduled while computing something?

Another comment: between 1000's of threads and writing a custom
continuation based scheduler, what about using a thread pool?  Does
anyone have a library with a "fork-IO-Pool" command?

--
Chris

Simon Marlow wrote:

> Tomasz Zielonka wrote:
>
>> On Thu, Dec 29, 2005 at 01:20:41PM +0000, Joel Reymont wrote:
>>
>>> Why does it take a fraction of a second for 1 thread to unpickle and
>>> several seconds per thread for several threads to do it at the same
>>> time? I think this is where the mistery lies.
>>
>>
>>
>> Have you considered any of this:
>>
>> - too big memory pressure: more memory means more frequent and more
>>   expensive GCs, 1000 threads using so much memory means bad cache
>>   performance
>> - a deficiency of GHC's thread scheduler - giving too much time one
>>   thread steals it from others (Simons, don't get angry at me - I am
>>   probably wrong here ;-)
>
>
> I don't think there's anything really strange going on here.
>
> The default context switch interval in GHC is 0.02 seconds, measured in
> CPU time by default.  GHC's scheduler is stricly round-robin, so
> therefore with 100 threads in the system it can be 2 seconds between a
> thread being descheduled and scheduled again.
>
> I measured the time taken to unpickle those large 50k packets as 0.3
> seconds on my amd64 box (program compiled *without* optimisation), so
> the thread can get descheduled twice during while unpickling a large
> packet, giving a >4s delay with 100 threads running.
>
> The actual context switch interval seems to often be larger than 0.2
> seconds; I'm not sure exactly why this is, it might be due to delays in
> the OS delivering the signal.  This does mean that the timeleak program
> reports alerts for as little as 50 threads, though.
>
> Cheers,
>     Simon
>
> _______________________________________________
> Haskell-Cafe mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

RE: Haskell vs. Erlang: The scheduler

Simon Marlow
In reply to this post by Joel Reymont
On 03 January 2006 15:13, Joel Reymont wrote:

> On Jan 3, 2006, at 2:30 PM, Simon Marlow wrote:
>> The default context switch interval in GHC is 0.02 seconds,
>> measured in CPU time by default.  GHC's scheduler is stricly round-
>> robin, so therefore with 100 threads in the system it can be 2
>> seconds between a thread being descheduled and scheduled again.
>>
>> I measured the time taken to unpickle those large 50k packets as
>> 0.3 seconds on my amd64 box (program compiled *without*
>> optimisation), so the thread can get descheduled twice during while
>> unpickling a large packet, giving a >4s delay with 100 threads
>> running.
>
> Is it impractical then to implement this type of app in Haskell?
> Based on the nature of Haskell scheduling I would be inclined to say
> yes.

Absolutely not!

Apart from the problem you have with a space leak caused by the input
buffer of the logging thread overflowing, which is easily fixed by using
a bounded channel, I don't know why you want priorities.  Is there
something else?

We could easily add Erlang-style priorities (though without the overhead
of counting function calls, I'd do it by counting allocations), but I'd
rather not if we can avoid it.

Cheers,
        Simon
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

RE: Re: Joels Time Leak

Simon Marlow
In reply to this post by Adrian Hey
On 03 January 2006 15:37, Sebastian Sylvan wrote:

> On 1/3/06, Simon Marlow <[hidden email]> wrote:
>> Tomasz Zielonka wrote:
>>> On Thu, Dec 29, 2005 at 01:20:41PM +0000, Joel Reymont wrote:
>>>
>>>> Why does it take a fraction of a second for 1 thread to unpickle
>>>> and several seconds per thread for several threads to do it at the
>>>> same time? I think this is where the mistery lies.
>>>
>>>
>>> Have you considered any of this:
>>>
>>> - too big memory pressure: more memory means more frequent and more
>>>   expensive GCs, 1000 threads using so much memory means bad cache
>>> performance - a deficiency of GHC's thread scheduler - giving too
>>>   much time one thread steals it from others (Simons, don't get
>>>   angry at me - I am probably wrong here ;-)
>>
>> I don't think there's anything really strange going on here.
>>
>> The default context switch interval in GHC is 0.02 seconds, measured
>> in CPU time by default.  GHC's scheduler is stricly round-robin, so
>> therefore with 100 threads in the system it can be 2 seconds between
>> a thread being descheduled and scheduled again.
>
> According to this:
> http://www.haskell.org/ghc/docs/latest/html/users_guide/sec-using-parallel.html#parallel-rts-opts
>
> The minimum time between context switches is 20 milliseconds.
>
> Is there any good reason why 0.02 seconds is the best that you can get
> here? Couldn't GHC's internal timer tick at a _much_ faster rate (like
> 50-100µs or so)?

Sure, there's no reason why we couldn't do this.  Of course, even idle Haskell processes will be ticking away in the background, so there's a reason not to make the interval too short.  What do you think is reasonable?

> Apart from meaning big trouble for applications with a large number of
> threads (such as Joels) it'll also make life difficult for any sort of
> real-time application. For instance if you want to use HOpenGL to
> render a simulation engine and you split it up into tons of concurrent
> processes (say one for each dynamic entity in the engine), the 20ms
> granularity would make it quite hard to achieve 60 frames per second
> in that case...

The reason things are the way they are is that a large number of *running* threads is not a workload we've optimised for.  In fact, Joel's program is the first one I've seen with a lot of running threads, apart from our testsuite.  And I suspect that when Joel uses a better binary I/O implementation a lot of that CPU usage will disappear.

Cheers,
        Simon
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Re: Joels Time Leak

Sebastian Sylvan
On 1/3/06, Simon Marlow <[hidden email]> wrote:

> On 03 January 2006 15:37, Sebastian Sylvan wrote:
>
> > On 1/3/06, Simon Marlow <[hidden email]> wrote:
> >> Tomasz Zielonka wrote:
> >>> On Thu, Dec 29, 2005 at 01:20:41PM +0000, Joel Reymont wrote:
> >>>
> >>>> Why does it take a fraction of a second for 1 thread to unpickle
> >>>> and several seconds per thread for several threads to do it at the
> >>>> same time? I think this is where the mistery lies.
> >>>
> >>>
> >>> Have you considered any of this:
> >>>
> >>> - too big memory pressure: more memory means more frequent and more
> >>>   expensive GCs, 1000 threads using so much memory means bad cache
> >>> performance - a deficiency of GHC's thread scheduler - giving too
> >>>   much time one thread steals it from others (Simons, don't get
> >>>   angry at me - I am probably wrong here ;-)
> >>
> >> I don't think there's anything really strange going on here.
> >>
> >> The default context switch interval in GHC is 0.02 seconds, measured
> >> in CPU time by default.  GHC's scheduler is stricly round-robin, so
> >> therefore with 100 threads in the system it can be 2 seconds between
> >> a thread being descheduled and scheduled again.
> >
> > According to this:
> > http://www.haskell.org/ghc/docs/latest/html/users_guide/sec-using-parallel.html#parallel-rts-opts
> >
> > The minimum time between context switches is 20 milliseconds.
> >
> > Is there any good reason why 0.02 seconds is the best that you can get
> > here? Couldn't GHC's internal timer tick at a _much_ faster rate (like
> > 50-100µs or so)?
>
> Sure, there's no reason why we couldn't do this.  Of course, even idle Haskell processes will be ticking away in the background, so there's a reason not to make the interval too short.  What do you think is reasonable?

Not sure. Could it be configurable via a command line flag? If the
profiler could report the % of time spent doing context switches (or
maybe it already does?) the user could fine tune this to his liking.

For the (hypothetical) real-time simulation app I would *guess* that
something along the lines of 500µs would be more than enough to not
introduce any unnecessary lag in rendering (seeing as the target frame
time would be around 15ms, and you'd want to have a good amount of
context switches to allow some of the next frame to be computed in
parallell to all the render-surface optimizations etc. for the current
frame).

But then again, there may be other apps which need it to be even
lower.. So a command line flag sure would be nice.

/S

--
Sebastian Sylvan
+46(0)736-818655
UIN: 44640862
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Re: Joels Time Leak

Joel Reymont
In reply to this post by Simon Marlow
Simon,

I don't think CPU usage is the issue. An individual thread will take  
a fraction of a second to deserialize a large packet. The issue is  
that, as you pointed out, you can get alerts even with 50 threads.  
Those fractions of a second add up in a certain way that's  
detrimental to the performance of the app.

The timeleak code uses Ptr Word8 to pickle which should be very  
efficient. I believe the delay comes from the way 'sequ' is compiled  
by GHC. I'll take the liberty of quoting Andrew Kennedy (your  
colleague from MS Research) who wrote the picklers:

--
My original pickler implementation was for SML. It was used in the  
MLj compiler, and is still used in the SML.NET compiler, and has  
acceptable performance (few ms pickling/unpickling for typical  
intermediate language object files). I must admit that I've not used  
the Haskell variant in anger. Apart from the inherent slowdown  
associated with laziness, is there a particular reason for poor  
performance?
--

'sequ' by itself does not seem like a big deal but when used to model  
records it builds a large nested lambda-list and I don't think that  
list is being compiled efficiently. I would appreciate if you could  
look at that and issue a verdict now that Andrew cofirms using the  
picklers in a real-life environment and w/o major problems.

Suppose I chose a different implementation of binary IO and disposed  
of pickler combinators.  Suppose I gained a 2x speed-up by doing so.  
I would now be getting alerts with 100 threads instead of 50, no?  
That's still far from ideal.

        Joel

On Jan 3, 2006, at 4:43 PM, Simon Marlow wrote:

> The reason things are the way they are is that a large number of  
> *running* threads is not a workload we've optimised for.  In fact,  
> Joel's program is the first one I've seen with a lot of running  
> threads, apart from our testsuite.  And I suspect that when Joel  
> uses a better binary I/O implementation a lot of that CPU usage  
> will disappear.

--
http://wagerlabs.com/





_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Joels Time Leak

haskell-2
In reply to this post by haskell-2
Thanks for the answer, but I should I written a longer comment. I have
added such a longer comment below:

Simon Marlow wrote:

> Chris Kuklewicz wrote:
>
>> Another comment: between 1000's of threads and writing a custom
>> continuation based scheduler, what about using a thread pool?  Does
>> anyone have a library with a "fork-IO-Pool" command?
>
>
> You don't need a thread pool, because threads are so cheap.  Thread
> pools are just a workaround for lack of lightweight concurrency.
>
> Cheers,
>     Simon
>

Since the round-robin scheduler has (0.02 * N) seconds of delay for N
therads, then one could trade off latency between time spent waiting for
the thread pool to start a job and time spend running the job and
getting interrupted.

In the limit of 1 worker thread, all the latency is waiting to get run,
and there are no interruptions, so the time taken *while running* is
very short.  With 10 threads, there can be a delay to start, and each
interruption adds 0.2 seconds to the job's run time once it has started.

For a server, the client requests queue up and wait for room in the
thread pool, and the pool is kept small enough that the
round-robin-schedular-delay keeps requests from timing out while being
serviced.  Otherwise 1000 client requests would cause 20 seconds of
reschedule penalty for all threads and they could all timeout.  With a
thread pool, one can drop threads that have been waiting for too long
instead of running them, so those threads will timeout. But the pool
keeps servicing at least some of the client requests on time.

All hypothetical to me, of course.

--
Chris
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re[2]: Re: Joels Time Leak

Bulat Ziganshin
In reply to this post by Simon Marlow
Hello Simon,

Tuesday, January 03, 2006, 7:43:21 PM, you wrote:

>> The minimum time between context switches is 20 milliseconds.
>>
>> Is there any good reason why 0.02 seconds is the best that you can get
>> here? Couldn't GHC's internal timer tick at a _much_ faster rate (like
>> 50-100µs or so)?

SM> Sure, there's no reason why we couldn't do this.  Of course, even
SM> idle Haskell processes will be ticking away in the background, so
SM> there's a reason not to make the interval too short.  What do  
SM> you think is reasonable?

Simon, the talk is about changing GHC _tick_, which is a _minimal_
possible context switch interval. so, we want to decrease this tick
and retain current 20 ms _default_ switch interval. this will make
possible to decrease switch interval for programs that really need it,
which is currently entirely impossible.


-C[<us>]:
Sets the context switch interval to <s> seconds. A context switch will
occur at the next heap block allocation after the timer expires (a
heap block allocation occurs every 4k of allocation). With -C0 or -C,
context switches will occur as often as possible (at every heap block
allocation). By default, context switches occur every 20ms
milliseconds. Note that GHC's internal timer ticks every 20ms, and the
context switch timer is always a multiple of this timer, so 20ms is
the maximum granularity available for timed context switches.  


--
Best regards,
 Bulat                            mailto:[hidden email]



_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
12