Parallel combinator, performance advice

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Parallel combinator, performance advice

Neil Mitchell
Hi,

I've written a parallel_ function, code attached. I'm looking for
criticism, suggestions etc on how to improve the performance and
fairness of this parallel construct. (If it turns out this construct
is already in a library somewhere, I'd be interested in that too!)

The problem I'm trying to solve is running system commands in
parallel. Importantly (unlike other Haskell parallel stuff) I'm not
expecting computationally heavy Haskell to be running in the threads,
and only want a maximum of n commands to fire at a time. The way I'm
trying to implement this is with a parallel_ function:

parallel_ :: [IO a] -> IO ()

The semantics are that after parallel_ returns each action will have
been executed exactly once. The implementation (attached) creates a
thread pool of numCapabililties-1 threads, each of which reads from a
task pool and attempts to do some useful work. I use an idempotent
function to ensure that all work is done at most one, and a sequence_
to ensure all work is done at least once.

Running a benchmark of issuing 1 million trivial tasks (create,
modify, read and IO ref) the version without any parallelism is really
fast (< 0.1 sec), and the version with parallelism is slow (> 10 sec).
This could be entirely due to space leaks etc when queueing many
tasks.

I'm useful for any thoughts people might have!

Thanks in advance,

Neil

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe

Parallel.hs (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Parallel combinator, performance advice

haskell-2
You create one MVar for each task in order to ensure all the tasks are done.
This is pretty heavyweight.

You could create a single Control.Concurrent.QSemN to count the completed tasks,
starting with 0.

Each task is followed by signalQSemN with a value of 1.  (I would use "finally").

As parallel_ launches the tasks it can count their number, then it would call
waitQSemN for that quantity to have finished.

--
Chris

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Parallel combinator, performance advice

Bulat Ziganshin-2
In reply to this post by Neil Mitchell
Hello Neil,

Tuesday, April 7, 2009, 2:25:12 PM, you wrote:

> The problem I'm trying to solve is running system commands in
> parallel.

"system commands" means execution of external commands or just system
calls inside Haskell?

> Running a benchmark of issuing 1 million trivial tasks (create,
> modify, read and IO ref) the version without any parallelism is really
> fast (< 0.1 sec), and the version with parallelism is slow (> 10 sec).
> This could be entirely due to space leaks etc when queueing many
> tasks.

i think it's just because use of MVar/Chan is much slower than IORef
activity. once i checked that on 1GHz cpu and got 2 million withMVar-s
per second

i don't understood exactly what you need, but my first shot is
to create N threads executing commands from channel:

para xs = do
  done <- newEmptyMVar
  chan <- newChan
  writeList2Chan chan (map Just xs ++ [Nothing])
 
  replicateM_ numCapabilities $ do
    forkIO $ do
      forever $ do
        x <- readChan chan
        case x of
          Just cmd -> cmd
          Nothing -> putMVar done ()
  takeMVar done
 


--
Best regards,
 Bulat                            mailto:[hidden email]

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Parallel combinator, performance advice

haskell-2
In reply to this post by Neil Mitchell
Neil Mitchell wrote:
> Sorry, accidentally sent before finishing the response! I also noticed
> you sent this directly to me, not to -cafe, was that intentional?

The mail/news gateway makes it look like that, but I also sent to the mailing list.

> You mean something like:
>
> parallel_ xs =
>    sem <- createSemapore (length xs)
>    enqueue [x >> signalSemapore sem | x <- xs]
>    waitOnSemaphore sem
>
> I thought of something like this, but then the thread that called
> parallel_ is blocked, which means if you fire off N threads you only
> get N-1 executing. If you have nested calls to parallel, then you end
> up with thread exhaustion. Is there a way to avoid that problem?
>
> Thanks
>
> Neil

Your parallel_ does not return until all operations are finished.

> parallel_ (x:xs) = do
>     ys <- mapM idempotent xs
>     mapM_ addParallel ys
>     sequence_ $ x : reverse ys

By the way, there is no obvious reason to insert "reverse" there.

What I meant was something like:
>
> para [] = return ()
> para [x] = x
> para xs = do
>   q <- newQSemN 0
>   let wrap x = finally x (signalQSemN q 1)
>       go [y] n = wrap x >> waitQSemN q (succ n)
>       go (y:ys) n = addParallel (wrap y) >> go ys $! succ n
>   go xs 0

This is nearly identical to your code, and avoid creating the MVar for each
operation.  I use "finally" to ensure the count is correct, but if a worker
threads dies then bas things will happen.  You can replace finally with (>>) if
speed is important.

This is also lazy since the length of the list is not forced early.

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Parallel combinator, performance advice

Neil Mitchell
Hi

> > The problem I'm trying to solve is running system commands in
> > parallel.

> "system commands" means execution of external commands or just system
> calls inside Haskell?

Calls to System.Cmd.system, i.e. running external console processes.
It's a make system I'm writing, so virtually all the time is spent in
calls to ghc etc.

To Bulat: I should have been clearer with the spec. The idea is that
multiple calls to paralell_ can execute, and a function executing
inside parallel_ can itself call parallel_. For this reason I need one
top-level thread pool, which requires unsafePerformIO. If I create a
thread pool every time, I end up with more threads than I want.

> Your parallel_ does not return until all operations are finished.
>
>> parallel_ (x:xs) = do
>>     ys <- mapM idempotent xs
>>     mapM_ addParallel ys
>>     sequence_ $ x : reverse ys
>
> By the way, there is no obvious reason to insert "reverse" there.

There is a reason :-)

Imagine I do parallel_ [a,b,c]

That's roughly doing (if b' is idempotent b):

enqueue b'
enqueue c'
a
b'
c'

If while executing a the thread pool starts on b', then after I've
finished a, I end up with both threads waiting for b', and nothing
doing c'. If I do a reverse, then the thread pool and me are starting
at different ends, so if we lock then I know it's something important
to me that the thread pool started first. It's still not idea, but it
happens less often.

> What I meant was something like:
>>
>> para [] = return ()
>> para [x] = x
>> para xs = do
>>   q <- newQSemN 0
>>   let wrap x = finally x (signalQSemN q 1)
>>       go [y] n = wrap x >> waitQSemN q (succ n)
>>       go (y:ys) n = addParallel (wrap y) >> go ys $! succ n
>>   go xs 0
>
> This is nearly identical to your code, and avoid creating the MVar for each
> operation.  I use "finally" to ensure the count is correct, but if a worker
> threads dies then bas things will happen.  You can replace finally with (>>) if
> speed is important.

Consider a thread pool with 2 threads and the call parallel_ [parallel_ [b,c],a]

You get the sequence:
enqueue (parallel_ [b,c])
a
wait on parallel_ [b,c]

While you are executing a, a thread pool starts:
enqueue b
c
wait for b

Now you have all the threads waiting, and no one dealing with the
thread pool. This results in deadlock.

I guess the "nested calls to parallel_" bit is the part of the spec
that makes everything much harder!

Thanks

Neil
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re[2]: Parallel combinator, performance advice

Bulat Ziganshin-2
Hello Neil,

Tuesday, April 7, 2009, 6:13:29 PM, you wrote:

> Calls to System.Cmd.system, i.e. running external console processes.
> It's a make system I'm writing, so virtually all the time is spent in
> calls to ghc etc.

> To Bulat: I should have been clearer with the spec. The idea is that
> multiple calls to paralell_ can execute, and a function executing
> inside parallel_ can itself call parallel_. For this reason I need one
> top-level thread pool, which requires unsafePerformIO. If I create a
> thread pool every time, I end up with more threads than I want.

this is smth new to solve

i propose to use concept similar to Capability of GHC RTS:

we have one Capability provided by thread calling para and N-1
Capabilities provided by your thread pool. all that we need is to
reuse current thread Capability as part of pool!

para xs = do
  sem <- newQSem
  for xs $ \x -> do
    writeChan chan (x `finally` signalQSem sem)
  tid <- forkIO (executing commands from chan...)
  waitQSem sem
  killThread tid

instead of killThread we really should send pseudo-job (like my
Nothing value) that will led to self-killing of job that gets this
signal

this solution still may lead to a bit more or less than N threads
executed at the same time. your turn!




--
Best regards,
 Bulat                            mailto:[hidden email]

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re[3]: Parallel combinator, performance advice

Bulat Ziganshin-2
Hello Bulat,

Tuesday, April 7, 2009, 6:50:14 PM, you wrote:

>   tid <- forkIO (executing commands from chan...)
>   waitQSem sem
>   killThread tid

> instead of killThread we really should send pseudo-job (like my
> Nothing value) that will led to self-killing of job that gets this
> signal

> this solution still may lead to a bit more or less than N threads
> executed at the same time. your turn!

solved! every job should go together with Bool flag `killItself`.
last job should have this flag set to True. thread will execute job
and kill itself if this flag is True. so we get strong guarantees that
there are exactly N threads in the system:

para xs = do
  sem <- newQSem
  for (init xs) $ \x -> do
    writeChan chan (x `finally` signalQSem sem, False)
  writeChan chan (last x `finally` signalQSem sem, True)
  --
  tid <- forkIO $ do
           let cycle = do
             (x,flag) <- readChan chan
             x
             unless flag cycle
           cycle
  --
  waitQSem sem


btw, this problem looks a great contribution into "Haskell way"
book of exercises

--
Best regards,
 Bulat                            mailto:[hidden email]

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Parallel combinator, performance advice

haskell-2
In reply to this post by Neil Mitchell
Neil Mitchell wrote:
>
> I guess the "nested calls to parallel_" bit is the part of the spec
> that makes everything much harder!
>
> Thanks
>
> Neil

Yes.  Much more annoying.

But the problem here is generic.  To avoid it you must never allow all thread to
block at once.

The parallel_ function is such a job, so you solved this with the 'idempotent'
trick.  You solution works by blocking all but 1 thread.

1a) Some worker thread 1 executes parallel_ with some jobs
1b) These get submitted the work queue 'chan'
1c) worker thread 1 starts on those same jobs, ignoring the queue
1d) worker thread 1 reaches the job being processed by thread 2
1e) worker thread 1 blocks until the jobs is finished in modifyMVar

2a) Worker thread 2 grabs a job posted by thread 1, that calls parallel_
2b) This batch of jobs gets submitted to the work queue 'chan'
2c) worker thread 2 starts on those same jobs, ignoring the queue
1d) worker thread 2 reaches the job being processed by thread 3
1e) worker thread 2 blocks until the jobs is finished in modifyMVar

3...4...5...

And now only 1 thread is still working, and it has to work in series.

I think I can fix this...

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Parallel combinator, performance advice

Sebastian Sylvan
In reply to this post by Neil Mitchell
This is a random idea, that's probably not going to work, but I don't have a way of testing it so I'll just post it!
How about using unsafeInterleaveIO to get a lazy suspension of the result of each action, and then using par to spark off each of them? If that works you can reuse the existing task-parallel system of GHC to do the heavily lifting for you, instead of having to write your own.

On Tue, Apr 7, 2009 at 11:25 AM, Neil Mitchell <[hidden email]> wrote:
Hi,

I've written a parallel_ function, code attached. I'm looking for
criticism, suggestions etc on how to improve the performance and
fairness of this parallel construct. (If it turns out this construct
is already in a library somewhere, I'd be interested in that too!)

The problem I'm trying to solve is running system commands in
parallel. Importantly (unlike other Haskell parallel stuff) I'm not
expecting computationally heavy Haskell to be running in the threads,
and only want a maximum of n commands to fire at a time. The way I'm
trying to implement this is with a parallel_ function:

parallel_ :: [IO a] -> IO ()

The semantics are that after parallel_ returns each action will have
been executed exactly once. The implementation (attached) creates a
thread pool of numCapabililties-1 threads, each of which reads from a
task pool and attempts to do some useful work. I use an idempotent
function to ensure that all work is done at most one, and a sequence_
to ensure all work is done at least once.

Running a benchmark of issuing 1 million trivial tasks (create,
modify, read and IO ref) the version without any parallelism is really
fast (< 0.1 sec), and the version with parallelism is slow (> 10 sec).
This could be entirely due to space leaks etc when queueing many
tasks.

I'm useful for any thoughts people might have!

Thanks in advance,

Neil

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe




--
Sebastian Sylvan
+44(0)7857-300802
UIN: 44640862

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re[2]: Parallel combinator, performance advice

Bulat Ziganshin-2
In reply to this post by Neil Mitchell
Hello Neil,

Tuesday, April 7, 2009, 6:13:29 PM, you wrote:

> Consider a thread pool with 2 threads and the call parallel_ [parallel_ [b,c],a]

> You get the sequence:
> enqueue (parallel_ [b,c])
> a
> wait on parallel_ [b,c]

> While you are executing a, a thread pool starts:
> enqueue b
> c
> wait for b

> Now you have all the threads waiting, and no one dealing with the
> thread pool. This results in deadlock.

i think the only way to solve this problem is to create one more
thread each time. let's see: on every call to para you need to alloc
one thread to wait for jobs completion. so on each nested call to para
you have minus one worker thread. finally you will eat them all!

so you need to make fork: one thread should serve jobs and another one
wait for completion of this jobs bucket. and with killItself flag you
will finish superfluous thread JIT


--
Best regards,
 Bulat                            mailto:[hidden email]

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Re[2]: Parallel combinator, performance advice

Neil Mitchell
Hi

Sebastian:

> How about using unsafeInterleaveIO to get a lazy suspension of the result of each action,
>  and then using par to spark off each of them? If that works you can reuse the existing
> task-parallel system of GHC to do the heavily lifting for you, instead of having to write your
> own.

par is likely to spark all the computations, and then switch between
them - which will mean I've got more than N things running in
parallel.

> i think the only way to solve this problem is to create one more
> thread each time. let's see: on every call to para you need to alloc
> one thread to wait for jobs completion. so on each nested call to para
> you have minus one worker thread. finally you will eat them all!
>
> so you need to make fork: one thread should serve jobs and another one
> wait for completion of this jobs bucket. and with killItself flag you
> will finish superfluous thread JIT

You are right, your previous solution was running at N-1 threads if
the order was a little unlucky. I've attached a new version which I
think gives you N threads always executing at full potential. It's
basically your idea from the last post, with the main logic being:

parallel_ (x1:xs) = do
    sem <- newQSem $ 1 - length xs
    forM_ xs $ \x ->
        writeChan queue (x >> signalQSem sem, False)
    x1
    addWorker
    waitQSem sem
    writeChan queue (signalQSem sem, True)
    waitQSem sem

Where the second flag being True = kill, as you suggested. I think
I've got the semaphore logic right - anyone want to see if I missed
something?

With this new version running 1000000 items takes ~1 second, instead
of ~10 seconds before, so an order of magnitude improvement, and
greater fairness. Very nice, thanks for all the help!

Thanks

Neil

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe

Parallel3.hs (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re[2]: Parallel combinator, performance advice

Bulat Ziganshin-2
In reply to this post by Neil Mitchell
Hello Neil,

Tuesday, April 7, 2009, 6:13:29 PM, you wrote:

> Calls to System.Cmd.system, i.e. running external console processes.
> It's a make system I'm writing, so virtually all the time is spent in
> calls to ghc etc.

btw, if all that you need is to limit amount of simultaneous
System.Cmd.system calls, you may go from opposite side: wrap this call
into semaphore:

sem = unsafePerformIO$ newQSem numCapabilities

mysystem = bracket_ (waitQSem sem) (signalQSem sem) . system

and implement para as simple thread population:

para = mapM_ forkIO


--
Best regards,
 Bulat                            mailto:[hidden email]

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re[4]: Parallel combinator, performance advice

Bulat Ziganshin-2
In reply to this post by Neil Mitchell
Hello Neil,

Tuesday, April 7, 2009, 7:33:25 PM, you wrote:
>> How about using unsafeInterleaveIO to get a lazy suspension of the result of each action,
>>  and then using par to spark off each of them? If that works you can reuse the existing
>> task-parallel system of GHC to do the heavily lifting for you, instead of having to write your
>> own.

> par is likely to spark all the computations, and then switch between
> them - which will mean I've got more than N things running in
> parallel.

par/GHC RTS limits amount of Haskell threads running simultaneously.
with a system call marked as safe, Capability will be freed while we
execute external program so nothing will be limited except for amount
of tasks *starting* (as opposite to running) simultaneously :)))


--
Best regards,
 Bulat                            mailto:[hidden email]

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Re[2]: Parallel combinator, performance advice

Neil Mitchell
In reply to this post by Bulat Ziganshin-2
Hi Bulat,

> btw, if all that you need is to limit amount of simultaneous
> System.Cmd.system calls, you may go from opposite side: wrap this call
> into semaphore:
>
> sem = unsafePerformIO$ newQSem numCapabilities
>
> mysystem = bracket_ (waitQSem sem) (signalQSem sem) . system
>
> and implement para as simple thread population:
>
> para = mapM_ forkIO

My main motivation is to limit the number of system calls, but it's
also useful from a user point of view if the system is doing a handful
of things at a time - it makes it easier to track what's going on.

I might try that tomorrow and see if it makes a difference to the
performance. While the majority of computation is in system calls,
quite a few of the threads open files etc, and having them all run in
parallel would end up with way too many open handles etc.

Thanks

Neil
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re[4]: Parallel combinator, performance advice

Bulat Ziganshin-2
In reply to this post by Neil Mitchell
Hello Neil,

Tuesday, April 7, 2009, 7:33:25 PM, you wrote:

> parallel_ (x1:xs) = do
>     sem <- newQSem $ 1 - length xs
>     forM_ xs $ \x ->
>         writeChan queue (x >> signalQSem sem, False)
>     x1
>     addWorker
>     waitQSem sem
>     writeChan queue (signalQSem sem, True)
>     waitQSem sem

> Where the second flag being True = kill, as you suggested. I think
> I've got the semaphore logic right - anyone want to see if I missed
> something?

Neil, executing x1 directly in parallel_ is incorrect idea. you should
have N worker threads, not N-1 threads plus one job executed in main
thread. imagine that you have 1000 jobs and N=4. that you will got
here is 3 threads each executed 333 jobs and 1 job executed by main
thread

so you still need to detach one more worker job and finish it just
before we are ready to finish waiting for QSem and continue in main
thread which is sole reason why we need killItself flag. in this code
snipped this flag is completely useless, btw


--
Best regards,
 Bulat                            mailto:[hidden email]

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re[4]: Parallel combinator, performance advice

Bulat Ziganshin-2
In reply to this post by Neil Mitchell
Hello Neil,

Tuesday, April 7, 2009, 7:47:17 PM, you wrote:

>> para = mapM_ forkIO

> I might try that tomorrow and see if it makes a difference to the
> performance. While the majority of computation is in system calls,
> quite a few of the threads open files etc, and having them all run in
> parallel would end up with way too many open handles etc.

if you have too much threads, you may replace forkIO with one more
QSem-enabled call:

semIO = unsafePerformIO$ newQSem 100

myForkIO = bracket_ (waitQSem semIO) (signalQSem semIO) . forkIO

this limit may be much higher than for System.Cmd.system


or you may go further and replace it with thread pool approach. the
main problem behind is raw calls to forkIO since these increases
amount of threads capable to call System.Cmd.system without any
control from us



--
Best regards,
 Bulat                            mailto:[hidden email]

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re[5]: Parallel combinator, performance advice

Bulat Ziganshin-2
In reply to this post by Bulat Ziganshin-2
Hello Bulat,

Tuesday, April 7, 2009, 7:50:08 PM, you wrote:

>> parallel_ (x1:xs) = do
>>     sem <- newQSem $ 1 - length xs
>>     forM_ xs $ \x ->
>>         writeChan queue (x >> signalQSem sem, False)
>>     x1
>>     addWorker
>>     waitQSem sem
>>     writeChan queue (signalQSem sem, True)
>>     waitQSem sem

> Neil, executing x1 directly in parallel_ is incorrect idea.

forget this. but it still a bit suboptimal: after everything was
finished, we schedule one more empty job and wait while some worker
thread will pick up it. it will go into Chan after all jobs scheduled
at the time our jobs was executed so that we are doing here is
eventually don't do any internal activity while we have all N external
programs running

instead, my solution packed this flag together with last job so once
last job is finished we are immediately returned from parallel_ so
other internal activity may go on

--
Best regards,
 Bulat                            mailto:[hidden email]

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Re[5]: Parallel combinator, performance advice

Neil Mitchell
Hi

> par is likely to spark all the computations, and then switch between
> them - which will mean I've got more than N things running in
> parallel.

| par/GHC RTS limits amount of Haskell threads running simultaneously.
| with a system call marked as safe, Capability will be freed while we
| execute external program so nothing will be limited except for amount
| of tasks *starting* (as opposite to running) simultaneously :)))

Yeah, I misspoke - I want to avoid starting N things.

>>> parallel_ (x1:xs) = do
>>>     sem <- newQSem $ 1 - length xs
>>>     forM_ xs $ \x ->
>>>         writeChan queue (x >> signalQSem sem, False)
>>>     x1
>>>     addWorker
>>>     waitQSem sem
>>>     writeChan queue (signalQSem sem, True)
>>>     waitQSem sem
>
>> Neil, executing x1 directly in parallel_ is incorrect idea.

It's a very slight optimisation, as it saves us queueing and
dequeueing x1, since we know the worker we're about the spawn on the
line below will grab x1 immediately.

> forget this. but it still a bit suboptimal: after everything was
> finished, we schedule one more empty job and wait while some worker
> thread will pick up it. it will go into Chan after all jobs scheduled
> at the time our jobs was executed so that we are doing here is
> eventually don't do any internal activity while we have all N external
> programs running
>
> instead, my solution packed this flag together with last job so once
> last job is finished we are immediately returned from parallel_ so
> other internal activity may go on

There is no guarantee that the last job finishes last. If the first
job takes longer than the last job we'll be one thread short while
waiting on the first job. It's a shame, since removing that additional
writeChan isn't particularly useful.

Thanks

Neil
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re[6]: Parallel combinator, performance advice

Bulat Ziganshin-2
In reply to this post by Bulat Ziganshin-2
Hello Bulat,

Tuesday, April 7, 2009, 8:10:43 PM, you wrote:

>>> parallel_ (x1:xs) = do
>>>     sem <- newQSem $ 1 - length xs
>>>     forM_ xs $ \x ->
>>>         writeChan queue (x >> signalQSem sem, False)
>>>     x1
>>>     addWorker
>>>     waitQSem sem
>>>     writeChan queue (signalQSem sem, True)
>>>     waitQSem sem

>> Neil, executing x1 directly in parallel_ is incorrect idea.

> forget this. but it still a bit suboptimal...

i think i realized why you use this schema. my solution may lead to
N-1 worker threads in the system if last job is too small - after its
execution we finish one thread and have just N-1 working threads until
parallel_ will be finished

but problem i mentioned in previous letter may also take place
although it looks like less important. we may solve both problems by
allowing worker thread to actively select its death time: it should
die only at the moment when *last* job in bucket was finished - this
guarantees us exactly N worker threads at any time. so:

parallel_ (x1:xs) = do
    sem <- newQSem $ - length xs
    jobsLast <- newMVar (length xs)
    addWorker
    forM_ (x1:xs) $ \x -> do
        writeChan queue $ do
           x
           signalQSem sem
           modifyMVar jobsLast $ \jobs -> do
               return (jobs-1, jobs==0)
    --
    waitQSem sem

   
and modify last 3 lines of addWorker:

addWorker :: IO ()
addWorker = do
    forkIO $ f `E.catch` \(e :: SomeException) ->
        throwTo mainThread $ ErrorCall "Control.Concurrent.Parallel: parallel thread died."
    return ()
    where
        f :: IO ()
        f = do
            act <- readChan queue
            kill <- act
            unless kill f



--
Best regards,
 Bulat                            mailto:[hidden email]

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Re[6]: Parallel combinator, performance advice

Neil Mitchell
Hi

>>>> parallel_ (x1:xs) = do
>>>>     sem <- newQSem $ 1 - length xs
>>>>     forM_ xs $ \x ->
>>>>         writeChan queue (x >> signalQSem sem, False)
>>>>     x1
>>>>     addWorker
>>>>     waitQSem sem
>>>>     writeChan queue (signalQSem sem, True)
>>>>     waitQSem sem
>
>>> Neil, executing x1 directly in parallel_ is incorrect idea.
>
>> forget this. but it still a bit suboptimal...
>
> i think i realized why you use this schema. my solution may lead to
> N-1 worker threads in the system if last job is too small - after its
> execution we finish one thread and have just N-1 working threads until
> parallel_ will be finished
>
> but problem i mentioned in previous letter may also take place
> although it looks like less important. we may solve both problems by
> allowing worker thread to actively select its death time: it should
> die only at the moment when *last* job in bucket was finished - this
> guarantees us exactly N worker threads at any time. so:
>
> parallel_ (x1:xs) = do
>    sem <- newQSem $ - length xs
>    jobsLast <- newMVar (length xs)
>    addWorker
>    forM_ (x1:xs) $ \x -> do
>        writeChan queue $ do
>           x
>           signalQSem sem
>           modifyMVar jobsLast $ \jobs -> do
>               return (jobs-1, jobs==0)

Yes, this saves us adding a kill job to the queue, but requires an
extra MVar. I guess which one of these is to be preferred depends on
performance measures.

I've just found that QSem _ISN'T_ valid below 0, so the above code
won't actually work with QSem as it stands. I've reported this on
ghc-users@. I've also spotted that QSem creates plenty of MVar's
itself, so the logic of moving to QSem instead of MVar isn't really
valid (although the new approach is nicer, so that's good).

Thanks

Neil
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
12