Concurrency performance problem

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Concurrency performance problem

Łukasz Dąbek
Hello Cafe!

I have a problem with following code: http://hpaste.org/83460. It is a
simple Monte Carlo integration. The problem is that when I run my
program with +RTS -N1 I get:
Multi
693204.039020917 8.620632s
Single
693204.039020917 8.574839s
End

And with +RTS -N4 (I have four CPU cores):
Multi
693204.0390209169 11.877143s
Single
693204.039020917 11.399888s
End

I have two questions:
 1) Why performance decreases when I add more cores for my program?
 2) Why performance of single threaded integration also changes with
number of cores?

Thanks for all answers,
Łukasz Dąbek.

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Concurrency performance problem

Don Stewart

Depends on your code...

On Mar 4, 2013 6:10 PM, "Łukasz Dąbek" <[hidden email]> wrote:
Hello Cafe!

I have a problem with following code: http://hpaste.org/83460. It is a
simple Monte Carlo integration. The problem is that when I run my
program with +RTS -N1 I get:
Multi
693204.039020917 8.620632s
Single
693204.039020917 8.574839s
End

And with +RTS -N4 (I have four CPU cores):
Multi
693204.0390209169 11.877143s
Single
693204.039020917 11.399888s
End

I have two questions:
 1) Why performance decreases when I add more cores for my program?
 2) Why performance of single threaded integration also changes with
number of cores?

Thanks for all answers,
Łukasz Dąbek.

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Concurrency performance problem

Łukasz Dąbek
What do you exactly mean? I have included link to full source listing:
http://hpaste.org/83460.

--
Łukasz Dąbek

2013/3/4 Don Stewart <[hidden email]>:

> Depends on your code...
>
> On Mar 4, 2013 6:10 PM, "Łukasz Dąbek" <[hidden email]> wrote:
>>
>> Hello Cafe!
>>
>> I have a problem with following code: http://hpaste.org/83460. It is a
>> simple Monte Carlo integration. The problem is that when I run my
>> program with +RTS -N1 I get:
>> Multi
>> 693204.039020917 8.620632s
>> Single
>> 693204.039020917 8.574839s
>> End
>>
>> And with +RTS -N4 (I have four CPU cores):
>> Multi
>> 693204.0390209169 11.877143s
>> Single
>> 693204.039020917 11.399888s
>> End
>>
>> I have two questions:
>>  1) Why performance decreases when I add more cores for my program?
>>  2) Why performance of single threaded integration also changes with
>> number of cores?
>>
>> Thanks for all answers,
>> Łukasz Dąbek.
>>
>> _______________________________________________
>> Haskell-Cafe mailing list
>> [hidden email]
>> http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Concurrency performance problem

Don Stewart

Apologies, didn't see the link on my phone :)

As the comment on the link shows, youre accidentally migrating unevaluated work to the main thread, hence no speedup.

Be very careful with evaluation strategies (esp. lazy expressions) around MVar and TVar points. Its too easy to put a thunk in one.

The strict-concurrency package is one attempt to invert the conventional lazy box, to better match thge most common case.

On Mar 4, 2013 7:25 PM, "Łukasz Dąbek" <[hidden email]> wrote:
What do you exactly mean? I have included link to full source listing:
http://hpaste.org/83460.

--
Łukasz Dąbek

2013/3/4 Don Stewart <[hidden email]>:
> Depends on your code...
>
> On Mar 4, 2013 6:10 PM, "Łukasz Dąbek" <[hidden email]> wrote:
>>
>> Hello Cafe!
>>
>> I have a problem with following code: http://hpaste.org/83460. It is a
>> simple Monte Carlo integration. The problem is that when I run my
>> program with +RTS -N1 I get:
>> Multi
>> 693204.039020917 8.620632s
>> Single
>> 693204.039020917 8.574839s
>> End
>>
>> And with +RTS -N4 (I have four CPU cores):
>> Multi
>> 693204.0390209169 11.877143s
>> Single
>> 693204.039020917 11.399888s
>> End
>>
>> I have two questions:
>>  1) Why performance decreases when I add more cores for my program?
>>  2) Why performance of single threaded integration also changes with
>> number of cores?
>>
>> Thanks for all answers,
>> Łukasz Dąbek.
>>
>> _______________________________________________
>> Haskell-Cafe mailing list
>> [hidden email]
>> http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Concurrency performance problem

Łukasz Dąbek
Thank you for your help! This solved my performance problem :)

Anyway, the second question remains. Why performance of single
threaded calculation is affected by RTS -N parameter. Is GHC doing
some parallelization behind the scenes?

--
Łukasz Dąbek.

2013/3/4 Don Stewart <[hidden email]>:

> Apologies, didn't see the link on my phone :)
>
> As the comment on the link shows, youre accidentally migrating unevaluated
> work to the main thread, hence no speedup.
>
> Be very careful with evaluation strategies (esp. lazy expressions) around
> MVar and TVar points. Its too easy to put a thunk in one.
>
> The strict-concurrency package is one attempt to invert the conventional
> lazy box, to better match thge most common case.
>
> On Mar 4, 2013 7:25 PM, "Łukasz Dąbek" <[hidden email]> wrote:
>>
>> What do you exactly mean? I have included link to full source listing:
>> http://hpaste.org/83460.
>>
>> --
>> Łukasz Dąbek
>>
>> 2013/3/4 Don Stewart <[hidden email]>:
>> > Depends on your code...
>> >
>> > On Mar 4, 2013 6:10 PM, "Łukasz Dąbek" <[hidden email]> wrote:
>> >>
>> >> Hello Cafe!
>> >>
>> >> I have a problem with following code: http://hpaste.org/83460. It is a
>> >> simple Monte Carlo integration. The problem is that when I run my
>> >> program with +RTS -N1 I get:
>> >> Multi
>> >> 693204.039020917 8.620632s
>> >> Single
>> >> 693204.039020917 8.574839s
>> >> End
>> >>
>> >> And with +RTS -N4 (I have four CPU cores):
>> >> Multi
>> >> 693204.0390209169 11.877143s
>> >> Single
>> >> 693204.039020917 11.399888s
>> >> End
>> >>
>> >> I have two questions:
>> >>  1) Why performance decreases when I add more cores for my program?
>> >>  2) Why performance of single threaded integration also changes with
>> >> number of cores?
>> >>
>> >> Thanks for all answers,
>> >> Łukasz Dąbek.
>> >>
>> >> _______________________________________________
>> >> Haskell-Cafe mailing list
>> >> [hidden email]
>> >> http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Concurrency performance problem

Johan Tibell-2
On Mon, Mar 4, 2013 at 11:39 AM, Łukasz Dąbek <[hidden email]> wrote:
Thank you for your help! This solved my performance problem :)

Anyway, the second question remains. Why performance of single
threaded calculation is affected by RTS -N parameter. Is GHC doing
some parallelization behind the scenes?

I believe it's because -N makes GHC use the threaded RTS, which is different from the non-threaded RTS and has some overheads therefore. 

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Concurrency performance problem

Łukasz Dąbek
2013/3/4 Johan Tibell <[hidden email]>:
> I believe it's because -N makes GHC use the threaded RTS, which is different
> from the non-threaded RTS and has some overheads therefore.

That's interesting. Can you recommend some reading materials about
this? Besides GHC source, of course ;) Explanation of why decrease in
performance is proportional to number of cores would be great.

--
Łukasz Dąbek

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Concurrency performance problem

Edward Z. Yang
In reply to this post by Łukasz Dąbek
If you just pass -N, GHC automatically sets the number of threads
based on the number of cores on your machine. Do you mean -threaded?

Excerpts from Łukasz Dąbek's message of Mon Mar 04 11:39:43 -0800 2013:

> Thank you for your help! This solved my performance problem :)
>
> Anyway, the second question remains. Why performance of single
> threaded calculation is affected by RTS -N parameter. Is GHC doing
> some parallelization behind the scenes?
>
> --
> Łukasz Dąbek.
>
> 2013/3/4 Don Stewart <[hidden email]>:
> > Apologies, didn't see the link on my phone :)
> >
> > As the comment on the link shows, youre accidentally migrating unevaluated
> > work to the main thread, hence no speedup.
> >
> > Be very careful with evaluation strategies (esp. lazy expressions) around
> > MVar and TVar points. Its too easy to put a thunk in one.
> >
> > The strict-concurrency package is one attempt to invert the conventional
> > lazy box, to better match thge most common case.
> >
> > On Mar 4, 2013 7:25 PM, "Łukasz Dąbek" <[hidden email]> wrote:
> >>
> >> What do you exactly mean? I have included link to full source listing:
> >> http://hpaste.org/83460.
> >>
> >> --
> >> Łukasz Dąbek
> >>
> >> 2013/3/4 Don Stewart <[hidden email]>:
> >> > Depends on your code...
> >> >
> >> > On Mar 4, 2013 6:10 PM, "Łukasz Dąbek" <[hidden email]> wrote:
> >> >>
> >> >> Hello Cafe!
> >> >>
> >> >> I have a problem with following code: http://hpaste.org/83460. It is a
> >> >> simple Monte Carlo integration. The problem is that when I run my
> >> >> program with +RTS -N1 I get:
> >> >> Multi
> >> >> 693204.039020917 8.620632s
> >> >> Single
> >> >> 693204.039020917 8.574839s
> >> >> End
> >> >>
> >> >> And with +RTS -N4 (I have four CPU cores):
> >> >> Multi
> >> >> 693204.0390209169 11.877143s
> >> >> Single
> >> >> 693204.039020917 11.399888s
> >> >> End
> >> >>
> >> >> I have two questions:
> >> >>  1) Why performance decreases when I add more cores for my program?
> >> >>  2) Why performance of single threaded integration also changes with
> >> >> number of cores?
> >> >>
> >> >> Thanks for all answers,
> >> >> Łukasz Dąbek.
> >> >>
> >> >> _______________________________________________
> >> >> Haskell-Cafe mailing list
> >> >> [hidden email]
> >> >> http://www.haskell.org/mailman/listinfo/haskell-cafe
>

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Concurrency performance problem

briand-3
In reply to this post by Łukasz Dąbek
On Mon, 4 Mar 2013 20:39:43 +0100
Łukasz Dąbek <[hidden email]> wrote:

> Thank you for your help! This solved my performance problem :)
>

do you have a link to the new code ?

it should be very instructive to see the differences.

Brian


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Concurrency performance problem

Łukasz Dąbek
2013/3/4  <[hidden email]>:
> do you have a link to the new code ?

Diff is at the bottom of original code: http://hpaste.org/83460.

If you just pass -N, GHC automatically sets the number of threads
based on the number of cores on your machine. 

Yes, I know that. I am just wondering why seemingly single threaded computation (look at singleThreadIntegrate in source code from first post) runs slower with increasing number of cores available (set through -N option).

--
Łukasz Dąbek


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Concurrency performance problem

Nathan Howell-2
Depends on the application, of course. The (on by default) parallel GC tends to kill performance for me... you might try running both with "+RTS -sstderr" to see if GC time is significantly higher, and try adding "+RTS -qg1" if it is.


On Mon, Mar 4, 2013 at 2:23 PM, Łukasz Dąbek <[hidden email]> wrote:
2013/3/4  <[hidden email]>:

> do you have a link to the new code ?

Diff is at the bottom of original code: http://hpaste.org/83460.

If you just pass -N, GHC automatically sets the number of threads
based on the number of cores on your machine. 

Yes, I know that. I am just wondering why seemingly single threaded computation (look at singleThreadIntegrate in source code from first post) runs slower with increasing number of cores available (set through -N option).

--
Łukasz Dąbek


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe



_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Concurrency performance problem

Łukasz Dąbek
2013/3/5 Nathan Howell <[hidden email]>
Depends on the application, of course. The (on by default) parallel GC tends to kill performance for me... you might try running both with "+RTS -sstderr" to see if GC time is significantly higher, and try adding "+RTS -qg1" if it is.
 
You are correct: parallel GC is slowing computation down. After some experiments I can produce two behaviors: use single threaded GC (multithreaded version is slowed down by factor of 5 - but single threaded backs to normal) or increase heap size (multithreaded version slows down by factor of 2, single threaded version runs normally). I guess I must live with this ;)

--
Łukasz Dąbek

 


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe