Design discussion for atomic primops to land in 7.8

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Design discussion for atomic primops to land in 7.8

Ryan Newton
There's a ticket that describes the design here:
    http://ghc.haskell.org/trac/ghc/ticket/8157#comment:1
It is a fairly simple extension of the casMutVar# that has been in since
7.2.  The implementation is on the `atomics` branch currently.

Feel free to add your views either here or on that task's comments.

One example of an alternative design would be Carter's proposal to expose
something closer to the full LLVM concurrency
ops<http://llvm.org/docs/Atomics.html>
:

Schonwald <carter.schonwald at gmail.com> wrote:

> i'm kinda thinking that we should do the analogue of exposing all the
> different memory model level choices (because its not that hard to add
> that), and when the person building it has an old version of GCC, it falls
> back to the legacy atomic operations?
>
> This also gives a nice path to how to upgrade to the inline asm approach.
>

These LLVM ops include many parameterized configurations of loads, stores,
cmpxchg, atomicrmw and barriers.  In fact, it implements much more than is
natively supported in most hardware, but it provides a uniform abstraction.

My original thought was that any kind of abstraction like that would be
built and maintained as a Haskell library, and only the most rudimentary
operations (required to get access to process features) would be exposed as
primops.  Let's call this the "small" set of concurrent ops.

If we want the "big set" I think we're doomed to *reproduce* the logic that
maps LLVM concurrency abstractions onto machine ops irrespective of whether
those abstractions are implemented as Haskell functions or as primops:

   - If the former, then the Haskell library must map the full set of ops
   to the reduced small set (just like LLVM does internally)
   - If we instead have a large set of LLVM-isomorphic primops.... then to
   support the same primops *in the native code backend *will, again,
   require reimplementing all configurations of all operations.

Unless... we want to make concurrency ops something that require the LLVM
backend?

Right now there is not a *performance* disadvantage to supporting a smaller
rather than a larger set of concurrency ops (LLVM has to emulate these
things anyway, or "round up" to more expensive ops).  The scenario where it
would be good to target ALL of LLVMs interface would be if processors and
LLVM improved in the future, and we automatically got the benefit of better
HW support for some op on on some arch.

I'm a bit skeptical of that proposition itself, however.  I personally
don't really like a world where we program with "virtual operations" that
don't really exist (and thus can't be *tested* against properly).  Absent
formal verification, it seems hard to get this code right anyway.  Errors
will be undetectable on existing architectures.

  -Ryan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130822/4db6cf1f/attachment.htm>

Reply | Threaded
Open this post in threaded view
|

Design discussion for atomic primops to land in 7.8

Carter Schonwald
Hey Ryan,
you raise some very good points.

The most important point you raise (I think) is this:
it would be very very nice to (where feasible) to add analogous machinery
to the native code gen, so that its not falling behind the llvm one quite
as much.

at least for these atomic operations (unlike the SIMD ones),
it may be worth investigating whats needed to add those to the native code
gen as well.

(adding simd support on the native codegen would be nice too, but
probably *substantially
*more work)



On Thu, Aug 22, 2013 at 11:40 AM, Ryan Newton <rrnewton at gmail.com> wrote:

> There's a ticket that describes the design here:
>     http://ghc.haskell.org/trac/ghc/ticket/8157#comment:1
> It is a fairly simple extension of the casMutVar# that has been in since
> 7.2.  The implementation is on the `atomics` branch currently.
>
> Feel free to add your views either here or on that task's comments.
>
> One example of an alternative design would be Carter's proposal to expose
> something closer to the full LLVM concurrency ops<http://llvm.org/docs/Atomics.html>
> :
>
> Schonwald <carter.schonwald at gmail.com> wrote:
>
>> i'm kinda thinking that we should do the analogue of exposing all the
>> different memory model level choices (because its not that hard to add
>> that), and when the person building it has an old version of GCC, it falls
>> back to the legacy atomic operations?
>>
>> This also gives a nice path to how to upgrade to the inline asm approach.
>>
>
> These LLVM ops include many parameterized configurations of loads, stores,
> cmpxchg, atomicrmw and barriers.  In fact, it implements much more than is
> natively supported in most hardware, but it provides a uniform abstraction.
>
> My original thought was that any kind of abstraction like that would be
> built and maintained as a Haskell library, and only the most rudimentary
> operations (required to get access to process features) would be exposed as
> primops.  Let's call this the "small" set of concurrent ops.
>
> If we want the "big set" I think we're doomed to *reproduce* the logic
> that maps LLVM concurrency abstractions onto machine ops irrespective of
> whether those abstractions are implemented as Haskell functions or as
> primops:
>
>    - If the former, then the Haskell library must map the full set of ops
>    to the reduced small set (just like LLVM does internally)
>    - If we instead have a large set of LLVM-isomorphic primops.... then
>    to support the same primops *in the native code backend *will, again,
>    require reimplementing all configurations of all operations.
>
> Unless... we want to make concurrency ops something that require the LLVM
> backend?
>
> Right now there is not a *performance* disadvantage to supporting a
> smaller rather than a larger set of concurrency ops (LLVM has to emulate
> these things anyway, or "round up" to more expensive ops).  The scenario
> where it would be good to target ALL of LLVMs interface would be if
> processors and LLVM improved in the future, and we automatically got the
> benefit of better HW support for some op on on some arch.
>
> I'm a bit skeptical of that proposition itself, however.  I personally
> don't really like a world where we program with "virtual operations" that
> don't really exist (and thus can't be *tested* against properly).  Absent
> formal verification, it seems hard to get this code right anyway.  Errors
> will be undetectable on existing architectures.
>
>   -Ryan
>
>
>
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130822/a45c2e77/attachment.htm>

Reply | Threaded
Open this post in threaded view
|

Design discussion for atomic primops to land in 7.8

Ryan Newton
Well, what's the long term plan?  Is the LLVM backend going to become the
only backend at some point?



On Thu, Aug 22, 2013 at 1:43 PM, Carter Schonwald <
carter.schonwald at gmail.com> wrote:

> Hey Ryan,
> you raise some very good points.
>
> The most important point you raise (I think) is this:
> it would be very very nice to (where feasible) to add analogous machinery
> to the native code gen, so that its not falling behind the llvm one quite
> as much.
>
> at least for these atomic operations (unlike the SIMD ones),
> it may be worth investigating whats needed to add those to the native code
> gen as well.
>
> (adding simd support on the native codegen would be nice too, but probably
> *substantially *more work)
>
>
>
> On Thu, Aug 22, 2013 at 11:40 AM, Ryan Newton <rrnewton at gmail.com> wrote:
>
>> There's a ticket that describes the design here:
>>     http://ghc.haskell.org/trac/ghc/ticket/8157#comment:1
>> It is a fairly simple extension of the casMutVar# that has been in since
>> 7.2.  The implementation is on the `atomics` branch currently.
>>
>> Feel free to add your views either here or on that task's comments.
>>
>> One example of an alternative design would be Carter's proposal to expose
>> something closer to the full LLVM concurrency ops<http://llvm.org/docs/Atomics.html>
>> :
>>
>> Schonwald <carter.schonwald at gmail.com> wrote:
>>
>>> i'm kinda thinking that we should do the analogue of exposing all the
>>> different memory model level choices (because its not that hard to add
>>> that), and when the person building it has an old version of GCC, it falls
>>> back to the legacy atomic operations?
>>>
>>> This also gives a nice path to how to upgrade to the inline asm approach.
>>>
>>
>> These LLVM ops include many parameterized configurations of loads,
>> stores, cmpxchg, atomicrmw and barriers.  In fact, it implements much more
>> than is natively supported in most hardware, but it provides a uniform
>> abstraction.
>>
>> My original thought was that any kind of abstraction like that would be
>> built and maintained as a Haskell library, and only the most rudimentary
>> operations (required to get access to process features) would be exposed as
>> primops.  Let's call this the "small" set of concurrent ops.
>>
>> If we want the "big set" I think we're doomed to *reproduce* the logic
>> that maps LLVM concurrency abstractions onto machine ops irrespective of
>> whether those abstractions are implemented as Haskell functions or as
>> primops:
>>
>>    - If the former, then the Haskell library must map the full set of
>>    ops to the reduced small set (just like LLVM does internally)
>>    - If we instead have a large set of LLVM-isomorphic primops.... then
>>    to support the same primops *in the native code backend *will, again,
>>    require reimplementing all configurations of all operations.
>>
>> Unless... we want to make concurrency ops something that require the LLVM
>> backend?
>>
>> Right now there is not a *performance* disadvantage to supporting a
>> smaller rather than a larger set of concurrency ops (LLVM has to emulate
>> these things anyway, or "round up" to more expensive ops).  The scenario
>> where it would be good to target ALL of LLVMs interface would be if
>> processors and LLVM improved in the future, and we automatically got the
>> benefit of better HW support for some op on on some arch.
>>
>> I'm a bit skeptical of that proposition itself, however.  I personally
>> don't really like a world where we program with "virtual operations" that
>> don't really exist (and thus can't be *tested* against properly).
>>  Absent formal verification, it seems hard to get this code right anyway.
>>  Errors will be undetectable on existing architectures.
>>
>>   -Ryan
>>
>>
>>
>>
>> _______________________________________________
>> ghc-devs mailing list
>> ghc-devs at haskell.org
>> http://www.haskell.org/mailman/listinfo/ghc-devs
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130822/9919a7d3/attachment.htm>

Reply | Threaded
Open this post in threaded view
|

Design discussion for atomic primops to land in 7.8

Ben Lippmeier-2

On 23/08/2013, at 3:52 AM, Ryan Newton wrote:

> Well, what's the long term plan?  Is the LLVM backend going to become the only backend at some point?

I wouldn't argue against ditching the NCG entirely. It's hard to justify fixing NCG performance problems when fixing them won't make the NCG faster than LLVM, and everyone uses LLVM anyway.

We're going to need more and more SIMD support when processors supporting the Larrabee New Instructions (LRBni) appear on people's desks. At that time there still won't be a good enough reason to implement those instructions in the NCG.

Ben.




Reply | Threaded
Open this post in threaded view
|

Design discussion for atomic primops to land in 7.8

Niklas Larsson
2013/8/26 Ben Lippmeier <benl at ouroborus.net>

>
> On 23/08/2013, at 3:52 AM, Ryan Newton wrote:
>
> > Well, what's the long term plan?  Is the LLVM backend going to become
> the only backend at some point?
>
> I wouldn't argue against ditching the NCG entirely. It's hard to justify
> fixing NCG performance problems when fixing them won't make the NCG faster
> than LLVM, and everyone uses LLVM anyway.



>

We're going to need more and more SIMD support when processors supporting
> the Larrabee New Instructions (LRBni) appear on people's desks. At that
> time there still won't be a good enough reason to implement those
> instructions in the NCG.
>
> Ben.
>
>
I hope to implement SIMD support for the native code gen soon. It's not a
huge task and having feature parity between LLVM and NCG would be good.

Niklas

>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130826/4ec685c7/attachment.htm>

Reply | Threaded
Open this post in threaded view
|

Design discussion for atomic primops to land in 7.8

John Lato-2
In reply to this post by Ben Lippmeier-2
On Sun, Aug 25, 2013 at 11:01 PM, Ben Lippmeier <benl at ouroborus.net> wrote:

>
> On 23/08/2013, at 3:52 AM, Ryan Newton wrote:
>
> > Well, what's the long term plan?  Is the LLVM backend going to become
> the only backend at some point?
>
> I wouldn't argue against ditching the NCG entirely. It's hard to justify
> fixing NCG performance problems when fixing them won't make the NCG faster
> than LLVM, and everyone uses LLVM anyway.
>

This is not true.  I don't believe I've ever seen the LLVM backend compile
more quickly than the NCG, it usually takes significantly longer, and for
at least some (most?) projects produces worse output.

I don't have anything against the LLVM backend in principle*, but at
present it's not as good as the NCG for us.

We're going to need more and more SIMD support when processors supporting
> the Larrabee New Instructions (LRBni) appear on people's desks. At that
> time there still won't be a good enough reason to implement those
> instructions in the NCG.


How about that the NCG is better than LLVM? ;)

In all seriousness, I'm quite sympathetic to the desire to support only one
backend, and LLVM can offer a lot (SIMD fallbacks, target architectures,
etc).  But at present, in my experience the LLVM backend doesn't really
live up to what I've seen claimed for it.  Given that, I think it's a bit
premature to talk of dropping the NCG.

My $0.02,
John

[1] Ok, I do have one issue with LLVM.  It's always struck me as very
brittle, with a lot of breakages between versions.  Given that I just tried
ghc -fllvm with LLVM-3.3 and the compiler bailed out due to a bad object
file, my impression of brittleness doesn't seem likely to change any time
soon.  Given that LLVM releases major versions predictably often, I don't
know that I want ghc devs spending time chasing after them.  But in
principle it seems the right thing to do.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130826/480e5740/attachment.htm>

Reply | Threaded
Open this post in threaded view
|

Design discussion for atomic primops to land in 7.8

Ben Lippmeier-2
In reply to this post by Niklas Larsson

> > Well, what's the long term plan?  Is the LLVM backend going to become the only backend at some point?
>
> I wouldn't argue against ditching the NCG entirely. It's hard to justify fixing NCG performance problems when fixing them won't make the NCG faster than LLVM, and everyone uses LLVM anyway.
>  
> We're going to need more and more SIMD support when processors supporting the Larrabee New Instructions (LRBni) appear on people's desks. At that time there still won't be a good enough reason to implement those instructions in the NCG.
>
> I hope to implement SIMD support for the native code gen soon. It's not a huge task and having feature parity between LLVM and NCG would be good.

Will you also update the SIMD support, register allocators, and calling conventions in 2015 when AVX-512 lands on the desktop? On all supported platforms? What about support for the x86 vcompress and vexpand instructions with mask registers? What about when someone finally asks for packed conversions between 16xWord8s and 16xFloat32s where you need to split the result into four separate registers? LLVM does that automatically.

I've been down this path before. In 2007 I implemented a separate graph colouring register allocator in the NCG to supposably improve GHC's numeric performance, but the LLVM backend subsumed that work and now having two separate register allocators is more of a maintenance burden than a help to anyone. At the time, LLVM was just becoming well known, so it wasn't obvious that implementing a new register allocator was a largely a redundant piece of work -- but I think it's clear now. I was happy to work on the project at the time, and I learned a lot from it, but when starting new projects now I also try to imagine the system that will replace the one I'm dreaming of.

Of course, you should do what interests you -- I'm just pointing out a strategic consideration.

Ben

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130826/7e8d420c/attachment.htm>

Reply | Threaded
Open this post in threaded view
|

Design discussion for atomic primops to land in 7.8

Simon Marlow-7
On 26/08/13 08:17, Ben Lippmeier wrote:

>
>>     > Well, what's the long term plan?  Is the LLVM backend going to
>>     become the only backend at some point?
>>
>>     I wouldn't argue against ditching the NCG entirely. It's hard to
>>     justify fixing NCG performance problems when fixing them won't
>>     make the NCG faster than LLVM, and everyone uses LLVM anyway.
>>
>>     We're going to need more and more SIMD support when processors
>>     supporting the Larrabee New Instructions (LRBni) appear on
>>     people's desks. At that time there still won't be a good enough
>>     reason to implement those instructions in the NCG.
>>
>> I hope to implement SIMD support for the native code gen soon. It's
>> not a huge task and having feature parity between LLVM and NCG would
>> be good.
>
> Will you also update the SIMD support, register allocators, and calling
> conventions in 2015 when AVX-512 lands on the desktop? On all supported
> platforms? What about support for the x86 vcompress and vexpand
> instructions with mask registers? What about when someone finally asks
> for packed conversions between 16xWord8s and 16xFloat32s where you need
> to split the result into four separate registers? LLVM does that
> automatically.
>
> I've been down this path before. In 2007 I implemented a separate graph
> colouring register allocator in the NCG to supposably improve GHC's
> numeric performance, but the LLVM backend subsumed that work and now
> having two separate register allocators is more of a maintenance burden
> than a help to anyone. At the time, LLVM was just becoming well known,
> so it wasn't obvious that implementing a new register allocator was a
> largely a redundant piece of work -- but I think it's clear now. I was
> happy to work on the project at the time, and I learned a lot from it,
> but when starting new projects now I also try to imagine the system that
> will replace the one I'm dreaming of.
>
> Of course, you should do what interests you -- I'm just pointing out a
> strategic consideration.

The existence of LLVM is definitely an argument not to put any more
effort into backend optimisation in GHC, at least for those
optimisations that LLVM can already do.

But as for whether the NCG is needed at all - there are a few ways that
the LLVM backend needs to be improved before it can be considered to be
a complete replacement for the NCG:

1. Compilation speed.  LLVM approximately doubles compilation time.
Avoiding going via the textual intermediate syntax would probably help here.

2. Shared library support (#4210, #5786).  It works (or worked?) on a
couple of platforms.  But even on those platforms it generated worse
code than the NCG due to using dynamic references for *all* symbols,
whereas the NCG knows which symbols live in a separate package and need
to use dynamic references.

3. Some low-level optimisation problems (#4308, #5567).  The LLVM
backend generates bad code for certain critical bits of the runtime,
perhaps due to lack of good aliasing information.  This hasn't been
revisited in the light of the new codegen, so perhaps it's better now.

Someone should benchmark the LLVM backend against the NCG with new
codegen in GHC 7.8.  It's possible that the new codegen is getting a
slight boost because it doesn't have to split up proc points, so it can
do better code generation for let-no-escapes. (It's also possible that
LLVM is being penalised a bit for the same reason - I spent more time
peering at NCG-generated code than LLVM-generated code).

These are some good places to start if you want to see GHC drop the NCG.

Cheers,
        Simon




Reply | Threaded
Open this post in threaded view
|

Design discussion for atomic primops to land in 7.8

Austin Seipp-4
To do this, IMO we'd also really have to start shipping our own copy
of LLVM. The current situation (use what we have configured or in
$PATH) won't really become feasible later on.

On platforms like ARM where there is no NCG, the mismatches can become
super painful, and it makes depending on certain features of the IR or
compiler toolchain (like an advanced, ISA-aware vectorizer in LLVM
3.3+) way more difficult, aside from being a management nightmare.

Fixing it does require taking a hit on things like build times,
though. Or we could use binary releases, but we occasionally may want
to tweak and/or fix things. If we ship our own LLVM for example, it's
reasonable to assume sometime in the future we'll want to change the
ABI during a release.

This does bring other benefits. Max Bolingbroke had an old alias
analysis plugin for LLVM that made a noticeable improvement on certain
kinds of programs, but shipping it against an arbitrary LLVM is
infeasible. Stuff like this could now be possible too.

In a way, I think there's some merit to having a simple, integrated
code generator that does the correct thing, with a high performance
option as we have now. LLVM is a huge project, and there's definitely
some part of me that thinks this may not lower our complexity budget
as much as we think, only shift parts of it around ('second rate'
platforms like PPC/ARM expose way more bugs in my experience, and
tracking them across such a massive surface area can be quite
difficult.) It's very stable and well tested, but an unequivocal
dependency on hundreds of thousands of lines of deeply complex code is
a big question no matter what.

But, the current NCG isn't that 'simple correct thing' either, though.
I think it's easily one of the least understood parts of the compiler
with a long history, it's rarely refactored or modified (very unlike
other parts,) and it's maintained only as necessary. Which doesn't
bode well for its future in any case.


On Mon, Aug 26, 2013 at 3:19 PM, Simon Marlow <marlowsd at gmail.com> wrote:

> On 26/08/13 08:17, Ben Lippmeier wrote:
>>
>>
>>>     > Well, what's the long term plan?  Is the LLVM backend going to
>>>     become the only backend at some point?
>>>
>>>     I wouldn't argue against ditching the NCG entirely. It's hard to
>>>     justify fixing NCG performance problems when fixing them won't
>>>     make the NCG faster than LLVM, and everyone uses LLVM anyway.
>>>
>>>     We're going to need more and more SIMD support when processors
>>>     supporting the Larrabee New Instructions (LRBni) appear on
>>>     people's desks. At that time there still won't be a good enough
>>>     reason to implement those instructions in the NCG.
>>>
>>> I hope to implement SIMD support for the native code gen soon. It's
>>> not a huge task and having feature parity between LLVM and NCG would
>>> be good.
>>
>>
>>  Will you also update the SIMD support, register allocators, and calling
>> conventions in 2015 when AVX-512 lands on the desktop? On all supported
>> platforms? What about support for the x86 vcompress and vexpand
>> instructions with mask registers? What about when someone finally asks
>> for packed conversions between 16xWord8s and 16xFloat32s where you need
>> to split the result into four separate registers? LLVM does that
>> automatically.
>>
>> I've been down this path before. In 2007 I implemented a separate graph
>> colouring register allocator in the NCG to supposably improve GHC's
>> numeric performance, but the LLVM backend subsumed that work and now
>> having two separate register allocators is more of a maintenance burden
>> than a help to anyone. At the time, LLVM was just becoming well known,
>> so it wasn't obvious that implementing a new register allocator was a
>> largely a redundant piece of work -- but I think it's clear now. I was
>> happy to work on the project at the time, and I learned a lot from it,
>> but when starting new projects now I also try to imagine the system that
>> will replace the one I'm dreaming of.
>>
>> Of course, you should do what interests you -- I'm just pointing out a
>> strategic consideration.
>
>
> The existence of LLVM is definitely an argument not to put any more effort
> into backend optimisation in GHC, at least for those optimisations that LLVM
> can already do.
>
> But as for whether the NCG is needed at all - there are a few ways that the
> LLVM backend needs to be improved before it can be considered to be a
> complete replacement for the NCG:
>
> 1. Compilation speed.  LLVM approximately doubles compilation time. Avoiding
> going via the textual intermediate syntax would probably help here.
>
> 2. Shared library support (#4210, #5786).  It works (or worked?) on a couple
> of platforms.  But even on those platforms it generated worse code than the
> NCG due to using dynamic references for *all* symbols, whereas the NCG knows
> which symbols live in a separate package and need to use dynamic references.
>
> 3. Some low-level optimisation problems (#4308, #5567).  The LLVM backend
> generates bad code for certain critical bits of the runtime, perhaps due to
> lack of good aliasing information.  This hasn't been revisited in the light
> of the new codegen, so perhaps it's better now.
>
> Someone should benchmark the LLVM backend against the NCG with new codegen
> in GHC 7.8.  It's possible that the new codegen is getting a slight boost
> because it doesn't have to split up proc points, so it can do better code
> generation for let-no-escapes. (It's also possible that LLVM is being
> penalised a bit for the same reason - I spent more time peering at
> NCG-generated code than LLVM-generated code).
>
> These are some good places to start if you want to see GHC drop the NCG.
>
> Cheers,
>         Simon
>
>
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs



--
Regards,
Austin - PGP: 4096R/0x91384671



Reply | Threaded
Open this post in threaded view
|

Design discussion for atomic primops to land in 7.8

Ben Lippmeier-2

I've collected the main points of this discussion on the wiki.

http://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/Backends/LLVM/ReplacingNCG

Ben.

On 28/08/2013, at 2:51 AM, Austin Seipp wrote:
>>>>> Well, what's the long term plan?  Is the LLVM backend going to
>>>>    become the only backend at some point?