LLVM calling convention for AVX2 and AVX512 registers

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

LLVM calling convention for AVX2 and AVX512 registers

Edward Kmett-2
Back around 2013, Geoff raised a discussion about fixing up the GHC ABI so that the LLVM calling convention could pass 256 bit vector types in YMM (and, i suppose now 512 bit vector types in ZMM).

As I recall, this was blocked by some short term concerns about which LLVM release was imminent or what have you. Four years on, the exact same sort of arguments could be dredged up, but yet in the meantime nobody is really using those types for anything.

This still creates a pain point around trying to use these wide types today. Spilling rather than passing them in registers adds a LOT of overhead to any attempt to use them that virtually erases any benefit to having them in the first place.

I started experimenting with writing some custom primops directly in llvm so I could do meaningful amounts of work with our SIMD vector types by just banging out the code that we can't write in haskell directly using llvm assembly, and hoping I could trick LLVM to do link time optimization to perhaps inline it, but I'm basically dead in the water over the overhead of our current calling convention, before I even start, it seems, as if we're spilling them there is no way that inlining / LTO could hope to figure out what we're doing out as part of the spill to erase that call entirely.

It is rather frustrating that I can't even cheat. =/

What do we need to do to finally fix this?

-Edward

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: LLVM calling convention for AVX2 and AVX512 registers

Geoffrey Mainland
We would need to get a patch to LLVM accepted to change the GHC calling
convention.

Now that we commit to a particular version of LLVM, this might be less
of an issue than it once was since we wouldn't have to support versions
of LLVM that didn't support the new calling convention.

So...how do we get a patch into LLVM? I believe I once had such a patch
ready to go...I will dig around for it, but the change is very small and
easily recreated.

It would be even better if we could *also* teach the native back end
about SSE instructions. Is there anyone who might be willing to work on
that?

Geoff

On 3/9/17 2:30 PM, Edward Kmett wrote:

> Back around 2013, Geoff raised a discussion about fixing up the GHC
> ABI so that the LLVM calling convention could pass 256 bit vector
> types in YMM (and, i suppose now 512 bit vector types in ZMM).
>
> As I recall, this was blocked by some short term concerns about which
> LLVM release was imminent or what have you. Four years on, the exact
> same sort of arguments could be dredged up, but yet in the meantime
> nobody is really using those types for anything.
>
> This still creates a pain point around trying to use these wide types
> today. Spilling rather than passing them in registers adds a LOT of
> overhead to any attempt to use them that virtually erases any benefit
> to having them in the first place.
>
> I started experimenting with writing some custom primops directly in
> llvm so I could do meaningful amounts of work with our SIMD vector
> types by just banging out the code that we can't write in haskell
> directly using llvm assembly, and hoping I could trick LLVM to do link
> time optimization to perhaps inline it, but I'm basically dead in the
> water over the overhead of our current calling convention, before I
> even start, it seems, as if we're spilling them there is no way that
> inlining / LTO could hope to figure out what we're doing out as part
> of the spill to erase that call entirely.
>
> It is rather frustrating that I can't even cheat. =/
>
> What do we need to do to finally fix this?
>
> -Edward

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: LLVM calling convention for AVX2 and AVX512 registers

Carter Schonwald
the patch is still on TRAC,


we need to do changes to both the 32bit and 64bit ABIs, and I think thats where I got stalled from lack of feedback

that aside:

heres the original email thread on the llvm commits thread 

and theres links from there to the iterating on the test suite plus the original patch

i'm more than happy to take a weekend to do the leg work, it was pretty fun last time.

BUT, we need to agree on what ABI to do, and make sure that those ABI changes dont create a performance regression for some unexpected reason.

On Thu, Mar 9, 2017 at 3:11 PM, Geoffrey Mainland <[hidden email]> wrote:
We would need to get a patch to LLVM accepted to change the GHC calling
convention.

Now that we commit to a particular version of LLVM, this might be less
of an issue than it once was since we wouldn't have to support versions
of LLVM that didn't support the new calling convention.

So...how do we get a patch into LLVM? I believe I once had such a patch
ready to go...I will dig around for it, but the change is very small and
easily recreated.

It would be even better if we could *also* teach the native back end
about SSE instructions. Is there anyone who might be willing to work on
that?

Geoff

On 3/9/17 2:30 PM, Edward Kmett wrote:
> Back around 2013, Geoff raised a discussion about fixing up the GHC
> ABI so that the LLVM calling convention could pass 256 bit vector
> types in YMM (and, i suppose now 512 bit vector types in ZMM).
>
> As I recall, this was blocked by some short term concerns about which
> LLVM release was imminent or what have you. Four years on, the exact
> same sort of arguments could be dredged up, but yet in the meantime
> nobody is really using those types for anything.
>
> This still creates a pain point around trying to use these wide types
> today. Spilling rather than passing them in registers adds a LOT of
> overhead to any attempt to use them that virtually erases any benefit
> to having them in the first place.
>
> I started experimenting with writing some custom primops directly in
> llvm so I could do meaningful amounts of work with our SIMD vector
> types by just banging out the code that we can't write in haskell
> directly using llvm assembly, and hoping I could trick LLVM to do link
> time optimization to perhaps inline it, but I'm basically dead in the
> water over the overhead of our current calling convention, before I
> even start, it seems, as if we're spilling them there is no way that
> inlining / LTO could hope to figure out what we're doing out as part
> of the spill to erase that call entirely.
>
> It is rather frustrating that I can't even cheat. =/
>
> What do we need to do to finally fix this?
>
> -Edward



_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: LLVM calling convention for AVX2 and AVX512 registers

Carter Schonwald
zooming out: 

what *should* the new ABI be?

Ed was suggesting we make all 16 xmm/ymm/ lower 16 zmm registers (depending on how they're being used) caller save, 

(what about all 32 zmm registers? would they be float only, or also for ints/words? simd has lots of nice int support!)

a) if this doesn't cause any perf regressions i've no objections

b) currently we only support passing floats/doubles and simd vectors of , do we wanna support int/word data there too? (or are the GPR / general purpose registers enough for those? )

c) other stuff i'm probably overlooking

d) lets do this!

On Thu, Mar 9, 2017 at 3:31 PM, Carter Schonwald <[hidden email]> wrote:
the patch is still on TRAC,


we need to do changes to both the 32bit and 64bit ABIs, and I think thats where I got stalled from lack of feedback

that aside:

heres the original email thread on the llvm commits thread 

and theres links from there to the iterating on the test suite plus the original patch

i'm more than happy to take a weekend to do the leg work, it was pretty fun last time.

BUT, we need to agree on what ABI to do, and make sure that those ABI changes dont create a performance regression for some unexpected reason.

On Thu, Mar 9, 2017 at 3:11 PM, Geoffrey Mainland <[hidden email]> wrote:
We would need to get a patch to LLVM accepted to change the GHC calling
convention.

Now that we commit to a particular version of LLVM, this might be less
of an issue than it once was since we wouldn't have to support versions
of LLVM that didn't support the new calling convention.

So...how do we get a patch into LLVM? I believe I once had such a patch
ready to go...I will dig around for it, but the change is very small and
easily recreated.

It would be even better if we could *also* teach the native back end
about SSE instructions. Is there anyone who might be willing to work on
that?

Geoff

On 3/9/17 2:30 PM, Edward Kmett wrote:
> Back around 2013, Geoff raised a discussion about fixing up the GHC
> ABI so that the LLVM calling convention could pass 256 bit vector
> types in YMM (and, i suppose now 512 bit vector types in ZMM).
>
> As I recall, this was blocked by some short term concerns about which
> LLVM release was imminent or what have you. Four years on, the exact
> same sort of arguments could be dredged up, but yet in the meantime
> nobody is really using those types for anything.
>
> This still creates a pain point around trying to use these wide types
> today. Spilling rather than passing them in registers adds a LOT of
> overhead to any attempt to use them that virtually erases any benefit
> to having them in the first place.
>
> I started experimenting with writing some custom primops directly in
> llvm so I could do meaningful amounts of work with our SIMD vector
> types by just banging out the code that we can't write in haskell
> directly using llvm assembly, and hoping I could trick LLVM to do link
> time optimization to perhaps inline it, but I'm basically dead in the
> water over the overhead of our current calling convention, before I
> even start, it seems, as if we're spilling them there is no way that
> inlining / LTO could hope to figure out what we're doing out as part
> of the spill to erase that call entirely.
>
> It is rather frustrating that I can't even cheat. =/
>
> What do we need to do to finally fix this?
>
> -Edward




_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: LLVM calling convention for AVX2 and AVX512 registers

Edward Kmett-2
If we only turn on ymm and zmm for passing explicit 256bit and 512bit vector types then changing the ABI would have basically zero effect on any code anybody is actually using today. Everything would remain abi compatible unless it involves the new types that nobody is using. 

This also has the benefit that turning on avx2 or avx512 wouldn't change the calling convention of any code, making it much safer to link code compiled with it on with code compiled with it off. That seems like a big deal.

Moreover, if we start passing normal floats, etc. through them then our lack of shuffles and ways to get data in/out of them becomes quite a pain point.

As for passing int/word data, passing the vectors of them through the ymm and zmm registers should be sufficient for the same reasons.

-Edward

On Thu, Mar 9, 2017 at 3:55 PM, Carter Schonwald <[hidden email]> wrote:
zooming out: 

what *should* the new ABI be?

Ed was suggesting we make all 16 xmm/ymm/ lower 16 zmm registers (depending on how they're being used) caller save, 

(what about all 32 zmm registers? would they be float only, or also for ints/words? simd has lots of nice int support!)

a) if this doesn't cause any perf regressions i've no objections

b) currently we only support passing floats/doubles and simd vectors of , do we wanna support int/word data there too? (or are the GPR / general purpose registers enough for those? )

c) other stuff i'm probably overlooking

d) lets do this!

On Thu, Mar 9, 2017 at 3:31 PM, Carter Schonwald <[hidden email]> wrote:
the patch is still on TRAC,


we need to do changes to both the 32bit and 64bit ABIs, and I think thats where I got stalled from lack of feedback

that aside:

heres the original email thread on the llvm commits thread 

and theres links from there to the iterating on the test suite plus the original patch

i'm more than happy to take a weekend to do the leg work, it was pretty fun last time.

BUT, we need to agree on what ABI to do, and make sure that those ABI changes dont create a performance regression for some unexpected reason.

On Thu, Mar 9, 2017 at 3:11 PM, Geoffrey Mainland <[hidden email]> wrote:
We would need to get a patch to LLVM accepted to change the GHC calling
convention.

Now that we commit to a particular version of LLVM, this might be less
of an issue than it once was since we wouldn't have to support versions
of LLVM that didn't support the new calling convention.

So...how do we get a patch into LLVM? I believe I once had such a patch
ready to go...I will dig around for it, but the change is very small and
easily recreated.

It would be even better if we could *also* teach the native back end
about SSE instructions. Is there anyone who might be willing to work on
that?

Geoff

On 3/9/17 2:30 PM, Edward Kmett wrote:
> Back around 2013, Geoff raised a discussion about fixing up the GHC
> ABI so that the LLVM calling convention could pass 256 bit vector
> types in YMM (and, i suppose now 512 bit vector types in ZMM).
>
> As I recall, this was blocked by some short term concerns about which
> LLVM release was imminent or what have you. Four years on, the exact
> same sort of arguments could be dredged up, but yet in the meantime
> nobody is really using those types for anything.
>
> This still creates a pain point around trying to use these wide types
> today. Spilling rather than passing them in registers adds a LOT of
> overhead to any attempt to use them that virtually erases any benefit
> to having them in the first place.
>
> I started experimenting with writing some custom primops directly in
> llvm so I could do meaningful amounts of work with our SIMD vector
> types by just banging out the code that we can't write in haskell
> directly using llvm assembly, and hoping I could trick LLVM to do link
> time optimization to perhaps inline it, but I'm basically dead in the
> water over the overhead of our current calling convention, before I
> even start, it seems, as if we're spilling them there is no way that
> inlining / LTO could hope to figure out what we're doing out as part
> of the spill to erase that call entirely.
>
> It is rather frustrating that I can't even cheat. =/
>
> What do we need to do to finally fix this?
>
> -Edward





_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: LLVM calling convention for AVX2 and AVX512 registers

Siddhanathan Shanmugam
It would be even better if we could *also* teach the native back end about SSE instructions. Is there anyone who might be willing to work on that?

Yes. Though, it would be better if someone with more experience than me decides to pick this up instead.

On Thu, Mar 9, 2017 at 7:00 PM, Edward Kmett <[hidden email]> wrote:
If we only turn on ymm and zmm for passing explicit 256bit and 512bit vector types then changing the ABI would have basically zero effect on any code anybody is actually using today. Everything would remain abi compatible unless it involves the new types that nobody is using. 

This also has the benefit that turning on avx2 or avx512 wouldn't change the calling convention of any code, making it much safer to link code compiled with it on with code compiled with it off. That seems like a big deal.

Moreover, if we start passing normal floats, etc. through them then our lack of shuffles and ways to get data in/out of them becomes quite a pain point.

As for passing int/word data, passing the vectors of them through the ymm and zmm registers should be sufficient for the same reasons.

-Edward

On Thu, Mar 9, 2017 at 3:55 PM, Carter Schonwald <[hidden email]> wrote:
zooming out: 

what *should* the new ABI be?

Ed was suggesting we make all 16 xmm/ymm/ lower 16 zmm registers (depending on how they're being used) caller save, 

(what about all 32 zmm registers? would they be float only, or also for ints/words? simd has lots of nice int support!)

a) if this doesn't cause any perf regressions i've no objections

b) currently we only support passing floats/doubles and simd vectors of , do we wanna support int/word data there too? (or are the GPR / general purpose registers enough for those? )

c) other stuff i'm probably overlooking

d) lets do this!

On Thu, Mar 9, 2017 at 3:31 PM, Carter Schonwald <[hidden email]> wrote:
the patch is still on TRAC,


we need to do changes to both the 32bit and 64bit ABIs, and I think thats where I got stalled from lack of feedback

that aside:

heres the original email thread on the llvm commits thread 

and theres links from there to the iterating on the test suite plus the original patch

i'm more than happy to take a weekend to do the leg work, it was pretty fun last time.

BUT, we need to agree on what ABI to do, and make sure that those ABI changes dont create a performance regression for some unexpected reason.

On Thu, Mar 9, 2017 at 3:11 PM, Geoffrey Mainland <[hidden email]> wrote:
We would need to get a patch to LLVM accepted to change the GHC calling
convention.

Now that we commit to a particular version of LLVM, this might be less
of an issue than it once was since we wouldn't have to support versions
of LLVM that didn't support the new calling convention.

So...how do we get a patch into LLVM? I believe I once had such a patch
ready to go...I will dig around for it, but the change is very small and
easily recreated.

It would be even better if we could *also* teach the native back end
about SSE instructions. Is there anyone who might be willing to work on
that?

Geoff

On 3/9/17 2:30 PM, Edward Kmett wrote:
> Back around 2013, Geoff raised a discussion about fixing up the GHC
> ABI so that the LLVM calling convention could pass 256 bit vector
> types in YMM (and, i suppose now 512 bit vector types in ZMM).
>
> As I recall, this was blocked by some short term concerns about which
> LLVM release was imminent or what have you. Four years on, the exact
> same sort of arguments could be dredged up, but yet in the meantime
> nobody is really using those types for anything.
>
> This still creates a pain point around trying to use these wide types
> today. Spilling rather than passing them in registers adds a LOT of
> overhead to any attempt to use them that virtually erases any benefit
> to having them in the first place.
>
> I started experimenting with writing some custom primops directly in
> llvm so I could do meaningful amounts of work with our SIMD vector
> types by just banging out the code that we can't write in haskell
> directly using llvm assembly, and hoping I could trick LLVM to do link
> time optimization to perhaps inline it, but I'm basically dead in the
> water over the overhead of our current calling convention, before I
> even start, it seems, as if we're spilling them there is no way that
> inlining / LTO could hope to figure out what we're doing out as part
> of the spill to erase that call entirely.
>
> It is rather frustrating that I can't even cheat. =/
>
> What do we need to do to finally fix this?
>
> -Edward





_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs



_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: LLVM calling convention for AVX2 and AVX512 registers

Ben Gamari-3
Siddhanathan Shanmugam <[hidden email]> writes:

>> It would be even better if we could *also* teach the native back end about
> SSE instructions. Is there anyone who might be willing to work on that?
>
> Yes. Though, it would be better if someone with more experience than me
> decides to pick this up instead.
>
I would be happy to advise if you would like to pick this up. I think it
would be great if the NCG were to learn about SSE and GHC could really
use more people knowledgable about its backend. The best way to learn is
by doing.

Cheers,

- Ben

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

signature.asc (497 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: LLVM calling convention for AVX2 and AVX512 registers

Edward Kmett-2
That, rather tangentially, reminds me: If we do start to teach the code generator about how to produce these sorts of things from simpler parts, e.g. via enabling something like LLVM's vectorization pass, or some internal future ghc compiler pass that checks for, say, Superword-Level Parallelism in the style of Jaewook Shin, then we need to differentiate between flags for what ghc/llvm is allowed to produce via optimization, etc. and what the end user is allowed to explicitly emit. e.g. in my own code I can safely call avx2 primitives after I set up guards to check that I'm on a CPU that supports them, but I can only currently emit that code after I tell GHC that I want it to allow the avx2 instructions. If I build a complicated dispatch mechanism in Haskell for picking the right ISA and emitting code for several of them, I'm going to need to tell ghc to let me build with all sorts of instruction sets that the machine the final executable runs on may not fully support. We should be careful not to conflate these two things.

-Edward

On Mon, Mar 13, 2017 at 2:44 PM, Ben Gamari <[hidden email]> wrote:
Siddhanathan Shanmugam <[hidden email]> writes:

>> It would be even better if we could *also* teach the native back end about
> SSE instructions. Is there anyone who might be willing to work on that?
>
> Yes. Though, it would be better if someone with more experience than me
> decides to pick this up instead.
>
I would be happy to advise if you would like to pick this up. I think it
would be great if the NCG were to learn about SSE and GHC could really
use more people knowledgable about its backend. The best way to learn is
by doing.

Cheers,

- Ben


_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: LLVM calling convention for AVX2 and AVX512 registers

Ben Gamari-3
Edward Kmett <[hidden email]> writes:

> That, rather tangentially, reminds me: If we do start to teach the code
> generator about how to produce these sorts of things from simpler parts,
> e.g. via enabling something like LLVM's vectorization pass, or some
> internal future ghc compiler pass that checks for, say, Superword-Level
> Parallelism
> <http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.106.4663&rep=rep1&type=pdf>
> in the style of Jaewook Shin, then we need to differentiate between flags
> for what ghc/llvm is allowed to produce via optimization, etc. and what the
> end user is allowed to explicitly emit. e.g. in my own code I can safely
> call avx2 primitives after I set up guards to check that I'm on a CPU that
> supports them, but I can only currently emit that code after I tell GHC
> that I want it to allow the avx2 instructions. If I build a complicated
> dispatch mechanism in Haskell for picking the right ISA and emitting code
> for several of them, I'm going to need to tell ghc to let me build with all
> sorts of instruction sets that the machine the final executable runs on may
> not fully support. We should be careful not to conflate these two things.
>
Indeed this is tricky.

The obvious stop-gap solution is to simply move your various platform
dependent implementations into multiple modules. However, as you say
this quickly breaks down once GHC itself starts to learn vectorisation.
At that point you will need to draw the distinction you mention,
separating the ISA available to the user and that available to the
compiler.

Another related question is whether you eventually want a way to specify
an ISA per-function (via pragma, for instance). This would allow you to
set a conservative `-march` for the module on the whole, but allow use
of ISA extensions precisely when necessary. This is a bit tricky in the
face of inlining; perhaps you want to require only `NOINLINE` functions
can be decorated with such a thing.

I suspect in the case of LLVM this will require breaking modules up into
multiple compilation units and linking together the resulting objects.
This will certainly require a fair bit of engineering effort but nothing
terribly difficult.

Regarding dispatch, GCC has a function multi-versioning mechanism [1]
which is seems relevant to mention here. However, it's not entirely
clear to me whether the complexity here is worthwhile for GHC.

Anyways, there are plenty of possible options here; it would be helpful
to have a feature request ticket for the "user/compiler ISA" idea you
propose where we can collect ideas. Perhaps you could open one?

Cheers,

- Ben


[1] https://lwn.net/Articles/691666/

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

signature.asc (497 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: LLVM calling convention for AVX2 and AVX512 registers

Edward Kmett-2
Hrmm. In C/C++ I can tell individual functions to turn on additional ISA feature sets with compiler-specific __attribute__((target("avx2"))) tricks. This avoids complains from the compiler when I call builtins that aren't available at my current compilation feature level. Perhaps pragmas for the codegen along those lines is what we'd ultimately need? Alternately, if we simply distinguish between what the ghc codegen produces with one set of options and what we're allowed to ask for explicitly with another then user-land tricks like I employ would remain sound.

-Edward

On Mon, Mar 13, 2017 at 7:26 PM, Ben Gamari <[hidden email]> wrote:
Edward Kmett <[hidden email]> writes:

> That, rather tangentially, reminds me: If we do start to teach the code
> generator about how to produce these sorts of things from simpler parts,
> e.g. via enabling something like LLVM's vectorization pass, or some
> internal future ghc compiler pass that checks for, say, Superword-Level
> Parallelism
> <http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.106.4663&rep=rep1&type=pdf>
> in the style of Jaewook Shin, then we need to differentiate between flags
> for what ghc/llvm is allowed to produce via optimization, etc. and what the
> end user is allowed to explicitly emit. e.g. in my own code I can safely
> call avx2 primitives after I set up guards to check that I'm on a CPU that
> supports them, but I can only currently emit that code after I tell GHC
> that I want it to allow the avx2 instructions. If I build a complicated
> dispatch mechanism in Haskell for picking the right ISA and emitting code
> for several of them, I'm going to need to tell ghc to let me build with all
> sorts of instruction sets that the machine the final executable runs on may
> not fully support. We should be careful not to conflate these two things.
>
Indeed this is tricky.

The obvious stop-gap solution is to simply move your various platform
dependent implementations into multiple modules. However, as you say
this quickly breaks down once GHC itself starts to learn vectorisation.
At that point you will need to draw the distinction you mention,
separating the ISA available to the user and that available to the
compiler.

Another related question is whether you eventually want a way to specify
an ISA per-function (via pragma, for instance). This would allow you to
set a conservative `-march` for the module on the whole, but allow use
of ISA extensions precisely when necessary. This is a bit tricky in the
face of inlining; perhaps you want to require only `NOINLINE` functions
can be decorated with such a thing.

I suspect in the case of LLVM this will require breaking modules up into
multiple compilation units and linking together the resulting objects.
This will certainly require a fair bit of engineering effort but nothing
terribly difficult.

Regarding dispatch, GCC has a function multi-versioning mechanism [1]
which is seems relevant to mention here. However, it's not entirely
clear to me whether the complexity here is worthwhile for GHC.

Anyways, there are plenty of possible options here; it would be helpful
to have a feature request ticket for the "user/compiler ISA" idea you
propose where we can collect ideas. Perhaps you could open one?

Cheers,

- Ben


[1] https://lwn.net/Articles/691666/


_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: LLVM calling convention for AVX2 and AVX512 registers

Ben Gamari-3
Edward Kmett <[hidden email]> writes:

> Hrmm. In C/C++ I can tell individual functions to turn on additional ISA
> feature sets with compiler-specific __attribute__((target("avx2"))) tricks.
> This avoids complains from the compiler when I call builtins that aren't
> available at my current compilation feature level. Perhaps pragmas for the
> codegen along those lines is what we'd ultimately need? Alternately, if we
> simply distinguish between what the ghc codegen produces with one set of
> options and what we're allowed to ask for explicitly with another then
> user-land tricks like I employ would remain sound.
>
I'm actually not sure that simply distinguishing between the user- and
codegen-allowed ISA extensions is quite sufficient. Afterall, AFAIK LLVM
doesn't make such a distinction itself: AFAIK if you write a vector
primitive and compile for a target that doesn't have an appropriate
instruction the code-generator will lower it with software emulation.

However, adding a pragma to allow per-function target annotations seems
quite reasonable and easily doable. Moreover, contrary to my previous
assertion, it shouldn't require any splitting of compilation units. I
ran a quick experiment, compiling this program,

    __attribute__((target("sse2"))) int hello() {
      return 1;
    }

With clang. It produced something like,

    define i32 @hello() #0 {
      ret i32 1
    }

    attributes #0 = { "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" ... }

So it seems LLVM is perfectly capable of expressing this; in hindsight
I'm not sure why I ever doubted this.

There are a number of details that would need to be worked out regarding
how such a pragma should behave. Does the general direction sound
reasonable? I've opened #13427 [1] to track this idea.

Cheers,

- Ben


[1] https://ghc.haskell.org/trac/ghc/ticket/13427

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

signature.asc (497 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: LLVM calling convention for AVX2 and AVX512 registers

Geoffrey Mainland
On 03/14/2017 04:02 PM, Ben Gamari wrote:

> Edward Kmett <[hidden email]> writes:
>
>> Hrmm. In C/C++ I can tell individual functions to turn on additional ISA
>> feature sets with compiler-specific __attribute__((target("avx2"))) tricks.
>> This avoids complains from the compiler when I call builtins that aren't
>> available at my current compilation feature level. Perhaps pragmas for the
>> codegen along those lines is what we'd ultimately need? Alternately, if we
>> simply distinguish between what the ghc codegen produces with one set of
>> options and what we're allowed to ask for explicitly with another then
>> user-land tricks like I employ would remain sound.
>>
> I'm actually not sure that simply distinguishing between the user- and
> codegen-allowed ISA extensions is quite sufficient. Afterall, AFAIK LLVM
> doesn't make such a distinction itself: AFAIK if you write a vector
> primitive and compile for a target that doesn't have an appropriate
> instruction the code-generator will lower it with software emulation.

This would mean that Haskell libraries compiled with different flags
would not be ABI compatible.

Our original paper exposed a Multi type class that was meant to be the
programmer interface to the primops. A Multi a would be the widest
vector type supported on the current architecture, so code that used a
Multi Double would always be guaranteed to work at the widest vector
type available for Double's.

The Multi approach explicitly eschewed lowering, but I would argue that
if performance is the goal, then automatic lowering is not what you
want. I would rather have the system pick the correct vector width for
me based on the current architecture.

This does nothing to solved the problem of ABI compatibility, which is
one reason I didn't push to get this upstreamed.

Is the Multi approach desirable? I think it would be nice to be able to
at least provide such a solution even if it isn't some sort of default.
Do we really want lowering of wider vector types?

Geoff

> However, adding a pragma to allow per-function target annotations seems
> quite reasonable and easily doable. Moreover, contrary to my previous
> assertion, it shouldn't require any splitting of compilation units. I
> ran a quick experiment, compiling this program,
>
>     __attribute__((target("sse2"))) int hello() {
>       return 1;
>     }
>
> With clang. It produced something like,
>
>     define i32 @hello() #0 {
>       ret i32 1
>     }
>
>     attributes #0 = { "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" ... }
>
> So it seems LLVM is perfectly capable of expressing this; in hindsight
> I'm not sure why I ever doubted this.
>
> There are a number of details that would need to be worked out regarding
> how such a pragma should behave. Does the general direction sound
> reasonable? I've opened #13427 [1] to track this idea.
>
> Cheers,
>
> - Ben
>
>
> [1] https://ghc.haskell.org/trac/ghc/ticket/13427



_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: LLVM calling convention for AVX2 and AVX512 registers

Carter Schonwald
This thread is getting into a broader discussion about target specific intrincsics as user prims vs compiler generated. 

@ben - ed is talking about stuff like a function call that's using a specific avx2 intrinsic, not the parameterized vector abstraction. LLvm shouldn't be lowering those. ... or clang has issues :/

On Tue, Mar 14, 2017 at 4:33 PM Geoffrey Mainland <[hidden email]> wrote:
On 03/14/2017 04:02 PM, Ben Gamari wrote:
> Edward Kmett <[hidden email]> writes:
>
>> Hrmm. In C/C++ I can tell individual functions to turn on additional ISA
>> feature sets with compiler-specific __attribute__((target("avx2"))) tricks.
>> This avoids complains from the compiler when I call builtins that aren't
>> available at my current compilation feature level. Perhaps pragmas for the
>> codegen along those lines is what we'd ultimately need? Alternately, if we
>> simply distinguish between what the ghc codegen produces with one set of
>> options and what we're allowed to ask for explicitly with another then
>> user-land tricks like I employ would remain sound.
>>
> I'm actually not sure that simply distinguishing between the user- and
> codegen-allowed ISA extensions is quite sufficient. Afterall, AFAIK LLVM
> doesn't make such a distinction itself: AFAIK if you write a vector
> primitive and compile for a target that doesn't have an appropriate
> instruction the code-generator will lower it with software emulation.

This would mean that Haskell libraries compiled with different flags
would not be ABI compatible.

Our original paper exposed a Multi type class that was meant to be the
programmer interface to the primops. A Multi a would be the widest
vector type supported on the current architecture, so code that used a
Multi Double would always be guaranteed to work at the widest vector
type available for Double's.

The Multi approach explicitly eschewed lowering, but I would argue that
if performance is the goal, then automatic lowering is not what you
want. I would rather have the system pick the correct vector width for
me based on the current architecture.

This does nothing to solved the problem of ABI compatibility, which is
one reason I didn't push to get this upstreamed.

Is the Multi approach desirable? I think it would be nice to be able to
at least provide such a solution even if it isn't some sort of default.
Do we really want lowering of wider vector types?

Geoff

> However, adding a pragma to allow per-function target annotations seems
> quite reasonable and easily doable. Moreover, contrary to my previous
> assertion, it shouldn't require any splitting of compilation units. I
> ran a quick experiment, compiling this program,
>
>     __attribute__((target("sse2"))) int hello() {
>       return 1;
>     }
>
> With clang. It produced something like,
>
>     define i32 @hello() #0 {
>       ret i32 1
>     }
>
>     attributes #0 = { "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" ... }
>
> So it seems LLVM is perfectly capable of expressing this; in hindsight
> I'm not sure why I ever doubted this.
>
> There are a number of details that would need to be worked out regarding
> how such a pragma should behave. Does the general direction sound
> reasonable? I've opened #13427 [1] to track this idea.
>
> Cheers,
>
> - Ben
>
>
> [1] https://ghc.haskell.org/trac/ghc/ticket/13427



_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: LLVM calling convention for AVX2 and AVX512 registers

Siddhanathan Shanmugam
> I would be happy to advise if you would like to pick this up.

Thanks Ben!

> This would mean that Haskell libraries compiled with different flags
> would not be ABI compatible.

Wait, can we not maintain ABI compatibility if we limit the target
features using a compiler flag? Sometimes (for performance reasons)
it's reasonable to request the compiler to only generate SSE
instructions, even if AVX2 is available on the target. On GCC we can
use the flag -msse to do just that.

On Tue, Mar 14, 2017 at 5:49 PM, Carter Schonwald
<[hidden email]> wrote:

> This thread is getting into a broader discussion about target specific
> intrincsics as user prims vs compiler generated.
>
> @ben - ed is talking about stuff like a function call that's using a
> specific avx2 intrinsic, not the parameterized vector abstraction. LLvm
> shouldn't be lowering those. ... or clang has issues :/
>
> On Tue, Mar 14, 2017 at 4:33 PM Geoffrey Mainland <[hidden email]>
> wrote:
>>
>> On 03/14/2017 04:02 PM, Ben Gamari wrote:
>> > Edward Kmett <[hidden email]> writes:
>> >
>> >> Hrmm. In C/C++ I can tell individual functions to turn on additional
>> >> ISA
>> >> feature sets with compiler-specific __attribute__((target("avx2")))
>> >> tricks.
>> >> This avoids complains from the compiler when I call builtins that
>> >> aren't
>> >> available at my current compilation feature level. Perhaps pragmas for
>> >> the
>> >> codegen along those lines is what we'd ultimately need? Alternately, if
>> >> we
>> >> simply distinguish between what the ghc codegen produces with one set
>> >> of
>> >> options and what we're allowed to ask for explicitly with another then
>> >> user-land tricks like I employ would remain sound.
>> >>
>> > I'm actually not sure that simply distinguishing between the user- and
>> > codegen-allowed ISA extensions is quite sufficient. Afterall, AFAIK LLVM
>> > doesn't make such a distinction itself: AFAIK if you write a vector
>> > primitive and compile for a target that doesn't have an appropriate
>> > instruction the code-generator will lower it with software emulation.
>>
>> This would mean that Haskell libraries compiled with different flags
>> would not be ABI compatible.
>>
>> Our original paper exposed a Multi type class that was meant to be the
>> programmer interface to the primops. A Multi a would be the widest
>> vector type supported on the current architecture, so code that used a
>> Multi Double would always be guaranteed to work at the widest vector
>> type available for Double's.
>>
>> The Multi approach explicitly eschewed lowering, but I would argue that
>> if performance is the goal, then automatic lowering is not what you
>> want. I would rather have the system pick the correct vector width for
>> me based on the current architecture.
>>
>> This does nothing to solved the problem of ABI compatibility, which is
>> one reason I didn't push to get this upstreamed.
>>
>> Is the Multi approach desirable? I think it would be nice to be able to
>> at least provide such a solution even if it isn't some sort of default.
>> Do we really want lowering of wider vector types?
>>
>> Geoff
>>
>> > However, adding a pragma to allow per-function target annotations seems
>> > quite reasonable and easily doable. Moreover, contrary to my previous
>> > assertion, it shouldn't require any splitting of compilation units. I
>> > ran a quick experiment, compiling this program,
>> >
>> >     __attribute__((target("sse2"))) int hello() {
>> >       return 1;
>> >     }
>> >
>> > With clang. It produced something like,
>> >
>> >     define i32 @hello() #0 {
>> >       ret i32 1
>> >     }
>> >
>> >     attributes #0 = { "target-cpu"="x86-64"
>> > "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" ... }
>> >
>> > So it seems LLVM is perfectly capable of expressing this; in hindsight
>> > I'm not sure why I ever doubted this.
>> >
>> > There are a number of details that would need to be worked out regarding
>> > how such a pragma should behave. Does the general direction sound
>> > reasonable? I've opened #13427 [1] to track this idea.
>> >
>> > Cheers,
>> >
>> > - Ben
>> >
>> >
>> > [1] https://ghc.haskell.org/trac/ghc/ticket/13427
>>
>>
>>
>> _______________________________________________
>> ghc-devs mailing list
>> [hidden email]
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
>
> _______________________________________________
> ghc-devs mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: LLVM calling convention for AVX2 and AVX512 registers

Ben Gamari-2
Siddhanathan Shanmugam <[hidden email]> writes:

>> I would be happy to advise if you would like to pick this up.
>
> Thanks Ben!
>
>> This would mean that Haskell libraries compiled with different flags
>> would not be ABI compatible.
>
> Wait, can we not maintain ABI compatibility if we limit the target
> features using a compiler flag? Sometimes (for performance reasons)
> it's reasonable to request the compiler to only generate SSE
> instructions, even if AVX2 is available on the target. On GCC we can
> use the flag -msse to do just that.
>
I think the reasoning here is the following (please excuse the rather
contrived example): Consider a function f with two variants,

    module AvxImpl where
    {-# OPTIONS_GHC -mavx #-}
    f :: DoubleX4# -> DoubleX4# -> Double

    module SseImpl where
    {-# OPTIONS_GHC -msse #-}
    f :: DoubleX4# -> DoubleX4# -> Double

If we allow GHC to pass arguments with SIMD registers we now have a bit
of a conundrum: The calling convention for AvxImpl.f will require that
we pass the two arguments in YMM registers, whereas SseImpl.f will
be via passed some other means (perhaps two pairs of XMM registers).

In the C world this isn't a problem AFAIK since intrinsic types map
directly to register classes. Consequently, I can look at a C
declaration type,

    double f(__m256 x, __m256 y);

and tell you precisely the calling convention that would be used. In
GHC, however, we have an abstract vector model and therefore the calling
convention is determined by which ISA the compiler is targetting.

I really don't know how to fix this "correctly". Currently we assume
that there is a static mapping between STG registers and machine
registers. Giving this up sounds quite painful.

Cheers,

- Ben

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

signature.asc (497 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: LLVM calling convention for AVX2 and AVX512 registers

Carter Schonwald
solution: lets call these registers what they are, instead of pretending they're portable. we are not going to find the right abstraction in the first go. lets not do that. first get it working sanely, then figure out proper abstractions

On Wed, Mar 15, 2017 at 10:27 AM, Ben Gamari <[hidden email]> wrote:
Siddhanathan Shanmugam <[hidden email]> writes:

>> I would be happy to advise if you would like to pick this up.
>
> Thanks Ben!
>
>> This would mean that Haskell libraries compiled with different flags
>> would not be ABI compatible.
>
> Wait, can we not maintain ABI compatibility if we limit the target
> features using a compiler flag? Sometimes (for performance reasons)
> it's reasonable to request the compiler to only generate SSE
> instructions, even if AVX2 is available on the target. On GCC we can
> use the flag -msse to do just that.
>
I think the reasoning here is the following (please excuse the rather
contrived example): Consider a function f with two variants,

    module AvxImpl where
    {-# OPTIONS_GHC -mavx #-}
    f :: DoubleX4# -> DoubleX4# -> Double

    module SseImpl where
    {-# OPTIONS_GHC -msse #-}
    f :: DoubleX4# -> DoubleX4# -> Double

If we allow GHC to pass arguments with SIMD registers we now have a bit
of a conundrum: The calling convention for AvxImpl.f will require that
we pass the two arguments in YMM registers, whereas SseImpl.f will
be via passed some other means (perhaps two pairs of XMM registers).

In the C world this isn't a problem AFAIK since intrinsic types map
directly to register classes. Consequently, I can look at a C
declaration type,

    double f(__m256 x, __m256 y);

and tell you precisely the calling convention that would be used. In
GHC, however, we have an abstract vector model and therefore the calling
convention is determined by which ISA the compiler is targetting.

I really don't know how to fix this "correctly". Currently we assume
that there is a static mapping between STG registers and machine
registers. Giving this up sounds quite painful.

Cheers,

- Ben


_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: LLVM calling convention for AVX2 and AVX512 registers

Carter Schonwald
In reply to this post by Siddhanathan Shanmugam
Ok so 

1) xmm when not using fancy features 

2) lets not have types that vary with the abi then!

i genuinely think that this is one of those domains where "no abstraction" is a better starting point than "wrong abstraction"

I believe both edward kmett and I genuinely want to be users of simd on ghc, and i think in both our cases, it would be markedly simpler to ground the initial work in the ISA / CPU feature level operations/ register flavors rather than trying to get ghc to do the "right abstraction" when we have no experience even trying to bundle it up as a library. Lets get stuff off the ground that doesn't mis-abstract, before we start hunting for the right higher level tools on top.  No matter *how* ghc ultimately bundles simd for high level programming, it *will* have to bottom out into these target specific operations at code gen time, and LLVM is *not* an abstraction for it.


On Fri, Mar 10, 2017 at 12:50 AM Siddhanathan Shanmugam <[hidden email]> wrote:
It would be even better if we could *also* teach the native back end about SSE instructions. Is there anyone who might be willing to work on that?

Yes. Though, it would be better if someone with more experience than me decides to pick this up instead.

On Thu, Mar 9, 2017 at 7:00 PM, Edward Kmett <[hidden email]> wrote:
If we only turn on ymm and zmm for passing explicit 256bit and 512bit vector types then changing the ABI would have basically zero effect on any code anybody is actually using today. Everything would remain abi compatible unless it involves the new types that nobody is using. 

This also has the benefit that turning on avx2 or avx512 wouldn't change the calling convention of any code, making it much safer to link code compiled with it on with code compiled with it off. That seems like a big deal.

Moreover, if we start passing normal floats, etc. through them then our lack of shuffles and ways to get data in/out of them becomes quite a pain point.

As for passing int/word data, passing the vectors of them through the ymm and zmm registers should be sufficient for the same reasons.

-Edward

On Thu, Mar 9, 2017 at 3:55 PM, Carter Schonwald <[hidden email]> wrote:
zooming out: 

what *should* the new ABI be?

Ed was suggesting we make all 16 xmm/ymm/ lower 16 zmm registers (depending on how they're being used) caller save, 

(what about all 32 zmm registers? would they be float only, or also for ints/words? simd has lots of nice int support!)

a) if this doesn't cause any perf regressions i've no objections

b) currently we only support passing floats/doubles and simd vectors of , do we wanna support int/word data there too? (or are the GPR / general purpose registers enough for those? )

c) other stuff i'm probably overlooking

d) lets do this!

On Thu, Mar 9, 2017 at 3:31 PM, Carter Schonwald <[hidden email]> wrote:
the patch is still on TRAC,


we need to do changes to both the 32bit and 64bit ABIs, and I think thats where I got stalled from lack of feedback

that aside:

heres the original email thread on the llvm commits thread 

and theres links from there to the iterating on the test suite plus the original patch

i'm more than happy to take a weekend to do the leg work, it was pretty fun last time.

BUT, we need to agree on what ABI to do, and make sure that those ABI changes dont create a performance regression for some unexpected reason.

On Thu, Mar 9, 2017 at 3:11 PM, Geoffrey Mainland <[hidden email]> wrote:
We would need to get a patch to LLVM accepted to change the GHC calling
convention.

Now that we commit to a particular version of LLVM, this might be less
of an issue than it once was since we wouldn't have to support versions
of LLVM that didn't support the new calling convention.

So...how do we get a patch into LLVM? I believe I once had such a patch
ready to go...I will dig around for it, but the change is very small and
easily recreated.

It would be even better if we could *also* teach the native back end
about SSE instructions. Is there anyone who might be willing to work on
that?

Geoff

On 3/9/17 2:30 PM, Edward Kmett wrote:
> Back around 2013, Geoff raised a discussion about fixing up the GHC
> ABI so that the LLVM calling convention could pass 256 bit vector
> types in YMM (and, i suppose now 512 bit vector types in ZMM).
>
> As I recall, this was blocked by some short term concerns about which
> LLVM release was imminent or what have you. Four years on, the exact
> same sort of arguments could be dredged up, but yet in the meantime
> nobody is really using those types for anything.
>
> This still creates a pain point around trying to use these wide types
> today. Spilling rather than passing them in registers adds a LOT of
> overhead to any attempt to use them that virtually erases any benefit
> to having them in the first place.
>
> I started experimenting with writing some custom primops directly in
> llvm so I could do meaningful amounts of work with our SIMD vector
> types by just banging out the code that we can't write in haskell
> directly using llvm assembly, and hoping I could trick LLVM to do link
> time optimization to perhaps inline it, but I'm basically dead in the
> water over the overhead of our current calling convention, before I
> even start, it seems, as if we're spilling them there is no way that
> inlining / LTO could hope to figure out what we're doing out as part
> of the spill to erase that call entirely.
>
> It is rather frustrating that I can't even cheat. =/
>
> What do we need to do to finally fix this?
>
> -Edward





_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs



_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: LLVM calling convention for AVX2 and AVX512 registers

Edward Kmett-2
In reply to this post by Ben Gamari-2
Currently if you try to use a DoubleX4# and don't have AVX2 turned on, it deliberately crashes out during code generation, no? So this is very deliberately not a problem with the current setup as I understand it. It only becomes one if we reverse the decision and decide to add terribly inefficient shims for this functionality at the primop level rather than have a higher level make the right call to just not use functionality that isn't present on the target platform.

-Edward


On Wed, Mar 15, 2017 at 10:27 AM, Ben Gamari <[hidden email]> wrote:
Siddhanathan Shanmugam <[hidden email]> writes:

>> I would be happy to advise if you would like to pick this up.
>
> Thanks Ben!
>
>> This would mean that Haskell libraries compiled with different flags
>> would not be ABI compatible.
>
> Wait, can we not maintain ABI compatibility if we limit the target
> features using a compiler flag? Sometimes (for performance reasons)
> it's reasonable to request the compiler to only generate SSE
> instructions, even if AVX2 is available on the target. On GCC we can
> use the flag -msse to do just that.
>
I think the reasoning here is the following (please excuse the rather
contrived example): Consider a function f with two variants,

    module AvxImpl where
    {-# OPTIONS_GHC -mavx #-}
    f :: DoubleX4# -> DoubleX4# -> Double

    module SseImpl where
    {-# OPTIONS_GHC -msse #-}
    f :: DoubleX4# -> DoubleX4# -> Double

If we allow GHC to pass arguments with SIMD registers we now have a bit
of a conundrum: The calling convention for AvxImpl.f will require that
we pass the two arguments in YMM registers, whereas SseImpl.f will
be via passed some other means (perhaps two pairs of XMM registers).

In the C world this isn't a problem AFAIK since intrinsic types map
directly to register classes. Consequently, I can look at a C
declaration type,

    double f(__m256 x, __m256 y);

and tell you precisely the calling convention that would be used. In
GHC, however, we have an abstract vector model and therefore the calling
convention is determined by which ISA the compiler is targetting.

I really don't know how to fix this "correctly". Currently we assume
that there is a static mapping between STG registers and machine
registers. Giving this up sounds quite painful.

Cheers,

- Ben

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs



_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: LLVM calling convention for AVX2 and AVX512 registers

Carter Schonwald
agreed. and the generic vector size stuff in llvm is both pretty naive, AND not the sane/tractable way to add SIMD support to the NCG, 

i'm totally ok with my vector sizes that are available depending on the target CPU or whatever. Operating systems have very sane errors for that sort of mishap, 

On Wed, Mar 15, 2017 at 3:29 PM, Edward Kmett <[hidden email]> wrote:
Currently if you try to use a DoubleX4# and don't have AVX2 turned on, it deliberately crashes out during code generation, no? So this is very deliberately not a problem with the current setup as I understand it. It only becomes one if we reverse the decision and decide to add terribly inefficient shims for this functionality at the primop level rather than have a higher level make the right call to just not use functionality that isn't present on the target platform.

-Edward


On Wed, Mar 15, 2017 at 10:27 AM, Ben Gamari <[hidden email]> wrote:
Siddhanathan Shanmugam <[hidden email]> writes:

>> I would be happy to advise if you would like to pick this up.
>
> Thanks Ben!
>
>> This would mean that Haskell libraries compiled with different flags
>> would not be ABI compatible.
>
> Wait, can we not maintain ABI compatibility if we limit the target
> features using a compiler flag? Sometimes (for performance reasons)
> it's reasonable to request the compiler to only generate SSE
> instructions, even if AVX2 is available on the target. On GCC we can
> use the flag -msse to do just that.
>
I think the reasoning here is the following (please excuse the rather
contrived example): Consider a function f with two variants,

    module AvxImpl where
    {-# OPTIONS_GHC -mavx #-}
    f :: DoubleX4# -> DoubleX4# -> Double

    module SseImpl where
    {-# OPTIONS_GHC -msse #-}
    f :: DoubleX4# -> DoubleX4# -> Double

If we allow GHC to pass arguments with SIMD registers we now have a bit
of a conundrum: The calling convention for AvxImpl.f will require that
we pass the two arguments in YMM registers, whereas SseImpl.f will
be via passed some other means (perhaps two pairs of XMM registers).

In the C world this isn't a problem AFAIK since intrinsic types map
directly to register classes. Consequently, I can look at a C
declaration type,

    double f(__m256 x, __m256 y);

and tell you precisely the calling convention that would be used. In
GHC, however, we have an abstract vector model and therefore the calling
convention is determined by which ISA the compiler is targetting.

I really don't know how to fix this "correctly". Currently we assume
that there is a static mapping between STG registers and machine
registers. Giving this up sounds quite painful.

Cheers,

- Ben

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs




_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: LLVM calling convention for AVX2 and AVX512 registers

Carter Schonwald
to reiterate: any automated lowering / shimming scheme will hurt any serious user of simd who isn't treating it as some black box abstraction. And those are the very users who are equipped to write / design libraries / ghc improvements that let still *other* users pretend to have a mostly decent black box abstraction. Our compiler engineering bandwidth is not enough to start with any automagic in this problem domain that isn't validated with a model implementation in user space. 

On Wed, Mar 15, 2017 at 3:31 PM, Carter Schonwald <[hidden email]> wrote:
agreed. and the generic vector size stuff in llvm is both pretty naive, AND not the sane/tractable way to add SIMD support to the NCG, 

i'm totally ok with my vector sizes that are available depending on the target CPU or whatever. Operating systems have very sane errors for that sort of mishap, 

On Wed, Mar 15, 2017 at 3:29 PM, Edward Kmett <[hidden email]> wrote:
Currently if you try to use a DoubleX4# and don't have AVX2 turned on, it deliberately crashes out during code generation, no? So this is very deliberately not a problem with the current setup as I understand it. It only becomes one if we reverse the decision and decide to add terribly inefficient shims for this functionality at the primop level rather than have a higher level make the right call to just not use functionality that isn't present on the target platform.

-Edward


On Wed, Mar 15, 2017 at 10:27 AM, Ben Gamari <[hidden email]> wrote:
Siddhanathan Shanmugam <[hidden email]> writes:

>> I would be happy to advise if you would like to pick this up.
>
> Thanks Ben!
>
>> This would mean that Haskell libraries compiled with different flags
>> would not be ABI compatible.
>
> Wait, can we not maintain ABI compatibility if we limit the target
> features using a compiler flag? Sometimes (for performance reasons)
> it's reasonable to request the compiler to only generate SSE
> instructions, even if AVX2 is available on the target. On GCC we can
> use the flag -msse to do just that.
>
I think the reasoning here is the following (please excuse the rather
contrived example): Consider a function f with two variants,

    module AvxImpl where
    {-# OPTIONS_GHC -mavx #-}
    f :: DoubleX4# -> DoubleX4# -> Double

    module SseImpl where
    {-# OPTIONS_GHC -msse #-}
    f :: DoubleX4# -> DoubleX4# -> Double

If we allow GHC to pass arguments with SIMD registers we now have a bit
of a conundrum: The calling convention for AvxImpl.f will require that
we pass the two arguments in YMM registers, whereas SseImpl.f will
be via passed some other means (perhaps two pairs of XMM registers).

In the C world this isn't a problem AFAIK since intrinsic types map
directly to register classes. Consequently, I can look at a C
declaration type,

    double f(__m256 x, __m256 y);

and tell you precisely the calling convention that would be used. In
GHC, however, we have an abstract vector model and therefore the calling
convention is determined by which ISA the compiler is targetting.

I really don't know how to fix this "correctly". Currently we assume
that there is a static mapping between STG registers and machine
registers. Giving this up sounds quite painful.

Cheers,

- Ben

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs





_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
12