The Curious Case of T6084 -or- Register Confusion with LLVM

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

The Curious Case of T6084 -or- Register Confusion with LLVM

Moritz Angermann-2
Hi *,

TLDR: The LLVM backend might confuse floating registers in GHC.

# Demo (Ticket #14251)

Let Demo.hs be the following short program (a minor modification from T6084):
```
{-# LANGUAGE MagicHash, BangPatterns #-}
module Main where

import GHC.Exts

{-# NOINLINE f #-}
f :: (Int# -> Float# -> Double# -> Float# -> Double# -> String) -> String
f g = g 3# 4.0# 5.0## 6.0# 6.9## ++ " World!"

{-# NOINLINE q #-}
q :: Int# -> Float# -> Double# -> Float# -> Double# -> String
q i j k l m = "Hello " ++ show (F# l) ++ " " ++ show (D# m)

main = putStrLn (f $ q)
```

What happens if we compile them with the NCG and LLVM?

$ ghc -fasm -fforce-recomp Demo.hs -O2 -o Demo-ncg && ./Demo-ncg
Hello 6.0 6.9 World!

$ ghc -fllvm -fforce-recomp Demo.hs -O2 -o Demo-llvm && ./Demo-llvm
Hello 4.0 5.0 World!

# Discussion

What is happening here?  The LLVM backend passes the registers in arguments,
which are then mapped to registers via the GHC calling convention we added
to LLVM.

As the LLVM backend takes off from Cmm, we produce function that always hold
the always live registers (on x86_64 these are: Base, Sp, Hp, R1, R2, R3, R4, R5, R6, SpLim)
and appends those registers that are live throughout the function call: in the
case of `q` this is one Float and one Double register. Let’s assume these are
F3 and D4.  Thus the function signature we generate looks like:

ghccc void @q(i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double)

And expect the passed arguments to represent the following registers:

              base, sp, hp, r1, r2, r3, r4, r5, r6, spLim, f3, d4

as we found that f1 and d1 are not live.

Yet, when we call `q` in the form of `g` in the body of `f`. We will pass it 14 arguments
instead of 12.  To make this “typecheck” in LLVM, we

@q' = bitcast @q to (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double)

and call @q’(base, sp, hp, r1, r2, r3, r4, r5, r6, spLim, f1, d2, f3, d4).

at this point, we now assign f3 <- f1 and d4 <- d2; while silently ignoring
the passed arguments f3 and d4.

(This is where my llvmng backend fell over, as it does not bitcast function
 signatures but tries to unify them.)

# Solution?

Initially, Ben and I though we could simply always pass all registers as
arguments in LLVM and call it a day with the downside of create more verbose
but correct code.  As I found out, that comes with a few complications.  For
some reason, all active stg registers for my machine give me

  Base, Sp, Hp, R1, R2, R3, R4, R5, R6, SpLim,
  F1, D1, F2, D2, F3, D3,
  XMM1,XMM2,XMM3,XMM4,XMM5,XMM6,
  YMM1,YMM2,YMM3,YMM4,YMM5,YMM6,
  ZMM1,ZMM2,ZMM3,ZMM4,ZMM5,ZMM6

I should not have the YMM*, and ZMM* registers as I don’t have any AVX nor AVX512;
that looks like only a patch away.  However we try to optimize our register, such
that we can pass up to six doubles or six floats or any combination of both if needed
in registers, without having to allocate them on the stack, by assuming overlapping
registers (See Note [Overlapping global registers]).

And as such a full function signature in LLVM would as opposed to one that’s based on
the “live” registers as we have right now, would consist of 12 float/double registers,
and LLVM only maps 6.  My current idea is to, pass only the explicit F1,D1,…,F3,D3
and try to disable the register overlapping for LLVM.  This would probably force more
floating values to be stack allocated rather than passed via registers, but would
likely guarantee that the registers match up.  The other option I can think of is to
define some viertual generic floating registers in the llvm code gen: V1,…,V6
and then perform something like

  F1 <- V1 as float
  D1 <- V1 as double

in the body of the function, while trying to use the `live` information at the call site
to decide which of F1 or D1 to pass as V1.

Ideas?

Cheers,
 Moritz

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: The Curious Case of T6084 -or- Register Confusion with LLVM

Ben Gamari-3
Moritz Angermann <[hidden email]> writes:

[snip]

>
> I should not have the YMM*, and ZMM* registers as I don’t have any AVX nor AVX512;
> that looks like only a patch away.  However we try to optimize our register, such
> that we can pass up to six doubles or six floats or any combination of both if needed
> in registers, without having to allocate them on the stack, by assuming overlapping
> registers (See Note [Overlapping global registers]).
>
> And as such a full function signature in LLVM would as opposed to one that’s based on
> the “live” registers as we have right now, would consist of 12 float/double registers,
> and LLVM only maps 6.  My current idea is to, pass only the explicit F1,D1,…,F3,D3
> and try to disable the register overlapping for LLVM.  This would probably force more
> floating values to be stack allocated rather than passed via registers, but would
> likely guarantee that the registers match up.  The other option I can think of is to
> define some viertual generic floating registers in the llvm code gen: V1,…,V6
> and then perform something like
>
>   F1 <- V1 as float
>   D1 <- V1 as double
>
> in the body of the function, while trying to use the `live` information at the call site
> to decide which of F1 or D1 to pass as V1.
>
Arguably the fundamental problem here is the assumption that all STG
entry-points have the same machine-level calling convention. As you
point out, our calling conventions in fact change due to things like
register overlap. Ideally the LLVM we produce would reflect this.

One way to make this happen would be for C-- call nodes to carry
information about the calling convention of the target (e.g. how many
arguments of each type the function expects; in the same way identifiers
in Core carry their type). Unfortunately a brief look at the code
generator suggests that this may require a fair amount of plumbing.

It's important to note though that this overlap problem is something
that will need to be addressed eventually if we are are to have proper
SIMD support (due to overlap between XMM, YMM, and ZMM).

Cheers,

- Ben

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

signature.asc (497 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: The Curious Case of T6084 -or- Register Confusion with LLVM

GHC - devs mailing list
In reply to this post by Moritz Angermann-2
Moritz

Talk to Kavon.  He was thinking about passing a struct instead of a huge list of registers, and only initialising the live fields of the struct.  I don't know whether he ultimately discarded the idea, but it sounded promising.

Simon

|  -----Original Message-----
|  From: ghc-devs [mailto:[hidden email]] On Behalf Of Moritz
|  Angermann
|  Sent: 20 September 2017 10:45
|  To: GHC developers <[hidden email]>
|  Subject: The Curious Case of T6084 -or- Register Confusion with LLVM
|  
|  Hi *,
|  
|  TLDR: The LLVM backend might confuse floating registers in GHC.
|  
|  # Demo (Ticket #14251)
|  
|  Let Demo.hs be the following short program (a minor modification from
|  T6084):
|  ```
|  {-# LANGUAGE MagicHash, BangPatterns #-} module Main where
|  
|  import GHC.Exts
|  
|  {-# NOINLINE f #-}
|  f :: (Int# -> Float# -> Double# -> Float# -> Double# -> String) -> String f
|  g = g 3# 4.0# 5.0## 6.0# 6.9## ++ " World!"
|  
|  {-# NOINLINE q #-}
|  q :: Int# -> Float# -> Double# -> Float# -> Double# -> String q i j k l m =
|  "Hello " ++ show (F# l) ++ " " ++ show (D# m)
|  
|  main = putStrLn (f $ q)
|  ```
|  
|  What happens if we compile them with the NCG and LLVM?
|  
|  $ ghc -fasm -fforce-recomp Demo.hs -O2 -o Demo-ncg && ./Demo-ncg Hello 6.0
|  6.9 World!
|  
|  $ ghc -fllvm -fforce-recomp Demo.hs -O2 -o Demo-llvm && ./Demo-llvm Hello
|  4.0 5.0 World!
|  
|  # Discussion
|  
|  What is happening here?  The LLVM backend passes the registers in arguments,
|  which are then mapped to registers via the GHC calling convention we added
|  to LLVM.
|  
|  As the LLVM backend takes off from Cmm, we produce function that always hold
|  the always live registers (on x86_64 these are: Base, Sp, Hp, R1, R2, R3,
|  R4, R5, R6, SpLim) and appends those registers that are live throughout the
|  function call: in the case of `q` this is one Float and one Double register.
|  Let’s assume these are
|  F3 and D4.  Thus the function signature we generate looks like:
|  
|  ghccc void @q(i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float,
|  double)
|  
|  And expect the passed arguments to represent the following registers:
|  
|                base, sp, hp, r1, r2, r3, r4, r5, r6, spLim, f3, d4
|  
|  as we found that f1 and d1 are not live.
|  
|  Yet, when we call `q` in the form of `g` in the body of `f`. We will pass it
|  14 arguments instead of 12.  To make this “typecheck” in LLVM, we
|  
|  @q' = bitcast @q to (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64,
|  float, double, float, double)
|  
|  and call @q’(base, sp, hp, r1, r2, r3, r4, r5, r6, spLim, f1, d2, f3, d4).
|  
|  at this point, we now assign f3 <- f1 and d4 <- d2; while silently ignoring
|  the passed arguments f3 and d4.
|  
|  (This is where my llvmng backend fell over, as it does not bitcast function
|  signatures but tries to unify them.)
|  
|  # Solution?
|  
|  Initially, Ben and I though we could simply always pass all registers as
|  arguments in LLVM and call it a day with the downside of create more verbose
|  but correct code.  As I found out, that comes with a few complications.  For
|  some reason, all active stg registers for my machine give me
|  
|    Base, Sp, Hp, R1, R2, R3, R4, R5, R6, SpLim,
|    F1, D1, F2, D2, F3, D3,
|    XMM1,XMM2,XMM3,XMM4,XMM5,XMM6,
|    YMM1,YMM2,YMM3,YMM4,YMM5,YMM6,
|    ZMM1,ZMM2,ZMM3,ZMM4,ZMM5,ZMM6
|  
|  I should not have the YMM*, and ZMM* registers as I don’t have any AVX nor
|  AVX512; that looks like only a patch away.  However we try to optimize our
|  register, such that we can pass up to six doubles or six floats or any
|  combination of both if needed in registers, without having to allocate them
|  on the stack, by assuming overlapping registers (See Note [Overlapping
|  global registers]).
|  
|  And as such a full function signature in LLVM would as opposed to one that’s
|  based on the “live” registers as we have right now, would consist of 12
|  float/double registers, and LLVM only maps 6.  My current idea is to, pass
|  only the explicit F1,D1,…,F3,D3 and try to disable the register overlapping
|  for LLVM.  This would probably force more floating values to be stack
|  allocated rather than passed via registers, but would likely guarantee that
|  the registers match up.  The other option I can think of is to define some
|  viertual generic floating registers in the llvm code gen: V1,…,V6 and then
|  perform something like
|  
|    F1 <- V1 as float
|    D1 <- V1 as double
|  
|  in the body of the function, while trying to use the `live` information at
|  the call site to decide which of F1 or D1 to pass as V1.
|  
|  Ideas?
|  
|  Cheers,
|   Moritz
|  
|  _______________________________________________
|  ghc-devs mailing list
|  [hidden email]
|  https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.haskell
|  .org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-
|  devs&data=02%7C01%7Csimonpj%40microsoft.com%7C89f152a90a1b43caa39408d5000c4b
|  8c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636414975169335863&sdata=Dda
|  hjmHVAKIaK3YVrmX7lS8s3OswoeLoP5sDRV060eE%3D&reserved=0
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

RE: The Curious Case of T6084 -or- Register Confusion with LLVM

GHC - devs mailing list
In reply to this post by Ben Gamari-3
|  One way to make this happen would be for C-- call nodes to carry information
|  about the calling convention of the target (e.g. how many arguments of each
|  type the function expects; in the same way identifiers in Core carry their
|  type).

That's be entirely possible for "known" calls, where the target is known, but not for "unknown" (i.e higher order) ones where the target of the call varies.

The "Making a fast curry" paper goes into this in some detail.  I think we already have different entry points for these two cases.  So maybe they could have different entry conventions...

Simon

|  -----Original Message-----
|  From: ghc-devs [mailto:[hidden email]] On Behalf Of Ben Gamari
|  Sent: 20 September 2017 16:54
|  To: Moritz Angermann <[hidden email]>; GHC developers <ghc-
|  [hidden email]>
|  Subject: Re: The Curious Case of T6084 -or- Register Confusion with LLVM
|  
|  Moritz Angermann <[hidden email]> writes:
|  
|  [snip]
|  >
|  > I should not have the YMM*, and ZMM* registers as I don’t have any AVX
|  > nor AVX512; that looks like only a patch away.  However we try to
|  > optimize our register, such that we can pass up to six doubles or six
|  > floats or any combination of both if needed in registers, without
|  > having to allocate them on the stack, by assuming overlapping registers
|  (See Note [Overlapping global registers]).
|  >
|  > And as such a full function signature in LLVM would as opposed to one
|  > that’s based on the “live” registers as we have right now, would
|  > consist of 12 float/double registers, and LLVM only maps 6.  My
|  > current idea is to, pass only the explicit F1,D1,…,F3,D3 and try to
|  > disable the register overlapping for LLVM.  This would probably force
|  > more floating values to be stack allocated rather than passed via
|  > registers, but would likely guarantee that the registers match up.
|  > The other option I can think of is to define some viertual generic
|  > floating registers in the llvm code gen: V1,…,V6 and then perform
|  > something like
|  >
|  >   F1 <- V1 as float
|  >   D1 <- V1 as double
|  >
|  > in the body of the function, while trying to use the `live`
|  > information at the call site to decide which of F1 or D1 to pass as V1.
|  >
|  Arguably the fundamental problem here is the assumption that all STG entry-
|  points have the same machine-level calling convention. As you point out, our
|  calling conventions in fact change due to things like register overlap.
|  Ideally the LLVM we produce would reflect this.
|  
|  One way to make this happen would be for C-- call nodes to carry information
|  about the calling convention of the target (e.g. how many arguments of each
|  type the function expects; in the same way identifiers in Core carry their
|  type). Unfortunately a brief look at the code generator suggests that this
|  may require a fair amount of plumbing.
|  
|  It's important to note though that this overlap problem is something that
|  will need to be addressed eventually if we are are to have proper SIMD
|  support (due to overlap between XMM, YMM, and ZMM).
|  
|  Cheers,
|  
|  - Ben
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: The Curious Case of T6084 -or- Register Confusion with LLVM

Kavon Farvardin
In reply to this post by Moritz Angermann-2
Responses are inline below:

> As the LLVM backend takes off from Cmm, we produce function that always hold
> the always live registers (on x86_64 these are: Base, Sp, Hp, R1, R2, R3, R4, R5, R6, SpLim)
> and appends those registers that are live throughout the function call: in the
> case of `q` this is one Float and one Double register.

To be more precise, we append only the live floating point or vector arguments to this always live list. We need to do this because of overlapping register usage in our calling convention on x86-64 (F1 and D1 are both put in XMM1). See Note [Overlapping global registers] for details.


> Let’s assume these are F3 and D4.  Thus the function signature we generate looks like:
>
> ghccc void @q(i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double)
>
> And expect the passed arguments to represent the following registers:
>
>              base, sp, hp, r1, r2, r3, r4, r5, r6, spLim, f3, d4
>
> as we found that f1 and d1 are not live.


I think it's wrong to assume that `q` accepts its first two floating-point arguments in F3 and D4, because I'm pretty sure the standard Cmm calling convention assigns them to F1 and D2, respectively. Are we actually outputting `q` such that F3 and D4 are used?


> (This is where my llvmng backend fell over, as it does not bitcast function
> signatures but tries to unify them.)


I think to solve this problem, we'll want to bitcast functions whenever we call them, because the type of an LLVM function is important for us to get the calling convention correct.


~kavon


> On Sep 20, 2017, at 4:44 AM, Moritz Angermann <[hidden email]> wrote:
>
> Hi *,
>
> TLDR: The LLVM backend might confuse floating registers in GHC.
>
> # Demo (Ticket #14251)
>
> Let Demo.hs be the following short program (a minor modification from T6084):
> ```
> {-# LANGUAGE MagicHash, BangPatterns #-}
> module Main where
>
> import GHC.Exts
>
> {-# NOINLINE f #-}
> f :: (Int# -> Float# -> Double# -> Float# -> Double# -> String) -> String
> f g = g 3# 4.0# 5.0## 6.0# 6.9## ++ " World!"
>
> {-# NOINLINE q #-}
> q :: Int# -> Float# -> Double# -> Float# -> Double# -> String
> q i j k l m = "Hello " ++ show (F# l) ++ " " ++ show (D# m)
>
> main = putStrLn (f $ q)
> ```
>
> What happens if we compile them with the NCG and LLVM?
>
> $ ghc -fasm -fforce-recomp Demo.hs -O2 -o Demo-ncg && ./Demo-ncg
> Hello 6.0 6.9 World!
>
> $ ghc -fllvm -fforce-recomp Demo.hs -O2 -o Demo-llvm && ./Demo-llvm
> Hello 4.0 5.0 World!
>
> # Discussion
>
> What is happening here?  The LLVM backend passes the registers in arguments,
> which are then mapped to registers via the GHC calling convention we added
> to LLVM.
>
> As the LLVM backend takes off from Cmm, we produce function that always hold
> the always live registers (on x86_64 these are: Base, Sp, Hp, R1, R2, R3, R4, R5, R6, SpLim)
> and appends those registers that are live throughout the function call: in the
> case of `q` this is one Float and one Double register. Let’s assume these are
> F3 and D4.  Thus the function signature we generate looks like:
>
> ghccc void @q(i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double)
>
> And expect the passed arguments to represent the following registers:
>
>              base, sp, hp, r1, r2, r3, r4, r5, r6, spLim, f3, d4
>
> as we found that f1 and d1 are not live.
>
> Yet, when we call `q` in the form of `g` in the body of `f`. We will pass it 14 arguments
> instead of 12.  To make this “typecheck” in LLVM, we
>
> @q' = bitcast @q to (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double)
>
> and call @q’(base, sp, hp, r1, r2, r3, r4, r5, r6, spLim, f1, d2, f3, d4).
>
> at this point, we now assign f3 <- f1 and d4 <- d2; while silently ignoring
> the passed arguments f3 and d4.
>
> (This is where my llvmng backend fell over, as it does not bitcast function
> signatures but tries to unify them.)
>
> # Solution?
>
> Initially, Ben and I though we could simply always pass all registers as
> arguments in LLVM and call it a day with the downside of create more verbose
> but correct code.  As I found out, that comes with a few complications.  For
> some reason, all active stg registers for my machine give me
>
>  Base, Sp, Hp, R1, R2, R3, R4, R5, R6, SpLim,
>  F1, D1, F2, D2, F3, D3,
>  XMM1,XMM2,XMM3,XMM4,XMM5,XMM6,
>  YMM1,YMM2,YMM3,YMM4,YMM5,YMM6,
>  ZMM1,ZMM2,ZMM3,ZMM4,ZMM5,ZMM6
>
> I should not have the YMM*, and ZMM* registers as I don’t have any AVX nor AVX512;
> that looks like only a patch away.  However we try to optimize our register, such
> that we can pass up to six doubles or six floats or any combination of both if needed
> in registers, without having to allocate them on the stack, by assuming overlapping
> registers (See Note [Overlapping global registers]).
>
> And as such a full function signature in LLVM would as opposed to one that’s based on
> the “live” registers as we have right now, would consist of 12 float/double registers,
> and LLVM only maps 6.  My current idea is to, pass only the explicit F1,D1,…,F3,D3
> and try to disable the register overlapping for LLVM.  This would probably force more
> floating values to be stack allocated rather than passed via registers, but would
> likely guarantee that the registers match up.  The other option I can think of is to
> define some viertual generic floating registers in the llvm code gen: V1,…,V6
> and then perform something like
>
>  F1 <- V1 as float
>  D1 <- V1 as double
>
> in the body of the function, while trying to use the `live` information at the call site
> to decide which of F1 or D1 to pass as V1.
>
> Ideas?
>
> Cheers,
> Moritz
>
> _______________________________________________
> ghc-devs mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: The Curious Case of T6084 -or- Register Confusion with LLVM

Kavon Farvardin
Let me elaborate a bit more because I clearly missed some points you already made in your original message. Sorry about that:


I don't think we need a heavyweight solution to this problem (the suggestions of: disabling overlapping registers for LLVM, or adding a new virtual register class Vx).

Instead, let's first remember how the type of the called function pointer corresponds to its calling convention when it is lowered to assembly in LLVM. In our GHC calling convention in LLVM, we can specify that

    if type == float OR type == double, use:
        XMM1,XMM2,XMM3,XMM4,XMM5,XMM6

When a calling convention is being determined by LLVM for any function definition or call, it goes in order from left to right in the list of parameters, and assigns float or double arguments to the first currently available register in that XMM list.

So, if `q` were indeed using F3 and D4 to accept its first two floating point arguments, the function signature we generate,

    ghccc void @q(i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double)

is wrong. The registers for the `float, double` arguments will be assigned to XMM1 and XMM2 by LLVM. Since F3 and D4 use XMM3 and XMM4, respectively, we should have padded out the type of `q` in LLVM to be:


    ghccc void @q(i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double)

where the first `float, double` parameters are now unused. We would also perform the same type of padding at every call site where the first two float arguments are F3 and D4, so that they end up in the right physical registers.
We pass `undef` for the first two `float, double` arguments.



> On Sep 21, 2017, at 12:32 PM, Kavon Farvardin <[hidden email]> wrote:
>
> Responses are inline below:
>
>> As the LLVM backend takes off from Cmm, we produce function that always hold
>> the always live registers (on x86_64 these are: Base, Sp, Hp, R1, R2, R3, R4, R5, R6, SpLim)
>> and appends those registers that are live throughout the function call: in the
>> case of `q` this is one Float and one Double register.
>
> To be more precise, we append only the live floating point or vector arguments to this always live list. We need to do this because of overlapping register usage in our calling convention on x86-64 (F1 and D1 are both put in XMM1). See Note [Overlapping global registers] for details.
>
>
>> Let’s assume these are F3 and D4.  Thus the function signature we generate looks like:
>>
>> ghccc void @q(i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double)
>>
>> And expect the passed arguments to represent the following registers:
>>
>>             base, sp, hp, r1, r2, r3, r4, r5, r6, spLim, f3, d4
>>
>> as we found that f1 and d1 are not live.
>
>
> I think it's wrong to assume that `q` accepts its first two floating-point arguments in F3 and D4, because I'm pretty sure the standard Cmm calling convention assigns them to F1 and D2, respectively. Are we actually outputting `q` such that F3 and D4 are used?
>
>
>> (This is where my llvmng backend fell over, as it does not bitcast function
>> signatures but tries to unify them.)
>
>
> I think to solve this problem, we'll want to bitcast functions whenever we call them, because the type of an LLVM function is important for us to get the calling convention correct.
>
>
> ~kavon
>
>
>> On Sep 20, 2017, at 4:44 AM, Moritz Angermann <[hidden email]> wrote:
>>
>> Hi *,
>>
>> TLDR: The LLVM backend might confuse floating registers in GHC.
>>
>> # Demo (Ticket #14251)
>>
>> Let Demo.hs be the following short program (a minor modification from T6084):
>> ```
>> {-# LANGUAGE MagicHash, BangPatterns #-}
>> module Main where
>>
>> import GHC.Exts
>>
>> {-# NOINLINE f #-}
>> f :: (Int# -> Float# -> Double# -> Float# -> Double# -> String) -> String
>> f g = g 3# 4.0# 5.0## 6.0# 6.9## ++ " World!"
>>
>> {-# NOINLINE q #-}
>> q :: Int# -> Float# -> Double# -> Float# -> Double# -> String
>> q i j k l m = "Hello " ++ show (F# l) ++ " " ++ show (D# m)
>>
>> main = putStrLn (f $ q)
>> ```
>>
>> What happens if we compile them with the NCG and LLVM?
>>
>> $ ghc -fasm -fforce-recomp Demo.hs -O2 -o Demo-ncg && ./Demo-ncg
>> Hello 6.0 6.9 World!
>>
>> $ ghc -fllvm -fforce-recomp Demo.hs -O2 -o Demo-llvm && ./Demo-llvm
>> Hello 4.0 5.0 World!
>>
>> # Discussion
>>
>> What is happening here?  The LLVM backend passes the registers in arguments,
>> which are then mapped to registers via the GHC calling convention we added
>> to LLVM.
>>
>> As the LLVM backend takes off from Cmm, we produce function that always hold
>> the always live registers (on x86_64 these are: Base, Sp, Hp, R1, R2, R3, R4, R5, R6, SpLim)
>> and appends those registers that are live throughout the function call: in the
>> case of `q` this is one Float and one Double register. Let’s assume these are
>> F3 and D4.  Thus the function signature we generate looks like:
>>
>> ghccc void @q(i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double)
>>
>> And expect the passed arguments to represent the following registers:
>>
>>             base, sp, hp, r1, r2, r3, r4, r5, r6, spLim, f3, d4
>>
>> as we found that f1 and d1 are not live.
>>
>> Yet, when we call `q` in the form of `g` in the body of `f`. We will pass it 14 arguments
>> instead of 12.  To make this “typecheck” in LLVM, we
>>
>> @q' = bitcast @q to (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double)
>>
>> and call @q’(base, sp, hp, r1, r2, r3, r4, r5, r6, spLim, f1, d2, f3, d4).
>>
>> at this point, we now assign f3 <- f1 and d4 <- d2; while silently ignoring
>> the passed arguments f3 and d4.
>>
>> (This is where my llvmng backend fell over, as it does not bitcast function
>> signatures but tries to unify them.)
>>
>> # Solution?
>>
>> Initially, Ben and I though we could simply always pass all registers as
>> arguments in LLVM and call it a day with the downside of create more verbose
>> but correct code.  As I found out, that comes with a few complications.  For
>> some reason, all active stg registers for my machine give me
>>
>> Base, Sp, Hp, R1, R2, R3, R4, R5, R6, SpLim,
>> F1, D1, F2, D2, F3, D3,
>> XMM1,XMM2,XMM3,XMM4,XMM5,XMM6,
>> YMM1,YMM2,YMM3,YMM4,YMM5,YMM6,
>> ZMM1,ZMM2,ZMM3,ZMM4,ZMM5,ZMM6
>>
>> I should not have the YMM*, and ZMM* registers as I don’t have any AVX nor AVX512;
>> that looks like only a patch away.  However we try to optimize our register, such
>> that we can pass up to six doubles or six floats or any combination of both if needed
>> in registers, without having to allocate them on the stack, by assuming overlapping
>> registers (See Note [Overlapping global registers]).
>>
>> And as such a full function signature in LLVM would as opposed to one that’s based on
>> the “live” registers as we have right now, would consist of 12 float/double registers,
>> and LLVM only maps 6.  My current idea is to, pass only the explicit F1,D1,…,F3,D3
>> and try to disable the register overlapping for LLVM.  This would probably force more
>> floating values to be stack allocated rather than passed via registers, but would
>> likely guarantee that the registers match up.  The other option I can think of is to
>> define some viertual generic floating registers in the llvm code gen: V1,…,V6
>> and then perform something like
>>
>> F1 <- V1 as float
>> D1 <- V1 as double
>>
>> in the body of the function, while trying to use the `live` information at the call site
>> to decide which of F1 or D1 to pass as V1.
>>
>> Ideas?
>>
>> Cheers,
>> Moritz
>>
>> _______________________________________________
>> ghc-devs mailing list
>> [hidden email]
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: The Curious Case of T6084 -or- Register Confusion with LLVM

Moritz Angermann-2
The issue is at the function definition. In the price point splitting code we determine that the F1 and D2 registers are not actually used in the body of `q`.  And as such optimize the set of live register tees from R1, F1, D2, F3, D4 to R1, F3, D4.

I simply retain the live registers of the top proc instead of updating them to the optimized set.

As such we generate the correct function signature in the llvm backend.

Sent from my iPhone

On 22 Sep 2017, at 2:08 AM, Kavon Farvardin <[hidden email]> wrote:

Let me elaborate a bit more because I clearly missed some points you already made in your original message. Sorry about that:


I don't think we need a heavyweight solution to this problem (the suggestions of: disabling overlapping registers for LLVM, or adding a new virtual register class Vx).

Instead, let's first remember how the type of the called function pointer corresponds to its calling convention when it is lowered to assembly in LLVM. In our GHC calling convention in LLVM, we can specify that

   if type == float OR type == double, use:
       XMM1,XMM2,XMM3,XMM4,XMM5,XMM6

When a calling convention is being determined by LLVM for any function definition or call, it goes in order from left to right in the list of parameters, and assigns float or double arguments to the first currently available register in that XMM list.

So, if `q` were indeed using F3 and D4 to accept its first two floating point arguments, the function signature we generate,

   ghccc void @q(i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double)

is wrong. The registers for the `float, double` arguments will be assigned to XMM1 and XMM2 by LLVM. Since F3 and D4 use XMM3 and XMM4, respectively, we should have padded out the type of `q` in LLVM to be:


   ghccc void @q(i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double)

where the first `float, double` parameters are now unused. We would also perform the same type of padding at every call site where the first two float arguments are F3 and D4, so that they end up in the right physical registers.
We pass `undef` for the first two `float, double` arguments.



On Sep 21, 2017, at 12:32 PM, Kavon Farvardin <[hidden email]> wrote:

Responses are inline below:

As the LLVM backend takes off from Cmm, we produce function that always hold
the always live registers (on x86_64 these are: Base, Sp, Hp, R1, R2, R3, R4, R5, R6, SpLim)
and appends those registers that are live throughout the function call: in the
case of `q` this is one Float and one Double register.

To be more precise, we append only the live floating point or vector arguments to this always live list. We need to do this because of overlapping register usage in our calling convention on x86-64 (F1 and D1 are both put in XMM1). See Note [Overlapping global registers] for details.


Let’s assume these are F3 and D4.  Thus the function signature we generate looks like:

ghccc void @q(i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double)

And expect the passed arguments to represent the following registers:

           base, sp, hp, r1, r2, r3, r4, r5, r6, spLim, f3, d4

as we found that f1 and d1 are not live.


I think it's wrong to assume that `q` accepts its first two floating-point arguments in F3 and D4, because I'm pretty sure the standard Cmm calling convention assigns them to F1 and D2, respectively. Are we actually outputting `q` such that F3 and D4 are used?


(This is where my llvmng backend fell over, as it does not bitcast function
signatures but tries to unify them.)


I think to solve this problem, we'll want to bitcast functions whenever we call them, because the type of an LLVM function is important for us to get the calling convention correct.


~kavon


On Sep 20, 2017, at 4:44 AM, Moritz Angermann <[hidden email]> wrote:

Hi *,

TLDR: The LLVM backend might confuse floating registers in GHC.

# Demo (Ticket #14251)

Let Demo.hs be the following short program (a minor modification from T6084):
```
{-# LANGUAGE MagicHash, BangPatterns #-}
module Main where

import GHC.Exts

{-# NOINLINE f #-}
f :: (Int# -> Float# -> Double# -> Float# -> Double# -> String) -> String
f g = g 3# 4.0# 5.0## 6.0# 6.9## ++ " World!"

{-# NOINLINE q #-}
q :: Int# -> Float# -> Double# -> Float# -> Double# -> String
q i j k l m = "Hello " ++ show (F# l) ++ " " ++ show (D# m)

main = putStrLn (f $ q)
```

What happens if we compile them with the NCG and LLVM?

$ ghc -fasm -fforce-recomp Demo.hs -O2 -o Demo-ncg && ./Demo-ncg
Hello 6.0 6.9 World!

$ ghc -fllvm -fforce-recomp Demo.hs -O2 -o Demo-llvm && ./Demo-llvm
Hello 4.0 5.0 World!

# Discussion

What is happening here?  The LLVM backend passes the registers in arguments,
which are then mapped to registers via the GHC calling convention we added
to LLVM.

As the LLVM backend takes off from Cmm, we produce function that always hold
the always live registers (on x86_64 these are: Base, Sp, Hp, R1, R2, R3, R4, R5, R6, SpLim)
and appends those registers that are live throughout the function call: in the
case of `q` this is one Float and one Double register. Let’s assume these are
F3 and D4.  Thus the function signature we generate looks like:

ghccc void @q(i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double)

And expect the passed arguments to represent the following registers:

           base, sp, hp, r1, r2, r3, r4, r5, r6, spLim, f3, d4

as we found that f1 and d1 are not live.

Yet, when we call `q` in the form of `g` in the body of `f`. We will pass it 14 arguments
instead of 12.  To make this “typecheck” in LLVM, we

@q' = bitcast @q to (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double)

and call @q’(base, sp, hp, r1, r2, r3, r4, r5, r6, spLim, f1, d2, f3, d4).

at this point, we now assign f3 <- f1 and d4 <- d2; while silently ignoring
the passed arguments f3 and d4.

(This is where my llvmng backend fell over, as it does not bitcast function
signatures but tries to unify them.)

# Solution?

Initially, Ben and I though we could simply always pass all registers as
arguments in LLVM and call it a day with the downside of create more verbose
but correct code.  As I found out, that comes with a few complications.  For
some reason, all active stg registers for my machine give me

Base, Sp, Hp, R1, R2, R3, R4, R5, R6, SpLim,
F1, D1, F2, D2, F3, D3,
XMM1,XMM2,XMM3,XMM4,XMM5,XMM6,
YMM1,YMM2,YMM3,YMM4,YMM5,YMM6,
ZMM1,ZMM2,ZMM3,ZMM4,ZMM5,ZMM6

I should not have the YMM*, and ZMM* registers as I don’t have any AVX nor AVX512;
that looks like only a patch away.  However we try to optimize our register, such
that we can pass up to six doubles or six floats or any combination of both if needed
in registers, without having to allocate them on the stack, by assuming overlapping
registers (See Note [Overlapping global registers]).

And as such a full function signature in LLVM would as opposed to one that’s based on
the “live” registers as we have right now, would consist of 12 float/double registers,
and LLVM only maps 6.  My current idea is to, pass only the explicit F1,D1,…,F3,D3
and try to disable the register overlapping for LLVM.  This would probably force more
floating values to be stack allocated rather than passed via registers, but would
likely guarantee that the registers match up.  The other option I can think of is to
define some viertual generic floating registers in the llvm code gen: V1,…,V6
and then perform something like

F1 <- V1 as float
D1 <- V1 as double

in the body of the function, while trying to use the `live` information at the call site
to decide which of F1 or D1 to pass as V1.

Ideas?

Cheers,
Moritz

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs



_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs