GHC's internal confusion about Ints and Words

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

GHC's internal confusion about Ints and Words

Moritz Angermann-2
Hi there!

So there is a procedure calling convention that for reasons I did not fully understand, but seem to be historically grown, uses packed arguments for those that are spilled onto the stack. On top of that, CInt is 32bit, Word is 64bits. This provides the following spectacle:

While we know in STG that the CInt is 32bits wide, when lowered into Cmm, it's represented as I64 in the arguments to the C function.  Thus packing based on the format of the Cmm type would yield 8 bytes. And now, all further packed arguments have the wrong offset (by four).

Specifically in GHC.Cmm.Utils we find:

primRepCmmType :: Platform -> PrimRep -> CmmType
primRepCmmType platform IntRep = bWord platform

mkIntCLit :: Platform -> Int -> CmmLit
mkIntCLit platform i = CmmInt (toInteger i) (wordWidth platform)

The naive idea to just fix this and make them return cIntWidth instead, seemingly produces the correct Cmm expressions at a local level, but produces a broken compiler.

A second approach could be to extend the Hints into providing sizes, and using those during the foreign call generation to pack spilled arguments.  This however appears to be more of a patching up of some fundamental underlying issue, instead of rectifying it properly.

Maybe I'll have to go down the Hint path, it does however break current Eq assumptions, as they are sized now, and what was equal before, is only equal now if they represent the same size.

From a cursory glance at the issues with naively fixing the width for Int, it seems that GHC internally assumes sizeof(Int) = sizeof(Word).  Maybe there is a whole level of HsInt vs CInt discrimination missing?

Cheers,
 Moritz

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

RE: GHC's internal confusion about Ints and Words

GHC - devs mailing list

Moritz

 

I’m afraid I don’t understand any of this.  Not your fault, but  I just don’t have enough context to know what you mean.

 

Is there a current bug?  If so, can you demonstrate it?   If not, what is the problem you want to solve?  Examples are always helpful.

 

Maybe it’s worth opening a ticket too?

 

Thanks!

 

Simon

 

From: ghc-devs <[hidden email]> On Behalf Of Moritz Angermann
Sent: 20 October 2020 02:51
To: ghc-devs <[hidden email]>
Subject: GHC's internal confusion about Ints and Words

 

Hi there!

 

So there is a procedure calling convention that for reasons I did not fully understand, but seem to be historically grown, uses packed arguments for those that are spilled onto the stack. On top of that, CInt is 32bit, Word is 64bits. This provides the following spectacle:

 

While we know in STG that the CInt is 32bits wide, when lowered into Cmm, it's represented as I64 in the arguments to the C function.  Thus packing based on the format of the Cmm type would yield 8 bytes. And now, all further packed arguments have the wrong offset (by four).

 

Specifically in GHC.Cmm.Utils we find:

primRepCmmType :: Platform -> PrimRep -> CmmType

primRepCmmType platform IntRep = bWord platform

 

mkIntCLit :: Platform -> Int -> CmmLit
mkIntCLit platform i = CmmInt (toInteger i) (wordWidth platform)

 

The naive idea to just fix this and make them return cIntWidth instead, seemingly produces the correct Cmm expressions at a local level, but produces a broken compiler.

 

A second approach could be to extend the Hints into providing sizes, and using those during the foreign call generation to pack spilled arguments.  This however appears to be more of a patching up of some fundamental underlying issue, instead of rectifying it properly.

 

Maybe I'll have to go down the Hint path, it does however break current Eq assumptions, as they are sized now, and what was equal before, is only equal now if they represent the same size.

 

From a cursory glance at the issues with naively fixing the width for Int, it seems that GHC internally assumes sizeof(Int) = sizeof(Word).  Maybe there is a whole level of HsInt vs CInt discrimination missing?

 

Cheers,

 Moritz


_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: GHC's internal confusion about Ints and Words

Moritz Angermann-2
Alright, let me expand a bit.  I've been looking at aarch64 NCG for ghc.  The Linux side of things is looking really good,
so I've moved onto the macOS side (I'm afraid I don't have any Windows aarch64 hardware, nor much windows knowledge
to even attempt a Windows version yet).

When calling C functions, the usual approach is to pass the first few arguments in registers, and then arguments that exceed
the argument passing slots on the stack.  The Arm AArch64 Procedure Call Standard (aapcs) for C does this by assigning 8byte
slots to each overflow argument on the stack.  A company I won't name, has decided to implement a slightly different variation of
the Procedure Call Standard, often referred to as darwinpcs.  This deviates from the aapcs for vargs, as well as for handling of
spilled arguments on the stack.

The aapcs allows us to generate calls to C functions without knowing the actual prototype of the function, as all arguments are
simply spilled into 8byte slots on the stack.  The darwinpcs however requires us to know the size of the arguments, so we can
properly pack them onto the stack.  Ints have 4 bytes, so we need to pack them into 4byte slots.

In the process library we have this rather fun foreign import:
foreign import ccall unsafe "runInteractiveProcess"
  c_runInteractiveProcess
        ::  Ptr CString
        -> CString
        -> Ptr CString
        -> FD
        -> FD
        -> FD
        -> Ptr FD
        -> Ptr FD
        -> Ptr FD
        -> Ptr CGid
        -> Ptr CUid
        -> CInt                         -- reset child's SIGINT & SIGQUIT handlers
        -> CInt                         -- flags
        -> Ptr CString
        -> IO PHANDLE


with the corresponding C declaration:

extern ProcHandle runInteractiveProcess( char *const args[],
                                         char *workingDirectory,
                                         char **environment,
                                         int fdStdIn,
                                         int fdStdOut,
                                         int fdStdErr,
                                         int *pfdStdInput,
                                         int *pfdStdOutput,
                                         int *pfdStdError,
                                         gid_t *childGroup,
                                         uid_t *childUser,
                                         int reset_int_quit_handlers,
                                         int flags,
                                         char **failed_doing);
This function thus takes 14 arguments. We pass only the first 8 arguments in registers, and the others on the stack.
Argument 12 and 13 are of type int.  On linux using the aapcs, we can pass those in 8byte slots on the stack. That is
both of them are effectively 64bits wide when passed.  However for darwinpcs, it is expected that these adhere to their
size and are packed as such. Therefore Argument 12 and 13 need to be passed as 4byte slots each on the stack.

This yields a moderate 8byte saving on the stack for the same function call on darwinpcs compared to aapcs.

Now onto GHC.  When we generate function calls for foreign C functions, we deal with something like:

genCCall
    :: ForeignTarget      -- function to call
    -> [CmmFormal]        -- where to put the result
    -> [CmmActual]        -- arguments (of mixed type)
    -> BlockId            -- The block we are in
    -> NatM (InstrBlock, Maybe BlockId)


based on Cmm Nodes of the form CmmUnsafeForeignCall target result_regs args

The CmmActual in the care of runInteractiveProcess hold the arguments for the function, however contrary to the function
declaration, it contains I64 slots for Argument 12 and 13. Thus computing the space needed for them based on their Cmm
Representations yields 8bytes, when they should really be 32bit and consume only 4 byte.

To illustrate this a bit better: here is what we see in the pretty printed cmm:

(_s6w3::I64) = call "ccall" arg hints:  [PtrHint, PtrHint, PtrHint, signed, signed, signed, PtrHint, PtrHint, PtrHint, PtrHint, PtrHint, signed, signed, PtrHint]  result hints:  [signed] _runInteractiveProcess(I64[Sp + 96], I64[Sp + 88], I64[Sp + 104], I64[Sp + 112], I64[Sp + 120], I64[Sp + 56], I64[Sp + 64], I64[Sp + 72], I64[Sp + 24], 0, 0, I64[Sp + 8], I64[Sp + 40] | I64[Sp + 48] | I64[Sp + 80] | 3, I64[R1 + 7]);

I've added size information to the ForeignHints (NoHint, AddrHint, SignedHint) we have, and computed both, which yields:

[(CmmReg (CmmLocal (LocalReg s6Gi (CmmType BitsCat W64))),AddrHint)
,(CmmReg (CmmLocal (LocalReg s6Gk (CmmType BitsCat W64))),AddrHint)
,(CmmReg (CmmLocal (LocalReg s6Gm (CmmType BitsCat W64))),AddrHint)
,(CmmReg (CmmLocal (LocalReg s6Go (CmmType BitsCat W64))),SignedHint W32)
,(CmmReg (CmmLocal (LocalReg s6Gq (CmmType BitsCat W64))),SignedHint W32)
,(CmmReg (CmmLocal (LocalReg s6Gs (CmmType BitsCat W64))),SignedHint W32)
,(CmmReg (CmmLocal (LocalReg s6Gu (CmmType BitsCat W64))),AddrHint)
,(CmmReg (CmmLocal (LocalReg s6Gw (CmmType BitsCat W64))),AddrHint)
,(CmmReg (CmmLocal (LocalReg s6Gy (CmmType BitsCat W64))),AddrHint)
,(CmmReg (CmmLocal (LocalReg s6Cp (CmmType BitsCat W64))),AddrHint)
,(CmmReg (CmmLocal (LocalReg s6FU (CmmType BitsCat W64))),AddrHint)
,(CmmReg (CmmLocal (LocalReg s6GA (CmmType BitsCat W64))),SignedHint W32)
,(CmmReg (CmmLocal (LocalReg s6GR (CmmType BitsCat W64))),SignedHint W32)
,(CmmReg (CmmLocal (LocalReg s6GM (CmmType BitsCat W64))),AddrHint)]

Thus, while we *do* know the right size from STG (which is what the Hints are computed from), we loose this information when lowering
into Cmm, where we represent them with W64. This is what I was alluding to in the previous email. In primRepCmmType, and mkIntCLit, we set their type to 64bit for Ints; which on this platform does not hold.

Now I've gone ahead and effectively assume Cmm is lying to me when generating Foreign Function Calls, and rely on the (new) sized
hints to produce the appropriate argument packing on the stack.  However I believe the correct way would be for GHC not to conflate Ints
and Words in Cmm; implicitly assuming they are the same width.  Sadly it's not as simple as having primRepCmmType and mkIntCLit produce 32bit types. I fear GHC internally assumes "Int" means 64bit Integer, and then just happens to make the Int ~ CInt assumption.

Cheers,
 Moritz

On Tue, Oct 20, 2020 at 3:33 PM Simon Peyton Jones <[hidden email]> wrote:

Moritz

 

I’m afraid I don’t understand any of this.  Not your fault, but  I just don’t have enough context to know what you mean.

 

Is there a current bug?  If so, can you demonstrate it?   If not, what is the problem you want to solve?  Examples are always helpful.

 

Maybe it’s worth opening a ticket too?

 

Thanks!

 

Simon

 

From: ghc-devs <[hidden email]> On Behalf Of Moritz Angermann
Sent: 20 October 2020 02:51
To: ghc-devs <[hidden email]>
Subject: GHC's internal confusion about Ints and Words

 

Hi there!

 

So there is a procedure calling convention that for reasons I did not fully understand, but seem to be historically grown, uses packed arguments for those that are spilled onto the stack. On top of that, CInt is 32bit, Word is 64bits. This provides the following spectacle:

 

While we know in STG that the CInt is 32bits wide, when lowered into Cmm, it's represented as I64 in the arguments to the C function.  Thus packing based on the format of the Cmm type would yield 8 bytes. And now, all further packed arguments have the wrong offset (by four).

 

Specifically in GHC.Cmm.Utils we find:

primRepCmmType :: Platform -> PrimRep -> CmmType

primRepCmmType platform IntRep = bWord platform

 

mkIntCLit :: Platform -> Int -> CmmLit
mkIntCLit platform i = CmmInt (toInteger i) (wordWidth platform)

 

The naive idea to just fix this and make them return cIntWidth instead, seemingly produces the correct Cmm expressions at a local level, but produces a broken compiler.

 

A second approach could be to extend the Hints into providing sizes, and using those during the foreign call generation to pack spilled arguments.  This however appears to be more of a patching up of some fundamental underlying issue, instead of rectifying it properly.

 

Maybe I'll have to go down the Hint path, it does however break current Eq assumptions, as they are sized now, and what was equal before, is only equal now if they represent the same size.

 

From a cursory glance at the issues with naively fixing the width for Int, it seems that GHC internally assumes sizeof(Int) = sizeof(Word).  Maybe there is a whole level of HsInt vs CInt discrimination missing?

 

Cheers,

 Moritz


_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: GHC's internal confusion about Ints and Words

Carter Schonwald
... are you talking about Haskell Int and word? Those are always the same size in bits and should match native point size. That is definitely an assumption of ghc 

It sounds like some information that is dropped after core is needed to correctly do something in stg/cmm in the context of the ARM64 ncg that was recently added to handle cint being 32bit in this context ?


On Tue, Oct 20, 2020 at 5:49 AM Moritz Angermann <[hidden email]> wrote:
Alright, let me expand a bit.  I've been looking at aarch64 NCG for ghc.  The Linux side of things is looking really good,
so I've moved onto the macOS side (I'm afraid I don't have any Windows aarch64 hardware, nor much windows knowledge
to even attempt a Windows version yet).

When calling C functions, the usual approach is to pass the first few arguments in registers, and then arguments that exceed
the argument passing slots on the stack.  The Arm AArch64 Procedure Call Standard (aapcs) for C does this by assigning 8byte
slots to each overflow argument on the stack.  A company I won't name, has decided to implement a slightly different variation of
the Procedure Call Standard, often referred to as darwinpcs.  This deviates from the aapcs for vargs, as well as for handling of
spilled arguments on the stack.

The aapcs allows us to generate calls to C functions without knowing the actual prototype of the function, as all arguments are
simply spilled into 8byte slots on the stack.  The darwinpcs however requires us to know the size of the arguments, so we can
properly pack them onto the stack.  Ints have 4 bytes, so we need to pack them into 4byte slots.

In the process library we have this rather fun foreign import:
foreign import ccall unsafe "runInteractiveProcess"
  c_runInteractiveProcess
        ::  Ptr CString
        -> CString
        -> Ptr CString
        -> FD
        -> FD
        -> FD
        -> Ptr FD
        -> Ptr FD
        -> Ptr FD
        -> Ptr CGid
        -> Ptr CUid
        -> CInt                         -- reset child's SIGINT & SIGQUIT handlers
        -> CInt                         -- flags
        -> Ptr CString
        -> IO PHANDLE


with the corresponding C declaration:

extern ProcHandle runInteractiveProcess( char *const args[],
                                         char *workingDirectory,
                                         char **environment,
                                         int fdStdIn,
                                         int fdStdOut,
                                         int fdStdErr,
                                         int *pfdStdInput,
                                         int *pfdStdOutput,
                                         int *pfdStdError,
                                         gid_t *childGroup,
                                         uid_t *childUser,
                                         int reset_int_quit_handlers,
                                         int flags,
                                         char **failed_doing);
This function thus takes 14 arguments. We pass only the first 8 arguments in registers, and the others on the stack.
Argument 12 and 13 are of type int.  On linux using the aapcs, we can pass those in 8byte slots on the stack. That is
both of them are effectively 64bits wide when passed.  However for darwinpcs, it is expected that these adhere to their
size and are packed as such. Therefore Argument 12 and 13 need to be passed as 4byte slots each on the stack.

This yields a moderate 8byte saving on the stack for the same function call on darwinpcs compared to aapcs.

Now onto GHC.  When we generate function calls for foreign C functions, we deal with something like:

genCCall
    :: ForeignTarget      -- function to call
    -> [CmmFormal]        -- where to put the result
    -> [CmmActual]        -- arguments (of mixed type)
    -> BlockId            -- The block we are in
    -> NatM (InstrBlock, Maybe BlockId)


based on Cmm Nodes of the form CmmUnsafeForeignCall target result_regs args

The CmmActual in the care of runInteractiveProcess hold the arguments for the function, however contrary to the function
declaration, it contains I64 slots for Argument 12 and 13. Thus computing the space needed for them based on their Cmm
Representations yields 8bytes, when they should really be 32bit and consume only 4 byte.

To illustrate this a bit better: here is what we see in the pretty printed cmm:

(_s6w3::I64) = call "ccall" arg hints:  [PtrHint, PtrHint, PtrHint, signed, signed, signed, PtrHint, PtrHint, PtrHint, PtrHint, PtrHint, signed, signed, PtrHint]  result hints:  [signed] _runInteractiveProcess(I64[Sp + 96], I64[Sp + 88], I64[Sp + 104], I64[Sp + 112], I64[Sp + 120], I64[Sp + 56], I64[Sp + 64], I64[Sp + 72], I64[Sp + 24], 0, 0, I64[Sp + 8], I64[Sp + 40] | I64[Sp + 48] | I64[Sp + 80] | 3, I64[R1 + 7]);

I've added size information to the ForeignHints (NoHint, AddrHint, SignedHint) we have, and computed both, which yields:

[(CmmReg (CmmLocal (LocalReg s6Gi (CmmType BitsCat W64))),AddrHint)
,(CmmReg (CmmLocal (LocalReg s6Gk (CmmType BitsCat W64))),AddrHint)
,(CmmReg (CmmLocal (LocalReg s6Gm (CmmType BitsCat W64))),AddrHint)
,(CmmReg (CmmLocal (LocalReg s6Go (CmmType BitsCat W64))),SignedHint W32)
,(CmmReg (CmmLocal (LocalReg s6Gq (CmmType BitsCat W64))),SignedHint W32)
,(CmmReg (CmmLocal (LocalReg s6Gs (CmmType BitsCat W64))),SignedHint W32)
,(CmmReg (CmmLocal (LocalReg s6Gu (CmmType BitsCat W64))),AddrHint)
,(CmmReg (CmmLocal (LocalReg s6Gw (CmmType BitsCat W64))),AddrHint)
,(CmmReg (CmmLocal (LocalReg s6Gy (CmmType BitsCat W64))),AddrHint)
,(CmmReg (CmmLocal (LocalReg s6Cp (CmmType BitsCat W64))),AddrHint)
,(CmmReg (CmmLocal (LocalReg s6FU (CmmType BitsCat W64))),AddrHint)
,(CmmReg (CmmLocal (LocalReg s6GA (CmmType BitsCat W64))),SignedHint W32)
,(CmmReg (CmmLocal (LocalReg s6GR (CmmType BitsCat W64))),SignedHint W32)
,(CmmReg (CmmLocal (LocalReg s6GM (CmmType BitsCat W64))),AddrHint)]

Thus, while we *do* know the right size from STG (which is what the Hints are computed from), we loose this information when lowering
into Cmm, where we represent them with W64. This is what I was alluding to in the previous email. In primRepCmmType, and mkIntCLit, we set their type to 64bit for Ints; which on this platform does not hold.

Now I've gone ahead and effectively assume Cmm is lying to me when generating Foreign Function Calls, and rely on the (new) sized
hints to produce the appropriate argument packing on the stack.  However I believe the correct way would be for GHC not to conflate Ints
and Words in Cmm; implicitly assuming they are the same width.  Sadly it's not as simple as having primRepCmmType and mkIntCLit produce 32bit types. I fear GHC internally assumes "Int" means 64bit Integer, and then just happens to make the Int ~ CInt assumption.

Cheers,
 Moritz

On Tue, Oct 20, 2020 at 3:33 PM Simon Peyton Jones <[hidden email]> wrote:

Moritz

 

I’m afraid I don’t understand any of this.  Not your fault, but  I just don’t have enough context to know what you mean.

 

Is there a current bug?  If so, can you demonstrate it?   If not, what is the problem you want to solve?  Examples are always helpful.

 

Maybe it’s worth opening a ticket too?

 

Thanks!

 

Simon

 

From: ghc-devs <[hidden email]> On Behalf Of Moritz Angermann
Sent: 20 October 2020 02:51
To: ghc-devs <[hidden email]>
Subject: GHC's internal confusion about Ints and Words

 

Hi there!

 

So there is a procedure calling convention that for reasons I did not fully understand, but seem to be historically grown, uses packed arguments for those that are spilled onto the stack. On top of that, CInt is 32bit, Word is 64bits. This provides the following spectacle:

 

While we know in STG that the CInt is 32bits wide, when lowered into Cmm, it's represented as I64 in the arguments to the C function.  Thus packing based on the format of the Cmm type would yield 8 bytes. And now, all further packed arguments have the wrong offset (by four).

 

Specifically in GHC.Cmm.Utils we find:

primRepCmmType :: Platform -> PrimRep -> CmmType

primRepCmmType platform IntRep = bWord platform

 

mkIntCLit :: Platform -> Int -> CmmLit
mkIntCLit platform i = CmmInt (toInteger i) (wordWidth platform)

 

The naive idea to just fix this and make them return cIntWidth instead, seemingly produces the correct Cmm expressions at a local level, but produces a broken compiler.

 

A second approach could be to extend the Hints into providing sizes, and using those during the foreign call generation to pack spilled arguments.  This however appears to be more of a patching up of some fundamental underlying issue, instead of rectifying it properly.

 

Maybe I'll have to go down the Hint path, it does however break current Eq assumptions, as they are sized now, and what was equal before, is only equal now if they represent the same size.

 

From a cursory glance at the issues with naively fixing the width for Int, it seems that GHC internally assumes sizeof(Int) = sizeof(Word).  Maybe there is a whole level of HsInt vs CInt discrimination missing?

 

Cheers,

 Moritz

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: GHC's internal confusion about Ints and Words

Moritz Angermann-2
Yes, that's right. I'm not sure it's in core though, as the width information still seems to be available in Stg. However the lowering from
stg into cmm widens it.

On Tue, Oct 20, 2020 at 9:57 PM Carter Schonwald <[hidden email]> wrote:
... are you talking about Haskell Int and word? Those are always the same size in bits and should match native point size. That is definitely an assumption of ghc 

It sounds like some information that is dropped after core is needed to correctly do something in stg/cmm in the context of the ARM64 ncg that was recently added to handle cint being 32bit in this context ?


On Tue, Oct 20, 2020 at 5:49 AM Moritz Angermann <[hidden email]> wrote:
Alright, let me expand a bit.  I've been looking at aarch64 NCG for ghc.  The Linux side of things is looking really good,
so I've moved onto the macOS side (I'm afraid I don't have any Windows aarch64 hardware, nor much windows knowledge
to even attempt a Windows version yet).

When calling C functions, the usual approach is to pass the first few arguments in registers, and then arguments that exceed
the argument passing slots on the stack.  The Arm AArch64 Procedure Call Standard (aapcs) for C does this by assigning 8byte
slots to each overflow argument on the stack.  A company I won't name, has decided to implement a slightly different variation of
the Procedure Call Standard, often referred to as darwinpcs.  This deviates from the aapcs for vargs, as well as for handling of
spilled arguments on the stack.

The aapcs allows us to generate calls to C functions without knowing the actual prototype of the function, as all arguments are
simply spilled into 8byte slots on the stack.  The darwinpcs however requires us to know the size of the arguments, so we can
properly pack them onto the stack.  Ints have 4 bytes, so we need to pack them into 4byte slots.

In the process library we have this rather fun foreign import:
foreign import ccall unsafe "runInteractiveProcess"
  c_runInteractiveProcess
        ::  Ptr CString
        -> CString
        -> Ptr CString
        -> FD
        -> FD
        -> FD
        -> Ptr FD
        -> Ptr FD
        -> Ptr FD
        -> Ptr CGid
        -> Ptr CUid
        -> CInt                         -- reset child's SIGINT & SIGQUIT handlers
        -> CInt                         -- flags
        -> Ptr CString
        -> IO PHANDLE


with the corresponding C declaration:

extern ProcHandle runInteractiveProcess( char *const args[],
                                         char *workingDirectory,
                                         char **environment,
                                         int fdStdIn,
                                         int fdStdOut,
                                         int fdStdErr,
                                         int *pfdStdInput,
                                         int *pfdStdOutput,
                                         int *pfdStdError,
                                         gid_t *childGroup,
                                         uid_t *childUser,
                                         int reset_int_quit_handlers,
                                         int flags,
                                         char **failed_doing);
This function thus takes 14 arguments. We pass only the first 8 arguments in registers, and the others on the stack.
Argument 12 and 13 are of type int.  On linux using the aapcs, we can pass those in 8byte slots on the stack. That is
both of them are effectively 64bits wide when passed.  However for darwinpcs, it is expected that these adhere to their
size and are packed as such. Therefore Argument 12 and 13 need to be passed as 4byte slots each on the stack.

This yields a moderate 8byte saving on the stack for the same function call on darwinpcs compared to aapcs.

Now onto GHC.  When we generate function calls for foreign C functions, we deal with something like:

genCCall
    :: ForeignTarget      -- function to call
    -> [CmmFormal]        -- where to put the result
    -> [CmmActual]        -- arguments (of mixed type)
    -> BlockId            -- The block we are in
    -> NatM (InstrBlock, Maybe BlockId)


based on Cmm Nodes of the form CmmUnsafeForeignCall target result_regs args

The CmmActual in the care of runInteractiveProcess hold the arguments for the function, however contrary to the function
declaration, it contains I64 slots for Argument 12 and 13. Thus computing the space needed for them based on their Cmm
Representations yields 8bytes, when they should really be 32bit and consume only 4 byte.

To illustrate this a bit better: here is what we see in the pretty printed cmm:

(_s6w3::I64) = call "ccall" arg hints:  [PtrHint, PtrHint, PtrHint, signed, signed, signed, PtrHint, PtrHint, PtrHint, PtrHint, PtrHint, signed, signed, PtrHint]  result hints:  [signed] _runInteractiveProcess(I64[Sp + 96], I64[Sp + 88], I64[Sp + 104], I64[Sp + 112], I64[Sp + 120], I64[Sp + 56], I64[Sp + 64], I64[Sp + 72], I64[Sp + 24], 0, 0, I64[Sp + 8], I64[Sp + 40] | I64[Sp + 48] | I64[Sp + 80] | 3, I64[R1 + 7]);

I've added size information to the ForeignHints (NoHint, AddrHint, SignedHint) we have, and computed both, which yields:

[(CmmReg (CmmLocal (LocalReg s6Gi (CmmType BitsCat W64))),AddrHint)
,(CmmReg (CmmLocal (LocalReg s6Gk (CmmType BitsCat W64))),AddrHint)
,(CmmReg (CmmLocal (LocalReg s6Gm (CmmType BitsCat W64))),AddrHint)
,(CmmReg (CmmLocal (LocalReg s6Go (CmmType BitsCat W64))),SignedHint W32)
,(CmmReg (CmmLocal (LocalReg s6Gq (CmmType BitsCat W64))),SignedHint W32)
,(CmmReg (CmmLocal (LocalReg s6Gs (CmmType BitsCat W64))),SignedHint W32)
,(CmmReg (CmmLocal (LocalReg s6Gu (CmmType BitsCat W64))),AddrHint)
,(CmmReg (CmmLocal (LocalReg s6Gw (CmmType BitsCat W64))),AddrHint)
,(CmmReg (CmmLocal (LocalReg s6Gy (CmmType BitsCat W64))),AddrHint)
,(CmmReg (CmmLocal (LocalReg s6Cp (CmmType BitsCat W64))),AddrHint)
,(CmmReg (CmmLocal (LocalReg s6FU (CmmType BitsCat W64))),AddrHint)
,(CmmReg (CmmLocal (LocalReg s6GA (CmmType BitsCat W64))),SignedHint W32)
,(CmmReg (CmmLocal (LocalReg s6GR (CmmType BitsCat W64))),SignedHint W32)
,(CmmReg (CmmLocal (LocalReg s6GM (CmmType BitsCat W64))),AddrHint)]

Thus, while we *do* know the right size from STG (which is what the Hints are computed from), we loose this information when lowering
into Cmm, where we represent them with W64. This is what I was alluding to in the previous email. In primRepCmmType, and mkIntCLit, we set their type to 64bit for Ints; which on this platform does not hold.

Now I've gone ahead and effectively assume Cmm is lying to me when generating Foreign Function Calls, and rely on the (new) sized
hints to produce the appropriate argument packing on the stack.  However I believe the correct way would be for GHC not to conflate Ints
and Words in Cmm; implicitly assuming they are the same width.  Sadly it's not as simple as having primRepCmmType and mkIntCLit produce 32bit types. I fear GHC internally assumes "Int" means 64bit Integer, and then just happens to make the Int ~ CInt assumption.

Cheers,
 Moritz

On Tue, Oct 20, 2020 at 3:33 PM Simon Peyton Jones <[hidden email]> wrote:

Moritz

 

I’m afraid I don’t understand any of this.  Not your fault, but  I just don’t have enough context to know what you mean.

 

Is there a current bug?  If so, can you demonstrate it?   If not, what is the problem you want to solve?  Examples are always helpful.

 

Maybe it’s worth opening a ticket too?

 

Thanks!

 

Simon

 

From: ghc-devs <[hidden email]> On Behalf Of Moritz Angermann
Sent: 20 October 2020 02:51
To: ghc-devs <[hidden email]>
Subject: GHC's internal confusion about Ints and Words

 

Hi there!

 

So there is a procedure calling convention that for reasons I did not fully understand, but seem to be historically grown, uses packed arguments for those that are spilled onto the stack. On top of that, CInt is 32bit, Word is 64bits. This provides the following spectacle:

 

While we know in STG that the CInt is 32bits wide, when lowered into Cmm, it's represented as I64 in the arguments to the C function.  Thus packing based on the format of the Cmm type would yield 8 bytes. And now, all further packed arguments have the wrong offset (by four).

 

Specifically in GHC.Cmm.Utils we find:

primRepCmmType :: Platform -> PrimRep -> CmmType

primRepCmmType platform IntRep = bWord platform

 

mkIntCLit :: Platform -> Int -> CmmLit
mkIntCLit platform i = CmmInt (toInteger i) (wordWidth platform)

 

The naive idea to just fix this and make them return cIntWidth instead, seemingly produces the correct Cmm expressions at a local level, but produces a broken compiler.

 

A second approach could be to extend the Hints into providing sizes, and using those during the foreign call generation to pack spilled arguments.  This however appears to be more of a patching up of some fundamental underlying issue, instead of rectifying it properly.

 

Maybe I'll have to go down the Hint path, it does however break current Eq assumptions, as they are sized now, and what was equal before, is only equal now if they represent the same size.

 

From a cursory glance at the issues with naively fixing the width for Int, it seems that GHC internally assumes sizeof(Int) = sizeof(Word).  Maybe there is a whole level of HsInt vs CInt discrimination missing?

 

Cheers,

 Moritz

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

RE: GHC's internal confusion about Ints and Words

GHC - devs mailing list

Now I've gone ahead and effectively assume Cmm is lying to me when generating Foreign Function Calls, and rely on the (new) sized hints to produce the appropriate argument packing on the stack

 

Why not instead just stop Cmm lying?

 

S

 

From: Moritz Angermann <[hidden email]>
Sent: 20 October 2020 15:03
To: Carter Schonwald <[hidden email]>
Cc: Simon Peyton Jones <[hidden email]>; ghc-devs <[hidden email]>
Subject: Re: GHC's internal confusion about Ints and Words

 

Yes, that's right. I'm not sure it's in core though, as the width information still seems to be available in Stg. However the lowering from

stg into cmm widens it.

 

On Tue, Oct 20, 2020 at 9:57 PM Carter Schonwald <[hidden email]> wrote:

... are you talking about Haskell Int and word? Those are always the same size in bits and should match native point size. That is definitely an assumption of ghc 

 

It sounds like some information that is dropped after core is needed to correctly do something in stg/cmm in the context of the ARM64 ncg that was recently added to handle cint being 32bit in this context ?

 

 

On Tue, Oct 20, 2020 at 5:49 AM Moritz Angermann <[hidden email]> wrote:

Alright, let me expand a bit.  I've been looking at aarch64 NCG for ghc.  The Linux side of things is looking really good,

so I've moved onto the macOS side (I'm afraid I don't have any Windows aarch64 hardware, nor much windows knowledge

to even attempt a Windows version yet).

 

When calling C functions, the usual approach is to pass the first few arguments in registers, and then arguments that exceed

the argument passing slots on the stack.  The Arm AArch64 Procedure Call Standard (aapcs) for C does this by assigning 8byte

slots to each overflow argument on the stack.  A company I won't name, has decided to implement a slightly different variation of

the Procedure Call Standard, often referred to as darwinpcs.  This deviates from the aapcs for vargs, as well as for handling of

spilled arguments on the stack.

 

The aapcs allows us to generate calls to C functions without knowing the actual prototype of the function, as all arguments are

simply spilled into 8byte slots on the stack.  The darwinpcs however requires us to know the size of the arguments, so we can

properly pack them onto the stack.  Ints have 4 bytes, so we need to pack them into 4byte slots.

 

In the process library we have this rather fun foreign import:

foreign import ccall unsafe "runInteractiveProcess"
  c_runInteractiveProcess
        ::  Ptr CString
        -> CString
        -> Ptr CString
        -> FD
        -> FD
        -> FD
        -> Ptr FD
        -> Ptr FD
        -> Ptr FD
        -> Ptr CGid
        -> Ptr CUid
        -> CInt                         -- reset child's SIGINT & SIGQUIT handlers
        -> CInt                         -- flags
        -> Ptr CString
        -> IO PHANDLE

 

with the corresponding C declaration:

extern ProcHandle runInteractiveProcess( char *const args[],
                                         char *workingDirectory,
                                         char **environment,
                                         int fdStdIn,
                                         int fdStdOut,
                                         int fdStdErr,
                                         int *pfdStdInput,
                                         int *pfdStdOutput,
                                         int *pfdStdError,
                                         gid_t *childGroup,
                                         uid_t *childUser,
                                         int reset_int_quit_handlers,
                                         int flags,
                                         char **failed_doing);

This function thus takes 14 arguments. We pass only the first 8 arguments in registers, and the others on the stack.

Argument 12 and 13 are of type int.  On linux using the aapcs, we can pass those in 8byte slots on the stack. That is

both of them are effectively 64bits wide when passed.  However for darwinpcs, it is expected that these adhere to their

size and are packed as such. Therefore Argument 12 and 13 need to be passed as 4byte slots each on the stack.

 

This yields a moderate 8byte saving on the stack for the same function call on darwinpcs compared to aapcs.

 

Now onto GHC.  When we generate function calls for foreign C functions, we deal with something like:

genCCall
    :: ForeignTarget      -- function to call
    -> [CmmFormal]        -- where to put the result
    -> [CmmActual]        -- arguments (of mixed type)
    -> BlockId            -- The block we are in
    -> NatM (InstrBlock, Maybe BlockId)

 

based on Cmm Nodes of the form CmmUnsafeForeignCall target result_regs args

 

The CmmActual in the care of runInteractiveProcess hold the arguments for the function, however contrary to the function

declaration, it contains I64 slots for Argument 12 and 13. Thus computing the space needed for them based on their Cmm

Representations yields 8bytes, when they should really be 32bit and consume only 4 byte.

 

To illustrate this a bit better: here is what we see in the pretty printed cmm:

 

(_s6w3::I64) = call "ccall" arg hints:  [PtrHint, PtrHint, PtrHint, signed, signed, signed, PtrHint, PtrHint, PtrHint, PtrHint, PtrHint, signed, signed, PtrHint]  result hints:  [signed] _runInteractiveProcess(I64[Sp + 96], I64[Sp + 88], I64[Sp + 104], I64[Sp + 112], I64[Sp + 120], I64[Sp + 56], I64[Sp + 64], I64[Sp + 72], I64[Sp + 24], 0, 0, I64[Sp + 8], I64[Sp + 40] | I64[Sp + 48] | I64[Sp + 80] | 3, I64[R1 + 7]);

 

I've added size information to the ForeignHints (NoHint, AddrHint, SignedHint) we have, and computed both, which yields:

[(CmmReg (CmmLocal (LocalReg s6Gi (CmmType BitsCat W64))),AddrHint)

,(CmmReg (CmmLocal (LocalReg s6Gk (CmmType BitsCat W64))),AddrHint)

,(CmmReg (CmmLocal (LocalReg s6Gm (CmmType BitsCat W64))),AddrHint)

,(CmmReg (CmmLocal (LocalReg s6Go (CmmType BitsCat W64))),SignedHint W32)

,(CmmReg (CmmLocal (LocalReg s6Gq (CmmType BitsCat W64))),SignedHint W32)

,(CmmReg (CmmLocal (LocalReg s6Gs (CmmType BitsCat W64))),SignedHint W32)

,(CmmReg (CmmLocal (LocalReg s6Gu (CmmType BitsCat W64))),AddrHint)

,(CmmReg (CmmLocal (LocalReg s6Gw (CmmType BitsCat W64))),AddrHint)

,(CmmReg (CmmLocal (LocalReg s6Gy (CmmType BitsCat W64))),AddrHint)

,(CmmReg (CmmLocal (LocalReg s6Cp (CmmType BitsCat W64))),AddrHint)

,(CmmReg (CmmLocal (LocalReg s6FU (CmmType BitsCat W64))),AddrHint)

,(CmmReg (CmmLocal (LocalReg s6GA (CmmType BitsCat W64))),SignedHint W32)

,(CmmReg (CmmLocal (LocalReg s6GR (CmmType BitsCat W64))),SignedHint W32)

,(CmmReg (CmmLocal (LocalReg s6GM (CmmType BitsCat W64))),AddrHint)]

 

Thus, while we *do* know the right size from STG (which is what the Hints are computed from), we loose this information when lowering

into Cmm, where we represent them with W64. This is what I was alluding to in the previous email. In primRepCmmType, and mkIntCLit, we set their type to 64bit for Ints; which on this platform does not hold.

 

Now I've gone ahead and effectively assume Cmm is lying to me when generating Foreign Function Calls, and rely on the (new) sized

hints to produce the appropriate argument packing on the stack.  However I believe the correct way would be for GHC not to conflate Ints

and Words in Cmm; implicitly assuming they are the same width.  Sadly it's not as simple as having primRepCmmType and mkIntCLit produce 32bit types. I fear GHC internally assumes "Int" means 64bit Integer, and then just happens to make the Int ~ CInt assumption.

 

Cheers,

 Moritz

 

On Tue, Oct 20, 2020 at 3:33 PM Simon Peyton Jones <[hidden email]> wrote:

Moritz

 

I’m afraid I don’t understand any of this.  Not your fault, but  I just don’t have enough context to know what you mean.

 

Is there a current bug?  If so, can you demonstrate it?   If not, what is the problem you want to solve?  Examples are always helpful.

 

Maybe it’s worth opening a ticket too?

 

Thanks!

 

Simon

 

From: ghc-devs <[hidden email]> On Behalf Of Moritz Angermann
Sent: 20 October 2020 02:51
To: ghc-devs <[hidden email]>
Subject: GHC's internal confusion about Ints and Words

 

Hi there!

 

So there is a procedure calling convention that for reasons I did not fully understand, but seem to be historically grown, uses packed arguments for those that are spilled onto the stack. On top of that, CInt is 32bit, Word is 64bits. This provides the following spectacle:

 

While we know in STG that the CInt is 32bits wide, when lowered into Cmm, it's represented as I64 in the arguments to the C function.  Thus packing based on the format of the Cmm type would yield 8 bytes. And now, all further packed arguments have the wrong offset (by four).

 

Specifically in GHC.Cmm.Utils we find:

primRepCmmType :: Platform -> PrimRep -> CmmType

primRepCmmType platform IntRep = bWord platform

 

mkIntCLit :: Platform -> Int -> CmmLit
mkIntCLit platform i = CmmInt (toInteger i) (wordWidth platform)

 

The naive idea to just fix this and make them return cIntWidth instead, seemingly produces the correct Cmm expressions at a local level, but produces a broken compiler.

 

A second approach could be to extend the Hints into providing sizes, and using those during the foreign call generation to pack spilled arguments.  This however appears to be more of a patching up of some fundamental underlying issue, instead of rectifying it properly.

 

Maybe I'll have to go down the Hint path, it does however break current Eq assumptions, as they are sized now, and what was equal before, is only equal now if they represent the same size.

 

From a cursory glance at the issues with naively fixing the width for Int, it seems that GHC internally assumes sizeof(Int) = sizeof(Word).  Maybe there is a whole level of HsInt vs CInt discrimination missing?

 

Cheers,

 Moritz

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: GHC's internal confusion about Ints and Words

Shao, Cheng
In reply to this post by Moritz Angermann-2
Indeed STG to Cmm lowering drops the correct size information for
ccall arguments, there's even a TODO comment that has been around for
quite a few years:
https://gitlab.haskell.org/ghc/ghc/-/blob/master/compiler/GHC/StgToCmm/Foreign.hs#L83

This has been an annoyance for Asterius as well. When we try to
translate a CmmUnsafeForeignCall node to a wasm function call, a CInt
argument (which should be i32 in wasm) can be mistyped as i64 which
causes a validation error. We have to insert wrap/extend opcodes based
on the callee function signature, but if we preserve correct argument
size in Cmm (or at least enrich the hints to include it), we won't
need such a hack.

On Tue, Oct 20, 2020 at 4:05 PM Moritz Angermann
<[hidden email]> wrote:

>
> Yes, that's right. I'm not sure it's in core though, as the width information still seems to be available in Stg. However the lowering from
> stg into cmm widens it.
>
> On Tue, Oct 20, 2020 at 9:57 PM Carter Schonwald <[hidden email]> wrote:
>>
>> ... are you talking about Haskell Int and word? Those are always the same size in bits and should match native point size. That is definitely an assumption of ghc
>>
>> It sounds like some information that is dropped after core is needed to correctly do something in stg/cmm in the context of the ARM64 ncg that was recently added to handle cint being 32bit in this context ?
>>
>>
>> On Tue, Oct 20, 2020 at 5:49 AM Moritz Angermann <[hidden email]> wrote:
>>>
>>> Alright, let me expand a bit.  I've been looking at aarch64 NCG for ghc.  The Linux side of things is looking really good,
>>> so I've moved onto the macOS side (I'm afraid I don't have any Windows aarch64 hardware, nor much windows knowledge
>>> to even attempt a Windows version yet).
>>>
>>> When calling C functions, the usual approach is to pass the first few arguments in registers, and then arguments that exceed
>>> the argument passing slots on the stack.  The Arm AArch64 Procedure Call Standard (aapcs) for C does this by assigning 8byte
>>> slots to each overflow argument on the stack.  A company I won't name, has decided to implement a slightly different variation of
>>> the Procedure Call Standard, often referred to as darwinpcs.  This deviates from the aapcs for vargs, as well as for handling of
>>> spilled arguments on the stack.
>>>
>>> The aapcs allows us to generate calls to C functions without knowing the actual prototype of the function, as all arguments are
>>> simply spilled into 8byte slots on the stack.  The darwinpcs however requires us to know the size of the arguments, so we can
>>> properly pack them onto the stack.  Ints have 4 bytes, so we need to pack them into 4byte slots.
>>>
>>> In the process library we have this rather fun foreign import:
>>> foreign import ccall unsafe "runInteractiveProcess"
>>>   c_runInteractiveProcess
>>>         ::  Ptr CString
>>>         -> CString
>>>         -> Ptr CString
>>>         -> FD
>>>         -> FD
>>>         -> FD
>>>         -> Ptr FD
>>>         -> Ptr FD
>>>         -> Ptr FD
>>>         -> Ptr CGid
>>>         -> Ptr CUid
>>>         -> CInt                         -- reset child's SIGINT & SIGQUIT handlers
>>>         -> CInt                         -- flags
>>>         -> Ptr CString
>>>         -> IO PHANDLE
>>>
>>> with the corresponding C declaration:
>>>
>>> extern ProcHandle runInteractiveProcess( char *const args[],
>>>                                          char *workingDirectory,
>>>                                          char **environment,
>>>                                          int fdStdIn,
>>>                                          int fdStdOut,
>>>                                          int fdStdErr,
>>>                                          int *pfdStdInput,
>>>                                          int *pfdStdOutput,
>>>                                          int *pfdStdError,
>>>                                          gid_t *childGroup,
>>>                                          uid_t *childUser,
>>>                                          int reset_int_quit_handlers,
>>>                                          int flags,
>>>                                          char **failed_doing);
>>> This function thus takes 14 arguments. We pass only the first 8 arguments in registers, and the others on the stack.
>>> Argument 12 and 13 are of type int.  On linux using the aapcs, we can pass those in 8byte slots on the stack. That is
>>> both of them are effectively 64bits wide when passed.  However for darwinpcs, it is expected that these adhere to their
>>> size and are packed as such. Therefore Argument 12 and 13 need to be passed as 4byte slots each on the stack.
>>>
>>> This yields a moderate 8byte saving on the stack for the same function call on darwinpcs compared to aapcs.
>>>
>>> Now onto GHC.  When we generate function calls for foreign C functions, we deal with something like:
>>>
>>> genCCall
>>>     :: ForeignTarget      -- function to call
>>>     -> [CmmFormal]        -- where to put the result
>>>     -> [CmmActual]        -- arguments (of mixed type)
>>>     -> BlockId            -- The block we are in
>>>     -> NatM (InstrBlock, Maybe BlockId)
>>>
>>> based on Cmm Nodes of the form CmmUnsafeForeignCall target result_regs args
>>>
>>> The CmmActual in the care of runInteractiveProcess hold the arguments for the function, however contrary to the function
>>> declaration, it contains I64 slots for Argument 12 and 13. Thus computing the space needed for them based on their Cmm
>>> Representations yields 8bytes, when they should really be 32bit and consume only 4 byte.
>>>
>>> To illustrate this a bit better: here is what we see in the pretty printed cmm:
>>>
>>> (_s6w3::I64) = call "ccall" arg hints:  [PtrHint, PtrHint, PtrHint, signed, signed, signed, PtrHint, PtrHint, PtrHint, PtrHint, PtrHint, signed, signed, PtrHint]  result hints:  [signed] _runInteractiveProcess(I64[Sp + 96], I64[Sp + 88], I64[Sp + 104], I64[Sp + 112], I64[Sp + 120], I64[Sp + 56], I64[Sp + 64], I64[Sp + 72], I64[Sp + 24], 0, 0, I64[Sp + 8], I64[Sp + 40] | I64[Sp + 48] | I64[Sp + 80] | 3, I64[R1 + 7]);
>>>
>>> I've added size information to the ForeignHints (NoHint, AddrHint, SignedHint) we have, and computed both, which yields:
>>>
>>> [(CmmReg (CmmLocal (LocalReg s6Gi (CmmType BitsCat W64))),AddrHint)
>>> ,(CmmReg (CmmLocal (LocalReg s6Gk (CmmType BitsCat W64))),AddrHint)
>>> ,(CmmReg (CmmLocal (LocalReg s6Gm (CmmType BitsCat W64))),AddrHint)
>>> ,(CmmReg (CmmLocal (LocalReg s6Go (CmmType BitsCat W64))),SignedHint W32)
>>> ,(CmmReg (CmmLocal (LocalReg s6Gq (CmmType BitsCat W64))),SignedHint W32)
>>> ,(CmmReg (CmmLocal (LocalReg s6Gs (CmmType BitsCat W64))),SignedHint W32)
>>> ,(CmmReg (CmmLocal (LocalReg s6Gu (CmmType BitsCat W64))),AddrHint)
>>> ,(CmmReg (CmmLocal (LocalReg s6Gw (CmmType BitsCat W64))),AddrHint)
>>> ,(CmmReg (CmmLocal (LocalReg s6Gy (CmmType BitsCat W64))),AddrHint)
>>> ,(CmmReg (CmmLocal (LocalReg s6Cp (CmmType BitsCat W64))),AddrHint)
>>> ,(CmmReg (CmmLocal (LocalReg s6FU (CmmType BitsCat W64))),AddrHint)
>>> ,(CmmReg (CmmLocal (LocalReg s6GA (CmmType BitsCat W64))),SignedHint W32)
>>> ,(CmmReg (CmmLocal (LocalReg s6GR (CmmType BitsCat W64))),SignedHint W32)
>>> ,(CmmReg (CmmLocal (LocalReg s6GM (CmmType BitsCat W64))),AddrHint)]
>>>
>>> Thus, while we *do* know the right size from STG (which is what the Hints are computed from), we loose this information when lowering
>>> into Cmm, where we represent them with W64. This is what I was alluding to in the previous email. In primRepCmmType, and mkIntCLit, we set their type to 64bit for Ints; which on this platform does not hold.
>>>
>>> Now I've gone ahead and effectively assume Cmm is lying to me when generating Foreign Function Calls, and rely on the (new) sized
>>> hints to produce the appropriate argument packing on the stack.  However I believe the correct way would be for GHC not to conflate Ints
>>> and Words in Cmm; implicitly assuming they are the same width.  Sadly it's not as simple as having primRepCmmType and mkIntCLit produce 32bit types. I fear GHC internally assumes "Int" means 64bit Integer, and then just happens to make the Int ~ CInt assumption.
>>>
>>> Cheers,
>>>  Moritz
>>>
>>> On Tue, Oct 20, 2020 at 3:33 PM Simon Peyton Jones <[hidden email]> wrote:
>>>>
>>>> Moritz
>>>>
>>>>
>>>>
>>>> I’m afraid I don’t understand any of this.  Not your fault, but  I just don’t have enough context to know what you mean.
>>>>
>>>>
>>>>
>>>> Is there a current bug?  If so, can you demonstrate it?   If not, what is the problem you want to solve?  Examples are always helpful.
>>>>
>>>>
>>>>
>>>> Maybe it’s worth opening a ticket too?
>>>>
>>>>
>>>>
>>>> Thanks!
>>>>
>>>>
>>>>
>>>> Simon
>>>>
>>>>
>>>>
>>>> From: ghc-devs <[hidden email]> On Behalf Of Moritz Angermann
>>>> Sent: 20 October 2020 02:51
>>>> To: ghc-devs <[hidden email]>
>>>> Subject: GHC's internal confusion about Ints and Words
>>>>
>>>>
>>>>
>>>> Hi there!
>>>>
>>>>
>>>>
>>>> So there is a procedure calling convention that for reasons I did not fully understand, but seem to be historically grown, uses packed arguments for those that are spilled onto the stack. On top of that, CInt is 32bit, Word is 64bits. This provides the following spectacle:
>>>>
>>>>
>>>>
>>>> While we know in STG that the CInt is 32bits wide, when lowered into Cmm, it's represented as I64 in the arguments to the C function.  Thus packing based on the format of the Cmm type would yield 8 bytes. And now, all further packed arguments have the wrong offset (by four).
>>>>
>>>>
>>>>
>>>> Specifically in GHC.Cmm.Utils we find:
>>>>
>>>> primRepCmmType :: Platform -> PrimRep -> CmmType
>>>>
>>>> primRepCmmType platform IntRep = bWord platform
>>>>
>>>>
>>>>
>>>> mkIntCLit :: Platform -> Int -> CmmLit
>>>> mkIntCLit platform i = CmmInt (toInteger i) (wordWidth platform)
>>>>
>>>>
>>>>
>>>> The naive idea to just fix this and make them return cIntWidth instead, seemingly produces the correct Cmm expressions at a local level, but produces a broken compiler.
>>>>
>>>>
>>>>
>>>> A second approach could be to extend the Hints into providing sizes, and using those during the foreign call generation to pack spilled arguments.  This however appears to be more of a patching up of some fundamental underlying issue, instead of rectifying it properly.
>>>>
>>>>
>>>>
>>>> Maybe I'll have to go down the Hint path, it does however break current Eq assumptions, as they are sized now, and what was equal before, is only equal now if they represent the same size.
>>>>
>>>>
>>>>
>>>> From a cursory glance at the issues with naively fixing the width for Int, it seems that GHC internally assumes sizeof(Int) = sizeof(Word).  Maybe there is a whole level of HsInt vs CInt discrimination missing?
>>>>
>>>>
>>>>
>>>> Cheers,
>>>>
>>>>  Moritz
>>>
>>> _______________________________________________
>>> ghc-devs mailing list
>>> [hidden email]
>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
> _______________________________________________
> ghc-devs mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: GHC's internal confusion about Ints and Words

Moritz Angermann-2
Hi *,

so, after some discussion with Simon and Simon, as well as Ben, we are all in agreement that using sized hints
is a band-aid solution for the real underlying problem.  Where the underlying problem is that we have CInt ~ Int32,
and we represent Int32 as I32# Int#.  And the proper solution would not likely be to represent Int32 as I32# Int32#.

After some trial and error (mostly be being too aggressive on changing Ints to sized ones, unnecessarily -- thanks
Ben for helping me stay on course!), I've produce what mostly amounts to this patch[1].

It also requires some additional narrow/extend calls to a few Data.Array.Base signatures to make them typecheck.

However I've got plenty of failures in the testsuite now. Hooray!

Most of them are of this form:

*** Core Lint errors : in result of Desugar (before optimization) ***
T12010.hsc:34:1: warning:
    Argument value doesn't match argument type:
    Fun type: Int# -> Int#
    Arg type: Int32#
    Arg: ds_d1B3
    In the RHS of c_socket :: CInt -> CInt -> CInt -> IO CInt
    In the body of lambda with binder ds_d1AU :: Int32
    In the body of lambda with binder ds_d1AV :: Int32
    In the body of lambda with binder ds_d1AW :: Int32
    In a case alternative: (I32# ds_d1AY :: Int32#)
    In a case alternative: (I32# ds_d1B0 :: Int32#)
    In a case alternative: (I32# ds_d1B2 :: Int32#)
    In the body of lambda with binder ds_d1B5 :: State# RealWorld
    In a case alternative: ((#,#) ds_d1B4 :: State# RealWorld,
                                  ds_d1B3 :: Int32#)
    Substitution: [TCvSubst
                     In scope: InScope {}
                     Type env: []
                     Co env: []]



Some other minor ones are test that now need explicit narrow/extending where it didn't need before.

As well as this beauty:

-- RHS size: {terms: 16, types: 0, coercions: 0, joins: 0/0}
i32 :: Int32
[GblId,
 Cpr=m1,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
         WorkFree=False, Expandable=False, Guidance=IF_ARGS [] 23 10}]
i32
  = GHC.Int.I32#
      (GHC.Prim.narrowInt32#
         (GHC.Prim.andI#
            (GHC.Prim.extendInt32#
               (GHC.Prim.narrowInt32#
                  (GHC.Prim.extendInt32# (GHC.Prim.narrowInt32# 1#))))
            (GHC.Prim.extendInt32#
               (GHC.Prim.narrowInt32#
                  (GHC.Prim.notI#
                     (GHC.Prim.extendInt32#
                        (GHC.Prim.narrowInt32#
                           (GHC.Prim.extendInt32# (GHC.Prim.narrowInt32# 1#)))))))))


This clearly needs some clean up.

Apart from that the rest seems to be mostly working. Any input would be appreciated. I'll need to do the same for
Word as well I'm afraid.

Cheers,
 Moritz
--

On Tue, Oct 20, 2020 at 10:34 PM Cheng Shao <[hidden email]> wrote:
Indeed STG to Cmm lowering drops the correct size information for
ccall arguments, there's even a TODO comment that has been around for
quite a few years:
https://gitlab.haskell.org/ghc/ghc/-/blob/master/compiler/GHC/StgToCmm/Foreign.hs#L83

This has been an annoyance for Asterius as well. When we try to
translate a CmmUnsafeForeignCall node to a wasm function call, a CInt
argument (which should be i32 in wasm) can be mistyped as i64 which
causes a validation error. We have to insert wrap/extend opcodes based
on the callee function signature, but if we preserve correct argument
size in Cmm (or at least enrich the hints to include it), we won't
need such a hack.

On Tue, Oct 20, 2020 at 4:05 PM Moritz Angermann
<[hidden email]> wrote:
>
> Yes, that's right. I'm not sure it's in core though, as the width information still seems to be available in Stg. However the lowering from
> stg into cmm widens it.
>
> On Tue, Oct 20, 2020 at 9:57 PM Carter Schonwald <[hidden email]> wrote:
>>
>> ... are you talking about Haskell Int and word? Those are always the same size in bits and should match native point size. That is definitely an assumption of ghc
>>
>> It sounds like some information that is dropped after core is needed to correctly do something in stg/cmm in the context of the ARM64 ncg that was recently added to handle cint being 32bit in this context ?
>>
>>
>> On Tue, Oct 20, 2020 at 5:49 AM Moritz Angermann <[hidden email]> wrote:
>>>
>>> Alright, let me expand a bit.  I've been looking at aarch64 NCG for ghc.  The Linux side of things is looking really good,
>>> so I've moved onto the macOS side (I'm afraid I don't have any Windows aarch64 hardware, nor much windows knowledge
>>> to even attempt a Windows version yet).
>>>
>>> When calling C functions, the usual approach is to pass the first few arguments in registers, and then arguments that exceed
>>> the argument passing slots on the stack.  The Arm AArch64 Procedure Call Standard (aapcs) for C does this by assigning 8byte
>>> slots to each overflow argument on the stack.  A company I won't name, has decided to implement a slightly different variation of
>>> the Procedure Call Standard, often referred to as darwinpcs.  This deviates from the aapcs for vargs, as well as for handling of
>>> spilled arguments on the stack.
>>>
>>> The aapcs allows us to generate calls to C functions without knowing the actual prototype of the function, as all arguments are
>>> simply spilled into 8byte slots on the stack.  The darwinpcs however requires us to know the size of the arguments, so we can
>>> properly pack them onto the stack.  Ints have 4 bytes, so we need to pack them into 4byte slots.
>>>
>>> In the process library we have this rather fun foreign import:
>>> foreign import ccall unsafe "runInteractiveProcess"
>>>   c_runInteractiveProcess
>>>         ::  Ptr CString
>>>         -> CString
>>>         -> Ptr CString
>>>         -> FD
>>>         -> FD
>>>         -> FD
>>>         -> Ptr FD
>>>         -> Ptr FD
>>>         -> Ptr FD
>>>         -> Ptr CGid
>>>         -> Ptr CUid
>>>         -> CInt                         -- reset child's SIGINT & SIGQUIT handlers
>>>         -> CInt                         -- flags
>>>         -> Ptr CString
>>>         -> IO PHANDLE
>>>
>>> with the corresponding C declaration:
>>>
>>> extern ProcHandle runInteractiveProcess( char *const args[],
>>>                                          char *workingDirectory,
>>>                                          char **environment,
>>>                                          int fdStdIn,
>>>                                          int fdStdOut,
>>>                                          int fdStdErr,
>>>                                          int *pfdStdInput,
>>>                                          int *pfdStdOutput,
>>>                                          int *pfdStdError,
>>>                                          gid_t *childGroup,
>>>                                          uid_t *childUser,
>>>                                          int reset_int_quit_handlers,
>>>                                          int flags,
>>>                                          char **failed_doing);
>>> This function thus takes 14 arguments. We pass only the first 8 arguments in registers, and the others on the stack.
>>> Argument 12 and 13 are of type int.  On linux using the aapcs, we can pass those in 8byte slots on the stack. That is
>>> both of them are effectively 64bits wide when passed.  However for darwinpcs, it is expected that these adhere to their
>>> size and are packed as such. Therefore Argument 12 and 13 need to be passed as 4byte slots each on the stack.
>>>
>>> This yields a moderate 8byte saving on the stack for the same function call on darwinpcs compared to aapcs.
>>>
>>> Now onto GHC.  When we generate function calls for foreign C functions, we deal with something like:
>>>
>>> genCCall
>>>     :: ForeignTarget      -- function to call
>>>     -> [CmmFormal]        -- where to put the result
>>>     -> [CmmActual]        -- arguments (of mixed type)
>>>     -> BlockId            -- The block we are in
>>>     -> NatM (InstrBlock, Maybe BlockId)
>>>
>>> based on Cmm Nodes of the form CmmUnsafeForeignCall target result_regs args
>>>
>>> The CmmActual in the care of runInteractiveProcess hold the arguments for the function, however contrary to the function
>>> declaration, it contains I64 slots for Argument 12 and 13. Thus computing the space needed for them based on their Cmm
>>> Representations yields 8bytes, when they should really be 32bit and consume only 4 byte.
>>>
>>> To illustrate this a bit better: here is what we see in the pretty printed cmm:
>>>
>>> (_s6w3::I64) = call "ccall" arg hints:  [PtrHint, PtrHint, PtrHint, signed, signed, signed, PtrHint, PtrHint, PtrHint, PtrHint, PtrHint, signed, signed, PtrHint]  result hints:  [signed] _runInteractiveProcess(I64[Sp + 96], I64[Sp + 88], I64[Sp + 104], I64[Sp + 112], I64[Sp + 120], I64[Sp + 56], I64[Sp + 64], I64[Sp + 72], I64[Sp + 24], 0, 0, I64[Sp + 8], I64[Sp + 40] | I64[Sp + 48] | I64[Sp + 80] | 3, I64[R1 + 7]);
>>>
>>> I've added size information to the ForeignHints (NoHint, AddrHint, SignedHint) we have, and computed both, which yields:
>>>
>>> [(CmmReg (CmmLocal (LocalReg s6Gi (CmmType BitsCat W64))),AddrHint)
>>> ,(CmmReg (CmmLocal (LocalReg s6Gk (CmmType BitsCat W64))),AddrHint)
>>> ,(CmmReg (CmmLocal (LocalReg s6Gm (CmmType BitsCat W64))),AddrHint)
>>> ,(CmmReg (CmmLocal (LocalReg s6Go (CmmType BitsCat W64))),SignedHint W32)
>>> ,(CmmReg (CmmLocal (LocalReg s6Gq (CmmType BitsCat W64))),SignedHint W32)
>>> ,(CmmReg (CmmLocal (LocalReg s6Gs (CmmType BitsCat W64))),SignedHint W32)
>>> ,(CmmReg (CmmLocal (LocalReg s6Gu (CmmType BitsCat W64))),AddrHint)
>>> ,(CmmReg (CmmLocal (LocalReg s6Gw (CmmType BitsCat W64))),AddrHint)
>>> ,(CmmReg (CmmLocal (LocalReg s6Gy (CmmType BitsCat W64))),AddrHint)
>>> ,(CmmReg (CmmLocal (LocalReg s6Cp (CmmType BitsCat W64))),AddrHint)
>>> ,(CmmReg (CmmLocal (LocalReg s6FU (CmmType BitsCat W64))),AddrHint)
>>> ,(CmmReg (CmmLocal (LocalReg s6GA (CmmType BitsCat W64))),SignedHint W32)
>>> ,(CmmReg (CmmLocal (LocalReg s6GR (CmmType BitsCat W64))),SignedHint W32)
>>> ,(CmmReg (CmmLocal (LocalReg s6GM (CmmType BitsCat W64))),AddrHint)]
>>>
>>> Thus, while we *do* know the right size from STG (which is what the Hints are computed from), we loose this information when lowering
>>> into Cmm, where we represent them with W64. This is what I was alluding to in the previous email. In primRepCmmType, and mkIntCLit, we set their type to 64bit for Ints; which on this platform does not hold.
>>>
>>> Now I've gone ahead and effectively assume Cmm is lying to me when generating Foreign Function Calls, and rely on the (new) sized
>>> hints to produce the appropriate argument packing on the stack.  However I believe the correct way would be for GHC not to conflate Ints
>>> and Words in Cmm; implicitly assuming they are the same width.  Sadly it's not as simple as having primRepCmmType and mkIntCLit produce 32bit types. I fear GHC internally assumes "Int" means 64bit Integer, and then just happens to make the Int ~ CInt assumption.
>>>
>>> Cheers,
>>>  Moritz
>>>
>>> On Tue, Oct 20, 2020 at 3:33 PM Simon Peyton Jones <[hidden email]> wrote:
>>>>
>>>> Moritz
>>>>
>>>>
>>>>
>>>> I’m afraid I don’t understand any of this.  Not your fault, but  I just don’t have enough context to know what you mean.
>>>>
>>>>
>>>>
>>>> Is there a current bug?  If so, can you demonstrate it?   If not, what is the problem you want to solve?  Examples are always helpful.
>>>>
>>>>
>>>>
>>>> Maybe it’s worth opening a ticket too?
>>>>
>>>>
>>>>
>>>> Thanks!
>>>>
>>>>
>>>>
>>>> Simon
>>>>
>>>>
>>>>
>>>> From: ghc-devs <[hidden email]> On Behalf Of Moritz Angermann
>>>> Sent: 20 October 2020 02:51
>>>> To: ghc-devs <[hidden email]>
>>>> Subject: GHC's internal confusion about Ints and Words
>>>>
>>>>
>>>>
>>>> Hi there!
>>>>
>>>>
>>>>
>>>> So there is a procedure calling convention that for reasons I did not fully understand, but seem to be historically grown, uses packed arguments for those that are spilled onto the stack. On top of that, CInt is 32bit, Word is 64bits. This provides the following spectacle:
>>>>
>>>>
>>>>
>>>> While we know in STG that the CInt is 32bits wide, when lowered into Cmm, it's represented as I64 in the arguments to the C function.  Thus packing based on the format of the Cmm type would yield 8 bytes. And now, all further packed arguments have the wrong offset (by four).
>>>>
>>>>
>>>>
>>>> Specifically in GHC.Cmm.Utils we find:
>>>>
>>>> primRepCmmType :: Platform -> PrimRep -> CmmType
>>>>
>>>> primRepCmmType platform IntRep = bWord platform
>>>>
>>>>
>>>>
>>>> mkIntCLit :: Platform -> Int -> CmmLit
>>>> mkIntCLit platform i = CmmInt (toInteger i) (wordWidth platform)
>>>>
>>>>
>>>>
>>>> The naive idea to just fix this and make them return cIntWidth instead, seemingly produces the correct Cmm expressions at a local level, but produces a broken compiler.
>>>>
>>>>
>>>>
>>>> A second approach could be to extend the Hints into providing sizes, and using those during the foreign call generation to pack spilled arguments.  This however appears to be more of a patching up of some fundamental underlying issue, instead of rectifying it properly.
>>>>
>>>>
>>>>
>>>> Maybe I'll have to go down the Hint path, it does however break current Eq assumptions, as they are sized now, and what was equal before, is only equal now if they represent the same size.
>>>>
>>>>
>>>>
>>>> From a cursory glance at the issues with naively fixing the width for Int, it seems that GHC internally assumes sizeof(Int) = sizeof(Word).  Maybe there is a whole level of HsInt vs CInt discrimination missing?
>>>>
>>>>
>>>>
>>>> Cheers,
>>>>
>>>>  Moritz
>>>
>>> _______________________________________________
>>> ghc-devs mailing list
>>> [hidden email]
>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
> _______________________________________________
> ghc-devs mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: GHC's internal confusion about Ints and Words

John Ericson-2

That sounds like a great change, and beautifully dovetails with !3658. (In fact an earlier version of that PR also attempted the Int32 change.)

I would just try to finish that and/or reuse the techniques. Sylvain has been doing all the work lately as I've been starved for time/ideas, so talk to him.

John

On 10/22/20 9:45 AM, Moritz Angermann wrote:
Hi *,

so, after some discussion with Simon and Simon, as well as Ben, we are all in agreement that using sized hints
is a band-aid solution for the real underlying problem.  Where the underlying problem is that we have CInt ~ Int32,
and we represent Int32 as I32# Int#.  And the proper solution would not likely be to represent Int32 as I32# Int32#.

After some trial and error (mostly be being too aggressive on changing Ints to sized ones, unnecessarily -- thanks
Ben for helping me stay on course!), I've produce what mostly amounts to this patch[1].

It also requires some additional narrow/extend calls to a few Data.Array.Base signatures to make them typecheck.

However I've got plenty of failures in the testsuite now. Hooray!

Most of them are of this form:

*** Core Lint errors : in result of Desugar (before optimization) ***
T12010.hsc:34:1: warning:
    Argument value doesn't match argument type:
    Fun type: Int# -> Int#
    Arg type: Int32#
    Arg: ds_d1B3
    In the RHS of c_socket :: CInt -> CInt -> CInt -> IO CInt
    In the body of lambda with binder ds_d1AU :: Int32
    In the body of lambda with binder ds_d1AV :: Int32
    In the body of lambda with binder ds_d1AW :: Int32
    In a case alternative: (I32# ds_d1AY :: Int32#)
    In a case alternative: (I32# ds_d1B0 :: Int32#)
    In a case alternative: (I32# ds_d1B2 :: Int32#)
    In the body of lambda with binder ds_d1B5 :: State# RealWorld
    In a case alternative: ((#,#) ds_d1B4 :: State# RealWorld,
                                  ds_d1B3 :: Int32#)
    Substitution: [TCvSubst
                     In scope: InScope {}
                     Type env: []
                     Co env: []]



Some other minor ones are test that now need explicit narrow/extending where it didn't need before.

As well as this beauty:

-- RHS size: {terms: 16, types: 0, coercions: 0, joins: 0/0}
i32 :: Int32
[GblId,
 Cpr=m1,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
         WorkFree=False, Expandable=False, Guidance=IF_ARGS [] 23 10}]
i32
  = GHC.Int.I32#
      (GHC.Prim.narrowInt32#
         (GHC.Prim.andI#
            (GHC.Prim.extendInt32#
               (GHC.Prim.narrowInt32#
                  (GHC.Prim.extendInt32# (GHC.Prim.narrowInt32# 1#))))
            (GHC.Prim.extendInt32#
               (GHC.Prim.narrowInt32#
                  (GHC.Prim.notI#
                     (GHC.Prim.extendInt32#
                        (GHC.Prim.narrowInt32#
                           (GHC.Prim.extendInt32# (GHC.Prim.narrowInt32# 1#)))))))))


This clearly needs some clean up.

Apart from that the rest seems to be mostly working. Any input would be appreciated. I'll need to do the same for
Word as well I'm afraid.

Cheers,
 Moritz
--

On Tue, Oct 20, 2020 at 10:34 PM Cheng Shao <[hidden email]> wrote:
Indeed STG to Cmm lowering drops the correct size information for
ccall arguments, there's even a TODO comment that has been around for
quite a few years:
https://gitlab.haskell.org/ghc/ghc/-/blob/master/compiler/GHC/StgToCmm/Foreign.hs#L83

This has been an annoyance for Asterius as well. When we try to
translate a CmmUnsafeForeignCall node to a wasm function call, a CInt
argument (which should be i32 in wasm) can be mistyped as i64 which
causes a validation error. We have to insert wrap/extend opcodes based
on the callee function signature, but if we preserve correct argument
size in Cmm (or at least enrich the hints to include it), we won't
need such a hack.

On Tue, Oct 20, 2020 at 4:05 PM Moritz Angermann
<[hidden email]> wrote:
>
> Yes, that's right. I'm not sure it's in core though, as the width information still seems to be available in Stg. However the lowering from
> stg into cmm widens it.
>
> On Tue, Oct 20, 2020 at 9:57 PM Carter Schonwald <[hidden email]> wrote:
>>
>> ... are you talking about Haskell Int and word? Those are always the same size in bits and should match native point size. That is definitely an assumption of ghc
>>
>> It sounds like some information that is dropped after core is needed to correctly do something in stg/cmm in the context of the ARM64 ncg that was recently added to handle cint being 32bit in this context ?
>>
>>
>> On Tue, Oct 20, 2020 at 5:49 AM Moritz Angermann <[hidden email]> wrote:
>>>
>>> Alright, let me expand a bit.  I've been looking at aarch64 NCG for ghc.  The Linux side of things is looking really good,
>>> so I've moved onto the macOS side (I'm afraid I don't have any Windows aarch64 hardware, nor much windows knowledge
>>> to even attempt a Windows version yet).
>>>
>>> When calling C functions, the usual approach is to pass the first few arguments in registers, and then arguments that exceed
>>> the argument passing slots on the stack.  The Arm AArch64 Procedure Call Standard (aapcs) for C does this by assigning 8byte
>>> slots to each overflow argument on the stack.  A company I won't name, has decided to implement a slightly different variation of
>>> the Procedure Call Standard, often referred to as darwinpcs.  This deviates from the aapcs for vargs, as well as for handling of
>>> spilled arguments on the stack.
>>>
>>> The aapcs allows us to generate calls to C functions without knowing the actual prototype of the function, as all arguments are
>>> simply spilled into 8byte slots on the stack.  The darwinpcs however requires us to know the size of the arguments, so we can
>>> properly pack them onto the stack.  Ints have 4 bytes, so we need to pack them into 4byte slots.
>>>
>>> In the process library we have this rather fun foreign import:
>>> foreign import ccall unsafe "runInteractiveProcess"
>>>   c_runInteractiveProcess
>>>         ::  Ptr CString
>>>         -> CString
>>>         -> Ptr CString
>>>         -> FD
>>>         -> FD
>>>         -> FD
>>>         -> Ptr FD
>>>         -> Ptr FD
>>>         -> Ptr FD
>>>         -> Ptr CGid
>>>         -> Ptr CUid
>>>         -> CInt                         -- reset child's SIGINT & SIGQUIT handlers
>>>         -> CInt                         -- flags
>>>         -> Ptr CString
>>>         -> IO PHANDLE
>>>
>>> with the corresponding C declaration:
>>>
>>> extern ProcHandle runInteractiveProcess( char *const args[],
>>>                                          char *workingDirectory,
>>>                                          char **environment,
>>>                                          int fdStdIn,
>>>                                          int fdStdOut,
>>>                                          int fdStdErr,
>>>                                          int *pfdStdInput,
>>>                                          int *pfdStdOutput,
>>>                                          int *pfdStdError,
>>>                                          gid_t *childGroup,
>>>                                          uid_t *childUser,
>>>                                          int reset_int_quit_handlers,
>>>                                          int flags,
>>>                                          char **failed_doing);
>>> This function thus takes 14 arguments. We pass only the first 8 arguments in registers, and the others on the stack.
>>> Argument 12 and 13 are of type int.  On linux using the aapcs, we can pass those in 8byte slots on the stack. That is
>>> both of them are effectively 64bits wide when passed.  However for darwinpcs, it is expected that these adhere to their
>>> size and are packed as such. Therefore Argument 12 and 13 need to be passed as 4byte slots each on the stack.
>>>
>>> This yields a moderate 8byte saving on the stack for the same function call on darwinpcs compared to aapcs.
>>>
>>> Now onto GHC.  When we generate function calls for foreign C functions, we deal with something like:
>>>
>>> genCCall
>>>     :: ForeignTarget      -- function to call
>>>     -> [CmmFormal]        -- where to put the result
>>>     -> [CmmActual]        -- arguments (of mixed type)
>>>     -> BlockId            -- The block we are in
>>>     -> NatM (InstrBlock, Maybe BlockId)
>>>
>>> based on Cmm Nodes of the form CmmUnsafeForeignCall target result_regs args
>>>
>>> The CmmActual in the care of runInteractiveProcess hold the arguments for the function, however contrary to the function
>>> declaration, it contains I64 slots for Argument 12 and 13. Thus computing the space needed for them based on their Cmm
>>> Representations yields 8bytes, when they should really be 32bit and consume only 4 byte.
>>>
>>> To illustrate this a bit better: here is what we see in the pretty printed cmm:
>>>
>>> (_s6w3::I64) = call "ccall" arg hints:  [PtrHint, PtrHint, PtrHint, signed, signed, signed, PtrHint, PtrHint, PtrHint, PtrHint, PtrHint, signed, signed, PtrHint]  result hints:  [signed] _runInteractiveProcess(I64[Sp + 96], I64[Sp + 88], I64[Sp + 104], I64[Sp + 112], I64[Sp + 120], I64[Sp + 56], I64[Sp + 64], I64[Sp + 72], I64[Sp + 24], 0, 0, I64[Sp + 8], I64[Sp + 40] | I64[Sp + 48] | I64[Sp + 80] | 3, I64[R1 + 7]);
>>>
>>> I've added size information to the ForeignHints (NoHint, AddrHint, SignedHint) we have, and computed both, which yields:
>>>
>>> [(CmmReg (CmmLocal (LocalReg s6Gi (CmmType BitsCat W64))),AddrHint)
>>> ,(CmmReg (CmmLocal (LocalReg s6Gk (CmmType BitsCat W64))),AddrHint)
>>> ,(CmmReg (CmmLocal (LocalReg s6Gm (CmmType BitsCat W64))),AddrHint)
>>> ,(CmmReg (CmmLocal (LocalReg s6Go (CmmType BitsCat W64))),SignedHint W32)
>>> ,(CmmReg (CmmLocal (LocalReg s6Gq (CmmType BitsCat W64))),SignedHint W32)
>>> ,(CmmReg (CmmLocal (LocalReg s6Gs (CmmType BitsCat W64))),SignedHint W32)
>>> ,(CmmReg (CmmLocal (LocalReg s6Gu (CmmType BitsCat W64))),AddrHint)
>>> ,(CmmReg (CmmLocal (LocalReg s6Gw (CmmType BitsCat W64))),AddrHint)
>>> ,(CmmReg (CmmLocal (LocalReg s6Gy (CmmType BitsCat W64))),AddrHint)
>>> ,(CmmReg (CmmLocal (LocalReg s6Cp (CmmType BitsCat W64))),AddrHint)
>>> ,(CmmReg (CmmLocal (LocalReg s6FU (CmmType BitsCat W64))),AddrHint)
>>> ,(CmmReg (CmmLocal (LocalReg s6GA (CmmType BitsCat W64))),SignedHint W32)
>>> ,(CmmReg (CmmLocal (LocalReg s6GR (CmmType BitsCat W64))),SignedHint W32)
>>> ,(CmmReg (CmmLocal (LocalReg s6GM (CmmType BitsCat W64))),AddrHint)]
>>>
>>> Thus, while we *do* know the right size from STG (which is what the Hints are computed from), we loose this information when lowering
>>> into Cmm, where we represent them with W64. This is what I was alluding to in the previous email. In primRepCmmType, and mkIntCLit, we set their type to 64bit for Ints; which on this platform does not hold.
>>>
>>> Now I've gone ahead and effectively assume Cmm is lying to me when generating Foreign Function Calls, and rely on the (new) sized
>>> hints to produce the appropriate argument packing on the stack.  However I believe the correct way would be for GHC not to conflate Ints
>>> and Words in Cmm; implicitly assuming they are the same width.  Sadly it's not as simple as having primRepCmmType and mkIntCLit produce 32bit types. I fear GHC internally assumes "Int" means 64bit Integer, and then just happens to make the Int ~ CInt assumption.
>>>
>>> Cheers,
>>>  Moritz
>>>
>>> On Tue, Oct 20, 2020 at 3:33 PM Simon Peyton Jones <[hidden email]> wrote:
>>>>
>>>> Moritz
>>>>
>>>>
>>>>
>>>> I’m afraid I don’t understand any of this.  Not your fault, but  I just don’t have enough context to know what you mean.
>>>>
>>>>
>>>>
>>>> Is there a current bug?  If so, can you demonstrate it?   If not, what is the problem you want to solve?  Examples are always helpful.
>>>>
>>>>
>>>>
>>>> Maybe it’s worth opening a ticket too?
>>>>
>>>>
>>>>
>>>> Thanks!
>>>>
>>>>
>>>>
>>>> Simon
>>>>
>>>>
>>>>
>>>> From: ghc-devs <[hidden email]> On Behalf Of Moritz Angermann
>>>> Sent: 20 October 2020 02:51
>>>> To: ghc-devs <[hidden email]>
>>>> Subject: GHC's internal confusion about Ints and Words
>>>>
>>>>
>>>>
>>>> Hi there!
>>>>
>>>>
>>>>
>>>> So there is a procedure calling convention that for reasons I did not fully understand, but seem to be historically grown, uses packed arguments for those that are spilled onto the stack. On top of that, CInt is 32bit, Word is 64bits. This provides the following spectacle:
>>>>
>>>>
>>>>
>>>> While we know in STG that the CInt is 32bits wide, when lowered into Cmm, it's represented as I64 in the arguments to the C function.  Thus packing based on the format of the Cmm type would yield 8 bytes. And now, all further packed arguments have the wrong offset (by four).
>>>>
>>>>
>>>>
>>>> Specifically in GHC.Cmm.Utils we find:
>>>>
>>>> primRepCmmType :: Platform -> PrimRep -> CmmType
>>>>
>>>> primRepCmmType platform IntRep = bWord platform
>>>>
>>>>
>>>>
>>>> mkIntCLit :: Platform -> Int -> CmmLit
>>>> mkIntCLit platform i = CmmInt (toInteger i) (wordWidth platform)
>>>>
>>>>
>>>>
>>>> The naive idea to just fix this and make them return cIntWidth instead, seemingly produces the correct Cmm expressions at a local level, but produces a broken compiler.
>>>>
>>>>
>>>>
>>>> A second approach could be to extend the Hints into providing sizes, and using those during the foreign call generation to pack spilled arguments.  This however appears to be more of a patching up of some fundamental underlying issue, instead of rectifying it properly.
>>>>
>>>>
>>>>
>>>> Maybe I'll have to go down the Hint path, it does however break current Eq assumptions, as they are sized now, and what was equal before, is only equal now if they represent the same size.
>>>>
>>>>
>>>>
>>>> From a cursory glance at the issues with naively fixing the width for Int, it seems that GHC internally assumes sizeof(Int) = sizeof(Word).  Maybe there is a whole level of HsInt vs CInt discrimination missing?
>>>>
>>>>
>>>>
>>>> Cheers,
>>>>
>>>>  Moritz
>>>
>>> _______________________________________________
>>> ghc-devs mailing list
>>> [hidden email]
>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
> _______________________________________________
> ghc-devs mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs