High-level Cmm code and stack allocation

classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

High-level Cmm code and stack allocation

Herbert Valerio Riedel
Hello,

According to Note [Syntax of .cmm files],

| There are two ways to write .cmm code:
|
|  (1) High-level Cmm code delegates the stack handling to GHC, and
|      never explicitly mentions Sp or registers.
|
|  (2) Low-level Cmm manages the stack itself, and must know about
|      calling conventions.
|
| Whether you want high-level or low-level Cmm is indicated by the
| presence of an argument list on a procedure.

However, while working on integer-gmp I've been noticing in
integer-gmp/cbits/gmp-wrappers.cmm that even though all Cmm procedures
have been converted to high-level Cmm, they still reference the 'Sp'
register, e.g.


    #define GMP_TAKE1_RET1(name,mp_fun)       \
    name (W_ ws1, P_ d1)                      \
    {                                         \
      W_ mp_tmp1;                             \
      W_ mp_result1;                          \
                                              \
    again:                                    \
      STK_CHK_GEN_N (2 * SIZEOF_MP_INT);      \
      MAYBE_GC(again);                        \
                                              \
      mp_tmp1    = Sp - 1 * SIZEOF_MP_INT;    \
      mp_result1 = Sp - 2 * SIZEOF_MP_INT;    \
      ...                                     \


So is this valid high-level Cmm code? What's the proper way to allocate
Stack (and/or Heap) memory from high-level Cmm code?

Cheers,
  hvr

Reply | Threaded
Open this post in threaded view
|

High-level Cmm code and stack allocation

Carter Schonwald
hey Herbert,
I generally start with looking at the primops.cmm file for examples
https://github.com/ghc/ghc/blob/master/rts/PrimOps.cmm#L572-L588

otoh, the comments in cmmparse.y indicate that's not quite "kosher"? or
maybe the comments are a lie?
https://github.com/ghc/ghc/blob/master/compiler/cmm/CmmParse.y#L24-L28


On Sat, Jan 4, 2014 at 6:26 PM, Herbert Valerio Riedel <hvr at gnu.org> wrote:

> Hello,
>
> According to Note [Syntax of .cmm files],
>
> | There are two ways to write .cmm code:
> |
> |  (1) High-level Cmm code delegates the stack handling to GHC, and
> |      never explicitly mentions Sp or registers.
> |
> |  (2) Low-level Cmm manages the stack itself, and must know about
> |      calling conventions.
> |
> | Whether you want high-level or low-level Cmm is indicated by the
> | presence of an argument list on a procedure.
>
> However, while working on integer-gmp I've been noticing in
> integer-gmp/cbits/gmp-wrappers.cmm that even though all Cmm procedures
> have been converted to high-level Cmm, they still reference the 'Sp'
> register, e.g.
>
>
>     #define GMP_TAKE1_RET1(name,mp_fun)       \
>     name (W_ ws1, P_ d1)                      \
>     {                                         \
>       W_ mp_tmp1;                             \
>       W_ mp_result1;                          \
>                                               \
>     again:                                    \
>       STK_CHK_GEN_N (2 * SIZEOF_MP_INT);      \
>       MAYBE_GC(again);                        \
>                                               \
>       mp_tmp1    = Sp - 1 * SIZEOF_MP_INT;    \
>       mp_result1 = Sp - 2 * SIZEOF_MP_INT;    \
>       ...                                     \
>
>
> So is this valid high-level Cmm code? What's the proper way to allocate
> Stack (and/or Heap) memory from high-level Cmm code?
>
> Cheers,
>   hvr
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20140104/0d191d6a/attachment.html>

Reply | Threaded
Open this post in threaded view
|

High-level Cmm code and stack allocation

Herbert Valerio Riedel
In reply to this post by Herbert Valerio Riedel
On 2014-01-05 at 00:26:52 +0100, Herbert Valerio Riedel wrote:

[...]

> So is this valid high-level Cmm code? What's the proper way to allocate
> Stack (and/or Heap) memory from high-level Cmm code?

PS: ...are function calls supposed to work as advertised in

    https://github.com/ghc/ghc/blob/master/compiler/cmm/CmmParse.y#L76 

?

I've tried using `(ret1,ret2) = call stg_fun (arg1,arg2);` in a Cmm
file, but I get a parser error on `call` with GHC HEAD; only when leave
out the return value assignement, i.e. when I use only `call stg_fun
(arg1,arg2);`, it gets parsed succesfully. Is this a bug in the
CmmParser?

Cheers,
  hvr

Reply | Threaded
Open this post in threaded view
|

High-level Cmm code and stack allocation

Carter Schonwald
i'm inclined to assume that its a parser error.
instead of (v)= call fun(args...argn);, did you try v = call fun(args1...n)
; ?


On Sat, Jan 4, 2014 at 7:27 PM, Herbert Valerio Riedel <hvr at gnu.org> wrote:

> On 2014-01-05 at 00:26:52 +0100, Herbert Valerio Riedel wrote:
>
> [...]
>
> > So is this valid high-level Cmm code? What's the proper way to allocate
> > Stack (and/or Heap) memory from high-level Cmm code?
>
> PS: ...are function calls supposed to work as advertised in
>
>     https://github.com/ghc/ghc/blob/master/compiler/cmm/CmmParse.y#L76
>
> ?
>
> I've tried using `(ret1,ret2) = call stg_fun (arg1,arg2);` in a Cmm
> file, but I get a parser error on `call` with GHC HEAD; only when leave
> out the return value assignement, i.e. when I use only `call stg_fun
> (arg1,arg2);`, it gets parsed succesfully. Is this a bug in the
> CmmParser?
>
> Cheers,
>   hvr
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20140104/4648c5cc/attachment.html>

Reply | Threaded
Open this post in threaded view
|

High-level Cmm code and stack allocation

Herbert Valerio Riedel-3
In reply to this post by Carter Schonwald
On 2014-01-05 at 01:15:53 +0100, Carter Schonwald wrote:
> hey Herbert,
> I generally start with looking at the primops.cmm file for examples
> https://github.com/ghc/ghc/blob/master/rts/PrimOps.cmm#L572-L588

    stg_decodeFloatzuIntzh ( F_ arg )
    {
        W_ p, mp_tmp1, W_ mp_tmp_w;
   
        STK_CHK_GEN_N (WDS(2));
   
        mp_tmp1  = Sp - WDS(1);
        mp_tmp_w = Sp - WDS(2);
   
        ccall __decodeFloat_Int(mp_tmp1 "ptr", mp_tmp_w "ptr", arg);
   
        return (W_[mp_tmp1], W_[mp_tmp_w]);
    }
   
that function in particular is compiled to

    [stg_decodeFloatzuIntzh() //  [F1]
             { info_tbl: []
               stack_info: arg_space: 8 updfr_space: Just 8
             }
         {offset
           cc: _c0::F32 = F1;
               goto c4;
           c4: if ((old + 0) - 2 * 8 < SpLim) goto c6; else goto c7;
           c6: I64[(young<c5> + 8)] = c5;
               call stg_gc_noregs() returns to c5, args: 8, res: 8, upd: 8;
           c5: goto c4;
           c7: _c2::I64 = Sp - 1 * 8;
               _c3::I64 = Sp - 2 * 8;
               _c8::I64 = __decodeFloat_Int;
               _c9::I64 = _c2::I64;
               _ca::I64 = _c3::I64;
               _cb::F32 = _c0::F32;
               call "ccall" arg hints:  [PtrHint, PtrHint,]  result hints:  []
                    (_c8::I64)(_c9::I64, _ca::I64, _cb::F32);
               R2 = I64[_c3::I64];
               R1 = I64[_c2::I64];
               call (P64[(old + 8)])(R2, R1) args: 8, res: 0, upd: 8;
         }
     }]
   
But I see no effort to adjust Sp (i.e. `Sp = Sp - 16`) right before the
call to __decodeFloat_Int; how is it ensured that __decodeFloat_Int
doesn't use the locations Sp-8 and Sp-16 for as its local stack?


> otoh, the comments in cmmparse.y indicate that's not quite "kosher"? or
> maybe the comments are a lie?
> https://github.com/ghc/ghc/blob/master/compiler/cmm/CmmParse.y#L24-L28

Reply | Threaded
Open this post in threaded view
|

High-level Cmm code and stack allocation

Herbert Valerio Riedel-3
In reply to this post by Carter Schonwald
On 2014-01-05 at 01:32:07 +0100, Carter Schonwald wrote:
> i'm inclined to assume that its a parser error.
> instead of (v)= call fun(args...argn);, did you try v = call fun(args1...n)
> ; ?

I've looked more closely at the parser, and the relevant productions...

    | 'call' expr '(' exprs0 ')' ';'
            { doCall $2 [] $4 }
    | '(' formals ')' '=' 'call' expr '(' exprs0 ')' ';'
            { doCall $6 $2 $8 }

...actually require the return values to be newly declared registers,
therefore the following works:

    foo()
    {
       W_ arg1, arg2;
     
       arg1 = 1; arg2 = 2;
   
       (W_ ret1, W_ ret2) = call stg_fun (arg1,arg2);
   
       return (ret2, ret1);
    }

Cheers,
  hvr

Reply | Threaded
Open this post in threaded view
|

High-level Cmm code and stack allocation

Simon Marlow-7
In reply to this post by Herbert Valerio Riedel
On 04/01/2014 23:26, Herbert Valerio Riedel wrote:

> Hello,
>
> According to Note [Syntax of .cmm files],
>
> | There are two ways to write .cmm code:
> |
> |  (1) High-level Cmm code delegates the stack handling to GHC, and
> |      never explicitly mentions Sp or registers.
> |
> |  (2) Low-level Cmm manages the stack itself, and must know about
> |      calling conventions.
> |
> | Whether you want high-level or low-level Cmm is indicated by the
> | presence of an argument list on a procedure.
>
> However, while working on integer-gmp I've been noticing in
> integer-gmp/cbits/gmp-wrappers.cmm that even though all Cmm procedures
> have been converted to high-level Cmm, they still reference the 'Sp'
> register, e.g.
>
>
>      #define GMP_TAKE1_RET1(name,mp_fun)       \
>      name (W_ ws1, P_ d1)                      \
>      {                                         \
>        W_ mp_tmp1;                             \
>        W_ mp_result1;                          \
>                                                \
>      again:                                    \
>        STK_CHK_GEN_N (2 * SIZEOF_MP_INT);      \
>        MAYBE_GC(again);                        \
>                                                \
>        mp_tmp1    = Sp - 1 * SIZEOF_MP_INT;    \
>        mp_result1 = Sp - 2 * SIZEOF_MP_INT;    \
>        ...                                     \
>
>
> So is this valid high-level Cmm code? What's the proper way to allocate
> Stack (and/or Heap) memory from high-level Cmm code?

Yes, this is technically wrong but luckily works.  I'd very much like to
have a better solution, preferably one that doesn't add any extra overhead.

The problem here is that we need to allocate a couple of temporary words
and take their address; that's an unusual thing to do in Cmm, so it only
occurs in a few places (mainly interacting with gmp).  Usually if you
want some temporary storage you can use local variables or some
heap-allocated memory.

Cheers,
Simon

Reply | Threaded
Open this post in threaded view
|

High-level Cmm code and stack allocation

Simon Marlow-7
In reply to this post by Herbert Valerio Riedel-3
On 05/01/2014 11:46, Herbert Valerio Riedel wrote:

> On 2014-01-05 at 01:15:53 +0100, Carter Schonwald wrote:
>> hey Herbert,
>> I generally start with looking at the primops.cmm file for examples
>> https://github.com/ghc/ghc/blob/master/rts/PrimOps.cmm#L572-L588
>
>      stg_decodeFloatzuIntzh ( F_ arg )
>      {
>          W_ p, mp_tmp1, W_ mp_tmp_w;
>
>          STK_CHK_GEN_N (WDS(2));
>
>          mp_tmp1  = Sp - WDS(1);
>          mp_tmp_w = Sp - WDS(2);
>
>          ccall __decodeFloat_Int(mp_tmp1 "ptr", mp_tmp_w "ptr", arg);
>
>          return (W_[mp_tmp1], W_[mp_tmp_w]);
>      }
>
> that function in particular is compiled to
>
>      [stg_decodeFloatzuIntzh() //  [F1]
>               { info_tbl: []
>                 stack_info: arg_space: 8 updfr_space: Just 8
>               }
>           {offset
>             cc: _c0::F32 = F1;
>                 goto c4;
>             c4: if ((old + 0) - 2 * 8 < SpLim) goto c6; else goto c7;
>             c6: I64[(young<c5> + 8)] = c5;
>                 call stg_gc_noregs() returns to c5, args: 8, res: 8, upd: 8;
>             c5: goto c4;
>             c7: _c2::I64 = Sp - 1 * 8;
>                 _c3::I64 = Sp - 2 * 8;
>                 _c8::I64 = __decodeFloat_Int;
>                 _c9::I64 = _c2::I64;
>                 _ca::I64 = _c3::I64;
>                 _cb::F32 = _c0::F32;
>                 call "ccall" arg hints:  [PtrHint, PtrHint,]  result hints:  []
>                      (_c8::I64)(_c9::I64, _ca::I64, _cb::F32);
>                 R2 = I64[_c3::I64];
>                 R1 = I64[_c2::I64];
>                 call (P64[(old + 8)])(R2, R1) args: 8, res: 0, upd: 8;
>           }
>       }]
>
> But I see no effort to adjust Sp (i.e. `Sp = Sp - 16`) right before the
> call to __decodeFloat_Int; how is it ensured that __decodeFloat_Int
> doesn't use the locations Sp-8 and Sp-16 for as its local stack?

__decodeFloat_Int is a C function, so it will not touch the Haskell stack.

Cheers,
        Simon

Reply | Threaded
Open this post in threaded view
|

High-level Cmm code and stack allocation

Herbert Valerio Riedel
In reply to this post by Simon Marlow-7
Hello Simon,

On 2014-01-07 at 17:04:52 +0100, Simon Marlow wrote:

[...]

> Yes, this is technically wrong but luckily works.  

...but only as long as the code-generator doesn't try to push something
on the stack, like e.g. when performing native 'call's which need to
push the return-location on the stack...?

> I'd very much like
> to have a better solution, preferably one that doesn't add any extra
> overhead.

I see... I've noticed there's a 'push() { ... }' construct that allows
to push items on the stack; couldn't we have generalized version of
that, taking a size-argument, declaring that specified amount of
stack-space is user-allocated/controlled within the '{ ... }' scope?

Greetings,
  hvr

Reply | Threaded
Open this post in threaded view
|

High-level Cmm code and stack allocation

Simon Marlow-7
On 07/01/2014 16:14, Herbert Valerio Riedel wrote:

> Hello Simon,
>
> On 2014-01-07 at 17:04:52 +0100, Simon Marlow wrote:
>
> [...]
>
>> Yes, this is technically wrong but luckily works.
>
> ...but only as long as the code-generator doesn't try to push something
> on the stack, like e.g. when performing native 'call's which need to
> push the return-location on the stack...?

Right - in principle the code generator is in control of the stack so it
can move the stack pointer whenever it likes, but in practice we know it
only does this in certain places, like when making native calls, so
these naughty functions just avoid doing that.

>> I'd very much like
>> to have a better solution, preferably one that doesn't add any extra
>> overhead.
>
> I see... I've noticed there's a 'push() { ... }' construct that allows
> to push items on the stack; couldn't we have generalized version of
> that, taking a size-argument, declaring that specified amount of
> stack-space is user-allocated/controlled within the '{ ... }' scope?

We could push a stack frame, like we do for an update frame, but the
problem is that we need a way to take the address of those stack
locations.  Taking the address of stack locations is also dodgy, because
stacks move (say, during a native call).  So it would still be unsafe.
Also pushing a stack frame would incur an extra memory write for the
info pointer, which is annoying.

Cheers,
Simon

Reply | Threaded
Open this post in threaded view
|

High-level Cmm code and stack allocation

Simon Peyton Jones
In reply to this post by Simon Marlow-7
| Yes, this is technically wrong but luckily works.  I'd very much like
| to
| have a better solution, preferably one that doesn't add any extra
| overhead.

| __decodeFloat_Int is a C function, so it will not touch the Haskell
| stack.

This all seems terribly fragile to me.  At least it ought to be surrounded with massive comments pointing out how terribly fragile it is, breaking all the rules that we carefully document elsewhere.

Can't we just allocate a Cmm "area"? The address of an area is a perfectly well-defined Cmm value.

Simon

| -----Original Message-----
| From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On Behalf Of Simon
| Marlow
| Sent: 07 January 2014 16:05
| To: Herbert Valerio Riedel; ghc-devs at haskell.org
| Subject: Re: High-level Cmm code and stack allocation
|
| On 04/01/2014 23:26, Herbert Valerio Riedel wrote:
| > Hello,
| >
| > According to Note [Syntax of .cmm files],
| >
| > | There are two ways to write .cmm code:
| > |
| > |  (1) High-level Cmm code delegates the stack handling to GHC, and
| > |      never explicitly mentions Sp or registers.
| > |
| > |  (2) Low-level Cmm manages the stack itself, and must know about
| > |      calling conventions.
| > |
| > | Whether you want high-level or low-level Cmm is indicated by the
| > | presence of an argument list on a procedure.
| >
| > However, while working on integer-gmp I've been noticing in
| > integer-gmp/cbits/gmp-wrappers.cmm that even though all Cmm
| procedures
| > have been converted to high-level Cmm, they still reference the 'Sp'
| > register, e.g.
| >
| >
| >      #define GMP_TAKE1_RET1(name,mp_fun)       \
| >      name (W_ ws1, P_ d1)                      \
| >      {                                         \
| >        W_ mp_tmp1;                             \
| >        W_ mp_result1;                          \
| >                                                \
| >      again:                                    \
| >        STK_CHK_GEN_N (2 * SIZEOF_MP_INT);      \
| >        MAYBE_GC(again);                        \
| >                                                \
| >        mp_tmp1    = Sp - 1 * SIZEOF_MP_INT;    \
| >        mp_result1 = Sp - 2 * SIZEOF_MP_INT;    \
| >        ...                                     \
| >
| >
| > So is this valid high-level Cmm code? What's the proper way to
| allocate
| > Stack (and/or Heap) memory from high-level Cmm code?
|
| Yes, this is technically wrong but luckily works.  I'd very much like
| to
| have a better solution, preferably one that doesn't add any extra
| overhead.
|
| The problem here is that we need to allocate a couple of temporary
| words
| and take their address; that's an unusual thing to do in Cmm, so it
| only
| occurs in a few places (mainly interacting with gmp).  Usually if you
| want some temporary storage you can use local variables or some
| heap-allocated memory.
|
| Cheers,
| Simon
| _______________________________________________
| ghc-devs mailing list
| ghc-devs at haskell.org
| http://www.haskell.org/mailman/listinfo/ghc-devs

Reply | Threaded
Open this post in threaded view
|

High-level Cmm code and stack allocation

Simon Marlow-7
On 07/01/14 22:53, Simon Peyton Jones wrote:

> | Yes, this is technically wrong but luckily works.  I'd very much like
> | to
> | have a better solution, preferably one that doesn't add any extra
> | overhead.
>
> | __decodeFloat_Int is a C function, so it will not touch the Haskell
> | stack.
>
> This all seems terribly fragile to me.  At least it ought to be surrounded with massive comments pointing out how terribly fragile it is, breaking all the rules that we carefully document elsewhere.
>
> Can't we just allocate a Cmm "area"? The address of an area is a perfectly well-defined Cmm value.

It is fragile, yes.  We can't use static memory because it needs to be
thread-local.  This particular hack has gone through several iterations
over the years: first we had static memory, which broke when we did the
parallel runtime, then we had special storage in the Capability, which
we gave up when GMP was split out into a separate library, because it
didn't seem right to have magic fields in the Capability for one library.

I'm looking into whether we can do temporary allocation on the heap for
this instead.

Cheers,
Simon


> Simon
>
> | -----Original Message-----
> | From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On Behalf Of Simon
> | Marlow
> | Sent: 07 January 2014 16:05
> | To: Herbert Valerio Riedel; ghc-devs at haskell.org
> | Subject: Re: High-level Cmm code and stack allocation
> |
> | On 04/01/2014 23:26, Herbert Valerio Riedel wrote:
> | > Hello,
> | >
> | > According to Note [Syntax of .cmm files],
> | >
> | > | There are two ways to write .cmm code:
> | > |
> | > |  (1) High-level Cmm code delegates the stack handling to GHC, and
> | > |      never explicitly mentions Sp or registers.
> | > |
> | > |  (2) Low-level Cmm manages the stack itself, and must know about
> | > |      calling conventions.
> | > |
> | > | Whether you want high-level or low-level Cmm is indicated by the
> | > | presence of an argument list on a procedure.
> | >
> | > However, while working on integer-gmp I've been noticing in
> | > integer-gmp/cbits/gmp-wrappers.cmm that even though all Cmm
> | procedures
> | > have been converted to high-level Cmm, they still reference the 'Sp'
> | > register, e.g.
> | >
> | >
> | >      #define GMP_TAKE1_RET1(name,mp_fun)       \
> | >      name (W_ ws1, P_ d1)                      \
> | >      {                                         \
> | >        W_ mp_tmp1;                             \
> | >        W_ mp_result1;                          \
> | >                                                \
> | >      again:                                    \
> | >        STK_CHK_GEN_N (2 * SIZEOF_MP_INT);      \
> | >        MAYBE_GC(again);                        \
> | >                                                \
> | >        mp_tmp1    = Sp - 1 * SIZEOF_MP_INT;    \
> | >        mp_result1 = Sp - 2 * SIZEOF_MP_INT;    \
> | >        ...                                     \
> | >
> | >
> | > So is this valid high-level Cmm code? What's the proper way to
> | allocate
> | > Stack (and/or Heap) memory from high-level Cmm code?
> |
> | Yes, this is technically wrong but luckily works.  I'd very much like
> | to
> | have a better solution, preferably one that doesn't add any extra
> | overhead.
> |
> | The problem here is that we need to allocate a couple of temporary
> | words
> | and take their address; that's an unusual thing to do in Cmm, so it
> | only
> | occurs in a few places (mainly interacting with gmp).  Usually if you
> | want some temporary storage you can use local variables or some
> | heap-allocated memory.
> |
> | Cheers,
> | Simon
> | _______________________________________________
> | ghc-devs mailing list
> | ghc-devs at haskell.org
> | http://www.haskell.org/mailman/listinfo/ghc-devs
>


Reply | Threaded
Open this post in threaded view
|

High-level Cmm code and stack allocation

Simon Peyton Jones
| > Can't we just allocate a Cmm "area"? The address of an area is a
| perfectly well-defined Cmm value.

What about this idea?

Simon

| -----Original Message-----
| From: Simon Marlow [mailto:marlowsd at gmail.com]
| Sent: 08 January 2014 09:26
| To: Simon Peyton Jones; Herbert Valerio Riedel
| Cc: ghc-devs at haskell.org
| Subject: Re: High-level Cmm code and stack allocation
|
| On 07/01/14 22:53, Simon Peyton Jones wrote:
| > | Yes, this is technically wrong but luckily works.  I'd very much
| > | like to have a better solution, preferably one that doesn't add any
| > | extra overhead.
| >
| > | __decodeFloat_Int is a C function, so it will not touch the Haskell
| > | stack.
| >
| > This all seems terribly fragile to me.  At least it ought to be
| surrounded with massive comments pointing out how terribly fragile it
| is, breaking all the rules that we carefully document elsewhere.
| >
| > Can't we just allocate a Cmm "area"? The address of an area is a
| perfectly well-defined Cmm value.
|
| It is fragile, yes.  We can't use static memory because it needs to be
| thread-local.  This particular hack has gone through several iterations
| over the years: first we had static memory, which broke when we did the
| parallel runtime, then we had special storage in the Capability, which
| we gave up when GMP was split out into a separate library, because it
| didn't seem right to have magic fields in the Capability for one
| library.
|
| I'm looking into whether we can do temporary allocation on the heap for
| this instead.
|
| Cheers,
| Simon
|
|
| > Simon
| >
| > | -----Original Message-----
| > | From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On Behalf Of
| > | Simon Marlow
| > | Sent: 07 January 2014 16:05
| > | To: Herbert Valerio Riedel; ghc-devs at haskell.org
| > | Subject: Re: High-level Cmm code and stack allocation
| > |
| > | On 04/01/2014 23:26, Herbert Valerio Riedel wrote:
| > | > Hello,
| > | >
| > | > According to Note [Syntax of .cmm files],
| > | >
| > | > | There are two ways to write .cmm code:
| > | > |
| > | > |  (1) High-level Cmm code delegates the stack handling to GHC,
| and
| > | > |      never explicitly mentions Sp or registers.
| > | > |
| > | > |  (2) Low-level Cmm manages the stack itself, and must know about
| > | > |      calling conventions.
| > | > |
| > | > | Whether you want high-level or low-level Cmm is indicated by the
| > | > | presence of an argument list on a procedure.
| > | >
| > | > However, while working on integer-gmp I've been noticing in
| > | > integer-gmp/cbits/gmp-wrappers.cmm that even though all Cmm
| > | procedures
| > | > have been converted to high-level Cmm, they still reference the
| 'Sp'
| > | > register, e.g.
| > | >
| > | >
| > | >      #define GMP_TAKE1_RET1(name,mp_fun)       \
| > | >      name (W_ ws1, P_ d1)                      \
| > | >      {                                         \
| > | >        W_ mp_tmp1;                             \
| > | >        W_ mp_result1;                          \
| > | >                                                \
| > | >      again:                                    \
| > | >        STK_CHK_GEN_N (2 * SIZEOF_MP_INT);      \
| > | >        MAYBE_GC(again);                        \
| > | >                                                \
| > | >        mp_tmp1    = Sp - 1 * SIZEOF_MP_INT;    \
| > | >        mp_result1 = Sp - 2 * SIZEOF_MP_INT;    \
| > | >        ...                                     \
| > | >
| > | >
| > | > So is this valid high-level Cmm code? What's the proper way to
| > | allocate
| > | > Stack (and/or Heap) memory from high-level Cmm code?
| > |
| > | Yes, this is technically wrong but luckily works.  I'd very much
| > | like to have a better solution, preferably one that doesn't add any
| > | extra overhead.
| > |
| > | The problem here is that we need to allocate a couple of temporary
| > | words and take their address; that's an unusual thing to do in Cmm,
| > | so it only occurs in a few places (mainly interacting with gmp).
| > | Usually if you want some temporary storage you can use local
| > | variables or some heap-allocated memory.
| > |
| > | Cheers,
| > | Simon
| > | _______________________________________________
| > | ghc-devs mailing list
| > | ghc-devs at haskell.org
| > | http://www.haskell.org/mailman/listinfo/ghc-devs
| >


Reply | Threaded
Open this post in threaded view
|

High-level Cmm code and stack allocation

Simon Marlow-7
On 08/01/2014 10:07, Simon Peyton Jones wrote:
> | > Can't we just allocate a Cmm "area"? The address of an area is a
> | perfectly well-defined Cmm value.
>
> What about this idea?

We don't really have a general concept of areas (any more), and areas
aren't exposed in the concrete Cmm syntax at all.  The current semantics
is that areas may overlap with each other, so there should only be one
active area at any point.  I found that this was important to ensure
that we could generate good code from the stack layout algorithm,
otherwise it had to make pessimistic assumptions and use too much stack.

You're going to ask me where this is documented, and I think I have to
admit to slacking off, sorry :-)  We did discuss it at the time, and I
made copious notes, but I didn't transfer those to the code.  I'll add a
Note.

Cheers,
Simon


> Simon
>
> | -----Original Message-----
> | From: Simon Marlow [mailto:marlowsd at gmail.com]
> | Sent: 08 January 2014 09:26
> | To: Simon Peyton Jones; Herbert Valerio Riedel
> | Cc: ghc-devs at haskell.org
> | Subject: Re: High-level Cmm code and stack allocation
> |
> | On 07/01/14 22:53, Simon Peyton Jones wrote:
> | > | Yes, this is technically wrong but luckily works.  I'd very much
> | > | like to have a better solution, preferably one that doesn't add any
> | > | extra overhead.
> | >
> | > | __decodeFloat_Int is a C function, so it will not touch the Haskell
> | > | stack.
> | >
> | > This all seems terribly fragile to me.  At least it ought to be
> | surrounded with massive comments pointing out how terribly fragile it
> | is, breaking all the rules that we carefully document elsewhere.
> | >
> | > Can't we just allocate a Cmm "area"? The address of an area is a
> | perfectly well-defined Cmm value.
> |
> | It is fragile, yes.  We can't use static memory because it needs to be
> | thread-local.  This particular hack has gone through several iterations
> | over the years: first we had static memory, which broke when we did the
> | parallel runtime, then we had special storage in the Capability, which
> | we gave up when GMP was split out into a separate library, because it
> | didn't seem right to have magic fields in the Capability for one
> | library.
> |
> | I'm looking into whether we can do temporary allocation on the heap for
> | this instead.
> |
> | Cheers,
> | Simon
> |
> |
> | > Simon
> | >
> | > | -----Original Message-----
> | > | From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On Behalf Of
> | > | Simon Marlow
> | > | Sent: 07 January 2014 16:05
> | > | To: Herbert Valerio Riedel; ghc-devs at haskell.org
> | > | Subject: Re: High-level Cmm code and stack allocation
> | > |
> | > | On 04/01/2014 23:26, Herbert Valerio Riedel wrote:
> | > | > Hello,
> | > | >
> | > | > According to Note [Syntax of .cmm files],
> | > | >
> | > | > | There are two ways to write .cmm code:
> | > | > |
> | > | > |  (1) High-level Cmm code delegates the stack handling to GHC,
> | and
> | > | > |      never explicitly mentions Sp or registers.
> | > | > |
> | > | > |  (2) Low-level Cmm manages the stack itself, and must know about
> | > | > |      calling conventions.
> | > | > |
> | > | > | Whether you want high-level or low-level Cmm is indicated by the
> | > | > | presence of an argument list on a procedure.
> | > | >
> | > | > However, while working on integer-gmp I've been noticing in
> | > | > integer-gmp/cbits/gmp-wrappers.cmm that even though all Cmm
> | > | procedures
> | > | > have been converted to high-level Cmm, they still reference the
> | 'Sp'
> | > | > register, e.g.
> | > | >
> | > | >
> | > | >      #define GMP_TAKE1_RET1(name,mp_fun)       \
> | > | >      name (W_ ws1, P_ d1)                      \
> | > | >      {                                         \
> | > | >        W_ mp_tmp1;                             \
> | > | >        W_ mp_result1;                          \
> | > | >                                                \
> | > | >      again:                                    \
> | > | >        STK_CHK_GEN_N (2 * SIZEOF_MP_INT);      \
> | > | >        MAYBE_GC(again);                        \
> | > | >                                                \
> | > | >        mp_tmp1    = Sp - 1 * SIZEOF_MP_INT;    \
> | > | >        mp_result1 = Sp - 2 * SIZEOF_MP_INT;    \
> | > | >        ...                                     \
> | > | >
> | > | >
> | > | > So is this valid high-level Cmm code? What's the proper way to
> | > | allocate
> | > | > Stack (and/or Heap) memory from high-level Cmm code?
> | > |
> | > | Yes, this is technically wrong but luckily works.  I'd very much
> | > | like to have a better solution, preferably one that doesn't add any
> | > | extra overhead.
> | > |
> | > | The problem here is that we need to allocate a couple of temporary
> | > | words and take their address; that's an unusual thing to do in Cmm,
> | > | so it only occurs in a few places (mainly interacting with gmp).
> | > | Usually if you want some temporary storage you can use local
> | > | variables or some heap-allocated memory.
> | > |
> | > | Cheers,
> | > | Simon
> | > | _______________________________________________
> | > | ghc-devs mailing list
> | > | ghc-devs at haskell.org
> | > | http://www.haskell.org/mailman/listinfo/ghc-devs
> | >
>

Reply | Threaded
Open this post in threaded view
|

High-level Cmm code and stack allocation

Simon Peyton Jones
That documentation would be good, yes!  I don't know what it means to say "we don't really have a general concept of areas any more".  We did before, and I didn't know that it had gone away.  Urk!  We can have lots of live areas, notably the old area (for the current call/return parameters, the call area for a call we are preparing, and the one-slot areas for variables we are saving on the stack.

Here's he current story https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/StackAreas

I agree that we have no concrete syntax for talking about areas, but that is something we could fix.  But I'm worried that they may not mean what they used to mean.

Simon

| -----Original Message-----
| From: Simon Marlow [mailto:marlowsd at gmail.com]
| Sent: 09 January 2014 08:39
| To: Simon Peyton Jones; Herbert Valerio Riedel
| Cc: ghc-devs at haskell.org
| Subject: Re: High-level Cmm code and stack allocation
|
| On 08/01/2014 10:07, Simon Peyton Jones wrote:
| > | > Can't we just allocate a Cmm "area"? The address of an area is a
| > | perfectly well-defined Cmm value.
| >
| > What about this idea?
|
| We don't really have a general concept of areas (any more), and areas
| aren't exposed in the concrete Cmm syntax at all.  The current semantics
| is that areas may overlap with each other, so there should only be one
| active area at any point.  I found that this was important to ensure
| that we could generate good code from the stack layout algorithm,
| otherwise it had to make pessimistic assumptions and use too much stack.
|
| You're going to ask me where this is documented, and I think I have to
| admit to slacking off, sorry :-)  We did discuss it at the time, and I
| made copious notes, but I didn't transfer those to the code.  I'll add a
| Note.
|
| Cheers,
| Simon
|
|
| > Simon
| >
| > | -----Original Message-----
| > | From: Simon Marlow [mailto:marlowsd at gmail.com]
| > | Sent: 08 January 2014 09:26
| > | To: Simon Peyton Jones; Herbert Valerio Riedel
| > | Cc: ghc-devs at haskell.org
| > | Subject: Re: High-level Cmm code and stack allocation
| > |
| > | On 07/01/14 22:53, Simon Peyton Jones wrote:
| > | > | Yes, this is technically wrong but luckily works.  I'd very much
| > | > | like to have a better solution, preferably one that doesn't add
| > | > | any extra overhead.
| > | >
| > | > | __decodeFloat_Int is a C function, so it will not touch the
| > | > | Haskell stack.
| > | >
| > | > This all seems terribly fragile to me.  At least it ought to be
| > | surrounded with massive comments pointing out how terribly fragile
| > | it is, breaking all the rules that we carefully document elsewhere.
| > | >
| > | > Can't we just allocate a Cmm "area"? The address of an area is a
| > | perfectly well-defined Cmm value.
| > |
| > | It is fragile, yes.  We can't use static memory because it needs to
| > | be thread-local.  This particular hack has gone through several
| > | iterations over the years: first we had static memory, which broke
| > | when we did the parallel runtime, then we had special storage in the
| > | Capability, which we gave up when GMP was split out into a separate
| > | library, because it didn't seem right to have magic fields in the
| > | Capability for one library.
| > |
| > | I'm looking into whether we can do temporary allocation on the heap
| > | for this instead.
| > |
| > | Cheers,
| > | Simon
| > |
| > |
| > | > Simon
| > | >
| > | > | -----Original Message-----
| > | > | From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On Behalf
| > | > | Of Simon Marlow
| > | > | Sent: 07 January 2014 16:05
| > | > | To: Herbert Valerio Riedel; ghc-devs at haskell.org
| > | > | Subject: Re: High-level Cmm code and stack allocation
| > | > |
| > | > | On 04/01/2014 23:26, Herbert Valerio Riedel wrote:
| > | > | > Hello,
| > | > | >
| > | > | > According to Note [Syntax of .cmm files],
| > | > | >
| > | > | > | There are two ways to write .cmm code:
| > | > | > |
| > | > | > |  (1) High-level Cmm code delegates the stack handling to
| > | > | > | GHC,
| > | and
| > | > | > |      never explicitly mentions Sp or registers.
| > | > | > |
| > | > | > |  (2) Low-level Cmm manages the stack itself, and must know
| about
| > | > | > |      calling conventions.
| > | > | > |
| > | > | > | Whether you want high-level or low-level Cmm is indicated by
| > | > | > | the presence of an argument list on a procedure.
| > | > | >
| > | > | > However, while working on integer-gmp I've been noticing in
| > | > | > integer-gmp/cbits/gmp-wrappers.cmm that even though all Cmm
| > | > | procedures
| > | > | > have been converted to high-level Cmm, they still reference
| > | > | > the
| > | 'Sp'
| > | > | > register, e.g.
| > | > | >
| > | > | >
| > | > | >      #define GMP_TAKE1_RET1(name,mp_fun)       \
| > | > | >      name (W_ ws1, P_ d1)                      \
| > | > | >      {                                         \
| > | > | >        W_ mp_tmp1;                             \
| > | > | >        W_ mp_result1;                          \
| > | > | >                                                \
| > | > | >      again:                                    \
| > | > | >        STK_CHK_GEN_N (2 * SIZEOF_MP_INT);      \
| > | > | >        MAYBE_GC(again);                        \
| > | > | >                                                \
| > | > | >        mp_tmp1    = Sp - 1 * SIZEOF_MP_INT;    \
| > | > | >        mp_result1 = Sp - 2 * SIZEOF_MP_INT;    \
| > | > | >        ...                                     \
| > | > | >
| > | > | >
| > | > | > So is this valid high-level Cmm code? What's the proper way to
| > | > | allocate
| > | > | > Stack (and/or Heap) memory from high-level Cmm code?
| > | > |
| > | > | Yes, this is technically wrong but luckily works.  I'd very much
| > | > | like to have a better solution, preferably one that doesn't add
| > | > | any extra overhead.
| > | > |
| > | > | The problem here is that we need to allocate a couple of
| > | > | temporary words and take their address; that's an unusual thing
| > | > | to do in Cmm, so it only occurs in a few places (mainly
| interacting with gmp).
| > | > | Usually if you want some temporary storage you can use local
| > | > | variables or some heap-allocated memory.
| > | > |
| > | > | Cheers,
| > | > | Simon
| > | > | _______________________________________________
| > | > | ghc-devs mailing list
| > | > | ghc-devs at haskell.org
| > | > | http://www.haskell.org/mailman/listinfo/ghc-devs
| > | >
| >

Reply | Threaded
Open this post in threaded view
|

High-level Cmm code and stack allocation

Simon Marlow-7
There are no one-slot areas any more, I ditched those when I rewrote the
stack allocator.  There is only ever one live area: either the old area
or the young area for a call we are about to make or have just made.
(see the data type: I removed the one-slot areas)

I struggled for a long time with this.  The problem is that with the
semantics of non-overlapping areas, code motion optimisations would tend
to increase the stack requirements of the function by overlapping the
live ranges of the areas.  I concluded that actually what we wanted was
areas that really do overlap, and optimisations that respect that, so
that we get more efficient stack usage.

Cheers,
        Simon

On 10/01/2014 15:22, Simon Peyton Jones wrote:

> That documentation would be good, yes!  I don't know what it means to say "we don't really have a general concept of areas any more".  We did before, and I didn't know that it had gone away.  Urk!  We can have lots of live areas, notably the old area (for the current call/return parameters, the call area for a call we are preparing, and the one-slot areas for variables we are saving on the stack.
>
> Here's he current story https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/StackAreas
>
> I agree that we have no concrete syntax for talking about areas, but that is something we could fix.  But I'm worried that they may not mean what they used to mean.
>
> Simon
>
> | -----Original Message-----
> | From: Simon Marlow [mailto:marlowsd at gmail.com]
> | Sent: 09 January 2014 08:39
> | To: Simon Peyton Jones; Herbert Valerio Riedel
> | Cc: ghc-devs at haskell.org
> | Subject: Re: High-level Cmm code and stack allocation
> |
> | On 08/01/2014 10:07, Simon Peyton Jones wrote:
> | > | > Can't we just allocate a Cmm "area"? The address of an area is a
> | > | perfectly well-defined Cmm value.
> | >
> | > What about this idea?
> |
> | We don't really have a general concept of areas (any more), and areas
> | aren't exposed in the concrete Cmm syntax at all.  The current semantics
> | is that areas may overlap with each other, so there should only be one
> | active area at any point.  I found that this was important to ensure
> | that we could generate good code from the stack layout algorithm,
> | otherwise it had to make pessimistic assumptions and use too much stack.
> |
> | You're going to ask me where this is documented, and I think I have to
> | admit to slacking off, sorry :-)  We did discuss it at the time, and I
> | made copious notes, but I didn't transfer those to the code.  I'll add a
> | Note.
> |
> | Cheers,
> | Simon
> |
> |
> | > Simon
> | >
> | > | -----Original Message-----
> | > | From: Simon Marlow [mailto:marlowsd at gmail.com]
> | > | Sent: 08 January 2014 09:26
> | > | To: Simon Peyton Jones; Herbert Valerio Riedel
> | > | Cc: ghc-devs at haskell.org
> | > | Subject: Re: High-level Cmm code and stack allocation
> | > |
> | > | On 07/01/14 22:53, Simon Peyton Jones wrote:
> | > | > | Yes, this is technically wrong but luckily works.  I'd very much
> | > | > | like to have a better solution, preferably one that doesn't add
> | > | > | any extra overhead.
> | > | >
> | > | > | __decodeFloat_Int is a C function, so it will not touch the
> | > | > | Haskell stack.
> | > | >
> | > | > This all seems terribly fragile to me.  At least it ought to be
> | > | surrounded with massive comments pointing out how terribly fragile
> | > | it is, breaking all the rules that we carefully document elsewhere.
> | > | >
> | > | > Can't we just allocate a Cmm "area"? The address of an area is a
> | > | perfectly well-defined Cmm value.
> | > |
> | > | It is fragile, yes.  We can't use static memory because it needs to
> | > | be thread-local.  This particular hack has gone through several
> | > | iterations over the years: first we had static memory, which broke
> | > | when we did the parallel runtime, then we had special storage in the
> | > | Capability, which we gave up when GMP was split out into a separate
> | > | library, because it didn't seem right to have magic fields in the
> | > | Capability for one library.
> | > |
> | > | I'm looking into whether we can do temporary allocation on the heap
> | > | for this instead.
> | > |
> | > | Cheers,
> | > | Simon
> | > |
> | > |
> | > | > Simon
> | > | >
> | > | > | -----Original Message-----
> | > | > | From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On Behalf
> | > | > | Of Simon Marlow
> | > | > | Sent: 07 January 2014 16:05
> | > | > | To: Herbert Valerio Riedel; ghc-devs at haskell.org
> | > | > | Subject: Re: High-level Cmm code and stack allocation
> | > | > |
> | > | > | On 04/01/2014 23:26, Herbert Valerio Riedel wrote:
> | > | > | > Hello,
> | > | > | >
> | > | > | > According to Note [Syntax of .cmm files],
> | > | > | >
> | > | > | > | There are two ways to write .cmm code:
> | > | > | > |
> | > | > | > |  (1) High-level Cmm code delegates the stack handling to
> | > | > | > | GHC,
> | > | and
> | > | > | > |      never explicitly mentions Sp or registers.
> | > | > | > |
> | > | > | > |  (2) Low-level Cmm manages the stack itself, and must know
> | about
> | > | > | > |      calling conventions.
> | > | > | > |
> | > | > | > | Whether you want high-level or low-level Cmm is indicated by
> | > | > | > | the presence of an argument list on a procedure.
> | > | > | >
> | > | > | > However, while working on integer-gmp I've been noticing in
> | > | > | > integer-gmp/cbits/gmp-wrappers.cmm that even though all Cmm
> | > | > | procedures
> | > | > | > have been converted to high-level Cmm, they still reference
> | > | > | > the
> | > | 'Sp'
> | > | > | > register, e.g.
> | > | > | >
> | > | > | >
> | > | > | >      #define GMP_TAKE1_RET1(name,mp_fun)       \
> | > | > | >      name (W_ ws1, P_ d1)                      \
> | > | > | >      {                                         \
> | > | > | >        W_ mp_tmp1;                             \
> | > | > | >        W_ mp_result1;                          \
> | > | > | >                                                \
> | > | > | >      again:                                    \
> | > | > | >        STK_CHK_GEN_N (2 * SIZEOF_MP_INT);      \
> | > | > | >        MAYBE_GC(again);                        \
> | > | > | >                                                \
> | > | > | >        mp_tmp1    = Sp - 1 * SIZEOF_MP_INT;    \
> | > | > | >        mp_result1 = Sp - 2 * SIZEOF_MP_INT;    \
> | > | > | >        ...                                     \
> | > | > | >
> | > | > | >
> | > | > | > So is this valid high-level Cmm code? What's the proper way to
> | > | > | allocate
> | > | > | > Stack (and/or Heap) memory from high-level Cmm code?
> | > | > |
> | > | > | Yes, this is technically wrong but luckily works.  I'd very much
> | > | > | like to have a better solution, preferably one that doesn't add
> | > | > | any extra overhead.
> | > | > |
> | > | > | The problem here is that we need to allocate a couple of
> | > | > | temporary words and take their address; that's an unusual thing
> | > | > | to do in Cmm, so it only occurs in a few places (mainly
> | interacting with gmp).
> | > | > | Usually if you want some temporary storage you can use local
> | > | > | variables or some heap-allocated memory.
> | > | > |
> | > | > | Cheers,
> | > | > | Simon
> | > | > | _______________________________________________
> | > | > | ghc-devs mailing list
> | > | > | ghc-devs at haskell.org
> | > | > | http://www.haskell.org/mailman/listinfo/ghc-devs
> | > | >
> | >
>

Reply | Threaded
Open this post in threaded view
|

High-level Cmm code and stack allocation

Simon Peyton Jones
Oh, ok.  Alas, a good chunk of my model of Cmm has just gone out of the window.  I thought that areas were such a lovely, well-behaved abstraction.  I was thrilled when we came up with them, and I'm very sorry to see them go.

There are no many things that I no longer understand.  I now have no idea how we save live variables over a call, or how multiple returned values from one call (returned on the stack) stay right where they are if they are live across the next call.

What was the actual problem?  That functions used too much stack, so the stack was getting too big?  But a one slot area corresponds exactly to a live variable, so I don't see how the area abstraction could possibly increase stack size.  And is stack size a crucial issue anyway?

Apart from anything else, areas would have given a lovely solution to the problem this thread started with!

I guess we can talk about this when you next visit?  But some documentation would be welcome.

Simon

| -----Original Message-----
| From: Simon Marlow [mailto:marlowsd at gmail.com]
| Sent: 10 January 2014 16:24
| To: Simon Peyton Jones; Herbert Valerio Riedel
| Cc: ghc-devs at haskell.org
| Subject: Re: High-level Cmm code and stack allocation
|
| There are no one-slot areas any more, I ditched those when I rewrote the
| stack allocator.  There is only ever one live area: either the old area
| or the young area for a call we are about to make or have just made.
| (see the data type: I removed the one-slot areas)
|
| I struggled for a long time with this.  The problem is that with the
| semantics of non-overlapping areas, code motion optimisations would tend
| to increase the stack requirements of the function by overlapping the
| live ranges of the areas.  I concluded that actually what we wanted was
| areas that really do overlap, and optimisations that respect that, so
| that we get more efficient stack usage.
|
| Cheers,
| Simon
|
| On 10/01/2014 15:22, Simon Peyton Jones wrote:
| > That documentation would be good, yes!  I don't know what it means to
| say "we don't really have a general concept of areas any more".  We did
| before, and I didn't know that it had gone away.  Urk!  We can have lots
| of live areas, notably the old area (for the current call/return
| parameters, the call area for a call we are preparing, and the one-slot
| areas for variables we are saving on the stack.
| >
| > Here's he current story
| > https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/StackAreas
| >
| > I agree that we have no concrete syntax for talking about areas, but
| that is something we could fix.  But I'm worried that they may not mean
| what they used to mean.
| >
| > Simon
| >
| > | -----Original Message-----
| > | From: Simon Marlow [mailto:marlowsd at gmail.com]
| > | Sent: 09 January 2014 08:39
| > | To: Simon Peyton Jones; Herbert Valerio Riedel
| > | Cc: ghc-devs at haskell.org
| > | Subject: Re: High-level Cmm code and stack allocation
| > |
| > | On 08/01/2014 10:07, Simon Peyton Jones wrote:
| > | > | > Can't we just allocate a Cmm "area"? The address of an area is
| > | > | > a
| > | > | perfectly well-defined Cmm value.
| > | >
| > | > What about this idea?
| > |
| > | We don't really have a general concept of areas (any more), and
| > | areas aren't exposed in the concrete Cmm syntax at all.  The current
| > | semantics is that areas may overlap with each other, so there should
| > | only be one active area at any point.  I found that this was
| > | important to ensure that we could generate good code from the stack
| > | layout algorithm, otherwise it had to make pessimistic assumptions
| and use too much stack.
| > |
| > | You're going to ask me where this is documented, and I think I have
| > | to admit to slacking off, sorry :-)  We did discuss it at the time,
| > | and I made copious notes, but I didn't transfer those to the code.
| > | I'll add a Note.
| > |
| > | Cheers,
| > | Simon
| > |
| > |
| > | > Simon
| > | >
| > | > | -----Original Message-----
| > | > | From: Simon Marlow [mailto:marlowsd at gmail.com]
| > | > | Sent: 08 January 2014 09:26
| > | > | To: Simon Peyton Jones; Herbert Valerio Riedel
| > | > | Cc: ghc-devs at haskell.org
| > | > | Subject: Re: High-level Cmm code and stack allocation
| > | > |
| > | > | On 07/01/14 22:53, Simon Peyton Jones wrote:
| > | > | > | Yes, this is technically wrong but luckily works.  I'd very
| > | > | > | much like to have a better solution, preferably one that
| > | > | > | doesn't add any extra overhead.
| > | > | >
| > | > | > | __decodeFloat_Int is a C function, so it will not touch the
| > | > | > | Haskell stack.
| > | > | >
| > | > | > This all seems terribly fragile to me.  At least it ought to
| > | > | > be
| > | > | surrounded with massive comments pointing out how terribly
| > | > | fragile it is, breaking all the rules that we carefully document
| elsewhere.
| > | > | >
| > | > | > Can't we just allocate a Cmm "area"? The address of an area is
| > | > | > a
| > | > | perfectly well-defined Cmm value.
| > | > |
| > | > | It is fragile, yes.  We can't use static memory because it needs
| > | > | to be thread-local.  This particular hack has gone through
| > | > | several iterations over the years: first we had static memory,
| > | > | which broke when we did the parallel runtime, then we had
| > | > | special storage in the Capability, which we gave up when GMP was
| > | > | split out into a separate library, because it didn't seem right
| > | > | to have magic fields in the Capability for one library.
| > | > |
| > | > | I'm looking into whether we can do temporary allocation on the
| > | > | heap for this instead.
| > | > |
| > | > | Cheers,
| > | > | Simon
| > | > |
| > | > |
| > | > | > Simon
| > | > | >
| > | > | > | -----Original Message-----
| > | > | > | From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On
| > | > | > | Behalf Of Simon Marlow
| > | > | > | Sent: 07 January 2014 16:05
| > | > | > | To: Herbert Valerio Riedel; ghc-devs at haskell.org
| > | > | > | Subject: Re: High-level Cmm code and stack allocation
| > | > | > |
| > | > | > | On 04/01/2014 23:26, Herbert Valerio Riedel wrote:
| > | > | > | > Hello,
| > | > | > | >
| > | > | > | > According to Note [Syntax of .cmm files],
| > | > | > | >
| > | > | > | > | There are two ways to write .cmm code:
| > | > | > | > |
| > | > | > | > |  (1) High-level Cmm code delegates the stack handling to
| > | > | > | > | GHC,
| > | > | and
| > | > | > | > |      never explicitly mentions Sp or registers.
| > | > | > | > |
| > | > | > | > |  (2) Low-level Cmm manages the stack itself, and must
| > | > | > | > | know
| > | about
| > | > | > | > |      calling conventions.
| > | > | > | > |
| > | > | > | > | Whether you want high-level or low-level Cmm is
| > | > | > | > | indicated by the presence of an argument list on a
| procedure.
| > | > | > | >
| > | > | > | > However, while working on integer-gmp I've been noticing
| > | > | > | > in integer-gmp/cbits/gmp-wrappers.cmm that even though all
| > | > | > | > Cmm
| > | > | > | procedures
| > | > | > | > have been converted to high-level Cmm, they still
| > | > | > | > reference the
| > | > | 'Sp'
| > | > | > | > register, e.g.
| > | > | > | >
| > | > | > | >
| > | > | > | >      #define GMP_TAKE1_RET1(name,mp_fun)       \
| > | > | > | >      name (W_ ws1, P_ d1)                      \
| > | > | > | >      {                                         \
| > | > | > | >        W_ mp_tmp1;                             \
| > | > | > | >        W_ mp_result1;                          \
| > | > | > | >                                                \
| > | > | > | >      again:                                    \
| > | > | > | >        STK_CHK_GEN_N (2 * SIZEOF_MP_INT);      \
| > | > | > | >        MAYBE_GC(again);                        \
| > | > | > | >                                                \
| > | > | > | >        mp_tmp1    = Sp - 1 * SIZEOF_MP_INT;    \
| > | > | > | >        mp_result1 = Sp - 2 * SIZEOF_MP_INT;    \
| > | > | > | >        ...                                     \
| > | > | > | >
| > | > | > | >
| > | > | > | > So is this valid high-level Cmm code? What's the proper
| > | > | > | > way to
| > | > | > | allocate
| > | > | > | > Stack (and/or Heap) memory from high-level Cmm code?
| > | > | > |
| > | > | > | Yes, this is technically wrong but luckily works.  I'd very
| > | > | > | much like to have a better solution, preferably one that
| > | > | > | doesn't add any extra overhead.
| > | > | > |
| > | > | > | The problem here is that we need to allocate a couple of
| > | > | > | temporary words and take their address; that's an unusual
| > | > | > | thing to do in Cmm, so it only occurs in a few places
| > | > | > | (mainly
| > | interacting with gmp).
| > | > | > | Usually if you want some temporary storage you can use local
| > | > | > | variables or some heap-allocated memory.
| > | > | > |
| > | > | > | Cheers,
| > | > | > | Simon
| > | > | > | _______________________________________________
| > | > | > | ghc-devs mailing list
| > | > | > | ghc-devs at haskell.org
| > | > | > | http://www.haskell.org/mailman/listinfo/ghc-devs
| > | > | >
| > | >
| >

Reply | Threaded
Open this post in threaded view
|

High-level Cmm code and stack allocation

Simon Marlow-7
So stack areas are still a great abstraction, the only change is that
they now overlap.  It's not just about stack getting too big, I've
copied the notes I made about it below (which I will paste into the code
in due course).  The nice property that we can generate well-defined Cmm
without knowing explicit stack offsets is intact.

What is different is that there used to be an intermediate state where
live variables were saved to abstract stack areas across calls, but Sp
was still not manifest.  This intermediate state doesn't exist any more,
the stack layout algorithm does it all in one pass.  To me this was far
simpler, and I think it ended up being fewer lines of code than the old
multi-phase stack layout algorithm (it's also much faster).

Of course you can always change this.  My goal was to get code that was
at least as good as the old code generator and in a reasonable amount of
time, and this was the shortest path I could find to that goal.

Cheers,
Simon

e.g. if we had

     x = Sp[old + 8]
     y = Sp[old + 16]

     Sp[young(L) + 8]  = L
     Sp[young(L) + 16] = y
     Sp[young(L) + 24] = x
     call f() returns to L

if areas semantically do not overlap, then we might optimise this to

     Sp[young(L) + 8]  = L
     Sp[young(L) + 16] = Sp[old + 8]
     Sp[young(L) + 24] = Sp[old + 16]
     call f() returns to L

and now young(L) cannot be allocated at the same place as old, and we
are doomed to use more stack.

   - old+8  conflicts with young(L)+8
   - old+16 conflicts with young(L)+16 and young(L)+8

so young(L)+8 == old+24 and we get

     Sp[-8]  = L
     Sp[-16] = Sp[8]
     Sp[-24] = Sp[0]
     Sp -= 24
     call f() returns to L

However, if areas are defined to be "possibly overlapping" in the
semantics, then we cannot commute any loads/stores of old with
young(L), and we will be able to re-use both old+8 and old+16 for
young(L).

     x = Sp[8]
     y = Sp[0]

     Sp[8] = L
     Sp[0] = y
     Sp[-8] = x
     Sp = Sp - 8
     call f() returns to L

Now, the assignments of y go away,

     x = Sp[8]
     Sp[8] = L
     Sp[-8] = x
     Sp = Sp - 8
     call f() returns to L


Conclusion:

   - T[old+N] aliases with U[young(L)+M] for all T, U, L, N and M
   - T[old+N] aliases with U[old+M] only if the areas actually overlap

this ensures that we will not commute any accesses to old with
young(L) or young(L) with young(L'), and the stack allocator will get
the maximum opportunity to overlap these areas, keeping the stack use
to a minimum and possibly avoiding some assignments.



On 10/01/2014 16:35, Simon Peyton Jones wrote:

> Oh, ok.  Alas, a good chunk of my model of Cmm has just gone out of the window.  I thought that areas were such a lovely, well-behaved abstraction.  I was thrilled when we came up with them, and I'm very sorry to see them go.
>
> There are no many things that I no longer understand.  I now have no idea how we save live variables over a call, or how multiple returned values from one call (returned on the stack) stay right where they are if they are live across the next call.
>
> What was the actual problem?  That functions used too much stack, so the stack was getting too big?  But a one slot area corresponds exactly to a live variable, so I don't see how the area abstraction could possibly increase stack size.  And is stack size a crucial issue anyway?
>
> Apart from anything else, areas would have given a lovely solution to the problem this thread started with!
>
> I guess we can talk about this when you next visit?  But some documentation would be welcome.
>
> Simon
>
> | -----Original Message-----
> | From: Simon Marlow [mailto:marlowsd at gmail.com]
> | Sent: 10 January 2014 16:24
> | To: Simon Peyton Jones; Herbert Valerio Riedel
> | Cc: ghc-devs at haskell.org
> | Subject: Re: High-level Cmm code and stack allocation
> |
> | There are no one-slot areas any more, I ditched those when I rewrote the
> | stack allocator.  There is only ever one live area: either the old area
> | or the young area for a call we are about to make or have just made.
> | (see the data type: I removed the one-slot areas)
> |
> | I struggled for a long time with this.  The problem is that with the
> | semantics of non-overlapping areas, code motion optimisations would tend
> | to increase the stack requirements of the function by overlapping the
> | live ranges of the areas.  I concluded that actually what we wanted was
> | areas that really do overlap, and optimisations that respect that, so
> | that we get more efficient stack usage.
> |
> | Cheers,
> | Simon
> |
> | On 10/01/2014 15:22, Simon Peyton Jones wrote:
> | > That documentation would be good, yes!  I don't know what it means to
> | say "we don't really have a general concept of areas any more".  We did
> | before, and I didn't know that it had gone away.  Urk!  We can have lots
> | of live areas, notably the old area (for the current call/return
> | parameters, the call area for a call we are preparing, and the one-slot
> | areas for variables we are saving on the stack.
> | >
> | > Here's he current story
> | > https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/StackAreas
> | >
> | > I agree that we have no concrete syntax for talking about areas, but
> | that is something we could fix.  But I'm worried that they may not mean
> | what they used to mean.
> | >
> | > Simon
> | >
> | > | -----Original Message-----
> | > | From: Simon Marlow [mailto:marlowsd at gmail.com]
> | > | Sent: 09 January 2014 08:39
> | > | To: Simon Peyton Jones; Herbert Valerio Riedel
> | > | Cc: ghc-devs at haskell.org
> | > | Subject: Re: High-level Cmm code and stack allocation
> | > |
> | > | On 08/01/2014 10:07, Simon Peyton Jones wrote:
> | > | > | > Can't we just allocate a Cmm "area"? The address of an area is
> | > | > | > a
> | > | > | perfectly well-defined Cmm value.
> | > | >
> | > | > What about this idea?
> | > |
> | > | We don't really have a general concept of areas (any more), and
> | > | areas aren't exposed in the concrete Cmm syntax at all.  The current
> | > | semantics is that areas may overlap with each other, so there should
> | > | only be one active area at any point.  I found that this was
> | > | important to ensure that we could generate good code from the stack
> | > | layout algorithm, otherwise it had to make pessimistic assumptions
> | and use too much stack.
> | > |
> | > | You're going to ask me where this is documented, and I think I have
> | > | to admit to slacking off, sorry :-)  We did discuss it at the time,
> | > | and I made copious notes, but I didn't transfer those to the code.
> | > | I'll add a Note.
> | > |
> | > | Cheers,
> | > | Simon
> | > |
> | > |
> | > | > Simon
> | > | >
> | > | > | -----Original Message-----
> | > | > | From: Simon Marlow [mailto:marlowsd at gmail.com]
> | > | > | Sent: 08 January 2014 09:26
> | > | > | To: Simon Peyton Jones; Herbert Valerio Riedel
> | > | > | Cc: ghc-devs at haskell.org
> | > | > | Subject: Re: High-level Cmm code and stack allocation
> | > | > |
> | > | > | On 07/01/14 22:53, Simon Peyton Jones wrote:
> | > | > | > | Yes, this is technically wrong but luckily works.  I'd very
> | > | > | > | much like to have a better solution, preferably one that
> | > | > | > | doesn't add any extra overhead.
> | > | > | >
> | > | > | > | __decodeFloat_Int is a C function, so it will not touch the
> | > | > | > | Haskell stack.
> | > | > | >
> | > | > | > This all seems terribly fragile to me.  At least it ought to
> | > | > | > be
> | > | > | surrounded with massive comments pointing out how terribly
> | > | > | fragile it is, breaking all the rules that we carefully document
> | elsewhere.
> | > | > | >
> | > | > | > Can't we just allocate a Cmm "area"? The address of an area is
> | > | > | > a
> | > | > | perfectly well-defined Cmm value.
> | > | > |
> | > | > | It is fragile, yes.  We can't use static memory because it needs
> | > | > | to be thread-local.  This particular hack has gone through
> | > | > | several iterations over the years: first we had static memory,
> | > | > | which broke when we did the parallel runtime, then we had
> | > | > | special storage in the Capability, which we gave up when GMP was
> | > | > | split out into a separate library, because it didn't seem right
> | > | > | to have magic fields in the Capability for one library.
> | > | > |
> | > | > | I'm looking into whether we can do temporary allocation on the
> | > | > | heap for this instead.
> | > | > |
> | > | > | Cheers,
> | > | > | Simon
> | > | > |
> | > | > |
> | > | > | > Simon
> | > | > | >
> | > | > | > | -----Original Message-----
> | > | > | > | From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On
> | > | > | > | Behalf Of Simon Marlow
> | > | > | > | Sent: 07 January 2014 16:05
> | > | > | > | To: Herbert Valerio Riedel; ghc-devs at haskell.org
> | > | > | > | Subject: Re: High-level Cmm code and stack allocation
> | > | > | > |
> | > | > | > | On 04/01/2014 23:26, Herbert Valerio Riedel wrote:
> | > | > | > | > Hello,
> | > | > | > | >
> | > | > | > | > According to Note [Syntax of .cmm files],
> | > | > | > | >
> | > | > | > | > | There are two ways to write .cmm code:
> | > | > | > | > |
> | > | > | > | > |  (1) High-level Cmm code delegates the stack handling to
> | > | > | > | > | GHC,
> | > | > | and
> | > | > | > | > |      never explicitly mentions Sp or registers.
> | > | > | > | > |
> | > | > | > | > |  (2) Low-level Cmm manages the stack itself, and must
> | > | > | > | > | know
> | > | about
> | > | > | > | > |      calling conventions.
> | > | > | > | > |
> | > | > | > | > | Whether you want high-level or low-level Cmm is
> | > | > | > | > | indicated by the presence of an argument list on a
> | procedure.
> | > | > | > | >
> | > | > | > | > However, while working on integer-gmp I've been noticing
> | > | > | > | > in integer-gmp/cbits/gmp-wrappers.cmm that even though all
> | > | > | > | > Cmm
> | > | > | > | procedures
> | > | > | > | > have been converted to high-level Cmm, they still
> | > | > | > | > reference the
> | > | > | 'Sp'
> | > | > | > | > register, e.g.
> | > | > | > | >
> | > | > | > | >
> | > | > | > | >      #define GMP_TAKE1_RET1(name,mp_fun)       \
> | > | > | > | >      name (W_ ws1, P_ d1)                      \
> | > | > | > | >      {                                         \
> | > | > | > | >        W_ mp_tmp1;                             \
> | > | > | > | >        W_ mp_result1;                          \
> | > | > | > | >                                                \
> | > | > | > | >      again:                                    \
> | > | > | > | >        STK_CHK_GEN_N (2 * SIZEOF_MP_INT);      \
> | > | > | > | >        MAYBE_GC(again);                        \
> | > | > | > | >                                                \
> | > | > | > | >        mp_tmp1    = Sp - 1 * SIZEOF_MP_INT;    \
> | > | > | > | >        mp_result1 = Sp - 2 * SIZEOF_MP_INT;    \
> | > | > | > | >        ...                                     \
> | > | > | > | >
> | > | > | > | >
> | > | > | > | > So is this valid high-level Cmm code? What's the proper
> | > | > | > | > way to
> | > | > | > | allocate
> | > | > | > | > Stack (and/or Heap) memory from high-level Cmm code?
> | > | > | > |
> | > | > | > | Yes, this is technically wrong but luckily works.  I'd very
> | > | > | > | much like to have a better solution, preferably one that
> | > | > | > | doesn't add any extra overhead.
> | > | > | > |
> | > | > | > | The problem here is that we need to allocate a couple of
> | > | > | > | temporary words and take their address; that's an unusual
> | > | > | > | thing to do in Cmm, so it only occurs in a few places
> | > | > | > | (mainly
> | > | interacting with gmp).
> | > | > | > | Usually if you want some temporary storage you can use local
> | > | > | > | variables or some heap-allocated memory.
> | > | > | > |
> | > | > | > | Cheers,
> | > | > | > | Simon
> | > | > | > | _______________________________________________
> | > | > | > | ghc-devs mailing list
> | > | > | > | ghc-devs at haskell.org
> | > | > | > | http://www.haskell.org/mailman/listinfo/ghc-devs
> | > | > | >
> | > | >
> | >
>

Reply | Threaded
Open this post in threaded view
|

High-level Cmm code and stack allocation

Simon Peyton Jones
Thanks.  Reading what you write below, I can see two possible motivations.

1.  Reduce stack sizes.  
2.  Eliminate memory moves

For (1) do we have any data to show that the non-overlap of areas was giving rise to unacceptably big stacks?

For (2) that is indeed clever, but it's pretty serendipitous: it relies on the overlap being just so, so that coincidentally y gets stored in the same place as it was loaded from.  I imagine that you don't plan the stack layout to cause that to happen; it's just a coincidence.  Do we have any data to show that the coincidence happens with any frequency?

Also, as you note, we lose the opportunity for certain sorts of code motion, perhaps increasing register pressure a lot.  So there is a downside too.

You seldom do things without a very good reason, so I feel I must be missing something.

Simon

| -----Original Message-----
| From: Simon Marlow [mailto:marlowsd at gmail.com]
| Sent: 10 January 2014 17:00
| To: Simon Peyton Jones; Herbert Valerio Riedel
| Cc: ghc-devs at haskell.org
| Subject: Re: High-level Cmm code and stack allocation
|
| So stack areas are still a great abstraction, the only change is that
| they now overlap.  It's not just about stack getting too big, I've
| copied the notes I made about it below (which I will paste into the code
| in due course).  The nice property that we can generate well-defined Cmm
| without knowing explicit stack offsets is intact.
|
| What is different is that there used to be an intermediate state where
| live variables were saved to abstract stack areas across calls, but Sp
| was still not manifest.  This intermediate state doesn't exist any more,
| the stack layout algorithm does it all in one pass.  To me this was far
| simpler, and I think it ended up being fewer lines of code than the old
| multi-phase stack layout algorithm (it's also much faster).
|
| Of course you can always change this.  My goal was to get code that was
| at least as good as the old code generator and in a reasonable amount of
| time, and this was the shortest path I could find to that goal.
|
| Cheers,
| Simon
|
| e.g. if we had
|
|      x = Sp[old + 8]
|      y = Sp[old + 16]
|
|      Sp[young(L) + 8]  = L
|      Sp[young(L) + 16] = y
|      Sp[young(L) + 24] = x
|      call f() returns to L
|
| if areas semantically do not overlap, then we might optimise this to
|
|      Sp[young(L) + 8]  = L
|      Sp[young(L) + 16] = Sp[old + 8]
|      Sp[young(L) + 24] = Sp[old + 16]
|      call f() returns to L
|
| and now young(L) cannot be allocated at the same place as old, and we
| are doomed to use more stack.
|
|    - old+8  conflicts with young(L)+8
|    - old+16 conflicts with young(L)+16 and young(L)+8
|
| so young(L)+8 == old+24 and we get
|
|      Sp[-8]  = L
|      Sp[-16] = Sp[8]
|      Sp[-24] = Sp[0]
|      Sp -= 24
|      call f() returns to L
|
| However, if areas are defined to be "possibly overlapping" in the
| semantics, then we cannot commute any loads/stores of old with young(L),
| and we will be able to re-use both old+8 and old+16 for young(L).
|
|      x = Sp[8]
|      y = Sp[0]
|
|      Sp[8] = L
|      Sp[0] = y
|      Sp[-8] = x
|      Sp = Sp - 8
|      call f() returns to L
|
| Now, the assignments of y go away,
|
|      x = Sp[8]
|      Sp[8] = L
|      Sp[-8] = x
|      Sp = Sp - 8
|      call f() returns to L
|
|
| Conclusion:
|
|    - T[old+N] aliases with U[young(L)+M] for all T, U, L, N and M
|    - T[old+N] aliases with U[old+M] only if the areas actually overlap
|
| this ensures that we will not commute any accesses to old with
| young(L) or young(L) with young(L'), and the stack allocator will get
| the maximum opportunity to overlap these areas, keeping the stack use to
| a minimum and possibly avoiding some assignments.
|
|
|
| On 10/01/2014 16:35, Simon Peyton Jones wrote:
| > Oh, ok.  Alas, a good chunk of my model of Cmm has just gone out of
| the window.  I thought that areas were such a lovely, well-behaved
| abstraction.  I was thrilled when we came up with them, and I'm very
| sorry to see them go.
| >
| > There are no many things that I no longer understand.  I now have no
| idea how we save live variables over a call, or how multiple returned
| values from one call (returned on the stack) stay right where they are
| if they are live across the next call.
| >
| > What was the actual problem?  That functions used too much stack, so
| the stack was getting too big?  But a one slot area corresponds exactly
| to a live variable, so I don't see how the area abstraction could
| possibly increase stack size.  And is stack size a crucial issue anyway?
| >
| > Apart from anything else, areas would have given a lovely solution to
| the problem this thread started with!
| >
| > I guess we can talk about this when you next visit?  But some
| documentation would be welcome.
| >
| > Simon
| >
| > | -----Original Message-----
| > | From: Simon Marlow [mailto:marlowsd at gmail.com]
| > | Sent: 10 January 2014 16:24
| > | To: Simon Peyton Jones; Herbert Valerio Riedel
| > | Cc: ghc-devs at haskell.org
| > | Subject: Re: High-level Cmm code and stack allocation
| > |
| > | There are no one-slot areas any more, I ditched those when I rewrote
| > | the stack allocator.  There is only ever one live area: either the
| > | old area or the young area for a call we are about to make or have
| just made.
| > | (see the data type: I removed the one-slot areas)
| > |
| > | I struggled for a long time with this.  The problem is that with the
| > | semantics of non-overlapping areas, code motion optimisations would
| > | tend to increase the stack requirements of the function by
| > | overlapping the live ranges of the areas.  I concluded that actually
| > | what we wanted was areas that really do overlap, and optimisations
| > | that respect that, so that we get more efficient stack usage.
| > |
| > | Cheers,
| > | Simon
| > |
| > | On 10/01/2014 15:22, Simon Peyton Jones wrote:
| > | > That documentation would be good, yes!  I don't know what it means
| > | > to
| > | say "we don't really have a general concept of areas any more".  We
| > | did before, and I didn't know that it had gone away.  Urk!  We can
| > | have lots of live areas, notably the old area (for the current
| > | call/return parameters, the call area for a call we are preparing,
| > | and the one-slot areas for variables we are saving on the stack.
| > | >
| > | > Here's he current story
| > | > https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/StackAre
| > | > as
| > | >
| > | > I agree that we have no concrete syntax for talking about areas,
| > | > but
| > | that is something we could fix.  But I'm worried that they may not
| > | mean what they used to mean.
| > | >
| > | > Simon
| > | >
| > | > | -----Original Message-----
| > | > | From: Simon Marlow [mailto:marlowsd at gmail.com]
| > | > | Sent: 09 January 2014 08:39
| > | > | To: Simon Peyton Jones; Herbert Valerio Riedel
| > | > | Cc: ghc-devs at haskell.org
| > | > | Subject: Re: High-level Cmm code and stack allocation
| > | > |
| > | > | On 08/01/2014 10:07, Simon Peyton Jones wrote:
| > | > | > | > Can't we just allocate a Cmm "area"? The address of an
| > | > | > | > area is a
| > | > | > | perfectly well-defined Cmm value.
| > | > | >
| > | > | > What about this idea?
| > | > |
| > | > | We don't really have a general concept of areas (any more), and
| > | > | areas aren't exposed in the concrete Cmm syntax at all.  The
| > | > | current semantics is that areas may overlap with each other, so
| > | > | there should only be one active area at any point.  I found that
| > | > | this was important to ensure that we could generate good code
| > | > | from the stack layout algorithm, otherwise it had to make
| > | > | pessimistic assumptions
| > | and use too much stack.
| > | > |
| > | > | You're going to ask me where this is documented, and I think I
| > | > | have to admit to slacking off, sorry :-)  We did discuss it at
| > | > | the time, and I made copious notes, but I didn't transfer those
| to the code.
| > | > | I'll add a Note.
| > | > |
| > | > | Cheers,
| > | > | Simon
| > | > |
| > | > |
| > | > | > Simon
| > | > | >
| > | > | > | -----Original Message-----
| > | > | > | From: Simon Marlow [mailto:marlowsd at gmail.com]
| > | > | > | Sent: 08 January 2014 09:26
| > | > | > | To: Simon Peyton Jones; Herbert Valerio Riedel
| > | > | > | Cc: ghc-devs at haskell.org
| > | > | > | Subject: Re: High-level Cmm code and stack allocation
| > | > | > |
| > | > | > | On 07/01/14 22:53, Simon Peyton Jones wrote:
| > | > | > | > | Yes, this is technically wrong but luckily works.  I'd
| > | > | > | > | very much like to have a better solution, preferably one
| > | > | > | > | that doesn't add any extra overhead.
| > | > | > | >
| > | > | > | > | __decodeFloat_Int is a C function, so it will not touch
| > | > | > | > | the Haskell stack.
| > | > | > | >
| > | > | > | > This all seems terribly fragile to me.  At least it ought
| > | > | > | > to be
| > | > | > | surrounded with massive comments pointing out how terribly
| > | > | > | fragile it is, breaking all the rules that we carefully
| > | > | > | document
| > | elsewhere.
| > | > | > | >
| > | > | > | > Can't we just allocate a Cmm "area"? The address of an
| > | > | > | > area is a
| > | > | > | perfectly well-defined Cmm value.
| > | > | > |
| > | > | > | It is fragile, yes.  We can't use static memory because it
| > | > | > | needs to be thread-local.  This particular hack has gone
| > | > | > | through several iterations over the years: first we had
| > | > | > | static memory, which broke when we did the parallel runtime,
| > | > | > | then we had special storage in the Capability, which we gave
| > | > | > | up when GMP was split out into a separate library, because
| > | > | > | it didn't seem right to have magic fields in the Capability
| for one library.
| > | > | > |
| > | > | > | I'm looking into whether we can do temporary allocation on
| > | > | > | the heap for this instead.
| > | > | > |
| > | > | > | Cheers,
| > | > | > | Simon
| > | > | > |
| > | > | > |
| > | > | > | > Simon
| > | > | > | >
| > | > | > | > | -----Original Message-----
| > | > | > | > | From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On
| > | > | > | > | Behalf Of Simon Marlow
| > | > | > | > | Sent: 07 January 2014 16:05
| > | > | > | > | To: Herbert Valerio Riedel; ghc-devs at haskell.org
| > | > | > | > | Subject: Re: High-level Cmm code and stack allocation
| > | > | > | > |
| > | > | > | > | On 04/01/2014 23:26, Herbert Valerio Riedel wrote:
| > | > | > | > | > Hello,
| > | > | > | > | >
| > | > | > | > | > According to Note [Syntax of .cmm files],
| > | > | > | > | >
| > | > | > | > | > | There are two ways to write .cmm code:
| > | > | > | > | > |
| > | > | > | > | > |  (1) High-level Cmm code delegates the stack
| > | > | > | > | > | handling to GHC,
| > | > | > | and
| > | > | > | > | > |      never explicitly mentions Sp or registers.
| > | > | > | > | > |
| > | > | > | > | > |  (2) Low-level Cmm manages the stack itself, and
| > | > | > | > | > | must know
| > | > | about
| > | > | > | > | > |      calling conventions.
| > | > | > | > | > |
| > | > | > | > | > | Whether you want high-level or low-level Cmm is
| > | > | > | > | > | indicated by the presence of an argument list on a
| > | procedure.
| > | > | > | > | >
| > | > | > | > | > However, while working on integer-gmp I've been
| > | > | > | > | > noticing in integer-gmp/cbits/gmp-wrappers.cmm that
| > | > | > | > | > even though all Cmm
| > | > | > | > | procedures
| > | > | > | > | > have been converted to high-level Cmm, they still
| > | > | > | > | > reference the
| > | > | > | 'Sp'
| > | > | > | > | > register, e.g.
| > | > | > | > | >
| > | > | > | > | >
| > | > | > | > | >      #define GMP_TAKE1_RET1(name,mp_fun)       \
| > | > | > | > | >      name (W_ ws1, P_ d1)                      \
| > | > | > | > | >      {                                         \
| > | > | > | > | >        W_ mp_tmp1;                             \
| > | > | > | > | >        W_ mp_result1;                          \
| > | > | > | > | >                                                \
| > | > | > | > | >      again:                                    \
| > | > | > | > | >        STK_CHK_GEN_N (2 * SIZEOF_MP_INT);      \
| > | > | > | > | >        MAYBE_GC(again);                        \
| > | > | > | > | >                                                \
| > | > | > | > | >        mp_tmp1    = Sp - 1 * SIZEOF_MP_INT;    \
| > | > | > | > | >        mp_result1 = Sp - 2 * SIZEOF_MP_INT;    \
| > | > | > | > | >        ...                                     \
| > | > | > | > | >
| > | > | > | > | >
| > | > | > | > | > So is this valid high-level Cmm code? What's the
| > | > | > | > | > proper way to
| > | > | > | > | allocate
| > | > | > | > | > Stack (and/or Heap) memory from high-level Cmm code?
| > | > | > | > |
| > | > | > | > | Yes, this is technically wrong but luckily works.  I'd
| > | > | > | > | very much like to have a better solution, preferably one
| > | > | > | > | that doesn't add any extra overhead.
| > | > | > | > |
| > | > | > | > | The problem here is that we need to allocate a couple of
| > | > | > | > | temporary words and take their address; that's an
| > | > | > | > | unusual thing to do in Cmm, so it only occurs in a few
| > | > | > | > | places (mainly
| > | > | interacting with gmp).
| > | > | > | > | Usually if you want some temporary storage you can use
| > | > | > | > | local variables or some heap-allocated memory.
| > | > | > | > |
| > | > | > | > | Cheers,
| > | > | > | > | Simon
| > | > | > | > | _______________________________________________
| > | > | > | > | ghc-devs mailing list
| > | > | > | > | ghc-devs at haskell.org
| > | > | > | > | http://www.haskell.org/mailman/listinfo/ghc-devs
| > | > | > | >
| > | > | >
| > | >
| >

Reply | Threaded
Open this post in threaded view
|

High-level Cmm code and stack allocation

Simon Marlow-7
Using more stack generally (but not always) implies extra memory
traffic.  I noticed it happening a lot, but I didn't make measurements -
we never had a way to generate code with just this one thing changed,
because the new code generator had lots of issues with bad code, and
this was just one.

We can think of stack allocation as a black box: it takes Cmm in which
(a) variables live across calls and (b) stack references are to [Old+n]
or [Young+n], and returns Cmm in which (a) variables do not live across
calls, and (b) all stack references are explicit offsets from Sp.

The internals of this box are what has changed. Most users of Cmm don't
need to care, because you can write optimisations on both the
pre-stack-allocated Cmm and the post-stack-allocated Cmm without knowing
anything about how stack allocation works. Indeed CmmSink (now) works on
both forms.

The stack area idea exposed some of the internals of this box; I don't
think that's necessarily a good thing.  There was *another* form of Cmm,
in which (a) variables do not live across calls, and (b) stack
references are to [Old+n], [Young+n] or [Sp(var)].  There was a
(beautifully simple) spill pass using Hoopl that inserted spills at the
definition site; unfortunately to generate good code you often have to
move the spills somewhere else.  And that's really hard, because code
motion interacts in complex ways with stack layout: making a bad code
motion decision can increase your stack requirements.  This is a pretty
good summary of what I was finding difficult here.  It was not possible
to generate good code without doing some optimisation on this
intermediate stage, yet by doing stack allocation in a different way it
was much easier to get good code.  So the new stack allocator just walks
through the code spilling, reloading, and allocating stack frames as it
goes and making intelligent decisions about not spilling things if
they're already on the stack.  This does a really good job, and it was
easy to add a couple of important special cases for common things.

There's plenty of room to do something better.  However, what we have
now generates good code from the kind of things that the code generator
generates (since that's what I tuned it for, by peering at lots of Cmm
and tweaking things), so any improvements won't see much benefit for
typical Haskell code.

I have some more docs for the stack layout code that I'll push shortly.

Cheers,
Simon

On 13/01/2014 15:20, Simon Peyton Jones wrote:

> Thanks.  Reading what you write below, I can see two possible motivations.
>
> 1.  Reduce stack sizes.
> 2.  Eliminate memory moves
>
> For (1) do we have any data to show that the non-overlap of areas was giving rise to unacceptably big stacks?
>
> For (2) that is indeed clever, but it's pretty serendipitous: it relies on the overlap being just so, so that coincidentally y gets stored in the same place as it was loaded from.  I imagine that you don't plan the stack layout to cause that to happen; it's just a coincidence.  Do we have any data to show that the coincidence happens with any frequency?
>
> Also, as you note, we lose the opportunity for certain sorts of code motion, perhaps increasing register pressure a lot.  So there is a downside too.
>
> You seldom do things without a very good reason, so I feel I must be missing something.
>
> Simon
>
> | -----Original Message-----
> | From: Simon Marlow [mailto:marlowsd at gmail.com]
> | Sent: 10 January 2014 17:00
> | To: Simon Peyton Jones; Herbert Valerio Riedel
> | Cc: ghc-devs at haskell.org
> | Subject: Re: High-level Cmm code and stack allocation
> |
> | So stack areas are still a great abstraction, the only change is that
> | they now overlap.  It's not just about stack getting too big, I've
> | copied the notes I made about it below (which I will paste into the code
> | in due course).  The nice property that we can generate well-defined Cmm
> | without knowing explicit stack offsets is intact.
> |
> | What is different is that there used to be an intermediate state where
> | live variables were saved to abstract stack areas across calls, but Sp
> | was still not manifest.  This intermediate state doesn't exist any more,
> | the stack layout algorithm does it all in one pass.  To me this was far
> | simpler, and I think it ended up being fewer lines of code than the old
> | multi-phase stack layout algorithm (it's also much faster).
> |
> | Of course you can always change this.  My goal was to get code that was
> | at least as good as the old code generator and in a reasonable amount of
> | time, and this was the shortest path I could find to that goal.
> |
> | Cheers,
> | Simon
> |
> | e.g. if we had
> |
> |      x = Sp[old + 8]
> |      y = Sp[old + 16]
> |
> |      Sp[young(L) + 8]  = L
> |      Sp[young(L) + 16] = y
> |      Sp[young(L) + 24] = x
> |      call f() returns to L
> |
> | if areas semantically do not overlap, then we might optimise this to
> |
> |      Sp[young(L) + 8]  = L
> |      Sp[young(L) + 16] = Sp[old + 8]
> |      Sp[young(L) + 24] = Sp[old + 16]
> |      call f() returns to L
> |
> | and now young(L) cannot be allocated at the same place as old, and we
> | are doomed to use more stack.
> |
> |    - old+8  conflicts with young(L)+8
> |    - old+16 conflicts with young(L)+16 and young(L)+8
> |
> | so young(L)+8 == old+24 and we get
> |
> |      Sp[-8]  = L
> |      Sp[-16] = Sp[8]
> |      Sp[-24] = Sp[0]
> |      Sp -= 24
> |      call f() returns to L
> |
> | However, if areas are defined to be "possibly overlapping" in the
> | semantics, then we cannot commute any loads/stores of old with young(L),
> | and we will be able to re-use both old+8 and old+16 for young(L).
> |
> |      x = Sp[8]
> |      y = Sp[0]
> |
> |      Sp[8] = L
> |      Sp[0] = y
> |      Sp[-8] = x
> |      Sp = Sp - 8
> |      call f() returns to L
> |
> | Now, the assignments of y go away,
> |
> |      x = Sp[8]
> |      Sp[8] = L
> |      Sp[-8] = x
> |      Sp = Sp - 8
> |      call f() returns to L
> |
> |
> | Conclusion:
> |
> |    - T[old+N] aliases with U[young(L)+M] for all T, U, L, N and M
> |    - T[old+N] aliases with U[old+M] only if the areas actually overlap
> |
> | this ensures that we will not commute any accesses to old with
> | young(L) or young(L) with young(L'), and the stack allocator will get
> | the maximum opportunity to overlap these areas, keeping the stack use to
> | a minimum and possibly avoiding some assignments.
> |
> |
> |
> | On 10/01/2014 16:35, Simon Peyton Jones wrote:
> | > Oh, ok.  Alas, a good chunk of my model of Cmm has just gone out of
> | the window.  I thought that areas were such a lovely, well-behaved
> | abstraction.  I was thrilled when we came up with them, and I'm very
> | sorry to see them go.
> | >
> | > There are no many things that I no longer understand.  I now have no
> | idea how we save live variables over a call, or how multiple returned
> | values from one call (returned on the stack) stay right where they are
> | if they are live across the next call.
> | >
> | > What was the actual problem?  That functions used too much stack, so
> | the stack was getting too big?  But a one slot area corresponds exactly
> | to a live variable, so I don't see how the area abstraction could
> | possibly increase stack size.  And is stack size a crucial issue anyway?
> | >
> | > Apart from anything else, areas would have given a lovely solution to
> | the problem this thread started with!
> | >
> | > I guess we can talk about this when you next visit?  But some
> | documentation would be welcome.
> | >
> | > Simon
> | >
> | > | -----Original Message-----
> | > | From: Simon Marlow [mailto:marlowsd at gmail.com]
> | > | Sent: 10 January 2014 16:24
> | > | To: Simon Peyton Jones; Herbert Valerio Riedel
> | > | Cc: ghc-devs at haskell.org
> | > | Subject: Re: High-level Cmm code and stack allocation
> | > |
> | > | There are no one-slot areas any more, I ditched those when I rewrote
> | > | the stack allocator.  There is only ever one live area: either the
> | > | old area or the young area for a call we are about to make or have
> | just made.
> | > | (see the data type: I removed the one-slot areas)
> | > |
> | > | I struggled for a long time with this.  The problem is that with the
> | > | semantics of non-overlapping areas, code motion optimisations would
> | > | tend to increase the stack requirements of the function by
> | > | overlapping the live ranges of the areas.  I concluded that actually
> | > | what we wanted was areas that really do overlap, and optimisations
> | > | that respect that, so that we get more efficient stack usage.
> | > |
> | > | Cheers,
> | > | Simon
> | > |
> | > | On 10/01/2014 15:22, Simon Peyton Jones wrote:
> | > | > That documentation would be good, yes!  I don't know what it means
> | > | > to
> | > | say "we don't really have a general concept of areas any more".  We
> | > | did before, and I didn't know that it had gone away.  Urk!  We can
> | > | have lots of live areas, notably the old area (for the current
> | > | call/return parameters, the call area for a call we are preparing,
> | > | and the one-slot areas for variables we are saving on the stack.
> | > | >
> | > | > Here's he current story
> | > | > https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/StackAre
> | > | > as
> | > | >
> | > | > I agree that we have no concrete syntax for talking about areas,
> | > | > but
> | > | that is something we could fix.  But I'm worried that they may not
> | > | mean what they used to mean.
> | > | >
> | > | > Simon
> | > | >
> | > | > | -----Original Message-----
> | > | > | From: Simon Marlow [mailto:marlowsd at gmail.com]
> | > | > | Sent: 09 January 2014 08:39
> | > | > | To: Simon Peyton Jones; Herbert Valerio Riedel
> | > | > | Cc: ghc-devs at haskell.org
> | > | > | Subject: Re: High-level Cmm code and stack allocation
> | > | > |
> | > | > | On 08/01/2014 10:07, Simon Peyton Jones wrote:
> | > | > | > | > Can't we just allocate a Cmm "area"? The address of an
> | > | > | > | > area is a
> | > | > | > | perfectly well-defined Cmm value.
> | > | > | >
> | > | > | > What about this idea?
> | > | > |
> | > | > | We don't really have a general concept of areas (any more), and
> | > | > | areas aren't exposed in the concrete Cmm syntax at all.  The
> | > | > | current semantics is that areas may overlap with each other, so
> | > | > | there should only be one active area at any point.  I found that
> | > | > | this was important to ensure that we could generate good code
> | > | > | from the stack layout algorithm, otherwise it had to make
> | > | > | pessimistic assumptions
> | > | and use too much stack.
> | > | > |
> | > | > | You're going to ask me where this is documented, and I think I
> | > | > | have to admit to slacking off, sorry :-)  We did discuss it at
> | > | > | the time, and I made copious notes, but I didn't transfer those
> | to the code.
> | > | > | I'll add a Note.
> | > | > |
> | > | > | Cheers,
> | > | > | Simon
> | > | > |
> | > | > |
> | > | > | > Simon
> | > | > | >
> | > | > | > | -----Original Message-----
> | > | > | > | From: Simon Marlow [mailto:marlowsd at gmail.com]
> | > | > | > | Sent: 08 January 2014 09:26
> | > | > | > | To: Simon Peyton Jones; Herbert Valerio Riedel
> | > | > | > | Cc: ghc-devs at haskell.org
> | > | > | > | Subject: Re: High-level Cmm code and stack allocation
> | > | > | > |
> | > | > | > | On 07/01/14 22:53, Simon Peyton Jones wrote:
> | > | > | > | > | Yes, this is technically wrong but luckily works.  I'd
> | > | > | > | > | very much like to have a better solution, preferably one
> | > | > | > | > | that doesn't add any extra overhead.
> | > | > | > | >
> | > | > | > | > | __decodeFloat_Int is a C function, so it will not touch
> | > | > | > | > | the Haskell stack.
> | > | > | > | >
> | > | > | > | > This all seems terribly fragile to me.  At least it ought
> | > | > | > | > to be
> | > | > | > | surrounded with massive comments pointing out how terribly
> | > | > | > | fragile it is, breaking all the rules that we carefully
> | > | > | > | document
> | > | elsewhere.
> | > | > | > | >
> | > | > | > | > Can't we just allocate a Cmm "area"? The address of an
> | > | > | > | > area is a
> | > | > | > | perfectly well-defined Cmm value.
> | > | > | > |
> | > | > | > | It is fragile, yes.  We can't use static memory because it
> | > | > | > | needs to be thread-local.  This particular hack has gone
> | > | > | > | through several iterations over the years: first we had
> | > | > | > | static memory, which broke when we did the parallel runtime,
> | > | > | > | then we had special storage in the Capability, which we gave
> | > | > | > | up when GMP was split out into a separate library, because
> | > | > | > | it didn't seem right to have magic fields in the Capability
> | for one library.
> | > | > | > |
> | > | > | > | I'm looking into whether we can do temporary allocation on
> | > | > | > | the heap for this instead.
> | > | > | > |
> | > | > | > | Cheers,
> | > | > | > | Simon
> | > | > | > |
> | > | > | > |
> | > | > | > | > Simon
> | > | > | > | >
> | > | > | > | > | -----Original Message-----
> | > | > | > | > | From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On
> | > | > | > | > | Behalf Of Simon Marlow
> | > | > | > | > | Sent: 07 January 2014 16:05
> | > | > | > | > | To: Herbert Valerio Riedel; ghc-devs at haskell.org
> | > | > | > | > | Subject: Re: High-level Cmm code and stack allocation
> | > | > | > | > |
> | > | > | > | > | On 04/01/2014 23:26, Herbert Valerio Riedel wrote:
> | > | > | > | > | > Hello,
> | > | > | > | > | >
> | > | > | > | > | > According to Note [Syntax of .cmm files],
> | > | > | > | > | >
> | > | > | > | > | > | There are two ways to write .cmm code:
> | > | > | > | > | > |
> | > | > | > | > | > |  (1) High-level Cmm code delegates the stack
> | > | > | > | > | > | handling to GHC,
> | > | > | > | and
> | > | > | > | > | > |      never explicitly mentions Sp or registers.
> | > | > | > | > | > |
> | > | > | > | > | > |  (2) Low-level Cmm manages the stack itself, and
> | > | > | > | > | > | must know
> | > | > | about
> | > | > | > | > | > |      calling conventions.
> | > | > | > | > | > |
> | > | > | > | > | > | Whether you want high-level or low-level Cmm is
> | > | > | > | > | > | indicated by the presence of an argument list on a
> | > | procedure.
> | > | > | > | > | >
> | > | > | > | > | > However, while working on integer-gmp I've been
> | > | > | > | > | > noticing in integer-gmp/cbits/gmp-wrappers.cmm that
> | > | > | > | > | > even though all Cmm
> | > | > | > | > | procedures
> | > | > | > | > | > have been converted to high-level Cmm, they still
> | > | > | > | > | > reference the
> | > | > | > | 'Sp'
> | > | > | > | > | > register, e.g.
> | > | > | > | > | >
> | > | > | > | > | >
> | > | > | > | > | >      #define GMP_TAKE1_RET1(name,mp_fun)       \
> | > | > | > | > | >      name (W_ ws1, P_ d1)                      \
> | > | > | > | > | >      {                                         \
> | > | > | > | > | >        W_ mp_tmp1;                             \
> | > | > | > | > | >        W_ mp_result1;                          \
> | > | > | > | > | >                                                \
> | > | > | > | > | >      again:                                    \
> | > | > | > | > | >        STK_CHK_GEN_N (2 * SIZEOF_MP_INT);      \
> | > | > | > | > | >        MAYBE_GC(again);                        \
> | > | > | > | > | >                                                \
> | > | > | > | > | >        mp_tmp1    = Sp - 1 * SIZEOF_MP_INT;    \
> | > | > | > | > | >        mp_result1 = Sp - 2 * SIZEOF_MP_INT;    \
> | > | > | > | > | >        ...                                     \
> | > | > | > | > | >
> | > | > | > | > | >
> | > | > | > | > | > So is this valid high-level Cmm code? What's the
> | > | > | > | > | > proper way to
> | > | > | > | > | allocate
> | > | > | > | > | > Stack (and/or Heap) memory from high-level Cmm code?
> | > | > | > | > |
> | > | > | > | > | Yes, this is technically wrong but luckily works.  I'd
> | > | > | > | > | very much like to have a better solution, preferably one
> | > | > | > | > | that doesn't add any extra overhead.
> | > | > | > | > |
> | > | > | > | > | The problem here is that we need to allocate a couple of
> | > | > | > | > | temporary words and take their address; that's an
> | > | > | > | > | unusual thing to do in Cmm, so it only occurs in a few
> | > | > | > | > | places (mainly
> | > | > | interacting with gmp).
> | > | > | > | > | Usually if you want some temporary storage you can use
> | > | > | > | > | local variables or some heap-allocated memory.
> | > | > | > | > |
> | > | > | > | > | Cheers,
> | > | > | > | > | Simon
> | > | > | > | > | _______________________________________________
> | > | > | > | > | ghc-devs mailing list
> | > | > | > | > | ghc-devs at haskell.org
> | > | > | > | > | http://www.haskell.org/mailman/listinfo/ghc-devs
> | > | > | > | >
> | > | > | >
> | > | >
> | >
>