NCG lowering of sqrt

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

NCG lowering of sqrt

Kavon Farvardin
Given a Cmm expression such as

    (_c8Gq::F64) = call MO_F64_Sqrt(_s8oX::F64);   // CmmUnsafeForeignCall

the native code generator produces an actual call to the sqrt C function, which has the side-effect of causing all floating-point registers to be dumped as they are caller-saved. In the nbody benchmark, this is particularly bad for a rather hot piece of code (see below).

Ideally the NCG would recognize this foreign call and instead use the `sqrtsd` SSE instruction when targeting x86-64.  

Does anyone know if the NCG can produce this instruction? I think it would be beneficial, as the below would turn into one or two instructions.

Other math functions such as sin/cos require x87 FPU instructions, which as far as I know we're not using.


;;;;;;;;;;;
; NCG generates this in parts of the nbody benchmark
; to compute the sqrt
;
    subq $8,%rsp
    movsd %xmm9,176(%rsp)  ; all floating-point registers
    movsd %xmm1,184(%rsp)  ; are caller-saved in SysV ABI
    movsd %xmm2,192(%rsp)
    movsd %xmm3,200(%rsp)
    movq %rdi,208(%rsp)
    movq %rcx,216(%rsp)
    movq %rsi,224(%rsp)
    movsd %xmm4,232(%rsp)
    movsd %xmm5,240(%rsp)
    movsd %xmm6,248(%rsp)
    movsd %xmm7,256(%rsp)
    movsd %xmm8,264(%rsp)
    movsd %xmm11,272(%rsp)
    call _sqrt
    ;; the loads
    ;; below are interleaved
    ;; with computations
    addq $8,%rsp
    movsd 264(%rsp),%xmm1
    movsd 240(%rsp),%xmm2
    movsd 224(%rsp),%xmm2
    movsd 232(%rsp),%xmm4
    movq 200(%rsp),%rax
    movsd 248(%rsp),%xmm4
    movsd 256(%rsp),%xmm4
    movq 216(%rsp),%rcx
    movsd 192(%rsp),%xmm2


~kavon

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: NCG lowering of sqrt

Ryan Yates
Hi Kavon,

I looked a bit and it does not appear that there is an SSE sqrt in the native code gen.  It should be easy to add (see a similar addition here: https://phabricator.haskell.org/D3265).  The x87 version was available for 32-bit.  I think if you use the LLVM backend it will give you the SSE sqrt.

Ryan

On Fri, Apr 28, 2017 at 9:27 AM, Kavon Farvardin <[hidden email]> wrote:
Given a Cmm expression such as

    (_c8Gq::F64) = call MO_F64_Sqrt(_s8oX::F64);   // CmmUnsafeForeignCall

the native code generator produces an actual call to the sqrt C function, which has the side-effect of causing all floating-point registers to be dumped as they are caller-saved. In the nbody benchmark, this is particularly bad for a rather hot piece of code (see below).

Ideally the NCG would recognize this foreign call and instead use the `sqrtsd` SSE instruction when targeting x86-64.

Does anyone know if the NCG can produce this instruction? I think it would be beneficial, as the below would turn into one or two instructions.

Other math functions such as sin/cos require x87 FPU instructions, which as far as I know we're not using.


;;;;;;;;;;;
; NCG generates this in parts of the nbody benchmark
; to compute the sqrt
;
    subq $8,%rsp
    movsd %xmm9,176(%rsp)  ; all floating-point registers
    movsd %xmm1,184(%rsp)  ; are caller-saved in SysV ABI
    movsd %xmm2,192(%rsp)
    movsd %xmm3,200(%rsp)
    movq %rdi,208(%rsp)
    movq %rcx,216(%rsp)
    movq %rsi,224(%rsp)
    movsd %xmm4,232(%rsp)
    movsd %xmm5,240(%rsp)
    movsd %xmm6,248(%rsp)
    movsd %xmm7,256(%rsp)
    movsd %xmm8,264(%rsp)
    movsd %xmm11,272(%rsp)
    call _sqrt
    ;; the loads
    ;; below are interleaved
    ;; with computations
    addq $8,%rsp
    movsd 264(%rsp),%xmm1
    movsd 240(%rsp),%xmm2
    movsd 224(%rsp),%xmm2
    movsd 232(%rsp),%xmm4
    movq 200(%rsp),%rax
    movsd 248(%rsp),%xmm4
    movsd 256(%rsp),%xmm4
    movq 216(%rsp),%rcx
    movsd 192(%rsp),%xmm2


~kavon

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: NCG lowering of sqrt

Ben Gamari-2
Ryan Yates <[hidden email]> writes:

> Hi Kavon,
>
> I looked a bit and it does not appear that there is an SSE sqrt in the
> native code gen.  It should be easy to add (see a similar addition here:
> https://phabricator.haskell.org/D3265).  The x87 version was available for
> 32-bit.  I think if you use the LLVM backend it will give you the SSE sqrt.
>
Indeed. I pushed a starting point to D3508; I haven't validated it but
there's a chance it will work. If not it would be great if someone could
pick it up. Otherwise I'll return to it when I have time.

Cheers,

- Ben


_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

signature.asc (497 bytes) Download Attachment