Vector primops sizes

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Vector primops sizes

Mikhail Baykov
Recently merged vector primops support only 16 bytes operands - Int32
x 4, Double x 2 and so on. Current AVX instructions support 256 bit
operands and with simple cut'n'paste work it's possible to support at
least Double x 4 operands. I made those changes and GHC generates
(using llvm) proper AVX code using ymm registers. Also it might make
sense to support primops for vector types larger than any currently
supported primitive types - I have those changes in my branch as well
and llvm generates pretty good code as well - those changes might be
useful to provide access for llvm shufflevector instruction or writing
high performance processing of large vectors - with less potential
overhead.

Do we want to support larger vectors directly or ghc should be made
smart enough to fuse operations with vector primops performed in
parallel into larger vectors/registers for llvm? Do we want to provide
access to llvm shufflevector instruction?


Reply | Threaded
Open this post in threaded view
|

Vector primops sizes

Carter Schonwald
Yes please! having these  (for valid target arches/ CPU targets) would be
really really valuable for me.

On Feb 13, 2013 12:07 AM, "Michael Baikov" <manpacket at gmail.com> wrote:

>
> Recently merged vector primops support only 16 bytes operands - Int32
> x 4, Double x 2 and so on. Current AVX instructions support 256 bit
> operands and with simple cut'n'paste work it's possible to support at
> least Double x 4 operands. I made those changes and GHC generates
> (using llvm) proper AVX code using ymm registers. Also it might make
> sense to support primops for vector types larger than any currently
> supported primitive types - I have those changes in my branch as well
> and llvm generates pretty good code as well - those changes might be
> useful to provide access for llvm shufflevector instruction or writing
> high performance processing of large vectors - with less potential
> overhead.
>
> Do we want to support larger vectors directly or ghc should be made
> smart enough to fuse operations with vector primops performed in
> parallel into larger vectors/registers for llvm? Do we want to provide
> access to llvm shufflevector instruction?
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130213/56ff783e/attachment.htm>

Reply | Threaded
Open this post in threaded view
|

Vector primops sizes

Carter Schonwald
In reply to this post by Mikhail Baykov
By which I mean having this family of proposed primops. Its not obvious to
me at least how GHC could  intelligently infer / use these implicitly for
the end user / library writer.
On Feb 13, 2013 12:07 AM, "Michael Baikov" <manpacket at gmail.com> wrote:

> Recently merged vector primops support only 16 bytes operands - Int32
> x 4, Double x 2 and so on. Current AVX instructions support 256 bit
> operands and with simple cut'n'paste work it's possible to support at
> least Double x 4 operands. I made those changes and GHC generates
> (using llvm) proper AVX code using ymm registers. Also it might make
> sense to support primops for vector types larger than any currently
> supported primitive types - I have those changes in my branch as well
> and llvm generates pretty good code as well - those changes might be
> useful to provide access for llvm shufflevector instruction or writing
> high performance processing of large vectors - with less potential
> overhead.
>
> Do we want to support larger vectors directly or ghc should be made
> smart enough to fuse operations with vector primops performed in
> parallel into larger vectors/registers for llvm? Do we want to provide
> access to llvm shufflevector instruction?
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130213/d813d632/attachment.htm>

Reply | Threaded
Open this post in threaded view
|

Vector primops sizes

Mikhail Baykov
> By which I mean having this family of proposed primops. Its not obvious to
> me at least how GHC could  intelligently infer / use these implicitly for
> the end user / library writer.

I have couple of ideas how to implement this, but having explicit set
of primops will make using of the vector instructions less magical.

As for having only valid set of primops for given arch/CPU target will
make things much more complicated - llvm takes care of implementing
vector operation from smaller instructions - operations DoubleX16
primitive types gets compiled into something like

plusDoubleX16# :: DoubleX16# -> DoubleX16# -> DoubleX16#

        movq    %r13, 616(%rsp)
        movq    %rbp, 608(%rsp)
        movq    %r12, 600(%rsp)
        movq    %rbx, 592(%rsp)
        movq    %r15, 544(%rsp)
        movq    592(%rsp), %rax
        movq    %rax, 344(%rsp)
        movq    608(%rsp), %rax
        vmovups (%rax), %ymm0
        vmovups 32(%rax), %ymm1
        vmovups 64(%rax), %ymm2
        vmovups 96(%rax), %ymm3
        vmovaps %ymm3, 224(%rsp)
        vmovaps %ymm2, 192(%rsp)
        vmovaps %ymm1, 160(%rsp)
        vmovaps %ymm0, 128(%rsp)
        movq    608(%rsp), %rax
        vmovups 128(%rax), %ymm0
        vmovups 160(%rax), %ymm1
        vmovups 192(%rax), %ymm2
        vmovups 224(%rax), %ymm3
        vmovaps %ymm3, 96(%rsp)
        vmovaps %ymm2, 64(%rsp)
        vmovaps %ymm1, 32(%rsp)
        vmovaps %ymm0, (%rsp)
        movq    344(%rsp), %rbx
        movq    %rbx, 592(%rsp)
        movq    544(%rsp), %r15
        movq    600(%rsp), %r12
        movq    608(%rsp), %rax
        movq    616(%rsp), %r13
        movq    %rax, %rbp
        vzeroupper



(Still it should be possible to compile this with less amount of movements)


Reply | Threaded
Open this post in threaded view
|

Vector primops sizes

Simon Peyton Jones
In reply to this post by Carter Schonwald
I believe Geoff is working on adding AVX.  I expect he?d be interested in your patches.

Simon

From: ghc-devs-bounces at haskell.org [mailto:ghc-devs-bounces at haskell.org] On Behalf Of Carter Schonwald
Sent: 13 February 2013 05:59
To: Michael Baikov
Cc: ghc-devs at haskell.org
Subject: Re: Vector primops sizes


Yes please! having these  (for valid target arches/ CPU targets) would be really really valuable for me.

On Feb 13, 2013 12:07 AM, "Michael Baikov" <manpacket at gmail.com<mailto:manpacket at gmail.com>> wrote:

>
> Recently merged vector primops support only 16 bytes operands - Int32
> x 4, Double x 2 and so on. Current AVX instructions support 256 bit
> operands and with simple cut'n'paste work it's possible to support at
> least Double x 4 operands. I made those changes and GHC generates
> (using llvm) proper AVX code using ymm registers. Also it might make
> sense to support primops for vector types larger than any currently
> supported primitive types - I have those changes in my branch as well
> and llvm generates pretty good code as well - those changes might be
> useful to provide access for llvm shufflevector instruction or writing
> high performance processing of large vectors - with less potential
> overhead.
>
> Do we want to support larger vectors directly or ghc should be made
> smart enough to fuse operations with vector primops performed in
> parallel into larger vectors/registers for llvm? Do we want to provide
> access to llvm shufflevector instruction?
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org<mailto:ghc-devs at haskell.org>
> http://www.haskell.org/mailman/listinfo/ghc-devs
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130213/1c997c91/attachment.htm>

Reply | Threaded
Open this post in threaded view
|

Vector primops sizes

Geoffrey Mainland
I haven't seen Michael's patches (where are they btw?), but there is
some extra work to be done to ensure that 256-bit values are passed in
registers. Otherwise adding support for wider vector types is fairly
straightforward.

The current plan is for 256-bit wide vector primops to always be
available. The programmer can test for the __AVX__ CPP symbol, which
indicates that these primops will be compiled to efficient code. I am
not inclined to add wider vector primops, as there is no current
platform where they can be compiled efficiently.

Most programmers should use the Multi type family instead of working
with primops (or their boxed wrappers) directly. For example, by using
Multi Double instead of DoubleX2, the programmer will get 256-bit wide
vectors on platforms that support AVX, and 128-bit wide vectors
otherwise. See https://github.com/mainland/primitive for details.

Geoff

On 02/13/2013 07:44 AM, Simon Peyton-Jones wrote:

> I believe Geoff is working on adding AVX.  I expect he?d be interested
> in your patches.
>
>  
>
> Simon
>
>  
>
> *From:*ghc-devs-bounces at haskell.org
> [mailto:ghc-devs-bounces at haskell.org] *On Behalf Of *Carter Schonwald
> *Sent:* 13 February 2013 05:59
> *To:* Michael Baikov
> *Cc:* ghc-devs at haskell.org
> *Subject:* Re: Vector primops sizes
>
>  
>
> Yes please! having these  (for valid target arches/ CPU targets) would
> be really really valuable for me.
>
> On Feb 13, 2013 12:07 AM, "Michael Baikov" <manpacket at gmail.com
> <mailto:manpacket at gmail.com>> wrote:
>>
>> Recently merged vector primops support only 16 bytes operands - Int32
>> x 4, Double x 2 and so on. Current AVX instructions support 256 bit
>> operands and with simple cut'n'paste work it's possible to support at
>> least Double x 4 operands. I made those changes and GHC generates
>> (using llvm) proper AVX code using ymm registers. Also it might make
>> sense to support primops for vector types larger than any currently
>> supported primitive types - I have those changes in my branch as well
>> and llvm generates pretty good code as well - those changes might be
>> useful to provide access for llvm shufflevector instruction or writing
>> high performance processing of large vectors - with less potential
>> overhead.
>>
>> Do we want to support larger vectors directly or ghc should be made
>> smart enough to fuse operations with vector primops performed in
>> parallel into larger vectors/registers for llvm? Do we want to provide
>> access to llvm shufflevector instruction?
>>
>> _______________________________________________
>> ghc-devs mailing list
>> ghc-devs at haskell.org <mailto:ghc-devs at haskell.org>
>> http://www.haskell.org/mailman/listinfo/ghc-devs
>
>
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs
>




Reply | Threaded
Open this post in threaded view
|

Vector primops sizes

Alexander Kjeldaas
I mentioned this in another thread, but Xeon Phi chips have 512-bit AVX,
and Intel has apparently implemented support in LLVM for the ispc compiler.
Also apparently this hasn't been merged back yet, but I guess it is only a
matter of time.
The Intel MIC architecture isn't quite x86 though.

https://github.com/ispc/ispc/issues/367
http://gpuscience.com/software/ispc-a-spmd-compiler-with-xeon-and-xeon-phi-support/

Alexander


On Thu, Feb 14, 2013 at 12:29 AM, Geoffrey Mainland <mainland at apeiron.net>wrote:

> I haven't seen Michael's patches (where are they btw?), but there is
> some extra work to be done to ensure that 256-bit values are passed in
> registers. Otherwise adding support for wider vector types is fairly
> straightforward.
>
> The current plan is for 256-bit wide vector primops to always be
> available. The programmer can test for the __AVX__ CPP symbol, which
> indicates that these primops will be compiled to efficient code. I am
> not inclined to add wider vector primops, as there is no current
> platform where they can be compiled efficiently.
>
> Most programmers should use the Multi type family instead of working
> with primops (or their boxed wrappers) directly. For example, by using
> Multi Double instead of DoubleX2, the programmer will get 256-bit wide
> vectors on platforms that support AVX, and 128-bit wide vectors
> otherwise. See https://github.com/mainland/primitive for details.
>
> Geoff
>
> On 02/13/2013 07:44 AM, Simon Peyton-Jones wrote:
> > I believe Geoff is working on adding AVX.  I expect he?d be interested
> > in your patches.
> >
> >
> >
> > Simon
> >
> >
> >
> > *From:*ghc-devs-bounces at haskell.org
> > [mailto:ghc-devs-bounces at haskell.org] *On Behalf Of *Carter Schonwald
> > *Sent:* 13 February 2013 05:59
> > *To:* Michael Baikov
> > *Cc:* ghc-devs at haskell.org
> > *Subject:* Re: Vector primops sizes
> >
> >
> >
> > Yes please! having these  (for valid target arches/ CPU targets) would
> > be really really valuable for me.
> >
> > On Feb 13, 2013 12:07 AM, "Michael Baikov" <manpacket at gmail.com
> > <mailto:manpacket at gmail.com>> wrote:
> >>
> >> Recently merged vector primops support only 16 bytes operands - Int32
> >> x 4, Double x 2 and so on. Current AVX instructions support 256 bit
> >> operands and with simple cut'n'paste work it's possible to support at
> >> least Double x 4 operands. I made those changes and GHC generates
> >> (using llvm) proper AVX code using ymm registers. Also it might make
> >> sense to support primops for vector types larger than any currently
> >> supported primitive types - I have those changes in my branch as well
> >> and llvm generates pretty good code as well - those changes might be
> >> useful to provide access for llvm shufflevector instruction or writing
> >> high performance processing of large vectors - with less potential
> >> overhead.
> >>
> >> Do we want to support larger vectors directly or ghc should be made
> >> smart enough to fuse operations with vector primops performed in
> >> parallel into larger vectors/registers for llvm? Do we want to provide
> >> access to llvm shufflevector instruction?
> >>
> >> _______________________________________________
> >> ghc-devs mailing list
> >> ghc-devs at haskell.org <mailto:ghc-devs at haskell.org>
> >> http://www.haskell.org/mailman/listinfo/ghc-devs
> >
> >
> >
> > _______________________________________________
> > ghc-devs mailing list
> > ghc-devs at haskell.org
> > http://www.haskell.org/mailman/listinfo/ghc-devs
> >
>
>
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130214/8bb9ecf0/attachment.htm>

Reply | Threaded
Open this post in threaded view
|

Vector primops sizes

Geoffrey Mainland
Thanks for pointing this out. Certainly when LLVM supports 512-bit AVX
instructions, we should add the appropriate primops!

Geoff

On 02/14/2013 11:44 AM, Alexander Kjeldaas wrote:

>
> I mentioned this in another thread, but Xeon Phi chips have 512-bit
> AVX, and Intel has apparently implemented support in LLVM for the ispc
> compiler. Also apparently this hasn't been merged back yet, but I
> guess it is only a matter of time.
> The Intel MIC architecture isn't quite x86 though.
>
> https://github.com/ispc/ispc/issues/367
> http://gpuscience.com/software/ispc-a-spmd-compiler-with-xeon-and-xeon-phi-support/
>
> Alexander
>
>
> On Thu, Feb 14, 2013 at 12:29 AM, Geoffrey Mainland
> <mainland at apeiron.net <mailto:mainland at apeiron.net>> wrote:
>
>     I haven't seen Michael's patches (where are they btw?), but there is
>     some extra work to be done to ensure that 256-bit values are passed in
>     registers. Otherwise adding support for wider vector types is fairly
>     straightforward.
>
>     The current plan is for 256-bit wide vector primops to always be
>     available. The programmer can test for the __AVX__ CPP symbol, which
>     indicates that these primops will be compiled to efficient code. I am
>     not inclined to add wider vector primops, as there is no current
>     platform where they can be compiled efficiently.
>
>     Most programmers should use the Multi type family instead of working
>     with primops (or their boxed wrappers) directly. For example, by using
>     Multi Double instead of DoubleX2, the programmer will get 256-bit wide
>     vectors on platforms that support AVX, and 128-bit wide vectors
>     otherwise. See https://github.com/mainland/primitive for details.
>
>     Geoff
>
>     On 02/13/2013 07:44 AM, Simon Peyton-Jones wrote:
>     > I believe Geoff is working on adding AVX.  I expect he?d be
>     interested
>     > in your patches.
>     >
>     >
>     >
>     > Simon
>     >
>     >
>     >
>     > *From:*ghc-devs-bounces at haskell.org
>     <mailto:ghc-devs-bounces at haskell.org>
>     > [mailto:ghc-devs-bounces at haskell.org
>     <mailto:ghc-devs-bounces at haskell.org>] *On Behalf Of *Carter Schonwald
>     > *Sent:* 13 February 2013 05:59
>     > *To:* Michael Baikov
>     > *Cc:* ghc-devs at haskell.org <mailto:ghc-devs at haskell.org>
>     > *Subject:* Re: Vector primops sizes
>     >
>     >
>     >
>     > Yes please! having these  (for valid target arches/ CPU targets)
>     would
>     > be really really valuable for me.
>     >
>     > On Feb 13, 2013 12:07 AM, "Michael Baikov" <manpacket at gmail.com
>     <mailto:manpacket at gmail.com>
>     > <mailto:manpacket at gmail.com <mailto:manpacket at gmail.com>>> wrote:
>     >>
>     >> Recently merged vector primops support only 16 bytes operands -
>     Int32
>     >> x 4, Double x 2 and so on. Current AVX instructions support 256 bit
>     >> operands and with simple cut'n'paste work it's possible to
>     support at
>     >> least Double x 4 operands. I made those changes and GHC generates
>     >> (using llvm) proper AVX code using ymm registers. Also it might
>     make
>     >> sense to support primops for vector types larger than any currently
>     >> supported primitive types - I have those changes in my branch
>     as well
>     >> and llvm generates pretty good code as well - those changes
>     might be
>     >> useful to provide access for llvm shufflevector instruction or
>     writing
>     >> high performance processing of large vectors - with less potential
>     >> overhead.
>     >>
>     >> Do we want to support larger vectors directly or ghc should be made
>     >> smart enough to fuse operations with vector primops performed in
>     >> parallel into larger vectors/registers for llvm? Do we want to
>     provide
>     >> access to llvm shufflevector instruction?
>     >>
>     >> _______________________________________________
>     >> ghc-devs mailing list
>     >> ghc-devs at haskell.org <mailto:ghc-devs at haskell.org>
>     <mailto:ghc-devs at haskell.org <mailto:ghc-devs at haskell.org>>
>     >> http://www.haskell.org/mailman/listinfo/ghc-devs
>     >
>     >
>     >
>     > _______________________________________________
>     > ghc-devs mailing list
>     > ghc-devs at haskell.org <mailto:ghc-devs at haskell.org>
>     > http://www.haskell.org/mailman/listinfo/ghc-devs
>     >
>
>
>
>     _______________________________________________
>     ghc-devs mailing list
>     ghc-devs at haskell.org <mailto:ghc-devs at haskell.org>
>     http://www.haskell.org/mailman/listinfo/ghc-devs
>
>