WordX/IntX wrap Word#/Int#?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

WordX/IntX wrap Word#/Int#?

Michal Terepeta
Hi all,

I've just noticed that all `WordX` (and `IntX`) data types are
actually implemented as wrappers around `Word#` (and `Int#`). This
probably doesn't matter much if it's stored on the heap (due to
pointer indirection and heap alignment), but it also means that:
```
data Foo = Foo {-# UNPACK #-} !Word8 {-# UNPACK #-} !Int8
```
will actually take *a lot* of space: on 64 bit we'd need 8 bytes for
header, 8 bytes for `Word8`, 8 bytes for `Int8`.

Is there any reason for this? The only thing I can see is that this
avoids having to add things like `Word8#` primitives into the
compiler. (also the codegen would need to emit zero-extend moves when
loading from memory, like `movzb{l,q}`)

If we had things like `Word8#` we could also consider changing `Bool`
to just wrap it (with the obvious encoding). Which would allow to both
UNPACK `Bool` *and* save the size within the struct. (alternatively
one could imagine a `Bool#` that would be just a byte)

I couldn't find any discussion about this, so any pointers would be
welcome. :)

Thanks,
Michal

PS.  I've had a look at it after reading about the recent
implementation of struct field reordering optimization in rustc:


_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: WordX/IntX wrap Word#/Int#?

Ben Gamari-2


On June 11, 2017 8:03:10 AM EDT, Michal Terepeta <[hidden email]> wrote:

>Hi all,
>
>I've just noticed that all `WordX` (and `IntX`) data types are
>actually implemented as wrappers around `Word#` (and `Int#`). This
>probably doesn't matter much if it's stored on the heap (due to
>pointer indirection and heap alignment), but it also means that:
>```
>data Foo = Foo {-# UNPACK #-} !Word8 {-# UNPACK #-} !Int8
>```
>will actually take *a lot* of space: on 64 bit we'd need 8 bytes for
>header, 8 bytes for `Word8`, 8 bytes for `Int8`.
>
>Is there any reason for this? The only thing I can see is that this
>avoids having to add things like `Word8#` primitives into the
>compiler. (also the codegen would need to emit zero-extend moves when
>loading from memory, like `movzb{l,q}`)
>
This is certainly one consideration. Another is that you would also need to teach the garbage collector to understand closures with sub-word-size fields. Currently we can encode whether each field of a closure is a pointer or not with a simple bitmap. If we naively allowed smaller fields we would need to increase the granularity of this representation to encode bytes.

Of course, one way to work around this would be to impose an invariant that guarantees that pointers are always word-aligned. Then we would probably want to shuffle sub-word sized fields, allowing two Word16s to inhabit a single word.

As you mention, this would no doubt require a bit of engineering. In particular, while x86 has robust support for sub-word-size operations, I don't believe all the platforms we support do. I these cases we would need to perform, for instance, aligned word-sized loads and stores and mask as appropriate. I may be wrong, however.

Another consideration is that the byte code interpreter would need to learn to understand these closures.

Regardless, Simon Marlow began some work in this direction a few years ago. There is a mostly complete patch in D38. All it needs is rebasing, fixing of the byte code interpreter, and then perhaps introduction of Word8# and friends. I think it would be great if we could make our heap representation a bit more space-conscious. Perhaps you could open a ticket so we collect these tidbits?

Another somewhat related issue that would be good think about in parallel to this issue is the treatment of the word-sized dependence of Word. See #11953.

Cheers,

- Ben


--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: WordX/IntX wrap Word#/Int#?

Joachim Breitner-2
Hi,

Am Sonntag, den 11.06.2017, 10:44 -0400 schrieb Ben Gamari:

> This is certainly one consideration. Another is that you would also
> need to teach the garbage collector to understand closures with sub-
> word-size fields. Currently we can encode whether each field of a
> closure is a pointer or not with a simple bitmap. If we naively
> allowed smaller fields we would need to increase the granularity of
> this representation to encode bytes.
>
> Of course, one way to work around this would be to impose an
> invariant that guarantees that pointers are always word-aligned. Then
> we would probably want to shuffle sub-word sized fields, allowing two
> Word16s to inhabit a single word.
that is not an issue; we already sort field into pointers first, and
non-pointers later. So all pointers are at the beginning and nicely
aligned, and all the non-pointer data can follow in whatever weird
format. The GC only needs to know how many words in total are used by
the non-pointer data.


Greetings,
Joachim
--
Joachim “nomeata” Breitner
  [hidden email]https://www.joachim-breitner.de/
  XMPP: [hidden email] • OpenPGP-Key: 0xF0FBF51F
  Debian Developer: [hidden email]
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: WordX/IntX wrap Word#/Int#?

Ben Gamari-2
Joachim Breitner <[hidden email]> writes:

> Hi,
>
> Am Sonntag, den 11.06.2017, 10:44 -0400 schrieb Ben Gamari:
>> This is certainly one consideration. Another is that you would also
>> need to teach the garbage collector to understand closures with sub-
>> word-size fields. Currently we can encode whether each field of a
>> closure is a pointer or not with a simple bitmap. If we naively
>> allowed smaller fields we would need to increase the granularity of
>> this representation to encode bytes.
>>
>> Of course, one way to work around this would be to impose an
>> invariant that guarantees that pointers are always word-aligned. Then
>> we would probably want to shuffle sub-word sized fields, allowing two
>> Word16s to inhabit a single word.
>
> that is not an issue; we already sort field into pointers first, and
> non-pointers later. So all pointers are at the beginning and nicely
> aligned, and all the non-pointer data can follow in whatever weird
> format. The GC only needs to know how many words in total are used by
> the non-pointer data.
>
Ahh, great point. I stand corrected.

Cheers,

- Ben


_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

signature.asc (497 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: WordX/IntX wrap Word#/Int#?

Michal Terepeta
Thanks a lot for the replies & links!

I'll try to finish Simon's diff (and probably ask silly questions if I get stuck ;)

Cheers,
Michal


_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: WordX/IntX wrap Word#/Int#?

Michal Terepeta
Just for the record, I've opened:
to track this.

Cheers,
Michal

On Mon, Jun 12, 2017 at 8:45 PM Michal Terepeta <[hidden email]> wrote:
Thanks a lot for the replies & links!

I'll try to finish Simon's diff (and probably ask silly questions if I get stuck ;)

Cheers,
Michal


_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: WordX/IntX wrap Word#/Int#?

Simon Marlow-7
In reply to this post by Joachim Breitner-2
On 11 June 2017 at 22:44, Joachim Breitner <[hidden email]> wrote:
Hi,

Am Sonntag, den 11.06.2017, 10:44 -0400 schrieb Ben Gamari:
> This is certainly one consideration. Another is that you would also
> need to teach the garbage collector to understand closures with sub-
> word-size fields. Currently we can encode whether each field of a
> closure is a pointer or not with a simple bitmap. If we naively
> allowed smaller fields we would need to increase the granularity of
> this representation to encode bytes.
>
> Of course, one way to work around this would be to impose an
> invariant that guarantees that pointers are always word-aligned. Then
> we would probably want to shuffle sub-word sized fields, allowing two
> Word16s to inhabit a single word.

that is not an issue; we already sort field into pointers first, and
non-pointers later. So all pointers are at the beginning and nicely
aligned, and all the non-pointer data can follow in whatever weird
format. The GC only needs to know how many words in total are used by
the non-pointer data.

But the compiler has no support for sub-word-sized fields yet.  I made a partial patch to support it a while ago: https://phabricator.haskell.org/D38 

Cheers
Simon 


Greetings,
Joachim
--
Joachim “nomeata” Breitner
  [hidden email]https://www.joachim-breitner.de/
  XMPP: [hidden email] • OpenPGP-Key: 0xF0FBF51F
  Debian Developer: [hidden email]

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs



_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Loading...