`binary` serializing datatypes deriving Generic wrong on arm (32bit) with GHC HEAD

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

`binary` serializing datatypes deriving Generic wrong on arm (32bit) with GHC HEAD

Moritz Angermann-2
Hi!

while trying to make sure cross compilation with Template Haskell works properly
with 8.4, I ran into the following situation:

When serializing data types, e.g. `Name OccName NameFlavour` in the transmission
of Template Haskell Splice results from a 32bit arm device to the x86_64 host ghc.

We expect to see:
```
  .- 0 (first constructor)         .- 1  .- 4  (fifth constructor)             .- 4                                 .- 3
  v                                v     v                                     v                                    v
\NUL \NUL\NUL\NUL\NUL\NUL\NUL\NUL\SOHf \EOT \NUL \NUL\NUL\NUL\NUL\NUL\NUL\NUL\EOTmain \NUL\NUL\NUL\NUL\NUL\NUL\NUL\ETXTmp
                                            '--' '----------------------------------' '---------------------------------'
                                            NameSpace       PkgName                               ModName
     '-------------------------------' '--------------------------------------------------------------------------------'
                  OccName                       NameG NameSpace PkgName ModName :: NameFlavour
'-----------------------------------------------------------------------------------------------------------------------'
                                Name OccName NameFlavour :: Name
```

However, the `NameSpace` on the 32bit arm ends up being 8 bytes. Even though the full
`Namespace` data type can be fully serialized in a single byte.

The `binary` package tries to compute the size it needs for a generic data type, using
the following logic (from binary/src/Data/Binary/Generic.hs):

```
class SumSize f where
  sumSize :: Tagged f Word64

newtype Tagged (s :: * -> *) b = Tagged {unTagged :: b}

instance (SumSize a, SumSize b) => SumSize (a :+: b) where

sumSize = Tagged $ unTagged (sumSize :: Tagged a Word64) +
                   unTagged (sumSize :: Tagged b Word64)

instance SumSize (C1 c a) where
  sumSize = Tagged 1
```


Thus for a simple sum type `data X = A | B` we should get a `sumSize` of 2.
The arm32 device however ends up getting 2^33, because `sumSize :: Tagged a Word64`
and `sumSize :: Tagged b Word64` each end up being 2^32.

With some help from the nice folks in #ghc, I was able to conjure up the following
condensed test case:

```
{-# LANGUAGE DeriveGeneric, KindSignatures, PolyKinds, CPP,
    ScopedTypeVariables, TypeOperators, TypeSynonymInstances,
    FlexibleInstances  #-}
{-# OPTIONS_GHC -O2 #-}

import GHC.Generics

import Data.Word
import Debug.Trace

data X = A | B deriving (Show, Generic)

main :: IO ()
main = print (sumSize :: Tagged (Rep X))

-- like traceShowId, but allows us to prepend a message.
t :: Show a => String -> a -> a
#if TRACE
t msg x = traceShow (msg ++ show x) x
#else
t _ = id
#endif

class SumSize f where
    sumSize :: Tagged f

newtype Tagged (s :: * -> *) = Tagged {unTagged :: WORD} deriving Show

instance (SumSize a, SumSize b) => SumSize (a :+: b) where
    sumSize = t "SumSize (a :+: b): " $ Tagged $ unTagged (t "a :+: b => sumSize :: Tagged a: " $ sumSize :: Tagged a) +
                                                 unTagged (t "a :+: b => sumSize :: Tagged b: " $ sumSize :: Tagged b)

instance SumSize (C1 c a) where
    sumSize = t "SumSize (C1 c a): " $ Tagged 1

instance SumSize a => SumSize (M1 D c a) where
    sumSize = t "SumSize (M1 D c a): " $ Tagged . unTagged $ (sumSize :: Tagged a)

```

compiling this with `-DWORD=Word32 -DTRACE=1` yields the correct result (=2), with `-DWORD=Word64 -DTRACE=0` as well.
With `-DWORD=Word64 -DTRACE=2` the wrong result (=2^33)

Optimization flags seem not to play any role when everything is in a single module (as the test case).

As such I have attached the `-ddump-simple -dsuppress-all` files for the Word64 and Word32 with TRACE=1, as well
as the diff between the Word32 and Word64 dump.

The output with WORD=Word32, TRACE=1 is:
```
"SumSize (C1 c a): Tagged {unTagged = 1}"
"a :+: b => sumSize :: Tagged a: Tagged {unTagged = 4294967296}"
"a :+: b => sumSize :: Tagged b: Tagged {unTagged = 4294967296}"
"SumSize (a :+: b): Tagged {unTagged = 8589934592}"
"SumSize (M1 D c a): Tagged {unTagged = 8589934592}"
Tagged {unTagged = 8589934592}
```

with WORD=Word64, TRACE=1 is:
```
"SumSize (C1 c a): Tagged {unTagged = 1}"
"a :+: b => sumSize :: Tagged a: Tagged {unTagged = 1}"
"a :+: b => sumSize :: Tagged b: Tagged {unTagged = 1}"
"SumSize (a :+: b): Tagged {unTagged = 2}"
"SumSize (M1 D c a): Tagged {unTagged = 2}"
Tagged {unTagged = 2}
```

Any help with this would be greatly appreciated!

Cheers,
 Moritz

PS: I'm not absolutely sure, but this might also be related to https://ghc.haskell.org/trac/ghc/ticket/13513


_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Main.word32.dump-simpl (18K) Download Attachment
Main.word64.dump-simpl (23K) Download Attachment
Main.dump-simpl.diff (23K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Word64 from int64_t on 32bit arm seems off.

Moritz Angermann-2

Given the following sample program:

```
import Data.Word

f :: a -> a
f x = x
{-# NOINLINE f #-}

x, y, x', y' :: Word64
x = 1
y = 1
x' = 4294967296 + 2 -- upper:1 + lower:2
y' = 4294967296 + 2 -- upper:1 + lower:2

main :: IO ()
main = let z = f x + f y
           z' = f x' + f y'
       in do { print (z == x + y)
             ; print z
             ; print (z' == x' + y')
             ; print z'
             }
```

This produces:

```
True
2
True
8589934596
```
when compiled with `-O0` for me.

and
```
False
8589934592
True
8589934596
```
when compiled with `-O1` for me.

Thus, if we start out with two Word64 that fit into the lower byte, we end up
with the sum of both in the upper byte (with `-O1`).

The difference between -O0 and -O1 is that -O1 goes through the primOps, and as
such we end up with code like:
```
  %26 = tail call i64 @hs_word64ToInt64(i64 2)
  %27 = tail call i64 @hs_word64ToInt64(i64 4294967296)
  %28 = tail call i64 @hs_plusInt64(i64 %27, i64 %26)
  %29 = tail call i64 @hs_int64ToWord64(i64 %28)
```
after which interestingly the result is correct.  However the subsequent
invocation of `@base_GHCziWord_W64zh_con_info`, seems to pick the wrong bytes
for when reconstructing the Word64.

If anyone got any idea, I'd be happy to know. Otherwise I guess I'd have to start
adding debug information into the rts?

Cheers,
 Moritz


> On Nov 28, 2017, at 12:52 PM, Moritz Angermann <[hidden email]> wrote:
>
> Hi!
>
> while trying to make sure cross compilation with Template Haskell works properly
> with 8.4, I ran into the following situation:
>
> When serializing data types, e.g. `Name OccName NameFlavour` in the transmission
> of Template Haskell Splice results from a 32bit arm device to the x86_64 host ghc.
>
> We expect to see:
> ```
>  .- 0 (first constructor)         .- 1  .- 4  (fifth constructor)             .- 4                                 .- 3
>  v                                v     v                                     v                                    v
> \NUL \NUL\NUL\NUL\NUL\NUL\NUL\NUL\SOHf \EOT \NUL \NUL\NUL\NUL\NUL\NUL\NUL\NUL\EOTmain \NUL\NUL\NUL\NUL\NUL\NUL\NUL\ETXTmp
>                                            '--' '----------------------------------' '---------------------------------'
>                                            NameSpace       PkgName                               ModName
>     '-------------------------------' '--------------------------------------------------------------------------------'
>                  OccName                       NameG NameSpace PkgName ModName :: NameFlavour
> '-----------------------------------------------------------------------------------------------------------------------'
>                                Name OccName NameFlavour :: Name
> ```
>
> However, the `NameSpace` on the 32bit arm ends up being 8 bytes. Even though the full
> `Namespace` data type can be fully serialized in a single byte.
>
> The `binary` package tries to compute the size it needs for a generic data type, using
> the following logic (from binary/src/Data/Binary/Generic.hs):
>
> ```
> class SumSize f where
>  sumSize :: Tagged f Word64
>
> newtype Tagged (s :: * -> *) b = Tagged {unTagged :: b}
>
> instance (SumSize a, SumSize b) => SumSize (a :+: b) where
>
> sumSize = Tagged $ unTagged (sumSize :: Tagged a Word64) +
>                   unTagged (sumSize :: Tagged b Word64)
>
> instance SumSize (C1 c a) where
>  sumSize = Tagged 1
> ```
>
>
> Thus for a simple sum type `data X = A | B` we should get a `sumSize` of 2.
> The arm32 device however ends up getting 2^33, because `sumSize :: Tagged a Word64`
> and `sumSize :: Tagged b Word64` each end up being 2^32.
>
> With some help from the nice folks in #ghc, I was able to conjure up the following
> condensed test case:
>
> ```
> {-# LANGUAGE DeriveGeneric, KindSignatures, PolyKinds, CPP,
>    ScopedTypeVariables, TypeOperators, TypeSynonymInstances,
>    FlexibleInstances  #-}
> {-# OPTIONS_GHC -O2 #-}
>
> import GHC.Generics
>
> import Data.Word
> import Debug.Trace
>
> data X = A | B deriving (Show, Generic)
>
> main :: IO ()
> main = print (sumSize :: Tagged (Rep X))
>
> -- like traceShowId, but allows us to prepend a message.
> t :: Show a => String -> a -> a
> #if TRACE
> t msg x = traceShow (msg ++ show x) x
> #else
> t _ = id
> #endif
>
> class SumSize f where
>    sumSize :: Tagged f
>
> newtype Tagged (s :: * -> *) = Tagged {unTagged :: WORD} deriving Show
>
> instance (SumSize a, SumSize b) => SumSize (a :+: b) where
>    sumSize = t "SumSize (a :+: b): " $ Tagged $ unTagged (t "a :+: b => sumSize :: Tagged a: " $ sumSize :: Tagged a) +
>                                                 unTagged (t "a :+: b => sumSize :: Tagged b: " $ sumSize :: Tagged b)
>
> instance SumSize (C1 c a) where
>    sumSize = t "SumSize (C1 c a): " $ Tagged 1
>
> instance SumSize a => SumSize (M1 D c a) where
>    sumSize = t "SumSize (M1 D c a): " $ Tagged . unTagged $ (sumSize :: Tagged a)
>
> ```
>
> compiling this with `-DWORD=Word32 -DTRACE=1` yields the correct result (=2), with `-DWORD=Word64 -DTRACE=0` as well.
> With `-DWORD=Word64 -DTRACE=2` the wrong result (=2^33)
>
> Optimization flags seem not to play any role when everything is in a single module (as the test case).
>
> As such I have attached the `-ddump-simple -dsuppress-all` files for the Word64 and Word32 with TRACE=1, as well
> as the diff between the Word32 and Word64 dump.
>
> The output with WORD=Word32, TRACE=1 is:
> ```
> "SumSize (C1 c a): Tagged {unTagged = 1}"
> "a :+: b => sumSize :: Tagged a: Tagged {unTagged = 4294967296}"
> "a :+: b => sumSize :: Tagged b: Tagged {unTagged = 4294967296}"
> "SumSize (a :+: b): Tagged {unTagged = 8589934592}"
> "SumSize (M1 D c a): Tagged {unTagged = 8589934592}"
> Tagged {unTagged = 8589934592}
> ```
>
> with WORD=Word64, TRACE=1 is:
> ```
> "SumSize (C1 c a): Tagged {unTagged = 1}"
> "a :+: b => sumSize :: Tagged a: Tagged {unTagged = 1}"
> "a :+: b => sumSize :: Tagged b: Tagged {unTagged = 1}"
> "SumSize (a :+: b): Tagged {unTagged = 2}"
> "SumSize (M1 D c a): Tagged {unTagged = 2}"
> Tagged {unTagged = 2}
> ```
>
> Any help with this would be greatly appreciated!
>
> Cheers,
> Moritz
>
> PS: I'm not absolutely sure, but this might also be related to https://ghc.haskell.org/trac/ghc/ticket/13513
>
> <Main.word32.dump-simpl><Main.word64.dump-simpl><Main.dump-simpl.diff>_______________________________________________
> ghc-devs mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs