How to speedup generically parsing sum types?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

How to speedup generically parsing sum types?

Bas van Dijk-2
Hello,

I recently added default generic implementations of toJSON and
parseJSON to the aeson package. Now I'm optimizing them. Here are some
benchmark results that compare:

* th: toJSON and fromJSON generated by template-haskell. Can be
compared to hand-written code. Should be the fastest of all.

* syb: toJSON and fromJSON from the Data.Aeson.Generic module. Uses
the Data type class.

* generic: my toJSON and fromJSON using GHC Generics.

The benchmark itself can be found here:
https://github.com/basvandijk/aeson/blob/optimizations/benchmarks/AesonCompareAutoInstances.hs

toJSON
======

D/toJSON/th                 3.631734 us
D/toJSON/syb                32.66679 us
D/toJSON/generic            3.371868 us

BigRecord/toJSON/th         8.982990 us
BigRecord/toJSON/syb        48.90737 us
BigRecord/toJSON/generic    8.971597 us

BigProduct/toJSON/th        1.578259 us
BigProduct/toJSON/syb       29.21153 us
BigProduct/toJSON/generic   1.623115 us

BigSum/toJSON/th            51.81214 ns
BigSum/toJSON/syb           1.256708 us
BigSum/toJSON/generic       71.32851 ns


fromJSON
========

D/fromJSON/th               7.017204 us
D/fromJSON/syb              23.46567 us
D/fromJSON/generic          7.968974 us

BigRecord/fromJSON/th       8.513789 us
BigRecord/fromJSON/syb      36.64501 us
BigRecord/fromJSON/generic  10.07809 us

BigProduct/fromJSON/th      2.430677 us
BigProduct/fromJSON/syb     17.97764 us
BigProduct/fromJSON/generic 2.201130 us

BigSum/fromJSON/th          414.8699 ns
BigSum/fromJSON/syb         4.113170 us
BigSum/fromJSON/generic     13.62614 us !!!


As can be seen, in most cases the GHC Generics implementation is much
faster than SYB and just as fast as TH. I'm impressed by how well GHC
optimizes the code!

Unfortunately the last benchmark, generically parsing a big sum type,
is much slower. The code for parsing sums, which can be found here:

https://github.com/basvandijk/aeson/blob/optimizations/Data/Aeson/Types/Internal.hs#L1059

is basically this:


instance (GFromSum a, GFromSum b) => GFromJSON (a :+: b) where
    gParseJSON (Object (M.toList -> [keyVal])) = gParseSum keyVal
    gParseJSON v = typeMismatch "sum (:+:)" v
    {-# INLINE gParseJSON #-}


class GFromSum f where
    gParseSum :: Pair -> Parser (f a)

instance (GFromSum a, GFromSum b) => GFromSum (a :+: b) where
    gParseSum keyVal = (L1 <$> gParseSum keyVal) <|>
                       (R1 <$> gParseSum keyVal)
    {-# INLINE gParseSum #-}

instance (Constructor c, GFromJSON a, ConsFromJSON a) =>
    GFromSum (C1 c a) where
    gParseSum (key, value)
        | key == pack (conName (undefined :: t c a p)) =
            gParseJSON value
        | otherwise = notFound $ unpack key
    {-# INLINE gParseSum #-}


notFound :: String -> Parser a
notFound key = fail $ "The key \"" ++ key ++ "\" was not found"
{-# INLINE notFound #-}


Any idea how to make it faster?

Regards,

Bas

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: How to speedup generically parsing sum types?

José Pedro Magalhães
Hi Bas,

First of all, thanks for these numbers. I have previously compared the performance of GP libs [1] and your results confirm what I would expect, except for that last one, BigSum/fromJSON/generic.

It's good that you're using INLINE pragmas on the generic function already. What I would also try:
- Compile with -O2 and -fno-spec-constr-count (this last one is particularly important)
- Add {-# INLINE [1] #-} pragmas to the to/from methods of your Generic instances.

To add these INLINE pragmas you will have to give your own instances of Generic (so you can't derive them). I'd suggest you get hold of them with -ddump-deriv, copy-paste and add the pragmas, just for testing purposes. The phase is important: you first want to make sure you inline the generic function definition, and only then the from/to.

Please keep me posted on the effects of these suggestions. In particular, if the INLINE pragmas on the from/to methods are essential, I'll be happy to add them to the derived instances.


Cheers,
Pedro

[1] http://dreixel.net/research/pdf/ogie.pdf

2011/11/3 Bas van Dijk <[hidden email]>
Hello,

I recently added default generic implementations of toJSON and
parseJSON to the aeson package. Now I'm optimizing them. Here are some
benchmark results that compare:

* th: toJSON and fromJSON generated by template-haskell. Can be
compared to hand-written code. Should be the fastest of all.

* syb: toJSON and fromJSON from the Data.Aeson.Generic module. Uses
the Data type class.

* generic: my toJSON and fromJSON using GHC Generics.

The benchmark itself can be found here:
https://github.com/basvandijk/aeson/blob/optimizations/benchmarks/AesonCompareAutoInstances.hs

toJSON
======

D/toJSON/th                 3.631734 us
D/toJSON/syb                32.66679 us
D/toJSON/generic            3.371868 us

BigRecord/toJSON/th         8.982990 us
BigRecord/toJSON/syb        48.90737 us
BigRecord/toJSON/generic    8.971597 us

BigProduct/toJSON/th        1.578259 us
BigProduct/toJSON/syb       29.21153 us
BigProduct/toJSON/generic   1.623115 us

BigSum/toJSON/th            51.81214 ns
BigSum/toJSON/syb           1.256708 us
BigSum/toJSON/generic       71.32851 ns


fromJSON
========

D/fromJSON/th               7.017204 us
D/fromJSON/syb              23.46567 us
D/fromJSON/generic          7.968974 us

BigRecord/fromJSON/th       8.513789 us
BigRecord/fromJSON/syb      36.64501 us
BigRecord/fromJSON/generic  10.07809 us

BigProduct/fromJSON/th      2.430677 us
BigProduct/fromJSON/syb     17.97764 us
BigProduct/fromJSON/generic 2.201130 us

BigSum/fromJSON/th          414.8699 ns
BigSum/fromJSON/syb         4.113170 us
BigSum/fromJSON/generic     13.62614 us !!!


As can be seen, in most cases the GHC Generics implementation is much
faster than SYB and just as fast as TH. I'm impressed by how well GHC
optimizes the code!

Unfortunately the last benchmark, generically parsing a big sum type,
is much slower. The code for parsing sums, which can be found here:

https://github.com/basvandijk/aeson/blob/optimizations/Data/Aeson/Types/Internal.hs#L1059

is basically this:


instance (GFromSum a, GFromSum b) => GFromJSON (a :+: b) where
   gParseJSON (Object (M.toList -> [keyVal])) = gParseSum keyVal
   gParseJSON v = typeMismatch "sum (:+:)" v
   {-# INLINE gParseJSON #-}


class GFromSum f where
   gParseSum :: Pair -> Parser (f a)

instance (GFromSum a, GFromSum b) => GFromSum (a :+: b) where
   gParseSum keyVal = (L1 <$> gParseSum keyVal) <|>
                      (R1 <$> gParseSum keyVal)
   {-# INLINE gParseSum #-}

instance (Constructor c, GFromJSON a, ConsFromJSON a) =>
   GFromSum (C1 c a) where
   gParseSum (key, value)
       | key == pack (conName (undefined :: t c a p)) =
           gParseJSON value
       | otherwise = notFound $ unpack key
   {-# INLINE gParseSum #-}


notFound :: String -> Parser a
notFound key = fail $ "The key \"" ++ key ++ "\" was not found"
{-# INLINE notFound #-}


Any idea how to make it faster?

Regards,

Bas


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: How to speedup generically parsing sum types?

Bas van Dijk-2
2011/11/3 José Pedro Magalhães <[hidden email]>:
> - Compile with -O2 and -fno-spec-constr-count (this last one is particularly
> important)

I already compiled with -O2. Adding -fno-spec-constr-count does not
change the results.

> - Add {-# INLINE [1] #-} pragmas to the to/from methods of your Generic
> instances.

I tried:

BigSum/toJSON/generic goes from 70 ns to 52 ns! So inlining 'from' is
an improvement.

Unfortunately BigSum/fromJSON/generic stays at 13 us.

Bas

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: How to speedup generically parsing sum types?

Bas van Dijk-2
For those who find this interesting. Here's the code of the BigSum benchmark
with a manual Generic instance with inlined 'from' and 'to':
https://gist.github.com/1336426

José, I was thinking about the following idea. Say GHC generates the
following instance for BigSum:

instance Generic BigSum where
  type Rep BigSum = D1 D1BigSum SumOfBigSum
  ...

type SumOfBigSum =
       (   (   (    C1 C1_0BigSum U1
               :+: (C1 C1_1BigSum U1 :+: C1 C1_2BigSum U1)
               )
           :+: (    C1 C1_3BigSum U1
               :+: (C1 C1_4BigSum U1 :+: C1 C1_5BigSum U1)
               )
           )
       :+: (   (    C1 C1_6BigSum U1
               :+: (C1 C1_7BigSum U1 :+: C1 C1_8BigSum U1)
               )
           :+: (    C1 C1_9BigSum  U1
               :+: (C1 C1_10BigSum U1 :+: C1 C1_11BigSum U1)
               )
           )
       )
  :+: (   (   (    C1 C1_12BigSum U1
              :+: (C1 C1_13BigSum U1 :+: C1 C1_14BigSum U1)
              )
          :+: (    C1 C1_15BigSum U1
              :+: (C1 C1_16BigSum U1 :+: C1 C1_17BigSum U1)
              )
          )
      :+: (   (    C1 C1_18BigSum U1
              :+: (C1 C1_19BigSum U1 :+: C1 C1_20BigSum U1)
              )
          :+: (   (C1 C1_21BigSum U1 :+: C1 C1_22BigSum U1)
              :+: (C1 C1_23BigSum U1 :+: C1 C1_24BigSum U1)
              )
          )
      )

It also generates the following function (or method): (I haven't
figured out the correct type yet. A correct version might need to use
type families or functional dependencies)

conPath :: String -> Maybe (C1 ? ? ? -> SumOfBigSum)
conPath "F01" = Just $ L1 . L1 . L1 . L1
conPath "F02" = Just $ L1 . L1 . L1 . R1 . L1
conPath "F03" = Just $ L1 . L1 . L1 . R1 . R1
conPath "F04" = Just $ L1 . L1 . R1 . L1
conPath "F05" = Just $ L1 . L1 . R1 . R1 . L1
conPath "F06" = Just $ L1 . L1 . R1 . R1 . R1
conPath "F07" = Just $ L1 . R1 . L1 . L1
conPath "F08" = Just $ L1 . R1 . L1 . R1 . L1
conPath "F09" = Just $ L1 . R1 . L1 . R1 . R1
conPath "F10" = Just $ L1 . R1 . R1 . L1
conPath "F11" = Just $ L1 . R1 . R1 . R1 . L1
conPath "F12" = Just $ L1 . R1 . R1 . R1 . R1
conPath "F13" = Just $ R1 . L1 . L1 . L1
conPath "F14" = Just $ R1 . L1 . L1 . R1 . L1
conPath "F15" = Just $ R1 . L1 . L1 . R1 . R1
conPath "F16" = Just $ R1 . L1 . R1 . L1
conPath "F17" = Just $ R1 . L1 . R1 . R1 . L1
conPath "F18" = Just $ R1 . L1 . R1 . R1 . R1
conPath "F19" = Just $ R1 . R1 . L1 . L1
conPath "F20" = Just $ R1 . R1 . L1 . R1 . L1
conPath "F21" = Just $ R1 . R1 . L1 . R1 . R1
conPath "F22" = Just $ R1 . R1 . R1 . L1 . L1
conPath "F23" = Just $ R1 . R1 . R1 . L1 . R1
conPath "F24" = Just $ R1 . R1 . R1 . R1 . L1
conPath "F25" = Just $ R1 . R1 . R1 . R1 . R1
conPath _     = Nothing

conPath is given the name of the constructor. If it's a valid name it
will return a function that constructs a SumOfBigSum given the
corresponding constructor. Of course, since the types of the
constructors can vary (not in this case) coming up with a correct
implementation is a challenge.

Using conPath in my gParseJSON is easy:

instance (GFromJSON a, GFromJSON b) => GFromJSON (a :+: b) where
    gParseJSON (Object (M.toList -> [(key, val)])) =
        case conPath key of
          Nothing   -> mzero
          Just path -> path <$> gParseJSON val

    gParseJSON v = typeMismatch "sum (:+:)" v
    {-# INLINE gParseJSON #-}

I suspect this to be much more efficient.

Bas

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: How to speedup generically parsing sum types?

Claus Reinke
In reply to this post by Bas van Dijk-2
> * syb: toJSON and fromJSON from the Data.Aeson.Generic module. Uses
> the Data type class.
>..
> As can be seen, in most cases the GHC Generics implementation is much
> faster than SYB and just as fast as TH. I'm impressed by how well GHC
> optimizes the code!

Not that it matters much if you're going with other tools, but your
SYB code has a long linear chain of type rep comparisons, at every
level of the recursive traversals. That is partially inherent in the SYB
design (reducing everything to cast), but could probably be improved?

Claus

 

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: How to speedup generically parsing sum types?

Bas van Dijk-2
2011/11/3 Claus Reinke <[hidden email]>:
> Not that it matters much if you're going with other tools, but your SYB code
> has a long linear chain of type rep comparisons, at every
> level of the recursive traversals. That is partially inherent in the SYB
> design (reducing everything to cast), but could probably be improved?

I'm not an expert on the SYB code. So if you have an idea how to
improve it please file a ticket or even better send a pull request.

Cheers,

Bas

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: How to speedup generically parsing sum types?

Twan van Laarhoven
In reply to this post by Bas van Dijk-2
On 03/11/11 11:16, Bas van Dijk wrote:

> ...
> instance (Constructor c, GFromJSON a, ConsFromJSON a) => GFromSum (C1 c a) where
>      gParseSum (key, value)
>          | key == pack (conName (undefined :: t c a p)) =
>              gParseJSON value
>          | otherwise = notFound $ unpack key
>      {-# INLINE gParseSum #-}
>
>
> notFound :: String ->  Parser a
> notFound key = fail $ "The key \"" ++ key ++ "\" was not found"
> {-# INLINE notFound #-}

Perhaps relying on Attoparsec backtracking for picking out the right
alternative from the sum is the problem. You could try it with Maybe:


class GFromSum f where
     gParseSum :: Pair -> Maybe (Parser (f a))

instance (Constructor c, GFromJSON a, ConsFromJSON a)
         => GFromSum (C1 c a) where
     gParseSum (key, value)
         | key == pack (conName (undefined :: t c a p))
                     = Just (gParseJSON value)
         | otherwise = Nothing

instance (GFromSum a, GFromSum b) => GFromSum (a :+: b) where
     gParseSum keyVal = (fmap L1 <$> gParseSum keyVal)
                    <|> (fmap R1 <$> gParseSum keyVal)
     {-# INLINE gParseSum #-}

instance (GFromSum a, GFromSum b) => GFromJSON (a :+: b) where
     gParseJSON (Object (M.toList -> [keyVal]))
         | Just p <- gParseSum keyVal -> p
     gParseJSON v = typeMismatch "sum (:+:)" v



Twan

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: How to speedup generically parsing sum types?

Bas van Dijk-2
On 3 November 2011 17:38, Twan van Laarhoven <[hidden email]> wrote:
> Perhaps relying on Attoparsec backtracking for picking out the right
> alternative from the sum is the problem. You could try it with Maybe:

Good idea. I implemented and committed it and the
BigSum/fromJSON/generic benchmark goes from 13.6 us to 11.3 us. Still
a lot slower than the 4.1 us of SYB or the 414.9 ns of TH but an
improvement nonetheless.

Thanks for the idea!

Bas

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe