Fast JSON validation - reducing allocations

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
23 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Fast JSON validation - reducing allocations

David Turner-2
Dear Café,

We have a stream of ~900byte JSON documents (lazy ByteStrings) that we would like to validate (NB not parse) as quickly as possible. We just need to check that the bytes are a syntactically valid JSON document and then pass them elsewhere; we do not care what the content is. The documents are encoded in UTF-8 and have no extraneous whitespace.

We have made a number of attempts:

1. Use Aeson to parse the documents and then discard the result. This was actually pretty good: Criterion clocked it at ~30us per document.

2. A hand-written Attoparsec parser. This did quite a bit worse: ~70us per document.

3. Use Get (from the binary package), which was a bit messier as it was factored to avoid all backtracking, but managed to get down to around ~20us per document.

4. Build a table-based, Word8-labelled DFA (with a minor hack to deal with values-within-values) and run it with ByteString.Lazy.foldl'. Also around ~20us per document.

We didn't pursue the Attoparsec idea any further, but the other three all allocate ~85kB per document, and this now seems to be the limiting factor for performance. In more detail, looking at the threadscope output, we see it parse 5 or 6 documents in ~150us (allocating ~500kB) and then run the GC, which takes about another 150us. The GC stats indicate that it's only copying a couple of kB of live data each time, so most of that 500kB is junk.

Now 85kB per document is ~96B per byte, which is only a pointer-and-a-half. It's not entirely surprising that there's this much allocation going on each time round the loop, but we'd like to try and get rid of it. I've tried the usual strictness tricks and have peered in a mystified fashion at the output of -ddump-simpl but am none the wiser.

There's a copy of the relevant code for option 4 at https://github.com/DaveCTurner/json-validator. I've hacked around with it a bit since producing the numbers above, so it's now apparently a bit slower than Aeson but allocates less too (~65kB).

Could anyone help, e.g. by pointing me at the bit in the Core that is allocating within the main loop?

Many thanks,

David



_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Fast JSON validation - reducing allocations

Bardur Arantsson-2
On 2017-05-11 18:12, David Turner wrote:
> Dear Café,
>
> We have a stream of ~900byte JSON documents (lazy ByteStrings) that we
> would like to validate (NB not parse) as quickly as possible. We just
> need to check that the bytes are a syntactically valid JSON document and
> then pass them elsewhere; we do not care what the content is. The
> documents are encoded in UTF-8 and have no extraneous whitespace.
>

No particular recommendations, but you might want to look into
semi-indexing[1] as a strategy. It looks plausible that it would be
possible to do that without a lot of allocation; see the paper for
details. (I there's also a demo implementation in Python on GH.)

[1] http://www.di.unipi.it/~ottavian/files/semi_index_cikm.pdf

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Fast JSON validation - reducing allocations

David Turner-2
Interesting, thanks for the link. However we're trying to do _even less_ than that - this is just the "scan the document to produce the Elias-Fano encoding" step, except without needing to keep a hold of it for later. It's not quite as trivial as that paper makes out as (a) it doesn't mention the possibility that the documents might not be well-formed, and (b) it doesn't really talk about dealing with the special delimiting characters within string literals, for which you need to drag along a bunch of state while you're scanning.

This'd be pretty trivial to do in a C-like language now we've got the DFA tables built, and I may resort to that at some point if we can't work out how to avoid the allocations in Haskell-land, but I'd quite like to be able to solve problems of this form without unnecessarily resorting to C.

Cheers,

On 11 May 2017 at 17:30, Bardur Arantsson <[hidden email]> wrote:
On 2017-05-11 18:12, David Turner wrote:
> Dear Café,
>
> We have a stream of ~900byte JSON documents (lazy ByteStrings) that we
> would like to validate (NB not parse) as quickly as possible. We just
> need to check that the bytes are a syntactically valid JSON document and
> then pass them elsewhere; we do not care what the content is. The
> documents are encoded in UTF-8 and have no extraneous whitespace.
>

No particular recommendations, but you might want to look into
semi-indexing[1] as a strategy. It looks plausible that it would be
possible to do that without a lot of allocation; see the paper for
details. (I there's also a demo implementation in Python on GH.)

[1] http://www.di.unipi.it/~ottavian/files/semi_index_cikm.pdf

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Fast JSON validation - reducing allocations

Bardur Arantsson-2
On 2017-05-11 18:59, David Turner wrote:
> Interesting, thanks for the link. However we're trying to do _even less_
> than that - this is just the "scan the document to produce the
> Elias-Fano encoding" step, except without needing to keep a hold of it
> for later. It's not quite as trivial as that paper makes out as (a) it
> doesn't mention the possibility that the documents might not be
> well-formed, and (b) it doesn't really talk about dealing with the
> special delimiting characters within string literals, for which you need
> to drag along a bunch of state while you're scanning.

My thinking was actually that it could be used reduce the allocations
since everything (AFAIR, it's been a while) would basically just be
linear datastructures and indexes.

> This'd be pretty trivial to do in a C-like language now we've got the
> DFA tables built, and I may resort to that at some point if we can't
> work out how to avoid the allocations in Haskell-land, but I'd quite
> like to be able to solve problems of this form without unnecessarily
> resorting to C.

We'll there's always

     https://hackage.haskell.org/package/inline-c

:)

Regards,

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Fast JSON validation - reducing allocations

Eric Seidel-3
In reply to this post by David Turner-2

On Thu, May 11, 2017, at 09:12, David Turner wrote:
> Could anyone help, e.g. by pointing me at the bit in the Core that is
> allocating within the main loop?

GHC has a -ticky flag that tracks precisely where allocations are
happening. It's quite low level (you may have to stare at the STG in
addition to the Core to interpret the results) but I've found it
incredibly useful when trying to understand performance swings in GHC's
own benchmarks.

https://ghc.haskell.org/trac/ghc/wiki/Debugging/TickyTicky

You can enable it per module, which is nice break from the
rebuild-the-world -prof, but the flip side IIRC is that if the
allocations are happening in a module that wasn't compiled with -ticky
(e.g. somewhere inside Data.Array) you might not see them.

Hope this helps!
Eric
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Fast JSON validation - reducing allocations

Ben Gamari-2
Eric Seidel <[hidden email]> writes:

> On Thu, May 11, 2017, at 09:12, David Turner wrote:
>> Could anyone help, e.g. by pointing me at the bit in the Core that is
>> allocating within the main loop?
>
> GHC has a -ticky flag that tracks precisely where allocations are
> happening. It's quite low level (you may have to stare at the STG in
> addition to the Core to interpret the results) but I've found it
> incredibly useful when trying to understand performance swings in GHC's
> own benchmarks.
>
> https://ghc.haskell.org/trac/ghc/wiki/Debugging/TickyTicky
>
> You can enable it per module, which is nice break from the
> rebuild-the-world -prof, but the flip side IIRC is that if the
> allocations are happening in a module that wasn't compiled with -ticky
> (e.g. somewhere inside Data.Array) you might not see them.
>
Indeed, -ticky is wonderful for this sort of thing since it doesn't
affect Core-to-Core optimization, meaning that the instrumented program
will behave very similarly to the uninstrumented version (just a bit
slower due to counter overhead).

I had a quick look at your example over lunch. I started like this
(after commenting out the aeson benchmark in Main),

    $ ghc app/Main.hs -isrc  -O2 -ddump-stg -fforce-recomp \
          -ddump-to-file -dsuppress-idinfo -ddump-simpl -rtsopts \
          -ticky -ticky-allocd
    $ app/Main +RTS -rticky.out

This produces a Ticky report (named ticky.out) in the current directory.
This includes a table which in my case looked something like,

        Entries      Alloc    Alloc'd  Non-void Arguments      STG Name
    --------------------------------------------------------------------------------
            123        192         48   1 L                    go9{v slyX} (main@main:Automaton)
              2          0         32   1 L                    go8{v slyH} (main@main:Automaton)
        1465016   58600640  146501600   0                      $w$j1{v slMc} (main@main:Automaton) in slHB
      161884268          011761148448   0                      $w$j1{v slJT} (main@main:Automaton) in slHA
              0          0   11720128   4 ppww                 $s$wgo2{v slHz} (main@main:Automaton) in r4s
      16353241111790448768   13185144   5 wwLpp                $wgo7{v slHA} (main@main:Automaton) in r4s
        1831270  167011824   13185144   6 ppwLww               $s$wgo1{v slHB} (main@main:Automaton) in r4s
         183127          0   13185144   0                      $w$j{v slGN} (main@main:Automaton) in r4s
            992       8624      10944   1 L                    go7{v slwV} (main@main:Automaton) in r4m
              3         72          0   1 C                    main@main:Automaton.showGraph120{v r3O}
            681          0          0   2 ML                   go6{v rkSd} (main@main:Automaton)

The important things we care about here are Alloc (that is how many
bytes each name allocates) and Alloc'd (that is, how many of each
closure are allocated in bytes). [Aside: we explicitly enabled the
Alloc'd column during compilation with the -ticky-allocd flag; if you
omit it this column will contain only zeros. In general I find it
helpful to have so I generally enable it, despite the overhead it
introduces.]

The table suggests a few nice starting points: $w$j1_slJT is allocated
quite a bit, probably by $wgo7_slHA. Next we can turn to the STG output
(src/Automaton.dump-stg) to try to understand why this is. Reading STG
generally tends to be quite hard due to the rather large, deeply nested
trees that most programs compile to. That being said, a good editor can
help immensely (I also have my ghc-dump [1] package which I've been
meaning to add STG support to, which would enable some interesting
analysis techniques).

Unfortunately, my lunch is now all gone so I'll have to drop this here
for now. However, I'll try to pick it up again tonight. Do keep us
apprised of your progress!

Cheers,

- Ben


[1] https://github.com/bgamari/ghc-dump

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.

signature.asc (497 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Fast JSON validation - reducing allocations

Ben Gamari-2
In reply to this post by David Turner-2
Ccing Luke Maurer under the assumption that he will appreciate seeing
the fruits of his labor.


David Turner <[hidden email]> writes:

> Dear Café,
>
(snip)
>
> There's a copy of the relevant code for option 4 at
> https://github.com/DaveCTurner/json-validator. I've hacked around with it a
> bit since producing the numbers above, so it's now apparently a bit slower
> than Aeson but allocates less too (~65kB).
>
> Could anyone help, e.g. by pointing me at the bit in the Core that is
> allocating within the main loop?
>
While turning this over in my head I realized that this is the sort of
program which may be helped significantly by GHC 8.2's improved join
point handling. Indeed the timings seem to support this hypothesis:

GHC 8.0.2

    benchmarking json-validator/Automaton/testEvent
    time                 22.84 μs   (22.76 μs .. 22.94 μs)
                         1.000 R²   (1.000 R² .. 1.000 R²)
    mean                 22.84 μs   (22.76 μs .. 22.94 μs)
    std dev              297.4 ns   (221.8 ns .. 378.8 ns)

    Alloc rate     4,015,256,078,038 bytes per MUT second

GHC 8.2

    benchmarking json-validator/Automaton/testEvent
    time                 9.221 μs   (9.141 μs .. 9.318 μs)
                         0.998 R²   (0.996 R² .. 1.000 R²)
    mean                 9.163 μs   (9.084 μs .. 9.356 μs)
    std dev              399.8 ns   (193.0 ns .. 745.4 ns)
    variance introduced by outliers: 54% (severely inflated)

    Alloc rate           123,141,635 bytes per MUT second


Wow! I suspect your allocations have now vanished.

I didn't verify that the improvement really was due to more join points,
but it seems quite likely. Well done Luke!

Cheers,

- Ben


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.

signature.asc (497 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Fast JSON validation - reducing allocations

David Turner-2
Ben, Eric, thanks for your help. A whole new world of low-level statistics opens up before me...

On 11 May 2017 at 18:57, Ben Gamari <[hidden email]> wrote:
> Eric Seidel <[hidden email]> writes:
> > On Thu, May 11, 2017, at 09:12, David Turner wrote:
> >> Could anyone help, e.g. by pointing me at the bit in the Core that is
> >> allocating within the main loop?
> >
> > GHC has a -ticky flag that tracks precisely where allocations are
> > happening. It's quite low level (you may have to stare at the STG in
> > addition to the Core to interpret the results) but I've found it
> > incredibly useful when trying to understand performance swings in GHC's
> > own benchmarks.
> >
> >
> > You can enable it per module, which is nice break from the
> > rebuild-the-world -prof, but the flip side IIRC is that if the
> > allocations are happening in a module that wasn't compiled with -ticky
> > (e.g. somewhere inside Data.Array) you might not see them.
> Indeed, -ticky is wonderful for this sort of thing since it doesn't
> affect Core-to-Core optimization, meaning that the instrumented program
> will behave very similarly to the uninstrumented version (just a bit
> slower due to counter overhead).
> I had a quick look at your example over lunch. I started like this
> (after commenting out the aeson benchmark in Main), 
>     $ ghc app/Main.hs -isrc  -O2 -ddump-stg -fforce-recomp \
>           -ddump-to-file -dsuppress-idinfo -ddump-simpl -rtsopts \
>           -ticky -ticky-allocd
>     $ app/Main +RTS -rticky.out


Ok, I was just about to ask about the Alloc'd column as mine was all zeroes, but the -ticky-allocd was what I needed, thanks.

> The table suggests a few nice starting points: $w$j1_slJT is allocated
> quite a bit, probably by $wgo7_slHA. Next we can turn to the STG output
> (src/Automaton.dump-stg) to try to understand why this is. Reading STG
> generally tends to be quite hard due to the rather large, deeply nested
> trees that most programs compile to.

You're not wrong.

I've trawled through the STG as best as I can for a newbie. The function with a large number in the 'Alloc' column looks pretty much to be the heart of the `step` function. There's a lot of nesting but it looks largely to be just checking the bounds on array indices, which I don't think allocates anything.

However I see this this little snippet at the very bottom of the nesting:

case
    indexArray# [ds1_sl1b
                 y2_sl5H]
of
_ [Occ=Dead]
{ (##) ipv8_sl60 [Occ=Once!] ->
      case
          ipv8_sl60
      of
      _ [Occ=Dead]
      { GHC.Word.W8# dt6_sl62 [Occ=Once] ->
            case
                ss_sl5v
            of
            dt7_sl63
            { __DEFAULT ->
                  (#,,#) [__word 1
                          dt6_sl62
                          dt7_sl63];
            };
      };
};


I'm guessing that the `indexArray` call is the line that reads `nextState = aTransitionsTable makeAutomaton AU.! (currentState, nextByte)`, and to me it looks like it might be unboxing the thing it's getting out of the array. Not sure that counts as an allocation, but I'm surprised anyway.

I'll keep digging anyway, but it'd be good to hear of any progress at your end too. Hope it didn't spoil your lunch!

Cheers,

> While turning this over in my head I realized that this is the sort of
> program which may be helped significantly by GHC 8.2's improved join
> point handling. Indeed the timings seem to support this hypothesis:
> GHC 8.0.2
>     benchmarking json-validator/Automaton/testEvent
>     time                 22.84 μs   (22.76 μs .. 22.94 μs)
>                          1.000 R²   (1.000 R² .. 1.000 R²)
>     mean                 22.84 μs   (22.76 μs .. 22.94 μs)
>     std dev              297.4 ns   (221.8 ns .. 378.8 ns)
>     Alloc rate     4,015,256,078,038 bytes per MUT second
> GHC 8.2
>     benchmarking json-validator/Automaton/testEvent
>     time                 9.221 μs   (9.141 μs .. 9.318 μs)
>                          0.998 R²   (0.996 R² .. 1.000 R²)
>     mean                 9.163 μs   (9.084 μs .. 9.356 μs)
>     std dev              399.8 ns   (193.0 ns .. 745.4 ns)
>     variance introduced by outliers: 54% (severely inflated)
>     Alloc rate           123,141,635 bytes per MUT second
> Wow! I suspect your allocations have now vanished.

Ooo, that's more like it.

Could you run again using the following to get Criterion's estimate of the allocations-per-call?

json-validator-exe --regress allocated:iters +RTS -T

Cheers,

David

On 11 May 2017 at 19:45, David Turner <[hidden email]> wrote:
On 11 May 2017 at 19:40, Ben Gamari <[hidden email]> wrote:
Ccing Luke Maurer under the assumption that he will appreciate seeing
the fruits of his labor.


David Turner <[hidden email]> writes:

> Dear Café,
>
(snip)
>
> There's a copy of the relevant code for option 4 at
> https://github.com/DaveCTurner/json-validator. I've hacked around with it a
> bit since producing the numbers above, so it's now apparently a bit slower
> than Aeson but allocates less too (~65kB).
>
> Could anyone help, e.g. by pointing me at the bit in the Core that is
> allocating within the main loop?
>
While turning this over in my head I realized that this is the sort of
program which may be helped significantly by GHC 8.2's improved join
point handling. Indeed the timings seem to support this hypothesis:

GHC 8.0.2

    benchmarking json-validator/Automaton/testEvent
    time                 22.84 μs   (22.76 μs .. 22.94 μs)
                         1.000 R²   (1.000 R² .. 1.000 R²)
    mean                 22.84 μs   (22.76 μs .. 22.94 μs)
    std dev              297.4 ns   (221.8 ns .. 378.8 ns)

    Alloc rate     4,015,256,078,038 bytes per MUT second

GHC 8.2

    benchmarking json-validator/Automaton/testEvent
    time                 9.221 μs   (9.141 μs .. 9.318 μs)
                         0.998 R²   (0.996 R² .. 1.000 R²)
    mean                 9.163 μs   (9.084 μs .. 9.356 μs)
    std dev              399.8 ns   (193.0 ns .. 745.4 ns)
    variance introduced by outliers: 54% (severely inflated)

    Alloc rate           123,141,635 bytes per MUT second


Wow! I suspect your allocations have now vanished.


Ooo, that's more like it.

Could you run again using the following to get Criterion's estimate of the allocations-per-call?

json-validator-exe --regress allocated:iters +RTS -T

Cheers,

David



_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Fast JSON validation - reducing allocations

Ben Gamari-2
In reply to this post by Ben Gamari-2
David Turner <[hidden email]> writes:

> Ben, Eric, thanks for your help. A whole new world of low-level statistics
> opens up before me...
>
No worries!

> You're not wrong.
>
> I've trawled through the STG as best as I can for a newbie. The function
> with a large number in the 'Alloc' column looks pretty much to be the heart
> of the `step` function. There's a lot of nesting but it looks largely to be
> just checking the bounds on array indices, which I don't think allocates
> anything.
>
Something that I should have mentioned earlier is that STG has the nice
property that all allocation is syntactically obvious: allocated
closures manifest as `let`s. This makes it fairly easy to pick out
possible allocation sites, even in large dumps.

> However I see this this little snippet at the very bottom of the nesting:
>
> case
>     indexArray# [ds1_sl1b
>                  y2_sl5H]
> of
> _ [Occ=Dead]
> { (##) ipv8_sl60 [Occ=Once!] ->
>       case
>           ipv8_sl60
>       of
>       _ [Occ=Dead]
>       { GHC.Word.W8# dt6_sl62 [Occ=Once] ->
>             case
>                 ss_sl5v
>             of
>             dt7_sl63
>             { __DEFAULT ->
>                   (#,,#) [__word 1
>                           dt6_sl62
>                           dt7_sl63];
>             };
>       };
> };
>
>
> I'm guessing that the `indexArray` call is the line that reads `nextState =
> aTransitionsTable makeAutomaton AU.! (currentState, nextByte)`, and to me
> it looks like it might be unboxing the thing it's getting out of the array.
> Not sure that counts as an allocation, but I'm surprised anyway.
>
The unboxing should just involve looking at the value field of the W8#
closure (e.g. a memory dereference) and putting the value in a machine
register. There is no allocation here as far as I can see.


> I'll keep digging anyway, but it'd be good to hear of any progress at your
> end too. Hope it didn't spoil your lunch!

Far from it! It was nice to have a puzzle to ponder.

Cheers,

- Ben

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.

signature.asc (497 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Fast JSON validation - reducing allocations

David Turner-2
Morning all,

On 11 May 2017 at 20:32, Ben Gamari <[hidden email]> wrote:

Something that I should have mentioned earlier is that STG has the nice
property that all allocation is syntactically obvious: allocated
closures manifest as `let`s. This makes it fairly easy to pick out
possible allocation sites, even in large dumps.


Ah, that's very useful to know!

Armed with that knowledge, I came to the conclusion that the allocation was for the sharing of the `nextState` variable. Inlining it brings it down to 20us and 22kB per iteration.


Getting closer, but it looks like waiting for 8.2 is a better answer. Looking forward to it!

Cheers,


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Fast JSON validation - reducing allocations

Arjen
On Fri, 2017-05-12 at 09:10 +0100, David Turner wrote:

> Morning all,
>
> On 11 May 2017 at 20:32, Ben Gamari <[hidden email]> wrote:
> > Something that I should have mentioned earlier is that STG has the
> > nice
> > property that all allocation is syntactically obvious: allocated
> > closures manifest as `let`s. This makes it fairly easy to pick out
> > possible allocation sites, even in large dumps.
>
>
> Ah, that's very useful to know!
>
> Armed with that knowledge, I came to the conclusion that the
> allocation was for the sharing of the `nextState` variable. Inlining
> it brings it down to 20us and 22kB per iteration.
>
> https://github.com/DaveCTurner/json-validator/commit/ec994ec9226ca7bc
> 2e76f19bef98f42e0b233524
>
> Getting closer, but it looks like waiting for 8.2 is a better answer.
> Looking forward to it!
>
> Cheers,
>

Maybe this is a silly question, and please let me know why if so, but:

Has anyone thought about parallelizing it for multiple messages in
order to "produce garbage faster"? While reducing allocation will make
the single validations faster, doing multiple ones might improve the
throughput per GC ratio. This assumes that the amount of live data in
the heap is small, making GC sort of constant time, and having multiple
cores available.

I wonder whether a few strategically placed par's and pseq's might
allow you to scale horizontally without requiring invasive changes to
the program's structure. Apologies for not trying to do this myself
first ;-).

kind regards, Arjen
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Fast JSON validation - reducing allocations

David Turner-2
On 12 May 2017 at 09:27, Arjen <[hidden email]> wrote:

Maybe this is a silly question, and please let me know why if so, but:

Has anyone thought about parallelizing it for multiple messages in
order to "produce garbage faster"? While reducing allocation will make
the single validations faster, doing multiple ones might improve the
throughput per GC ratio. This assumes that the amount of live data in
the heap is small, making GC sort of constant time, and having multiple
cores available.

Not a silly question at all. Adding the following incantation:

    `using` parListChunk 100 rseq

does quite happily spread things across all 4 cores on my development machine, and it's certainly a bit faster. To give some stats, it processes ~24 events between GCs rather than ~6, and collects ~2MB rather than ~500kB. The throughput becomes a lot less consistent, however, at least partly due to some bigger GC pauses along the way. As Ben's statistics showed, our allocation rate on one thread is around 4TBps, which is already stressing the GC out a bit, and parallelising it doesn't make that problem any easier.

I know in the OP I said "we have a stream" (accidental MLK misquote) but in fact there are a bunch of parallel streams whose number exceeds the number of available cores, so we don't anticipate any enormous benefit from spreading the processing of any one stream across multiple cores: single-threaded performance is what we think we should be concentrating on.

Cheers,

David




_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Fast JSON validation - reducing allocations

Arjen
In reply to this post by Arjen
On Fri, 2017-05-12 at 10:27 +0200, Arjen wrote:

> On Fri, 2017-05-12 at 09:10 +0100, David Turner wrote:
> > Morning all,
> > 
> > On 11 May 2017 at 20:32, Ben Gamari <[hidden email]> wrote:
> > > Something that I should have mentioned earlier is that STG has
> the
> > > nice
> > > property that all allocation is syntactically obvious: allocated
> > > closures manifest as `let`s. This makes it fairly easy to pick
> out
> > > possible allocation sites, even in large dumps.
> > 
> > 
> > Ah, that's very useful to know!
> > 
> > Armed with that knowledge, I came to the conclusion that the
> > allocation was for the sharing of the `nextState` variable.
> Inlining
> > it brings it down to 20us and 22kB per iteration.
> > 
> > https://github.com/DaveCTurner/json-validator/commit/ec994ec9226ca7
> bc
> > 2e76f19bef98f42e0b233524
> > 
> > Getting closer, but it looks like waiting for 8.2 is a better
> answer.
> > Looking forward to it!
> > 
> > Cheers,
> > 
>
> Maybe this is a silly question, and please let me know why if so,
> but:
>
> Has anyone thought about parallelizing it for multiple messages in
> order to "produce garbage faster"? While reducing allocation will
> make
> the single validations faster, doing multiple ones might improve the
> throughput per GC ratio. This assumes that the amount of live data in
> the heap is small, making GC sort of constant time, and having
> multiple
> cores available.
>
> I wonder whether a few strategically placed par's and pseq's might
> allow you to scale horizontally without requiring invasive changes to
> the program's structure. Apologies for not trying to do this myself
> first ;-).
>

Unfortunately, this does not seem to help. I tried it and while I did
see 3.5 and 3.95 CPU's used with parList and parListChunk 8, the
timings show it to be slower than sequential evaluation of a list of
testEvent. Thank you for the problem to try this on :-)

Please let me know if I made a mistake in the testing code below.

kind regards, Arjen

---8<---

benchmarking json-validator/Automaton/testEvent
time                 26.57 μs   (26.07 μs .. 27.12 μs)
                     0.987 R²   (0.973 R² .. 0.995 R²)
mean                 31.01 μs   (28.17 μs .. 44.49 μs)
std dev              15.65 μs   (5.258 μs .. 35.77 μs)
variance introduced by outliers: 99% (severely inflated)

benchmarking json-validator/Automaton-List/testEventList
time                 65.11 ms   (61.54 ms .. 69.54 ms)
                     0.982 R²   (0.951 R² .. 0.996 R²)
mean                 59.18 ms   (56.28 ms .. 63.22 ms)
std dev              5.801 ms   (4.344 ms .. 8.238 ms)
variance introduced by outliers: 32% (moderately inflated)

benchmarking json-validator/Automaton-ParList/testEventList
time                 243.9 ms   (214.9 ms .. 270.6 ms)
                     0.996 R²   (0.986 R² .. 1.000 R²)
mean                 253.7 ms   (243.6 ms .. 260.8 ms)
std dev              10.32 ms   (6.781 ms .. 13.19 ms)
variance introduced by outliers: 16% (moderately inflated)

benchmarking json-validator/Automaton-ParListChunk/testEventList
time                 211.4 ms   (193.1 ms .. 232.3 ms)
                     0.997 R²   (0.990 R² .. 1.000 R²)
mean                 200.3 ms   (193.0 ms .. 206.6 ms)
std dev              9.256 ms   (7.106 ms .. 10.21 ms)
variance introduced by outliers: 14% (moderately inflated)

  19,225,632,224 bytes allocated in the heap
     109,968,376 bytes copied during GC
       2,062,736 bytes maximum residency (20 sample(s))
       2,250,352 bytes maximum slop
              10 MB total memory in use (0 MB lost due to
fragmentation)

                                     Tot time (elapsed)  Avg pause  Max
pause
  Gen  0     28722 colls, 28722
par   17.829s   1.591s     0.0001s    0.0160s
  Gen  1        20 colls,    19
par    0.255s   0.054s     0.0027s    0.0058s

  Parallel GC work balance: 7.41% (serial 0%, perfect 100%)

  TASKS: 13 (1 bound, 12 peak workers (12 total), using -N4)

  SPARKS: 25625 (25570 converted, 0 overflowed, 0 dud, 20 GC'd, 35
fizzled)

  INIT    time    0.001s  (  0.002s elapsed)
  MUT     time   44.403s  ( 22.013s elapsed)
  GC      time   18.084s  (  1.645s elapsed)
  EXIT    time    0.001s  (  0.001s elapsed)
  Total   time   62.490s  ( 23.660s elapsed)

  Alloc rate    432,981,654 bytes per MUT second

  Productivity  71.1% of total user, 93.0% of total elapsed

gc_alloc_block_sync: 25370
whitehole_spin: 0
gen[0].sync: 405
gen[1].sync: 22

---8<---

{-# LANGUAGE OverloadedStrings #-}

module Main where

import           Criterion.Main
import           Data.Aeson
import qualified Data.ByteString.Lazy as BL
import qualified Data.Text.Encoding   as T
import qualified Data.Text.IO as T
import qualified Data.Text as T (append, pack)
import qualified Automaton
import Control.Parallel.Strategies

testEvent :: Int -> BL.ByteString
testEvent i = BL.fromStrict $ T.encodeUtf8 (T.append
"{\"stream\":\"actuals-
stream\",\"submitter\":{\"type\":\"other\",\"description\":\"redactedre
dac\"},\"driverActivities\":[{\"driverActivity\":{\"journey\":{\"headco
de\":\"1A01\",\"description\":\"redactedredactedredactedredactedredacte
dredacte\"},\"activity\":[{\"arrivalTime\":null,\"sequence\":1,\"tiploc
\":\"REDACTE\",\"stop\":true,\"departureTime\":\"2016-06-
09T18:22:28.000000000000Z\"},{\"arrivalTime\":\"2016-06-
09T18:24:43.000000000000Z\",\"sequence\":2,\"tiploc\":\"REDACTE\",\"sto
p\":true,\"departureTime\":\"2016-06-
09T18:25:51.000000000000Z\"},{\"arrivalTime\":\"2016-06-
09T18:26:58.000000000000Z\",\"sequence\":3,\"tiploc\":\"REDACT\",\"stop
\":true,\"departureTime\":\"2016-06-
09T18:28:08.000000000000Z\"},{\"arrivalTime\":\"2016-06-
09T18:29:57.000000000000Z\",\"sequence\":4,\"tiploc\":\"REDACTE\",\"sto
p\":true,\"departureTime\":null}]},\"activityUserId\":\"521be60a-02f2-
4892-b468-c17d9c1c4fcf\"}],\"submissionTime\":\"2016-06-
09T18:36:45.831486000000Z\",\"type\":\"driverActivityLogged" (T.append
(T.pack (show i)) "\"}"))

data AnyJSON = AnyJSON
  deriving (Show, Eq)

instance FromJSON AnyJSON where
  parseJSON _ = pure AnyJSON

main :: IO ()
main = print (testEventList [0]) >> defaultMain
  [ bgroup "json-validator/Automaton"
    [ bench "testEvent" $ whnf Automaton.isValidJson (testEvent 0)
    ]
  , bgroup "json-validator/Automaton-List"
    [ bench "testEventList" $ whnf (\es -> and (testEventList es
`using` rseq)) [1..1000]
    ]
  , bgroup "json-validator/Automaton-ParList"
    [ bench "testEventList" $ whnf (\es -> and (testEventList es
`using` parList rseq)) [1..1000]
    ]
  , bgroup "json-validator/Automaton-ParListChunk"
    [ bench "testEventList" $ whnf (\es -> and (testEventList es
`using` parListChunk 8 rseq)) [1..1000]
    ]
  ]
  where
    testEventList = map (Automaton.isValidJson . testEvent)

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Fast JSON validation - reducing allocations

Arjen
In reply to this post by David Turner-2
On Fri, 2017-05-12 at 10:48 +0100, David Turner wrote:

> On 12 May 2017 at 09:27, Arjen <[hidden email]> wrote:
> > Maybe this is a silly question, and please let me know why if so,
> > but:
> >
> > Has anyone thought about parallelizing it for multiple messages in
> > order to "produce garbage faster"? While reducing allocation will
> > make
> > the single validations faster, doing multiple ones might improve
> > the
> > throughput per GC ratio. This assumes that the amount of live data
> > in
> > the heap is small, making GC sort of constant time, and having
> > multiple
> > cores available.
>
> Not a silly question at all. Adding the following incantation:
>
>     `using` parListChunk 100 rseq
>
> does quite happily spread things across all 4 cores on my development
> machine, and it's certainly a bit faster. To give some stats, it
> processes ~24 events between GCs rather than ~6, and collects ~2MB
> rather than ~500kB. The throughput becomes a lot less consistent,
> however, at least partly due to some bigger GC pauses along the way.
> As Ben's statistics showed, our allocation rate on one thread is
> around 4TBps, which is already stressing the GC out a bit, and
> parallelising it doesn't make that problem any easier.
>
> I know in the OP I said "we have a stream" (accidental MLK misquote)
> but in fact there are a bunch of parallel streams whose number
> exceeds the number of available cores, so we don't anticipate any
> enormous benefit from spreading the processing of any one stream
> across multiple cores: single-threaded performance is what we think
> we should be concentrating on.
>
> Cheers,
>
> David
>
Happy to read that you have more success than I with parListChunk.
I expect(ed) the sparks to be cheap, so this might help if the various
incoming streams are very unbalanced in length and/or message sizes.
I agree that multi-processing a single stream does not make much sense
if you already have multiple concurrent streams.

Nice to see that adding on parallelism in Haskell (GHC) is that easy
(in this case) and with a very good speed-up factor!

kind regards, Arjen
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Fast JSON validation - reducing allocations

Arjen
In reply to this post by David Turner-2
On Fri, 2017-05-12 at 10:48 +0100, David Turner wrote:

> On 12 May 2017 at 09:27, Arjen <[hidden email]> wrote:
> > Maybe this is a silly question, and please let me know why if so,
> > but:
> >
> > Has anyone thought about parallelizing it for multiple messages in
> > order to "produce garbage faster"? While reducing allocation will
> > make
> > the single validations faster, doing multiple ones might improve
> > the
> > throughput per GC ratio. This assumes that the amount of live data
> > in
> > the heap is small, making GC sort of constant time, and having
> > multiple
> > cores available.
>
> Not a silly question at all. Adding the following incantation:
>
>     `using` parListChunk 100 rseq
>
> does quite happily spread things across all 4 cores on my development
> machine, and it's certainly a bit faster. To give some stats, it
> processes ~24 events between GCs rather than ~6, and collects ~2MB
> rather than ~500kB. The throughput becomes a lot less consistent,
> however, at least partly due to some bigger GC pauses along the way.
> As Ben's statistics showed, our allocation rate on one thread is
> around 4TBps, which is already stressing the GC out a bit, and
> parallelising it doesn't make that problem any easier.
>
> I know in the OP I said "we have a stream" (accidental MLK misquote)
> but in fact there are a bunch of parallel streams whose number
> exceeds the number of available cores, so we don't anticipate any
> enormous benefit from spreading the processing of any one stream
> across multiple cores: single-threaded performance is what we think
> we should be concentrating on.
>
> Cheers,
>
> David
>
Apologies for spamming, but if you want more mutator time between
garbage collections, try increasing the nursery size. I get a lot less
time in GC and also much less sync's on GC (according to -s). It seems
to reduce the execution time by a third using something -A8M or -A64M
(quite extreem). In a large program, this effect might be less.

kind regards, Arjen


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Fast JSON validation - reducing allocations

David Turner-2
In reply to this post by Arjen
On 12 May 2017 at 11:00, Arjen <[hidden email]> wrote:
Happy to read that you have more success than I with parListChunk.
I expect(ed) the sparks to be cheap, so this might help if the various
incoming streams are very unbalanced in length and/or message sizes.
I agree that multi-processing a single stream does not make much sense
if you already have multiple concurrent streams.

Nice to see that adding on parallelism in Haskell (GHC) is that easy
(in this case) and with a very good speed-up factor!

kind regards, Arjen

Ah, ok, this is strange. The production system is definitely faster with the parallel strategy (on one stream) but my Criterion benchmarks are much slower. I've updated the code and results on Github (https://github.com/DaveCTurner/json-validator). Probably not going to be able to look at the parallel version much more, but thought you might like a look.

Cheers,


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Fast JSON validation - reducing allocations

Ben Gamari-2
In reply to this post by Ben Gamari-2
David Turner <[hidden email]> writes:

> On 11 May 2017 at 19:40, Ben Gamari <[hidden email]> wrote:
>
>> Ccing Luke Maurer under the assumption that he will appreciate seeing
>> the fruits of his labor.
>>
>>
>> David Turner <[hidden email]> writes:
>>
>> > Dear Café,
>> >
>> (snip)
>> >
>> > There's a copy of the relevant code for option 4 at
>> > https://github.com/DaveCTurner/json-validator. I've hacked around with
>> it a
>> > bit since producing the numbers above, so it's now apparently a bit
>> slower
>> > than Aeson but allocates less too (~65kB).
>> >
>> > Could anyone help, e.g. by pointing me at the bit in the Core that is
>> > allocating within the main loop?
>> >
>> While turning this over in my head I realized that this is the sort of
>> program which may be helped significantly by GHC 8.2's improved join
>> point handling. Indeed the timings seem to support this hypothesis:
>>
>> GHC 8.0.2
>>
>>     benchmarking json-validator/Automaton/testEvent
>>     time                 22.84 μs   (22.76 μs .. 22.94 μs)
>>                          1.000 R²   (1.000 R² .. 1.000 R²)
>>     mean                 22.84 μs   (22.76 μs .. 22.94 μs)
>>     std dev              297.4 ns   (221.8 ns .. 378.8 ns)
>>
>>     Alloc rate     4,015,256,078,038 bytes per MUT second
>>
>> GHC 8.2
>>
>>     benchmarking json-validator/Automaton/testEvent
>>     time                 9.221 μs   (9.141 μs .. 9.318 μs)
>>                          0.998 R²   (0.996 R² .. 1.000 R²)
>>     mean                 9.163 μs   (9.084 μs .. 9.356 μs)
>>     std dev              399.8 ns   (193.0 ns .. 745.4 ns)
>>     variance introduced by outliers: 54% (severely inflated)
>>
>>     Alloc rate           123,141,635 bytes per MUT second
>>
>>
>> Wow! I suspect your allocations have now vanished.
>>
>
>
> Ooo, that's more like it.
>
> Could you run again using the following to get Criterion's estimate of the
> allocations-per-call?
>
> json-validator-exe --regress allocated:iters +RTS -T
>benchmarking json-validator/Automaton/testEvent
Here it is,

GHC 8.0.2

    benchmarking json-validator/Automaton/testEvent
    time                 34.81 μs   (33.48 μs .. 36.10 μs)
                         0.991 R²   (0.988 R² .. 0.994 R²)
    mean                 34.04 μs   (33.28 μs .. 34.99 μs)
    std dev              2.828 μs   (2.423 μs .. 3.247 μs)
    allocated:           1.000 R²   (1.000 R² .. 1.000 R²)
      iters              65924.651  (65895.364 .. 65951.510)
      y                  -4175.284  (-46736.295 .. 39814.235)
    variance introduced by outliers: 78% (severely inflated)


GHC 8.2

    benchmarking json-validator/Automaton/testEvent
    time                 9.021 μs   (8.923 μs .. 9.162 μs)
                         0.998 R²   (0.994 R² .. 1.000 R²)
    mean                 8.970 μs   (8.905 μs .. 9.232 μs)
    std dev              341.4 ns   (94.90 ns .. 711.7 ns)
    allocated:           0.973 R²   (0.956 R² .. 0.984 R²)
      iters              343.033    (332.692 .. 354.606)
      y                  -7326.339  (-57269.638 .. 47222.231)
    variance introduced by outliers: 47% (moderately inflated)

Cheers,

- Ben

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.

signature.asc (497 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Fast JSON validation - reducing allocations

Mikolaj Konarski-2
Truly impressive. Amazing.

I wonder what style of coding of inner loops
leads to such good results in 8.2. Is it easy to describe?
Or is the answer "any" or "simplest" or "natural"? If not,
can it be captured as some recursion combinators perhaps?

>> json-validator-exe --regress allocated:iters +RTS -T
>
> Here it is,
>
> GHC 8.0.2
>
>     benchmarking json-validator/Automaton/testEvent
>     time                 34.81 μs   (33.48 μs .. 36.10 μs)
>                          0.991 R²   (0.988 R² .. 0.994 R²)
>     mean                 34.04 μs   (33.28 μs .. 34.99 μs)
>     std dev              2.828 μs   (2.423 μs .. 3.247 μs)
>     allocated:           1.000 R²   (1.000 R² .. 1.000 R²)
>       iters              65924.651  (65895.364 .. 65951.510)
>       y                  -4175.284  (-46736.295 .. 39814.235)
>     variance introduced by outliers: 78% (severely inflated)
>
>
> GHC 8.2
>
>     benchmarking json-validator/Automaton/testEvent
>     time                 9.021 μs   (8.923 μs .. 9.162 μs)
>                          0.998 R²   (0.994 R² .. 1.000 R²)
>     mean                 8.970 μs   (8.905 μs .. 9.232 μs)
>     std dev              341.4 ns   (94.90 ns .. 711.7 ns)
>     allocated:           0.973 R²   (0.956 R² .. 0.984 R²)
>       iters              343.033    (332.692 .. 354.606)
>       y                  -7326.339  (-57269.638 .. 47222.231)
>     variance introduced by outliers: 47% (moderately inflated)
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Fast JSON validation - reducing allocations

Ben Gamari-3
In reply to this post by David Turner-2
David Turner <[hidden email]> writes:

> On 12 May 2017 at 09:27, Arjen <[hidden email]> wrote:
>
>>
>> Maybe this is a silly question, and please let me know why if so, but:
>>
>> Has anyone thought about parallelizing it for multiple messages in
>> order to "produce garbage faster"? While reducing allocation will make
>> the single validations faster, doing multiple ones might improve the
>> throughput per GC ratio. This assumes that the amount of live data in
>> the heap is small, making GC sort of constant time, and having multiple
>> cores available.
>>
>
> Not a silly question at all. Adding the following incantation:
>
>     `using` parListChunk 100 rseq
>
> does quite happily spread things across all 4 cores on my development
> machine, and it's certainly a bit faster. To give some stats, it processes
> ~24 events between GCs rather than ~6, and collects ~2MB rather than
> ~500kB. The throughput becomes a lot less consistent, however, at least
> partly due to some bigger GC pauses along the way. As Ben's statistics
> showed, our allocation rate on one thread is around 4TBps, which is already
> stressing the GC out a bit, and parallelising it doesn't make that problem
> any easier.
>
To elaborate on this a bit: in my experience allocation rate becomes
even more important for performance (both latency and throughput) when
introducing parallelism. The reason is that our GC is a stop-the-world
collector and consequently whenever one thread needs to GC, it must stop
and wait for all others to stop as well. This means that your program
will be synchronizing constantly, even if your threads are working on
completely non-intersecting sets of heap objects, greatly limiting your
usable parallelism. You can observe this behavior using the eventlog.

It is possible to tune the GC to trade latency for throughput without
changing your program. Increasing the size of the nursery will mean that
each thread will take longer to fill its nursery and consequently
require less frequent GC. However, this means that each minor GC will
take longer.

For this reason, if your program is truly embarassingly parallel, it is
usually preferrable performance-wise to run multiple processes
concurrently than use pure parallelism (a sad truth in a harsh reality,
in my opinion).


Note that there have been efforts to help this issue in the past: Simon
Marlow has a paper [1] examining a GC scheme with thread-local heaps,
allowing threads to perform minor collections while other threads
continue to run. Unfortunately this introduced a significant amount of
complexity into the RTS which the performance numbers just didn't
justify.

The new compact regions feature also bears on this issue, as it gives
users tools for better managing the effect of their long-lived live data
on GC performance. That being said, this wouldn't help in your case as
you have little long-lived live data.

Some form of the linear types proposal also may some day offer some help
here as it could in principle allow alternatives to today's GC'd heap.
For instance, in your case you may want to use an arena allocation
strategy: when starting a document allocate an arena which will serve as
your nursery, run your computation, copy your result out to the GC'd
heap at the end, and throw away the arena.

Finally, I hope that someone will some day pick up the RTS work again.
This might mean trying another approach to thread-local heaps, allowing
multiple indepedent heaps per RTS, or even allowing multiple RTSs per
process. It is sad that our parallelism story is limited in the ways
that it is and there are still a number of stones that remain unturned.

Cheers,

 - Ben


[1] http://community.haskell.org/~simonmar/local-gc.pdf

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.

signature.asc (497 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Fast JSON validation - reducing allocations

David Turner-2

On 12 May 2017 at 14:44, Ben Gamari <[hidden email]> wrote:

For this reason, if your program is truly embarassingly parallel, it is
usually preferrable performance-wise to run multiple processes
concurrently than use pure parallelism (a sad truth in a harsh reality,
in my opinion).


Hmm, an interesting idea. It's not _all_ embarassingly parallel, but the bottleneck is. You're right, we could have separate front-end processes just doing the validation (indeed they could even be distributed across the network). I don't think we'll do that just yet, particularly not when 8.2's results are so promising, but it sounds a useful idea to keep in the toolbox.

Thanks again,

David


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
12