ANNOUNCE: The Fibon benchmark suite (v0.2.0)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

ANNOUNCE: The Fibon benchmark suite (v0.2.0)

David Peixotto
I'm pleased to announce the release of the Fibon benchmark tools and suite.

Fibon is a set of tools for running and analyzing benchmark programs in
Haskell. Most importantly, it includes an optional set of benchmark
programs including many programs taken from the Hackage open source
repository.

The source code for the tools and benchmarks are available on github


The Fibon tools (without the benchmarks) are available on hackage.


The package needs to be unpacked and built in place to be able to run any
benchmarks. It can be used with the official Fibon benchmarks or you can
create your own suite and just use Fibon to run and analyze your benchmark
programs.

Some more documentation is available on the fibon wiki


Fibon Tools
===================================================================
Fibon is a pure Haskell framework for running and analyzing benchmark
programs. Cabal is used for building the benchmarks. The benchmark
harness, configuration files, and benchmark descriptions are all written in
Haskell. The benchmark descriptions and run configurations are all statically
compiled into the benchmark runner to ensure that configuration errors are
found at compile time.

The Fibon tools are not tied to any compiler infrastructure and can build
benchmarks using any compiler supported by cabal. However, there are some
extra features available when using GHC to build the benchmarks:

  * Support in config files for using an inplace GHC HEAD build
  * Support in `fibon-run` for collecting GC stats from GHC compiled programs
  * Support in `fibon-analyse` for reading GC stats from Fibon result files

The Fibon Benchmark Suite
===================================================================
The Fibon benchmark suite currently contains 34 benchmarks from a variety of
sources. The individual benchmarks and lines of code are given below.

Dph
  _DphLib                    316
  Dotp                       308
  Qsort                      236
  QuickHull                  680
  Sumsq                       72
  ------------------------------
  TOTAL                     1612

Hackage
  Agum                       786
  Bzlib                      432
  Cpsa                     11582
  Crypto                    4486
  Fgl                       3834
  Fst                       4532
  Funsat                   16085
  Gf                       23970
  HaLeX                     4035
  Happy                     5833
  Hgalib                     819
  Palindromes                496
  Pappy                     7313
  QuickCheck                4495
  Regex                     6873
  Simgi                     5134
  TernaryTrees               722
  Xsact                     2783
  ------------------------------
  TOTAL                   104210

Repa
  _RepaLib                  8775
  Blur                        77
  FFT2d                       89
  FFT3d                      103
  Laplace                    274
  MMult                      133
  ------------------------------
  TOTAL                     9451

Shootout
  BinaryTrees                 63
  ChameneosRedux              96
  Fannkuch                    27
  Mandelbrot                  68
  Nbody                      192
  Pidigits                    26
  SpectralNorm                97
  ------------------------------
  TOTAL                      569


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: [Haskell] ANNOUNCE: The Fibon benchmark suite (v0.2.0)

Jason Dagit-2


On Tue, Nov 9, 2010 at 1:24 PM, David Peixotto <[hidden email]> wrote:
I'm pleased to announce the release of the Fibon benchmark tools and suite.

Fibon is a set of tools for running and analyzing benchmark programs in
Haskell. Most importantly, it includes an optional set of benchmark
programs including many programs taken from the Hackage open source
repository.

The source code for the tools and benchmarks are available on github


The Fibon tools (without the benchmarks) are available on hackage.


The package needs to be unpacked and built in place to be able to run any
benchmarks. It can be used with the official Fibon benchmarks or you can
create your own suite and just use Fibon to run and analyze your benchmark
programs.

Some more documentation is available on the fibon wiki


Fibon Tools
===================================================================
Fibon is a pure Haskell framework for running and analyzing benchmark
programs. Cabal is used for building the benchmarks. The benchmark
harness, configuration files, and benchmark descriptions are all written in
Haskell. The benchmark descriptions and run configurations are all statically
compiled into the benchmark runner to ensure that configuration errors are
found at compile time.

The Fibon tools are not tied to any compiler infrastructure and can build
benchmarks using any compiler supported by cabal. However, there are some
extra features available when using GHC to build the benchmarks:

  * Support in config files for using an inplace GHC HEAD build
  * Support in `fibon-run` for collecting GC stats from GHC compiled programs
  * Support in `fibon-analyse` for reading GC stats from Fibon result files

The Fibon Benchmark Suite
===================================================================
The Fibon benchmark suite currently contains 34 benchmarks from a variety of
sources. The individual benchmarks and lines of code are given below.

Congrats on the release!  It looks like you've invested a lot of time and put in some hard work.

I have a few questions:
  * What differentiates fibon from criterion?  I see both use the statistics package.
  * Does it track memory statistics?  I glanced at the FAQ but didn't see anything about it.
  * Are the numbers in the sample output seconds or milliseconds?  What is the stddev (eg., what does the distribution of run-times look like)?

Thanks,
Jason

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: [Haskell] ANNOUNCE: The Fibon benchmark suite (v0.2.0)

David Peixotto

On Nov 9, 2010, at 3:45 PM, Jason Dagit wrote:
I have a few questions:
  * What differentiates fibon from criterion?  I see both use the statistics package.

I think the two packages have different benchmarking targets.

Criterion allows you to easily test individual functions and gives some help with benchmarking in the presence of lazy evaluation. If some code does not execute for a long time it will run it multiple times to get sensible timings. Criterion does a much more sophisticated statistical analysis of the results, but I hope to incorporate that into the Fibon analysis in the future.

Fibon is a more traditional benchmarking suite like SPEC or nofib. My interest is using it to test compiler optimizations. It can only benchmark at the whole program level by running an executable. It checks that the program produces the correct output, can collect extra metrics generated by the program, separates collecting results from analyzing results, and generates tables directly comparing the results from different benchmark runs.

  * Does it track memory statistics?  I glanced at the FAQ but didn't see anything about it.

Yes, it can read memory statistics dumped by the GHC runtime. It has built in support for reading the stats dumped by `+RTS -t --machine-readable` which includes things like bytes allocated and time spent in GC.

  * Are the numbers in the sample output seconds or milliseconds?  What is the stddev (eg., what does the distribution of run-times look like)?

I'm not sure which results you are referring to exactly (the numbers in the announcement were lines of code). I picked benchmarks that all ran for at least a second (and hopefully longer) with compiler optimizations enabled. On an 8-core Xeon, the median time over all benchmarks is 8.43 seconds, mean time is 12.57 seconds and standard deviation is 14.56 seconds.

-David


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: [Haskell] ANNOUNCE: The Fibon benchmark suite (v0.2.0)

Jason Dagit-2


On Tue, Nov 9, 2010 at 5:47 PM, David Peixotto <[hidden email]> wrote:

On Nov 9, 2010, at 3:45 PM, Jason Dagit wrote:
I have a few questions:
  * What differentiates fibon from criterion?  I see both use the statistics package.

I think the two packages have different benchmarking targets.

Criterion allows you to easily test individual functions and gives some help with benchmarking in the presence of lazy evaluation. If some code does not execute for a long time it will run it multiple times to get sensible timings. Criterion does a much more sophisticated statistical analysis of the results, but I hope to incorporate that into the Fibon analysis in the future.

Fibon is a more traditional benchmarking suite like SPEC or nofib. My interest is using it to test compiler optimizations. It can only benchmark at the whole program level by running an executable. It checks that the program produces the correct output, can collect extra metrics generated by the program, separates collecting results from analyzing results, and generates tables directly comparing the results from different benchmark runs.

  * Does it track memory statistics?  I glanced at the FAQ but didn't see anything about it.

Yes, it can read memory statistics dumped by the GHC runtime. It has built in support for reading the stats dumped by `+RTS -t --machine-readable` which includes things like bytes allocated and time spent in GC.

Oh, I see.  In that case, it's more similar to darcs-benchmark.  Except that darcs-benchmark is tailored specifically at benchmarking darcs.  Where they overlap is parsing the RTS statistics, running the whole program, and tabular reports.  Darcs-benchmark adds to that an embedded DSL for specifying operations to do on the repository between benchmarks (and translating those operations to runnable shell snippets).

I wonder if Fibon and darcs-benchmark could share common infrastructure beyond the statistics package.  It sure sounds like it to me.  Perhaps some collaboration is in order.


  * Are the numbers in the sample output seconds or milliseconds?  What is the stddev (eg., what does the distribution of run-times look like)?

I'm not sure which results you are referring to exactly (the numbers in the announcement were lines of code). I picked benchmarks that all ran for at least a second (and hopefully longer) with compiler optimizations enabled. On an 8-core Xeon, the median time over all benchmarks is 8.43 seconds, mean time is 12.57 seconds and standard deviation is 14.56 seconds.

I probably read your email too fast, sorry.  Thanks for the clarification.

Thanks,
Jason

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: [Haskell] ANNOUNCE: The Fibon benchmark suite (v0.2.0)

David Peixotto
Hi Jason,

Sorry for the delayed response. Thanks for pointing out the darcs-benchmark
package. I had not seen that before and there may be some room for sharing
infrastructure. Parsing the runtime stats is pretty easy, but comparing
different runs, computing statistics, and generating tables should be a
common task.

On a related note, when I uploaded the fibon package, I put it in a new
"Benchmarking" category as opposed to the existing "Testing" category. In my
mind testing is more for correctness and benchmarking is for performance. I
think it would be useful to include other benchmarking packages
(darcs-benchmark, criterion) in that category.



--------------------------------------------------
From: "Jason Dagit" <[hidden email]>
Sent: Tuesday, November 09, 2010 7:58 PM
To: "David Peixotto" <[hidden email]>
Cc: <[hidden email]>; <[hidden email]>
Subject: Re: [Haskell] ANNOUNCE: The Fibon benchmark suite (v0.2.0)

> On Tue, Nov 9, 2010 at 5:47 PM, David Peixotto <[hidden email]> wrote:
>
>>
>> On Nov 9, 2010, at 3:45 PM, Jason Dagit wrote:
>>
>> I have a few questions:
>>   * What differentiates fibon from criterion?  I see both use the
>> statistics package.
>>
>>
>> I think the two packages have different benchmarking targets.
>>
>> Criterion allows you to easily test individual functions and gives some
>> help with benchmarking in the presence of lazy evaluation. If some code
>> does
>> not execute for a long time it will run it multiple times to get sensible
>> timings. Criterion does a much more sophisticated statistical analysis of
>> the results, but I hope to incorporate that into the Fibon analysis in
>> the
>> future.
>>
>> Fibon is a more traditional benchmarking suite like SPEC or nofib. My
>> interest is using it to test compiler optimizations. It can only
>> benchmark
>> at the whole program level by running an executable. It checks that the
>> program produces the correct output, can collect extra metrics generated
>> by
>> the program, separates collecting results from analyzing results, and
>> generates tables directly comparing the results from different benchmark
>> runs.
>>
>>   * Does it track memory statistics?  I glanced at the FAQ but didn't see
>> anything about it.
>>
>>
>> Yes, it can read memory statistics dumped by the GHC runtime. It has
>> built
>> in support for reading the stats dumped by `+RTS -t --machine-readable`
>> which includes things like bytes allocated and time spent in GC.
>>
>
> Oh, I see.  In that case, it's more similar to darcs-benchmark.  Except
> that
> darcs-benchmark is tailored specifically at benchmarking darcs.  Where
> they
> overlap is parsing the RTS statistics, running the whole program, and
> tabular reports.  Darcs-benchmark adds to that an embedded DSL for
> specifying operations to do on the repository between benchmarks (and
> translating those operations to runnable shell snippets).
>
> I wonder if Fibon and darcs-benchmark could share common infrastructure
> beyond the statistics package.  It sure sounds like it to me.  Perhaps
> some
> collaboration is in order.
>
>
>>   * Are the numbers in the sample output seconds or milliseconds?  What
>> is
>> the stddev (eg., what does the distribution of run-times look like)?
>>
>>
>> I'm not sure which results you are referring to exactly (the numbers in
>> the
>> announcement were lines of code). I picked benchmarks that all ran for at
>> least a second (and hopefully longer) with compiler optimizations
>> enabled.
>> On an 8-core Xeon, the median time over all benchmarks is 8.43 seconds,
>> mean
>> time is 12.57 seconds and standard deviation is 14.56 seconds.
>>
>
> I probably read your email too fast, sorry.  Thanks for the clarification.
>
> Thanks,
> Jason
>
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe