Haskell version of ray tracer code is much slower than the original ML

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
40 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Haskell version of ray tracer code is much slower than the original ML

Phil Armstrong-2
In odd spare moments, I took John Harrops simple ray tracer[1] & made a
Haskell version:

  http://www.kantaka.co.uk/cgi-bin/darcsweb.cgi?r=ray

  darcs get http://www.kantaka.co.uk/darcs/ray

It's pretty much a straight translation into idiomatic Haskell (as far
as my Haskell is idiomatic anyway).

Unfortunately, it's a lot slower than the ML version, despite turning
all the optimisation options up as far as they'll go. Profiling
suggests that much of the time is spent in the intersection' function,
and that the code is creating (and garbage collecting) an awful lot of
(-|) vector subtraction thunks. Trying to make intersection' or
ray_sphere stricter (with seq) appears to have no effect whatsoever:
the output of -ddump-simpl is unchanged (with the arguments all
staying lazy).

Am I missing anything obvious? I don't want to carry out herculean
code rewriting efforts: that wouldn't really be in the spirit of the
thing.

cheers, Phil

[1] http://www.ffconsultancy.com/languages/ray_tracer/comparison.html

--
http://www.kantaka.co.uk/ .oOo. public key: http://www.kantaka.co.uk/gpg.txt
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Haskell version of ray tracer code is much slower than the original ML

Phil Armstrong-2
On Thu, Jun 21, 2007 at 12:25:44PM +0100, Sebastian Sylvan wrote:
>Try using floats for the vector, and strict fields (add a ! to the
>fields in the data declaration).

Because the optimisation page on the haskell wiki is very explicit
about never using Float when you can use Double, that's why. An older
revision used Float and it was slower than the current one. Making the
datatypes strict also makes no difference.

I have tried the obvious things :)

>That's the simplest possible thing I can think of after about two
>seconds of looking anyway. It appears that the ML version uses float
>so I don't understand why you would use Double for the Haskell version
>at all, and then think you could do any sort of valid comparisons
>between them...

OCaML floats are Doubles, at least on x86.

cheers, Phil

--
http://www.kantaka.co.uk/ .oOo. public key: http://www.kantaka.co.uk/gpg.txt
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re[2]: Haskell version of ray tracer code is much slower than the original ML

Bulat Ziganshin-2
Hello Philip,

Thursday, June 21, 2007, 3:36:27 PM, you wrote:

> revision used Float and it was slower than the current one. Making the
> datatypes strict also makes no difference.

don't forget to use either -funpack-strict-fields or {#- UNPACK -#} pragma



--
Best regards,
 Bulat                            mailto:[hidden email]

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Haskell version of ray tracer code is much slower than the original ML

Jon Harrop
In reply to this post by Phil Armstrong-2

Awesome stuff!

On Thursday 21 June 2007 12:36:27 Philip Armstrong wrote:
> On Thu, Jun 21, 2007 at 12:25:44PM +0100, Sebastian Sylvan wrote:
> >Try using floats for the vector, and strict fields (add a ! to the
> >fields in the data declaration).
>
> Because the optimisation page on the haskell wiki is very explicit
> about never using Float when you can use Double, that's why. An older
> revision used Float and it was slower than the current one. Making the
> datatypes strict also makes no difference.

Where exactly do the !s go and what do they do?

> >That's the simplest possible thing I can think of after about two
> >seconds of looking anyway. It appears that the ML version uses float
> >so I don't understand why you would use Double for the Haskell version
> >at all, and then think you could do any sort of valid comparisons
> >between them...
>
> OCaML floats are Doubles, at least on x86.

Yes. OCaml doesn't have a 32-bit float storage format, apart from an
entirely-float big array. Also, the ML in OCaml doesn't stand for
metalanguage. ;-)

There is probably some benefit to laziness in this example because many of the
spheres are occluded in the final image, so parts of the scene tree that are
eagerly generated in the other implementations may actually never be
traversed by the renderer.

I take it you saw the whole language comparison:

  http://www.ffconsultancy.com/languages/ray_tracer/

I'll be uploading concurrent implementations ASAP. Haskell should do well
there... :-)

PS: You spelled my name wrong!
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
The OCaml Journal
http://www.ffconsultancy.com/products/ocaml_journal/?e
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Haskell version of ray tracer code is much slower than the original ML

Phil Armstrong-2
In reply to this post by Bulat Ziganshin-2
On Thu, Jun 21, 2007 at 04:23:37PM +0400, Bulat Ziganshin wrote:
>Thursday, June 21, 2007, 3:36:27 PM, you wrote:
>> revision used Float and it was slower than the current one. Making the
>> datatypes strict also makes no difference.
>
>don't forget to use either -funpack-strict-fields or {#- UNPACK -#} pragma

ahem:

  http://www.kantaka.co.uk/cgi-bin/darcsweb.cgi?r=ray;a=headblob;f=/Makefile

(Assuming you mean -funbox-strict-fields; -funpack-strict-fields
doesn't appear in the ghc 6.6.1 makefile as far as I can see.)

As I said, I've tried the obvious things & they didn't make any
difference. Now I could go sprinkling $!, ! and seq around like
confetti but that seems like giving up really.

Phil

--
http://www.kantaka.co.uk/ .oOo. public key: http://www.kantaka.co.uk/gpg.txt
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Haskell version of ray tracer code is much slower than the original ML

Phil Armstrong-2
In reply to this post by Jon Harrop
On Thu, Jun 21, 2007 at 01:39:24PM +0100, Jon Harrop wrote:

>Awesome stuff!
>
>On Thursday 21 June 2007 12:36:27 Philip Armstrong wrote:
>> On Thu, Jun 21, 2007 at 12:25:44PM +0100, Sebastian Sylvan wrote:
>> >Try using floats for the vector, and strict fields (add a ! to the
>> >fields in the data declaration).
>>
>> Because the optimisation page on the haskell wiki is very explicit
>> about never using Float when you can use Double, that's why. An older
>> revision used Float and it was slower than the current one. Making the
>> datatypes strict also makes no difference.
>
>Where exactly do the !s go and what do they do?

On the datatypes:

data Vector = V !Double !Double !Double

for instance. They tell the compiler to make those fields strict
rather than lazy. This may or may not help things...

>> OCaML floats are Doubles, at least on x86.
>
>Yes. OCaml doesn't have a 32-bit float storage format, apart from an
>entirely-float big array. Also, the ML in OCaml doesn't stand for
>metalanguage. ;-)

Point!

>I take it you saw the whole language comparison:
>
>  http://www.ffconsultancy.com/languages/ray_tracer/

Yup. I'd been meaning to run off a haskell version for a
while.

>I'll be uploading concurrent implementations ASAP. Haskell should do well
>there... :-)

I'll be on the lookout.

>PS: You spelled my name wrong!

Sorry!

Phil

--
http://www.kantaka.co.uk/ .oOo. public key: http://www.kantaka.co.uk/gpg.txt
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Haskell version of ray tracer code is much slower than the original ML

Derek Elkins
In reply to this post by Jon Harrop
On Thu, 2007-06-21 at 13:39 +0100, Jon Harrop wrote:

> Awesome stuff!
>
> On Thursday 21 June 2007 12:36:27 Philip Armstrong wrote:
> > On Thu, Jun 21, 2007 at 12:25:44PM +0100, Sebastian Sylvan wrote:
> > >Try using floats for the vector, and strict fields (add a ! to the
> > >fields in the data declaration).
> >
> > Because the optimisation page on the haskell wiki is very explicit
> > about never using Float when you can use Double, that's why. An older
> > revision used Float and it was slower than the current one. Making the
> > datatypes strict also makes no difference.
>
> Where exactly do the !s go and what do they do?

> > >That's the simplest possible thing I can think of after about two
> > >seconds of looking anyway. It appears that the ML version uses float
> > >so I don't understand why you would use Double for the Haskell version
> > >at all, and then think you could do any sort of valid comparisons
> > >between them...
> >
> > OCaML floats are Doubles, at least on x86.
>
> Yes. OCaml doesn't have a 32-bit float storage format, apart from an
> entirely-float big array. Also, the ML in OCaml doesn't stand for
> metalanguage. ;-)

To be technical, it should be OCAML.

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Haskell version of ray tracer code is much slower than the original ML

Mark T.B. Carroll-2
In reply to this post by Phil Armstrong-2
Philip Armstrong <[hidden email]> writes:
(snip)
> Because the optimisation page on the haskell wiki is very explicit
> about never using Float when you can use Double, that's why.
(snip)

Is that still true if you use -fexcess-precision ?

-- Mark

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
bf3
Reply | Threaded
Open this post in threaded view
|

RE: Haskell version of ray tracer code is much slower than the original ML

bf3
In reply to this post by Phil Armstrong-2
So float math in *slower* than double math in Haskell? That is interesting.
Why is that?

BTW, does Haskell support 80-bit "long double"s? The Intel CPU seems to use
that format internally.

-----Original Message-----
From: [hidden email]
[mailto:[hidden email]] On Behalf Of Philip Armstrong
Sent: Thursday, June 21, 2007 1:36 PM
To: [hidden email]; Sebastian Sylvan
Subject: Re: [Haskell-cafe] Haskell version of ray tracer code is much
slower than the original ML

On Thu, Jun 21, 2007 at 12:25:44PM +0100, Sebastian Sylvan wrote:
>Try using floats for the vector, and strict fields (add a ! to the
>fields in the data declaration).

Because the optimisation page on the haskell wiki is very explicit
about never using Float when you can use Double, that's why. An older
revision used Float and it was slower than the current one. Making the
datatypes strict also makes no difference.

I have tried the obvious things :)

>That's the simplest possible thing I can think of after about two
>seconds of looking anyway. It appears that the ML version uses float
>so I don't understand why you would use Double for the Haskell version
>at all, and then think you could do any sort of valid comparisons
>between them...

OCaML floats are Doubles, at least on x86.

cheers, Phil

--
http://www.kantaka.co.uk/ .oOo. public key: http://www.kantaka.co.uk/gpg.txt
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Haskell version of ray tracer code is much slower than the original ML

Phil Armstrong-2
In reply to this post by Mark T.B. Carroll-2
On Thu, Jun 21, 2007 at 01:29:56PM -0400, Mark T.B. Carroll wrote:
>Philip Armstrong <[hidden email]> writes:
>(snip)
>> Because the optimisation page on the haskell wiki is very explicit
>> about never using Float when you can use Double, that's why.
>(snip)
>
>Is that still true if you use -fexcess-precision ?

Why on earth would you use -fexcess-precision if you're using Floats?
The excess precision only apples to Doubles held in registers on x86
IIRC. (If you spill a Double from a register to memory, then you lose
the extra precision bits in the process).

Unless -fexcess-precision with ghc does something completely different
to the analogous gcc setting that is.

Phil

--
http://www.kantaka.co.uk/ .oOo. public key: http://www.kantaka.co.uk/gpg.txt
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Haskell version of ray tracer code is much slower than the original ML

Phil Armstrong-2
In reply to this post by bf3
On Thu, Jun 21, 2007 at 08:15:36PM +0200, peterv wrote:
>So float math in *slower* than double math in Haskell? That is interesting.
>Why is that?
>
>BTW, does Haskell support 80-bit "long double"s? The Intel CPU seems to use
>that format internally.

As I understand things, that is the effect of using -fexcess-precision.

Obviously this means that the behaviour of your program can change
with seemingly trivial code rearrangements, but if you're messing with
floating point numbers then you ought to know what you're doing anyway.

Phil

--
http://www.kantaka.co.uk/ .oOo. public key: http://www.kantaka.co.uk/gpg.txt
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Haskell version of ray tracer code is much slower than the original ML

Mark T.B. Carroll-2
In reply to this post by Phil Armstrong-2
Philip Armstrong <[hidden email]> writes:
(snip)
> Why on earth would you use -fexcess-precision if you're using Floats?
> The excess precision only apples to Doubles held in registers on x86
> IIRC. (If you spill a Double from a register to memory, then you lose
> the extra precision bits in the process).

Some googling suggests that point 2 on
http://www.haskell.org/hawiki/FasterFloatingPointWithGhc
might have been what I was thinking of.

-- Mark

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Haskell version of ray tracer code is much slower than the original ML

Phil Armstrong-2
On Thu, Jun 21, 2007 at 03:29:17PM -0400, Mark T.B. Carroll wrote:

>Philip Armstrong <[hidden email]> writes:
>(snip)
>> Why on earth would you use -fexcess-precision if you're using Floats?
>> The excess precision only apples to Doubles held in registers on x86
>> IIRC. (If you spill a Double from a register to memory, then you lose
>> the extra precision bits in the process).
>
>Some googling suggests that point 2 on
>http://www.haskell.org/hawiki/FasterFloatingPointWithGhc
>might have been what I was thinking of.

That's the old wiki. The new one gives the opposite advice! (As does
the ghc manual):

  http://www.haskell.org/ghc/docs/latest/html/users_guide/faster.html
  http://www.haskell.org/haskellwiki/Performance/Floating_Point

Phil

--
http://www.kantaka.co.uk/ .oOo. public key: http://www.kantaka.co.uk/gpg.txt
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Haskell version of ray tracer code is much slower than the original ML

Phil Armstrong-2
On Thu, Jun 21, 2007 at 08:42:57PM +0100, Philip Armstrong wrote:

>On Thu, Jun 21, 2007 at 03:29:17PM -0400, Mark T.B. Carroll wrote:
>>Philip Armstrong <[hidden email]> writes:
>>(snip)
>>>Why on earth would you use -fexcess-precision if you're using Floats?
>>>The excess precision only apples to Doubles held in registers on x86
>>>IIRC. (If you spill a Double from a register to memory, then you lose
>>>the extra precision bits in the process).
>>
>>Some googling suggests that point 2 on
>>http://www.haskell.org/hawiki/FasterFloatingPointWithGhc
>>might have been what I was thinking of.
>
>That's the old wiki. The new one gives the opposite advice! (As does
>the ghc manual):
>
>  http://www.haskell.org/ghc/docs/latest/html/users_guide/faster.html
>  http://www.haskell.org/haskellwiki/Performance/Floating_Point

Incidentally, the latter page implies that ghc is being overly
pessimistic when compilling FP code without -fexcess-precision:

 "On x86 (and other platforms with GHC prior to version 6.4.2), use
  the -fexcess-precision flag to improve performance of floating-point
  intensive code (up to 2x speedups have been seen). This will keep
  more intermediates in registers instead of memory, at the expense of
  occasional differences in results due to unpredictable rounding."

IIRC, it is possible to issue an instruction to the x86 FP unit which
makes all operations work on 64-bit Doubles, even though there are
80-bits available internally. Which then means there's no requirement
to spill intermediate results to memory in order to get the rounding
correct.

Ideally, -fexcess-precision should just affect whether the FP unit
uses 80 or 64 bit Doubles. It shouldn't make any performance
difference, although obviously the generated results may be different.

As an aside, if you use the -optc-mfpmath=sse option, then you only
get 64-bit Doubles anyway (on x86).

cheers, Phil

--
http://www.kantaka.co.uk/ .oOo. public key: http://www.kantaka.co.uk/gpg.txt
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Haskell version of ray tracer code is much slower than the original ML

Simon Marlow-5
Philip Armstrong wrote:

> On Thu, Jun 21, 2007 at 08:42:57PM +0100, Philip Armstrong wrote:
>> On Thu, Jun 21, 2007 at 03:29:17PM -0400, Mark T.B. Carroll wrote:
>
>> That's the old wiki. The new one gives the opposite advice! (As does
>> the ghc manual):
>>
>>  http://www.haskell.org/ghc/docs/latest/html/users_guide/faster.html
>>  http://www.haskell.org/haskellwiki/Performance/Floating_Point
>
> Incidentally, the latter page implies that ghc is being overly
> pessimistic when compilling FP code without -fexcess-precision:
>
> "On x86 (and other platforms with GHC prior to version 6.4.2), use
>  the -fexcess-precision flag to improve performance of floating-point
>  intensive code (up to 2x speedups have been seen). This will keep
>  more intermediates in registers instead of memory, at the expense of
>  occasional differences in results due to unpredictable rounding."
>
> IIRC, it is possible to issue an instruction to the x86 FP unit which
> makes all operations work on 64-bit Doubles, even though there are
> 80-bits available internally. Which then means there's no requirement
> to spill intermediate results to memory in order to get the rounding
> correct.

For some background on why GHC doesn't do this, see the comment "MORE FLOATING
POINT MUSINGS..." in

   http://darcs.haskell.org/ghc/compiler/nativeGen/MachInstrs.hs

The main problem is floats: even if you put the FPU into 64-bit mode, your float
operations will be done at 64-bit precision.  There are other technical problems
that we found with doing this, the comment above elaborates.

GHC passes -ffloat-store to GCC, unless you give the flag -fexcess-precision.
The idea is to try to get reproducible floating-point results.  The native code
generator is unaffected by -fexcess-precision, but it produces rubbish
floating-point code on x86 anyway.

> Ideally, -fexcess-precision should just affect whether the FP unit
> uses 80 or 64 bit Doubles. It shouldn't make any performance
> difference, although obviously the generated results may be different.
 >
> As an aside, if you use the -optc-mfpmath=sse option, then you only
> get 64-bit Doubles anyway (on x86).

You probably want SSE2.  If I ever get around to finishing it, the GHC native
code generator will be able to generate SSE2 code on x86 someday, like it
currently does for x86-64.  For now, to get good FP performance on x86, you
probably want

   -fvia-C -fexcess-precision -optc-mfpmath=sse2

Cheers,
        Simon
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Haskell version of ray tracer code is much slower than the original ML

Simon Marlow-5
In reply to this post by Phil Armstrong-2
Philip Armstrong wrote:

> On Thu, Jun 21, 2007 at 08:15:36PM +0200, peterv wrote:
>> So float math in *slower* than double math in Haskell? That is
>> interesting.
>> Why is that?  
>>
>> BTW, does Haskell support 80-bit "long double"s? The Intel CPU seems
>> to use
>> that format internally.
>
> As I understand things, that is the effect of using -fexcess-precision.
>
> Obviously this means that the behaviour of your program can change
> with seemingly trivial code rearrangements,

Not just code rearrangements: your program will give different results depending
on the optimisation settings, whether you compile with -fvia-C or -fasm, and the
results will be different from those on a machine using fixed 32-bit or 64-bit
precision floating point operations.

Cheers,
        Simon

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Haskell version of ray tracer code is much slower than the original ML

Phil Armstrong-2
In reply to this post by Simon Marlow-5
On Fri, Jun 22, 2007 at 01:16:54PM +0100, Simon Marlow wrote:

>Philip Armstrong wrote:
>>IIRC, it is possible to issue an instruction to the x86 FP unit which
>>makes all operations work on 64-bit Doubles, even though there are
>>80-bits available internally. Which then means there's no requirement
>>to spill intermediate results to memory in order to get the rounding
>>correct.
>
>For some background on why GHC doesn't do this, see the comment "MORE
>FLOATING POINT MUSINGS..." in
>
>   http://darcs.haskell.org/ghc/compiler/nativeGen/MachInstrs.hs

Twisty. I guess 'slow, but correct, with switches to go faster at the
price of correctness' is about the best option.

>You probably want SSE2.  If I ever get around to finishing it, the GHC
>native code generator will be able to generate SSE2 code on x86 someday,
>like it currently does for x86-64.  For now, to get good FP performance on
>x86, you probably want
>
>   -fvia-C -fexcess-precision -optc-mfpmath=sse2

Reading the gcc manpage, I think you mean -optc-msse2
-optc-mfpmath=sse. -mfpmath=sse2 doesn't appear to be an option.

(I note in passing that the ghc darcs head produces binaries from
ray.hs which are about 15% slower than ghc 6.6.1 ones btw. Same
optimisation options used both times.)

cheers, Phil

--
http://www.kantaka.co.uk/ .oOo. public key: http://www.kantaka.co.uk/gpg.txt
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Re: Haskell version of ray tracer code is much slower than the original ML

Claus Reinke
In reply to this post by Simon Marlow-5
>   -fvia-C -fexcess-precision -optc-mfpmath=sse2

is there, or should there be a way to define -O "profiles" for ghc?
so that -O would refer to the standard profile, -Ofp would refer
to the combination above as a floating point optiimisation profile,
other profiles might include things like -funbox-strict-fields, and
-Omy42 would refer to my own favourite combination of flags..

perhaps this should be generalised to ghc flag profiles, to cover
things like '-fno-monomorphism-restriction -fno-mono-pat-binds'
or '-fglasgow-exts -fallow-undecidable-instances; and the like?

just a thought,
claus

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Re: Haskell version of ray tracer code is much slower than the original ML

Dougal Stanton
On 22/06/07, Claus Reinke <[hidden email]> wrote:

> perhaps this should be generalised to ghc flag profiles, to cover
> things like '-fno-monomorphism-restriction -fno-mono-pat-binds'
> or '-fglasgow-exts -fallow-undecidable-instances; and the like?

You just *know* someone's gonna abuse that to make a genuine
-funroll-loops, right? ;-)

D.
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Re: Haskell version of ray tracer code is muchslower than the original ML

Claus Reinke
In reply to this post by Claus Reinke
on second thought, user-defined profiles are a two-edged sword,
negating the documentation advantages of in-source flags. better to
handle that in the editor/ide. but predefined flag profiles would still
seem to make sense?

there is something wrong about this wealth of options. it is great
that one has all that control over details, but it also makes it more
difficult to get things right (eg, i was surprised that -O doesn't
unbox strict fields by default). even a formula one driver doesn't
control every lever himself, that's up to the team.

for optimisations, i used to have a simple picture in mind (from
my c days, i guess?), when ghci is no longer fast enough, that is:

no -O: standard executables are fast enough, thank you

-O: standard executables aren't fast enough, do something
    about it, but don't bother me with the details

-O2: i need your best _safe_ optimisation efforts, and i'm
    prepared to pay for that with longer compilation times

-O3: i need your absolute best optimisation efforts, and i'm
    prepared to verify myself that optimisations that cannot
    automatically be checked for safety have no serious negative
    effect on the results (it would be nice if you told me which
    potentially unsafe optimisations you used in compilation)

on top of that, as an alternative to -O3, specific tradeoffs would
be useful, where i specify whether i want to optimize for space
or for time, or which kinds of optimization opportunities the
compiler should pay attention to, such as strictness, unboxing,
floating point ops, etc.. but even here i wouldn't want to give
platform-specific options, i'd want the compiler to choose the
most appropriate options, given my specified tradeoffs and
emphasis, taking into account platform and self-knowledge.

so, i'd say -Ofp, and the compiler might pick:

>>   -fvia-C -fexcess-precision -optc-mfpmath=sse2

if i'm on a platform and compiler version where that is an
appropriate selection of flags to get the best floating point
performance. and it might pick a different selection of flags
on a different platform, or with a different compiler version.

> perhaps this should be generalised to ghc flag profiles, to cover
> things like '-fno-monomorphism-restriction -fno-mono-pat-binds'
> or '-fglasgow-exts -fallow-undecidable-instances; and the like?

that is a slightly different story, and it might be useful (a) to
provide flag groups (-fno-mono*) and (b) to specify implication
(just about every language extension flag implies -fglasgow-exts,
so there's no need to specify that again, and there might be
other opportunities for reducing groups of options with a
single maximum in the implication order; one might even
introduce pseudo-flags for grouping, such as -fhaskell2;-).

claus

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
12