Number 1, at least for now

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Number 1, at least for now

Donald Bruce Stewart
Haskell is now ranked number 1 on the Great Language Shootout!
   
    http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=all

Hooray :)

-- Don
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Number 1, at least for now

Sebastian Sylvan
On 2/1/06, Donald Bruce Stewart <[hidden email]> wrote:
> Haskell is now ranked number 1 on the Great Language Shootout!
>
>     http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=all
>
> Hooray :)
>

That is neat. Mostly for dispelling the "pure lazy fp is inherently
slow" argument. It's not likely to last though, as soon as someone
implements the three missing C programs we'll be bumped down again.
But it's still pretty cool!

/S

--
Sebastian Sylvan
+46(0)736-818655
UIN: 44640862
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Number 1, at least for now

Ben Lippmeier
Sebastian Sylvan wrote:
>>Haskell is now ranked number 1 on the Great Language Shootout!
>>
> That is neat. Mostly for dispelling the "pure lazy fp is inherently
> slow" argument.

Ha! I don't think "pure lazy fp" would be the phrase I would choose to
describe this code.

An example (from fannkuch):

t <- IO $ \s ->
      case readIntOffAddr# p1# 0# s of
        (# s, p1 #) -> case readIntOffAddr# p1# (n# -# 1#) s of
            (# s, pn #) -> (# s, not (p1 ==# 0# || pn ==# (n# -# 1#)) #)

Ben.
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Number 1, at least for now

Donald Bruce Stewart
Ben.Lippmeier:

> Sebastian Sylvan wrote:
> >>Haskell is now ranked number 1 on the Great Language Shootout!
> >>
> >That is neat. Mostly for dispelling the "pure lazy fp is inherently
> >slow" argument.
>
> Ha! I don't think "pure lazy fp" would be the phrase I would choose to
> describe this code.
>
> An example (from fannkuch):
>
> t <- IO $ \s ->
>      case readIntOffAddr# p1# 0# s of
>        (# s, p1 #) -> case readIntOffAddr# p1# (n# -# 1#) s of
>            (# s, pn #) -> (# s, not (p1 ==# 0# || pn ==# (n# -# 1#)) #)
>

Ok, I'll bite ;) This is a very atypical "example", as fannkuch is the
only entry written this way (and it could be rewritten, with careful                                  attention to the Core). There are also many lovely pure, lazy entries.

A great thing about pure, lazy Haskell is that you can write impure
strict code if you have to.

Cheers,
  Don
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Number 1, at least for now

Ben Lippmeier
Donald Bruce Stewart wrote:

>>Ha! I don't think "pure lazy fp" would be the phrase I would choose to
>>describe this code.
>>
>>An example (from fannkuch):
>>
>>t <- IO $ \s ->
>>     case readIntOffAddr# p1# 0# s of
>>       (# s, p1 #) -> case readIntOffAddr# p1# (n# -# 1#) s of
>>           (# s, pn #) -> (# s, not (p1 ==# 0# || pn ==# (n# -# 1#)) #)
>
> Ok, I'll bite ;) This is a very atypical "example", as fannkuch is the
> only entry written this way (and it could be rewritten, with careful
 > attention to the Core). There are also many lovely pure, lazy entries.


Not so atypical.

More examples (I didn't look /that/ hard.. :))

-----------------------------------
* From reverse-complement

reverseit iubc strand i l s =
     if i >=# l
         then (# s, (I# i, I# l) #)
         else case readWord8OffAddr# strand i s  of { (# s, c #) ->
              case readWord8OffAddr# strand l s  of { (# s, x #) ->
              case readWord8OffAddr# iubc (word2Int# x) s  of { (# s, y#)
[snip]


* From k-nucleotide.

eqmem i ptr1 ptr2 s = if i ==# 0# then (# s , True #) else
     case readInt8OffAddr# ptr1 0# s of { (# s, i8a #) ->
     case readInt8OffAddr# ptr2 0# s of { (# s, i8b #) ->
     if i8a ==# i8b
         then eqmem (i -# 1#) (plusAddr# ptr1 1#) (plusAddr# ptr2 1#) s
         else (# s, False #) } }


* From n-body.

kineticE i = let i' = (.|. (shiftL i 3))
              in do m <- mass i
                    vx <- unsafeRead b (i' vx)
                    vy <- unsafeRead b (i' vy)
                    vz <- unsafeRead b (i' vz)
                    return $! 0.5 * m * (vx*vx + vy*vy + vz*vz)

----------------------

*I* am certainly not implying that Haskell is anything less than the
most wonderous language in the entire world.

I'm saying that there's a stark difference in style between the programs
submitted to the shootout, and the ones I would show to people that I
myself was trying to introduce to the wonders of purely lazy functional
programming. :).

I think there's a big-fat-lesson about the tension between abstraction
and implementation in these entries.

On one hand we've got "This is what I want", on the other it's "What do
I have to do to implement it".


Ben.



_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Number 1, at least for now

John Meacham
On Thu, Feb 02, 2006 at 02:01:28PM +1100, Ben Lippmeier wrote:
> I think there's a big-fat-lesson about the tension between abstraction
> and implementation in these entries.

though, I think this is a great oprotunity to improve ghc's optimizer.
compare the core output from the hand optimized ones with the naturally
written ones (that use the exact same algorithms, just not with all the
crazy hand-unboxing action) and look for places where ghc is not
optimizing something it could be. or at least, add some 'seq's until it
generates the same core and submit that program as it is more idiomatic
haskell if somewhat hand-optimized. (but not so much as to use explicit
boxing)

        John

--
John Meacham - ⑆repetae.net⑆john⑈
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Number 1, at least for now

Donald Bruce Stewart
In reply to this post by Ben Lippmeier
Ben.Lippmeier:

> Donald Bruce Stewart wrote:
>
> >>Ha! I don't think "pure lazy fp" would be the phrase I would choose to
> >>describe this code.
> >>
> >>An example (from fannkuch):
> >>
> >>t <- IO $ \s ->
> >>    case readIntOffAddr# p1# 0# s of
> >>      (# s, p1 #) -> case readIntOffAddr# p1# (n# -# 1#) s of
> >>          (# s, pn #) -> (# s, not (p1 ==# 0# || pn ==# (n# -# 1#)) #)
> >
> >Ok, I'll bite ;) This is a very atypical "example", as fannkuch is the
> >only entry written this way (and it could be rewritten, with careful
> > attention to the Core). There are also many lovely pure, lazy entries.
>
> Not so atypical.
>
> More examples (I didn't look /that/ hard.. :))
>
> -----------------------------------
> * From reverse-complement
> * From k-nucleotide.
> * From n-body.

Yeah, these were the hard entries to improve, and we tended towards
brute force. The problems were either IO intensive, or requiring mutable
state by spec definition, so we end up with tightly optimised inner loops,
and this is how we do that in Haskell :)

Still, the majority of code is not written this way -- problems not
doing a lot of IO, or needing mutable arrays, are written at a higher
level.

The plan is that most of these ugly entries disappear once packed
strings are in the base library, which solves the IO issue. A good
packed string regex library would also be useful.

> *I* am certainly not implying that Haskell is anything less than the
> most wonderous language in the entire world.
>
> I'm saying that there's a stark difference in style between the programs
> submitted to the shootout, and the ones I would show to people that I
> myself was trying to introduce to the wonders of purely lazy functional
> programming. :).

Hehe, Maybe I should write "Learning Haskell: The Shootout Way" :)

But sometimes elegance and speed do coincide. You could show them:
    * chameneos
    * pidigits
    * binary-trees
    * cheap-concurrency
    * partial-sums

rather than the nasty IO problems.
 
> I think there's a big-fat-lesson about the tension between abstraction
> and implementation in these entries.
>
> On one hand we've got "This is what I want", on the other it's "What do
> I have to do to implement it".

Fair enough.

I think my point is that the examples you point to illustrate the main
problem we've had: fast, heavy duty IO and fast mutable unboxed arrays.

We can expect improvements on both these issues in the next few months.

-- Don
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Number 1, at least for now

haskell-2
In reply to this post by Ben Lippmeier
Everyone is welcome to write 'better' code for the shootout.  Its a good way to
learn useful techniques, and share them.  I will add notes about the benefits
that were to be had from using this ugly code.

Ben Lippmeier wrote:

> Donald Bruce Stewart wrote:
>
>>> Ha! I don't think "pure lazy fp" would be the phrase I would choose
>>> to describe this code.
>>>
>>> An example (from fannkuch):
>>>
>>> t <- IO $ \s ->
>>>     case readIntOffAddr# p1# 0# s of
>>>       (# s, p1 #) -> case readIntOffAddr# p1# (n# -# 1#) s of
>>>           (# s, pn #) -> (# s, not (p1 ==# 0# || pn ==# (n# -# 1#)) #)
>>
>> Ok, I'll bite ;) This is a very atypical "example", as fannkuch is the
>> only entry written this way (and it could be rewritten, with careful
>> attention to the Core). There are also many lovely pure, lazy entries.
>
>

There are two Haskell entries for fannkuch.  The "normal" one has this code:

> rotate 2 (x1:x2:xs) = x2:x1:xs
> rotate 3 (x1:x2:x3:xs) = x2:x3:x1:xs
> rotate 4 (x1:x2:x3:x4:xs) = x2:x3:x4:x1:xs
> rotate 5 (x1:x2:x3:x4:x5:xs) = x2:x3:x4:x5:x1:xs
> rotate 6 (x1:x2:x3:x4:x5:x6:xs) = x2:x3:x4:x5:x6:x1:xs
> rotate 7 (x1:x2:x3:x4:x5:x6:x7:xs) = x2:x3:x4:x5:x6:x7:x1:xs
> rotate 8 (x1:x2:x3:x4:x5:x6:x7:x8:xs) = x2:x3:x4:x5:x6:x7:x8:x1:xs
> rotate 9 (x1:x2:x3:x4:x5:x6:x7:x8:x9:xs) = x2:x3:x4:x5:x6:x7:x8:x9:x1:xs
> rotate 10 (x1:x2:x3:x4:x5:x6:x7:x8:x9:x10:xs) = x2:x3:x4:x5:x6:x7:x8:x9:x10:x1:xs
> rotate n (x:xs) = rotate' n xs
>     where rotate' 1 xs     = x:xs
>           rotate' n (x:xs) = x:rotate' (n-1) xs

Which does not scale particularly well.  The "normal" entry is 9.0x slower than
the leader, which the readIntOffAddr# is 3.1x slower.  So the gain from explicit
unboxing and state passing was substantial.


> Not so atypical.
>
> More examples (I didn't look /that/ hard.. :))
>
> -----------------------------------
> * From reverse-complement
>
> reverseit iubc strand i l s =
>     if i >=# l
>         then (# s, (I# i, I# l) #)
>         else case readWord8OffAddr# strand i s  of { (# s, c #) ->
>              case readWord8OffAddr# strand l s  of { (# s, x #) ->
>              case readWord8OffAddr# iubc (word2Int# x) s  of { (# s, y#)
>              case readWord8OffAddr# iubc   (word2Int# c) s  of { (# s, z #) ->
>              case writeWord8OffAddr# strand i y s of { s ->
>              case writeWord8OffAddr# strand l z s of { s ->
>              reverseit iubc strand (i +# 1#) (l -# 1#) s
>         } } } } } }



Again, there is a "normal" entry, with a screen full of the definition of

> complement :: Base -> Base
> complement 'A' = 'T'
> complement 'a' = 'T'
> complement 'C' = 'G'
> complement 'c' = 'G'
> complement 'G' = 'C'
> ...

The "normal" entry was 32x slower than the leader and used 31x the memory.  The
optimized entry runs 6.3x slower and  uses 1.1x the memory.  The fast entry uses
only the state & unboxing seen above, the rest of the program is normal Haskell,
e.g. the one line that replaces a screen full:

> pairs = map (c2w *** c2w) [('A','T'),('C','G'),('B','V'),('D','H'),('K','M'),('R','Y'),('\0','\0')]

>
>
> * From k-nucleotide.
>
> eqmem i ptr1 ptr2 s = if i ==# 0# then (# s , True #) else
>     case readInt8OffAddr# ptr1 0# s of { (# s, i8a #) ->
>     case readInt8OffAddr# ptr2 0# s of { (# s, i8b #) ->
>     if i8a ==# i8b
>         then eqmem (i -# 1#) (plusAddr# ptr1 1#) (plusAddr# ptr2 1#) s
>         else (# s, False #) } }
>

I wrote that, copying the technique from Don.  That is what a fast packed string
library which compares ascii bytes should do internally.  In c-code it is exacly
the 'memcmp' function.  And yet the GHC 6.4.1 distribution does not have this
function.  This is an example of the library missing things that it should have.

The "normal" Haskell entry was 26x slower and 29x bigger than the leader.  The
replacement code above is 22x and 7.1x instead.  Note that I did *not* get a big
speed increase.  This is because I must use the provided Data.Hashtable which is
simply not competitive to other languages.  This shootout benchmark has
highlighted the deficiency.

>
> * From n-body.
>
> kineticE i = let i' = (.|. (shiftL i 3))
>              in do m <- mass i
>                    vx <- unsafeRead b (i' vx)
>                    vy <- unsafeRead b (i' vy)
>                    vz <- unsafeRead b (i' vz)
>                    return $! 0.5 * m * (vx*vx + vy*vy + vz*vz)
>

I wrote that out of desperation.  Einar has a previous entry that is beautiful
Haskell, creating an "open mutable records" system just for this benchmark.  It
ran in 18x time, 5.8 space.  Many of us put entries on the wiki which were
different ways to approach it (arrays of mutable data; mutable arrays of
constant data).  None were able to beat Einar's entry.  Eventually I and Don
wrote a flat ptr to array of double version, doing it's own offset computation
using (shiftL) and (.!.) as you see above.  This runs in 6.6x time and 4.1x
space.  So without resorting to state & unboxing we got nearly triple the speed.

Though we are still half the speed of OCaml here.


> ----------------------
>
> *I* am certainly not implying that Haskell is anything less than the
> most wonderous language in the entire world.

The neat thing is that for speed we do not drop out into c-code or assembly.  We
just use different parts of Haskell.  The mutable state for the n-body code is
simple using the IO features to act on a ptr to machine doubles.

>
> I'm saying that there's a stark difference in style between the programs
> submitted to the shootout, and the ones I would show to people that I
> myself was trying to introduce to the wonders of purely lazy functional
> programming. :).

True.  The ugly code above is not wondrous.

I will assert that the use of explicit-IO-state passing and unboxed code is good
for teaching.  If you want to explain the IO monad, you will end up talking
about the state of the "real world" and how nice it is that you don't have to
pass it around so that pure functions can peek/poke at it.  By showing someone
the examples above, they get to read what their IO-do-notation means once it has
been de-sugared to a lower level than (>>=).

Another way to look at it is that "readIntOffAddr# p1# 0# s" is making use of a
nominally pure function, where peekPtr / pokePtr are impure.

>
> I think there's a big-fat-lesson about the tension between abstraction
> and implementation in these entries.
>
> On one hand we've got "This is what I want", on the other it's "What do
> I have to do to implement it".
>
> Ben.

You have the truth of it.

Haskell wins the lines-of-code metric by a large margin from the abstraction.
But entries that run into big performance problems get parts rewritten in lower
level Haskell.
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Number 1, at least for now

Gour-2
In reply to this post by John Meacham
On Wed, 2006-02-01 at 19:14 -0800, John Meacham wrote:

> though, I think this is a great oprotunity to improve ghc's optimizer.

Huh, that would be the best thing with the whole shootout endeavour..


Sincerely,
Gour

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe

signature.asc (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: Number 1, at least for now

Simon Peyton Jones
In reply to this post by Donald Bruce Stewart
| > though, I think this is a great oprotunity to improve ghc's
optimizer.
|
| Huh, that would be the best thing with the whole shootout endeavour..

Yes indeed.  

One thing that would be really helpful, as a first step, would be to
identify a bunch of concrete examples that GHC "should" have optimised,
or perhaps "could" have optimised, and attach them to the GHC wiki
somewhere.  One place would be here
        http://haskell.org/haskellwiki/Performance
or perhaps somewhere on the developers wiki, or as a task.
        http://hackage.haskell.org/trac/ghc

I'm not promising that anything will happen fast (unless someone else
does it!) but it's always much more likely to happen if there's a
specific place that advertises low-hanging performance fruit.
 

This is closely related to the "advice to programmers" that Don
circulated yesterday, in which he explained how he optimised one
particular case.  But not quite the same.

Simon
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Number 1, at least for now

Wolfgang Jeltsch
In reply to this post by Donald Bruce Stewart
Am Mittwoch, 1. Februar 2006 08:22 schrieb Donald Bruce Stewart:
> Haskell is now ranked number 1 on the Great Language Shootout!
>
>     http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=all
>
> Hooray :)
>
> -- Don

It seems to be number 2 at the moment.

Best wishes,
Wolfgang
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Number 1, at least for now

Wolfgang Jeltsch
In reply to this post by Donald Bruce Stewart
Am Donnerstag, 2. Februar 2006 04:26 schrieb Donald Bruce Stewart:
> [...]

> A good packed string regex library would also be useful.

But only one that gives us regular expressions which are parsed at compile
time instead of runtime.

> [...]

Best wishes,
Wolfgang
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Number 1, at least for now

Sebastian Sylvan
In reply to this post by Wolfgang Jeltsch
On 2/2/06, Wolfgang Jeltsch <[hidden email]> wrote:

> Am Mittwoch, 1. Februar 2006 08:22 schrieb Donald Bruce Stewart:
> > Haskell is now ranked number 1 on the Great Language Shootout!
> >
> >     http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=all
> >
> > Hooray :)
> >
> > -- Don
>
> It seems to be number 2 at the moment.
>

It looks like it, all of a sudden, has one missing benchmark. Did
something break?

/S

--
Sebastian Sylvan
+46(0)736-818655
UIN: 44640862
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Number 1, at least for now

Isaac Gouy-2
--- Sebastian Sylvan <[hidden email]>
wrote:
> > It seems to be number 2 at the moment.
> >
>
> It looks like it, all of a sudden, has one missing
> benchmark. Did
> something break?

Previously the GHC program was shown incorrectly as
completing regex-dna within the timeout - now it's
shown correctly.

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com 
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe