Shootout favoring imperative code

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
30 messages Options
12
Reply | Threaded
Open this post in threaded view
|

x86 code generation going wrong?

haskell-2
Hello,

  I need to ask for some help to test x86 code generation.

There is a factor of two runtime difference between the code I am
benchmarking on my OS X powerbook G4 (ghc 6.4.1) and shootout's speed on
a linux x86 machine (ghc 6.4.1).

Could someone else running on x86 test the three versions pasted below
before I think about submitting another one to the shootout?

To compile "ghc --make filename.hs -o program"

To run "cat input-file | time ./program"

where to save space, the gzip'd input file is at

http://paradosso.mit.edu/~ckuklewicz/sum-file-test-input.gz

-------------------------------------------------------------------------
-- Original version
{-# OPTIONS -O2 #-}
import Char( ord )

main :: IO ()
main = getContents >>= print . accP 0 0

accP :: Int -> Int -> String -> Int
accP before this  []       =       before+this
accP before this ('\n':xs) = accP (before+this) 0                        xs
accP before this ('-' :xs) = accN  before       this                     xs
accP before this (x   :xs) = accP  before      (this*10+ord(x)-ord('0')) xs

accN :: Int -> Int -> String -> Int
accN before this  []       =       before-this
accN before this ('\n':xs) = accP (before-this) 0                        xs
accN before this (x   :xs) = accN  before      (this*10+ord(x)-ord('0')) xs

-------------------------------------------------------------------------
-- Faster on G4, 2x slower on x86
{-# OPTIONS -O2 -funbox-strict-fields #-}
import GHC.Base

data I = I !Int

main = print . new (I 0) =<< getContents

new (I i) []       = i
new (I i) ('-':xs) = neg (I 0) xs
    where neg (I n) ('\n':xs) = new (I (i - n)) xs
          neg (I n) (x   :xs) = neg (I (parse x + (10 * n))) xs
new (I i) (x:xs) = pos (I (parse x)) xs
    where pos (I n) ('\n':xs) = new (I (i + n)) xs
          pos (I n) (x   :xs) = pos (I (parse x + (10 * n))) xs

parse c = ord c - ord '0'

-------------------------------------------------------------------------
-- Explicitly unboxed proposal, faster on G4
{-# OPTIONS -fglasgow-exts -O2 #-}

import GHC.Base

main = print . sumFile =<< getContents
    where sumFile = (\rest -> newLine rest 0#)

newLine [] rt = (I# rt)
newLine ('-':rest) rt = negLine rest 0#
    where negLine ('\n':rest) soFar = newLine rest (rt -# soFar)
          negLine ( x  :rest) soFar = negLine rest (d2i x +# (10# *# soFar))
newLine (x:rest) rt = posLine rest (d2i x)
    where posLine ('\n':rest) soFar = newLine rest (rt +# soFar)
          posLine ( x  :rest) soFar = posLine rest (d2i x +# (10# *# soFar))

d2i (C# c) = (ord# c) -# z
    where z = ord# '0'#
-------------------------------------------------------------------------

Thanks,
  Chris

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

RE: x86 code generation going wrong?

Branimir Maksimovic-2



>From: Chris Kuklewicz <[hidden email]>
>To: [hidden email]
>Subject: [Haskell-cafe] x86 code generation going wrong?
>Date: Sat, 07 Jan 2006 16:18:59 +0000
>
>Hello,
>
>   I need to ask for some help to test x86 code generation.
>
>There is a factor of two runtime difference between the code I am
>benchmarking on my OS X powerbook G4 (ghc 6.4.1) and shootout's speed on
>a linux x86 machine (ghc 6.4.1).
>
>Could someone else running on x86 test the three versions pasted below
>before I think about submitting another one to the shootout?

Here are the tests on P4 2.4 ghz and athlon 64 3000 linux test1-3 in
respective order
of appearance (note:OPTIONS didn't do anything I have to
compile -O2 -fglasgow-exts explicitely, because I've got compile error for
test3.hs )

[bmaxa@maxa ~/haskell/myhaskell] $ time ./test1 < sum-file-test-input
4000000

real    0m3.550s
user    0m3.440s
sys     0m0.080s
[bmaxa@maxa ~/haskell/myhaskell] $ time ./test2 < sum-file-test-input
4000000

real    0m3.708s
user    0m3.660s
sys     0m0.060s
[bmaxa@maxa ~/haskell/myhaskell] $ time ./test3 < sum-file-test-input
4000000

real    0m3.678s
user    0m3.620s
sys     0m0.050s

This is on athlon64 3000 , linux :

[bmaxa@devel64 ~]$ time ./test1 < sum-file-test-input
4000000

real    0m5.782s
user    0m5.724s
sys     0m0.056s

[bmaxa@devel64 ~]$ time ./test2 < sum-file-test-input
4000000

real    0m5.953s
user    0m5.900s
sys     0m0.052s
[bmaxa@devel64 ~]$ time ./test3 < sum-file-test-input
4000000

real    0m5.403s
user    0m5.332s
sys     0m0.072s

Greetings, Bane.

>
>To compile "ghc --make filename.hs -o program"
>
>To run "cat input-file | time ./program"
>
>where to save space, the gzip'd input file is at
>
>http://paradosso.mit.edu/~ckuklewicz/sum-file-test-input.gz
>
>-------------------------------------------------------------------------
>-- Original version
>{-# OPTIONS -O2 #-}
>import Char( ord )
>
>main :: IO ()
>main = getContents >>= print . accP 0 0
>
>accP :: Int -> Int -> String -> Int
>accP before this  []       =       before+this
>accP before this ('\n':xs) = accP (before+this) 0                        xs
>accP before this ('-' :xs) = accN  before       this                     xs
>accP before this (x   :xs) = accP  before      (this*10+ord(x)-ord('0')) xs
>
>accN :: Int -> Int -> String -> Int
>accN before this  []       =       before-this
>accN before this ('\n':xs) = accP (before-this) 0                        xs
>accN before this (x   :xs) = accN  before      (this*10+ord(x)-ord('0')) xs
>
>-------------------------------------------------------------------------
>-- Faster on G4, 2x slower on x86
>{-# OPTIONS -O2 -funbox-strict-fields #-}
>import GHC.Base
>
>data I = I !Int
>
>main = print . new (I 0) =<< getContents
>
>new (I i) []       = i
>new (I i) ('-':xs) = neg (I 0) xs
>     where neg (I n) ('\n':xs) = new (I (i - n)) xs
>           neg (I n) (x   :xs) = neg (I (parse x + (10 * n))) xs
>new (I i) (x:xs) = pos (I (parse x)) xs
>     where pos (I n) ('\n':xs) = new (I (i + n)) xs
>           pos (I n) (x   :xs) = pos (I (parse x + (10 * n))) xs
>
>parse c = ord c - ord '0'
>
>-------------------------------------------------------------------------
>-- Explicitly unboxed proposal, faster on G4
>{-# OPTIONS -fglasgow-exts -O2 #-}
>
>import GHC.Base
>
>main = print . sumFile =<< getContents
>     where sumFile = (\rest -> newLine rest 0#)
>
>newLine [] rt = (I# rt)
>newLine ('-':rest) rt = negLine rest 0#
>     where negLine ('\n':rest) soFar = newLine rest (rt -# soFar)
>           negLine ( x  :rest) soFar = negLine rest (d2i x +# (10# *#
>soFar))
>newLine (x:rest) rt = posLine rest (d2i x)
>     where posLine ('\n':rest) soFar = newLine rest (rt +# soFar)
>           posLine ( x  :rest) soFar = posLine rest (d2i x +# (10# *#
>soFar))
>
>d2i (C# c) = (ord# c) -# z
>     where z = ord# '0'#
>-------------------------------------------------------------------------
>
>Thanks,
>   Chris
>
>_______________________________________________
>Haskell-Cafe mailing list
>[hidden email]
>http://www.haskell.org/mailman/listinfo/haskell-cafe

_________________________________________________________________
FREE pop-up blocking with the new MSN Toolbar - get it now!
http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re[2]: x86 code generation going wrong?

Bulat Ziganshin
Hello Branimir,

Sunday, January 08, 2006, 1:57:06 PM, you wrote:
BM> of appearance (note:OPTIONS didn't do anything I have to
BM> compile -O2 -fglasgow-exts explicitely, because I've got compile error for
BM> test3.hs )

use

{-# OPTIONS_GHC -O2 -fglasgow-exts #-}


--
Best regards,
 Bulat                            mailto:[hidden email]



_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: x86 code generation going wrong?

haskell-2
In reply to this post by haskell-2
Brian Sniffen wrote:
> The first couldn't even complete on my 2.26 GHz Celeron! It's only got
> 512 MB of RAM,  which may be part of the problem.

I should not leak memory but it may be an optimization problem.

Try explicitly using "ghc -O2 -funbox-strict-fields".

>
> Stack space overflow: current size 1048576 bytes.
> Use `+RTS -Ksize' to increase it.
> ./test1 < sum-file-test-input  25.33s user 2.89s system 18% cpu 2:32.02 total
>
> ./test2 < sum-file-test-input  4.79s user 0.20s system 94% cpu 5.276 total
>
> ./test3 < sum-file-test-input  4.46s user 0.14s system 99% cpu 4.623 total
>
> --
> Brian T. Sniffen
> [hidden email]    or    [hidden email]
> http://www.evenmere.org/~bts
>

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: x86 code generation going wrong?

Branimir Maksimovic-2



>From: Chris Kuklewicz <[hidden email]>
>To: [hidden email], Haskell Cafe <[hidden email]>
>Subject: Re: [Haskell-cafe] x86 code generation going wrong?
>Date: Sun, 08 Jan 2006 20:33:57 +0000
>
>Brian Sniffen wrote:
> > The first couldn't even complete on my 2.26 GHz Celeron! It's only got
> > 512 MB of RAM,  which may be part of the problem.
>
>I should not leak memory but it may be an optimization problem.
>
>Try explicitly using "ghc -O2 -funbox-strict-fields".
>
On p4. 2.4 ghz 512mb first example takes about 2.5 mb of ram.
I've compiled explicitelly with -O2, because without optimisations it takes
ridicilously large amount of RAM.
I've changed {# OPTIONS to OPTIONS_GHC as Bulat sugested
but still no effect. Options have to be specified on command line in
order to work.


Greetings, Bane.

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: x86 code generation going wrong?

Donald Bruce Stewart
bmaxa:

>
>
>
> >From: Chris Kuklewicz <[hidden email]>
> >To: [hidden email], Haskell Cafe <[hidden email]>
> >Subject: Re: [Haskell-cafe] x86 code generation going wrong?
> >Date: Sun, 08 Jan 2006 20:33:57 +0000
> >
> >Brian Sniffen wrote:
> >> The first couldn't even complete on my 2.26 GHz Celeron! It's only got
> >> 512 MB of RAM,  which may be part of the problem.
> >
> >I should not leak memory but it may be an optimization problem.
> >
> >Try explicitly using "ghc -O2 -funbox-strict-fields".
> >
> On p4. 2.4 ghz 512mb first example takes about 2.5 mb of ram.
> I've compiled explicitelly with -O2, because without optimisations it takes
> ridicilously large amount of RAM.
> I've changed {# OPTIONS to OPTIONS_GHC as Bulat sugested
> but still no effect. Options have to be specified on command line in
> order to work.

Ensure that the {-# OPTIONS ... #-} lines is the *first* line of the
file, and that no comments precede it.

-- Don
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: x86 code generation going wrong?

Branimir Maksimovic-2



>From: [hidden email] (Donald Bruce Stewart)
>To: Branimir Maksimovic <[hidden email]>
>CC: [hidden email],
>[hidden email],[hidden email]
>Subject: Re: [Haskell-cafe] x86 code generation going wrong?
>Date: Mon, 9 Jan 2006 11:15:51 +1100
>
>bmaxa:
> >
> >
> >
> > >From: Chris Kuklewicz <[hidden email]>
> > >To: [hidden email], Haskell Cafe <[hidden email]>
> > >Subject: Re: [Haskell-cafe] x86 code generation going wrong?
> > >Date: Sun, 08 Jan 2006 20:33:57 +0000
> > >
> > >Brian Sniffen wrote:
> > >> The first couldn't even complete on my 2.26 GHz Celeron! It's only
>got
> > >> 512 MB of RAM,  which may be part of the problem.
> > >
> > >I should not leak memory but it may be an optimization problem.
> > >
> > >Try explicitly using "ghc -O2 -funbox-strict-fields".
> > >
> > On p4. 2.4 ghz 512mb first example takes about 2.5 mb of ram.
> > I've compiled explicitelly with -O2, because without optimisations it
>takes
> > ridicilously large amount of RAM.
> > I've changed {# OPTIONS to OPTIONS_GHC as Bulat sugested
> > but still no effect. Options have to be specified on command line in
> > order to work.
>
>Ensure that the {-# OPTIONS ... #-} lines is the *first* line of the
>file, and that no comments precede it.
>

Aaah, I didn't knew that. Now this works, thanks!


Greetings, Bane.

_________________________________________________________________
FREE pop-up blocking with the new MSN Toolbar - get it now!
http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Shootout favoring imperative code

Simon Marlow
In reply to this post by Sebastian Sylvan
Sebastian Sylvan wrote:

> It would be neat if the PackedString library contained functions such
> as hGetLine etc. It does have a function for reading from a buffer,
> but it won't stop at a newline...
> But yeah, fast string manipulation is difficult when using a
> linked-list representation...

My version of the packed string library does have an hGetLine.  Don
Stewart was merging my version with his fps at some point, Don - any
news on that?

Cheers,
        Simon

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Shootout favoring imperative code

Einar Karttunen
On 09.01 11:32, Simon Marlow wrote:

> Sebastian Sylvan wrote:
>
> >It would be neat if the PackedString library contained functions such
> >as hGetLine etc. It does have a function for reading from a buffer,
> >but it won't stop at a newline...
> >But yeah, fast string manipulation is difficult when using a
> >linked-list representation...
>
> My version of the packed string library does have an hGetLine.  Don
> Stewart was merging my version with his fps at some point, Don - any
> news on that?

Getting a fast FastPackedString will solve the problems with many
benchmarks. A similar thing for arrays would be nice - although
this is more about inteface:

> module Data.Array.UnsafeOps where
>
> import Data.Array.Base hiding((!))
>
> {-# INLINE (!) #-}
> (!) :: MArray a e m => a Int e -> Int -> m e
> (!) = unsafeRead
>
> {-# INLINE set #-}
> set :: MArray a e m => a Int e -> Int -> e -> m ()
> set = unsafeWrite
>
> {-# INLINE swap #-}
> swap :: MArray a e m => a Int e -> Int -> Int -> m ()
> swap arr x y = do xv <- arr ! x
>                   yv <- arr ! y
>                   set arr x yv
>                   set arr y xv
>
> {-# INLINE combineTo #-}
> combineTo :: MArray a e m => a Int e -> Int -> (e -> e -> e) -> a Int e -> Int -> m ()
> combineTo a0 i0 f a1 i1 = do v0 <- a0 ! i0
>                              v1 <- a1 ! i1
>                              set a0 i0 $! f v0 v1

and so forth. Usually imperative solutions have something like
"a[i] += b[i]", which currently is quite tedious and ugly to
translate to MArrays. Now it would become "combineTo a i (+) b i".

- Einar Karttunen
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Re: Shootout favoring imperative code

Bulat Ziganshin
Hello Einar,

Wednesday, January 11, 2006, 6:06:56 PM, you wrote:

>> My version of the packed string library does have an hGetLine.  Don
>> Stewart was merging my version with his fps at some point, Don - any
>> news on that?

EK> Getting a fast FastPackedString will solve the problems with many
EK> benchmarks.

btw, JHC's version of FPS uses slightly less memory (i don't remember,
8 or 12 bytes per each string) and i think must be faster (because it
uses ByteArray# instead of Addr#). so, the best variant is to add hGetLine
to John's library

>>                   set arr x yv

(arr,x) =: yv

looks better ;)

EK> and so forth. Usually imperative solutions have something like
EK> "a[i] += b[i]", which currently is quite tedious and ugly to
EK> translate to MArrays. Now it would become "combineTo a i (+) b i".

you are definitely a Hal Daume's client, look at http://www.isi.edu/~hdaume/STPP/


--
Best regards,
 Bulat                            mailto:[hidden email]



_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
12