I see lots of shootout examples where Haskell programs seem to perform
comparably with C programs, but I find it hard to reproduce anything like those figures when testing with my own code. So here's a simple case: I have this C program: #include <stdio.h> #define n 100000000 double a[n]; int main() { int i; for (i = 0; i<n; ++i) { a[i] = 1.0f; } for (i = 0; i<n-1; ++i) { a[i+1] = a[i]+a[i+1]; } printf("%f\n",a[n-1]); } And this Haskell program: import Data.Array.IO import Control.Monad n = 10000000 main = do a <- newArray (0,n-1) 1.0 :: IO (IOUArray Int Double) forM_ [0..n-2] $ \i -> do { x <- readArray a i; y <- readArray a (i+1); writeArray a (i+1) (x+y) } x <- readArray a (n-1) print x Even though 'n' is 10 times bigger in the C program it runs much faster than the Haskell program on my MacBook Pro with Haskell 6.6.1. I've tried lots of different combinations of flags that I've found in various postings to haskell-cafe but to no avail. What flags do I need to get at least within a factor of 2 or 3 of C? Am I using the wrong kind of array? Or is Haskell always going to be 15-20 times slower for this kind of numerical work? -- Dan _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
Hello Dan,
Thursday, November 8, 2007, 9:33:12 PM, you wrote: > main = do > a <- newArray (0,n-1) 1.0 :: IO (IOUArray Int Double) > forM_ [0..n-2] $ \i -> do { x <- readArray a i; y <- readArray a > (i+1); writeArray a (i+1) (x+y) } > x <- readArray a (n-1) > print x 1. ghc doesn't implement loop unrolling, even in -O2 mode 2. forM_ may have its own overheads, i'm not sure 3. add strictness annotations: forM_ [0..n-2] $ \i -> do { return $! i; x <- readArray a i; return $! x; y <- readArray a (i+1); return $! y; writeArray a (i+1) (x+y) } such cycle should be approx. 3-10 times slower than C one -- Best regards, Bulat mailto:[hidden email] _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
In reply to this post by Dan Piponi-2
"Dan Piponi" <[hidden email]> writes:
> Even though 'n' is 10 times bigger in the C program it runs much > faster than the Haskell program on my MacBook Pro with Haskell 6.6.1. > I've tried lots of different combinations of flags that I've found in > various postings to haskell-cafe but to no avail. import Data.List main = do print $ foldl' (+) 0 $ take 100000000 [1.0,1.0..] works 10 times faster than your C version. You just need to adapt to the radically different style of programming. Real Fortran programmer may write Fortran programs on any programming language :) -- JID: [hidden email] _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
In reply to this post by Dan Piponi-2
On Thu, 2007-11-08 at 10:33 -0800, Dan Piponi wrote:
> I see lots of shootout examples where Haskell programs seem to perform > comparably with C programs, but I find it hard to reproduce anything > like those figures when testing with my own code. So here's a simple > case: > > I have this C program: > > #include <stdio.h> > > #define n 100000000 > > double a[n]; > > int main() > { > int i; > for (i = 0; i<n; ++i) > { > a[i] = 1.0f; > } > for (i = 0; i<n-1; ++i) > { > a[i+1] = a[i]+a[i+1]; > } > printf("%f\n",a[n-1]); > } > > And this Haskell program: > > import Data.Array.IO > import Control.Monad > > n = 10000000 > > main = do > a <- newArray (0,n-1) 1.0 :: IO (IOUArray Int Double) > forM_ [0..n-2] $ \i -> do { x <- readArray a i; y <- readArray a > (i+1); writeArray a (i+1) (x+y) } > x <- readArray a (n-1) > print x > > Even though 'n' is 10 times bigger in the C program it runs much > faster than the Haskell program on my MacBook Pro with Haskell 6.6.1. > I've tried lots of different combinations of flags that I've found in > various postings to haskell-cafe but to no avail. > > What flags do I need to get at least within a factor of 2 or 3 of C? > Am I using the wrong kind of array? Or is Haskell always going to be > 15-20 times slower for this kind of numerical work? I tried both, but it would be really helpful to see your command line options. I used: $ gcc -O3 ghc-bench.c -o ghcbC ghc-bench.c: In function ‘main’: ghc-bench.c:16: warning: incompatible implicit declaration of built-in function ‘printf’ $ ghc --make -O2 ghc-bench.hs and got: $ time ./ghc-bench 2.0e7 real 0m0.714s user 0m0.576s sys 0m0.132s $ time ./ghcbC 20000000.000000 real 0m0.305s user 0m0.164s sys 0m0.132s This is on a first-gen Macbook running Ubuntu. 1GB RAM. 1.83Ghz Core Duo $ ghc --version The Glorious Glasgow Haskell Compilation System, version 6.8.0.20071019 $ gcc --version gcc (GCC) 4.1.2 (Ubuntu 4.1.2-0ubuntu4) Copyright (C) 2006 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
In reply to this post by Mikhail Gusarov-3
On Fri, 2007-11-09 at 00:51 +0600, Mikhail Gusarov wrote:
> "Dan Piponi" <[hidden email]> writes: > > > Even though 'n' is 10 times bigger in the C program it runs much > > faster than the Haskell program on my MacBook Pro with Haskell 6.6.1. > > I've tried lots of different combinations of flags that I've found in > > various postings to haskell-cafe but to no avail. > > import Data.List > > main = do > print $ foldl' (+) 0 $ take 100000000 [1.0,1.0..] > > works 10 times faster than your C version. You just need to adapt to the > radically different style of programming. > > Real Fortran programmer may write Fortran programs on any programming > language :) Good point, but this is not true for the OP. Take a look at Dan's blog. It's linked on planet.haskell.org. BTW. I prefer: print 10000000.0 ;) _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
In reply to this post by Bulat Ziganshin-2
Bulat,
The strictness gave me something like a 10% performance increase making the Haskell code more than 10 times slower than the C. Is this the right type of array to use for performance? -- Dan On Nov 8, 2007 10:36 AM, Bulat Ziganshin <[hidden email]> wrote: > Hello Dan, > > Thursday, November 8, 2007, 9:33:12 PM, you wrote: > > > main = do > > a <- newArray (0,n-1) 1.0 :: IO (IOUArray Int Double) > > forM_ [0..n-2] $ \i -> do { x <- readArray a i; y <- readArray a > > (i+1); writeArray a (i+1) (x+y) } > > x <- readArray a (n-1) > > print x > > 1. ghc doesn't implement loop unrolling, even in -O2 mode > 2. forM_ may have its own overheads, i'm not sure > 3. add strictness annotations: > > forM_ [0..n-2] $ \i -> do { return $! i; > x <- readArray a i; return $! x; > y <- readArray a (i+1); return $! y; > writeArray a (i+1) (x+y) } > > such cycle should be approx. 3-10 times slower than C one > > -- > Best regards, > Bulat mailto:[hidden email] > > Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
In reply to this post by Dan Piponi-2
On Thu, 2007-11-08 at 10:33 -0800, Dan Piponi wrote:
> I see lots of shootout examples where Haskell programs seem to perform > comparably with C programs, but I find it hard to reproduce anything > like those figures when testing with my own code. So here's a simple > case: > > I have this C program: > > #include <stdio.h> > > #define n 100000000 > > double a[n]; > > int main() > { > int i; > for (i = 0; i<n; ++i) > { > a[i] = 1.0f; > } > for (i = 0; i<n-1; ++i) > { > a[i+1] = a[i]+a[i+1]; > } > printf("%f\n",a[n-1]); > } > > And this Haskell program: > > import Data.Array.IO > import Control.Monad > > n = 10000000 > > main = do > a <- newArray (0,n-1) 1.0 :: IO (IOUArray Int Double) > forM_ [0..n-2] $ \i -> do { x <- readArray a i; y <- readArray a > (i+1); writeArray a (i+1) (x+y) } > x <- readArray a (n-1) > print x > > Even though 'n' is 10 times bigger in the C program it runs much > faster than the Haskell program on my MacBook Pro with Haskell 6.6.1. > I've tried lots of different combinations of flags that I've found in > various postings to haskell-cafe but to no avail. > > What flags do I need to get at least within a factor of 2 or 3 of C? > Am I using the wrong kind of array? Or is Haskell always going to be > 15-20 times slower for this kind of numerical work? Wow. You should *really* try using GHC 6.8.1: GHC 6.6.1 (-O2) real 0m7.062s user 0m5.464s sys 0m0.288s GHC 6.8.0.20071019 real 0m0.723s user 0m0.616s sys 0m0.100s result is "2.0e7" for both _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
In reply to this post by Mikhail Gusarov-3
Mikhail,
> main = do > print $ foldl' (+) 0 $ take 100000000 [1.0,1.0..] > > works 10 times faster than your C version. You just need to adapt to the > radically different style of programming. My wasn't intended to represent the problem that I'm trying to solve, but the approach I want to take. The problems that I do want to solve don't lend themselves to this kind of approach. My real situation is that I want to write code that has both a high-level component and a low-level number-crunching component that works on large dense and sparse arrays. Idiomatic Haskell is great for the high-level component. But the question is whether or not its worth trying to write the low-level code in Haskell or whether I should just write that code in C and use the FFI. There are still advantages to using Haskell even if the code is highly unidiomatic: you have the freedom of throwing in higher order functions from time to time in the low-level code, and writing mixed-language code is a pain. -- Dan _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
In reply to this post by Thomas Schilling-2
On Nov 8, 2007 11:24 AM, Thomas Schilling <[hidden email]> wrote:
> Wow. You should *really* try using GHC 6.8.1: I was hoping you weren't going to say that :-) As soon as I find a suitable 64-bit Intel binary for MacOSX, or can bootstrap my way out of happy needing happy in my attempted source build, I'll report back on how my code performs. -- Dan _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
In reply to this post by Mikhail Gusarov-3
Can you show us your compilation options and timings?
-- _jsn _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
In reply to this post by Bulat Ziganshin-2
Bulat Ziganshin <[hidden email]> writes:
> Hello Dan, > > Thursday, November 8, 2007, 9:33:12 PM, you wrote: > >> main = do >> a <- newArray (0,n-1) 1.0 :: IO (IOUArray Int Double) >> forM_ [0..n-2] $ \i -> do { x <- readArray a i; y <- readArray a >> (i+1); writeArray a (i+1) (x+y) } >> x <- readArray a (n-1) >> print x > > 1. ghc doesn't implement loop unrolling, even in -O2 mode > 2. forM_ may have its own overheads, i'm not sure > 3. add strictness annotations: > > forM_ [0..n-2] $ \i -> do { return $! i; > x <- readArray a i; return $! x; > y <- readArray a (i+1); return $! y; > writeArray a (i+1) (x+y) } > > such cycle should be approx. 3-10 times slower than C one I didn't know you can do that. Anyway, I tried OP's C and Haskell versions and the one with your strictness annotations. It seems that the `$!' thing doesn't do much, or it's not doing anything. OP's C code: $ time ./test 100000000.000000 real 0m1.128s user 0m0.572s sys 0m0.555s OP's Haskell code with n=1e8: $ time ./test-ghc 1.0e8 real 0m2.065s user 0m1.505s sys 0m0.558s OP's Haskell code with n=1e8 and your strictness annotations: $ time ./test-ghc 1.0e8 real 0m2.064s user 0m1.472s sys 0m0.591s However, bad performance observed by OP is not observed by me. $ ghc --version The Glorious Glasgow Haskell Compilation System, version 6.8.1 $ gcc --version gcc (GCC) 4.1.2 (Gentoo 4.1.2 p1.0.2) Copyright (C) 2006 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. -- c/* __o/* <\ * (__ */\ < _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
In reply to this post by Jason Dusek
On Nov 8, 2007 11:34 AM, Jason Dusek <[hidden email]> wrote:
> Can you show us your compilation options and timings? I was simply using -O3. I tried a bunch of other flags (copied from the shootout examples) but they made no appreciable difference. I was getting about 1.5s for the Haskell program and about 0.08s for the C one with the same n=10,000,000. I now see that my ghc is in fact an i386 build whose binary was downloaded from the ghc binary web page. So I'm going to try to make a better comparison with a 64 bit 6.8.1. The batteries on my laptop will probably drop to zero before then however, so it'll be tomorrow at the earliest. -- Dan _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
In reply to this post by Dan Piponi-2
dpiponi:
> I see lots of shootout examples where Haskell programs seem to perform > comparably with C programs, but I find it hard to reproduce anything > like those figures when testing with my own code. So here's a simple > case: > > I have this C program: > > #include <stdio.h> > > #define n 100000000 > > double a[n]; > > int main() > { > int i; > for (i = 0; i<n; ++i) > { > a[i] = 1.0f; > } > for (i = 0; i<n-1; ++i) > { > a[i+1] = a[i]+a[i+1]; > } > printf("%f\n",a[n-1]); > } > > And this Haskell program: > > import Data.Array.IO > import Control.Monad > > n = 10000000 > > main = do > a <- newArray (0,n-1) 1.0 :: IO (IOUArray Int Double) > forM_ [0..n-2] $ \i -> do { x <- readArray a i; y <- readArray a > (i+1); writeArray a (i+1) (x+y) } > x <- readArray a (n-1) > print x Be really careful with Doubles (i.e. check what Int arrays also do), since you need special flags to get good performance from unboxed Double arrays. The best effort I know of is: http://shootout.alioth.debian.org/gp4/benchmark.php?test=spectralnorm&lang=all which took some time to work out the right path through the compiler. In essence: * some gcc flags: -optc-O3 -optc-march=pentium4 -optc-mfpmath=sse -optc-msse2 -optc-ffast-math * Foreign.Marshal.Array * type Reals = Ptr Double * inlinePerformIO in the end I'm stunned at how well we did there, since Double math has been a long term issue. Perhaps there are other lessons we can extract from that code? -- Don _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
In reply to this post by Dan Piponi-2
On Nov 8, 2007 11:36 AM, Paul Brown <[hidden email]> wrote:
> All that said, I'm not sure where I got the GHC that I used to build > the 6.6.1 via MacPorts; I think it shipped with MacOS once upon a time. > sudo port install ghc . . . configure: error: GHC is required unless bootstrapping from .hc files. But I do have a plan... -- Dan _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
In reply to this post by Dan Piponi-2
"Dan Piponi" <[hidden email]> writes:
> My wasn't intended to represent the problem that I'm trying to solve, > but the approach I want to take. The problems that I do want to solve > don't lend themselves to this kind of approach. > > My real situation is that I want to write code that has both a > high-level component and a low-level number-crunching component that > works on large dense and sparse arrays. Idiomatic Haskell is great for > the high-level component. But the question is whether or not its worth > trying to write the low-level code in Haskell or whether I should just > write that code in C and use the FFI. I've written a bit number-crunching code in haskell. My experience is that usually the haskell code runs about 2 to 5 times slower -- It depends on whether I use the best optimisation scheme or not -- I do not know if I am doing the correct thing most of the time, since there is not much literature on-line about numerical computing in haskell, or in FP in general. There is one in ocaml, though. > There are still advantages to using Haskell even if the > code is highly unidiomatic: you have the freedom of > throwing in higher order functions from time to time in > the low-level code, and writing mixed-language code is a > pain. -- Dan I fully agree. Xiao-Yong -- c/* __o/* <\ * (__ */\ < _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
In reply to this post by Dan Piponi-2
dpiponi:
> On Nov 8, 2007 11:34 AM, Jason Dusek <[hidden email]> wrote: > > Can you show us your compilation options and timings? > > I was simply using -O3. I tried a bunch of other flags (copied from > the shootout examples) but they made no appreciable difference. Argh, -O2 please. -O3 does nothing more, and sometime does less, depending on the compiler :) Double math always needs the gcc backend, and -optc flags (see the spectral-norm post). > > I was getting about 1.5s for the Haskell program and about 0.08s for > the C one with the same n=10,000,000. I'm sure we can do better than that! > I now see that my ghc is in fact an i386 build whose binary was > downloaded from the ghc binary web page. So I'm going to try to make a > better comparison with a 64 bit 6.8.1. The batteries on my laptop will > probably drop to zero before then however, so it'll be tomorrow at the > earliest. If you can post the code somewhere, that would be great, with examples of how to reproduce your timings. -- don _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
On Nov 8, 2007 12:16 PM, Don Stewart <[hidden email]> wrote:
> If you can post the code somewhere, that would be great, with examples > of how to reproduce your timings. The code is exactly what I posted originally (but nore that n is 10 times larger in the C code). I compiled using ghc -O3 -o test test.hs. I timed the resulting executable with unix 'time'. I think I used the tiger Intel binary here: http://www.haskell.org/ghc/download_ghc_661.html (ghc --version reports only "The Glorious...verison 6.6.1", file ~/lib/ghc-6.6.1/ghc-6.6.1 reports "Mach-O executable i386".) This was all run on a 2.4GHz MacBook Pro with Leopard and 2Gb RAM. Anything else you need? I was investigating the plausibility of implementing something like a fluid dynamics solver completely in Haskell. I have some working C code so I thought I'd port it. But only if there's a chance of it being within a factor of 2 or 3 of the original speed. -- Dan _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
dpiponi:
> On Nov 8, 2007 12:16 PM, Don Stewart <[hidden email]> wrote: > > > If you can post the code somewhere, that would be great, with examples > > of how to reproduce your timings. > > The code is exactly what I posted originally (but nore that n is 10 > times larger in the C code). I compiled using ghc -O3 -o test test.hs. > I timed the resulting executable with unix 'time'. I think I used the > tiger Intel binary here: > http://www.haskell.org/ghc/download_ghc_661.html (ghc --version > reports only "The Glorious...verison 6.6.1", file > ~/lib/ghc-6.6.1/ghc-6.6.1 reports "Mach-O executable i386".) This was > all run on a 2.4GHz MacBook Pro with Leopard and 2Gb RAM. Anything > else you need? > > I was investigating the plausibility of implementing something like a > fluid dynamics solver completely in Haskell. I have some working C > code so I thought I'd port it. But only if there's a chance of it > being within a factor of 2 or 3 of the original speed. Can you start by retrying with flags from the spectral-norm benchmark: http://shootout.alioth.debian.org/gp4/benchmark.php?test=spectralnorm&lang=ghc&id=0 The interaction with gcc here is quite important, so forcing -fvia-C will matter. -- Don _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
In reply to this post by Thomas Schilling-2
nominolo:
> On Thu, 2007-11-08 at 10:33 -0800, Dan Piponi wrote: > > I see lots of shootout examples where Haskell programs seem to perform > > comparably with C programs, but I find it hard to reproduce anything > > like those figures when testing with my own code. So here's a simple > > case: > > > > I have this C program: > > > > #include <stdio.h> > > > > #define n 100000000 > > > > double a[n]; > > > > int main() > > { > > int i; > > for (i = 0; i<n; ++i) > > { > > a[i] = 1.0f; > > } > > for (i = 0; i<n-1; ++i) > > { > > a[i+1] = a[i]+a[i+1]; > > } > > printf("%f\n",a[n-1]); > > } > > > > And this Haskell program: > > > > import Data.Array.IO > > import Control.Monad > > > > n = 10000000 > > > > main = do > > a <- newArray (0,n-1) 1.0 :: IO (IOUArray Int Double) > > forM_ [0..n-2] $ \i -> do { x <- readArray a i; y <- readArray a > > (i+1); writeArray a (i+1) (x+y) } > > x <- readArray a (n-1) > > print x > > > > Even though 'n' is 10 times bigger in the C program it runs much > > faster than the Haskell program on my MacBook Pro with Haskell 6.6.1. > > I've tried lots of different combinations of flags that I've found in > > various postings to haskell-cafe but to no avail. > > > > What flags do I need to get at least within a factor of 2 or 3 of C? > > Am I using the wrong kind of array? Or is Haskell always going to be > > 15-20 times slower for this kind of numerical work? > > Wow. You should *really* try using GHC 6.8.1: > > GHC 6.6.1 (-O2) > > real 0m7.062s > user 0m5.464s > sys 0m0.288s > > GHC 6.8.0.20071019 > > real 0m0.723s > user 0m0.616s > sys 0m0.100s > > result is "2.0e7" for both So that looks like a bug in the simplifier in 6.6.1 that has been fixed. This should go in the testsuite, perhaps, Simon? -- Don _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
In reply to this post by Don Stewart-2
On Nov 8, 2007 12:36 PM, Don Stewart <[hidden email]> wrote:
> dpiponi: > Can you start by retrying with flags from the spectral-norm benchmark: > http://shootout.alioth.debian.org/gp4/benchmark.php?test=spectralnorm&lang=ghc&id=0 Actually, that was my starting point for investigating how to optimise Haskell numerical code and I initially used all of those flags. -fvia-C isn't listed among those flags but I also tried adding it anyway. 1.45s every time, they make no difference. (And I made sure I touched the source each time to force recompilation.) It looks like my whole question might become moot with ghc 6.8.1, but so far I've been unable to build it due to the cyclic happy dependency. -- Dan _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
Free forum by Nabble | Edit this page |