|
Hello haskell-cafe,
since there are no objective tests comparing ghc to gcc, i made my own one. these are 3 programs, calculating sum in c++ and haskell: main = print $ sum[1..10^9::Int] main = print $ sum0 (10^9) 0 sum0 :: Int -> Int -> Int sum0 0 !acc = acc sum0 !x !acc = sum0 (x-1) (acc+x) main() { int sum=0; //for(int j=0; j<100;j++) for(int i=0; i<1000*1000*1000;i++) sum += i; return sum; } execution times: sum: ghc 6.6.1 -O2 : 12.433 secs ghc 6.10.1 -O2 : 12.792 secs sum-fast: ghc 6.6.1 -O2 : 1.919 secs ghc 6.10.1 -O2 : 1.856 secs ghc 6.10.1 -O2 -fvia-C : 1.966 secs C++: gcc 3.4.5 -O3 -funroll-loops: 0.062 secs -- Best regards, Bulat mailto:[hidden email] _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
Ahem. Seems like you've included time spent on the runtime loading.
My results: MigMit:~ MigMit$ gcc -o test -O3 -funroll-loops test.c && time ./test -1243309312 real 0m0.066s user 0m0.063s sys 0m0.002s MigMit:~ MigMit$ rm test; ghc -O2 --make test.hs && time ./test Linking test ... -243309312 real 0m3.201s user 0m3.165s sys 0m0.017s While 3.201 vs. 0.066 seem to be a huge difference, 0.017 vs. 0.002 is not that bad. On 20 Feb 2009, at 16:29, Bulat Ziganshin wrote: > Hello haskell-cafe, > > since there are no objective tests comparing ghc to gcc, i made my own > one. these are 3 programs, calculating sum in c++ and haskell: > > main = print $ sum[1..10^9::Int] > > > main = print $ sum0 (10^9) 0 > > sum0 :: Int -> Int -> Int > sum0 0 !acc = acc > sum0 !x !acc = sum0 (x-1) (acc+x) > > > main() > { > int sum=0; > //for(int j=0; j<100;j++) > for(int i=0; i<1000*1000*1000;i++) > sum += i; > return sum; > } > > execution times: > sum: > ghc 6.6.1 -O2 : 12.433 secs > ghc 6.10.1 -O2 : 12.792 secs > sum-fast: > ghc 6.6.1 -O2 : 1.919 secs > ghc 6.10.1 -O2 : 1.856 secs > ghc 6.10.1 -O2 -fvia-C : 1.966 secs > C++: > gcc 3.4.5 -O3 -funroll-loops: 0.062 secs > > > -- > Best regards, > Bulat mailto:[hidden email] > > _______________________________________________ > Haskell-Cafe mailing list > [hidden email] > http://www.haskell.org/mailman/listinfo/haskell-cafe _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
Forget it, my bad.
On 20 Feb 2009, at 16:48, Miguel Mitrofanov wrote: > Ahem. Seems like you've included time spent on the runtime loading. > > My results: > > MigMit:~ MigMit$ gcc -o test -O3 -funroll-loops test.c && time ./test > -1243309312 > real 0m0.066s > user 0m0.063s > sys 0m0.002s > MigMit:~ MigMit$ rm test; ghc -O2 --make test.hs && time ./test > Linking test ... > -243309312 > > real 0m3.201s > user 0m3.165s > sys 0m0.017s > > While 3.201 vs. 0.066 seem to be a huge difference, 0.017 vs. 0.002 > is not that bad. > > On 20 Feb 2009, at 16:29, Bulat Ziganshin wrote: > >> Hello haskell-cafe, >> >> since there are no objective tests comparing ghc to gcc, i made my >> own >> one. these are 3 programs, calculating sum in c++ and haskell: >> >> main = print $ sum[1..10^9::Int] >> >> >> main = print $ sum0 (10^9) 0 >> >> sum0 :: Int -> Int -> Int >> sum0 0 !acc = acc >> sum0 !x !acc = sum0 (x-1) (acc+x) >> >> >> main() >> { >> int sum=0; >> //for(int j=0; j<100;j++) >> for(int i=0; i<1000*1000*1000;i++) >> sum += i; >> return sum; >> } >> >> execution times: >> sum: >> ghc 6.6.1 -O2 : 12.433 secs >> ghc 6.10.1 -O2 : 12.792 secs >> sum-fast: >> ghc 6.6.1 -O2 : 1.919 secs >> ghc 6.10.1 -O2 : 1.856 secs >> ghc 6.10.1 -O2 -fvia-C : 1.966 secs >> C++: >> gcc 3.4.5 -O3 -funroll-loops: 0.062 secs >> >> >> -- >> Best regards, >> Bulat mailto:[hidden email] >> >> _______________________________________________ >> Haskell-Cafe mailing list >> [hidden email] >> http://www.haskell.org/mailman/listinfo/haskell-cafe > > _______________________________________________ > Haskell-Cafe mailing list > [hidden email] > http://www.haskell.org/mailman/listinfo/haskell-cafe _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by MigMit
Hello Miguel,
Friday, February 20, 2009, 4:48:15 PM, you wrote: > Ahem. Seems like you've included time spent on the runtime loading. for C, i've used additional 100x loop > sys 0m0.002s > sys 0m0.017s > While 3.201 vs. 0.066 seem to be a huge difference, 0.017 vs. 0.002 is > not that bad. are you know that "sys" time means? :) -- Best regards, Bulat mailto:[hidden email] _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by Bulat Ziganshin-2
---- Test.hs ----
import Prelude hiding (sum, enumFromTo) import Data.List.Stream (sum, unfoldr) enumFromTo m n = unfoldr f m where f k | k <= n = Just (k,k+1) | otherwise = Nothing main = print . sum $ enumFromTo 1 (10^9 :: Int) ---- snip ---- dolio@zeke % time ./Test 500000000500000000 ./Test 3.12s user 0.03s system 80% cpu 3.922 total dolio@zeke % time ./Test-sum0 500000000500000000 ./Test-sum0 3.47s user 0.02s system 80% cpu 4.348 total dolio@zeke % time ./Test-sum0 500000000500000000 ./Test-sum0 3.60s user 0.02s system 90% cpu 4.009 total dolio@zeke % time ./Test 500000000500000000 ./Test 3.11s user 0.02s system 81% cpu 3.846 total ---- snip ---- "Test-sum0" is with the sum0 function "Test" is the code at the top of this mail. -fvia-c -optc-O3 didn't seem to make a big difference with either Haskell example, so they're both with the default backend. Your C++ code runs slowly on my system (around 1 second), but that's because it uses 32-bit ints, I guess (switching to long int sped it up). -- Dan _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
Sorry for replying to myself, but I got suspicious about the 6ms runtime of
the 64-bit C++ code on my machine. So I looked at the assembly and found this: .LCFI1: movabsq $499999999500000000, %rsi movl $_ZSt4cout, %edi pushq %r12 I'm no assembly guru, but that makes me think that there's no actual computation going on in the runtime for the 64-bit C++ program, whereas the 32-bit one is clearly doing work on my system, since it takes around 1 second. Not that I'd be sad if GHC could reduce that whole constant at compile time, but GCC isn't doing 1 billion adds in 6 (or even 60) milliseconds. _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by Bulat Ziganshin-2
Bulat Ziganshin <[hidden email]> wrote:
> execution times: > sum: > ghc 6.6.1 -O2 : 12.433 secs > ghc 6.10.1 -O2 : 12.792 secs > sum-fast: > ghc 6.6.1 -O2 : 1.919 secs > ghc 6.10.1 -O2 : 1.856 secs > ghc 6.10.1 -O2 -fvia-C : 1.966 secs > C++: > gcc 3.4.5 -O3 -funroll-loops: 0.062 secs > calculate and print. Next time, use exitWith, please. -- (c) this sig last receiving data processing entity. Inspect headers for copyright history. All rights reserved. Copying, hiring, renting, performance and/or quoting of this signature prohibited. _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by Bulat Ziganshin-2
On a G4:
s.hs (which does not need bang patterns) is: > main = seq (sum0 (10^9) 0) (return ()) > > sum0 :: Int -> Int -> Int > sum0 0 acc = acc > sum0 x acc = sum0 (x-1) $! (acc+x) And s.c is (actually including 10^9, which Bulat's did not): > main() > { > int sum=0; > for(int i=1000*1000*1000; i>0; i--) > sum += i; > } I compiled them with ghc --make -O2 s.hs -o shs gcc -o sc -std=c99 -O3 -funroll-loops s.c And timed them: $ time ./shs real 0m3.309s user 0m3.008s sys 0m0.026s $ time ./sc real 0m0.411s user 0m0.316s sys 0m0.006s So C is 9.4 times faster. And via-C did not help: $ ghc -fvia-C -optc "-O3 -funroll-loops" --make -O2 s.hs -o shs-via-C $ time ./shs-via-C real 0m7.051s user 0m3.010s sys 0m0.050s -- Chris _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by Dan Doel
Hello Dan,
Friday, February 20, 2009, 5:39:25 PM, you wrote: > Not that I'd be sad if GHC could reduce that whole constant at compile time, > but GCC isn't doing 1 billion adds in 6 (or even 60) milliseconds. yes, that's what was done actually: 22 0020 8D44D01C leal 28(%eax,%edx,8), %eax 23 0024 83C208 addl $8, %edx so, i rechecked with multiplies: mult.hs 12.667 mult-fast.hs 2.512 mult.cpp 0.938 and xors: mult.hs 12.605 mult-fast.hs 1.856 xor.cpp 0.339 -- Best regards, Bulat mailto:[hidden email] _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by Achim Schneider
When I change the C++ program into:
int n; scanf("%d", &n);
for(i=0; i<n;i++) { sum += i;
} GCC need 100 milliseconds on my 3.0GHz new Xeon with loop unrolling enabled. Without loop unrolling GCC needs about 635ms
Visual C++ does it in 577 ms, generating the following code: loop: add rbx,rax inc rax cmp rax,rcx
jl loop GHC with -O2 -fvia-c (the fastest I could make it) needs 13075 for the naive sum 2100 ms with sum0 2018 ms using the stream-fusion Interesting to see that the stream-fusion was slower when not doing -fvia-c (more than twice as slow with -O) So GHC is about 3 to 4 times slower as Visual C++ / GCC without loop unrolling, which is not too bad since GHC does not perform register optimization and loop unrolling yet no? On Fri, Feb 20, 2009 at 3:44 PM, Achim Schneider <[hidden email]> wrote:
_______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by Achim Schneider
Hello Achim,
Friday, February 20, 2009, 5:44:44 PM, you wrote: > Nice! Now we know that gcc can calculate faster than Haskell can > calculate and print. Next time, use exitWith, please. it was done in order to simplify sources. are you really believe that ghc needs more than 1 millisecond to print one number? :) -- Best regards, Bulat mailto:[hidden email] _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by Peter Verswyvelen-2
>>>>> "Peter" == Peter Verswyvelen <[hidden email]> writes:
Peter> So GHC is about 3 to 4 times slower as Visual C++ / GCC Peter> without loop unrolling, which is not too bad since GHC does Peter> not perform register optimization and loop unrolling yet Peter> no? I would call it rather poor. And I don't accept a since of that form as valid mitigation. -- Colin Adams Preston Lancashire _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by Bulat Ziganshin-2
Bulat Ziganshin <[hidden email]> wrote:
> Hello Achim, > > Friday, February 20, 2009, 5:44:44 PM, you wrote: > > > Nice! Now we know that gcc can calculate faster than Haskell can > > calculate and print. Next time, use exitWith, please. > > it was done in order to simplify sources. are you really believe that > ghc needs more than 1 millisecond to print one number? :) > -- (c) this sig last receiving data processing entity. Inspect headers for copyright history. All rights reserved. Copying, hiring, renting, performance and/or quoting of this signature prohibited. _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by Peter Verswyvelen-2
Hello Peter,
Friday, February 20, 2009, 6:18:50 PM, you wrote: > So GHC is about 3 to 4 times slower as Visual C++ / GCC without > loop unrolling why stop on disabling loop unrolling? there are lot of options we can use if we want to make gcc slower :D -- Best regards, Bulat mailto:[hidden email] _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by Colin Paul Adams
Well C# does it with a for loop in 2300ms, and when using a IEnumerable sequence it needs 19936ms. Very much like the Haskell code. But of course the Haskell code could optimize the sum I guess, I assume it is using the lazy version of sum by default.
Anyway it was more of a question. Does GHC perform register allocation (e.g. using graph colouring) and loop unrolling? On Fri, Feb 20, 2009 at 4:22 PM, Colin Paul Adams <[hidden email]> wrote: >>>>> "Peter" == Peter Verswyvelen <[hidden email]> writes: _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by Achim Schneider
Hello Achim,
Friday, February 20, 2009, 6:25:31 PM, you wrote: >> it was done in order to simplify sources. are you really believe that >> ghc needs more than 1 millisecond to print one number? :) >> > Well, I know that (Show a) is about as slow as you can get. yes, but it's printed only once against 10^9 computations -- Best regards, Bulat mailto:[hidden email] _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by Peter Verswyvelen-2
Hello Peter,
Friday, February 20, 2009, 6:34:04 PM, you wrote: > Well C# does it with a for loop in 2300ms, and when using a > IEnumerable sequence it needs 19936ms. Very much like the Haskell > code. But of course the Haskell code could optimize the sum I guess, > I assume it is using the lazy version of sum by default. the question is what is the natural for every language > Anyway it was more of a question. Does GHC perform register > allocation (e.g. using graph colouring) and loop unrolling? afaik, ghc can be compared with 20-years old C compilers. it uses registers for performing tight loops but has very simple register allocation procedure. also it doesn't unroll loops > On Fri, Feb 20, 2009 at 4:22 PM, Colin Paul Adams <[hidden email]> wrote: > >>>>>> "Peter" == Peter Verswyvelen <[hidden email]> writes: > > Peter> So GHC is about 3 to 4 times slower as Visual C++ / GCC > Peter> without loop unrolling, which is not too bad since GHC does > Peter> not perform register optimization and loop unrolling yet > Peter> no? > > I would call it rather poor. > > And I don't accept a since of that form as valid mitigation. > -- > Colin Adams > Preston Lancashire > > -- Best regards, Bulat mailto:[hidden email] _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by Dan Doel
On Fri, Feb 20, 2009 at 6:39 AM, Dan Doel <[hidden email]> wrote: Sorry for replying to myself, but I got suspicious about the 6ms runtime of The GCC optimizer must know that you can't return a value to user space of that large as a return result.
In Haskell you're printing it... why not print it in C++?
_______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by Bulat Ziganshin-2
Bulat Ziganshin <[hidden email]> wrote:
> Hello Peter, > > Friday, February 20, 2009, 6:34:04 PM, you wrote: > > > Well C# does it with a for loop in 2300ms, and when using a > > IEnumerable sequence it needs__19936ms. Very much like the Haskell > > code. But of course the Haskell code could optimize the sum I guess, > > I assume it is using the lazy version of sum by default. > > the question is what is the natural for every language > > > Anyway it was more of a question.__Does GHC perform register > > allocation (e.g. using graph colouring) __and loop unrolling? > > afaik, ghc can be compared with 20-years old C compilers. it uses > registers for performing tight loops but has very simple register > allocation procedure. also it doesn't unroll loops > I'm only asking because gcc fails to use _anything_ but plain registers. -- (c) this sig last receiving data processing entity. Inspect headers for copyright history. All rights reserved. Copying, hiring, renting, performance and/or quoting of this signature prohibited. _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
In reply to this post by David Leimbach
On Friday 20 February 2009 10:52:03 am David Leimbach wrote:
> The GCC optimizer must know that you can't return a value to user space of > that large as a return result. > > In Haskell you're printing it... why not print it in C++? I actually changed my local copy to print out the result (since I wanted to make sure it was using 64 bit ints). It didn't make a difference in the timing (of either the 32 or 64 bit version). _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
| Powered by Nabble | Edit this page |
