Dear everyone, I'm always grateful to your help.
I have been assigned a complicated and growing task in which I'll perform a lot of discrete Fourier transforms, so I have measured performance of several DFT libraries in Haskell: http://en.pk.paraiso-lang.org/Hackage/what-is-the-fastest-dft-in-haskell/main The raw result: http://paraiso-lang.org/html/bench-dft-in-haskell.html I'll share the result in hope that some of you will also find this result useful. Also, please let me know any possible flaws or improvements in the benchmark process! My observations are as follows: * vector-fftw with wisdom was more than 1/2 times faster than fftw in C with wisdom (and with communication overhead.) * vector-fftw without wisdom was significantly _faster_ than fftw in C without wisdom. I wonder why. * vector-fftw over vector was faster than fft over CArray. * any library that doesn't use fftw is much slower than those that does. Best, -- Takayuki MURANUSHI The Hakubi Center for Advanced Research, Kyoto University http://www.hakubi.kyoto-u.ac.jp/02_mem/h22/muranushi.html _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
Takayuki Muranushi <[hidden email]> wrote:
> * vector-fftw with wisdom was more than 1/2 times faster than fftw in > C with wisdom (and with communication overhead.) > * vector-fftw without wisdom was significantly _faster_ than fftw in C > without wisdom. I wonder why. > * vector-fftw over vector was faster than fft over CArray. > * any library that doesn't use fftw is much slower than those that > does. I have no experience with FFTW, but in general a result like this often means that you may not have actually calculated the values themselves. One easy way to ensure this is to print out the whole result. If you feel like printing takes too much CPU time for comparison, you need to force deeply like with deepseq. Notably Data.Vector is a lazy data structure. If you force the vector itself, you are not forcing the individual values. For FFT I would assume that the length of the resulting vector does not depend on any values. Greets, Ertugrul -- Not to be or to be and (not to be or to be and (not to be or to be and (not to be or to be and ... that is the list monad. _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe signature.asc (853 bytes) Download Attachment |
Ertugrul:
I might be missing something in translation, but if I understand Takayuki's message's intent, everything needs to be calculated because the C-based FFTW library is called (eventually). Laziness doesn't really have an impact. The choice of underlying data structure and whether FFTW wisdom is used clearly has a significant impact. FFTW and Intel's MKL libraries are the acknowledged "state of the art" libraries for performing discrete Fourier transforms. I'm not sure there's anything better or faster for CPU implementations (I know there's a O(1) implementation for map-reduce systems and NVIDIA's CUDA-FFT. Note that the map-reduce approach has a preprocessing step that isn't O(1).) Interesting to note that much of the code for FFTW was initially generated using OCaml to find optimal versions of code for particular problem sizes. -scooter On Sun, Aug 5, 2012 at 6:37 PM, Ertugrul Söylemez <[hidden email]> wrote:
_______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
Scott Michel <[hidden email]> wrote:
> I might be missing something in translation, but if I understand > Takayuki's message's intent, everything needs to be calculated because > the C-based FFTW library is called (eventually). Laziness doesn't > really have an impact. > > The choice of underlying data structure and whether FFTW wisdom is > used clearly has a significant impact. If the Haskell wrapper library is a thick enough, lazy layer around FFTW, the size of the result vector may not at all depend on any FFTW computation. Again, I have no experience at all with FFTW or any Haskell bindings to it. This is just a general remark that is worth keeping in mind. Greets, Ertugrul -- Key-ID: E5DD8D11 "Ertugrul Soeylemez <[hidden email]>" FPrint: BD28 3E3F BE63 BADD 4157 9134 D56A 37FA E5DD 8D11 Keysrv: hkp://subkeys.pgp.net/ _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe signature.asc (853 bytes) Download Attachment |
In reply to this post by Takayuki Muranushi
Takayuki Muranushi wrote:
> * vector-fftw with wisdom was more than 1/2 times faster than fftw in > C with wisdom (and with communication overhead.) I would be suspicious of that result. Calling a C function from a library should be slower from Haskell than from C. Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/ _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
Dear Ertugrul, Scott and Erik, thank you for your comments.
w.r.t the lazyness, I make the solvers to calculate the amplitude of final FFT results (i.e. to calculate the square magnitude of array elements and sum over them,) compare the response with the expected results and cause side effects depending on the test result. This should cause the FFT chain to be fully evaluated. >> * vector-fftw with wisdom was more than 1/2 times faster than fftw in >> C with wisdom (and with communication overhead.) > I would be suspicious of that result. Calling a C function from a library > should be slower from Haskell than from C. Sorry for the confusion, What I meant is that vector-fftw version takes more time than C version, but less than twice. Please compare the two lines * "fft/cpp 1 1048576 102" * "fft/vector-fftw 0 1048576 102" in http://paraiso-lang.org/html/bench-dft-in-haskell.html . P.S. including GPU contestants would be interesting! 2012/8/6 Erik de Castro Lopo <[hidden email]>: > Takayuki Muranushi wrote: > >> * vector-fftw with wisdom was more than 1/2 times faster than fftw in >> C with wisdom (and with communication overhead.) > > I would be suspicious of that result. Calling a C function from a library > should be slower from Haskell than from C. > > Erik > -- > ---------------------------------------------------------------------- > Erik de Castro Lopo > http://www.mega-nerd.com/ > > _______________________________________________ > Haskell-Cafe mailing list > [hidden email] > http://www.haskell.org/mailman/listinfo/haskell-cafe Best, -- Takayuki MURANUSHI The Hakubi Center for Advanced Research, Kyoto University http://www.hakubi.kyoto-u.ac.jp/02_mem/h22/muranushi.html _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
Takayuki Muranushi wrote:
> >> * vector-fftw with wisdom was more than 1/2 times faster than fftw in > >> C with wisdom (and with communication overhead.) > > > I would be suspicious of that result. Calling a C function from a library > > should be slower from Haskell than from C. > > Sorry for the confusion, What I meant is that vector-fftw version takes > more time than C version, but less than twice. That makes much more sense. Whether you're calling fftw from C or from Haskell, its still the fftw library doing most of the work. As you increase the FFT length, the difference between C and Haskell should decrease. Cheers, Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/ _______________________________________________ Haskell-Cafe mailing list [hidden email] http://www.haskell.org/mailman/listinfo/haskell-cafe |
Free forum by Nabble | Edit this page |