Hi,
I'd like to have some points of view on this subject. I need to start a project on neural analysis and genetic algorithms that will process many millions of records in a Linux X86_64 box (some development also in a X32 computer). Maybe I'll use fuzzy logic, or case-based reasoning, or decision trees, too. Recently, I finished a small project on genetic algorithms with Python, but since run-time is very important, my next project will need a fast language. Either I'll go to C++ or similar language or I'll try a functiional language - Haskell. Apart from some features in Python, I've never programmed thinking in a functional way. I'd like to avoid having to learn C++. I'd like to concentrate in getting the work done and Haskell seems like a solution. But... My questions are: - Is Haskell able to read (also write to a point) data from databases in a fast and reliable way? (MySql or PostgreSQL) - how could I program something like this in Haskell: .. generate random population .. for each one of the population: .. for time period 1 to ten million: .. evaluate method 1, 2, 3, 4, 5, 6, .... ..evaluate fitness of each one .. generate new population based on results of previous generation It seems relatively intuitive for me to program this in an imperative language. But what about in Haskell? - Is Haskell suitable to process data like this in a fast way (aproximate to C++?) - In order for Haskell to be fast, coding is done in a 'natural' way or with use of special hidden details of the language? - Although I always liked math, I no longer have the knowledge I used to have several years ago. Is this important to help program in this funcional language? - Are there graphical packages available to plot results or is it easy to connect it to a Python (or C) library? - Is code easily reusable in different future projects? Since it has no objects... how can it be done? Sorry for all these questions, but I really need to know about this and that's why I want to read answers from knowledgeable people. Luis |
Hi. Why not? A friend of mine is a resercher in AI and is an expert in Neural network (solution spaces, matrices, etc). His programming language is Haskell. He is very happy to use Haskell (on MacOS X) in its reserach. Personally I use Haskell with Postgres using HDBC or calling the postgress function in pqlib using the Haskell FFI (may favourite way). I'm also an experienced C++ programmer (12 years) and I can say that C++ is a very complex language to learn from scratch and master it. Now I'm finding a little bit hard to switch from an imperative language to a functional language. Anyway Haskell is a very beautiful and elegant and powerful language that you can use for everything. About using a GUI, I've never used one on Mac. Sorry. I know that exist a Haskell extension called GTK2HS (see: http://www.haskell.org/gtk2hs/). If you want to use GHC (the de-facto compiler for Haskell), the code produced is very fast and you have a very powerful support for concurrency and parallelism. For further information go to: http://www.haskell.org/ghc/ where you can find a lot of useful documentation and answers to ur questions. Luca. > Date: Fri, 6 Nov 2009 13:58:16 +0000 > From: [hidden email] > To: [hidden email] > Subject: [Haskell-beginners] Is Haskell for me? > > Hi, > > I'd like to have some points of view on this subject. > > I need to start a project on neural analysis and genetic algorithms > that will process many millions of records in a Linux X86_64 box (some > development also in a X32 computer). Maybe I'll use fuzzy logic, or > case-based reasoning, or decision trees, too. > > Recently, I finished a small project on genetic algorithms with > Python, but since run-time is very important, my next project will > need a fast language. Either I'll go to C++ or similar language or > I'll try a functiional language - Haskell. > Apart from some features in Python, I've never programmed thinking in > a functional way. > I'd like to avoid having to learn C++. I'd like to concentrate in > getting the work done and Haskell seems like a solution. But... > > My questions are: > - Is Haskell able to read (also write to a point) data from databases > in a fast and reliable way? (MySql or PostgreSQL) > > - how could I program something like this in Haskell: > .. generate random population > .. for each one of the population: > .. for time period 1 to ten million: > .. evaluate method 1, 2, 3, 4, 5, 6, .... > ..evaluate fitness of each one > .. generate new population based on results of previous generation > > It seems relatively intuitive for me to program this in an imperative > language. But what about in Haskell? > > - Is Haskell suitable to process data like this in a fast way > (aproximate to C++?) > > - In order for Haskell to be fast, coding is done in a 'natural' way > or with use of special hidden details of the language? > > - Although I always liked math, I no longer have the knowledge I used > to have several years ago. Is this important to help program in this > funcional language? > > - Are there graphical packages available to plot results or is it easy > to connect it to a Python (or C) library? > > - Is code easily reusable in different future projects? Since it has > no objects... how can it be done? > > Sorry for all these questions, but I really need to know about this > and that's why I want to read answers from knowledgeable people. > > > Luis > _______________________________________________ > Beginners mailing list > [hidden email] > http://www.haskell.org/mailman/listinfo/beginners _________________________________________________________________ New Windows 7: Simplify what you do everyday. Find the right PC for you. http://www.microsoft.com/uk/windows/buy/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.haskell.org/pipermail/beginners/attachments/20091106/851a2e23/attachment.html |
In reply to this post by Luis P. Mendes
> - how could I program something like this in Haskell: ..
> generate random population Since Haskell is a "pure" language, after you attribute a value to a variable they are glued together forever. So, the language itself can't have such a thing as a random number. (Here, someone with technical knowledge will probably correct me, but you get the poing.) Dealing with this (and other "outside world" stuff) requires learning how to deal with a special type construct called "Monad". It's the major step you will have to as a Haskell begginer. It's, though, extremely interesting, and extremely powerfull after you understand it. > - Although I always liked math, I no longer have the knowledge > I used to have several years ago. Is this important to help > program in this funcional language? No. But if you like math, you're probably going to find links to really interesting math while you learn. > (...) is it easy to connect it to a Python (or C) library? To C libraries the answer is yes. Actually, if I want to do C, I prefer to do it in Haskell :) A friend of mine, who uses neural networks in his MSc work, was impressed that in half an hour I could get better results using a low-level binding from libfann to Haskell (link below) than his own hand written code. http://hackage.haskell.org/package/bindings-fann > - Is code easily reusable in different future projects? Since it > has no objects... how can it be done? Yes, to a point you may find too radical at first. Even the most basic constructs are usually combinations of other pieces. After some time, you will start any code you write by searching pieces to join together instead of writing everything yourself. It is, though, usually easier to write code from scratch in Haskell than reuse code in imperative languages :) Biased opinions, of course, but I hope they help. Best, Maur?cio |
I'm certainly not an expert on Haskell, and my studies over the last
year have prevented me from really working in Haskell (or any language) much, but... Maur??cio CA wrote: > > - how could I program something like this in Haskell: .. > > generate random population > > Since Haskell is a "pure" language, after you attribute a value > to a variable they are glued together forever. So, the language > itself can't have such a thing as a random number. (Here, someone > with technical knowledge will probably correct me, but you get > the poing.) Dealing with this (and other "outside world" stuff) > requires learning how to deal with a special type construct > called "Monad". It's the major step you will have to as a Haskell > begginer. It's, though, extremely interesting, and extremely > powerfull after you understand it. to write random code. You need a basic understanding of "monads as computations", and you need to know how to use do notation, which should feel very much like programming in an imperative language. On the other hand, the rest of the algorithm Luis mentioned is *very* natural in Haskell once you learn how to deal with lists. > > > - Although I always liked math, I no longer have the knowledge > > I used to have several years ago. Is this important to help > > program in this funcional language? > > No. But if you like math, you're probably going to find links to > really interesting math while you learn. I Definitely agree. Being a math student, one of the things I like about Haskell is how it's a "down to earth" example of some very high level concepts showing up-- without necessarily needing to learn all the math that's there. (I spend more time learning the math than the Haskell, but that's a result of my biases, not need.) Now, your other (related) questions: - Is Haskell suitable to process data like this in a fast way (aproximate to C++?) - In order for Haskell to be fast, coding is done in a 'natural' way or with use of special hidden details of the language? Idiomatic Haskell won't be as fast as idiomatic C++, but it will blow Python away. Since I assume you'll be working with large data sets, lazy evaluation may actually make your life much easier. If you really are pushing for lightening fast code, you can do some Don Stewart style hacking, or you can link to C using the foreign function interface. I have no experience with either, but I've heard nothing but the best about Haskell/C interaction. Cheers, Cory |
In reply to this post by Luis P. Mendes
Hi!
Am Freitag, den 06.11.2009, 13:58 +0000 schrieb Luis P. Mendes: > - Is Haskell able to read (also write to a point) data from databases > in a fast and reliable way? (MySql or PostgreSQL) You might want to check out these links: http://software.complete.org/software/projects/show/hdbc http://software.complete.org/static/hdbc-odbc/doc//HDBC-odbc/Database-HDBC-ODBC.html http://sites.google.com/site/haskell/notes/connecting-to-mysql-with-haskell http://www.volker-wysk.de/mysql-hs/ > - how could I program something like this in Haskell: > .. generate random population > .. for each one of the population: > .. for time period 1 to ten million: > .. evaluate method 1, 2, 3, 4, 5, 6, .... > ..evaluate fitness of each one > .. generate new population based on results of previous generation You will have to change your thinking a lot. Forget about "for each one of the population". Think "Map a function to the list which contains the population". When I started I thought I knew map from Python - and in fact I did underestimate "map" all the time in Python. For time 1 to ten million evaluate a method? No - map "evaluate fitness" to the list [1..10000000] (Don't worry about big numbers - it's all lazily evaluated. "Generate new population based on previous generation"? It's just recursion and where will you find better recursion than in a functional language? > - Are there graphical packages available to plot results or is it easy > to connect it to a Python (or C) library? There are some graphical packages, but if everything else fails: Do fast computing in Haskell and plot the results using Python. > - Is code easily reusable in different future projects? Since it has > no objects... how can it be done? You do not reuse objects, you reuse functions - that's why they call it functional and yes > Sorry for all these questions, but I really need to know about this > and that's why I want to read answers from knowledgeable people. Sorry, I'm a beginner myself and would not actually call myself "knowledgeable". However, the learning curve is steep and lots of things that seem to be very complicated are soon very logic! You'll need some time to get the new ideas. If you got that, it's really worthwhile! Greets, Bernhard |
In reply to this post by Cory Knapp
On Friday 06 November 2009 09:52:48 am Cory Knapp wrote:
> Idiomatic Haskell won't be as fast as idiomatic C++, but it will blow > Python away. Based on the little bit of stuff I've done, I think I'd characterize it this way: C++ will be maybe twice as fast as Haskell. Maybe a little more, maybe a little less, depending on a lot of details. For heavy computation, Python will be a couple orders of magnitude slower than both. IOW, Haskell is slower than C++ but it's in the same ballpark. Would anyone disagree? Shawn. |
WARNING: THIS IS A VERY LONG REPLY
Luis, If you are in a hurry skip to the point where I give advices ;-) ----------------------------------------SKIP IF YOU LIKE TO---------------------------------------------------- I'll ask some more questions!! :-P - Nothing beats Assembly in speed. Why isn't everyone programming in Assembly?? - Python gives me the flexibility for fast prototyping, because it easily connects to C and Java libraries Why isn't every enterprise service bus (ESB) implementation written in Python?? - Ruby has Rails!! One of most successful MVC frameworks ever. Why isn't every system written in Ruby on Rails? Basically, what I am trying to say is that even though Haskell is not blazing fast, it can be made fast enough for you, given you use the right algorithms and optimizations. For this to happen, you must first understand, the functional paradigm, the language and how the compiler optimizes your code. For the rest of your questions: - Is Haskell able to read (also write to a point) data from databases in a fast and reliable way? (MySql or PostgreSQL) Yes, there is a lot of ways. Look at Hackage (HackageDB)<http://hackage.haskell.org/packages/hackage.html>for database related packages - how could I program something like this in Haskell: .. generate random population .. for each one of the population: .. for time period 1 to ten million: .. evaluate method 1, 2, 3, 4, 5, 6, .... ..evaluate fitness of each one .. generate new population based on results of previous generation It seems relatively intuitive for me to program this in an imperative language. But what about in Haskell? It looks rather imperative to me, either. You can implement imperative things on the IO and State monads, but I would really suggest you rethink the algorithm. - Is Haskell suitable to process data like this in a fast way (aproximate to C++?) How fast, again, depends on the algorithm. Haskell programs look like a collection of mathematical functions. One could write a program like a composition of functions: main= count . words . readfile and then define functions individually... - In order for Haskell to be fast, coding is done in a 'natural' way or with use of special hidden details of the language? The natural way makes you code right, but you have to think every step you do, since a badly organized recursion, or the excess of lazyness can make your program hog on memory and/or go dead slow. - Although I always liked math, I no longer have the knowledge I used to have several years ago. Is this important to help program in this funcional language? The math needed in your first steps is not hard. After you started learning Haskell you will most certainly step on Monads. These might require some abstract algebra, if you want to understand what's behind the curtains. Refrain yourself from trying to understand them using math and take a look on Philip Wadler's "Monads for functional programming"<http://homepages.inf.ed.ac.uk/wadler/papers/marktoberdorf/baastad.pdf> - Are there graphical packages available to plot results or is it easy to connect it to a Python (or C) library? Hackage is the way to go. - Is code easily reusable in different future projects? Since it has no objects... how can it be done? Like all other languages, it all depends on YOU. Haskell has no objects, but it has polymorphic functions, which are just as powerful. It also has type classes, which are almost like C++ object classes, but completely different. Type classes can restrict or increase polymorphism, depending on how you use them. There are also MANY extensions implemented on GHC that expand the type system to its limit. ---------------------------------------- END OF SKIP BLOCK---------------------------------------------------- Is Haskell for you? I can't answer, but I can give some advice: 1) If you are in a hurry, go for what you know (C++, Java, Python, Assembly...) 2) If you really want to dig in, dig in. Learning Haskell is a wonderful experience and can dramatically change your way of programming. Think of a mind altering experience!!! 3) Even though it is hard to write great programs at first, in the end it is very rewarding. As your programming style improves, you'll see how elegant algorithms are implemented in functional programming. I still consider myself a beginner, but I can assure you Haskell is a great language for functional programming. Best regards, Rafael Gustavo da Cunha Pereira Pinto -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.haskell.org/pipermail/beginners/attachments/20091106/82db49e7/attachment-0001.html |
In reply to this post by Shawn Willden
Am Freitag 06 November 2009 18:19:50 schrieb Shawn Willden:
> On Friday 06 November 2009 09:52:48 am Cory Knapp wrote: > > Idiomatic Haskell won't be as fast as idiomatic C++, but it will blow > > Python away. > > Based on the little bit of stuff I've done, I think I'd characterize it > this way: C++ will be maybe twice as fast as Haskell. Maybe a little > more, maybe a little less, depending on a lot of details. Difficult waters. Depending on a lot of things, (rather idiomatic) Haskell is between a little faster (that's rare, however) and *much* slower than C (or C++, but I can only strongly advise against using that; pick C, C# or Java if you want to use a fast, halfway sensible imperative language. C++ consistently chose what I find the worst parts of both worlds - YMMV). Two things should be noted: - it's easy, especially if one isn't yet experienced, to write Haskell code that is much slower than C because of the wrong choice of data structures or not making things strict in the right places. - but if that happens, you have a good chance of quickly getting help to get up to speed here or on #haskell. That said, a factor of 2-3 relative to C is usually achievable without low-level tweaking, sometimes better, sometimes worse. > For heavy > computation, Python will be a couple orders of magnitude slower than both. For not very large values of 'couple': don't expect a factor of more than 100 - that's extremely rare. > > IOW, Haskell is slower than C++ but it's in the same ballpark. Yes, as a general rule, that's it. > > Would anyone disagree? Bulat? > > Shawn. |
In reply to this post by Shawn Willden
On 6 Nov 2009, at 17:19, Shawn Willden wrote: > Based on the little bit of stuff I've done, I think I'd characterize > it this > way: C++ will be maybe twice as fast as Haskell. Maybe a little > more, maybe > a little less, depending on a lot of details. For heavy > computation, Python > will be a couple orders of magnitude slower than both. To be fair, Python offloads its heavy lifting to C libraries - NumPy and SciPy run at very close to full C speed on large datasets. This is also how Matlab works. Unladen Swallow is an upcoming JIT compiler for Python. Where Haskell shines for computation is when you can leverage lazy evaluation. Cheers, G |
On 11/6/09, Gaius Hammond <[hidden email]> wrote:
> To be fair, Python offloads its heavy lifting to C libraries - NumPy > and SciPy run at very close to full C speed on large datasets. This is > also how Matlab works. Unladen Swallow is an upcoming JIT compiler for > Python. > > Where Haskell shines for computation is when you can leverage lazy > evaluation. *If* you can offload most of your work to SciPy. Depending on what you do this is at least difficult. I don't know how much of a neural network can be represented as big fat matrix :). -- Felipe. |
On Fri, Nov 6, 2009 at 4:58 PM, Felipe Lessa <[hidden email]> wrote:
> On 11/6/09, Gaius Hammond <[hidden email]> wrote: >> To be fair, Python offloads its heavy lifting to C libraries - NumPy >> and SciPy run at very close to full C speed on large datasets. This is >> also how Matlab works. Unladen Swallow is an upcoming JIT compiler for >> Python. >> >> Where Haskell shines for computation is when you can leverage lazy >> evaluation. > > *If* you can offload most of your work to SciPy. Depending on what you > do this is at least difficult. I don't know how much of a neural > network can be represented as big fat matrix :). > > -- > Felipe. Neural networking can be easily handled in both Python and C. I've used my own Graphine graph theory package for it and found that it was quite easy and reasonably fast, and certainly LEDA is a very high performance, pretty easy-to-use library. There's no need to coerce ANNs into a matrix form if you don't want to. Geremy Condra |
In reply to this post by Shawn Willden
On Friday 06 November 2009 17:19:50 Shawn Willden wrote:
> On Friday 06 November 2009 09:52:48 am Cory Knapp wrote: > > Idiomatic Haskell won't be as fast as idiomatic C++, but it will blow > > Python away. > > Based on the little bit of stuff I've done, I think I'd characterize it > this way: C++ will be maybe twice as fast as Haskell. Maybe a little > more, maybe a little less, depending on a lot of details. For heavy > computation, Python will be a couple orders of magnitude slower than both. > > IOW, Haskell is slower than C++ but it's in the same ballpark. > > Would anyone disagree? The long-standing bug in GHC's garbage collector that makes writing a single boxed value into an array O(n) instead of O(1), because the GC dirties and retraverses the entire array, makes it impossible to write efficient Haskell code to solve many basic problems. Here is a simple dictionary benchmark where Python is 4x faster than Haskell because this bug means it is impossible to implement a competitively- performant dictionary: http://flyingfrogblog.blogspot.com/2009/04/f-vs-ocaml-vs-haskell-hash-table.html Haskell's celebrated idiomatic quicksort is actually 6,000x slower than a traditional implementation on this machine and consumes asymptotically more memory (making it useless for quite mundane problems): http://www.haskell.org/haskellwiki/Introduction#Quicksort_in_Haskell Although people have created array-based quicksorts in Haskell the same perf bug in the GC means that a generic quicksort in Haskell would be asymptotically slower if it were given an array of boxed values. As an aside, purity complicates the creation of a parallel generic quicksort because it is necessary to mutate different parts of the same array in parallel. AFAICT, writing an efficient parallel generic quicksort is an unsolved problem in Haskell. Suffice to say, I cannot agree that Haskell is in the same ballpark as C++ in terms of performance. :-) -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e |
On 21/11/2009, at 15:36 , Jon Harrop wrote: > The long-standing bug in GHC's garbage collector that makes writing a single > boxed value into an array O(n) instead of O(1), because the GC dirties and > retraverses the entire array, makes it impossible to write efficient Haskell > code to solve many basic problems. Are you talking about http://hackage.haskell.org/trac/ghc/ticket/650 ? The way I read that ticket is that writing a boxed value to a mutable array is still O(1), but the garbage collector traverses the whole array during scanning. That could certainly slow down GC cycles, but how does it make array update O(n)? (update of the standard Haskell "pure" arrays being a separate issue, of course). Ben. |
Excerpts from Ben Lippmeier's message of Sat Nov 21 07:38:09 +0100 2009:
> > On 21/11/2009, at 15:36 , Jon Harrop wrote: > > > The long-standing bug in GHC's garbage collector that makes writing a single > > boxed value into an array O(n) instead of O(1), because the GC dirties and > > retraverses the entire array, makes it impossible to write efficient Haskell > > code to solve many basic problems. > > Are you talking about http://hackage.haskell.org/trac/ghc/ticket/650 ? > > The way I read that ticket is that writing a boxed value to a mutable array is still O(1), but the garbage collector traverses the whole array during > scanning. That could certainly slow down GC cycles, but how does it make array update O(n)? He means in worst case. Writing once, will cause O(n) during the next GC. Of course if you do a lot of updates between two GCs their is no extra penalty. So if you make 'k' updates between 2 GCs it costs you k*O(1)+O(n) which is still O(n) but practically nicer than k*O(n). -- Nicolas Pouillard http://nicolaspouillard.fr |
On 21/11/2009, at 21:13 , Nicolas Pouillard wrote: > Excerpts from Ben Lippmeier's message of Sat Nov 21 07:38:09 +0100 2009: >> >> On 21/11/2009, at 15:36 , Jon Harrop wrote: >> >>> The long-standing bug in GHC's garbage collector that makes writing a single >>> boxed value into an array O(n) instead of O(1), because the GC dirties and >>> retraverses the entire array, makes it impossible to write efficient Haskell >>> code to solve many basic problems. >> >> Are you talking about http://hackage.haskell.org/trac/ghc/ticket/650 ? >> >> The way I read that ticket is that writing a boxed value to a mutable array is still O(1), but the garbage collector traverses the whole array during >> scanning. That could certainly slow down GC cycles, but how does it make array update O(n)? > > He means in worst case. Writing once, will cause O(n) during the next GC. Of > course if you do a lot of updates between two GCs their is no extra penalty. > So if you make 'k' updates between 2 GCs it costs you k*O(1)+O(n) which is > still O(n) but practically nicer than k*O(n). > Hmm. I'd be careful about conflating algorithmic complexity with memory management issues. By the above reasoning, if I were to run any program using arrays on a system with a two space garbage collector (which copies all live objects during each GC) I could say the worst case algorithmic complexity was O(n). That doesn't sound right. I could take this further and say that in a virtual memory system, there is a chance that the whole heap gets copied to the disk and back between each array update. I might again say this has O(n) complexity, but it wouldn't be very helpful... Ben. |
Excerpts from Ben Lippmeier's message of Sat Nov 21 12:56:09 +0100 2009:
> > On 21/11/2009, at 21:13 , Nicolas Pouillard wrote: > > > Excerpts from Ben Lippmeier's message of Sat Nov 21 07:38:09 +0100 2009: > >> > >> On 21/11/2009, at 15:36 , Jon Harrop wrote: > >> > >>> The long-standing bug in GHC's garbage collector that makes writing a single > >>> boxed value into an array O(n) instead of O(1), because the GC dirties and > >>> retraverses the entire array, makes it impossible to write efficient Haskell > >>> code to solve many basic problems. > >> > >> Are you talking about http://hackage.haskell.org/trac/ghc/ticket/650 ? > >> > >> The way I read that ticket is that writing a boxed value to a mutable array is still O(1), but the garbage collector traverses the whole array during > >> scanning. That could certainly slow down GC cycles, but how does it make array update O(n)? > > > > He means in worst case. Writing once, will cause O(n) during the next GC. Of > > course if you do a lot of updates between two GCs their is no extra penalty. > > So if you make 'k' updates between 2 GCs it costs you k*O(1)+O(n) which is > > still O(n) but practically nicer than k*O(n). > > > Hmm. I'd be careful about conflating algorithmic complexity with memory management issues. By the above reasoning, if I were to run any program using > arrays on a system with a two space garbage collector (which copies all live objects during each GC) I could say the worst case algorithmic complexity was > O(n). That doesn't sound right. > I could take this further and say that in a virtual memory system, there is a chance that the whole heap gets copied to the disk and back between each > array update. I might again say this has O(n) complexity, but it wouldn't be very helpful... Your algorithm is O(1) but the current run-time system can take a time closer to O(n). This could be the case with a two spaces GC or with swapping. -- Nicolas Pouillard http://nicolaspouillard.fr |
In reply to this post by Ben Lippmeier
On Saturday 21 November 2009 11:56:09 Ben Lippmeier wrote:
> Hmm. I'd be careful about conflating algorithmic complexity with memory > management issues. No need. Just look at how badly the performance scales for Haskell vs other languages. For example, inserting 1-16 million floating point key/values into a hash table: Haskell OCaml F# 1M: 3.198s 1.0x 1.129s 1.0x 0.080s 1.0x 2M: 8.498s 2.7x 2.313s 2.0x 0.138s 1.7x 4M: 25.697s 8.0x 4.567s 4.0x 0.281s 3.5x 8M: 97.994s 30.6x 10.450s 9.3x 0.637s 8.0x 16M: 388.080s 121.4x 23.261s 20.6x 1.338s 16.7x Note that Haskell is 290x slower than F# on that last test. In practice, you would turn to a purely functional dictionary in Haskell based upon balanced binary trees in order to work around this long-standing bug in the GC but those trees incur O(log n) indirections and typically run orders of magnitude slower than a decent hash table. Suffice to say, Haskell is nowhere near being in the ballpark of C++'s performance for basic functionality like dictionaries and sorting. > By the above reasoning, if I were to run any program > using arrays on a system with a two space garbage collector (which copies > all live objects during each GC) I could say the worst case algorithmic > complexity was O(n). That doesn't sound right. Can you write a program that demonstrates this effect as I did? > I could take this further and say that in a virtual memory system, there is > a chance that the whole heap gets copied to the disk and back between each > array update. Can you write a program that demonstrates this effect as I did? -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e |
In reply to this post by Luis P. Mendes
I'm not a Haskell expert-- in fact, I'm a beginner, and that's why I'm here,
but I read your blog post on the subject of this hash table program, and it seems to me from reading the comments in reply that you're just here trolling, because you're using an algorithm that is fundamentally not purely functional, and of course that's going to be slower, just like asking Joel Zumaya to pitch left handed. If you were being honest about your complaint, you'd make an apples-to-apples comparison, and a number of commenters on your blog have proposed implementations that perform much better than your example. Sorry if I'm just being a jerk, Nathan M. Holden, Haskell Beginner On Saturday 21 November 2009 12:55:03 pm [hidden email] wrote: > Message: 4 > Date: Sat, 21 Nov 2009 18:02:33 +0000 > From: Jon Harrop <[hidden email]> > Subject: Re: [Haskell-beginners] Re: Is Haskell for me? > To: Ben Lippmeier <[hidden email]> > Cc: beginners <[hidden email]> > Message-ID: <[hidden email]> > Content-Type: text/plain; charset="iso-8859-1" > > On Saturday 21 November 2009 11:56:09 Ben Lippmeier wrote: > > Hmm. I'd be careful about conflating algorithmic complexity with memory > > management issues. > > No need. Just look at how badly the performance scales for Haskell vs other > languages. For example, inserting 1-16 million floating point key/values > into a hash table: > > Haskell OCaml F# > 1M: 3.198s 1.0x 1.129s 1.0x 0.080s 1.0x > 2M: 8.498s 2.7x 2.313s 2.0x 0.138s 1.7x > 4M: 25.697s 8.0x 4.567s 4.0x 0.281s 3.5x > 8M: 97.994s 30.6x 10.450s 9.3x 0.637s 8.0x > 16M: 388.080s 121.4x 23.261s 20.6x 1.338s 16.7x > > Note that Haskell is 290x slower than F# on that last test. > > In practice, you would turn to a purely functional dictionary in Haskell > based upon balanced binary trees in order to work around this > long-standing bug in the GC but those trees incur O(log n) indirections > and typically run orders of magnitude slower than a decent hash table. > > Suffice to say, Haskell is nowhere near being in the ballpark of C++'s > performance for basic functionality like dictionaries and sorting. > > > By the above reasoning, if I were to run any program > > using arrays on a system with a two space garbage collector (which copies > > all live objects during each GC) I could say the worst case algorithmic > > complexity was O(n). That doesn't sound right. > > Can you write a program that demonstrates this effect as I did? > > > I could take this further and say that in a virtual memory system, there > > is a chance that the whole heap gets copied to the disk and back between > > each array update. > > Can you write a program that demonstrates this effect as I did? |
On Saturday 21 November 2009 18:57:21 you wrote:
> I'm not a Haskell expert-- in fact, I'm a beginner, and that's why I'm > here, but I read your blog post on the subject of this hash table program, > and it seems to me from reading the comments in reply that you're just here > trolling, because you're using an algorithm that is fundamentally not > purely functional, and of course that's going to be slower, just like > asking Joel Zumaya to pitch left handed. > > If you were being honest about your complaint, you'd make an > apples-to-apples comparison, and a number of commenters on your blog have > proposed implementations that perform much better than your example. > > Sorry if I'm just being a jerk, Not at all, that is a perfectly reasonable concern but I did already try to address it in my previous post: > > In practice, you would turn to a purely functional dictionary in Haskell > > based upon balanced binary trees in order to work around this > > long-standing bug in the GC but those trees incur O(log n) indirections > > and typically run orders of magnitude slower than a decent hash table. For example, the following Haskell program builds a purely functional Data.Map: module Main where import Prelude hiding (lookup) import Data.Map (empty, insert, lookup, size) n = 1000000 build m 0 = m build m n = build (insert x (1.0 / x) m) (n-1) where x = fromIntegral n :: Double main = do let m = build empty n (Just v) = lookup 100 m print $ size m print v Running this program with different "n" gives: Data.Map 1M: 2.797s 1.0x 2M: 6.090s 2.2x 4M: 14.226s 5.1x 8M: 28.449s 10.2x 16M: 83.171s 29.7x This is several times faster than Haskell's Data.Hashtable (because of the long-standing bug in their GC that I described) and is scaling better. However, if you compare with the timings I gave before: Data.Hashtable OCaml F# 1M: 3.198s 1.0x 1.129s 1.0x 0.080s 1.0x 2M: 8.498s 2.7x 2.313s 2.0x 0.138s 1.7x 4M: 25.697s 8.0x 4.567s 4.0x 0.281s 3.5x 8M: 97.994s 30.6x 10.450s 9.3x 0.637s 8.0x 16M: 388.080s 121.4x 23.261s 20.6x 1.338s 16.7x you'll see that the absolute performance of this idiomatic Haskell solution is still absolutely awful: consistently about 50x slower than the F#. This is also true in the context of sorting: Haskell's standard library routines for sorting are orders of magnitude slower than those found in most other compiled languages. Suffice to say, idiomatic Haskell is also nowhere near being in the same ballpark as C++ with respect to performance. Realistically, with enough expertise you should be able to optimize most of your Haskell programs to beat Python's performance but there are some important cases where you will not even be able to do that. -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e |
In reply to this post by Jon Harrop
On 22/11/2009, at 5:02 , Jon Harrop wrote: > >> By the above reasoning, if I were to run any program >> using arrays on a system with a two space garbage collector (which copies >> all live objects during each GC) I could say the worst case algorithmic >> complexity was O(n). That doesn't sound right. > > Can you write a program that demonstrates this effect as I did? I believe your numbers, but I don't have a F# install to test against ATM. > Suffice to say, Haskell is nowhere near being in the ballpark of C++'s > performance for basic functionality like dictionaries and sorting. Sorry, I wasn't trying to start a pissing match. I just wanted to know what the issue was so I could avoid it in my own programs. Thanks! Ben. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.haskell.org/pipermail/beginners/attachments/20091121/76b2abd0/attachment-0001.html |
Free forum by Nabble | Edit this page |