Haskell Speed

classic Classic list List threaded Threaded
54 messages Options
123
Reply | Threaded
Open this post in threaded view
|

RE: Re: Haskell Speed

Branimir Maksimovic-2


>From: Isaac Gouy <[hidden email]>
>To: [hidden email]
>Subject: [Haskell-cafe] Re: Haskell Speed
>Date: Tue, 27 Dec 2005 20:12:01 -0800 (PST)
>
>Branimir Maksimovic wrote:
> > Of course, first example uses [String] instead of
>Data.HashTable
> > as other languages do. Imagine C program does not
>use
> > hash,rather list, how it will perform:)
>
>And the author comments his program
>-- This is a purely functional solution to the
>problem.
>-- An alternative which keeps a mutable table of
>occurences would
>-- be faster.
>
Yes, I saw that but that does not qualifies for benchmark with hash tables.
I mean program is ok, but this is aplles to oranges.

>We'll be happy to also show a Haskell program that
>uses Data.HashTable - first, someone needs to
>contribute that program.

Ok , i've attached hash table version. As I'm Haskell newbie I suppose
there is someone which can do much better.

>
>
> > I didn't look further after that.
>
>Ideal - you may criticize without the risk that others
>will criticize what you do.

Attachment follows, you can cricize my poor Haskell skills as much you want
:)
This version si actually D program trnslated to Haskell + file parsing code
from
original Haskell program. Now wou can compare no matter how slow it is :)

Greetings, Bane.

_________________________________________________________________
Don't just search. Find. Check out the new MSN Search!
http://search.msn.click-url.com/go/onm00200636ave/direct/01/

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe

knucleotide.hs (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Re: Haskell Speed

Albert Y. C. Lai
In reply to this post by Peter Simons
>   wc :: String -> (Int, Int, Int)
>   wc file = ( length (lines file)
>             , length (words file)
>             , length file
>             )

Most people tend to think that imperative programming novices don't
even start their obvious solutions from something as inefficient as
this.  Well, I beg to differ.

For almost a decade, most (I dare claim even all) Pascal and C
compilers were "three-pass" or "two-pass".  It means perhaps the
compiler reads the input two or three times (each for a different
task, just like the above), or perhaps the compiler reads the input
once, produces an intermediate form and saves it to disk, then reads
the intermediate form from disk, produces a second intermediate form
and saves it to disk, then reads the second intermediate form from
disk, then it can produce machine code.

It must have been the obvious method, since even though it was
obviously crappy, everyone was doing it, and "everyone" obviously
refers to both novices and gurus.  It must also have been pretty
non-obvious how to transition from the obvious slow method to a
one-pass fast method, since for almost a decade no one did it.  Most
of us had to wait until someone figured it out and then we had Turbo
Pascal.

And it was the case with imperative programming.
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Haskell Speed

Isaac Gouy-2
In reply to this post by Daniel Carrera-2
--- Isaac Gouy <[hidden email]> wrote:
> We'll be happy to also show a Haskell program that
> uses Data.HashTable - first, someone needs to
> contribute that program.

Someone did:  k-nucleotide Haskell GHC #2
http://shootout.alioth.debian.org/gp4/benchmark.php?test=knucleotide&lang=all


       
               
__________________________________
Yahoo! for Good - Make a difference this year.
http://brand.yahoo.com/cybergivingweek2005/
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Haskell Speed

Peter Simons
In reply to this post by Albert Y. C. Lai
Albert Lai writes:

 > For almost a decade, most (I dare claim even all) Pascal
 > and C compilers were "three-pass" or "two-pass". It means
 > perhaps the compiler reads the input two or three times
 > [...], or perhaps the compiler reads the input once,
 > produces an intermediate form and saves it to disk, then
 > reads the intermediate form from disk, produces a second
 > intermediate form and saves it to disk, then reads the
 > second intermediate form from disk, then it can produce
 > machine code.
 >
 > It must have been the obvious method, since even though
 > it was obviously crappy [...].

I beg to differ. Highly modular software is not necessarily
crapy if you're writing something as complex as a C or
Pascal compiler -- especially in times where RAM existed
only in miniscule amounts. A highly modularized algorithm
for counting lines, words, and characters in an input file,
however, is something altogether different. I doubt anyone
but the most inexperienced novices would have tried to do
that in three passes in a strict language.

Peter

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Re: Haskell Speed

Bill Wood-3
In reply to this post by Albert Y. C. Lai
On Thu, 2005-12-29 at 15:56 -0500, Albert Lai wrote:
   . . .
> one-pass fast method, since for almost a decade no one did it.  Most
> of us had to wait until someone figured it out and then we had Turbo

Judging from comments by U. Ammann [1],[2], part of the original Pascal
implementation team at ETH Zurich, the first Pascal compiler for the CDC
6x00 family of computers was written in 1970-1971, and a second was
developed from scratch for the same computers but compiling the revised
language, was written in 1972-1974.  I think both of these compilers
were one-pass.

 -- Bill Wood

[1] U. Ammann, "The Zurich Implementation", in D.W. Barron (ed.),
    _Pascal, The Language and its Implementation_, pp. 63-82, John Wiley
    and Sons, 1981.

[2] U. Ammann, "Code Generation for a Pascal Compiler", in D.W. Barron
    (ed.),  _Pascal, The Language and its Implementation_, pp. 83-124,
    John Wiley and Sons, 1981.


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re[2]: Re: Haskell Speed

Bulat Ziganshin
In reply to this post by Albert Y. C. Lai
Hello Albert,

Thursday, December 29, 2005, 11:56:12 PM, you wrote:

AL> For almost a decade, most (I dare claim even all) Pascal and C
AL> compilers were "three-pass" or "two-pass".

1) Pascal was developed as one-pass compiled language.
highly-optimizing compilers used additional passes to generate better
code

2) multiple passes was used in times where was just not enough memory
to perform all passes in parallel

--
Best regards,
 Bulat                            mailto:[hidden email]



_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

RE: Re: Haskell Speed

Branimir Maksimovic-2
In reply to this post by Isaac Gouy-2



>From: Isaac Gouy <[hidden email]>
>To: [hidden email]
>Subject: [Haskell-cafe] Re: Haskell Speed
>Date: Thu, 29 Dec 2005 13:00:15 -0800 (PST)
>
>--- Isaac Gouy <[hidden email]> wrote:
> > We'll be happy to also show a Haskell program that
> > uses Data.HashTable - first, someone needs to
> > contribute that program.
>
>Someone did:  k-nucleotide Haskell GHC #2
>http://shootout.alioth.debian.org/gp4/benchmark.php?test=knucleotide&lang=all
>
>
To comment some observation on this program.
Most of the pressure now is on Data.HashTable.
I've susspected such large memory usage on substring from array conversions,
so mad version with data MyString = MakeMyStrinf { buf :: Ptr Char, size ::
Int }
and there was no substrings in map or anywhere else, but memory
consumption remains.
So after eliminating inserts and updates into HashTable memory was ok.
Finally I've realized that updates into hash table actually increase
memory usage to large extent instead to keep memory same
on average. So I guess this is bug in HashTable?
Second in this test, hash function needs to be very strong,
as even with following I got longest chain of 16 elements.
myHashString = fromIntegral . ff''' . ff'' . ff' . foldr f 0
      where f c m = f'' $ f' (ord c + m)
            f' m = m + (m `shiftL` 10)
            f'' m = m `xor` (m `shiftR` 6)
            ff' m = m + (m `shiftL` 3)
            ff'' m = m `xor` (m `shiftR` 11)
            ff''' m = m + (m `shiftL` 15)
Default hashString has longestChain of 18 elements.
Perhaps someone can explain if such a memory leaks from HashTable updates
are normal or are bugs?
All in all functional version consumes less memory and is twice as
fast.

Greetings, Bane.

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Re: Haskell Speed

Jan-Willem Maessen

On Dec 29, 2005, at 7:44 PM, Branimir Maksimovic wrote:

> To comment some observation on this program.
> Most of the pressure now is on Data.HashTable.
> I've susspected such large memory usage on substring from array  
> conversions,
> so mad version with data MyString = MakeMyStrinf { buf :: Ptr Char,  
> size :: Int }
> and there was no substrings in map or anywhere else, but memory
> consumption remains.
> So after eliminating inserts and updates into HashTable memory was ok.
> Finally I've realized that updates into hash table actually increase
> memory usage to large extent instead to keep memory same
> on average. So I guess this is bug in HashTable?

Probably.  The minimum table chunk size was rather large.  I have  
been experimenting (tests are running even as I type) with alternate  
implementations of Data.HashTable.  So far the winning implementation  
is one based on multiplicative hashing and simple table doubling.  
This seems to be consistently competitive with / faster than  
Data.Map.  At this point my inclination is to make the whole suite  
available:

Data.HashTable.Class
Data.HashTable.DataMap
Data.HashTable.Doubling
Data.HashTable.Flat
Data.HashTable.Freeze
Data.HashTable.Modulus
Data.HashTable.Multiplicative
Data.HashTable.Original
Data.HashTable.Test
Data.HashTable

I've separated out the hashing technique (Modulus, Multiplicative)  
from the hash table implementation.  Note that a few of the above are  
bogus, and this is only a fraction of the implementations tried thus  
far.

I need to get distribution clearance for a bunch of this code from my  
employer, and figure out how to package it.  The latter may be  
tricky, as Data.Hashtable is currently rather intimately tied to a  
bunch of rather tricky bits of GHC.

-Jan-Willem Maessen

> Second in this test, hash function needs to be very strong,
> as even with following I got longest chain of 16 elements.
> myHashString = fromIntegral . ff''' . ff'' . ff' . foldr f 0
>      where f c m = f'' $ f' (ord c + m)
>            f' m = m + (m `shiftL` 10)
>            f'' m = m `xor` (m `shiftR` 6)
>            ff' m = m + (m `shiftL` 3)
>            ff'' m = m `xor` (m `shiftR` 11)
>            ff''' m = m + (m `shiftL` 15)
> Default hashString has longestChain of 18 elements.
> Perhaps someone can explain if such a memory leaks from HashTable  
> updates
> are normal or are bugs?
> All in all functional version consumes less memory and is twice as
> fast.
>
> Greetings, Bane.
>
> _________________________________________________________________
> Express yourself instantly with MSN Messenger! Download today it's  
> FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
>
> _______________________________________________
> Haskell-Cafe mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re[2]: Re: Haskell Speed

Bulat Ziganshin
In reply to this post by Branimir Maksimovic-2
Hello Branimir,

Friday, December 30, 2005, 3:44:26 AM, you wrote:
BM> myHashString = fromIntegral . ff''' . ff'' . ff' . foldr f 0

i use the following hash function in my program:

filenameHash  =  fromIntegral . foldl (\h c -> h*37+(ord c)) 0

(it's written without explicit parameter in order to try to help GHC
machinery better optimize this function)

BM> All in all functional version consumes less memory and is twice as
BM> fast.

IOArrays is second-class citizens in GHC/Haskell. they are scanned on
_each_ GC, and this can substantially slow down program which uses
large IOArrays. below is two runs of the almost the same program using
different array types. this program unpickles 10^6 Word32 values,
using

1) Array Int Word32
2) IOArray Int Word32

first program just unpickles list and then uses listArray to transform
it to Array, while the second program writes unpickled values directly
to IOArray. the second solution uses less memory, creates less temporary
data and obviously must be faster. but really it works 8 times slower
because these dramatical GC times

for your program, things will be not so bad because IOArray, used inside
your HashTable, contains far less than million elements and
because your program has more to do itself. but try to compare MUT
time of functional and imperative version - may be your algorithm is
really faster and only GC times makes this comparision unfair

.. writing this message i thought that reducing number of GCs can
speed up my program and your program too. so there is third variant -
using IOArray, but with "+RTS -A100m"


>1) Array Int Word32

make array: 1.392 secs (arr = listArray0 [1..10^6] :: Array Int Word32)
put array: 1.833 secs  (put_ h arr)
get array: 2.163 secs  (a <- get h :: IO (Array Int Word32))

381,202,596 bytes allocated in the heap
143,766,944 bytes copied during GC
 24,937,228 bytes maximum residency (8 sample(s))

       1439 collections in generation 0 (  1.65s)
          8 collections in generation 1 (  0.77s)

         54 Mb total memory in use

  INIT  time    0.01s  (  0.00s elapsed)
  MUT   time    2.88s  (  2.93s elapsed)
  GC    time    2.42s  (  2.45s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time    5.32s  (  5.39s elapsed)

  %GC time      45.6%  (45.5% elapsed)

  Alloc rate    131,721,698 bytes per MUT second

  Productivity  54.2% of total user, 53.5% of total elapsed



>2) IOArray Int Word32

make array: 6.569 secs (v <- newListArray (0,10^6-1) [1..10^6] :: IO (IOArray Int Word32))
put array: 14.110 secs (put_ h arr)
get array: 23.874 secs (a <- get h :: IO (IOArray Int Word32))

360,377,080 bytes allocated in the heap
 87,007,920 bytes copied during GC
 18,237,472 bytes maximum residency (6 sample(s))

       1344 collections in generation 0 ( 40.77s)
          6 collections in generation 1 (  0.38s)

         36 Mb total memory in use

  INIT  time    0.01s  (  0.00s elapsed)
  MUT   time    3.15s  (  3.16s elapsed)
  GC    time   41.15s  ( 41.40s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time   44.31s  ( 44.56s elapsed)

  %GC time      92.9%  (92.9% elapsed)

  Alloc rate    114,007,301 bytes per MUT second

  Productivity   7.1% of total user, 7.1% of total elapsed


 
>3) IOArray Int Word32 with "+RTS -A100m"

make: 0.391 secs
put: 2.343 secs
get: 2.003 secs

440,654,968 bytes allocated in the heap
 26,413,568 bytes copied during GC
 20,043,828 bytes maximum residency (2 sample(s))

          4 collections in generation 0 (  0.23s)
          2 collections in generation 1 (  0.64s)

        120 Mb total memory in use

  INIT  time    0.02s  (  0.00s elapsed)
  MUT   time    3.81s  (  3.83s elapsed)
  GC    time    0.87s  (  0.90s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time    4.71s  (  4.74s elapsed)

  %GC time      18.6%  (19.1% elapsed)

  Alloc rate    114,963,466 bytes per MUT second

  Productivity  81.0% of total user, 80.5% of total elapsed
 

--
Best regards,
 Bulat                            mailto:[hidden email]



_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

RE: Re[2]: Re: Haskell Speed

Branimir Maksimovic-2



>From: Bulat Ziganshin <[hidden email]>
>Reply-To: Bulat Ziganshin <[hidden email]>
>To: "Branimir Maksimovic" <[hidden email]>
>CC: [hidden email], [hidden email]
>Subject: Re[2]: [Haskell-cafe] Re: Haskell Speed
>Date: Fri, 30 Dec 2005 17:56:57 +0300
>
>Hello Branimir,
>
>Friday, December 30, 2005, 3:44:26 AM, you wrote:
>BM> myHashString = fromIntegral . ff''' . ff'' . ff' . foldr f 0
>
>i use the following hash function in my program:
>
>filenameHash  =  fromIntegral . foldl (\h c -> h*37+(ord c)) 0
>
>(it's written without explicit parameter in order to try to help GHC
>machinery better optimize this function)

I've found that hasing function and anything that does calculation
is fast enough (compared imported C function and Haskell, and no real
difference,
Haskell is fast regarding calculations)
, problem is with memory leak.
In this case hashing function have to be strong in order to avoid
linear search. when memory leak with HashTable dissapears I will
use some even better hash functions.

>
>BM> All in all functional version consumes less memory and is twice as
>BM> fast.
>
>IOArrays is second-class citizens in GHC/Haskell. they are scanned on
>_each_ GC, and this can substantially slow down program which uses
>large IOArrays.

Hm, there is Hans Boehm GC for C and C++ and I have gcmalloc and
gcmalloc_atomic. Why this isn;t present in Haskel? gcmalloc_atomic is
usefull
when allocating large arrays that does not contain any references/pointers.

>
>for your program, things will be not so bad because IOArray, used inside
>your HashTable, contains far less than million elements and
>because your program has more to do itself. but try to compare MUT
>time of functional and imperative version - may be your algorithm is
>really faster and only GC times makes this comparision unfair

Hm that can be solved if hash_table use gcmaloc_atomic I guess.
(If all execution times goes for GC scans).

>
>.. writing this message i thought that reducing number of GCs can
>speed up my program and your program too. so there is third variant -
>using IOArray, but with "+RTS -A100m"
>
>
Wow, that option almost doubled speed of HashTable program (memory leak
remains).
Thank you, for enlighten me about GC. After this I can think of
gc_add_scanrange,gc_remove_scanrange and gcmalloc_atomic
functions as they are now common to other languages?
We need low level GC interface in order to produce fast
programs.

Greetings, Bane.

_________________________________________________________________
Don't just search. Find. Check out the new MSN Search!
http://search.msn.com/

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re[4]: Re: Haskell Speed

Bulat Ziganshin
Hello Branimir,

Saturday, December 31, 2005, 4:55:51 AM, you wrote:
>>IOArrays is second-class citizens in GHC/Haskell. they are scanned on
>>_each_ GC, and this can substantially slow down program which uses
>>large IOArrays.

BM> Hm, there is Hans Boehm GC for C and C++ and I have gcmalloc and
BM> gcmalloc_atomic. Why this isn;t present in Haskel? gcmalloc_atomic is
BM> usefull
BM> when allocating large arrays that does not contain any references/pointers.

because these arrays CONTAINS references :)  they reference all the
values inserted in the array so far

there is also IOUArray, which contains plain unboxed values of simple
types (Int, Float..) and don't slow down the program

>>.. writing this message i thought that reducing number of GCs can
>>speed up my program and your program too. so there is third variant -
>>using IOArray, but with "+RTS -A100m"
>>
>>
BM> Wow, that option almost doubled speed of HashTable program (memory leak
BM> remains).

use option "+RTS -sstderr" to see runtime of the program itself ("MUT
time") and "GC time" separately. see section "4.14.2. RTS options to
control the garbage collector" of GHC guide for more information


--
Best regards,
 Bulat                            mailto:[hidden email]



_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Haskell Speed

Brent Fulgham
In reply to this post by Daniel Carrera-2
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

> I recently stumbled across this thread, related to the Shootout  
> (which I run) and wanted to add a few comments.
>
> First, I agree that I am a Haskell novice; however, I rely on  
> people like yourselves to show me where I am doing things wrong.  
> If our solutions are horribly inefficient, please tell us what we  
> can do to resolve the problem.  But note that in some cases, we are  
> purposely being inefficient so that we can measure the cost of  
> certain basic operations.
>
> Doug Bagely started the Shootout as a comparison of scripting  
> languages.  After I revived it, I started focussing more on  
> functional and logic languages, since I was curious how these  
> languages would fare in comparison with the more typically-used  
> languages like C/C++.  Consequently, many of the tests show this  
> heritage and are highly imperative.  We have been working to relax  
> some of these rules, and have added many new tests that try to be  
> more amenable to comparison of lazy and 'strict' languages.
>
> In our defense I would like to plead that it is not easy to devise  
> short problems (e.g., that can be expressed in a page or so of code  
> - -- we try to limit it to 100 lines of C++ code as a rough  
> benchmark), and that also take enough run time for a meaningful  
> measurement.  So, many of the tests initially suffered from foolish  
> looping and repetition as a means of extending the run times  
> sufficiently to allow measurement.  We have been changing tests so  
> that fewer of them rely on these kinds of activities to generate  
> results.  We are always interested in new ideas, and would consider  
> any benchmark ideas you have.
>
> Finally, I don't agree at all that Haskell's scores show poorly.  
> The GHC compiler is one of the better performers, and regularly  
> trounces other languages when we let it do its job.  For example,  
> we had to jump through many hoops to convince the compiler to  
> actually invoke all the loops required to do the "same way" tests  
> on methodcall and similar benchmarks, because the compiler was so  
> sharp at identifying wasted effort and optimizing it away.  Perhaps  
> we don't say this strongly enough on the site, but I was truly  
> impressed at its ability to toss aside wasted calculations and  
> focus on the core problem.  I'll admit it was frustrating because  
> we wanted to see how long it would take for Haskell to make N  
> recursive calls, or what have you, but the fact that the compiler  
> saw that there was no need to do the recursion to get the result is  
> quite amazing!  I'm sure this would be extremely useful when  
> compiling real-world programs.
>
> There have been many cases where the "obvious" solution in Haskell  
> is a great performer, but we discover that it's so fast because it  
> has tossed aside wasted cycles and just done the work necessary to  
> arrive at a solution.  It's unfortunate, since we then have to  
> dirty the program with wasted steps to force it to exercise the  
> tasks required.  But in a "real world" test I think Haskell would  
> show quite well.
>
> As we evolve into the newer tests, I think you will see Haskell  
> make a better showing.  Simon Marlow (and others) have been a huge  
> help in optimizing the existing tests in a "fair" fashion, and as  
> we provide more "same thing" type tests, I think Haskell will pull  
> further ahead.
>
> Thanks,
>
> - -Brent
>
> P.S.  I must say that I find Haskell to be the most esthetically  
> pleasing language -- it's quite beautiful when printed!

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (Darwin)

iD8DBQFDuWIlzGDdrzfvUpURAnJuAJ90G9ltXH5m+jsz6D0QoucoY6anNACfcfkQ
Mg2K0NO0kbBYLy4ZX7LCoeM=
=uGnw
-----END PGP SIGNATURE-----
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

RE: Re: Haskell Speed

Simon Marlow
In reply to this post by Daniel Carrera-2
On 30 December 2005 01:23, Jan-Willem Maessen wrote:

> Probably.  The minimum table chunk size was rather large.  I have
> been experimenting (tests are running even as I type) with alternate
> implementations of Data.HashTable.  So far the winning implementation
> is one based on multiplicative hashing and simple table doubling.
> This seems to be consistently competitive with / faster than
> Data.Map.  At this point my inclination is to make the whole suite
> available:
>
> Data.HashTable.Class
> Data.HashTable.DataMap
> Data.HashTable.Doubling
> Data.HashTable.Flat
> Data.HashTable.Freeze
> Data.HashTable.Modulus
> Data.HashTable.Multiplicative
> Data.HashTable.Original
> Data.HashTable.Test
> Data.HashTable
>
> I've separated out the hashing technique (Modulus, Multiplicative)
> from the hash table implementation.  Note that a few of the above are
> bogus, and this is only a fraction of the implementations tried thus
> far.
>
> I need to get distribution clearance for a bunch of this code from my
> employer, and figure out how to package it.  The latter may be
> tricky, as Data.Hashtable is currently rather intimately tied to a
> bunch of rather tricky bits of GHC.

I wonder if you could put together a drop-in replacement for
Data.HashTable that we can incorporate?  There's not much point in us
still providing the inefficient version as standard after you've done
all this work to figure out how to do it better.

Cheers,
        Simon
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Re: Haskell Speed

Jan-Willem Maessen

On Jan 4, 2006, at 5:30 AM, Simon Marlow wrote:

> On 30 December 2005 01:23, Jan-Willem Maessen wrote:
>
>> Probably.  The minimum table chunk size was rather large.  I have
>> been experimenting (tests are running even as I type) with alternate
>> implementations of Data.HashTable.  So far the winning implementation
>> is one based on multiplicative hashing and simple table doubling.
>> This seems to be consistently competitive with / faster than
>> Data.Map.  At this point my inclination is to make the whole suite
>> available:
>>
>> Data.HashTable.Class
>> Data.HashTable.DataMap
>> Data.HashTable.Doubling
>> Data.HashTable.Flat
>> Data.HashTable.Freeze
>> Data.HashTable.Modulus
>> Data.HashTable.Multiplicative
>> Data.HashTable.Original
>> Data.HashTable.Test
>> Data.HashTable
>>
>> I've separated out the hashing technique (Modulus, Multiplicative)
>> from the hash table implementation.  Note that a few of the above are
>> bogus, and this is only a fraction of the implementations tried thus
>> far.
>>
>> I need to get distribution clearance for a bunch of this code from my
>> employer, and figure out how to package it.  The latter may be
>> tricky, as Data.Hashtable is currently rather intimately tied to a
>> bunch of rather tricky bits of GHC.
>
> I wonder if you could put together a drop-in replacement for
> Data.HashTable that we can incorporate?  There's not much point in us
> still providing the inefficient version as standard after you've done
> all this work to figure out how to do it better.

That'd be good---though what qualifies as the "efficient version"  
will, I think, change based on the GC changes you said you made (I  
haven't had the time/patience to try to bootstrap the development  
version of GHC).  This is one of the reasons I've been saving so many  
bread crumbs along the way.  All of the above will work as drop-in  
Data.HashTable replacements, with Data.HashTable simply re-exporting  
multiplicative hashing and table doubling.

The tricky bit, as you probably know, is the rather intimate ties  
between Data.HashTable and the rest of the builtin code.  This  
requires the shipping Data.HashTable library to import most things  
from the source, rather than from the place where they're made  
publicly available.  Do you have any suggestions as to how I might go  
about testing that integration?

-Jan

>
> Cheers,
> Simon

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
123