FPS/Data.ByteString candidate

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
42 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: Data.ByteString candidate 3

Ketil Malde-3

I hope you'll forgive me for re-advertising my FPS modifications.
I've started over from Don's sources (please don't use my old fps
repo), refactored, and reworked my changes into that.  

The refactored repo (all functionality and performance identical to
the original):

           http://www.ii.uib.no/~ketil/src/fps-wrapped

Repo with added Latin1 and ASCII support:

           http://www.ii.uib.no/~ketil/src/fps-i18n

Latin1 functions equal to Char8, but packing chars > 255 will give an
error.  ASCII does the same, but stores characters > 127 out of harms
way.

Adding support for new character sets requires defining four functions
and three constants, and #include'ing a common file.

In addition, some nice properties hold, for instance:

        s1 > s2 => pack s1 > pack s2
        w2c . c2w == id   -- provided no error
        c2w . w2c == id   -- total function

Only the latter holds for Char8.

Latin1 has been tested with the Char8 QC tests, and they have all been
subjected to the benchmark suite, results at

           http://www.ii.uib.no/~ketil/src/bench.txt

(This is using /usr/share/word/dict)

Packing and unpacking isn't part of the benchmark, but is expected to
be around 10% slower than for Char8.  I have no explanation why 'map'
and 'split' are faster.

-k
--
If I haven't seen further, it is by standing in the footprints of giants

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Data.ByteString candidate 3

John Meacham
In reply to this post by Simon Marlow-5
On Thu, Apr 27, 2006 at 10:25:53AM +0100, Simon Marlow wrote:
> Actually yes, I did intend to replace/extend Text.Regex with JRegex at
> some point.  Plus we can include PCRE, since it has a BSD license -
> maybe it can replace the POSIX regex implementation that we have in GHC
> right now (which was taken from FreeBSD's libc).

That would actually be nice, the PCRE that comes with most systems isn't
compiled with unicode support enabled. However you can set flags when
compiling PCRE to let it handle UTF8 directly.

        John

--
John Meacham - ⑆repetae.net⑆john⑈
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
123