Re: [Haskell] ANNOUNCE: FPS - FastPackedStrings 0.2

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: [Haskell] ANNOUNCE: FPS - FastPackedStrings 0.2

Bulat Ziganshin-2
Hello Donald,

Wednesday, April 19, 2006, 1:23:25 PM, you wrote:

> I'm pleased to announce version 0.2 of FPS, the fast, packed string
> library for Haskell.

i want to write some critique about current state of your lib.
something of this may be interesting only for me

1. it don't support Unicode. there are at least two libs (Simon's and from
JHC) that uses UTF-8 to do this. of course, they will be not so
efficient on some operations. i think that it is essential to
general-purpose library and Data.PackedString replacement. Simon's lib
already implements utf-8, latin-1, ucs-2 and ucs-4 encoding. may be
it's possible to join them all together in one lib that uses
prerocessing or some other technique to implement differences between
utf-8 and fixed-width encoding

2. it uses ForeignPtr, that is slow in ghc 6.4 and require more
memory than ByteArray# in any ghc version. in addition, you place two
Ints here. it's question of taste, of course - whether we want to have
better speed or memory requirements. i personally prefer to have
minimal memory overhead that will be accomplished by using ByteArray#
plus one Int to hold string's size. it's possible to write lib so it
will use any unboxed array and this will work both with ForeignPtr
and ByteArray# implementations. It will also allow to implement
general list-like interface for any immutable arrays what should be
useful, in particular for collections framework that is now written by
JP Bernardy. of course, using general Array instead of ForeignPtr will
make some functions (such as mmap) unimplementable for common case

3. i'm also have two libs that is intended to be included in GHC 6.6. one
of them is reworking of Array facilities, another replaces the Handles
functionality. there are some common interests between us here - adding
common ListEmulation interface to Arrays, then implementing
FastStrings on top of it, then implementing really fast I/O for these
FastStrings. we definitely should join our work at some moment to provide
consistent features set for ghc 6.6. currently i can say that i also tried
to implement fast I/O using packed strings from your lib and from
jhc's module and - yes, i found that without prior `findEOL` it's hard
to make it fast with your (ForeignPtr-based) implementation. it seems
that you encounter the same problem

disclaimer: i know that i say about huge changes in the library
structure, so it's just thoughts on this issue. the only really
required thing is to make my streams I/O work efficient with your lib
if both libs will go into 6.6



--
Best regards,
 Bulat                            mailto:[hidden email]

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: [Haskell] ANNOUNCE: FPS - FastPackedStrings 0.2

John Meacham
On Wed, Apr 19, 2006 at 06:04:58PM +0400, Bulat Ziganshin wrote:
> 1. it don't support Unicode. there are at least two libs (Simon's and from
> JHC) that uses UTF-8 to do this. of course, they will be not so
> efficient on some operations. i think that it is essential to
> general-purpose library and Data.PackedString replacement. Simon's lib
> already implements utf-8, latin-1, ucs-2 and ucs-4 encoding. may be
> it's possible to join them all together in one lib that uses
> prerocessing or some other technique to implement differences between
> utf-8 and fixed-width encoding

Indeed. I was excited about the prospect of using FastPackedString until
I saw it didn't support the full character range. that is too bad.

I'd recommend just always using utf8 under the hood (since it shouldn't
matter what encoding is used internally) and have two integers stored
with the pointer, the number of bytes and the number of characters. when
these are the same you know you have straight ASCII, plus it gives you
O(1) length for free. I have very optimized utf8 fold operators in the
jhc version of PackedString you can steal. they get speed by assuming
the data is always valid UTF8 so don't do error checking, which the
constructors always enforce. (yay for ADTs)


        John

--
John Meacham - ⑆repetae.net⑆john⑈
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: [Haskell] ANNOUNCE: FPS - FastPackedStrings 0.2

Marcin 'Qrczak' Kowalczyk
John Meacham <[hidden email]> writes:

> I'd recommend just always using utf8 under the hood

Or have two cases of the representation: an array of bytes if every
character is U+00FF or below, or an array of 32-bit words otherwise.

My Kogut compiler does that internally, and makes use of that when
interfacing with C (the default encoding is assumed to preserve ASCII;
if the narrow case has only ASCII chars, a pointer to its internals
is passed to a C function; for this reason it has an otherwise unused
'\0' after the last character).

AFAIK CLISP has 16-bit words as the third case.

--
   __("<         Marcin Kowalczyk
   \__/       [hidden email]
    ^^     http://qrnik.knm.org.pl/~qrczak/
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: [Haskell] ANNOUNCE: FPS - FastPackedStrings 0.2

John Meacham
On Thu, Apr 20, 2006 at 12:47:49AM +0200, Marcin 'Qrczak' Kowalczyk wrote:
> > I'd recommend just always using utf8 under the hood
>
> Or have two cases of the representation: an array of bytes if every
> character is U+00FF or below, or an array of 32-bit words otherwise.

The complexity of multiple cases and encodings never seemed worth it to
me. the code gets bigger and you have to have switches depending on the
representation that slows things down. Just plain old utf8 always seems
the best for a FastPackedString library at least. But others opinions
differ on the matter.

        John

--
John Meacham - ⑆repetae.net⑆john⑈
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: [Haskell] ANNOUNCE: FPS - FastPackedStrings 0.2

Donald Bruce Stewart
In reply to this post by Bulat Ziganshin-2
bulat.ziganshin:

> Hello Donald,
>
> Wednesday, April 19, 2006, 1:23:25 PM, you wrote:
>
> > I'm pleased to announce version 0.2 of FPS, the fast, packed string
> > library for Haskell.
>
> i want to write some critique about current state of your lib.
> something of this may be interesting only for me
>
> 1. it don't support Unicode. there are at least two libs (Simon's and from
> JHC) that uses UTF-8 to do this. of course, they will be not so
> efficient on some operations. i think that it is essential to
> general-purpose library and Data.PackedString replacement. Simon's lib
> already implements utf-8, latin-1, ucs-2 and ucs-4 encoding. may be
> it's possible to join them all together in one lib that uses
> prerocessing or some other technique to implement differences between
> utf-8 and fixed-width encoding

I plan to do that at some point, yes, following the structure of Simon's lib.
It should be fairly easy to merge Simon's UTF code with fps, as the
data structure is the same.

> 2. it uses ForeignPtr, that is slow in ghc 6.4 and require more
> memory than ByteArray# in any ghc version. in addition, you place two
> Ints here. it's question of taste, of course - whether we want to have
> better speed or memory requirements. i personally prefer to have
> minimal memory overhead that will be accomplished by using ByteArray#
> plus one Int to hold string's size. it's possible to write lib so it
> will use any unboxed array and this will work both with ForeignPtr
> and ByteArray# implementations. It will also allow to implement
> general list-like interface for any immutable arrays what should be
> useful, in particular for collections framework that is now written by
> JP Bernardy. of course, using general Array instead of ForeignPtr will
> make some functions (such as mmap) unimplementable for common case
 
That's quite a big change. I'd like to see a prototype first, and some
numbers. As SPJ sometimes says: "show me the code".

Cheers,
  Don
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

"FastPackedString" considered harmful

Donald Bruce Stewart
In reply to this post by Bulat Ziganshin-2

On the topic of a good name for FPS, I think its fairly widely
considered that a packed byte "string" shouldn't be considered a String,
and thus FastPackedString is a potentially confusing misnomer.
It was always meant as a working title until something replaced
Data.PackedString anyway.

If this library is to be imported into the base libraries, along with
future PackedString.Unicode and so on, layers on top, we should probably
get the name right now.

I'm disinclined to call it a ByteArray module -- it doesn't really offer
array-like operations. And its got nothing much to do with the other
Array.* stuff.

Instead, how about: Data.ByteString  (with a connotation of IntMap) ?

That seems to suggest both the stringy-ness of the api, but also the
restriction to bytes.

-- Don
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: "FastPackedString" considered harmful

sethk
I agree with Don about the current name.  If FastPackString is confusing because it isn't a string, doesn't ByteString suffers from the same problem?

Seth

On Thu, 20 Apr 2006 13:26:27 +1000
[hidden email] (Donald Bruce Stewart) wrote:

>
> On the topic of a good name for FPS, I think its fairly widely
> considered that a packed byte "string" shouldn't be considered a String,
> and thus FastPackedString is a potentially confusing misnomer.
> It was always meant as a working title until something replaced
> Data.PackedString anyway.
>
> If this library is to be imported into the base libraries, along with
> future PackedString.Unicode and so on, layers on top, we should probably
> get the name right now.
>
> I'm disinclined to call it a ByteArray module -- it doesn't really offer
> array-like operations. And its got nothing much to do with the other
> Array.* stuff.
>
> Instead, how about: Data.ByteString  (with a connotation of IntMap) ?
>
> That seems to suggest both the stringy-ness of the api, but also the
> restriction to bytes.
>
> -- Don
> _______________________________________________
> Libraries mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/libraries
>
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: "FastPackedString" considered harmful

Donald Bruce Stewart

seth:
> I agree with Don about the current name.  If FastPackString is confusing because it isn't a string, doesn't ByteString suffers from the same problem?

I'm not sure if you're agreeing with me Seth :)

I do think FastPackedString is not a suitable for a standard Haskell
library as long as it only works on byte strings. So, to make crystal
clear what the story is, I propose ByteString, following IntMap, to
state this. I think its important to keep 'String' in the name, since
its a string-mangling library.

> Seth
>
> On Thu, 20 Apr 2006 13:26:27 +1000
> [hidden email] (Donald Bruce Stewart) wrote:
>
> >
> > On the topic of a good name for FPS, I think its fairly widely
> > considered that a packed byte "string" shouldn't be considered a String,
> > and thus FastPackedString is a potentially confusing misnomer.
> > It was always meant as a working title until something replaced
> > Data.PackedString anyway.
> >
> > If this library is to be imported into the base libraries, along with
> > future PackedString.Unicode and so on, layers on top, we should probably
> > get the name right now.
> >
> > I'm disinclined to call it a ByteArray module -- it doesn't really offer
> > array-like operations. And its got nothing much to do with the other
> > Array.* stuff.
> >
> > Instead, how about: Data.ByteString  (with a connotation of IntMap) ?
> >
> > That seems to suggest both the stringy-ness of the api, but also the
> > restriction to bytes.
> >
> > -- Don
> > _______________________________________________
> > Libraries mailing list
> > [hidden email]
> > http://www.haskell.org/mailman/listinfo/libraries
> >
> _______________________________________________
> Libraries mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/libraries
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: "FastPackedString" considered harmful

Donald Bruce Stewart
dons:
> I'm not sure if you're agreeing with me Seth :)
>
> I do think FastPackedString is not a suitable for a standard Haskell
> library as long as it only works on byte strings.

*cough* "not a suitable _name_ for a standard Haskell library..."
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re[2]: [Haskell] ANNOUNCE: FPS - FastPackedStrings 0.2

Bulat Ziganshin-2
In reply to this post by John Meacham
Hello John,

Thursday, April 20, 2006, 1:52:53 AM, you wrote:

> Indeed. I was excited about the prospect of using FastPackedString until
> I saw it didn't support the full character range. that is too bad.

this lib should be slower than your on small strings due to ForeignPtr
inefficiency

and i think that Donald should mention in doc/announcement that his
lib is latin-1 only. it's not good that each of us should scan his
sources to rediscover this fact


--
Best regards,
 Bulat                            mailto:[hidden email]

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re[2]: [Haskell] ANNOUNCE: FPS - FastPackedStrings 0.2

Bulat Ziganshin-2
In reply to this post by John Meacham
Hello John,

Thursday, April 20, 2006, 3:03:23 AM, you wrote:

> On Thu, Apr 20, 2006 at 12:47:49AM +0200, Marcin 'Qrczak' Kowalczyk wrote:
>> > I'd recommend just always using utf8 under the hood
>>
>> Or have two cases of the representation: an array of bytes if every
>> character is U+00FF or below, or an array of 32-bit words otherwise.

> The complexity of multiple cases and encodings never seemed worth it to
> me. the code gets bigger and you have to have switches depending on the
> representation that slows things down. Just plain old utf8 always seems
> the best for a FastPackedString library at least. But others opinions
> differ on the matter.

i'm not sure. utf-8 encoding has his own drawbacks (i mean complexity
of code and slowness). moreover if Donald will use Simon's technique
for implementing ucs1..ucs4 packed strings, this module will just
call existing functions. one problem, though, is what we will lose
utf-8 support for mapped files and any other C strings


--
Best regards,
 Bulat                            mailto:[hidden email]

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: "FastPackedString" considered harmful

Bulat Ziganshin-2
In reply to this post by Donald Bruce Stewart
Hello Donald,

Thursday, April 20, 2006, 7:26:27 AM, you wrote:

> Instead, how about: Data.ByteString  (with a connotation of IntMap) ?

Data.Str and type Str imho is a best thing. but you should support
utf-8. Data.Str.Latin1 .. Data.Str.UCS4 may be used for other forms of
packed strings


--
Best regards,
 Bulat                            mailto:[hidden email]

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: "FastPackedString" considered harmful

Udo Stenzel
In reply to this post by Donald Bruce Stewart
Donald Bruce Stewart wrote:
>
> I do think FastPackedString is not a suitable for a standard Haskell
> library as long as it only works on byte strings.

How about making it polymorphic then?  It could work on everything
Storable, with a bunch of specialized funktions for the common
Word8/Word16/Word32 and Char variants.


Udo.
--
When someone says "I want a programming language in which I need only
say what I wish done," give him a lollipop.

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries

signature.asc (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [Haskell] ANNOUNCE: FPS - FastPackedStrings 0.2

Tomasz Zielonka
In reply to this post by John Meacham
On Wed, Apr 19, 2006 at 02:52:53PM -0700, John Meacham wrote:
> I'd recommend just always using utf8 under the hood (since it shouldn't
> matter what encoding is used internally) and have two integers stored
> with the pointer, the number of bytes and the number of characters. when
> these are the same you know you have straight ASCII, plus it gives you
> O(1) length for free. I have very optimized utf8 fold operators in the
> jhc version of PackedString you can steal. they get speed by assuming
> the data is always valid UTF8 so don't do error checking, which the
> constructors always enforce. (yay for ADTs)

Nice trick, but you loose constant time indexing operations, don't you?

Best regards
Tomasz
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: "FastPackedString" considered harmful

Jean-Philippe Bernardy
In reply to this post by Donald Bruce Stewart
How about ByteSequence ?

This is inline with Ross' Data.Sequence and Data.IntMap.

Cheers,
JP.

On 4/20/06, Donald Bruce Stewart <[hidden email]> wrote:

>
> On the topic of a good name for FPS, I think its fairly widely
> considered that a packed byte "string" shouldn't be considered a String,
> and thus FastPackedString is a potentially confusing misnomer.
> It was always meant as a working title until something replaced
> Data.PackedString anyway.
>
> If this library is to be imported into the base libraries, along with
> future PackedString.Unicode and so on, layers on top, we should probably
> get the name right now.
>
> I'm disinclined to call it a ByteArray module -- it doesn't really offer
> array-like operations. And its got nothing much to do with the other
> Array.* stuff.
>
> Instead, how about: Data.ByteString  (with a connotation of IntMap) ?
>
> That seems to suggest both the stringy-ness of the api, but also the
> restriction to bytes.
>
> -- Don
> _______________________________________________
> Libraries mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/libraries
>
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: [Haskell] ANNOUNCE: FPS - FastPackedStrings 0.2

Donald Bruce Stewart
In reply to this post by Bulat Ziganshin-2
bulat.ziganshin:
> Hello John,
>
> Thursday, April 20, 2006, 1:52:53 AM, you wrote:
>
> > Indeed. I was excited about the prospect of using FastPackedString until
> > I saw it didn't support the full character range. that is too bad.
>
> this lib should be slower than your on small strings due to ForeignPtr
> inefficiency

Do you have some sketches of ByteArray# represenations?
I'm very happy to change the representation if it makes things more time
or space efficient.

-- Don
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: "FastPackedString" considered harmful

Simon Marlow-5
In reply to this post by Jean-Philippe Bernardy
Data.ByteVector?

But I would expect a library called Data.Byte* to deal in terms of
Word8s, not Chars.  The objective of the operation is to provide a
speedier [Char], so I'd be happy with Data.PackedString(.Latin1).

Perhaps the ideal structure would be something like (taking up Udo
Stenzel's suggestion):

   Data.StorableVector
     - data {- Storable w => -} StorableVector w
     - implements all the FPS operations
     - convert between ForeignPtr, StorableVector, StorableArray

   Data.PackedString.Latin1
   Data.PackedString.UTF8
     - newtype PackedString = PS (StorableVector Word8)

Making all the appropriate specialisations happen might be a challenge,
though.  An intermediate position is to provide Data.ByteVector and
Data.PackedString.* defined in terms of that.

Cheers,
        Simon

Jean-Philippe Bernardy wrote:

> How about ByteSequence ?
>
> This is inline with Ross' Data.Sequence and Data.IntMap.
>
> Cheers,
> JP.
>
> On 4/20/06, Donald Bruce Stewart <[hidden email]> wrote:
>
>>On the topic of a good name for FPS, I think its fairly widely
>>considered that a packed byte "string" shouldn't be considered a String,
>>and thus FastPackedString is a potentially confusing misnomer.
>>It was always meant as a working title until something replaced
>>Data.PackedString anyway.
>>
>>If this library is to be imported into the base libraries, along with
>>future PackedString.Unicode and so on, layers on top, we should probably
>>get the name right now.
>>
>>I'm disinclined to call it a ByteArray module -- it doesn't really offer
>>array-like operations. And its got nothing much to do with the other
>>Array.* stuff.
>>
>>Instead, how about: Data.ByteString  (with a connotation of IntMap) ?
>>
>>That seems to suggest both the stringy-ness of the api, but also the
>>restriction to bytes.
>>
>>-- Don
>>_______________________________________________
>>Libraries mailing list
>>[hidden email]
>>http://www.haskell.org/mailman/listinfo/libraries
>>

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: [Haskell] ANNOUNCE: FPS - FastPackedStrings 0.2

Simon Marlow-5
In reply to this post by Donald Bruce Stewart
Donald Bruce Stewart wrote:
> bulat.ziganshin:

>>2. it uses ForeignPtr, that is slow in ghc 6.4 and require more
>>memory than ByteArray# in any ghc version. in addition, you place two
>>Ints here. it's question of taste, of course - whether we want to have
>>better speed or memory requirements. i personally prefer to have
>>minimal memory overhead that will be accomplished by using ByteArray#
>>plus one Int to hold string's size. it's possible to write lib so it
>>will use any unboxed array and this will work both with ForeignPtr
>>and ByteArray# implementations. It will also allow to implement
>>general list-like interface for any immutable arrays what should be
>>useful, in particular for collections framework that is now written by
>>JP Bernardy. of course, using general Array instead of ForeignPtr will
>>make some functions (such as mmap) unimplementable for common case
>
> That's quite a big change. I'd like to see a prototype first, and some
> numbers. As SPJ sometimes says: "show me the code".

Optimising the library for GHC 6.4.x isn't a goal (for me, at least).
I'm convinced that ForeignPtr is the right choice.  No other
representation can give you all of these:

   - converting arbitrary Ptrs into PackedStrings without copying,
     including finalizers if necessary (useful for mmap()).

   - garbage-collectable PackedStrings can be allocated on the heap, no
     finalizer necessary

   - PackedStrings can be passed to foreign functions without copying

ByteArray# only gives you the second two.  You would need two variants
of the representation to get all three, and that means extra overhead.

Cheers,
        Simon
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re[2]: [Haskell] ANNOUNCE: FPS - FastPackedStrings 0.2

Bulat Ziganshin-2
In reply to this post by Donald Bruce Stewart
Hello Donald,

Thursday, April 20, 2006, 1:06:38 PM, you wrote:

>> this lib should be slower than your on small strings due to ForeignPtr
>> inefficiency

> Do you have some sketches of ByteArray# represenations?
> I'm very happy to change the representation if it makes things more time
> or space efficient.

you should see at the jhc's PackedString implementation that uses
UArray (which is internally ByteArray#). his implementation also don't
use additional Ints - array size is stored as part of UArray structure
and O(1) substr operation is just not exists (instead, it is O(n) with
copying)

as i said, this can be extended to support List-like interface for ANY
arrays, that is the very useful thing. if you will implement this,
then we will only need utf-8 wrapper around UArray Int Word8 for
implementing packed Unicode strings (and Latin1/UCS-2/UCS-4 strings
will work just as unboxed arrays of Word8/Word16/Word32)

you can compare speed and memory requirements of ForeignPtr and
ByteArray# solutions by comparing jhc's module with Simon's utf-8
implementation - both are not such greatly optimized as your lib, both
uses utf-8. please use large number of typical strings (with size of
10-100) for real-world comparison, comparing on 20mb strings is not
indicative

i can write disk traversing algorithm (like ls-lR) for some sort of
real app, to make testing more realistic


--
Best regards,
 Bulat                            mailto:[hidden email]

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re[2]: [Haskell] ANNOUNCE: FPS - FastPackedStrings 0.2

Bulat Ziganshin-2
In reply to this post by Simon Marlow-5
Hello Simon,

Thursday, April 20, 2006, 1:23:42 PM, you wrote:

>>>it's possible to write lib so it
>>>will use any unboxed array and this will work both with ForeignPtr
>>>and ByteArray# implementations

>    - converting arbitrary Ptrs into PackedStrings without copying,
>      including finalizers if necessary (useful for mmap()).

>    - garbage-collectable PackedStrings can be allocated on the heap, no
>      finalizer necessary

>    - PackedStrings can be passed to foreign functions without copying

> ByteArray# only gives you the second two.  You would need two variants
> of the representation to get all three, and that means extra overhead.

Simon, you skipped the whole idea - the lib should work with any array,
including StorableArray too. it should be just Array->ListLike
interface converter, with additional code specialized for unmovable
arrays. plus UTF-8 support at the top of this:

module ArrayToListLike
class ListLike
  head
  tail
  ...
instance (Array a e) => ListLike (a e)

module UTF8PackedString
import ArrayToListLike
newtype UTF8PackedString = UTF8PackedString (UArray Int Word8)
instance ListLike UTF8PackedString


well, i'm not sure that this can be implemented with classes. but at
least this can be implemented with preprocessor (that is Haskell's
module system ;) just like yourself implemented packed strings for
1/2/4-byte wide representations. here, array type and array elements
type will be parameters:

#define ARRAY UArray
#define ELEMENT Word8
#include "Str.hs"


#define ARRAY StorableArray
#define ELEMENT Word8
#include "Str.hs"


> Optimising the library for GHC 6.4.x isn't a goal (for me, at least).

1. i think that we will have really stable release of 6.6 at
Fall'2007, not earlier

2. ForeignPtr has the following drawbacks even for 6.6:

- need more memory. main goal of using packed string is to reduce
memory requirements so adding 10-15 bytes is bad idea

- can point only to the unmovable memory blocks that may increase
overall memory consumption for some apps. general Array can work with
any memory - movable heap, unmovable heap, or C-allocated

--
Best regards,
 Bulat                            mailto:[hidden email]

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
12