FPS 0.3

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

FPS 0.3

Donald Bruce Stewart
Following the comments over the last few days, I've implemented and
tagged FPS 0.3. Changes include:

    * Clearer docs. Emphasis on support only for latin-1
    * Partition out the few Word8 functions
    * Merge in all of Simon's QC tests into a combined test/benchmark suite
    * Stress testing on huge data quantities
    * Space profiling, improved some functions
    * Clarify BSD license everything

I think its about ready to become a ByteSequence (or whatever) library:
a useful (actually, critical, imo) interim until we have a full
Unicode/UTF* layer, and a Storable a => Vector a, set up.  At that point,
the Word8/Char issues resolve themselves.

As usual, info is all here:
    http://www.cse.unsw.edu.au/~dons/fps.html

-- Don
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: FPS 0.3

Simon Marlow-5
Donald Bruce Stewart wrote:

> Following the comments over the last few days, I've implemented and
> tagged FPS 0.3. Changes include:
>
>     * Clearer docs. Emphasis on support only for latin-1
>     * Partition out the few Word8 functions
>     * Merge in all of Simon's QC tests into a combined test/benchmark suite
>     * Stress testing on huge data quantities
>     * Space profiling, improved some functions
>     * Clarify BSD license everything
>
> I think its about ready to become a ByteSequence (or whatever) library:
> a useful (actually, critical, imo) interim until we have a full
> Unicode/UTF* layer, and a Storable a => Vector a, set up.  At that point,
> the Word8/Char issues resolve themselves.
>
> As usual, info is all here:
>     http://www.cse.unsw.edu.au/~dons/fps.html

Looks good to me.  I propose we resolve the naming and put it into the
base package as a replacement for Data.PackedString.

Currently the module is called Data.FastPackedString and the type is
FastString.  I like either Data.ByteString(ByteString) or
Data.ByteSequence(ByteSequence) as a replacement,
preferences/objections/alternatives?

Cheers,
        Simon
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: FPS 0.3

Donald Bruce Stewart
simonmarhaskell:

> Donald Bruce Stewart wrote:
> >    http://www.cse.unsw.edu.au/~dons/fps.html
>
> Looks good to me.  I propose we resolve the naming and put it into the
> base package as a replacement for Data.PackedString.
>
> Currently the module is called Data.FastPackedString and the type is
> FastString.  I like either Data.ByteString(ByteString) or
> Data.ByteSequence(ByteSequence) as a replacement,
> preferences/objections/alternatives?

(+1) for Data.ByteString(ByteString)

-- Don
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re[2]: FPS 0.3

Bulat Ziganshin-2
In reply to this post by Simon Marlow-5
Hello Simon,

Friday, April 21, 2006, 12:40:17 PM, you wrote:

> Looks good to me.  I propose we resolve the naming and put it into the
> base package as a replacement for Data.PackedString.

are you sure that ByteString can replace PackedString? ;)


--
Best regards,
 Bulat                            mailto:[hidden email]

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

FPS, ForeignPtr and GHC 6.6

Donald Bruce Stewart
In reply to this post by Donald Bruce Stewart
Just did some benchmarking of FPS with ghc 6.6 from last night.
Most functions seem to have gotten faster (due to the improved ForeignPtr). In particular:

    words           1.583        2.115
    lines           0.028        0.421
    inits           0.777        5.350
    tails           0.836        6.634

All functions that generate many substrings.
Interestingly, unpack seems to have gone backwards.

    unpack          3.319        1.596

So I'll have to look at that. Possibly the fusion tricks are not working?
Happily this now means:

    main = print . length . P.lines =<<  P.hGetContents stdin

now runs at exactly 2x the speed of wc -l :)

    $ time ./a.out < 20M
    314080
    ./a.out < 20M  0.03s user 0.04s system 85% cpu 0.083 total

    pill19$ time wc -l < 20M
    314080
    wc -l < 20M  0.02s user 0.01s system 83% cpu 0.042 total

It was around 4-5x under 6.4.1. Good work Simon!

-- Don, happily benchmarking
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: FPS, ForeignPtr and GHC 6.6

Bulat Ziganshin-2
Hello Donald,

Friday, April 21, 2006, 2:09:36 PM, you wrote:

> Just did some benchmarking of FPS with ghc 6.6 from last night.
> Most functions seem to have gotten faster (due to the improved ForeignPtr). In particular:

it's the answer what will be the speed using ByteArray#. ForeignPtr in
ghc 6.6 is no magic, it just contains direct Ptr instead of boxed one
in 6.4



--
Best regards,
 Bulat                            mailto:[hidden email]

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: FPS 0.3

Simon Marlow-5
In reply to this post by Bulat Ziganshin-2
Bulat Ziganshin wrote:

> Hello Simon,
>
> Friday, April 21, 2006, 12:40:17 PM, you wrote:
>
>
>>Looks good to me.  I propose we resolve the naming and put it into the
>>base package as a replacement for Data.PackedString.
>
>
> are you sure that ByteString can replace PackedString? ;)

are you referring to the fact that ByteString doesn't handle the full
Unicode range, but PackedString does?

My response to that would be to use [Char] if you want Unicode.
PackedString was slower than [Char] in many cases anyway :-)

Cheers,
        Simon
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: FPS, ForeignPtr and GHC 6.6

Simon Marlow-5
In reply to this post by Bulat Ziganshin-2
Bulat Ziganshin wrote:
> Friday, April 21, 2006, 2:09:36 PM, you wrote:
>
>>Just did some benchmarking of FPS with ghc 6.6 from last night.
>>Most functions seem to have gotten faster (due to the improved ForeignPtr). In particular:
>
>
> it's the answer what will be the speed using ByteArray#. ForeignPtr in
> ghc 6.6 is no magic, it just contains direct Ptr instead of boxed one
> in 6.4

The point is that you can't implement all the operations in FPS using a
ByteArray# representation, so that's a bogus comparison.  ForeignPtr is
the right thing!

BTW, we could improve the performance of ForeignPtr even further by
eliminating the IORef in its representation.  This means losing the
ordering property of addForeignPtrFinalizer, but if we did this just for
the ForeignPtrs created by FPS.pack, I doubt anyone would ever notice,
and you save 12 (24) bytes per PS.

Cheers,
        Simon
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re[2]: FPS 0.3

Bulat Ziganshin-2
In reply to this post by Simon Marlow-5
Hello Simon,

Friday, April 21, 2006, 2:22:28 PM, you wrote:

>> are you sure that ByteString can replace PackedString? ;)

> are you referring to the fact that ByteString doesn't handle the full
> Unicode range, but PackedString does?

yes

> My response to that would be to use [Char] if you want Unicode.
> PackedString was slower than [Char] in many cases anyway :-)

so why you say that it's a "replacement for Data.PackedString" ?

let's omit also arrays. ByteString is not replaces them but who
worries? :)


--
Best regards,
 Bulat                            mailto:[hidden email]

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re[2]: FPS, ForeignPtr and GHC 6.6

Bulat Ziganshin-2
In reply to this post by Simon Marlow-5
Hello Simon,

Friday, April 21, 2006, 3:44:17 PM, you wrote:

> The point is that you can't implement all the operations in FPS using a
> ByteArray# representation, so that's a bogus comparison.

that operations can't be implemented with pinned byte arrays? i think
only mmap support?

> ForeignPtr is the right thing!

they are slower in 6.4
they need more memory
memory fragmentation can't be eliminated by GC

moreover, implementation of ListLike interface for all arrays is
anyway useful. currently proposed ByteString is just one limited
implementation of this. why haskellers prefer to implement facilities
for just one concrete datatype instead of supporting whole classes??

moreover, Donald made most of its test with excessively large strings.
for strings of real size (10-100 bytes) speed difference will be
typically the same as in his `wc` test. so the current implementation
slows library at least 2 times and this can't be avoided without
switching to 6.6

> BTW, we could improve the performance of ForeignPtr even further by
> eliminating the IORef in its representation.  This means losing the
> ordering property of addForeignPtrFinalizer, but if we did this just for
> the ForeignPtrs created by FPS.pack, I doubt anyone would ever notice,
> and you save 12 (24) bytes per PS.

data ForeignPtrContents
  = PlainForeignPtr !(IORef [IO ()])
  | MallocPtr (MutableByteArray# RealWorld) !(IORef [IO ()])

you mean adding 3rd variant here:

  | PlainMallocPtr (MutableByteArray# RealWorld)

?

btw, i still wondering - that is the difference between
MutableByteArray# and ByteArray# ? :)


--
Best regards,
 Bulat                            mailto:[hidden email]

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: FPS, ForeignPtr and GHC 6.6

David Roundy
On Fri, Apr 21, 2006 at 04:53:42PM +0400, Bulat Ziganshin wrote:
> moreover, Donald made most of its test with excessively large strings.
> for strings of real size (10-100 bytes) speed difference will be
> typically the same as in his `wc` test. so the current implementation
> slows library at least 2 times and this can't be avoided without
> switching to 6.6

This is a module designed to efficiently handle large strings.  That's the
entire reason for its existence.  Which isn't to say one shouldn't also
work to make it fast on small strings, but small strings are cheap
regardless, so they aren't the point that needs to be optimized.
--
David Roundy
http://www.darcs.net
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: FPS, ForeignPtr and GHC 6.6

Simon Marlow-5
In reply to this post by Bulat Ziganshin-2
Bulat Ziganshin wrote:

> Hello Simon,
>
> Friday, April 21, 2006, 3:44:17 PM, you wrote:
>
>
>>The point is that you can't implement all the operations in FPS using a
>>ByteArray# representation, so that's a bogus comparison.
>
> that operations can't be implemented with pinned byte arrays? i think
> only mmap support?

Many of the operations in "Low-level constructors"

   http://www.cse.unsw.edu.au/~dons/fps/Data.FastPackedString.html#21

plus mmapFile.

>>ForeignPtr is the right thing!
>
> they are slower in 6.4

Performance with 6.4 isn't a priority, since the library is going into
6.6.  By all means write a version that works better with 6.4 in the
meantime.

> data ForeignPtrContents
>   = PlainForeignPtr !(IORef [IO ()])
>   | MallocPtr (MutableByteArray# RealWorld) !(IORef [IO ()])
>
> you mean adding 3rd variant here:
>
>   | PlainMallocPtr (MutableByteArray# RealWorld)
>
> ?

Yes

> btw, i still wondering - that is the difference between
> MutableByteArray# and ByteArray# ? :)

ByteArray# doesn't have the state parameter, it can be read outside of
the IO/ST monads.  The point is to add a little type safety, it doesn't
have any impact on the implementation.

Cheers,
        Simon
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re[2]: FPS, ForeignPtr and GHC 6.6

Bulat Ziganshin-2
Hello Simon,

Friday, April 21, 2006, 5:57:47 PM, you wrote:

> Performance with 6.4 isn't a priority, since the library is going into
> 6.6.  By all means write a version that works better with 6.4 in the
> meantime.

so this library

1. don't intended to be used with 6.4
2. don't support Unicode
3. don't intended to be used with small strings (<20mb)

extremely useful thing!

i still propose to make universal Array->ListLike interface converter,
with additional functionality for StorableArrays. this will solve
problems with efficiency in 6.4 and with memory fragmentation and will
provide us more universal solution that can be used with 2-byte and
4-byte packed strings and with all other combinations of array/element types


--
Best regards,
 Bulat                            mailto:[hidden email]

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries