Iteratee-like

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Iteratee-like

Permjacov Evgeniy
Ok, I think, I made it right now. I wrote two versions of the very same
module with roughly the same interface. It is minimalistic framework for
producing, transforming, zipping and folding streaming data (a sample
code that does file IO provided, but it is not well tested yet). One
version abuses CPS-style, requires rank2types, but should be slightly
faster and has cleaner code. Other version is plain haskell-98 and
should be reasonable portable, but is tricky at some plases and still
uses CPS for specifing stream processors. I want to add many things, but
currently I want to document code properly so it be easy to read (BTW,
has someone any article, produced from *.lhs with good typesetting and
with *.lhs itself avaible? )

current links

https://github.com/permeakra/Rank2Iteratee
https://github.com/permeakra/PassiveIteratee

The main difference from 'original' iteratees I read about is that both
do not use 'chunks' and pass data one-by-one. So, what I wrote may be
slower, but should be easier to maintain and more transparent for ghc
optimising facilities. I wanted as clean and simple code as possible,
but it is still very, very messy at some places and I want it cleaner.
Any suggestions? I also want to check, how good ghc does its work with
this messy modules. They may become interesting benchmarks.

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Iteratee-like

John Lato-2
From: Permjacov Evgeniy <[hidden email]>

current links

https://github.com/permeakra/Rank2Iteratee
https://github.com/permeakra/PassiveIteratee

The main difference from 'original' iteratees I read about is that both
do not use 'chunks' and pass data one-by-one. So, what I wrote may be
slower, but should be easier to maintain and more transparent for ghc
optimising facilities. I wanted as clean and simple code as possible,
but it is still very, very messy at some places and I want it cleaner.
Any suggestions? I also want to check, how good ghc does its work with
this messy modules. They may become interesting benchmarks.

Have you tried comparing it to either iteratee or enumerator (which had mostly comparable performance last time I checked, with a slight edge to iteratee)?  Or to Oleg's library?  Try writing test cases, a simple byte-counting application, or similar, so you can compare the performance with the other versions.  Both enumerator and iteratee include demo programs that you could use as a starting point.

I agree that iteratees which work on a per-element level are very clean and should be amenable to optimization by GHC.  It also shows a very clear relationship with stream-fusion techniques.  Unfortunately when I last tried it I couldn't get acceptable performance.  I was using ghc-6.12.1 IIRC, so it could be different now.

John

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Iteratee-like

Permjacov Evgeniy
On 12/15/2010 05:48 PM, John Lato wrote:
From: Permjacov Evgeniy <[hidden email]>

current links

https://github.com/permeakra/Rank2Iteratee
https://github.com/permeakra/PassiveIteratee

The main difference from 'original' iteratees I read about is that both
do not use 'chunks' and pass data one-by-one. So, what I wrote may be
slower, but should be easier to maintain and more transparent for ghc
optimising facilities. I wanted as clean and simple code as possible,
but it is still very, very messy at some places and I want it cleaner.
Any suggestions? I also want to check, how good ghc does its work with
this messy modules. They may become interesting benchmarks.

Have you tried comparing it to either iteratee or enumerator (which had mostly comparable performance last time I checked, with a slight edge to iteratee)?  Or to Oleg's library?  Try writing test cases, a simple byte-counting application, or similar, so you can compare the performance with the other versions.  Both enumerator and iteratee include demo programs that you could use as a starting point.
I wrote a simple counter (attached,works for both variants of package), that literally counts bytes of given value in input. I got three-time slower result then with lazy io ( 0_o) with rank2types variant and six-seven time slower result with no CPS-version.  Looks like ghc is really good with list fusion... I'm still reading tutorial from iteratee package, so I have not compared with it yet. An equivalend lazy io programm attached. Can someone give an advice how to improve performance?

I agree that iteratees which work on a per-element level are very clean and should be amenable to optimization by GHC.  It also shows a very clear relationship with stream-fusion techniques.  Unfortunately when I last tried it I couldn't get acceptable performance.  I was using ghc-6.12.1 IIRC, so it could be different now.

John


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe

testLIO.hs (324 bytes) Download Attachment
testR2I.hs (449 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Iteratee-like

Permjacov Evgeniy
In reply to this post by John Lato-2
On 12/15/2010 05:48 PM, John Lato wrote:
From: Permjacov Evgeniy <[hidden email]>

current links

https://github.com/permeakra/Rank2Iteratee
https://github.com/permeakra/PassiveIteratee

The main difference from 'original' iteratees I read about is that both
do not use 'chunks' and pass data one-by-one. So, what I wrote may be
slower, but should be easier to maintain and more transparent for ghc
optimising facilities. I wanted as clean and simple code as possible,
but it is still very, very messy at some places and I want it cleaner.
Any suggestions? I also want to check, how good ghc does its work with
this messy modules. They may become interesting benchmarks.

Have you tried comparing it to either iteratee or enumerator (which had mostly comparable performance last time I checked, with a slight edge to iteratee)?  Or to Oleg's library?  Try writing test cases, a simple byte-counting application, or similar, so you can compare the performance with the other versions.  Both enumerator and iteratee include demo programs that you could use as a starting point.
Ok, I tested with ByteString chunks and got roughly the same performance (less then 5 % difference) as with Data.Iteratee (as expected, as it is not a monad a bottlenec when using chunks). However, with Word8' streams I slows down to point six times slower then lazy IO. this is still may be acceptable if IO actions has to be performed while making nontrivial list fusions, but in general it is fail.

Well, ghc has another complicated case for compiler optimisation tests.

CPS-style with rank2 types provides boost to performance, but when using chunks it is insignificant, so haskell-98 version of iteratees may be used with no worries.

I agree that iteratees which work on a per-element level are very clean and should be amenable to optimization by GHC.  It also shows a very clear relationship with stream-fusion techniques.  Unfortunately when I last tried it I couldn't get acceptable performance.  I was using ghc-6.12.1 IIRC, so it could be different now.

John


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Iteratee-like

John Lato-2
On Thu, Dec 16, 2010 at 12:23 AM, Permjacov Evgeniy <[hidden email]> wrote:
On 12/15/2010 05:48 PM, John Lato wrote:
From: Permjacov Evgeniy <[hidden email]>

current links

https://github.com/permeakra/Rank2Iteratee
https://github.com/permeakra/PassiveIteratee

The main difference from 'original' iteratees I read about is that both
do not use 'chunks' and pass data one-by-one. So, what I wrote may be
slower, but should be easier to maintain and more transparent for ghc
optimising facilities. I wanted as clean and simple code as possible,
but it is still very, very messy at some places and I want it cleaner.
Any suggestions? I also want to check, how good ghc does its work with
this messy modules. They may become interesting benchmarks.

Have you tried comparing it to either iteratee or enumerator (which had mostly comparable performance last time I checked, with a slight edge to iteratee)?  Or to Oleg's library?  Try writing test cases, a simple byte-counting application, or similar, so you can compare the performance with the other versions.  Both enumerator and iteratee include demo programs that you could use as a starting point.
Ok, I tested with ByteString chunks and got roughly the same performance (less then 5 % difference) as with Data.Iteratee (as expected, as it is not a monad a bottlenec when using chunks). However, with Word8' streams I slows down to point six times slower then lazy IO. this is still may be acceptable if IO actions has to be performed while making nontrivial list fusions, but in general it is fail.

Disappointing, but I'm not surprised.  I think getting good performance is possible in principle, but currently there's something missing from the implementations.  Whether work needs to be done on GHC's optimizer or the iteratee code, I can't say.  Honestly I'm not too interested in pursuing this myself now, but if somebody else wants to it could be fruitful.

John

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe