Re: using ghc with make

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: using ghc with make

Simon Marlow-5
[ moving to [hidden email] from [hidden email] ]

Bulat Ziganshin wrote:

> Wednesday, April 19, 2006, 4:45:19 PM, you wrote:
>
>>Believe me I've looked in detail at your streams library.
>>Performance-wise it is great but the design needs to be reworked IMO.
>
>>The main problem is that it doesn't have enough type structure.  There
>>are many combinations of stream transformers that don't make sense, and
>>should therefore be ruled out by the type system.  There are operations
>>that don't work on some streams.  There should at the least be a type
>>distinction between directly accessible memory streams, byte streams,
>>and text streams.  Additionally I would add separate classes for
>>seekable and buffered streams.  I believe these changes would improve
>>performance by reducing the size of dictionaries.
>
> you have written this in February, but this discussion was not
> finished due to my laziness. now i tried to split Stream interface to
> several parts. so
I've attached a sketched design.  It doesn't compile, but it illustrates
the structure I have in mind.  The main improvement since the new-io
library is the addition of memory streams.  This is an idea from your
library and I like it a lot, although I changed the type of the methods:

-- | An input stream accessed directly via a memory buffer.
-- Ordinary 'InputStream's may be converted to 'MemInputStream's by
-- adding buffering; see 'bufferInputStream'.
class MemInputStream s where
   -- | Consume some bytes from the memory stream.  The second argument
   -- is an IO action that is passed a buffer and its size (the size must
   -- be non-zero), and it should return the number of bytes consumed.
   withStreamInputBuffer :: s -> (Ptr Word8 -> Int -> IO Int) -> IO ()

-- | An output stream accessed directly via a memory buffer.
-- Ordinary 'OutputStream's may be converted to 'MemOutputStream's by
-- adding buffering; see 'bufferOutputStream'.
class MemOutputStream s where
   -- | Write some bytes to a memory stream.  The second argument
   -- is an IO action that is passed a buffer and its size (the size must
   -- be non-zero), and it should return the number of bytes written.
   withStreamOutputBuffer :: s -> (Ptr Word8 -> Int -> IO Int) -> IO ()

> 1) that you think - Stream should be base for all other stream classes
> or each Stream class should be independent? i.e.

Superclasses aren't necessary, but they might help to reduce the size of
contexts in practice.  We should probably experiment with both.

> 2) separation of Stream classes make some automatic definitions
> impossible. for example, released version contains vGetBuf
> implementation that is defined via vGetChar and works ok for streams
> that provide only vGetChar as base function.

No class should implement both reading bytes and reading chars.
Encoding/decoding should be a stream transformer that turns a byte
stream into a text stream.  So there's no duplication of these methods.

I believe splitting up the classes should lead to less duplication, not
more, partly because you don't have to implement a lot of methods that
don't do anything or are errors (eg. writing to an input stream).  I
admit I haven't actually written all the code, though.

> 3) the problems are substantially growed now - when i tried to
> separate input and output streams (the same will apply to detaching of
> seekable streams into the separate class). the problem is what i need
> either to provide 2 or 3 separate implementations for buffering of
> read-only, write-only and read-write streams or have some universal
> definition that should work even when base Stream don't provide part
> of operations. the last seems to be impossible - may be i don't
> understand enough Haskell's class system?
>
> let's see:
>
> data BufferedStream h = Buf h ....
>
> vClose (Buf h ...) = vPutBuf ... - flush buffer's contents
>
> how i can implement this if `h` may not support vPutBuf operation?
> especially to allow read/write streams to work???
You can only buffer a byte stream.  See my sketch design.

> 4) what you mean by "There are many combinations of stream
> transformers that don't make sense" ? splitting Stream class to the
> BlockStream/TextStream/ByteStream or something else?

Yes - adding decoding to a TextStream doesn't make sense.  Directly
accessing the memory of a byte stream doesn't make sense: you need to
buffer it first, or use a memory-mapped stream.

It is still possible to implement read/write files using this structure.
  There's nothing stopping you having an type that is an instance of
both InputStream and OutputStream (eg. a read/write file), and layering
buffering on top of this would yield a buffered input/output stream in
which the buffer contains only input or output data.

Cheers,
        Simon

{-# OPTIONS -fglasgow-exts -cpp #-}
-----------------------------------------------------------------------------
-- |
-- Module      :  System.IO.Stream
-- Copyright   :  (c) various
-- License     :  see libraries/base/LICENSE
--
-- Maintainer  :  [hidden email]
-- Stability   :  experimental
-- Portability :  non-portable (existentials, ghc extensions)
--
-- InputStreams and OutputStreams are classes of objects which support
-- input and output respectively.  Streams can be layered on top of various
-- underlying I/O objects (such as files or sockets).  Stream transformers
-- can be applied to turn streams of one type into streams of another type.
--
-----------------------------------------------------------------------------

module System.IO.Stream (
  -- * Streams
  {-class-} Stream(..),
  {-class-} InputStream(..),
  {-class-} OutputStream(..),
  streamGet, streamReadBuffer,
  streamPut, streamWriteBuffer,

  -- * Stream connections
  PipeInputStream, PipeOutputStream,
  streamPipe,
  streamConnect,

  -- * Memory streams
  {-class-} MemInputStream,
  {-class-} MemOutputStream,

  -- ** Converting memory streams to I/O streams
  MemToInputStream, memToInputStream,
  MemToOutputStream, memToOutputStream,

  -- * Buffering
  BufferMode(..),
  BufferedInputStream, bufferIntputStream,
  BufferedOutputStream, bufferOutputStream,
  ) where

import System.IO.Buffer

import Foreign
import Data.Word ( Word8 )
import System.IO ( BufferMode(..) )
import System.IO.Error  ( mkIOError, eofErrorType )
import Control.Exception ( assert )
import Control.Monad ( when, liftM )
import Control.Concurrent
import Data.IORef

import GHC.Exts
import GHC.Ptr ( Ptr(..) )
import GHC.IOBase ( IO(..), ioException )
import GHC.Handle ( ioe_EOF )

#define UPK  {-# UNPACK #-} !

-- -----------------------------------------------------------------------------
-- Streams

class Stream s where
   -- | closes a stream
   closeStream :: s -> IO ()
   
   -- | returns 'True' if the stream is open
   streamIsOpen :: s -> IO Bool

   -- | ToDo: objections have been raised about this method, and
   -- are still to be resolved.  It doesn't make as much sense
   -- for output streams as it does for input streams.
   streamIsEOS :: s -> IO Bool

   -- | Returns 'True' if there is data available to read from this
   -- stream.  Returns 'False' if either there is no data available, or
   -- the end of the stream has been reached.
   streamReady :: s -> IO Bool

   -- | Returns the number of bytes that can be transfered to/from
   -- this stream, if known.
   streamRemaining :: s -> IO (Maybe Integer)


-- | An 'InputStream' is a basic I/O object which supports reading a
-- stream of 'Word8's.  It is expected that 'InputStream's are unbuffered:
-- buffering is layered on top of one of these.
class Stream s => InputStream s where
   -- | Grabs data without blocking, but only if there is data available.
   -- If there is none, then waits for some.  This function may only
   -- return zero if either the requested length is zero or the end of stream
   -- has been reached.
   streamReadBufferNonBlocking  :: s -> Integer -> Ptr Word8 -> IO Integer


-- | Reads a single 'Word8' from a stream.
streamGet :: InputStream s => s -> IO Word8
streamGet s =
  alloca $ \p -> do
     r <- streamReadBufferNonBlocking s 1 p
     if r == 0 then ioe_EOF else peek p
   
-- | Reads data from the stream into a 'Buffer'.  Returns the
-- number of elements that were read, which may only be less than the
-- requested length if the end of the stream was reached.
streamReadBuffer :: InputStream s => s -> Integer -> Ptr Word8 -> IO Integer
streamReadBuffer s 0 buf = return 0
streamReadBuffer s len ptr = streamReadBufferLoop s ptr 0 (fromIntegral len)

streamReadBufferLoop
        :: InputStream s => s -> Ptr Word8
        -> Integer -> Integer -> IO Integer
streamReadBufferLoop s ptr off len = do
  r <- streamReadBufferNonBlocking s len ptr
  if r == 0
     then return (fromIntegral off)
     else if (r < len)
             then streamReadBufferLoop s
                        (ptr `plusPtr` fromIntegral r) (off+r) (len-r)
             else return (off+r)

-- -----------------------------------------------------------------------------
-- Output streams

class Stream s => OutputStream s where
   -- | Writes data to an output stream.  It will write at least one
   -- byte, but will only write further bytes if it can do so without
   -- blocking.
   --
   -- The result may never be 0 if the requested write size was > 0.
   -- If no bytes can be written to the stream, then the
   -- 'streamWriteBufferNonBlocking' should raise an exception
   -- indicating the cause of the problem (eg. the stream is closed).
   streamWriteBufferNonBlocking :: s -> Integer -> Ptr Word8 -> IO Integer
   

-- | Writes a single byte to an output stream.
streamPut :: OutputStream s => s -> Word8 -> IO ()
streamPut s word = with word $ \p -> streamWriteBuffer s 1 p

-- | Writes data to a stream, only returns when all the data has been
-- written.
streamWriteBuffer :: OutputStream s => s -> Integer -> Ptr Word8 -> IO ()
streamWriteBuffer s 0   ptr = return ()
streamWriteBuffer s len ptr = streamWriteBufferLoop s ptr 0 len

streamWriteBufferLoop
        :: OutputStream s => s -> Ptr Word8
        -> Integer -> Integer -> IO ()
streamWriteBufferLoop s ptr off len =
  seq off $ -- strictness hack
  if len == 0
        then return ()
        else do
  r <- streamWriteBufferNonBlocking s len ptr
  assert (r /= 0) $ do
  if (r < len)
        then streamWriteBufferLoop s (ptr `plusPtr` fromIntegral r)
                (off+r) (len-r)
        else return ()

-- ----------------------------------------------------------------------------
-- Connecting streams

data PipeInputStream
data PipeOutputStream

instance Stream PipeInputStream -- ToDo
instance InputStream PipeInputStream -- ToDo

instance Stream PipeOutputStream -- ToDo
instance OutputStream PipeOutputStream -- ToDo

streamPipe :: IO (PipeInputStream, PipeOutputStream)
streamPipe = error "unimplemented: streamOutputToInput"

-- | Takes an output stream and an input stream, and pipes all the
-- data from the former into the latter.
streamConnect :: (OutputStream o, InputStream i) => o -> i -> IO ()
streamConnect = error "unimplemented: streamInputToOutput"

-- ----------------------------------------------------------------------------
-- Memory streams

-- | An input stream accessed directly via a memory buffer.
-- Ordinary 'InputStream's may be converted to 'MemInputStream's by
-- adding buffering; see 'bufferInputStream'.
class MemInputStream s where
  -- | Consume some bytes from the memory stream.  The second argument
  -- is an IO action that is passed a buffer and its size (the size must
  -- be non-zero), and it should return the number of bytes consumed.
  withStreamInputBuffer :: s -> (Ptr Word8 -> Int -> IO Int) -> IO ()

-- | An output stream accessed directly via a memory buffer.
-- Ordinary 'OutputStream's may be converted to 'MemOutputStream's by
-- adding buffering; see 'bufferOutputStream'.
class MemOutputStream s where
  -- | Write some bytes to a memory stream.  The second argument
  -- is an IO action that is passed a buffer and its size (the size must
  -- be non-zero), and it should return the number of bytes written.
  withStreamOutputBuffer :: s -> (Ptr Word8 -> Int -> IO Int) -> IO ()

-- -----------------------------------------------------------------------------
-- A memory stream can be converted to an ordinary byte stream

newtype MemToInputStream  s = MemToInputStream  s
  deriving (Stream, MemInputStream)

newtype MemToOutputStream s = MemToOutputStream s
  deriving (Stream, MemOutputStream)

-- Rationale: what we really want is
--    instance MemInputStream s => InputStream s
-- but that overlaps.  So instead we provide a way to convert
-- every MemInputStream into something that is an instance of
-- InputStream.

instance InputStream  (MemToInputStream  s) where
instance OutputStream (MemToOutputStream s) where

memToInputStream  :: MemInputStream  s => MemToInputStream s
memToInputStream = MemToInputStream

memToOutputStream :: MemOutputStream s => MemToOutputStream s
memToOutputStream = MemToOutputStream

-- -----------------------------------------------------------------------------
-- Buffering

-- | Operations on a stream with a buffer
class Buffered s where
   -- | Sets the buffering mode on the stream.  Returns 'True' if
   -- the buffereing mode was set, or 'False' if it wasn't.  If the
   -- stream does not support buffering, it may return 'False'.
   setBufferMode :: s -> BufferMode -> IO Bool
   
   -- | Returns the current buffering mode for a stream.  On a
   -- stream that does not support buffering, the result will always
   -- be 'NoBuffering'.
   getBufferMode :: s -> IO BufferMode

   -- | Returns the number of bytes of data in the buffer
   countBufferedBytes :: s -> IO Int


-- | Operations on an output stream with a buffer
class OutputBuffered s where
   -- | Flushes the buffer to the operating system
   flush :: s -> IO ()
   
   -- | Flushes the buffered data as far as possible, even to the
   -- physical media if it can.  It returns 'True' if the data
   -- has definitely been flushed as far as it can go: to the
   -- disk for a disk file, to the screen for a terminal, and so on.
   sync :: s -> IO Bool


-- | Operations on an input stream with a buffer
class InputBuffered s where
   -- | Discards the input buffer
   discard :: s -> IO ()

   -- | Pushes back the buffered data, if possible.  Returns 'True' if
   -- the buffer could be pushed back, 'False' otherwise.
   pushback :: s -> IO Bool
        -- TODO: Seekable superclass allows pushback?


-- | An 'InputStream' with buffering added
data BufferedInputStream  s = BufferedInputStream  s BufferMode !(IORef (Buffer Word8))

-- | An 'OutputStream' with buffering added
data BufferedOutputStream s = BufferedOutputStream s BufferMode !(IORef (Buffer Word8))

bufferSize :: BufferMode -> Int
bufferSize (BlockBuffering (Just size)) = size
bufferSize _ = dEFAULT_BUFFER_SIZE

-- | Add buffering to an 'InputStream'
bufferInputStream :: InputStream  s => s -> BufferMode -> BufferedInputStream  s
bufferInputStream stream bmode = do
   buffer <- allocateBuffer (bufferSize bmode)
   r <- newIORef buffer
   return (BufferedInputStream stream bmode r)

-- | Add buffering to an 'OutputStream'
bufferOutputStream :: OutputStream s => s -> BufferedOutputStream s
bufferOutputStream stream bmode = do
   buffer <- allocateBuffer (bufferSize bmode)
   r <- newIORef buffer
   return (BufferedOutputStream stream bmode r)

instance Stream s => MemInputStream (BufferedInputStream s) where
  withStreamInputBuffer b@(BufferedInputStream s bmode ref) action = do
    buffer <- readIORef ref
    if emptyBuffer buffer
        then do buffer' <- fillReadBuffer s buffer
                writeIORef ref buffer'
                withStreamInputBuffer b action
        else do let used = bufferUsed buffer
                count <- withBuffer buffer $ \ptr ->
                                action (ptr `plusPtr` bufRPtr buffer) used
                let buffer' = bufferRemove (min used count)
                checkBufferInvariants buffer'
                writeIORef ref $! buffer'

instance Stream s => MemOutputStream (BufferedOutputStream s) where
  withStreamOutputBuffer b@(BufferedOutputStream s bmode ref) action = do
    buffer <- readIORef ref
    let avail = bufferAvailable buffer
    count <- withBuffer buffer $ \ptr ->
                        action (ptr `plusPtr` bufWPtr buffer) avail
    let buffer' = bufferAdd (max count avail)
    if fullBuffer buffer'
        then do writeBuffer s buffer; writeIORef ref (emptyBuffer buffer')
        else writeIORef ref buffer'

instance Stream s => Stream (BufferedInputStream s)
instance Stream s => Stream (BufferedOutputStream s)

instance Buffered (BufferedOutputStream s)
instance OutputBuffered (BufferedOutputStream s)

instance Buffered (BufferedInputStream s)
instance InputBuffered (BufferedInputStream s)

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: using ghc with make

Marcin 'Qrczak' Kowalczyk
Simon Marlow <[hidden email]> writes:

> You can only buffer a byte stream.  See my sketch design.

IMHO there should be buffered character streams too; of course in this
case the buffer holds characters. This allows to amortize the cost of
character recoding. Buffering should usually be the topmost layer.

--
   __("<         Marcin Kowalczyk
   \__/       [hidden email]
    ^^     http://qrnik.knm.org.pl/~qrczak/
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Streams internal structure

Bulat Ziganshin-2
In reply to this post by Simon Marlow-5
Hello Simon,

Thursday, April 20, 2006, 12:57:10 PM, you wrote:

>>>Believe me I've looked in detail at your streams library.
>>
>>>The main problem is that it doesn't have enough type structure.  There
>>>are many combinations of stream transformers that don't make sense, and
>>>should therefore be ruled out by the type system.  There are operations
>>>that don't work on some streams.  There should at the least be a type
>>>distinction between directly accessible memory streams, byte streams,
>>>and text streams.  Additionally I would add separate classes for
>>>seekable and buffered streams.  I believe these changes would improve
>>>performance by reducing the size of dictionaries.

well, your last answer shows that you don't understand my problems.
i'm entirely want to have precise classes, but when i run into
IMPLEMENTATION, Haskell restrictions bite me again and again. for
beginning, i should describe class structure what was modified
according to your critics (i yet not released version with these
modifications, but it available as
http://freearc.narod.ru/Binary_20060329184510.rar)

currently, Stream lib includes class Streams, what includes
"unclassified", minor operations, such as vIsEOF or vShow, and classes
BlockStream, MemoryStream, ByteStream, TextStream

class BlockStream supports reading and writing of memory blocks:

class (Stream m h) => BlockStream m h | h->m where
    vGetBuf :: h -> Ptr a -> Int -> m Int
    vPutBuf :: h -> Ptr a -> Int -> m ()

it's a natural class for low-level, "raw" streams such as FD or
network sockets:

instance BlockStream IO FD where
    vGetBuf = fdGetBuf
    vPutBuf = fdPutBuf

vGetBuf, as you suggested, should implement non-blocking reads, while
the vPutBuf should write the whole buffer.


MemoryStream class supports low-level, "raw" streams whose data are
ALREADY RESIDE IN MEMORY. examples are MemBuf, MemoryMappedFile,
circular buffer for inter-thread communication what i plan to
implement. the main difference against BlockStream is what data are
not read in user-supplied buffer, but already somewhere in memory, and
MemoryStream functions just give to user address and size of next part
of data (for reading) or next place to fill up (for writing). so:

class (Stream m h) => MemoryStream m h | h->m where
    -- | Receive next buffer which contains data / should be filled with data
    vReceiveBuf :: h -> ReadWrite -> m (Ptr a, Int)
    -- | Release buffer after reading `len` bytes / Send buffer filled with `len` bytes
    vSendBuf    :: h -> Ptr a -> Int -> Int -> m ()

data ReadWrite = READING | WRITING | UNKNOWN  deriving (Eq, Show)


this scheme will allow to create MemBuf not as one huge buffer, as in
current implementation and all other Binary/... libraries, but as list
of buffers of some fixed size. the same applies to MemoryMappedFile -
this scheme allows to map just fixed-size buffer each time instead of
mapping the whole file

next level is the ByteStream class, what is just the way to quickly
read/write one byte at time:

class (Stream m h) => ByteStream m h | h->m where
    vGetByte :: h -> m Word8
    vPutByte :: h -> Word8 -> m ()

each buffering transformer implements ByteStream via BlockStream or
MemoryStream:

instance (BlockStream IO h) => ByteStream IO (BufferedBlockStream h)
instance (MemoryStream IO h) => ByteStream IO (BufferedMemoryStream h)
instance (MemoryStream IO h) => ByteStream IO (UncheckedBufferedMemoryStream h)

These data type constructors (BufferedBlockStream...) joins raw stream
with buffer, r/w pointer and other data required to implement
buffering:

data BufferedBlockStream h = BBuf h                  -- raw stream
                                  (IOURef BytePtr)   -- buffer
                                  (IOURef BytePtr)   -- buffer end
                                  (IOURef BytePtr)   -- r/w pointer
                                  ....
type BytePtr = Ptr Word8



Next level is class TextStream that implements text I/O operations:

class (Stream m h) => TextStream m h | h->m where
    vGetChar :: h -> m Char
    vGetLine :: h -> m String
    vGetContents :: h -> m String
    vPutChar :: h -> Char -> m ()
    vPutStr :: h -> String -> m ()

Encoding transformer attaches encoding to the ByteStream that allows
to implement text I/O:

instance (ByteStream m h) => TextStream m (WithEncoding m h)

data WithEncoding m h = WithEncoding h (CharEncoding m)

where type "CharEncoding m" provides vGetByte->vGetChar and vPutByte->vPutChar
transformers operating in monad m


all these can be named a canonical Streams hierarchy and it already
works. i will be glad to add BufferedStream and SeekableStream classes
and split BlockStream..TextStream to the reading and writing ones, but
this is, as i said, limited by implementation issues


--
Best regards,
 Bulat                            mailto:[hidden email]

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Streams internal structure

Simon Marlow-5
Hi Bulat,

Bulat Ziganshin wrote:

> well, your last answer shows that you don't understand my problems.
> i'm entirely want to have precise classes, but when i run into
> IMPLEMENTATION, Haskell restrictions bite me again and again.

Ok, I think you need to describe these problems in more detail.  The
message you just wrote describes the structure of the library which I
think is mostly fine, and corresponds fairly well with what I had in mind.

> http://freearc.narod.ru/Binary_20060329184510.rar)

Could you provide a .zip or .tar.gz instead?

> currently, Stream lib includes class Streams, what includes
> "unclassified", minor operations, such as vIsEOF or vShow, and classes
> BlockStream, MemoryStream, ByteStream, TextStream
>
> class BlockStream supports reading and writing of memory blocks:
>
> class (Stream m h) => BlockStream m h | h->m where
>     vGetBuf :: h -> Ptr a -> Int -> m Int
>     vPutBuf :: h -> Ptr a -> Int -> m ()

fine, that's closely equivalent to my InputStream/OutputStream.

> it's a natural class for low-level, "raw" streams such as FD or
> network sockets:
>
> instance BlockStream IO FD where
>     vGetBuf = fdGetBuf
>     vPutBuf = fdPutBuf
>
> vGetBuf, as you suggested, should implement non-blocking reads, while
> the vPutBuf should write the whole buffer.

vPutBuf should be non-blocking too.

> MemoryStream class supports low-level, "raw" streams whose data are
> ALREADY RESIDE IN MEMORY. examples are MemBuf, MemoryMappedFile,
> circular buffer for inter-thread communication what i plan to
> implement. the main difference against BlockStream is what data are
> not read in user-supplied buffer, but already somewhere in memory, and
> MemoryStream functions just give to user address and size of next part
> of data (for reading) or next place to fill up (for writing). so:
>
> class (Stream m h) => MemoryStream m h | h->m where
>     -- | Receive next buffer which contains data / should be filled with data
>     vReceiveBuf :: h -> ReadWrite -> m (Ptr a, Int)
>     -- | Release buffer after reading `len` bytes / Send buffer filled with `len` bytes
>     vSendBuf    :: h -> Ptr a -> Int -> Int -> m ()

which is equivalent to my MemInputStream/MemOutputStream.  I believe the
with-style interface that I use is better though.

> data ReadWrite = READING | WRITING | UNKNOWN  deriving (Eq, Show)
>
> this scheme will allow to create MemBuf not as one huge buffer, as in
> current implementation and all other Binary/... libraries, but as list
> of buffers of some fixed size. the same applies to MemoryMappedFile -
> this scheme allows to map just fixed-size buffer each time instead of
> mapping the whole file
>
> next level is the ByteStream class, what is just the way to quickly
> read/write one byte at time:
>
> class (Stream m h) => ByteStream m h | h->m where
>     vGetByte :: h -> m Word8
>     vPutByte :: h -> Word8 -> m ()
>
> each buffering transformer implements ByteStream via BlockStream or
> MemoryStream:

I didn't have an equivalent to this class in my design.  Why is it
necessary?  vGetByte/vPutByte can be implemented for an arbitrary
BlockStream or indeed a MemoryStream.

> instance (BlockStream IO h) => ByteStream IO (BufferedBlockStream h)

BufferedBlockStream == my BufferedInputStream/BufferedOutputStream

> instance (MemoryStream IO h) => ByteStream IO (BufferedMemoryStream h)

what's a BufferedMemoryStream for?  Isn't a memory stream already
buffered by definition?

> instance (MemoryStream IO h) => ByteStream IO (UncheckedBufferedMemoryStream h)
 >

> These data type constructors (BufferedBlockStream...) joins raw stream
> with buffer, r/w pointer and other data required to implement
> buffering:
>
> data BufferedBlockStream h = BBuf h                  -- raw stream
>                                   (IOURef BytePtr)   -- buffer
>                                   (IOURef BytePtr)   -- buffer end
>                                   (IOURef BytePtr)   -- r/w pointer
>                                   ....
> type BytePtr = Ptr Word8
>
>
>
> Next level is class TextStream that implements text I/O operations:
>
> class (Stream m h) => TextStream m h | h->m where
>     vGetChar :: h -> m Char
>     vGetLine :: h -> m String
>     vGetContents :: h -> m String
>     vPutChar :: h -> Char -> m ()
>     vPutStr :: h -> String -> m ()

Ok, I wouldn't put all those method in the class, I think.  Also it is
necessary to have buffering at the TextStream level, as Marcin pointed
out.  I haven't thought through the design carefully here.

> Encoding transformer attaches encoding to the ByteStream that allows
> to implement text I/O:
>
> instance (ByteStream m h) => TextStream m (WithEncoding m h)
>
> data WithEncoding m h = WithEncoding h (CharEncoding m)

yes

> where type "CharEncoding m" provides vGetByte->vGetChar and vPutByte->vPutChar
> transformers operating in monad m
>
> all these can be named a canonical Streams hierarchy and it already
> works.

So what are the problems you were referring to?  eg. this from your
previous message:

 > 2) separation of Stream classes make some automatic definitions
 > impossible. for example, released version contains vGetBuf
 > implementation that is defined via vGetChar and works ok for streams
 > that provide only vGetChar as base function.

and

> i will be glad to add BufferedStream and SeekableStream classes
> and split BlockStream..TextStream to the reading and writing ones, but
> this is, as i said, limited by implementation issues

which implementation issues?

Cheers,
        Simon
_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Streams internal structure

Bulat Ziganshin-2
In reply to this post by Bulat Ziganshin-2
Hello Bulat,

Friday, April 21, 2006, 1:10:29 PM, you wrote:

> according to your critics (i yet not released version with these
> modifications, but it available as
> http://freearc.narod.ru/Binary_20060329184510.rar)

sorry, it is http://freearc.narod.ru/Binary_20060331174635.rar


--
Best regards,
 Bulat                            mailto:[hidden email]

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re[2]: Streams internal structure

Bulat Ziganshin-2
In reply to this post by Simon Marlow-5
Hello Simon,

Friday, April 21, 2006, 2:20:26 PM, you wrote:

>> well, your last answer shows that you don't understand my problems.
>> i'm entirely want to have precise classes, but when i run into
>> IMPLEMENTATION, Haskell restrictions bite me again and again.

> Ok, I think you need to describe these problems in more detail.  The
> message you just wrote describes the structure of the library which I
> think is mostly fine, and corresponds fairly well with what I had in mind.

> Could you provide a .zip or .tar.gz instead?

yes, the last version is http://freearc.narod.ru/StreamsBeta.zip

> vPutBuf should be non-blocking too.

so it should return number of bytes written? and it should be named
vPutBufNonBlocking? you are already said in February that vGetBuf
should be used in buffering transformer gently - i.e. i should use
just the returned number of bytes and don't enforce filling of whole
buffer. how should be the policy of using vPutBufNonBlocking? what
should be vGEtBuf/vPutBuf (i.e. functions provided to users) - are
they should be blocking or non-blocking?

> which is equivalent to my MemInputStream/MemOutputStream.  I believe the
> with-style interface that I use is better though.

main benefit of my structure is that MemoryStream and BlockStream are
very close so i hope to implement common buffering layer. your variant
makes this much harder. on the other side, i don't see any benefits in
using your scheme

>> each buffering transformer implements ByteStream via BlockStream or
>> MemoryStream:

> I didn't have an equivalent to this class in my design.  Why is it
> necessary?  vGetByte/vPutByte can be implemented for an arbitrary
> BlockStream or indeed a MemoryStream.

:) yes, you never mind about implementation issues! how the naked FD, for
example, can implement vGetByte? i'm not Copperfield to extract buffer
from nowhere. Buffering transformer is not just a bunch of functions,
it's a whole DATA STRUCTURE what joins stream handle and buffer
pointers together! see my previous letter what described this

the main idea of the whole Streams library is what low-level stream
types don't carry any data that will be used only in high-level
routines. FD don't carry buffer or CharEncoding information. it's just
can read or write block of data - it's all! all other data items are
added by corresponding transformers. so FD CAN'T implement vGetByte
with buffering and its implementation without buffering will be imho
just source of errors - users will constantly asking why it is so
slow. moreover i can't add vGetByte/vPutByte to the BlockStream or
MemoryStream class (because both can implement it), so it anyway
should be in separate class

>> instance (MemoryStream IO h) => ByteStream IO (BufferedMemoryStream h)

> what's a BufferedMemoryStream for?  Isn't a memory stream already
> buffered by definition?

we just use word "buffering" in different meanings. for me buffering
means that underlying stream give us data in large enough chunks and
this can be implemented inefficiently. buffering transformer efficiently
implements byte-oriented or char-oriented operations by working with
buffer (either provided by MemoryStream or allocated explicitly for
BlockStream) and make rather infrequent calls to the block i/o
operations of underlying stream

so, for me MemoryStream is not buffered, it just support block i/o
operations in slightly different way than BlockStream. going your way,
as i understand, we should call withStreamInputBuffer on each i/o
operation what mean that withStreamInputBuffer should be implemented
very efficiently in each MemoryStream. well, it's also possible but i
prefer to use the same algorithms (and i hope to join implementations)
for byte&char i/o over both MemoryStreams and BlockStreams

on the other side, now i think about making BlockStream->MemoryStream
transformer (it should just alloc it's own buffer) and then your's
withStreamInputBuffer may be used for any I/O through buffer (i.e.
both for BlockStreams and MemoryStreams). it will be great for implementing
functions requiring lookahead such as vGetLine

>> class (Stream m h) => TextStream m h | h->m where
>>     vGetChar :: h -> m Char
>>     vGetLine :: h -> m String
>>     vGetContents :: h -> m String
>>     vPutChar :: h -> Char -> m ()
>>     vPutStr :: h -> String -> m ()

> Ok, I wouldn't put all those method in the class, I think.

it is because you never mind about fast implementation :)  i already
moved vPutStrLn/vPrint out of this class.

> Also it is
> necessary to have buffering at the TextStream level, as Marcin pointed
> out.  I haven't thought through the design carefully here.

implementation is rather obvious and therefore boring :)  we should
convert data to the UCS-4 and then work with buffer containing 4-byte
chars. are you not done this in 6.6 compiler when you read
UTF-8 sources? we will lose vTell operation (on the other side, we
anyway lose it on text streams in windows :) )


>> all these can be named a canonical Streams hierarchy and it already
>> works.

> So what are the problems you were referring to?

i said only about working things. as you see, i don't mentioned,
for example, InTextStream or SeekableStream classes, because i had
problems with implementing this


 >> 2) separation of Stream classes make some automatic definitions
 >> impossible. for example, released version contains vGetBuf
 >> implementation that is defined via vGetChar and works ok for streams
 >> that provide only vGetChar as base function.

it's a minor problem, so you can skip it. just for completeness: my
first lib contained the following definitions:

class Stream m h where
  vGetChar :: ...
  vGetPuf :: ...

  -- default implementations
  vGetPuf = ... some code using vGetChar

instance Stream IO StringBuffer
  vGetChar = ...
instance Stream IO StringReader
  vGetChar = ...

and vGetBuf for StringBuffer and StringReader was defined
automatically. try to make this effect for current design. it should
define:

instance (TextStream m h) => BlockStream m h
  vGetPuf = ... some code using vGetChar

but such code seen by Haskell as potential overlapping instance
definition
 

>> i will be glad to add BufferedStream and SeekableStream classes
>> and split BlockStream..TextStream to the reading and writing ones, but
>> this is, as i said, limited by implementation issues

> which implementation issues?

well, let's we have 3 different FDs - ReadFD, WriteFD and ReadWriteFD.
it's essentially the same Ints, just supporting different sets of I/O
operations:

instance InBlockStream ReadFD  where ...      -- implements vGetBuf
instance InBlockStream ReadWriteFD where ...
instance OutBlockStream WriteFD where ...     -- implements vPutBuf
instance OutBlockStream ReadWriteFD where ...

now i'm going to implement InByteStream interface (vGetByte) and
OutByteStream (vPutByte) for buffered FDs. i want to use
the same data type constructor to buffer all 3 types of FD streams.
well, we can write something like:

instance (InBlockStream h) => InByteStream (Buffered h) where
  vGetByte = ... something involving vGetBuf at last

instance (OutBlockStream h) => OutByteStream (Buffered h) where
  vPutByte = ... something involving vPutBuf at last

at the first look, it seems fine. but we also need to implement some
common functionality:

instance (Stream h) => Stream (Buffered h) where
  vSeek = ...

let's examine vSeek implementation. it should flush buffer on output
streams, i.e. it should call vPutBuf. but that's impossible - vPutBuf
is not in Streams dictionary! well, let's replicate implementation of
Stream class:

instance (InBlockStream h) => Stream (Buffered h) where
  ...
instance (OutBlockStream h) => Stream (Buffered h) where
  ...

but this, besides repeating almost the same code two times, again creates
overlapped instances! what instance should use compiler for ReadWriteFD?

yes, i can create 3 different data types for buffering input, output
and input-output streams but this means even more code duplication.
and multiplying these 3 i/o modes to the 3 buffering transformers i
already have, i got limitless code growth!


the same problem strikes me on each transformation stage:
BlockStream->ByteStream, ByteStream->TextStream,
ByteStream->BitsStream. it also bites me when i try to separate
SeekableStream operations - vSeek operation required to discard input
buffer, what is required in vSetBuf, for example. implementation of
different Stream operations contains numerous dependencies from each
other. may be it is great that i discover these dependencies at the early
development stage instead of bothering with numerous errors when
library will be really used in various environments

btw, that you think about specifying buffering mode at the Stream
creation? so vSetBuf will just gone away. to be exact, it will be
still possible to use different buffering schemes, by creating new
buffered Streams over the one raw Stream:

h <- openRawFD "text"
h1 <- withBuffering (BlockBuffering 512) h
... working with h1
vFlush h1 -- discards buffer and returns `h` pointer to exact the
          -- position of first byte that is not consumed by `h1`
h2 <- withBuffering LineBuffering h
... working with h2


--
Best regards,
 Bulat                            mailto:[hidden email]

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries
Reply | Threaded
Open this post in threaded view
|

Re: Streams internal structure

Simon Marlow-5
So I think I've extracted the important points here:

1. seek needs to flush the buffer, so it needs to know whether it is on
a buffered handle or not.  My plan was to do it like this:

class Seekable s where
   seek :: s -> SeekLocation -> IO ()

instance Seekable s => Seekable (BufferedInputStream s) where
   seek (BufferedInputStream s buf) loc = do
     .. flush the buffer ..
     seek s loc


2. performance of putByte, getByte.  You have the ByteStream interface,
as far as I can tell, just to improve the performance of these
operations.  I didn't have such a class because I believe that
shovelling large numbers of bytes is more important from a performance
perspective than shovelling one byte at a time.  For instance, I gave
the byte reader that works on any InputStream earlier:

-- | Reads a single 'Word8' from a stream.
streamGet :: InputStream s => s -> IO Word8
streamGet s =
   alloca $ \p -> do
      r <- streamReadBufferNonBlocking s 1 p
      if r == 0 then ioe_EOF else peek p

Still, you can have full speed single byte operations by just adding
methods to the MemInputStream/MemOutputStream class:

class MemInputStream s where
   withStreamInputBuffer :: s -> (Ptr Word8 -> Int -> IO (a,Int)) -> IO a

   memStreamGet :: s -> IO Word8
   memStreamGet s =
    withStreamInputBuffer s $ \p i -> do
      x <- peek p
      return (x, i+1)

(the default obviously isn't very fast, but individual instances would
provide faster implementations).


3. code explosion.

Firstly, removing ByteStream will reduce the code explosion.

In my design I had separate types for BufferedInputStream and
BufferedOutputStream, and there is hardly any duplication because
buffering input is just different to buffering output.  Read/write
buffering is just a combination of these two.

I still don't see why you need another layer of buffering over a memory
stream.  It supports direct operations on the memory already.  As I said
above, if you want fast(er) single byte read on these, then add it to
the MemInputStream class.

There is some boilerplate: the need to translate Stream, Seekable,
Buffered, InputBuffered etc. through upper layers to lower layers means
some boilerplate wrapper instances.  But I don't think that's a big
deal, and splitting up these classes hasn't made it any worse.
Admittedly though, separating input and output has doubled the amount of
boilerplate wrappers.  I don't see a way around that, other than using
some preprocessing or TH to generate the instances.

--------------

So here's how I see the layering for a file reading stream using iconv:

   file or socket (fd-based) :: FileInputStream
      instance of: InputStream, Stream, Seekable

   byte buffer :: BufferedInputStream
      instance of: MemInputStream, Stream,
                   Seekable, Buffered, InputBuffered

   iconv :: BufferedTextInputStream
      instance of: MemTextInputStream, TextInputStream, Stream,
                   Seekable, Buffered, InputBuffered

   lock :: LockedTextInputStream
      instance of: MemTextInputStream, TextInputStream, Stream,
                   Seekable, Buffered, InputBuffered

Two layers of buffering aren't strictly necessary, because the byte
buffer will usually be emptied immediately by the decoder, except that
it might leave a few bytes when there isn't a complete character to
decode.  It is simpler to do it this way though, because the decoding
layer then works with any MemInputStream (eg. a memory mapped file or
ByteString, as well as a buffered file), and it doesn't lose any efficiency.

A memory-mapped file becomes a MemInputStream directly, so it only gets
3 layers instead of 4, similarly for a ByteString or UArray Word8.

I haven't said what the MemTextInputStream or TextInputStream classes
look like.  I imagine something like this:

class MemTextInputStream s where
   withCharInputBuffer :: s -> (Ptr Char -> Int -> IO (a,Int)) -> IO a

class TextInputStream s where
   getChar :: s -> IO Char
   getContents ...
   getLine ...

A StringReader can be made an instance of TextInputStream directly.

So do you see any problem with this design?  Performance should be fine:
getContents reads directly from the buffer filled by iconv, and all
other operations are done at buffer sizes.  Fast byte and multibyte
operations are supported on MemInputStreams so fast Binary I/O and
serialisation/deserialisation are possible.

Cheers,
        Simon

Bulat Ziganshin wrote:

> Hello Simon,
>
> Friday, April 21, 2006, 2:20:26 PM, you wrote:
>
>
>>>well, your last answer shows that you don't understand my problems.
>>>i'm entirely want to have precise classes, but when i run into
>>>IMPLEMENTATION, Haskell restrictions bite me again and again.
>
>
>>Ok, I think you need to describe these problems in more detail.  The
>>message you just wrote describes the structure of the library which I
>>think is mostly fine, and corresponds fairly well with what I had in mind.
>
>
>>Could you provide a .zip or .tar.gz instead?
>
>
> yes, the last version is http://freearc.narod.ru/StreamsBeta.zip
>
>
>>vPutBuf should be non-blocking too.
>
>
> so it should return number of bytes written? and it should be named
> vPutBufNonBlocking? you are already said in February that vGetBuf
> should be used in buffering transformer gently - i.e. i should use
> just the returned number of bytes and don't enforce filling of whole
> buffer. how should be the policy of using vPutBufNonBlocking? what
> should be vGEtBuf/vPutBuf (i.e. functions provided to users) - are
> they should be blocking or non-blocking?
>
>
>>which is equivalent to my MemInputStream/MemOutputStream.  I believe the
>>with-style interface that I use is better though.
>
>
> main benefit of my structure is that MemoryStream and BlockStream are
> very close so i hope to implement common buffering layer. your variant
> makes this much harder. on the other side, i don't see any benefits in
> using your scheme
>
>
>>>each buffering transformer implements ByteStream via BlockStream or
>>>MemoryStream:
>
>
>>I didn't have an equivalent to this class in my design.  Why is it
>>necessary?  vGetByte/vPutByte can be implemented for an arbitrary
>>BlockStream or indeed a MemoryStream.
>
>
> :) yes, you never mind about implementation issues! how the naked FD, for
> example, can implement vGetByte? i'm not Copperfield to extract buffer
> from nowhere. Buffering transformer is not just a bunch of functions,
> it's a whole DATA STRUCTURE what joins stream handle and buffer
> pointers together! see my previous letter what described this
>
> the main idea of the whole Streams library is what low-level stream
> types don't carry any data that will be used only in high-level
> routines. FD don't carry buffer or CharEncoding information. it's just
> can read or write block of data - it's all! all other data items are
> added by corresponding transformers. so FD CAN'T implement vGetByte
> with buffering and its implementation without buffering will be imho
> just source of errors - users will constantly asking why it is so
> slow. moreover i can't add vGetByte/vPutByte to the BlockStream or
> MemoryStream class (because both can implement it), so it anyway
> should be in separate class
>
>
>>>instance (MemoryStream IO h) => ByteStream IO (BufferedMemoryStream h)
>
>
>>what's a BufferedMemoryStream for?  Isn't a memory stream already
>>buffered by definition?
>
>
> we just use word "buffering" in different meanings. for me buffering
> means that underlying stream give us data in large enough chunks and
> this can be implemented inefficiently. buffering transformer efficiently
> implements byte-oriented or char-oriented operations by working with
> buffer (either provided by MemoryStream or allocated explicitly for
> BlockStream) and make rather infrequent calls to the block i/o
> operations of underlying stream
>
> so, for me MemoryStream is not buffered, it just support block i/o
> operations in slightly different way than BlockStream. going your way,
> as i understand, we should call withStreamInputBuffer on each i/o
> operation what mean that withStreamInputBuffer should be implemented
> very efficiently in each MemoryStream. well, it's also possible but i
> prefer to use the same algorithms (and i hope to join implementations)
> for byte&char i/o over both MemoryStreams and BlockStreams
>
> on the other side, now i think about making BlockStream->MemoryStream
> transformer (it should just alloc it's own buffer) and then your's
> withStreamInputBuffer may be used for any I/O through buffer (i.e.
> both for BlockStreams and MemoryStreams). it will be great for implementing
> functions requiring lookahead such as vGetLine
>
>
>>>class (Stream m h) => TextStream m h | h->m where
>>>    vGetChar :: h -> m Char
>>>    vGetLine :: h -> m String
>>>    vGetContents :: h -> m String
>>>    vPutChar :: h -> Char -> m ()
>>>    vPutStr :: h -> String -> m ()
>
>
>>Ok, I wouldn't put all those method in the class, I think.
>
>
> it is because you never mind about fast implementation :)  i already
> moved vPutStrLn/vPrint out of this class.
>
>
>>Also it is
>>necessary to have buffering at the TextStream level, as Marcin pointed
>>out.  I haven't thought through the design carefully here.
>
>
> implementation is rather obvious and therefore boring :)  we should
> convert data to the UCS-4 and then work with buffer containing 4-byte
> chars. are you not done this in 6.6 compiler when you read
> UTF-8 sources? we will lose vTell operation (on the other side, we
> anyway lose it on text streams in windows :) )
>
>
>
>>>all these can be named a canonical Streams hierarchy and it already
>>>works.
>
>
>>So what are the problems you were referring to?
>
>
> i said only about working things. as you see, i don't mentioned,
> for example, InTextStream or SeekableStream classes, because i had
> problems with implementing this
>
>
>  >> 2) separation of Stream classes make some automatic definitions
>  >> impossible. for example, released version contains vGetBuf
>  >> implementation that is defined via vGetChar and works ok for streams
>  >> that provide only vGetChar as base function.
>
> it's a minor problem, so you can skip it. just for completeness: my
> first lib contained the following definitions:
>
> class Stream m h where
>   vGetChar :: ...
>   vGetPuf :: ...
>
>   -- default implementations
>   vGetPuf = ... some code using vGetChar
>
> instance Stream IO StringBuffer
>   vGetChar = ...
> instance Stream IO StringReader
>   vGetChar = ...
>
> and vGetBuf for StringBuffer and StringReader was defined
> automatically. try to make this effect for current design. it should
> define:
>
> instance (TextStream m h) => BlockStream m h
>   vGetPuf = ... some code using vGetChar
>
> but such code seen by Haskell as potential overlapping instance
> definition
>  
>
>
>>>i will be glad to add BufferedStream and SeekableStream classes
>>>and split BlockStream..TextStream to the reading and writing ones, but
>>>this is, as i said, limited by implementation issues
>
>
>>which implementation issues?
>
>
> well, let's we have 3 different FDs - ReadFD, WriteFD and ReadWriteFD.
> it's essentially the same Ints, just supporting different sets of I/O
> operations:
>
> instance InBlockStream ReadFD  where ...      -- implements vGetBuf
> instance InBlockStream ReadWriteFD where ...
> instance OutBlockStream WriteFD where ...     -- implements vPutBuf
> instance OutBlockStream ReadWriteFD where ...
>
> now i'm going to implement InByteStream interface (vGetByte) and
> OutByteStream (vPutByte) for buffered FDs. i want to use
> the same data type constructor to buffer all 3 types of FD streams.
> well, we can write something like:
>
> instance (InBlockStream h) => InByteStream (Buffered h) where
>   vGetByte = ... something involving vGetBuf at last
>
> instance (OutBlockStream h) => OutByteStream (Buffered h) where
>   vPutByte = ... something involving vPutBuf at last
>
> at the first look, it seems fine. but we also need to implement some
> common functionality:
>
> instance (Stream h) => Stream (Buffered h) where
>   vSeek = ...
>
> let's examine vSeek implementation. it should flush buffer on output
> streams, i.e. it should call vPutBuf. but that's impossible - vPutBuf
> is not in Streams dictionary! well, let's replicate implementation of
> Stream class:
>
> instance (InBlockStream h) => Stream (Buffered h) where
>   ...
> instance (OutBlockStream h) => Stream (Buffered h) where
>   ...
>
> but this, besides repeating almost the same code two times, again creates
> overlapped instances! what instance should use compiler for ReadWriteFD?
>
> yes, i can create 3 different data types for buffering input, output
> and input-output streams but this means even more code duplication.
> and multiplying these 3 i/o modes to the 3 buffering transformers i
> already have, i got limitless code growth!
>
>
> the same problem strikes me on each transformation stage:
> BlockStream->ByteStream, ByteStream->TextStream,
> ByteStream->BitsStream. it also bites me when i try to separate
> SeekableStream operations - vSeek operation required to discard input
> buffer, what is required in vSetBuf, for example. implementation of
> different Stream operations contains numerous dependencies from each
> other. may be it is great that i discover these dependencies at the early
> development stage instead of bothering with numerous errors when
> library will be really used in various environments
>
> btw, that you think about specifying buffering mode at the Stream
> creation? so vSetBuf will just gone away. to be exact, it will be
> still possible to use different buffering schemes, by creating new
> buffered Streams over the one raw Stream:
>
> h <- openRawFD "text"
> h1 <- withBuffering (BlockBuffering 512) h
> ... working with h1
> vFlush h1 -- discards buffer and returns `h` pointer to exact the
>           -- position of first byte that is not consumed by `h1`
> h2 <- withBuffering LineBuffering h
> ... working with h2
>
>

_______________________________________________
Libraries mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/libraries