Opinion on two splitting functions

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Opinion on two splitting functions

Daniel Díaz Carrete
Hi,

The pipes-group and streaming libraries both give you functions like "group", "groupBy" and "chunksOf" that let you segment a stream while preserving streaming. I was thinking of implementing these two additional functions:

delimit  :: (x -> a -> (x,NonEmpty [b])) 
         -> x 
         -> (x -> NonEmpty [b])
         -> Stream (Of a) m r 
         -> Stream (Stream (Of b) m) m r

delimitM :: (x -> a -> Stream (Of b) m (Stream (Stream (Of b) m) m x))
         -> m x 
         -> (x -> Stream (Of b) m (Stream (Stream (Of b) m) m ()))
         -> Stream (Of a) m r 
         -> Stream (Stream (Of b) m) m r

For "delimit", the idea is that every step produces a NonEmpty list of lists along with the new internal state. The list at the head would consist of elements belonging to the "current open group". If there are more lists afterwards, they represent new groups which have been detected in the stream. For streams comprising one single continuous group,  the NonEmptys would *always* contain just one list.

"delimitM" is the monadic version, although here the types start getting quite scary.

What do you think? Would "delimit" make sense, or at this level of complexity it would be better to build the splitting function by single-stepping the source stream?

(Extra semi-related question: could the FreeT trick used in pipes-group also be applied to Sources from the conduit library?)

--
Reply | Threaded
Open this post in threaded view
|

Re: Opinion on two splitting functions

Gabriel Gonzalez
What is the motivation for `delimit`/delimitM`?  Are these designed to be utilities for building group-like functions?

Also, the `FreeT` trick can be applied to `ConduitM`, although not `Source`

On Aug 17, 2017, at 5:47 PM, Daniel Díaz <[hidden email]> wrote:

Hi,

The pipes-group and streaming libraries both give you functions like "group", "groupBy" and "chunksOf" that let you segment a stream while preserving streaming. I was thinking of implementing these two additional functions:

delimit  :: (x -> a -> (x,NonEmpty [b])) 
         -> x 
         -> (x -> NonEmpty [b])
         -> Stream (Of a) m r 
         -> Stream (Stream (Of b) m) m r

delimitM :: (x -> a -> Stream (Of b) m (Stream (Stream (Of b) m) m x))
         -> m x 
         -> (x -> Stream (Of b) m (Stream (Stream (Of b) m) m ()))
         -> Stream (Of a) m r 
         -> Stream (Stream (Of b) m) m r

For "delimit", the idea is that every step produces a NonEmpty list of lists along with the new internal state. The list at the head would consist of elements belonging to the "current open group". If there are more lists afterwards, they represent new groups which have been detected in the stream. For streams comprising one single continuous group,  the NonEmptys would *always* contain just one list.

"delimitM" is the monadic version, although here the types start getting quite scary.

What do you think? Would "delimit" make sense, or at this level of complexity it would be better to build the splitting function by single-stepping the source stream?

(Extra semi-related question: could the FreeT trick used in pipes-group also be applied to Sources from the conduit library?)

--

--
Reply | Threaded
Open this post in threaded view
|

Re: Opinion on two splitting functions

Daniel Díaz Carrete
On Saturday, August 19, 2017 at 6:34:21 PM UTC+2, Gabriel Gonzalez wrote:
What is the motivation for `delimit`/delimitM`?  Are these designed to be utilities for building group-like functions?

Yes.

I'm trying to use Backpack to create an abstract common interface to pipes / streaming / conduit: https://github.com/danidiaz/streamy

I don't want to expose a single-stepping function like "next", but I would still like to be able to define complex splitters using only the abstract signature.


Also, the `FreeT` trick can be applied to `ConduitM`, although not `Source`

Interesting, so perhaps it would be possible to implement a conduit-group package...
 

On Aug 17, 2017, at 5:47 PM, Daniel Díaz <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="5Togig8GCQAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">diaz.c...@...> wrote:

Hi,

The pipes-group and streaming libraries both give you functions like "group", "groupBy" and "chunksOf" that let you segment a stream while preserving streaming. I was thinking of implementing these two additional functions:

delimit  :: (x -> a -> (x,NonEmpty [b])) 
         -> x 
         -> (x -> NonEmpty [b])
         -> Stream (Of a) m r 
         -> Stream (Stream (Of b) m) m r

delimitM :: (x -> a -> Stream (Of b) m (Stream (Stream (Of b) m) m x))
         -> m x 
         -> (x -> Stream (Of b) m (Stream (Stream (Of b) m) m ()))
         -> Stream (Of a) m r 
         -> Stream (Stream (Of b) m) m r

For "delimit", the idea is that every step produces a NonEmpty list of lists along with the new internal state. The list at the head would consist of elements belonging to the "current open group". If there are more lists afterwards, they represent new groups which have been detected in the stream. For streams comprising one single continuous group,  the NonEmptys would *always* contain just one list.

"delimitM" is the monadic version, although here the types start getting quite scary.

What do you think? Would "delimit" make sense, or at this level of complexity it would be better to build the splitting function by single-stepping the source stream?

(Extra semi-related question: could the FreeT trick used in pipes-group also be applied to Sources from the conduit library?)

--

--
Reply | Threaded
Open this post in threaded view
|

Re: Opinion on two splitting functions

Gabriel Gonzalez
So one thing I want to point out is that if you change the type of `delimit` to:

    delimit :: (x -> a -> x) -> x -> (x -> NonEmpty b) -> Stream (Of a) m r -> Stream (Stream (Of b) m) m r

... which might be equivalent depending on your needs, then you can simplify the type further to:

    delimit :: Fold a (NonEmpty b) -> Stream (Of a) m r -> Stream (Stream (Of b) m) m r

... where `Fold` is from the `foldl` library

You can similarly simplify `delimitM` using `FoldM` from the same library

On Aug 19, 2017, at 12:31 PM, Daniel Díaz <[hidden email]> wrote:

On Saturday, August 19, 2017 at 6:34:21 PM UTC+2, Gabriel Gonzalez wrote:
What is the motivation for `delimit`/delimitM`?  Are these designed to be utilities for building group-like functions?

Yes.

I'm trying to use Backpack to create an abstract common interface to pipes / streaming / conduit: https://github.com/danidiaz/streamy

I don't want to expose a single-stepping function like "next", but I would still like to be able to define complex splitters using only the abstract signature.


Also, the `FreeT` trick can be applied to `ConduitM`, although not `Source`

Interesting, so perhaps it would be possible to implement a conduit-group package...
 

On Aug 17, 2017, at 5:47 PM, Daniel Díaz <diaz.c...@gmail.com> wrote:

Hi,

The pipes-group and streaming libraries both give you functions like "group", "groupBy" and "chunksOf" that let you segment a stream while preserving streaming. I was thinking of implementing these two additional functions:

delimit  :: (x -> a -> (x,NonEmpty [b])) 
         -> x 
         -> (x -> NonEmpty [b])
         -> Stream (Of a) m r 
         -> Stream (Stream (Of b) m) m r

delimitM :: (x -> a -> Stream (Of b) m (Stream (Stream (Of b) m) m x))
         -> m x 
         -> (x -> Stream (Of b) m (Stream (Stream (Of b) m) m ()))
         -> Stream (Of a) m r 
         -> Stream (Stream (Of b) m) m r

For "delimit", the idea is that every step produces a NonEmpty list of lists along with the new internal state. The list at the head would consist of elements belonging to the "current open group". If there are more lists afterwards, they represent new groups which have been detected in the stream. For streams comprising one single continuous group,  the NonEmptys would *always* contain just one list.

"delimitM" is the monadic version, although here the types start getting quite scary.

What do you think? Would "delimit" make sense, or at this level of complexity it would be better to build the splitting function by single-stepping the source stream?

(Extra semi-related question: could the FreeT trick used in pipes-group also be applied to Sources from the conduit library?)

-- 

--
Reply | Threaded
Open this post in threaded view
|

Re: Opinion on two splitting functions

Daniel Díaz Carrete
One slight awkwardness of representing it as a Fold is that it gives the impression that the initial state may emit values downstream, which is odd for a splitter.

Anyway, I when ahead and implement the function. Here is the Backpack signature with some documentation. Here is the pipes implementation, here the streaming one, and here some simple tests.

I believe one could implement the pipes-text splitters (lines,words) this way.

On Saturday, August 19, 2017 at 11:53:21 PM UTC+2, Gabriel Gonzalez wrote:
So one thing I want to point out is that if you change the type of `delimit` to:

    delimit :: (x -> a -> x) -> x -> (x -> NonEmpty b) -> Stream (Of a) m r -> Stream (Stream (Of b) m) m r

... which might be equivalent depending on your needs, then you can simplify the type further to:

    delimit :: Fold a (NonEmpty b) -> Stream (Of a) m r -> Stream (Stream (Of b) m) m r

... where `Fold` is from the `foldl` library

You can similarly simplify `delimitM` using `FoldM` from the same library

On Aug 19, 2017, at 12:31 PM, Daniel Díaz <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="BpnTAXgXCQAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">diaz.c...@...> wrote:

On Saturday, August 19, 2017 at 6:34:21 PM UTC+2, Gabriel Gonzalez wrote:
What is the motivation for `delimit`/delimitM`?  Are these designed to be utilities for building group-like functions?

Yes.

I'm trying to use Backpack to create an abstract common interface to pipes / streaming / conduit: <a href="https://github.com/danidiaz/streamy" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2Fdanidiaz%2Fstreamy\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFslTxPNQ50kt-G7d4VFZJm6lSGeA&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2Fdanidiaz%2Fstreamy\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFslTxPNQ50kt-G7d4VFZJm6lSGeA&#39;;return true;">https://github.com/danidiaz/streamy

I don't want to expose a single-stepping function like "next", but I would still like to be able to define complex splitters using only the abstract signature.


Also, the `FreeT` trick can be applied to `ConduitM`, although not `Source`

Interesting, so perhaps it would be possible to implement a conduit-group package...
 

On Aug 17, 2017, at 5:47 PM, Daniel Díaz <diaz.c...@<a href="http://gmail.com/" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://gmail.com/&#39;;return true;" onclick="this.href=&#39;http://gmail.com/&#39;;return true;">gmail.com> wrote:

Hi,

The pipes-group and streaming libraries both give you functions like "group", "groupBy" and "chunksOf" that let you segment a stream while preserving streaming. I was thinking of implementing these two additional functions:

delimit  :: (x -> a -> (x,NonEmpty [b])) 
         -> x 
         -> (x -> NonEmpty [b])
         -> Stream (Of a) m r 
         -> Stream (Stream (Of b) m) m r

delimitM :: (x -> a -> Stream (Of b) m (Stream (Stream (Of b) m) m x))
         -> m x 
         -> (x -> Stream (Of b) m (Stream (Stream (Of b) m) m ()))
         -> Stream (Of a) m r 
         -> Stream (Stream (Of b) m) m r

For "delimit", the idea is that every step produces a NonEmpty list of lists along with the new internal state. The list at the head would consist of elements belonging to the "current open group". If there are more lists afterwards, they represent new groups which have been detected in the stream. For streams comprising one single continuous group,  the NonEmptys would *always* contain just one list.

"delimitM" is the monadic version, although here the types start getting quite scary.

What do you think? Would "delimit" make sense, or at this level of complexity it would be better to build the splitting function by single-stepping the source stream?

(Extra semi-related question: could the FreeT trick used in pipes-group also be applied to Sources from the conduit library?)

-- 

--
Reply | Threaded
Open this post in threaded view
|

Re: Opinion on two splitting functions

Gabriel Gonzalez
Note that this applies to the type that you gave, too!  There is no way to deduce from the type that the initial state won't be used to produce a value :)

If you do want that guarantee, then perhaps you could restructure the type as:

    delimit :: (x -> a -> (x, NonEmpty b)) -> x -> Stream (Of a) m r -> Stream (Stream (Of b) m) m r

This type has some nice properties, too.  For example, the first part is equivalent to a mealy machine:

    {-# LANGUAGE ExistentialQuantification #-}

    data Mealy a b = forall s . Mealy
        { begin :: s

        , step :: a -> s -> (b, s)
        -- ^ @a -> State s b@
        }

... which implements all sorts of type classes (like `Category`, `Applicative` and `Arrow`) and supports nice lens-based combinators analogous to the ones for `Fold`

Using that type, you would have:

    delimit :: Mealy a (NonEmpty b) -> Stream (Or a) m r -> Stream (Stream (Of b) m) m r

... and you can also generalize `Mealy` to a monadic version, too for `delimitM`

However, looking at this more closely I'm not sure that your original type for `delimit` is sufficiently general to handle most use cases for grouping a stream.  If I'm reading the type correctly I think this assumes that you have one group of `b`s per original `a` in the input stream, which might not be the case.  For example, consider this function that groups a stream of text chunks into lines:

    lines :: Monad m => Stream (Of Text) m r -> Stream (Stream (Of Text)) m r

I don't see how you could implement that in terms of `delimit`

I think if you really want a sufficiently interface you should expose a `next`-like function (similar to `list-transformer` and `pipes`)

    data Step f m r = Cons (f (Stream f m r)) | Nil r

    next :: Stream f m r -> m (Step f m r)

You might still provide higher-efficiency combinators for grouping, but I think you will need the less efficient `next` primitive exposed as a fallback for the general case of grouping things

On Sep 12, 2017, at 3:16 PM, Daniel Díaz <[hidden email]> wrote:

One slight awkwardness of representing it as a Fold is that it gives the impression that the initial state may emit values downstream, which is odd for a splitter.

Anyway, I when ahead and implement the function. Here is the Backpack signature with some documentation. Here is the pipes implementation, here the streaming one, and here some simple tests.

I believe one could implement the pipes-text splitters (lines,words) this way.

On Saturday, August 19, 2017 at 11:53:21 PM UTC+2, Gabriel Gonzalez wrote:
So one thing I want to point out is that if you change the type of `delimit` to:

    delimit :: (x -> a -> x) -> x -> (x -> NonEmpty b) -> Stream (Of a) m r -> Stream (Stream (Of b) m) m r

... which might be equivalent depending on your needs, then you can simplify the type further to:

    delimit :: Fold a (NonEmpty b) -> Stream (Of a) m r -> Stream (Stream (Of b) m) m r

... where `Fold` is from the `foldl` library

You can similarly simplify `delimitM` using `FoldM` from the same library

On Aug 19, 2017, at 12:31 PM, Daniel Díaz <diaz.c...@gmail.com> wrote:

On Saturday, August 19, 2017 at 6:34:21 PM UTC+2, Gabriel Gonzalez wrote:
What is the motivation for `delimit`/delimitM`?  Are these designed to be utilities for building group-like functions?

Yes.

I'm trying to use Backpack to create an abstract common interface to pipes / streaming / conduit: https://github.com/danidiaz/streamy

I don't want to expose a single-stepping function like "next", but I would still like to be able to define complex splitters using only the abstract signature.


Also, the `FreeT` trick can be applied to `ConduitM`, although not `Source`

Interesting, so perhaps it would be possible to implement a conduit-group package...
 

On Aug 17, 2017, at 5:47 PM, Daniel Díaz <diaz.c...@gmail.com> wrote:

Hi,

The pipes-group and streaming libraries both give you functions like "group", "groupBy" and "chunksOf" that let you segment a stream while preserving streaming. I was thinking of implementing these two additional functions:

delimit  :: (x -> a -> (x,NonEmpty [b])) 
         -> x 
         -> (x -> NonEmpty [b])
         -> Stream (Of a) m r 
         -> Stream (Stream (Of b) m) m r

delimitM :: (x -> a -> Stream (Of b) m (Stream (Stream (Of b) m) m x))
         -> m x 
         -> (x -> Stream (Of b) m (Stream (Stream (Of b) m) m ()))
         -> Stream (Of a) m r 
         -> Stream (Stream (Of b) m) m r

For "delimit", the idea is that every step produces a NonEmpty list of lists along with the new internal state. The list at the head would consist of elements belonging to the "current open group". If there are more lists afterwards, they represent new groups which have been detected in the stream. For streams comprising one single continuous group,  the NonEmptys would *always* contain just one list.

"delimitM" is the monadic version, although here the types start getting quite scary.

What do you think? Would "delimit" make sense, or at this level of complexity it would be better to build the splitting function by single-stepping the source stream?

(Extra semi-related question: could the FreeT trick used in pipes-group also be applied to Sources from the conduit library?)

-- 

--
Reply | Threaded
Open this post in threaded view
|

Re: Opinion on two splitting functions

Daniel Díaz Carrete
On Friday, September 15, 2017 at 7:17:33 PM UTC+2, Gabriel Gonzalez wrote:
Note that this applies to the type that you gave, too!  There is no way to deduce from the type that the initial state won't be used to produce a value :)


Ah, you're right, of course. The "done" function may emit stuff without any input. The problem is that if I remove it and go the Mealy machine route some ways of splitting seem to become impossible. For example, imagine splitting a text stream on a two-char separator (while removing the separator).
 

However, looking at this more closely I'm not sure that your original type for `delimit` is sufficiently general to handle most use cases for grouping a stream.  If I'm reading the type correctly I think this assumes that you have one group of `b`s per original `a` in the input stream, which might not be the case.  For example, consider this function that groups a stream of text chunks into lines:

    lines :: Monad m => Stream (Of Text) m r -> Stream (Stream (Of Text)) m r

I don't see how you could implement that in terms of `delimit`



Instead of the step function returning  NonEmpty b, it should return NonEmpty [b]:

delimit :: Monad m => (x -> a -> (x, NonEmpty [b])) -> (x -> NonEmpty [b]) -> x -> Stream a m r -> Groups b m r

I think this covers the case in which we encounter multiple newlines in a single block of text. It would be like:

lines :: Stream T.Text IO r -> Groups T.Text IO r
lines =
    Y.delimit (\nl txt ->
                  if T.null txt
                     then (nl,pure [])
                     else let nl' = T.last txt == '\n'
                          in case (nl,T.lines txt) of
                              (True,ls)    -> (nl', []  :| map pure ls)
                              (False,l:ls) -> (nl', [l] :| map pure ls)
                              (_,[])       -> error "never happens")
              (\_ -> pure [])
              True
 

--