Quantcast

Messages delimited by multiple newlines?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Messages delimited by multiple newlines?

David McBride
I have a protocol where each message in the protocol is composed of several headers followed by newlines, and then an extra newline at the end to signal end of headers.  Sort of like headers in email, but goes on forever.

header: value\n
header: value\n
\n
newmessage\n
header: value\n
anotherheader: value\n
\n


To process message by messages I ended up making my own lens below based on Pipes.Parse.span.  It is a bit ugly.  I wanted to know if there was a more idiomatic way to do this using Pipes.text.line(s), Pipes.Group.groupBy, something else in Pipes.Parse or something else interesting like that.

endline :: Monad m => Lens' (Producer Text m a) (Producer Text m (Producer Text m a))
endline k p0 = fmap join (k (go p0 ""))
  where
      go :: Monad m => Producer Text m a -> Text -> Producer Text m (Producer Text m a)
      go p accum = do
        x <- lift (next p)
        case x of
              Left   r        -> return (return r)
              Right ("", p')  -> go p' ""
              Right (txt, p') -> do
                let
                 (prefix, suffix) = T.breakOn "\n\n" (T.append accum txt)
                 prefixnotrailing = T.dropWhileEnd (== '\n') prefix
                 trailingeol = T.takeWhileEnd (== '\n') prefix
                if (not . T.null $ prefixnotrailing)
                  then yield prefixnotrailing
                  else return ()
                if T.null suffix
                  then go p' trailingeol
                  else return (yield (T.drop 2 (T.append trailingeol suffix)) >> p')


--
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Messages delimited by multiple newlines?

Michael Thompson
I think as a lens this should reinsert the missing "\n\n",

     endline' :: Monad m => Lens' (Producer Text m a) (Producer Text m (Producer Text m a))
     endline' k p0 = fmap (>>= (yield "\n\n" >>)) (k (go p0 ""))   -- instead of just `join`

     >>> :set -XOverloadedStrings
     >>> Text.toLazyM $ over endline id $ yield "hello\nworld\n\ngoodbye\nworld"
     "hello\nworldgoodbye\nworld"
     >>> Text.toLazyM $ over endline' id $ yield "hello\nworld\n\ngoodbye\nworld"
     "hello\nworld\n\ngoodbye\nworld"

The latter is more the desired behavior.

Note that as it stands this silently accumulates everything 
before the double newline. One might try to avoid this, but if 
these are not foreign files it might not be worth worrying about.
If you don't want to accumulate, the thing that is really missing 
is a function like `Data.Text.breakOn` and `Data.Text.splitOn` 
which could break a text stream on a given text shape, here "\n\n" 
I remember trying to implement these, but it is surprisingly 
difficult to do in a non-plodding way. `text` uses an extremely 
complicated, but fast, method that collects a list of all indices 
at which the match text begins.

One thing I wondered is, are you going to repeat this across the 
length of the file? If so, and accumulating lines isn't an issue, 
then one might approach the problem starting by accumulating lines

     >>> :t PG.folds mappend mempty id . view Text.lines   -- I was using Pipes.Group = PG; Pipes.Text = Text
     PG.folds mappend mempty id . view Text.lines
      :: Monad m => Producer Text m r -> Producer Text m r

Now we have a producer of separate accumulated lines and can break on an empty line.

    >>> let accumLines = PG.folds mappend mempty id . view Text.lines
    >>> let txt = yield "hello\nworld\n\ngoodbye"
    >>> runEffect $ accumLines txt >-> P.print
    "hello"
    "world"
    ""
    "goodbye"

Now we are missing something like a `split :: a -> Producer a m r -> FreeT (Producer a m) m r`
which should be in Pipes.Group I think. If all of the above is not completely wrong headed,
we could try to write one. It should be pretty simple given `Pipes.Parse.span`. But note 
that with `Pipes.Parse.span` we are close to the effect you wanted:

    >>> rest <- runEffect $ accumLines txt  ^. PP.span (/= mempty) >-> P.print
    "hello"
    "world"
    *Main
    >>> runEffect $ rest >-> P.print
    ""
    "goodbye"

    >>> runEffect $ rest >-> P.drop 1 >-> P.print

    "goodbye"


You can collect the lines of the first record with `P.toListM'`

     >>> (rec1,rest) <-  P.toListM' $  accumLines txt  ^. PP.span (/= mempty) 

     >>> rec1

     ["hello","world"]


Like I said, this may all be wrong-headed and uncomprehending, I'm partly just testing

ideas to see what you are intending.

--
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Messages delimited by multiple newlines?

Michael Thompson
Oh, I should have said, when I suggested we might propose a `split` 
to Gabriel for Pipes.Group, that it should be a lens, thus with the type

    split :: (Monad m)  => (a -> Bool) -> Lens' (Producer a m r) (FreeT (Producer a m) m r)

or maybe 

    split :: (Monad m, Eq a) => a -> Lens' (Producer a m r) (FreeT (Producer a m) m r)

It occurs to me `Pipes.Group.groupBy` permits things like this

     >>> let cmp a b = a /=  mempty && b /= mempty 

     >>> let kludge p = PG.folds (\a b -> a <> "\n" <> b)  mempty id (accumLines p ^. PG.groupsBy cmp)

     >>>  runEffect $ kludge txt >-> P.filter (/= "\n") >-> P.print

     "\nhello\nworld"

     "\ngoodbye"



--
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Messages delimited by multiple newlines?

Michael Thompson
One more correction: it seems the type should be  

   split :: (Monad m, Eq a) => a -> Lens' (Producer a m r) (FreeT (Producer a m) m r)

since the lens should reinsert the thing we split on. If we use a 
predicate, we don't know what it was.

--
Loading...