Decoding JSON stream where some values are needed before others

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Sal
Reply | Threaded
Open this post in threaded view
|

Decoding JSON stream where some values are needed before others

Sal
Hello,

I just posted this pipes related question at stackoverflow: http://stackoverflow.com/questions/37498848/decoding-json-stream-where-some-values-are-needed-before-others

If any one has pointers on this, will very much appreciate it. I control the Javascript client that sends the JSON if this makes it easier to solve the problem.

--
Sal
Reply | Threaded
Open this post in threaded view
|

Re: Decoding JSON stream where some values are needed before others

Sal
More on this after some more investigation, and SO feedback:

Problem: How to encode and decode a JSON object in streaming fashion where we need some information from JSON body before handling rest of JSON body - I have an object of type (Text,Lazy ByteString) where text has metadata (e.g., say filepath to write to) that allows us to handle bytestring (e.g., an image that is streamed to a consumer/sink based on metadata information).

The text metadata needs to be ordered before the bytestring so it can be parsed before we start parsing bytestring in streaming fashion.  So, I picked the tuple since it can be encoded as an ordered JSON array (which aeson does as well in deriveJSON for tuple types). Another approach is probably concatenated json or newline

Now, what I am trying to figure out is what is the best way to solve this - should I use a custom parser with aeson ToJSON/FromJSON instance, or a pipe parse, pipe aeson or something else? What would a good solution look like to this problem? I just need a JSON parser that can parse the metadata, and then continue streaming parse on bytestring based on metadata information - that way, I could pipe it between producer and consumer (so, if the parse is incomplete, we could undo the write). 

Since I haven't used pipes before, I am unsure how to proceed here, and so, will very much appreciate pointers. Psuedo-code or toy examples will be very helpful.


On Saturday, May 28, 2016 at 8:20:02 AM UTC-4, Sal wrote:
Hello,

I just posted this pipes related question at stackoverflow: <a href="http://stackoverflow.com/questions/37498848/decoding-json-stream-where-some-values-are-needed-before-others" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fstackoverflow.com%2Fquestions%2F37498848%2Fdecoding-json-stream-where-some-values-are-needed-before-others\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGqDY4NiMIf26KkmQSmQLXd117cKg&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fstackoverflow.com%2Fquestions%2F37498848%2Fdecoding-json-stream-where-some-values-are-needed-before-others\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGqDY4NiMIf26KkmQSmQLXd117cKg&#39;;return true;">http://stackoverflow.com/questions/37498848/decoding-json-stream-where-some-values-are-needed-before-others

If any one has pointers on this, will very much appreciate it. I control the Javascript client that sends the JSON if this makes it easier to solve the problem.

--
Reply | Threaded
Open this post in threaded view
|

Re: Decoding JSON stream where some values are needed before others

Gabriel Gonzalez
In reply to this post by Sal
I believe you won’t be able to reuse the `pipes-aeson` library because it doesn't provide a way to stream over a nested field of a decoded JSON record nor is there any support for cursor-like navigation of the struct.  That means that you will need to parse the skeleton of the JSON record by hand.

Also, some work needs to be done to wrap the `base64-bytestring` in a `pipes`-like API with this type:

    -- Convert a base64-encoded stream to a raw byte stream
    decodeBase64
        :: Producer ByteString m r
        -- ^ Base64-encoded bytes
        -> Producer ByteString m (Either SomeException (Producer ByteString m r)) 
        -- ^ Raw bytes

Note that the result returns a `Producer` for the remainder of the byte string (i.e. everything after the base64-encoded bytes) if the decoding completes successfully.  This lets you resume parsing where the image bytes end.

However, assuming that you have a `decodeBase64` function, then the rough outline of how the code would work is that you’d have three parts:

* Parse the prefix of the record before the image bytes using a `binary` parser adapted to `pipes`
* Use the `decodeBase64` function to stream the decoded image bytes
* Parse the suffix of the record after the image bytes also using a `binary` parser adapted to `pipes`

In other words, the types and implementation would look roughly like this:

    -- This would match the "{ 'id' : 'foo', 'image' : '" prefix of the JSON record
    skipPrefix :: Data.Binary.Get ()

    skipPrefix’ :: Monad m => Producer ByteString m r -> m (Either DecodingError (Producer ByteString m r))
    skipPrefix’ = execStateT (Pipes.Binary.decodeGet skipPrefix)

    — This would match the "' }" suffix of the JSON record
    skipSuffix :: Data.Binary.Get ()

    skipSuffix’ :: Monad m => Producer ByteString m r -> m (Either DecodingError (Producer ByteString m r))
    skipSuffix’ = execStateT (Pipes.Binary.decodeGet skipSuffix)

    streamImage
        ::  Monad m
        =>  Producer ByteString m r
        ->  Producer ByteString m (Either SomeException (Producer ByteString m r))
    streamImage p0 = do
        e0 <- lift (skipPrefix’ p0)
        case e0 of
            Left exc -> return (Left (toException exc))
            Right p1 -> do
                e1 <- decodeBase64 p1
                case e1 of
                    Left exc -> return (Left exc)
                    Right p2 -> do
                        e2 <- lift (skipSuffix’ p2)
                        case e2 of
                            Left exc -> return (Left (toException exc))
                            Right p3 -> return (Right p3)

In other words, `streamImage` would take a `Producer` as input that begins at the first character of the JSON record, and it will stream the decoded image bytes extracted from that record.  If decoding succeeds, then it will return the remainder of the byte stream immediately after the JSON record.

On May 28, 2016, at 5:20 AM, Sal <[hidden email]> wrote:

Hello,


If any one has pointers on this, will very much appreciate it. I control the Javascript client that sends the JSON if this makes it easier to solve the problem.

--

--
Sal
Reply | Threaded
Open this post in threaded view
|

Re: Decoding JSON stream where some values are needed before others

Sal
Thanks, Gabriel. Very helpful. 

I am going to do a prototype (just to get familiar with pipes - I already use turtle and io-streams) and come back if any questions.

On Sunday, May 29, 2016 at 12:36:26 AM UTC-4, Gabriel Gonzalez wrote:
I believe you won’t be able to reuse the `pipes-aeson` library because it doesn't provide a way to stream over a nested field of a decoded JSON record nor is there any support for cursor-like navigation of the struct.  That means that you will need to parse the skeleton of the JSON record by hand.

Also, some work needs to be done to wrap the `base64-bytestring` in a `pipes`-like API with this type:

    -- Convert a base64-encoded stream to a raw byte stream
    decodeBase64
        :: Producer ByteString m r
        -- ^ Base64-encoded bytes
        -> Producer ByteString m (Either SomeException (Producer ByteString m r)) 
        -- ^ Raw bytes

Note that the result returns a `Producer` for the remainder of the byte string (i.e. everything after the base64-encoded bytes) if the decoding completes successfully.  This lets you resume parsing where the image bytes end.

However, assuming that you have a `decodeBase64` function, then the rough outline of how the code would work is that you’d have three parts:

* Parse the prefix of the record before the image bytes using a `binary` parser adapted to `pipes`
* Use the `decodeBase64` function to stream the decoded image bytes
* Parse the suffix of the record after the image bytes also using a `binary` parser adapted to `pipes`

In other words, the types and implementation would look roughly like this:

    -- This would match the "{ 'id' : 'foo', 'image' : '" prefix of the JSON record
    skipPrefix :: Data.Binary.Get ()

    skipPrefix’ :: Monad m => Producer ByteString m r -> m (Either DecodingError (Producer ByteString m r))
    skipPrefix’ = execStateT (Pipes.Binary.decodeGet skipPrefix)

    — This would match the "' }" suffix of the JSON record
    skipSuffix :: Data.Binary.Get ()

    skipSuffix’ :: Monad m => Producer ByteString m r -> m (Either DecodingError (Producer ByteString m r))
    skipSuffix’ = execStateT (Pipes.Binary.decodeGet skipSuffix)

    streamImage
        ::  Monad m
        =>  Producer ByteString m r
        ->  Producer ByteString m (Either SomeException (Producer ByteString m r))
    streamImage p0 = do
        e0 <- lift (skipPrefix’ p0)
        case e0 of
            Left exc -> return (Left (toException exc))
            Right p1 -> do
                e1 <- decodeBase64 p1
                case e1 of
                    Left exc -> return (Left exc)
                    Right p2 -> do
                        e2 <- lift (skipSuffix’ p2)
                        case e2 of
                            Left exc -> return (Left (toException exc))
                            Right p3 -> return (Right p3)

In other words, `streamImage` would take a `Producer` as input that begins at the first character of the JSON record, and it will stream the decoded image bytes extracted from that record.  If decoding succeeds, then it will return the remainder of the byte stream immediately after the JSON record.

On May 28, 2016, at 5:20 AM, Sal <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="A3Bs3suoAgAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">sanket....@...> wrote:

Hello,

I just posted this pipes related question at stackoverflow: <a href="http://stackoverflow.com/questions/37498848/decoding-json-stream-where-some-values-are-needed-before-others" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fstackoverflow.com%2Fquestions%2F37498848%2Fdecoding-json-stream-where-some-values-are-needed-before-others\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGqDY4NiMIf26KkmQSmQLXd117cKg&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fstackoverflow.com%2Fquestions%2F37498848%2Fdecoding-json-stream-where-some-values-are-needed-before-others\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGqDY4NiMIf26KkmQSmQLXd117cKg&#39;;return true;">http://stackoverflow.com/questions/37498848/decoding-json-stream-where-some-values-are-needed-before-others

If any one has pointers on this, will very much appreciate it. I control the Javascript client that sends the JSON if this makes it easier to solve the problem.

--

--