How to deal with last item with concatMapAccumC in Conduit.

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

How to deal with last item with concatMapAccumC in Conduit.

jun zhang
Dear cafes

I use Conduit to parse a huge file. And I need merge lines by condition.

I find the concatMapAccumC can do that and I write a demo as blow(with conduit-combinators-1.0.6,lts-6.18).
The problem is if the last item didn’t make condition true, the data only keep in the accum but missing in stream.

Any one can give me some advises?

Thanks


----------------------------
import Conduit

test'::Int->Int->(Int,[Int])
test' a s = case (a+s) > 5 of
    True -> (0,[a+s])
    False -> (a+s,[])

testlog::IO [Int]
testlog = runConduit $ (yieldMany [1,2,3,4,5,6,3]) $=  (concatMapAccumC test' 0 ) $$ sinkList



 


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to deal with last item with concatMapAccumC in Conduit.

Michael Snoyman
I'm afraid I doon't follow what it meant by the stream here. Could you provide a complete, runnable example and indicate what the expected and actual output are?

On Mon, Jul 17, 2017 at 5:48 AM, jun zhang <[hidden email]> wrote:
Dear cafes

I use Conduit to parse a huge file. And I need merge lines by condition.

I find the concatMapAccumC can do that and I write a demo as blow(with conduit-combinators-1.0.6,lts-6.18).
The problem is if the last item didn’t make condition true, the data only keep in the accum but missing in stream.

Any one can give me some advises?

Thanks


----------------------------
import Conduit

test'::Int->Int->(Int,[Int])
test' a s = case (a+s) > 5 of
    True -> (0,[a+s])
    False -> (a+s,[])

testlog::IO [Int]
testlog = runConduit $ (yieldMany [1,2,3,4,5,6,3]) $=  (concatMapAccumC test' 0 ) $$ sinkList






_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to deal with last item with concatMapAccumC in Conduit.

jun zhang
Dear all 

the runnable example code is as blow

===================================================================
import Conduit
import Text.Regex (matchRegex,mkRegex,Regex)



loghead = mkRegex "^([0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2},[0-9]{3} )"
-- "2015-01-25 00:04:18,840"

logMerge::Regex->String->String->(String,[String])
logMerge logregex str accum =
    case  matchRegex logregex  str of
        Just _ ->  (str,[(accum++"\n")])
        Nothing -> case null accum of
            True ->  (str,[])
            False ->  (accum ++ "<br>" ++ str,[])


runMerge::String->String->IO ()
runMerge infile outfile =
    runResourceT $ sourceFile infile  $= linesUnboundedC $= concatMapAccumC (logMerge loghead ) "" $$  sinkFile outfile

================================================================

the example input file is 
---------
2015-01-25 00:03:44,331 | DEBUG | WebContainer : 20  |  | errorCode: toString() = null
2015-01-25 00:03:44,331 | DEBUG | WebContainer : 20  |   codsexception.getErrorCode(): toString() 
{
    errorCode = "UNEXPECTED_PROBLEM"
    severity = ""
}
2015-01-25 00:03:44,331 | DEBUG | WebContainer : 20  |  | 
2015-01-25 00:03:45,331 | DEBUG | WebContainer : 20  |  | 
---------

the expected output is 
---------
2015-01-25 00:03:44,331 | DEBUG | WebContainer : 20  |  | errorCode: toString() = null
2015-01-25 00:03:44,331 | DEBUG | WebContainer : 20  |   codsexception.getErrorCode(): toString() <br>{<br>    errorCode =<br>"UNEXPECTED_PROBLEM"<br>    severity = ""<br>}
2015-01-25 00:03:44,331 | DEBUG | WebContainer : 20  |  | 
2015-01-25 00:03:45,331 | DEBUG | WebContainer : 20  |  | 
---------

the actual output is blow, missing the last line of log
---------
2015-01-25 00:03:44,331 | DEBUG | WebContainer : 20  |  | errorCode: toString() = null
2015-01-25 00:03:44,331 | DEBUG | WebContainer : 20  |   codsexception.getErrorCode(): toString() <br>{<br>    errorCode =<br>"UNEXPECTED_PROBLEM"<br>    severity = ""<br>}
2015-01-25 00:03:44,331 | DEBUG | WebContainer : 20  |  | 
---------

Thanks 



在 2017年7月19日,下午7:50,Michael Snoyman <[hidden email]> 写道:

I'm afraid I doon't follow what it meant by the stream here. Could you provide a complete, runnable example and indicate what the expected and actual output are?

On Mon, Jul 17, 2017 at 5:48 AM, jun zhang <[hidden email]> wrote:
Dear cafes

I use Conduit to parse a huge file. And I need merge lines by condition.

I find the concatMapAccumC can do that and I write a demo as blow(with conduit-combinators-1.0.6,lts-6.18).
The problem is if the last item didn’t make condition true, the data only keep in the accum but missing in stream.

Any one can give me some advises?

Thanks


----------------------------
import Conduit

test'::Int->Int->(Int,[Int])
test' a s = case (a+s) > 5 of
    True -> (0,[a+s])
    False -> (a+s,[])

testlog::IO [Int]
testlog = runConduit $ (yieldMany [1,2,3,4,5,6,3]) $=  (concatMapAccumC test' 0 ) $$ sinkList






_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to deal with last item with concatMapAccumC in Conduit.

Michael Snoyman
I'll preface by saying this probably indicates that the API for concatMapAccumC should be slightly different than it is currently.

The problem is that there is no way to convert the final accumulator value into output, and therefore when the input stream ends, that accumulator is simply dropped. One solution (pretty hacky) is to wrap all of the lines in a `Just` and then send in a final `Nothing` value to indicate that the stream is ended. This would look like:


Another option is to simply use the conduit primitives (await and yield) directly:


I'd lean towards the latter.

On Thu, Jul 20, 2017 at 9:34 AM, jun zhang <[hidden email]> wrote:
Dear all 

the runnable example code is as blow

===================================================================
import Conduit
import Text.Regex (matchRegex,mkRegex,Regex)



loghead = mkRegex "^([0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2},[0-9]{3} )"
-- "2015-01-25 00:04:18,840"

logMerge::Regex->String->String->(String,[String])
logMerge logregex str accum =
    case  matchRegex logregex  str of
        Just _ ->  (str,[(accum++"\n")])
        Nothing -> case null accum of
            True ->  (str,[])
            False ->  (accum ++ "<br>" ++ str,[])


runMerge::String->String->IO ()
runMerge infile outfile =
    runResourceT $ sourceFile infile  $= linesUnboundedC $= concatMapAccumC (logMerge loghead ) "" $$  sinkFile outfile

================================================================

the example input file is 
---------
2015-01-25 00:03:44,331 | DEBUG | WebContainer : 20  |  | errorCode: toString() = null
2015-01-25 00:03:44,331 | DEBUG | WebContainer : 20  |   codsexception.getErrorCode(): toString() 
{
    errorCode = "UNEXPECTED_PROBLEM"
    severity = ""
}
2015-01-25 00:03:44,331 | DEBUG | WebContainer : 20  |  | 
2015-01-25 00:03:45,331 | DEBUG | WebContainer : 20  |  | 
---------

the expected output is 
---------
2015-01-25 00:03:44,331 | DEBUG | WebContainer : 20  |  | errorCode: toString() = null
2015-01-25 00:03:44,331 | DEBUG | WebContainer : 20  |   codsexception.getErrorCode(): toString() <br>{<br>    errorCode =<br>"UNEXPECTED_PROBLEM"<br>    severity = ""<br>}
2015-01-25 00:03:44,331 | DEBUG | WebContainer : 20  |  | 
2015-01-25 00:03:45,331 | DEBUG | WebContainer : 20  |  | 
---------

the actual output is blow, missing the last line of log
---------
2015-01-25 00:03:44,331 | DEBUG | WebContainer : 20  |  | errorCode: toString() = null
2015-01-25 00:03:44,331 | DEBUG | WebContainer : 20  |   codsexception.getErrorCode(): toString() <br>{<br>    errorCode =<br>"UNEXPECTED_PROBLEM"<br>    severity = ""<br>}
2015-01-25 00:03:44,331 | DEBUG | WebContainer : 20  |  | 
---------

Thanks 



在 2017年7月19日,下午7:50,Michael Snoyman <[hidden email]> 写道:

I'm afraid I doon't follow what it meant by the stream here. Could you provide a complete, runnable example and indicate what the expected and actual output are?

On Mon, Jul 17, 2017 at 5:48 AM, jun zhang <[hidden email]> wrote:
Dear cafes

I use Conduit to parse a huge file. And I need merge lines by condition.

I find the concatMapAccumC can do that and I write a demo as blow(with conduit-combinators-1.0.6,lts-6.18).
The problem is if the last item didn’t make condition true, the data only keep in the accum but missing in stream.

Any one can give me some advises?

Thanks


----------------------------
import Conduit

test'::Int->Int->(Int,[Int])
test' a s = case (a+s) > 5 of
    True -> (0,[a+s])
    False -> (a+s,[])

testlog::IO [Int]
testlog = runConduit $ (yieldMany [1,2,3,4,5,6,3]) $=  (concatMapAccumC test' 0 ) $$ sinkList






_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.



_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to deal with last item with concatMapAccumC in Conduit.

jun zhang
Thanks very much




在 2017年7月21日,下午3:17,Michael Snoyman <[hidden email]> 写道:

I'll preface by saying this probably indicates that the API for concatMapAccumC should be slightly different than it is currently.

The problem is that there is no way to convert the final accumulator value into output, and therefore when the input stream ends, that accumulator is simply dropped. One solution (pretty hacky) is to wrap all of the lines in a `Just` and then send in a final `Nothing` value to indicate that the stream is ended. This would look like:


Another option is to simply use the conduit primitives (await and yield) directly:


I'd lean towards the latter.

On Thu, Jul 20, 2017 at 9:34 AM, jun zhang <[hidden email]> wrote:
Dear all 

the runnable example code is as blow

===================================================================
import Conduit
import Text.Regex (matchRegex,mkRegex,Regex)



loghead = mkRegex "^([0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2},[0-9]{3} )"
-- "2015-01-25 00:04:18,840"

logMerge::Regex->String->String->(String,[String])
logMerge logregex str accum =
    case  matchRegex logregex  str of
        Just _ ->  (str,[(accum++"\n")])
        Nothing -> case null accum of
            True ->  (str,[])
            False ->  (accum ++ "<br>" ++ str,[])


runMerge::String->String->IO ()
runMerge infile outfile =
    runResourceT $ sourceFile infile  $= linesUnboundedC $= concatMapAccumC (logMerge loghead ) "" $$  sinkFile outfile

================================================================

the example input file is 
---------
2015-01-25 00:03:44,331 | DEBUG | WebContainer : 20  |  | errorCode: toString() = null
2015-01-25 00:03:44,331 | DEBUG | WebContainer : 20  |   codsexception.getErrorCode(): toString() 
{
    errorCode = "UNEXPECTED_PROBLEM"
    severity = ""
}
2015-01-25 00:03:44,331 | DEBUG | WebContainer : 20  |  | 
2015-01-25 00:03:45,331 | DEBUG | WebContainer : 20  |  | 
---------

the expected output is 
---------
2015-01-25 00:03:44,331 | DEBUG | WebContainer : 20  |  | errorCode: toString() = null
2015-01-25 00:03:44,331 | DEBUG | WebContainer : 20  |   codsexception.getErrorCode(): toString() <br>{<br>    errorCode =<br>"UNEXPECTED_PROBLEM"<br>    severity = ""<br>}
2015-01-25 00:03:44,331 | DEBUG | WebContainer : 20  |  | 
2015-01-25 00:03:45,331 | DEBUG | WebContainer : 20  |  | 
---------

the actual output is blow, missing the last line of log
---------
2015-01-25 00:03:44,331 | DEBUG | WebContainer : 20  |  | errorCode: toString() = null
2015-01-25 00:03:44,331 | DEBUG | WebContainer : 20  |   codsexception.getErrorCode(): toString() <br>{<br>    errorCode =<br>"UNEXPECTED_PROBLEM"<br>    severity = ""<br>}
2015-01-25 00:03:44,331 | DEBUG | WebContainer : 20  |  | 
---------

Thanks 



在 2017年7月19日,下午7:50,Michael Snoyman <[hidden email]> 写道:

I'm afraid I doon't follow what it meant by the stream here. Could you provide a complete, runnable example and indicate what the expected and actual output are?

On Mon, Jul 17, 2017 at 5:48 AM, jun zhang <[hidden email]> wrote:
Dear cafes

I use Conduit to parse a huge file. And I need merge lines by condition.

I find the concatMapAccumC can do that and I write a demo as blow(with conduit-combinators-1.0.6,lts-6.18).
The problem is if the last item didn’t make condition true, the data only keep in the accum but missing in stream.

Any one can give me some advises?

Thanks


----------------------------
import Conduit

test'::Int->Int->(Int,[Int])
test' a s = case (a+s) > 5 of
    True -> (0,[a+s])
    False -> (a+s,[])

testlog::IO [Int]
testlog = runConduit $ (yieldMany [1,2,3,4,5,6,3]) $=  (concatMapAccumC test' 0 ) $$ sinkList






_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.




_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Loading...