strict version of Haskell - does it exist?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
28 messages Options
12
Reply | Threaded
Open this post in threaded view
|

strict version of Haskell - does it exist?

Marc Weber
A lot of work has been gone into GHC and its libraries.
However for some use cases C is still preferred, for obvious speed
reasons - because optimizing an Haskell application can take much time.

Is there any document describing why there is no ghc --strict flag
making all code strict by default?
Wouldn't this make it easier to apply Haskell to some additional fields
such as video processing etc?

Wouldn't such a '--strict' flag turn Haskell/GHC into a better C/gcc
compiler?

Projects like this: https://github.com/thoughtpolice/strict-ghc-plugin
show that the idea is not new.

Eg some time ago I had to do some logfile analysis. I ended doing it in
PHP because optimizing the Haskell code took too much time.

Marc Weber

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: strict version of Haskell - does it exist?

Don Stewart
Generally strict Haskell means using strict data types - vectors, arrays, bytestrings, intmaps where required.

However, you usually don't want all code and data strict, all the time, since laziness/on-demand eval is critical for deferring non-essential work.

Summary; -fstrict wouldn't magically make your code good. Using the right balance of strict and lazy code, via the right choice of strict and lazy types, however, often does.

Id be interested to know what choices were made in your log file case led you into problems -- using something excessively lazy (like lazy lists) or something excessively strict (like strict bytestrings) would both be suboptimal for log analysis. A hybrid type like a lazy bytestring, would be more appropriate.

On Sunday, January 29, 2012, Marc Weber <[hidden email]> wrote:
> A lot of work has been gone into GHC and its libraries.
> However for some use cases C is still preferred, for obvious speed
> reasons - because optimizing an Haskell application can take much time.
>
> Is there any document describing why there is no ghc --strict flag
> making all code strict by default?
> Wouldn't this make it easier to apply Haskell to some additional fields
> such as video processing etc?
>
> Wouldn't such a '--strict' flag turn Haskell/GHC into a better C/gcc
> compiler?
>
> Projects like this: https://github.com/thoughtpolice/strict-ghc-plugin
> show that the idea is not new.
>
> Eg some time ago I had to do some logfile analysis. I ended doing it in
> PHP because optimizing the Haskell code took too much time.
>
> Marc Weber
>
> _______________________________________________
> Haskell-Cafe mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: strict version of Haskell - does it exist?

Chris Wong
In reply to this post by Marc Weber
On Mon, Jan 30, 2012 at 10:13 AM, Marc Weber <[hidden email]> wrote:
> A lot of work has been gone into GHC and its libraries.
> However for some use cases C is still preferred, for obvious speed
> reasons - because optimizing an Haskell application can take much time.

As much as any other high-level language, I guess. Don't compare
apples to oranges and complain oranges aren't crunchy enough ;)

> Is there any document describing why there is no ghc --strict flag
> making all code strict by default?

Yes -- it's called the Haskell Report :)

GHC does a lot of optimization already. If making something strict
won't change how it behaves, it will, using a process called
strictness analysis.

The reason why there is no --strict flag is that strictness isn't just
something you turn on and off willy-nilly: it changes how the whole
language works. Structures such as infinite lists and Don Stewart's
lazy bytestrings *depend* on laziness for their performance.

> Wouldn't this make it easier to apply Haskell to some additional fields
> such as video processing etc?
>
> Wouldn't such a '--strict' flag turn Haskell/GHC into a better C/gcc
> compiler?

See above.

> Projects like this: https://github.com/thoughtpolice/strict-ghc-plugin
> show that the idea is not new.

Not sure what that does, but I'll have a look at it.

> Eg some time ago I had to do some logfile analysis. I ended doing it in
> PHP because optimizing the Haskell code took too much time.

That probably because you're using linked lists for strings. For
intensive text processing, it's better to use the text package instead
[1]

Chris

[1] http://hackage.haskell.org/package/text

> Marc Weber
>
> _______________________________________________
> Haskell-Cafe mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: strict version of Haskell - does it exist?

Ertugrul Söylemez
In reply to this post by Marc Weber
Marc Weber <[hidden email]> wrote:

> A lot of work has been gone into GHC and its libraries.
> However for some use cases C is still preferred, for obvious speed
> reasons - because optimizing an Haskell application can take much
> time.
>
> Is there any document describing why there is no ghc --strict flag
> making all code strict by default?
> Wouldn't this make it easier to apply Haskell to some additional
> fields such as video processing etc?
>
> Wouldn't such a '--strict' flag turn Haskell/GHC into a better C/gcc
> compiler?
>
> Projects like this: https://github.com/thoughtpolice/strict-ghc-plugin
> show that the idea is not new.
>
> Eg some time ago I had to do some logfile analysis. I ended doing it
> in PHP because optimizing the Haskell code took too much time.
First of all, /learning/ to optimize Haskell can be difficult.  The
optimizing itself is actually fairly easy in my experience, once you
understand how the language works.

Usually the nonstrictness is no bottleneck.  However, you have to know
that you are in a nonstrict language.  In fact, I find myself having
difficulties writing efficient code in a strict language.

Now to answer your question:  A strict-by-default Haskell comes with the
implication that you can throw away most of the libraries, including the
base library.  So yes, a strict-by-default Haskell is very well
possible, but the question is whether you actually want that.  I
wouldn't, because a lot of my code relies on the standard semantics.  I
would also expect problems with the way Haskell performs I/O, because it
would mean that

    forever (putStrLn "Hello world")

would cause a heap overflow, if Haskell were strict.  Note that we don't
have control structures.  We have combinators, and their nonstrictness
is essential.  The flag you are proposing would turn Haskell into a
language that is different enough that you couldn't do many useful
things with it.

If you want to save the time to learn how to write efficient Haskell
programs, you may want to have a look into the Disciple language.  You
will find that it has a different type system, which captures side
effects explicitly to make a pure strict language even possible.


Greets,
Ertugrul

--
nightmare = unsafePerformIO (getWrongWife >>= sex)
http://ertes.de/

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe

signature.asc (853 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: strict version of Haskell - does it exist?

Austin Seipp
In reply to this post by Chris Wong
The strict-ghc-plugin (under my maintenance) is just a continuation of
one of the original demos Max had for plugin support in the compiler.
The idea is fairly simple: 'let' and 'case' are the forms for creating
lazy/strict bindings in Core. It just systematically replaces all
occurrences of 'let' in Core with 'case'. So 'let b = e1 in e2'
becomes 'case e1 of { b -> e2 }', making 'b' strict. It also replaces
all applications of the form 'App e1 e2' (which is also lazy, as e2 is
evaluated on demand) with an equivalent binding like 'case e2 of { x
-> App e1 x }'. Pretty simple, and results in a totally strict
program.

The idea is just a proof of concept; in particular, I (and likely Max
although I cannot speak for him) am not using it as a position to say
that sometimes you want everything strict. You don't; at some point,
you're not even using Haskell anymore I suppose (remember: non-strict
semantics.) I can't think of any instance in which I would need or
want to use this plugin, honestly. But maybe someone else would - I
did refactor it to where you can strictify individual functions, as
opposed to full-blown modules, via annotations. So you could
selectively strictify things if you found it beneficial on certain
identifiers. But then there's the question of what affect that has on
the rest of GHC's optimizers, which I cant answer: the strictifier
modifies the pipeline to be the *first* pass, and the remaining ones
run afterwords. Compilers are built on heuristics and built for
'average' code. Sometimes these heuristics interact in odd ways,
especially with code that may deviate from 'the norm.' Once you're
fighting the optimizer, it can become a very difficult battle to win.
Careful analysis and selective optimization is probably going to take
you farther than hitting it with a giant hammer.

Having lazy and strict data structures and knowing when/where to use
them is crucial for good performance, and both have their merits (same
with every other thing under the sun, like by-ref/by-val semantics in
`data` types, which you can control with UNPACK etc.) I think we could
most certainly use better tools for analyzing low-level performance
details and the tradeoff between strictness/laziness and (especially
in large codebases,) but I don't think systematically making
everything strict is going to be the right idea in a vast majority of
situations.

On Sun, Jan 29, 2012 at 4:12 PM, Chris Wong
<[hidden email]> wrote:

> On Mon, Jan 30, 2012 at 10:13 AM, Marc Weber <[hidden email]> wrote:
>> A lot of work has been gone into GHC and its libraries.
>> However for some use cases C is still preferred, for obvious speed
>> reasons - because optimizing an Haskell application can take much time.
>
> As much as any other high-level language, I guess. Don't compare
> apples to oranges and complain oranges aren't crunchy enough ;)
>
>> Is there any document describing why there is no ghc --strict flag
>> making all code strict by default?
>
> Yes -- it's called the Haskell Report :)
>
> GHC does a lot of optimization already. If making something strict
> won't change how it behaves, it will, using a process called
> strictness analysis.
>
> The reason why there is no --strict flag is that strictness isn't just
> something you turn on and off willy-nilly: it changes how the whole
> language works. Structures such as infinite lists and Don Stewart's
> lazy bytestrings *depend* on laziness for their performance.
>
>> Wouldn't this make it easier to apply Haskell to some additional fields
>> such as video processing etc?
>>
>> Wouldn't such a '--strict' flag turn Haskell/GHC into a better C/gcc
>> compiler?
>
> See above.
>
>> Projects like this: https://github.com/thoughtpolice/strict-ghc-plugin
>> show that the idea is not new.
>
> Not sure what that does, but I'll have a look at it.
>
>> Eg some time ago I had to do some logfile analysis. I ended doing it in
>> PHP because optimizing the Haskell code took too much time.
>
> That probably because you're using linked lists for strings. For
> intensive text processing, it's better to use the text package instead
> [1]
>
> Chris
>
> [1] http://hackage.haskell.org/package/text
>
>> Marc Weber
>>
>> _______________________________________________
>> Haskell-Cafe mailing list
>> [hidden email]
>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
> _______________________________________________
> Haskell-Cafe mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/haskell-cafe



--
Regards,
Austin

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: strict version of Haskell - does it exist?

Marc Weber
In reply to this post by Don Stewart
Excerpts from Don Stewart's message of Sun Jan 29 22:55:08 +0100 2012:
> Summary; -fstrict wouldn't magically make your code good.
No. you're right. I don't expect that to happen. I agree on it being
always the programmers fault using wrong tools or not knowing the tools
well enough to get a job done.

The PHP code looks like this:

    foreach(glob('*.txt') as $file){
      foreach(split(file_get_contents($file)) => $line){
        $parsed_line = json_decode($line);
        // get some unix timestamps, keep some hashes of seen clients
        // (cookie ids) and such
        // check how many minutes before a checkout the customer visited
        // the site - and whether he did so for a couple of days.
      }
    }

// print result

The files are about 300 MB in size. However memory usage grew and grew
and grew - I had to kill it or limit amount of files.
The PHP code runs in a couple of seconds (parsing json and loading
files).. the Haskell app took much longer. That PHP is fast is no
surprise: I expect json_decode and split to be implemented in C.

So yes - I used lazy lists. However 8GB of RAM should have been enough
to keep things in RAM. So maybe also the JSON parsing library kept too
many unevaluated things in memory. So I could start either writing my
own JSON parsing library (being more strict) or profile the application
many times - but I don't want to.
Ignoring the json parsing I also gave conduits a try - only counting
lines. I know its experimental - but from its description I concluded it
would prevent me being a stupid Haskell programmer from taking too much
memory even using bad Haskell code.
However it looked like splitting the lines only counting them (recognizing
utf-8 chars) took a lot longer than also doing the json parsing in PHP.
Then the conduit implementation looked funny: hGetLine is used to feed
a line each time ... (luckily - because the utf8-libraries don't provide
nice ways to parse incomplete chunks of utf-8 bytes such as returning
Either IncompleteMissingByte UTF8Chunk)..

Probably the split(,"\n") in PHP doesn't parse utf-8 chars - and luckily
it doesn't seem to matter because \n only uses one byte.

I know that I'm not an Haskell expert. I still got the impression that
getting nice performance would be a small challenge and require much
more time than I spend on the PHP version.

That's why I'm wondering why there is no -fstrict option for such simple
use cases because Haskell has so many optimizations other languages
dream about.

I mean lot's of "non lazy" language still get their jobs done. So it
always depends on the use case.

Isn't it easy to add a compiler flag to GHC adding those ! strictness annotations
everywhere possible? Then simple use case like this would not be a huge
challenge.

Maybe you're right: I should just prepare some dummy files and ask the
community to help. However optimizing the JSON parser and my code just
seemed to be too much effort ..

Marc Weber

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: strict version of Haskell - does it exist?

Herbert Valerio Riedel
On Sun, 2012-01-29 at 23:47 +0100, Marc Weber wrote:
> So maybe also the JSON parsing library kept too
> many unevaluated things in memory. So I could start either writing my
> own JSON parsing library (being more strict)

Jfyi, aeson has been added strict parser variants json' and value' [1]
some time ago, so you shouldn't have to write your own stricter JSON
parsing library...

 [1]: http://hackage.haskell.org/packages/archive/aeson/0.6.0.0/doc/html/Data-Aeson-Parser.html#g:2


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: strict version of Haskell - does it exist?

Felipe Lessa
On Mon, Jan 30, 2012 at 6:21 AM, Herbert Valerio Riedel <[hidden email]> wrote:
> On Sun, 2012-01-29 at 23:47 +0100, Marc Weber wrote:
>> So maybe also the JSON parsing library kept too
>> many unevaluated things in memory. So I could start either writing my
>> own JSON parsing library (being more strict)
>
> Jfyi, aeson has been added strict parser variants json' and value'
> some time ago, so you shouldn't have to write your own stricter JSON
> parsing library...

Also, besides using those variants, you may also use the
attoparsec-conduit library [1].  If you have

  processJson :: Value -> IO X

then you'd need just something like

  import Data.Aeson (Value, json')
  import Data.Attoparsec.Char8 (isSpace_w8)
  import qualified Data.ByteString as B
  import qualified Data.Conduit as C
  import qualified Data.Conduit.Attoparsec as CA
  import qualified Data.Conduit.Binary as CB
  import qualified Data.Conduit.List as CL

  main = do
    ...
    ret <- forM_ fileList $ \fp -> do
      C.runResourceT $
        CB.sourceFile fp C.$=
        jsonLines C.$=
        CL.mapM processJson C.$$
        CL.consume
    print ret

  jsonLines :: C.Resource m => C.Conduit B.ByteString m Value
  jsonLines = C.sequenceSink () $ do
    val <- CA.sinkParser json'
    CB.dropWhile isSpace_w8
    return $ C.Emit () [val]

This code is extremely resource-friendly, since (a) you can't leak
file descriptors and (b) you just have to make sure that  processJson
function isn't too lazy.  It should be quite fast as well.

Cheers! =)

[1] http://hackage.haskell.org/package/attoparsec-conduit

--
Felipe.

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: strict version of Haskell - does it exist?

Alexander Bernauer
In reply to this post by Ertugrul Söylemez
On Sun, Jan 29, 2012 at 11:25:09PM +0100, Ertugrul Söylemez wrote:
> First of all, /learning/ to optimize Haskell can be difficult.  The
> optimizing itself is actually fairly easy in my experience, once you
> understand how the language works.

Given the fact that you have obviously mastered the learning part of
this, do you have any recommendations on what to read or on how to
proceed in order to learn how to optimize Haskell code?

I can imagine, it's not only about understanding the language itself,
but also about understanding how your compiler and its switches work,
being able to find the hot spots in your code, being able to measure the
effects of your changes, developing a good sense for the tradeoffs, etc.

So far, I have only stumpled upon chapter 25 of Real World Haskell.
Anything else you might recommend?

regards

Alex

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe

signature.asc (205 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: strict version of Haskell - does it exist?

Malcolm Wallace-2
In reply to this post by Ertugrul Söylemez

On 29 Jan 2012, at 22:25, Ertugrul Söylemez wrote:
> A strict-by-default Haskell comes with the
> implication that you can throw away most of the libraries, including the
> base library.  So yes, a strict-by-default Haskell is very well
> possible, but the question is whether you actually want that.  I
> wouldn't, because a lot of my code relies on the standard semantics.

At work, we have a strict version of Haskell, and indeed we do not use the standard libraries, but have built up our own versions of the ones we use.  However, our compiler is smart enough to transform and optimise the source code *as if* it were non-strict: it is only at runtime that things are evaluated strictly.  This means that, in general, you can rely on the standard semantics to a surprisingly large extent.  For instance,

    maybe (error "foo") lines (Just "hello\nworld")

will succeed without calling error, just like in Haskell.  Even if the final argument is supplied only at runtime, not statically, it will still do the right thing.

However, the downside of being strict at runtime is frequently poorer performance.  Stack overflows are common if you use explicit recursion:  it is better to use higher-order functions (map, fold, until) that are implemented at a lower level in C (i.e. not directly using explicit recursion themselves).  This is a good thing of course - thinking of data structures in the aggregate, rather than piecemeal.
However, bulk operations do transform the entire data structure, not merely the fragments that are needed for the onward computation, so it can often be a net performance loss.  The standard lazy computational paradigm of generate-and-test is therefore hideously expensive, on occasion.

Regards,
    Malcolm
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: strict version of Haskell - does it exist?

Jan-Willem Maessen


On Mon, Jan 30, 2012 at 6:24 AM, Malcolm Wallace <[hidden email]> wrote:

On 29 Jan 2012, at 22:25, Ertugrul Söylemez wrote:
> A strict-by-default Haskell comes with the
> implication that you can throw away most of the libraries, including the
> base library.  So yes, a strict-by-default Haskell is very well
> possible, but the question is whether you actually want that.  I
> wouldn't, because a lot of my code relies on the standard semantics.

At work, we have a strict version of Haskell, and indeed we do not use the standard libraries, but have built up our own versions of the ones we use.  However, our compiler is smart enough to transform and optimise the source code *as if* it were non-strict: it is only at runtime that things are evaluated strictly.  This means that, in general, you can rely on the standard semantics to a surprisingly large extent.

I wanted to emphasize Malcolm's point here.  Optimizing using the original Haskell semantics turned out to be pretty important back when I was working on Eager Haskell.  For example, a lot of Haskell functions are written in the following style:

f a b c
  | guard1 d = g h i
  | guard2 e = h
  | guard3 f = i
  | otherwise = j
  where d = ...expensive...
        e = ...expensive...
        f = ...expensive...
        g = ...expensive...
        h = ...expensive...
        i = ...expensive...
        j = ... expensive...

An an ordinary procedural language, where function calls in g, h, i, and j might have side effects, we can't sink bindings down to the point of use.  Even in the absence of side effects, we have to account for the fact that some of these computations are used in some -- but not all -- right-hand sides, and that often we need to do some inlining to discover that a value isn't going to be used.  It turns out Haskell code relies on this sort of thing all over the place, and simply coding around it leads to deeply-nested let bindings that walk off the right-hand edge of the screen.

It's not difficult to rewrite most of the prelude functions in this style, but it's no longer pretty, and it's recognizably not idiomatic Haskell.

However, bulk operations do transform the entire data structure, not merely the fragments that are needed for the onward computation, so it can often be a net performance loss.  The standard lazy computational paradigm of generate-and-test is therefore hideously expensive, on occasion.

This was a huge issue in Eager Haskell.  By far our worst performance was on stream-like programs that generated infinite lists of results, and then sliced away the useless tail.  With completely strict evaluation, this of course doesn't work at all, but it can be tricky to bound the required sizes of inputs even if you know how much of the output you want (imagine a call to takeWhile or filter on an infinite list).

-Jan-Willem Maessen
 

Regards,
   Malcolm
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: strict version of Haskell - does it exist?

Ertugrul Söylemez
In reply to this post by Alexander Bernauer
Alexander Bernauer <[hidden email]> wrote:

> On Sun, Jan 29, 2012 at 11:25:09PM +0100, Ertugrul Söylemez wrote:
> > First of all, /learning/ to optimize Haskell can be difficult.  The
> > optimizing itself is actually fairly easy in my experience, once you
> > understand how the language works.
>
> Given the fact that you have obviously mastered the learning part of
> this, do you have any recommendations on what to read or on how to
> proceed in order to learn how to optimize Haskell code?
>
> I can imagine, it's not only about understanding the language itself,
> but also about understanding how your compiler and its switches work,
> being able to find the hot spots in your code, being able to measure
> the effects of your changes, developing a good sense for the
> tradeoffs, etc.
>
> So far, I have only stumpled upon chapter 25 of Real World Haskell.
> Anything else you might recommend?
That's the tricky part about this.  Unfortunately the monad tutorial
fallacy [1] applies here.  At some point it makes click and suddenly you
see all the data dependencies.  If you were schizophrenic, you would
probably see little arrows all over your code.

What helped me a bit was to read the Wikibooks page about Haskell's
denotational semantics [2], but other than that I pretty much learned it
by myself.

In any case it's helpful to use 'seq' explicitly instead of bang
patterns.  Personally I never use bang patterns, but always use seq,
strict patterns or guards with strict functions.  Example:

    add :: Integer -> Integer -> Integer
    add x 0 = x
    add x y = (add $! succ x) (pred y)

Here the pattern forces the second argument, so there is no need to
force it yourself.  However, nothing forces the first argument, so it
needs to be forced manually, in this case by using ($!).

[1]: <http://byorgey.wordpress.com/2009/01/12/
      abstraction-intuition-and-the-monad-tutorial-fallacy/>

[2]: <http://en.wikibooks.org/wiki/Haskell/Denotational_semantics>


Greets,
Ertugrul

--
nightmare = unsafePerformIO (getWrongWife >>= sex)
http://ertes.de/

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe

signature.asc (853 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: strict version of Haskell - does it exist?

Marc Weber
In reply to this post by Marc Weber
Replying to all replies at once:

> Malcolm Wallace
>  At work, we have a strict version of Haskell
:-) which proofs that it is worth thinking about it.

> Ertugrul
> If you want to save the time to learn how to write efficient Haskell
> programs, you may want to have a look into the Disciple language.
Yes - I will. Its on my TODO list for at least 12 month :(
Not sure whether there are parser combinator libraries yet.

@  Herbert Valerio Riedel  (suggesting aeson)
I gave it yet another try - see below.
It still fails.

@ Felipe Almeida Lessa  (suggesting conduits and atto parsec)
I mentioned that I already tried it. Counting lines only was a lot slower than
counting lines and parsing JSON using PHP.

@ Chris Wong
> flag is that strictness isn't just something you turn on and off willy-nilly
You're right. But those features are not required for all tasks :)
Eg have a look at Data.Map. Would a strict implementation look that different?




I came up with this now. Trying strict bytestrings and Aeson.
note the LB.fromChunks usage below to turn a strict into a lazy bytestring.

Result: PHP script doing the same runs in 33secs (using the
AFindCheckoutsAndContacts branch) The haskell program below - I stopped it
after 8 min. (At least it does no longer cause my system to swap ..

You're right: I could start profiling. I could learn about how to optimize it.)
But why? The conclusion is that if I ever use yesod - and if I ever want to
analyse logs - I should call PHP from yesod as external process !? :-(

Even if I had a core i7 (8 threads in parallel) I still would not be sure
whether Haskell would be the right choice. I agree that I assume that all data
fits into memory so that piecewise evaluation doesn't matter too much.

Thanks for all your help - it proofs once again that the Haskell
community is the most helpful I know about. Yet I think me being the
programmer Haskell is the wrong choice for this task.

Thanks Marc Weber

my new attempt - now I should start profiling.. Looks like I haven't
built all libs with profiling support ..



  import Data.Aeson.Types
  import Data.Aeson
  import Data.List
  import Control.Applicative
  import Debug.Trace
  import qualified Data.Map as M
  import Action
  import Data.Aeson.Parser as AP

  import qualified Data.ByteString.Lazy as LB

  import qualified Data.ByteString as BS
  import qualified Data.ByteString.Char8 as LBC8

  data Action = ACountLine | AFindCheckoutsAndContacts

  -- lines look like this:
  -- {"id":"4ee535f01550c","r":"","ua":"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)","t":1323644400,"k":"XXX","v":"YY"}

  data Item = Item {
    id :: SB.ByteString,
    ua :: SB.ByteString,
    t  :: Int,
    k  :: SB.ByteString,
    v  :: SB.ByteString
  }

  instance FromJSON Item where
      parseJSON (Object v) = Item <$>
                             v .: "id" <*>
                             v .: "ua" <*>
                             v .: "t" <*>
                             v .: "k" <*>
                             v .: "v"
      parseJSON _ = empty


  run :: Action -> [FilePath] -> IO ()

  run AFindCheckoutsAndContacts files = do
    -- find all ids quering the server more than 100 times.
    -- do so by building a map counting queries

    (lines :: [BS.ByteString]) <- fmap (concat . (map LBC8.lines) ) $ mapM BS.readFile files
    let processLine :: (M.Map BS.ByteString Int) -> BS.ByteString -> (M.Map BS.ByteString Int)
        processLine st line = case decode' (LB.fromChunks [line]) of
                                  Nothing -> st -- traceShow ("bad line " ++ (LBC8.unpack line)) st
                                  Just (Item id ua t k v) -> M.insertWith (+) k 1 st
    let count_by_id = foldl' processLine (M.empty) lines
    mapM_ print $ M.toList $ M.filter (> 100) count_by_id

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: strict version of Haskell - does it exist?

Felipe Lessa
On Mon, Jan 30, 2012 at 2:12 PM, Marc Weber <[hidden email]> wrote:
> @ Felipe Almeida Lessa  (suggesting conduits and atto parsec)
> I mentioned that I already tried it. Counting lines only was a lot slower than
> counting lines and parsing JSON using PHP.

Then please take a deeper look into my code.  What you said that
you've tried is something else.

Cheers,

--
Felipe.

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: strict version of Haskell - does it exist?

Roman Leshchinskiy
In reply to this post by Marc Weber
Marc Weber wrote:
> Replying to all replies at once:
>
>> Malcolm Wallace
>>  At work, we have a strict version of Haskell
> :-) which proofs that it is worth thinking about it.

But doesn't necessarily prove that it's a good idea.

>                                   Just (Item id ua t k v) -> M.insertWith
> (+) k 1 st

Does replacing this by insertWith' help?

Roman




_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: strict version of Haskell - does it exist?

Roman Leshchinskiy
In reply to this post by Marc Weber
Marc Weber wrote:
> Replying to all replies at once:
>
>> Malcolm Wallace
>>  At work, we have a strict version of Haskell
> :-) which proofs that it is worth thinking about it.

But doesn't necessarily prove that it's a good idea.

>                                   Just (Item id ua t k v) -> M.insertWith
> (+) k 1 st

Does replacing this by insertWith' help?

Roman




_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: strict version of Haskell - does it exist?

Joachim Breitner-2
In reply to this post by Alexander Bernauer
Hi,

Am Montag, den 30.01.2012, 10:52 +0100 schrieb Alexander Bernauer:

> On Sun, Jan 29, 2012 at 11:25:09PM +0100, Ertugrul Söylemez wrote:
> > First of all, /learning/ to optimize Haskell can be difficult.  The
> > optimizing itself is actually fairly easy in my experience, once you
> > understand how the language works.
>
> Given the fact that you have obviously mastered the learning part of
> this, do you have any recommendations on what to read or on how to
> proceed in order to learn how to optimize Haskell code?
>
> I can imagine, it's not only about understanding the language itself,
> but also about understanding how your compiler and its switches work,
> being able to find the hot spots in your code, being able to measure the
> effects of your changes, developing a good sense for the tradeoffs, etc.
>
> So far, I have only stumpled upon chapter 25 of Real World Haskell.
> Anything else you might recommend?
Although I would not claim that I have mastered this, I did recently
hold a talk that touched some of these issues, and also exhibits a case
where you want something more fine-grained than just strictness or
lazyness. From your name I think it is safe to point you to a German
document:
http://www.joachim-breitner.de/blog/archives/539-Guest-lecture-on-Haskell-performance.html

Greetings,
Joachim

--
Joachim "nomeata" Breitner
  [hidden email]  |  [hidden email]  |  GPG: 0x4743206C
  xmpp: [hidden email] | http://www.joachim-breitner.de/


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe

signature.asc (205 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: strict version of Haskell - does it exist?

Marc Weber
In reply to this post by Felipe Lessa

Using insertWith' gets time down to 30-40 secs (thus only being 3-4
times slower than PHP).
PHP still is at 13 secs, does not require installing libraries - does
not require compilation and is trivial to write.

A trivial C++ application takes 11-12secs and even with some googling
was trivial to write.

Excerpts from Felipe Almeida Lessa's message of Mon Jan 30 17:36:46 +0100 2012:
> Then please take a deeper look into my code.  What you said that
> you've tried is something else.
I didn't say that I tried your code. I gave enumerator package a try
counting lines which I expected to behave similar to conduits
because both serve a similar purpose.
Then I hit the the "sourceFile" returns chunked lines issue (reported
it, got fixed) - ....

Anyway: My log files are a json dictionary on each line:

  { id : "foo", ... }
  { id : "bar", ... }

Now how do I use the conduit package to split a "chunked" file into lines?
Or should I create a new parser "many json >> newline" ?

Except that I think my processJson for this test should look like this
because I want to count how often the clients queried the server.
Probalby I should also be using CL.fold as shown in the test cases of
conduit. If you tell me how you'd cope with the "one json dict on each
line" issue I'll try to benchmark this solution as well.


-- probably existing library functions can be used here ..
processJson :: (M.Map T.Text Int) -> Value -> (M.Map T.Text Int)
processJson m value = case value of
                          Ae.Object hash_map ->
                            case HMS.lookup (T.pack "id") hash_map of
                              Just id_o ->
                                case id_o of
                                  Ae.String id -> M.insertWith' (+) id 1 m
                                  _ -> m
                              _ -> m
                          _ -> m

Marc Weber

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: strict version of Haskell - does it exist?

Felipe Lessa
On Tue, Jan 31, 2012 at 6:05 AM, Marc Weber <[hidden email]> wrote:

> I didn't say that I tried your code. I gave enumerator package a try
> counting lines which I expected to behave similar to conduits
> because both serve a similar purpose.
> Then I hit the the "sourceFile" returns chunked lines issue (reported
> it, got fixed) - ....
>
> Anyway: My log files are a json dictionary on each line:
>
>  { id : "foo", ... }
>  { id : "bar", ... }
>
> Now how do I use the conduit package to split a "chunked" file into lines?
> Or should I create a new parser "many json >> newline" ?

Currently there are two solutions.  The first one is what I wrote
earlier on this thread:

 jsonLines :: C.Resource m => C.Conduit B.ByteString m Value
 jsonLines = C.sequenceSink () $ do
   val <- CA.sinkParser json'
   CB.dropWhile isSpace_w8
   return $ C.Emit () [val]

This conduit will run the json' parser (from aeson) and then drop any
whitespace after that.  Note that it will correctly parse all of your
files but will also parse some files that don't conform to your
specification.  I assume that's fine.



The other solution is going to released with conduit 0.2, probably
today.  There's a lines conduit that splits the file into lines, so
you could write jsonLines above as:

 mapJson :: C.Resource m => C.Conduit B.ByteString m Value
 mapJson = C.sequenceSink () $ do
   val <- CA.sinkParser json'
   return $ C.Emit () [val]

which doesn't need to care about newlines, and then change main to

 main = do
   ...
   ret <- forM_ fileList $ \fp -> do
     C.runResourceT $
       CB.sourceFile fp C.$=
       CB.lines C.$=  -- new line is here
       mapJson C.$=
       CL.mapM processJson C.$$
       CL.consume
   print ret


I don't know which solution would be faster.  Either way, both
solutions will probably be faster with the new conduit 0.2.


> Except that I think my processJson for this test should look like this
> because I want to count how often the clients queried the server.
> Probalby I should also be using CL.fold as shown in the test cases of
> conduit. If you tell me how you'd cope with the "one json dict on each
> line" issue I'll try to benchmark this solution as well.

This issue was already being coped with in my previous e-mail =).

> -- probably existing library functions can be used here ..
> processJson :: (M.Map T.Text Int) -> Value -> (M.Map T.Text Int)
> processJson m value = case value of
>                          Ae.Object hash_map ->
>                            case HMS.lookup (T.pack "id") hash_map of
>                              Just id_o ->
>                                case id_o of
>                                  Ae.String id -> M.insertWith' (+) id 1 m
>                                  _ -> m
>                              _ -> m
>                          _ -> m

Looks like the perfect job for CL.fold.  Just change those three last
lines in main from

  ... C.$=
  CL.mapM processJson C.$$
  CL.consume

into

  ... C.$$
  CL.fold processJson

and you should be ready to go.

Cheers!

--
Felipe.

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: strict version of Haskell - does it exist?

Marc Weber
In reply to this post by Marc Weber
>  jsonLines :: C.Resource m => C.Conduit B.ByteString m Value
>  jsonLines = C.sequenceSink () $ do
>    val <- CA.sinkParser json'
>    CB.dropWhile isSpace_w8
>    return $ C.Emit () [val]

Adding a \state -> (the way Felipe Lessa told me) make is work and
it runs in about 20sec and that although some conduit overhead is likely
to take place.

omitting my custom data type using bytestrings operating on Value of
Aeson reduces running time to 16secs.

PHP/C++ still wins: less than 12secs.

Now I can imagine again that even a desktop multi core system is faster
than a single threaded C application.

Thanks for your help. Maybe I can setup profiling again to understand
why its still taking little bit more time.

Marc Weber

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
12