Re: Haskell-Cafe Digest, Vol 180, Issue 32

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Haskell-Cafe Digest, Vol 180, Issue 32

MIMUW
I got 4.7s for similar amount of data in 2013.
However I was pretty sure that fully inlined implementation could potentially go 5x faster.
http://hackage.haskell.org/package/hPDB

Please check xeno XML parser benchmarks for another example.
https://hackage.haskell.org/package/xeno
On Fri, 31 Aug 2018 at 14:41, <[hidden email]> wrote:
Send Haskell-Cafe mailing list submissions to
        [hidden email]

To subscribe or unsubscribe via the World Wide Web, visit
        http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
or, via email, send a message with subject or body 'help' to
        [hidden email]

You can reach the person managing the list at
        [hidden email]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Haskell-Cafe digest..."
Today's Topics:

   1. Re: HDBC packages looking for maintainer (Tobias Dammers)
   2. Re: Alternative instance for non-backtracking parsers
      (Olaf Klinke)
   3. Re: Alternative instance for non-backtracking parsers
      (Bardur Arantsson)



---------- Forwarded message ----------
From: Tobias Dammers <[hidden email]>
To: [hidden email]
Cc: 
Bcc: 
Date: Thu, 30 Aug 2018 15:24:04 +0200
Subject: Re: [Haskell-cafe] HDBC packages looking for maintainer
Hi,

I'd be interested. I've used HDBC on a few projects, and my yeshql
library was originally built with HDBC as the only backend. It would be
a terrible shame to see this bitrot.

Cheers,

Tobias (tdammers on github etc.)

On Mon, Aug 13, 2018 at 12:07:38PM +0200, Erik Hesselink wrote:
> Hi all,
>
> I've been the maintainer for some of the HDBC packages for a while now.
> Sadly, I've mostly neglected them due to lack of time and usage. While the
> packages mostly work, there are occasional pull requests and updates for
> new compiler versions.
>
> Because of this I'm looking for someone who wants to take over HDBC and
> related packages [1]. If you use HDBC and would like to take over
> maintainership, please let me know and we can get things set up.
>
> Regards,
>
> Erik
>
> [1] https://github.com/hdbc

> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.


--
Tobias Dammers - [hidden email]




---------- Forwarded message ----------
From: Olaf Klinke <[hidden email]>
To: PY <[hidden email]>
Cc: haskell-cafe <[hidden email]>
Bcc: 
Date: Thu, 30 Aug 2018 20:21:07 +0200
Subject: Re: [Haskell-cafe] Alternative instance for non-backtracking parsers
> Hello, Olaf. I have some distrust of elegant solutions (one of them are
> C.P. libs).

I have a program that parses several CSV files, one of them 50MB in size, and writes its result as HTML. When I started optimizing, the execution time was 144 seconds. Profiling (thanks to Jasper Van der Jeugt for writing profiteur!) revealed that most of the time was spent parsing and postprocessing the 50MB CSV file. Changing the data structure of the postprocessing stage cut down the execution time to 32 seconds, but still the majority is spent on parsing.
Then I realized that (StateT String Maybe) is a parser which conveniently has all the class instances one needs, most notably its Alternative instance make it a backtracking parser. After defining a few combinators I was able to swap out my megaparsec parser against the new parser, which slashed execution time in half. Now most of the parsing time is dedicated to transforming text to numbers and dates. I doubt that parsing time can be reduced much further [*]. The new parser was identical to the old parser, only the combinators now come from another module. That is the elegant thing about monadic parser libraries.
I will now use the fast parser by default, and if it returns a Nothing, the program will suggest a command line flag that switches to the original megaparsec parser, exactly telling the user where the parse failed and why.
I am not sure whether there is another family of parsers that have interfaces so similar that switching from one package to another is as effortless as monadic parsers.

Cheers
Olaf

[*] To the parser experts on this list: How much time should a parser take that processes a 50MB, 130000-line text file, extracting 5 values (String, UTCTime, Int, Double) from each line?



---------- Forwarded message ----------
From: Bardur Arantsson <[hidden email]>
To: [hidden email]
Cc: 
Bcc: 
Date: Thu, 30 Aug 2018 21:43:55 +0200
Subject: Re: [Haskell-cafe] Alternative instance for non-backtracking parsers
On 30/08/2018 20.21, Olaf Klinke wrote:
>> Hello, Olaf. I have some distrust of elegant solutions (one of them are
>> C.P. libs).
>
> [*] To the parser experts on this list: How much time should a parser take that processes a 50MB, 130000-line text file, extracting 5 values (String, UTCTime, Int, Double) from each line?

Not an expert, but for something as (relatively!) standard as CSV, I'd
probably go for a specialized solution like 'cassava', which seems like
it does quite well according to https://github.com/haskell-perf/csv

Based purely the lines/second numbers on that page and the number you've
given, I'd guesstimate that your parsing could potentially be as fast as
(3.185ms / 1000 lines) * 130000 lines = 414.05ms = 0.4 s.

(Of coure that still doesn't account for extracting the Int, Double,
etc., but there are also specialized solutions for that which should be
pretty hard to beat, see e.g. bytestring-lexing.)

It's also probably a bit less elegant than a generic parsec-like thing,
but that's to be expected for a more special-case solution.

Regards,


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Haskell-Cafe Digest, Vol 180, Issue 32

Ben Franksen
Am 31.08.2018 um 14:53 schrieb Michal J Gajda:
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of Haskell-Cafe digest..."

Sounds like a good idea to me ;-)

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.