ANN: lazy-csv - the fastest and most space-efficient parser for CSV

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

ANN: lazy-csv - the fastest and most space-efficient parser for CSV

Malcolm Wallace-2
There are lots of Haskell CSV parsers out there.  Most have poor error-reporting, and do not scale to large inputs.  I am pleased to announce an industrial-strength library that is robust, fast, space-efficient, lazy, and scales to gigantic inputs with no loss of performance.

    http://code.haskell.org/lazy-csv/

Downloads from Hackage:

    http://hackage.haskell.org/package/lazy-csv

This library has been in industrial use for several years now, but this is the first public release.  No doubt the API is not as general as it could be, but it already serves many purposes very well.  I'm happy to receive bug reports and suggestions for improvements.

Regards,
    Malcolm


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: ANN: lazy-csv - the fastest and most space-efficient parser for CSV

Oliver Charles-3
On 02/25/2013 10:47 AM, Malcolm Wallace wrote:
There are lots of Haskell CSV parsers out there.  Most have poor error-reporting, and do not scale to large inputs.  I am pleased to announce an industrial-strength library that is robust, fast, space-efficient, lazy, and scales to gigantic inputs with no loss of performance.

    http://code.haskell.org/lazy-csv/

Downloads from Hackage:

    http://hackage.haskell.org/package/lazy-csv

This library has been in industrial use for several years now, but this is the first public release.  No doubt the API is not as general as it could be, but it already serves many purposes very well.  I'm happy to receive bug reports and suggestions for improvements.

Regards,
    Malcolm

Obvious question: How does this compare to cassava? Especially cassava's Data.CSV.Incremental module? I specifically ask because you mention that it's " It is lazier, faster, more space-efficient, and more flexible in its treatment of errors, than any other extant Haskell CSV library on Hackage" but there is no mention of cassava in the website.

- Ollie

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: ANN: lazy-csv - the fastest and most space-efficient parser for CSV

Malcolm Wallace-2

On 25 Feb 2013, at 11:14, Oliver Charles wrote:

> Obvious question: How does this compare to cassava? Especially cassava's Data.CSV.Incremental module? I specifically ask because you mention that it's " It is lazier, faster, more space-efficient, and more flexible in its treatment of errors, than any other extant Haskell CSV library on Hackage" but there is no mention of cassava in the website.

Simple answer - I have never heard of cassava, and suspect it did not exist when I first did the benchmarking. I'd be happy to re-do my performance comparison, including cassava and any other recent-ish CSV libraries, if I can find them.

Regards,
    Malcolm
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: ANN: lazy-csv - the fastest and most space-efficient parser for CSV

Ivan Lazar Miljenovic
In reply to this post by Malcolm Wallace-2
On 25 February 2013 21:47, Malcolm Wallace <[hidden email]> wrote:
> There are lots of Haskell CSV parsers out there.  Most have poor error-reporting, and do not scale to large inputs.  I am pleased to announce an industrial-strength library that is robust, fast, space-efficient, lazy, and scales to gigantic inputs with no loss of performance.
>
>     http://code.haskell.org/lazy-csv/
>
> Downloads from Hackage:
>
>     http://hackage.haskell.org/package/lazy-csv

Note that on your website, you list the Hackage URL as having
"packages" rather than "package"...

>
> This library has been in industrial use for several years now, but this is the first public release.  No doubt the API is not as general as it could be, but it already serves many purposes very well.  I'm happy to receive bug reports and suggestions for improvements.
>
> Regards,
>     Malcolm
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/haskell-cafe



--
Ivan Lazar Miljenovic
[hidden email]
http://IvanMiljenovic.wordpress.com

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: ANN: lazy-csv - the fastest and most space-efficient parser for CSV

John Wiegley-3
In reply to this post by Malcolm Wallace-2
>>>>> Malcolm Wallace <[hidden email]> writes:

> Simple answer - I have never heard of cassava, and suspect it did not exist
> when I first did the benchmarking. I'd be happy to re-do my performance
> comparison, including cassava and any other recent-ish CSV libraries, if I
> can find them.

I would be very interested in those results, Malcolm.

Thanks,
--
John Wiegley
FP Complete                         Haskell tools, training and consulting
http://fpcomplete.com               johnw on #haskell/irc.freenode.net

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: ANN: lazy-csv - the fastest and most space-efficient parser for CSV

ozataman
I'd also like to point to a couple of CSV libraries I released a long time ago and have been maintaining that both target constant-space operation and try (and hope) for the best in terms of speed. I'd be very interested to know how they fare in terms of performance benchmarking:

Latest, based on conduit: http://hackage.haskell.org/package/csv-conduit (just released the latest version)


Notice how both are based on IO streaming libraries of fame to achieve both constant space operation AND nice interoperability with their habitat. I have found this to be especially true in the case of conduit.

If you end up designing a benchmark, I'd be happy to get it working with my library.

- Oz

On Monday, February 25, 2013 at 5:16 PM, John Wiegley wrote:

Malcolm Wallace <[hidden email]> writes:

Simple answer - I have never heard of cassava, and suspect it did not exist
when I first did the benchmarking. I'd be happy to re-do my performance
comparison, including cassava and any other recent-ish CSV libraries, if I
can find them.

I would be very interested in those results, Malcolm.

Thanks,
--
John Wiegley
FP Complete Haskell tools, training and consulting

_______________________________________________
Haskell-Cafe mailing list


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: ANN: lazy-csv - the fastest and most space-efficient parser for CSV

Don Stewart
In reply to this post by Malcolm Wallace-2

Cassava is quite new, but has the same goals as lazy-csv.

Its about a year old now - http://blog.johantibell.com/2012/08/a-new-fast-and-easy-to-use-csv-library.html

I know Johan has been working on the benchmarks of late - it would be very good to know how the two compare in features

On Feb 25, 2013 11:23 AM, "Malcolm Wallace" <[hidden email]> wrote:

On 25 Feb 2013, at 11:14, Oliver Charles wrote:

> Obvious question: How does this compare to cassava? Especially cassava's Data.CSV.Incremental module? I specifically ask because you mention that it's " It is lazier, faster, more space-efficient, and more flexible in its treatment of errors, than any other extant Haskell CSV library on Hackage" but there is no mention of cassava in the website.

Simple answer - I have never heard of cassava, and suspect it did not exist when I first did the benchmarking. I'd be happy to re-do my performance comparison, including cassava and any other recent-ish CSV libraries, if I can find them.

Regards,
    Malcolm
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: ANN: lazy-csv - the fastest and most space-efficient parser for CSV

Johan Tibell-2
On Mon, Feb 25, 2013 at 2:32 PM, Don Stewart <[hidden email]> wrote:

Cassava is quite new, but has the same goals as lazy-csv.

Its about a year old now - http://blog.johantibell.com/2012/08/a-new-fast-and-easy-to-use-csv-library.html

I know Johan has been working on the benchmarks of late - it would be very good to know how the two compare in features


To run, check out the cassava repo on GitHub and run: cabal configure --enable-benchmarks && cabal build && cabal bench

Here are the results (all the normal caveats for benchmarking applies):

benchmarking positional/decode/presidents/without conversion
mean: 62.85965 us, lb 62.56705 us, ub 63.26101 us, ci 0.950
std dev: 1.751446 us, lb 1.371323 us, ub 2.295576 us, ci 0.950

benchmarking positional/decode/streaming/presidents/without conversion
mean: 93.81925 us, lb 91.14701 us, ub 98.19217 us, ci 0.950
std dev: 17.20842 us, lb 11.58690 us, ub 23.41786 us, ci 0.950

benchmarking comparison/lazy-csv
mean: 133.2609 us, lb 132.4415 us, ub 135.3085 us, ci 0.950
std dev: 6.193178 us, lb 3.123661 us, ub 12.83148 us, ci 0.950

The two first set of numbers are for cassava (in the all-at-once vs streaming mode). The last set is for lazy-csv.

The feature sets of the two libraries are quite different. Both do basic CSV parsing (with some extensions).

 * lazy-csv parses CSV data to something akin to [[ByteString]], but with a heavy focus on error recovery and precise error messages.
 * cassava parses CSV data to [a], where a is a user-defined type that represents a CSV record. There are options to recover from *type conversion* errors, but not from malformed CSV. cassava has several parsing modes: incremental for parsing interleaved with I/O, streaming for lazy parsing (with or without I/O), and all-at-once parsing for when you want to hold all the data in memory.

-- Johan


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe