How efficient is read?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
23 messages Options
12
Reply | Threaded
Open this post in threaded view
|

How efficient is read?

Tom Hawkins-2
I have a lot of structured data in a program written in a different
language, which I would like to read in and analyze with Haskell.  And
I'm free to format this data in any shape or form from the other
language.

Could I define a Haskell type for this data that derives the default
Read, then simply print out Haskell code from the program and 'read'
it in?  Would this be horribly inefficient?  It would save me some
time of writing a parser.

-Tom
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: How efficient is read?

Don Stewart-2
tomahawkins:
> I have a lot of structured data in a program written in a different
> language, which I would like to read in and analyze with Haskell.  And
> I'm free to format this data in any shape or form from the other
> language.
>
> Could I define a Haskell type for this data that derives the default
> Read, then simply print out Haskell code from the program and 'read'
> it in?  Would this be horribly inefficient?  It would save me some
> time of writing a parser.

It would be easy but inefficient for more than say, 100k of data.
deriving Binary will be faster and almost as easy (the derive script is
in the binary/scripts dir).
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: How efficient is read?

Pierre-Etienne Meunier-3
In reply to this post by Tom Hawkins-2
In fact, the time you'd spend writing read instances would not compare to the half hour required to learn parsec.
And your parser will be efficient (at least, according to the guys from the parser team ;-)

Cheers,
PE


El 08/05/2010, a las 23:32, Tom Hawkins escribió:

> I have a lot of structured data in a program written in a different
> language, which I would like to read in and analyze with Haskell.  And
> I'm free to format this data in any shape or form from the other
> language.
>
> Could I define a Haskell type for this data that derives the default
> Read, then simply print out Haskell code from the program and 'read'
> it in?  Would this be horribly inefficient?  It would save me some
> time of writing a parser.
>
> -Tom
> _______________________________________________
> Haskell-Cafe mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: How efficient is read?

Daniel Gorín
In reply to this post by Tom Hawkins-2
On May 9, 2010, at 12:32 AM, Tom Hawkins wrote:

> I have a lot of structured data in a program written in a different
> language, which I would like to read in and analyze with Haskell.  And
> I'm free to format this data in any shape or form from the other
> language.
>
> Could I define a Haskell type for this data that derives the default
> Read, then simply print out Haskell code from the program and 'read'
> it in?  Would this be horribly inefficient?  It would save me some
> time of writing a parser.
>
> -Tom

If your types contain infix constructors, the derived Read instances  
may be almost unusable; see http://hackage.haskell.org/trac/ghc/ticket/1544

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: How efficient is read?

paul r-2
In reply to this post by Pierre-Etienne Meunier-3


PEM> In fact, the time you'd spend writing read instances would not
PEM> compare to the half hour required to learn parsec.

maybe the wiki could be updated to give more clues for a newcomer.

  http://www.haskell.org/haskellwiki/Parsec

in particular :

 - link 1 points to the parsec site, with an almost 10 years old
   documentation, for a previous major release
 - link 3 is broken


The rest of the page is a bit terse as well. I'm really wondering what
one should start reading to learn how to parse a stream in haskell.

--
  Paul
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: How efficient is read?

Malcolm Wallace
In reply to this post by Pierre-Etienne Meunier-3
> In fact, the time you'd spend writing read instances would not  
> compare to the half hour required to learn parsec.
> And your parser will be efficient (at least, according to the guys  
> from the parser team ;-)


I agree that Read is likely to be inefficient, but the more important  
aspect is that it gives you no useful error message if the parse fails.

Parser combinators are really rather easy to learn and use, and tend  
to give decent error reports when something goes wrong.  In fact, if  
you just want Read-like functionality for a set of Haskell datatypes,  
use polyparse: the DrIFT tool can derive polyparse's Text.Parse class  
(the equivalent of Read) for you, so you do not even need to write the  
parser yourself!

I would caution against using Parsec if your dataset is large.  Parsec  
does not return anything until it has seen the entire input, so can  
use a huge amount of memory.  The other day someone was observing on  
haskell-cafe that parsing a 9Mb XML file using a Parsec-based parser  
required >7Gb of memory, compared with 1.3Gb for a strict polyparse-
based parser (still too much), and the happy conclusion was that the  
lazy polyparse variant uses a neglible amount by comparison.

(Declaration of interest: I wrote polyparse.)

Regards,
     Malcolm

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: How efficient is read?

Ivan Lazar Miljenovic
Malcolm Wallace <[hidden email]> writes:
> (Declaration of interest: I wrote polyparse.)

For which I, for one, am grateful!

(So, when are you going to release an updated version with a fixed
definition of discard for the lazy parser? :p)

--
Ivan Lazar Miljenovic
[hidden email]
IvanMiljenovic.wordpress.com
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Re: How efficient is read?

Stephen Tetley-2
In reply to this post by paul r-2
On 9 May 2010 08:45, Paul R <[hidden email]> wrote:
[SNIP]

>  http://www.haskell.org/haskellwiki/Parsec
>
> in particular :
>
>  - link 1 points to the parsec site, with an almost 10 years old
>   documentation, for a previous major release
>  - link 3 is broken
>
>
> The rest of the page is a bit terse as well. I'm really wondering what
> one should start reading to learn how to parse a stream in haskell.

Hi Paul

The 10 year old documentation is very good though - for my taste,
Parsec 2.0 is the best documented Haskell lib I've seen.

If you want to parse a stream, you don't want Parsec as produces as it
isn't an online parser - online meaning 'streaming' i.e. it can
produce some results during the 'work' rather than a single result at
the end. From the descriptions on Hackage, Parsimony and uu-parsinglib
sound like better candidates; similarly one of the Polyparse modules
provides an online parser.

If you want to learn how to write a streaming parser, pick one of
those - start work and post back to this list if/when you have
problems. Remember that a non-streaming parser is simpler than a
streaming one: you might want to write a version that works on short
input first and your result type has to support streaming (probably
best if it is a list). Also for any parser, but especially an online
one you'll have to be careful to use backtracking sparingly.

Best wishes

Stephen
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Re: How efficient is read?

Ivan Lazar Miljenovic
Stephen Tetley <[hidden email]> writes:
> The 10 year old documentation is very good though - for my taste,
> Parsec 2.0 is the best documented Haskell lib I've seen.

But does it help with Parsec-3?

> If you want to parse a stream, you don't want Parsec as produces as it
> isn't an online parser - online meaning 'streaming' i.e. it can
> produce some results during the 'work' rather than a single result at
> the end.

I thought this was one of the new features in Parsec-3...

--
Ivan Lazar Miljenovic
[hidden email]
IvanMiljenovic.wordpress.com
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Re: How efficient is read?

Stephen Tetley-2
On 9 May 2010 11:42, Ivan Lazar Miljenovic <[hidden email]> wrote:

>> If you want to parse a stream, you don't want Parsec ___ as it
>> isn't an online parser - online meaning 'streaming' i.e. it can
>> produce some results during the 'work' rather than a single result at
>> the end.
>
> I thought this was one of the new features in Parsec-3...

Hi Ivan

Possibly?

If so, maybe the authors ought to mention it in the cabal.file /
package description. I know it can use bytestrings which have
efficiency advantages over String, but that doesn't make it online.


Best wishes

Stephen
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Re: How efficient is read?

Ivan Lazar Miljenovic
Stephen Tetley <[hidden email]> writes:

> On 9 May 2010 11:42, Ivan Lazar Miljenovic <[hidden email]> wrote:
>
>>> If you want to parse a stream, you don't want Parsec ___ as it
>>> isn't an online parser - online meaning 'streaming' i.e. it can
>>> produce some results during the 'work' rather than a single result at
>>> the end.
>>
>> I thought this was one of the new features in Parsec-3...
>
> Possibly?
>
> If so, maybe the authors ought to mention it in the cabal.file /
> package description. I know it can use bytestrings which have
> efficiency advantages over String, but that doesn't make it online.

Well, RWH talks about Parsecs' "input stream", so maybe I'm just
confusing the terms: http://book.realworldhaskell.org/read/using-parsec.html

--
Ivan Lazar Miljenovic
[hidden email]
IvanMiljenovic.wordpress.com
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Re: How efficient is read?

Brandon S Allbery KF8NH
On May 9, 2010, at 06:53 , Ivan Lazar Miljenovic wrote:

> Stephen Tetley <[hidden email]> writes:
>> On 9 May 2010 11:42, Ivan Lazar Miljenovic  
>> <[hidden email]> wrote:
>>>> If you want to parse a stream, you don't want Parsec ___ as it
>>>> isn't an online parser - online meaning 'streaming' i.e. it can
>>>> produce some results during the 'work' rather than a single  
>>>> result at
>>>> the end.
>>>
>>> I thought this was one of the new features in Parsec-3...
>>
>> Possibly?
>>
>> If so, maybe the authors ought to mention it in the cabal.file /
>> package description. I know it can use bytestrings which have
>> efficiency advantages over String, but that doesn't make it online.
>
> Well, RWH talks about Parsecs' "input stream", so maybe I'm just
> confusing the terms: http://book.realworldhaskell.org/read/using-parsec.html
Hm.  I'd understand that as referring to the fact that Parsec 3 can  
use arbitrary input types instead of [Char], not to streams as in  
stream fusion or lazy processing.

--
brandon s. allbery [solaris,freebsd,perl,pugs,haskell] [hidden email]
system administrator [openafs,heimdal,too many hats] [hidden email]
electrical and computer engineering, carnegie mellon university    KF8NH



_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe

PGP.sig (202 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: How efficient is read?

Tom Hawkins-2
In reply to this post by Malcolm Wallace
On Sun, May 9, 2010 at 3:36 AM, Malcolm Wallace
>
> (Declaration of interest: I wrote polyparse.)

Yes, I used polyparse in the VCD library.  It rocks!

I'll check out the DrIFT tool.

Thanks.
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Re: How efficient is read?

Stephen Tetley-2
In reply to this post by Brandon S Allbery KF8NH
On 9 May 2010 13:25, Brandon S. Allbery KF8NH <[hidden email]> wrote:
[SNIP]

> Hm.  I'd understand that as referring to the fact that Parsec 3 can use
> arbitrary input types instead of [Char], not to streams as in stream fusion
> or lazy processing.

Hi Brandon

Yes - that's my impression too.

There is a package for Parsec iteratee package on Hackage that would
presumably support for streaming (thats to say online, or synonymously
- 'piecemeal' / lazy processing). However unless I could find a
tutorial, I'd go with Polyparse or uu-parsinglib (Doaitse Swierstra
has a tech report that gives a very detailed guide to uu-parsinglib).

Best wishes

Stephen
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: How efficient is read?

paul r-2
In reply to this post by Stephen Tetley-2
Hello Stephen,

Stephen> The 10 year old documentation is very good though - for my
Stephen> taste, Parsec 2.0 is the best documented Haskell lib I've seen.

Indeed the doc for 2.0 is really comprehensive, but didn't the library
evolve a lot between release 2.0 and 3.1 ?

Stephen> If you want to parse a stream, you don't want Parsec as
Stephen> produces as it isn't an online parser - online meaning
Stephen> 'streaming' i.e. it can produce some results during the 'work'
Stephen> rather than a single result at the end. From the descriptions
Stephen> on Hackage, Parsimony and uu-parsinglib sound like better
Stephen> candidates; similarly one of the Polyparse modules provides an
Stephen> online parser.

Thank you for this well detailed explanation. It was just me misusing
the word "stream", I was actually meaning a simple bounded string.

As a first shot I might try to add a new Reader to pandoc, which makes
use of Parsec 3, maybe a Textile one, which is not in yet.

regards,



--
  Paul
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: How efficient is read?

Stephen Tetley-2
On 10 May 2010 09:32, Paul R <[hidden email]> wrote:
[SNIP]
>
> Indeed the doc for 2.0 is really comprehensive, but didn't the library
> evolve a lot between release 2.0 and 3.1 ?


Hi Paul

I think the internals evolved a lot more than the interface - so it
can handle parsing byte-strings etc. There was quite a long blog post
aggregated to Planet Haskell a few months ago detailing the new
internals, unfortunately I can't remember the author's name so can't
find you a reference (I'm sure is wasn't Derek Elkins who maintains
Parsec).

I still use Parsec 2.1 myself (I've no need to parse large files where
byte-strings would be a clear advantage) so I'm not the best person to
comment, but I've just scanned the Haddock documentation on Hackage
and the interfaces look very similar. The modules have slightly
different namespaces - so imports will be different and one would have
to choose which text type to use (Text.Parsec.ByteString;
Text.Parsec.ByteString.Lazy or Text.Parsec.String) and import the
appropriate module to get the "parseFromFile" function.

Best wishes

Stephen
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Re: How efficient is read?

Brandon S Allbery KF8NH
In reply to this post by paul r-2
On May 10, 2010, at 04:32 , Paul R wrote:
> Stephen> If you want to parse a stream, you don't want Parsec as
> Stephen> produces as it isn't an online parser - online meaning
>
> Thank you for this well detailed explanation. It was just me misusing
> the word "stream", I was actually meaning a simple bounded string.

That's not misuse, it's just confusing the usual parser terminology  
with the usual Haskell terminology.  (Hence also the earlier confusion  
about token streams in Parsec 3.)

--
brandon s. allbery [solaris,freebsd,perl,pugs,haskell] [hidden email]
system administrator [openafs,heimdal,too many hats] [hidden email]
electrical and computer engineering, carnegie mellon university    KF8NH



_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe

PGP.sig (202 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: How efficient is read?

Chris Eidhof
In reply to this post by Tom Hawkins-2
There is the ChristmasTree package (http://hackage.haskell.org/package/ChristmasTree) which provides a very fast read alternative by deriving grammars for each datatype. If you want to know the speed differences, see http://www.cs.uu.nl/wiki/bin/view/Center/TTTAS for more information (it's in the Haskell Do You Read Me paper, see section 5 for a comparison of efficiency).

-chris

On 9 mei 2010, at 05:32, Tom Hawkins wrote:

> I have a lot of structured data in a program written in a different
> language, which I would like to read in and analyze with Haskell.  And
> I'm free to format this data in any shape or form from the other
> language.
>
> Could I define a Haskell type for this data that derives the default
> Read, then simply print out Haskell code from the program and 'read'
> it in?  Would this be horribly inefficient?  It would save me some
> time of writing a parser.
>
> -Tom
> _______________________________________________
> Haskell-Cafe mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: How efficient is read?

Tom Hawkins-2
In reply to this post by Malcolm Wallace
> In fact, if you just want
> Read-like functionality for a set of Haskell datatypes, use polyparse: the
> DrIFT tool can derive polyparse's Text.Parse class (the equivalent of Read)
> for you, so you do not even need to write the parser yourself!

Cabal install DrIFT-cabalized complains.  What is the module "Rules"?
I've never seen it before.

Is there a quick fix?  I didn't see a "build-depends" line in my
~/.cabal/config file.



e0082888@e0082888-laptop:~$ cabal install DrIFT-cabalized
Resolving dependencies...
Configuring DrIFT-cabalized-2.2.3.1...
Preprocessing executables for DrIFT-cabalized-2.2.3.1...
Building DrIFT-cabalized-2.2.3.1...

src/DrIFT.hs:19:17:
    Could not find module `Rules':
      It is a member of the hidden package `ghc-6.12.2'.
      Perhaps you need to add `ghc' to the build-depends in your .cabal file.
      Use -v to see a list of the files searched for.
cabal: Error: some packages failed to install:
DrIFT-cabalized-2.2.3.1 failed during the building phase. The exception was:
ExitFailure 1
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: How efficient is read?

Gwern Branwen
On Mon, May 10, 2010 at 4:50 PM, Tom Hawkins <[hidden email]> wrote:

>> In fact, if you just want
>> Read-like functionality for a set of Haskell datatypes, use polyparse: the
>> DrIFT tool can derive polyparse's Text.Parse class (the equivalent of Read)
>> for you, so you do not even need to write the parser yourself!
>
> Cabal install DrIFT-cabalized complains.  What is the module "Rules"?
> I've never seen it before.
>
> Is there a quick fix?  I didn't see a "build-depends" line in my
> ~/.cabal/config file.
>
>
>
> e0082888@e0082888-laptop:~$ cabal install DrIFT-cabalized
> Resolving dependencies...
> Configuring DrIFT-cabalized-2.2.3.1...
> Preprocessing executables for DrIFT-cabalized-2.2.3.1...
> Building DrIFT-cabalized-2.2.3.1...
>
> src/DrIFT.hs:19:17:
>    Could not find module `Rules':
>      It is a member of the hidden package `ghc-6.12.2'.
>      Perhaps you need to add `ghc' to the build-depends in your .cabal file.
>      Use -v to see a list of the files searched for.
> cabal: Error: some packages failed to install:
> DrIFT-cabalized-2.2.3.1 failed during the building phase. The exception was:
> ExitFailure 1

The tarball was missing its Rules.hs; as it happens, GHC has a module
named Rules.hs as well, hence the confusing error. I've uploaded a
fresh one that should work.

--
gwern
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
12