Haskell XML Parsers

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Haskell XML Parsers

Richard Senington
Dear all,

I have been looking at using XML for a little program I have been
writing. The file I am currently trying to load is about 9MB, and I have
now tried to use
HaXml and HST. Without any of my own code, just a simple call to the
basic parsers, they both use huge amount of memory.
HST is the worst and about 7GB and climbing. HaXml uses 1.3Gb.

The code I am using is
HST
xml <- readFile file_name_here;k<-runX (parseXmlDocument True) xml;print k

and for HaXml
x<-readFile file_name_here
let (Document _ _ e _) = xmlParse "t" x
let t = myFilter $ CElem e
print $ length t


I have seen on previous posts to the cafe that other people have run
into this problem with HST. Is this a general problem with XML in
Haskell (I know that XML parsing is a slow and bulky process but this
seems excessive)? Is there a known solution? Does anyone have any advice?

Cheers

RS
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Haskell XML Parsers

Malcolm Wallace
> I have been looking at using XML for a little program I have been  
> writing. The file I am currently trying to load is about 9MB, and I  
> have now tried to use
> HaXml and HST. Without any of my own code, just a simple call to the  
> basic parsers, they both use huge amount of memory.
> HST is the worst and about 7GB and climbing. HaXml uses 1.3Gb.

Are you using Text.XML.HaXml.ParseLazy, or Text.XML.HaXml.Parse?  The  
lazy version should show much better space usage, provided your  
subsequent usage of the document is roughly a single-pass traversal.

Regards,
     Malcolm

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Haskell XML Parsers

Neil Mitchell
In reply to this post by Richard Senington
Hi,

You might want to take a look at TagSoup
(http://community.haskell.org/~ndm/tagsoup) - it parses XML/HTML
lazily returning a stream of tags. It doesn't do nesting, but it does
have good memory usage.

Thanks, Neil

On Fri, Apr 30, 2010 at 11:35 AM, R Senington <[hidden email]> wrote:

> Dear all,
>
> I have been looking at using XML for a little program I have been writing. The file I am currently trying to load is about 9MB, and I have now tried to use
> HaXml and HST. Without any of my own code, just a simple call to the basic parsers, they both use huge amount of memory.
> HST is the worst and about 7GB and climbing. HaXml uses 1.3Gb.
>
> The code I am using is
> HST
> xml <- readFile file_name_here;k<-runX (parseXmlDocument True) xml;print k
>
> and for HaXml
> x<-readFile file_name_here
> let (Document _ _ e _) = xmlParse "t" x
> let t = myFilter $ CElem e
> print $ length t
>
>
> I have seen on previous posts to the cafe that other people have run into this problem with HST. Is this a general problem with XML in Haskell (I know that XML parsing is a slow and bulky process but this seems excessive)? Is there a known solution? Does anyone have any advice?
>
> Cheers
>
> RS
> _______________________________________________
> Haskell-Cafe mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Haskell XML Parsers

Gregory Collins-3
In reply to this post by Richard Senington
R Senington <[hidden email]> writes:

> Dear all,
>
> I have been looking at using XML for a little program I have been writing. The
> file I am currently trying to load is about 9MB, and I have now tried to use
> HaXml and HST. Without any of my own code, just a simple call to the basic
> parsers, they both use huge amount of memory.
> HST is the worst and about 7GB and climbing. HaXml uses 1.3Gb.

If your needs are reasonably basic, you could consider trying:

  http://hackage.haskell.org/package/hexpat

which is an FFI binding to expat.

G
--
Gregory Collins <[hidden email]>
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Haskell XML Parsers

Malcolm Wallace
>> I have been looking at using XML for a little program I have been  
>> writing. The
>> file I am currently trying to load is about 9MB, and I have now  
>> tried to use
>> HaXml and HST. Without any of my own code, just a simple call to  
>> the basic
>> parsers, they both use huge amount of memory.
>> HST is the worst and about 7GB and climbing. HaXml uses 1.3Gb.

For the archives, the user took the suggestion of switching to HaXml's  
lazy parser, which solved the memory issue.

Regards,
     Malcolm

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe