HaXml: ampersand in attribute value

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

HaXml: ampersand in attribute value

Koen.Roelandt
HaXml seems to choke on finding an ampersand in an attribute value. Is
this normal? Is there any workaround?

Cheers,

Koen.
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: HaXml: ampersand in attribute value

Malcolm Wallace
[hidden email] wrote:
> HaXml seems to choke on finding an ampersand in an attribute value. Is
> this normal? Is there any workaround?

Yes, it is expected.  An ampersand indicates the start of a reference,
e.g. < or   If there is no semicolon to indicate the end of the
reference, then it is a parse error.  The XML specification is quite
clear that neither & nor < are valid standalone characters in an
attribute value.

Regards,
     Malcolm
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: HaXml: ampersand in attribute value

Lennart Augustsson
But speaking of HaXml bugs, I'm pretty sure HaXml doesn't handle
% correctly.  It seem to treat % specially everywhere, but I think
it is only special inside DTDs.  I have many XML files produced by
other tools that the HaXml parser fails to process because of this.

        -- Lennart

Malcolm Wallace wrote:

> [hidden email] wrote:
>> HaXml seems to choke on finding an ampersand in an attribute value. Is
>> this normal? Is there any workaround?
>
> Yes, it is expected.  An ampersand indicates the start of a reference,
> e.g. &lt; or &#20;  If there is no semicolon to indicate the end of the
> reference, then it is a parse error.  The XML specification is quite
> clear that neither & nor < are valid standalone characters in an
> attribute value.
>
> Regards,
>     Malcolm
> _______________________________________________
> Haskell-Cafe mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: HaXml: ampersand in attribute value

Koen.Roelandt
In reply to this post by Koen.Roelandt
> But speaking of HaXml bugs, I'm pretty sure HaXml doesn't handle
> % correctly.  It seem to treat % specially everywhere, but I think
> it is only special inside DTDs.  I have many XML files produced by
> other tools that the HaXml parser fails to process because of this.

I had a similar problem where the parser choked on % signs in attribute
values in the XML file. I solved it by playing around in Lex.hs. I could
check for the (ugly) solution...

> Yes, it is expected.  An ampersand indicates the start of a reference,
> e.g. &lt; or &#20;  If there is no semicolon to indicate the end of the
> reference, then it is a parse error.  The XML specification is quite
> clear that neither & nor < are valid standalone characters in an
> attribute value.

Which is exactly the problem. The & is part of a reference, namely &euml;.

Regards,

Koen.
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: HaXml: ampersand in attribute value

Malcolm Wallace
In reply to this post by Lennart Augustsson
Lennart Augustsson wrote:
 > But speaking of HaXml bugs, I'm pretty sure HaXml doesn't handle
 > % correctly.  It seem to treat % specially everywhere, but I think
 > it is only special inside DTDs.  I have many XML files produced by
 > other tools that the HaXml parser fails to process because of this.

I believe I fixed at least one bug to do with % characters around
version 1.14.  But that is the development branch in darcs, not formally
released yet.  Nevertheless, if you know of such bugs, do report them;
even better if you can send a small test case.

Regards,
     Malcolm
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: HaXml: ampersand in attribute value

Graham Klyne-2
In reply to this post by Lennart Augustsson
Lennart Augustsson wrote:
> But speaking of HaXml bugs, I'm pretty sure HaXml doesn't handle
> % correctly.  It seem to treat % specially everywhere, but I think
> it is only special inside DTDs.  I have many XML files produced by
> other tools that the HaXml parser fails to process because of this.

Indeed.  This is an area that I found required a fair amount of work on the
version of HaXML I was playing with, some time ago.

The change log at the end of:
http://www.ninebynine.org/Software/HaskellUtils/HaXml-1.12/src/Text/XML/HaXml/Lex.hs
has some clues to what I had to do.  Notably:
[[
-- Revision 1.12  2004/06/04 21:59:13  graham
-- Wortk-in-progress:  creating intermediate filter to handle parameter
-- entity replacement.  Separated common features from parse module.
-- Created new module based on simplified use of parsing utilities
-- to dtect and substitute PEs.  The result is a modifed token sequence
-- passed to the main XML parser.
]]

The parameter entity filter is defined by:
http://www.ninebynine.org/Software/HaskellUtils/HaXml-1.12/src/Text/XML/HaXml/SubstitutePE.hs

The parameter and entity entity handling aspect of the code was not pretty, due
mainly to the somewhat quirky nature of XML syntax, especially concerning
parameter and general entities.

#g

--
Graham Klyne
For email:
http://www.ninebynine.org/#Contact
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: HaXml: ampersand in attribute value

Graham Klyne-2
In reply to this post by Malcolm Wallace
Malcolm Wallace wrote:

> Lennart Augustsson wrote:
>> But speaking of HaXml bugs, I'm pretty sure HaXml doesn't handle
>> % correctly.  It seem to treat % specially everywhere, but I think
>> it is only special inside DTDs.  I have many XML files produced by
>> other tools that the HaXml parser fails to process because of this.
>
> I believe I fixed at least one bug to do with % characters around
> version 1.14.  But that is the development branch in darcs, not formally
> released yet.  Nevertheless, if you know of such bugs, do report them;
> even better if you can send a small test case.

Malcolm,

Did you come across the HaXml test harness I created based on a subset of W3C
conformance tests?

http://www.ninebynine.org/Software/HaskellUtils/HaXml-1.12/test/

This covers all the parameter entity problems I fixed some time ago.

#g

--
Graham Klyne
For email:
http://www.ninebynine.org/#Contact
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: HaXml: ampersand in attribute value

Malcolm Wallace
Graham Klyne <[hidden email]> wrote:

> Did you come across the HaXml test harness I created based on a subset
> of W3C conformance tests?
>     http://www.ninebynine.org/Software/HaskellUtils/HaXml-1.12/test/
> This covers all the parameter entity problems I fixed some time ago.

Indeed, and an excellent resource.  I have been wondering how to merge
it back into my version of HaXml ever since.

Regards,
    Malcolm
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe