HaXml and the XHTML 1.0 Strict DTD

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

HaXml and the XHTML 1.0 Strict DTD

Peter Gammie
Is anyone using HaXml to validate XHTML Strict?

The old 1.13.2 version has some bugs in how it handles attributes that  
stop me from using it. It handled the DTD parsing fine.

The most-recent darcs version relies on a newer ByteString than I  
have, so it is not easy for me to test it.

A recent (this year) darcs version failed to parse the DTD, yielding  
this error:

validate: In a sequence:
   in content spec of ELEMENT decl: head
   When looking for a non-empty sequence with separators:
    In a sequence:
     Expected % but found |
       in file xhtml1  at line 252 col 50
     when looking for a content particle

   when looking for a content particle

This is the context:

<!--================ Document Head  
=======================================-->

<!ENTITY % head.misc "(script|style|meta|link|object)*">

<!-- content model is %head.misc; combined with a single
      title and an optional base element in any order -->

<!ELEMENT head (%head.misc;,
      ((title, %head.misc;, (base, %head.misc;)?) |
       (base, %head.misc;, (title, %head.misc;))))>

I appreciate any advice on this.

cheers
peter
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: HaXml and the XHTML 1.0 Strict DTD

Malcolm Wallace
Peter Gammie <[hidden email]> wrote:

> The most-recent darcs version relies on a newer ByteString than I  
> have, so it is not easy for me to test it.

I believe there was a patch to fix this.  Apparently only one version of
the bytestring package (0.9.0.1) ever exported the 'join' function, and
a different version with the same number (but not exporting 'join') was
uploaded to Hackage!  'Join' has since been replaced by 'intercalate',
which is available in all versions 0.9.x.

> A recent (this year) darcs version failed to parse the DTD, yielding  
> this error:

I didn't try the full XHTML DTD, but the fragment you included in your
message was parsed just fine by the darcs version of HaXml/DtdToHaskell.

Regards,
    Malcolm
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Re: HaXml and the XHTML 1.0 Strict DTD

Duncan Coutts

On Wed, 2008-04-30 at 11:32 +0100, Malcolm Wallace wrote:

> Peter Gammie <[hidden email]> wrote:
>
> > The most-recent darcs version relies on a newer ByteString than I  
> > have, so it is not easy for me to test it.
>
> I believe there was a patch to fix this.  Apparently only one version of
> the bytestring package (0.9.0.1) ever exported the 'join' function, and
> a different version with the same number (but not exporting 'join') was
> uploaded to Hackage!  'Join' has since been replaced by 'intercalate',
> which is available in all versions 0.9.x.

Just goes to show that we need a tool to compare and check package APIs
so that packages that want to follow a versioning policy can check that
they really are. Doing these things manually is prone to mistakes like
this one (and another that I'm aware of in the same package).

Duncan

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: HaXml and the XHTML 1.0 Strict DTD

Marc Weber
In reply to this post by Peter Gammie
Also have a look at the HaXml page. A branch is listed there passing
more tests if I recall correctly. Don't know wether those changes have
been merged back yet. I haven't tried it.

Marc Weber
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: HaXml and the XHTML 1.0 Strict DTD

Peter Gammie
In reply to this post by Malcolm Wallace
On 30/04/2008, at 5:32 PM, Malcolm Wallace wrote:

> Peter Gammie <[hidden email]> wrote:
>
>> The most-recent darcs version relies on a newer ByteString than I
>> have, so it is not easy for me to test it.
>
> I believe there was a patch to fix this.  Apparently only one  
> version of
> the bytestring package (0.9.0.1) ever exported the 'join' function,  
> and
> a different version with the same number (but not exporting 'join')  
> was
> uploaded to Hackage!  'Join' has since been replaced by 'intercalate',
> which is available in all versions 0.9.x.

Thanks. I don't doubt it works with a newer bytestring, I just can't  
readily use such a thing.

>> A recent (this year) darcs version failed to parse the DTD, yielding
>> this error:
>
> I didn't try the full XHTML DTD, but the fragment you included in your
> message was parsed just fine by the darcs version of HaXml/
> DtdToHaskell.

Can you please try the full XHTML 1.0 Strict DTD? At the same time,  
can you verify that it handles this part of it properly (circa line  
854):

<!ELEMENT table
      (caption?, (col*|colgroup*), thead?, tfoot?, (tbody+|tr+))>

Using a slightly hacked HaXml v1.13.3, I get this from DtdToHaskell:

data Table = Table Table_Attrs (Maybe Caption)
                    (OneOf2 [Col] [Colgroup]) (Maybe Thead) (Maybe  
Tfoot)
                    (OneOf2 (List1 Tbody) (List1 Tr))
            deriving (Eq,Show)

My expectation is that we can have a <table> without a <col> or  
<colgroup> child. The W3 validator seems to agree with that  
interpretation. When I use the HaXml validator with this DTD I get  
this (e.g.):

Element <table> should contain (caption?,(col*|
colgroup*),thead?,tfoot?,(tbody+|tr+)) but does not.

Element <table> should contain (col*|colgroup*) but does not.

cheers
peter
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: HaXml and the XHTML 1.0 Strict DTD

Malcolm Wallace
Peter Gammie <[hidden email]> wrote:

> <!ELEMENT table
>       (caption?, (col*|colgroup*), thead?, tfoot?, (tbody+|tr+))>
>
> Using a slightly hacked HaXml v1.13.3, I get this from DtdToHaskell:
>
> data Table = Table Table_Attrs (Maybe Caption)
>                     (OneOf2 [Col] [Colgroup]) (Maybe Thead) (Maybe Tfoot)
>                     (OneOf2 (List1 Tbody) (List1 Tr))
>             deriving (Eq,Show)

This looks entirely correct to me.

> My expectation is that we can have a <table> without a <col> or  
> <colgroup> child.

Ah, yes I can see why that is permitted, but I guess HaXml's validator
is not yet smart enough to be able to choose whether it has seen an
empty list of <col> or an empty list of <colgroup>.  :-)

Here is a suggested fix.  Let me know if it works for you.  In
src/Text/XML/HaXml/Validate.hs, around line 220, use the following diff
over the local defn of 'choice':

    choice elem ns cps =  -- return only those parses that don't give any errors
        [ rem | ([],rem) <- map (\cp-> checkCP elem (definite cp) ns) cps ]
+       ++ [ ns | all possEmpty cps ]
        where definite (TagName n Query)  = TagName n None
              definite (Choice cps Query) = Choice cps None
              definite (Seq cps Query)    = Seq cps None
              definite (TagName n Star)   = TagName n Plus
              definite (Choice cps Star)  = Choice cps Plus
              definite (Seq cps Star)     = Seq cps Plus
              definite x                  = x
+             possEmpty (TagName _ mod)   = mod `elem` [Query,Star]
+             possEmpty (Choice cps None) = all possEmpty cps
+             possEmpty (Choice _ mod)    = mod `elem` [Query,Star]
+             possEmpty (Seq cps None)    = all possEmpty cps
+             possEmpty (Seq _ mod)       = mod `elem` [Query,Star]
   
Are there other places, apart from the validator, where a similar
problem arises?

Regards,
    Malcolm
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: HaXml and the XHTML 1.0 Strict DTD

Peter Gammie
On 21/05/2008, at 5:44 PM, Malcolm Wallace wrote:

> Peter Gammie <[hidden email]> wrote:
>
>> <!ELEMENT table
>>      (caption?, (col*|colgroup*), thead?, tfoot?, (tbody+|tr+))>
>>
>> Using a slightly hacked HaXml v1.13.3, I get this from DtdToHaskell:
>>
>> data Table = Table Table_Attrs (Maybe Caption)
>>                    (OneOf2 [Col] [Colgroup]) (Maybe Thead) (Maybe  
>> Tfoot)
>>                    (OneOf2 (List1 Tbody) (List1 Tr))
>>            deriving (Eq,Show)
>
> This looks entirely correct to me.

I realised that as soon as I sent it. :-)

>> My expectation is that we can have a <table> without a <col> or
>> <colgroup> child.
>
> Ah, yes I can see why that is permitted, but I guess HaXml's validator
> is not yet smart enough to be able to choose whether it has seen an
> empty list of <col> or an empty list of <colgroup>.  :-)
>
> Here is a suggested fix.  Let me know if it works for you.  In
> src/Text/XML/HaXml/Validate.hs, around line 220, use the following  
> diff
> over the local defn of 'choice':
>
>    choice elem ns cps =  -- return only those parses that don't give  
> any errors
>        [ rem | ([],rem) <- map (\cp-> checkCP elem (definite cp) ns)  
> cps ]
> +       ++ [ ns | all possEmpty cps ]
>        where definite (TagName n Query)  = TagName n None
>              definite (Choice cps Query) = Choice cps None
>              definite (Seq cps Query)    = Seq cps None
>              definite (TagName n Star)   = TagName n Plus
>              definite (Choice cps Star)  = Choice cps Plus
>              definite (Seq cps Star)     = Seq cps Plus
>              definite x                  = x
> +             possEmpty (TagName _ mod)   = mod `elem` [Query,Star]
> +             possEmpty (Choice cps None) = all possEmpty cps
> +             possEmpty (Choice _ mod)    = mod `elem` [Query,Star]
> +             possEmpty (Seq cps None)    = all possEmpty cps
> +             possEmpty (Seq _ mod)       = mod `elem` [Query,Star]

Fantastic, thanks, that seems to work fine. A couple of nits: your use  
of `elem` refers to Prelude.elem, so I added the Prelude as a  
qualified import as P and changed those shadowed references to `P.elem`.

I will try to send you a patch against 1.13.3 with all these little  
bits and pieces, when my project is finished.

Can you lay out some kind of plan for HaXml? (is 1.13.x now dead, is  
1.19.x stable, ...?) This would help for new-ish projects like mine.

> Are there other places, apart from the validator, where a similar
> problem arises?

I do not know, I am merely using the DTD and HTML parsers, the CFilter  
combinators, the pretty printer and the validator. They all seem fine  
on a cursory check.

(In general HaXml has been working quite well. Thanks for producing  
such a long-lived and well-thought-out library.)

cheers
peter

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: HaXml and the XHTML 1.0 Strict DTD

Malcolm Wallace
Peter Gammie <[hidden email]> wrote:

> Can you lay out some kind of plan for HaXml? (is 1.13.x now dead, is  
> 1.19.x stable, ...?) This would help for new-ish projects like mine.

The 1.13.x stable branch sees minimal maintenance only, mostly to repair
it to build after each new release of ghc breaks something.

Versions 1.14 - 1.19 (i.e. the darcs repo) introduce several API
changes.  I think those have now pretty-much stablised, but
unfortunately the work to realise the benefit of those changes
throughout the codebase is still incomplete in some places.  That is why
I have not frozen and released this branch as 2.0 yet.

For forward compatibility I would definitely recommend that a new
project using HaXml should start with the 1.19 branch, not 1.13.

Regards,
    Malcolm
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: HaXml and the XHTML 1.0 Strict DTD

Peter Gammie
On 21/05/2008, at 7:40 PM, Malcolm Wallace wrote:

> Peter Gammie <[hidden email]> wrote:
>
>> Can you lay out some kind of plan for HaXml? (is 1.13.x now dead, is
>> 1.19.x stable, ...?) This would help for new-ish projects like mine.
>
> The 1.13.x stable branch sees minimal maintenance only, mostly to  
> repair
> it to build after each new release of ghc breaks something.
>
> Versions 1.14 - 1.19 (i.e. the darcs repo) introduce several API
> changes.  I think those have now pretty-much stablised, but
> unfortunately the work to realise the benefit of those changes
> throughout the codebase is still incomplete in some places.  That is  
> why
> I have not frozen and released this branch as 2.0 yet.
>
> For forward compatibility I would definitely recommend that a new
> project using HaXml should start with the 1.19 branch, not 1.13.

Thanks for your advice. Due to GHC 6.6.1 not being ByteString-
upgradable, I have been slow in using the darcs version of HaXml. I am  
now using GHC 6.8.2 and so can try it out.

My earlier-reported bug for the DTD parser stands:

$ ~/bin/DtdToHaskell xhtml1-strict.dtd
DtdToHaskell: In a sequence:
   in content spec of ELEMENT decl: head
   When looking for a non-empty sequence with separators:
    In a sequence:
     Expected % but found |
       in file xhtml1-strict.dtd  at line 252 col 50
     when looking for a content particle

   when looking for a content particle

That is the XHTML 1.0 Strict DTD from the W3.

Do you have any ideas what might have caused this? If not, I will have  
a poke around. It did work fine in 1.13.3, as I remarked earlier.

cheers
peter
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe