Lexical Syntax and Unicode

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Lexical Syntax and Unicode

Manlio Perillo-3
Hi.

Reading the Haskell 98 Report (section 9.2), I have found a possible
problem.

The lexical syntax supports Unicode, however this is not true for the
newline:

newline -> return linefeed | return | linefeed | formfeed


The Unicode standard adds two additional characters:

U+2028 LINE SEPARATOR
U+2029 PARAGRAPH SEPARATOR

The Unicode Character Database, also defines two general categories:
Zl = Separator, line
Zp = Separator, paragraph

The Zl category only contains the LINE SEPARATOR character and the Zp
category only contains the PARAGRAPH SEPARATOR character.


So, IMHO, the lexical syntax should be changed in :

newline -> return linefeed | return | linefeed | formfeed
           | uniLine | uniPara
uniLine -> any Unicode character defined as line separator
uniPara -> any Unicode character defined as paragraph separator

or, alternatively:

uniLine -> LINE SEPARATOR
uniPara -> PARAGRAPH SEPARATOR



Manlio Perillo
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe