[GHC] #10412: isAlphaNum includes mark characters, but neither isAlpha nor isNumber do

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

[GHC] #10412: isAlphaNum includes mark characters, but neither isAlpha nor isNumber do

GHC - devs mailing list
#10412: isAlphaNum includes mark characters, but neither isAlpha nor isNumber do
-------------------------------------+-------------------------------------
              Reporter:              |             Owner:
  Artyom.Kazak                       |            Status:  new
                  Type:  bug         |         Milestone:
              Priority:  normal      |           Version:  7.10.1
             Component:              |  Operating System:  Unknown/Multiple
  libraries/base                     |   Type of failure:  None/Unknown
              Keywords:  unicode     |        Blocked By:
          Architecture:              |   Related Tickets:
  Unknown/Multiple                   |
             Test Case:              |
              Blocking:              |
Differential Revisions:              |
-------------------------------------+-------------------------------------
 {{{#!hs
 > isMark '\768'
 True

 > isAlphaNum '\768'
 True

 > (isAlpha '\768', isNumber '\768')
 (False,False)
 }}}

 This behavior comes from this piece in WCsubst.c:

 {{{
 unipred(u_iswalnum,(GENCAT_LT|GENCAT_LU|GENCAT_LL|GENCAT_LM|GENCAT_LO|
                     GENCAT_MC|GENCAT_ME|GENCAT_MN|
                     GENCAT_NO|GENCAT_ND|GENCAT_NL))
 }}}

 I'm not sure what should be done here. Is it a bug with isAlpaNum? Or with
 isAlpha? How does it correspond to iswalnum's behavior in C++?

 (And if it's a feature and not a bug, then it should definitely be
 documented.)

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/10412>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
_______________________________________________
ghc-tickets mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #10412: isAlphaNum includes mark characters, but neither isAlpha nor isNumber do

GHC - devs mailing list
#10412: isAlphaNum includes mark characters, but neither isAlpha nor isNumber do
-------------------------------------+-------------------------------------
        Reporter:  Artyom.Kazak      |                   Owner:
            Type:  bug               |                  Status:  new
        Priority:  normal            |               Milestone:
       Component:  libraries/base    |                 Version:  7.10.1
      Resolution:                    |                Keywords:  unicode
Operating System:  Unknown/Multiple  |            Architecture:
 Type of failure:  None/Unknown      |  Unknown/Multiple
      Blocked By:                    |               Test Case:
 Related Tickets:                    |                Blocking:
                                     |  Differential Revisions:
-------------------------------------+-------------------------------------

Comment (by hvr):

 For the record, this was already an issue on GHC 7.8.4 (through GHC
 7.0.4):

 {{{
 GHCi, version 7.0.4: http://www.haskell.org/ghc/  :? for help
 λ> import Data.Char
 λ> length $ filter isMark  $ filter (\c -> isAlphaNum c /= (isAlpha c &&
 isNumber c)) ['\0'..]
 1281
 }}}

 {{{
 GHCi, version 7.8.4: http://www.haskell.org/ghc/  :? for help
 λ> import Data.Char
 λ> length $ filter isMark  $ filter (\c -> isAlphaNum c /= (isAlpha c &&
 isNumber c)) ['\0'..]
 1498
 }}}

 {{{
 GHCi, version 7.10.1.20150511: http://www.haskell.org/ghc/  :? for help
 λ> import Data.Char
 λ> length $ filter isMark  $ filter (\c -> isAlphaNum c /= (isAlpha c &&
 isNumber c)) ['\0'..]
 1830
 }}}

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/10412#comment:1>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
_______________________________________________
ghc-tickets mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #10412: isAlphaNum includes mark characters, but neither isAlpha nor isNumber do

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#10412: isAlphaNum includes mark characters, but neither isAlpha nor isNumber do
-------------------------------------+-------------------------------------
        Reporter:  Artyom.Kazak      |                Owner:  (none)
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:
       Component:  libraries/base    |              Version:  7.10.1
      Resolution:                    |             Keywords:  unicode
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------
Changes (by lelf):

 * cc: lelf (added)


--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/10412#comment:2>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler

_______________________________________________
ghc-tickets mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #10412: isAlphaNum includes mark characters, but neither isAlpha nor isNumber do

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#10412: isAlphaNum includes mark characters, but neither isAlpha nor isNumber do
-------------------------------------+-------------------------------------
        Reporter:  Artyom.Kazak      |                Owner:  (none)
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:
       Component:  libraries/base    |              Version:  7.10.1
      Resolution:                    |             Keywords:  unicode,
                                     |  newcomer
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------
Changes (by bgamari):

 * keywords:  unicode => unicode, newcomer


--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/10412#comment:3>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler

_______________________________________________
ghc-tickets mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #10412: isAlphaNum includes mark characters, but neither isAlpha nor isNumber do

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#10412: isAlphaNum includes mark characters, but neither isAlpha nor isNumber do
-------------------------------------+-------------------------------------
        Reporter:  Artyom.Kazak      |                Owner:  (none)
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:
       Component:  libraries/base    |              Version:  7.10.1
      Resolution:                    |             Keywords:  unicode,
                                     |  newcomer
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by sighingnow):

 `GENCAT_MC|GENCAT_ME|GENCAT_MN` has been included in `u_iswalnum` since
 more than 10 years ago. However the documentation of `isAlphaNum` says
 "Selects alphabetic or numeric digit Unicode characters" and doesn't
 mention the "mark" characters.

 Should we fix the documentation of `isAlphaNum` to include "mark"
 characters or keep the documentation as it is and fix `u_iswalnum`?

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/10412#comment:4>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler

_______________________________________________
ghc-tickets mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #10412: isAlphaNum includes mark characters, but neither isAlpha nor isNumber do

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#10412: isAlphaNum includes mark characters, but neither isAlpha nor isNumber do
-------------------------------------+-------------------------------------
        Reporter:  Artyom.Kazak      |                Owner:  (none)
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:
       Component:  libraries/base    |              Version:  7.10.1
      Resolution:                    |             Keywords:  unicode,
                                     |  newcomer
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by Azel):

 From what I can see on various C and C++ documentations (i.e.
 [https://docs.microsoft.com/en-gb/cpp/c-runtime-library/reference/isalnum-
 iswalnum-isalnum-l-iswalnum-l Microsoft's],
 [https://www.gnu.org/software/libc/manual/html_node/Classification-of-
 Wide-Characters.html#Classification-of-Wide-Characters the glibc's] or
 [http://en.cppreference.com/w/cpp/string/wide/iswalnum cppreference.com's]
 which refers us [http://www.open-std.org/JTC1/SC35/WG5/docs/30112d10.pdf
 here]) `iswalnum`'s behaviour should be to return `True` if either of
 `iswalpha` or `iswdigit` does, so I guess `isAlphaNum` ought to do the
 same. That is, keeping the documentation as it is and fixing `u_iswalnum`.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/10412#comment:5>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler

_______________________________________________
ghc-tickets mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #10412: isAlphaNum includes mark characters, but neither isAlpha nor isNumber do

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#10412: isAlphaNum includes mark characters, but neither isAlpha nor isNumber do
-------------------------------------+-------------------------------------
        Reporter:  Artyom.Kazak      |                Owner:  (none)
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:
       Component:  libraries/base    |              Version:  7.10.1
      Resolution:                    |             Keywords:  unicode,
                                     |  newcomer
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by Azel):

 Looking a bit farther afield, all languages I see who have an `isAlphaNum`
 equivalent define it as returning `True` if either of their `isAlpha` or
 `isNumber` equivalents do (e.g.
 [https://docs.oracle.com/javase/9/docs/api/java/lang/Character.html
 #isLetterOrDigit-int- Java's], [http://msdn.microsoft.com/en-
 gb/library/cay4xx2f(v=vs.110).aspx the .NET Framework's],
 [http://www.lispworks.com/documentation/HyperSpec/Body/13_ade.htm Common
 Lisp's], [https://docs.python.org/3/library/stdtypes.html#str.isalnum
 Python's] — with the particularity in Python's documentation that they put
 three functions to match on numbers in `isalnum`'s description but the
 first two are subsumed by the third… — or [http://www.ada-
 auth.org/standards/12rm/html/RM-A-3-5.html Ada's]). So I'm willing to have
 a go at solving that ticket and would be in favour of fixing `u_iswalnum`
 and keeping the doc mostly as it is: it states that `isAlphaNum` selects
 alphabetic or numeric digit Unicode characters and currently, even if we
 remove the mark characters, it doesn't matches only that because it
 matches also `GENCAT_NO` and `GENCAT_NL`.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/10412#comment:6>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler

_______________________________________________
ghc-tickets mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #10412: isAlphaNum includes mark characters, but neither isAlpha nor isNumber do

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#10412: isAlphaNum includes mark characters, but neither isAlpha nor isNumber do
-------------------------------------+-------------------------------------
        Reporter:  Artyom.Kazak      |                Owner:  Azel
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:
       Component:  libraries/base    |              Version:  7.10.1
      Resolution:                    |             Keywords:  unicode,
                                     |  newcomer
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------
Changes (by Azel):

 * owner:  (none) => Azel


--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/10412#comment:7>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler

_______________________________________________
ghc-tickets mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #10412: isAlphaNum includes mark characters, but neither isAlpha nor isNumber do

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#10412: isAlphaNum includes mark characters, but neither isAlpha nor isNumber do
-------------------------------------+-------------------------------------
        Reporter:  Artyom.Kazak      |                Owner:  Azel
            Type:  bug               |               Status:  patch
        Priority:  normal            |            Milestone:
       Component:  libraries/base    |              Version:  7.10.1
      Resolution:                    |             Keywords:  unicode,
                                     |  newcomer
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):  Phab:D4593
       Wiki Page:                    |
-------------------------------------+-------------------------------------
Changes (by Azel):

 * status:  new => patch
 * differential:   => Phab:D4593


--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/10412#comment:8>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler

_______________________________________________
ghc-tickets mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #10412: isAlphaNum includes mark characters, but neither isAlpha nor isNumber do

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#10412: isAlphaNum includes mark characters, but neither isAlpha nor isNumber do
-------------------------------------+-------------------------------------
        Reporter:  Artyom.Kazak      |                Owner:  Azel
            Type:  bug               |               Status:  patch
        Priority:  normal            |            Milestone:
       Component:  libraries/base    |              Version:  7.10.1
      Resolution:                    |             Keywords:  unicode,
                                     |  newcomer
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):  Phab:D4593
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by Ben Gamari <ben@…>):

 In [changeset:"a26983a3aef25b3fa5f66b4d68ea7240a6dd1543/ghc"
 a26983a3/ghc]:
 {{{
 #!CommitTicketReference repository="ghc"
 revision="a26983a3aef25b3fa5f66b4d68ea7240a6dd1543"
 Fixes isAlphaNum re. isAlpha/isNumber and doc fix (trac issue #10412)

 Corrects the inconsistency between Data.Char.isAlphaNum,
 Data.Char.isAlpha and Data.Char.isNumber. Indeed, isAlphaNum was
 returning True not only when isAlpha or isNumber returned True but
 also when isMark did. The selectors for the Mn, Mc and Me general
 categories where removed from the macro generating u_iswalnum in
 ubconfc.

 Also, Data.Char.isAlphaNum's documentation was changed to state that
 isAlphaNum returns true not only for Unicode number digits but for
 Unicode numbers in general in Unicode.hs.

 Signed-off-by: ARJANEN Loïc Jean David <[hidden email]>

 Reviewers: hvr, ekmett, lelf, bgamari

 Reviewed By: bgamari

 Subscribers: thomie, carter

 GHC Trac Issues: #10412

 Differential Revision: https://phabricator.haskell.org/D4593
 }}}

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/10412#comment:9>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler

_______________________________________________
ghc-tickets mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #10412: isAlphaNum includes mark characters, but neither isAlpha nor isNumber do

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#10412: isAlphaNum includes mark characters, but neither isAlpha nor isNumber do
-------------------------------------+-------------------------------------
        Reporter:  Artyom.Kazak      |                Owner:  Azel
            Type:  bug               |               Status:  closed
        Priority:  normal            |            Milestone:  8.6.1
       Component:  libraries/base    |              Version:  7.10.1
      Resolution:  fixed             |             Keywords:  unicode,
                                     |  newcomer
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):  Phab:D4593
       Wiki Page:                    |
-------------------------------------+-------------------------------------
Changes (by bgamari):

 * status:  patch => closed
 * resolution:   => fixed
 * milestone:   => 8.6.1


--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/10412#comment:10>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler

_______________________________________________
ghc-tickets mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-tickets
Reply | Threaded
Open this post in threaded view
|

Re: [GHC] #10412: isAlphaNum includes mark characters, but neither isAlpha nor isNumber do

GHC - devs mailing list
In reply to this post by GHC - devs mailing list
#10412: isAlphaNum includes mark characters, but neither isAlpha nor isNumber do
-------------------------------------+-------------------------------------
        Reporter:  Artyom.Kazak      |                Owner:  Azel
            Type:  bug               |               Status:  closed
        Priority:  normal            |            Milestone:  8.6.1
       Component:  libraries/base    |              Version:  7.10.1
      Resolution:  fixed             |             Keywords:  unicode,
                                     |  newcomer
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):  Phab:D4593
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by Ben Gamari <ben@…>):

 In [changeset:"da7438526e4bfb1821aa77a22ff66a4a80baf8c6/ghc" da74385/ghc]:
 {{{
 #!CommitTicketReference repository="ghc"
 revision="da7438526e4bfb1821aa77a22ff66a4a80baf8c6"
 base: Add a test for T10412

 Expects the current behavior, will be updated by D4593 to reflect
 desired behavior.

 Reviewers: hvr

 Subscribers: thomie, carter

 GHC Trac Issues: #10412

 Differential Revision: https://phabricator.haskell.org/D4610
 }}}

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/10412#comment:11>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler

_______________________________________________
ghc-tickets mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-tickets