Unicode Haskell source -- Yippie!

classic Classic list List threaded Threaded
42 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: Unicode Haskell source -- Yippie!

David Fox-12
On Thu, Apr 24, 2014 at 10:27 AM, Kyle Murphy <[hidden email]> wrote:
It's an interesting feature, and nice if you want that sort of thing, but not something I'd personally want to see as the default. Deviating from the standard ASCII set of characters is just too much of a hurdle to usability of the language.

On the other hand, maybe if its good enough for the entire field of Mathematics since forever there might be some benefit in it for us. 

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Unicode Haskell source -- Yippie!

Bardur Arantsson-2
On 2014-04-26 17:58, David Fox wrote:

> On Thu, Apr 24, 2014 at 10:27 AM, Kyle Murphy <[hidden email]> wrote:
>
>> It's an interesting feature, and nice if you want that sort of thing, but
>> not something I'd personally want to see as the default. Deviating from the
>> standard ASCII set of characters is just too much of a hurdle to usability
>> of the language.
>>
>
> On the other hand, maybe if its good enough for the entire field of
> Mathematics since forever there might be some benefit in it for us.
>

Typing into a computer != Handwriting (in various significant ways).
Most of mathematics notation predates computers/typewriters. Just
compare writing a formula by hand and typing the same formula in (La)TeX.

Regards,

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Unicode Haskell source -- Yippie!

Carter Schonwald
In reply to this post by David Fox-12
the vast majority of math is written using latex, which while supporting unicode, is mostly ascii :) 


On Sat, Apr 26, 2014 at 11:58 AM, David Fox <[hidden email]> wrote:
On Thu, Apr 24, 2014 at 10:27 AM, Kyle Murphy <[hidden email]> wrote:
It's an interesting feature, and nice if you want that sort of thing, but not something I'd personally want to see as the default. Deviating from the standard ASCII set of characters is just too much of a hurdle to usability of the language.

On the other hand, maybe if its good enough for the entire field of Mathematics since forever there might be some benefit in it for us. 

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe



_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Unicode Haskell source -- Yippie!

Ben Franksen
In reply to this post by Nickolay Kudasov
Nickolay Kudasov wrote:
>> eg I would like to see \ spelled as λ
>
> ​I have symbol substitution enabled in Vim. E.g. when I write \ (and it is
> syntactically lambda) I get λ. The same way composition (.) is replaced
> with ∘. The same trick can be enabled for other operators as well. So I
> have normal text and nice presentation in *my* text editor: it does not
> bother anyone but me.

I think this is the right approach. See also https://github.com/i-tu/Hasklig/

The main problem with special Unicode characters, as I see it, is that it is
no longer possible to distinguish characters unambiguously just by looking
at them. Apart from questions of maintainability, this is also a potential
security problem: it enables an attacker to slip in malicious code simply by
importing a module whose name looks like a well known safe module. In a big
and complex piece of software, such an attack might not be spotted for some
time.

Cheers
Ben
--
"Make it so they have to reboot after every typo." -- Scott Adams


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Unicode Haskell source -- Yippie!

Rustom Mody
In reply to this post by David Fox-12
On Sat, Apr 26, 2014 at 9:28 PM, David Fox <[hidden email]> wrote:
On Thu, Apr 24, 2014 at 10:27 AM, Kyle Murphy <[hidden email]> wrote:
It's an interesting feature, and nice if you want that sort of thing, but not something I'd personally want to see as the default. Deviating from the standard ASCII set of characters is just too much of a hurdle to usability of the language.

On the other hand, maybe if its good enough for the entire field of Mathematics since forever there might be some benefit in it for us. 

Chris spoke of his choice of Idris over Agda related to not going overboard with unicode. The FAQ he linked to has this to say:

| And I'm sure that in a few years time things will be different and software will
| cope better and it will make sense to revisit this. For now, however, I would
| prefer not to allow arbitrary unicode symbols in operators.


1. I'd like to underscore the 'arbitrary'.  Why is ASCII any less arbitrary -- apart from an increasingly irrelevant historical accident -- than Arabic, Bengali, Cyrillic, Deseret? [Hint: Whats the A in ASCII?]  By contrast math may at least have some pretensions to universality?

2. Maybe its a good time now to 'revisit'?  Otherwise like klunky-qwerty, it may happen that when the technological justifications for an inefficient choice are long gone, social inertia will prevent any useful change.


On Sun, Apr 27, 2014 at 3:00 PM, Ben Franksen <[hidden email]> wrote:
The main problem with special Unicode characters, as I see it, is that it is
no longer possible to distinguish characters unambiguously just by looking
at them. Apart from questions of maintainability, this is also a potential
security problem: it enables an attacker to slip in malicious code simply by
importing a module whose name looks like a well known safe module. In a big
and complex piece of software, such an attack might not be spotted for some
time.

Bang on!

However the Pandora-box is already open and the creepy-crawlies are all over us.

Witness:

GHCi, version 7.6.3: http://www.haskell.org/ghc/  :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Prelude> let а = 1
Prelude> a
<interactive>:11:1: Not in scope: `a'
Prelude>


In case you cant see it the two a's are different unicode characters:
CYRILLIC SMALL LETTER A
vs
LATIN SMALL LETTER A

Regards
Rusi


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Unicode Haskell source -- Yippie!

Bardur Arantsson-2
On 2014-04-27 13:45, Rustom Mody wrote:
>
> 1. I'd like to underscore the 'arbitrary'.  Why is ASCII any less arbitrary
> -- apart from an increasingly irrelevant historical accident -- than
> Arabic, Bengali, Cyrillic, Deseret? [Hint: Whats the A in ASCII?]  By
> contrast math may at least have some pretensions to universality?

The symbols in math are also mostly arbitrary. In effect they should be
considered as "parallel" to the Cyrillic, Latin or Greek alphabets. (Of
course math borrows quite a few symbols from the latter, but I digress.)

>
> 2. Maybe its a good time now to 'revisit'?  Otherwise like klunky-qwerty,
> it may happen that when the technological justifications for an inefficient
> choice are long gone, social inertia will prevent any useful change.
>

Billions of people have QWERTY keyboards. Unless you come up with
something *radically* better then they're not going to change. Inertia
has made anything but incremental change impossible. (I note that
Microsoft actually managed to change the QWERTY keyboard incrementally a
decade or two ago by adding the Windows and Context Menu keys. Of course
that didn't removing/change any of the existing functionality of the
basic QWERTY, so it was a relatively small change.)

Using "macros" like "\" (for lambda) or "\sum_{i=0}^{n} i" and having
the editor/IDE display that differently is at least semi-practical for
typing stuff into your computer using QWERTY.

Regards,

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Unicode Haskell source -- Yippie!

Brandon Allbery
In reply to this post by Rustom Mody
On Sun, Apr 27, 2014 at 7:45 AM, Rustom Mody <[hidden email]> wrote:
1. I'd like to underscore the 'arbitrary'.  Why is ASCII any less arbitrary -- apart from an increasingly irrelevant historical accident -- than Arabic, Bengali, Cyrillic, Deseret? [Hint: Whats the A in ASCII?]  By contrast math may at least have some pretensions to universality?

Math notations are not as universal as many would like to think, sadly.

And I am not sure the historical accident is really irrelevant; as the same "accident" was involved in most of the computer languages and protocols we use daily, I would not be at all surprised to find that there are subtle dependencies buried in the whole mess --- similar to how (most... sigh) humans pick up language and culture signals as children too young to apply any kind of critical analysis to it, and can have real problems trying to eradicate or modify them later. (Yes, languages can be fixed. But how many tools do you use when working with them? It's almost certainly more than the ones that immediately come to mind or are listed on e.g. Hackage. In particular, that ligature may be great in your editor and unfortunate when you pop a terminal and grep for it --- especially if you start extending this to other languages so you need a different set of ligatures [a different font!] for each language....)

--
brandon s allbery kf8nh                               sine nomine associates
[hidden email]                                  [hidden email]
unix, openafs, kerberos, infrastructure, xmonad        http://sinenomine.net

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Unicode Haskell source -- Yippie!

Rustom Mody

On Sun, Apr 27, 2014 at 9:02 PM, Brandon Allbery <[hidden email]> wrote:
On Sun, Apr 27, 2014 at 7:45 AM, Rustom Mody <[hidden email]> wrote:
1. I'd like to underscore the 'arbitrary'.  Why is ASCII any less arbitrary -- apart from an increasingly irrelevant historical accident -- than Arabic, Bengali, Cyrillic, Deseret? [Hint: Whats the A in ASCII?]  By contrast math may at least have some pretensions to universality?

Math notations are not as universal as many would like to think, sadly.

And I am not sure the historical accident is really irrelevant; as the same "accident" was involved in most of the computer languages and protocols we use daily, I would not be at all surprised to find that there are subtle dependencies buried in the whole mess --- similar to how (most... sigh) humans pick up language and culture signals as children too young to apply any kind of critical analysis to it, and can have real problems trying to eradicate or modify them later. (Yes, languages can be fixed. But how many tools do you use when working with them? It's almost certainly more than the ones that immediately come to mind or are listed on e.g. Hackage. In particular, that ligature may be great in your editor and unfortunate when you pop a terminal and grep for it --- especially if you start extending this to other languages so you need a different set of ligatures [a different font!] for each language....)


Nice point!

And as I said above that Pandora's box is already wide open for current Haskell.
[And python and probably most modern languages] Can we reverse it??


Witness:

----------------------
$ ghci
GHCi, version 7.6.3: http://www.haskell.org/ghc/  :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Prelude> let (fine,fine) = (1,2)
Prelude> (fine,fine)
(1,2)
Prelude>

---------------------

If you had the choice would you allow that f-i ligature to be thus confusable with the more normal fi?  I probably wouldn't but nobody is asking us and the water that's flowed under the bridge cannot be 'flowed' backwards (to the best of my knowledge!)

In case that seems far-fetched consider the scenario:
1. Somebody loads (maybe innocently) the code involving variables like 'fine'
into a 'ligature-happy 'IDE/editor'
2. The editor quietly changes all the fine to fine.
3. Since all those variables are in local scope nothing untoward is noticed
4. Until someone loads it into an 'old-fashioned' editor... and then...

Would you like to be on the receiving end on such 'fun'?

IOW the choice
"Ascii is the universal bedrock of computers -- best to stick with it"
vs
"Ascii is arbitrary and parochial and we SHOULD move on"

is not a choice at all. We (ie OSes, editors, languages) have all already moved on. And moved on in a particularly ill-considered way.

For example there used to be the minor nuisance that linux filesystems were typically case-sensitive, windows case-insensitive.

Now with zillions of new confusables like the Latin vs Cyrillic а vs a -- well we have quite a mess!

Embracing math in a well-considered and systematic way does not increase the mess; it can even reduce it.

My 2 (truly American) ¢
Rusi

PS Someone spoke of APL and someone else said Agda/Idris may be more relevant.  I wonder how many of the younger generation have heard of squiggol?

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Unicode Haskell source -- Yippie!

Ian Tuomi
On 27 Apr 2014, at 19:58, Rustom Mody wrote:

> If you had the choice would you allow that f-i ligature to be thus
> confusable with the more normal fi?  I probably wouldn't but nobody is
> asking us and the water that's flowed under the bridge cannot be
> 'flowed'
> backwards (to the best of my knowledge!)
>
> In case that seems far-fetched consider the scenario:
> 1. Somebody loads (maybe innocently) the code involving variables like
> 'fine'
> into a 'ligature-happy 'IDE/editor'
> 2. The editor quietly changes all the fine to fine.
> 3. Since all those variables are in local scope nothing untoward is
> noticed
> 4. Until someone loads it into an 'old-fashioned' editor... and
> then...

I develop Hasklig, and have enjoyed the discussion about the pros and
cons of ligatures in coding fonts. However, I really must protest this
line of reasoning since it is based on false premises.

As an opentype feature, ligatures have nothing to do with the 'fi' and
'fl' unicode points, (which are legacy only, and heavily discouraged by
the unicode consortium), or with unicode at all. The encoding of the
file could be pure ASCII for all the ligatures care. The font used
changes how the text looks, and nothing else.

When speaking of special unicode symbols in code, I agree with most
objections raised against them :)

br,
Ian

P.S. Sorry for potential repost - I'm getting automatic rejects
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Unicode Haskell source -- Yippie!

Richard A. O'Keefe
In reply to this post by Rustom Mody

On 25/04/2014, at 5:15 AM, Rustom Mody wrote:
> x ÷ y   = divMod x y

This one looks wrong to me.
In common usage, ÷ indicates plain old division,
e.g., 3÷2 = 1½.
See for example http://en.wikipedia.org/wiki/Table_of_mathematical_symbols

One possibility would be

> x ÷ y = x / y :: Rational


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Unicode Haskell source -- Yippie!

Conrad Parker
In reply to this post by Ian Tuomi

On 28 April 2014 06:57, Ian Tuomi <[hidden email]> wrote:
On 27 Apr 2014, at 19:58, Rustom Mody wrote:
If you had the choice would you allow that f-i ligature to be thus
confusable with the more normal fi?  I probably wouldn't but nobody is
asking us and the water that's flowed under the bridge cannot be 'flowed'
backwards (to the best of my knowledge!)

In case that seems far-fetched consider the scenario:
1. Somebody loads (maybe innocently) the code involving variables like
'fine'
into a 'ligature-happy 'IDE/editor'
2. The editor quietly changes all the fine to fine.
3. Since all those variables are in local scope nothing untoward is noticed
4. Until someone loads it into an 'old-fashioned' editor... and then...

I develop Hasklig, and have enjoyed the discussion about the pros and cons of ligatures in coding fonts. However, I really must protest this line of reasoning since it is based on false premises.

As an opentype feature, ligatures have nothing to do with the 'fi' and 'fl' unicode points, (which are legacy only, and heavily discouraged by the unicode consortium), or with unicode at all. The encoding of the file could be pure ASCII for all the ligatures care. The font used changes how the text looks, and nothing else.

When speaking of special unicode symbols in code, I agree with most objections raised against them :)

Ian,

thanks for hasklig. My first thought when I saw it was that hopefully it would assuage the annoying promoters of unicode overreach.

Conrad. 

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Unicode Haskell source -- Yippie!

Richard A. O'Keefe
In reply to this post by Rustom Mody

On 26/04/2014, at 1:30 AM, Rustom Mody wrote:
> On Fri, Apr 25, 2014 at 6:32 PM, Chris Warburton <[hidden email]> wrote:
> Rustom Mody <[hidden email]> writes:
>
> > As for APL, it failed for various reasons eg
> > - mixing up assembly language (straight line code with gotos) with
> > functional idioms
> > - the character set was a major hurdle in the 60s. Thats not an issue today
> > when most OSes/editors are unicode compliant

I strongly suspect that the failure of APL had very little
to do with the character set.  When APL was introduced, the
character set was just a matter of dropping in a different
golf-ball.  Later, it was just bits on a screen.  Heck in
1984 I was using C and LaTeX on an IBM mainframe where the
terminals displayed curly braces as spaces, and oddly enough
that didn't kill C...  In any case, it was possible to enter
any arbitrary APL text using straight ASCII, so that was no
great problem.

There were a number of much more serious issues with APL.

(1) In "classic" APL everything is an n-dimensional array,
either an array of characters or an array of (complex) numbers.
An absolutely regular array.  Want to process a collection of
records where some of the fields are strings?  No can do.
Want to process a collection of strings of different length?
No can do: you must use a 2-dimensional array, padding all
the strings to the same length.  Want type checking?
Hysterical laughter.

APL2 "fixed" this by introducing nested arrays.  This is
powerful, but occasionally clumsy.  And it is positional,
not named.  You *can* represent trees, you can represent
records with mixed fields, you can do all sorts of stuff.
But it's positional, not named.

(2) There aren't _that_ many APL symbols, and it didn't take
too long to learn them, and once you did, they weren't that
hard to remember.  (Although the use of the horseshoe
symbols in APL2 strikes me as *ab*use.) Problem is, a whole
lot of other things were done with numbers.  Here's the
trig functions:
        0 ◦ x sqrt(1-x**2)
        1 ◦ x sin x
        ¯1 ◦ x arcsin x
        2 ◦ x cos x
        ¯2 ◦ x arccos x
        3 ◦ x tan x
        ¯3 ◦ x arctan x
        4 ◦ x sqrt(x**2+1)
        ¯4 ◦ x sqrt(x**2-1)
        5 ◦ x sinh x
        ¯5 ◦ x arcsinh x
        6 ◦ x cosh x
        ¯6 ◦ x arccosh x
        7 ◦ x tanh x
        ¯7 ◦ x arctanh x
Who thought _that_ was a good idea?
Well, presumably it was the same person who introduced the
"I-beam functions".  A range of system functions (time of
day, cpu time used, space available, ...) were distinguished
by *numbers*.

(3) Which brings me to the dialect problem.  No two systems
had the *same* set of I-beam functions.  You couldn't even
rely on two systems having the same *kind* of approach to
files.  There were several commercial APL systems, and they
weren't priced for the hobbyist or student.


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Unicode Haskell source -- Yippie!

Richard A. O'Keefe
In reply to this post by Ben Franksen

On 27/04/2014, at 9:30 PM, Ben Franksen wrote:
> The main problem with special Unicode characters, as I see it, is that it is
> no longer possible to distinguish characters unambiguously just by looking
> at them.

"No longer"?
Hands up all the people old enough to have used "coding forms".

Yes, children, there was a time when programmers wrote their
programs on printed paper forms (sort of like A4 tipped sideways)
so that the keypunch girls (not my sexism, historical accuracy)
knew exactly which column each character went in.  And at the
top of each sheet was a row of boxes for you to show how you wrote
2 Z 7
1 I !
0 O
and the like.

For that matter, I recall a PhD thesis from the 80s in which the
author spent a page grumbling about the difficulty of telling
commas and semicolons apart...

> Apart from questions of maintainability, this is also a potential
> security problem: it enables an attacker to slip in malicious code simply by
> importing a module whose name looks like a well known safe module. In a big
> and complex piece of software, such an attack might not be spotted for some
> time.

Again, considering the possibilities of "1" "i" "l", I don't
think we actually have a new problem here.

Presumably this can be addressed by tools:
"here is are some modules, tell me what exactly they depend on"
not entirely unlike ldd(1).

Of course, the gotofail bug shows that it's not enough to _have_
tools like that, you have to use them and review the results
periodically.



_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Unicode Haskell source -- Yippie!

Rustom Mody
In reply to this post by Richard A. O'Keefe



On Mon, Apr 28, 2014 at 6:46 AM, Richard A. O'Keefe <[hidden email]> wrote:

On 25/04/2014, at 5:15 AM, Rustom Mody wrote:
> x ÷ y   = divMod x y

This one looks wrong to me.
In common usage, ÷ indicates plain old division,
e.g., 3÷2 = 1½.
See for example http://en.wikipedia.org/wiki/Table_of_mathematical_symbols

One possibility would be

> x ÷ y = x / y :: Rational


Thanks Richard for (as usual!) look at that list with a fine-toothed comb

I started with writing a corresponding list for python:
http://blog.languager.org/2014/04/unicoded-python.html

As you will see I mention there that ÷ mapped to divMod is one but hardly the only possibility.

That list is mostly about math, not imperative features and so carries over from python to haskell mostly unchanged.

Please (if you have 5 minutes) glance at it and give me your comments. I may then finish a similar one for Haskell.

Thanks
Rusi



--
http://www.the-magus.in
http://blog.languager.org


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Unicode Haskell source -- Yippie!

Richard A. O'Keefe
Before speaking of "Apl's mistakes", one should be
clear about what exactly those mistakes *were*.
I should point out that the symbols of APL, as such,
were not a problem.  But the *number* of such symbols
was.  In order to avoid questions about operator
precedence, APL *hasn't* any.  In the same way,
Smalltalk has an extensible set of 'binary selectors'.
If you see an expression like

        a ÷> b ~@ c

which operator dominates which?  Smalltalk adopted the
same solution as APL:  no operator precedence.

Before Pascal, there was something approaching a
consensus in programming languages that
        ** tightest
        *,/,div,mod
        unary and binary +,-
        relational operators
        not
        and
        or
In order to make life easier with user-defined
operators, Algol 68 broke this by making unary
operators (including not and others you haven't
heard of like 'down' and 'upb') bind tightest.
As it turned out, this make have made life
easier for the compiler, but not for people.
In order, allegedly, to make life easier for
students, Pascal broke this by making 'or'
and 'and' at the same level as '+' and '*'.
To this day, many years after Pascal vanished
(Think Pascal is dead, MrP is dead, MPW Pascal
is dead, IBM mainframe Pascal died so long ago
it doesn't smell any more, Sun Pascal is dead, ...)
a couple of generations of programmers believe
that you have to write
        (x > 0) && (x < n)
in C, because of what their Pascal-trained predecessor
taught them.

If we turn to Unicode, how should we read

        a ⊞ b ⟐ c

Maybe someone has a principled way to tell.  I don't.
And then we have to ask about a ⊞⟐ b ⟐⊞ c.

This is NOT a new problem.
Haskell already has way too many operators floating
around for me to remember their relative precedence,
and I have to follow a rule "when an expression
contains two operators from different 'semantic fields',
use parentheses."  Don't ask me to explain that!

Unicode does make the problem rather more pressing.
Instead of agonising over the difference between
< << <<< <<<< and the like, now we can agonise over
the difference between a couple of dozen variously
decorated and accompanied versions of the subset sign
as single characters.

Did you know that there is a single ⩵ character?
Distinct from ==?

I firmly believe that *careful* introduction of
mathematical symbols can be good, but that it needs
rather more care for consistency and readabiity than
Haskell operators have had so far.

I think wide consideration is necessary lest we end
up with things like x ÷ y where x and y are numbers
not giving a number.

>
> I started with writing a corresponding list for python:
> http://blog.languager.org/2014/04/unicoded-python.html

The "Math Space Advantage" there can be summarised as:
 "if you use Unicode symbols for operators you can
  omit even more spaces than you already do, wow!"

Never mind APL.  What about SETL?
For years I yearned to get my hands on SETL so that
I could write
        (∀x∈s)(∃y∈s)f(x, y)
The idea of using *different* symbols for testing and
binding (2.2, "Dis") strikes me as "Dis" indeed.  I want
to use the same character in both places because they
mean the same thing.  It's the ∀ and ∃ that mean "bind".

The name space burden reduction argument won't fly either.
Go back and look at
http://en.wikipedia.org/wiki/Table_of_mathematical_symbols

≤  less than or equal to in a partial order
   is a subgroup of
   can be reduced to
×  multiplication  
   Cartesian product
   cross product
   (as superscript) group of units

In mathematics, the same meaning may be represented by
several different symbols.  And the same symbol may be
used for several different meanings.

(If Haskell allowed prefix and superscript operators,
think of the fun we could have keeping track of
                                             *
the Hodge dual  *v  and the ordinary dual:  v .)

Representing π as π seems like a clear win.
But do we want to use c, e, G, α, γ and other constants
with familiar 1-character names by those characters?
What if someone is writing Haskell in Greek?
(Are you reading this, Kostis?)

I STRONGLY disagree that x÷y should violate the norms of
school by returning something other than a number.
When it comes to returning a quotient and remainder,
Haskell has two ways to do this and Common Lisp has four.
I don't know how many Python has, but in situation of
such ambiguity, it would be disastrous NOT to use words
to make it clear which is meant.

I find the use of double up arrow for exponentiation odd.
Back in the days of BASIC on a model 33 Teletype, one
used the single up arrow for that purpose.

As for floor and ceiling, it would be truer to mathematical
notation to use
        ⌊x⌋
for floor.  (I note that in Arial Unicode as this appears
on my screen these characters look horrible.  They should
have the same vertical extent as the square brackets they
are derived from.  Cambria Math and Lucida Sans are OK.)

The claim that "dicts are more fundamental to programming
than sets" appears to be falsified by SETL, in which
dicts were just a special case of sets.  (For that matter,
so they were in Smalltalk-80.)

For existing computational notations with rich sets of
mathematical symbols look at Z and B.  (B as in the B
method, not as in the ancestor of C.)

The claim that mathematical expressions cannot be written
in Lisp or COBOL is clearly false.  See Interlisp, which
allowed infix operators.  COBOL uses "-" for subtraction,
it just needs spaces around it, which is a Good Thing.
Using the centre dot as a word separator would have more
merit if it weren't so useful as an operator.

The reference to APL has switch the operands of take
and drop.  It should be
        number_to_keep ↑ vector
        number_to_lose ↓ vector
>
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Unicode Haskell source -- Yippie!

Rustom Mody
Hi Richard
Thanks for a vigorous and rigorous appraisal of my blog post:
http://blog.languager.org/2014/04/unicoded-python.html

However this is a Haskell list and my post being not just a discussion about python but some brainstorming for how python could change, a detailed discussion about that is probably too off-topic here dont you think?

So for now let me address just one of your points, which is appropriate for this forum.

I'd be pleased to discuss the other points you raise off list.

Also, while Ive learnt a lot from this thread, I also see some confusions and fallacies.
So before drilling down into details and losing the forest for the trees, I'd prefer to start with a broad perspective rather than a narrow technological focus -- more at end.


On Tue, Apr 29, 2014 at 11:04 AM, Richard A. O'Keefe <[hidden email]> wrote:
Before speaking of "Apl's mistakes", one should be
clear about what exactly those mistakes *were*.
I should point out that the symbols of APL, as such,
were not a problem.  But the *number* of such symbols
was.  In order to avoid questions about operator
precedence, APL *hasn't* any.  In the same way,
Smalltalk has an extensible set of 'binary selectors'.
If you see an expression like

        a ÷> b ~@ c

which operator dominates which?  Smalltalk adopted the
same solution as APL:  no operator precedence.

Before Pascal, there was something approaching a
consensus in programming languages that
        **                      tightest
        *,/,div,mod
        unary and binary +,-
        relational operators
        not
        and
        or
In order to make life easier with user-defined
operators, Algol 68 broke this by making unary
operators (including not and others you haven't
heard of like 'down' and 'upb') bind tightest.
As it turned out, this make have made life
easier for the compiler, but not for people.
In order, allegedly, to make life easier for
students, Pascal broke this by making 'or'
and 'and' at the same level as '+' and '*'.
To this day, many years after Pascal vanished
(Think Pascal is dead, MrP is dead, MPW Pascal
is dead, IBM mainframe Pascal died so long ago
it doesn't smell any more, Sun Pascal is dead, ...)
a couple of generations of programmers believe
that you have to write
        (x > 0) && (x < n)
in C, because of what their Pascal-trained predecessor
taught them.

If we turn to Unicode, how should we read

        a ⊞ b ⟐ c

Maybe someone has a principled way to tell.  I don't.

Without claiming to cover all cases, this is a 'principle'
If we have:
(⊞) :: a -> a -> b
(⟐) :: b -> b -> c

then ⊞'s precedence should be higher than ⟐.
This is what makes it natural to have the precedences of (+) (<) (&&) in decreasing order.

This is also why the bitwise operators in C have the wrong precedence:
x & 0xF == 0xF
has only 1 meaningful interpretation; C chooses the other!
The error comes (probably) from treating & as close to the logical operators like && whereas in fact it is more kin to arithmetic operators like +.

There are of course other principles:
Dijkstra argued vigorously that boolean algebra being completely symmetric in
(∨,True)  (∧, False),  ∧, should have the same precedence.

Evidently not too many people agree with him!

----------------------
To come back to the broader questions.


and I find that the new unicode chars for -<< and >>- are missing.
Ok, a minor doc-bug perhaps?

Poking further into that web-page, I find that it has
charset=ISO-8859-1


Running w3's validator http://validator.w3.org/ on it one gets:
No DOCTYPE found!

What has this got to do with unicode in python source?
That depends on how one sees it.

When I studied C (nearly 30 years now!) we used gets as a matter of course.
Today we dont.

Are Kernighan and Ritchie wrong in teaching it?
Are today's teacher's wrong in proscribing it?

I believe the only reasonable outlook is that truth changes with time: it was ok then; its not today.

Likewise DOCTYPE-missing and charset-other-than-UTF-8.
Random example  showing how right yesterday becomes wrong today:
http://www.sitepoint.com/forums/showthread.php?660779-Content-type-iso-8859-1-or-utf-8

Unicode vs ASCII in program source is similar (I believe).
My thoughts on this (of a philosophical nature) are:
http://blog.languager.org/2014/04/unicode-and-unix-assumption.html

If we can get the broader agreements (disagreements!) out of the way to start with, we may then look at the details.

Thanks and regards,
Rusi

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Unicode Haskell source -- Yippie!

Daniel Fischer
On Wednesday 30 April 2014, 13:51:38, Rustom Mody wrote:
> Without claiming to cover all cases, this is a 'principle'
> If we have:
> (⊞) :: a -> a -> b
> (⟐) :: b -> b -> c
> then ⊞'s precedence should be higher than ⟐.
But what if (⟐) :: b -> b -> a?

> This is what makes it natural to have the precedences of (+) (<) (&&) in
> decreasing order.
>
> This is also why the bitwise operators in C have the wrong precedence:
> x & 0xF == 0xF
> has only 1 meaningful interpretation; C chooses the other!
> The error comes (probably) from treating & as close to the logical
> operators like && whereas in fact it is more kin to arithmetic operators
> like +.

That comes from `&` and `|` being logical operators in B. Quoth Dennis Ritchie
(http://cm.bell-labs.com/who/dmr/chist.html in the section "Neonatal C"):

> to make the conversion less painful, we decided to keep the precedence of
> the & operator the same relative to ==, and merely split the precedence of
> && slightly from &. Today, it seems that it would have been preferable to
> move the relative precedences of & and ==, and thereby simplify a common C
> idiom
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Unicode Haskell source -- Yippie!

Rustom Mody
On Wed, Apr 30, 2014 at 2:33 PM, Daniel Fischer <[hidden email]> wrote:
> x & 0xF == 0xF
> has only 1 meaningful interpretation; C chooses the other!
> The error comes (probably) from treating & as close to the logical
> operators like && whereas in fact it is more kin to arithmetic operators
> like +.

That comes from `&` and `|` being logical operators in B. Quoth Dennis Ritchie
(http://cm.bell-labs.com/who/dmr/chist.html in the section "Neonatal C"):

> to make the conversion less painful, we decided to keep the precedence of
> the & operator the same relative to ==, and merely split the precedence of
> && slightly from &. Today, it seems that it would have been preferable to
> move the relative precedences of & and ==, and thereby simplify a common C
> idiom

Nice! I learn a bit of history.
Hope we learn from it!

viz. Some things which are easy in a state of transition become painful in a (more) steady state.


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Unicode Haskell source -- Yippie!

Rustom Mody
In reply to this post by Daniel Fischer
On Wed, Apr 30, 2014 at 2:33 PM, Daniel Fischer <[hidden email]> wrote:
On Wednesday 30 April 2014, 13:51:38, Rustom Mody wrote:
> Without claiming to cover all cases, this is a 'principle'
> If we have:
> (⊞) :: a -> a -> b
> (⟐) :: b -> b -> c
> then ⊞'s precedence should be higher than ⟐.
But what if (⟐) :: b -> b -> a?

Sorry, missed that question tucked away :-)
I did say a (not the) principle, not claiming to cover all cases!

I guess it should be non-associative (ie infix without l/r) same precedence?


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Unicode Haskell source -- Yippie!

Richard A. O'Keefe
In reply to this post by Rustom Mody
I wrote
>> If we turn to Unicode, how should we read
>>
>>         a ⊞ b ⟐ c
>>
>> Maybe someone has a principled way to tell.  I don't.

Rustom Mody wrote:
>
> Without claiming to cover all cases, this is a 'principle'
> If we have:
> (⊞) :: a -> a -> b
> (⟐) :: b -> b -> c
>
> then ⊞'s precedence should be higher than ⟐.

I always have trouble with "higher" and "lower" precedence,
because I've used languages where the operator with the bigger
number binds tighter and languages where the operator with the
bigger number gets to dominate the other.  Both are natural
enough, but with opposite meanings for "higher".

This principle does not explain why * binds tighter than +,
which means we need more than one principle.
It also means that if OP1 :: a -> a -> b and OP2 :: b -> b -> a
then OP1 should be higher than OP2 and OP2 should be higher
than OP1, which is a bit of a puzzler, unless perhaps you are
advocating a vaguely CGOL-ish asymmetric precedence scheme
where the precedence on the left and the precedence on the
right can be different.

For the record, let me stipulate that I had in mind a situation
where OP1, OP2 : a -> a -> a.  For example, APL uses the floor
and ceiling operators infix to stand for max and min.  This
principle offers us no help in ordering max and min.

Or consider APL again, whence I'll borrow (using ASCII because
this is webmail tonight)
    take, rotate :: Int -> Vector t -> Vector t
Haskell applies operator precedence before it does type
checking, so how would it know to parse
    n `take` m `rotate` v
as (n `take` (m `rotate` v))?

I don't believe there was anything in my original example to
suggest that either operator had two operands of the same type,
so I must conclude that this principle fails to provide any
guidance in that case (like this one).


> This is what makes it natural to have the precedences of (+) (<) (&&) in
> decreasing order.
>
> This is also why the bitwise operators in C have the wrong precedence:

Oh, I agree with that!

> The error comes (probably) from treating & as close to the logical
> operators like && whereas in fact it is more kin to arithmetic operators
> like +.

The error comes from BCPL where & and && were the same operator
(similarly | and ||).  At some point in the evolution of C from BCPL
the operators were split apart but the bitwise ones left in the wrong
place.
>
> There are of course other principles:
> Dijkstra argued vigorously that boolean algebra being completely symmetric
> in
> (∨,True)  (∧, False),  âˆ§, ∨ should have the same precedence.
>
> Evidently not too many people agree with him!

Sadly, I am reading this in a web browser where the Unicode symbols
are completely garbled.  (More precisely, I think it's WebMail doing
it.)  Maybe Unicode isn't ready for prime time yet?

You might be interested to hear that in the Ada programming
language, you are not allowed to mix 'and' with 'or' (or
'and then' with 'or else') without using parentheses.  The
rationale is that the designers did not believe that enough
programmers understood the precedence of and/or.  The GNU C
compiler kvetches when you have p && q || r without otiose
parentheses.  Seems that there are plenty of designers out
there who agree with Dijkstra, not out of a taste for
well-engineered notation, but out of contempt for the
Average Programmer.

> When I studied C (nearly 30 years now!) we used gets as a matter of
> course.
> Today we dont.

Hmm.  I started with C in late 1979.  Ouch.  That's 34 and a half
years ago.  This was under Unix version 6+, with a slightly
"pre-classic" C.  A little later we got EUC Unix version 7, and a
'classic' C compiler that, oh joy, supported /\ (min) and \/ (max)
operators.  [With a bug in the code generator that I patched.]

> Are Kernighan and Ritchie wrong in teaching it?
> Are today's teacher's wrong in proscribing it?
>
> I believe the only reasonable outlook is that truth changes with time: it
> was ok then; its not today.

In this case, bull-dust!  gets() is rejected today because a
botch in its design makes it bug-prone.  Nothing has changed.
It was bug-prone 34 years ago.  It has ALWAYS been a bad idea
to use gets().  Amongst other things, the Unix manuals have
always presented the difference between gets() -- discards
the terminator -- and fgets() -- annoyingly retains the
terminator -- as a bug which they thought it was too late to
fix; after all, C had hundreds of users!  No, it was obvious
way back then:  you want to read a line?  Fine, WRITE YOUR OWN
FUNCTION, because there is NO C library function that does
quite what you want.  The great thing about C was that you
*could* write your own line-reading function without suffering.
Not only would your function do the right thing (whatever you
conceived that to be), it would be as fast, or nearly as fast,
as the built-in one.  Try doing *that* in PL/I!

No, in this case, *opinions* may have changed, peoples
*estimation* of and *tolerance for* the risks may have
changed, but the truth has not changed.
>
> Likewise DOCTYPE-missing and charset-other-than-UTF-8.
> Random example  showing how right yesterday becomes wrong today:
> http://www.sitepoint.com/forums/showthread.php?660779-Content-type-iso-8859-1-or-utf-8

Well, "missing" DOCTYPE is where it starts to get a bit technical.
An SGML document is basically made up of three parts:
  - an SGML declaration (meta-meta-data) that tells the
    parser, amongst other things, what characters to use for
    delimiters, whether various things are case sensitive,
    what the numeric limits are, and whether various features
    are enabled.
  - a Document Type Declaration (meta-data) that conforms to
    the lexical rules set up by the SGML declaration and
    defines (a) the grammar rules and (b) a bunch of macros.
  - a document (data).
The SGML declaration can be supplied to a parser as data (and
yes, I've done that), or it can be stipulated by convention
(as the HTML standards do).  In the same way, the DTD can be
  - completely declared in-line
  - defined by reference with local amendments
  - defined solely by reference
  - known by convention.
If there is a convention that a document without a DTD uses
a particular DTD, SGML is fine with that.  (It's all part of
"entity management", one of the minor arcana of SGML.)

As for the link in question, it doesn't show right turning into
wrong.  A quick summary of the sensible part of that thread:

   - If you use a <meta> tag to specify the encoding of your
     file, it had better be *right*.

     This has been true ever since <meta> tags first existed.

   - If you have a document in Latin 1 and any characters
     outside that range are written as character entity references
     or numeric character references, there is no need to change.

     No change of right to wrong here!

   - If you want to use English punctuation marks like dashes and
     curly quotes, using UTF-8 will let you write these characters
     without character entities or NCRs.

     This is only half true.  It will let you do this conveniently
     IF your local environment has fonts that include the characters.
     (Annoyingly, in Mac OS 10.6, which I'm typing on,
     Edit|Special characters is not only geographically confused,
     listing Coptic as a *European* script -- last type I checked
     Egypt was still in Africa -- but it doesn't display any Coptic
     characters.  In the Mac OS 10.7 system I normally use,
     Edit|Special characters got dramatically worse as an interface,
     but no more competent with Coptic characters.  Just because a
     character is in Unicode doesn't mean it can be *used*,
     practically speaking.)

     Instead of saying that what is wrong has become or is becoming
     right, I'd prefer to say that what was impossible is becoming
     possible and what was broken (Unicode font support) is gradually
     getting fixed.

   - Some Unicode characters, indeed, some Latin 1 characters, are
     so easy to confuse with other characters that it is advisable
     to use character entities.

     Again, nothing about wrong turning into right.  This was good
     advice as soon as Latin 1 came out.

> Unicode vs ASCII in program source is similar (I believe).

Well, not really.  People using specification languages like Z
routinely used characters way outside the ASCII range; one way
was to use LaTeX.  Another way was to have GUI systems that
let you key in using LaTeX character names or menus but see the
intended characters.  Back in about 1984 I was able to use a
16-bit character set on the Xerox Lisp Machines.  I've still
got a manual for the XNS character set somewhere.  In one of
the founding documents for the ISO Prolog standard, I
recommended, in 1984, that the Prolog standard.  That's THREE
YEARS before Unicode was a gleam in its founders' eyes.

This is NOT new.  As soon as there were bit-mapped displays
and laser printers, there was pressure to allow a wider range
of characters in programs.  Let me repeat that: 30 years ago
I was able to use non-ASCII characters in computer programs.
*Easily*, via virtual keyboards.

In 1987, the company I was working at in California revamped
their system to handle 16-bit characters and we bought a
terminal that could handle Japanese characters.  Of course
this was because we wanted to sell our system in Japan.
But this was shortly before X11 came out; the MIT window
system of the day was X10 and the operating system we were
using the 16-bit characters on was VMS.  That's 27 years ago.

This is not new.

So what _is_ new?

* A single standard.

  Wait, we DON'T have a single standard.  We have a single
  standard *provider* issuing a rapid series of revisions
  of an increasingly complex standard, where entire features
  are first rejected outright, then introduced, and then
  deprecated again.  Unicode 6.3 came out last year with
  five new characters (bringing the total to 110,122),
  over a thousand new character *variants*, two new normative
  properties, and a new BIDI algorithm which I don't yet
  understand.  And Unicode 7.0 is due out in 3 months.

  Because of this
  - different people WILL have tools that understand different
    versions of Unicode.  In fact, different tools in the same
    environment may do this.
  - your beautiful character WILL show up as garbage or even
    blank on someone's screen UNLESS it is an old or extremely
    popular (can you say Emoji?  I knew you could.  Can you
    teach me how to say it?) one.
  - when proposing to exploit Unicode characters, it is VITAL
    to understand that the Unicode "stability" rules are and
    which characters have what stable properties.

* With large cheap discs, large fonts are looking like a lot less
  of a problem.  (I failed to learn to read the Armenian letters,
  but do have those.  I succeeded in learning to read the Coptic
  letters -- but not the language(s)! -- but don't have those.
  Life is not fair.)

* We now have (a series of versions of) a standard character set
  containing a vast number of characters.  I very much doubt whether
  there is any one person who knows all the Unicode characters.

* Many of these characters are very similar.  I counted 64 "right
  arrow" characters before I gave up; this didn't include harpoons.
  Some of these are _very_ similar.  Some characters are visibly
  distinct, but normally regarded as mere stylistic differences.
  For example, <= has at least three variations (one bar, slanted;
  one bar, flat; two bars, flat) which people familiar with
  less than or equal have learned *not* to tell apart. But they
  are three different Unicode characters, from which we could
  make three different operators with different precedence or
  associativity, and of course type.

> My thoughts on this (of a philosophical nature) are:
> http://blog.languager.org/2014/04/unicode-and-unix-assumption.html
>
> If we can get the broader agreements (disagreements!) out of the way to
> start with, we may then look at the details.

I think Haskell can tolerate an experimental phase where people
try out a lot of things as long as everyone understands that it
*IS* an experimental phase, and as long as experimental operators
are kept out of Hackage, certainly out of the Platform, or at
least segregate it into areas with big flashing "danger" signs.

I think a *small* number of "pretty" operators can be added to
Haskell, without the sky falling, and I'll probably quite like
the result.  (Does anyone know how to get a copy of the
collected The Squiggolist?)  Let's face it, if a program is
full of Armenian identifiers or Ogham ones I'm not going to
have a clue what it's about anyway.  But keeping the "standard"
-- as in used in core modules -- letter and operator sets smallish
is probably a good idea.

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
123