Same compiled program behaving differently when called from ghci and shell

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Same compiled program behaving differently when called from ghci and shell

Bruno Damour
Hello,
I have a very strange (for me) problem that I manage to reduce to this :
I have a small program that reads a file with 1 only character (è = e8)
The program is ftest2.hs :

import IO

import Data.Maybe

tfind s = lookup (head s) $ zip ['\xe8', '\xde'] "12"

main    = do

             h<- readFile "g:\\CODE\\rlib\\test.txt"

             putStrLn h

             print $ tfind h

I compile it from command line :

ghc --make ftest2.hs

Now the weird results :
1/ cmd line:

ftest2.exe

è

Just '2'

2/ ghci

Prelude>  :!ftest2.exe

è

Just '2'


3/WinGHci

Prelude>  :! ftest2.exe

è

Just '1'


I tested different variants, there is always a difference.
Any idea to help me trace this behaviour ?
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Same compiled program behaving differently when called from ghci and shell

Bulat Ziganshin-2
Hello Bruno,

Sunday, November 21, 2010, 8:49:52 AM, you wrote:

> ghc --make ftest2.hs

may be your versions of ghc and (win)ghci are different? the behavior
was changed in latest versions afaik


--
Best regards,
 Bulat                            mailto:[hidden email]

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Same compiled program behaving differently when called from ghci and shell

Bruno Damour
Le 21/11/10 11:03, Bulat Ziganshin a écrit :
> Hello Bruno,
>
> Sunday, November 21, 2010, 8:49:52 AM, you wrote:
>
>> ghc --make ftest2.hs
> may be your versions of ghc and (win)ghci are different? the behavior
> was changed in latest versions afaik
>
>
that would be surprising, I only installed Haskell Platform 2.0.0...
couple of cabal installed packages but...

what would be the change of behaviour you're mentioning ?
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Same compiled program behaving differently when called from ghci and shell

Manlio Perillo-3
In reply to this post by Bruno Damour
Il 21/11/2010 06:49, Bruno Damour ha scritto:
> Hello,
> I have a very strange (for me) problem that I manage to reduce to this :
> I have a small program that reads a file with 1 only character (è = e8)
> The program is ftest2.hs :
>

> [...]

The only difference I can see is the codepage used.

The Windows console use codepage 850:
http://stackoverflow.com/questions/1259084/what-encoding-code-page-is-cmd-exe-using

Instead the default codepage of Windows for western languages is 1252.


Now, "fate" is that (Python console):
>>> '\xe8'.decode('cp1252').encode('cp850')
'\x8a'
>>> '\xde'.decode('cp1252').encode('cp850')
'\xe8'


You can now see the possible cause of the problem.


Try to change the codepage of the console.
See also:
http://www.postgresql.org/docs/9.0/interactive/app-psql.html#AEN75686

> [...]



Regards   Manlio
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Same compiled program behaving differently when called from ghci and shell

Bruno Damour
Le 21/11/10 17:21, Manlio Perillo a écrit :

> Il 21/11/2010 06:49, Bruno Damour ha scritto:
>> Hello,
>> I have a very strange (for me) problem that I manage to reduce to this :
>> I have a small program that reads a file with 1 only character (è = e8)
>> The program is ftest2.hs :
>>
>> [...]
> The only difference I can see is the codepage used.
>
> The Windows console use codepage 850:
> http://stackoverflow.com/questions/1259084/what-encoding-code-page-is-cmd-exe-using
>
> Instead the default codepage of Windows for western languages is 1252.
>
>
> Now, "fate" is that (Python console):
>>>> '\xe8'.decode('cp1252').encode('cp850')
> '\x8a'
>>>> '\xde'.decode('cp1252').encode('cp850')
> '\xe8'
>
>
> You can now see the possible cause of the problem.
>
>
> Try to change the codepage of the console.
> See also:
> http://www.postgresql.org/docs/9.0/interactive/app-psql.html#AEN75686
>
>> [...]
>
>
> Regards   Manlio
> _______________________________________________
> Haskell-Cafe mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/haskell-cafe
yes I kind of began to figure that IO might use an environment setting.
That souns a bit weird to me (newbe) at it should impact the result of a
program depending on where it is launched... its the same binary anyway
? or ?

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Same compiled program behaving differently when called from ghci and shell

Bruno Damour
In reply to this post by Manlio Perillo-3
Le 21/11/10 17:21, Manlio Perillo a écrit :
> Il 21/11/2010 06:49, Bruno Damour ha scritto:
>> Hello,
>> I have a very strange (for me) problem that I manage to reduce to this :
>> I have a small program that reads a file with 1 only character (è = e8)
>> The program is ftest2.hs :

>> The only difference I can see is the codepage used.
>>
>> The Windows console use codepage 850:
>> http://stackoverflow.com/questions/1259084/what-encoding-code-page-is-cmd-exe-using
>>
>> Instead the default codepage of Windows for western languages is 1252.
>> haskell-cafe
Of course you're right but that was a surprise to me...

G:\CODE\rlib>chcp 1252

Page de codes active: 1252

G:\CODE\rlib>ftest3.exe

è

Just '1'

G:\CODE\rlib>chcp 850

Page de codes active : 850

G:\CODE\rlib>ftest3.exe

è

Just '2'


Quite treacherous IMHO ? Or what
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Same compiled program behaving differently when called from ghci and shell

Manlio Perillo-3
In reply to this post by Bruno Damour
Il 21/11/2010 19:06, Bruno Damour ha scritto:

> Le 21/11/10 17:21, Manlio Perillo a écrit :
>> Il 21/11/2010 06:49, Bruno Damour ha scritto:
>>> Hello,
>>> I have a very strange (for me) problem that I manage to reduce to this :
>>> I have a small program that reads a file with 1 only character (è = e8)
>>> The program is ftest2.hs :
>>>
> [...]
>> Now, "fate" is that (Python console):
>>>>> '\xe8'.decode('cp1252').encode('cp850')
>> '\x8a'
>>>>> '\xde'.decode('cp1252').encode('cp850')
>> '\xe8'
>>
> [...]
>
> yes I kind of began to figure that IO might use an environment setting.

Did you tried to execute again the program, setting the console codepage
to 1252?

> That souns a bit weird to me (newbe) at it should impact the result of a
> program depending on where it is launched... its the same binary anyway
> ? or ?

This is only a guess, but recent versions of GHC I/O lib do a low level
encoding, when reading a file in text mode.

This is the correct way, since a Char is supposed to be an Unicode
character.

I assume that when reading a text file, the I/O lib just check the
system encoding and use it.

In your case, you have a text file, codified with codepage 1252, but
that GHC is trying to read using codepage 850, instead.

So, as in the example I posted, you have (using, again, Python syntax):
- the character u'è' - Unicode code point 0xe8
- a byte data in the file, as 0xe8; this is the result of
  u'è'.encode('cp1252')
- a Haskell Char '\xde'; this is the result of
  '\xe8'.decode('cp850')


There are 3 solutions:
1) open the file in binary mode
2) set the console codepage to 1252.

   I do this by changing the "Command Prompt" shortcut destination to:
     `%SystemRoot%\system32\cmd.exe /k chcp 1252`
3) explicitly set the encoding when reading the file in text mode

   Unfortunately this is now a rather low level and GHC specific
   operation:

http://www.haskell.org/ghc/docs/6.12.2/html/libraries/base-4.2.0.1/GHC-IO-Handle.html

   The Python API is, by the way:
   http://docs.python.org/dev/py3k/library/functions.html#open

   GHC API is quite different (if I understand it correctly).
   You can change the encoding only after the file has been opened, and
   you can change it again after having read some data (in Python,
   instead, the file encoding is immutable)



Regards   Manlio
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Same compiled program behaving differently when called from ghci and shell

Manlio Perillo-3
In reply to this post by Bruno Damour
Il 21/11/2010 19:28, Bruno Damour ha scritto:

> [...]
> Of course you're right but that was a surprise to me...
>
> G:\CODE\rlib>chcp 1252
>
> Page de codes active: 1252
>
> G:\CODE\rlib>ftest3.exe
>
> è
>
> Just '1'
>
> G:\CODE\rlib>chcp 850
>
> Page de codes active : 850
>
> G:\CODE\rlib>ftest3.exe
>
> è
>
> Just '2'
>
>
> Quite treacherous IMHO ? Or what

It is not treacherous at all.

When you open a file, GHC use localeEncoding, that, as the name suggest,
depends on system current codepage (in Windows case).

In your example, you are simply changing the codepage, and thus the
program behaviour changes accordling.


It is the same as when you have a program that print some environ
parameter (as an example with System.Environment.getEnvironment).
Of course if you change the OS environ from the console, that program
behaviour will change.

And it is also the same when your program read a file content.
If you change the file content from elsewere, the program behaviour will
change.



As for the original example, I just think that the GHC user guide
*should* clearly explain what does it means to open a file in text mode
[1], and, if possible, add a note about Windows console (as it has been
done with PostgreSQL documentation).


[1] right now I do not remember what the Haskell Report says



Regards   Manlio
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: Same compiled program behaving differently when called from ghci and shell

Manlio Perillo-3
In reply to this post by Manlio Perillo-3
Il 21/11/2010 21:51, Manlio Perillo ha scritto:

> [...]
> There are 3 solutions:
> 1) open the file in binary mode
> 2) set the console codepage to 1252.
>
>    I do this by changing the "Command Prompt" shortcut destination to:
>      `%SystemRoot%\system32\cmd.exe /k chcp 1252`
> 3) explicitly set the encoding when reading the file in text mode
>
>    Unfortunately this is now a rather low level and GHC specific
>    operation:
>
> http://www.haskell.org/ghc/docs/6.12.2/html/libraries/base-4.2.0.1/GHC-IO-Handle.html
>

Correction: encoding support is in System.IO (base 4.2 package), but it
is not documented in the Haskell 2010 Report.


By the way: what is the rationale why the TextEncoding data does not
contain the encoding name?


Regards   Manlio
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe