why spawn (and safeSpawn, etc) use encodeString?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

why spawn (and safeSpawn, etc) use encodeString?

Platon Pronko
Hi!

I noticed that spawn mangled my unicode characters - instead of my unicode character the called program recieved garbage. Looking deeper I found out that spawn pre-processes the string with `encodeString` function.

`encodeString` first converts [Char] into [Word8], and then converts each individual Word8 back into Char. Since unicode Char will be converted into multiple Word8, the resulting string would be quite different.

Example:

Prelude> import Codec.Binary.UTF8.String
Prelude Codec.Binary.UTF8.String> encodeString "Ø"
"\195\152"
Prelude Codec.Binary.UTF8.String> putStrLn "Ø"
Ø
Prelude Codec.Binary.UTF8.String> putStrLn $ encodeString "Ø"
ÃPrelude Codec.Binary.UTF8.String>


Is there a reason why xmonad uses `encodeString` here?

I implemented a copy of the function that doesn't use `encodeString`, seems to work okay:

safeSpawnUnicode :: MonadIO m => FilePath -> [String] -> m ()
safeSpawnUnicode prog args = io $ void $ forkProcess $ do
   uninstallSignalHandlers
   _ <- createSession
   executeFile prog True args Nothing

Best regards,
Platon Pronko
_______________________________________________
xmonad mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/xmonad
Reply | Threaded
Open this post in threaded view
|

Re: why spawn (and safeSpawn, etc) use encodeString?

Gwern Branwen-2
As one of the people responsible for that, the backstory is that at
the time very long ago (2008?), it wasn't clear how to handle Unicode
text in a cross-distro bugfree way while passing through a Haskell
library like XMonad from a Prompt into X11 or the shell or
applications, no one had the appetite to make an in-depth study of the
various systems to figure out what exactly had to be done to handle
ASCII & Unicode in a way that would be safe everywhere, and
`encodeString` seemed to sorta work in most cases and be better that
what came before.

Since the Haskell and other ecosystems have gradually continued
evolving (one hopes), it's possible that many Unicode-related issues
have since quietly vanished, and XMonad could do something simpler and
more correct than it does now; but one would need to investigate
thoroughly on a couple systems before one could be sure it was safe to
update `spawn` and all downstream users of `encodeString` etc, and no
one has been willing to do so to the extent to make a change in the
(generally very stable) HEAD.

--
gwern
https://www.gwern.net
_______________________________________________
xmonad mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/xmonad
Reply | Threaded
Open this post in threaded view
|

Re: why spawn (and safeSpawn, etc) use encodeString?

Platon Pronko
Unfortunately I do not have access to many different systems - only the one I use now, Arch Linux. So I won't be able to test it thoroughly. But as a data point - on my machine Prompt returns UTF8 and that UTF8 can be safely passed into the executeFile call, without encodeString.

Best regards,
Platon Pronko

On 2020-04-17 22:37, Gwern Branwen wrote:

> As one of the people responsible for that, the backstory is that at
> the time very long ago (2008?), it wasn't clear how to handle Unicode
> text in a cross-distro bugfree way while passing through a Haskell
> library like XMonad from a Prompt into X11 or the shell or
> applications, no one had the appetite to make an in-depth study of the
> various systems to figure out what exactly had to be done to handle
> ASCII & Unicode in a way that would be safe everywhere, and
> `encodeString` seemed to sorta work in most cases and be better that
> what came before.
>
> Since the Haskell and other ecosystems have gradually continued
> evolving (one hopes), it's possible that many Unicode-related issues
> have since quietly vanished, and XMonad could do something simpler and
> more correct than it does now; but one would need to investigate
> thoroughly on a couple systems before one could be sure it was safe to
> update `spawn` and all downstream users of `encodeString` etc, and no
> one has been willing to do so to the extent to make a change in the
> (generally very stable) HEAD.
>
_______________________________________________
xmonad mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/xmonad
Reply | Threaded
Open this post in threaded view
|

Re: why spawn (and safeSpawn, etc) use encodeString?

Brandon Allbery
In 2020 my inclination is to encode if it has codepoints > 255 in it and leave it on the user otherwise; it's impossible to guess the right action.

On Fri, Apr 17, 2020, 16:05 Platon Pronko <[hidden email]> wrote:
Unfortunately I do not have access to many different systems - only the one I use now, Arch Linux. So I won't be able to test it thoroughly. But as a data point - on my machine Prompt returns UTF8 and that UTF8 can be safely passed into the executeFile call, without encodeString.

Best regards,
Platon Pronko

On 2020-04-17 22:37, Gwern Branwen wrote:
> As one of the people responsible for that, the backstory is that at
> the time very long ago (2008?), it wasn't clear how to handle Unicode
> text in a cross-distro bugfree way while passing through a Haskell
> library like XMonad from a Prompt into X11 or the shell or
> applications, no one had the appetite to make an in-depth study of the
> various systems to figure out what exactly had to be done to handle
> ASCII & Unicode in a way that would be safe everywhere, and
> `encodeString` seemed to sorta work in most cases and be better that
> what came before.
>
> Since the Haskell and other ecosystems have gradually continued
> evolving (one hopes), it's possible that many Unicode-related issues
> have since quietly vanished, and XMonad could do something simpler and
> more correct than it does now; but one would need to investigate
> thoroughly on a couple systems before one could be sure it was safe to
> update `spawn` and all downstream users of `encodeString` etc, and no
> one has been willing to do so to the extent to make a change in the
> (generally very stable) HEAD.
>
_______________________________________________
xmonad mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/xmonad

_______________________________________________
xmonad mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/xmonad