Fwd: Re: [commit: ghc] ghc-7.10: base: fix #10298 & #7695 (25b8478)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Fwd: Re: [commit: ghc] ghc-7.10: base: fix #10298 & #7695 (25b8478)

Edward Z. Yang
This commit broke the HM builds: https://phabricator.haskell.org/B4147

When I validate locally, though, it works fine.

Edward

Excerpts from git's message of 2015-05-28 18:11:07 -0700:

> Repository : ssh://[hidden email]/ghc
>
> On branch  : ghc-7.10
> Link       : http://ghc.haskell.org/trac/ghc/changeset/25b84781ed950d59c7bffb77a576d3c43a883ca9/ghc
>
> >---------------------------------------------------------------
>
> commit 25b84781ed950d59c7bffb77a576d3c43a883ca9
> Author: Austin Seipp <[hidden email]>
> Date:   Tue May 19 04:56:40 2015 -0500
>
>     base: fix #10298 & #7695
>    
>     Summary:
>     This applies a patch from Reid Barton and Sylvain Henry, which fix a
>     disasterous infinite loop when iconv fails to load locale files, as
>     specified in #10298.
>    
>     The fix is a bit of a hack but should be fine - for the actual reasoning
>     behind it, see `Note [Disaster and iconv]` for more info.
>    
>     In addition to this fix, we also patch up the IO Encoding utilities to
>     recognize several variations of the 'ASCII' encoding (including its
>     aliases) directly so that GHC can do conversions without iconv. This
>     allows a static binary to sit in an initramfs.
>    
>     Authored-by: Reid Barton <[hidden email]>
>     Authored-by: Sylvain Henry <[hidden email]>
>     Signed-off-by: Austin Seipp <[hidden email]>
>    
>     Test Plan: Eyeballed it.
>    
>     Reviewers: rwbarton, hvr
>    
>     Subscribers: bgamari, thomie
>    
>     Differential Revision: https://phabricator.haskell.org/D898
>    
>     GHC Trac Issues: #10298, #7695
>    
>     (cherry picked from commit e28462de700240288519a016d0fe44d4360d9ffd)
>
> >---------------------------------------------------------------
>
> 25b84781ed950d59c7bffb77a576d3c43a883ca9
>  libraries/base/GHC/IO/Encoding.hs | 14 +++++++++++++-
>  libraries/base/GHC/TopHandler.hs  | 29 ++++++++++++++++++++++++++++-
>  2 files changed, 41 insertions(+), 2 deletions(-)
>
> diff --git a/libraries/base/GHC/IO/Encoding.hs b/libraries/base/GHC/IO/Encoding.hs
> index 31683b4..014b61b 100644
> --- a/libraries/base/GHC/IO/Encoding.hs
> +++ b/libraries/base/GHC/IO/Encoding.hs
> @@ -235,7 +235,14 @@ mkTextEncoding e = case mb_coding_failure_mode of
>          _             -> Nothing
>  
>  mkTextEncoding' :: CodingFailureMode -> String -> IO TextEncoding
> -mkTextEncoding' cfm enc = case [toUpper c | c <- enc, c /= '-'] of
> +mkTextEncoding' cfm enc
> +  -- First, specifically match on ASCII encodings directly using
> +  -- several possible aliases (specified by RFC 1345 & co), which
> +  -- allows us to handle ASCII conversions without iconv at all (see
> +  -- trac #10298).
> +  | any (== enc) ansiEncNames = return (UTF8.mkUTF8 cfm)
> +  -- Otherwise, handle other encoding needs via iconv.
> +  | otherwise = case [toUpper c | c <- enc, c /= '-'] of
>      "UTF8"    -> return $ UTF8.mkUTF8 cfm
>      "UTF16"   -> return $ UTF16.mkUTF16 cfm
>      "UTF16LE" -> return $ UTF16.mkUTF16le cfm
> @@ -249,6 +256,11 @@ mkTextEncoding' cfm enc = case [toUpper c | c <- enc, c /= '-'] of
>  #else
>      _ -> Iconv.mkIconvEncoding cfm enc
>  #endif
> +  where
> +    ansiEncNames = -- ASCII aliases
> +      [ "ANSI_X3.4-1968", "iso-ir-6", "ANSI_X3.4-1986", "ISO_646.irv:1991"
> +      , "US-ASCII", "us", "IBM367", "cp367", "csASCII", "ASCII", "ISO646-US"
> +      ]
>  
>  latin1_encode :: CharBuffer -> Buffer Word8 -> IO (CharBuffer, Buffer Word8)
>  latin1_encode input output = fmap (\(_why,input',output') -> (input',output')) $ Latin1.latin1_encode input output -- unchecked, used for char8
> diff --git a/libraries/base/GHC/TopHandler.hs b/libraries/base/GHC/TopHandler.hs
> index d7c0038..e725196 100644
> --- a/libraries/base/GHC/TopHandler.hs
> +++ b/libraries/base/GHC/TopHandler.hs
> @@ -157,13 +157,40 @@ real_handler exit se = do
>             Just (ExitFailure n) -> exit n
>  
>             -- EPIPE errors received for stdout are ignored (#2699)
> -           _ -> case fromException se of
> +           _ -> catch (case fromException se of
>                  Just IOError{ ioe_type = ResourceVanished,
>                                ioe_errno = Just ioe,
>                                ioe_handle = Just hdl }
>                     | Errno ioe == ePIPE, hdl == stdout -> exit 0
>                  _ -> do reportError se
>                          exit 1
> +                ) (disasterHandler exit) -- See Note [Disaster with iconv]
> +
> +-- don't use errorBelch() directly, because we cannot call varargs functions
> +-- using the FFI.
> +foreign import ccall unsafe "HsBase.h errorBelch2"
> +   errorBelch :: CString -> CString -> IO ()
> +
> +disasterHandler :: (Int -> IO a) -> IOError -> IO a
> +disasterHandler exit _ =
> +  withCAString "%s" $ \fmt ->
> +    withCAString msgStr $ \msg ->
> +      errorBelch fmt msg >> exit 1
> +  where msgStr = "encountered an exception while trying to report an exception"
> +
> +{- Note [Disaster with iconv]
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +When using iconv, it's possible for things like iconv_open to fail in
> +restricted environments (like an initram or restricted container), but
> +when this happens the error raised inevitably calls `peekCString`,
> +which depends on the users locale, which depends on using
> +`iconv_open`... which causes an infinite loop.
> +
> +This occurrence is also known as tickets #10298 and #7695. So to work
> +around it we just set _another_ error handler and bail directly by
> +calling the RTS, without iconv at all.
> +-}
>  
>  
>  -- try to flush stdout/stderr, but don't worry if we fail
>
--- End forwarded message ---
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Re: [commit: ghc] ghc-7.10: base: fix #10298 & #7695 (25b8478)

Sylvain HENRY
It depends on your locale, this explains the different behavior on your local machine.

The build machine seems to be using an ASCII locale and GHC wants to print some Unicode characters (the tick and back-tick surrounding types in error messages). Before this patch, Iconv was used to do the conversion from Unicode to ASCII. It seems that it replaces the Unicode ticks with ASCII ticks (i.e. 0xE28098 in UTF-8 into 0x60 and 0xE28099 into 0x27).

If you enter:
cd ghc/testsuite/tests
grep -rn "match expected type" **/*.stderr

You can see that some .stderr have been generated with a ASCII locale and some others with a UTF-8 locale by looking at the ticks. Are the tests expecting ASCII output (e.g. driver/T2507) passing on platforms with UTF-8 locales? The output is not equal to the expected one except if it is converted to ASCII before the comparison.

With this patch, we don't use Iconv to convert from Unicode to ASCII because it may not be available in some contexts (docker containers, initramdisk, etc.). Instead we use the UTF-8 encoder to encode ASCII (ASCII characters are encoded in the same way in ASCII and in UTF-8) and we don't try to match Unicode only characters to ASCII ones.

Solutions:
1) change our patch to use our method only when Iconv cannot be used.
2) implement the Unicode to ASCII conversion as performed by Iconv
3) change the locale to a UTF-8 one on the build machine ;-)

Sylvain




2015-05-29 3:23 GMT+02:00 Edward Z. Yang <[hidden email]>:
This commit broke the HM builds: https://phabricator.haskell.org/B4147

When I validate locally, though, it works fine.

Edward

Excerpts from git's message of 2015-05-28 18:11:07 -0700:
> Repository : ssh://git@.../ghc
>
> On branch  : ghc-7.10
> Link       : http://ghc.haskell.org/trac/ghc/changeset/25b84781ed950d59c7bffb77a576d3c43a883ca9/ghc
>
> >---------------------------------------------------------------
>
> commit 25b84781ed950d59c7bffb77a576d3c43a883ca9
> Author: Austin Seipp <[hidden email]>
> Date:   Tue May 19 04:56:40 2015 -0500
>
>     base: fix #10298 & #7695
>
>     Summary:
>     This applies a patch from Reid Barton and Sylvain Henry, which fix a
>     disasterous infinite loop when iconv fails to load locale files, as
>     specified in #10298.
>
>     The fix is a bit of a hack but should be fine - for the actual reasoning
>     behind it, see `Note [Disaster and iconv]` for more info.
>
>     In addition to this fix, we also patch up the IO Encoding utilities to
>     recognize several variations of the 'ASCII' encoding (including its
>     aliases) directly so that GHC can do conversions without iconv. This
>     allows a static binary to sit in an initramfs.
>
>     Authored-by: Reid Barton <[hidden email]>
>     Authored-by: Sylvain Henry <[hidden email]>
>     Signed-off-by: Austin Seipp <[hidden email]>
>
>     Test Plan: Eyeballed it.
>
>     Reviewers: rwbarton, hvr
>
>     Subscribers: bgamari, thomie
>
>     Differential Revision: https://phabricator.haskell.org/D898
>
>     GHC Trac Issues: #10298, #7695
>
>     (cherry picked from commit e28462de700240288519a016d0fe44d4360d9ffd)
>
> >---------------------------------------------------------------
>
> 25b84781ed950d59c7bffb77a576d3c43a883ca9
>  libraries/base/GHC/IO/Encoding.hs | 14 +++++++++++++-
>  libraries/base/GHC/TopHandler.hs  | 29 ++++++++++++++++++++++++++++-
>  2 files changed, 41 insertions(+), 2 deletions(-)
>
> diff --git a/libraries/base/GHC/IO/Encoding.hs b/libraries/base/GHC/IO/Encoding.hs
> index 31683b4..014b61b 100644
> --- a/libraries/base/GHC/IO/Encoding.hs
> +++ b/libraries/base/GHC/IO/Encoding.hs
> @@ -235,7 +235,14 @@ mkTextEncoding e = case mb_coding_failure_mode of
>          _             -> Nothing
>
>  mkTextEncoding' :: CodingFailureMode -> String -> IO TextEncoding
> -mkTextEncoding' cfm enc = case [toUpper c | c <- enc, c /= '-'] of
> +mkTextEncoding' cfm enc
> +  -- First, specifically match on ASCII encodings directly using
> +  -- several possible aliases (specified by RFC 1345 & co), which
> +  -- allows us to handle ASCII conversions without iconv at all (see
> +  -- trac #10298).
> +  | any (== enc) ansiEncNames = return (UTF8.mkUTF8 cfm)
> +  -- Otherwise, handle other encoding needs via iconv.
> +  | otherwise = case [toUpper c | c <- enc, c /= '-'] of
>      "UTF8"    -> return $ UTF8.mkUTF8 cfm
>      "UTF16"   -> return $ UTF16.mkUTF16 cfm
>      "UTF16LE" -> return $ UTF16.mkUTF16le cfm
> @@ -249,6 +256,11 @@ mkTextEncoding' cfm enc = case [toUpper c | c <- enc, c /= '-'] of
>  #else
>      _ -> Iconv.mkIconvEncoding cfm enc
>  #endif
> +  where
> +    ansiEncNames = -- ASCII aliases
> +      [ "ANSI_X3.4-1968", "iso-ir-6", "ANSI_X3.4-1986", "ISO_646.irv:1991"
> +      , "US-ASCII", "us", "IBM367", "cp367", "csASCII", "ASCII", "ISO646-US"
> +      ]
>
>  latin1_encode :: CharBuffer -> Buffer Word8 -> IO (CharBuffer, Buffer Word8)
>  latin1_encode input output = fmap (\(_why,input',output') -> (input',output')) $ Latin1.latin1_encode input output -- unchecked, used for char8
> diff --git a/libraries/base/GHC/TopHandler.hs b/libraries/base/GHC/TopHandler.hs
> index d7c0038..e725196 100644
> --- a/libraries/base/GHC/TopHandler.hs
> +++ b/libraries/base/GHC/TopHandler.hs
> @@ -157,13 +157,40 @@ real_handler exit se = do
>             Just (ExitFailure n) -> exit n
>
>             -- EPIPE errors received for stdout are ignored (#2699)
> -           _ -> case fromException se of
> +           _ -> catch (case fromException se of
>                  Just IOError{ ioe_type = ResourceVanished,
>                                ioe_errno = Just ioe,
>                                ioe_handle = Just hdl }
>                     | Errno ioe == ePIPE, hdl == stdout -> exit 0
>                  _ -> do reportError se
>                          exit 1
> +                ) (disasterHandler exit) -- See Note [Disaster with iconv]
> +
> +-- don't use errorBelch() directly, because we cannot call varargs functions
> +-- using the FFI.
> +foreign import ccall unsafe "HsBase.h errorBelch2"
> +   errorBelch :: CString -> CString -> IO ()
> +
> +disasterHandler :: (Int -> IO a) -> IOError -> IO a
> +disasterHandler exit _ =
> +  withCAString "%s" $ \fmt ->
> +    withCAString msgStr $ \msg ->
> +      errorBelch fmt msg >> exit 1
> +  where msgStr = "encountered an exception while trying to report an exception"
> +
> +{- Note [Disaster with iconv]
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +When using iconv, it's possible for things like iconv_open to fail in
> +restricted environments (like an initram or restricted container), but
> +when this happens the error raised inevitably calls `peekCString`,
> +which depends on the users locale, which depends on using
> +`iconv_open`... which causes an infinite loop.
> +
> +This occurrence is also known as tickets #10298 and #7695. So to work
> +around it we just set _another_ error handler and bail directly by
> +calling the RTS, without iconv at all.
> +-}
>
>
>  -- try to flush stdout/stderr, but don't worry if we fail
>
--- End forwarded message ---
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Re: [commit: ghc] ghc-7.10: base: fix #10298 & #7695 (25b8478)

Thomas Miedema
I ran 'make accept' on those two tests (they were failing for me locally), and now the build is green again. This was the change:


On Fri, May 29, 2015 at 5:35 PM, Sylvain Henry <[hidden email]> wrote:
It depends on your locale, this explains the different behavior on your local machine.

The build machine seems to be using an ASCII locale and GHC wants to print some Unicode characters (the tick and back-tick surrounding types in error messages). Before this patch, Iconv was used to do the conversion from Unicode to ASCII. It seems that it replaces the Unicode ticks with ASCII ticks (i.e. 0xE28098 in UTF-8 into 0x60 and 0xE28099 into 0x27).

If you enter:
cd ghc/testsuite/tests
grep -rn "match expected type" **/*.stderr

You can see that some .stderr have been generated with a ASCII locale and some others with a UTF-8 locale by looking at the ticks. Are the tests expecting ASCII output (e.g. driver/T2507) passing on platforms with UTF-8 locales? The output is not equal to the expected one except if it is converted to ASCII before the comparison.

With this patch, we don't use Iconv to convert from Unicode to ASCII because it may not be available in some contexts (docker containers, initramdisk, etc.). Instead we use the UTF-8 encoder to encode ASCII (ASCII characters are encoded in the same way in ASCII and in UTF-8) and we don't try to match Unicode only characters to ASCII ones.

Solutions:
1) change our patch to use our method only when Iconv cannot be used.
2) implement the Unicode to ASCII conversion as performed by Iconv
3) change the locale to a UTF-8 one on the build machine ;-)

Sylvain




2015-05-29 3:23 GMT+02:00 Edward Z. Yang <[hidden email]>:
This commit broke the HM builds: https://phabricator.haskell.org/B4147

When I validate locally, though, it works fine.

Edward

Excerpts from git's message of 2015-05-28 18:11:07 -0700:
> Repository : ssh://git@.../ghc
>
> On branch  : ghc-7.10
> Link       : http://ghc.haskell.org/trac/ghc/changeset/25b84781ed950d59c7bffb77a576d3c43a883ca9/ghc
>
> >---------------------------------------------------------------
>
> commit 25b84781ed950d59c7bffb77a576d3c43a883ca9
> Author: Austin Seipp <[hidden email]>
> Date:   Tue May 19 04:56:40 2015 -0500
>
>     base: fix #10298 & #7695
>
>     Summary:
>     This applies a patch from Reid Barton and Sylvain Henry, which fix a
>     disasterous infinite loop when iconv fails to load locale files, as
>     specified in #10298.
>
>     The fix is a bit of a hack but should be fine - for the actual reasoning
>     behind it, see `Note [Disaster and iconv]` for more info.
>
>     In addition to this fix, we also patch up the IO Encoding utilities to
>     recognize several variations of the 'ASCII' encoding (including its
>     aliases) directly so that GHC can do conversions without iconv. This
>     allows a static binary to sit in an initramfs.
>
>     Authored-by: Reid Barton <[hidden email]>
>     Authored-by: Sylvain Henry <[hidden email]>
>     Signed-off-by: Austin Seipp <[hidden email]>
>
>     Test Plan: Eyeballed it.
>
>     Reviewers: rwbarton, hvr
>
>     Subscribers: bgamari, thomie
>
>     Differential Revision: https://phabricator.haskell.org/D898
>
>     GHC Trac Issues: #10298, #7695
>
>     (cherry picked from commit e28462de700240288519a016d0fe44d4360d9ffd)
>
> >---------------------------------------------------------------
>
> 25b84781ed950d59c7bffb77a576d3c43a883ca9
>  libraries/base/GHC/IO/Encoding.hs | 14 +++++++++++++-
>  libraries/base/GHC/TopHandler.hs  | 29 ++++++++++++++++++++++++++++-
>  2 files changed, 41 insertions(+), 2 deletions(-)
>
> diff --git a/libraries/base/GHC/IO/Encoding.hs b/libraries/base/GHC/IO/Encoding.hs
> index 31683b4..014b61b 100644
> --- a/libraries/base/GHC/IO/Encoding.hs
> +++ b/libraries/base/GHC/IO/Encoding.hs
> @@ -235,7 +235,14 @@ mkTextEncoding e = case mb_coding_failure_mode of
>          _             -> Nothing
>
>  mkTextEncoding' :: CodingFailureMode -> String -> IO TextEncoding
> -mkTextEncoding' cfm enc = case [toUpper c | c <- enc, c /= '-'] of
> +mkTextEncoding' cfm enc
> +  -- First, specifically match on ASCII encodings directly using
> +  -- several possible aliases (specified by RFC 1345 & co), which
> +  -- allows us to handle ASCII conversions without iconv at all (see
> +  -- trac #10298).
> +  | any (== enc) ansiEncNames = return (UTF8.mkUTF8 cfm)
> +  -- Otherwise, handle other encoding needs via iconv.
> +  | otherwise = case [toUpper c | c <- enc, c /= '-'] of
>      "UTF8"    -> return $ UTF8.mkUTF8 cfm
>      "UTF16"   -> return $ UTF16.mkUTF16 cfm
>      "UTF16LE" -> return $ UTF16.mkUTF16le cfm
> @@ -249,6 +256,11 @@ mkTextEncoding' cfm enc = case [toUpper c | c <- enc, c /= '-'] of
>  #else
>      _ -> Iconv.mkIconvEncoding cfm enc
>  #endif
> +  where
> +    ansiEncNames = -- ASCII aliases
> +      [ "ANSI_X3.4-1968", "iso-ir-6", "ANSI_X3.4-1986", "ISO_646.irv:1991"
> +      , "US-ASCII", "us", "IBM367", "cp367", "csASCII", "ASCII", "ISO646-US"
> +      ]
>
>  latin1_encode :: CharBuffer -> Buffer Word8 -> IO (CharBuffer, Buffer Word8)
>  latin1_encode input output = fmap (\(_why,input',output') -> (input',output')) $ Latin1.latin1_encode input output -- unchecked, used for char8
> diff --git a/libraries/base/GHC/TopHandler.hs b/libraries/base/GHC/TopHandler.hs
> index d7c0038..e725196 100644
> --- a/libraries/base/GHC/TopHandler.hs
> +++ b/libraries/base/GHC/TopHandler.hs
> @@ -157,13 +157,40 @@ real_handler exit se = do
>             Just (ExitFailure n) -> exit n
>
>             -- EPIPE errors received for stdout are ignored (#2699)
> -           _ -> case fromException se of
> +           _ -> catch (case fromException se of
>                  Just IOError{ ioe_type = ResourceVanished,
>                                ioe_errno = Just ioe,
>                                ioe_handle = Just hdl }
>                     | Errno ioe == ePIPE, hdl == stdout -> exit 0
>                  _ -> do reportError se
>                          exit 1
> +                ) (disasterHandler exit) -- See Note [Disaster with iconv]
> +
> +-- don't use errorBelch() directly, because we cannot call varargs functions
> +-- using the FFI.
> +foreign import ccall unsafe "HsBase.h errorBelch2"
> +   errorBelch :: CString -> CString -> IO ()
> +
> +disasterHandler :: (Int -> IO a) -> IOError -> IO a
> +disasterHandler exit _ =
> +  withCAString "%s" $ \fmt ->
> +    withCAString msgStr $ \msg ->
> +      errorBelch fmt msg >> exit 1
> +  where msgStr = "encountered an exception while trying to report an exception"
> +
> +{- Note [Disaster with iconv]
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +When using iconv, it's possible for things like iconv_open to fail in
> +restricted environments (like an initram or restricted container), but
> +when this happens the error raised inevitably calls `peekCString`,
> +which depends on the users locale, which depends on using
> +`iconv_open`... which causes an infinite loop.
> +
> +This occurrence is also known as tickets #10298 and #7695. So to work
> +around it we just set _another_ error handler and bail directly by
> +calling the RTS, without iconv at all.
> +-}
>
>
>  -- try to flush stdout/stderr, but don't worry if we fail
>
--- End forwarded message ---
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs



_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs