Crash on Windows with large data

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Crash on Windows with large data

Simon Peyton Jones
Yitz and others

| Would you and your team be interested in looking at this?

In principle yes of course.   I wish we had a "team", as in a bunch of people whose job it is to make GHC fantastic.  However we certainly do have a team of volunteers who do amazing things; and one (totally heroic and overworked) Ian Lynagh who really does have it as his job.  So in practice it might be harder.

Does anyone feel able to help Yitz?  

Ian might, depending on how hard it was to put him into a position where he could reproduce it.  Plus it depends on how critical it is for you.  

Simon

| -----Original Message-----
| From: sefer.org at gmail.com [mailto:sefer.org at gmail.com] On Behalf Of
| Yitzchak Gale
| Sent: 21 March 2013 21:03
| To: Simon Peyton-Jones
| Subject: Crash on Windows with large data
|
| Hi Simon,
|
| At work, we have an app that ran into a BEX64 crash on Windows when
| processing a large and complex data set. I compiled a special version of
| our app with GHC 7.6.2 for Windows, because we suspect it needs to run
| on 64 bits.
|
| On Linux, the same app compiled with 7.4.2 runs to completion when
| processing the same data, although it does use as much as 15 GB of
| memory in the process. The Windows version crashed when it hit something
| over 5 GB or memory usage.
|
| Would you and your team be interested in looking at this?
|
| The problem is, both our app and our customer's data is proprietary.
| The customer is a well-know very large corporation. Due to the nature of
| the problem, I doubt there is any reasonable way to isolate the problem.
| I imagine we would need to send you the original code and data. Is there
| some sort of arrangement that can be made for this situation?
|
| Thanks,
| Yitz


Reply | Threaded
Open this post in threaded view
|

Crash on Windows with large data

Jason Dagit-3
I had never heard of BEX/BEX64 errors before. Naturally I wanted to know
what it was. Perhaps others here have never heard of this as well? Just in
case, I'll share what I learned.

After a bit of digging I came across this:
http://technet.microsoft.com/en-us/library/cc738483(WS.10).aspx

I think that article is saying that windows has a policy to prevent
arbitrary buffers from being executed. Perhaps the RTS needs to do
something special on windows when it's allocating pages above a certain
address?

Of potential note is this section (I don't know the RTS at all, so maybe it
already takes care of the following):
What works differently?
Application Compatibility

Some application behaviors are expected to be incompatible with DEP.
Applications that perform dynamic code generation (such as just-in-time
code generation) and that do not explicitly mark generated code with
Execute permission might have compatibility problems with DEP. Applications
that are not built with SafeSEH must have their exception handlers located
in executable memory regions.

Applications that attempt to violate DEP will receive an exception with
status code STATUS_ACCESS_VIOLATION (0xC0000005). If an application
requires executable memory, it must explicitly set this attribute on the
appropriate memory by specifying PAGE_EXECUTE, PAGE_EXECUTE_READ,
 PAGE_EXECUTE_READWRITE orPAGE_EXECUTE_WRITECOPY in the memory protection
argument of the Virtual* memory allocation functions. Heap allocations
using the malloc() and HeapAlloc()functions are non-executable.
Perhaps the problem is reproducible simply by forcing the 64bit windows
build of ghc to execute a thunk allocated above the 4 GB address range?


On Fri, Mar 22, 2013 at 6:55 AM, Simon Peyton-Jones
<simonpj at microsoft.com>wrote:

> Yitz and others
>
> | Would you and your team be interested in looking at this?
>
> In principle yes of course.   I wish we had a "team", as in a bunch of
> people whose job it is to make GHC fantastic.  However we certainly do have
> a team of volunteers who do amazing things; and one (totally heroic and
> overworked) Ian Lynagh who really does have it as his job.  So in practice
> it might be harder.
>
> Does anyone feel able to help Yitz?
>
> Ian might, depending on how hard it was to put him into a position where
> he could reproduce it.  Plus it depends on how critical it is for you.
>
> Simon
>
> | -----Original Message-----
> | From: sefer.org at gmail.com [mailto:sefer.org at gmail.com] On Behalf Of
> | Yitzchak Gale
> | Sent: 21 March 2013 21:03
> | To: Simon Peyton-Jones
> | Subject: Crash on Windows with large data
> |
> | Hi Simon,
> |
> | At work, we have an app that ran into a BEX64 crash on Windows when
> | processing a large and complex data set. I compiled a special version of
> | our app with GHC 7.6.2 for Windows, because we suspect it needs to run
> | on 64 bits.
> |
> | On Linux, the same app compiled with 7.4.2 runs to completion when
> | processing the same data, although it does use as much as 15 GB of
> | memory in the process. The Windows version crashed when it hit something
> | over 5 GB or memory usage.
> |
> | Would you and your team be interested in looking at this?
> |
> | The problem is, both our app and our customer's data is proprietary.
> | The customer is a well-know very large corporation. Due to the nature of
> | the problem, I doubt there is any reasonable way to isolate the problem.
> | I imagine we would need to send you the original code and data. Is there
> | some sort of arrangement that can be made for this situation?
> |
> | Thanks,
> | Yitz
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130322/bd984ecd/attachment.htm>

Reply | Threaded
Open this post in threaded view
|

Crash on Windows with large data

Ian Lynagh-2
In reply to this post by Simon Peyton Jones
On Fri, Mar 22, 2013 at 01:55:42PM +0000, Simon Peyton-Jones wrote:
>
> Ian might, depending on how hard it was to put him into a position where he could reproduce it.  Plus it depends on how critical it is for you.  
>
> | memory in the process. The Windows version crashed when it hit something
> | over 5 GB or memory usage.

I'm afraid I don't currently have access to a Windows machine with >5G
of RAM.


Thanks
Ian



Reply | Threaded
Open this post in threaded view
|

Fwd: Crash on Windows with large data

Yitzchak Gale
Please forgive me if you received this message twice.
I believe that my first attempt to send it was swallowed
due to my not being a subscriber to the list at that time.

We now have a Windows server with 23 GB of memory
temporarily available this week to work on this GHC crash.

Is there anyone interested in having a look at the problem?

Quick review: We have a proprietary app which crashes
on Windows with a BEX64 error when
it processes input data that causes it to use more than
about 4 GB of RAM. The same program when processing
the same data runs to successful completion on Linux.

Jason offered evidence that perhaps this is a general
problem with the RTS on Windows when more than 4 GB
of RAM is consumed.

Further information:

We are using 7.6.2 64-bit on Windows, and
7.4.2 (Haskell Platform) on Linux.

Our program also requires quite a bit of stack when
processing that large data. I give it 128 MB.

I just tried reproducing this problem by running a
simple program that eats up lots of memory.
First I used a list, then I tried "chopping things
up" more by using a rose tree. In both cases, I also
tried creating an intentional memory leak to
eat up lots of stack. None of that succeeded
in reproducing the crash. I wouldn't eliminate
Jason's hypothesis just yet, but so far it looks
like it's not quite that simple.

Any ideas?

Thanks,
Yitz


Reply | Threaded
Open this post in threaded view
|

Crash on Windows with large data

Simon Marlow-7
In reply to this post by Jason Dagit-3
On 22/03/13 21:26, Jason Dagit wrote:

> I had never heard of BEX/BEX64 errors before. Naturally I wanted to know
> what it was. Perhaps others here have never heard of this as well? Just
> in case, I'll share what I learned.
>
> After a bit of digging I came across this:
> http://technet.microsoft.com/en-us/library/cc738483(WS.10).aspx
>
> I think that article is saying that windows has a policy to prevent
> arbitrary buffers from being executed. Perhaps the RTS needs to do
> something special on windows when it's allocating pages above a certain
> address?
>
> Of potential note is this section (I don't know the RTS at all, so maybe
> it already takes care of the following):
>
>
>         What works differently?
>
>
>           Application Compatibility
>
> Some application behaviors are expected to be incompatible with DEP.
> Applications that perform dynamic code generation (such as just-in-time
> code generation) and that do not explicitly mark generated code with
> Execute permission might have compatibility problems with DEP.
> Applications that are not built with SafeSEH must have their exception
> handlers located in executable memory regions.
>
> Applications that attempt to violate DEP will receive an exception with
> status code |STATUS_ACCESS_VIOLATION (0xC0000005)|. If an application
> requires executable memory, it must explicitly set this attribute on the
> appropriate memory by specifying
> |PAGE_EXECUTE|,| PAGE_EXECUTE_READ|,| PAGE_EXECUTE_READWRITE
> |or|PAGE_EXECUTE_WRITECOPY| in the memory protection argument of the
> |Virtual* |memory allocation functions. Heap allocations using the
> |malloc()| and |HeapAlloc()|functions are non-executable.
>
> Perhaps the problem is reproducible simply by forcing the 64bit windows
> build of ghc to execute a thunk allocated above the 4 GB address range?

We explicitly call VirtualProtect() to set the execute bit on pages that
we need to execute code from, and have done for some time (Windows has
had DEP since XP SP2).  In GHC this happens for foreign import
"wrapper", and when compiling a foreign call with GHCi, it doesn't
happen for ordinary thunks - the code for these is statically compiled
into the binary.

So the bottom line is I don't know what's going wrong here, and
unfortunately I don't have the time to investigate right now.  Perhaps
Ian would able to look into it, as he did the Win 64 port.

Cheers,
        Simon