Static values language extension proposal

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Static values language extension proposal

Facundo Domínguez-3
Hello,
    With the support of Tweag I/O, Mathieu and I have been assembling
a design proposal for the language extension for Static values that
will take Cloud Haskell a big step forward in usability. Please, find
the proposal inlined below.

    We are looking forward to discuss its feasibility and features
with the community.

Best,
Facundo

--

In these notes we discuss a design of the language extension proposed
in [1] for Cloud Haskell. That is, support from the compiler to
produce labels that can be used to identify Haskell top-level bindings
across processes in a network.

Static values
=========

Following [1], the extension consists of a new syntactic form `static
e`, along with a type constructor `StaticRef` and a function

unstatic :: StaticRef a -> a

The idea is that values of type `StaticRef a` uniquely identify a
value that can be referred to by a global name rather than serialized
over the network between processes that are instances of a single
binary, because all such processes share the same top-level bindings.

Generating static references
====================

We start by introducing global names. A `GlobalName` is a symbol bound
in the top-level environment. It is much like global names in Template
Haskell, but `GlobalNames` always refer to terms, and they include a
package version.

data GlobalName = GlobalName PkgName PkgVersion ModName OccName

`GlobalNames` can be used as references to static values.

newtype StaticRef a = StaticRef GlobalName

`StaticRef a` is to `GlobalName` what `Ptr a` is to `Addr#`: a wrapper
with a phantom type parameter that keeps track of the type of the
value that is referenced.

The special form

static e

is an expression of type `StaticRef a` where `e :: a` is a closed
expression (meaning any free variables in `e` are bound in the
top-level environment).

If `e` is an identifier, `static e` just refers to it. Otherwise, the
compiler needs to introduce a new top-level binding with a fresh name
and the expression used as right-hand side, and the static reference
would point to this top-level binding instead.

Looking up static references
====================

`unstatic` is implemented as a function which finds a top-level value
from the `GlobalName`, otherwise it raises an exception. It crucially
relies on leveraging the system’s dynamic linker, so out-of-the-box
only works with dynamically linked binaries (but see below).
`unstatic` proceeds as follows:

  * Determines the name of the shared library from the package name
and the package version.

  * Determines the symbol of the value by Z-Encoding the package name,
the module name and the value name.

  * Uses the system’s dynamic linker interface to obtain the address
of the symbol.

  * Converts the symbol to a haskell value with `GHC.Prim.addrToAny#`

In principle, only symbols in shared libraries can be found. However,
the dynamic linker is able to find symbols in modules that are linked
statically if GHC is fed with the option -optl-Wl,--export-dynamic. A
future enhancement could be to have GHC warn the user when modules
using the extension are linked statically and this option is not used
during linking.

GHC only defines symbols for exported definitions in modules. So
unstatic won’t be able to find the private bindings of a module. For
this sake, the implementation of static should in addition ensure that
the bindings it gets will appear in the symbol table when they are not
exported by their defining modules.

Template Haskell support
==================

The static keyword needs to be made available in Template Haskell so
the distributed-static package can benefit from this language
extension.

Rationale
=======

We want the language extension to meet the following requirements:

  1. It must be a practical alternative to the remoteTable functions
in the distributed-static package.

  2. It must not change the build scheme used for Haskell programs. A
collection of .o files produced from Haskell source code should still
be possible to link with the system linking tools.

  3. It must not restrict all communicating processes using the
extension to be launched from the same binary.

  4. It must not significantly increase the binary size.

(1) is addressed by replacing remote tables with the symbol tables
produced by the compiler. Additionally, Template Haskell support is
included so that the existing distributed-static package can be
adapted and extended to include this extension.

(2) is addressed by choosing a scheme which does not require the
linker to perform any extension-specific procedure to collect the
static values in various modules. There’s a trade off here though,
since symbols in statically linked modules cannot be accessed unless
-optl-Wl,--export-dynamic is supplied during linking.

(3) is addressed by allowing programs to exchange static values for
any bindings found in the modules they share.

(4) is addressed by reusing the symbol tables produced by the compiler
in object files rather than creating separate remote tables.

About the need for using different binaries
==============================

While using distributed-process we found some use cases for supporting
communicating closures between multiple binaries.

One of these use cases involved a distributed application and a
monitoring tool. The monitoring tool would need to link in some
graphics libraries to display information on the screen, none of which
were required by the monitored application. Conversely, the monitored
application would link in some modules that the monitoring application
didn’t need. Crucially, both applications are fairly loosely coupled,
even if they both need to exchange static values about bindings in
some modules they shared.

An analogous use case involved the distributed application and a
control application that would be used to change dynamic settings of
the former.

Further Work
==========

As the application depends on shared libraries, now a tool to collect
these libraries would be required so they can be distributed together
with the executable binary when deploying a Cloud Haskell application
in a cluster. We won’t delve further into this problem.

Another possible line of work is extending this approach so a process
can pull shared objects from a remote peer, when this remote peer
sends a static value that is defined in a shared object not available
to the process.

Integration with distributed-static
=======================

The package distributed-static could either adopt this extension as
the only implementation of static values, or it could support many
notions of static references, say by using a type class to overload
`unstatic`.

class Static st s | s -> st where
   unstatic :: st -> s a -> Either String a

where the class parameter `st` is provided for backwards compatibility
with the existing scheme to provide context-dependent information. The
extension we present here does not depend on this parameter, so `()`
could be used for the `StaticRef` instance.

instance Static () StaticRef where ...

References
========

[1] Jeff Epstein, Andrew P. Black, and Simon Peyton-Jones. Towards
Haskell in the cloud. SIGPLAN Not., 46(12):118–129, September 2011.
ISSN 0362-1340.
_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: Static values language extension proposal

Carter Schonwald
Hey Facundo,

thanks for sharing this proposal. 

several questions:

0) I think you could actually implement this proposal as a userland library, at least as you've described it. Have you tried doing so? 

1) what does this accomplish that can not be accomplished by having various nodes agree on a DSL, and sending ASTs to each other?
     1a) in fact, I'd argue (and some others agree, and i'll admit my opinions have been shaped by those more expert than me) that the sending a wee AST you can interpret on the other side is much SAFER than "sending a function symbol thats hard coded hopefully into both programs in a way that it means the same thing".  I've had many educational conversations with 

2) how does it provide more type safety than the current TH based approach? (I've seen Tim and others hit very very gnarly bugs in cloud haskell based upon the "magic static values" approach). 

3) this proposal requires changes to linking etc that would really make it useful only on systems and deployments that only have Template Haskell AND Dynamic linking.  (and also rules out any context where it'd be nice to deploy a static app or say, use CH in ios! )


to repeat: have you considered defining an AST type + interpreter for the computations you want to send around, and doing that? I think its a much simpler, safer, easier, flexible and PORTABLE approach, though one current CH doesn't do (though the folks working on CH seem to be receptive to switching to such a strategy if someone validates it)

cheers
-Carter



On Fri, Jan 24, 2014 at 12:19 PM, Facundo Domínguez <[hidden email]> wrote:
Hello,
    With the support of Tweag I/O, Mathieu and I have been assembling
a design proposal for the language extension for Static values that
will take Cloud Haskell a big step forward in usability. Please, find
the proposal inlined below.

    We are looking forward to discuss its feasibility and features
with the community.

Best,
Facundo

--

In these notes we discuss a design of the language extension proposed
in [1] for Cloud Haskell. That is, support from the compiler to
produce labels that can be used to identify Haskell top-level bindings
across processes in a network.

Static values
=========

Following [1], the extension consists of a new syntactic form `static
e`, along with a type constructor `StaticRef` and a function

unstatic :: StaticRef a -> a

The idea is that values of type `StaticRef a` uniquely identify a
value that can be referred to by a global name rather than serialized
over the network between processes that are instances of a single
binary, because all such processes share the same top-level bindings.

Generating static references
====================

We start by introducing global names. A `GlobalName` is a symbol bound
in the top-level environment. It is much like global names in Template
Haskell, but `GlobalNames` always refer to terms, and they include a
package version.

data GlobalName = GlobalName PkgName PkgVersion ModName OccName

`GlobalNames` can be used as references to static values.

newtype StaticRef a = StaticRef GlobalName

`StaticRef a` is to `GlobalName` what `Ptr a` is to `Addr#`: a wrapper
with a phantom type parameter that keeps track of the type of the
value that is referenced.

The special form

static e

is an expression of type `StaticRef a` where `e :: a` is a closed
expression (meaning any free variables in `e` are bound in the
top-level environment).

If `e` is an identifier, `static e` just refers to it. Otherwise, the
compiler needs to introduce a new top-level binding with a fresh name
and the expression used as right-hand side, and the static reference
would point to this top-level binding instead.

Looking up static references
====================

`unstatic` is implemented as a function which finds a top-level value
from the `GlobalName`, otherwise it raises an exception. It crucially
relies on leveraging the system’s dynamic linker, so out-of-the-box
only works with dynamically linked binaries (but see below).
`unstatic` proceeds as follows:

  * Determines the name of the shared library from the package name
and the package version.

  * Determines the symbol of the value by Z-Encoding the package name,
the module name and the value name.

  * Uses the system’s dynamic linker interface to obtain the address
of the symbol.

  * Converts the symbol to a haskell value with `GHC.Prim.addrToAny#`

In principle, only symbols in shared libraries can be found. However,
the dynamic linker is able to find symbols in modules that are linked
statically if GHC is fed with the option -optl-Wl,--export-dynamic. A
future enhancement could be to have GHC warn the user when modules
using the extension are linked statically and this option is not used
during linking.

GHC only defines symbols for exported definitions in modules. So
unstatic won’t be able to find the private bindings of a module. For
this sake, the implementation of static should in addition ensure that
the bindings it gets will appear in the symbol table when they are not
exported by their defining modules.

Template Haskell support
==================

The static keyword needs to be made available in Template Haskell so
the distributed-static package can benefit from this language
extension.

Rationale
=======

We want the language extension to meet the following requirements:

  1. It must be a practical alternative to the remoteTable functions
in the distributed-static package.

  2. It must not change the build scheme used for Haskell programs. A
collection of .o files produced from Haskell source code should still
be possible to link with the system linking tools.

  3. It must not restrict all communicating processes using the
extension to be launched from the same binary.

  4. It must not significantly increase the binary size.

(1) is addressed by replacing remote tables with the symbol tables
produced by the compiler. Additionally, Template Haskell support is
included so that the existing distributed-static package can be
adapted and extended to include this extension.

(2) is addressed by choosing a scheme which does not require the
linker to perform any extension-specific procedure to collect the
static values in various modules. There’s a trade off here though,
since symbols in statically linked modules cannot be accessed unless
-optl-Wl,--export-dynamic is supplied during linking.

(3) is addressed by allowing programs to exchange static values for
any bindings found in the modules they share.

(4) is addressed by reusing the symbol tables produced by the compiler
in object files rather than creating separate remote tables.

About the need for using different binaries
==============================

While using distributed-process we found some use cases for supporting
communicating closures between multiple binaries.

One of these use cases involved a distributed application and a
monitoring tool. The monitoring tool would need to link in some
graphics libraries to display information on the screen, none of which
were required by the monitored application. Conversely, the monitored
application would link in some modules that the monitoring application
didn’t need. Crucially, both applications are fairly loosely coupled,
even if they both need to exchange static values about bindings in
some modules they shared.

An analogous use case involved the distributed application and a
control application that would be used to change dynamic settings of
the former.

Further Work
==========

As the application depends on shared libraries, now a tool to collect
these libraries would be required so they can be distributed together
with the executable binary when deploying a Cloud Haskell application
in a cluster. We won’t delve further into this problem.

Another possible line of work is extending this approach so a process
can pull shared objects from a remote peer, when this remote peer
sends a static value that is defined in a shared object not available
to the process.

Integration with distributed-static
=======================

The package distributed-static could either adopt this extension as
the only implementation of static values, or it could support many
notions of static references, say by using a type class to overload
`unstatic`.

class Static st s | s -> st where
   unstatic :: st -> s a -> Either String a

where the class parameter `st` is provided for backwards compatibility
with the existing scheme to provide context-dependent information. The
extension we present here does not depend on this parameter, so `()`
could be used for the `StaticRef` instance.

instance Static () StaticRef where ...

References
========

[1] Jeff Epstein, Andrew P. Black, and Simon Peyton-Jones. Towards
Haskell in the cloud. SIGPLAN Not., 46(12):118–129, September 2011.
ISSN 0362-1340.

--


_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: Static values language extension proposal

Tim Watson
In reply to this post by Facundo Domínguez-3
I don't have time to weigh in on this proposal right now, but I have several comments...

On 24 Jan 2014, at 17:19, Facundo Domínguez wrote:
> Rationale
> =======
>
> We want the language extension to meet the following requirements:
>
>  1. It must be a practical alternative to the remoteTable functions
> in the distributed-static package.
>

Agreed - this is vital!

>  2. It must not change the build scheme used for Haskell programs. A
> collection of .o files produced from Haskell source code should still
> be possible to link with the system linking tools.
>

Also vital.

>  3. It must not restrict all communicating processes using the
> extension to be launched from the same binary.
>

I personally think this is very valuable.

> About the need for using different binaries
> ==============================
>
> While using distributed-process we found some use cases for supporting
> communicating closures between multiple binaries.
>
> One of these use cases involved a distributed application and a
> monitoring tool. The monitoring tool would need to link in some
> graphics libraries to display information on the screen, none of which
> were required by the monitored application. Conversely, the monitored
> application would link in some modules that the monitoring application
> didn’t need. Crucially, both applications are fairly loosely coupled,
> even if they both need to exchange static values about bindings in
> some modules they shared.

Indeed - this is an almost canonical use-case, as are administrative (e.g., remote management) tools.

> As the application depends on shared libraries, now a tool to collect
> these libraries would be required so they can be distributed together
> with the executable binary when deploying a Cloud Haskell application
> in a cluster. We won’t delve further into this problem.

Great idea.

>
> Another possible line of work is extending this approach so a process
> can pull shared objects from a remote peer, when this remote peer
> sends a static value that is defined in a shared object not available
> to the process.

This would go a long way towards answering our questions about 'hot code upgrade' and be useful in many other areas too.

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: Static values language extension proposal

Tim Watson
In reply to this post by Carter Schonwald
On 24 Jan 2014, at 17:59, Carter Schonwald wrote:
> 0) I think you could actually implement this proposal as a userland library, at least as you've described it. Have you tried doing so?
>

I didn't pick up on that at all - how would we be able to do that?

> 1) what does this accomplish that can not be accomplished by having various nodes agree on a DSL, and sending ASTs to each other?
>      1a) in fact, I'd argue (and some others agree, and i'll admit my opinions have been shaped by those more expert than me) that the sending a wee AST you can interpret on the other side is much SAFER than "sending a function symbol thats hard coded hopefully into both programs in a way that it means the same thing".  I've had many educational conversations with
>

I've still not seen a convincing example of how to do this though. It would help if someone explained what this would look like, running over two (or more) separate binaries and still shipping code. It's just that, afaict, that AST wouldn't be so "wee" once it had to represent any arbitrary expression. One could, of course, just ship source (or some intermediate representation), but that would also require compiler infrastructure to be installed on the target.

> 2) how does it provide more type safety than the current TH based approach? (I've seen Tim and others hit very very gnarly bugs in cloud haskell based upon the "magic static values" approach).
>

This is definitely true, but I see it as a problem related to our use of TH rather than our current use of closures and 'Static' per se. Having said that, it can be toe-curlingly difficult to work with closure/static sometimes, so *anything* that makes this easier sounds good to me.

>
> to repeat: have you considered defining an AST type + interpreter for the computations you want to send around, and doing that? I think its a much simpler, safer, easier, flexible and PORTABLE approach, though one current CH doesn't do (though the folks working on CH seem to be receptive to switching to such a strategy if someone validates it)
>

I/we are, I think, amenable to doing whatever makes the most sense. This could include doing more than one thing, when it comes to dealing with 'statics'. Personally I think the proposal sounds interesting, though as I mentioned in my previously mail, I haven't had time to sit down and look at it in detail yet.

Cheers,
Tim
_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: Static values language extension proposal

Brandon Allbery
In reply to this post by Facundo Domínguez-3
On Fri, Jan 24, 2014 at 12:19 PM, Facundo Domínguez <[hidden email]> wrote:
In principle, only symbols in shared libraries can be found. However,
the dynamic linker is able to find symbols in modules that are linked
statically if GHC is fed with the option -optl-Wl,--export-dynamic. A

This strikes me as highly platform specific to the Linux and possibly FreeBSD implementations of ELF; it likely will not work with Solaris ELF, which handles dynamic symbols differently (or at least used to), and will not work with non-ELF platforms (OS X, Windows) and probably won't work with a non-GNU ld such as is used on Solaris and OS X.

--
brandon s allbery kf8nh                               sine nomine associates
[hidden email]                                  [hidden email]
unix, openafs, kerberos, infrastructure, xmonad        http://sinenomine.net

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: Static values language extension proposal

Mathieu Boespflug
In reply to this post by Carter Schonwald
[Sorry for the multiple reposts - couldn't quite figure out which
email address doesn't get refused by the list..]


Hi Carter,

thank you for the good points you raise. I'll try and address each of
them as best I can below.

> 0) I think you could actually implement this proposal as a userland library,
> at least as you've described it. Have you tried doing so?

Indeed, this could be done without touching the compiler at all. We
thought long and hard about a path that would ultimately make an
extension either unnecessary, or at any rate very small. At this
point, the only thing that we are proposing to add to the compiler is
the syntactic form "static e". Contrary to the presentation in the
paper, the 'unstatic' function can be implemented entirely as library
code and does not need to be a primop. Moreover, we do not need to
piece together any kind of global remote table at compile time or link
time, because we're piggy backing on that already constructed by the
system linker.

The `static e` form could as well be a piece of Template Haskell, but
making it a proper extension means that the compiler can enforce more
invariants and be a bit more helpful to the user. In particular,
detecting situations where symbolic references cannot be generated
because e.g. the imported packages were not compiled as dynamic linked
libraries. Or seamlessly supporting calling `static f` on an idenfier
`f` that is not exported by the module.

> 1) what does this accomplish that can not be accomplished by having various
> nodes agree on a DSL, and sending ASTs to each other?
>      1a) in fact, I'd argue (and some others agree, and i'll admit my
> opinions have been shaped by those more expert than me) that the sending a
> wee AST you can interpret on the other side is much SAFER than "sending a
> function symbol thats hard coded hopefully into both programs in a way that
> it means the same thing".

I very much subscribe to the idea of defining small DSL's for
exchanging code between nodes. And this proposal is compatible with
that idea.

One thing that might not have been so clear in the original email is
that we are proposing here to introduce just *one such DSL*. It's just
that it's a trivial one whose grammar only contains linker symbol
names.

As it happens, distributed-static today already supports two such
DSL's: a DSL of labels, which are arbitrary string names for
functions, and a small language for composing Static values together.
There is a patch lying around by Edsko proposing to add a third "DSL":
one that allows nodes to trade arbitrary Haskell strings that are then
eval'ed on the other end by the 'plugins' package.

As Facundo explains at the end of his email, the notion of a "static"
value ought to be a more general one than was first envisioned in the
paper: a static value is any closed denotation, denoted in any of a
choice of multiple small languages, some of which ship standard with
distributed-static. The user can define his own DSL for shipping code
around.

This is why we propose to make Static into a class. Each DSL is
generated by one datatype. Each such datatype has a Static instance.
If you would like to ship an AST around the cluster, you can make the
datatype for that AST an instance of Static, with 'unstatic' being
defined as an interpreter for your AST.

Concretely:

data HsExpr = ...

instance Static HsExpr where
  unstatic e = Hs.interpret e

> I've had many educational conversations with

... ?

> 2) how does it provide more type safety than the current TH based approach?
> (I've seen Tim and others hit very very gnarly bugs in cloud haskell based
> upon the "magic static values" approach).

The type safety of the current TH approach is reasonable I think. One
potential problem comes from managing dynamically typed values in the
remote table, which must be coerced to the right type and use the
right decoders if you don't use TH. With the approach we propose,
there is no remote table, so I guess this should help eliminate a
source of bugs.

> 3) this proposal requires changes to linking etc that would really make it
> useful only on systems and deployments that only have Template Haskell AND
> Dynamic linking.  (and also rules out any context where it'd be nice to
> deploy a static app or say, use CH in ios! )

I don't know about iOS. And it's very likely that there are contexts
in which this extension doesn't work. But as I said above, you are
always free to define your own DSL's that cover the particular use
case that you have in mind. The nice thing with this particular DSL is
that it requires little to no TH to generate label names, which can
always be a source of bugs, especially when you forget to include them
in the global remote table (which is something that TH doesn't and
can't help you with).

Furthermore, it was my understanding that GHC is heading towards a
world of "dynamic linkable by default", and it is by now something
that is supported on most platforms by GHC. See e.g.

https://ghc.haskell.org/trac/ghc/wiki/DynamicGhcPrograms

There are fairly good solutions to deploy self contained dynamically
linked apps these days, e.g. Docker. And in any case, with a few extra
flags we can still do away with the dynamic linking requirement on
some (all?) platforms.

> to repeat: have you considered defining an AST type + interpreter for the
> computations you want to send around, and doing that? I think its a much
> simpler, safer, easier, flexible and PORTABLE approach, though one current
> CH doesn't do (though the folks working on CH seem to be receptive to
> switching to such a strategy if someone validates it)

We have, and it's an option with different tradeoffs. Both solutions
could gainfully live side by side and are in fact complementary. I
contend that the solution described by Facundo has the advantage of
eliminating much of the syntactic overhead associated with sending
references to (higher-order) values across the cluster. We have more
ideas specific to distributed-process which we can discuss in a
separate thread to reduce the syntactic overhead even further, to
practically nothing.

Best,

Mathieu
_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: Static values language extension proposal

Carter Schonwald
anyways

1) you should (once 7.8 is out) evaluate how far you can push your ideas wrt dynamic loading as a user land library.
 If you can't make it work as a library and can demonstrate why (or how even though it works its not quite satisfactory), thats signals something!  

But I really think insisting that the linker symbol names denote the "datum agreement" in a distributed system is punting on what should be handled at the application level. Simon Marlow put some improvements into GHC to help improve doing dynamic code (un)loading, stress test that!

 Theres quite a few industrial haskell shops that provide products / services where internally they do runtime dynamic loading of user provided object files, so i'm sure that the core GHC support is there if you actually dig into the apis! And they do this in a distributed systems context, sans CH.

2) I've a work in progress on specing out a proper (and sound :) ) static values type extension for ghc, that will be usable perhaps in your your case (though by dint of being sound, will preclude some of the things you think you want). BUT, any type system changes need to actually provide safety. My motivation for having a notion of static values comes from a desire to add compiler support for certain numerical computing operations that require compiler support to be usable in haskell. BUT, much of the same work 

@tim: what on earth does "sending arbitrary code" mean? I feel like the more precise thing everyone here wants is "for a given application / infrastructure deployment, I would to be able to send my application specific computations over the network, using cloud haskell, and be sure that both sides think its the same code".

As for *how* to send an AST fragment, edward kmett and other have some pretty nice typed AST models that are easy to adapt and extend for an application specific use case. Bound http://hackage.haskell.org/package/bound is one nice one. 

heres a really really good school of haskell exposition https://www.fpcomplete.com/user/edwardk/bound

And theres a generalization that supports strong typing that i've copied from an hpaste https://gist.github.com/cartazio/5727196, where its notable that the AST data type is called "Remote" :),
I think thats a hint its meant to be a haskell manipulable way of constructing a typed DSL you can serialize using a finally tagless style api approach (ie have a set of type class instances / operations that you use to run the computation and/or construct the AST you can send over the wire)




On Fri, Jan 24, 2014 at 3:19 PM, Mathieu Boespflug <[hidden email]> wrote:
[Sorry for the multiple reposts - couldn't quite figure out which
email address doesn't get refused by the list..]


Hi Carter,

thank you for the good points you raise. I'll try and address each of
them as best I can below.

> 0) I think you could actually implement this proposal as a userland library,
> at least as you've described it. Have you tried doing so?

Indeed, this could be done without touching the compiler at all. We
thought long and hard about a path that would ultimately make an
extension either unnecessary, or at any rate very small. At this
point, the only thing that we are proposing to add to the compiler is
the syntactic form "static e". Contrary to the presentation in the
paper, the 'unstatic' function can be implemented entirely as library
code and does not need to be a primop. Moreover, we do not need to
piece together any kind of global remote table at compile time or link
time, because we're piggy backing on that already constructed by the
system linker.

The `static e` form could as well be a piece of Template Haskell, but
making it a proper extension means that the compiler can enforce more
invariants and be a bit more helpful to the user. In particular,
detecting situations where symbolic references cannot be generated
because e.g. the imported packages were not compiled as dynamic linked
libraries. Or seamlessly supporting calling `static f` on an idenfier
`f` that is not exported by the module.

> 1) what does this accomplish that can not be accomplished by having various
> nodes agree on a DSL, and sending ASTs to each other?
>      1a) in fact, I'd argue (and some others agree, and i'll admit my
> opinions have been shaped by those more expert than me) that the sending a
> wee AST you can interpret on the other side is much SAFER than "sending a
> function symbol thats hard coded hopefully into both programs in a way that
> it means the same thing".

I very much subscribe to the idea of defining small DSL's for
exchanging code between nodes. And this proposal is compatible with
that idea.

One thing that might not have been so clear in the original email is
that we are proposing here to introduce just *one such DSL*. It's just
that it's a trivial one whose grammar only contains linker symbol
names.

As it happens, distributed-static today already supports two such
DSL's: a DSL of labels, which are arbitrary string names for
functions, and a small language for composing Static values together.
There is a patch lying around by Edsko proposing to add a third "DSL":
one that allows nodes to trade arbitrary Haskell strings that are then
eval'ed on the other end by the 'plugins' package.

As Facundo explains at the end of his email, the notion of a "static"
value ought to be a more general one than was first envisioned in the
paper: a static value is any closed denotation, denoted in any of a
choice of multiple small languages, some of which ship standard with
distributed-static. The user can define his own DSL for shipping code
around.

This is why we propose to make Static into a class. Each DSL is
generated by one datatype. Each such datatype has a Static instance.
If you would like to ship an AST around the cluster, you can make the
datatype for that AST an instance of Static, with 'unstatic' being
defined as an interpreter for your AST.

Concretely:

data HsExpr = ...

instance Static HsExpr where
  unstatic e = Hs.interpret e

> I've had many educational conversations with

... ?

> 2) how does it provide more type safety than the current TH based approach?
> (I've seen Tim and others hit very very gnarly bugs in cloud haskell based
> upon the "magic static values" approach).

The type safety of the current TH approach is reasonable I think. One
potential problem comes from managing dynamically typed values in the
remote table, which must be coerced to the right type and use the
right decoders if you don't use TH. With the approach we propose,
there is no remote table, so I guess this should help eliminate a
source of bugs.

> 3) this proposal requires changes to linking etc that would really make it
> useful only on systems and deployments that only have Template Haskell AND
> Dynamic linking.  (and also rules out any context where it'd be nice to
> deploy a static app or say, use CH in ios! )

I don't know about iOS. And it's very likely that there are contexts
in which this extension doesn't work. But as I said above, you are
always free to define your own DSL's that cover the particular use
case that you have in mind. The nice thing with this particular DSL is
that it requires little to no TH to generate label names, which can
always be a source of bugs, especially when you forget to include them
in the global remote table (which is something that TH doesn't and
can't help you with).

Furthermore, it was my understanding that GHC is heading towards a
world of "dynamic linkable by default", and it is by now something
that is supported on most platforms by GHC. See e.g.

https://ghc.haskell.org/trac/ghc/wiki/DynamicGhcPrograms

There are fairly good solutions to deploy self contained dynamically
linked apps these days, e.g. Docker. And in any case, with a few extra
flags we can still do away with the dynamic linking requirement on
some (all?) platforms.

> to repeat: have you considered defining an AST type + interpreter for the
> computations you want to send around, and doing that? I think its a much
> simpler, safer, easier, flexible and PORTABLE approach, though one current
> CH doesn't do (though the folks working on CH seem to be receptive to
> switching to such a strategy if someone validates it)

We have, and it's an option with different tradeoffs. Both solutions
could gainfully live side by side and are in fact complementary. I
contend that the solution described by Facundo has the advantage of
eliminating much of the syntactic overhead associated with sending
references to (higher-order) values across the cluster. We have more
ideas specific to distributed-process which we can discuss in a
separate thread to reduce the syntactic overhead even further, to
practically nothing.

Best,

Mathieu


_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: Static values language extension proposal

Tim Watson
On 25 Jan 2014, at 18:12, Carter Schonwald wrote:

1) you should (once 7.8 is out) evaluate how far you can push your ideas wrt dynamic loading as a user land library.
 If you can't make it work as a library and can demonstrate why (or how even though it works its not quite satisfactory), thats signals something!  


Is that something you'll consider looking at Matthieu?

 Theres quite a few industrial haskell shops that provide products / services where internally they do runtime dynamic loading of user provided object files, so i'm sure that the core GHC support is there if you actually dig into the apis! And they do this in a distributed systems context, sans CH.


We have a pull request from Edsko that melds hs-plugins support with static, as per the original proposal's notes, so this seems like a corollary issue to me. 

2) I've a work in progress on specing out a proper (and sound :) ) static values type extension for ghc, that will be usable perhaps in your your case (though by dint of being sound, will preclude some of the things you think you want). BUT, any type system changes need to actually provide safety. My motivation for having a notion of static values comes from a desire to add compiler support for certain numerical computing operations that require compiler support to be usable in haskell. BUT, much of the same work 


Timescales? There are commercial users of Cloud Haskell clamouring for improvements to the way we handle this situation, and I'm keen to combine getting broader community agreements about "the right thing to do" with facilitating our users real needs. If there are other options pertaining to "static" support, I'd like to know more!

@tim: what on earth does "sending arbitrary code" mean? I feel like the more precise thing everyone here wants is "for a given application / infrastructure deployment, I would to be able to send my application specific computations over the network, using cloud haskell, and be sure that both sides think its the same code".


With Cloud Haskell in its current guise, I can "Closure up" pretty any thunk I like and spawn it on a remote node. If the node's are both running the same executable, we're fine. If they're not, we're potentially in trouble.

In Erlang, I can rpc/send *any* term and evaluate it on another node. That includes functions of course. Whether or not we want to be quite that general is another matter, but that is the comparison I've been making.

As for *how* to send an AST fragment, edward kmett and other have some pretty nice typed AST models that are easy to adapt and extend for an application specific use case. Bound http://hackage.haskell.org/package/bound is one nice one. 

heres a really really good school of haskell exposition https://www.fpcomplete.com/user/edwardk/bound

And theres a generalization that supports strong typing that i've copied from an hpaste https://gist.github.com/cartazio/5727196, where its notable that the AST data type is called "Remote" :),
I think thats a hint its meant to be a haskell manipulable way of constructing a typed DSL you can serialize using a finally tagless style api approach (ie have a set of type class instances / operations that you use to run the computation and/or construct the AST you can send over the wire)


These are all lovely, but aren't we talking about either (a) putting together an AST to represent whatever valid Haskell program someone wants to send, or (b) forcing every application developer to write an AST to cover all their remote computations. Both of those sound like a lot more work than the proposal below. They may be the right approach from some domains, but there is a fair bit of "developer overhead" involved from what I can see.

On Fri, Jan 24, 2014 at 3:19 PM, Mathieu Boespflug <[hidden email]> wrote:
The `static e` form could as well be a piece of Template Haskell, but
making it a proper extension means that the compiler can enforce more
invariants and be a bit more helpful to the user. In particular,
detecting situations where symbolic references cannot be generated
because e.g. the imported packages were not compiled as dynamic linked
libraries. Or seamlessly supporting calling `static f` on an idenfier
`f` that is not exported by the module.


All of which sound like a usability improvement to me.

I very much subscribe to the idea of defining small DSL's for
exchanging code between nodes. And this proposal is compatible with
that idea.

One thing that might not have been so clear in the original email is
that we are proposing here to introduce just *one such DSL*. It's just
that it's a trivial one whose grammar only contains linker symbol
names.


That triviality is a rather important point as well, because...

As it happens, distributed-static today already supports two such
DSL's: a DSL of labels, which are arbitrary string names for
functions, and a small language for composing Static values together.

And whilst those two DSL's are rather simple, it can still be tricky to get things right. 

As Facundo explains at the end of his email, the notion of a "static"
value ought to be a more general one than was first envisioned in the
paper: a static value is any closed denotation, denoted in any of a
choice of multiple small languages, some of which ship standard with
distributed-static. The user can define his own DSL for shipping code
around.

Indeed - there's never been anything preventing users from doing thus. Indeed, sending messages that are "interpreted" by a remote processes in order to apply some specific processing is pretty much the MO of all Cloud Haskell code. The "plugins" based support will add to the options there.

> 2) how does it provide more type safety than the current TH based approach?
> (I've seen Tim and others hit very very gnarly bugs in cloud haskell based
> upon the "magic static values" approach).

The type safety of the current TH approach is reasonable I think. One
potential problem comes from managing dynamically typed values in the
remote table, which must be coerced to the right type and use the
right decoders if you don't use TH. With the approach we propose,
there is no remote table, so I guess this should help eliminate a
source of bugs.

And remove a slightly awkward programming model. 


> to repeat: have you considered defining an AST type + interpreter for the
> computations you want to send around, and doing that? I think its a much
> simpler, safer, easier, flexible and PORTABLE approach, though one current
> CH doesn't do (though the folks working on CH seem to be receptive to
> switching to such a strategy if someone validates it)

We have, and it's an option with different tradeoffs. Both solutions
could gainfully live side by side and are in fact complementary. I
contend that the solution described by Facundo has the advantage of
eliminating much of the syntactic overhead associated with sending
references to (higher-order) values across the cluster. We have more
ideas specific to distributed-process which we can discuss in a
separate thread to reduce the syntactic overhead even further, to
practically nothing.


I agree that the proposal sounds beneficial. It's a good thing that both approaches can live side by side. 

I'd like to hear more about these other ideas too. I'd also like to hear more from the rest of the community - especially Cloud Haskell users. I know a few others besides Parallel Scientific are using Cloud Haskell in commercial applications - I'd very much like to hear from you all on this proposal too.

Cheers,
Tim


_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: Static values language extension proposal

Brandon Allbery
On Sun, Jan 26, 2014 at 1:43 PM, Tim Watson <[hidden email]> wrote:
In Erlang, I can rpc/send *any* term and evaluate it on another node. That includes functions of course. Whether or not we want to be quite that general is another matter, but that is the comparison I've been making.

Note that Erlang gets away with this through being a virtual machine architecture; BEAM is about as write-once-run-anywhere as it gets, and the platform specifics are abstracted by the BEAM VM interpreter. You just aren't going to accomplish this with a native compiled language, without encoding as a virtual machine yourself (that is, the AST-based mechanisms).

Perhaps you should consider fleshing out ghc's current bytecode support to be a full VM? Or perhaps an interesting alternative would be a BEAM backend for ghc.

--
brandon s allbery kf8nh                               sine nomine associates
[hidden email]                                  [hidden email]
unix, openafs, kerberos, infrastructure, xmonad        http://sinenomine.net

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: Static values language extension proposal

Tim Watson
Hi Brandon,

On 26 Jan 2014, at 19:01, Brandon Allbery wrote:

On Sun, Jan 26, 2014 at 1:43 PM, Tim Watson <[hidden email]> wrote:
In Erlang, I can rpc/send *any* term and evaluate it on another node. That includes functions of course. Whether or not we want to be quite that general is another matter, but that is the comparison I've been making.

Note that Erlang gets away with this through being a virtual machine architecture; BEAM is about as write-once-run-anywhere as it gets, and the platform specifics are abstracted by the BEAM VM interpreter. You just aren't going to accomplish this with a native compiled language, without encoding as a virtual machine yourself (that is, the AST-based mechanisms).

Yeah, I do realise this. Of course we're not trying to reproduce the BEAM really, but what we /do/ want is to be able to do is exchange messages between nodes that are not running the same executable. The proposal does appear to address this requirement, at least to some extent. There may be complementary (or better) approaches. I believe Carter is going to provide some additional details viz his work in this area at some point.

Anything that reduces the amount of Template Haskell required to work with Cloud Haskell is a "good thing (tm)" IMO. Not that I mind using TH, but the programming model is currently quite awkward from the caller's perspective, since you've got to (a) create a Static/Closure out of potentially complex chunks of code, which often involves creating numerous top level wrapper APIs and (b) fiddle around with the remote-table (both in the code that defines remote-able thunks *and* in the code that starts a node wishing to operate on them.

Also note that this problem isn't limited to sending code around the network. Just sending arbitrary *data* between nodes is currently discouraged (though not disallowed) because the receiving program *might* not understand the types you're sending it. This is very restrictive and the proposal does, at the very least, allow us to safely serialise, send and receive types that both programs "know about" by virtue of having been linked to the same library/libraries. 

But yes - there are certainly constraints and edge cases aplenty here. I'm not entirely sure whether or not we'd need to potentially change the (binary) encoding of raw messages in distributed-process, for example, in response to this change. Currently we serialise a pointer (i.e., the pointer to the fingerprint for the type that's being sent), and I can imagine that not working properly across different nodes running on different architectures etc.

Perhaps you should consider fleshing out ghc's current bytecode support to be a full VM?

After discussing this with Simon M, we concluded there was little point in doing so. The GHC RTS is practically a VM anyway, and there's probably not that much value to be gained by shipping bytecode around. Besides, as you put it, the AST-based mechanisms allow for this anyway (albeit with some coding required on the part of the application developer) and Carter (and others) assure me that the mechanisms required to do this kind of thing already exist. We just need to find the right way to take advantage of them.

Or perhaps an interesting alternative would be a BEAM backend for ghc.


I've talked to a couple of people that want to try this. I'm intrigued, but have other things to focus on. :)

Cheers,
Tim

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: Static values language extension proposal

Facundo Domínguez-3
To address the concerns about static linking and portability, there is
also the alternative of of using the RTS linker in those platforms
that need it.

In many aspects, neither linker makes a big difference to us. We are
going with the system's dynamic linker mainly because GHC team has
expressed the desire to get rid of the RTS linker.

Using the RTS linker would require addressing some additional
technical issues, none of which appear to be show-stoppers. It would
be just more work.

Best,
Facundo


On Mon, Jan 27, 2014 at 2:20 PM, Tim Watson <[hidden email]> wrote:

> Hi Brandon,
>
> On 26 Jan 2014, at 19:01, Brandon Allbery wrote:
>
> On Sun, Jan 26, 2014 at 1:43 PM, Tim Watson <[hidden email]>
> wrote:
>>
>> In Erlang, I can rpc/send *any* term and evaluate it on another node. That
>> includes functions of course. Whether or not we want to be quite that
>> general is another matter, but that is the comparison I've been making.
>
>
> Note that Erlang gets away with this through being a virtual machine
> architecture; BEAM is about as write-once-run-anywhere as it gets, and the
> platform specifics are abstracted by the BEAM VM interpreter. You just
> aren't going to accomplish this with a native compiled language, without
> encoding as a virtual machine yourself (that is, the AST-based mechanisms).
>
>
> Yeah, I do realise this. Of course we're not trying to reproduce the BEAM
> really, but what we /do/ want is to be able to do is exchange messages
> between nodes that are not running the same executable. The proposal does
> appear to address this requirement, at least to some extent. There may be
> complementary (or better) approaches. I believe Carter is going to provide
> some additional details viz his work in this area at some point.
>
> Anything that reduces the amount of Template Haskell required to work with
> Cloud Haskell is a "good thing (tm)" IMO. Not that I mind using TH, but the
> programming model is currently quite awkward from the caller's perspective,
> since you've got to (a) create a Static/Closure out of potentially complex
> chunks of code, which often involves creating numerous top level wrapper
> APIs and (b) fiddle around with the remote-table (both in the code that
> defines remote-able thunks *and* in the code that starts a node wishing to
> operate on them.
>
> Also note that this problem isn't limited to sending code around the
> network. Just sending arbitrary *data* between nodes is currently
> discouraged (though not disallowed) because the receiving program *might*
> not understand the types you're sending it. This is very restrictive and the
> proposal does, at the very least, allow us to safely serialise, send and
> receive types that both programs "know about" by virtue of having been
> linked to the same library/libraries.
>
> But yes - there are certainly constraints and edge cases aplenty here. I'm
> not entirely sure whether or not we'd need to potentially change the
> (binary) encoding of raw messages in distributed-process, for example, in
> response to this change. Currently we serialise a pointer (i.e., the pointer
> to the fingerprint for the type that's being sent), and I can imagine that
> not working properly across different nodes running on different
> architectures etc.
>
> Perhaps you should consider fleshing out ghc's current bytecode support to
> be a full VM?
>
>
> After discussing this with Simon M, we concluded there was little point in
> doing so. The GHC RTS is practically a VM anyway, and there's probably not
> that much value to be gained by shipping bytecode around. Besides, as you
> put it, the AST-based mechanisms allow for this anyway (albeit with some
> coding required on the part of the application developer) and Carter (and
> others) assure me that the mechanisms required to do this kind of thing
> already exist. We just need to find the right way to take advantage of them.
>
> Or perhaps an interesting alternative would be a BEAM backend for ghc.
>
>
> I've talked to a couple of people that want to try this. I'm intrigued, but
> have other things to focus on. :)
>
> Cheers,
> Tim
>
> _______________________________________________
> Glasgow-haskell-users mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
>
_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: Static values language extension proposal

Boespflug, Mathieu
In reply to this post by Carter Schonwald
Hi Carter, Tim,

On Sat, Jan 25, 2014 at 7:12 PM, Carter Schonwald
<[hidden email]> wrote:
> anyways
>
> 1) you should (once 7.8 is out) evaluate how far you can push your ideas wrt
> dynamic loading as a user land library.
>  If you can't make it work as a library and can demonstrate why (or how even
> though it works its not quite satisfactory), thats signals something!

Signals what?

On Sun, Jan 26, 2014 at 7:43 PM, Tim Watson <[hidden email]> wrote:
> Is that something you'll consider looking at Matthieu?

We would prefer to do it that way, to be honest. As explained in my
previous email, we identified two problems with this approach:

1) User friendliness. It's important for us that Cloud Haskell be
pretty much as user friendly and easy to use as Erlang is.

    a) I don't know that it's possible from Template Haskell to detect
and warn the user when dependent modules have not been compiled into
dynamic object code or into static code with the right flags.

    b)  It's very convenient in practice to be able to send not just
`f` if `f` is a global identifier, but in general `e` where `e` is any
closed expression mentioning only global names. That can easily be
done by having the compiler float the expression `e` to the top-level
and give it a global name. I don't see how to do that in TH in a user
friendly way.

2) A technical issue: you ought to be able to send unexported
functions across the wire, just as you can pass unexported functions
as arguments to higher-order functions. Yet GHC does not create linker
symbols for unexported identifiers, so our approach would break down.
Worse, I don't think that it's even possible to detect in TH whether
an identifier is exported or not, in order to warn the user. One could
imagine a compiler flag to force the creation of linker symbols for
all toplevel bindings, exported or unexported. But that seems
wasteful, and potentially not very user friendly.

If the above can be solved, all the better!

If not: we don't always want to touch the compiler, but when we do,
ideally it should be in an unintrusive way. I contend our proposal
fits that criterion. And our cursory implementation efforts seem to
confirm that so far.

> But I really think insisting that the linker symbol names denote the "datum
> agreement" in a distributed system is punting on what should be handled at
> the application level. Simon Marlow put some improvements into GHC to help
> improve doing dynamic code (un)loading, stress test that!

We could use either the system linker or rts linker. Not sure that it
makes any difference at the application level.

> 2) I've a work in progress on specing out a proper (and sound :) ) static
> values type extension for ghc, that will be usable perhaps in your your case
> (though by dint of being sound, will preclude some of the things you think
> you want).

I look forward to hearing more about that. How is the existing
proposal not (type?) sound?

> BUT, any type system changes need to actually provide safety.

To be clear, this proposal doesn't touch the type checker in any way.

> As for *how* to send an AST fragment, edward kmett and other have some
> pretty nice typed AST models that are easy to adapt and extend for an
> application specific use case. Bound
> http://hackage.haskell.org/package/bound is one nice one.
>
> heres a really really good school of haskell exposition
> https://www.fpcomplete.com/user/edwardk/bound

These are nice encodings for AST's. But they don't address how to
minimize the amount of code to ship around the cluster. If you have no
agreement about what functions are commonly available, then the AST
needs to include the code for the function you are sending, + any
functions it depends, + any of their dependencies, and so on
transitively.

Tim, perhaps the following also answers some of your questions. This
is where the current proposal comes in: if you choose to ship around
AST's, you can minimize their size by having them mention shared
linker symbol names. Mind, that's already possible today, by means of
the global RemoteTable, but it's building that remote table safely,
conveniently, in a modular way, and with static checking that no
symbols from any of the modules that were linked at build time were
missed, that is difficult.

By avoiding a RemoteTable entirely, we avoid having to solve that
difficult problem. :)

Best,

--
Mathieu Boespflug
Founder at http://tweag.io.
_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: Static values language extension proposal

Brandon Allbery
On Tue, Jan 28, 2014 at 7:53 AM, Mathieu Boespflug <[hidden email]> wrote:
On Sat, Jan 25, 2014 at 7:12 PM, Carter Schonwald
<[hidden email]> wrote:
> 1) you should (once 7.8 is out) evaluate how far you can push your ideas wrt
> dynamic loading as a user land library.
>  If you can't make it work as a library and can demonstrate why (or how even
> though it works its not quite satisfactory), thats signals something!

Signals what?

That there is a shortcoming in ghc and/or the rts that needs to be addressed.

--
brandon s allbery kf8nh                               sine nomine associates
[hidden email]                                  [hidden email]
unix, openafs, kerberos, infrastructure, xmonad        http://sinenomine.net

_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: Static values language extension proposal

Tim Watson
In reply to this post by Boespflug, Mathieu
Hi Mathieu,

On 28 Jan 2014, at 12:53, Mathieu Boespflug wrote:
> We would prefer to do it that way, to be honest. As explained in my
> previous email, we identified two problems with this approach:
>
> 1) User friendliness. It's important for us that Cloud Haskell be
> pretty much as user friendly and easy to use as Erlang is.
>

Exactly!

>    a) I don't know that it's possible from Template Haskell to detect
> and warn the user when dependent modules have not been compiled into
> dynamic object code or into static code with the right flags.
>

I don't think that it is, from what I've seen, though I'm by no means an expert.

>    b)  It's very convenient in practice to be able to send not just
> `f` if `f` is a global identifier, but in general `e` where `e` is any
> closed expression mentioning only global names. That can easily be
> done by having the compiler float the expression `e` to the top-level
> and give it a global name. I don't see how to do that in TH in a user
> friendly way.

Agreed.

>
> 2) A technical issue: you ought to be able to send unexported
> functions across the wire, just as you can pass unexported functions
> as arguments to higher-order functions. Yet GHC does not create linker
> symbols for unexported identifiers, so our approach would break down.
> Worse, I don't think that it's even possible to detect in TH whether
> an identifier is exported or not, in order to warn the user. One could
> imagine a compiler flag to force the creation of linker symbols for
> all toplevel bindings, exported or unexported. But that seems
> wasteful, and potentially not very user friendly.

Interesting.

>
> If the above can be solved, all the better!
>
> If not: we don't always want to touch the compiler, but when we do,
> ideally it should be in an unintrusive way. I contend our proposal
> fits that criterion. And our cursory implementation efforts seem to
> confirm that so far.

Good!

>
>> But I really think insisting that the linker symbol names denote the "datum
>> agreement" in a distributed system is punting on what should be handled at
>> the application level. Simon Marlow put some improvements into GHC to help
>> improve doing dynamic code (un)loading, stress test that!
>
> We could use either the system linker or rts linker. Not sure that it
> makes any difference at the application level.

No indeed.

>
>> 2) I've a work in progress on specing out a proper (and sound :) ) static
>> values type extension for ghc, that will be usable perhaps in your your case
>> (though by dint of being sound, will preclude some of the things you think
>> you want).
>
> I look forward to hearing more about that.

+1

> How is the existing proposal not (type?) sound?
>

I'd like to hear more about the concerns too.

>> As for *how* to send an AST fragment, edward kmett and other have some
>> pretty nice typed AST models that are easy to adapt and extend for an
>> application specific use case. Bound
>> http://hackage.haskell.org/package/bound is one nice one.
>>
>> heres a really really good school of haskell exposition
>> https://www.fpcomplete.com/user/edwardk/bound
>
> These are nice encodings for AST's. But they don't address how to
> minimize the amount of code to ship around the cluster. If you have no
> agreement about what functions are commonly available, then the AST
> needs to include the code for the function you are sending, + any
> functions it depends, + any of their dependencies, and so on
> transitively.

That was precisely my concern with the idea of shipping *something* AST-like around. It's a lot of overhead for every application you want to develop, or a *massive* overhead to cover all bases.

>
> Tim, perhaps the following also answers some of your questions. This
> is where the current proposal comes in: if you choose to ship around
> AST's, you can minimize their size by having them mention shared
> linker symbol names.

Indeed, that does seem to simplify things.

> Mind, that's already possible today, by means of
> the global RemoteTable, but it's building that remote table safely,
> conveniently, in a modular way, and with static checking that no
> symbols from any of the modules that were linked at build time were
> missed, that is difficult.
>

Yep. It's awkward and when you get it wrong, you're either fighting with TH-obscured compiler errors or worse, the damn thing just doesn't work (because you can't decode properly on the remote node and things just crash, or worse still, just hang on waiting for the *correct* input types, which never arrive because they're not "known" to the RTS).

> By avoiding a RemoteTable entirely, we avoid having to solve that
> difficult problem. :)

Not having a RemoteTable sounds like a plus to me.

Cheers,
Tim


_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: Static values language extension proposal

Carter Schonwald
In reply to this post by Brandon Allbery
Theres actually a missing piece of information in this thread: what are the example computations that are being sent?
My understanding is that erlang has not way to send file handles, shared variables, Tvars, Mvars, memory mapped binary files, GPU code / memory pointers , and other fun unportable things between nodes, and I don't really expect / see how we can hope to sanely do that in haskell!  

point in fact, even when restricted to "exactly the same binary, running on a cluster of homogeneous machines with the exact same hardware, with a modern linux distro " you hit some gnarly problems doing this for arbitrary closures!  Its for a very simple (and fun) reason: address randomization!   

Nathan Howell was actually doing some experimentation with one strategy for this special case here https://github.com/alphaHeavy/vacuum-tube  as a deeply rts twiddling bit of hackery so you could in fact "serialize arbitrary closures" between homogeneous machines running the exact same code (and with address randomization disabled too i think)

on the GHC API front, http://www.haskell.org/ghc/docs/latest/html/libraries/ghc-7.6.3/DynamicLoading.html along with (and more appropriately http://www.haskell.org/ghc/docs/latest/html/libraries/ghc-7.6.3/ObjLink.html ) should actually give enough basic tooling to make this possible as a userland library, mind you unload was recently fixed up in HEAD by Simon Marlow to support the dynamic code loading / unloading use case he has in facebook.  Point being the GHC 7.8 version of the ObjLink api should actually give enough support tooling to prototype this idea in user land, and that plus better support for writing "direct haskell code" and getting out both a local computation and an AST we can serialize would probably be a good set of primitives for making this feasible in user land.  I 

The meat of my point is 1) "yes I want this too" but also 2) one thing I really have come to appreciate about how GHC is engineered is a lot of work is done to provide the "right" primitives so that really really great tools can be built in user land.  I think That the goal of this proposal can be accomplished quite nicely with the  ObjLink module, unless i'm not understanding something.  In Fact, because in general not every computation will be properly serializable, you need not even bother with tracking an explicit symbol table on each side, just try to load it at a given type and if it fails it wasn't there!

The point being, linkers are a thing, ghc exposes an API for linking, have you tried that api? http://www.haskell.org/ghc/docs/latest/html/libraries/ghc-7.6.3/ObjLink.html






On Tue, Jan 28, 2014 at 10:21 AM, Brandon Allbery <[hidden email]> wrote:
On Tue, Jan 28, 2014 at 7:53 AM, Mathieu Boespflug <[hidden email]> wrote:
On Sat, Jan 25, 2014 at 7:12 PM, Carter Schonwald
<[hidden email]> wrote:
> 1) you should (once 7.8 is out) evaluate how far you can push your ideas wrt
> dynamic loading as a user land library.
>  If you can't make it work as a library and can demonstrate why (or how even
> though it works its not quite satisfactory), thats signals something!

Signals what?

That there is a shortcoming in ghc and/or the rts that needs to be addressed.

--
brandon s allbery kf8nh                               sine nomine associates
[hidden email]                                  [hidden email]
unix, openafs, kerberos, infrastructure, xmonad        http://sinenomine.net


_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: Static values language extension proposal

Facundo Domínguez-3
Hello Carter,
  Thanks for the links. IIUC the ObjLink module contains an interface
to the RTS linker. The points raised by Mathieu in his last email as
(1a), (1b) and (2) still hold.

Here's a use case for (2):

module Communicate(run)

import Control.Distributed.Process

f :: Int -> Int
f = id

runSend :: Process ()
runSend = send someone (static f)

runExpect :: Int -> Process Int
runExpect n = fmap (($ n) . unstatic) expect

If any program tries to use runExpect, it would fail at runtime
because it would fail to find `f`, because `f` is not exported and
therefore a symbol for it would not appear in object files.

The solution that modifies the compiler is superior to all workarounds
we could think of to workaround this problem with a library. Any
suggestions?

Best,
Facundo

On Tue, Jan 28, 2014 at 3:03 PM, Carter Schonwald
<[hidden email]> wrote:

> Theres actually a missing piece of information in this thread: what are the
> example computations that are being sent?
> My understanding is that erlang has not way to send file handles, shared
> variables, Tvars, Mvars, memory mapped binary files, GPU code / memory
> pointers , and other fun unportable things between nodes, and I don't really
> expect / see how we can hope to sanely do that in haskell!
>
> point in fact, even when restricted to "exactly the same binary, running on
> a cluster of homogeneous machines with the exact same hardware, with a
> modern linux distro " you hit some gnarly problems doing this for arbitrary
> closures!  Its for a very simple (and fun) reason: address randomization!
>
> Nathan Howell was actually doing some experimentation with one strategy for
> this special case here https://github.com/alphaHeavy/vacuum-tube  as a
> deeply rts twiddling bit of hackery so you could in fact "serialize
> arbitrary closures" between homogeneous machines running the exact same code
> (and with address randomization disabled too i think)
>
> on the GHC API front,
> http://www.haskell.org/ghc/docs/latest/html/libraries/ghc-7.6.3/DynamicLoading.html
> along with (and more appropriately
> http://www.haskell.org/ghc/docs/latest/html/libraries/ghc-7.6.3/ObjLink.html
> ) should actually give enough basic tooling to make this possible as a
> userland library, mind you unload was recently fixed up in HEAD by Simon
> Marlow to support the dynamic code loading / unloading use case he has in
> facebook.  Point being the GHC 7.8 version of the ObjLink api should
> actually give enough support tooling to prototype this idea in user land,
> and that plus better support for writing "direct haskell code" and getting
> out both a local computation and an AST we can serialize would probably be a
> good set of primitives for making this feasible in user land.  I
>
> The meat of my point is 1) "yes I want this too" but also 2) one thing I
> really have come to appreciate about how GHC is engineered is a lot of work
> is done to provide the "right" primitives so that really really great tools
> can be built in user land.  I think That the goal of this proposal can be
> accomplished quite nicely with the  ObjLink module, unless i'm not
> understanding something.  In Fact, because in general not every computation
> will be properly serializable, you need not even bother with tracking an
> explicit symbol table on each side, just try to load it at a given type and
> if it fails it wasn't there!
>
> The point being, linkers are a thing, ghc exposes an API for linking, have
> you tried that api?
> http://www.haskell.org/ghc/docs/latest/html/libraries/ghc-7.6.3/ObjLink.html
>
>
>
>
>
>
> On Tue, Jan 28, 2014 at 10:21 AM, Brandon Allbery <[hidden email]>
> wrote:
>>
>> On Tue, Jan 28, 2014 at 7:53 AM, Mathieu Boespflug <[hidden email]> wrote:
>>>
>>> On Sat, Jan 25, 2014 at 7:12 PM, Carter Schonwald
>>> <[hidden email]> wrote:
>>> > 1) you should (once 7.8 is out) evaluate how far you can push your
>>> > ideas wrt
>>> > dynamic loading as a user land library.
>>> >  If you can't make it work as a library and can demonstrate why (or how
>>> > even
>>> > though it works its not quite satisfactory), thats signals something!
>>>
>>> Signals what?
>>
>>
>> That there is a shortcoming in ghc and/or the rts that needs to be
>> addressed.
>>
>> --
>> brandon s allbery kf8nh                               sine nomine
>> associates
>> [hidden email]
>> [hidden email]
>> unix, openafs, kerberos, infrastructure, xmonad
>> http://sinenomine.net
>
>
>
> _______________________________________________
> Glasgow-haskell-users mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
>
_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: Static values language extension proposal

Facundo Domínguez-3
Escuse me, the module export list was meant to be

> module Communicate(runExpect, runSend) where

Facundo

On Tue, Jan 28, 2014 at 5:13 PM, Facundo Domínguez
<[hidden email]> wrote:

> Hello Carter,
>   Thanks for the links. IIUC the ObjLink module contains an interface
> to the RTS linker. The points raised by Mathieu in his last email as
> (1a), (1b) and (2) still hold.
>
> Here's a use case for (2):
>
> module Communicate(run)
>
> import Control.Distributed.Process
>
> f :: Int -> Int
> f = id
>
> runSend :: Process ()
> runSend = send someone (static f)
>
> runExpect :: Int -> Process Int
> runExpect n = fmap (($ n) . unstatic) expect
>
> If any program tries to use runExpect, it would fail at runtime
> because it would fail to find `f`, because `f` is not exported and
> therefore a symbol for it would not appear in object files.
>
> The solution that modifies the compiler is superior to all workarounds
> we could think of to workaround this problem with a library. Any
> suggestions?
>
> Best,
> Facundo
>
> On Tue, Jan 28, 2014 at 3:03 PM, Carter Schonwald
> <[hidden email]> wrote:
>> Theres actually a missing piece of information in this thread: what are the
>> example computations that are being sent?
>> My understanding is that erlang has not way to send file handles, shared
>> variables, Tvars, Mvars, memory mapped binary files, GPU code / memory
>> pointers , and other fun unportable things between nodes, and I don't really
>> expect / see how we can hope to sanely do that in haskell!
>>
>> point in fact, even when restricted to "exactly the same binary, running on
>> a cluster of homogeneous machines with the exact same hardware, with a
>> modern linux distro " you hit some gnarly problems doing this for arbitrary
>> closures!  Its for a very simple (and fun) reason: address randomization!
>>
>> Nathan Howell was actually doing some experimentation with one strategy for
>> this special case here https://github.com/alphaHeavy/vacuum-tube  as a
>> deeply rts twiddling bit of hackery so you could in fact "serialize
>> arbitrary closures" between homogeneous machines running the exact same code
>> (and with address randomization disabled too i think)
>>
>> on the GHC API front,
>> http://www.haskell.org/ghc/docs/latest/html/libraries/ghc-7.6.3/DynamicLoading.html
>> along with (and more appropriately
>> http://www.haskell.org/ghc/docs/latest/html/libraries/ghc-7.6.3/ObjLink.html
>> ) should actually give enough basic tooling to make this possible as a
>> userland library, mind you unload was recently fixed up in HEAD by Simon
>> Marlow to support the dynamic code loading / unloading use case he has in
>> facebook.  Point being the GHC 7.8 version of the ObjLink api should
>> actually give enough support tooling to prototype this idea in user land,
>> and that plus better support for writing "direct haskell code" and getting
>> out both a local computation and an AST we can serialize would probably be a
>> good set of primitives for making this feasible in user land.  I
>>
>> The meat of my point is 1) "yes I want this too" but also 2) one thing I
>> really have come to appreciate about how GHC is engineered is a lot of work
>> is done to provide the "right" primitives so that really really great tools
>> can be built in user land.  I think That the goal of this proposal can be
>> accomplished quite nicely with the  ObjLink module, unless i'm not
>> understanding something.  In Fact, because in general not every computation
>> will be properly serializable, you need not even bother with tracking an
>> explicit symbol table on each side, just try to load it at a given type and
>> if it fails it wasn't there!
>>
>> The point being, linkers are a thing, ghc exposes an API for linking, have
>> you tried that api?
>> http://www.haskell.org/ghc/docs/latest/html/libraries/ghc-7.6.3/ObjLink.html
>>
>>
>>
>>
>>
>>
>> On Tue, Jan 28, 2014 at 10:21 AM, Brandon Allbery <[hidden email]>
>> wrote:
>>>
>>> On Tue, Jan 28, 2014 at 7:53 AM, Mathieu Boespflug <[hidden email]> wrote:
>>>>
>>>> On Sat, Jan 25, 2014 at 7:12 PM, Carter Schonwald
>>>> <[hidden email]> wrote:
>>>> > 1) you should (once 7.8 is out) evaluate how far you can push your
>>>> > ideas wrt
>>>> > dynamic loading as a user land library.
>>>> >  If you can't make it work as a library and can demonstrate why (or how
>>>> > even
>>>> > though it works its not quite satisfactory), thats signals something!
>>>>
>>>> Signals what?
>>>
>>>
>>> That there is a shortcoming in ghc and/or the rts that needs to be
>>> addressed.
>>>
>>> --
>>> brandon s allbery kf8nh                               sine nomine
>>> associates
>>> [hidden email]
>>> [hidden email]
>>> unix, openafs, kerberos, infrastructure, xmonad
>>> http://sinenomine.net
>>
>>
>>
>> _______________________________________________
>> Glasgow-haskell-users mailing list
>> [hidden email]
>> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
>>
_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: Static values language extension proposal

Erik de Castro Lopo-34
In reply to this post by Mathieu Boespflug
Mathieu Boespflug wrote:

> [Sorry for the multiple reposts - couldn't quite figure out which
> email address doesn't get refused by the list..]
>
>
> Hi Carter,
>
> thank you for the good points you raise. I'll try and address each of
> them as best I can below.
>
> > 0) I think you could actually implement this proposal as a userland library,
> > at least as you've described it. Have you tried doing so?
>
> Indeed, this could be done without touching the compiler at all.

We had this response really early on in this discussion.

Quite honestly I think that should have been the end of the discussion.

The existing GHC release already have a huge workload getting releases
out the door and adding to that workload without adding manpower and
resources would be a bad idea.

You really should try doing this as a library outside of GHC and if GHC
needs a few small additional features, they can be added.

> The `static e` form could as well be a piece of Template Haskell, but
> making it a proper extension means that the compiler can enforce more
> invariants and be a bit more helpful to the user.

Once it works outside GHC and has proven useful, then it might be worthwhile
add small specific, easily testable/maintainable features to GHC to support
what goes on on your library.

Erik
--
----------------------------------------------------------------------
Erik de Castro Lopo
http://www.mega-nerd.com/
_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: Static values language extension proposal

Jost Berthold
In reply to this post by Carter Schonwald
On 01/28/2014 06:03 PM, Carter Schonwald wrote:

> Theres actually a missing piece of information in this thread: what
> are the example computations that are being sent? My understanding is
> that erlang has not way to send file handles, shared variables,
> Tvars, Mvars, memory mapped binary files, GPU code / memory pointers
> , and other fun unportable things between nodes, and I don't really
> expect / see how we can hope to sanely do that in haskell!
>
> [...]"exactly the same binary, running on a cluster of homogeneous
> machines with the exact same hardware, with a modern linux distro "
> [...]
>
> Nathan Howell was actually doing some experimentation with one
> strategy for this special case here
> https://github.com/alphaHeavy/vacuum-tube  as a deeply rts twiddling
> bit of hackery so you could in fact "serialize arbitrary closures"
> between homogeneous machines running the exact same code (and with
> address randomization disabled too i think)

When mentioning Nathan's approach (based on foreign primops), let me
point to a more complete, RTS-backed implementation; work done by myself
and itself based on a long-standing runtime support for a parallel
Haskell on distributed memory systems.
The latest instance of this rts-based serialisation was reported in the
Haskell-implementors' workshop 2013 (
www.haskell.org/wikiupload/2/28/HIW2013PackingAPI.pdf ); code is on
github (https://github.com/jberthold/rts-serialisation)

Some technical remarks:
-Nathan's prim.op approach is awesome, but it is not easy to get its
interplay with garbage collection right. It is on my list to take a look
at this code again and see how far we can push the envelope.

-About address randomisation: The RTS-based serialisation uses relative
locations from a known offset to handle it. A more concerning detail is
that CAFs must be reverted rather than discarded during GC (currently
they are just retained, not satisfactory for long-running code).

-About sending arbitrary closures: indeed it does not make any sense to
transfer MVars and IORefs (file handles, StablePtrs, etc). My approach
is to solve this dynamically by exception handling. I can imagine that
there is a sensible combination of RTS support with a suitable type
class framework (Static, for one), but lazy evaluation, especially lazy
I/O, complicates matters.

/ Jost Berthold
_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Reply | Threaded
Open this post in threaded view
|

Re: Static values language extension proposal

Boespflug, Mathieu
In reply to this post by Carter Schonwald
Hi Carter,

On Tue, Jan 28, 2014 at 6:03 PM, Carter Schonwald
<[hidden email]> wrote:
> Theres actually a missing piece of information in this thread: what are the
> example computations that are being sent?

Quite simply, the same as those considered in the original Cloud
Haskell paper, that already advocates the extension that Facundo's
first email merely fleshed out a tiny bit. Here's the link once again:

"Towards Haskell in the Cloud", Jeff Epstein, Andrew P. Black, and
Simon Peyton-Jones (2011).
http://research.microsoft.com/en-us/um/people/simonpj/papers/parallel/remote.pdf

We are emphatically not considering "arbitrary closures" as you say
below, anymore than the original paper does. As such...

> My understanding is that erlang has not way to send file handles, shared
> variables, Tvars, Mvars, memory mapped binary files, GPU code / memory
> pointers , and other fun unportable things between nodes, and I don't really
> expect / see how we can hope to sanely do that in haskell!

... the above is completely impossible. The original paper explains
why this is so (see Sections 2.3 and 5.1). Here's the gist:

1. you can only send remotely serializable values, i.e. that have an
instance of class Serializable.
2. none of the above have a Serializable instance, and are hence not
"send"-able.

When it comes to sending closures capturing any of the above types of
values, the reasoning goes like this:

3. a closure in the sense of CH is a pair of a static value and an environment,
4. a closure can only be sent if it is serializable,
5. a closure is serializable only if its its environment can be serialized,
5. its environment can be serialized only if all free variables of the
closure can,
6. none of the above have a Serializable instance,
7. hence any closure capturing file handles, MVars, memory pointers,
etc cannot be sent.

> point in fact, even when restricted to "exactly the same binary, running on
> a cluster of homogeneous machines with the exact same hardware, with a
> modern linux distro " you hit some gnarly problems doing this for arbitrary
> closures!  Its for a very simple (and fun) reason: address randomization!

Which is why neither we nor the original paper considered using
addresses as labels for static values. We use linker labels, which are
stable.

> on the GHC API front,
> http://www.haskell.org/ghc/docs/latest/html/libraries/ghc-7.6.3/DynamicLoading.html
> along with (and more appropriately
> http://www.haskell.org/ghc/docs/latest/html/libraries/ghc-7.6.3/ObjLink.html
> ) should actually give enough basic tooling to make this possible as a
> userland library, mind you unload was recently fixed up in HEAD by Simon
> Marlow to support the dynamic code loading / unloading use case he has in
> facebook.  Point being the GHC 7.8 version of the ObjLink api should
> actually give enough support tooling to prototype this idea in user land,
> and that plus better support for writing "direct haskell code" and getting
> out both a local computation and an AST we can serialize would probably be a
> good set of primitives for making this feasible in user land.  I

For the third time: we can of course use any linker API that the
system or the compiler happens to provide, so long as it allows
resolving linker symbols to Haskell values. The (small) extension
under consideration does not replace or add to any existing linker
API. It just transparently floats closed expressions to the top-level,
makes sure linker symbols will exist at runtime (they currently don't
always do),  and does some basic sanity checks so the user doesn't
lose his.

I listed problems labeled 1a), 1b) and 2) in my previous email. You
still haven't showed us how to address those in pure TH userland.

> In Fact, because in general not every computation
> will be properly serializable, you need not even bother with tracking an
> explicit symbol table on each side, just try to load it at a given type and
> if it fails it wasn't there!
>
> The point being, linkers are a thing, ghc exposes an API for linking, have
> you tried that api?
> http://www.haskell.org/ghc/docs/latest/html/libraries/ghc-7.6.3/ObjLink.html

Yes we have. But I don't see how using it or not using it makes any
difference to the user interface of the proposed compiler extension.
It's an implementation detail with tradeoffs that Facundo could
explain in detail in GHC ticket #8711 if you hadn't rudely closed it
as a "duplicate" of some future and unspecified work of yours.

Best,

Mathieu
_______________________________________________
Glasgow-haskell-users mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
12