Explicit approach to lazy effects For probability monad?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Explicit approach to lazy effects For probability monad?

Benjamin Redelings

Hi,

I'm wondering if people could point me to some background on explicitly representing lazy effects in Haskell?  The only effect I am concerned about at this point is `seq`.  If there is a way to do this with monads, that would be great.  (I know that a lot of people hate lazy effects.  However, I think this is an orthogonal issue to the question of how to *represent* lazy effects explicitly.)

As background, I'm using a lazy probability monad where random sampling is done with unsafeInterleaveST/IO (or something similar), so that random variables are only sampled if they are actually used:

model = do
   x <- sample $ normal 0 1            -- the interpreter does `unsafeInterleaveIO` to implement 'sample'
   cond <- sample $ bernoulli 0.5
   let y = if cond == 1 then x else 0
   return y

Here, x is only sampled if cond==1.  This has a lot of benefits in procedures like Markov chain Monte Carlo.

I initially thought that I could model lazy effects by mapping a -> (a,state) as in the ST monad.  However, it seems  like lazy effects basically require allowing forked state threads to join CONDITIONALLY on if a value is used/forced, and I don't see how to do that without a source-to-source transformation of functions. It seems like lazy effects require tracking state within each closure, and not globally within the monad interpreter.

It kind of seems like this can't work if >>= has type 'm a -> (a -> m b) -> mb', assuming m a = (a,state).  The second argument needs to have type (m a -> m b) instead of (a -> m b), since the computation needs to have access to the state produced by the first "m a" computation, in order to (optionally) depend on the state thread for first computation.

Does this make any sense?  I don't know the literature in this area.

Q1. Is there a nice way of representing lazy effects that someone could point me to?

Q2. Alternatively, is there a proof that either

  (i) there is not a monadic way to represent lazy effects, or

  (ii) fmap f requires a source-to-source transformation (for example to explicitly join state threads on strict operations like ($) and case).

-BenRI

P.S. As even more background, I'm using a probability monad that allows you to e.g. generate an infinite list of random variables, and only instantiate the ones that are looked at:

model = do
  n <- sample $ geometric 0.5
  ys <- sample $ list (repeat $ normal 0.0 1.0)
  zs <- take n ys
  return (sum zs)

Here the number of random variables (n) can change over time, and I ultimately want effects to happen when they are instantiated.  There is a brief overview at http://www.bali-phy.org/models.php


_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Explicit approach to lazy effects For probability monad?

Sandra Dylus
Hi,

On 15. Oct 2019, at 16:59, Benjamin Redelings <[hidden email]> wrote:

> I'm wondering if people could point me to some background on explicitly representing lazy effects in Haskell?  The only effect I am concerned about at this point is `seq`.  If there is a way to do this with monads, that would be great.  (I know that a lot of people hate lazy effects.  However, I think this is an orthogonal issue to the question of how to *represent* lazy effects explicitly.)

if you’re explicitly interested in sharing computations (rather than only modelling non-strictness) then the following approach by Fisher, Kiselyov and Shan might be of interest.

http://homes.sice.indiana.edu/ccshan/rational/S0956796811000189a.pdf (Purely functional lazy nondeterministic programming)

The are modelling the functional logic language Curry, but have also some remarks about modelling a lazy probabilistic language with their approach.
If you’re not interested in the sharing part of laziness, the paper might be a good first starting point nonetheless. They use a deep monadic embedding that you can use to model non-strictness. Other papers that use such an encoding are the following.

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.192.7153&rep=rep1&type=pdf#page=8 (Transforming Functional Logic Programs into Monadic Functional Programs)
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.134.9706&rep=rep1&type=pdf (Verifying Haskell Programs Using Constructive Type Theory)

Best regards
Sandra
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Explicit approach to lazy effects For probability monad?

Olaf Klinke
In reply to this post by Benjamin Redelings
Hi Benjamin,

Your example code seems to deal with two distinct types:
The do-notation is about the effects monad (on the random number generator?) and the `sample` function pulls whatever representation you have for an actual probability distribution into this effect monad. In my mental model, the argument to `sample` represents a function Double -> x that interprets a number coming out of the standard random number generator as an element of type x.
I suggest to consider the following two properties of the mathematical probability monad (a.k.a. the Giry monad), which I will use as syntactic re-write rules in your modeling language.

The first property is Fubini's Theorem. In Haskell terms it says that for all f, a :: m x and b :: m y the two terms

do {x <- a; y <- b; f x y}
do {y <- b; x <- a; f x y}

are semantically equivalent. (For state monads, this fails.) Monads where this holds are said to be commutative. If you have two urns, then drawing from the left and then drawing from the right is the same as first drawing from the right and then drawing from the left. Using Fubini, we can swap the first two lines in your example:

model = do
   cond <- bernoulli 0.5
   x <- normal 0 1
   return (if cond == 1 then x else 0)

This desugars to

bernoulli 0.5 >>= (\cond -> normal 0 1 >>= (\x -> return (if cond == 1 then x else return 0)))
bernoulli 0.5 >>= (\cond -> fmap (\x -> if cond == 1 then x else 0) (normal 0 1))

The second property is a kind of lazyness, namely

    fmap (const x) and return are semantically equivalent.

which holds for mathematical distributions (but not for state monads). Now one could argue that in case cond == 0 the innermost function is constant in x, in which case the whole thing does not depend on the argument (normal 0 1). The Lemma we need here is semantic equivalence of the following two lambda terms.

\cond -> \x -> if cond == 1 then x else 0
\cond -> if cond == 1 then id else const 0

If the above is admissible then the following syntactic transformation is allowed:

model = do
   cond <- bernoulli 0.5
   if cond == 1 then normal 0 1 else return 0

which makes it obvious that the normal distribution is only sampled when needed. But I don't know whether you would regard this as the same model. Notice that I disregarded your `sample` function. That is, my monadic language is the monad of probabilites, not the monad of state transformations of a random number generator. Maybe you can delay using the random number generator until the very end? I don't know the complete set of operations your modeling language sports. If that delay is possible, then maybe you can use a monad that has the above two properties (e.g. a reader monad) and only feed the random numbers to the final model. As proof of concept, consider the following.

type Model = Reader Double
model :: Model Int
model = do
  x <- reader (\r -> last [1..round (recip r)])
  cond <- reader (\r -> r > 0.5)
  return (if cond then x else 0)

runReader model is fast for very small inputs, which would not be the case when the first line was always evaluated.

Cheers,
Olaf
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Explicit approach to lazy effects For probability monad?

Olaf Klinke
In reply to this post by Benjamin Redelings
> It kind of seems like this can't work if >>= has type 'm a -> (a -> m b)
> -> mb', assuming m a = (a,state).  The second argument needs to have
> type (m a -> m b) instead of (a -> m b), since the computation needs to
> have access to the state produced by the first "m a" computation, in
> order to (optionally) depend on the state thread for first computation.

My earlier suggestion of using the pure reader monad is not useful for two reasons:
(1) A single random number is shared among all sub-computations, which destroys statistical independence.
(2) Multi-dimensional distributions arguably require more than a single floating point number to parametrize.

Maybe the following two monads hint at a better way, although none of them exhibits the two desirable properties I suggested in my first reply. It seems that both exhibit the sequential nature that you want to avoid, if the underlying monad m is sequential (i.e. stateful).

(a) The free monad over the reader functor.

type Distribution a
   = Free ((->) Double) a
   = Pure a | Impure (r -> Free ((->) Double) a)
This is a reader monad of functions
(Double,Double,...,Double) -> a
where the tuple length may be any finite number.
-- @choose p@ chooses the first distribution with probability @p@.
choose :: Double -> Distribution a -> Distribution a -> Distribution a
choose p x y = Impure (\r -> if r <= p then x else y)
-- Feed a random number generator into a distribution.
-- You get to decide which monad that is.
sample :: Monad m => m r -> Free ((->) r) a -> m a
sample _ (Pure a) = pure a
sample gen (Impure f) = (sample gen) =<< (fmap f gen)

(b) A reader monad which passes copies of the random number generator around, not the generator state. Again, you get to choose which monad the random number generator lives in.

type DistrT m a
   = ReaderT (m Double) m a
   ~ m Double -> m a
-- 'chooseT' queries the random number generator once
-- and then passes it down to either of the two choices.
chooseT :: Monad m => Double -> DistrT m a -> DistrT m a -> DistrT m a
chooseT p x y = ReaderT (\gen -> gen >>= (\r -> if r <= p then runReaderT x gen else runReaderT y gen))
sampleT = runReaderT :: DistrT m a -> m Double -> m a

Apologies that I can not be any more helpful at the moment. But the topic really interests me, please do post any progress on haskell-cafe.

Olaf
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Explicit approach to lazy effects For probability monad?

Benjamin Redelings
In reply to this post by Olaf Klinke

Hi Olaf,

Thanks for your reply!  I think I was unclear about a few things:

1. Mainly, I am _assuming_ that you can implement a lazy probability monad while ignoring random number generators.  So, monad should be commutative, and should have the second property of laziness that you mention.

(As an aside, I think the normal way to do this is to implement a function that splits the random number generator whenever you perform a lazy random computation using function: split :: g -> (g,g).  My hack is to present that we have a hardware instruction that generates true random numbers, and then put that in the IO Monad, and then use unsafeInterLeaveIO.  However, I would have to think more about this.)

2. My question is really about how you can represent side effects in a lazy context.  Thus the monad would be something like

EffectMonad = (a, Set Effect), 

where Effect is some ADT that represents effects.  Each effect represents some action that can be undone, such as registering a newly created random variable in the list of all random variables.

This seems to be easy in a strict context, because you can change a function a->b that has effects into a-> EffectMonad b.  Then your interpreter just needs to modify the global state to add the effects from the

interpreter state1 (x <<= y) = let (result1,effects1) = interpreter state1 x
                                  state2 = state1 `union` effects1
                               in interpreter state2 (y result1)

However, with a lazy language I think this does not work, because we do not want to include "effects1" unless the "result1" is actually consumed.

In that context, I think that a function (a->b) would end up becoming

EffectMonad a -> EffectMonad (EffectMonad b)

The argument 'a' changes to EffectMonad 'a' because the function itself (and not the interpreter) must decide whether to include the effects of the input into the effects of the output.  The output changes to EffectMonad (EffectMonad b) so that the result is still of type (EffectMonad b) after the result is unwrapped.

Does that make more sense?

-BenRI




On 10/17/19 4:03 PM, Olaf Klinke wrote:
Hi Benjamin, 

Your example code seems to deal with two distinct types: 
The do-notation is about the effects monad (on the random number generator?) and the `sample` function pulls whatever representation you have for an actual probability distribution into this effect monad. In my mental model, the argument to `sample` represents a function Double -> x that interprets a number coming out of the standard random number generator as an element of type x. 
I suggest to consider the following two properties of the mathematical probability monad (a.k.a. the Giry monad), which I will use as syntactic re-write rules in your modeling language. 

The first property is Fubini's Theorem. In Haskell terms it says that for all f, a :: m x and b :: m y the two terms

do {x <- a; y <- b; f x y}
do {y <- b; x <- a; f x y}

are semantically equivalent. (For state monads, this fails.) Monads where this holds are said to be commutative. If you have two urns, then drawing from the left and then drawing from the right is the same as first drawing from the right and then drawing from the left. Using Fubini, we can swap the first two lines in your example: 

model = do
   cond <- bernoulli 0.5
   x <- normal 0 1
   return (if cond == 1 then x else 0)

This desugars to

bernoulli 0.5 >>= (\cond -> normal 0 1 >>= (\x -> return (if cond == 1 then x else return 0)))
bernoulli 0.5 >>= (\cond -> fmap (\x -> if cond == 1 then x else 0) (normal 0 1))

The second property is a kind of lazyness, namely 

    fmap (const x) and return are semantically equivalent.

which holds for mathematical distributions (but not for state monads). Now one could argue that in case cond == 0 the innermost function is constant in x, in which case the whole thing does not depend on the argument (normal 0 1). The Lemma we need here is semantic equivalence of the following two lambda terms. 

\cond -> \x -> if cond == 1 then x else 0
\cond -> if cond == 1 then id else const 0

If the above is admissible then the following syntactic transformation is allowed: 

model = do
   cond <- bernoulli 0.5
   if cond == 1 then normal 0 1 else return 0

which makes it obvious that the normal distribution is only sampled when needed. But I don't know whether you would regard this as the same model. Notice that I disregarded your `sample` function. That is, my monadic language is the monad of probabilites, not the monad of state transformations of a random number generator. Maybe you can delay using the random number generator until the very end? I don't know the complete set of operations your modeling language sports. If that delay is possible, then maybe you can use a monad that has the above two properties (e.g. a reader monad) and only feed the random numbers to the final model. As proof of concept, consider the following.

type Model = Reader Double
model :: Model Int
model = do
  x <- reader (\r -> last [1..round (recip r)])
  cond <- reader (\r -> r > 0.5)
  return (if cond then x else 0)

runReader model is fast for very small inputs, which would not be the case when the first line was always evaluated. 

Cheers,
Olaf

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Explicit approach to lazy effects For probability monad?

Benjamin Redelings
In reply to this post by Sandra Dylus

Hi Sandra,

Thank you for the references!


On 10/15/19 11:10 AM, Sandra Dylus wrote:
if you’re explicitly interested in sharing computations (rather than only modelling non-strictness) then the following approach by Fisher, Kiselyov and Shan might be of interest.

http://homes.sice.indiana.edu/ccshan/rational/S0956796811000189a.pdf (Purely functional lazy nondeterministic programming)

The are modelling the functional logic language Curry, but have also some remarks about modelling a lazy probabilistic language with their approach.
If you’re not interested in the sharing part of laziness, the paper might be a good first starting point nonetheless. They use a deep monadic embedding that you can use to model non-strictness. Other papers that use such an encoding are the following.

Their function 'share :: m a -> m (m a)' is interesting: I think the double wrapping m (m a) suggests how one could handle lazy effects in a monad.

A function (a->b) would get lifted to (m a -> m (m b)).

(i) the input changes from a to (m a) so that the function itself can decide whether to include the effect of the input into the output

(ii) the output changes from b to m (m b) so that it still has type (m b) after being unwrapped by the monad.

For example, if g takes type (m b) as input, then:

do 
  x <- f args   -- if f has return type (m b) then x has type b
  y <- g x      -- but x needs to have type (m b) here
  return y

-BenRI
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.192.7153&rep=rep1&type=pdf#page=8 (Transforming Functional Logic Programs into Monadic Functional Programs)
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.134.9706&rep=rep1&type=pdf (Verifying Haskell Programs Using Constructive Type Theory)

Best regards
Sandra

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Explicit approach to lazy effects For probability monad?

Olaf Klinke
In reply to this post by Benjamin Redelings
Benjamin,

I believe that with the right monad, you won't need to think about side-effects, at least not the side-effects that are manual fiddling with registering variables. Haskell should do that for you. It seems to me what you really need is a strictness analyzer together with the appropriate re-write rules that push the call to monadic actions as deep into the probabilistic model as possible. But initially you said you want to avoid source-to-source translations.

Indeed your mention of splitting the random number generator seems to buy laziness, judging by the documentation of MonadInterleave in the MonadRandom package. But looking at the definition of interleave for RandT you can see that the monadic computation is still executed, only the random number generator state is restored afterwards. In particular, side effects of the inner monad are always executed. (Or was I using it wrong?) Hence from what I've seen my judgement is that state transformer monads are a dead end.

The idea with changing a -> m b to m a -> m (m b) seemed promising as well. You can easily make a Category instance for the type

C a b = m a -> m (m b)

so that composition of arrows in C does not necessarily execute the monadic computation. However, functions of the Arrow instance (e.g. 'first') must look at the input monad action in order to turn C a b into C (a,c) (b,c), at least I could not think of another way.

Sorry for writing so destructive posts. In mathematics it is generally easier to find counterexamples than to write proofs.

Olaf
 

> Am 07.11.2019 um 17:56 schrieb Benjamin Redelings <[hidden email]>:
>
> Hi Olaf,
>
> Thanks for your reply!  I think I was unclear about a few things:
>
> 1. Mainly, I am _assuming_ that you can implement a lazy probability monad while ignoring random number generators.  So,       monad should be commutative, and should have the second property of laziness that you mention.
>
> (As an aside, I think the normal way to do this is to implement a function that splits the random number generator whenever you perform a lazy random computation using function: split :: g -> (g,g).  My hack is to present that we have a hardware instruction that generates true random numbers, and then put that in the IO Monad, and then use unsafeInterLeaveIO.  However, I would have to think more about this.)
> 2. My question is really about how you can represent side effects in a lazy context.  Thus the monad would be something like
> EffectMonad = (a, Set Effect),
>
> where Effect is some ADT that represents effects.  Each effect represents some action that can be undone, such as registering a newly created random variable in the list of all random variables.
> This seems to be easy in a strict context, because you can change a function a->b that has effects into a-> EffectMonad b.  Then your interpreter just needs to modify the global state to add the effects from the
> interpreter state1 (x <<= y) = let (result1,effects1) = interpreter state1 x
>                                   state2 = state1 `union` effects1
>                                in interpreter state2 (y result1)
>
> However, with a lazy language I think this does not work, because we do not want to include "effects1" unless the "result1" is actually consumed.
>
> In that context, I think that a function (a->b) would end up becoming
>
> EffectMonad a -> EffectMonad (EffectMonad b)
>
> The argument 'a' changes to EffectMonad 'a' because the function itself (and not the interpreter) must decide whether to include the effects of the input into the effects of the output.  The output changes to EffectMonad (EffectMonad b) so that the result is still of type (EffectMonad b) after the result is unwrapped.
>
> Does that make more sense?
> -BenRI
>
>
>
> On 10/17/19 4:03 PM, Olaf Klinke wrote:
>> Hi Benjamin,
>>
>> Your example code seems to deal with two distinct types:
>> The do-notation is about the effects monad (on the random number generator?) and the `sample` function pulls whatever representation you have for an actual probability distribution into this effect monad. In my mental model, the argument to `sample` represents a function Double -> x that interprets a number coming out of the standard random number generator as an element of type x.
>> I suggest to consider the following two properties of the mathematical probability monad (a.k.a. the Giry monad), which I will use as syntactic re-write rules in your modeling language.
>>
>> The first property is Fubini's Theorem. In Haskell terms it says that for all f, a :: m x and b :: m y the two terms
>>
>> do {x <- a; y <- b; f x y}
>> do {y <- b; x <- a; f x y}
>>
>> are semantically equivalent. (For state monads, this fails.) Monads where this holds are said to be commutative. If you have two urns, then drawing from the left and then drawing from the right is the same as first drawing from the right and then drawing from the left. Using Fubini, we can swap the first two lines in your example:
>>
>> model = do
>>    cond <- bernoulli 0.5
>>    x <- normal 0 1
>>    return (if cond == 1 then x else 0)
>>
>> This desugars to
>>
>> bernoulli 0.5 >>= (\cond -> normal 0 1 >>= (\x -> return (if cond == 1 then x else return 0)))
>> bernoulli 0.5 >>= (\cond -> fmap (\x -> if cond == 1 then x else 0) (normal 0 1))
>>
>> The second property is a kind of lazyness, namely
>>
>>     fmap (const x) and return are semantically equivalent.
>>
>> which holds for mathematical distributions (but not for state monads). Now one could argue that in case cond == 0 the innermost function is constant in x, in which case the whole thing does not depend on the argument (normal 0 1). The Lemma we need here is semantic equivalence of the following two lambda terms.
>>
>> \cond -> \x -> if cond == 1 then x else 0
>> \cond -> if cond == 1 then id else const 0
>>
>> If the above is admissible then the following syntactic transformation is allowed:
>>
>> model = do
>>    cond <- bernoulli 0.5
>>    if cond == 1 then normal 0 1 else return 0
>>
>> which makes it obvious that the normal distribution is only sampled when needed. But I don't know whether you would regard this as the same model. Notice that I disregarded your `sample` function. That is, my monadic language is the monad of probabilites, not the monad of state transformations of a random number generator. Maybe you can delay using the random number generator until the very end? I don't know the complete set of operations your modeling language sports. If that delay is possible, then maybe you can use a monad that has the above two properties (e.g. a reader monad) and only feed the random numbers to the final model. As proof of concept, consider the following.
>>
>> type Model = Reader Double
>> model :: Model Int
>> model = do
>>   x <- reader (\r -> last [1..round (recip r)])
>>   cond <- reader (\r -> r > 0.5)
>>   return (if cond then x else 0)
>>
>> runReader model is fast for very small inputs, which would not be the case when the first line was always evaluated.
>>
>> Cheers,
>> Olaf
>>

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Explicit approach to lazy effects For probability monad?

Benjamin Redelings

Hi Olaf,

Thanks for your interesting response!  Keep in mind that I have written my own Haskell interpreter, and that the lazy random monad is already working in my interpreter.  So I am seeing the issue with effects as second issue. 

See http://www.bali-phy.org/models.php for some more information on this system.  Note that I also have 'mfix' working in the lazy random monad, which I think is pretty cool.  (See the second example about trait evolution on a random tree).  However, I have no type system at all at the present time.

On 11/10/19 4:21 PM, Olaf Klinke wrote:
I believe that with the right monad, you won't need to think about side-effects, at least not the side-effects that are manual fiddling with registering variables. Haskell should do that for you. 

So, I'm trying to implement a random walk on the space of Haskell program traces.  (A program trace records the execution graph to the extent that it depends on builtin operations with random results, such as sampling from a normal distribution.)  I could provide a PDF of an example execution trace, if you are interested.

This means that logically Haskell is kind of running at two levels:

(1) there is an inner random program that we are making a trace of.

(2) there is another outer program that modifies these traces.

The reason I think I need "registration" side-effects is that the outer program needs a list of random variables in the trace graph that it can randomly tweak.  For example, if the inner program is given by:

do
  x <- sample $ normal 0 1  ----- [1]
  observe (normal x 1) 10
  return $ log_all [x %% "x"]

then the idea is that "x" would get registered when it first accessed.  This allows the outer program needs to know that it can modify the node for "x" in the trace graph.  This side-effect is invisible to the inner program, but visible to the outer program.

So, I'm not sure if it is correct that I would not need side-effects.  However, if I could avoid side-effects, that would be great.

It seems to me what you really need is a strictness analyzer together with the appropriate re-write rules that push the call to monadic actions as deep into the probabilistic model as possible. But initially you said you want to avoid source-to-source translations. 

I think maybe you are right that source-to-source transforms are the right way.  Currently I have avoided by writing my own interpreter and virtual machine that keeps track of trace graphs.

What do you mean about pushing monadic actions as deep into the probabilistic model as possible?

Indeed your mention of splitting the random number generator seems to buy laziness, judging by the documentation of MonadInterleave in the MonadRandom package. But looking at the definition of interleave for RandT you can see that the monadic computation is still executed, only the random number generator state is restored afterwards. In particular, side effects of the inner monad are always executed. (Or was I using it wrong?) Hence from what I've seen my judgement is that state transformer monads are a dead end. 

Hmm... I will have to look at this.  Are you saying that in the MonadRandom package, interleaved computations are not sequenced just because of the random number generator, but they ARE sequenced if they perform any monadic actions?  If this is true, then it would seem that the state is not the problem?

In any case I think the problems that come from threading random number generator state are not fundamental.  There are machine instructions that generate true randomness, so one could always implement randomness in the IO monad without carrying around a state.

interpreter (f >>= g) = do
                           x <- unsafeInterleaveIO $ interpreter f 
                           interpreter $ g x

I have a very hacky system, but this is basically what I am doing.  It is probably terrible, but I have not run into any problems so far.  Is there a reason that one should avoid this?

So far it seems to work.  I am not that worried about the probability monad.


The idea with changing a -> m b to m a -> m (m b) seemed promising as well. You can easily make a Category instance for the type 

C a b = m a -> m (m b)

so that composition of arrows in C does not necessarily execute the monadic computation. However, functions of the Arrow instance (e.g. 'first') must look at the input monad action in order to turn C a b into C (a,c) (b,c), at least I could not think of another way. 

I'm sorry, I am not very familiar with the Haskell functions for categories, I just read part of Chapter 1 of an Algebraic Topology book. 

Can you possibly rephrase this in terms of (very simple) math?  Specifically, I was thinking that mapping (a->b) to (m a -> m (m b)) looks like a Functor where

a       =>  m a                           --- this is "return"

a->b =>  m a -> m (m b)        --- this is "fmap"

Why are you saying that this a category instead of a functor?  I am probably just confused, I am not very familiar with categories yet, and have not had time to go look at the Arrow instance you are talking about.

Sorry for writing so destructive posts. In mathematics it is generally easier to find counterexamples than to write proofs. 

Haha, no problem.  Its not clear to me that this is possible.  If in general lazy effects cannot be represented in Haskell using do-notation, that would probably be interesting to state formally.


-BenRI

[1] Regarding "sample", I think that if I I write "x <- normal 0 1" then I think that I cannot write "observe (normal x 1) 10", but would have to distinguish the distribution (normalDistr x 1) from the action of sampling from the distribution (normal 0 1).  In my notation (normal x 1) is the distribution itself, and not an action.

But if I could eliminate the "sample" then I think the code would look a lot cleaner.



Olaf
 

Am 07.11.2019 um 17:56 schrieb Benjamin Redelings [hidden email]:

Hi Olaf,

Thanks for your reply!  I think I was unclear about a few things:

1. Mainly, I am _assuming_ that you can implement a lazy probability monad while ignoring random number generators.  So,       monad should be commutative, and should have the second property of laziness that you mention.

(As an aside, I think the normal way to do this is to implement a function that splits the random number generator whenever you perform a lazy random computation using function: split :: g -> (g,g).  My hack is to present that we have a hardware instruction that generates true random numbers, and then put that in the IO Monad, and then use unsafeInterLeaveIO.  However, I would have to think more about this.)
2. My question is really about how you can represent side effects in a lazy context.  Thus the monad would be something like 
EffectMonad = (a, Set Effect), 

where Effect is some ADT that represents effects.  Each effect represents some action that can be undone, such as registering a newly created random variable in the list of all random variables.
This seems to be easy in a strict context, because you can change a function a->b that has effects into a-> EffectMonad b.  Then your interpreter just needs to modify the global state to add the effects from the 
interpreter state1 (x <<= y) = let (result1,effects1) = interpreter state1 x
                                  state2 = state1 `union` effects1
                               in interpreter state2 (y result1)

However, with a lazy language I think this does not work, because we do not want to include "effects1" unless the "result1" is actually consumed.

In that context, I think that a function (a->b) would end up becoming

EffectMonad a -> EffectMonad (EffectMonad b)

The argument 'a' changes to EffectMonad 'a' because the function itself (and not the interpreter) must decide whether to include the effects of the input into the effects of the output.  The output changes to EffectMonad (EffectMonad b) so that the result is still of type (EffectMonad b) after the result is unwrapped.

Does that make more sense?
-BenRI



On 10/17/19 4:03 PM, Olaf Klinke wrote:
Hi Benjamin, 

Your example code seems to deal with two distinct types: 
The do-notation is about the effects monad (on the random number generator?) and the `sample` function pulls whatever representation you have for an actual probability distribution into this effect monad. In my mental model, the argument to `sample` represents a function Double -> x that interprets a number coming out of the standard random number generator as an element of type x. 
I suggest to consider the following two properties of the mathematical probability monad (a.k.a. the Giry monad), which I will use as syntactic re-write rules in your modeling language. 

The first property is Fubini's Theorem. In Haskell terms it says that for all f, a :: m x and b :: m y the two terms

do {x <- a; y <- b; f x y}
do {y <- b; x <- a; f x y}

are semantically equivalent. (For state monads, this fails.) Monads where this holds are said to be commutative. If you have two urns, then drawing from the left and then drawing from the right is the same as first drawing from the right and then drawing from the left. Using Fubini, we can swap the first two lines in your example: 

model = do
   cond <- bernoulli 0.5
   x <- normal 0 1
   return (if cond == 1 then x else 0)

This desugars to

bernoulli 0.5 >>= (\cond -> normal 0 1 >>= (\x -> return (if cond == 1 then x else return 0)))
bernoulli 0.5 >>= (\cond -> fmap (\x -> if cond == 1 then x else 0) (normal 0 1))

The second property is a kind of lazyness, namely 

    fmap (const x) and return are semantically equivalent.

which holds for mathematical distributions (but not for state monads). Now one could argue that in case cond == 0 the innermost function is constant in x, in which case the whole thing does not depend on the argument (normal 0 1). The Lemma we need here is semantic equivalence of the following two lambda terms. 

\cond -> \x -> if cond == 1 then x else 0
\cond -> if cond == 1 then id else const 0

If the above is admissible then the following syntactic transformation is allowed: 

model = do
   cond <- bernoulli 0.5
   if cond == 1 then normal 0 1 else return 0

which makes it obvious that the normal distribution is only sampled when needed. But I don't know whether you would regard this as the same model. Notice that I disregarded your `sample` function. That is, my monadic language is the monad of probabilites, not the monad of state transformations of a random number generator. Maybe you can delay using the random number generator until the very end? I don't know the complete set of operations your modeling language sports. If that delay is possible, then maybe you can use a monad that has the above two properties (e.g. a reader monad) and only feed the random numbers to the final model. As proof of concept, consider the following.

type Model = Reader Double
model :: Model Int
model = do
  x <- reader (\r -> last [1..round (recip r)])
  cond <- reader (\r -> r > 0.5)
  return (if cond then x else 0)

runReader model is fast for very small inputs, which would not be the case when the first line was always evaluated. 

Cheers,
Olaf


    

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.
Reply | Threaded
Open this post in threaded view
|

Re: Explicit approach to lazy effects For probability monad?

Haskell - Haskell-Cafe mailing list
On Wed, Nov 13, 2019 at 11:58:59AM -0500, Benjamin Redelings wrote:

> On 11/10/19 4:21 PM, Olaf Klinke wrote:
> > I believe that with the right monad, you won't need to think about
> > side-effects, at least not the side-effects that are manual fiddling
> > with registering variables. Haskell should do that for you.
>
> So, I'm trying to implement a random walk on the space of Haskell
> program traces.
>
> This means that logically Haskell is kind of running at two levels:
>
> (1) there is an inner random program that we are making a trace of.
>
> (2) there is another outer program that modifies these traces.
>
> [..]
>
> So, I'm not sure if it is correct that I would not need side-effects. 
> However, if I could avoid side-effects, that would be great.
FWIW, I believe it's possible to avoid effects, though I'm not sure if
it can be done particularly efficiently.

Instead of maintaining an external "database of randomness" (à la
Wingate et al. 2011 -- i.e. the "registered" variables that you can
supply randomness to), you should be able to directly annotate
probabilistic nodes in your syntax tree with PRNG state snapshots.  When
you need randomness for the "outer program" semantics, you just iterate
the PRNG states on-site.

The representationally elegant way to do this seems to be to use a free
monad to denote your probabilistic programs and a cofree comonad to
represent their execution traces.  The cofree comonad allows you to
annotate the probabilistic syntax tree nodes with the required state
(parameter space value, log-likelihood, PRNG state), and then perturb
them lazily as required via a comonadic 'extend'.

I wrote up a couple of prototypes of this sort of setup awhile back:

  * https://jtobin.io/simple-probabilistic-programming
  * https://jtobin.io/comonadic-mcmc

Whether this can yield anything usefully efficient is an open question.
It does avoid the annoying "observe" and "sample" syntax, at least.

-- jared

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.

signature.asc (235 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Explicit approach to lazy effects For probability monad?

Olaf Klinke
In reply to this post by Benjamin Redelings
Benjamin,

rather than giving examples I would prefer a list of all the keywords and syntactic constructs of the language, together with their semantics. What, for instance, is the type of 'log_all'? What is its semantics? What is the difference between 'sample' and 'random'? The Bali-Phy tutorial seems to be mainly composed of examples, too.

I have to admit that I still don't understand what an execution trace is in your case. My guess: The inner program is a Haskell term that, given a different starting state of the random number generator, produces a value (and logs some intermediate values?) The outer program repeatedly feeds new starting states to the RNG and executes the inner program anew, gathering an overall picture of the random behaviour? Is that what you mean by "modifying the node x in the trace graph"?

In the remainder I will argue that 'interleave' only works if the inner monad has my second lazyness property.
The definition of interleave in MonadRandom is copied below (RandT is an alias for StateT).
The occurrence of liftM and runStateT tells you that the inner monad action is executed, no matter what. It is only the final state that is known before the action executes. Let's experiment:

import Control.Monad.State.Lazy
import Control.Monad.Writer
import Control.Applicative

-- Just for the sake of example
type Gen = Int
split :: Gen -> (Gen,Gen)
split g = (g,g)

type RandT m a = StateT Gen m a
interleave :: Monad m => RandT m a -> RandT m a
interleave m = StateT $ \g -> case split g of
    (gl,gr) -> liftM (\p -> (fst p,gr)) $ runStateT m gl

foo :: RandT (Writer [Char]) String
foo = StateT (\g -> tell "foo executed;" >> return ("foo",succ g))

bar :: RandT (Writer [Char]) String
bar = StateT (\g -> tell "bar executed;" >> return ("bar",succ g))

ghci> flip runStateT 0 $ fmap snd $ (,) <$> foo <*> bar
WriterT (Identity (("bar",2),"foo executed;bar executed;"))

ghci> flip runStateT 0 $ fmap snd $ (,) <$> (interleave foo) <*> bar
WriterT (Identity (("bar",1),"foo executed;bar executed;"))

You can see that while interleave makes the generator state look like (interleave foo) has not used the generator, the inner monadic action does execute. In particular, a sequential monad like Writer spoils the intentions of interleave. However, other non-sequential monads are more well-behaved. Consider the following.

import System.IO.Unsafe
import Data.Functor.Identity

foo' :: RandT Identity String
foo' = StateT (\g -> case unsafePerformIO (putStrLn "foo executed") of
        () -> return ("foo",succ g))

bar' :: RandT Identity String
bar' = StateT (\g -> do
    case unsafePerformIO (putStrLn "bar executed") of
        () -> return ("bar",succ g))

ghci> flip runStateT 0 $ fmap snd $ (,) <$> foo' <*> bar'
Identity ("bar executed
bar",foo executed
2)

ghci> flip runStateT 0 $ fmap snd $  (,) <$> (interleave foo') <*> bar'
Identity ("bar executed
bar",1)

I believe this is not the doing of interleave, it is a property of Identity:

ghci> fmap snd $ (,) <$> undefined <*> (Identity "bar")
Identity "bar"

This is exactly the lazyness property I was advertising: fmap.const = const.pure
In this case
  fmap (\x -> snd (x,"bar")
= fmap (const "bar")

I'm CC-ing the maintainer of MonadRandom, hoping to get his opinion on this.
Olaf

> Am 13.11.2019 um 17:58 schrieb Benjamin Redelings <[hidden email]>:
>
> Hi Olaf,
>
> Thanks for your interesting response!  Keep in mind that I have written my own Haskell interpreter, and that the lazy random monad is already working in my interpreter.  So I am seeing the issue with effects as second issue.  
>
> See http://www.bali-phy.org/models.php for some more information on this system.  Note that I also have 'mfix' working in the lazy random monad, which I think is pretty cool.  (See the second example about trait evolution on a random tree).  However, I have no type system at all at the present time.
>
> On 11/10/19 4:21 PM, Olaf Klinke wrote:
>> I believe that with the right monad, you won't need to think about side-effects, at least not the side-effects that are manual fiddling with registering variables. Haskell should do that for you.
> So, I'm trying to implement a random walk on the space of Haskell program traces.  (A program trace records the execution graph to the extent that it depends on builtin operations with random results, such as sampling from a normal distribution.)  I could provide a PDF of an example execution trace, if you are interested.
>
> This means that logically Haskell is kind of running at two levels:
>
> (1) there is an inner random program that we are making a trace of.
>
> (2) there is another outer program that modifies these traces.
>
> The reason I think I need "registration" side-effects is that the outer program needs a list of random variables in the trace graph that it can randomly tweak.  For example, if the inner program is given by:
>
> do
>   x <- sample $ normal 0 1  ----- [1]
>   observe (normal x 1) 10
>   return $ log_all [x %% "x"]
>
> then the idea is that "x" would get registered when it first accessed.  This allows the outer program needs to know that it can modify the node for "x" in the trace graph.  This side-effect is invisible to the inner program, but visible to the outer program.
>
> So, I'm not sure if it is correct that I would not need side-effects.  However, if I could avoid side-effects, that would be great.
>
>> It seems to me what you really need is a strictness analyzer together with the appropriate re-write rules that push the call to monadic actions as deep into the probabilistic model as possible. But initially you said you want to avoid source-to-source translations.
> I think maybe you are right that source-to-source transforms are the right way.  Currently I have avoided by writing my own interpreter and virtual machine that keeps track of trace graphs.
>
> What do you mean about pushing monadic actions as deep into the probabilistic model as possible?
>
>> Indeed your mention of splitting the random number generator seems to buy laziness, judging by the documentation of MonadInterleave in the MonadRandom package. But looking at the definition of interleave for RandT you can see that the monadic computation is still executed, only the random number generator state is restored afterwards. In particular, side effects of the inner monad are always executed. (Or was I using it wrong?) Hence from what I've seen my judgement is that state transformer monads are a dead end.
> Hmm... I will have to look at this.  Are you saying that in the MonadRandom package, interleaved computations are not sequenced just because of the random number generator, but they ARE sequenced if they perform any monadic actions?  If this is true, then it would seem that the state is not the problem?
>
> In any case I think the problems that come from threading random number generator state are not fundamental.  There are machine instructions that generate true randomness, so one could always implement randomness in the IO monad without carrying around a state.
>
> interpreter (f >>= g) = do
>                            x <- unsafeInterleaveIO $ interpreter f
>                            interpreter $ g x
>
> I have a very hacky system, but this is basically what I am doing.  It is probably terrible, but I have not run into any problems so far.  Is there a reason that one should avoid this?
>
> So far it seems to work.  I am not that worried about the probability monad.
>
>
>
>> The idea with changing a -> m b to m a -> m (m b) seemed promising as well. You can easily make a Category instance for the type
>>
>> C a b = m a -> m (m b)
>>
>> so that composition of arrows in C does not necessarily execute the monadic computation. However, functions of the Arrow instance (e.g. 'first') must look at the input monad action in order to turn C a b into C (a,c) (b,c), at least I could not think of another way.
>>
> I'm sorry, I am not very familiar with the Haskell functions for categories, I just read part of Chapter 1 of an Algebraic Topology book.  
>
> Can you possibly rephrase this in terms of (very simple) math?  Specifically, I was thinking that mapping (a->b) to (m a -> m (m b)) looks like a Functor where
>
> a       =>  m a                           --- this is "return"
>
> a->b =>  m a -> m (m b)        --- this is "fmap"
>
> Why are you saying that this a category instead of a functor?  I am probably just confused, I am not very familiar with categories yet, and have not had time to go look at the Arrow instance you are talking about.
>
>> Sorry for writing so destructive posts. In mathematics it is generally easier to find counterexamples than to write proofs.
> Haha, no problem.  Its not clear to me that this is possible.  If in general lazy effects cannot be represented in Haskell using do-notation, that would probably be interesting to state formally.
>
>
>
> -BenRI
>
> [1] Regarding "sample", I think that if I I write "x <- normal 0 1" then I think that I cannot write "observe (normal x 1) 10", but would have to distinguish the distribution (normalDistr x 1) from the action of sampling from the distribution (normal 0 1).  In my notation (normal x 1) is the distribution itself, and not an action.
>
> But if I could eliminate the "sample" then I think the code would look a lot cleaner.
>
>
>> Olaf
>>  
>>
>>
>>> Am 07.11.2019 um 17:56 schrieb Benjamin Redelings <[hidden email]>
>>> :
>>>
>>> Hi Olaf,
>>>
>>> Thanks for your reply!  I think I was unclear about a few things:
>>>
>>> 1. Mainly, I am _assuming_ that you can implement a lazy probability monad while ignoring random number generators.  So,       monad should be commutative, and should have the second property of laziness that you mention.
>>>
>>> (As an aside, I think the normal way to do this is to implement a function that splits the random number generator whenever you perform a lazy random computation using function: split :: g -> (g,g).  My hack is to present that we have a hardware instruction that generates true random numbers, and then put that in the IO Monad, and then use unsafeInterLeaveIO.  However, I would have to think more about this.)
>>> 2. My question is really about how you can represent side effects in a lazy context.  Thus the monad would be something like
>>> EffectMonad = (a, Set Effect),
>>>
>>> where Effect is some ADT that represents effects.  Each effect represents some action that can be undone, such as registering a newly created random variable in the list of all random variables.
>>> This seems to be easy in a strict context, because you can change a function a->b that has effects into a-> EffectMonad b.  Then your interpreter just needs to modify the global state to add the effects from the
>>> interpreter state1 (x <<= y) = let (result1,effects1) = interpreter state1 x
>>>                                   state2 = state1 `union` effects1
>>>                                in interpreter state2 (y result1)
>>>
>>> However, with a lazy language I think this does not work, because we do not want to include "effects1" unless the "result1" is actually consumed.
>>>
>>> In that context, I think that a function (a->b) would end up becoming
>>>
>>> EffectMonad a -> EffectMonad (EffectMonad b)
>>>
>>> The argument 'a' changes to EffectMonad 'a' because the function itself (and not the interpreter) must decide whether to include the effects of the input into the effects of the output.  The output changes to EffectMonad (EffectMonad b) so that the result is still of type (EffectMonad b) after the result is unwrapped.
>>>
>>> Does that make more sense?
>>> -BenRI
>>>
>>>
>>>
>>> On 10/17/19 4:03 PM, Olaf Klinke wrote:
>>>
>>>> Hi Benjamin,
>>>>
>>>> Your example code seems to deal with two distinct types:
>>>> The do-notation is about the effects monad (on the random number generator?) and the `sample` function pulls whatever representation you have for an actual probability distribution into this effect monad. In my mental model, the argument to `sample` represents a function Double -> x that interprets a number coming out of the standard random number generator as an element of type x.
>>>> I suggest to consider the following two properties of the mathematical probability monad (a.k.a. the Giry monad), which I will use as syntactic re-write rules in your modeling language.
>>>>
>>>> The first property is Fubini's Theorem. In Haskell terms it says that for all f, a :: m x and b :: m y the two terms
>>>>
>>>> do {x <- a; y <- b; f x y}
>>>> do {y <- b; x <- a; f x y}
>>>>
>>>> are semantically equivalent. (For state monads, this fails.) Monads where this holds are said to be commutative. If you have two urns, then drawing from the left and then drawing from the right is the same as first drawing from the right and then drawing from the left. Using Fubini, we can swap the first two lines in your example:
>>>>
>>>> model = do
>>>>    cond <- bernoulli 0.5
>>>>    x <- normal 0 1
>>>>    return (if cond == 1 then x else 0)
>>>>
>>>> This desugars to
>>>>
>>>> bernoulli 0.5 >>= (\cond -> normal 0 1 >>= (\x -> return (if cond == 1 then x else return 0)))
>>>> bernoulli 0.5 >>= (\cond -> fmap (\x -> if cond == 1 then x else 0) (normal 0 1))
>>>>
>>>> The second property is a kind of lazyness, namely
>>>>
>>>>     fmap (const x) and return are semantically equivalent.
>>>>
>>>> which holds for mathematical distributions (but not for state monads). Now one could argue that in case cond == 0 the innermost function is constant in x, in which case the whole thing does not depend on the argument (normal 0 1). The Lemma we need here is semantic equivalence of the following two lambda terms.
>>>>
>>>> \cond -> \x -> if cond == 1 then x else 0
>>>> \cond -> if cond == 1 then id else const 0
>>>>
>>>> If the above is admissible then the following syntactic transformation is allowed:
>>>>
>>>> model = do
>>>>    cond <- bernoulli 0.5
>>>>    if cond == 1 then normal 0 1 else return 0
>>>>
>>>> which makes it obvious that the normal distribution is only sampled when needed. But I don't know whether you would regard this as the same model. Notice that I disregarded your `sample` function. That is, my monadic language is the monad of probabilites, not the monad of state transformations of a random number generator. Maybe you can delay using the random number generator until the very end? I don't know the complete set of operations your modeling language sports. If that delay is possible, then maybe you can use a monad that has the above two properties (e.g. a reader monad) and only feed the random numbers to the final model. As proof of concept, consider the following.
>>>>
>>>> type Model = Reader Double
>>>> model :: Model Int
>>>> model = do
>>>>   x <- reader (\r -> last [1..round (recip r)])
>>>>   cond <- reader (\r -> r > 0.5)
>>>>   return (if cond then x else 0)
>>>>
>>>> runReader model is fast for very small inputs, which would not be the case when the first line was always evaluated.
>>>>
>>>> Cheers,
>>>> Olaf
>>>>
>>>>
>

_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.