Operating on HsSyn

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Operating on HsSyn

GHC - devs mailing list

Devs,

 

Shayan is working away on “Trees that grow”… do keep it on your radar:

 

To: ghc-devs
Sent: 25 May 2017 23:49

Do take a look at this:

 

·         We propose to re-engineer HsSyn itself.  This will touch a lot of code.

·         But it’s very neat, and will bring big long-term advantages

·         And we can do it a bit at a time

 

The wiki page https://ghc.haskell.org/trac/ghc/wiki/ImplementingTreesThatGrow has the details.  It’s entirely an internal change, not a change to GHC’s specification, so it’s independent of the GHC proposals process.  But I’d value the opinion of other GHC devs

 

Meanwhile I have a question. When pretty-printing HsSyn we often have a situation like this:

 

  data Pass = Parsed | Renamed | Typechecked

 

  data HsExpr (p :: Pass) = HsVar (IdP p) | ....

 

  type famliy IdP p where

     IdP Parsed      = RdrName

     IdP Renamed     = Name

     IdP Typechecked = Id

 

  instance (Outputable (IdP p)) => Outputable (HsExpr p) where

     ppr (HsVar v) = ppr v

 

The (ppr v) requires (Outputable (IdP p)), hence the context.

 

Moreover, and more seriously, there are things we just can't pretty-print

right now.  For example, HsExpr has this data constructor:

 

  data HsExpr p = ...

    | OpApp       (LHsExpr p)

                  (LHsExpr p)

                  (PostRn p Fixity)

                  (LHsExpr p)

 

To pretty-print the third argument, we'd need to add

 

  instance (Outputable (IdP p),

            Outputable (PostRn p Fixity))   -- New

        => Outputable (HsExpr p) where

     ppr (HsVar v) = ppr v

 

and that gets onerous.  So today we never print these annotations, to avoid bloating the instance contexts, which can be painful.  It bit me yesterday.

 

We have bitten that bullet for the Data class: look at HsExtension.DataId, which abbreviates the long list of dictionaries:

 

  type DataId p =

    ( Data p

    , ForallX Data p

    , Data (NameOrRdrName (IdP p))

    , Data (IdP p)

    , Data (PostRn p (IdP p))

    , Data (PostRn p (Located Name))

    , Data (PostRn p Bool)

    , Data (PostRn p Fixity)

     ,..and nine more... )

 

Let me note in passing that [wiki:QuantifiedContexts] would make this somewhat shorter

 

  type DataId p =

    ( Data p

    , ForallX Data p

    , Data (NameOrRdrName (IdP p))

    , Data (IdP p)

    , forall t. Data t => Data (PostRn p t))

 

But we still need one item in this list for each type function,

and I am worried about how this scales to the

[wiki:ImplementingTreesThatGrow] story, when we have a type

function for each data constructor -- and there are a lot of data

constructors.

 

So I have four questions

 

  1. I think we should probably use a superclass instead of a type synonym

 

class (Data p, ForallX Data p, ....) => DataId p where {}

 

Why?  Only one argument to pass, and to pass on to successive calls.  I see no downside.

 

  1. Shall we treat Outputable like Data?  (I.e. make an abbreviation for a long list of Outputable instances and use it everywhere)

  2. I thought of another way to do it (pass a token); see below

 

  1. Are there any other ways?

 

 

Token passing idea.

 

Perhaps instead of passing lots of functions, we pass a singleton token

that encodes the pass, like this:

 

  instance (PassC p) => Outputable (HsExpr p) where

     ppr (HsVar v) = case getPass :: IsPass p of

                       IsParsed      -> ppr v

                       IsRenamed     -> ppr v

                       IsTypechecked -> ppr v

 

The three ppr's are at different types, of course; that's the point.

The infrastructure is something like

 

  class PassC p where

    getPass :: IsPass p

 

  data IsPass p where

    IsParsed      :: IsPass Parsed

    IsRenamed     :: IsParsed Renamed

    IsTypechecked :: IsParsed Typechecked

 

  instance PassC Parsed where getPass = IsParsed

    ...etc...

 

Now we could sweep away all those OutputableX classes,

replacing them with dynamic tests on the singletons IsParsed etc.

This would have advantages:

 

- Probably faster: there's a dynamic test, but many fewer dictionary

  arguments and higher-order function dispatch

 

- Only one dictionary to pass; programming is easier.

 

The big downside is that it's not extensible: it works only because

we know the three cases.  But the "Trees that Grow" story really doesn't

scale well to pretty-printing: so maybe we should just give up on that?


_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Operating on HsSyn

Shayan Najd Javadipour-2
Before all this, we may need to discuss a bit about the intended semantics of
`Outputable`: does it need to print `PostRn`, or `PostTc` fields; or `Out`
suffixed constructors?  If not, then we only need to write a set of instances
for the base growable AST, once and for all.  Such instances will be polymorphic
on the extension descriptor `p`, and do not need to mention the constraints like
`(PostRn p (IdP p)`, since these are just extensions and not part of the base
growable AST.  Or, am I missing something about the intended semantics of
`Outputable`?

You write

So today we never print these annotations, to avoid bloating the instance contexts, which can be painful. 

I have been under the impression that we don't even want to print those.

Of course, there are scenarios (like `Show` instances) where we do want to write 
compositional / generic functions that take into account the extensions.
Here is my abstract overview of the scenario, that may help the discussion.
Consider data types `A`, `B`, and `C` (say, one AST datatype per compiler phase) that
 are defined as extensions to a base datatype `T`:

> A = T XA
> B = T XB
> C = T XC

where `X*`s are extension descriptors.
Now, say we want to a define functions `f_A`, `f_B`, and `f_C` over `A`, `B`, and `C`.
We have two main alternatives: 
(a) either we write these  (manually or using the deriving mechanism) separately
(b) or we write a generic / parametric function `g` over `T`, and reuse that to define `f_*`s

Of course, (b) is preferable in theory , but not always possible or preferable in practice.
In which case, we can always resort to (a).
The more varying are the definitions of `f_A`, `f_B`, and `f_C` the more parametric should 
`g` get, as this is the case for any generic function.

With a correct design, I believe, these are all independent of Trees that Grow story itself: 
we are now not only trying to reuse data types, and functions agnostic towards extensions
(pretty printers in my view of their semantics), but also reuse functions with parametric / 
varying behaviour with respect to extensions. 

/Shayan



On Fri, Jul 28, 2017 at 10:18 AM, Simon Peyton Jones <[hidden email]> wrote:

Devs,

 

Shayan is working away on “Trees that grow”… do keep it on your radar:

 

To: ghc-devs
Sent: 25 May 2017 23:49

Do take a look at this:

 

·         We propose to re-engineer HsSyn itself.  This will touch a lot of code.

·         But it’s very neat, and will bring big long-term advantages

·         And we can do it a bit at a time

 

The wiki page https://ghc.haskell.org/trac/ghc/wiki/ImplementingTreesThatGrow has the details.  It’s entirely an internal change, not a change to GHC’s specification, so it’s independent of the GHC proposals process.  But I’d value the opinion of other GHC devs

 

Meanwhile I have a question. When pretty-printing HsSyn we often have a situation like this:

 

  data Pass = Parsed | Renamed | Typechecked

 

  data HsExpr (p :: Pass) = HsVar (IdP p) | ....

 

  type famliy IdP p where

     IdP Parsed      = RdrName

     IdP Renamed     = Name

     IdP Typechecked = Id

 

  instance (Outputable (IdP p)) => Outputable (HsExpr p) where

     ppr (HsVar v) = ppr v

 

The (ppr v) requires (Outputable (IdP p)), hence the context.

 

Moreover, and more seriously, there are things we just can't pretty-print

right now.  For example, HsExpr has this data constructor:

 

  data HsExpr p = ...

    | OpApp       (LHsExpr p)

                  (LHsExpr p)

                  (PostRn p Fixity)

                  (LHsExpr p)

 

To pretty-print the third argument, we'd need to add

 

  instance (Outputable (IdP p),

            Outputable (PostRn p Fixity))   -- New

        => Outputable (HsExpr p) where

     ppr (HsVar v) = ppr v

 

and that gets onerous.  So today we never print these annotations, to avoid bloating the instance contexts, which can be painful.  It bit me yesterday.

 

We have bitten that bullet for the Data class: look at HsExtension.DataId, which abbreviates the long list of dictionaries:

 

  type DataId p =

    ( Data p

    , ForallX Data p

    , Data (NameOrRdrName (IdP p))

    , Data (IdP p)

    , Data (PostRn p (IdP p))

    , Data (PostRn p (Located Name))

    , Data (PostRn p Bool)

    , Data (PostRn p Fixity)

     ,..and nine more... )

 

Let me note in passing that [wiki:QuantifiedContexts] would make this somewhat shorter

 

  type DataId p =

    ( Data p

    , ForallX Data p

    , Data (NameOrRdrName (IdP p))

    , Data (IdP p)

    , forall t. Data t => Data (PostRn p t))

 

But we still need one item in this list for each type function,

and I am worried about how this scales to the

[wiki:ImplementingTreesThatGrow] story, when we have a type

function for each data constructor -- and there are a lot of data

constructors.

 

So I have four questions

 

  1. I think we should probably use a superclass instead of a type synonym

 

class (Data p, ForallX Data p, ....) => DataId p where {}

 

Why?  Only one argument to pass, and to pass on to successive calls.  I see no downside.

 

  1. Shall we treat Outputable like Data?  (I.e. make an abbreviation for a long list of Outputable instances and use it everywhere)

  2. I thought of another way to do it (pass a token); see below

 

  1. Are there any other ways?

 

 

Token passing idea.

 

Perhaps instead of passing lots of functions, we pass a singleton token

that encodes the pass, like this:

 

  instance (PassC p) => Outputable (HsExpr p) where

     ppr (HsVar v) = case getPass :: IsPass p of

                       IsParsed      -> ppr v

                       IsRenamed     -> ppr v

                       IsTypechecked -> ppr v

 

The three ppr's are at different types, of course; that's the point.

The infrastructure is something like

 

  class PassC p where

    getPass :: IsPass p

 

  data IsPass p where

    IsParsed      :: IsPass Parsed

    IsRenamed     :: IsParsed Renamed

    IsTypechecked :: IsParsed Typechecked

 

  instance PassC Parsed where getPass = IsParsed

    ...etc...

 

Now we could sweep away all those OutputableX classes,

replacing them with dynamic tests on the singletons IsParsed etc.

This would have advantages:

 

- Probably faster: there's a dynamic test, but many fewer dictionary

  arguments and higher-order function dispatch

 

- Only one dictionary to pass; programming is easier.

 

The big downside is that it's not extensible: it works only because

we know the three cases.  But the "Trees that Grow" story really doesn't

scale well to pretty-printing: so maybe we should just give up on that?



_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Operating on HsSyn

GHC - devs mailing list

I have been under the impression that we don't even want to print those.

On the contrary, inside GHC I /do/ want to print them. Otherwise how can I see what the renamer has done?

 

Simon

 

From: Shayan Najd [mailto:[hidden email]]
Sent: 28 July 2017 12:20
To: Simon Peyton Jones <[hidden email]>
Cc: [hidden email]; Alan & Kim Zimmerman <[hidden email]>
Subject: Re: Operating on HsSyn

 

Before all this, we may need to discuss a bit about the intended semantics of

`Outputable`: does it need to print `PostRn`, or `PostTc` fields; or `Out`

suffixed constructors?  If not, then we only need to write a set of instances

for the base growable AST, once and for all.  Such instances will be polymorphic

on the extension descriptor `p`, and do not need to mention the constraints like

`(PostRn p (IdP p)`, since these are just extensions and not part of the base

growable AST.  Or, am I missing something about the intended semantics of

`Outputable`?

 

You write

 

So today we never print these annotations, to avoid bloating the instance contexts, which can be painful. 

 

I have been under the impression that we don't even want to print those.

 

Of course, there are scenarios (like `Show` instances) where we do want to write 

compositional / generic functions that take into account the extensions.

Here is my abstract overview of the scenario, that may help the discussion.

Consider data types `A`, `B`, and `C` (say, one AST datatype per compiler phase) that

 are defined as extensions to a base datatype `T`:

 

> A = T XA

> B = T XB

> C = T XC

 

where `X*`s are extension descriptors.

Now, say we want to a define functions `f_A`, `f_B`, and `f_C` over `A`, `B`, and `C`.

We have two main alternatives: 

(a) either we write these  (manually or using the deriving mechanism) separately

(b) or we write a generic / parametric function `g` over `T`, and reuse that to define `f_*`s

 

Of course, (b) is preferable in theory , but not always possible or preferable in practice.

In which case, we can always resort to (a).

The more varying are the definitions of `f_A`, `f_B`, and `f_C` the more parametric should 

`g` get, as this is the case for any generic function.

 

With a correct design, I believe, these are all independent of Trees that Grow story itself: 

we are now not only trying to reuse data types, and functions agnostic towards extensions

(pretty printers in my view of their semantics), but also reuse functions with parametric / 

varying behaviour with respect to extensions. 

 

/Shayan

 

 

 

On Fri, Jul 28, 2017 at 10:18 AM, Simon Peyton Jones <[hidden email]> wrote:

Devs,

 

Shayan is working away on “Trees that grow”… do keep it on your radar:

 

To: ghc-devs
Sent: 25 May 2017 23:49

Do take a look at this:

 

·         We propose to re-engineer HsSyn itself.  This will touch a lot of code.

·         But it’s very neat, and will bring big long-term advantages

·         And we can do it a bit at a time

 

The wiki page https://ghc.haskell.org/trac/ghc/wiki/ImplementingTreesThatGrow has the details.  It’s entirely an internal change, not a change to GHC’s specification, so it’s independent of the GHC proposals process.  But I’d value the opinion of other GHC devs

 

Meanwhile I have a question. When pretty-printing HsSyn we often have a situation like this:

 

  data Pass = Parsed | Renamed | Typechecked

 

  data HsExpr (p :: Pass) = HsVar (IdP p) | ....

 

  type famliy IdP p where

     IdP Parsed      = RdrName

     IdP Renamed     = Name

     IdP Typechecked = Id

 

  instance (Outputable (IdP p)) => Outputable (HsExpr p) where

     ppr (HsVar v) = ppr v

 

The (ppr v) requires (Outputable (IdP p)), hence the context.

 

Moreover, and more seriously, there are things we just can't pretty-print

right now.  For example, HsExpr has this data constructor:

 

  data HsExpr p = ...

    | OpApp       (LHsExpr p)

                  (LHsExpr p)

                  (PostRn p Fixity)

                  (LHsExpr p)

 

To pretty-print the third argument, we'd need to add

 

  instance (Outputable (IdP p),

            Outputable (PostRn p Fixity))   -- New

        => Outputable (HsExpr p) where

     ppr (HsVar v) = ppr v

 

and that gets onerous.  So today we never print these annotations, to avoid bloating the instance contexts, which can be painful.  It bit me yesterday.

 

We have bitten that bullet for the Data class: look at HsExtension.DataId, which abbreviates the long list of dictionaries:

 

  type DataId p =

    ( Data p

    , ForallX Data p

    , Data (NameOrRdrName (IdP p))

    , Data (IdP p)

    , Data (PostRn p (IdP p))

    , Data (PostRn p (Located Name))

    , Data (PostRn p Bool)

    , Data (PostRn p Fixity)

     ,..and nine more... )

 

Let me note in passing that [wiki:QuantifiedContexts] would make this somewhat shorter

 

  type DataId p =

    ( Data p

    , ForallX Data p

    , Data (NameOrRdrName (IdP p))

    , Data (IdP p)

    , forall t. Data t => Data (PostRn p t))

 

But we still need one item in this list for each type function,

and I am worried about how this scales to the

[wiki:ImplementingTreesThatGrow] story, when we have a type

function for each data constructor -- and there are a lot of data

constructors.

 

So I have four questions

 

  1. I think we should probably use a superclass instead of a type synonym

 

class (Data p, ForallX Data p, ....) => DataId p where {}

 

Why?  Only one argument to pass, and to pass on to successive calls.  I see no downside.

 

  1. Shall we treat Outputable like Data?  (I.e. make an abbreviation for a long list of Outputable instances and use it everywhere)
  2. I thought of another way to do it (pass a token); see below

 

  1. Are there any other ways?

 

 

Token passing idea.

 

Perhaps instead of passing lots of functions, we pass a singleton token

that encodes the pass, like this:

 

  instance (PassC p) => Outputable (HsExpr p) where

     ppr (HsVar v) = case getPass :: IsPass p of

                       IsParsed      -> ppr v

                       IsRenamed     -> ppr v

                       IsTypechecked -> ppr v

 

The three ppr's are at different types, of course; that's the point.

The infrastructure is something like

 

  class PassC p where

    getPass :: IsPass p

 

  data IsPass p where

    IsParsed      :: IsPass Parsed

    IsRenamed     :: IsParsed Renamed

    IsTypechecked :: IsParsed Typechecked

 

  instance PassC Parsed where getPass = IsParsed

    ...etc...

 

Now we could sweep away all those OutputableX classes,

replacing them with dynamic tests on the singletons IsParsed etc.

This would have advantages:

 

- Probably faster: there's a dynamic test, but many fewer dictionary

  arguments and higher-order function dispatch

 

- Only one dictionary to pass; programming is easier.

 

The big downside is that it's not extensible: it works only because

we know the three cases.  But the "Trees that Grow" story really doesn't

scale well to pretty-printing: so maybe we should just give up on that?

 


_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Operating on HsSyn

Shayan Najd Javadipour-2
On the contrary, inside GHC I /do/ want to print them. Otherwise how can I see what the renamer has done?

Right. So if I understand correctly, with this semantics, `Outputable` is somewhere between pretty printing as often used in program manipulation libraries (like Haskell-Src-Exts (HSE)) which is closer to syntax, and `Show` which is closer to Haskell representation.
(There are also "exact printers" (as in HSE) that are even closer to syntax in some sense.) 
Often, pretty printers generate only grammatically valid terms, not the ones polluted with extra annotations (hence grammatically invalid), e.g., what is the grammatically valid form of `OpApp` printed via `Outputable` that includes the fixity annotation.  

If I recall correctly, we have briefly studied these in the past summer, we came up with some roundtrip correctness criteria, like the following (bar error handling; assume valid input):

 (parser . prettyPrint . parser) = id  
 
[paging in Jacques]

The reason I am trying to flesh out the semantics is the /big/ gains on code reuse later on in the process: one does not need to define a separate pretty printing library for Haskell syntax, and can reuse the well-tested and well-maintained one in GHC.

Reformulating part of your concern, based on my understanding (if I may), the questions is: what is the proper design of an "outputer" (debug-printer?) where /annotated/ terms can be pretty-printed including any printable (pretty?showable?) annotations.
In particular, we may want to take advantage of extensibility of data types for this.
Am I far off? 

Note: with proper design, an extensible debug-printer can still subsume corresponding pretty-printers.


On Fri, Jul 28, 2017 at 2:36 PM, Simon Peyton Jones <[hidden email]> wrote:

I have been under the impression that we don't even want to print those.

On the contrary, inside GHC I /do/ want to print them. Otherwise how can I see what the renamer has done?

 

Simon

 

From: Shayan Najd [mailto:[hidden email]]
Sent: 28 July 2017 12:20
To: Simon Peyton Jones <[hidden email]>
Cc: [hidden email]; Alan & Kim Zimmerman <[hidden email]>
Subject: Re: Operating on HsSyn

 

Before all this, we may need to discuss a bit about the intended semantics of

`Outputable`: does it need to print `PostRn`, or `PostTc` fields; or `Out`

suffixed constructors?  If not, then we only need to write a set of instances

for the base growable AST, once and for all.  Such instances will be polymorphic

on the extension descriptor `p`, and do not need to mention the constraints like

`(PostRn p (IdP p)`, since these are just extensions and not part of the base

growable AST.  Or, am I missing something about the intended semantics of

`Outputable`?

 

You write

 

So today we never print these annotations, to avoid bloating the instance contexts, which can be painful. 

 

I have been under the impression that we don't even want to print those.

 

Of course, there are scenarios (like `Show` instances) where we do want to write 

compositional / generic functions that take into account the extensions.

Here is my abstract overview of the scenario, that may help the discussion.

Consider data types `A`, `B`, and `C` (say, one AST datatype per compiler phase) that

 are defined as extensions to a base datatype `T`:

 

> A = T XA

> B = T XB

> C = T XC

 

where `X*`s are extension descriptors.

Now, say we want to a define functions `f_A`, `f_B`, and `f_C` over `A`, `B`, and `C`.

We have two main alternatives: 

(a) either we write these  (manually or using the deriving mechanism) separately

(b) or we write a generic / parametric function `g` over `T`, and reuse that to define `f_*`s

 

Of course, (b) is preferable in theory , but not always possible or preferable in practice.

In which case, we can always resort to (a).

The more varying are the definitions of `f_A`, `f_B`, and `f_C` the more parametric should 

`g` get, as this is the case for any generic function.

 

With a correct design, I believe, these are all independent of Trees that Grow story itself: 

we are now not only trying to reuse data types, and functions agnostic towards extensions

(pretty printers in my view of their semantics), but also reuse functions with parametric / 

varying behaviour with respect to extensions. 

 

/Shayan

 

 

 

On Fri, Jul 28, 2017 at 10:18 AM, Simon Peyton Jones <[hidden email]> wrote:

Devs,

 

Shayan is working away on “Trees that grow”… do keep it on your radar:

 

To: ghc-devs
Sent: 25 May 2017 23:49

Do take a look at this:

 

·         We propose to re-engineer HsSyn itself.  This will touch a lot of code.

·         But it’s very neat, and will bring big long-term advantages

·         And we can do it a bit at a time

 

The wiki page https://ghc.haskell.org/trac/ghc/wiki/ImplementingTreesThatGrow has the details.  It’s entirely an internal change, not a change to GHC’s specification, so it’s independent of the GHC proposals process.  But I’d value the opinion of other GHC devs

 

Meanwhile I have a question. When pretty-printing HsSyn we often have a situation like this:

 

  data Pass = Parsed | Renamed | Typechecked

 

  data HsExpr (p :: Pass) = HsVar (IdP p) | ....

 

  type famliy IdP p where

     IdP Parsed      = RdrName

     IdP Renamed     = Name

     IdP Typechecked = Id

 

  instance (Outputable (IdP p)) => Outputable (HsExpr p) where

     ppr (HsVar v) = ppr v

 

The (ppr v) requires (Outputable (IdP p)), hence the context.

 

Moreover, and more seriously, there are things we just can't pretty-print

right now.  For example, HsExpr has this data constructor:

 

  data HsExpr p = ...

    | OpApp       (LHsExpr p)

                  (LHsExpr p)

                  (PostRn p Fixity)

                  (LHsExpr p)

 

To pretty-print the third argument, we'd need to add

 

  instance (Outputable (IdP p),

            Outputable (PostRn p Fixity))   -- New

        => Outputable (HsExpr p) where

     ppr (HsVar v) = ppr v

 

and that gets onerous.  So today we never print these annotations, to avoid bloating the instance contexts, which can be painful.  It bit me yesterday.

 

We have bitten that bullet for the Data class: look at HsExtension.DataId, which abbreviates the long list of dictionaries:

 

  type DataId p =

    ( Data p

    , ForallX Data p

    , Data (NameOrRdrName (IdP p))

    , Data (IdP p)

    , Data (PostRn p (IdP p))

    , Data (PostRn p (Located Name))

    , Data (PostRn p Bool)

    , Data (PostRn p Fixity)

     ,..and nine more... )

 

Let me note in passing that [wiki:QuantifiedContexts] would make this somewhat shorter

 

  type DataId p =

    ( Data p

    , ForallX Data p

    , Data (NameOrRdrName (IdP p))

    , Data (IdP p)

    , forall t. Data t => Data (PostRn p t))

 

But we still need one item in this list for each type function,

and I am worried about how this scales to the

[wiki:ImplementingTreesThatGrow] story, when we have a type

function for each data constructor -- and there are a lot of data

constructors.

 

So I have four questions

 

  1. I think we should probably use a superclass instead of a type synonym

 

class (Data p, ForallX Data p, ....) => DataId p where {}

 

Why?  Only one argument to pass, and to pass on to successive calls.  I see no downside.

 

  1. Shall we treat Outputable like Data?  (I.e. make an abbreviation for a long list of Outputable instances and use it everywhere)
  2. I thought of another way to do it (pass a token); see below

 

  1. Are there any other ways?

 

 

Token passing idea.

 

Perhaps instead of passing lots of functions, we pass a singleton token

that encodes the pass, like this:

 

  instance (PassC p) => Outputable (HsExpr p) where

     ppr (HsVar v) = case getPass :: IsPass p of

                       IsParsed      -> ppr v

                       IsRenamed     -> ppr v

                       IsTypechecked -> ppr v

 

The three ppr's are at different types, of course; that's the point.

The infrastructure is something like

 

  class PassC p where

    getPass :: IsPass p

 

  data IsPass p where

    IsParsed      :: IsPass Parsed

    IsRenamed     :: IsParsed Renamed

    IsTypechecked :: IsParsed Typechecked

 

  instance PassC Parsed where getPass = IsParsed

    ...etc...

 

Now we could sweep away all those OutputableX classes,

replacing them with dynamic tests on the singletons IsParsed etc.

This would have advantages:

 

- Probably faster: there's a dynamic test, but many fewer dictionary

  arguments and higher-order function dispatch

 

- Only one dictionary to pass; programming is easier.

 

The big downside is that it's not extensible: it works only because

we know the three cases.  But the "Trees that Grow" story really doesn't

scale well to pretty-printing: so maybe we should just give up on that?

 



_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Operating on HsSyn

Shayan Najd Javadipour-2
by

 (parser . prettyPrint . parser) = id  

I meant 

 (prettyPrint . parser . prettyPrint) = id  

for a valid input. 


On Fri, Jul 28, 2017 at 4:32 PM, Shayan Najd <[hidden email]> wrote:
On the contrary, inside GHC I /do/ want to print them. Otherwise how can I see what the renamer has done?

Right. So if I understand correctly, with this semantics, `Outputable` is somewhere between pretty printing as often used in program manipulation libraries (like Haskell-Src-Exts (HSE)) which is closer to syntax, and `Show` which is closer to Haskell representation.
(There are also "exact printers" (as in HSE) that are even closer to syntax in some sense.) 
Often, pretty printers generate only grammatically valid terms, not the ones polluted with extra annotations (hence grammatically invalid), e.g., what is the grammatically valid form of `OpApp` printed via `Outputable` that includes the fixity annotation.  

If I recall correctly, we have briefly studied these in the past summer, we came up with some roundtrip correctness criteria, like the following (bar error handling; assume valid input):

 (parser . prettyPrint . parser) = id  
 
[paging in Jacques]

The reason I am trying to flesh out the semantics is the /big/ gains on code reuse later on in the process: one does not need to define a separate pretty printing library for Haskell syntax, and can reuse the well-tested and well-maintained one in GHC.

Reformulating part of your concern, based on my understanding (if I may), the questions is: what is the proper design of an "outputer" (debug-printer?) where /annotated/ terms can be pretty-printed including any printable (pretty?showable?) annotations.
In particular, we may want to take advantage of extensibility of data types for this.
Am I far off? 

Note: with proper design, an extensible debug-printer can still subsume corresponding pretty-printers.


On Fri, Jul 28, 2017 at 2:36 PM, Simon Peyton Jones <[hidden email]> wrote:

I have been under the impression that we don't even want to print those.

On the contrary, inside GHC I /do/ want to print them. Otherwise how can I see what the renamer has done?

 

Simon

 

From: Shayan Najd [mailto:[hidden email]]
Sent: 28 July 2017 12:20
To: Simon Peyton Jones <[hidden email]>
Cc: [hidden email]; Alan & Kim Zimmerman <[hidden email]>
Subject: Re: Operating on HsSyn

 

Before all this, we may need to discuss a bit about the intended semantics of

`Outputable`: does it need to print `PostRn`, or `PostTc` fields; or `Out`

suffixed constructors?  If not, then we only need to write a set of instances

for the base growable AST, once and for all.  Such instances will be polymorphic

on the extension descriptor `p`, and do not need to mention the constraints like

`(PostRn p (IdP p)`, since these are just extensions and not part of the base

growable AST.  Or, am I missing something about the intended semantics of

`Outputable`?

 

You write

 

So today we never print these annotations, to avoid bloating the instance contexts, which can be painful. 

 

I have been under the impression that we don't even want to print those.

 

Of course, there are scenarios (like `Show` instances) where we do want to write 

compositional / generic functions that take into account the extensions.

Here is my abstract overview of the scenario, that may help the discussion.

Consider data types `A`, `B`, and `C` (say, one AST datatype per compiler phase) that

 are defined as extensions to a base datatype `T`:

 

> A = T XA

> B = T XB

> C = T XC

 

where `X*`s are extension descriptors.

Now, say we want to a define functions `f_A`, `f_B`, and `f_C` over `A`, `B`, and `C`.

We have two main alternatives: 

(a) either we write these  (manually or using the deriving mechanism) separately

(b) or we write a generic / parametric function `g` over `T`, and reuse that to define `f_*`s

 

Of course, (b) is preferable in theory , but not always possible or preferable in practice.

In which case, we can always resort to (a).

The more varying are the definitions of `f_A`, `f_B`, and `f_C` the more parametric should 

`g` get, as this is the case for any generic function.

 

With a correct design, I believe, these are all independent of Trees that Grow story itself: 

we are now not only trying to reuse data types, and functions agnostic towards extensions

(pretty printers in my view of their semantics), but also reuse functions with parametric / 

varying behaviour with respect to extensions. 

 

/Shayan

 

 

 

On Fri, Jul 28, 2017 at 10:18 AM, Simon Peyton Jones <[hidden email]> wrote:

Devs,

 

Shayan is working away on “Trees that grow”… do keep it on your radar:

 

To: ghc-devs
Sent: 25 May 2017 23:49

Do take a look at this:

 

·         We propose to re-engineer HsSyn itself.  This will touch a lot of code.

·         But it’s very neat, and will bring big long-term advantages

·         And we can do it a bit at a time

 

The wiki page https://ghc.haskell.org/trac/ghc/wiki/ImplementingTreesThatGrow has the details.  It’s entirely an internal change, not a change to GHC’s specification, so it’s independent of the GHC proposals process.  But I’d value the opinion of other GHC devs

 

Meanwhile I have a question. When pretty-printing HsSyn we often have a situation like this:

 

  data Pass = Parsed | Renamed | Typechecked

 

  data HsExpr (p :: Pass) = HsVar (IdP p) | ....

 

  type famliy IdP p where

     IdP Parsed      = RdrName

     IdP Renamed     = Name

     IdP Typechecked = Id

 

  instance (Outputable (IdP p)) => Outputable (HsExpr p) where

     ppr (HsVar v) = ppr v

 

The (ppr v) requires (Outputable (IdP p)), hence the context.

 

Moreover, and more seriously, there are things we just can't pretty-print

right now.  For example, HsExpr has this data constructor:

 

  data HsExpr p = ...

    | OpApp       (LHsExpr p)

                  (LHsExpr p)

                  (PostRn p Fixity)

                  (LHsExpr p)

 

To pretty-print the third argument, we'd need to add

 

  instance (Outputable (IdP p),

            Outputable (PostRn p Fixity))   -- New

        => Outputable (HsExpr p) where

     ppr (HsVar v) = ppr v

 

and that gets onerous.  So today we never print these annotations, to avoid bloating the instance contexts, which can be painful.  It bit me yesterday.

 

We have bitten that bullet for the Data class: look at HsExtension.DataId, which abbreviates the long list of dictionaries:

 

  type DataId p =

    ( Data p

    , ForallX Data p

    , Data (NameOrRdrName (IdP p))

    , Data (IdP p)

    , Data (PostRn p (IdP p))

    , Data (PostRn p (Located Name))

    , Data (PostRn p Bool)

    , Data (PostRn p Fixity)

     ,..and nine more... )

 

Let me note in passing that [wiki:QuantifiedContexts] would make this somewhat shorter

 

  type DataId p =

    ( Data p

    , ForallX Data p

    , Data (NameOrRdrName (IdP p))

    , Data (IdP p)

    , forall t. Data t => Data (PostRn p t))

 

But we still need one item in this list for each type function,

and I am worried about how this scales to the

[wiki:ImplementingTreesThatGrow] story, when we have a type

function for each data constructor -- and there are a lot of data

constructors.

 

So I have four questions

 

  1. I think we should probably use a superclass instead of a type synonym

 

class (Data p, ForallX Data p, ....) => DataId p where {}

 

Why?  Only one argument to pass, and to pass on to successive calls.  I see no downside.

 

  1. Shall we treat Outputable like Data?  (I.e. make an abbreviation for a long list of Outputable instances and use it everywhere)
  2. I thought of another way to do it (pass a token); see below

 

  1. Are there any other ways?

 

 

Token passing idea.

 

Perhaps instead of passing lots of functions, we pass a singleton token

that encodes the pass, like this:

 

  instance (PassC p) => Outputable (HsExpr p) where

     ppr (HsVar v) = case getPass :: IsPass p of

                       IsParsed      -> ppr v

                       IsRenamed     -> ppr v

                       IsTypechecked -> ppr v

 

The three ppr's are at different types, of course; that's the point.

The infrastructure is something like

 

  class PassC p where

    getPass :: IsPass p

 

  data IsPass p where

    IsParsed      :: IsPass Parsed

    IsRenamed     :: IsParsed Renamed

    IsTypechecked :: IsParsed Typechecked

 

  instance PassC Parsed where getPass = IsParsed

    ...etc...

 

Now we could sweep away all those OutputableX classes,

replacing them with dynamic tests on the singletons IsParsed etc.

This would have advantages:

 

- Probably faster: there's a dynamic test, but many fewer dictionary

  arguments and higher-order function dispatch

 

- Only one dictionary to pass; programming is easier.

 

The big downside is that it's not extensible: it works only because

we know the three cases.  But the "Trees that Grow" story really doesn't

scale well to pretty-printing: so maybe we should just give up on that?

 




_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Operating on HsSyn

Alan & Kim Zimmerman
Well, in GHC as it stands now (from 8.2.1), the original formulation is correct.

The current pretty printer does not reproduce layout, but reparsing a pretty printed ParsedSource will faithfully reproduce the original ParsedSource (except for the Locations).

This is useful for constructing hsSyn fragments programmatically and generating usable source from them.

I am hoping (intending) that a future version will allow the other formulation too.

Alan

On 28 July 2017 at 19:38, Shayan Najd <[hidden email]> wrote:
by

 (parser . prettyPrint . parser) = id  

I meant 

 (prettyPrint . parser . prettyPrint) = id  

for a valid input. 


On Fri, Jul 28, 2017 at 4:32 PM, Shayan Najd <[hidden email]> wrote:
On the contrary, inside GHC I /do/ want to print them. Otherwise how can I see what the renamer has done?

Right. So if I understand correctly, with this semantics, `Outputable` is somewhere between pretty printing as often used in program manipulation libraries (like Haskell-Src-Exts (HSE)) which is closer to syntax, and `Show` which is closer to Haskell representation.
(There are also "exact printers" (as in HSE) that are even closer to syntax in some sense.) 
Often, pretty printers generate only grammatically valid terms, not the ones polluted with extra annotations (hence grammatically invalid), e.g., what is the grammatically valid form of `OpApp` printed via `Outputable` that includes the fixity annotation.  

If I recall correctly, we have briefly studied these in the past summer, we came up with some roundtrip correctness criteria, like the following (bar error handling; assume valid input):

 (parser . prettyPrint . parser) = id  
 
[paging in Jacques]

The reason I am trying to flesh out the semantics is the /big/ gains on code reuse later on in the process: one does not need to define a separate pretty printing library for Haskell syntax, and can reuse the well-tested and well-maintained one in GHC.

Reformulating part of your concern, based on my understanding (if I may), the questions is: what is the proper design of an "outputer" (debug-printer?) where /annotated/ terms can be pretty-printed including any printable (pretty?showable?) annotations.
In particular, we may want to take advantage of extensibility of data types for this.
Am I far off? 

Note: with proper design, an extensible debug-printer can still subsume corresponding pretty-printers.


On Fri, Jul 28, 2017 at 2:36 PM, Simon Peyton Jones <[hidden email]> wrote:

I have been under the impression that we don't even want to print those.

On the contrary, inside GHC I /do/ want to print them. Otherwise how can I see what the renamer has done?

 

Simon

 

From: Shayan Najd [mailto:[hidden email]]
Sent: 28 July 2017 12:20
To: Simon Peyton Jones <[hidden email]>
Cc: [hidden email]; Alan & Kim Zimmerman <[hidden email]>
Subject: Re: Operating on HsSyn

 

Before all this, we may need to discuss a bit about the intended semantics of

`Outputable`: does it need to print `PostRn`, or `PostTc` fields; or `Out`

suffixed constructors?  If not, then we only need to write a set of instances

for the base growable AST, once and for all.  Such instances will be polymorphic

on the extension descriptor `p`, and do not need to mention the constraints like

`(PostRn p (IdP p)`, since these are just extensions and not part of the base

growable AST.  Or, am I missing something about the intended semantics of

`Outputable`?

 

You write

 

So today we never print these annotations, to avoid bloating the instance contexts, which can be painful. 

 

I have been under the impression that we don't even want to print those.

 

Of course, there are scenarios (like `Show` instances) where we do want to write 

compositional / generic functions that take into account the extensions.

Here is my abstract overview of the scenario, that may help the discussion.

Consider data types `A`, `B`, and `C` (say, one AST datatype per compiler phase) that

 are defined as extensions to a base datatype `T`:

 

> A = T XA

> B = T XB

> C = T XC

 

where `X*`s are extension descriptors.

Now, say we want to a define functions `f_A`, `f_B`, and `f_C` over `A`, `B`, and `C`.

We have two main alternatives: 

(a) either we write these  (manually or using the deriving mechanism) separately

(b) or we write a generic / parametric function `g` over `T`, and reuse that to define `f_*`s

 

Of course, (b) is preferable in theory , but not always possible or preferable in practice.

In which case, we can always resort to (a).

The more varying are the definitions of `f_A`, `f_B`, and `f_C` the more parametric should 

`g` get, as this is the case for any generic function.

 

With a correct design, I believe, these are all independent of Trees that Grow story itself: 

we are now not only trying to reuse data types, and functions agnostic towards extensions

(pretty printers in my view of their semantics), but also reuse functions with parametric / 

varying behaviour with respect to extensions. 

 

/Shayan

 

 

 

On Fri, Jul 28, 2017 at 10:18 AM, Simon Peyton Jones <[hidden email]> wrote:

Devs,

 

Shayan is working away on “Trees that grow”… do keep it on your radar:

 

To: ghc-devs
Sent: 25 May 2017 23:49

Do take a look at this:

 

·         We propose to re-engineer HsSyn itself.  This will touch a lot of code.

·         But it’s very neat, and will bring big long-term advantages

·         And we can do it a bit at a time

 

The wiki page https://ghc.haskell.org/trac/ghc/wiki/ImplementingTreesThatGrow has the details.  It’s entirely an internal change, not a change to GHC’s specification, so it’s independent of the GHC proposals process.  But I’d value the opinion of other GHC devs

 

Meanwhile I have a question. When pretty-printing HsSyn we often have a situation like this:

 

  data Pass = Parsed | Renamed | Typechecked

 

  data HsExpr (p :: Pass) = HsVar (IdP p) | ....

 

  type famliy IdP p where

     IdP Parsed      = RdrName

     IdP Renamed     = Name

     IdP Typechecked = Id

 

  instance (Outputable (IdP p)) => Outputable (HsExpr p) where

     ppr (HsVar v) = ppr v

 

The (ppr v) requires (Outputable (IdP p)), hence the context.

 

Moreover, and more seriously, there are things we just can't pretty-print

right now.  For example, HsExpr has this data constructor:

 

  data HsExpr p = ...

    | OpApp       (LHsExpr p)

                  (LHsExpr p)

                  (PostRn p Fixity)

                  (LHsExpr p)

 

To pretty-print the third argument, we'd need to add

 

  instance (Outputable (IdP p),

            Outputable (PostRn p Fixity))   -- New

        => Outputable (HsExpr p) where

     ppr (HsVar v) = ppr v

 

and that gets onerous.  So today we never print these annotations, to avoid bloating the instance contexts, which can be painful.  It bit me yesterday.

 

We have bitten that bullet for the Data class: look at HsExtension.DataId, which abbreviates the long list of dictionaries:

 

  type DataId p =

    ( Data p

    , ForallX Data p

    , Data (NameOrRdrName (IdP p))

    , Data (IdP p)

    , Data (PostRn p (IdP p))

    , Data (PostRn p (Located Name))

    , Data (PostRn p Bool)

    , Data (PostRn p Fixity)

     ,..and nine more... )

 

Let me note in passing that [wiki:QuantifiedContexts] would make this somewhat shorter

 

  type DataId p =

    ( Data p

    , ForallX Data p

    , Data (NameOrRdrName (IdP p))

    , Data (IdP p)

    , forall t. Data t => Data (PostRn p t))

 

But we still need one item in this list for each type function,

and I am worried about how this scales to the

[wiki:ImplementingTreesThatGrow] story, when we have a type

function for each data constructor -- and there are a lot of data

constructors.

 

So I have four questions

 

  1. I think we should probably use a superclass instead of a type synonym

 

class (Data p, ForallX Data p, ....) => DataId p where {}

 

Why?  Only one argument to pass, and to pass on to successive calls.  I see no downside.

 

  1. Shall we treat Outputable like Data?  (I.e. make an abbreviation for a long list of Outputable instances and use it everywhere)
  2. I thought of another way to do it (pass a token); see below

 

  1. Are there any other ways?

 

 

Token passing idea.

 

Perhaps instead of passing lots of functions, we pass a singleton token

that encodes the pass, like this:

 

  instance (PassC p) => Outputable (HsExpr p) where

     ppr (HsVar v) = case getPass :: IsPass p of

                       IsParsed      -> ppr v

                       IsRenamed     -> ppr v

                       IsTypechecked -> ppr v

 

The three ppr's are at different types, of course; that's the point.

The infrastructure is something like

 

  class PassC p where

    getPass :: IsPass p

 

  data IsPass p where

    IsParsed      :: IsPass Parsed

    IsRenamed     :: IsParsed Renamed

    IsTypechecked :: IsParsed Typechecked

 

  instance PassC Parsed where getPass = IsParsed

    ...etc...

 

Now we could sweep away all those OutputableX classes,

replacing them with dynamic tests on the singletons IsParsed etc.

This would have advantages:

 

- Probably faster: there's a dynamic test, but many fewer dictionary

  arguments and higher-order function dispatch

 

- Only one dictionary to pass; programming is easier.

 

The big downside is that it's not extensible: it works only because

we know the three cases.  But the "Trees that Grow" story really doesn't

scale well to pretty-printing: so maybe we should just give up on that?

 





_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Operating on HsSyn

MarLinn
In reply to this post by Shayan Najd Javadipour-2

by

 (parser . prettyPrint . parser) = id  

I meant 

(prettyPrint . parser . prettyPrint) = id

for a valid input.

Simplifying, (parser ∷ String → something), and (prettyPrint ∷ something → String).

Therefore, (parser . prettyPrint . parser ∷ String → something) and (prettyPrint . parser . prettyPrint ∷ something → String).

Therefore, both criteria could only apply for (something ~ String). But as pretty printing adds quotation marks, not even that is true.

There are four formulations that might be applicable:

  1. parser . prettyPrint ≍ id

  2. prettyPrint . parser id -- ∷ String → String, useless here

  3. prettyPrint . parser . prettyPrint prettyPrint

  4. parser . prettyPrint . parser parser

  5. Well, you could go beyond to (prettyPrint . parser . prettyPrint . parser prettyPrint . parser) etc…

I don't think 1 (or 2) follow from one of the last two. But 1 does imply them. So it is a stronger criterion than both, and therefore probably not the one to choose. Assuming the parser is internally consistent, 3 just says something about the internal consistency of the pretty printer, while 4 says something about the relationship of the pretty printer to the parser. Thus 4 looks like the best candidate for a criterion. Possibly with 3 as a secondary target.

Cheers,
MarLinn


_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Operating on HsSyn

Alan & Kim Zimmerman
I agree. 4 is the current GHC invariant.

i.e., re-parsing a pretty printed parse tree gives you back a parse tree identical to the original (ignoring SrcSpans)

Alan

On 28 July 2017 at 20:34, MarLinn <[hidden email]> wrote:

by

 (parser . prettyPrint . parser) = id  

I meant 

(prettyPrint . parser . prettyPrint) = id

for a valid input.

Simplifying, (parser ∷ String → something), and (prettyPrint ∷ something → String).

Therefore, (parser . prettyPrint . parser ∷ String → something) and (prettyPrint . parser . prettyPrint ∷ something → String).

Therefore, both criteria could only apply for (something ~ String). But as pretty printing adds quotation marks, not even that is true.

There are four formulations that might be applicable:

  1. parser . prettyPrint ≍ id

  2. prettyPrint . parser id -- ∷ String → String, useless here

  3. prettyPrint . parser . prettyPrint prettyPrint

  4. parser . prettyPrint . parser parser

  5. Well, you could go beyond to (prettyPrint . parser . prettyPrint . parser prettyPrint . parser) etc…

I don't think 1 (or 2) follow from one of the last two. But 1 does imply them. So it is a stronger criterion than both, and therefore probably not the one to choose. Assuming the parser is internally consistent, 3 just says something about the internal consistency of the pretty printer, while 4 says something about the relationship of the pretty printer to the parser. Thus 4 looks like the best candidate for a criterion. Possibly with 3 as a secondary target.

Cheers,
MarLinn


_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs



_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Operating on HsSyn

Shayan Najd Javadipour-2
MarLinn,
   Thanks for correcting me, and spelling this out. 
   I did mean what Alan mentioned: "re-parsing a pretty printed parse tree gives you back a parse tree identical to the original (ignoring SrcSpans)".
   As I recall, we had to go a bit further to give 'Something' some more structure to take into account things like "(ignoring SrcSpans)" (e.g., to define exact-printers, etc).
   Provided I have failed twice to properly recall the invariant, I refrain from trying to recall the rest tonight :) 

Not diverging from my point above, as far as I understand, an ideal `Outputable` machinery is going to be a bit different from the traditional pretty printers.
I believe with a proper design we can even reuse `Outputable` machinery and provide it as a pretty printer for Haskell terms.
It resembles the scenario in Section 3.7 compared to Section 3.6 of Trees that Grow [1].     
 
Having said all these, we ARE diverging from the original thread, and Simon's questions. 

How about taking printer-design related discussion to the following wiki page (and/or a new ghc-dev thread if needed):

Cheers,
  Shayan


On Fri, Jul 28, 2017 at 8:43 PM, Alan & Kim Zimmerman <[hidden email]> wrote:
I agree. 4 is the current GHC invariant.

i.e., re-parsing a pretty printed parse tree gives you back a parse tree identical to the original (ignoring SrcSpans)

Alan

On 28 July 2017 at 20:34, MarLinn <[hidden email]> wrote:

by

 (parser . prettyPrint . parser) = id  

I meant 

(prettyPrint . parser . prettyPrint) = id

for a valid input.

Simplifying, (parser ∷ String → something), and (prettyPrint ∷ something → String).

Therefore, (parser . prettyPrint . parser ∷ String → something) and (prettyPrint . parser . prettyPrint ∷ something → String).

Therefore, both criteria could only apply for (something ~ String). But as pretty printing adds quotation marks, not even that is true.

There are four formulations that might be applicable:

  1. parser . prettyPrint ≍ id

  2. prettyPrint . parser id -- ∷ String → String, useless here

  3. prettyPrint . parser . prettyPrint prettyPrint

  4. parser . prettyPrint . parser parser

  5. Well, you could go beyond to (prettyPrint . parser . prettyPrint . parser prettyPrint . parser) etc…

I don't think 1 (or 2) follow from one of the last two. But 1 does imply them. So it is a stronger criterion than both, and therefore probably not the one to choose. Assuming the parser is internally consistent, 3 just says something about the internal consistency of the pretty printer, while 4 says something about the relationship of the pretty printer to the parser. Thus 4 looks like the best candidate for a criterion. Possibly with 3 as a secondary target.

Cheers,
MarLinn


_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs



_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs



_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Operating on HsSyn

Shayan Najd Javadipour-2
Simon, 

Back to your original questions:

1. I think we should probably use a superclass instead of a type synonym
 
class (Data p, ForallX Data p, ....) => DataId p where {}
 
Why?  Only one argument to pass, and to pass on to successive calls.  I see no downside.

Sounds good to me. I have not experimented with it yet, but I see no downside as well.

Shall we treat Outputable like Data?  (I.e. make an abbreviation for a long list of Outputable instances and use it everywhere)

Besides, the above point about semantics and reusing it as classical pretty-printer, I suggest a bit more fine-grained abbreviations, if we are going along this path:
 we define the abbreviations per datatype (as opposed to all), we define `ForallXNAME :: constraint -> *` (as in Section 3.7 of the paper) where `NAME` is name of the datatype.

However, we do NOT need Trees that Grow in its full generality: as far as I understand, we do not have GADTs or existentials in GHC ASTs.
This means we can use the simpler design of Trees that Grow that all extension families had the same arity and kind, then we could factor them all into one type family dispatching on some unique identifier.
(we used type-level strings, or promoted datatypes for annotations.)

For example, the following in absence of existentials

> type family XVar x a
> type family XApp x a
> type family XAbs x a
> data Exp x a 
>  = Var (XVar x a) a
>  | Abs (XAbs x a) a (Exp x a)
>  | App (XApp x a) (Exp x a) (Exp x a)
 
is as good as (one of our very first extensible encodings)

> type family XDispatch name x a
> data Lbl = VarL | AbsL | AppL
> data Exp x a 
>  = Var (XDispatch VarL x a) a
>  | Abs (XDispatch AbsL x a) a (Exp x a)
>  | App (XDispatch AppL x a) (Exp x a) (Exp x a)

We could do the same with promoted datatypes as annotations (instead of type-level strings).

The advantage of the simpler encoding is that now, in theory, we can define

> ForallXExp (c :: Constraint) x a = forall l. c (XDispatch l x a)

Which gives us for example `ForallXExp Outputable x a `.

Is this encoding faster, in comparison? Does it help?

/Shayan









On Fri, Jul 28, 2017 at 10:11 PM, Shayan Najd <[hidden email]> wrote:
MarLinn,
   Thanks for correcting me, and spelling this out. 
   I did mean what Alan mentioned: "re-parsing a pretty printed parse tree gives you back a parse tree identical to the original (ignoring SrcSpans)".
   As I recall, we had to go a bit further to give 'Something' some more structure to take into account things like "(ignoring SrcSpans)" (e.g., to define exact-printers, etc).
   Provided I have failed twice to properly recall the invariant, I refrain from trying to recall the rest tonight :) 

Not diverging from my point above, as far as I understand, an ideal `Outputable` machinery is going to be a bit different from the traditional pretty printers.
I believe with a proper design we can even reuse `Outputable` machinery and provide it as a pretty printer for Haskell terms.
It resembles the scenario in Section 3.7 compared to Section 3.6 of Trees that Grow [1].     
 
Having said all these, we ARE diverging from the original thread, and Simon's questions. 

How about taking printer-design related discussion to the following wiki page (and/or a new ghc-dev thread if needed):

Cheers,
  Shayan


On Fri, Jul 28, 2017 at 8:43 PM, Alan & Kim Zimmerman <[hidden email]> wrote:
I agree. 4 is the current GHC invariant.

i.e., re-parsing a pretty printed parse tree gives you back a parse tree identical to the original (ignoring SrcSpans)

Alan

On 28 July 2017 at 20:34, MarLinn <[hidden email]> wrote:

by

 (parser . prettyPrint . parser) = id  

I meant 

(prettyPrint . parser . prettyPrint) = id

for a valid input.

Simplifying, (parser ∷ String → something), and (prettyPrint ∷ something → String).

Therefore, (parser . prettyPrint . parser ∷ String → something) and (prettyPrint . parser . prettyPrint ∷ something → String).

Therefore, both criteria could only apply for (something ~ String). But as pretty printing adds quotation marks, not even that is true.

There are four formulations that might be applicable:

  1. parser . prettyPrint ≍ id

  2. prettyPrint . parser id -- ∷ String → String, useless here

  3. prettyPrint . parser . prettyPrint prettyPrint

  4. parser . prettyPrint . parser parser

  5. Well, you could go beyond to (prettyPrint . parser . prettyPrint . parser prettyPrint . parser) etc…

I don't think 1 (or 2) follow from one of the last two. But 1 does imply them. So it is a stronger criterion than both, and therefore probably not the one to choose. Assuming the parser is internally consistent, 3 just says something about the internal consistency of the pretty printer, while 4 says something about the relationship of the pretty printer to the parser. Thus 4 looks like the best candidate for a criterion. Possibly with 3 as a secondary target.

Cheers,
MarLinn


_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs



_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs




_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Operating on HsSyn

GHC - devs mailing list
In reply to this post by Shayan Najd Javadipour-2

Am I far off? 

I don’t think you are far off.  All I’m after is the debug-printer use-case: I want to be able to see the contents of the tree, including the PostRn annotations.  I don’t care about round-tripping it.

 

Simon

 

From: Shayan Najd [mailto:[hidden email]]
Sent: 28 July 2017 15:32
To: Simon Peyton Jones <[hidden email]>
Cc: [hidden email]; Alan & Kim Zimmerman <[hidden email]>; Jacques Carette <[hidden email]>
Subject: Re: Operating on HsSyn

 

On the contrary, inside GHC I /do/ want to print them. Otherwise how can I see what the renamer has done?

 

Right. So if I understand correctly, with this semantics, `Outputable` is somewhere between pretty printing as often used in program manipulation libraries (like Haskell-Src-Exts (HSE)) which is closer to syntax, and `Show` which is closer to Haskell representation.

(There are also "exact printers" (as in HSE) that are even closer to syntax in some sense.) 

Often, pretty printers generate only grammatically valid terms, not the ones polluted with extra annotations (hence grammatically invalid), e.g., what is the grammatically valid form of `OpApp` printed via `Outputable` that includes the fixity annotation.  

 

If I recall correctly, we have briefly studied these in the past summer, we came up with some roundtrip correctness criteria, like the following (bar error handling; assume valid input):

 

 (parser . prettyPrint . parser) = id  

 

[paging in Jacques]

 

The reason I am trying to flesh out the semantics is the /big/ gains on code reuse later on in the process: one does not need to define a separate pretty printing library for Haskell syntax, and can reuse the well-tested and well-maintained one in GHC.

 

Reformulating part of your concern, based on my understanding (if I may), the questions is: what is the proper design of an "outputer" (debug-printer?) where /annotated/ terms can be pretty-printed including any printable (pretty?showable?) annotations.

In particular, we may want to take advantage of extensibility of data types for this.

Am I far off? 

 

Note: with proper design, an extensible debug-printer can still subsume corresponding pretty-printers.

 

 

On Fri, Jul 28, 2017 at 2:36 PM, Simon Peyton Jones <[hidden email]> wrote:

I have been under the impression that we don't even want to print those.

On the contrary, inside GHC I /do/ want to print them. Otherwise how can I see what the renamer has done?

 

Simon

 

From: Shayan Najd [mailto:[hidden email]]
Sent: 28 July 2017 12:20
To: Simon Peyton Jones <[hidden email]>
Cc: [hidden email]; Alan & Kim Zimmerman <[hidden email]>
Subject: Re: Operating on HsSyn

 

Before all this, we may need to discuss a bit about the intended semantics of

`Outputable`: does it need to print `PostRn`, or `PostTc` fields; or `Out`

suffixed constructors?  If not, then we only need to write a set of instances

for the base growable AST, once and for all.  Such instances will be polymorphic

on the extension descriptor `p`, and do not need to mention the constraints like

`(PostRn p (IdP p)`, since these are just extensions and not part of the base

growable AST.  Or, am I missing something about the intended semantics of

`Outputable`?

 

You write

 

So today we never print these annotations, to avoid bloating the instance contexts, which can be painful. 

 

I have been under the impression that we don't even want to print those.

 

Of course, there are scenarios (like `Show` instances) where we do want to write 

compositional / generic functions that take into account the extensions.

Here is my abstract overview of the scenario, that may help the discussion.

Consider data types `A`, `B`, and `C` (say, one AST datatype per compiler phase) that

 are defined as extensions to a base datatype `T`:

 

> A = T XA

> B = T XB

> C = T XC

 

where `X*`s are extension descriptors.

Now, say we want to a define functions `f_A`, `f_B`, and `f_C` over `A`, `B`, and `C`.

We have two main alternatives: 

(a) either we write these  (manually or using the deriving mechanism) separately

(b) or we write a generic / parametric function `g` over `T`, and reuse that to define `f_*`s

 

Of course, (b) is preferable in theory , but not always possible or preferable in practice.

In which case, we can always resort to (a).

The more varying are the definitions of `f_A`, `f_B`, and `f_C` the more parametric should 

`g` get, as this is the case for any generic function.

 

With a correct design, I believe, these are all independent of Trees that Grow story itself: 

we are now not only trying to reuse data types, and functions agnostic towards extensions

(pretty printers in my view of their semantics), but also reuse functions with parametric / 

varying behaviour with respect to extensions. 

 

/Shayan

 

 

 

On Fri, Jul 28, 2017 at 10:18 AM, Simon Peyton Jones <[hidden email]> wrote:

Devs,

 

Shayan is working away on “Trees that grow”… do keep it on your radar:

 

To: ghc-devs
Sent: 25 May 2017 23:49

Do take a look at this:

 

·         We propose to re-engineer HsSyn itself.  This will touch a lot of code.

·         But it’s very neat, and will bring big long-term advantages

·         And we can do it a bit at a time

 

The wiki page https://ghc.haskell.org/trac/ghc/wiki/ImplementingTreesThatGrow has the details.  It’s entirely an internal change, not a change to GHC’s specification, so it’s independent of the GHC proposals process.  But I’d value the opinion of other GHC devs

 

Meanwhile I have a question. When pretty-printing HsSyn we often have a situation like this:

 

  data Pass = Parsed | Renamed | Typechecked

 

  data HsExpr (p :: Pass) = HsVar (IdP p) | ....

 

  type famliy IdP p where

     IdP Parsed      = RdrName

     IdP Renamed     = Name

     IdP Typechecked = Id

 

  instance (Outputable (IdP p)) => Outputable (HsExpr p) where

     ppr (HsVar v) = ppr v

 

The (ppr v) requires (Outputable (IdP p)), hence the context.

 

Moreover, and more seriously, there are things we just can't pretty-print

right now.  For example, HsExpr has this data constructor:

 

  data HsExpr p = ...

    | OpApp       (LHsExpr p)

                  (LHsExpr p)

                  (PostRn p Fixity)

                  (LHsExpr p)

 

To pretty-print the third argument, we'd need to add

 

  instance (Outputable (IdP p),

            Outputable (PostRn p Fixity))   -- New

        => Outputable (HsExpr p) where

     ppr (HsVar v) = ppr v

 

and that gets onerous.  So today we never print these annotations, to avoid bloating the instance contexts, which can be painful.  It bit me yesterday.

 

We have bitten that bullet for the Data class: look at HsExtension.DataId, which abbreviates the long list of dictionaries:

 

  type DataId p =

    ( Data p

    , ForallX Data p

    , Data (NameOrRdrName (IdP p))

    , Data (IdP p)

    , Data (PostRn p (IdP p))

    , Data (PostRn p (Located Name))

    , Data (PostRn p Bool)

    , Data (PostRn p Fixity)

     ,..and nine more... )

 

Let me note in passing that [wiki:QuantifiedContexts] would make this somewhat shorter

 

  type DataId p =

    ( Data p

    , ForallX Data p

    , Data (NameOrRdrName (IdP p))

    , Data (IdP p)

    , forall t. Data t => Data (PostRn p t))

 

But we still need one item in this list for each type function,

and I am worried about how this scales to the

[wiki:ImplementingTreesThatGrow] story, when we have a type

function for each data constructor -- and there are a lot of data

constructors.

 

So I have four questions

 

  1. I think we should probably use a superclass instead of a type synonym

 

class (Data p, ForallX Data p, ....) => DataId p where {}

 

Why?  Only one argument to pass, and to pass on to successive calls.  I see no downside.

 

  1. Shall we treat Outputable like Data?  (I.e. make an abbreviation for a long list of Outputable instances and use it everywhere)
  2. I thought of another way to do it (pass a token); see below

 

  1. Are there any other ways?

 

 

Token passing idea.

 

Perhaps instead of passing lots of functions, we pass a singleton token

that encodes the pass, like this:

 

  instance (PassC p) => Outputable (HsExpr p) where

     ppr (HsVar v) = case getPass :: IsPass p of

                       IsParsed      -> ppr v

                       IsRenamed     -> ppr v

                       IsTypechecked -> ppr v

 

The three ppr's are at different types, of course; that's the point.

The infrastructure is something like

 

  class PassC p where

    getPass :: IsPass p

 

  data IsPass p where

    IsParsed      :: IsPass Parsed

    IsRenamed     :: IsParsed Renamed

    IsTypechecked :: IsParsed Typechecked

 

  instance PassC Parsed where getPass = IsParsed

    ...etc...

 

Now we could sweep away all those OutputableX classes,

replacing them with dynamic tests on the singletons IsParsed etc.

This would have advantages:

 

- Probably faster: there's a dynamic test, but many fewer dictionary

  arguments and higher-order function dispatch

 

- Only one dictionary to pass; programming is easier.

 

The big downside is that it's not extensible: it works only because

we know the three cases.  But the "Trees that Grow" story really doesn't

scale well to pretty-printing: so maybe we should just give up on that?

 

 


_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Loading...