TTG: Handling Source Locations

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

TTG: Handling Source Locations

Richard Eisenberg-4
Hi devs,

I just came across [TTG: Handling Source Locations], as I was poking around in RdrHsSyn and found wondrous things like (dL->L wiz waz) all over the place.


I see why this change is wanted and how the new version works.

It seems to me, though, that this move makes us *less typed*. That is, it would be very easy (and disastrous) to forget to match on a location node. For example, I can now do this:

> foo :: LPat p -> ...
> foo (VarPat ...) = ...

Note that I have declared that foo takes a located pat, but then I forgot to extract the location with dL. This would type-check, but it would fail. Previously, the type checker would ensure that I didn't forget to match on the L constructor. This error would get caught after some poking about, because foo just wouldn't work.

However, worse, we might forget to *add* a location when downstream functions expect one. This would be harder to detect, for two reasons:
1. The problem is caught at deconstruction, and figuring out where an object was constructed can be quite hard.
2. The problem might silently cause trouble, because dL won't actually fail on a node missing a location -- it just gives noSrcSpan. So the problem would manifest as a subtle degradation in the quality of an error message, perhaps not caught until several patches (or years!) later.

So I'm uncomfortable with this direction of travel.

Has this aspect of this design been brought up before? I have to say I don't have a great solution to suggest. Perhaps the best I can think of is to make Located a type family. It would branch on the type index to HsSyn types, introducing a Located node for GhcPass but not for other types. This Isn't really all that extensible (I think) and it gives special status to GHC's usage of the AST. But it seems to solve the immediate problems without the downside above.

Sorry for reopening something that has already been debated, but (unless I'm missing something) the current state of affairs seems like a potential wellspring of subtle bugs.

Thanks,
Richard

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: TTG: Handling Source Locations

Vladislav Zavialov-2
I wholly share this concern, which is why I commented on the Phab diff:

> Does this rely on the caller to call dL on the pattern? Very fragile, let's not do that.

In addition, I'm worried about illegal states where we end up with
multiple nested levels of `NewPat`, and calling `dL` once is not
sufficient.

As to the better solution, I think we should just go with Solution B
from the Wiki page. Yes, it's somewhat more boilerplate, but it
guarantees to have locations in the right places for all nodes. The
main argument against it was that we'd have to define `type instance
XThing (GhcPass p) = SrcSpan` for many a `Thing`, but I don't see it
as a downside at all. We should do so anyway, to get rid of parsing
API annotations and put them in the AST proper.

All the best,
Vladislav

On Sat, Feb 9, 2019 at 7:19 PM Richard Eisenberg <[hidden email]> wrote:

>
> Hi devs,
>
> I just came across [TTG: Handling Source Locations], as I was poking around in RdrHsSyn and found wondrous things like (dL->L wiz waz) all over the place.
>
> General outline: https://ghc.haskell.org/trac/ghc/wiki/ImplementingTreesThatGrow/HandlingSourceLocations
> Phab diff: https://phabricator.haskell.org/D5036
> Trac ticket: https://ghc.haskell.org/trac/ghc/ticket/15495
> Commit: https://gitlab.haskell.org/ghc/ghc/commit/509d5be69c7507ba5d0a5f39ffd1613a59e73eea
>
> I see why this change is wanted and how the new version works.
>
> It seems to me, though, that this move makes us *less typed*. That is, it would be very easy (and disastrous) to forget to match on a location node. For example, I can now do this:
>
> > foo :: LPat p -> ...
> > foo (VarPat ...) = ...
>
> Note that I have declared that foo takes a located pat, but then I forgot to extract the location with dL. This would type-check, but it would fail. Previously, the type checker would ensure that I didn't forget to match on the L constructor. This error would get caught after some poking about, because foo just wouldn't work.
>
> However, worse, we might forget to *add* a location when downstream functions expect one. This would be harder to detect, for two reasons:
> 1. The problem is caught at deconstruction, and figuring out where an object was constructed can be quite hard.
> 2. The problem might silently cause trouble, because dL won't actually fail on a node missing a location -- it just gives noSrcSpan. So the problem would manifest as a subtle degradation in the quality of an error message, perhaps not caught until several patches (or years!) later.
>
> So I'm uncomfortable with this direction of travel.
>
> Has this aspect of this design been brought up before? I have to say I don't have a great solution to suggest. Perhaps the best I can think of is to make Located a type family. It would branch on the type index to HsSyn types, introducing a Located node for GhcPass but not for other types. This Isn't really all that extensible (I think) and it gives special status to GHC's usage of the AST. But it seems to solve the immediate problems without the downside above.
>
> Sorry for reopening something that has already been debated, but (unless I'm missing something) the current state of affairs seems like a potential wellspring of subtle bugs.
>
> Thanks,
> Richard
> _______________________________________________
> ghc-devs mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: TTG: Handling Source Locations

Matthew Pickering
I just did this now, it was quite disconcerting that my code continued
to compile after applying `cL loc` to the return value of one of my
functions.

On Sat, Feb 9, 2019 at 5:40 PM Vladislav Zavialov <[hidden email]> wrote:

>
> I wholly share this concern, which is why I commented on the Phab diff:
>
> > Does this rely on the caller to call dL on the pattern? Very fragile, let's not do that.
>
> In addition, I'm worried about illegal states where we end up with
> multiple nested levels of `NewPat`, and calling `dL` once is not
> sufficient.
>
> As to the better solution, I think we should just go with Solution B
> from the Wiki page. Yes, it's somewhat more boilerplate, but it
> guarantees to have locations in the right places for all nodes. The
> main argument against it was that we'd have to define `type instance
> XThing (GhcPass p) = SrcSpan` for many a `Thing`, but I don't see it
> as a downside at all. We should do so anyway, to get rid of parsing
> API annotations and put them in the AST proper.
>
> All the best,
> Vladislav
>
> On Sat, Feb 9, 2019 at 7:19 PM Richard Eisenberg <[hidden email]> wrote:
> >
> > Hi devs,
> >
> > I just came across [TTG: Handling Source Locations], as I was poking around in RdrHsSyn and found wondrous things like (dL->L wiz waz) all over the place.
> >
> > General outline: https://ghc.haskell.org/trac/ghc/wiki/ImplementingTreesThatGrow/HandlingSourceLocations
> > Phab diff: https://phabricator.haskell.org/D5036
> > Trac ticket: https://ghc.haskell.org/trac/ghc/ticket/15495
> > Commit: https://gitlab.haskell.org/ghc/ghc/commit/509d5be69c7507ba5d0a5f39ffd1613a59e73eea
> >
> > I see why this change is wanted and how the new version works.
> >
> > It seems to me, though, that this move makes us *less typed*. That is, it would be very easy (and disastrous) to forget to match on a location node. For example, I can now do this:
> >
> > > foo :: LPat p -> ...
> > > foo (VarPat ...) = ...
> >
> > Note that I have declared that foo takes a located pat, but then I forgot to extract the location with dL. This would type-check, but it would fail. Previously, the type checker would ensure that I didn't forget to match on the L constructor. This error would get caught after some poking about, because foo just wouldn't work.
> >
> > However, worse, we might forget to *add* a location when downstream functions expect one. This would be harder to detect, for two reasons:
> > 1. The problem is caught at deconstruction, and figuring out where an object was constructed can be quite hard.
> > 2. The problem might silently cause trouble, because dL won't actually fail on a node missing a location -- it just gives noSrcSpan. So the problem would manifest as a subtle degradation in the quality of an error message, perhaps not caught until several patches (or years!) later.
> >
> > So I'm uncomfortable with this direction of travel.
> >
> > Has this aspect of this design been brought up before? I have to say I don't have a great solution to suggest. Perhaps the best I can think of is to make Located a type family. It would branch on the type index to HsSyn types, introducing a Located node for GhcPass but not for other types. This Isn't really all that extensible (I think) and it gives special status to GHC's usage of the AST. But it seems to solve the immediate problems without the downside above.
> >
> > Sorry for reopening something that has already been debated, but (unless I'm missing something) the current state of affairs seems like a potential wellspring of subtle bugs.
> >
> > Thanks,
> > Richard
> > _______________________________________________
> > ghc-devs mailing list
> > [hidden email]
> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
> _______________________________________________
> ghc-devs mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: TTG: Handling Source Locations

Shayan Najd Javadipour-2
In reply to this post by Richard Eisenberg-4
Hi Richard,

> [Richard:]
> It seems to me, though, that this move makes us *less typed*.
> [and]
> However, worse, we might forget to *add* a location when downstream functions expect one.

We had a more sophisticated version of TTG that could support the
ping-pong style (and hence typed tagging of locations), but it came at
the price of more complicated encoding [0].
We have decided to abandon the more typed variant since tracking
whether a node is located or not is inherently a dynamic/run-time
process, not a static/compile-time process:
there are some nodes that are generated in the process by the compiler
by an arbitrary logic (hard to encode by types), hence have no
location (in the source code).

The types `LHsExpr`, `LPat`, and the like will be deleted! It will be
all `HsExpr`, `Pat` and the like.
Baking-in, e.g. `LHsExpr` into `HsExpr`, was a mistake in the first
place: we were cheating using `Maybe` type anyway, when for example an
`LHsExpr` was forcibly required but we had only `HsExpr` and used
`noLoc`.

> Sorry for reopening something that has already been debated, but (unless I'm missing something) the current state of affairs seems like a potential wellspring of subtle bugs.

We were really careful about the refactoring. The new code aside, I
don't see how we can introduce any bugs by the refactoring of the old
code explained in the wiki.
About the new code, the convention is straightforward: anytime you
destruct an AST node, assume a wrapper node inside (add an extra
case), or use the smart constructors/pattern synonyms.

I'd be happy to rediscuss the design space here. It would be great to
have everyone fully on board as it is not a trivial change.

/Shayan

[0] https://github.com/shayan-najd/HsAST/blob/master/Paper.pdf

On Sat, 9 Feb 2019 at 17:19, Richard Eisenberg <[hidden email]> wrote:

>
> Hi devs,
>
> I just came across [TTG: Handling Source Locations], as I was poking around in RdrHsSyn and found wondrous things like (dL->L wiz waz) all over the place.
>
> General outline: https://ghc.haskell.org/trac/ghc/wiki/ImplementingTreesThatGrow/HandlingSourceLocations
> Phab diff: https://phabricator.haskell.org/D5036
> Trac ticket: https://ghc.haskell.org/trac/ghc/ticket/15495
> Commit: https://gitlab.haskell.org/ghc/ghc/commit/509d5be69c7507ba5d0a5f39ffd1613a59e73eea
>
> I see why this change is wanted and how the new version works.
>
> It seems to me, though, that this move makes us *less typed*. That is, it would be very easy (and disastrous) to forget to match on a location node. For example, I can now do this:
>
> > foo :: LPat p -> ...
> > foo (VarPat ...) = ...
>
> Note that I have declared that foo takes a located pat, but then I forgot to extract the location with dL. This would type-check, but it would fail. Previously, the type checker would ensure that I didn't forget to match on the L constructor. This error would get caught after some poking about, because foo just wouldn't work.
>
> However, worse, we might forget to *add* a location when downstream functions expect one. This would be harder to detect, for two reasons:
> 1. The problem is caught at deconstruction, and figuring out where an object was constructed can be quite hard.
> 2. The problem might silently cause trouble, because dL won't actually fail on a node missing a location -- it just gives noSrcSpan. So the problem would manifest as a subtle degradation in the quality of an error message, perhaps not caught until several patches (or years!) later.
>
> So I'm uncomfortable with this direction of travel.
>
> Has this aspect of this design been brought up before? I have to say I don't have a great solution to suggest. Perhaps the best I can think of is to make Located a type family. It would branch on the type index to HsSyn types, introducing a Located node for GhcPass but not for other types. This Isn't really all that extensible (I think) and it gives special status to GHC's usage of the AST. But it seems to solve the immediate problems without the downside above.
>
> Sorry for reopening something that has already been debated, but (unless I'm missing something) the current state of affairs seems like a potential wellspring of subtle bugs.
>
> Thanks,
> Richard
> _______________________________________________
> ghc-devs mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

RE: TTG: Handling Source Locations

GHC - devs mailing list
In reply to this post by Matthew Pickering
One way to think of it is this: we can now put SrcSpans where they make sense, rather than everywhere.   We can still say (Located t) in places where we want to guarantee a SrcSpan.

Yes, this lets us add more than one; that's redundant but not harmful.

Simon

|  -----Original Message-----
|  From: ghc-devs <[hidden email]> On Behalf Of Matthew
|  Pickering
|  Sent: 12 February 2019 09:08
|  To: Vladislav Zavialov <[hidden email]>
|  Cc: GHC <[hidden email]>
|  Subject: Re: TTG: Handling Source Locations
|  
|  I just did this now, it was quite disconcerting that my code continued to
|  compile after applying `cL loc` to the return value of one of my
|  functions.
|  
|  On Sat, Feb 9, 2019 at 5:40 PM Vladislav Zavialov <[hidden email]>
|  wrote:
|  >
|  > I wholly share this concern, which is why I commented on the Phab diff:
|  >
|  > > Does this rely on the caller to call dL on the pattern? Very fragile,
|  let's not do that.
|  >
|  > In addition, I'm worried about illegal states where we end up with
|  > multiple nested levels of `NewPat`, and calling `dL` once is not
|  > sufficient.
|  >
|  > As to the better solution, I think we should just go with Solution B
|  > from the Wiki page. Yes, it's somewhat more boilerplate, but it
|  > guarantees to have locations in the right places for all nodes. The
|  > main argument against it was that we'd have to define `type instance
|  > XThing (GhcPass p) = SrcSpan` for many a `Thing`, but I don't see it
|  > as a downside at all. We should do so anyway, to get rid of parsing
|  > API annotations and put them in the AST proper.
|  >
|  > All the best,
|  > Vladislav
|  >
|  > On Sat, Feb 9, 2019 at 7:19 PM Richard Eisenberg <[hidden email]>
|  wrote:
|  > >
|  > > Hi devs,
|  > >
|  > > I just came across [TTG: Handling Source Locations], as I was poking
|  around in RdrHsSyn and found wondrous things like (dL->L wiz waz) all
|  over the place.
|  > >
|  > > General outline:
|  > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgh
|  > > c.haskell.org%2Ftrac%2Fghc%2Fwiki%2FImplementingTreesThatGrow%2FHand
|  > > lingSourceLocations&amp;data=02%7C01%7Csimonpj%40microsoft.com%7C915
|  > > 2cd5c5b624a9fac5c08d690c9a908%7C72f988bf86f141af91ab2d7cd011db47%7C1
|  > > %7C0%7C636855593134767677&amp;sdata=I6kltUVNtcMItCao1dPvnM86%2FlE8ky
|  > > CwshV81dD6mbY%3D&amp;reserved=0 Phab diff:
|  > > https://phabricator.haskell.org/D5036
|  > > Trac ticket:
|  > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgh
|  > > c.haskell.org%2Ftrac%2Fghc%2Fticket%2F15495&amp;data=02%7C01%7Csimon
|  > > pj%40microsoft.com%7C9152cd5c5b624a9fac5c08d690c9a908%7C72f988bf86f1
|  > > 41af91ab2d7cd011db47%7C1%7C0%7C636855593134767677&amp;sdata=VeRbLhJD
|  > > ZQv%2FCZ39lMpwo2SRhmcyIsHRgwXNYDN28cA%3D&amp;reserved=0
|  > > Commit:
|  > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi
|  > > tlab.haskell.org%2Fghc%2Fghc%2Fcommit%2F509d5be69c7507ba5d0a5f39ffd1
|  > > 613a59e73eea&amp;data=02%7C01%7Csimonpj%40microsoft.com%7C9152cd5c5b
|  > > 624a9fac5c08d690c9a908%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C
|  > > 636855593134767677&amp;sdata=nv9GjvSvGweBPmsHEVD1jBB7yz0Br0hDHtZ5Exv
|  > > uDqU%3D&amp;reserved=0
|  > >
|  > > I see why this change is wanted and how the new version works.
|  > >
|  > > It seems to me, though, that this move makes us *less typed*. That
|  is, it would be very easy (and disastrous) to forget to match on a
|  location node. For example, I can now do this:
|  > >
|  > > > foo :: LPat p -> ...
|  > > > foo (VarPat ...) = ...
|  > >
|  > > Note that I have declared that foo takes a located pat, but then I
|  forgot to extract the location with dL. This would type-check, but it
|  would fail. Previously, the type checker would ensure that I didn't
|  forget to match on the L constructor. This error would get caught after
|  some poking about, because foo just wouldn't work.
|  > >
|  > > However, worse, we might forget to *add* a location when downstream
|  functions expect one. This would be harder to detect, for two reasons:
|  > > 1. The problem is caught at deconstruction, and figuring out where an
|  object was constructed can be quite hard.
|  > > 2. The problem might silently cause trouble, because dL won't
|  actually fail on a node missing a location -- it just gives noSrcSpan. So
|  the problem would manifest as a subtle degradation in the quality of an
|  error message, perhaps not caught until several patches (or years!)
|  later.
|  > >
|  > > So I'm uncomfortable with this direction of travel.
|  > >
|  > > Has this aspect of this design been brought up before? I have to say
|  I don't have a great solution to suggest. Perhaps the best I can think of
|  is to make Located a type family. It would branch on the type index to
|  HsSyn types, introducing a Located node for GhcPass but not for other
|  types. This Isn't really all that extensible (I think) and it gives
|  special status to GHC's usage of the AST. But it seems to solve the
|  immediate problems without the downside above.
|  > >
|  > > Sorry for reopening something that has already been debated, but
|  (unless I'm missing something) the current state of affairs seems like a
|  potential wellspring of subtle bugs.
|  > >
|  > > Thanks,
|  > > Richard
|  > > _______________________________________________
|  > > ghc-devs mailing list
|  > > [hidden email]
|  > > https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmai
|  > > l.haskell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-devs&amp;data=02%
|  > > 7C01%7Csimonpj%40microsoft.com%7C9152cd5c5b624a9fac5c08d690c9a908%7C
|  > > 72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636855593134767677&amp;sd
|  > > ata=EAHftJhgmjTvm8f3de99dOFSoddfwu1KPoVHMwP1KtA%3D&amp;reserved=0
|  > _______________________________________________
|  > ghc-devs mailing list
|  > [hidden email]
|  > https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.
|  > haskell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-devs&amp;data=02%7C01
|  > %7Csimonpj%40microsoft.com%7C9152cd5c5b624a9fac5c08d690c9a908%7C72f988
|  > bf86f141af91ab2d7cd011db47%7C1%7C0%7C636855593134767677&amp;sdata=EAHf
|  > tJhgmjTvm8f3de99dOFSoddfwu1KPoVHMwP1KtA%3D&amp;reserved=0
|  _______________________________________________
|  ghc-devs mailing list
|  [hidden email]
|  https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.has
|  kell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-
|  devs&amp;data=02%7C01%7Csimonpj%40microsoft.com%7C9152cd5c5b624a9fac5c08d
|  690c9a908%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636855593134767677
|  &amp;sdata=EAHftJhgmjTvm8f3de99dOFSoddfwu1KPoVHMwP1KtA%3D&amp;reserved=0
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: TTG: Handling Source Locations

Vladislav Zavialov-2
> One way to think of it is this: we can now put SrcSpans where they make sense, rather than everywhere.

I claim an SrcSpan makes sense everywhere, so this is not a useful
distinction. Think about it as code provenance, an AST node always
comes from somewhere: a user-written .hs file, a GHCi command, or
compiler-generated code (via TH or deriving). We should never omit
this information from a node.

And when we are writing code that consumes an AST, it always makes
sense to ask what the provenance of a node is, for example to use it
in an error message.

> this lets us add more than one; that's redundant but not harmful

It goes against the philosophy of making illegal states
irrepresentable. Now all code must be careful not to end up in an
illegal state of nested SrcSpan, without any help from the
typechecker.

The code that pattern matches on an AST, at the same time, must be
prepared to handle this case anyway (or else we risk to crash), which
it currently does with stripSrcSpanPat in the implementation of dL.

And having to remember to apply dL when matching on the AST is more
trivia to learn and remember. Not even a warning if one forgets to do
that, no appropriate place to explain this to new contributors
(reading another Note just to start doing anything at all with the
AST? unnecessary friction), and only a test failure at best in case of
a mistake.

My concrete proposal: let's just put SrcSpan in the extension fields
of each node. In other words, take these lines

type instance XVarPat  (GhcPass _) = NoExt
type instance XLazyPat (GhcPass _) = NoExt
type instance XAsPat   (GhcPass _) = NoExt
type instance XParPat  (GhcPass _) = NoExt
type instance XBangPat (GhcPass _) = NoExt
...

and replace them with

type instance XVarPat  (GhcPass _) = SrcSpan
type instance XLazyPat (GhcPass _) = SrcSpan
type instance XAsPat   (GhcPass _) = SrcSpan
type instance XParPat  (GhcPass _) = SrcSpan
type instance XBangPat (GhcPass _) = SrcSpan
...

And don't bother with the HasSrcSpan class, don't define
composeSrcSpan and decomposeSrcSpan. Very straightforward and
beneficial for both producers and consumers of an AST.

All the best,
Vladislav
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: TTG: Handling Source Locations

Richard Eisenberg-4
In reply to this post by Shayan Najd Javadipour-2


> On Feb 12, 2019, at 5:19 AM, Shayan Najd <[hidden email]> wrote:
>
> About the new code, the convention is straightforward: anytime you
> destruct an AST node, assume a wrapper node inside (add an extra
> case), or use the smart constructors/pattern synonyms.

Aha! This, I did not know. So, you're saying that all the consumers of the GHC AST need to remember to use dL every time they pattern-match. With the new design, using dL when it's unnecessary doesn't hurt, but forgetting it is problematic. So: just use it every time. My problem, though, is that this is just a convention -- no one checks it. It would be easy to forget.

> On Feb 12, 2019, at 6:00 AM, Simon Peyton Jones via ghc-devs <[hidden email]> wrote:
>
> One way to think of it is this: we can now put SrcSpans where they make sense, rather than everywhere.

This has some logic to it, but I'm not quite sold. Another way of saying this is that the new design favors flexibility for the producer, at the cost of requiring consumers to be aware of and consistently apply the convention Shayan describes above. The problem is, though, that if the producer is stingy in adding source locations, the consumer won't know which locations are expected to be informative. Is the consumer expected to collect locations from a variety of places and try to combine them somehow? I doubt it. So this means that the flexibility for the producer isn't really there -- the type system will accept arbitrary choices of where to put locations, but consumers won't get the locations where they expect them.

>   We can still say (Located t) in places where we want to guarantee a SrcSpan.

This seems to go against the TTG philosophy. We can do this in, say, the return type of a function, but we can't in the AST proper, because that's shared among a number of clients, some of whom don't want the source locations.

>
> Yes, this lets us add more than one; that's redundant but not harmful.

I disagree here. If we add locations to a node twice, then we'll have to use dL twice to find the underlying constructor. This is another case there the type system offers the producer flexibility but hamstrings the consumer.


> On Feb 12, 2019, at 7:32 AM, Vladislav Zavialov <[hidden email]> wrote:
>
> I claim an SrcSpan makes sense everywhere, so this is not a useful
> distinction. Think about it as code provenance, an AST node always
> comes from somewhere

I agree with this observation. Perhaps SrcSpan is a bad name, and SrcProvenance is better. We could even imagine using the new HasCallStack feature to track where generated code comes from (perhaps only in DEBUG compilers). Do we need to do this today? I'm not sure there's a crying need. But philosophically, we are able to attach a provenance to every slice of AST, so there's really no reason for uninformative locations.

> My concrete proposal: let's just put SrcSpan in the extension fields
> of each node

I support this counter-proposal. Perhaps if it required writing loads of extra type instances, I wouldn't be as much in favor. But we already have to write those instances -- they just change from NoExt to SrcSpan. This seems to solve all the problems nicely, at relatively low cost. And, I'm sure it's more efficient at runtime than either the previous ping-pong style or the current scenario, as we can pattern-match on constructors directly, requiring one less pointer-chase or function call.

One downside of this proposal is that it means that more care will have to be taken when setting the extension field of AST nodes after a pass, making sure to preserve the location. (This isn't really all that different from location-shuffling today.) A quick mock-up shows that record-updates can make this easier:

> data Phase = Parsed | Renamed
>
> data Exp p = Node (XNode p) Int
>
> type family XNode (p :: Phase)
> type instance XNode p = NodeExt p
>
> data NodeExt p where
>   NodeExt :: { flag :: Bool, fvs :: RenamedOnly p String } -> NodeExt p
>
> type family RenamedOnly p t where
>   RenamedOnly Parsed _ = ()
>   RenamedOnly Renamed t = t
>
> example :: Exp Parsed
> example = Node (NodeExt { flag = True, fvs = () }) 5
>
> rename :: Exp Parsed -> Exp Renamed
> rename (Node ext n) = Node (ext { fvs = "xyz" }) n

Note that the extension point is a record type that has a field available only after renaming. We can then do a type-changing record update when producing the renamed node, preserving the flag in the code above. What's sad is that, if there were no renamer-only field, we couldn't do a type-changing empty record update as the default case. (Haskell doesn't have empty record updates. Perhaps it should. They would be useful in doing a type-change on a datatype with a phantom index. A clever compiler could even somehow ensure that such a record update is completely compiled away.) In any case, this little example is essentially orthogonal to my points above, and the choice of whether to use records or other structures are completely local to the extension point. I just thought it might make for a nice style.

Thanks,
Richard
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: TTG: Handling Source Locations

Shayan Najd Javadipour-2
> My problem, though, is that this is just a convention -- no one checks it. It would be easy to forget.

I am not sure if I understand: shouldn't the totality checker warn if
there is no pattern for the wrapper constructor (hence enforce the
convention)?


On Tue, 12 Feb 2019 at 15:19, Richard Eisenberg <[hidden email]> wrote:

>
>
>
> > On Feb 12, 2019, at 5:19 AM, Shayan Najd <[hidden email]> wrote:
> >
> > About the new code, the convention is straightforward: anytime you
> > destruct an AST node, assume a wrapper node inside (add an extra
> > case), or use the smart constructors/pattern synonyms.
>
> Aha! This, I did not know. So, you're saying that all the consumers of the GHC AST need to remember to use dL every time they pattern-match. With the new design, using dL when it's unnecessary doesn't hurt, but forgetting it is problematic. So: just use it every time. My problem, though, is that this is just a convention -- no one checks it. It would be easy to forget.
>
> > On Feb 12, 2019, at 6:00 AM, Simon Peyton Jones via ghc-devs <[hidden email]> wrote:
> >
> > One way to think of it is this: we can now put SrcSpans where they make sense, rather than everywhere.
>
> This has some logic to it, but I'm not quite sold. Another way of saying this is that the new design favors flexibility for the producer, at the cost of requiring consumers to be aware of and consistently apply the convention Shayan describes above. The problem is, though, that if the producer is stingy in adding source locations, the consumer won't know which locations are expected to be informative. Is the consumer expected to collect locations from a variety of places and try to combine them somehow? I doubt it. So this means that the flexibility for the producer isn't really there -- the type system will accept arbitrary choices of where to put locations, but consumers won't get the locations where they expect them.
>
> >   We can still say (Located t) in places where we want to guarantee a SrcSpan.
>
> This seems to go against the TTG philosophy. We can do this in, say, the return type of a function, but we can't in the AST proper, because that's shared among a number of clients, some of whom don't want the source locations.
>
> >
> > Yes, this lets us add more than one; that's redundant but not harmful.
>
> I disagree here. If we add locations to a node twice, then we'll have to use dL twice to find the underlying constructor. This is another case there the type system offers the producer flexibility but hamstrings the consumer.
>
>
> > On Feb 12, 2019, at 7:32 AM, Vladislav Zavialov <[hidden email]> wrote:
> >
> > I claim an SrcSpan makes sense everywhere, so this is not a useful
> > distinction. Think about it as code provenance, an AST node always
> > comes from somewhere
>
> I agree with this observation. Perhaps SrcSpan is a bad name, and SrcProvenance is better. We could even imagine using the new HasCallStack feature to track where generated code comes from (perhaps only in DEBUG compilers). Do we need to do this today? I'm not sure there's a crying need. But philosophically, we are able to attach a provenance to every slice of AST, so there's really no reason for uninformative locations.
>
> > My concrete proposal: let's just put SrcSpan in the extension fields
> > of each node
>
> I support this counter-proposal. Perhaps if it required writing loads of extra type instances, I wouldn't be as much in favor. But we already have to write those instances -- they just change from NoExt to SrcSpan. This seems to solve all the problems nicely, at relatively low cost. And, I'm sure it's more efficient at runtime than either the previous ping-pong style or the current scenario, as we can pattern-match on constructors directly, requiring one less pointer-chase or function call.
>
> One downside of this proposal is that it means that more care will have to be taken when setting the extension field of AST nodes after a pass, making sure to preserve the location. (This isn't really all that different from location-shuffling today.) A quick mock-up shows that record-updates can make this easier:
>
> > data Phase = Parsed | Renamed
> >
> > data Exp p = Node (XNode p) Int
> >
> > type family XNode (p :: Phase)
> > type instance XNode p = NodeExt p
> >
> > data NodeExt p where
> >   NodeExt :: { flag :: Bool, fvs :: RenamedOnly p String } -> NodeExt p
> >
> > type family RenamedOnly p t where
> >   RenamedOnly Parsed _ = ()
> >   RenamedOnly Renamed t = t
> >
> > example :: Exp Parsed
> > example = Node (NodeExt { flag = True, fvs = () }) 5
> >
> > rename :: Exp Parsed -> Exp Renamed
> > rename (Node ext n) = Node (ext { fvs = "xyz" }) n
>
> Note that the extension point is a record type that has a field available only after renaming. We can then do a type-changing record update when producing the renamed node, preserving the flag in the code above. What's sad is that, if there were no renamer-only field, we couldn't do a type-changing empty record update as the default case. (Haskell doesn't have empty record updates. Perhaps it should. They would be useful in doing a type-change on a datatype with a phantom index. A clever compiler could even somehow ensure that such a record update is completely compiled away.) In any case, this little example is essentially orthogonal to my points above, and the choice of whether to use records or other structures are completely local to the extension point. I just thought it might make for a nice style.
>
> Thanks,
> Richard
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: TTG: Handling Source Locations

Shayan Najd Javadipour-2
> [Richard:] I disagree here. If we add locations to a node twice, then we'll have to use dL twice to find the underlying constructor. This is another case there the type system offers the producer flexibility but hamstrings the consumer.

Depends on the semantics of `dL`: currently  (for `Pat`) it returns
the top-level `SrcSpan` and then the underlying node with all the
inner wrappers stripped away. So one use of `dL` is enough in this
semantic.
(see https://github.com/ghc/ghc/blob/master/compiler/hsSyn/HsPat.hs#L341)

> [Vlad:] As to the better solution, I think we should just go with Solution B from the Wiki page.
> [Richard:] I support this counter-proposal. Perhaps if it required writing loads of extra type instances, I wouldn't be as much in favour.

It may help to identify at least three sorts of functions commonly
used *currently* in GHC when interacting with AST nodes (please add,
if I am missing some):
(a) those that ignore source locations;
(b) those that generically handle source locations regardless of the
constructor of the underlying node; and
(c) those that handle source locations case-by-case (often by nested
pattern matching).

The key issue with Solution B, as listed in the wiki, is that it ruins
the separation of two concerns in functions working on AST nodes:
handling source locations, and the actual logic of the function.
With the ping-pong style, handling of source locations is sometimes
refactored in a separate function, and with Solution A refactored in a
separate case/function clause.
With Solution B, however, every time we construct a node we should
have a source location ready to put into it.
That is, with Solution B, (a) and (b) are not cleanly implemented.

(I can explain more if not clear.)

/Shayan

On Tue, 12 Feb 2019 at 15:30, Shayan Najd <[hidden email]> wrote:

>
> > My problem, though, is that this is just a convention -- no one checks it. It would be easy to forget.
>
> I am not sure if I understand: shouldn't the totality checker warn if
> there is no pattern for the wrapper constructor (hence enforce the
> convention)?
>
>
> On Tue, 12 Feb 2019 at 15:19, Richard Eisenberg <[hidden email]> wrote:
> >
> >
> >
> > > On Feb 12, 2019, at 5:19 AM, Shayan Najd <[hidden email]> wrote:
> > >
> > > About the new code, the convention is straightforward: anytime you
> > > destruct an AST node, assume a wrapper node inside (add an extra
> > > case), or use the smart constructors/pattern synonyms.
> >
> > Aha! This, I did not know. So, you're saying that all the consumers of the GHC AST need to remember to use dL every time they pattern-match. With the new design, using dL when it's unnecessary doesn't hurt, but forgetting it is problematic. So: just use it every time. My problem, though, is that this is just a convention -- no one checks it. It would be easy to forget.
> >
> > > On Feb 12, 2019, at 6:00 AM, Simon Peyton Jones via ghc-devs <[hidden email]> wrote:
> > >
> > > One way to think of it is this: we can now put SrcSpans where they make sense, rather than everywhere.
> >
> > This has some logic to it, but I'm not quite sold. Another way of saying this is that the new design favors flexibility for the producer, at the cost of requiring consumers to be aware of and consistently apply the convention Shayan describes above. The problem is, though, that if the producer is stingy in adding source locations, the consumer won't know which locations are expected to be informative. Is the consumer expected to collect locations from a variety of places and try to combine them somehow? I doubt it. So this means that the flexibility for the producer isn't really there -- the type system will accept arbitrary choices of where to put locations, but consumers won't get the locations where they expect them.
> >
> > >   We can still say (Located t) in places where we want to guarantee a SrcSpan.
> >
> > This seems to go against the TTG philosophy. We can do this in, say, the return type of a function, but we can't in the AST proper, because that's shared among a number of clients, some of whom don't want the source locations.
> >
> > >
> > > Yes, this lets us add more than one; that's redundant but not harmful.
> >
> > I disagree here. If we add locations to a node twice, then we'll have to use dL twice to find the underlying constructor. This is another case there the type system offers the producer flexibility but hamstrings the consumer.
> >
> >
> > > On Feb 12, 2019, at 7:32 AM, Vladislav Zavialov <[hidden email]> wrote:
> > >
> > > I claim an SrcSpan makes sense everywhere, so this is not a useful
> > > distinction. Think about it as code provenance, an AST node always
> > > comes from somewhere
> >
> > I agree with this observation. Perhaps SrcSpan is a bad name, and SrcProvenance is better. We could even imagine using the new HasCallStack feature to track where generated code comes from (perhaps only in DEBUG compilers). Do we need to do this today? I'm not sure there's a crying need. But philosophically, we are able to attach a provenance to every slice of AST, so there's really no reason for uninformative locations.
> >
> > > My concrete proposal: let's just put SrcSpan in the extension fields
> > > of each node
> >
> > I support this counter-proposal. Perhaps if it required writing loads of extra type instances, I wouldn't be as much in favor. But we already have to write those instances -- they just change from NoExt to SrcSpan. This seems to solve all the problems nicely, at relatively low cost. And, I'm sure it's more efficient at runtime than either the previous ping-pong style or the current scenario, as we can pattern-match on constructors directly, requiring one less pointer-chase or function call.
> >
> > One downside of this proposal is that it means that more care will have to be taken when setting the extension field of AST nodes after a pass, making sure to preserve the location. (This isn't really all that different from location-shuffling today.) A quick mock-up shows that record-updates can make this easier:
> >
> > > data Phase = Parsed | Renamed
> > >
> > > data Exp p = Node (XNode p) Int
> > >
> > > type family XNode (p :: Phase)
> > > type instance XNode p = NodeExt p
> > >
> > > data NodeExt p where
> > >   NodeExt :: { flag :: Bool, fvs :: RenamedOnly p String } -> NodeExt p
> > >
> > > type family RenamedOnly p t where
> > >   RenamedOnly Parsed _ = ()
> > >   RenamedOnly Renamed t = t
> > >
> > > example :: Exp Parsed
> > > example = Node (NodeExt { flag = True, fvs = () }) 5
> > >
> > > rename :: Exp Parsed -> Exp Renamed
> > > rename (Node ext n) = Node (ext { fvs = "xyz" }) n
> >
> > Note that the extension point is a record type that has a field available only after renaming. We can then do a type-changing record update when producing the renamed node, preserving the flag in the code above. What's sad is that, if there were no renamer-only field, we couldn't do a type-changing empty record update as the default case. (Haskell doesn't have empty record updates. Perhaps it should. They would be useful in doing a type-change on a datatype with a phantom index. A clever compiler could even somehow ensure that such a record update is completely compiled away.) In any case, this little example is essentially orthogonal to my points above, and the choice of whether to use records or other structures are completely local to the extension point. I just thought it might make for a nice style.
> >
> > Thanks,
> > Richard
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: TTG: Handling Source Locations

Richard Eisenberg-4
In reply to this post by Shayan Najd Javadipour-2
That's true, but how would it play out in practice? For example, take a look at RnPat. There is a rnLPatAndThen which uses wrapSrcSpanCps to extract the location and then call rnPatAndThen. rnPatAndThen, in turn, just panics if it sees the extension point, because that's an unexpected constructor. Someone could easily call rnPatAndThen when they should call rnLPatAndThen. This would cause a panic.

There's also the problem that the pattern-match checker can't usefully look through view patterns. If there is a nested pattern-match (that is, we see dL->L _ (SomeOtherConstructor), then there is no way to guarantee a complete pattern-match short of a catch-all. So it doesn't seem to me that the pattern-match checker is really helping us achieve what we want here.

Richard

> On Feb 12, 2019, at 9:30 AM, Shayan Najd <[hidden email]> wrote:
>
>> My problem, though, is that this is just a convention -- no one checks it. It would be easy to forget.
>
> I am not sure if I understand: shouldn't the totality checker warn if
> there is no pattern for the wrapper constructor (hence enforce the
> convention)?
>
>
> On Tue, 12 Feb 2019 at 15:19, Richard Eisenberg <[hidden email]> wrote:
>>
>>
>>
>>> On Feb 12, 2019, at 5:19 AM, Shayan Najd <[hidden email]> wrote:
>>>
>>> About the new code, the convention is straightforward: anytime you
>>> destruct an AST node, assume a wrapper node inside (add an extra
>>> case), or use the smart constructors/pattern synonyms.
>>
>> Aha! This, I did not know. So, you're saying that all the consumers of the GHC AST need to remember to use dL every time they pattern-match. With the new design, using dL when it's unnecessary doesn't hurt, but forgetting it is problematic. So: just use it every time. My problem, though, is that this is just a convention -- no one checks it. It would be easy to forget.
>>
>>> On Feb 12, 2019, at 6:00 AM, Simon Peyton Jones via ghc-devs <[hidden email]> wrote:
>>>
>>> One way to think of it is this: we can now put SrcSpans where they make sense, rather than everywhere.
>>
>> This has some logic to it, but I'm not quite sold. Another way of saying this is that the new design favors flexibility for the producer, at the cost of requiring consumers to be aware of and consistently apply the convention Shayan describes above. The problem is, though, that if the producer is stingy in adding source locations, the consumer won't know which locations are expected to be informative. Is the consumer expected to collect locations from a variety of places and try to combine them somehow? I doubt it. So this means that the flexibility for the producer isn't really there -- the type system will accept arbitrary choices of where to put locations, but consumers won't get the locations where they expect them.
>>
>>>  We can still say (Located t) in places where we want to guarantee a SrcSpan.
>>
>> This seems to go against the TTG philosophy. We can do this in, say, the return type of a function, but we can't in the AST proper, because that's shared among a number of clients, some of whom don't want the source locations.
>>
>>>
>>> Yes, this lets us add more than one; that's redundant but not harmful.
>>
>> I disagree here. If we add locations to a node twice, then we'll have to use dL twice to find the underlying constructor. This is another case there the type system offers the producer flexibility but hamstrings the consumer.
>>
>>
>>> On Feb 12, 2019, at 7:32 AM, Vladislav Zavialov <[hidden email]> wrote:
>>>
>>> I claim an SrcSpan makes sense everywhere, so this is not a useful
>>> distinction. Think about it as code provenance, an AST node always
>>> comes from somewhere
>>
>> I agree with this observation. Perhaps SrcSpan is a bad name, and SrcProvenance is better. We could even imagine using the new HasCallStack feature to track where generated code comes from (perhaps only in DEBUG compilers). Do we need to do this today? I'm not sure there's a crying need. But philosophically, we are able to attach a provenance to every slice of AST, so there's really no reason for uninformative locations.
>>
>>> My concrete proposal: let's just put SrcSpan in the extension fields
>>> of each node
>>
>> I support this counter-proposal. Perhaps if it required writing loads of extra type instances, I wouldn't be as much in favor. But we already have to write those instances -- they just change from NoExt to SrcSpan. This seems to solve all the problems nicely, at relatively low cost. And, I'm sure it's more efficient at runtime than either the previous ping-pong style or the current scenario, as we can pattern-match on constructors directly, requiring one less pointer-chase or function call.
>>
>> One downside of this proposal is that it means that more care will have to be taken when setting the extension field of AST nodes after a pass, making sure to preserve the location. (This isn't really all that different from location-shuffling today.) A quick mock-up shows that record-updates can make this easier:
>>
>>> data Phase = Parsed | Renamed
>>>
>>> data Exp p = Node (XNode p) Int
>>>
>>> type family XNode (p :: Phase)
>>> type instance XNode p = NodeExt p
>>>
>>> data NodeExt p where
>>>  NodeExt :: { flag :: Bool, fvs :: RenamedOnly p String } -> NodeExt p
>>>
>>> type family RenamedOnly p t where
>>>  RenamedOnly Parsed _ = ()
>>>  RenamedOnly Renamed t = t
>>>
>>> example :: Exp Parsed
>>> example = Node (NodeExt { flag = True, fvs = () }) 5
>>>
>>> rename :: Exp Parsed -> Exp Renamed
>>> rename (Node ext n) = Node (ext { fvs = "xyz" }) n
>>
>> Note that the extension point is a record type that has a field available only after renaming. We can then do a type-changing record update when producing the renamed node, preserving the flag in the code above. What's sad is that, if there were no renamer-only field, we couldn't do a type-changing empty record update as the default case. (Haskell doesn't have empty record updates. Perhaps it should. They would be useful in doing a type-change on a datatype with a phantom index. A clever compiler could even somehow ensure that such a record update is completely compiled away.) In any case, this little example is essentially orthogonal to my points above, and the choice of whether to use records or other structures are completely local to the extension point. I just thought it might make for a nice style.
>>
>> Thanks,
>> Richard

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: TTG: Handling Source Locations

Shayan Najd Javadipour-2
> Someone could easily call rnPatAndThen when they should call rnLPatAndThen. This would cause a panic.

With Solution A, there shouldn't be two functions `rnLPatAndThen` and
`rnPatAndThen` anyways. There should be only `rnPatAndThen` with an
extra case for the wrapper node.

> There's also the problem that the pattern-match checker can't usefully look through view patterns.

Yes, I have reported it while back. I don't know of the progress in fixing this.

On Tue, 12 Feb 2019 at 16:24, Richard Eisenberg <[hidden email]> wrote:

>
> That's true, but how would it play out in practice? For example, take a look at RnPat. There is a rnLPatAndThen which uses wrapSrcSpanCps to extract the location and then call rnPatAndThen. rnPatAndThen, in turn, just panics if it sees the extension point, because that's an unexpected constructor. Someone could easily call rnPatAndThen when they should call rnLPatAndThen. This would cause a panic.
>
> There's also the problem that the pattern-match checker can't usefully look through view patterns. If there is a nested pattern-match (that is, we see dL->L _ (SomeOtherConstructor), then there is no way to guarantee a complete pattern-match short of a catch-all. So it doesn't seem to me that the pattern-match checker is really helping us achieve what we want here.
>
> Richard
>
> > On Feb 12, 2019, at 9:30 AM, Shayan Najd <[hidden email]> wrote:
> >
> >> My problem, though, is that this is just a convention -- no one checks it. It would be easy to forget.
> >
> > I am not sure if I understand: shouldn't the totality checker warn if
> > there is no pattern for the wrapper constructor (hence enforce the
> > convention)?
> >
> >
> > On Tue, 12 Feb 2019 at 15:19, Richard Eisenberg <[hidden email]> wrote:
> >>
> >>
> >>
> >>> On Feb 12, 2019, at 5:19 AM, Shayan Najd <[hidden email]> wrote:
> >>>
> >>> About the new code, the convention is straightforward: anytime you
> >>> destruct an AST node, assume a wrapper node inside (add an extra
> >>> case), or use the smart constructors/pattern synonyms.
> >>
> >> Aha! This, I did not know. So, you're saying that all the consumers of the GHC AST need to remember to use dL every time they pattern-match. With the new design, using dL when it's unnecessary doesn't hurt, but forgetting it is problematic. So: just use it every time. My problem, though, is that this is just a convention -- no one checks it. It would be easy to forget.
> >>
> >>> On Feb 12, 2019, at 6:00 AM, Simon Peyton Jones via ghc-devs <[hidden email]> wrote:
> >>>
> >>> One way to think of it is this: we can now put SrcSpans where they make sense, rather than everywhere.
> >>
> >> This has some logic to it, but I'm not quite sold. Another way of saying this is that the new design favors flexibility for the producer, at the cost of requiring consumers to be aware of and consistently apply the convention Shayan describes above. The problem is, though, that if the producer is stingy in adding source locations, the consumer won't know which locations are expected to be informative. Is the consumer expected to collect locations from a variety of places and try to combine them somehow? I doubt it. So this means that the flexibility for the producer isn't really there -- the type system will accept arbitrary choices of where to put locations, but consumers won't get the locations where they expect them.
> >>
> >>>  We can still say (Located t) in places where we want to guarantee a SrcSpan.
> >>
> >> This seems to go against the TTG philosophy. We can do this in, say, the return type of a function, but we can't in the AST proper, because that's shared among a number of clients, some of whom don't want the source locations.
> >>
> >>>
> >>> Yes, this lets us add more than one; that's redundant but not harmful.
> >>
> >> I disagree here. If we add locations to a node twice, then we'll have to use dL twice to find the underlying constructor. This is another case there the type system offers the producer flexibility but hamstrings the consumer.
> >>
> >>
> >>> On Feb 12, 2019, at 7:32 AM, Vladislav Zavialov <[hidden email]> wrote:
> >>>
> >>> I claim an SrcSpan makes sense everywhere, so this is not a useful
> >>> distinction. Think about it as code provenance, an AST node always
> >>> comes from somewhere
> >>
> >> I agree with this observation. Perhaps SrcSpan is a bad name, and SrcProvenance is better. We could even imagine using the new HasCallStack feature to track where generated code comes from (perhaps only in DEBUG compilers). Do we need to do this today? I'm not sure there's a crying need. But philosophically, we are able to attach a provenance to every slice of AST, so there's really no reason for uninformative locations.
> >>
> >>> My concrete proposal: let's just put SrcSpan in the extension fields
> >>> of each node
> >>
> >> I support this counter-proposal. Perhaps if it required writing loads of extra type instances, I wouldn't be as much in favor. But we already have to write those instances -- they just change from NoExt to SrcSpan. This seems to solve all the problems nicely, at relatively low cost. And, I'm sure it's more efficient at runtime than either the previous ping-pong style or the current scenario, as we can pattern-match on constructors directly, requiring one less pointer-chase or function call.
> >>
> >> One downside of this proposal is that it means that more care will have to be taken when setting the extension field of AST nodes after a pass, making sure to preserve the location. (This isn't really all that different from location-shuffling today.) A quick mock-up shows that record-updates can make this easier:
> >>
> >>> data Phase = Parsed | Renamed
> >>>
> >>> data Exp p = Node (XNode p) Int
> >>>
> >>> type family XNode (p :: Phase)
> >>> type instance XNode p = NodeExt p
> >>>
> >>> data NodeExt p where
> >>>  NodeExt :: { flag :: Bool, fvs :: RenamedOnly p String } -> NodeExt p
> >>>
> >>> type family RenamedOnly p t where
> >>>  RenamedOnly Parsed _ = ()
> >>>  RenamedOnly Renamed t = t
> >>>
> >>> example :: Exp Parsed
> >>> example = Node (NodeExt { flag = True, fvs = () }) 5
> >>>
> >>> rename :: Exp Parsed -> Exp Renamed
> >>> rename (Node ext n) = Node (ext { fvs = "xyz" }) n
> >>
> >> Note that the extension point is a record type that has a field available only after renaming. We can then do a type-changing record update when producing the renamed node, preserving the flag in the code above. What's sad is that, if there were no renamer-only field, we couldn't do a type-changing empty record update as the default case. (Haskell doesn't have empty record updates. Perhaps it should. They would be useful in doing a type-change on a datatype with a phantom index. A clever compiler could even somehow ensure that such a record update is completely compiled away.) In any case, this little example is essentially orthogonal to my points above, and the choice of whether to use records or other structures are completely local to the extension point. I just thought it might make for a nice style.
> >>
> >> Thanks,
> >> Richard
>
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: TTG: Handling Source Locations

Adam Gundry-2
While we're on the topic, is there any plan to get rid of all those
panics? AFAICS they are entirely unnecessary: we should just use an
empty datatype for unused constructor extension points, then we can
eliminate it to get whatever we like. See #15247.

Adam


On 12/02/2019 15:40, Shayan Najd wrote:

>> Someone could easily call rnPatAndThen when they should call rnLPatAndThen. This would cause a panic.
>
> With Solution A, there shouldn't be two functions `rnLPatAndThen` and
> `rnPatAndThen` anyways. There should be only `rnPatAndThen` with an
> extra case for the wrapper node.
>
>> There's also the problem that the pattern-match checker can't usefully look through view patterns.
>
> Yes, I have reported it while back. I don't know of the progress in fixing this.
>
> On Tue, 12 Feb 2019 at 16:24, Richard Eisenberg <[hidden email]> wrote:
>>
>> That's true, but how would it play out in practice? For example, take a look at RnPat. There is a rnLPatAndThen which uses wrapSrcSpanCps to extract the location and then call rnPatAndThen. rnPatAndThen, in turn, just panics if it sees the extension point, because that's an unexpected constructor. Someone could easily call rnPatAndThen when they should call rnLPatAndThen. This would cause a panic.
>>
>> There's also the problem that the pattern-match checker can't usefully look through view patterns. If there is a nested pattern-match (that is, we see dL->L _ (SomeOtherConstructor), then there is no way to guarantee a complete pattern-match short of a catch-all. So it doesn't seem to me that the pattern-match checker is really helping us achieve what we want here.
>>
>> Richard
>>
>>> On Feb 12, 2019, at 9:30 AM, Shayan Najd <[hidden email]> wrote:
>>>
>>>> My problem, though, is that this is just a convention -- no one checks it. It would be easy to forget.
>>>
>>> I am not sure if I understand: shouldn't the totality checker warn if
>>> there is no pattern for the wrapper constructor (hence enforce the
>>> convention)?
>>>
>>>
>>> On Tue, 12 Feb 2019 at 15:19, Richard Eisenberg <[hidden email]> wrote:
>>>>
>>>>
>>>>
>>>>> On Feb 12, 2019, at 5:19 AM, Shayan Najd <[hidden email]> wrote:
>>>>>
>>>>> About the new code, the convention is straightforward: anytime you
>>>>> destruct an AST node, assume a wrapper node inside (add an extra
>>>>> case), or use the smart constructors/pattern synonyms.
>>>>
>>>> Aha! This, I did not know. So, you're saying that all the consumers of the GHC AST need to remember to use dL every time they pattern-match. With the new design, using dL when it's unnecessary doesn't hurt, but forgetting it is problematic. So: just use it every time. My problem, though, is that this is just a convention -- no one checks it. It would be easy to forget.
>>>>
>>>>> On Feb 12, 2019, at 6:00 AM, Simon Peyton Jones via ghc-devs <[hidden email]> wrote:
>>>>>
>>>>> One way to think of it is this: we can now put SrcSpans where they make sense, rather than everywhere.
>>>>
>>>> This has some logic to it, but I'm not quite sold. Another way of saying this is that the new design favors flexibility for the producer, at the cost of requiring consumers to be aware of and consistently apply the convention Shayan describes above. The problem is, though, that if the producer is stingy in adding source locations, the consumer won't know which locations are expected to be informative. Is the consumer expected to collect locations from a variety of places and try to combine them somehow? I doubt it. So this means that the flexibility for the producer isn't really there -- the type system will accept arbitrary choices of where to put locations, but consumers won't get the locations where they expect them.
>>>>
>>>>>  We can still say (Located t) in places where we want to guarantee a SrcSpan.
>>>>
>>>> This seems to go against the TTG philosophy. We can do this in, say, the return type of a function, but we can't in the AST proper, because that's shared among a number of clients, some of whom don't want the source locations.
>>>>
>>>>>
>>>>> Yes, this lets us add more than one; that's redundant but not harmful.
>>>>
>>>> I disagree here. If we add locations to a node twice, then we'll have to use dL twice to find the underlying constructor. This is another case there the type system offers the producer flexibility but hamstrings the consumer.
>>>>
>>>>
>>>>> On Feb 12, 2019, at 7:32 AM, Vladislav Zavialov <[hidden email]> wrote:
>>>>>
>>>>> I claim an SrcSpan makes sense everywhere, so this is not a useful
>>>>> distinction. Think about it as code provenance, an AST node always
>>>>> comes from somewhere
>>>>
>>>> I agree with this observation. Perhaps SrcSpan is a bad name, and SrcProvenance is better. We could even imagine using the new HasCallStack feature to track where generated code comes from (perhaps only in DEBUG compilers). Do we need to do this today? I'm not sure there's a crying need. But philosophically, we are able to attach a provenance to every slice of AST, so there's really no reason for uninformative locations.
>>>>
>>>>> My concrete proposal: let's just put SrcSpan in the extension fields
>>>>> of each node
>>>>
>>>> I support this counter-proposal. Perhaps if it required writing loads of extra type instances, I wouldn't be as much in favor. But we already have to write those instances -- they just change from NoExt to SrcSpan. This seems to solve all the problems nicely, at relatively low cost. And, I'm sure it's more efficient at runtime than either the previous ping-pong style or the current scenario, as we can pattern-match on constructors directly, requiring one less pointer-chase or function call.
>>>>
>>>> One downside of this proposal is that it means that more care will have to be taken when setting the extension field of AST nodes after a pass, making sure to preserve the location. (This isn't really all that different from location-shuffling today.) A quick mock-up shows that record-updates can make this easier:
>>>>
>>>>> data Phase = Parsed | Renamed
>>>>>
>>>>> data Exp p = Node (XNode p) Int
>>>>>
>>>>> type family XNode (p :: Phase)
>>>>> type instance XNode p = NodeExt p
>>>>>
>>>>> data NodeExt p where
>>>>>  NodeExt :: { flag :: Bool, fvs :: RenamedOnly p String } -> NodeExt p
>>>>>
>>>>> type family RenamedOnly p t where
>>>>>  RenamedOnly Parsed _ = ()
>>>>>  RenamedOnly Renamed t = t
>>>>>
>>>>> example :: Exp Parsed
>>>>> example = Node (NodeExt { flag = True, fvs = () }) 5
>>>>>
>>>>> rename :: Exp Parsed -> Exp Renamed
>>>>> rename (Node ext n) = Node (ext { fvs = "xyz" }) n
>>>>
>>>> Note that the extension point is a record type that has a field available only after renaming. We can then do a type-changing record update when producing the renamed node, preserving the flag in the code above. What's sad is that, if there were no renamer-only field, we couldn't do a type-changing empty record update as the default case. (Haskell doesn't have empty record updates. Perhaps it should. They would be useful in doing a type-change on a datatype with a phantom index. A clever compiler could even somehow ensure that such a record update is completely compiled away.) In any case, this little example is essentially orthogonal to my points above, and the choice of whether to use records or other structures are completely local to the extension point. I just thought it might make for a nice style.
>>>>
>>>> Thanks,
>>>> Richard

--
Adam Gundry, Haskell Consultant
Well-Typed LLP, https://www.well-typed.com/
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: TTG: Handling Source Locations

Ryan Scott
In reply to this post by Richard Eisenberg-4
> Yes, I have reported it while back. I don't know of the progress in fixing this.

Reported what? #15884? [1] You do realize that there is a very simple workaround for that issue, right? Instead of writing this, which is subject to the pattern-guard completeness issues observed in #15753:

    f :: Maybe a -> Bool
    f (id->Nothing)  = False
    f (id->(Just _)) = True

You can instead write this:

    f :: Maybe a -> Bool
    f (id -> x) =
      case x of
        Nothing -> False
        Just _  -> True

This will get proper coverage checking, which means that this technique could be used to remove all of the panicking catch-all cases brought about by dL view patterns.

Ryan S.
-----

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: TTG: Handling Source Locations

Ryan Scott
In reply to this post by Richard Eisenberg-4
Yes, I agree. This will require sprinkling the codebase with EmptyCase due to [1], but that's still a sight better than calling `panic`. After GHC 8.10 is released (and the minimum version of GHC that HEAD supports is 8.8), we can even remove these empty cases by making the empty data type fields strict (see [2]).

Ryan S.
-----

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: TTG: Handling Source Locations

Shayan Najd Javadipour-2
>is there any plan to get rid of all those panics?

There are two sorts of panics related to TTG: the ones due to #15247
(i.e. unused extension constructors), and the ones due to #15884 (i.e.
issues with view patterns).

About the former, I believe we all agree. Moreover, using Solution A
discussed above, there will be way less unused extension constructors
anyway: HsSyn types will use their extension constructors for the
location wrapper constructor.
About the latter, until #15247 is fixed, we can do rewrites as Ryan
suggests. Hopefully, there will also be less of such panics around
after making the code idiomatic to match Solution A discussed above.

/Shayan

On Wed, 13 Feb 2019 at 12:07, Ryan Scott <[hidden email]> wrote:

>
> Yes, I agree. This will require sprinkling the codebase with EmptyCase due to [1], but that's still a sight better than calling `panic`. After GHC 8.10 is released (and the minimum version of GHC that HEAD supports is 8.8), we can even remove these empty cases by making the empty data type fields strict (see [2]).
>
> Ryan S.
> -----
> [1] https://ghc.haskell.org/trac/ghc/ticket/15247#comment:4
> [2] https://ghc.haskell.org/trac/ghc/ticket/15305
> _______________________________________________
> ghc-devs mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: TTG: Handling Source Locations

Shayan Najd Javadipour-2
* "About the latter, until #15247 is fixed" ---> "About the latter,
until #15884 is fixed"

On Wed, 13 Feb 2019 at 14:00, Shayan Najd <[hidden email]> wrote:

>
> >is there any plan to get rid of all those panics?
>
> There are two sorts of panics related to TTG: the ones due to #15247
> (i.e. unused extension constructors), and the ones due to #15884 (i.e.
> issues with view patterns).
>
> About the former, I believe we all agree. Moreover, using Solution A
> discussed above, there will be way less unused extension constructors
> anyway: HsSyn types will use their extension constructors for the
> location wrapper constructor.
> About the latter, until #15247 is fixed, we can do rewrites as Ryan
> suggests. Hopefully, there will also be less of such panics around
> after making the code idiomatic to match Solution A discussed above.
>
> /Shayan
>
> On Wed, 13 Feb 2019 at 12:07, Ryan Scott <[hidden email]> wrote:
> >
> > Yes, I agree. This will require sprinkling the codebase with EmptyCase due to [1], but that's still a sight better than calling `panic`. After GHC 8.10 is released (and the minimum version of GHC that HEAD supports is 8.8), we can even remove these empty cases by making the empty data type fields strict (see [2]).
> >
> > Ryan S.
> > -----
> > [1] https://ghc.haskell.org/trac/ghc/ticket/15247#comment:4
> > [2] https://ghc.haskell.org/trac/ghc/ticket/15305
> > _______________________________________________
> > ghc-devs mailing list
> > [hidden email]
> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs