Update on HIE Files

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Update on HIE Files

Zubin Duggal
Hello all,

I've been working on the HIE File (https://ghc.haskell.org/trac/ghc/wiki/HIEFiles) GSOC project,

The design of the data structure as well as the traversal of GHCs ASTs to collect all the relevant info is mostly complete.

We traverse the Renamed and Typechecked AST to collect the following info about each SrcSpan

1) Its type, if it corresponds to a binding, pattern or expression
2) Details about any tokens in the original source corresponding to this span(keywords, symbols, etc.)
3) The set of Constructor/Type pairs that correspond to this span in the GHC AST
4) Details about all the identifiers that occur at this SrcSpan

For each occurrence of an identifier(Name or ModuleName), we store its type(if it has one), and classify it as one of the following based on how it occurs:

1) Use
2) Import/Export
3) Pattern Binding, along with the scope of the binding, and the span of the entire binding location(including the RHS) if it occurs as part of a top level declaration, do binding or let/where binding
4) Value Binding, along with whether it is an instance binding or not, its scope, and the span of its entire binding site, including the RHS
5) Type Declaration (class or regular) (foo :: ...)
6) Declaration(class, type, instance, data, type family etc.)
7) Type variable binding, along with its scope(which takes into account ScopedTypeVariables)

I have updated the wiki page with more details about the Scopes associated with bindings: https://ghc.haskell.org/trac/ghc/wiki/HIEFiles#Scopeinformationaboutsymbols

These annotated SrcSpans are then arranged into a interval/rose tree to aid lookups.

We assume that no SrcSpans ever partially overlap, for any two SrcSpans that occur in the Renamed/Typechecked ASTs, either they are equal, disjoint, or strictly contained in each other. This assumption has mostly held out so far while testing on the entire ghc:HEAD tree, other than one case where the typechecker strips out parenthesis in the original source, which has been patched(see https://ghc.haskell.org/trac/ghc/ticket/15242).

I have also written functions that lookup the binding site(including RHS) and scope of an identifier from the tree. Testing these functions on the ghc:HEAD tree, it succeeds in looking up scopes for almost all symbol occurrences in all source files, and I've also verified that the calculated scope always contains all the occurrences of the symbol. The few cases where this check fails is where the SrcSpans have been mangled by CPP(see https://ghc.haskell.org/trac/ghc/ticket/15279).


Moving forward, the plan for the rest of the summer is

1) Move this into the GHC tree and add a flag that controls generating this
2) Write serializers and deserializers for this info
3) Teach the GHC PackageDb about .hie files
4) Rewrite haddocks --hyperlinked-source to use .hie files.

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Update on HIE Files

Matthew Pickering
Have you considered how this feature interacts with source plugins?

Could the generation of these files be implemented as a source plugin?
That would mean that development of the feature would not be coupled
to GHC releases.

Cheers,

Matt

On Tue, Jun 26, 2018 at 11:48 AM, Zubin Duggal <[hidden email]> wrote:

> Hello all,
>
> I've been working on the HIE File
> (https://ghc.haskell.org/trac/ghc/wiki/HIEFiles) GSOC project,
>
> The design of the data structure as well as the traversal of GHCs ASTs to
> collect all the relevant info is mostly complete.
>
> We traverse the Renamed and Typechecked AST to collect the following info
> about each SrcSpan
>
> 1) Its type, if it corresponds to a binding, pattern or expression
> 2) Details about any tokens in the original source corresponding to this
> span(keywords, symbols, etc.)
> 3) The set of Constructor/Type pairs that correspond to this span in the GHC
> AST
> 4) Details about all the identifiers that occur at this SrcSpan
>
> For each occurrence of an identifier(Name or ModuleName), we store its
> type(if it has one), and classify it as one of the following based on how it
> occurs:
>
> 1) Use
> 2) Import/Export
> 3) Pattern Binding, along with the scope of the binding, and the span of the
> entire binding location(including the RHS) if it occurs as part of a top
> level declaration, do binding or let/where binding
> 4) Value Binding, along with whether it is an instance binding or not, its
> scope, and the span of its entire binding site, including the RHS
> 5) Type Declaration (class or regular) (foo :: ...)
> 6) Declaration(class, type, instance, data, type family etc.)
> 7) Type variable binding, along with its scope(which takes into account
> ScopedTypeVariables)
>
> I have updated the wiki page with more details about the Scopes associated
> with bindings:
> https://ghc.haskell.org/trac/ghc/wiki/HIEFiles#Scopeinformationaboutsymbols
>
> These annotated SrcSpans are then arranged into a interval/rose tree to aid
> lookups.
>
> We assume that no SrcSpans ever partially overlap, for any two SrcSpans that
> occur in the Renamed/Typechecked ASTs, either they are equal, disjoint, or
> strictly contained in each other. This assumption has mostly held out so far
> while testing on the entire ghc:HEAD tree, other than one case where the
> typechecker strips out parenthesis in the original source, which has been
> patched(see https://ghc.haskell.org/trac/ghc/ticket/15242).
>
> I have also written functions that lookup the binding site(including RHS)
> and scope of an identifier from the tree. Testing these functions on the
> ghc:HEAD tree, it succeeds in looking up scopes for almost all symbol
> occurrences in all source files, and I've also verified that the calculated
> scope always contains all the occurrences of the symbol. The few cases where
> this check fails is where the SrcSpans have been mangled by CPP(see
> https://ghc.haskell.org/trac/ghc/ticket/15279).
>
> The code for this currently lives here:
> https://github.com/haskell/haddock/compare/ghc-head...wz1000:hiefile-2
>
> Moving forward, the plan for the rest of the summer is
>
> 1) Move this into the GHC tree and add a flag that controls generating this
> 2) Write serializers and deserializers for this info
> 3) Teach the GHC PackageDb about .hie files
> 4) Rewrite haddocks --hyperlinked-source to use .hie files.
>
> _______________________________________________
> ghc-devs mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Update on HIE Files

Zubin Duggal
Hey Matt,

In principle, there should be no problem interacting with source plugins, or implementing this as a source plugin, given that the generating function has type:

enrichHie :: GhcMonad m => TypecheckedSource -> RenamedSource -> m (HieAST Type)

The only reason the GhcMonad constraint is necessary and this is not a pure function is because desugarExpr has type
deSugarExpr :: HscEnv -> LHsExpr GhcTc -> IO (Messages, Maybe CoreExpr)
So we need a GhcMonad to get the HscEnv. We need to desugar expressions to get their Type.

However, in a private email with Németh Boldizsár regarding implementing this as a source plugin, I had the following concerns:

1. Since HIE files are going to be used for haddock generation, and haddock is a pretty important part of the haskell ecosystem, GHC should be able to produce them by default without needing to install anything else.
2. Integrating HIE file generation into GHC itself will push the burden of maintaining support to whoever makes breaking changes to GHC, instead of whoever ends up maintaining the source plugin. This way, HIE files can be a first class citizen and evolve with GHC.
3. Concerns about portability of source plugins - it should work at least wherever haddock can currently work
4. I believe there are some issues with how plugins interact with GHCs recompilation avoidance? Given that HIE files are also meant to be used for interactive usage via haskell-ide-engine, this is a pretty big deal breaker.

I understand (4) has been solved now, but the first three still remain.

On 26 June 2018 at 16:23, Matthew Pickering <[hidden email]> wrote:
Have you considered how this feature interacts with source plugins?

Could the generation of these files be implemented as a source plugin?
That would mean that development of the feature would not be coupled
to GHC releases.

Cheers,

Matt

On Tue, Jun 26, 2018 at 11:48 AM, Zubin Duggal <[hidden email]> wrote:
> Hello all,
>
> I've been working on the HIE File
> (https://ghc.haskell.org/trac/ghc/wiki/HIEFiles) GSOC project,
>
> The design of the data structure as well as the traversal of GHCs ASTs to
> collect all the relevant info is mostly complete.
>
> We traverse the Renamed and Typechecked AST to collect the following info
> about each SrcSpan
>
> 1) Its type, if it corresponds to a binding, pattern or expression
> 2) Details about any tokens in the original source corresponding to this
> span(keywords, symbols, etc.)
> 3) The set of Constructor/Type pairs that correspond to this span in the GHC
> AST
> 4) Details about all the identifiers that occur at this SrcSpan
>
> For each occurrence of an identifier(Name or ModuleName), we store its
> type(if it has one), and classify it as one of the following based on how it
> occurs:
>
> 1) Use
> 2) Import/Export
> 3) Pattern Binding, along with the scope of the binding, and the span of the
> entire binding location(including the RHS) if it occurs as part of a top
> level declaration, do binding or let/where binding
> 4) Value Binding, along with whether it is an instance binding or not, its
> scope, and the span of its entire binding site, including the RHS
> 5) Type Declaration (class or regular) (foo :: ...)
> 6) Declaration(class, type, instance, data, type family etc.)
> 7) Type variable binding, along with its scope(which takes into account
> ScopedTypeVariables)
>
> I have updated the wiki page with more details about the Scopes associated
> with bindings:
> https://ghc.haskell.org/trac/ghc/wiki/HIEFiles#Scopeinformationaboutsymbols
>
> These annotated SrcSpans are then arranged into a interval/rose tree to aid
> lookups.
>
> We assume that no SrcSpans ever partially overlap, for any two SrcSpans that
> occur in the Renamed/Typechecked ASTs, either they are equal, disjoint, or
> strictly contained in each other. This assumption has mostly held out so far
> while testing on the entire ghc:HEAD tree, other than one case where the
> typechecker strips out parenthesis in the original source, which has been
> patched(see https://ghc.haskell.org/trac/ghc/ticket/15242).
>
> I have also written functions that lookup the binding site(including RHS)
> and scope of an identifier from the tree. Testing these functions on the
> ghc:HEAD tree, it succeeds in looking up scopes for almost all symbol
> occurrences in all source files, and I've also verified that the calculated
> scope always contains all the occurrences of the symbol. The few cases where
> this check fails is where the SrcSpans have been mangled by CPP(see
> https://ghc.haskell.org/trac/ghc/ticket/15279).
>
> The code for this currently lives here:
> https://github.com/haskell/haddock/compare/ghc-head...wz1000:hiefile-2
>
> Moving forward, the plan for the rest of the summer is
>
> 1) Move this into the GHC tree and add a flag that controls generating this
> 2) Write serializers and deserializers for this info
> 3) Teach the GHC PackageDb about .hie files
> 4) Rewrite haddocks --hyperlinked-source to use .hie files.
>
> _______________________________________________
> ghc-devs mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>


_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Update on HIE Files

Gershom Bazerman
Another reason that this probably should go into the mainline rather than a plugin is that Zubin was explaining to me that the mechanisms introduced could improve and generalize the “:set +c” family of type and location information provided by ghci: https://downloads.haskell.org/~ghc/master/users-guide/ghci.html#ghci-cmd-:set%20+c

—g


On June 26, 2018 at 7:09:23 AM, Zubin Duggal ([hidden email]) wrote:

Hey Matt,

In principle, there should be no problem interacting with source plugins, or implementing this as a source plugin, given that the generating function has type:

enrichHie :: GhcMonad m => TypecheckedSource -> RenamedSource -> m (HieAST Type)

The only reason the GhcMonad constraint is necessary and this is not a pure function is because desugarExpr has type
deSugarExpr :: HscEnv -> LHsExpr GhcTc -> IO (Messages, Maybe CoreExpr)
So we need a GhcMonad to get the HscEnv. We need to desugar expressions to get their Type.

However, in a private email with Németh Boldizsár regarding implementing this as a source plugin, I had the following concerns:

1. Since HIE files are going to be used for haddock generation, and haddock is a pretty important part of the haskell ecosystem, GHC should be able to produce them by default without needing to install anything else.
2. Integrating HIE file generation into GHC itself will push the burden of maintaining support to whoever makes breaking changes to GHC, instead of whoever ends up maintaining the source plugin. This way, HIE files can be a first class citizen and evolve with GHC.
3. Concerns about portability of source plugins - it should work at least wherever haddock can currently work
4. I believe there are some issues with how plugins interact with GHCs recompilation avoidance? Given that HIE files are also meant to be used for interactive usage via haskell-ide-engine, this is a pretty big deal breaker.

I understand (4) has been solved now, but the first three still remain.

On 26 June 2018 at 16:23, Matthew Pickering <[hidden email]> wrote:
Have you considered how this feature interacts with source plugins?

Could the generation of these files be implemented as a source plugin?
That would mean that development of the feature would not be coupled
to GHC releases.

Cheers,

Matt

On Tue, Jun 26, 2018 at 11:48 AM, Zubin Duggal <[hidden email]> wrote:
> Hello all,
>
> I've been working on the HIE File
> (https://ghc.haskell.org/trac/ghc/wiki/HIEFiles) GSOC project,
>
> The design of the data structure as well as the traversal of GHCs ASTs to
> collect all the relevant info is mostly complete.
>
> We traverse the Renamed and Typechecked AST to collect the following info
> about each SrcSpan
>
> 1) Its type, if it corresponds to a binding, pattern or expression
> 2) Details about any tokens in the original source corresponding to this
> span(keywords, symbols, etc.)
> 3) The set of Constructor/Type pairs that correspond to this span in the GHC
> AST
> 4) Details about all the identifiers that occur at this SrcSpan
>
> For each occurrence of an identifier(Name or ModuleName), we store its
> type(if it has one), and classify it as one of the following based on how it
> occurs:
>
> 1) Use
> 2) Import/Export
> 3) Pattern Binding, along with the scope of the binding, and the span of the
> entire binding location(including the RHS) if it occurs as part of a top
> level declaration, do binding or let/where binding
> 4) Value Binding, along with whether it is an instance binding or not, its
> scope, and the span of its entire binding site, including the RHS
> 5) Type Declaration (class or regular) (foo :: ...)
> 6) Declaration(class, type, instance, data, type family etc.)
> 7) Type variable binding, along with its scope(which takes into account
> ScopedTypeVariables)
>
> I have updated the wiki page with more details about the Scopes associated
> with bindings:
> https://ghc.haskell.org/trac/ghc/wiki/HIEFiles#Scopeinformationaboutsymbols
>
> These annotated SrcSpans are then arranged into a interval/rose tree to aid
> lookups.
>
> We assume that no SrcSpans ever partially overlap, for any two SrcSpans that
> occur in the Renamed/Typechecked ASTs, either they are equal, disjoint, or
> strictly contained in each other. This assumption has mostly held out so far
> while testing on the entire ghc:HEAD tree, other than one case where the
> typechecker strips out parenthesis in the original source, which has been
> patched(see https://ghc.haskell.org/trac/ghc/ticket/15242).
>
> I have also written functions that lookup the binding site(including RHS)
> and scope of an identifier from the tree. Testing these functions on the
> ghc:HEAD tree, it succeeds in looking up scopes for almost all symbol
> occurrences in all source files, and I've also verified that the calculated
> scope always contains all the occurrences of the symbol. The few cases where
> this check fails is where the SrcSpans have been mangled by CPP(see
> https://ghc.haskell.org/trac/ghc/ticket/15279).
>
> The code for this currently lives here:
> https://github.com/haskell/haddock/compare/ghc-head...wz1000:hiefile-2
>
> Moving forward, the plan for the rest of the summer is
>
> 1) Move this into the GHC tree and add a flag that controls generating this
> 2) Write serializers and deserializers for this info
> 3) Teach the GHC PackageDb about .hie files
> 4) Rewrite haddocks --hyperlinked-source to use .hie files.
>
> _______________________________________________
> ghc-devs mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>


_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs