Tracking down instances from use-sites

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Tracking down instances from use-sites

Christopher Done-2
Hi all,

Given a TypecheckedModule, what's the most direct way given a Var
expression retrieved from the AST, to determine:

1) that it's a class method e.g. `read`
2) that it's a generic call (no instance chosen) e.g. `Read a => a -> String`
3) or if it's a resolved instance, then which instance is it and which
package, module and declaration is that defined in?

Starting with this file that has a TypecheckedModule in it:
https://gist.github.com/chrisdone/6fcb9f1cba6324148d481fcd4eab6af6#file-ghc-api-hs-L23

I presume at this point that instance resolution has taken place. I'm
not sure that dictionaries or chosen instances are inserted into the
AST, or whether just the resolved types are inserted e.g. `Int ->
String`, where I want e.g. `Read Int`, which might lead me to finding
the matching instance from an InstEnv or so.

I'd like to do some analyses of Haskell codebases, and the fact that
calls to class methods are opaque is a bit of a road-blocker. Any
handy tips? Prior work?

It'd be neat in tooling to just hit a goto-definition key on `read`
and be taken to the instance implementation rather than the class
definition.

Also, listing all functions that use throw# or functions defined in
terms of throw# or FFI calls would be helpful, especially for doing
audits. If I could immediately list all partial functions in a
project, then list all call-sites, it would be a very convenient way
when doing an audit to see whether partial functions (such as head)
are used with the proper preconditions or not.

Any tips appreciated,

Chris
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Tracking down instances from use-sites

Matthew Pickering
Chris,

I have also considered this question.

1. Look at the `idDetails` of the `Id`. A class selector is a `ClassOpId`.
2,3,

When a class selector `foo` is typechecked, the instance information
is of course resolved. The selector `foo` is then wrapped in a
`HsWrapper` which when desugared will apply the type arguments and
dictionary arguments.
Thus, in order to understand what instance has been selected, we need
to look into the `HsWrapper`. In particular, one of the constructors
is the `WpEvApp` constructor which is what will apply the dictionary
argument.
In case 2, this will be a type variable. In case 3, this will be the
dictionary variable. I'm not sure how to distinguish these two cases
easily. Then once you have the dictionary id, you can use `idType` to
get the type of the dictionary which will be something like `Show ()`
in order
to tell you which instance was selected.

You can inspect the AST of a typechecked program using the
`-ddump-tc-ast` flag.

Finally, you should considering writing this as a source plugin rather
than using the GHC API as it will be easier to run in a variety of
different scenarios.

Cheers,

Matt

On Tue, Jun 26, 2018 at 4:40 PM, Christopher Done <[hidden email]> wrote:

> Hi all,
>
> Given a TypecheckedModule, what's the most direct way given a Var
> expression retrieved from the AST, to determine:
>
> 1) that it's a class method e.g. `read`
> 2) that it's a generic call (no instance chosen) e.g. `Read a => a -> String`
> 3) or if it's a resolved instance, then which instance is it and which
> package, module and declaration is that defined in?
>
> Starting with this file that has a TypecheckedModule in it:
> https://gist.github.com/chrisdone/6fcb9f1cba6324148d481fcd4eab6af6#file-ghc-api-hs-L23
>
> I presume at this point that instance resolution has taken place. I'm
> not sure that dictionaries or chosen instances are inserted into the
> AST, or whether just the resolved types are inserted e.g. `Int ->
> String`, where I want e.g. `Read Int`, which might lead me to finding
> the matching instance from an InstEnv or so.
>
> I'd like to do some analyses of Haskell codebases, and the fact that
> calls to class methods are opaque is a bit of a road-blocker. Any
> handy tips? Prior work?
>
> It'd be neat in tooling to just hit a goto-definition key on `read`
> and be taken to the instance implementation rather than the class
> definition.
>
> Also, listing all functions that use throw# or functions defined in
> terms of throw# or FFI calls would be helpful, especially for doing
> audits. If I could immediately list all partial functions in a
> project, then list all call-sites, it would be a very convenient way
> when doing an audit to see whether partial functions (such as head)
> are used with the proper preconditions or not.
>
> Any tips appreciated,
>
> Chris
> _______________________________________________
> ghc-devs mailing list
> [hidden email]
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Tracking down instances from use-sites

Christopher Done-2
>  The selector `foo` is then wrapped in a
> `HsWrapper` which when desugared will apply the type arguments and
> dictionary arguments.

Nice! I'll give this a try and report back. Thanks.

> Finally, you should considering writing this as a source plugin rather
> than using the GHC API as it will be easier to run in a variety of
> different scenarios.

It took me a few minutes to find what you meant. For posterity, I
think "frontend plugins" is the name:
https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/extending_ghc.html#frontend-plugins

That sounds like a good idea. This is the first time I've seen this
feature of GHC.

Cheers!



On Tue, 26 Jun 2018 at 17:19, Matthew Pickering
<[hidden email]> wrote:

>
> Chris,
>
> I have also considered this question.
>
> 1. Look at the `idDetails` of the `Id`. A class selector is a `ClassOpId`.
> 2,3,
>
> When a class selector `foo` is typechecked, the instance information
> is of course resolved. The selector `foo` is then wrapped in a
> `HsWrapper` which when desugared will apply the type arguments and
> dictionary arguments.
> Thus, in order to understand what instance has been selected, we need
> to look into the `HsWrapper`. In particular, one of the constructors
> is the `WpEvApp` constructor which is what will apply the dictionary
> argument.
> In case 2, this will be a type variable. In case 3, this will be the
> dictionary variable. I'm not sure how to distinguish these two cases
> easily. Then once you have the dictionary id, you can use `idType` to
> get the type of the dictionary which will be something like `Show ()`
> in order
> to tell you which instance was selected.
>
> You can inspect the AST of a typechecked program using the
> `-ddump-tc-ast` flag.
>
> Finally, you should considering writing this as a source plugin rather
> than using the GHC API as it will be easier to run in a variety of
> different scenarios.
>
> Cheers,
>
> Matt
>
> On Tue, Jun 26, 2018 at 4:40 PM, Christopher Done <[hidden email]> wrote:
> > Hi all,
> >
> > Given a TypecheckedModule, what's the most direct way given a Var
> > expression retrieved from the AST, to determine:
> >
> > 1) that it's a class method e.g. `read`
> > 2) that it's a generic call (no instance chosen) e.g. `Read a => a -> String`
> > 3) or if it's a resolved instance, then which instance is it and which
> > package, module and declaration is that defined in?
> >
> > Starting with this file that has a TypecheckedModule in it:
> > https://gist.github.com/chrisdone/6fcb9f1cba6324148d481fcd4eab6af6#file-ghc-api-hs-L23
> >
> > I presume at this point that instance resolution has taken place. I'm
> > not sure that dictionaries or chosen instances are inserted into the
> > AST, or whether just the resolved types are inserted e.g. `Int ->
> > String`, where I want e.g. `Read Int`, which might lead me to finding
> > the matching instance from an InstEnv or so.
> >
> > I'd like to do some analyses of Haskell codebases, and the fact that
> > calls to class methods are opaque is a bit of a road-blocker. Any
> > handy tips? Prior work?
> >
> > It'd be neat in tooling to just hit a goto-definition key on `read`
> > and be taken to the instance implementation rather than the class
> > definition.
> >
> > Also, listing all functions that use throw# or functions defined in
> > terms of throw# or FFI calls would be helpful, especially for doing
> > audits. If I could immediately list all partial functions in a
> > project, then list all call-sites, it would be a very convenient way
> > when doing an audit to see whether partial functions (such as head)
> > are used with the proper preconditions or not.
> >
> > Any tips appreciated,
> >
> > Chris
> > _______________________________________________
> > ghc-devs mailing list
> > [hidden email]
> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Tracking down instances from use-sites

Matthew Pickering
Sorry, they are not "frontend plugins" but a new feature that will be
in GHC 8.6.

They are an implementation of this GHC proposal.
https://github.com/ghc-proposals/ghc-proposals/blob/master/proposals/0017-source-plugins.rst

There is also this thread last year about the same topic which Simon
answered in the same way that I did but you may find either
explanation more useful.

https://mail.haskell.org/pipermail/ghc-devs/2017-October/014826.html

Cheers,

Matt

On Tue, Jun 26, 2018 at 6:04 PM, Christopher Done <[hidden email]> wrote:

>>  The selector `foo` is then wrapped in a
>> `HsWrapper` which when desugared will apply the type arguments and
>> dictionary arguments.
>
> Nice! I'll give this a try and report back. Thanks.
>
>> Finally, you should considering writing this as a source plugin rather
>> than using the GHC API as it will be easier to run in a variety of
>> different scenarios.
>
> It took me a few minutes to find what you meant. For posterity, I
> think "frontend plugins" is the name:
> https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/extending_ghc.html#frontend-plugins
>
> That sounds like a good idea. This is the first time I've seen this
> feature of GHC.
>
> Cheers!
>
>
>
> On Tue, 26 Jun 2018 at 17:19, Matthew Pickering
> <[hidden email]> wrote:
>>
>> Chris,
>>
>> I have also considered this question.
>>
>> 1. Look at the `idDetails` of the `Id`. A class selector is a `ClassOpId`.
>> 2,3,
>>
>> When a class selector `foo` is typechecked, the instance information
>> is of course resolved. The selector `foo` is then wrapped in a
>> `HsWrapper` which when desugared will apply the type arguments and
>> dictionary arguments.
>> Thus, in order to understand what instance has been selected, we need
>> to look into the `HsWrapper`. In particular, one of the constructors
>> is the `WpEvApp` constructor which is what will apply the dictionary
>> argument.
>> In case 2, this will be a type variable. In case 3, this will be the
>> dictionary variable. I'm not sure how to distinguish these two cases
>> easily. Then once you have the dictionary id, you can use `idType` to
>> get the type of the dictionary which will be something like `Show ()`
>> in order
>> to tell you which instance was selected.
>>
>> You can inspect the AST of a typechecked program using the
>> `-ddump-tc-ast` flag.
>>
>> Finally, you should considering writing this as a source plugin rather
>> than using the GHC API as it will be easier to run in a variety of
>> different scenarios.
>>
>> Cheers,
>>
>> Matt
>>
>> On Tue, Jun 26, 2018 at 4:40 PM, Christopher Done <[hidden email]> wrote:
>> > Hi all,
>> >
>> > Given a TypecheckedModule, what's the most direct way given a Var
>> > expression retrieved from the AST, to determine:
>> >
>> > 1) that it's a class method e.g. `read`
>> > 2) that it's a generic call (no instance chosen) e.g. `Read a => a -> String`
>> > 3) or if it's a resolved instance, then which instance is it and which
>> > package, module and declaration is that defined in?
>> >
>> > Starting with this file that has a TypecheckedModule in it:
>> > https://gist.github.com/chrisdone/6fcb9f1cba6324148d481fcd4eab6af6#file-ghc-api-hs-L23
>> >
>> > I presume at this point that instance resolution has taken place. I'm
>> > not sure that dictionaries or chosen instances are inserted into the
>> > AST, or whether just the resolved types are inserted e.g. `Int ->
>> > String`, where I want e.g. `Read Int`, which might lead me to finding
>> > the matching instance from an InstEnv or so.
>> >
>> > I'd like to do some analyses of Haskell codebases, and the fact that
>> > calls to class methods are opaque is a bit of a road-blocker. Any
>> > handy tips? Prior work?
>> >
>> > It'd be neat in tooling to just hit a goto-definition key on `read`
>> > and be taken to the instance implementation rather than the class
>> > definition.
>> >
>> > Also, listing all functions that use throw# or functions defined in
>> > terms of throw# or FFI calls would be helpful, especially for doing
>> > audits. If I could immediately list all partial functions in a
>> > project, then list all call-sites, it would be a very convenient way
>> > when doing an audit to see whether partial functions (such as head)
>> > are used with the proper preconditions or not.
>> >
>> > Any tips appreciated,
>> >
>> > Chris
>> > _______________________________________________
>> > ghc-devs mailing list
>> > [hidden email]
>> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
Reply | Threaded
Open this post in threaded view
|

Re: Tracking down instances from use-sites

Ben Gamari-3
In reply to this post by Christopher Done-2
Christopher Done <[hidden email]> writes:

> Hi all,
>
> Given a TypecheckedModule, what's the most direct way given a Var
> expression retrieved from the AST, to determine:
>
> 1) that it's a class method e.g. `read`
> 2) that it's a generic call (no instance chosen) e.g. `Read a => a -> String`
> 3) or if it's a resolved instance, then which instance is it and which
> package, module and declaration is that defined in?
>
> Starting with this file that has a TypecheckedModule in it:
> https://gist.github.com/chrisdone/6fcb9f1cba6324148d481fcd4eab6af6#file-ghc-api-hs-L23
>
> I presume at this point that instance resolution has taken place. I'm
> not sure that dictionaries or chosen instances are inserted into the
> AST, or whether just the resolved types are inserted e.g. `Int ->
> String`, where I want e.g. `Read Int`, which might lead me to finding
> the matching instance from an InstEnv or so.
>
> I'd like to do some analyses of Haskell codebases, and the fact that
> calls to class methods are opaque is a bit of a road-blocker. Any
> handy tips? Prior work?
>
> It'd be neat in tooling to just hit a goto-definition key on `read`
> and be taken to the instance implementation rather than the class
> definition.
>
Indeed that would be great.

I believe (1) is quite straightforward: You can recognize a class
operation by looking at the function's IdDetails (specifically looking
for ClassOpId). This contains the Class to which the method belongs.

Getting back to the instance is a bit trickier. I'll admit I don't know
whether there is a convenient way to do this. However, I can try to fill
in some background and give a few ideas. First let's review of
how typeclass evidence is represented in HsSyn (apologies if this is
already known): For concreteness, let's consider the program,

    showList :: Show a => [a] -> String
    showList x = show x

After typechecking this will likely turn into something like (taken from
the output of -ddump-tc -fprint-typechecker-elaboration):

    AbsBindsSig [a_a1hj] [$dShow_a1hl]
        {Exported type: Hi.showList :: forall a. Show a => [a] -> String
                        [LclId]
        Bind: showList_a1hk x_azo = show @ [a_a1hj] $dShow_a1hn x_azo
        Evidence: EvBinds{[W] $dShow_a1hn
                            = GHC.Show.$fShow[] @[a_a1hj] [$dShow_a1hl]}}

This AbsBind represents a binding abstracted over a dictionary argument
($dShow_a1hl :: Show a_a1hj). The "Evidence" section gives
a list of evidence bindings which the desugarer will wrap the RHS in; in
this case the typechecker has built a `Show [a_a1hj]` instance from the
`Show a => Show [a]` instance defined in GHC.Show and the abstracted
`$dShow_A1hl` dictionary.

The `show` call site will then look something like this in HsSyn:

    HsApp
      (HsWrap
          (WpEvApp $dShow_a1hn)
          (HsWrap
              (WpTyApp a_a1hj)
              (HsVar GHC.Show.show)))
      (HsVar x_azo)

Here the typechecker has wrapped the (show x_azo) expression in a pair
of HsWrappers which apply its type and dictionary arguments.

This suggests an approach to identify "generic" call sites (item (2)
above): look at whether the RHS of the call site's dictionary is
lambda-bound or not. In the above case we see that it is not
lambda-bound but rather a concrete dictionary: `GHC.Show.$fShow[]`. You
can know that this is a dictionary by looking at its IdDetails
(specifically, it is of the DFunId variety).

By contrast if we have a generic call-site:

    printIt :: Show a => a -> IO ()
    printIt x = putStrLn $ show x

We see that we the evidence binding is headed by a lambda-bound dictionary:

    AbsBindsSig [a_a1AP] [$dShow_a1AR]
      {Exported type: printIt :: forall a. Show a => a -> IO ()
                      [LclId]
      Bind: printIt_a1AQ x_a12W
              = putStrLn $ show @ a_a1AP $dShow_a1AV x_a12W
      Evidence: EvBinds{[W] $dShow_a1AV = $dShow_a1AR}}

Of course, in the case that you have a concrete dictionary you *also*
want to know the source location of the instance declaration from which
it arose. I'm afraid this may be quite challenging as this isn't
information we currently keep. Currently interface files don't really
keep any information that might be useful to IDE tooling users. It's
possible that we could add such information, although it's unclear
exactly what this would look like. It would be great to hear more from
tooling users regarding what information they would like to see.

Also relevant here is the HIE file GSoC project [1] being worked on this
summer of Zubin Duggal (CC'd).


> Also, listing all functions that use throw# or functions defined in
> terms of throw# or FFI calls would be helpful, especially for doing
> audits. If I could immediately list all partial functions in a
> project, then list all call-sites, it would be a very convenient way
> when doing an audit to see whether partial functions (such as head)
> are used with the proper preconditions or not.
>
This may be non-trivial; you may be able to get something along these
lines out of the strictness signature present in IdInfo. However, I
suspect this will be a bit fragile (e.g. we don't even run demand
analysis with -O0 IIRC).

Cheers,

- Ben



[1] https://ghc.haskell.org/trac/ghc/wiki/HIEFiles

_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

signature.asc (497 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Tracking down instances from use-sites

Christopher Done-2
Ben,

Thanks for the in-depth elaboration of what Mathew/Simon were
describing! It seems within reach!

> Of course, in the case that you have a concrete dictionary you *also*
> want to know the source location of the instance declaration from which
> it arose. I'm afraid this may be quite challenging as this isn't
> information we currently keep. Currently interface files don't really
> keep any information that might be useful to IDE tooling users. It's
> possible that we could add such information, although it's unclear
> exactly what this would look like. It would be great to hear more from
> tooling users regarding what information they would like to see.

Indeed, not having the exact source location was a stretch, I didn't
have high hopes for that. However, the package and module is actually
useful. Regarding that, I did find the following field:

    -- | @is_dfun_name = idName . is_dfun@.
    --
    -- We use 'is_dfun_name' for the visibility check,
    -- 'instIsVisible', which needs to know the 'Module' which the
    -- dictionary is defined in. However, we cannot use the 'Module'
    -- attached to 'is_dfun' since doing so would mean we would
    -- potentially pull in an entire interface file unnecessarily.
    -- This was the cause of #12367.
    , is_dfun_name :: Name

So it seems like I could use the Name to get a Module which contains a
UnitId (package and version) and ModuleName. If I've already generated
the right metadata for that package and module, then I can do the
mapping.

> Also relevant here is the HIE file GSoC project [1] being worked on this
> summer of Zubin Duggal (CC'd).

I think this would be a good use-case for that.

> > Also, listing all functions that use throw# or functions defined in
> > terms of throw# or FFI calls would be helpful, especially for doing
> > audits. If I could immediately list all partial functions in a
> > project, then list all call-sites, it would be a very convenient way
> > when doing an audit to see whether partial functions (such as head)
> > are used with the proper preconditions or not.
>
> This may be non-trivial; you may be able to get something along these
> lines out of the strictness signature present in IdInfo. However, I
> suspect this will be a bit fragile (e.g. we don't even run demand
> analysis with -O0 IIRC).

I was going to start with a very naive approach of creating a dependency
graph merely based on presence in a declaration, not on use. E.g.

    foo = if False then head [] else 123

would still be flagged up as partial, even though upon inspection it
isn't. But it uses `head`, so it should arouse suspicion. I'd want to
review it myself and determine that it's safe and then mark it safe. In
the least, I might mark such code as having potential for bugs.

Cheers!
_______________________________________________
ghc-devs mailing list
[hidden email]
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs