Why do Names defined in the current module lack a module name?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Why do Names defined in the current module lack a module name?

Johan Tibell-2
Hi,

I'm writing an app that uses the GHC API to extract names from modules.
I've written a little AST traversal that finds all Names in the
type-checked AST and writes them to a file. I noticed that every Name
defined in the current module (i.e. Names generated for top-level function
definitions) lack a module name i.e. nameModule_maybe return Nothing. Why
is this? Should I just assume that every time nameModule_maybe return
Nothing the module name is in fact the name of the current module being
compiled?

-- Johan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130328/0d493615/attachment.htm>

Reply | Threaded
Open this post in threaded view
|

Why do Names defined in the current module lack a module name?

Simon Peyton Jones
Does this help
http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/NameType

Something that starts top-level may not finish up as top-level.  Nested bindings are never qualified.

After TidyPgm, externally-visible names (to the linker) are qualified, ones local to the .o file are not.

Does it matter?

Simon

From: ghc-devs-bounces at haskell.org [mailto:ghc-devs-bounces at haskell.org] On Behalf Of Johan Tibell
Sent: 28 March 2013 20:35
To: ghc-devs at haskell.org
Subject: Why do Names defined in the current module lack a module name?

Hi,

I'm writing an app that uses the GHC API to extract names from modules. I've written a little AST traversal that finds all Names in the type-checked AST and writes them to a file. I noticed that every Name defined in the current module (i.e. Names generated for top-level function definitions) lack a module name i.e. nameModule_maybe return Nothing. Why is this? Should I just assume that every time nameModule_maybe return Nothing the module name is in fact the name of the current module being compiled?

-- Johan

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130402/0b706a78/attachment.htm>

Reply | Threaded
Open this post in threaded view
|

Why do Names defined in the current module lack a module name?

Johan Tibell-2
Hi Simon,

On Tue, Apr 2, 2013 at 3:31 AM, Simon Peyton-Jones
<simonpj at microsoft.com> wrote:
>
> Does this help
>
> http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/NameType

A bit, but it's still not clear to me exactly when user defined
exported entities will have full (i.e. including module) names.

> Something that starts top-level may not finish up as top-level.  Nested bindings are never qualified.
>
> After TidyPgm, externally-visible names (to the linker) are qualified, ones local to the .o file are not.

Here's my example program:

```haskell
module Test
    ( mysum
    ) where

import Data.List (foldl')

import Import (imported)

mysum :: [Int] -> Int
mysum xs = foldl' (+) imported xs
```

As you see it has a top-level exported thing (mysum). My problem is
that I'm traversing the type-checked AST (i.e. returned by
`typecheckedSource module`) trying to collect all the names so I can
index them for a code search project I'm working on. It's a bit
similar to the GHC ctags/etags tool, except I'm trying to index all
the source code.

So for every Name I run into in the source code I need to figure out
what kind of name it is. That's made quite tricky by the fact that
name resolution isn't actually quite done by the time we have the
typed AST (i.e. mysum ought to have the name "Test.mysum", but it has
the name "mysum"). I can try to implement this last resolution step
myself, but then I need to understand how to identify names such as
mysum above, while traversing the AST.

-- Johan


Reply | Threaded
Open this post in threaded view
|

Why do Names defined in the current module lack a module name?

Simon Peyton Jones
I've just been checking. The top level name *is* (and remains) an External Name. See below.

Don't be misled by the dump labelled "Typechecker" (-ddump-tc).  It is carefully printing things as in error messages, with as little qualification as possible; since 'mysum' is in scope unqualified here, it's printed unqualified. But it's an External Name all right. Use -dppr-debug to see it in its full glory

Simon



==================== Renamer ====================
Foo.mysum :: [Int] -> Int
Foo.mysum xs = foldl' (+) 0 xs

TYPE SIGNATURES
    mysum :: [Int] -> Int
TYPE CONSTRUCTORS
COERCION AXIOMS
Dependent modules: []
Dependent packages: [base, ghc-prim, integer-gmp]

==================== Typechecker ====================
AbsBinds
[]
[]
[mysum <= mysum
   <>]
  mysum :: [Int] -> Int
  [LclId]
  mysum xs = foldl' ((+)) 0 xs


==================== Desugar (after optimization) ====================
Result size = 11

Foo.mysum :: [GHC.Types.Int] -> GHC.Types.Int
[LclIdX]
Foo.mysum =
  \ (xs_abz :: [GHC.Types.Int]) ->
    Data.List.foldl'
      @ GHC.Types.Int
      @ GHC.Types.Int
      (GHC.Num.+ @ GHC.Types.Int GHC.Num.$fNumInt)
      (GHC.Types.I# 0)
      xs_abz

| -----Original Message-----
| From: Johan Tibell [mailto:johan.tibell at gmail.com]
| Sent: 02 April 2013 17:43
| To: Simon Peyton-Jones
| Cc: ghc-devs at haskell.org
| Subject: Re: Why do Names defined in the current module lack a module
| name?
|
| Hi Simon,
|
| On Tue, Apr 2, 2013 at 3:31 AM, Simon Peyton-Jones
| <simonpj at microsoft.com> wrote:
| >
| > Does this help
| >
| > http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/NameType
|
| A bit, but it's still not clear to me exactly when user defined exported
| entities will have full (i.e. including module) names.
|
| > Something that starts top-level may not finish up as top-level.
| Nested bindings are never qualified.
| >
| > After TidyPgm, externally-visible names (to the linker) are qualified,
| ones local to the .o file are not.
|
| Here's my example program:
|
| ```haskell
| module Test
|     ( mysum
|     ) where
|
| import Data.List (foldl')
|
| import Import (imported)
|
| mysum :: [Int] -> Int
| mysum xs = foldl' (+) imported xs
| ```
|
| As you see it has a top-level exported thing (mysum). My problem is that
| I'm traversing the type-checked AST (i.e. returned by `typecheckedSource
| module`) trying to collect all the names so I can index them for a code
| search project I'm working on. It's a bit similar to the GHC ctags/etags
| tool, except I'm trying to index all the source code.
|
| So for every Name I run into in the source code I need to figure out
| what kind of name it is. That's made quite tricky by the fact that name
| resolution isn't actually quite done by the time we have the typed AST
| (i.e. mysum ought to have the name "Test.mysum", but it has the name
| "mysum"). I can try to implement this last resolution step myself, but
| then I need to understand how to identify names such as mysum above,
| while traversing the AST.
|
| -- Johan


Reply | Threaded
Open this post in threaded view
|

Why do Names defined in the current module lack a module name?

Manuel M T Chakravarty
In reply to this post by Johan Tibell-2
Johan Tibell <johan.tibell at gmail.com>:

> On Tue, Apr 2, 2013 at 3:31 AM, Simon Peyton-Jones
> <simonpj at microsoft.com> wrote:
> A bit, but it's still not clear to me exactly when user defined
> exported entities will have full (i.e. including module) names.
>
>> Something that starts top-level may not finish up as top-level.  Nested bindings are never qualified.
>>
>> After TidyPgm, externally-visible names (to the linker) are qualified, ones local to the .o file are not.
>
> Here's my example program:
>
> ```haskell
> module Test
>    ( mysum
>    ) where
>
> import Data.List (foldl')
>
> import Import (imported)
>
> mysum :: [Int] -> Int
> mysum xs = foldl' (+) imported xs
> ```
>
> As you see it has a top-level exported thing (mysum). My problem is
> that I'm traversing the type-checked AST (i.e. returned by
> `typecheckedSource module`) trying to collect all the names so I can
> index them for a code search project I'm working on. It's a bit
> similar to the GHC ctags/etags tool, except I'm trying to index all
> the source code.
>
> So for every Name I run into in the source code I need to figure out
> what kind of name it is. That's made quite tricky by the fact that
> name resolution isn't actually quite done by the time we have the
> typed AST (i.e. mysum ought to have the name "Test.mysum", but it has
> the name "mysum"). I can try to implement this last resolution step
> myself, but then I need to understand how to identify names such as
> mysum above, while traversing the AST.

I'm not sure what information you are trying to collect, but if you can traverse the Core program after TidyPgm instead of the type checked AST, that would have all names cleaned up (as Simon wrote) and still have valid source positions.

Otherwise, there are predicates on 'Id's and 'Name's that determine their categories, e.g., 'isExportedId'. Just keep in mind that in GHC-speak a "local" 'Id' is one that is defined in the currently compiled module. It may still be a top-level 'Id' that is exported. (Not your classic definition of "local", I guess ;)

Manuel



Reply | Threaded
Open this post in threaded view
|

Why do Names defined in the current module lack a module name?

Johan Tibell-2
In reply to this post by Simon Peyton Jones
On Wed, Apr 3, 2013 at 5:14 AM, Simon Peyton-Jones
<simonpj at microsoft.com> wrote:
> I've just been checking. The top level name *is* (and remains) an External Name. See below.
>
> Don't be misled by the dump labelled "Typechecker" (-ddump-tc).  It is carefully printing things as in error messages, with as little qualification as possible; since 'mysum' is in scope unqualified here, it's printed unqualified. But it's an External Name all right. Use -dppr-debug to see it in its full glory

I'm not dumping the names using a command line flag, but using a
program that uses the GHC API that I wrote. This is how I use the GHC
API to get hold of the AST once the source code has been compiled:

forEachM z xs f = foldM f z xs

indexSymbols :: ModuleGraph -> Ghc Builder.Builder
indexSymbols graph = forEachM Builder.new graph $ \ builder ms -> do
    let filename = msHsFilePath ms
    handleSourceError printErrorAndExit $ do
        liftIO $ putStrLn ("Loading " ++ filename ++ " ...")
        mod <- loadModule =<< typecheckModule =<< parseModule ms
        case mod of
            _ | isBootSummary ms -> return builder
            _ | Just (group, _, _, _) <- renamedSource mod -> do
                let modname = moduleName $ ms_mod ms
                    builder' = invertedIndex builder modname $ bagToList $
                               symbols group (typecheckedSource mod)
-- ** HERE **
                liftIO $ print $ builder'
                return $! builder'
            _  -> liftIO $ exitWith (ExitFailure 1)
  where
    printErrorAndExit e = do
        printException e
        liftIO $ exitWith (ExitFailure 1)

So I invoke my AST traversal function 'symbols' (which just extracts
all the Names in the AST into a Bag) on the AST returned by
'typecheckedSource mod'. This is the AST I expected to have fully
qualified names but doesn't.


Reply | Threaded
Open this post in threaded view
|

Why do Names defined in the current module lack a module name?

Manuel M T Chakravarty
Johan Tibell <johan.tibell at gmail.com>:
> On Wed, Apr 3, 2013 at 5:14 AM, Simon Peyton-Jones
> <simonpj at microsoft.com> wrote:
>> I've just been checking. The top level name *is* (and remains) an External Name. See below.
>>
>> Don't be misled by the dump labelled "Typechecker" (-ddump-tc).  It is carefully printing things as in error messages, with as little qualification as possible; since 'mysum' is in scope unqualified here, it's printed unqualified. But it's an External Name all right. Use -dppr-debug to see it in its full glory
> I'm not dumping the names using a command line flag, but using a
> program that uses the GHC API that I wrote.

Simon's point is that GHC can print names in different ways. The verbosity of a printed name depends on the context in which it is printed (and -dppr-debug is a way to tell GHC that you want to get all info, including uniques etc).

> This is how I use the GHC
> API to get hold of the AST once the source code has been compiled:
>
> forEachM z xs f = foldM f z xs
>
> indexSymbols :: ModuleGraph -> Ghc Builder.Builder
> indexSymbols graph = forEachM Builder.new graph $ \ builder ms -> do
>    let filename = msHsFilePath ms
>    handleSourceError printErrorAndExit $ do
>        liftIO $ putStrLn ("Loading " ++ filename ++ " ...")
>        mod <- loadModule =<< typecheckModule =<< parseModule ms
>        case mod of
>            _ | isBootSummary ms -> return builder
>            _ | Just (group, _, _, _) <- renamedSource mod -> do
>                let modname = moduleName $ ms_mod ms
>                    builder' = invertedIndex builder modname $ bagToList $
>                               symbols group (typecheckedSource mod)
> -- ** HERE **
>                liftIO $ print $ builder'
>                return $! builder'
>            _  -> liftIO $ exitWith (ExitFailure 1)
>  where
>    printErrorAndExit e = do
>        printException e
>        liftIO $ exitWith (ExitFailure 1)
>
> So I invoke my AST traversal function 'symbols' (which just extracts
> all the Names in the AST into a Bag) on the AST returned by
> 'typecheckedSource mod'. This is the AST I expected to have fully
> qualified names but doesn't.

I'm not sure what your 'invertedIndex' does. You seem to use the 'Show' instance via 'print'. However, I don't think 'Name' has a 'Show' instance. So, I wonder how you convert the 'Name's to 'String's.

In GHC, we do the following, use 'ppr' to convert an entity to an 'SDoc' and then when the 'SDoc' gets turned into a 'String', the dynamic flags determine what detail of a 'Name' to print. You can do this in one step with 'showPpr' <http://www.haskell.org/ghc/docs/latest/html/libraries/ghc-7.6.2/Outputable.html#v:showPpr>, but you need to supply some 'DynFlags' <http://www.haskell.org/ghc/docs/latest/html/libraries/ghc-7.6.2/DynFlags.html#t:DynFlags>.

Manuel



Reply | Threaded
Open this post in threaded view
|

Why do Names defined in the current module lack a module name?

Johan Tibell-2
On Wed, Apr 3, 2013 at 5:42 PM, Manuel M T Chakravarty
<chak at cse.unsw.edu.au> wrote:
> Simon's point is that GHC can print names in different ways. The verbosity of a printed name depends on the context in which it is printed (and -dppr-debug is a way to tell GHC that you want to get all info, including uniques etc).

I understand that. I think it's unrelated to my problem however.
moduleName_maybe, the function that's supposed to give me the module
of a Name returns Nothing for all locally defined names. My guess is
that these Names aren't made fully qualified until some later stage.

> I'm not sure what your 'invertedIndex' does. You seem to use the 'Show' instance via 'print'. However, I don't think 'Name' has a 'Show' instance. So, I wonder how you convert the 'Name's to 'String's.

invertedIndex isn't interesting in this case. Briefly, it takes a Bag
of Names created by the 'symbols' function and build an inverted index
(a search engine term) from the names. I derived Show instances for
all GHC types for debugging.

> In GHC, we do the following, use 'ppr' to convert an entity to an 'SDoc' and then when the 'SDoc' gets turned into a 'String', the dynamic flags determine what detail of a 'Name' to print. You can do this in one step with 'showPpr' <http://www.haskell.org/ghc/docs/latest/html/libraries/ghc-7.6.2/Outputable.html#v:showPpr>, but you need to supply some 'DynFlags' <http://www.haskell.org/ghc/docs/latest/html/libraries/ghc-7.6.2/DynFlags.html#t:DynFlags>.


Reply | Threaded
Open this post in threaded view
|

Why do Names defined in the current module lack a module name?

Simon Peyton Jones

|  I understand that. I think it's unrelated to my problem however.
|  moduleName_maybe, the function that's supposed to give me the module
|  of a Name returns Nothing for all locally defined names. My guess is
|  that these Names aren't made fully qualified until some later stage.

I think there is a danger of speaking at cross purposes here.  

In the example I sent, you can see that the module name is printed, in the output of the renamer, typechecker, and desugarer.  (You need -dppr-debug to see this in the -ddump-tc output, for reasons I mentioned; perhaps that should be changed.)

I  would be very surprised if nameModule_maybe returned Nothing for a Name that printed with its module name.  Yet it appears that you are saying that
        a Name that prints with its module name with -ddump-ds or -ddump-tc
        replies to nameModule_maybe with Nothing
I don't understand that.
       
I suppose that the only way to do so is for Johan to send a reproducible test case; or perhaps I have misunderstood the question.

Simon


Reply | Threaded
Open this post in threaded view
|

Why do Names defined in the current module lack a module name?

Johan Tibell-2
On Thu, Apr 4, 2013 at 1:44 AM, Simon Peyton-Jones
<simonpj at microsoft.com> wrote:
> I  would be very surprised if nameModule_maybe returned Nothing for a Name that printed with its module name.  Yet it appears that you are saying that
>         a Name that prints with its module name with -ddump-ds or -ddump-tc
>         replies to nameModule_maybe with Nothing
> I don't understand that.

Thanks Simon. This is progress, because now I know that this is not
expected behavior, so I can try to figure out if I'm doing something
wrong and if not send you a test case. :)


Reply | Threaded
Open this post in threaded view
|

Why do Names defined in the current module lack a module name?

Johan Tibell-2
Simon, it looks like I'm running into problems similar to those Edsko
described in the email "Top-level type signatures in TcGblEnv?" to
ghc-devs a while ago. The complete information about a source file is
spread out over several different ASTs.


Reply | Threaded
Open this post in threaded view
|

Why do Names defined in the current module lack a module name?

Simon Peyton Jones
Johan

I'd like to help but I don't have a clear enough information of the problem you are trying to solve, or the actual concrete problem you are encountering, to give a meaningful response.

If you are ok, fine; if you want me to help specifically then Skype, a concrete test case, ... something

Simon

| -----Original Message-----
| From: Johan Tibell [mailto:johan.tibell at gmail.com]
| Sent: 04 April 2013 18:46
| To: Simon Peyton-Jones
| Cc: Manuel M T Chakravarty; ghc-devs at haskell.org
| Subject: Re: Why do Names defined in the current module lack a module
| name?
|
| Simon, it looks like I'm running into problems similar to those Edsko
| described in the email "Top-level type signatures in TcGblEnv?" to ghc-
| devs a while ago. The complete information about a source file is spread
| out over several different ASTs.


Reply | Threaded
Open this post in threaded view
|

Why do Names defined in the current module lack a module name?

Johan Tibell-2
Simon,

I've created a small standalone test case here:
https://gist.github.com/tibbe/5321268

Usage:

Download the Main.hs and Test.hs files into the same directory:

https://gist.github.com/tibbe/5321268/raw/68537a79865209d7218429b916a7ab1ccebc18e4/Main.hs
https://gist.github.com/tibbe/5321268/raw/f9cce61b35b3349eeeac622a4c27c1dd5e0410cf/Test.hs

Compile and run:
ghc Main.hs
./Main

Expected output:
Loading Test.hs ...
["Test.mysum","Data.List.foldl'","GHC.Num.+"]

Actual output:
Loading Test.hs ...
[".mysum","Data.List.foldl'","GHC.Num.+"]

Note how the locally defined and exported function mysum lacks a
module name. The code that creates a string from the Name is on line
59 in Main.hs:

    nameToString name = (maybe "" (moduleNameString . moduleName) .
nameModule_maybe $ name)
                        ++ "." ++ getOccString name

The module name is missing in the output because nameModule_maybe
returns Nothing for the mysum Name.

-- Johan


Reply | Threaded
Open this post in threaded view
|

Why do Names defined in the current module lack a module name?

Simon Peyton Jones
Aha.  You should be looking in the abs_exports field of AbsBinds, not in abs_binds.

I've added some (long-needed) comments to HsBinds, which I append below.

I hope that helps.

Simon

Note [AbsBinds]
~~~~~~~~~~~~~~~
The AbsBinds constructor is used in the output of the type checker, to record
*typechecked* and *generalised* bindings.  Consider a module M, with this
top-level binding
    M.reverse []     = []
    M.reverse (x:xs) = M.reverse xs ++ [x]

In Hindley-Milner, a recursive binding is typechecked with the *recursive* uses
being *monomorphic*.  So after typechecking *and* deugaring we will get something
like this
 
    M.reverse :: forall a. [a] -> [a]
      = /\a. letrec
                reverse :: [a] -> [a] = \xs -> case xs of
                                                []     -> []
                                                (x:xs) -> reverse xs ++ [x]
             in reverse

Notice that 'M.reverse' is polymorphic as expected, but there is a local
defintion for plain 'reverse' which is *monomorphic*.  The type variable
'a' scopes over the entire letrec.

That's after desugaring.  What about after type checking but before desugaring?  
That's where AbsBinds comes in.  It looks like this:

   AbsBinds { abs_tvs     = [a]
            , abs_exports = [ABE { abe_poly = M.reverse :: forall a. [a] -> [a],
                                 , abe_mono = reverse :: a -> a}]
            , abs_binds = { reverse :: [a] -> [a]
                               = \xs -> case xs of
                                            []     -> []
                                            (x:xs) -> reverse xs ++ [x] } }

Here,
  * abs_tvs says what type variables are abstracted over the binding group,
    just 'a' in this case.
  * abs_binds is the *monomorphic* bindings of the group
  * abs_exports describes how to get the polymorphic Id 'M.reverse' from the
    monomorphic one 'reverse'

Notice that the *original* function (the polymorphic one you thought
you were defining) appears in the abe_poly field of the
abs_exports. The bindings in abs_binds are for fresh, local, Ids with
a *monomorphic* Id.

If there is a group of mutually recusive functions without type
signatures, we get one AbsBinds with the monomorphic versions of the
bindings in abs_binds, and one element of abe_exports for each
variable bound in the mutually recursive group.  This is true even for
pattern bindings.  Example:
        (f,g) = (\x -> x, f)
After type checking we get
   AbsBinds { abs_tvs     = [a]
            , abs_exports = [ ABE { abe_poly = M.f :: forall a. a -> a
                                  , abe_mono = f :: a -> a }
                            , ABE { abe_poly = M.g :: forall a. a -> a
                                  , abe_mono = g :: a -> a }]
            , abs_binds = { (f,g) = (\x -> x, f) }



|  -----Original Message-----
|  From: Johan Tibell [mailto:johan.tibell at gmail.com]
|  Sent: 05 April 2013 18:58
|  To: Simon Peyton-Jones
|  Cc: Manuel M T Chakravarty; ghc-devs at haskell.org
|  Subject: Re: Why do Names defined in the current module lack a module name?
|  
|  Simon,
|  
|  I've created a small standalone test case here:
|  https://gist.github.com/tibbe/5321268
|  
|  Usage:
|  
|  Download the Main.hs and Test.hs files into the same directory:
|  
|  https://gist.github.com/tibbe/5321268/raw/68537a79865209d7218429b916a7ab
|  1ccebc18e4/Main.hs
|  https://gist.github.com/tibbe/5321268/raw/f9cce61b35b3349eeeac622a4c27c1dd
|  5e0410cf/Test.hs
|  
|  Compile and run:
|  ghc Main.hs
|  ./Main
|  
|  Expected output:
|  Loading Test.hs ...
|  ["Test.mysum","Data.List.foldl'","GHC.Num.+"]
|  
|  Actual output:
|  Loading Test.hs ...
|  [".mysum","Data.List.foldl'","GHC.Num.+"]
|  
|  Note how the locally defined and exported function mysum lacks a
|  module name. The code that creates a string from the Name is on line
|  59 in Main.hs:
|  
|      nameToString name = (maybe "" (moduleNameString . moduleName) .
|  nameModule_maybe $ name)
|                          ++ "." ++ getOccString name
|  
|  The module name is missing in the output because nameModule_maybe
|  returns Nothing for the mysum Name.
|  
|  -- Johan