More Language.C work for Google's Summer of Code

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

More Language.C work for Google's Summer of Code

Aaron Tomb-3
Hello,

I'm wondering whether there's anyone on the list with an interest in  
doing additional work on the Language.C library for the Summer of  
Code. There are a few enhancements that I'd be very interested seeing,  
and I'd love be a mentor for such a project if there's a student  
interested in working on them.

The first is to integrate preprocessing into the library. Currently,  
the library calls out to GCC to preprocess source files before parsing  
them. This has some unfortunate consequences, however, because  
comments and macro information are lost. A number of program analyses  
could benefit from metadata encoded in comments, because C doesn't  
have any sort of formal annotation mechanism, but in the current state  
we have to resort to ugly hacks (at best) to get at the contents of  
comments. Also, effective diagnostic messages need to be closely tied  
to original source code. In the presence of pre-processed macros,  
column number information is unreliable, so it can be difficult to  
describe to a user exactly what portion of a program a particular  
analysis refers to. An integrated preprocessor could retain comments  
and remember information about macros, eliminating both of these  
problems.

The second possible project is to create a nicer interface for  
traversals over Language.C ASTs. Currently, the symbol table is built  
to include only information about global declarations and those other  
declarations currently in scope. Therefore, when performing multiple  
traversals over an AST, each traversal must re-analyze all global  
declarations and the entire AST of the function of interest. A better  
solution might be to build a traversal that creates a single symbol  
table describing all declarations in a translation unit (including  
function- and block-scoped variables), for easy reference during  
further traversals. It may also be valuable to have this traversal  
produce a slightly-simplified AST in the process. I'm not thinking of  
anything as radical as the simplifications performed by something like  
CIL, however. It might simply be enough to transform variable  
references into a form suitable for easy lookup in a complete symbol  
table like I've just described. Other simple transformations such as  
making all implicit casts explicit, or normalizing compound  
initializers, could also be good.

A third possibility, which would probably depend on the integrated  
preprocessor, would be to create an exact pretty-printer. That is, a  
pretty-printing function such that pretty . parse is the identity.  
Currently, parse . pretty should be the identity, but it's not true  
the other way around. An exact pretty-printer would be very useful in  
creating rich presentations of C source code --- think LXR on steroids.

If you're interested in any combination of these, or anything similar,  
let me know. The deadline is approaching quickly, but I'd be happy to  
work together with a student to flesh any of these out into a full  
proposal.

Thanks,
Aaron

--
Aaron Tomb
Galois, Inc. (http://www.galois.com)
[hidden email]
Phone: (503) 808-7206
Fax: (503) 350-0833

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: More Language.C work for Google's Summer of Code

Serguey Zefirov
I tried to devise a C preprocessor, but then I figured out that I
could write something like that:
---------------------------
#define A(arg) A_start (arg) A_end

#define A_start "this is A_start definition."
#define A_end "this is A_end definition."

A (
#undef A_start
#define A_start A_end
)
---------------------------

gcc preprocesses it into the following:
---------------------------
# 1 "a.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "a.c"





"this is A_end definition." () "this is A_end definition."
---------------------------

Another woes are filenames in angle brackets for #include. They
require special case for tokenizer.

So I given it (fully compliant C preprocessor) up. ;)

Other than that, C preprocessor looks simple.

I hardly qualify as a student, though.

2010/3/30 Aaron Tomb <[hidden email]>:

> The first is to integrate preprocessing into the library. Currently, the
> library calls out to GCC to preprocess source files before parsing them.
> This has some unfortunate consequences, however, because comments and macro
> information are lost. A number of program analyses could benefit from
> metadata encoded in comments, because C doesn't have any sort of formal
> annotation mechanism, but in the current state we have to resort to ugly
> hacks (at best) to get at the contents of comments. Also, effective
> diagnostic messages need to be closely tied to original source code. In the
> presence of pre-processed macros, column number information is unreliable,
> so it can be difficult to describe to a user exactly what portion of a
> program a particular analysis refers to. An integrated preprocessor could
> retain comments and remember information about macros, eliminating both of
> these problems.
>
> The second possible project is to create a nicer interface for traversals
> over Language.C ASTs. Currently, the symbol table is built to include only
> information about global declarations and those other declarations currently
> in scope. Therefore, when performing multiple traversals over an AST, each
> traversal must re-analyze all global declarations and the entire AST of the
> function of interest. A better solution might be to build a traversal that
> creates a single symbol table describing all declarations in a translation
> unit (including function- and block-scoped variables), for easy reference
> during further traversals. It may also be valuable to have this traversal
> produce a slightly-simplified AST in the process. I'm not thinking of
> anything as radical as the simplifications performed by something like CIL,
> however. It might simply be enough to transform variable references into a
> form suitable for easy lookup in a complete symbol table like I've just
> described. Other simple transformations such as making all implicit casts
> explicit, or normalizing compound initializers, could also be good.
>
> A third possibility, which would probably depend on the integrated
> preprocessor, would be to create an exact pretty-printer. That is, a
> pretty-printing function such that pretty . parse is the identity.
> Currently, parse . pretty should be the identity, but it's not true the
> other way around. An exact pretty-printer would be very useful in creating
> rich presentations of C source code --- think LXR on steroids.
>
> If you're interested in any combination of these, or anything similar, let
> me know. The deadline is approaching quickly, but I'd be happy to work
> together with a student to flesh any of these out into a full proposal.
>
> Thanks,
> Aaron
>
> --
> Aaron Tomb
> Galois, Inc. (http://www.galois.com)
> [hidden email]
> Phone: (503) 808-7206
> Fax: (503) 350-0833
>
> _______________________________________________
> Haskell-Cafe mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: More Language.C work for Google's Summer of Code

Stephen Tetley-2
On 30 March 2010 18:55, Serguey Zefirov <[hidden email]> wrote:

>
> Other than that, C preprocessor looks simple.
>


Ah no - apparently anything but simple.

You might want to see Jean-Marie Favre's (very readable, amusing)
papers on subject. Much of the behaviour of CPP is not defined and
often inaccurately described, certainly it wouldn't appear to make an
ideal one summer, student project.


http://megaplanet.org/jean-marie-favre/papers/CPPDenotationalSemantics.pdf

There are some others as well from his home page.

Best wishes

Stephen
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: More Language.C work for Google's Summer of Code

Achim Schneider
Stephen Tetley <[hidden email]> wrote:

> Much of the behaviour of CPP is not defined and  often inaccurately
> described, certainly it wouldn't appear to make an ideal one summer,
> student project.
>
If you get

http://ldeniau.web.cern.ch/ldeniau/cos.html

to work, virtually everything else should work, too.

Macro languages haven't been in fashion in the last decades, so you
have to locate a veritable fan to work on this.

There are, after all, still people writing TeX macros. There's got to
be some CPP zealots, out there.


--
(c) this sig last receiving data processing entity. Inspect headers
for copyright history. All rights reserved. Copying, hiring, renting,
performance and/or quoting of this signature prohibited.


_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: More Language.C work for Google's Summer of Code

austin seipp-2
In reply to this post by Aaron Tomb-3
(sorry for the dupe aaron! forgot to add haskell-cafe to senders list!)

Perhaps the best course of action would be to try and extend cpphs to
do things like this? From the looks of the interface, it can already
do some of these things e.g. do not strip comments from a file:

http://hackage.haskell.org/packages/archive/cpphs/1.11/doc/html/Language-Preprocessor-Cpphs.html#t%3ABoolOptions

Malcolm would have to attest to how complete it is w.r.t. say, gcc's
preprocessor, but if this were to be a SOC project, extending cpphs to
include needed functionality would probably be much more realistic
than writing a new one.

On Tue, Mar 30, 2010 at 12:30 PM, Aaron Tomb <[hidden email]> wrote:

> Hello,
>
> I'm wondering whether there's anyone on the list with an interest in doing
> additional work on the Language.C library for the Summer of Code. There are
> a few enhancements that I'd be very interested seeing, and I'd love be a
> mentor for such a project if there's a student interested in working on
> them.
>
> The first is to integrate preprocessing into the library. Currently, the
> library calls out to GCC to preprocess source files before parsing them.
> This has some unfortunate consequences, however, because comments and macro
> information are lost. A number of program analyses could benefit from
> metadata encoded in comments, because C doesn't have any sort of formal
> annotation mechanism, but in the current state we have to resort to ugly
> hacks (at best) to get at the contents of comments. Also, effective
> diagnostic messages need to be closely tied to original source code. In the
> presence of pre-processed macros, column number information is unreliable,
> so it can be difficult to describe to a user exactly what portion of a
> program a particular analysis refers to. An integrated preprocessor could
> retain comments and remember information about macros, eliminating both of
> these problems.
>
> The second possible project is to create a nicer interface for traversals
> over Language.C ASTs. Currently, the symbol table is built to include only
> information about global declarations and those other declarations currently
> in scope. Therefore, when performing multiple traversals over an AST, each
> traversal must re-analyze all global declarations and the entire AST of the
> function of interest. A better solution might be to build a traversal that
> creates a single symbol table describing all declarations in a translation
> unit (including function- and block-scoped variables), for easy reference
> during further traversals. It may also be valuable to have this traversal
> produce a slightly-simplified AST in the process. I'm not thinking of
> anything as radical as the simplifications performed by something like CIL,
> however. It might simply be enough to transform variable references into a
> form suitable for easy lookup in a complete symbol table like I've just
> described. Other simple transformations such as making all implicit casts
> explicit, or normalizing compound initializers, could also be good.
>
> A third possibility, which would probably depend on the integrated
> preprocessor, would be to create an exact pretty-printer. That is, a
> pretty-printing function such that pretty . parse is the identity.
> Currently, parse . pretty should be the identity, but it's not true the
> other way around. An exact pretty-printer would be very useful in creating
> rich presentations of C source code --- think LXR on steroids.
>
> If you're interested in any combination of these, or anything similar, let
> me know. The deadline is approaching quickly, but I'd be happy to work
> together with a student to flesh any of these out into a full proposal.
>
> Thanks,
> Aaron
>
> --
> Aaron Tomb
> Galois, Inc. (http://www.galois.com)
> [hidden email]
> Phone: (503) 808-7206
> Fax: (503) 350-0833
>
> _______________________________________________
> Haskell-Cafe mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>



--
- Austin
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: More Language.C work for Google's Summer of Code

Edward Amsden-6
In reply to this post by Aaron Tomb-3
I'd be very much interested in working on this library for GSoC. I'm
currently working on an idea for another project, but I'm not certain
how widely beneficial it would be. The preprocessor and
pretty-printing projects sound especially intriguing.

On Tue, Mar 30, 2010 at 1:30 PM, Aaron Tomb <[hidden email]> wrote:

> Hello,
>
> I'm wondering whether there's anyone on the list with an interest in doing
> additional work on the Language.C library for the Summer of Code. There are
> a few enhancements that I'd be very interested seeing, and I'd love be a
> mentor for such a project if there's a student interested in working on
> them.
>
> The first is to integrate preprocessing into the library. Currently, the
> library calls out to GCC to preprocess source files before parsing them.
> This has some unfortunate consequences, however, because comments and macro
> information are lost. A number of program analyses could benefit from
> metadata encoded in comments, because C doesn't have any sort of formal
> annotation mechanism, but in the current state we have to resort to ugly
> hacks (at best) to get at the contents of comments. Also, effective
> diagnostic messages need to be closely tied to original source code. In the
> presence of pre-processed macros, column number information is unreliable,
> so it can be difficult to describe to a user exactly what portion of a
> program a particular analysis refers to. An integrated preprocessor could
> retain comments and remember information about macros, eliminating both of
> these problems.
>
> The second possible project is to create a nicer interface for traversals
> over Language.C ASTs. Currently, the symbol table is built to include only
> information about global declarations and those other declarations currently
> in scope. Therefore, when performing multiple traversals over an AST, each
> traversal must re-analyze all global declarations and the entire AST of the
> function of interest. A better solution might be to build a traversal that
> creates a single symbol table describing all declarations in a translation
> unit (including function- and block-scoped variables), for easy reference
> during further traversals. It may also be valuable to have this traversal
> produce a slightly-simplified AST in the process. I'm not thinking of
> anything as radical as the simplifications performed by something like CIL,
> however. It might simply be enough to transform variable references into a
> form suitable for easy lookup in a complete symbol table like I've just
> described. Other simple transformations such as making all implicit casts
> explicit, or normalizing compound initializers, could also be good.
>
> A third possibility, which would probably depend on the integrated
> preprocessor, would be to create an exact pretty-printer. That is, a
> pretty-printing function such that pretty . parse is the identity.
> Currently, parse . pretty should be the identity, but it's not true the
> other way around. An exact pretty-printer would be very useful in creating
> rich presentations of C source code --- think LXR on steroids.
>
> If you're interested in any combination of these, or anything similar, let
> me know. The deadline is approaching quickly, but I'd be happy to work
> together with a student to flesh any of these out into a full proposal.
>
> Thanks,
> Aaron
>
> --
> Aaron Tomb
> Galois, Inc. (http://www.galois.com)
> [hidden email]
> Phone: (503) 808-7206
> Fax: (503) 350-0833
>
> _______________________________________________
> Haskell-Cafe mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: More Language.C work for Google's Summer of Code

Aaron Tomb-3
In reply to this post by austin seipp-2
Yes, that would definitely be one productive way forward. One concern  
is that Language.C is BSD-licensed (and it would be nice to keep it  
that way), and cpphs is LGPL. However, if cpphs remained a separate  
program, producing C + extra stuff as output, and the Language.C  
parser understood the extra stuff, this could accomplish what I'm  
interested in. It would be interesting, even, to just extend the  
Language.C parser to support comments, and to tell cpphs to leave them  
in.

There's also another pre-processor, mcpp [1], that is quite featureful  
and robust, and which supports an output mode with special syntax  
describing the origin of the code resulting from macro expansion.

Aaron

[1] http://mcpp.sourceforge.net/

On Mar 30, 2010, at 12:14 PM, austin seipp wrote:

> (sorry for the dupe aaron! forgot to add haskell-cafe to senders  
> list!)
>
> Perhaps the best course of action would be to try and extend cpphs to
> do things like this? From the looks of the interface, it can already
> do some of these things e.g. do not strip comments from a file:
>
> http://hackage.haskell.org/packages/archive/cpphs/1.11/doc/html/Language-Preprocessor-Cpphs.html#t%3ABoolOptions
>
> Malcolm would have to attest to how complete it is w.r.t. say, gcc's
> preprocessor, but if this were to be a SOC project, extending cpphs to
> include needed functionality would probably be much more realistic
> than writing a new one.
>
> On Tue, Mar 30, 2010 at 12:30 PM, Aaron Tomb <[hidden email]> wrote:
>> Hello,
>>
>> I'm wondering whether there's anyone on the list with an interest  
>> in doing
>> additional work on the Language.C library for the Summer of Code.  
>> There are
>> a few enhancements that I'd be very interested seeing, and I'd love  
>> be a
>> mentor for such a project if there's a student interested in  
>> working on
>> them.
>>
>> The first is to integrate preprocessing into the library.  
>> Currently, the
>> library calls out to GCC to preprocess source files before parsing  
>> them.
>> This has some unfortunate consequences, however, because comments  
>> and macro
>> information are lost. A number of program analyses could benefit from
>> metadata encoded in comments, because C doesn't have any sort of  
>> formal
>> annotation mechanism, but in the current state we have to resort to  
>> ugly
>> hacks (at best) to get at the contents of comments. Also, effective
>> diagnostic messages need to be closely tied to original source  
>> code. In the
>> presence of pre-processed macros, column number information is  
>> unreliable,
>> so it can be difficult to describe to a user exactly what portion  
>> of a
>> program a particular analysis refers to. An integrated preprocessor  
>> could
>> retain comments and remember information about macros, eliminating  
>> both of
>> these problems.
>>
>> The second possible project is to create a nicer interface for  
>> traversals
>> over Language.C ASTs. Currently, the symbol table is built to  
>> include only
>> information about global declarations and those other declarations  
>> currently
>> in scope. Therefore, when performing multiple traversals over an  
>> AST, each
>> traversal must re-analyze all global declarations and the entire  
>> AST of the
>> function of interest. A better solution might be to build a  
>> traversal that
>> creates a single symbol table describing all declarations in a  
>> translation
>> unit (including function- and block-scoped variables), for easy  
>> reference
>> during further traversals. It may also be valuable to have this  
>> traversal
>> produce a slightly-simplified AST in the process. I'm not thinking of
>> anything as radical as the simplifications performed by something  
>> like CIL,
>> however. It might simply be enough to transform variable references  
>> into a
>> form suitable for easy lookup in a complete symbol table like I've  
>> just
>> described. Other simple transformations such as making all implicit  
>> casts
>> explicit, or normalizing compound initializers, could also be good.
>>
>> A third possibility, which would probably depend on the integrated
>> preprocessor, would be to create an exact pretty-printer. That is, a
>> pretty-printing function such that pretty . parse is the identity.
>> Currently, parse . pretty should be the identity, but it's not true  
>> the
>> other way around. An exact pretty-printer would be very useful in  
>> creating
>> rich presentations of C source code --- think LXR on steroids.
>>
>> If you're interested in any combination of these, or anything  
>> similar, let
>> me know. The deadline is approaching quickly, but I'd be happy to  
>> work
>> together with a student to flesh any of these out into a full  
>> proposal.
>>
>> Thanks,
>> Aaron
>>
>> --
>> Aaron Tomb
>> Galois, Inc. (http://www.galois.com)
>> [hidden email]
>> Phone: (503) 808-7206
>> Fax: (503) 350-0833
>>
>> _______________________________________________
>> Haskell-Cafe mailing list
>> [hidden email]
>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>>
>
>
>
> --
> - Austin
> _______________________________________________
> Haskell-Cafe mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: More Language.C work for Google's Summer of Code

Nick Bowler-3
In reply to this post by Stephen Tetley-2
On 19:54 Tue 30 Mar     , Stephen Tetley wrote:
> On 30 March 2010 18:55, Serguey Zefirov <[hidden email]> wrote:
> > Other than that, C preprocessor looks simple.
>
> Ah no - apparently anything but simple.

I would describe it as "simple but somewhat annoying".  This means that
guessing at its specification will not result in anything resembling a
correct implementation, but reading the specification and implementing
accordingly is straightforward.

Probably the hardest part is expression evaluation.

> You might want to see Jean-Marie Favre's (very readable, amusing)
> papers on subject. Much of the behaviour of CPP is not defined and
> often inaccurately described, certainly it wouldn't appear to make an
> ideal one summer, student project.

The only specification of the C preprocessor that matters is the one
contained in the specification of the C programming language.  The
accuracy of any other description of it is not relevant.  C is quite
possibly the language with the greatest quantity of inaccurate
descriptions in existence (scratch that, C++ is likely worse).

As with most of the C programming language, a lot of the behaviour is
implementation-defined or even undefined, as you suggest.  For example:

/* implementation-defined */
#pragma launch_missiles

/* undefined */
#define explosion defined
#if explosion
# pragma launch_missiles
#endif

This makes a preprocessor /easier/ to implement, because in these cases
the implementer can do /whatever she wants/, including doing nothing or
starting the missile launch procedure.  In the implementation-defined
case, the implementor must additionally write the decision down
somewhere, i.e. "Upon execution of a #pragma launch_missiles directive,
all missiles are launched".

> http://megaplanet.org/jean-marie-favre/papers/CPPDenotationalSemantics.pdf

If this paper had criticised the actual C standard as opposed to a
working draft, it would have been easier to take it seriously.  I find
the published standard quite clear about the requirements of a C
preprocessor.

Nevertheless, assuming that the complaints of the paper remain valid, it
appears to boil down to "The C is preprocessor is weird, and one must
read its whole specification to understand all of it".  It also seems to
contain a bit of "The C standard does not precisely describe the GNU C
preprocessor".

This work is certainly within the scope of a summer project.

--
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: More Language.C work for Google's Summer of Code

Aaron Tomb-3
In reply to this post by Edward Amsden-6
That's very good to hear!

When it comes to preprocessing and exact printing, I think that there  
are various stages of completeness that we could support.

   1) Add support for parsing comments to the Language.C parser. Keep  
using an external pre-processor but tell it to leave comments in the  
source code. The cpphs pre-processor can do this. The trickiest bit  
here would have to do with where to record the comments in the AST.  
What AST node is a given comment associate with? We could probably  
come up with some general rules, and perhaps certain comments, in  
weird locations, would still be ignored.

   2) Support correct column numbers for source locations. This falls  
short of complete macro support, but covers one of the key problems  
that macros introduce. The mcpp preprocessor [1] has a special  
diagnostic mode where it adds special comments describing the origin  
of code that resulted from macro expansion. If the parser retained  
comments, we could use this information to help with exact pretty-
printing.

   3) Modify the pretty-printer to take position information into  
account when pretty-printing (at least optionally). As long as macro  
definitions themselves (as well as #ifdef, etc.) are not in the AST,  
the output will still not be exactly the same as the input, but it'll  
come closer.

   4) Add full support for parsing and expanding macros internally, so  
that both macro definitions and expansions appear in the Language.C  
AST. This is probably a huge project, partly because macros do not  
have to obey the tree structure of the C language in any way. This is  
perhaps beyond the scope of a summer project, but the other steps  
could help prepare for it in the future, and still fully address some  
of the problems caused by the preprocessor along the way.

Do you think you'd be interested in some subset or variation of 1, 2,  
and 3? Are there other ideas you have? Things I've missed? Things  
you'd do differently?

Thanks,
Aaron


[1] http://mcpp.sourceforge.net/


On Mar 30, 2010, at 1:46 PM, Edward Amsden wrote:

> I'd be very much interested in working on this library for GSoC. I'm
> currently working on an idea for another project, but I'm not certain
> how widely beneficial it would be. The preprocessor and
> pretty-printing projects sound especially intriguing.
>
> On Tue, Mar 30, 2010 at 1:30 PM, Aaron Tomb <[hidden email]> wrote:
>> Hello,
>>
>> I'm wondering whether there's anyone on the list with an interest  
>> in doing
>> additional work on the Language.C library for the Summer of Code.  
>> There are
>> a few enhancements that I'd be very interested seeing, and I'd love  
>> be a
>> mentor for such a project if there's a student interested in  
>> working on
>> them.
>>
>> The first is to integrate preprocessing into the library.  
>> Currently, the
>> library calls out to GCC to preprocess source files before parsing  
>> them.
>> This has some unfortunate consequences, however, because comments  
>> and macro
>> information are lost. A number of program analyses could benefit from
>> metadata encoded in comments, because C doesn't have any sort of  
>> formal
>> annotation mechanism, but in the current state we have to resort to  
>> ugly
>> hacks (at best) to get at the contents of comments. Also, effective
>> diagnostic messages need to be closely tied to original source  
>> code. In the
>> presence of pre-processed macros, column number information is  
>> unreliable,
>> so it can be difficult to describe to a user exactly what portion  
>> of a
>> program a particular analysis refers to. An integrated preprocessor  
>> could
>> retain comments and remember information about macros, eliminating  
>> both of
>> these problems.
>>
>> The second possible project is to create a nicer interface for  
>> traversals
>> over Language.C ASTs. Currently, the symbol table is built to  
>> include only
>> information about global declarations and those other declarations  
>> currently
>> in scope. Therefore, when performing multiple traversals over an  
>> AST, each
>> traversal must re-analyze all global declarations and the entire  
>> AST of the
>> function of interest. A better solution might be to build a  
>> traversal that
>> creates a single symbol table describing all declarations in a  
>> translation
>> unit (including function- and block-scoped variables), for easy  
>> reference
>> during further traversals. It may also be valuable to have this  
>> traversal
>> produce a slightly-simplified AST in the process. I'm not thinking of
>> anything as radical as the simplifications performed by something  
>> like CIL,
>> however. It might simply be enough to transform variable references  
>> into a
>> form suitable for easy lookup in a complete symbol table like I've  
>> just
>> described. Other simple transformations such as making all implicit  
>> casts
>> explicit, or normalizing compound initializers, could also be good.
>>
>> A third possibility, which would probably depend on the integrated
>> preprocessor, would be to create an exact pretty-printer. That is, a
>> pretty-printing function such that pretty . parse is the identity.
>> Currently, parse . pretty should be the identity, but it's not true  
>> the
>> other way around. An exact pretty-printer would be very useful in  
>> creating
>> rich presentations of C source code --- think LXR on steroids.
>>
>> If you're interested in any combination of these, or anything  
>> similar, let
>> me know. The deadline is approaching quickly, but I'd be happy to  
>> work
>> together with a student to flesh any of these out into a full  
>> proposal.
>>
>> Thanks,
>> Aaron
>>
>> --
>> Aaron Tomb
>> Galois, Inc. (http://www.galois.com)
>> [hidden email]
>> Phone: (503) 808-7206
>> Fax: (503) 350-0833
>>
>> _______________________________________________
>> Haskell-Cafe mailing list
>> [hidden email]
>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>>
> _______________________________________________
> Haskell-Cafe mailing list
> [hidden email]
> http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: More Language.C work for Google's Summer of Code

Tom Hawkins-2
In reply to this post by Aaron Tomb-3
On Tue, Mar 30, 2010 at 7:30 PM, Aaron Tomb <[hidden email]> wrote:
> Hello,
>
> I'm wondering whether there's anyone on the list with an interest in doing
> additional work on the Language.C library for the Summer of Code. There are
> a few enhancements that I'd be very interested seeing, and I'd love be a
> mentor for such a project if there's a student interested in working on
> them.

Here's another suggestion: A transformer to convert Language.C's AST
to RTL, thus hiding a lot of tedious details like structures, case
statements, variable declarations, typedefs, etc.

I started writing a model checker [1] based on Language.C, but got so
bogged down in all the details of C I lost interest.

-Tom

[1] http://hackage.haskell.org/package/afv
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: More Language.C work for Google's Summer of Code

Aaron Tomb-3
On Mar 30, 2010, at 3:16 PM, Tom Hawkins wrote:

> On Tue, Mar 30, 2010 at 7:30 PM, Aaron Tomb <[hidden email]> wrote:
>> Hello,
>>
>> I'm wondering whether there's anyone on the list with an interest  
>> in doing
>> additional work on the Language.C library for the Summer of Code.  
>> There are
>> a few enhancements that I'd be very interested seeing, and I'd love  
>> be a
>> mentor for such a project if there's a student interested in  
>> working on
>> them.
>
> Here's another suggestion: A transformer to convert Language.C's AST
> to RTL, thus hiding a lot of tedious details like structures, case
> statements, variable declarations, typedefs, etc.
>
> I started writing a model checker [1] based on Language.C, but got so
> bogged down in all the details of C I lost interest.

I would also love to have something along these lines, and would be  
happy to mentor such a project.

On a related note, I have some code sitting around that converts  
Language.C ASTs into a variant of Guarded Commands, and I expect I'll  
release that at some point. For the moment, it's a little too  
intimately tied to the program it's part of, though.

Aaron
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: More Language.C work for Google's Summer of Code

Edward Amsden-6
In reply to this post by Aaron Tomb-3
On Tue, Mar 30, 2010 at 5:14 PM, Aaron Tomb <[hidden email]> wrote:

> That's very good to hear!
>
> When it comes to preprocessing and exact printing, I think that there are
> various stages of completeness that we could support.
>
>  1) Add support for parsing comments to the Language.C parser. Keep using an
> external pre-processor but tell it to leave comments in the source code. The
> cpphs pre-processor can do this. The trickiest bit here would have to do
> with where to record the comments in the AST. What AST node is a given
> comment associate with? We could probably come up with some general rules,
> and perhaps certain comments, in weird locations, would still be ignored.

>
>  2) Support correct column numbers for source locations. This falls short of
> complete macro support, but covers one of the key problems that macros
> introduce. The mcpp preprocessor [1] has a special diagnostic mode where it
> adds special comments describing the origin of code that resulted from macro
> expansion. If the parser retained comments, we could use this information to
> help with exact pretty-printing.
>
>  3) Modify the pretty-printer to take position information into account when
> pretty-printing (at least optionally). As long as macro definitions
> themselves (as well as #ifdef, etc.) are not in the AST, the output will
> still not be exactly the same as the input, but it'll come closer.
>
>  4) Add full support for parsing and expanding macros internally, so that
> both macro definitions and expansions appear in the Language.C AST. This is
> probably a huge project, partly because macros do not have to obey the tree
> structure of the C language in any way. This is perhaps beyond the scope of
> a summer project, but the other steps could help prepare for it in the
> future, and still fully address some of the problems caused by the
> preprocessor along the way.
I haven't looked at the C spec on macros, but I'm pretty motivated and
would like to shoot for a big project.

>
> Do you think you'd be interested in some subset or variation of 1, 2, and 3?
> Are there other ideas you have? Things I've missed? Things you'd do
> differently?

I'm very interested in all 3 of them, and actually somewhat in #4,
though I'll have to do some reading to understand why you're saying
it's such a big undertaking.

>
> Thanks,
> Aaron
>
>
> [1] http://mcpp.sourceforge.net/
>
>
> On Mar 30, 2010, at 1:46 PM, Edward Amsden wrote:
>
>> I'd be very much interested in working on this library for GSoC. I'm
>> currently working on an idea for another project, but I'm not certain
>> how widely beneficial it would be. The preprocessor and
>> pretty-printing projects sound especially intriguing.
>>
>> On Tue, Mar 30, 2010 at 1:30 PM, Aaron Tomb <[hidden email]> wrote:
>>>
>>> Hello,
>>>
>>> I'm wondering whether there's anyone on the list with an interest in
>>> doing
>>> additional work on the Language.C library for the Summer of Code. There
>>> are
>>> a few enhancements that I'd be very interested seeing, and I'd love be a
>>> mentor for such a project if there's a student interested in working on
>>> them.
>>>
>>> The first is to integrate preprocessing into the library. Currently, the
>>> library calls out to GCC to preprocess source files before parsing them.
>>> This has some unfortunate consequences, however, because comments and
>>> macro
>>> information are lost. A number of program analyses could benefit from
>>> metadata encoded in comments, because C doesn't have any sort of formal
>>> annotation mechanism, but in the current state we have to resort to ugly
>>> hacks (at best) to get at the contents of comments. Also, effective
>>> diagnostic messages need to be closely tied to original source code. In
>>> the
>>> presence of pre-processed macros, column number information is
>>> unreliable,
>>> so it can be difficult to describe to a user exactly what portion of a
>>> program a particular analysis refers to. An integrated preprocessor could
>>> retain comments and remember information about macros, eliminating both
>>> of
>>> these problems.
>>>
>>> The second possible project is to create a nicer interface for traversals
>>> over Language.C ASTs. Currently, the symbol table is built to include
>>> only
>>> information about global declarations and those other declarations
>>> currently
>>> in scope. Therefore, when performing multiple traversals over an AST,
>>> each
>>> traversal must re-analyze all global declarations and the entire AST of
>>> the
>>> function of interest. A better solution might be to build a traversal
>>> that
>>> creates a single symbol table describing all declarations in a
>>> translation
>>> unit (including function- and block-scoped variables), for easy reference
>>> during further traversals. It may also be valuable to have this traversal
>>> produce a slightly-simplified AST in the process. I'm not thinking of
>>> anything as radical as the simplifications performed by something like
>>> CIL,
>>> however. It might simply be enough to transform variable references into
>>> a
>>> form suitable for easy lookup in a complete symbol table like I've just
>>> described. Other simple transformations such as making all implicit casts
>>> explicit, or normalizing compound initializers, could also be good.
>>>
>>> A third possibility, which would probably depend on the integrated
>>> preprocessor, would be to create an exact pretty-printer. That is, a
>>> pretty-printing function such that pretty . parse is the identity.
>>> Currently, parse . pretty should be the identity, but it's not true the
>>> other way around. An exact pretty-printer would be very useful in
>>> creating
>>> rich presentations of C source code --- think LXR on steroids.
>>>
>>> If you're interested in any combination of these, or anything similar,
>>> let
>>> me know. The deadline is approaching quickly, but I'd be happy to work
>>> together with a student to flesh any of these out into a full proposal.
>>>
>>> Thanks,
>>> Aaron
>>>
>>> --
>>> Aaron Tomb
>>> Galois, Inc. (http://www.galois.com)
>>> [hidden email]
>>> Phone: (503) 808-7206
>>> Fax: (503) 350-0833
>>>
>>> _______________________________________________
>>> Haskell-Cafe mailing list
>>> [hidden email]
>>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>>>
>> _______________________________________________
>> Haskell-Cafe mailing list
>> [hidden email]
>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: More Language.C work for Google's Summer of Code

wren ng thornton
In reply to this post by Stephen Tetley-2
Stephen Tetley wrote:
> Much of the behaviour of CPP is not defined and
> often inaccurately described, certainly it wouldn't appear to make an
> ideal one summer, student project.

But to give Language.C integrated support for preprocessing, one needn't
implement CPP. They only need to implement the right API for a
preprocessor to communicate with the parser/analyzer.

Considering all the folks outside of C who use the CPP
*cough*Haskell*cough* having a stand-alone CPP would be good in its own
right. In fact, I seem to recall there's already one of those floating
around somewhere... ;)

I think it'd be far cooler and more useful to give Language.C integrated
preprocessor support without hard-wiring it to the CPP. Especially given
as there are divergent semantics for different CPP implementations, and
given we could easily imagine wanting to use another preprocessor (e.g.,
for annotations, documentation, etc)

--
Live well,
~wren
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe
Reply | Threaded
Open this post in threaded view
|

Re: More Language.C work for Google's Summer of Code

Malcolm Wallace
In reply to this post by austin seipp-2
> Malcolm would have to attest to how complete it is w.r.t. say, gcc's
> preprocessor,

cpphs is intended to be as faithful to the CPP standard as possible,  
whilst still retaining the extra flexibility we want in a non-C  
environment, e.g. retaining the operator symbols //, /*, and */.  If  
the behaviour of cpphs does not match gcc -E, then it is either a bug  
(please report it) or an intentional feature.

Real CPP is rather horribly defined as a lexical analyser for C, so  
has a builtin notion of identifier, operator, etc, which is not so  
useful for all the other settings in which we just want to use  
conditional inclusion or macros.  Also, CPP fully intermingles  
conditionals, file inclusion, and macro expansion, whereas cpphs makes  
a strenuous effort to separate those things into logical phases: first  
the conditionals and inclusions, then macro expansion.  This  
separation makes it possible to run only one or other of the phases,  
which can occasionally be useful.

 > One concern is that Language.C is BSD-licensed (and it would be  
nice to keep it that way), and cpphs is LGPL. However, if cpphs  
remained a separate program, producing C + extra stuff as output, and  
the Language.C parser understood the extra stuff, this could  
accomplish what I'm interested in.

As for licensing, yes, cpphs as a standalone binary, is GPL.  The  
library version is LGPL.  One misconception is that a BSD-licensed  
library cannot use an LGPL'd library - of course it can.  You just  
need to ensure that everyone can update the LGPL'd part if they wish.  
And as I always state for all of my tools, if the licence is a problem  
for any user, contact me to negotiate terms.  I'm perfectly willing to  
allow commercial distribution with exemption from some of the GPL  
obligations.  (And I note in passing that other alternatives like gcc  
are also GPL'd.)

Regards,
     Malcolm
_______________________________________________
Haskell-Cafe mailing list
[hidden email]
http://www.haskell.org/mailman/listinfo/haskell-cafe