parser frontend

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

parser frontend

Maurizio Vitale
G'day,
  I'm trying to decide how to architect the frontend of a compiler I'm
using as an excuse for learning Haskell.

  Suppose we have a language (SystemVerilog) that has the notion of
multi-file compilation units.
This means that specific subset of files form separate scopes for certain
things.
For instance if we represent with lists of lists compilation units:
[[a,b,c], [d], [e,f]]
a macro definition in file b, would affect the rest of the file and file c,
but wouldn't have effect on files d,e or a.

Furthermore, if a toplevel construct is split between two files, say c and
d, the compiler should treat the original compilation unit specification as
if it was:
[[a,b,(c,d)][e,f]], where (c,d) means we're compiling the logical
concatenation of c and d.
If (c,d) is also incomplete, (c,d,e) should be tried and so on.

Now in well designed systems, all files can actually be compiled in
parallel and so I'd like to optimize for that case and recompile files
(either because we need side effects from files before them or because
they're incomplete. So I'm not to concerned if wasted work is done for the
degenerate cases.

Any suggestion on how to go about this?
Le's assume that parsing a file is done with a function:
p :: file -> Either Error (ast, [defs], [use]) where the [defs] are things
that might affects files down the line and [use] are things that, if
defined by some prior file, cause recompilation.

I'm interested in both a sequential and parallel solution and it would be
sweet if they where similar, with the proper abstractions.

Thanks a lot for any insight/ideas,

  Maurizio
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/beginners/attachments/20150327/e7f1e63b/attachment.html>

Reply | Threaded
Open this post in threaded view
|

parser frontend

Kostiantyn Rybnikov
Maurizio,

It's not completely clear how your compiler knows which files depend on
which. Does it read files one by one and "updates" dependency list? What
about situation when toplevel construct is split in two files, how does
this single-file API handle it?

On Fri, Mar 27, 2015 at 3:38 PM, Maurizio Vitale <mrz.vtl at gmail.com> wrote:

> G'day,
>   I'm trying to decide how to architect the frontend of a compiler I'm
> using as an excuse for learning Haskell.
>
>   Suppose we have a language (SystemVerilog) that has the notion of
> multi-file compilation units.
> This means that specific subset of files form separate scopes for certain
> things.
> For instance if we represent with lists of lists compilation units:
> [[a,b,c], [d], [e,f]]
> a macro definition in file b, would affect the rest of the file and file
> c, but wouldn't have effect on files d,e or a.
>
> Furthermore, if a toplevel construct is split between two files, say c and
> d, the compiler should treat the original compilation unit specification as
> if it was:
> [[a,b,(c,d)][e,f]], where (c,d) means we're compiling the logical
> concatenation of c and d.
> If (c,d) is also incomplete, (c,d,e) should be tried and so on.
>
> Now in well designed systems, all files can actually be compiled in
> parallel and so I'd like to optimize for that case and recompile files
> (either because we need side effects from files before them or because
> they're incomplete. So I'm not to concerned if wasted work is done for the
> degenerate cases.
>
> Any suggestion on how to go about this?
> Le's assume that parsing a file is done with a function:
> p :: file -> Either Error (ast, [defs], [use]) where the [defs] are things
> that might affects files down the line and [use] are things that, if
> defined by some prior file, cause recompilation.
>
> I'm interested in both a sequential and parallel solution and it would be
> sweet if they where similar, with the proper abstractions.
>
> Thanks a lot for any insight/ideas,
>
>   Maurizio
>
> _______________________________________________
> Beginners mailing list
> Beginners at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/beginners
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/beginners/attachments/20150328/a7a0657a/attachment.html>

Reply | Threaded
Open this post in threaded view
|

parser frontend

Maurizio Vitale
I did some reading and realized that the question is  a bit too wide. I've
got now Simon Marlow's book on parallel/concurrent haskell and going
through it.

Files depend on each other because one defines something that the other
uses. There's more in SystemVerilog, but assume #define and #ifdef pairs.
So the dependency is only implicit: you know it only after you've parsed
both files.
Nevertheless, a well organized system will #include files with definitions
(and do the same for other SystemVerilog things that cause dependencies) so
I'm ok with optimistically assume no dependency and compute a list of
"problems" that cause recompilation.
So if:
file A:
...
#define foo
...
file B:
...
#ifdef foo
...
#endif
...
and the command line has the order fileA, fileB we compile both files in
parallel, but the result of compiling A will be (ast, [foo], []) and the
result of compiling B will be (ast/Error, [], [foo]) since foo is in the
use list of B and in the def list of A we know we have to recompile B (and
compiling B doesn't pass a def list to a possible C until we're satisfied
with it). So when there're dependencies, I'm ok with recompiling multiple
times.

As for the second question, splitting a toplevel across files is even more
perverse and should be extremely rare (I think is there only for historical
reasons: the first Verilog simulators assumed the entire design available
and basically did a 'cat f1..fN' before compiling. Again, I'm happy with
redo work in this case. If compilation units are [ [a,b,c], [d,e] ] and you
have a parse error in c after having consumed the entire file and d has
also a parse error, we restart with a new specification [ [a,b,(c,d)] [e]]
where (c,d) means concatenation of the content of file c and d. Parse for a
and b are still valid. Parse for e might be valid. But even if we
invalidate and redo everything in exchange for simple code, I'm ok with it.

My question can be abstracted a bit more in terms of list processing.
We have a list of lists [[a]]. We need to process each leaf element by
mapping a function p a e which process an element of type 'a' in an
environment 'e' which comes form the evaluation of previous elements. Up to
here, would be normal monadic code to hide 'e'. In addition since in our
domain, we assume 'e' `intersect` 'u' is often empty (where 'u' are the
uses that we discover while processing an element) we'd like to start the
processing of all elements in parallel and redo the ones that turned out to
be invalidated by previous ones. I'd like to find a solution to this that
is in good Haskell style.

We can forget the toplevel split across files for now, that's can be dealt
in a outer layer that restarts everything throwing away all the work.


On Sat, Mar 28, 2015 at 7:13 AM, Konstantine Rybnikov <k-bx at k-bx.com> wrote:

> Maurizio,
>
> It's not completely clear how your compiler knows which files depend on
> which. Does it read files one by one and "updates" dependency list? What
> about situation when toplevel construct is split in two files, how does
> this single-file API handle it?
>
> On Fri, Mar 27, 2015 at 3:38 PM, Maurizio Vitale <mrz.vtl at gmail.com>
> wrote:
>
>> G'day,
>>   I'm trying to decide how to architect the frontend of a compiler I'm
>> using as an excuse for learning Haskell.
>>
>>   Suppose we have a language (SystemVerilog) that has the notion of
>> multi-file compilation units.
>> This means that specific subset of files form separate scopes for certain
>> things.
>> For instance if we represent with lists of lists compilation units:
>> [[a,b,c], [d], [e,f]]
>> a macro definition in file b, would affect the rest of the file and file
>> c, but wouldn't have effect on files d,e or a.
>>
>> Furthermore, if a toplevel construct is split between two files, say c
>> and d, the compiler should treat the original compilation unit
>> specification as if it was:
>> [[a,b,(c,d)][e,f]], where (c,d) means we're compiling the logical
>> concatenation of c and d.
>> If (c,d) is also incomplete, (c,d,e) should be tried and so on.
>>
>> Now in well designed systems, all files can actually be compiled in
>> parallel and so I'd like to optimize for that case and recompile files
>> (either because we need side effects from files before them or because
>> they're incomplete. So I'm not to concerned if wasted work is done for the
>> degenerate cases.
>>
>> Any suggestion on how to go about this?
>> Le's assume that parsing a file is done with a function:
>> p :: file -> Either Error (ast, [defs], [use]) where the [defs] are
>> things that might affects files down the line and [use] are things that, if
>> defined by some prior file, cause recompilation.
>>
>> I'm interested in both a sequential and parallel solution and it would be
>> sweet if they where similar, with the proper abstractions.
>>
>> Thanks a lot for any insight/ideas,
>>
>>   Maurizio
>>
>> _______________________________________________
>> Beginners mailing list
>> Beginners at haskell.org
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/beginners
>>
>>
>
> _______________________________________________
> Beginners mailing list
> Beginners at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/beginners
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/beginners/attachments/20150328/dcf63046/attachment.html>

Reply | Threaded
Open this post in threaded view
|

parser frontend

Maurizio Vitale
In reply to this post by Maurizio Vitale
I think this goes a very long way towards what I need:
https://github.com/ekmett/speculation

If anybody has any suggestion on other libraries that can be of use when
trying to parallelize a parser, I'd be very interested in hearing it.

Thanks,

  Maurizio

On Fri, Mar 27, 2015 at 9:38 AM, Maurizio Vitale <mrz.vtl at gmail.com> wrote:

> G'day,
>   I'm trying to decide how to architect the frontend of a compiler I'm
> using as an excuse for learning Haskell.
>
>   Suppose we have a language (SystemVerilog) that has the notion of
> multi-file compilation units.
> This means that specific subset of files form separate scopes for certain
> things.
> For instance if we represent with lists of lists compilation units:
> [[a,b,c], [d], [e,f]]
> a macro definition in file b, would affect the rest of the file and file
> c, but wouldn't have effect on files d,e or a.
>
> Furthermore, if a toplevel construct is split between two files, say c and
> d, the compiler should treat the original compilation unit specification as
> if it was:
> [[a,b,(c,d)][e,f]], where (c,d) means we're compiling the logical
> concatenation of c and d.
> If (c,d) is also incomplete, (c,d,e) should be tried and so on.
>
> Now in well designed systems, all files can actually be compiled in
> parallel and so I'd like to optimize for that case and recompile files
> (either because we need side effects from files before them or because
> they're incomplete. So I'm not to concerned if wasted work is done for the
> degenerate cases.
>
> Any suggestion on how to go about this?
> Le's assume that parsing a file is done with a function:
> p :: file -> Either Error (ast, [defs], [use]) where the [defs] are things
> that might affects files down the line and [use] are things that, if
> defined by some prior file, cause recompilation.
>
> I'm interested in both a sequential and parallel solution and it would be
> sweet if they where similar, with the proper abstractions.
>
> Thanks a lot for any insight/ideas,
>
>   Maurizio
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/beginners/attachments/20150328/0cb0be86/attachment.html>