Proposal: better library management ideas (was: how to checkout proper submodules)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
27 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Proposal: better library management ideas (was: how to checkout proper submodules)

Austin Seipp-4
So, there seems to be a fairly clear majority favor in terms of doing
something I think. The question then, is what. I'm fairly convinced
from Ian's response earlier that submodules *can* be dangerous if
you're using a lot of high-traffic packages, especially the ability to
trample each other might be bad. I could see this happening for base
for example if two people are working on large features and do not
coordinate a merge. Git's own merge facility doesn't suffer nearly as
bad from this problem and we can figure out how we want that to happen
later.

However, it seems like every high-volume package is for better or
worse, intimately tied to GHC. These packages are also the most
problematic to rollback 'in sync' with GHC. As Geoffrey mentioned,
this also becomes even MORE impossible if you use merges without
fast-forwards or rebases, because dates no longer correlate
accurately. These package include base, and testsuite. Probably nofib
as well. In some sense, I agree with Malcolm that 'base' being GHC
only is maybe unfortunate. But maybe it's not (I'll talk more about
this later,) and maybe in the mean time we shouldn't lie to ourselves.

So first off, I'd like to propose something I guess, which seems, to
me, the best approach for one if we want to avoid developer pain with
as many wins as possible in the long run. I hope this doesn't sound
actively radical or anything, but it's going to totally sound actively
radical (though I don't think it is):

--> Let's just put base and testsuite inside the GHC repository
directly. No submodules, no floating repos. Just put it directly
inside and make a super commit, I guess. GHC becomes the de facto
repository. And hey, why not nofib?

I know, I know. People really want to split the maintenance burdens I
guess, and ideologically the Haskell community is all about clean
separation but, please? All of GHC HQ are the de facto maintainers of
this stuff anyway. And as Jan mentioned, testsuite is really *so*
crucial GHC should have it inline. The testsuite is perhaps the most
important of all.

There are other candidates for this treatment too, really. For
example, why is template-haskell, ghc-prim, and hpc split out? GHC is
the only thing that supports them. template-haskell is especially
super-intrusive of an extension to support, and arguably hpc as well.
integer-simple and integer-gmp follow the exact same story. Same with
hoopl and dph. They're all ours. We own them. Just put them all inside
GHC and be done with it. Having active fragmentation in the VCS is not
necessary when there need be none. These packages de-facto ship with
GHC and are very tied to it.

I think people might be really opposed to a mega repository or
something, but honestly? There's less maintenance, cross-package
changes can work correctly and be tracked correctly in terms of
history. It's less work for maintainers. It's less to explain and
frankly, less to mess up. All of this I think is a huge win.

OK, so radical idea is out there. Let's look at some numbers. I think
ultimately anything will be a bit painful, because...

$ cd ~/ghc/ghc-work
$ grep -v "\#" ./packages | head --lines="-1" | wc -l
39

There are 39 sub packages which GHC requires (the -1 is because GHC
itself is listed as the final entry.) These aren't all libraries of
course. But that's a massive number of dependencies really, so
managing them is a pain.

How many are submodules already?

$ grep -v "\#" packages | head --lines="-1" | awk '{print $3}' | grep
"^-" | wc -l
14

So there are 14 submodules, and 25 packages that are free floating.
This is a very very large amount of dependent packages. I guess that's
just the price we pay.

Let's say that hypothetically, we fold all those packages I said into
GHC (base, testsuite, nofib, template-haskell, integer-simple,
integer-gmp, hpc, ghc-prim.) That leaves 14 submodules and 17
floaters.

I actually believe that most of the submodules right now are a fairly
good trade off, because as designed they've all got upstreams. That's
good. But what about things that are *not* submodules?

Let's look at the commits over all the floaters in 1 year. The command
is "git log --since="1 year ago" --format=oneline . | wc -l"

* ghc-tarballs: 1
* hsc2hs: 11
* haddock: 147
* array: 10
* base: 306
* deepseq: 6
* directory: 19
* filepath: 3
* ghc-prim: 9
* haskell98: 11
* haskell2010: 7
* hoopl: 13
* hpc: 13
* integer-gmp: 29
* integer-simple: 8
* old-time: 5
* old-locale: 3
* process: 40
* template-haskell: 19
* unix: 32
* testsuite: 825
* nofib: 50
* parallel: 5
* stm: 31
* dph: 95

Remember, a lot of the commits in several of these repositories are
somewhat closely tied to GHC commits. Testsuite especially, so the
numbers lie a little. But *now* let's take out all the ones we wanted
to fold in.

* ghc-tarballs: 1
* hsc2hs: 11
* haddock: 147
* array: 10
* deepseq: 6
* directory: 19
* filepath: 3
* haskell98: 11
* haskell2010: 7
* old-time: 5
* old-locale: 3
* process: 40
* unix: 32
* parallel: 5
* stm: 31

These are all incredibly low traffic with the exception of haddock,
because I was generous and listed it anyway (even though I shouldn't
because it uses the GHC API.) stm/parallel are also pretty generous
I'd say.

Now let's think about this. Most of these could be converted to
submodules with very little loss possibly. They are not very actively
touched in the process of most development cycles and after looking at
a lot of the changes. It's unlikely you'll hit many merge conflicts or
weird situations. And even if you do, it's probably not going to
happen *often*. It's even possible a lot of these could also become
upstreams with separate maintainers. A lot of these are not dependent
on GHC necessarily in theory or practice possibly: unix, process,
deepseq, array, directory, filepath, etc. Someone could maintain them
and developers work with them. Would anyone want to be a maintainer?
(I heard some people clamoring for GitHub. Become a maintainer and you
can host it where you want :P)

Or we could also fold them in too - mega repository style - and just
say GHC HQ is the de-facto maintainer, as it is now. If someone wants
to step up, we can split it out later. That would just leave 14 sub
repositories which are pretty well taken care of with upstreams. Maybe
a few more if some people come onboard and can maintain things. This
would reduce our problems a lot I feel. Other things like ./sync-all
could change to support branching and other basic multi-repo
facilities as Jan said, and that's not totally unreasonable either I
think. It's about making the normal case easy.

We're often concerned with things being at the right granularity and
sharing stuff maybe, but I think the trend is pretty frighteningly
clear at this point in time - GHC is the de facto implementation of
Haskell, and the number of maintainers isn't especially high. And
maintaining it is a lot of work (it's truly a World Class? programming
language implementation, after all.) And having 39 repositories is
scary. If that's the case, I'd say we should optimize where it counts
and minimize our own burden and make it easy to track our changes, and
make our workflows as simple as possible. Yes, hypothetically a
competitor can come along and give us a run for our money and maybe
they'll want to use base and the testsuite and all that other stuff
and we'll own it and whatnot. And duplication of work etc etc. And
that'll be sad.

Or not. And they'll do their complete own thing and run with it. UHC
has its own base and testsuite, as does JHC for example. Perhaps
sharing things like that is the exception, not the rule or regular
occurrence. Ultimately a software project is as much about ideals, and
what we believe is worth working on with our time - just as it is
about what code you're writing or using right now. Perhaps we should
not hinge our development strategies on these tactics any longer when
the pattern seems to be darn clear.

This proposal is fairly radical. It would require the agreement of
almost every single developer, because several of us have varying
degrees of ownership over parts of the source that concern all of
them. But like I said, it seems the majority would agree something
should change, and I don't think we should give up finding it, so
let's just see where our ideas take us. And I think the wins would be
enormous.

I also appreciate you all dealing with the novels I've written over
the past few days.

--
Regards,
Austin - PGP: 4096R/0x91384671


Reply | Threaded
Open this post in threaded view
|

Proposal: better library management ideas (was: how to checkout proper submodules)

Roman Cheplyaka-2
Hi Austin,

I apologize for not having read the full email yet (I'm in a hurry right
now), but...

* Austin Seipp <aseipp at pobox.com> [2013-06-09 00:23:22-0500]

> --> Let's just put base and testsuite inside the GHC repository
> directly. No submodules, no floating repos. Just put it directly
> inside and make a super commit, I guess. GHC becomes the de facto
> repository. And hey, why not nofib?
>
> I know, I know. People really want to split the maintenance burdens I
> guess, and ideologically the Haskell community is all about clean
> separation but, please? All of GHC HQ are the de facto maintainers of
> this stuff anyway. And as Jan mentioned, testsuite is really *so*
> crucial GHC should have it inline. The testsuite is perhaps the most
> important of all.
>
> There are other candidates for this treatment too, really. For
> example, why is template-haskell, ghc-prim, and hpc split out? GHC is
> the only thing that supports them. template-haskell is especially
> super-intrusive of an extension to support, and arguably hpc as well.
> integer-simple and integer-gmp follow the exact same story. Same with
> hoopl and dph. They're all ours. We own them. Just put them all inside
> GHC and be done with it. Having active fragmentation in the VCS is not
> necessary when there need be none. These packages de-facto ship with
> GHC and are very tied to it.

I'm a strong -1 on this. As one example, we have forks of base and
ghc-prim for Haskell suite:

  https://github.com/haskell-suite/base
  https://github.com/haskell-suite/ghc-prim

which would be much more complicated if these were not independent
repositories.

But more generally, I think there's still hope that the core packages
will be made portable ? I'm referring to Joachim Breitner's work on
splitting the base.

Roman


Reply | Threaded
Open this post in threaded view
|

Proposal: better library management ideas (was: how to checkout proper submodules)

Jan Stolarek
Hi Austin,

I admire your talent for writing emails ;-)

As you wrote in your email I'm totally for including testsuite into GHC, because it is essentially
part of GHC and it doesn't make sense to have a version of testsuite not corresponding to a
version of GHC. As you pointed out the same argument can be used for other packages, but still
there is one thing I don't like about that idea. What if an average haskeller wants to improve
one of the libraries e.g. by adding comments or fixing a minor bug? If we have a super-repo that
person would need to check out everything, which is discouraging. Another, separate issue here is
that such a person needs to either register to ghc-devs or trac to send a patch. Using github
would be helpful here, though I agree with Geoffrey about merge commits - we'd have to think of
sth here. Also, the fact that GHC HQ is maintaining all of the mentioned packages doesn't mean
that they need to be stored in one repo, at least not in git (this would make more sense to me
with SVN where you can checkout a subdirectory).

Still, I strongly agree that sth should be done about current setup. I'm not a git guru so I
cannot fully foresee what would be the consequences of turning everything into submodules, but I
think that it cannot be worse than it is now, right?

Jan

Dnia niedziela, 9 czerwca 2013, Roman Cheplyaka napisa?:

> Hi Austin,
>
> I apologize for not having read the full email yet (I'm in a hurry right
> now), but...
>
> * Austin Seipp <aseipp at pobox.com> [2013-06-09 00:23:22-0500]
>
> > --> Let's just put base and testsuite inside the GHC repository
> > directly. No submodules, no floating repos. Just put it directly
> > inside and make a super commit, I guess. GHC becomes the de facto
> > repository. And hey, why not nofib?
> >
> > I know, I know. People really want to split the maintenance burdens I
> > guess, and ideologically the Haskell community is all about clean
> > separation but, please? All of GHC HQ are the de facto maintainers of
> > this stuff anyway. And as Jan mentioned, testsuite is really *so*
> > crucial GHC should have it inline. The testsuite is perhaps the most
> > important of al
> >
> > There are other candidates for this treatment too, really. For
> > example, why is template-haskell, ghc-prim, and hpc split out? GHC is
> > the only thing that supports them. template-haskell is especially
> > super-intrusive of an extension to support, and arguably hpc as well.
> > integer-simple and integer-gmp follow the exact same story. Same with
> > hoopl and dph. They're all ours. We own them. Just put them all inside
> > GHC and be done with it. Having active fragmentation in the VCS is not
> > necessary when there need be none. These packages de-facto ship with
> > GHC and are very tied to it.
>
> I'm a strong -1 on this. As one example, we have forks of base and
> ghc-prim for Haskell suite:
>
>   https://github.com/haskell-suite/base
>   https://github.com/haskell-suite/ghc-prim
>
> which would be much more complicated if these were not independent
> repositories.
>
> But more generally, I think there's still hope that the core packages
> will be made portable ? I'm referring to Joachim Breitner's work on
> splitting the base.
>
> Roman
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs




Reply | Threaded
Open this post in threaded view
|

Proposal: better library management ideas (was: how to checkout proper submodules)

Austin Seipp-4
In reply to this post by Roman Cheplyaka-2
Hi Roman,

On Sun, Jun 9, 2013 at 1:44 AM, Roman Cheplyaka <roma at ro-che.info> wrote:
> I'm a strong -1 on this. As one example, we have forks of base and
> ghc-prim for Haskell suite:
>
>   https://github.com/haskell-suite/base
>   https://github.com/haskell-suite/ghc-prim
>
> which would be much more complicated if these were not independent
> repositories.

I hate being that person but, if the purpose of these forks is to work
around specific bugs in HSE and/or fix problems with name resolution
of GHC-specific terms, which sort of seems to be the case from the
log, I don't think hacking base & co. is a long term solution. It
could potentially need infinite ongoing maintenance. I went down this
road with LHC too.

And my gut feeling is that hacking ghc-prim out-of-band feels so
amazingly wrong I'm frankly not sure if "I need to fork it" can
actually warrant a huge amount of sympathy, to the point of keeping
the repository separate for that 1 fork in existence (granted,
ghc-prim is still pretty low traffic. But base is not.) If you DO need
help from GHC, is there really nothing we could easily and reasonably
do to further assist you? I think asking for specific, principled
solutions on our part is not out of the question here.

Are there any other forks of base people have for any particular
reason? What reasons are those?

> But more generally, I think there's still hope that the core packages
> will be made portable ? I'm referring to Joachim Breitner's work on
> splitting the base.

To be clear, packages and their numbers aren't *really* the problem.
It's repositories. The numbers just make this slightly worse. Adding
packages and adding repositories both add overhead. Adding
repositories adds a significantly *larger* amount of complexity, all
things considered. The only honest, legitimate way to reduce that
complexity is to fold in repositories. But this means that we have to
give something up, too.

If base were to get split into 5 packages or 8 packages, that's
potentially fine by me, even welcomed. What I don't want is 5 more
repositories that are all intimately tied to GHC's build and features,
which a majority of GHC-specific work will be driven towards, and over
time that we then must manage and synchronize heavily. That's just a
massive amount of work. Just looking at Joachim's fork of base on
github, I already have some reservations about its current
implementation. Like, base-float still exports GHC-specific
namespaces. Every package still has a lot GHC specific code, as
opposed to some isolated substrate that we provide and base-* packages
interface with. So we're going to maintain all of that, it's the sad
truth. And if Joachim's patch were merged tomorrow somehow, I think
that frankly so much of it would still be under GHC control, my
argument would still stand. It would still be one repository. We would
still own it. It makes base more granular, but this has almost nothing
to do with our real problems.

Fixing all of that where we're not *actually* in control of it is a
ton of work. The current patches just don't solve that I think. And
this was last discussed in February? So what's the timeline here?
Clearly we're not even done with the API discussion at all. So, 6
months? A year? Who knows? "When it's done"? I'm not sure most of us
want to wait that long, especially considering the need to track down
bugs and have accurate historical logs is a fairly frequent
occurrence.

> Roman

--
Regards,
Austin - PGP: 4096R/0x91384671


Reply | Threaded
Open this post in threaded view
|

Proposal: better library management ideas (was: how to checkout proper submodules)

Austin Seipp-4
In reply to this post by Jan Stolarek
On Sun, Jun 9, 2013 at 3:47 AM, Jan Stolarek <jan.stolarek at p.lodz.pl> wrote:
> I admire your talent for writing emails ;-)

You can be honest and just call them what they are: horribly written novellas.

> As you wrote in your email I'm totally for including testsuite into GHC, because it is essentially
> part of GHC and it doesn't make sense to have a version of testsuite not corresponding to a
> version of GHC. As you pointed out the same argument can be used for other packages, but still
> there is one thing I don't like about that idea. What if an average haskeller wants to improve
> one of the libraries e.g. by adding comments or fixing a minor bug? If we have a super-repo that
> person would need to check out everything, which is discouraging.

This is a good point I hadn't considered, but it's less of a worry for
some packages than others. For example, base, ghc-prim and
template-haskell are so intimately tied into GHC that reinstalling
them is either impossible or a bad idea. To change them, you must
build your own GHC anyway (either from source, or HEAD.) And if you're
using a Haskell Platform compiler, clearly you'd have no luck with the
git repository anyway (due to their strong interdependence.)

But again, I'm totally OK with a lot of these other repositories being
submodules. For example, process, unix, deepseq, filepath, directory.
Those don't need to be folded in. Lots of them could have their own
maintainers with separate upstreams. They're touched infrequently
enough traffic concerns aren't as much of a deal. I just want the most
high-traffic'd repositories dealt with, because in practice these are
the *most* critical and the most interdependent. That in turn leads to
the most problems.

> Another, separate issue here is
> that such a person needs to either register to ghc-devs or trac to send a patch. Using github
> would be helpful here, though I agree with Geoffrey about merge commits - we'd have to think of
> sth here. Also, the fact that GHC HQ is maintaining all of the mentioned packages doesn't mean
> that they need to be stored in one repo, at least not in git (this would make more sense to me
> with SVN where you can checkout a subdirectory).

Not necessarily, the 'owners' of the packages are still the libraries
committee. People can propose changes there as they have always done.
It just so happens most of the 'libraries' maintained packages are
de-facto maintained by GHC people.

You're right not all of them need to be folded in. But I think several
of them should be, and these are the ones that hurt the most.

(Plus, my radical proposal can't be considered totally, completely
radical unless I propose something which would - of course - be shot
down.)

> Still, I strongly agree that sth should be done about current setup. I'm not a git guru so I
> cannot fully foresee what would be the consequences of turning everything into submodules, but I
> think that it cannot be worse than it is now, right?

For some submodules, it could certainly be worse. Please see Ian's
link in the prior discussion concerning submodules - for high-traffic
repositories, some of the concerns are disconcerning.

> Jan
>
> Dnia niedziela, 9 czerwca 2013, Roman Cheplyaka napisa?:
>> Hi Austin,
>>
>> I apologize for not having read the full email yet (I'm in a hurry right
>> now), but...
>>
>> * Austin Seipp <aseipp at pobox.com> [2013-06-09 00:23:22-0500]
>>
>> > --> Let's just put base and testsuite inside the GHC repository
>> > directly. No submodules, no floating repos. Just put it directly
>> > inside and make a super commit, I guess. GHC becomes the de facto
>> > repository. And hey, why not nofib?
>> >
>> > I know, I know. People really want to split the maintenance burdens I
>> > guess, and ideologically the Haskell community is all about clean
>> > separation but, please? All of GHC HQ are the de facto maintainers of
>> > this stuff anyway. And as Jan mentioned, testsuite is really *so*
>> > crucial GHC should have it inline. The testsuite is perhaps the most
>> > important of al
>> >
>> > There are other candidates for this treatment too, really. For
>> > example, why is template-haskell, ghc-prim, and hpc split out? GHC is
>> > the only thing that supports them. template-haskell is especially
>> > super-intrusive of an extension to support, and arguably hpc as well.
>> > integer-simple and integer-gmp follow the exact same story. Same with
>> > hoopl and dph. They're all ours. We own them. Just put them all inside
>> > GHC and be done with it. Having active fragmentation in the VCS is not
>> > necessary when there need be none. These packages de-facto ship with
>> > GHC and are very tied to it.
>>
>> I'm a strong -1 on this. As one example, we have forks of base and
>> ghc-prim for Haskell suite:
>>
>>   https://github.com/haskell-suite/base
>>   https://github.com/haskell-suite/ghc-prim
>>
>> which would be much more complicated if these were not independent
>> repositories.
>>
>> But more generally, I think there's still hope that the core packages
>> will be made portable ? I'm referring to Joachim Breitner's work on
>> splitting the base.
>>
>> Roman
>>
>> _______________________________________________
>> ghc-devs mailing list
>> ghc-devs at haskell.org
>> http://www.haskell.org/mailman/listinfo/ghc-devs
>
>



--
Regards,
Austin - PGP: 4096R/0x91384671


Reply | Threaded
Open this post in threaded view
|

Proposal: better library management ideas (was: how to checkout proper submodules)

Ian Lynagh-2
In reply to this post by Austin Seipp-4
On Sun, Jun 09, 2013 at 11:15:37AM -0500, Austin Seipp wrote:
>
> > I'm referring to Joachim Breitner's work on
> > splitting the base.
>
> So what's the timeline here?

As soon as possible after 7.8 is branched.


Thanks
Ian



Reply | Threaded
Open this post in threaded view
|

Proposal: better library management ideas (was: how to checkout proper submodules)

Jan Stolarek
In reply to this post by Austin Seipp-4
> You can be honest and just call them what they are: horribly written
> novellas.
Actually, I was thinking that instead of posting to the list you might consider publishing your
emails as papers on workshops or symposia ;)

> for high-traffic repositories, some of the concerns are disconcerning.
But the high-traffic repositories (base, testsuite) are already submodules, right? For me the
major problem of the current setup is that we cannot use one of the most important features of a
VCS, i.e. going back in time. The only solutions to this problem that I am aware of are folding
or turning into submodules all libraries that GHC depends on.

I just had this moment of enlightment that the question of including a repo as a submodule (or
folding it into GHC tree) is not a matter of traffic, but a matter of that library's
implementation. If it uses GHC-specific API then it goes in, because it is tightly-coupled. If it
is implemented in standard Haskell then it can stay out, because changes to compiler should not
affect it. This is pretty simple criterium to identify libraries that we should be concerned with
(perhaps this is obvious, but it only occured to me now). So a high-traffic repo that does not
depend on non-standard features of GHC could still be kept as an in-tree repo, without affecting
the ability to go back in time.

Jan


Reply | Threaded
Open this post in threaded view
|

Proposal: better library management ideas (was: how to checkout proper submodules)

Jan Stolarek
In reply to this post by Austin Seipp-4
Oh, and I've been made aware that git 1.7 and later can checkout a subdirectory of a repo - this
partially invalidates my previous argument. I'm saying partially, because it is a bit more
difficult than dealing with a library that has its own repo + it seems that some potential
contributors might not be aware of this feature (like me today in the morning).

Janek


Reply | Threaded
Open this post in threaded view
|

Proposal: better library management ideas (was: how to checkout proper submodules)

Roman Cheplyaka-2
In reply to this post by Austin Seipp-4
* Austin Seipp <aseipp at pobox.com> [2013-06-09 11:15:37-0500]

> Hi Roman,
>
> On Sun, Jun 9, 2013 at 1:44 AM, Roman Cheplyaka <roma at ro-che.info> wrote:
> > I'm a strong -1 on this. As one example, we have forks of base and
> > ghc-prim for Haskell suite:
> >
> >   https://github.com/haskell-suite/base
> >   https://github.com/haskell-suite/ghc-prim
> >
> > which would be much more complicated if these were not independent
> > repositories.
>
> I hate being that person but, if the purpose of these forks is to work
> around specific bugs in HSE and/or fix problems with name resolution
> of GHC-specific terms, which sort of seems to be the case from the
> log, I don't think hacking base & co. is a long term solution. It
> could potentially need infinite ongoing maintenance. I went down this
> road with LHC too.

It is only partly to work around bugs in HSE.

The second part is to work around bugs and quirks in base itself. There
are places where CPP wouldn't produce meaningful code unless
__GLASGOW_HASKELL__ is defined, for example.

Even ignoring those obvious bugs for a minute, currently the large part
of base is defined under GHC.* hierarchy and isn't available unless
__GLASGOW_HASKELL__ is defined.

But okay, let's suppose that at some point everything is fixed and we
don't have to *fork* base.  We still would like to use it! Should we
fetch the whole GHC tree in order to get its development version?

> And my gut feeling is that hacking ghc-prim out-of-band feels so
> amazingly wrong I'm frankly not sure if "I need to fork it" can
> actually warrant a huge amount of sympathy, to the point of keeping
> the repository separate for that 1 fork in existence (granted,
> ghc-prim is still pretty low traffic. But base is not.)

It *is* wrong, but who is to blame that a big part of Prelude comes from
there, including all logical operations and classes Eq and Ord?

> If you DO need help from GHC, is there really nothing we could easily
> and reasonably do to further assist you? I think asking for specific,
> principled solutions on our part is not out of the question here.

The best help would be to make and keep base relatively portable and not
to introduce superfluous conditional compilation. (I realise that a lot
of that has just accumulated historically, but now is a good time to get
rid of it.)

It is a ton of work, and I'm very happy when I see people like Joachim
trying to do something in that direction.

Right now I'm only asking not to make their work even harder by moving
base under the ghc repository.

> > But more generally, I think there's still hope that the core packages
> > will be made portable ? I'm referring to Joachim Breitner's work on
> > splitting the base.
>
> To be clear, packages and their numbers aren't *really* the problem.

What I'm trying to say here is that there's hope for a portable base.
Maybe not in the form of split base ? I don't know.
But it's the direction we should be moving anyways.

And usurping base by GHC is a move in the opposite direction.

Roman


Reply | Threaded
Open this post in threaded view
|

Proposal: better library management ideas (was: how to checkout proper submodules)

John Lato-2
On Mon, Jun 10, 2013 at 1:32 AM, Roman Cheplyaka <roma at ro-che.info> wrote:

>
> What I'm trying to say here is that there's hope for a portable base.
> Maybe not in the form of split base ? I don't know.
> But it's the direction we should be moving anyways.
>
> And usurping base by GHC is a move in the opposite direction.


Maybe that's a good thing?  The current situation doesn't really seem to be
working.  Keeping base separate negatively impacts workflow of GHC devs (as
evidenced by these threads), just to support something that other compilers
don't use anyway.  Maybe it would be easier to fold base back into ghc and
try again, perhaps after some code cleanup?  Having base in ghc may provide
more motivation to separate it properly.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130610/8584d127/attachment.htm>

Reply | Threaded
Open this post in threaded view
|

Proposal: better library management ideas (was: how to checkout proper submodules)

Roman Cheplyaka-2
* John Lato <jwlato at gmail.com> [2013-06-10 07:59:55+0800]

> On Mon, Jun 10, 2013 at 1:32 AM, Roman Cheplyaka <roma at ro-che.info> wrote:
>
> >
> > What I'm trying to say here is that there's hope for a portable base.
> > Maybe not in the form of split base ? I don't know.
> > But it's the direction we should be moving anyways.
> >
> > And usurping base by GHC is a move in the opposite direction.
>
>
> Maybe that's a good thing?  The current situation doesn't really seem to be
> working.  Keeping base separate negatively impacts workflow of GHC devs (as
> evidenced by these threads), just to support something that other compilers
> don't use anyway.  Maybe it would be easier to fold base back into ghc and
> try again, perhaps after some code cleanup?  Having base in ghc may provide
> more motivation to separate it properly.

After base is in GHC, separating it again will be only harder, not
easier. Or do you have a specific plan in mind?

Roman


Reply | Threaded
Open this post in threaded view
|

Proposal: better library management ideas (was: how to checkout proper submodules)

John Lato-2
On Mon, Jun 10, 2013 at 1:32 PM, Roman Cheplyaka <roma at ro-che.info> wrote:

> * John Lato <jwlato at gmail.com> [2013-06-10 07:59:55+0800]
> > On Mon, Jun 10, 2013 at 1:32 AM, Roman Cheplyaka <roma at ro-che.info>
> wrote:
> >
> > >
> > > What I'm trying to say here is that there's hope for a portable base.
> > > Maybe not in the form of split base ? I don't know.
> > > But it's the direction we should be moving anyways.
> > >
> > > And usurping base by GHC is a move in the opposite direction.
> >
> >
> > Maybe that's a good thing?  The current situation doesn't really seem to
> be
> > working.  Keeping base separate negatively impacts workflow of GHC devs
> (as
> > evidenced by these threads), just to support something that other
> compilers
> > don't use anyway.  Maybe it would be easier to fold base back into ghc
> and
> > try again, perhaps after some code cleanup?  Having base in ghc may
> provide
> > more motivation to separate it properly.
>
> After base is in GHC, separating it again will be only harder, not
> easier. Or do you have a specific plan in mind?


It's more about motivation.  It seems to me right now base is in a halfway
state.  People think that moving it further away from ghc is The Right
Thing To Do, but nobody is feeling enough pain to be sufficiently motivated
to do it.  If we apply pain, then someone will be motivated to do it
properly.  And if nobody steps up, maybe having a platform-agnostic base
isn't really very important.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130610/c37f56e5/attachment.htm>

Reply | Threaded
Open this post in threaded view
|

Proposal: better library management ideas (was: how to checkout proper submodules)

Simon Peyton Jones
In reply to this post by John Lato-2
I forget who said it, but it's true that we have uncritically assumed that

*         One package = one repository
But I now realise that there's no need for that.   We could certainly have one repo with multiple packages.

What are the motivations for having a separate repository.  Are these two the main ones?

*         Sense of "ownership" by the maintainer. (My package isn't merely a barnacle on the side of GHC.)

*         Ability to release new versions un-synchronised with GHC releases

And neither really hold for the GHC-maintained packages.

One merit of splitting up 'base' will be that a chunk of it can go in the "independent" sector, leaving a smaller rump that is intimately coupled to GHC.  But we don't need to await that glorious day before getting on with the debate this thread is so constructively having.

Again: I am a non-expert. I will be happy to fall in with whatever you git experts decide, provided (a) you have some measure of agreement that it's step forward (b) you tell me clearly what my workflows should be.

Simon

From: ghc-devs-bounces at haskell.org [mailto:ghc-devs-bounces at haskell.org] On Behalf Of John Lato
Sent: 10 June 2013 01:00
To: Roman Cheplyaka
Cc: ghc-devs at haskell.org
Subject: Re: Proposal: better library management ideas (was: how to checkout proper submodules)

On Mon, Jun 10, 2013 at 1:32 AM, Roman Cheplyaka <roma at ro-che.info<mailto:roma at ro-che.info>> wrote:

What I'm trying to say here is that there's hope for a portable base.
Maybe not in the form of split base - I don't know.
But it's the direction we should be moving anyways.

And usurping base by GHC is a move in the opposite direction.

Maybe that's a good thing?  The current situation doesn't really seem to be working.  Keeping base separate negatively impacts workflow of GHC devs (as evidenced by these threads), just to support something that other compilers don't use anyway.  Maybe it would be easier to fold base back into ghc and try again, perhaps after some code cleanup?  Having base in ghc may provide more motivation to separate it properly.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130610/f35f03d8/attachment-0001.htm>

Reply | Threaded
Open this post in threaded view
|

Proposal: better library management ideas (was: how to checkout proper submodules)

Roman Cheplyaka-2
My motivation for having a separate repository is that I can check it
out and work on it without having to check out the whole GHC.

At the moment base and ghc-prim are used by the name-resolving compiler
http://haskell-suite.github.io/haskell-names/

Roman

* Simon Peyton-Jones <simonpj at microsoft.com> [2013-06-10 07:42:19+0000]

> I forget who said it, but it's true that we have uncritically assumed that
>
> *         One package = one repository
> But I now realise that there's no need for that.   We could certainly have one repo with multiple packages.
>
> What are the motivations for having a separate repository.  Are these two the main ones?
>
> *         Sense of "ownership" by the maintainer. (My package isn't merely a barnacle on the side of GHC.)
>
> *         Ability to release new versions un-synchronised with GHC releases
>
> And neither really hold for the GHC-maintained packages.
>
> One merit of splitting up 'base' will be that a chunk of it can go in the "independent" sector, leaving a smaller rump that is intimately coupled to GHC.  But we don't need to await that glorious day before getting on with the debate this thread is so constructively having.
>
> Again: I am a non-expert. I will be happy to fall in with whatever you git experts decide, provided (a) you have some measure of agreement that it's step forward (b) you tell me clearly what my workflows should be.
>
> Simon
>
> From: ghc-devs-bounces at haskell.org [mailto:ghc-devs-bounces at haskell.org] On Behalf Of John Lato
> Sent: 10 June 2013 01:00
> To: Roman Cheplyaka
> Cc: ghc-devs at haskell.org
> Subject: Re: Proposal: better library management ideas (was: how to checkout proper submodules)
>
> On Mon, Jun 10, 2013 at 1:32 AM, Roman Cheplyaka <roma at ro-che.info<mailto:roma at ro-che.info>> wrote:
>
> What I'm trying to say here is that there's hope for a portable base.
> Maybe not in the form of split base - I don't know.
> But it's the direction we should be moving anyways.
>
> And usurping base by GHC is a move in the opposite direction.
>
> Maybe that's a good thing?  The current situation doesn't really seem to be working.  Keeping base separate negatively impacts workflow of GHC devs (as evidenced by these threads), just to support something that other compilers don't use anyway.  Maybe it would be easier to fold base back into ghc and try again, perhaps after some code cleanup?  Having base in ghc may provide more motivation to separate it properly.


Reply | Threaded
Open this post in threaded view
|

Proposal: better library management ideas (was: how to checkout proper submodules)

Daniel Trstenjak-2

On Mon, Jun 10, 2013 at 10:54:06AM +0300, Roman Cheplyaka wrote:
> My motivation for having a separate repository is that I can check it
> out and work on it without having to check out the whole GHC.

With git-subtree you can have both. A separate repository for easy
forking of e.g. base and just one repository for GHC with a sub directory
for base.

At work we're sharing a quite big library between two development teams.
There's a separate repository for this library, which is used for
synchronization between both projects. Each project has it's own
repository with a sub directory containing the library and git-subtree
is used to merge this sub directory with the library repository.

Most developers don't even have to care that there's a separate
repository for the library, they're just working with the one project repository.


Reply | Threaded
Open this post in threaded view
|

Proposal: better library management ideas (was: how to checkout proper submodules)

Geoffrey Mainland
On 06/10/2013 11:06 AM, Daniel Trstenjak wrote:

> On Mon, Jun 10, 2013 at 10:54:06AM +0300, Roman Cheplyaka wrote:
>> My motivation for having a separate repository is that I can check it
>> out and work on it without having to check out the whole GHC.
>
> With git-subtree you can have both. A separate repository for easy
> forking of e.g. base and just one repository for GHC with a sub directory
> for base.
>
> At work we're sharing a quite big library between two development teams.
> There's a separate repository for this library, which is used for
> synchronization between both projects. Each project has it's own
> repository with a sub directory containing the library and git-subtree
> is used to merge this sub directory with the library repository.
>
> Most developers don't even have to care that there's a separate
> repository for the library, they're just working with the one project
repository.
>
> From time to time - perhaps once a week - the changes in the projects
> get merged back into the library repository.
>
> git-submodules is a burden for every developer, git-subtree is "just" a
> burden for the developer doing the merges with the external repository.
>
> The git-subtree script is more or less just a nice wrapper around the
> subtree merge strategy of git-merge. It uses only the available git
commands.

I mentioned git-subtree as a possible alternative earlier in the
thread. One of the primary objections at the time was that the subtree
command is not installed by default in, e.g., the Ubuntu git package.

Merging base and/or testsuite into the ghc repository wouldn't solve the
primary issue, which is that we can't reproduce a full source code tree
without to resorting to the fingerprints script, and even then we can't
bisect. Side note: the fingerprint script *didn't even work* for almost
a year after it was introduced; see commit 73ce2e70.

I think there are three realistic choices about how we should resolve
this issue. Our choice affects the decision about whether or not base
and/or testsuite should be merged into the ghc repository, so I think
the merger discussion should be tabled for the time being.

1) Leave everything as-is. We live with a mix of submodules and
fingerprints.

2) Use submodules.

3) Use subtrees.

I don't think there is a realistic third option, e.g., use a mix of
subtrees and submodules, but I may be wrong.

So, if we can agree that these are the three realistic alternatives, I
volunteer to flesh out the wiki so it lists the pros and cons of each
choice. If there are other sane paths forward besides these three,
please let us know!

Geoff



Reply | Threaded
Open this post in threaded view
|

Proposal: better library management ideas (was: how to checkout proper submodules)

Roman Cheplyaka-2
In reply to this post by Daniel Trstenjak-2
Hmm, okay, if you're saying that this workflow works and is not very
painful, then I withdraw my objection.

Thanks,
Roman

* Daniel Trstenjak <daniel.trstenjak at gmail.com> [2013-06-10 12:06:56+0200]

>
> On Mon, Jun 10, 2013 at 10:54:06AM +0300, Roman Cheplyaka wrote:
> > My motivation for having a separate repository is that I can check it
> > out and work on it without having to check out the whole GHC.
>
> With git-subtree you can have both. A separate repository for easy
> forking of e.g. base and just one repository for GHC with a sub directory
> for base.
>
> At work we're sharing a quite big library between two development teams.
> There's a separate repository for this library, which is used for
> synchronization between both projects. Each project has it's own
> repository with a sub directory containing the library and git-subtree
> is used to merge this sub directory with the library repository.
>
> Most developers don't even have to care that there's a separate
> repository for the library, they're just working with the one project repository.
>
> From time to time - perhaps once a week - the changes in the projects
> get merged back into the library repository.
>
> git-submodules is a burden for every developer, git-subtree is "just" a
> burden for the developer doing the merges with the external repository.
>
> The git-subtree script is more or less just a nice wrapper around the
> subtree merge strategy of git-merge. It uses only the available git commands.
>
>
> Greetings,
> Daniel


Reply | Threaded
Open this post in threaded view
|

Proposal: better library management ideas (was: how to checkout proper submodules)

Ian Lynagh-2
In reply to this post by Geoffrey Mainland
On Mon, Jun 10, 2013 at 11:23:13AM +0100, Geoffrey Mainland wrote:
>
> Side note: the fingerprint script *didn't even work* for almost
> a year after it was introduced; see commit 73ce2e70.

Which implies that wanting to go back in time is rare, so making it easy
should be given low weight when considering the options?

> 3) Use subtrees.

Is this possible with subtrees?:

* Initially ghc's Cabal repo is at the same commit as upstream
* We make a local commit 123 in Cabal to fix some bug
* Cabal upstream makes a commit 456 to fix the same bug differently
* We jump to commit 456, in such a way that we don't end up merging
  with our 123 commit every time we pull from Cabal in the future


Thanks
Ian
--
Ian Lynagh, Haskell Consultant
Well-Typed LLP, http://www.well-typed.com/


Reply | Threaded
Open this post in threaded view
|

Proposal: better library management ideas (was: how to checkout proper submodules)

Nicolas Trangez
On Mon, 2013-06-10 at 11:45 +0100, Ian Lynagh wrote:
> > Side note: the fingerprint script *didn't even work* for almost
> > a year after it was introduced; see commit 73ce2e70.
>
> Which implies that wanting to go back in time is rare, so making it
> easy
> should be given low weight when considering the options?

If 'git bisect' would work (out of the box) on the GHC repo, going back
in time would certainly be a more common operation.

Nicolas



Reply | Threaded
Open this post in threaded view
|

Proposal: better library management ideas (was: how to checkout proper submodules)

Geoffrey Mainland
On 06/10/2013 11:49 AM, Nicolas Trangez wrote:

> On Mon, 2013-06-10 at 11:45 +0100, Ian Lynagh wrote:
>>> Side note: the fingerprint script *didn't even work* for almost
>>> a year after it was introduced; see commit 73ce2e70.
>>
>> Which implies that wanting to go back in time is rare, so making it
>> easy
>> should be given low weight when considering the options?
>
> If 'git bisect' would work (out of the box) on the GHC repo, going back
> in time would certainly be a more common operation.

I agree. Going back in time is really, really hard with fingerprints
because you have to get the fingerprint files somewhere, and they don't
always exist.

Also, it could be the case that people used the fingerprint files to
"bisect" but didn't notice they weren't quite right because the
fingerprints were "close enough." OK for bug-finding, terrible for
reproduceable builds.

Many people on the list have been quite vocal about wanting to be able
to bisect. *I* have wanted to be able to bisect many, many times, but I
don't because it's such a pain.

I also want to be able to tell people how to build branches of ghc that
I am working on, e.g., the simd and th-new branches. That means having
to store a fingerprint file somewhere public and keep it in sync with my
tree. I would much rather just tell them to check out the foo branch of
ghc and be done with it.

Geoff



12