how to checkout proper submodules

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
44 messages Options
123
Reply | Threaded
Open this post in threaded view
|

how to checkout proper submodules

Kazu Yamamoto (山本和彦)
Hi,

Andreas and I found that the new IO manager is not working properly in
the current GHC head. I'm sure that it worked well at least on May 7.

We need to narrow the range of commits, so I did:

  % git checkout bb2795db36b36966697c228315ae20767c4a8753
  % git submodule update

But this does not checkout proper submodules. For instance,
libraries/base has newer commits. And of cource, building fails.

Please tell us how to checkout proper submodules against a specific
GHC tree.

--Kazu


Reply | Threaded
Open this post in threaded view
|

how to checkout proper submodules

Johan Tibell-2
Unfortunately we don't use submodules for all repos e.g. base. This makes
it very hard to accurately check out a previous state and bisect errors
unfortunately.


On Tue, Jun 4, 2013 at 6:07 PM, Kazu Yamamoto <kazu at iij.ad.jp> wrote:

> Hi,
>
> Andreas and I found that the new IO manager is not working properly in
> the current GHC head. I'm sure that it worked well at least on May 7.
>
> We need to narrow the range of commits, so I did:
>
>   % git checkout bb2795db36b36966697c228315ae20767c4a8753
>   % git submodule update
>
> But this does not checkout proper submodules. For instance,
> libraries/base has newer commits. And of cource, building fails.
>
> Please tell us how to checkout proper submodules against a specific
> GHC tree.
>
> --Kazu
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130604/b8929498/attachment.htm>

Reply | Threaded
Open this post in threaded view
|

how to checkout proper submodules

Nicolas Frisby
Is the way forward then to manually bisect by timestamp? Perhaps there are
scripts "out there" to assist with stuck a task.
On Jun 4, 2013 8:47 PM, "Johan Tibell" <johan.tibell at gmail.com> wrote:

> Unfortunately we don't use submodules for all repos e.g. base. This makes
> it very hard to accurately check out a previous state and bisect errors
> unfortunately.
>
>
> On Tue, Jun 4, 2013 at 6:07 PM, Kazu Yamamoto <kazu at iij.ad.jp> wrote:
>
>> Hi,
>>
>> Andreas and I found that the new IO manager is not working properly in
>> the current GHC head. I'm sure that it worked well at least on May 7.
>>
>> We need to narrow the range of commits, so I did:
>>
>>   % git checkout bb2795db36b36966697c228315ae20767c4a8753
>>   % git submodule update
>>
>> But this does not checkout proper submodules. For instance,
>> libraries/base has newer commits. And of cource, building fails.
>>
>> Please tell us how to checkout proper submodules against a specific
>> GHC tree.
>>
>> --Kazu
>>
>> _______________________________________________
>> ghc-devs mailing list
>> ghc-devs at haskell.org
>> http://www.haskell.org/mailman/listinfo/ghc-devs
>>
>
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130604/a096e408/attachment.htm>

Reply | Threaded
Open this post in threaded view
|

how to checkout proper submodules

Mateusz Kowalczyk
In reply to this post by Johan Tibell-2
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/06/13 02:46, Johan Tibell wrote:

> Unfortunately we don't use submodules for all repos e.g. base. This
> makes it very hard to accurately check out a previous state and
> bisect errors unfortunately.
>
>
> On Tue, Jun 4, 2013 at 6:07 PM, Kazu Yamamoto <kazu at iij.ad.jp>
> wrote:
>
>> Hi,
>>
>> Andreas and I found that the new IO manager is not working
>> properly in the current GHC head. I'm sure that it worked well at
>> least on May 7.
>>
>> We need to narrow the range of commits, so I did:
>>
>> % git checkout bb2795db36b36966697c228315ae20767c4a8753 % git
>> submodule update
>>
>> But this does not checkout proper submodules. For instance,
>> libraries/base has newer commits. And of cource, building fails.
>>
>> Please tell us how to checkout proper submodules against a
>> specific GHC tree.
>>
>> --Kazu

Is there a reason why some submodules are proper git repos and some
aren't? Benefits of having git repos as submodules are hopefully clear
so I'm interested why this isn't the case here.

- --
Mateusz K.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)

iQIcBAEBAgAGBQJRrpwJAAoJEM1mucMq2pqX8fcP/iNexwoV425kxRh5uPH0/Rrc
hP0a9li5z4ddzYHjCaZTFc25HxVK6f6FqX05nbfUH8Uc39a71g+A2qntdpQ0JI7S
SO5EBH39i/ehCmyUDdM/tcdF4jvdk+1iVmiyXmzsefnC+WC4vlMSEwNnOeWUxNok
79AUw8cC/7yAT88q3Ktvs2hgPKmpQ/90nQnNvLceYgSu19UgGCilmfVn0KuOCtda
wBEO32xC61MJdDVrPgQqqo/niW4s67ECF5yEZEvtBKY8sBBtJQhR+nOTtiaBqTl5
q8DHz+6V8djGAZ89xiDjFakGA1E5+VhKkCZhwwvsH3DqzfVn/q9G2IH9pomdxYCy
COhefxxN2Fsqe5V5rqBhZEdASJuraPhnD6Wh2cHTHgCrYC39RjgHGdUsZ304ufaN
P9CDxBn2uJtPaW5klL8yMvRAjL78myljdozZMmeqZ/Jdwi28iCJ+T8Bg2ZTnwncm
J1BRKHdx84AhVqQtJEv2fl6jX7XX3Mh2Iuoe9Vkr2WoO7UaqkJQUE0rhlExHrh9/
NQHKQhDxeinHtc5DRJBFA6n1eKhb1CKm/XPA0k2xQMjTaC6GamwOD1BpKekhHrxk
yExUIINGmDBr0PaitTJq85NRFsBzLciCbO2oPVnVVTkCJdnZf0xSuetkrnh1hSgM
NAhVIIZikZgPKEnJlP/E
=YFFN
-----END PGP SIGNATURE-----


Reply | Threaded
Open this post in threaded view
|

how to checkout proper submodules

Austin Seipp-4
In reply to this post by Kazu Yamamoto (山本和彦)
(Warning: incoming answer, followed by a rant.)

Base is not a submodule, meaning that there is essentially no way to
automatically check it back out to the "exact same state" it was in,
given some specified GHC commit - the commit IDs are not tracked.

At this point, you are basically on your own. You'll have to manually
checkout libraries/base to a specific commit that occurred 'around'
the same time as the GHC commit. In this case, that means looking
through whatever commits hit HEAD on May 7th:

$ cd libraries/base
$ git log --until="May 7th"

The resulting list will show you what happened up to may 7th. Take the
latest commit in that list, and check out base to that revision. Any
commits afterword happened on may 8th or later:

$ git checkout -b temporary-io-fix <sha1 of latest May 7th commit>

You're going to need to do this for every module that is not tracked
as a submodule. Most of the repositories are very low-activity. base &
testsuite are going to be the annoying ones.

You'll have to continue this 'manual bisection' by hand, with a very
hefty dose of frustrating trial-and-error, in my experience.

There is a secondary alternative. GHC has a script called
'fingerprint.py' (in utils/fingerprint/) which is somewhat designed to
work around this deficiency (very poorly.) This script basically dumps
out a text file, containing a key/value pair mapping every repository
to its current HEAD commit. It can then take that text file and
automatically do 'git checkout' for you in every repo. The idea is you
can take fingerprints of the tree, save the results, and cleanly check
out to some state later.

The GHC build bots run by Ben L.'s "Buildbox" library automatically
runs the 'fingerprint.py' script during nightly-builds, from what I
remember. It may be possible to just look in the ghc-builds archives,
and steal some fingerprints from the last month off one of the
buildbots. I don't know who maintains the individual bots; perhaps you
can ask the list. However, this will at best give you a 1-day level of
granularity, rather than commit level granularity, which is still
rather unsatisfying.

------------- Answer over, rant begins. ---------------------

I know we had this discussion sometime recently I think, but can
someone *please* explain why we are in this situation of half
submodules, half random-floating-git-repository-checkouts? It's
terrible. I'm frankly surprised we've even been doing it this long,
over a year or more? It is literally the worst of submodules, and
free-standing-repositories put together, with none of the advantages
of either.

Free-standing repos are attractive because they are just there, and
you don't have to 'maintain' them (sort of.) Submodules are attractive
because they identify the critical points in which your repositories
depend on each other. We have neither benefit right now, clearly.

In particular, this makes it impossible to use tools like 'git bisect'
which is *incredibly* useful for just these exact cases. Hell, you can
even make 'git bisect' work almost 100% automatically with a tiny bit
of shell scripting.

http://mainisusuallyafunction.blogspot.com/2012/09/tracking-down-unused-variables-with.html

You could just instead have a script that built the compiler, and ran
the built compiler on your testcase, after every bisection. Wouldn't
it be *great* to have something like that Just Work? A tool like this
could potentially boil down Kazu's bug almost automatically for
example, with little-to-no frustrating intervention.

And even now, looking at the repository listing of what is in
libraries/, that are not submodules, I really see no reason why more -
or even all - of them cannot be submodules. Is it a workflow issue of
some sort? That's what I'm thinking at this point, but I also don't
think it could be any worse than it is now.

Realistically, very few libraries GHC needs for bootstrapping seem to
change that much. unix, integer-simple, haskeline and filepath for
example change *extremely* infrequently, but all are free-standing.
Why? In the event they were submodules, would anything actually be
lost?

The maintainer - that is, not GHC HQ - would still 'own' the official
repository. They can make changes to it. But if there is a necessity
to pull that in for GHC (feature request, bug fix, random thing) it
can be done by updating the submodule pointer to the new commit. But
this must happen explicitly by a GHC committer. In the event they
update the submodule pointer, they should also obviously make sure the
build still works.

That means we have to update the submodule pointers ourselves if
things change. That sucks I guess, but really, aside from base and
testsuite, the two most frequently changing repositories, is that
*actually* going to cost us a lot of work?

And even if it does cost us work, I'll speak for myself: I will gladly
pay for that work and do it all myself if it means I can actually
bisect and actually roll back my tree to some point to fix things -
without needing to prepare for it months in advance using hacks. Like
creating thousands of fingerprints, using fingerprint.py every day
when people make commits (no, I haven't done this, but it could be
done, and I really don't want to do it.)

Long-term reproducible builds are, IMO, a must for any project.
*Especially* a project of our size. *Especially* a compiler of all
things. But as it stands, when you build GHC, you can probably
reproduce *today's* results and *today's* bugs. Last month's results?
Last years? Finding the difference between those months ago and today?
Good luck - you will need it.

On Tue, Jun 4, 2013 at 8:07 PM, Kazu Yamamoto <kazu at iij.ad.jp> wrote:

> Hi,
>
> Andreas and I found that the new IO manager is not working properly in
> the current GHC head. I'm sure that it worked well at least on May 7.
>
> We need to narrow the range of commits, so I did:
>
>   % git checkout bb2795db36b36966697c228315ae20767c4a8753
>   % git submodule update
>
> But this does not checkout proper submodules. For instance,
> libraries/base has newer commits. And of cource, building fails.
>
> Please tell us how to checkout proper submodules against a specific
> GHC tree.
>
> --Kazu
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs



--
Regards,
Austin - PGP: 4096R/0x91384671


Reply | Threaded
Open this post in threaded view
|

how to checkout proper submodules

Johan Tibell-2
On Tue, Jun 4, 2013 at 7:05 PM, Austin Seipp <aseipp at pobox.com> wrote:

> I know we had this discussion sometime recently I think, but can
> someone *please* explain why we are in this situation of half
> submodules, half random-floating-git-repository-checkouts? It's
> terrible. I'm frankly surprised we've even been doing it this long,
> over a year or more? It is literally the worst of submodules, and
> free-standing-repositories put together, with none of the advantages
> of either.
>

This is my understanding of what happened: we started out with only plain
repos. This avoids some of the pitfalls of submodules and we believed it
was the least disruptive workflow (when switching form darcs) for the core
contributors. Eventually we needed GHC to track upstream releases of
libraries (e.g. Cabal) instead of jus tracking HEAD, which it did before.
To achieve that, we switched the libraries that GHC just tracks (e.g.
Cabal) to submodules. The libraries maintained by GHC HQ (e.g. base) we're
still kept as plain repos to avoid disrupting anyones workflow.

The latest git release has improved submodules support some so if we now
thing the benefits of submodules outweigh the costs we can discuss if we
want to change to policy. I don't want to make that decision for other GHC
developers that spend much more time on GHC than I (e.g. SPJ). Their
productivity is more important than any inconveniences the lack of
consistent use of submodules might cause me.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130604/dc9d9366/attachment-0001.htm>

Reply | Threaded
Open this post in threaded view
|

how to checkout proper submodules

Austin Seipp-4
I absolutely agree here, FWIW. We should only do this if there is a
clear consensus on doing so and everyone doing active development is
comfortable with it. And it's entirely possible submodules are
inadequate for some reason that I'm not aware of which is a
show-stopper.

However, the notion of impact-on-contributors cuts both ways. GHC has
an extremely small team of hackers as it stands, and we are lucky to
have *amazing* contributors like Kazu, Andreas, yourself, Simon &
Simon, and numerous others help make GHC what it is. Much of this is
volunteer work. But as the Haskell community grows, and we are at a
loss of other full-time contributors like Simon Marlow, I think we are
beginning to see the strain on GHC and its current contributors. So,
it's important to evaluate what we're doing right and wrong. This
feedback loop is always present even if seasoned contributors can live
with it - but new contributors will definitely be impacted.

In this instance, I honestly find it disheartening that the answer to
things like "getting older revisions of the source code in HEAD," or
techniques like bisection is basically "that doesn't work." The second
is unfortunate, but the latter is pretty legitimately worrying. It
would be one thing if this was a one-off occurrence of some odd
developer-workflow. But I have answered the fundamental question here
(submodules vs free-floating clones) a handful of times myself at
least, experienced the pain of the decision myself when doing
rollbacks, and I'm sure other contributors can say the same.

GHC is already a large, industry-strength software project with years
of work put behind it. The barrier to entry and contribution is not
exactly small, but I think we've all done a good job. I'd love to see
more people contributing. But I cannot help but find these discussions
a bit sad, where contributors are impaired due to regular/traditional
development workflows like rollbacks are rendered useless - due to
some odd source control discrepancy that nobody else on the planet
seems to suffer from.

I guess the short version is basically that that you're absolutely
right: the time of Simon, Ian, and other high-profile contributors is
*extremely* important. But I'd also rather not have people like Kazu
potentially spend hours or even days doing what simple automation can
achieve in what is literally a few keystrokes, and not only that - par
for the course for other projects. This ultimately impacts the
development cycles of *everybody*. And even if Kazu deals with it -
what about the next person?

On Wed, Jun 5, 2013 at 12:12 AM, Johan Tibell <johan.tibell at gmail.com> wrote:
> The latest git release has improved submodules support some so if we now
> thing the benefits of submodules outweigh the costs we can discuss if we
> want to change to policy. I don't want to make that decision for other GHC
> developers that spend much more time on GHC than I (e.g. SPJ). Their
> productivity is more important than any inconveniences the lack of
> consistent use of submodules might cause me.


--
Regards,
Austin - PGP: 4096R/0x91384671


Reply | Threaded
Open this post in threaded view
|

how to checkout proper submodules

Simon Peyton Jones
For the avoidance of doubt, I totally support what Austin and Johan are saying:

I find the current setup confusing too.

I'm totally persuaded of the merits of git bisect etc.

I am the opposite of a git power-user (a git weedy-user?).  I will be content to do whatever I'm told workflow-wise, provided I am told clearly in words of one syllable.

I *very strongly* want to reduce barriers to entry for would-be contributors, and this is clearly a barrier we could lower.  Making Kazu, Austin, Johan, etc more productive is massively valuable.

There may be some history to how we arrived at this point, but that should not constrain for the future.  We can change our workflow.   I would want Ian and Simon to be thoroughly on board, but I regard the current setup as totally open to improvement.  Please!

BTW, Ian has written it up quite carefully here: http://hackage.haskell.org/trac/ghc/wiki/Repositories, and the linked page http://hackage.haskell.org/trac/ghc/wiki/Repositories/Upstream.

Simon



| -----Original Message-----
| From: ghc-devs-bounces at haskell.org [mailto:ghc-devs-bounces at haskell.org]
| On Behalf Of Austin Seipp
| Sent: 05 June 2013 07:35
| To: Johan Tibell
| Cc: ghc-devs at haskell.org
| Subject: Re: how to checkout proper submodules
|
| I absolutely agree here, FWIW. We should only do this if there is a
| clear consensus on doing so and everyone doing active development is
| comfortable with it. And it's entirely possible submodules are
| inadequate for some reason that I'm not aware of which is a
| show-stopper.
|
| However, the notion of impact-on-contributors cuts both ways. GHC has
| an extremely small team of hackers as it stands, and we are lucky to
| have *amazing* contributors like Kazu, Andreas, yourself, Simon &
| Simon, and numerous others help make GHC what it is. Much of this is
| volunteer work. But as the Haskell community grows, and we are at a
| loss of other full-time contributors like Simon Marlow, I think we are
| beginning to see the strain on GHC and its current contributors. So,
| it's important to evaluate what we're doing right and wrong. This
| feedback loop is always present even if seasoned contributors can live
| with it - but new contributors will definitely be impacted.
|
| In this instance, I honestly find it disheartening that the answer to
| things like "getting older revisions of the source code in HEAD," or
| techniques like bisection is basically "that doesn't work." The second
| is unfortunate, but the latter is pretty legitimately worrying. It
| would be one thing if this was a one-off occurrence of some odd
| developer-workflow. But I have answered the fundamental question here
| (submodules vs free-floating clones) a handful of times myself at
| least, experienced the pain of the decision myself when doing
| rollbacks, and I'm sure other contributors can say the same.
|
| GHC is already a large, industry-strength software project with years
| of work put behind it. The barrier to entry and contribution is not
| exactly small, but I think we've all done a good job. I'd love to see
| more people contributing. But I cannot help but find these discussions
| a bit sad, where contributors are impaired due to regular/traditional
| development workflows like rollbacks are rendered useless - due to
| some odd source control discrepancy that nobody else on the planet
| seems to suffer from.
|
| I guess the short version is basically that that you're absolutely
| right: the time of Simon, Ian, and other high-profile contributors is
| *extremely* important. But I'd also rather not have people like Kazu
| potentially spend hours or even days doing what simple automation can
| achieve in what is literally a few keystrokes, and not only that - par
| for the course for other projects. This ultimately impacts the
| development cycles of *everybody*. And even if Kazu deals with it -
| what about the next person?
|
| On Wed, Jun 5, 2013 at 12:12 AM, Johan Tibell <johan.tibell at gmail.com>
| wrote:
| > The latest git release has improved submodules support some so if we now
| > thing the benefits of submodules outweigh the costs we can discuss if we
| > want to change to policy. I don't want to make that decision for other GHC
| > developers that spend much more time on GHC than I (e.g. SPJ). Their
| > productivity is more important than any inconveniences the lack of
| > consistent use of submodules might cause me.
|
|
| --
| Regards,
| Austin - PGP: 4096R/0x91384671
|
| _______________________________________________
| ghc-devs mailing list
| ghc-devs at haskell.org
| http://www.haskell.org/mailman/listinfo/ghc-devs


Reply | Threaded
Open this post in threaded view
|

how to checkout proper submodules

Manuel M T Chakravarty
I agree with Austin and Johan. It's a bizarre setup. Submodules have their pain points (which we already have to deal with), but the ability to properly snapshot and branch the whole tree would be a serious benefit IMO.

Manuel

PS: While we are at it, why don't we just have the main repos on GitHub and use forks and pull requests like the rest of the world? (Using Git, but not GitHub's superb infrastructure, seems like a terrible waste to me.)

Simon Peyton-Jones <simonpj at microsoft.com>:

> For the avoidance of doubt, I totally support what Austin and Johan are saying:
>
> I find the current setup confusing too.
>
> I'm totally persuaded of the merits of git bisect etc.
>
> I am the opposite of a git power-user (a git weedy-user?).  I will be content to do whatever I'm told workflow-wise, provided I am told clearly in words of one syllable.
>
> I *very strongly* want to reduce barriers to entry for would-be contributors, and this is clearly a barrier we could lower.  Making Kazu, Austin, Johan, etc more productive is massively valuable.
>
> There may be some history to how we arrived at this point, but that should not constrain for the future.  We can change our workflow.   I would want Ian and Simon to be thoroughly on board, but I regard the current setup as totally open to improvement.  Please!
>
> BTW, Ian has written it up quite carefully here: http://hackage.haskell.org/trac/ghc/wiki/Repositories, and the linked page http://hackage.haskell.org/trac/ghc/wiki/Repositories/Upstream.
>
> Simon
>
>
>
> | -----Original Message-----
> | From: ghc-devs-bounces at haskell.org [mailto:ghc-devs-bounces at haskell.org]
> | On Behalf Of Austin Seipp
> | Sent: 05 June 2013 07:35
> | To: Johan Tibell
> | Cc: ghc-devs at haskell.org
> | Subject: Re: how to checkout proper submodules
> |
> | I absolutely agree here, FWIW. We should only do this if there is a
> | clear consensus on doing so and everyone doing active development is
> | comfortable with it. And it's entirely possible submodules are
> | inadequate for some reason that I'm not aware of which is a
> | show-stopper.
> |
> | However, the notion of impact-on-contributors cuts both ways. GHC has
> | an extremely small team of hackers as it stands, and we are lucky to
> | have *amazing* contributors like Kazu, Andreas, yourself, Simon &
> | Simon, and numerous others help make GHC what it is. Much of this is
> | volunteer work. But as the Haskell community grows, and we are at a
> | loss of other full-time contributors like Simon Marlow, I think we are
> | beginning to see the strain on GHC and its current contributors. So,
> | it's important to evaluate what we're doing right and wrong. This
> | feedback loop is always present even if seasoned contributors can live
> | with it - but new contributors will definitely be impacted.
> |
> | In this instance, I honestly find it disheartening that the answer to
> | things like "getting older revisions of the source code in HEAD," or
> | techniques like bisection is basically "that doesn't work." The second
> | is unfortunate, but the latter is pretty legitimately worrying. It
> | would be one thing if this was a one-off occurrence of some odd
> | developer-workflow. But I have answered the fundamental question here
> | (submodules vs free-floating clones) a handful of times myself at
> | least, experienced the pain of the decision myself when doing
> | rollbacks, and I'm sure other contributors can say the same.
> |
> | GHC is already a large, industry-strength software project with years
> | of work put behind it. The barrier to entry and contribution is not
> | exactly small, but I think we've all done a good job. I'd love to see
> | more people contributing. But I cannot help but find these discussions
> | a bit sad, where contributors are impaired due to regular/traditional
> | development workflows like rollbacks are rendered useless - due to
> | some odd source control discrepancy that nobody else on the planet
> | seems to suffer from.
> |
> | I guess the short version is basically that that you're absolutely
> | right: the time of Simon, Ian, and other high-profile contributors is
> | *extremely* important. But I'd also rather not have people like Kazu
> | potentially spend hours or even days doing what simple automation can
> | achieve in what is literally a few keystrokes, and not only that - par
> | for the course for other projects. This ultimately impacts the
> | development cycles of *everybody*. And even if Kazu deals with it -
> | what about the next person?
> |
> | On Wed, Jun 5, 2013 at 12:12 AM, Johan Tibell <johan.tibell at gmail.com>
> | wrote:
> | > The latest git release has improved submodules support some so if we now
> | > thing the benefits of submodules outweigh the costs we can discuss if we
> | > want to change to policy. I don't want to make that decision for other GHC
> | > developers that spend much more time on GHC than I (e.g. SPJ). Their
> | > productivity is more important than any inconveniences the lack of
> | > consistent use of submodules might cause me.
> |
> |
> | --
> | Regards,
> | Austin - PGP: 4096R/0x91384671
> |
> | _______________________________________________
> | ghc-devs mailing list
> | ghc-devs at haskell.org
> | http://www.haskell.org/mailman/listinfo/ghc-devs
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs



Reply | Threaded
Open this post in threaded view
|

how to checkout proper submodules

David Terei
On 5 June 2013 01:43, Manuel M T Chakravarty <chak at cse.unsw.edu.au> wrote:

> I agree with Austin and Johan. It's a bizarre setup. Submodules have their
> pain points (which we already have to deal with), but the ability to
> properly snapshot and branch the whole tree would be a serious benefit IMO.
>
> Manuel
>
> PS: While we are at it, why don't we just have the main repos on GitHub
> and use forks and pull requests like the rest of the world? (Using Git, but
> not GitHub's superb infrastructure, seems like a terrible waste to me.)
>

I'd be all for this. We partially use the GitHub infrastructure since trac
broke and I changed the emails to point to GitHub instead. I also often do
code reviews with other devs on a personal GHC fork on github before
merging in.

I believe it would also help encourage more contributors (especially for
libraries) but others have expressed disagreement with this point of view
in the past and I'm not in hold of data.

Either way, I'm glad git bisect may soon work. We'll finally be able to use
the whole feature set of a version control tool :)  (other piece was the
move from darcs -> git which gave us a working annotate).


> Simon Peyton-Jones <simonpj at microsoft.com>:
> > For the avoidance of doubt, I totally support what Austin and Johan are
> saying:
> >
> > I find the current setup confusing too.
> >
> > I'm totally persuaded of the merits of git bisect etc.
> >
> > I am the opposite of a git power-user (a git weedy-user?).  I will be
> content to do whatever I'm told workflow-wise, provided I am told clearly
> in words of one syllable.
> >
> > I *very strongly* want to reduce barriers to entry for would-be
> contributors, and this is clearly a barrier we could lower.  Making Kazu,
> Austin, Johan, etc more productive is massively valuable.
> >
> > There may be some history to how we arrived at this point, but that
> should not constrain for the future.  We can change our workflow.   I would
> want Ian and Simon to be thoroughly on board, but I regard the current
> setup as totally open to improvement.  Please!
> >
> > BTW, Ian has written it up quite carefully here:
> http://hackage.haskell.org/trac/ghc/wiki/Repositories, and the linked
> page http://hackage.haskell.org/trac/ghc/wiki/Repositories/Upstream.
> >
> > Simon
> >
> >
> >
> > | -----Original Message-----
> > | From: ghc-devs-bounces at haskell.org [mailto:
> ghc-devs-bounces at haskell.org]
> > | On Behalf Of Austin Seipp
> > | Sent: 05 June 2013 07:35
> > | To: Johan Tibell
> > | Cc: ghc-devs at haskell.org
> > | Subject: Re: how to checkout proper submodules
> > |
> > | I absolutely agree here, FWIW. We should only do this if there is a
> > | clear consensus on doing so and everyone doing active development is
> > | comfortable with it. And it's entirely possible submodules are
> > | inadequate for some reason that I'm not aware of which is a
> > | show-stopper.
> > |
> > | However, the notion of impact-on-contributors cuts both ways. GHC has
> > | an extremely small team of hackers as it stands, and we are lucky to
> > | have *amazing* contributors like Kazu, Andreas, yourself, Simon &
> > | Simon, and numerous others help make GHC what it is. Much of this is
> > | volunteer work. But as the Haskell community grows, and we are at a
> > | loss of other full-time contributors like Simon Marlow, I think we are
> > | beginning to see the strain on GHC and its current contributors. So,
> > | it's important to evaluate what we're doing right and wrong. This
> > | feedback loop is always present even if seasoned contributors can live
> > | with it - but new contributors will definitely be impacted.
> > |
> > | In this instance, I honestly find it disheartening that the answer to
> > | things like "getting older revisions of the source code in HEAD," or
> > | techniques like bisection is basically "that doesn't work." The second
> > | is unfortunate, but the latter is pretty legitimately worrying. It
> > | would be one thing if this was a one-off occurrence of some odd
> > | developer-workflow. But I have answered the fundamental question here
> > | (submodules vs free-floating clones) a handful of times myself at
> > | least, experienced the pain of the decision myself when doing
> > | rollbacks, and I'm sure other contributors can say the same.
> > |
> > | GHC is already a large, industry-strength software project with years
> > | of work put behind it. The barrier to entry and contribution is not
> > | exactly small, but I think we've all done a good job. I'd love to see
> > | more people contributing. But I cannot help but find these discussions
> > | a bit sad, where contributors are impaired due to regular/traditional
> > | development workflows like rollbacks are rendered useless - due to
> > | some odd source control discrepancy that nobody else on the planet
> > | seems to suffer from.
> > |
> > | I guess the short version is basically that that you're absolutely
> > | right: the time of Simon, Ian, and other high-profile contributors is
> > | *extremely* important. But I'd also rather not have people like Kazu
> > | potentially spend hours or even days doing what simple automation can
> > | achieve in what is literally a few keystrokes, and not only that - par
> > | for the course for other projects. This ultimately impacts the
> > | development cycles of *everybody*. And even if Kazu deals with it -
> > | what about the next person?
> > |
> > | On Wed, Jun 5, 2013 at 12:12 AM, Johan Tibell <johan.tibell at gmail.com>
> > | wrote:
> > | > The latest git release has improved submodules support some so if we
> now
> > | > thing the benefits of submodules outweigh the costs we can discuss
> if we
> > | > want to change to policy. I don't want to make that decision for
> other GHC
> > | > developers that spend much more time on GHC than I (e.g. SPJ). Their
> > | > productivity is more important than any inconveniences the lack of
> > | > consistent use of submodules might cause me.
> > |
> > |
> > | --
> > | Regards,
> > | Austin - PGP: 4096R/0x91384671
> > |
> > | _______________________________________________
> > | ghc-devs mailing list
> > | ghc-devs at haskell.org
> > | http://www.haskell.org/mailman/listinfo/ghc-devs
> >
> > _______________________________________________
> > ghc-devs mailing list
> > ghc-devs at haskell.org
> > http://www.haskell.org/mailman/listinfo/ghc-devs
>
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130605/fa13c02d/attachment-0001.htm>

Reply | Threaded
Open this post in threaded view
|

how to checkout proper submodules

Erik de Castro Lopo-34
David Terei wrote:

> Either way, I'm glad git bisect may soon work.

Having git bisect work on the GHC tree would be a plus!

Erik
--
----------------------------------------------------------------------
Erik de Castro Lopo
http://www.mega-nerd.com/


Reply | Threaded
Open this post in threaded view
|

how to checkout proper submodules

John Lato-2
In reply to this post by David Terei
On Wed, Jun 5, 2013 at 5:10 PM, David Terei <davidterei at gmail.com> wrote:

> On 5 June 2013 01:43, Manuel M T Chakravarty <chak at cse.unsw.edu.au> wrote:
>
>> I agree with Austin and Johan. It's a bizarre setup. Submodules have
>> their pain points (which we already have to deal with), but the ability to
>> properly snapshot and branch the whole tree would be a serious benefit IMO.
>>
>> Manuel
>>
>> PS: While we are at it, why don't we just have the main repos on GitHub
>> and use forks and pull requests like the rest of the world? (Using Git, but
>> not GitHub's superb infrastructure, seems like a terrible waste to me.)
>>
>
> I'd be all for this. We partially use the GitHub infrastructure since trac
> broke and I changed the emails to point to GitHub instead. I also often do
> code reviews with other devs on a personal GHC fork on github before
> merging in.
>
> I believe it would also help encourage more contributors (especially for
> libraries) but others have expressed disagreement with this point of view
> in the past and I'm not in hold of data.
>

I strongly suspect that fixing the original issue from this thread would do
much more to encourage contributions.  It certainly doesn't matter to me if
ghc is on github or not, but I (as an extremely meager GHC hacker) find it
near-impossible to maintain a usable repo if I want to do any sort of
branching or checkouts.  And while I hate git submodules with a passion, I
agree with everyone who thus far has said that the current practice is even
less usable (all the drawbacks and none of benefits).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130605/1901f0bb/attachment.htm>

Reply | Threaded
Open this post in threaded view
|

how to checkout proper submodules

Daniel Vainsencher
In reply to this post by Manuel M T Chakravarty
Reply | Threaded
Open this post in threaded view
|

how to checkout proper submodules

Jan Stolarek
In reply to this post by David Terei
For me the biggest plus of switching to submodules would be keeping GHC and testsuite in sync. If
there are any reasons not to change in-tree library repos to submodules, then I would at least
want testsuite to be changed to a submodule.

I also use github for my daily work on GHC and being able to send patches via Pull Requests would
make things easier. On the other hand it might be more difficult to attach files to a ticket (no
such feature on Github AFAIK).

Speaking of Github, perhaps we should put more stress on github folks to fix this:
https://github.com/github/markup/issues/196 ?

Jan


Reply | Threaded
Open this post in threaded view
|

how to checkout proper submodules

Vincent Hanquez
In reply to this post by David Terei
On 06/05/2013 10:10 AM, David Terei wrote:

> On 5 June 2013 01:43, Manuel M T Chakravarty <chak at cse.unsw.edu.au> wrote:
>
>> I agree with Austin and Johan. It's a bizarre setup. Submodules have their
>> pain points (which we already have to deal with), but the ability to
>> properly snapshot and branch the whole tree would be a serious benefit IMO.
>>
>> Manuel
>>
>> PS: While we are at it, why don't we just have the main repos on GitHub
>> and use forks and pull requests like the rest of the world? (Using Git, but
>> not GitHub's superb infrastructure, seems like a terrible waste to me.)
>>
> I'd be all for this. We partially use the GitHub infrastructure since trac
> broke and I changed the emails to point to GitHub instead. I also often do
> code reviews with other devs on a personal GHC fork on github before
> merging in.
>
> I believe it would also help encourage more contributors (especially for
> libraries) but others have expressed disagreement with this point of view
> in the past and I'm not in hold of data.
As a very recent new (try-to-be-)contributor, i'ld like to weight in, in
favor of this.

IMHO, having to create a trac account, and submit patches by attachment
(with the confusing trac UI) instead of just pushing to some
repositories and
issuing pull requests is quite suboptimal.

I don't think it would scare anyone enough that they wouldn't
contribute, but
lowering the "entry cost" is always useful.

--
Vincent


Reply | Threaded
Open this post in threaded view
|

how to checkout proper submodules

Niklas Larsson
In reply to this post by Austin Seipp-4
When I was fiddling with having to rollback everything to a known good
state I patched sync-all to checkout all the repos to the state they were
in on a certain date, it's pretty naive, but it should be usable for doing
manual bisecting at least. I can't find the old mailing list archives, so I
attach the patch here.

Niklas


2013/6/5 Austin Seipp <aseipp at pobox.com>

> (Warning: incoming answer, followed by a rant.)
>
> Base is not a submodule, meaning that there is essentially no way to
> automatically check it back out to the "exact same state" it was in,
> given some specified GHC commit - the commit IDs are not tracked.
>
> At this point, you are basically on your own. You'll have to manually
> checkout libraries/base to a specific commit that occurred 'around'
> the same time as the GHC commit. In this case, that means looking
> through whatever commits hit HEAD on May 7th:
>
> $ cd libraries/base
> $ git log --until="May 7th"
>
> The resulting list will show you what happened up to may 7th. Take the
> latest commit in that list, and check out base to that revision. Any
> commits afterword happened on may 8th or later:
>
> $ git checkout -b temporary-io-fix <sha1 of latest May 7th commit>
>
> You're going to need to do this for every module that is not tracked
> as a submodule. Most of the repositories are very low-activity. base &
> testsuite are going to be the annoying ones.
>
> You'll have to continue this 'manual bisection' by hand, with a very
> hefty dose of frustrating trial-and-error, in my experience.
>
> There is a secondary alternative. GHC has a script called
> 'fingerprint.py' (in utils/fingerprint/) which is somewhat designed to
> work around this deficiency (very poorly.) This script basically dumps
> out a text file, containing a key/value pair mapping every repository
> to its current HEAD commit. It can then take that text file and
> automatically do 'git checkout' for you in every repo. The idea is you
> can take fingerprints of the tree, save the results, and cleanly check
> out to some state later.
>
> The GHC build bots run by Ben L.'s "Buildbox" library automatically
> runs the 'fingerprint.py' script during nightly-builds, from what I
> remember. It may be possible to just look in the ghc-builds archives,
> and steal some fingerprints from the last month off one of the
> buildbots. I don't know who maintains the individual bots; perhaps you
> can ask the list. However, this will at best give you a 1-day level of
> granularity, rather than commit level granularity, which is still
> rather unsatisfying.
>
> ------------- Answer over, rant begins. ---------------------
>
> I know we had this discussion sometime recently I think, but can
> someone *please* explain why we are in this situation of half
> submodules, half random-floating-git-repository-checkouts? It's
> terrible. I'm frankly surprised we've even been doing it this long,
> over a year or more? It is literally the worst of submodules, and
> free-standing-repositories put together, with none of the advantages
> of either.
>
> Free-standing repos are attractive because they are just there, and
> you don't have to 'maintain' them (sort of.) Submodules are attractive
> because they identify the critical points in which your repositories
> depend on each other. We have neither benefit right now, clearly.
>
> In particular, this makes it impossible to use tools like 'git bisect'
> which is *incredibly* useful for just these exact cases. Hell, you can
> even make 'git bisect' work almost 100% automatically with a tiny bit
> of shell scripting.
>
>
> http://mainisusuallyafunction.blogspot.com/2012/09/tracking-down-unused-variables-with.html
>
> You could just instead have a script that built the compiler, and ran
> the built compiler on your testcase, after every bisection. Wouldn't
> it be *great* to have something like that Just Work? A tool like this
> could potentially boil down Kazu's bug almost automatically for
> example, with little-to-no frustrating intervention.
>
> And even now, looking at the repository listing of what is in
> libraries/, that are not submodules, I really see no reason why more -
> or even all - of them cannot be submodules. Is it a workflow issue of
> some sort? That's what I'm thinking at this point, but I also don't
> think it could be any worse than it is now.
>
> Realistically, very few libraries GHC needs for bootstrapping seem to
> change that much. unix, integer-simple, haskeline and filepath for
> example change *extremely* infrequently, but all are free-standing.
> Why? In the event they were submodules, would anything actually be
> lost?
>
> The maintainer - that is, not GHC HQ - would still 'own' the official
> repository. They can make changes to it. But if there is a necessity
> to pull that in for GHC (feature request, bug fix, random thing) it
> can be done by updating the submodule pointer to the new commit. But
> this must happen explicitly by a GHC committer. In the event they
> update the submodule pointer, they should also obviously make sure the
> build still works.
>
> That means we have to update the submodule pointers ourselves if
> things change. That sucks I guess, but really, aside from base and
> testsuite, the two most frequently changing repositories, is that
> *actually* going to cost us a lot of work?
>
> And even if it does cost us work, I'll speak for myself: I will gladly
> pay for that work and do it all myself if it means I can actually
> bisect and actually roll back my tree to some point to fix things -
> without needing to prepare for it months in advance using hacks. Like
> creating thousands of fingerprints, using fingerprint.py every day
> when people make commits (no, I haven't done this, but it could be
> done, and I really don't want to do it.)
>
> Long-term reproducible builds are, IMO, a must for any project.
> *Especially* a project of our size. *Especially* a compiler of all
> things. But as it stands, when you build GHC, you can probably
> reproduce *today's* results and *today's* bugs. Last month's results?
> Last years? Finding the difference between those months ago and today?
> Good luck - you will need it.
>
> On Tue, Jun 4, 2013 at 8:07 PM, Kazu Yamamoto <kazu at iij.ad.jp> wrote:
> > Hi,
> >
> > Andreas and I found that the new IO manager is not working properly in
> > the current GHC head. I'm sure that it worked well at least on May 7.
> >
> > We need to narrow the range of commits, so I did:
> >
> >   % git checkout bb2795db36b36966697c228315ae20767c4a8753
> >   % git submodule update
> >
> > But this does not checkout proper submodules. For instance,
> > libraries/base has newer commits. And of cource, building fails.
> >
> > Please tell us how to checkout proper submodules against a specific
> > GHC tree.
> >
> > --Kazu
> >
> > _______________________________________________
> > ghc-devs mailing list
> > ghc-devs at haskell.org
> > http://www.haskell.org/mailman/listinfo/ghc-devs
>
>
>
> --
> Regards,
> Austin - PGP: 4096R/0x91384671
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130605/3ebbcaec/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Add-date-command-to-sync-all.patch
Type: application/octet-stream
Size: 1465 bytes
Desc: not available
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130605/3ebbcaec/attachment.obj>

Reply | Threaded
Open this post in threaded view
|

how to checkout proper submodules

Manuel M T Chakravarty
In reply to this post by David Terei
David Terei <davidterei at gmail.com>:

> On 5 June 2013 01:43, Manuel M T Chakravarty <chak at cse.unsw.edu.au> wrote:
> I agree with Austin and Johan. It's a bizarre setup. Submodules have their pain points (which we already have to deal with), but the ability to properly snapshot and branch the whole tree would be a serious benefit IMO.
>
> Manuel
>
> PS: While we are at it, why don't we just have the main repos on GitHub and use forks and pull requests like the rest of the world? (Using Git, but not GitHub's superb infrastructure, seems like a terrible waste to me.)
>
> I'd be all for this. We partially use the GitHub infrastructure since trac broke and I changed the emails to point to GitHub instead. I also often do code reviews with other devs on a personal GHC fork on github before merging in.
>
> I believe it would also help encourage more contributors (especially for libraries) but others have expressed disagreement with this point of view in the past and I'm not in hold of data.

For the compiler, the barriers to contribution are probably elsewhere, but for the libraries, I'm sure, it would lower the barrier to entry. For example, to fix some documentation, I personally would never bother to create a patch file and attach it to some Trac ticket (where I first have to create an account). In contrast, a pull request on GitHub is a matter of a few clicks.

Manuel

PS: Anybody who doubts this needs to post their GitHub account name, so we can check that they actually ever used GitHub properly ;)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130605/26effecb/attachment-0001.htm>

Reply | Threaded
Open this post in threaded view
|

how to checkout proper submodules

Geoffrey Mainland
In reply to this post by Niklas Larsson
I very much support moving to all-submodules. In fact, I argued for
all-submodules when we made the half-submodules transition last
year. Being able to easily check out a consistent and complete source
code tree in a repeatable way is extremely important.

Checking out by date "works" if you have dated history in your git
reflog. For example, see:

http://stackoverflow.com/questions/6990484/git-checkout-by-date

In general, git commits are *not* time ordered, so asking for the
version at a particular time is not well-defined across different
working repositories.

The GHC HQ buildbots dump fingerprints in a form that is usable directly
with fingerprint.py. You can get these fingerprints from the ghc-builds@
archive. Unfortunately there was a large gap after MSR moved buildings
where our builds did not run, but things are more or less working now. I
believe Ben's buildbot package dumps fingerprints in a form that needs
to be massaged before fingerprints.py can deal with it.

Geoff

On 06/05/2013 11:32 AM, Niklas Larsson wrote:

> When I was fiddling with having to rollback everything to a known good
> state I patched sync-all to checkout all the repos to the state they
> were in on a certain date, it's pretty naive, but it should be usable
> for doing manual bisecting at least. I can't find the old mailing list
> archives, so I attach the patch here.
>
> Niklas
>
>
> 2013/6/5 Austin Seipp <aseipp at pobox.com>
>
>     (Warning: incoming answer, followed by a rant.)
>
>     Base is not a submodule, meaning that there is essentially no way to
>     automatically check it back out to the "exact same state" it was in,
>     given some specified GHC commit - the commit IDs are not tracked.
>
>     At this point, you are basically on your own. You'll have to manually
>     checkout libraries/base to a specific commit that occurred 'around'
>     the same time as the GHC commit. In this case, that means looking
>     through whatever commits hit HEAD on May 7th:
>
>     $ cd libraries/base
>     $ git log --until="May 7th"
>
>     The resulting list will show you what happened up to may 7th. Take the
>     latest commit in that list, and check out base to that revision. Any
>     commits afterword happened on may 8th or later:
>
>     $ git checkout -b temporary-io-fix <sha1 of latest May 7th commit>
>
>     You're going to need to do this for every module that is not tracked
>     as a submodule. Most of the repositories are very low-activity. base &
>     testsuite are going to be the annoying ones.
>
>     You'll have to continue this 'manual bisection' by hand, with a very
>     hefty dose of frustrating trial-and-error, in my experience.
>
>     There is a secondary alternative. GHC has a script called
>     'fingerprint.py' (in utils/fingerprint/) which is somewhat designed to
>     work around this deficiency (very poorly.) This script basically dumps
>     out a text file, containing a key/value pair mapping every repository
>     to its current HEAD commit. It can then take that text file and
>     automatically do 'git checkout' for you in every repo. The idea is you
>     can take fingerprints of the tree, save the results, and cleanly check
>     out to some state later.
>
>     The GHC build bots run by Ben L.'s "Buildbox" library automatically
>     runs the 'fingerprint.py' script during nightly-builds, from what I
>     remember. It may be possible to just look in the ghc-builds archives,
>     and steal some fingerprints from the last month off one of the
>     buildbots. I don't know who maintains the individual bots; perhaps you
>     can ask the list. However, this will at best give you a 1-day level of
>     granularity, rather than commit level granularity, which is still
>     rather unsatisfying.
>
>     ------------- Answer over, rant begins. ---------------------
>
>     I know we had this discussion sometime recently I think, but can
>     someone *please* explain why we are in this situation of half
>     submodules, half random-floating-git-repository-checkouts? It's
>     terrible. I'm frankly surprised we've even been doing it this long,
>     over a year or more? It is literally the worst of submodules, and
>     free-standing-repositories put together, with none of the advantages
>     of either.
>
>     Free-standing repos are attractive because they are just there, and
>     you don't have to 'maintain' them (sort of.) Submodules are attractive
>     because they identify the critical points in which your repositories
>     depend on each other. We have neither benefit right now, clearly.
>
>     In particular, this makes it impossible to use tools like 'git bisect'
>     which is *incredibly* useful for just these exact cases. Hell, you can
>     even make 'git bisect' work almost 100% automatically with a tiny bit
>     of shell scripting.
>
>    
http://mainisusuallyafunction.blogspot.com/2012/09/tracking-down-unused-variables-with.html

>
>     You could just instead have a script that built the compiler, and ran
>     the built compiler on your testcase, after every bisection. Wouldn't
>     it be *great* to have something like that Just Work? A tool like this
>     could potentially boil down Kazu's bug almost automatically for
>     example, with little-to-no frustrating intervention.
>
>     And even now, looking at the repository listing of what is in
>     libraries/, that are not submodules, I really see no reason why more -
>     or even all - of them cannot be submodules. Is it a workflow issue of
>     some sort? That's what I'm thinking at this point, but I also don't
>     think it could be any worse than it is now.
>
>     Realistically, very few libraries GHC needs for bootstrapping seem to
>     change that much. unix, integer-simple, haskeline and filepath for
>     example change *extremely* infrequently, but all are free-standing.
>     Why? In the event they were submodules, would anything actually be
>     lost?
>
>     The maintainer - that is, not GHC HQ - would still 'own' the official
>     repository. They can make changes to it. But if there is a necessity
>     to pull that in for GHC (feature request, bug fix, random thing) it
>     can be done by updating the submodule pointer to the new commit. But
>     this must happen explicitly by a GHC committer. In the event they
>     update the submodule pointer, they should also obviously make sure the
>     build still works.
>
>     That means we have to update the submodule pointers ourselves if
>     things change. That sucks I guess, but really, aside from base and
>     testsuite, the two most frequently changing repositories, is that
>     *actually* going to cost us a lot of work?
>
>     And even if it does cost us work, I'll speak for myself: I will gladly
>     pay for that work and do it all myself if it means I can actually
>     bisect and actually roll back my tree to some point to fix things -
>     without needing to prepare for it months in advance using hacks. Like
>     creating thousands of fingerprints, using fingerprint.py every day
>     when people make commits (no, I haven't done this, but it could be
>     done, and I really don't want to do it.)
>
>     Long-term reproducible builds are, IMO, a must for any project.
>     *Especially* a project of our size. *Especially* a compiler of all
>     things. But as it stands, when you build GHC, you can probably
>     reproduce *today's* results and *today's* bugs. Last month's results?
>     Last years? Finding the difference between those months ago and today?
>     Good luck - you will need it.
>
>     On Tue, Jun 4, 2013 at 8:07 PM, Kazu Yamamoto <kazu at iij.ad.jp> wrote:
>     > Hi,
>     >
>     > Andreas and I found that the new IO manager is not working
properly in
>     > the current GHC head. I'm sure that it worked well at least on
May 7.

>     >
>     > We need to narrow the range of commits, so I did:
>     >
>     >   % git checkout bb2795db36b36966697c228315ae20767c4a8753
>     >   % git submodule update
>     >
>     > But this does not checkout proper submodules. For instance,
>     > libraries/base has newer commits. And of cource, building fails.
>     >
>     > Please tell us how to checkout proper submodules against a specific
>     > GHC tree.
>     >
>     > --Kazu
>
>     --
>     Regards,
>     Austin - PGP: 4096R/0x91384671




Reply | Threaded
Open this post in threaded view
|

how to checkout proper submodules

Geoffrey Mainland
In reply to this post by Manuel M T Chakravarty
I don't know much about subtrees, but that might be another possibility?

There are a lot of things to recommend moving to github. I do hate
(non-empty) merge commits, though, so I'm not a fan of github's pull
request mechanism.

Geoff

On 06/05/2013 09:43 AM, Manuel M T Chakravarty wrote:

> I agree with Austin and Johan. It's a bizarre setup. Submodules have
> their pain points (which we already have to deal with), but the
> ability to properly snapshot and branch the whole tree would be a
> serious benefit IMO.
>
> Manuel
>
> PS: While we are at it, why don't we just have the main repos on
> GitHub and use forks and pull requests like the rest of the world?
> (Using Git, but not GitHub's superb infrastructure, seems like a
> terrible waste to me.)
>
> Simon Peyton-Jones <simonpj at microsoft.com>:
>> For the avoidance of doubt, I totally support what Austin and Johan
>> are saying:
>>
>> I find the current setup confusing too.
>>
>> I'm totally persuaded of the merits of git bisect etc.
>>
>> I am the opposite of a git power-user (a git weedy-user?).  I will be
>> content to do whatever I'm told workflow-wise, provided I am told
>> clearly in words of one syllable.
>>
>> I *very strongly* want to reduce barriers to entry for would-be
>> contributors, and this is clearly a barrier we could lower.  Making
>> Kazu, Austin, Johan, etc more productive is massively valuable.
>>
>> There may be some history to how we arrived at this point, but that
>> should not constrain for the future.  We can change our workflow.  I
>> would want Ian and Simon to be thoroughly on board, but I regard the
>> current setup as totally open to improvement.  Please!
>>
>> BTW, Ian has written it up quite carefully here:
>> http://hackage.haskell.org/trac/ghc/wiki/Repositories, and the linked
>> page http://hackage.haskell.org/trac/ghc/wiki/Repositories/Upstream.
>>
>> Simon
>>
>>
>>
>> | -----Original Message-----
>> | From: ghc-devs-bounces at haskell.org
[mailto:ghc-devs-bounces at haskell.org]

>> | On Behalf Of Austin Seipp
>> | Sent: 05 June 2013 07:35
>> | To: Johan Tibell
>> | Cc: ghc-devs at haskell.org
>> | Subject: Re: how to checkout proper submodules
>> |
>> | I absolutely agree here, FWIW. We should only do this if there is a
>> | clear consensus on doing so and everyone doing active development is
>> | comfortable with it. And it's entirely possible submodules are
>> | inadequate for some reason that I'm not aware of which is a
>> | show-stopper.
>> |
>> | However, the notion of impact-on-contributors cuts both ways. GHC has
>> | an extremely small team of hackers as it stands, and we are lucky to
>> | have *amazing* contributors like Kazu, Andreas, yourself, Simon &
>> | Simon, and numerous others help make GHC what it is. Much of this is
>> | volunteer work. But as the Haskell community grows, and we are at a
>> | loss of other full-time contributors like Simon Marlow, I think we are
>> | beginning to see the strain on GHC and its current contributors. So,
>> | it's important to evaluate what we're doing right and wrong. This
>> | feedback loop is always present even if seasoned contributors can live
>> | with it - but new contributors will definitely be impacted.
>> |
>> | In this instance, I honestly find it disheartening that the answer to
>> | things like "getting older revisions of the source code in HEAD," or
>> | techniques like bisection is basically "that doesn't work." The second
>> | is unfortunate, but the latter is pretty legitimately worrying. It
>> | would be one thing if this was a one-off occurrence of some odd
>> | developer-workflow. But I have answered the fundamental question here
>> | (submodules vs free-floating clones) a handful of times myself at
>> | least, experienced the pain of the decision myself when doing
>> | rollbacks, and I'm sure other contributors can say the same.
>> |
>> | GHC is already a large, industry-strength software project with years
>> | of work put behind it. The barrier to entry and contribution is not
>> | exactly small, but I think we've all done a good job. I'd love to see
>> | more people contributing. But I cannot help but find these discussions
>> | a bit sad, where contributors are impaired due to regular/traditional
>> | development workflows like rollbacks are rendered useless - due to
>> | some odd source control discrepancy that nobody else on the planet
>> | seems to suffer from.
>> |
>> | I guess the short version is basically that that you're absolutely
>> | right: the time of Simon, Ian, and other high-profile contributors is
>> | *extremely* important. But I'd also rather not have people like Kazu
>> | potentially spend hours or even days doing what simple automation can
>> | achieve in what is literally a few keystrokes, and not only that - par
>> | for the course for other projects. This ultimately impacts the
>> | development cycles of *everybody*. And even if Kazu deals with it -
>> | what about the next person?
>> |
>> | On Wed, Jun 5, 2013 at 12:12 AM, Johan Tibell <johan.tibell at gmail.com>
>> | wrote:
>> | > The latest git release has improved submodules support some so if
we now
>> | > thing the benefits of submodules outweigh the costs we can
discuss if we
>> | > want to change to policy. I don't want to make that decision for
other GHC
>> | > developers that spend much more time on GHC than I (e.g. SPJ). Their
>> | > productivity is more important than any inconveniences the lack of
>> | > consistent use of submodules might cause me.
>> |
>> |
>> | --
>> | Regards,
>> | Austin - PGP: 4096R/0x91384671



Reply | Threaded
Open this post in threaded view
|

how to checkout proper submodules

Daniel Trstenjak-2

Hi Geoffrey,

> I don't know much about subtrees, but that might be another possibility?

the main point about subtrees is, that you've just one repository and
you're merging a directory of this repository with 'git subtree' with
some other git repository.

subtrees and submodules both try to handle the use case if you want to
incorporate a third party repository into your own repository and would
like to merge the changes in both directions.

I think that subtrees are easier for the developer working on the
repository, because there's only one repository, but it's a bit more
hassle merging the third party repository.

submodules are harder for the developer, because there're multiple
repositories, but merging the third party repository might be a bit
easier.

GHC devs might have other reasons for using submodules, because they want
to separate things or they're afraid that the resulting one repository
might get too big, but I think that there should be good reasons for
using submodules, because a lot of workflows (like branching) are such
a hassle with submodules.


Greetings,
Daniel


123