[dev] Split Git

Sat Nov 15 00:57:09 UTC 2014

Quoting Michael J Rubinsky <mrubinsk at horde.org>:

> Quoting Jan Schneider <jan at horde.org>:
>
>> Zitat von Michael J Rubinsky <mrubinsk at horde.org>:
>>
>>> I've been slowly looking into options for development workflows  
>>> after splitting our repo. I'm convinced we should use git-subtree,  
>>> but not in the way I think others have suggested.
>>>
>>> Gunnar has written an article detailing the use of git-subtree as  
>>> a way to keep BOTH the monolithic and split repos. There was a  
>>> short mailing list discussion to go along with thisa. I don't  
>>> think that this method will work; First of all, this will lead to  
>>> two different canonical Horde repositories for the same code.  
>>> That's confusing. Secondly, it is resource intensive. Gunnar's  
>>> approach utilizes an interim repository that will do the actual  
>>> split filtering and pushing, but it is s l o w. Third, and perhaps  
>>> most important, is that published topic branches won't be portable  
>>> between the two.
>>>
>>> Assume the monolithic repo is being used for development - it is  
>>> impossible to checkout a topic branch of one of the subtrees  
>>> without git rm'ing the folder, then re adding the subtree's topic  
>>> branch via git-subtree add. Even with squashing all the commits  
>>> along the way, this is *messy*. The monolithic's history will be  
>>> polluted with at least 2 merge commits everytime a branch is  
>>> changed. Not to mention that the state of the upstream monolithic  
>>> repository will be inconsistent. There is no telling what branch  
>>> any of the subtrees are currently on. E.g., if I replace  
>>> framework/component with a topic branch and the push the  
>>> monolithic repository without replacing it with master again, the  
>>> next person who pulls will get a monolithic repository with  
>>> framework/component's code from the topic branch.
>>>
>>> I propose that we only provide our individual repositories  
>>> publicly. Locally, however, we can utilize git-subtree to build a  
>>> monolithic repository we can develop against. This repository  
>>> remains local and is not expected to match the state of any other  
>>> local repository. This allows us to continue developing with  
>>> more-or-less the same workflow we utilize now, and utilizing  
>>> things like our framework_install script (mostly) unchanged. The  
>>> components script will probably need to be tweaked, since we will  
>>> obviously be releasing from the discrete repositories.  Dealing  
>>> with branch changes will still be messy, but the mess will be  
>>> confined to the local repository and not pushed up to any public  
>>> monolithic repository.
>>>
>>> Of course, helper scripts can be used to lessen the burden of  
>>> things like changing branches, and split/pushing back to the  
>>> upstream repositories. I've been cobbling together some ideas in a  
>>> utility script locally that, among other things, can be used to  
>>> setup the initial local repository.
>>>
>>> Thoughts? I'm sure I'm not the only one who wants to get this  
>>> moving. Among other reasons, I don't want to work on any BC  
>>> breaking code until we have this sorted and working.
>>
>> What's your reasoning to still have a monolith repo with sub-trees,  
>> even if only privately, locally? I understand that want to keep the  
>> workflow as close to the current workflow as possible. Though as  
>> you already mentioned, this won't work without scripts and tools  
>> for that anyway.
>
> It's not that this won't work without the tools, but rather makes it  
> easier, by combining a few commands. I do see the point you are  
> making though (see below).
>
>> My question is, what benefit do we have, managing a monolith, or  
>> container repo repository locally, opposed to having those tools  
>> and scripts just manage the individual repositories in a local  
>> container *directory*. I see that being able to git-commit from the  
>> base directory is a good thing. But if this only works by later  
>> splitting this commit to the individual repos through tools, why  
>> not having this tool making the "base" commit right from the start?  
>> Or is it possible to use the split tools automatically from git  
>> hooks?
>
> Yes, mostly it's to take advantage of git's functionality and our  
> existing tool set. Not only for git-commit from the base directory,  
> but things like diff|status|log as well. The latter may be of  
> dubious value if you choose to always squash the subtree updates,  
> though on the other hand, this would result in only changes made  
> locally showing in the log.

I just went through a whole from-the-ground up installation of a dev  
system while preparing the vagrant image.  These are the lessons I  
learned:

1. Our current dev installation process is absolutely terrible.  It  
took me the better part of **4** hours to try to configure the system  
to a point where it was usable.  And I'm supposedly a person (as a  
developer) who has pre-existing knowledge of the process.

It was a humbling/embarrassing experience.

In short: nothing about the current development process, including  
tools, workflow, etc., should be a factor in deciding how to change  
the system.

2. Monolithic repositories are a terrible idea.

And git subtree is not the answer, for a variety of reasons.

(My horror story: it took 6 hours to clone the current repo to a  
memory card so I can do work on a single library during a flight.   
There's no need to clone 25,000 files when you are only working on 25).

3. We have a bunch of different installation utilities living in  
several different locations.  Very confusing knowing what to use and  
how to use it.

4. Composer is going to make this all easier.  Warming up to it  
quickly (especially after trying to deal with an automated process to  
install all required PEAR libraries for a given set of  
applications/libraries).

The problem is the idea of trying to structure the git repo in a way  
where the structure itself defines the development  
process/environment.  This is the mistake we have done in the past and  
we can't repeat it again.

VCS is nothing more than a way to store code (and revisions).  *How*  
the various components interact is something that needs to be done at  
the environment level.  We can provide tools that create this  
environment automatically, but that is only one interpretation and an  
installation is free to do with the components as they wish.

My proposed solution:

- All apps and libraries live in separate git repos.  They are all  
entirely independent of each other.
   - Requires maintenance of a list of git repos, but that is a minor  
hassle.  (This can be automated via a script on www.horde.org, for  
example).

- Installation of code from git repos can be facilitated by a script.
   - This script can create/clone the git repos as needed.
   - All repos will be stored in a base folder
   - Option to create a separate, web-accessible directory.

- We combine all installation code that currently exists into a single script.
   - Includes installation script described above.
   - Includes stuff in framework/bin
   - Includes stuff in horde-support/maintenance-tools
   - Includes the groupware install code (in fact, the  
Horde_Core_Bundle code is probably a good place to start in terms of  
creating the install script).
   - This script can be packaged via PHAR
   - Benefit: development install script work can be leveraged to make  
end-user installs better also.
   - From a technical perspective: the goal would be to create test  
installations and a developer installation using Vagrant where the  
provisioning file contains nothing but 'horde-install' commands.

Given the fact that we need to be using Composer ASAP, and that Travis  
is currently broken/unusable, the priority on this is high.  My  
schedule looks a bit more clear the next week or two, so I can  
hopefully provide support to help get this done.

michael

___________________________________
Michael Slusarz [slusarz at horde.org]