icon

Gitslave—gits

Introduction to GitSlave

Gitslave creates a group of related repositories—a superproject repository and a number of slave repositories—all of which are concurrently developed on and on which all git operations should normally operate; so when you branch, each repository in the project is branched in turn. Similarly when you commit, push, pull, merge, tag, checkout, status, log, etc; each git command will run on the superproject and all slave repositories in turn. This sort of activity may be very familiar to CVS and (to a lesser extent) Subversion users. Gitslave's design is for simplicity for normal git operations.

Gitslave has been used for mid-sized product development with many slave repositories (representing different programs and plugins), branches, tags, and developers; and for single-person repositories tracking groups of .emacs and .vim repositories (in the latter case, it is basically used to keep the slave repositories up to date via a single command).

The gits wrapper typically runs the indicated git command on each repository in the project and combines (and occasionally post-processes for some special commands) the output from the individual git commands to make everything clearer, which is very useful when you have a few dozen slaves—looking at a concatenation of normally identical output for each git command would lose the wheat in the chaff.

Gitslave does not take over your repository. You may continue to use legacy git commands both inside of a gits cloned repository and outside in a privately git-cloned repository. Gitslave is a value added supplement designed to accelerate performing identical git actions over all linked repositories and aside from one new file in the superproject, adjustments to .gitignore, and perhaps a few private config variables, does not otherwise affect your repositories.

Other options

git-submodules is the legacy solution for this sort of activity. submodules went a different way where you have a submodule at a semi-fixed commit. It is a little annoying to make changes to the submodule due to the requirement to check out onto the correct submodule branch, make the change, commit, and then go into the superproject and commit the commit (or at least record the new location of the submodule). It was originally designed for third party projects which you typically do not doing active development on (it works the other way with a little inconvenience). Most git commands performed on the superproject will not recurse down into the submodules. As suggested above, submodules give you a tight mapping between subproject commits and superproject commits (you always know which commit a subproject was in for any given superproject commit).

Another option is to stick everything in one giant repository (either natively or by the git subtree merge strategy). This might make your repository annoyingly large and it is usually a bad idea to aggregate multiple concepts in the same repository. It also doesn't work conveniently (or at least efficiently) if the subsets are shared with other super-projects or you changes need to be shared with the other super-projects or back upstream.

Another options include repo from Google, used with Android. Repo seems to work much like gitslave from a high level perspective, but I've not seen a lot of documentation on using it for other projects. Gitslave also came first.

Still another option is kitenet's mr which supports multiple repository types (CVS, SVN, git, etc). It is absolutely the solution for multi-SCM projects, but since it works on the lowest common denominator you would lose much of the expressive power of git.

Gitslave is not perfect

Gitslave is imperfect in a few ways. It can complicate forensic archeology, it may need special care and feeding if one or more of the repositories are third party repositories, you can have partial success and partial failure (no atomic cross repository actions), not every git command has specific support in gits which needs it, and things can get a little squirrelly if different branches/tags have different attached slave repositories. However, we have not had any significant problems in over two years of intensive work on a project using this script nor has anyone else reported anything—do not mistake that for a warranty or a guarantee, for there is none.

Gitslave complicates forensic archeology in two ways. Most obviously you cannot have gitk (or something similar) show the complete history of all projects in all linked repositories. Less obviously, there is a very loose relationship between commits in different repositories. You cannot easily and precisely determine what commit/SHA any other repository was at when a particular commit was made (though you can approximate and assume pretty easily). Only tags provide exact synchronization between different repositories. Thus, gitslave may not be appropriate for blame-based debugging or egofull programming.

Your setup may need special care and feeding if one or more of the repositories is a third party repository. If you blindly attached the true upstream master to your local repository, you are at the mercy of the upstream commits to master. If there is a change which is not fully baked, you cannot refuse to accept it. Also you cannot easily use public branches since you probably will be unable to push those branches to the third party repository. The solution is to:

  • Consider using a unique naming system for branches and tags. This allows you to keep your branches and tags separate from the upstream branches and tags. This might even go as far as ditching master as your normal branch for your project-specific repositories (`git symbolic-ref HEAD refs/heads/mymaster` can change the default branch when cloning from a bare clone).
  • Choose one of the following schemes for updating:
    • Keep a project-local master mirror repository for the third party package as your project's upstream (git clone --mirror --shared URL mydir). Periodically fetch in the bare repository. When you are ready to bring in some/all changes, you can `git merge` from remote/origin/ to . This has the disadvantage of requiring server-side git commands (the fetch) to be executed, of requiring a strict separation of reference namespace, and requires that you remember which upstream branches correspond to which project branches, but at least you can see (via gitk) those merges with the correct names.
    • A slight variant on the above is to have a normal bare repository as the project local master, and use a bare mirrored client repository (with the projectmaster as a remote) as a proxy to avoid having to run commands on the project repository server. Fetch on origin and (metaphorically) `git push --all --tags projectmaster` You then can have a normal clone do the merge of origin/master into mymaster. As long as you keep all local changes off the upstream branch, your transfer repository can happily import changes from the true upstream to the projectmaster and a normal clone can merge as necessary. It still requires a strict separation of reference namespace, and you still have to remember which upstream branches correspond to which project branches, but at least you can see (via gitk) those merges with the correct names.
    • The next variant gets rid of the requirement to have a strict separation of upstream namespace and your project namespace (except for the namespaceless tags). You create a normal project-master bare repository and have a normal clone of it. That clone add a remote for the true upstream. That transfer clone then merges between the upstream remote branch and the project branch and pushes the result to origin as normal. This still has the problem that there is no memorized mapping between the upstream and project branches. Even worse, no-one except this repository (or any repository with upstream as a remote) will be able to see (via gitk) the mapping. They will just see the merge from an anonymous branch.
    • Finally we have the punting option. Have a normal bare repo as a local master and create a vendor branch in the repository. When you want to update, checkout the vendor branch and replace the working directory with the most recent checkout/tarball from the appropriate upstream release/commit. Then merge the changes in. You lose the detailed history of the upstream changes, but this is a very easy and tradition method of importing changes. There is no question of namespace contamination, but you must manually figure out what to merge where in a normal checkout from your local project master (though gitk can help you see what you did in the past). This doesn't work at all conveniently if different local-project release branches are tracking different upstream-project release branches—creating multiple vendor branches loses the simplicity which makes this option attractive.

Some git subcommands need special support from gitslave because they deal with (typically) repository URLs. For instance, `gits remote add NAME URL` is special cased because it has to figure out the correct URL for each of the submodules based on the superrepository URL and the subproject information. However, not all git commands have been specially modified when run with gits. See the manual page for the list of the ones which have, but specifically `gits remote set-url` and `gits branch --set-upstream` are two which have not been specially supported yet.

Even less perfect is the full and complete project documentation on what gitslave does, how it does it, and the various features and tweaks it might have. Gitslave isn't all that complex so the hope is that it doesn't need alot. We have an extensive manual page which is a good first step, and there is a lengthy tutorial on basic gitslave operations. See the links on the left for more information.

Summary, gitslave is a powerful tool when used for good

When you have a problem which calls for easy multirepository management without lots of synchronization, where you typically might want to run the same git command over every repository in your project, gitslave is the solution for you.