Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics

Dashboard
Notifications
Mark all as read
Q&A

How should we share some content between two otherwise-independent git repositories?

+14
−0

We have two teams, dev and doc, and I'd like them to have shared access (via git) to a common subset of content. Specifically, I would like the examples that are used in the doc and that are scriptable to be part of our regular (dev) tests, so we'll know if a code change has broken one of them. Right now these examples aren't in source control or regularly tested at all; I think that's bad, I want to get them checked in somewhere in a way that we can plug them in to dev's tests, and that's the reason for this question.

Elsewhere I asked about git submodule versus git subtree, but some things have changed for us since then and I'm now wondering whether the correct answer is "neither". Here are our constraints:

  • The software runs on Linux. Developers thus have Linux environments. For reasons I can't change, the doc team uses Windows but can ssh to Linux machines to run the software. (Or use locally-run VMs, but that's usually more work.)

  • Everybody uses git through the same git and Bitbucket servers. There is a doc repository and, separately, a dev repository (in different projects). The dev repository includes tests. Each test has an input script and expected output -- pretty standard stuff.

  • The dev repository is large. We don't want doc to have to check it out, especially because they wouldn't actually be able to do anything with it on Windows.

  • We would like to check in example scripts and their expected output in one place, such that the dev tests, doc, and (eventually)1 the doc build can use them. Doc would use these examples in two environments: Linux (where they can run them and thus create/maintain them), and Windows (where they can access them to use in doc).

  • Dev and doc both use the same branching policies, though not the same timing. (As is usual, doc lags dev a bit at the end of a release cycle.)

The examples I'm talking about would be used by both the dev and doc projects, which makes the examples logically a child of both. But a submodule or subtree can't have two parents (and it would probably be a bad idea anyway). In thinking about the problem, I've come to wonder whether the examples should instead be a third (top-level) repo used by both teams and independently managed. We would need to modify how the tests are run to include this other directory, which I assume would have to be external to the dev working tree. (Otherwise git would get confused about changes made therein, right?) This would effectively create a dependency from the dev repo to the examples repo, in that if you just checked out dev and ran the tests, they'd fail when they got to the references to the examples. That isn't morally pure but seems like it could be worked around, particularly if the makefile for the tests emits a suitable warning/reminder. A possibly thornier problem is that it would be each person's responsibility to be on the correct branch of the examples repo; they're separate repos so git won't help you keep them in sync.

I hadn't used git before joining this company, so I've only really seen one group's practices. How should I be thinking about this shared body of content -- separate repo, or connected in git somehow to two other repos?

  1. Initially I expect that doc will cut/paste into the doc from the examples in this shared repository. That's what people do now -- they run examples locally and then copy the code and output into the doc. Yeah, not ideal, but one step at a time... Eventually I imagine the doc build being able to use something like includes to pull in stuff from the actual tested examples, someday.

Why does this post require moderator attention?
You might want to add some details to your flag.
Why should this post be closed?

2 comments

One opinion is to solve it with a script on the build sever that gets the examples from the doc git. Ringi‭ 7 months ago

Why can't a repository be a sub-module in two different repositories? I'm not aware of anything that would make that difficult. (No comment on sub-trees because I'm not familiar with them). Martin Bonner‭ 3 months ago

4 answers

+10
−0

Generally speaking, if two groups of people collaborate on the same software, I'd recommend they put everything in a shared git repository:

  • Documentation is intimately tied to the version of the software it describes. Looking at code examples in version X while writing docs for version Y is generally a mistake and should ideally be prevented by tooling. If everything lives in the same repository, git will default to checking out the same version of everything, and if you ask git to check out a different version it will automatically do this for both the code and the documentation. Submodules won't do that.

  • Putting everything into the same repo removes artificial barriers to collaboration. Each group can look at what the other group did, and propose changes to the work of the other group in a way that is trivial to integrate (unlike git subtree, which makes this somewhat hard, especially if changes flow in both directions). For instance, a developer making a change to an API could see how that API is documented, and in the same commit update both the code and the documentation, send this through a review by both groups. I realize the groups may not currently work together so closely, but this may be simply because their current tooling has made it hard to do that, and say nothing about the value of closer collaboration. I emphasize this because in agile software development, both developers and documentation specialists are usually considered to be part of the same, cross-functional team. Creating separate teams is not regarded as best practice because it impedes sharing and collaboration.

With that out of the way, let's look at the impediments you mentioned:

Developers thus have Linux environments. For reasons I can't change, the doc team uses Windows but can ssh to Linux machines to run the software

That's totally fine as far as git is concerned. Git allows you to check out, edit, and commit changes to a file created under a different operating system, and will even convert line endings for you if you so configure it.

If the concern is being able to run the software: if you're on Windows 10, you may be able to use the Windows Subsystem for Linux to run your Linux app directly under Windows.

The dev repository is large. We don't want doc to have to check it out, especially because they wouldn't actually be able to do anything with it on Windows.

Just how large are we talking here? And how come the size is a problem for the docs team, but not the dev team? After all, it's the same size for both ...?

Unless its hundreds of megabytes, I probably wouldn't mind - disk space is cheap, and git is really good with doing incremental downloads, so this only affects the initial check out, but not your everyday work with git.

If the repository is so large that cloning and working with it is a real pain, then I wonder how the developers cope? Do they have some tricks they might share with you? Or are they suffering too? If so, why don't they make their repository smaller?

Dev and doc both use the same branching policies, though not the same timing. (As is usual, doc lags dev a bit at the end of a release cycle.)

For branches, the doc team could simply keep committing even after the dev team is done. For instance, if you have a branch named v1.6 you could keep committing to that branch even the devs have released v1.6.

For tags, you obviously can't do that - but does the doc team need those?

Summary

In summary, sharing a repository can offer you a more seamless collaboration with the development team than git subtree or git submodule, is much easier to understand for your git newbies, and seems doable in your situation. It's therefore the option I'd investigate first.

Why does this post require moderator attention?
You might want to add some details to your flag.

4 comments

It looks like the dev team has done some repo cleanup since we first looked at this; their repo is now down to about 70GB, and doc's WIndows machines aren't loaded up with as much corporate bloatware as they used to be. So we could actually share now; thanks for pointing it out. Monica Cellio‭ 5 months ago

70GB! Good gracious! For context, the git repository of the linux kernel (30 million lines of code) with 15 years of history (950000 commits) weighs in at about 4 GB (3GB history, 1 GB checkout). Perhaps your development team could slim down its repository further? I can't imagine they enjoy working with a repository this big (or if they do, they have some tricks up their sleeve they could share). meriton‭ 5 months ago

@Monica Those are either some large files or millions of files. If it is the former, did you look into git lfs? Someone‭ 4 months ago

On further investigation, apparently they are checking in some large third-party libraries. Oof. But we managed to solve our problem another way (self-answer forthcoming). Monica Cellio‭ 14 days ago

+2
−0

I was discussing this problem again with some coworkers, and one of them said "QA has that problem too -- here's what we're doing about it".

At its core, having everybody add lots of tests to the test suite in the server repository isn't manageable. In addition to the volume -- and number of files (to dig through looking for something) can be as big a challenge as volume -- we really don't want to open that repository for writes from a larger group of people. And members of the doc and QA teams don't want to have to go through what is, for them, a pretty heavyweight process for committing changes to the server code repository.

QA has, as you might expect, a large body of tests that are independent of the developer tests. They've been using some of their own tooling to manage them (in a separate repository), but everyone involved wanted those tests to run using the same test-running framework that we use for dev tests -- but run separately. This is the problem QA was tackling recently, and it's analogous to the doc problem.

The solution we implemented was to modify the test-running framework to, optionally, link in a separate repository. The tests run from the test directory in the server repo, like the existing tests already do, but additional arguments to the invocation can now specify whether to use the QA, or doc, repo. If this is specified, then after checking out the server repo as usual for test runs, the test run also checks out the additional repository and creates a sym link in the server test directory. It then runs the tests (which test suite(s) to run is also an input) as usual.

To the test runner, it looks like the tests are in the server test directory. But the server repo doesn't need to care about them, and QA and doc don't need to be able to modify that repo.

When we're ready to integrate doc tests (example scripts) into the doc, we can have the doc build check out the additional repo in a similar fashion and then do its build. We haven't gotten that far yet.

In the end, we did something similar to a suggestion made in a comment months ago.

Why does this post require moderator attention?
You might want to add some details to your flag.

0 comments

+2
−0

I would create three repository, let's call them dev, doc and shared for simplicitiy. Then I would add shared as a dependency to the other two by using submodules.

In this way you will not only track dev, doc and shared separately, but you will also track the dependency between them.

Why does this post require moderator attention?
You might want to add some details to your flag.

0 comments

+3
−1

First off, all the things @meriton said, I think overall you would be better in a single repository as your docs and code should be changing at the same rate (or should be).

Having said that if you do have two independent pieces of software which share a common piece of code you would be better to use a package system to bundle and install your shared code. This is language dependent, but examples would be npm (javascript), nuget (.NET), pip (python) etc.

The reason you want a package over a git submodule is that you want each independent repo to be in control of its own build status. For example if a change happens in a git submodule it has a potential to break downstream repos. This sucks. Instead the behaviour you want is for the downstream repos to opt into new versions of the module. This means that if a failure occurs it is initiated by the maintainers of the consuming repo, and at a time where its clear why the failure occurred.

There are some other patterns you can apply to this in really specific situations around requiring consistency but these are probably way more than you need. Heres an article I wrote about this a few years ago https://blog.staticvoid.co.nz/2017/library_vs_microservice/

Why does this post require moderator attention?
You might want to add some details to your flag.

1 comment

A package system for separation with intentional versioning seems like a sound approach to me. But I'm the OP and probably less equipped to evaluate than others here. I see that someone has downvoted this answer and would like to understand why, if possible. Monica Cellio‭ 5 months ago

Sign up to answer this question »