Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Comments on Git: How to clone only a few recent commits?

Parent

Git: How to clone only a few recent commits?

+11
−0

How do I clone the repository with only part of the history? For example, let's say I want to download only the last 5 commits out of thousands.

History
Why does this post require moderator attention?
You might want to add some details to your flag.
Why should this post be closed?

0 comment threads

Post
+10
−0

How do I clone the repository with only part of the history?

It depends on what part you want. It's possible to have shallow clones (which is exactly what you need, only a part of the commit history), and the documentation says there are the following options:

--depth <depth>
Create a shallow clone with a history truncated to the specified number of commits. Implies --single-branch unless --no-single-branch is given to fetch the histories near the tips of all branches. If you want to clone submodules shallowly, also pass --shallow-submodules.

--shallow-since=<date>
Create a shallow clone with a history after the specified time.

--shallow-exclude=<revision>
Create a shallow clone with a history, excluding commits reachable from a specified remote branch or tag. This option can be specified multiple times.

So you can choose the maximum depth, a start date or a revision you don't want to be included in the shallow clone. Which one to use, depends on what you need.

The question asks for "a few recent commits" and "the last 5 commits", and I'm afraid that the available options (specially --depth, mentioned in another answer) might not work in all cases.

With --shallow-since= you can set a start date, but can't control the number of commits. And with --depth, it's not guaranteed that the number of commits will be the same as the depth.


When you clone a repository, Git also sets your local HEAD to be the same HEAD set in the remote repository (or a specific branch, if you provide one in the command line, such as git clone url --branch=somebranch). And the --depth option will fetch all commits reachable from that HEAD, stopping at the specified depth. But setting a depth to some value X doesn't mean that it'll fetch exactly X commits. This option only tells Git the maximum number of "levels" to fetch, which might or might not result in the same number of commits.

For instance, if a commit has more than one parent (which is pretty common when a merge happens), then all parents will be at the same depth (at the same "level"), hence all will be fetched/downloaded.

I've made a quick test here: first I checked out the master branch, then I merged 3 branches all at once, so the resulting commit has 4 parents (master's previous commit, plus the tip of the 3 merged branches).

I pushed this to a remote repo, then cloned it with git clone --depth=2 remore_url, and 5 commits were fetched. I've checked this with git log --graph --format="%C(#3299ff)%ad %C(auto)%h %C(#cdcd51)[%p] %C(#eeeeee bold)(%an)%C(auto)%d: %s" --decorate, and the output was:

*---.   2023-08-21 13:30:41 534da95 [99be355 8896d09 7519854 615db8f] (Hugo Kotsubo) (HEAD -> master, origin/master, origin/HEAD): Merge branches 'b1', 'b2' and 'b3'
|\ \ \  
| | | * 2023-08-21 13:28:43 615db8f [] (Hugo Kotsubo) (grafted): b3
| | * 2023-08-21 13:28:24 7519854 [] (Hugo Kotsubo) (grafted): b2
| * 2023-08-21 13:28:08 8896d09 [] (Hugo Kotsubo) (grafted): b1
* 2023-08-21 13:27:02 99be355 [] (Hugo Kotsubo) (grafted): new file

We can see that "level 1" is the remote's HEAD (in this case, the master branch). And "level 2" contains the tip of branches b1 (commit 8896d09), b2 (commit 7519854) and b3 (615db8f), and also the commit 99be355, which was the master branch before the merge.

Conclusion: --depth tells Git what the maximum depth I want, but the number of commits won't necessarily be the same. In the example above, I set the maximum depth to 2, but 5 commits were fetched (because one level has more than 1 commit).


Setting the maximum depth also doesn't guarantee that it'll get the most recent commits of the whole repository. What if another branch created lots of recent commits, but they're not merged onto master yet? With the above solution, the only guarantee I have is that I've got the last commits in the branch that corresponds to remote's HEAD.

Of course I could do git clone --depth=5 url --branch=anotherbranch, but then I'll get only the most recent commits of that branch - and I'll need to have prior knowledge that that specific branch has the most recent commits, if I want "the most recent of all".

The same applies to --shallow-since: it'll fetch the commits on the remote's HEAD (or a branch specified by --branch option), but it won't work in cases where another branch has the most recent commits.

Actually, it's more complicated than that. What if the most recent commit is in one branch, the second most recent is in another branch, and so on? Then cloning one single branch won't do the trick.

If you want to know the most recent commits across all branches, I'm afraid there's no way to do it with git clone and shallow clones (but I'd love to see an answer proving me wrong). Anyway, for that case, the only solution I can think of is: you'll have to clone the entire repository and then search for those commits (for instance, with something like git branch --sort=-committerdate or git for-each-ref --sort=-committerdate refs/heads/, and then getting the first N lines).

History
Why does this post require moderator attention?
You might want to add some details to your flag.

1 comment thread

How does git handle incomplete (local) repositories? (2 comments)
How does git handle incomplete (local) repositories?
Iizuki‭ wrote 8 months ago

How does git actually handle incomplete (local) repositories? Given that commits are changesets, just cloning a few of the most recent changes wouldn't result in anything sensible. So does it like create an archive of the history beyond the desired depth, and apply the most recent commits on that?

hkotsubo‭ wrote 8 months ago · edited 8 months ago

Iizuki‭ Actually, a commit is a snapshot (references: 1, 2): it contains a pointer to a tree object, which represents the state of the whole working dir - BTW, a tree contains pointers to blobs (files) and another trees (subdirectories, which in turn can contain more pointers to other blobs or trees, and so on), so it represents the full content of the working dir when it was commited.

Even if you shallow-clone only a single commit, all the related objects (the respective tree and all its blobs and sub-trees) are also downloaded. Hence, the working dir's state represented by that commit can be fully reconstructed, even if you don't have the whole history.