How Git actually stores your code, what rebase and merge do at the object level, how to rescue commits the reflog saves, and how LFS and sparse checkout handle scale — the mental models senior developers rely on
Git doesn't store diffs. It stores snapshots, and every object in that snapshot is identified by the SHA-1 (or SHA-256 in newer repos) hash of its content. This is called content-addressable storage, and it's the foundation everything else builds on.
There are exactly four object types:
Raw file content — just bytes. No filename, no path. Two files with identical content share one blob across the entire history.
A directory listing: maps filenames to blob SHAs (files) or other tree SHAs (subdirectories). Equivalent to a directory entry, not a full directory walk.
Points to exactly one tree (the root snapshot), zero or more parent commits, and contains author, committer, timestamp, and message. The SHA of a commit changes if any of these change.
An annotated tag object — wraps another object (usually a commit) with a tagger name, date, and message. Lightweight tags are just refs, not objects.
You can inspect any object directly:
The object graph for a simple two-commit repo looks like this:
Three consequences of this model that matter in practice:
.git/refs/ containing a SHA. Creating a branch is O(1) — just write 40 bytes to a file./tree/<sha>/ URL will always resolve to the exact snapshot — even after the branch is deleted — as long as the object hasn't been garbage-collected from the server.
Understanding what each operation actually does to the object graph eliminates the confusion about when history gets "rewritten."
A merge creates a new commit with two parents. The object graph gains a node; no existing objects are touched.
No rewriting. C, D, and E are untouched. M is new. Fast-forward is just moving the branch ref pointer when main had no new commits since the branch diverged.
Rebase replays commits as new objects on top of the target. It does not move commits — it creates new ones with new SHAs.
main or a shared branch.
Cherry-pick copies the diff introduced by a commit and applies it as a new commit on the current branch. The new commit has a different SHA and a different parent — it is not the same object, even though it introduces the same change.
| Operation | Creates new objects? | Mutates existing? | Produces merge commit? | Linear history? |
|---|---|---|---|---|
| Merge | 1 commit (merge commit) | No | Yes | No |
| Fast-forward merge | No | No (just moves ref) | No | Yes |
| Rebase | N new commits (one per replayed commit) | No (old ones orphaned) | No | Yes |
| Squash merge | 1 commit (squashed) | No | No | Yes |
| Cherry-pick | 1 commit per picked commit | No | No | Yes |
The reflog is a local, per-repository journal that records every time a ref (branch, HEAD) changes — including resets, rebases, and amends. It is your safety net. It is not pushed to GitHub.
When you check out a commit SHA directly, you enter "detached HEAD" state. Any commits you make here are not attached to a branch — they'll be garbage-collected eventually.
git gc and they may be pruned earlier. Don't wait weeks to rescue a lost commit.
git fsck --lost-found writes dangling blobs and commits to .git/lost-found/ — a last resort when reflog is insufficient. Also useful for recovering accidentally deleted stash entries.
Interactive rebase (git rebase -i) lets you rewrite any sequence of commits before they go public. Used well, it produces a clean history that's easy to review and bisect. Used poorly, it destroys context.
The editor opens with a list of commits and a command for each:
| Command | What it does | When to use |
|---|---|---|
pick | Keep the commit as-is | Default — commits that are clean and self-contained |
reword | Keep the commit, edit the message | Improving message quality before PR review |
edit | Pause here so you can amend the commit | Splitting a commit into two, or adding a missed file |
squash | Merge into the previous commit; prompts for combined message | Combining "wip" and its follow-up fix into one commit |
fixup | Like squash but silently discards this commit's message | Tiny corrections — the parent commit's message is correct |
drop | Delete the commit entirely | Removing experiment commits, reverted code, debug commits |
exec | Run a shell command after this step | Running tests at each commit during a complex rebase |
Cleaning up the example above before a PR:
Result: three clean commits. The wip and its fix are gone. The reviewer sees intent, not the journey.
editWhen you run git merge, Git picks a merge strategy to combine the histories. Most developers never need to specify this explicitly, but understanding what happens explains conflict patterns and resolution behaviour.
| Strategy | When used | How it works |
|---|---|---|
| ort | Default since Git 2.34 (replaces recursive) | Optimised Recursive Three-way merge. Finds the merge base, runs a three-way merge per file, handles criss-cross merges by recursively merging the merge bases. Faster and produces fewer spurious conflicts than its predecessor. |
| recursive | Default in Git <2.34; still available explicitly | Same concept as ort but older implementation. Use -X theirs or -X ours to auto-resolve conflicts in one side's favour. |
| octopus | Merging 3+ branches in one command | Designed for merging multiple feature branches simultaneously. Refuses to proceed if any conflict requires manual resolution — it's for clean, independent branches only. |
| ours | Explicit -s ours |
Records a merge commit but takes 100% of the current branch's tree. The other branch's changes are discarded. Useful for "officially" merging a branch you're abandoning without incorporating its changes. |
| subtree | Working with git subtrees | Like recursive, but Git attempts to recognise that one repo is a subdirectory of the other and adjusts the diff accordingly. |
-X)The recursive/ort strategies accept options via -X:
-X theirs and -X ours silently discard real changes in conflicts. Use them only when you genuinely want one side to win for all conflicts — for example, merging a dependency update branch where only one version should survive. Never use them as a "just make it compile" shortcut on application logic.
When GitHub performs the server-side merge (the green "Merge pull request" button), it uses git merge --no-ff with the default ort strategy. If GitHub reports "This branch has conflicts that must be resolved," it means ort couldn't auto-resolve — you need to pull the branch, merge locally, fix conflicts, and push.
GitHub exposes three merge options per repository (configurable under Settings → General → Pull Requests). Understanding their object-level effects helps you choose the right default for your team.
| Mode | History shape | Preserves commits | Best for |
|---|---|---|---|
| Merge commit | Non-linear (diamond) | Yes — original SHAs | Teams that value full audit trails; repos where each commit must be independently deployable |
| Squash merge | Linear | No — squashed into one | Teams with messy intermediate commits; products where one PR = one deployable unit |
| Rebase merge | Linear | Yes — as replayed copies | Teams with disciplined commit hygiene; open-source projects that value per-commit history |
main make tooling (changelog generators, bisect scripts, deploy pipelines) complicated. Squash merge works well for most product teams; rebase merge works well for library/open-source repos with disciplined contributors.
Git performs poorly with large binary files (design assets, ML model weights, compiled binaries, test fixtures over a few MB). Each version of a 50 MB binary is stored as a full blob — clone time and repo size grow unboundedly. Git LFS solves this by replacing large files in the repo with small pointer files and storing the actual content on a separate LFS server.
| Plan | Free storage | Free bandwidth/month | Overage |
|---|---|---|---|
| Free / Pro | 1 GB | 1 GB | $0.07/GB storage, $0.0875/GB bandwidth |
| Team / Enterprise | 1 GB (base) | 1 GB (base) | Same rates; data packs available |
When not to use LFS: build artifacts that should live in an artifact registry (GitHub Packages, S3, Artifactory), generated files that shouldn't be in source control at all, or files that are added once and never changed (just keep them as regular blobs).
When a monorepo grows to hundreds of thousands of files, a full clone and checkout take minutes and consume gigabytes of disk. Git provides two complementary tools to reduce this: partial clone (download fewer objects) and sparse checkout (write fewer files to the working tree).
Partial clone defers downloading blob content. You clone the tree and commit graph without downloading every file's content — blobs are fetched lazily when you actually read the file.
Sparse checkout writes only specified directories/files to your working tree. Combined with partial clone, this gives you a fast, small workspace inside a huge repo.
Cone mode only supports directory boundaries (fast, O(1) lookups). If you need file-level patterns, use non-cone mode — but it's significantly slower on large repos:
| Technique | Reduces | Best for |
|---|---|---|
| Partial clone (blob:none) | Download size & initial clone time | Any large repo; CI runners cloning for a specific job |
| Sparse checkout (cone) | Working tree size; git status/add performance | Monorepos; devs working in one service area |
Shallow clone (--depth=1) | History download | CI: build + test without needing full history; not suitable for git operations that need ancestry |
actions/checkout action supports both partial clone and sparse checkout via its filter and sparse-checkout inputs (v4+). Use these in CI workflows that only need a single service's files to avoid cloning the entire monorepo on every run.