PHASE 1 OF 14

Git Internals & GitHub's Object Model

How Git actually stores your code, what rebase and merge do at the object level, how to rescue commits the reflog saves, and how LFS and sparse checkout handle scale — the mental models senior developers rely on

Git Internals Object Model Rebase Reflog LFS Monorepo
TIP
Use the sidebar or nav bar to jump between phases. Each phase builds on earlier ones but can also stand alone as a reference.
1.1

Git's Content-Addressable Storage

Git doesn't store diffs. It stores snapshots, and every object in that snapshot is identified by the SHA-1 (or SHA-256 in newer repos) hash of its content. This is called content-addressable storage, and it's the foundation everything else builds on.

There are exactly four object types:

blob

Raw file content — just bytes. No filename, no path. Two files with identical content share one blob across the entire history.

tree

A directory listing: maps filenames to blob SHAs (files) or other tree SHAs (subdirectories). Equivalent to a directory entry, not a full directory walk.

commit

Points to exactly one tree (the root snapshot), zero or more parent commits, and contains author, committer, timestamp, and message. The SHA of a commit changes if any of these change.

tag

An annotated tag object — wraps another object (usually a commit) with a tagger name, date, and message. Lightweight tags are just refs, not objects.

You can inspect any object directly:

$ git cat-file -t a3f8d12 commit $ git cat-file -p a3f8d12 tree 9f3a2b1c8d7e4f0a1b2c3d4e5f6a7b8c9d0e1f2a parent 7b2c1a0d9e8f7c6b5a4d3c2b1a0f9e8d7c6b5a4d author Alice <alice@example.com> 1718150400 +0530 committer Alice <alice@example.com> 1718150400 +0530 feat: add payment retry logic $ git cat-file -p 9f3a2b1c # inspect the tree 100644 blob d8e9f0a1b2c3 README.md 100644 blob 4c5d6e7f8a9b package.json 040000 tree a1b2c3d4e5f6 src

The object graph for a simple two-commit repo looks like this:

# Object graph — each arrow is "points to" (stored as SHA reference) commit a3f8d12 ──► tree 9f3a2b1 ──► blob d8e9f0a (README.md content) │ └──► blob 4c5d6e7 (package.json content) │ └──► tree a1b2c3d (src/ subtree) │ └── parent ──► commit 7b2c1a0 ──► tree 8e9f0a1 ... HEAD ──► refs/heads/main ──► a3f8d12

Three consequences of this model that matter in practice:

  • Identical files across history are stored once. Rename a file → new tree object, same blob. Edit a file → new blob, new tree, new commit. The old blobs are never deleted until a garbage-collect prune.
  • A commit SHA is a cryptographic fingerprint of the entire repository state — the tree, all its blobs, and all ancestor commits. You cannot silently change history without the SHA changing.
  • Branches and tags are just files in .git/refs/ containing a SHA. Creating a branch is O(1) — just write 40 bytes to a file.
INSIGHT
GitHub displays a commit SHA and lets you permalink to any tree state using it. The /tree/<sha>/ URL will always resolve to the exact snapshot — even after the branch is deleted — as long as the object hasn't been garbage-collected from the server.
1.2

Rebase, Merge & Cherry-Pick at the Object Level

Understanding what each operation actually does to the object graph eliminates the confusion about when history gets "rewritten."

Merge

A merge creates a new commit with two parents. The object graph gains a node; no existing objects are touched.

# Before merge: feature branched from main main: A ── B ── C ↑ feature: A ── B ── D ── E # After: git merge feature (from main) main: A ── B ── C ── M ← merge commit M has parents C and E ╲ ╱ feature: D ── E

No rewriting. C, D, and E are untouched. M is new. Fast-forward is just moving the branch ref pointer when main had no new commits since the branch diverged.

Rebase

Rebase replays commits as new objects on top of the target. It does not move commits — it creates new ones with new SHAs.

# git rebase main (from feature) # Before: main: A ── B ── C feature: A ── B ── D ── E # After: main: A ── B ── C feature: A ── B ── C ── D' ── E' ← D' and E' are NEW objects (new SHAs) # Old D and E are now unreferenced — they'll be pruned by gc
RULE
Never rebase a branch that other developers have based work on. Because D and E no longer exist (they're replaced by D' and E'), anyone who branched off D will have a divergent history that's painful to reconcile. Rebase is safe on your own local/feature branches before a PR; never rebase main or a shared branch.

Cherry-Pick

Cherry-pick copies the diff introduced by a commit and applies it as a new commit on the current branch. The new commit has a different SHA and a different parent — it is not the same object, even though it introduces the same change.

# git cherry-pick E (from main) main: A ── B ── C ── E' ← E' is a new commit with the same diff as E feature: A ── B ── C ── D ── E # E and E' are different objects despite identical diffs
OperationCreates new objects?Mutates existing?Produces merge commit?Linear history?
Merge1 commit (merge commit)NoYesNo
Fast-forward mergeNoNo (just moves ref)NoYes
RebaseN new commits (one per replayed commit)No (old ones orphaned)NoYes
Squash merge1 commit (squashed)NoNoYes
Cherry-pick1 commit per picked commitNoNoYes
1.3

Reflog: Recovering Lost Commits

The reflog is a local, per-repository journal that records every time a ref (branch, HEAD) changes — including resets, rebases, and amends. It is your safety net. It is not pushed to GitHub.

$ git reflog a3f8d12 (HEAD -> main) HEAD@{0}: commit: feat: add retry logic 7b2c1a0 HEAD@{1}: rebase (finish): returning to refs/heads/main 7b2c1a0 HEAD@{2}: rebase (pick): fix: null pointer in checkout 3e1c9d8 HEAD@{3}: rebase (start): checkout main 9a4b5f1 HEAD@{4}: commit: wip: half-done retry b8c7d2e HEAD@{5}: checkout: moving from feature to main

Scenario 1 — Accidental reset

$ git reset --hard HEAD~3 # oops, lost 3 commits $ git reflog # find the SHA before the reset ... a3f8d12 HEAD@{1}: commit: feat: the thing I just lost $ git reset --hard a3f8d12 # restore

Scenario 2 — Detached HEAD rescue

When you check out a commit SHA directly, you enter "detached HEAD" state. Any commits you make here are not attached to a branch — they'll be garbage-collected eventually.

$ git checkout a3f8d12 # detached HEAD $ git commit -m "experiment" # creates f9e8d7c, not on any branch $ git checkout main # f9e8d7c is now "lost" # Recovery: reflog still has it $ git reflog f9e8d7c HEAD@{1}: commit: experiment $ git branch recover-experiment f9e8d7c # attach a branch — now safe

Scenario 3 — After a bad interactive rebase

# Rebase went wrong — find the pre-rebase state $ git reflog | grep "rebase (start)" 3e1c9d8 HEAD@{8}: rebase (start): checkout main $ git reset --hard 3e1c9d8 # back to before the rebase started
IMPORTANT
Reflog entries expire. By default, reachable entries expire after 90 days, unreachable (orphaned) entries after 30 days. Run git gc and they may be pruned earlier. Don't wait weeks to rescue a lost commit.
SENIOR TIP
git fsck --lost-found writes dangling blobs and commits to .git/lost-found/ — a last resort when reflog is insufficient. Also useful for recovering accidentally deleted stash entries.
1.4

Interactive Rebase: Sculpting History Before a PR

Interactive rebase (git rebase -i) lets you rewrite any sequence of commits before they go public. Used well, it produces a clean history that's easy to review and bisect. Used poorly, it destroys context.

$ git rebase -i HEAD~5 # rewrite last 5 commits $ git rebase -i main # rewrite all commits since branching from main

The editor opens with a list of commits and a command for each:

pick a3f8d12 feat: add payment retry logic pick 7b2c1a0 fix: typo in error message pick 3e1c9d8 wip: forgot to remove debug log pick 9a4b5f1 fix: actually fix the debug log pick b8c7d2e test: add unit tests for retry

Commands and when to use them

CommandWhat it doesWhen to use
pickKeep the commit as-isDefault — commits that are clean and self-contained
rewordKeep the commit, edit the messageImproving message quality before PR review
editPause here so you can amend the commitSplitting a commit into two, or adding a missed file
squashMerge into the previous commit; prompts for combined messageCombining "wip" and its follow-up fix into one commit
fixupLike squash but silently discards this commit's messageTiny corrections — the parent commit's message is correct
dropDelete the commit entirelyRemoving experiment commits, reverted code, debug commits
execRun a shell command after this stepRunning tests at each commit during a complex rebase

Cleaning up the example above before a PR:

pick a3f8d12 feat: add payment retry logic reword 7b2c1a0 fix: correct error message wording in retry handler drop 3e1c9d8 wip: forgot to remove debug log fixup 9a4b5f1 fix: actually fix the debug log pick b8c7d2e test: add unit tests for retry

Result: three clean commits. The wip and its fix are gone. The reviewer sees intent, not the journey.

Splitting a commit with edit

# 1. Mark the commit as 'edit' in the rebase list # 2. Git pauses — you're now on that commit $ git reset HEAD~ # unstage everything from this commit $ git add src/payment.js $ git commit -m "feat: add retry logic to payment service" $ git add src/email.js $ git commit -m "feat: add retry logic to email sender" $ git rebase --continue # resume the rebase
TEAM POLICY SUGGESTION
Establish a branch hygiene rule: squash or fixup WIP commits before requesting review, but preserve multi-commit structure when each commit is genuinely independent (the reviewer can review commit-by-commit). Enforce this via PR description template rather than a forced-squash-merge policy — the latter loses intentional commit structure.
1.5

Merge Strategies: recursive, ort, octopus, ours

When you run git merge, Git picks a merge strategy to combine the histories. Most developers never need to specify this explicitly, but understanding what happens explains conflict patterns and resolution behaviour.

StrategyWhen usedHow it works
ort Default since Git 2.34 (replaces recursive) Optimised Recursive Three-way merge. Finds the merge base, runs a three-way merge per file, handles criss-cross merges by recursively merging the merge bases. Faster and produces fewer spurious conflicts than its predecessor.
recursive Default in Git <2.34; still available explicitly Same concept as ort but older implementation. Use -X theirs or -X ours to auto-resolve conflicts in one side's favour.
octopus Merging 3+ branches in one command Designed for merging multiple feature branches simultaneously. Refuses to proceed if any conflict requires manual resolution — it's for clean, independent branches only.
ours Explicit -s ours Records a merge commit but takes 100% of the current branch's tree. The other branch's changes are discarded. Useful for "officially" merging a branch you're abandoning without incorporating its changes.
subtree Working with git subtrees Like recursive, but Git attempts to recognise that one repo is a subdirectory of the other and adjusts the diff accordingly.

Strategy options (-X)

The recursive/ort strategies accept options via -X:

$ git merge -X theirs feature # auto-resolve all conflicts by taking "theirs" $ git merge -X ours feature # auto-resolve all conflicts by keeping "ours" $ git merge -X ignore-space-change feature # ignore whitespace-only changes
CAUTION
-X theirs and -X ours silently discard real changes in conflicts. Use them only when you genuinely want one side to win for all conflicts — for example, merging a dependency update branch where only one version should survive. Never use them as a "just make it compile" shortcut on application logic.

GitHub's merge button uses the ort strategy

When GitHub performs the server-side merge (the green "Merge pull request" button), it uses git merge --no-ff with the default ort strategy. If GitHub reports "This branch has conflicts that must be resolved," it means ort couldn't auto-resolve — you need to pull the branch, merge locally, fix conflicts, and push.

1.6

GitHub's Three Merge Modes

GitHub exposes three merge options per repository (configurable under Settings → General → Pull Requests). Understanding their object-level effects helps you choose the right default for your team.

── Merge commit (default) ────────────────────────────────── # Equivalent to: git merge --no-ff main: A ── B ── C ─────── M (M has two parents: C and E) ╲ ╱ feature: D ── E History: complete, non-linear. Every commit from feature is preserved. Bisect: clean — each commit is a real, tested state. Log readability: complex on busy repos. ── Squash and merge ───────────────────────────────────────── # Equivalent to: git merge --squash && git commit main: A ── B ── C ── S (S contains D+E squashed, parent is C only) feature: D ── E (orphaned — feature branch now disconnected) History: linear, clean. One PR = one commit on main. Bisect: each point on main = one merged PR — easy to identify which PR broke things. Drawback: feature branch can't be re-merged cleanly; internal commits are lost. ── Rebase and merge ───────────────────────────────────────── # Equivalent to: git rebase main && git merge --ff-only main: A ── B ── C ── D' ── E' (D' and E' are rebased copies of D and E) feature: D ── E (original commits — now orphaned) History: linear, preserves individual commits. Bisect: finest granularity — individual commits on main. Drawback: commits lose their original SHA — GitHub links to merged PR but commit SHAs differ.
ModeHistory shapePreserves commitsBest for
Merge commit Non-linear (diamond) Yes — original SHAs Teams that value full audit trails; repos where each commit must be independently deployable
Squash merge Linear No — squashed into one Teams with messy intermediate commits; products where one PR = one deployable unit
Rebase merge Linear Yes — as replayed copies Teams with disciplined commit hygiene; open-source projects that value per-commit history
LEAD RECOMMENDATION
Pick one merge strategy per repo and enforce it via branch protection ("only allow squash merging" or similar). Mixed strategies on main make tooling (changelog generators, bisect scripts, deploy pipelines) complicated. Squash merge works well for most product teams; rebase merge works well for library/open-source repos with disciplined contributors.
1.7

Git Large File Storage (LFS)

Git performs poorly with large binary files (design assets, ML model weights, compiled binaries, test fixtures over a few MB). Each version of a 50 MB binary is stored as a full blob — clone time and repo size grow unboundedly. Git LFS solves this by replacing large files in the repo with small pointer files and storing the actual content on a separate LFS server.

What actually lives in the repo

# Without LFS — the binary IS in Git's object store: blob a4b3c2d1... ← 47 MB model.pkl stored as a Git blob # With LFS — the repo contains only a pointer file: blob f1e2d3c4... ← tiny text pointer: version https://git-lfs.github.com/spec/v1 oid sha256:9b4e...c3f1 size 49283145 # The actual 47 MB lives on the LFS server, fetched on demand

Setting up LFS

# Install once per machine $ git lfs install # Track a file pattern in this repo (.gitattributes is committed) $ git lfs track "*.psd" $ git lfs track "models/**/*.pkl" $ git add .gitattributes $ git commit -m "chore: track PSD and model files via LFS" # Verify what's tracked $ git lfs ls-files 9b4ec3f1 * models/v2/classifier.pkl a7b8c9d0 * assets/hero.psd # See LFS storage usage $ git lfs status

GitHub LFS quotas

PlanFree storageFree bandwidth/monthOverage
Free / Pro1 GB1 GB$0.07/GB storage, $0.0875/GB bandwidth
Team / Enterprise1 GB (base)1 GB (base)Same rates; data packs available
IMPORTANT
LFS files are not included in repository archives (the "Download ZIP" button). If your project needs downloadable releases that include large assets, attach them explicitly to a GitHub Release rather than relying on LFS.

When LFS is worth the overhead

  • Binary assets that change frequently (design files, ML datasets)
  • Any single file over ~5 MB that will be updated across the project lifetime
  • Test fixture archives, pre-built binaries checked in for reproducibility

When not to use LFS: build artifacts that should live in an artifact registry (GitHub Packages, S3, Artifactory), generated files that shouldn't be in source control at all, or files that are added once and never changed (just keep them as regular blobs).

1.8

Sparse Checkout & Partial Clone

When a monorepo grows to hundreds of thousands of files, a full clone and checkout take minutes and consume gigabytes of disk. Git provides two complementary tools to reduce this: partial clone (download fewer objects) and sparse checkout (write fewer files to the working tree).

Partial clone — fetch only what you need

Partial clone defers downloading blob content. You clone the tree and commit graph without downloading every file's content — blobs are fetched lazily when you actually read the file.

# Blobless clone: download commits and trees, fetch blobs on demand $ git clone --filter=blob:none https://github.com/org/monorepo.git # Treeless clone: even more aggressive — only commits are pre-downloaded $ git clone --filter=tree:0 https://github.com/org/monorepo.git # Check what type of clone you have $ git config --local core.partialclonefilter blob:none

Sparse checkout — work in a subset of the tree

Sparse checkout writes only specified directories/files to your working tree. Combined with partial clone, this gives you a fast, small workspace inside a huge repo.

# Enable sparse checkout on an existing clone $ git sparse-checkout init --cone # cone mode: fast, directory-based patterns # Specify which directories you want $ git sparse-checkout set services/payments services/auth shared/utils # List current patterns $ git sparse-checkout list services/payments services/auth shared/utils # Add more directories without losing existing ones $ git sparse-checkout add services/notifications

Combined workflow: clone only what you'll work on

# One-step: partial blobless clone + sparse checkout into one service $ git clone --filter=blob:none --sparse https://github.com/org/monorepo.git $ cd monorepo $ git sparse-checkout init --cone $ git sparse-checkout set services/payments # Result: only services/payments/ and the root files exist locally. # A 40 GB monorepo becomes a <200 MB local clone.

Non-cone mode for fine-grained patterns

Cone mode only supports directory boundaries (fast, O(1) lookups). If you need file-level patterns, use non-cone mode — but it's significantly slower on large repos:

$ git sparse-checkout init # no --cone flag $ cat .git/info/sparse-checkout /* !/docs !/legacy !*.test.ts
TechniqueReducesBest for
Partial clone (blob:none)Download size & initial clone timeAny large repo; CI runners cloning for a specific job
Sparse checkout (cone)Working tree size; git status/add performanceMonorepos; devs working in one service area
Shallow clone (--depth=1)History downloadCI: build + test without needing full history; not suitable for git operations that need ancestry
GITHUB ACTIONS NOTE
The actions/checkout action supports both partial clone and sparse checkout via its filter and sparse-checkout inputs (v4+). Use these in CI workflows that only need a single service's files to avoid cloning the entire monorepo on every run.

Up Next — Phase 2: Repository Architecture & Monorepo Strategy

Polyrepo vs monorepo decision frameworks, CODEOWNERS-driven ownership models, repository templates, rulesets, and the org-level .github repo.

Continue to Phase 2 → Back to Hub