Phase 1: Git Internals & GitHub's Object Model — GitHub Advanced

1.1

Git's Content-Addressable Storage

Git doesn't store diffs. It stores snapshots, and every object in that snapshot is identified by the SHA-1 (or SHA-256 in newer repos) hash of its content. This is called content-addressable storage, and it's the foundation everything else builds on.

There are exactly four object types:

blob

Raw file content — just bytes. No filename, no path. Two files with identical content share one blob across the entire history.

tree

A directory listing: maps filenames to blob SHAs (files) or other tree SHAs (subdirectories). Equivalent to a directory entry, not a full directory walk.

commit

Points to exactly one tree (the root snapshot), zero or more parent commits, and contains author, committer, timestamp, and message. The SHA of a commit changes if any of these change.

tag

An annotated tag object — wraps another object (usually a commit) with a tagger name, date, and message. Lightweight tags are just refs, not objects.

You can inspect any object directly:

$ git cat-file -t a3f8d12 commit $ git cat-file -p a3f8d12 tree 9f3a2b1c8d7e4f0a1b2c3d4e5f6a7b8c9d0e1f2a parent 7b2c1a0d9e8f7c6b5a4d3c2b1a0f9e8d7c6b5a4d author Alice <alice@example.com> 1718150400 +0530 committer Alice <alice@example.com> 1718150400 +0530 feat: add payment retry logic $ git cat-file -p 9f3a2b1c # inspect the tree 100644 blob d8e9f0a1b2c3 README.md 100644 blob 4c5d6e7f8a9b package.json 040000 tree a1b2c3d4e5f6 src

The object graph for a simple two-commit repo looks like this:

# Object graph — each arrow is "points to" (stored as SHA reference) commit a3f8d12 ──► tree 9f3a2b1 ──► blob d8e9f0a (README.md content) │ └──► blob 4c5d6e7 (package.json content) │ └──► tree a1b2c3d (src/ subtree) │ └── parent ──► commit 7b2c1a0 ──► tree 8e9f0a1 ... HEAD ──► refs/heads/main ──► a3f8d12

Three consequences of this model that matter in practice:

Identical files across history are stored once. Rename a file → new tree object, same blob. Edit a file → new blob, new tree, new commit. The old blobs are never deleted until a garbage-collect prune.
A commit SHA is a cryptographic fingerprint of the entire repository state — the tree, all its blobs, and all ancestor commits. You cannot silently change history without the SHA changing.
Branches and tags are just files in .git/refs/ containing a SHA. Creating a branch is O(1) — just write 40 bytes to a file.

INSIGHT

GitHub displays a commit SHA and lets you permalink to any tree state using it. The /tree/<sha>/ URL will always resolve to the exact snapshot — even after the branch is deleted — as long as the object hasn't been garbage-collected from the server.

1.2

Rebase, Merge & Cherry-Pick at the Object Level

Understanding what each operation actually does to the object graph eliminates the confusion about when history gets "rewritten."

Merge

A merge creates a new commit with two parents. The object graph gains a node; no existing objects are touched.

# Before merge: feature branched from main main: A ── B ── C ↑ feature: A ── B ── D ── E # After: git merge feature (from main) main: A ── B ── C ── M ← merge commit M has parents C and E ╲ ╱ feature: D ── E

No rewriting. C, D, and E are untouched. M is new. Fast-forward is just moving the branch ref pointer when main had no new commits since the branch diverged.

Rebase

Rebase replays commits as new objects on top of the target. It does not move commits — it creates new ones with new SHAs.

# git rebase main (from feature) # Before: main: A ── B ── C feature: A ── B ── D ── E # After: main: A ── B ── C feature: A ── B ── C ── D' ── E' ← D' and E' are NEW objects (new SHAs) # Old D and E are now unreferenced — they'll be pruned by gc

RULE

Never rebase a branch that other developers have based work on. Because D and E no longer exist (they're replaced by D' and E'), anyone who branched off D will have a divergent history that's painful to reconcile. Rebase is safe on your own local/feature branches before a PR; never rebase main or a shared branch.

Cherry-Pick

Cherry-pick copies the diff introduced by a commit and applies it as a new commit on the current branch. The new commit has a different SHA and a different parent — it is not the same object, even though it introduces the same change.

# git cherry-pick E (from main) main: A ── B ── C ── E' ← E' is a new commit with the same diff as E feature: A ── B ── C ── D ── E # E and E' are different objects despite identical diffs

Operation	Creates new objects?	Mutates existing?	Produces merge commit?	Linear history?
Merge	1 commit (merge commit)	No	Yes	No
Fast-forward merge	No	No (just moves ref)	No	Yes
Rebase	N new commits (one per replayed commit)	No (old ones orphaned)	No	Yes
Squash merge	1 commit (squashed)	No	No	Yes
Cherry-pick	1 commit per picked commit	No	No	Yes

1.3

Reflog: Recovering Lost Commits

The reflog is a local, per-repository journal that records every time a ref (branch, HEAD) changes — including resets, rebases, and amends. It is your safety net. It is not pushed to GitHub.

$ git reflog a3f8d12 (HEAD -> main) HEAD@{0}: commit: feat: add retry logic 7b2c1a0 HEAD@{1}: rebase (finish): returning to refs/heads/main 7b2c1a0 HEAD@{2}: rebase (pick): fix: null pointer in checkout 3e1c9d8 HEAD@{3}: rebase (start): checkout main 9a4b5f1 HEAD@{4}: commit: wip: half-done retry b8c7d2e HEAD@{5}: checkout: moving from feature to main

Scenario 1 — Accidental reset

$ git reset --hard HEAD~3 # oops, lost 3 commits $ git reflog # find the SHA before the reset ... a3f8d12 HEAD@{1}: commit: feat: the thing I just lost $ git reset --hard a3f8d12 # restore

Scenario 2 — Detached HEAD rescue

When you check out a commit SHA directly, you enter "detached HEAD" state. Any commits you make here are not attached to a branch — they'll be garbage-collected eventually.

$ git checkout a3f8d12 # detached HEAD $ git commit -m "experiment" # creates f9e8d7c, not on any branch $ git checkout main # f9e8d7c is now "lost" # Recovery: reflog still has it $ git reflog f9e8d7c HEAD@{1}: commit: experiment $ git branch recover-experiment f9e8d7c # attach a branch — now safe

Scenario 3 — After a bad interactive rebase

# Rebase went wrong — find the pre-rebase state $ git reflog | grep "rebase (start)" 3e1c9d8 HEAD@{8}: rebase (start): checkout main $ git reset --hard 3e1c9d8 # back to before the rebase started

IMPORTANT

Reflog entries expire. By default, reachable entries expire after 90 days, unreachable (orphaned) entries after 30 days. Run git gc and they may be pruned earlier. Don't wait weeks to rescue a lost commit.

SENIOR TIP

git fsck --lost-found writes dangling blobs and commits to .git/lost-found/ — a last resort when reflog is insufficient. Also useful for recovering accidentally deleted stash entries.

1.4

Interactive Rebase: Sculpting History Before a PR

Interactive rebase (git rebase -i) lets you rewrite any sequence of commits before they go public. Used well, it produces a clean history that's easy to review and bisect. Used poorly, it destroys context.

$ git rebase -i HEAD~5 # rewrite last 5 commits $ git rebase -i main # rewrite all commits since branching from main

The editor opens with a list of commits and a command for each:

pick a3f8d12 feat: add payment retry logic pick 7b2c1a0 fix: typo in error message pick 3e1c9d8 wip: forgot to remove debug log pick 9a4b5f1 fix: actually fix the debug log pick b8c7d2e test: add unit tests for retry

Commands and when to use them

Command	What it does	When to use
`pick`	Keep the commit as-is	Default — commits that are clean and self-contained
`reword`	Keep the commit, edit the message	Improving message quality before PR review
`edit`	Pause here so you can amend the commit	Splitting a commit into two, or adding a missed file
`squash`	Merge into the previous commit; prompts for combined message	Combining "wip" and its follow-up fix into one commit
`fixup`	Like squash but silently discards this commit's message	Tiny corrections — the parent commit's message is correct
`drop`	Delete the commit entirely	Removing experiment commits, reverted code, debug commits
`exec`	Run a shell command after this step	Running tests at each commit during a complex rebase

Cleaning up the example above before a PR:

pick a3f8d12 feat: add payment retry logic reword 7b2c1a0 fix: correct error message wording in retry handler drop 3e1c9d8 wip: forgot to remove debug log fixup 9a4b5f1 fix: actually fix the debug log pick b8c7d2e test: add unit tests for retry

Result: three clean commits. The wip and its fix are gone. The reviewer sees intent, not the journey.

Splitting a commit with `edit`

# 1. Mark the commit as 'edit' in the rebase list # 2. Git pauses — you're now on that commit $ git reset HEAD~ # unstage everything from this commit $ git add src/payment.js $ git commit -m "feat: add retry logic to payment service" $ git add src/email.js $ git commit -m "feat: add retry logic to email sender" $ git rebase --continue # resume the rebase

TEAM POLICY SUGGESTION

Establish a branch hygiene rule: squash or fixup WIP commits before requesting review, but preserve multi-commit structure when each commit is genuinely independent (the reviewer can review commit-by-commit). Enforce this via PR description template rather than a forced-squash-merge policy — the latter loses intentional commit structure.

1.5

Merge Strategies: recursive, ort, octopus, ours

When you run git merge, Git picks a merge strategy to combine the histories. Most developers never need to specify this explicitly, but understanding what happens explains conflict patterns and resolution behaviour.

Strategy	When used	How it works
ort	Default since Git 2.34 (replaces recursive)	Optimised Recursive Three-way merge. Finds the merge base, runs a three-way merge per file, handles criss-cross merges by recursively merging the merge bases. Faster and produces fewer spurious conflicts than its predecessor.
recursive	Default in Git <2.34; still available explicitly	Same concept as ort but older implementation. Use `-X theirs` or `-X ours` to auto-resolve conflicts in one side's favour.
octopus	Merging 3+ branches in one command	Designed for merging multiple feature branches simultaneously. Refuses to proceed if any conflict requires manual resolution — it's for clean, independent branches only.
ours	Explicit `-s ours`	Records a merge commit but takes 100% of the current branch's tree. The other branch's changes are discarded. Useful for "officially" merging a branch you're abandoning without incorporating its changes.
subtree	Working with git subtrees	Like recursive, but Git attempts to recognise that one repo is a subdirectory of the other and adjusts the diff accordingly.

Strategy options (`-X`)

The recursive/ort strategies accept options via -X:

$ git merge -X theirs feature # auto-resolve all conflicts by taking "theirs" $ git merge -X ours feature # auto-resolve all conflicts by keeping "ours" $ git merge -X ignore-space-change feature # ignore whitespace-only changes

CAUTION

-X theirs and -X ours silently discard real changes in conflicts. Use them only when you genuinely want one side to win for all conflicts — for example, merging a dependency update branch where only one version should survive. Never use them as a "just make it compile" shortcut on application logic.

GitHub's merge button uses the ort strategy

When GitHub performs the server-side merge (the green "Merge pull request" button), it uses git merge --no-ff with the default ort strategy. If GitHub reports "This branch has conflicts that must be resolved," it means ort couldn't auto-resolve — you need to pull the branch, merge locally, fix conflicts, and push.

1.6

GitHub's Three Merge Modes

GitHub exposes three merge options per repository (configurable under Settings → General → Pull Requests). Understanding their object-level effects helps you choose the right default for your team.

── Merge commit (default) ────────────────────────────────── # Equivalent to: git merge --no-ff main: A ── B ── C ─────── M (M has two parents: C and E) ╲ ╱ feature: D ── E History: complete, non-linear. Every commit from feature is preserved. Bisect: clean — each commit is a real, tested state. Log readability: complex on busy repos. ── Squash and merge ───────────────────────────────────────── # Equivalent to: git merge --squash && git commit main: A ── B ── C ── S (S contains D+E squashed, parent is C only) feature: D ── E (orphaned — feature branch now disconnected) History: linear, clean. One PR = one commit on main. Bisect: each point on main = one merged PR — easy to identify which PR broke things. Drawback: feature branch can't be re-merged cleanly; internal commits are lost. ── Rebase and merge ───────────────────────────────────────── # Equivalent to: git rebase main && git merge --ff-only main: A ── B ── C ── D' ── E' (D' and E' are rebased copies of D and E) feature: D ── E (original commits — now orphaned) History: linear, preserves individual commits. Bisect: finest granularity — individual commits on main. Drawback: commits lose their original SHA — GitHub links to merged PR but commit SHAs differ.

Mode	History shape	Preserves commits	Best for
Merge commit	Non-linear (diamond)	Yes — original SHAs	Teams that value full audit trails; repos where each commit must be independently deployable
Squash merge	Linear	No — squashed into one	Teams with messy intermediate commits; products where one PR = one deployable unit
Rebase merge	Linear	Yes — as replayed copies	Teams with disciplined commit hygiene; open-source projects that value per-commit history

LEAD RECOMMENDATION

Pick one merge strategy per repo and enforce it via branch protection ("only allow squash merging" or similar). Mixed strategies on main make tooling (changelog generators, bisect scripts, deploy pipelines) complicated. Squash merge works well for most product teams; rebase merge works well for library/open-source repos with disciplined contributors.

1.7

Git Large File Storage (LFS)

Git performs poorly with large binary files (design assets, ML model weights, compiled binaries, test fixtures over a few MB). Each version of a 50 MB binary is stored as a full blob — clone time and repo size grow unboundedly. Git LFS solves this by replacing large files in the repo with small pointer files and storing the actual content on a separate LFS server.

What actually lives in the repo

# Without LFS — the binary IS in Git's object store: blob a4b3c2d1... ← 47 MB model.pkl stored as a Git blob # With LFS — the repo contains only a pointer file: blob f1e2d3c4... ← tiny text pointer: version https://git-lfs.github.com/spec/v1 oid sha256:9b4e...c3f1 size 49283145 # The actual 47 MB lives on the LFS server, fetched on demand

Setting up LFS

# Install once per machine $ git lfs install # Track a file pattern in this repo (.gitattributes is committed) $ git lfs track "*.psd" $ git lfs track "models/**/*.pkl" $ git add .gitattributes $ git commit -m "chore: track PSD and model files via LFS" # Verify what's tracked $ git lfs ls-files 9b4ec3f1 * models/v2/classifier.pkl a7b8c9d0 * assets/hero.psd # See LFS storage usage $ git lfs status

GitHub LFS quotas

Plan	Free storage	Free bandwidth/month	Overage
Free / Pro	1 GB	1 GB	$0.07/GB storage, $0.0875/GB bandwidth
Team / Enterprise	1 GB (base)	1 GB (base)	Same rates; data packs available

IMPORTANT

LFS files are not included in repository archives (the "Download ZIP" button). If your project needs downloadable releases that include large assets, attach them explicitly to a GitHub Release rather than relying on LFS.

When LFS is worth the overhead

Binary assets that change frequently (design files, ML datasets)
Any single file over ~5 MB that will be updated across the project lifetime
Test fixture archives, pre-built binaries checked in for reproducibility

When not to use LFS: build artifacts that should live in an artifact registry (GitHub Packages, S3, Artifactory), generated files that shouldn't be in source control at all, or files that are added once and never changed (just keep them as regular blobs).

1.8

Sparse Checkout & Partial Clone

When a monorepo grows to hundreds of thousands of files, a full clone and checkout take minutes and consume gigabytes of disk. Git provides two complementary tools to reduce this: partial clone (download fewer objects) and sparse checkout (write fewer files to the working tree).

Partial clone — fetch only what you need

Partial clone defers downloading blob content. You clone the tree and commit graph without downloading every file's content — blobs are fetched lazily when you actually read the file.

# Blobless clone: download commits and trees, fetch blobs on demand $ git clone --filter=blob:none https://github.com/org/monorepo.git # Treeless clone: even more aggressive — only commits are pre-downloaded $ git clone --filter=tree:0 https://github.com/org/monorepo.git # Check what type of clone you have $ git config --local core.partialclonefilter blob:none

Sparse checkout — work in a subset of the tree

Sparse checkout writes only specified directories/files to your working tree. Combined with partial clone, this gives you a fast, small workspace inside a huge repo.

# Enable sparse checkout on an existing clone $ git sparse-checkout init --cone # cone mode: fast, directory-based patterns # Specify which directories you want $ git sparse-checkout set services/payments services/auth shared/utils # List current patterns $ git sparse-checkout list services/payments services/auth shared/utils # Add more directories without losing existing ones $ git sparse-checkout add services/notifications

Combined workflow: clone only what you'll work on

# One-step: partial blobless clone + sparse checkout into one service $ git clone --filter=blob:none --sparse https://github.com/org/monorepo.git $ cd monorepo $ git sparse-checkout init --cone $ git sparse-checkout set services/payments # Result: only services/payments/ and the root files exist locally. # A 40 GB monorepo becomes a <200 MB local clone.

Non-cone mode for fine-grained patterns

Cone mode only supports directory boundaries (fast, O(1) lookups). If you need file-level patterns, use non-cone mode — but it's significantly slower on large repos:

$ git sparse-checkout init # no --cone flag $ cat .git/info/sparse-checkout /* !/docs !/legacy !*.test.ts

Technique	Reduces	Best for
Partial clone (blob:none)	Download size & initial clone time	Any large repo; CI runners cloning for a specific job
Sparse checkout (cone)	Working tree size; git status/add performance	Monorepos; devs working in one service area
Shallow clone (`--depth=1`)	History download	CI: build + test without needing full history; not suitable for git operations that need ancestry

GITHUB ACTIONS NOTE

The actions/checkout action supports both partial clone and sparse checkout via its filter and sparse-checkout inputs (v4+). Use these in CI workflows that only need a single service's files to avoid cloning the entire monorepo on every run.

Git Internals & GitHub's Object Model

Git's Content-Addressable Storage

blob

tree

commit

tag

Rebase, Merge & Cherry-Pick at the Object Level

Merge

Rebase

Cherry-Pick

Reflog: Recovering Lost Commits

Scenario 1 — Accidental reset

Scenario 2 — Detached HEAD rescue

Scenario 3 — After a bad interactive rebase

Interactive Rebase: Sculpting History Before a PR

Commands and when to use them

Splitting a commit with `edit`

Merge Strategies: recursive, ort, octopus, ours

Strategy options (`-X`)

GitHub's merge button uses the ort strategy

GitHub's Three Merge Modes

Git Large File Storage (LFS)

What actually lives in the repo

Setting up LFS

GitHub LFS quotas

When LFS is worth the overhead

Sparse Checkout & Partial Clone

Partial clone — fetch only what you need

Sparse checkout — work in a subset of the tree

Combined workflow: clone only what you'll work on

Non-cone mode for fine-grained patterns

Up Next — Phase 2: Repository Architecture & Monorepo Strategy

Series Navigation

Git Internals & GitHub's Object Model

Git's Content-Addressable Storage

blob

tree

commit

tag

Rebase, Merge & Cherry-Pick at the Object Level

Merge

Rebase

Cherry-Pick

Reflog: Recovering Lost Commits

Scenario 1 — Accidental reset

Scenario 2 — Detached HEAD rescue

Scenario 3 — After a bad interactive rebase

Interactive Rebase: Sculpting History Before a PR

Commands and when to use them

Splitting a commit with edit

Merge Strategies: recursive, ort, octopus, ours

Strategy options (-X)

GitHub's merge button uses the ort strategy

GitHub's Three Merge Modes

Git Large File Storage (LFS)

What actually lives in the repo

Setting up LFS

GitHub LFS quotas

When LFS is worth the overhead

Sparse Checkout & Partial Clone

Partial clone — fetch only what you need

Sparse checkout — work in a subset of the tree

Combined workflow: clone only what you'll work on

Non-cone mode for fine-grained patterns

Up Next — Phase 2: Repository Architecture & Monorepo Strategy

Series Navigation

Splitting a commit with `edit`

Strategy options (`-X`)