๐ฌ Deep Investigation: The Index Demystified โ
๐งช My Own Git Investigation Notes
Deep diving into questions that puzzled me while learning Git internals
๐ค The Big Questions That Puzzled Me โ
"How the hell does
git write-treeorgit commitget all that information from the index only?"
"What happens to the index when we change branches?"
"Does
git addcreate trees too, or only blobs?"
Let's investigate! ๐
๐ What Exactly IS the Index? โ
Index Location โ
.git/index โ Binary file containing the staging areaWhat the Index Stores (Per File) โ
| Field | Description | Example |
|---|---|---|
| SHA-1 hash | Pointer to blob object | ce013625030ba8dba906f756967f9e9ca394464a |
| File path | Full path from repo root | src/components/header.js |
| File mode | Permissions | 100644 (regular) or 100755 (executable) |
| Stage number | For merge conflicts (0-3) | 0 (normal) |
| Timestamps | ctime, mtime for change detection | 1706300000 |
| File size | For quick comparison | 1234 bytes |
๐ Critical Insight: The Index is FLAT! โ
๐ก This is the KEY to understanding Git!
The index does NOT store tree objects. It's a flat sorted list of paths!
# What the index looks like internally:
100644 ce013625... 0 README.md
100644 a1b2c3d4... 0 src/app.js
100644 e5f6g7h8... 0 src/components/header.js
100644 i9j0k1l2... 0 src/components/footer.js
100644 m3n4o5p6... 0 src/utils/helpers.jsNo nested folders - just flat paths!
Index vs Repository Structure โ
๐ Inspecting the Index โ
Command: git ls-files --stage โ
This is your best friend for understanding the index!
git ls-files --stageOutput:
100644 ce013625030ba8dba906f756967f9e9ca394464a 0 README.md
100644 a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0 0 src/app.js
100644 e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4 0 src/components/header.jsFields explained:
100644 ce013625... 0 README.md
โ โ โ โ
โ โ โ โโโ File path
โ โ โโโ Stage number (0=normal, 1-3=merge conflict)
โ โโโ Blob SHA-1 hash
โโโ File mode (100644=regular, 100755=executable)More Index Inspection Commands โ
# Show all files in index
git ls-files
# Show files with their status
git ls-files -s
# Show deleted files
git ls-files -d
# Show modified files
git ls-files -m
# Show untracked files
git ls-files -o
# Show ignored files
git ls-files -i --exclude-standard
# Show cached files (staged)
git ls-files -c๐ฌ What Happens: git add vs git commit โ
git add - Creates ONLY Blobs โ
Key Point: git add does NOT create trees!
git commit - Creates Trees AND Commit โ
The Magic of git write-tree โ
๐ช How does it build trees from a flat list?
Because the index is a sorted flat list of paths, git write-tree can efficiently:
- Group by directory - All
src/...files together - Build bottom-up - Deepest directories first
- Create tree objects - For each directory level
- Link them together - Parent trees reference child trees
Index (sorted): Trees Created:
โโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโ
README.md โโโโโโโโโโ
src/app.js โโโโโ โ ๐ณ src/components/
src/components/a.js โโโโผโโโโโผโโโถ ๐ณ src/
src/components/b.js โโโโ โ ๐ณ root (links all)
โWhat Happens to Index When Switching Branches? โ
This was one of my biggest questions!
Scenario 1: Clean Staging Area โ
Result: Index is completely replaced with target branch's snapshot.
Scenario 2: Uncommitted Changes (Non-Conflicting) โ
Result: Your staged changes "follow" you to the new branch!
Scenario 3: Conflicting Changes โ
Result: Git refuses to switch to protect your work!
Summary Table โ
| Scenario | What Happens to Index |
|---|---|
| Clean working tree | Index replaced with target branch |
| Non-conflicting changes | Changes stay staged, follow you |
| Conflicting changes | Switch blocked, index unchanged |
Force switch (-f) | Index wiped, your changes LOST! |
Complete Plumbing Commands Reference โ
Index Commands โ
| Command | Description |
|---|---|
git ls-files --stage | Show full index with SHAs and modes |
git ls-files | List tracked files |
git ls-files -m | List modified files |
git ls-files -d | List deleted files |
git ls-files -o | List untracked files |
git ls-files -o --exclude-standard | Untracked, respecting .gitignore |
git update-index --add --cacheinfo <mode> <sha> <path> | Add entry to index |
git update-index --remove <path> | Remove from index |
git update-index --refresh | Refresh index stat info |
git read-tree <tree-sha> | Load tree into index |
git write-tree | Create tree from index |
Object Commands โ
| Command | Description |
|---|---|
git hash-object -w <file> | Create blob from file |
git hash-object --stdin -w | Create blob from stdin |
git hash-object -t <type> --stdin | Create object of specific type |
git cat-file -t <sha> | Show object type |
git cat-file -p <sha> | Pretty-print object |
git cat-file -s <sha> | Show object size |
git cat-file blob <sha> | Show blob content |
git cat-file commit <sha> | Show commit content |
git cat-file tree <sha> | Show tree content |
git mktree | Create tree from stdin |
git commit-tree <tree> -m "msg" | Create commit object |
git commit-tree <tree> -m "msg" -p <parent> | Create commit with parent |
Reference Commands โ
| Command | Description |
|---|---|
git update-ref refs/heads/<branch> <sha> | Update branch |
git symbolic-ref HEAD refs/heads/<branch> | Update HEAD |
git symbolic-ref HEAD | Show what HEAD points to |
git rev-parse HEAD | Get current commit SHA |
git rev-parse HEAD^{tree} | Get current tree SHA |
git rev-parse --short HEAD | Get short SHA |
git show-ref | List all refs |
git show-ref --heads | List branch refs |
git show-ref --tags | List tag refs |
Diff & Comparison Commands โ
| Command | Description |
|---|---|
git diff-tree -p <sha> | Show diff for commit |
git diff-index HEAD | Diff index vs HEAD |
git diff-files | Diff working dir vs index |
git ls-tree <tree-sha> | List tree contents |
git ls-tree -r <tree-sha> | Recursively list tree |
Pack & Verify Commands โ
| Command | Description |
|---|---|
git verify-pack -v .git/objects/pack/*.idx | Verify pack files |
git count-objects -v | Count loose objects |
git fsck | Check repository integrity |
git gc | Garbage collect |
git prune | Remove unreachable objects |
Investigation Lab: Try It Yourself! โ
Lab 1: Watch the Index Change โ
# Start fresh
mkdir index-lab && cd index-lab
git init
# Index is empty
git ls-files --stage
# (nothing)
# Create and add a file
echo "hello" > test.txt
git add test.txt
# Now see the index!
git ls-files --stage
# 100644 ce013625030ba8dba906f756967f9e9ca394464a 0test.txt
# The blob was created
git cat-file -p ce013625030ba8dba906f756967f9e9ca394464a
# helloLab 2: See Trees Created at Commit โ
# Commit (this creates trees!)
git commit -m "first commit"
# Get the commit
git cat-file -p HEAD
# tree 8b137891791fe96927ad78e64b0aad7bded08bdc
# author ...
# Get the tree
git cat-file -p 8b137891791fe96927ad78e64b0aad7bded08bdc
# 100644 blob ce013625030ba8dba906f756967f9e9ca394464atest.txt
# The tree was created during commit, NOT during add!Lab 3: Index During Branch Switch โ
# Create a new file and stage it (don't commit)
echo "staged but not committed" > new.txt
git add new.txt
# Check index
git ls-files --stage
# Shows both test.txt and new.txt
# Create and switch to new branch
git checkout -b feature
# Check index again - new.txt followed us!
git ls-files --stage
# Still shows both files!
# The staged file traveled with us to the new branchLab 4: Prove git add Only Creates Blobs โ
# Create fresh repo
mkdir blob-test && cd blob-test
git init
# Create nested structure
mkdir -p src/components
echo "root file" > README.md
echo "app code" > src/app.js
echo "component" > src/components/Button.js
# Stage everything
git add .
# Check: What objects exist?
find .git/objects -type f | head -20
# All objects are BLOBS! No trees yet!
# Verify one:
git cat-file -t $(git ls-files --stage | head -1 | awk '{print $2}')
# blob
# Now commit
git commit -m "first commit"
# Check again - NOW we have trees!
git cat-file -p HEAD^{tree}
# Shows tree objects were created during commitUseful Inspection Scripts โ
I've created a collection of powerful inspection scripts!
Location: git/docs/internals/scripts/
| Script | Description |
|---|---|
inspect-index.sh | Deep index inspection with formatting |
inspect-objects.sh | Explore object database |
inspect-refs.sh | Examine all references |
git-internals-dump.sh | Full repository dump |
watch-git-changes.sh | Monitor .git in real-time |
create-commit-manually.sh | Build commit with plumbing only |
compare-branches.sh | Compare branch internals |
find-blob.sh | Find which commits contain a blob |
See scripts/README.md for detailed usage!
Key Takeaways โ
Just paths + blob SHAs, no tree structure
Creates blob + updates index, no trees
write-tree builds hierarchy from flat index
Unless you have uncommitted changes
๐ง Quiz: Test Your Understanding โ
1. Does the index store tree objects?
โ NO! The index is a flat list of paths โ blob SHAs. Trees are only created during git commit (via git write-tree).
2. What does `git add` create?
Only blobs! It:
- Reads file content
- Creates blob object in
.git/objects/ - Updates
.git/indexwith path โ blob mapping
3. When are tree objects created?
During git commit! The command git write-tree reads the flat index and builds the hierarchical tree structure on the fly.
4. What happens to staged files when you switch branches?
If there's no conflict with the target branch, staged files follow you to the new branch. The index keeps your staged changes!
5. Why can Git build trees from a flat index?
Because the index is sorted by path! Git can efficiently group files by directory and build trees bottom-up.
What's Next? โ
Next: Rewriting History & Git Disasters
Now that you understand the internals, learn how to manipulate history and recover from Git disasters!