Skip to content

๐Ÿ”ฌ Deep Investigation: The Index Demystified โ€‹

๐Ÿงช My Own Git Investigation Notes

Deep diving into questions that puzzled me while learning Git internals


๐Ÿค” The Big Questions That Puzzled Me โ€‹

"How the hell does git write-tree or git commit get all that information from the index only?"

"What happens to the index when we change branches?"

"Does git add create trees too, or only blobs?"

Let's investigate! ๐Ÿ”


๐Ÿ“‹ What Exactly IS the Index? โ€‹

๐ŸŽฏ Definition: The Git staging area (also called the index or cache) is a flat list of file paths mapped to blob SHAs. It's the proposed snapshot for your next commit.

Index Location โ€‹

.git/index    โ† Binary file containing the staging area

What the Index Stores (Per File) โ€‹

FieldDescriptionExample
SHA-1 hashPointer to blob objectce013625030ba8dba906f756967f9e9ca394464a
File pathFull path from repo rootsrc/components/header.js
File modePermissions100644 (regular) or 100755 (executable)
Stage numberFor merge conflicts (0-3)0 (normal)
Timestampsctime, mtime for change detection1706300000
File sizeFor quick comparison1234 bytes

๐Ÿ”‘ Critical Insight: The Index is FLAT! โ€‹

๐Ÿ’ก This is the KEY to understanding Git!

The index does NOT store tree objects. It's a flat sorted list of paths!

# What the index looks like internally:
100644 ce013625... 0	README.md
100644 a1b2c3d4... 0	src/app.js
100644 e5f6g7h8... 0	src/components/header.js
100644 i9j0k1l2... 0	src/components/footer.js
100644 m3n4o5p6... 0	src/utils/helpers.js

No nested folders - just flat paths!

Index vs Repository Structure โ€‹


๐Ÿ” Inspecting the Index โ€‹

Command: git ls-files --stage โ€‹

This is your best friend for understanding the index!

bash
git ls-files --stage

Output:

100644 ce013625030ba8dba906f756967f9e9ca394464a 0	README.md
100644 a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0 0	src/app.js
100644 e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4 0	src/components/header.js

Fields explained:

100644   ce013625...   0        README.md
  โ”‚           โ”‚        โ”‚            โ”‚
  โ”‚           โ”‚        โ”‚            โ””โ”€โ”€ File path
  โ”‚           โ”‚        โ””โ”€โ”€ Stage number (0=normal, 1-3=merge conflict)
  โ”‚           โ””โ”€โ”€ Blob SHA-1 hash
  โ””โ”€โ”€ File mode (100644=regular, 100755=executable)

More Index Inspection Commands โ€‹

bash
# Show all files in index
git ls-files

# Show files with their status
git ls-files -s

# Show deleted files
git ls-files -d

# Show modified files
git ls-files -m

# Show untracked files
git ls-files -o

# Show ignored files
git ls-files -i --exclude-standard

# Show cached files (staged)
git ls-files -c

๐ŸŽฌ What Happens: git add vs git commit โ€‹

git add - Creates ONLY Blobs โ€‹

Key Point: git add does NOT create trees!

git commit - Creates Trees AND Commit โ€‹

The Magic of git write-tree โ€‹

๐Ÿช„ How does it build trees from a flat list?

Because the index is a sorted flat list of paths, git write-tree can efficiently:

  1. Group by directory - All src/... files together
  2. Build bottom-up - Deepest directories first
  3. Create tree objects - For each directory level
  4. Link them together - Parent trees reference child trees
Index (sorted):                  Trees Created:
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
README.md          โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    
src/app.js         โ”€โ”€โ”€โ”€โ”    โ”‚    ๐ŸŒณ src/components/
src/components/a.js โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ–ถ ๐ŸŒณ src/
src/components/b.js โ”€โ”€โ”€โ”˜    โ”‚    ๐ŸŒณ root (links all)
                            โ”‚

What Happens to Index When Switching Branches? โ€‹

This was one of my biggest questions!

Scenario 1: Clean Staging Area โ€‹

Result: Index is completely replaced with target branch's snapshot.

Scenario 2: Uncommitted Changes (Non-Conflicting) โ€‹

Result: Your staged changes "follow" you to the new branch!

Scenario 3: Conflicting Changes โ€‹

Result: Git refuses to switch to protect your work!

Summary Table โ€‹

ScenarioWhat Happens to Index
Clean working treeIndex replaced with target branch
Non-conflicting changesChanges stay staged, follow you
Conflicting changesSwitch blocked, index unchanged
Force switch (-f)Index wiped, your changes LOST!

Complete Plumbing Commands Reference โ€‹

Index Commands โ€‹

CommandDescription
git ls-files --stageShow full index with SHAs and modes
git ls-filesList tracked files
git ls-files -mList modified files
git ls-files -dList deleted files
git ls-files -oList untracked files
git ls-files -o --exclude-standardUntracked, respecting .gitignore
git update-index --add --cacheinfo <mode> <sha> <path>Add entry to index
git update-index --remove <path>Remove from index
git update-index --refreshRefresh index stat info
git read-tree <tree-sha>Load tree into index
git write-treeCreate tree from index

Object Commands โ€‹

CommandDescription
git hash-object -w <file>Create blob from file
git hash-object --stdin -wCreate blob from stdin
git hash-object -t <type> --stdinCreate object of specific type
git cat-file -t <sha>Show object type
git cat-file -p <sha>Pretty-print object
git cat-file -s <sha>Show object size
git cat-file blob <sha>Show blob content
git cat-file commit <sha>Show commit content
git cat-file tree <sha>Show tree content
git mktreeCreate tree from stdin
git commit-tree <tree> -m "msg"Create commit object
git commit-tree <tree> -m "msg" -p <parent>Create commit with parent

Reference Commands โ€‹

CommandDescription
git update-ref refs/heads/<branch> <sha>Update branch
git symbolic-ref HEAD refs/heads/<branch>Update HEAD
git symbolic-ref HEADShow what HEAD points to
git rev-parse HEADGet current commit SHA
git rev-parse HEAD^{tree}Get current tree SHA
git rev-parse --short HEADGet short SHA
git show-refList all refs
git show-ref --headsList branch refs
git show-ref --tagsList tag refs

Diff & Comparison Commands โ€‹

CommandDescription
git diff-tree -p <sha>Show diff for commit
git diff-index HEADDiff index vs HEAD
git diff-filesDiff working dir vs index
git ls-tree <tree-sha>List tree contents
git ls-tree -r <tree-sha>Recursively list tree

Pack & Verify Commands โ€‹

CommandDescription
git verify-pack -v .git/objects/pack/*.idxVerify pack files
git count-objects -vCount loose objects
git fsckCheck repository integrity
git gcGarbage collect
git pruneRemove unreachable objects

Investigation Lab: Try It Yourself! โ€‹

Lab 1: Watch the Index Change โ€‹

bash
# Start fresh
mkdir index-lab && cd index-lab
git init

# Index is empty
git ls-files --stage
# (nothing)

# Create and add a file
echo "hello" > test.txt
git add test.txt

# Now see the index!
git ls-files --stage
# 100644 ce013625030ba8dba906f756967f9e9ca394464a 0test.txt

# The blob was created
git cat-file -p ce013625030ba8dba906f756967f9e9ca394464a
# hello

Lab 2: See Trees Created at Commit โ€‹

bash
# Commit (this creates trees!)
git commit -m "first commit"

# Get the commit
git cat-file -p HEAD
# tree 8b137891791fe96927ad78e64b0aad7bded08bdc
# author ...

# Get the tree
git cat-file -p 8b137891791fe96927ad78e64b0aad7bded08bdc
# 100644 blob ce013625030ba8dba906f756967f9e9ca394464atest.txt

# The tree was created during commit, NOT during add!

Lab 3: Index During Branch Switch โ€‹

bash
# Create a new file and stage it (don't commit)
echo "staged but not committed" > new.txt
git add new.txt

# Check index
git ls-files --stage
# Shows both test.txt and new.txt

# Create and switch to new branch
git checkout -b feature

# Check index again - new.txt followed us!
git ls-files --stage
# Still shows both files!

# The staged file traveled with us to the new branch

Lab 4: Prove git add Only Creates Blobs โ€‹

bash
# Create fresh repo
mkdir blob-test && cd blob-test
git init

# Create nested structure
mkdir -p src/components
echo "root file" > README.md
echo "app code" > src/app.js
echo "component" > src/components/Button.js

# Stage everything
git add .

# Check: What objects exist?
find .git/objects -type f | head -20

# All objects are BLOBS! No trees yet!
# Verify one:
git cat-file -t $(git ls-files --stage | head -1 | awk '{print $2}')
# blob

# Now commit
git commit -m "first commit"

# Check again - NOW we have trees!
git cat-file -p HEAD^{tree}
# Shows tree objects were created during commit

Useful Inspection Scripts โ€‹

I've created a collection of powerful inspection scripts!

Location: git/docs/internals/scripts/

ScriptDescription
inspect-index.shDeep index inspection with formatting
inspect-objects.shExplore object database
inspect-refs.shExamine all references
git-internals-dump.shFull repository dump
watch-git-changes.shMonitor .git in real-time
create-commit-manually.shBuild commit with plumbing only
compare-branches.shCompare branch internals
find-blob.shFind which commits contain a blob

See scripts/README.md for detailed usage!


Key Takeaways โ€‹

Index is FLAT
Just paths + blob SHAs, no tree structure
`git add` = Blobs only
Creates blob + updates index, no trees
`git commit` = Trees
write-tree builds hierarchy from flat index
Branch switch = Index replace
Unless you have uncommitted changes

๐Ÿง  Quiz: Test Your Understanding โ€‹

1. Does the index store tree objects?

โŒ NO! The index is a flat list of paths โ†’ blob SHAs. Trees are only created during git commit (via git write-tree).

2. What does `git add` create?

Only blobs! It:

  1. Reads file content
  2. Creates blob object in .git/objects/
  3. Updates .git/index with path โ†’ blob mapping
3. When are tree objects created?

During git commit! The command git write-tree reads the flat index and builds the hierarchical tree structure on the fly.

4. What happens to staged files when you switch branches?

If there's no conflict with the target branch, staged files follow you to the new branch. The index keeps your staged changes!

5. Why can Git build trees from a flat index?

Because the index is sorted by path! Git can efficiently group files by directory and build trees bottom-up.


What's Next? โ€‹

Next: Rewriting History & Git Disasters

Now that you understand the internals, learn how to manipulate history and recover from Git disasters!

Continue to Lesson 05

Released under the MIT License.