Skip to content

🔧 Creating a Repository From Scratch (Part A)

🏗️ Build a Git Repo Without git init!

Using only echo, mkdir, and low-level Git commands

Lesson 4 Overview


📺 Video Reference

ResourceLink
🎬 VideoCreating Repo From Scratch
📄 Transcript04-creating-a-repo-from-scratch.txt

🎯 What We'll Build

We're going to create a complete Git repository without using:

  • git init
  • git add
  • git commit

Instead, we'll use:

  • echo and mkdir
  • ✅ Plumbing commands (git hash-object, git update-index, etc.)

🔧 Porcelain vs Plumbing Commands

The Toilet Analogy 🚽

Git commands are divided into two types, named after toilet parts (seriously!):

TypeDescriptionExamples
PorcelainUser-friendly, high-levelgit add, git commit, git checkout
PlumbingLow-level, internalgit hash-object, git update-index, git write-tree

Most users only interact with the porcelain (the nice, visible part). But underneath, the plumbing does the real work!


📋 Plumbing Commands Reference

CommandPurposeInputOutput
git hash-objectCreate blob from contentFile or stdinSHA-1 hash
git cat-fileRead object contentSHA-1 hashContent/type/size
git update-indexAdd entry to staging areaBlob SHA + filenameUpdates index
git write-treeCreate tree from indexIndexTree SHA
git commit-treeCreate commit from treeTree SHA + messageCommit SHA
git update-refUpdate branch referenceBranch + SHAUpdates ref file

🚀 Part 1: The Normal Way (for comparison)

First, let's see what git init creates:

bash
# Create a normal repo
mkdir normal-repo && cd normal-repo
git init

Output:

Initialized empty Git repository in /workspace/normal-repo/.git/

What git init Created

bash
tree .git
.git
├── HEAD                 ← Points to current branch
├── config               ← Repository configuration
├── description          ← GitWeb description (rarely used)
├── hooks/               ← Git hooks (scripts)
│   ├── pre-commit.sample
│   └── ... (more samples)
├── info/
│   └── exclude          ← Local gitignore
├── objects/             ← Object database
│   ├── info/
│   └── pack/
└── refs/                ← References (branches, tags)
    ├── heads/           ← Local branches
    └── tags/            ← Tags

The Normal Workflow

bash
# Create a file
echo "hello" > f.txt

# Stage it
git add f.txt

# Commit it
git commit -m "added f.txt"

🔍 What Actually Happens When You Run git add?

🎯 Let's Break Down `git add` Step by Step!

Understanding this is KEY to understanding Git internals

Starting Point

Let's say you have a file:

bash
echo "hello world" > myfile.txt

At this point:

  • ✅ File exists in working directory
  • ❌ Nothing in staging area (index)
  • ❌ Nothing in object database

When You Run git add myfile.txt

Git does TWO things internally:


Step 1: Create a Blob Object

When you run git add, Git first creates a blob (Binary Large OBject):

┌─────────────────────────────────────────────────────────────────┐
│                    What Git Does Internally                      │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  1. Read file contents: "hello world"                           │
│                                                                  │
│  2. Prepend header: "blob 11\0" (type + size + null byte)       │
│     Result: "blob 11\0hello world"                              │
│                                                                  │
│  3. Calculate SHA-1 hash of that string:                        │
│     → 95d09f2b10159347eece71399a7e2e907ea3df4f                  │
│                                                                  │
│  4. Compress with zlib and store at:                            │
│     .git/objects/95/d09f2b10159347eece71399a7e2e907ea3df4f     │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

This is equivalent to running:

bash
git hash-object -w myfile.txt
# Output: 95d09f2b10159347eece71399a7e2e907ea3df4f
💡 Key Point: The blob only contains the contents of the file. It does NOT store:
  • ❌ The filename
  • ❌ The file path
  • ❌ File permissions
  • ❌ When it was created
Just raw content!

Step 2: Update the Index (Staging Area)

After creating the blob, Git updates the index file:

┌─────────────────────────────────────────────────────────────────┐
│                    .git/index gets updated                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  New entry added:                                                │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │ mode: 100644 (regular file)                              │    │
│  │ SHA:  95d09f2b10159347eece71399a7e2e907ea3df4f          │    │
│  │ path: myfile.txt                                         │    │
│  └─────────────────────────────────────────────────────────┘    │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

This is equivalent to running:

bash
git update-index --add --cacheinfo 100644 \
    95d09f2b10159347eece71399a7e2e907ea3df4f \
    myfile.txt

After git add - The Complete Picture

Now you have:

  • ✅ File in working directory (myfile.txt)
  • ✅ Blob in object database (.git/objects/95/d09f2b...)
  • ✅ Entry in index linking filename to blob SHA

Visual Summary: git add = Two Operations

┌────────────────────────────────────────────────────────────────────┐
│                         git add myfile.txt                          │
├────────────────────────────────────────────────────────────────────┤
│                                                                     │
│   ┌─────────────────────┐         ┌─────────────────────┐          │
│   │   STEP 1: BLOB      │         │   STEP 2: INDEX     │          │
│   │   ═══════════════   │         │   ═══════════════   │          │
│   │                     │         │                     │          │
│   │   Read myfile.txt   │         │   Add entry:        │          │
│   │         ↓           │         │                     │          │
│   │   "hello world"     │         │   myfile.txt        │          │
│   │         ↓           │         │      ↓              │          │
│   │   SHA-1 hash        │         │   95d09f2b...       │          │
│   │         ↓           │         │      ↓              │          │
│   │   95d09f2b...       │────────→│   100644 (mode)     │          │
│   │         ↓           │         │                     │          │
│   │   Store in          │         │   Write to          │          │
│   │   .git/objects/     │         │   .git/index        │          │
│   │                     │         │                     │          │
│   └─────────────────────┘         └─────────────────────┘          │
│                                                                     │
│   Plumbing equivalent:            Plumbing equivalent:              │
│   git hash-object -w file         git update-index --add            │
│                                                                     │
└────────────────────────────────────────────────────────────────────┘

What If You Modify the File and git add Again?

bash
# Modify the file
echo "hello world v2" > myfile.txt

# Stage again
git add myfile.txt

Git creates a NEW blob with a NEW SHA:

🔑 Important: Git NEVER modifies existing objects! Each version creates a NEW blob. Old blobs stay around until garbage collection.

Quick Reference: git add Internals

What git add DoesPlumbing Equivalent
Create blob from filegit hash-object -w <file>
Update indexgit update-index --add --cacheinfo <mode> <sha> <path>
ComponentLocationPurpose
Blob.git/objects/XX/XXXX...Stores file contents (compressed)
Index.git/indexMaps filenames to blob SHAs

📸 What Actually Happens When You Run git commit?

🎬 Now Let's Understand `git commit`!

This is where Git creates a permanent snapshot of your staged changes

Starting Point (After git add)

We have:

  • ✅ Blob in object database
  • ✅ Entry in index pointing to blob
  • ❌ No tree yet
  • ❌ No commit yet

When You Run git commit -m "message"

Git does THREE things internally:


🌳 Step 1: Create a Tree Object

The tree is a snapshot of your directory structure at commit time.

What is a Tree?

🌳 Tree Object = A list of entries, where each entry is:
  • mode - file permissions (100644, 100755, 040000)
  • type - blob or tree
  • SHA - hash of the content
  • name - filename or directory name

Git Reads the Index and Creates a Tree

┌─────────────────────────────────────────────────────────────────┐
│                   Index → Tree Conversion                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Index contains:                    Tree object created:         │
│  ┌─────────────────────┐            ┌─────────────────────┐     │
│  │ myfile.txt          │            │ 100644 blob 95d09f2 │     │
│  │   → 95d09f2b...     │     →      │        myfile.txt   │     │
│  │   mode: 100644      │            │                     │     │
│  └─────────────────────┘            │ SHA: 7a8b9c0d...    │     │
│                                     └─────────────────────┘     │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

This is equivalent to running:

bash
git write-tree
# Output: 7a8b9c0d1e2f3a4b5c6d7e8f9a0b1c2d3e4f5a6b

Tree Structure for Multiple Files

If you have multiple files and directories:

Working Directory:
├── README.md
├── src/
│   ├── main.js
│   └── utils.js
└── package.json

Git creates a tree hierarchy:

What's Inside a Tree Object?

bash
git cat-file -p 7a8b9c0d1e2f3a4b5c6d7e8f9a0b1c2d3e4f5a6b

Output:

100644 blob 95d09f2b10159347eece71399a7e2e907ea3df4f    myfile.txt
100644 blob def456789abcdef0123456789abcdef012345678    README.md
040000 tree cde0123456789abcdef0123456789abcdef01234    src

📸 Step 2: Create a Commit Object

Now Git creates the commit object - the actual snapshot!

What's in a Commit?

┌─────────────────────────────────────────────────────────────────┐
│                      Commit Object Contents                      │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  tree 7a8b9c0d1e2f3a4b5c6d7e8f9a0b1c2d3e4f5a6b                 │
│       ↑ Points to root tree (the snapshot!)                     │
│                                                                  │
│  parent a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0               │
│         ↑ Previous commit (omitted if first commit)             │
│                                                                  │
│  author John Doe <john@example.com> 1706300000 +0000            │
│         ↑ Who wrote the code + timestamp                        │
│                                                                  │
│  committer John Doe <john@example.com> 1706300000 +0000         │
│            ↑ Who created the commit + timestamp                  │
│                                                                  │
│  Add myfile                                                      │
│  ↑ Commit message                                                │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

This is equivalent to running:

bash
# For first commit (no parent):
git commit-tree 7a8b9c0d... -m "Add myfile"
# Output: f1e2d3c4b5a6978808695a4b3c2d1e0f9a8b7c6d

# For subsequent commits (with parent):
git commit-tree 7a8b9c0d... -m "Add myfile" -p a1b2c3d4...

Author vs Committer - What's the Difference?

👤 Author
Who originally wrote the code

Example: You write a patch and email it to someone
👥 Committer
Who actually created the commit

Example: Maintainer applies your patch to their repo

Usually they're the same person! They differ in cases like:

  • Cherry-picking commits
  • Applying patches
  • Rebasing (committer changes, author stays same)

🔄 Step 3: Update the Branch Reference

The commit object exists, but Git needs to know it's the latest on this branch!

What Happens

┌─────────────────────────────────────────────────────────────────┐
│                    Branch Reference Update                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Before commit:                                                  │
│  .git/refs/heads/main → a1b2c3d4... (previous commit)           │
│                                                                  │
│  After commit:                                                   │
│  .git/refs/heads/main → f1e2d3c4... (NEW commit!)               │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

This is equivalent to running:

bash
git update-ref refs/heads/main f1e2d3c4b5a6978808695a4b3c2d1e0f9a8b7c6d

# Or simply:
echo "f1e2d3c4b5a6978808695a4b3c2d1e0f9a8b7c6d" > .git/refs/heads/main

🎬 Complete git commit Visualization


📊 The Complete Picture: git add + git commit

┌──────────────────────────────────────────────────────────────────────────┐
│                    COMPLETE WORKFLOW: add + commit                        │
├──────────────────────────────────────────────────────────────────────────┤
│                                                                           │
│  WORKING DIR          INDEX              OBJECTS           REFS           │
│  ════════════         ═════              ═══════           ════           │
│                                                                           │
│  ┌─────────┐                                                              │
│  │ myfile  │                                                              │
│  │ "hello" │                                                              │
│  └────┬────┘                                                              │
│       │                                                                   │
│       │  git add                                                          │
│       ▼                                                                   │
│  ┌─────────┐     ┌─────────────┐      ┌─────────────┐                    │
│  │ myfile  │────▶│ myfile.txt  │─────▶│ 📦 BLOB     │                    │
│  │ "hello" │     │ → 95d09f2b  │      │ 95d09f2b... │                    │
│  └─────────┘     └──────┬──────┘      └─────────────┘                    │
│                         │                                                 │
│                         │  git commit                                     │
│                         ▼                                                 │
│                  ┌─────────────┐      ┌─────────────┐                    │
│                  │ (unchanged) │      │ 🌳 TREE     │                    │
│                  │             │◀─────│ 7a8b9c0d... │                    │
│                  └─────────────┘      └──────┬──────┘                    │
│                                              │                            │
│                                              ▼                            │
│                                       ┌─────────────┐    ┌──────────┐    │
│                                       │ 📸 COMMIT   │───▶│ main     │    │
│                                       │ f1e2d3c4... │    │ f1e2d3c4 │    │
│                                       │ tree: 7a8b  │    └──────────┘    │
│                                       │ parent: ... │                    │
│                                       │ msg: "..."  │                    │
│                                       └─────────────┘                    │
│                                                                           │
└──────────────────────────────────────────────────────────────────────────┘

🔑 Key Insights

📦 Blobs are content-only
No filename, no path, just raw content with SHA
🌳 Trees give structure
Maps names → blobs/trees, like a directory listing
📸 Commits are snapshots
Point to a tree + metadata + parent(s)
📁 Branches are bookmarks
Just a file with a commit SHA inside

📋 Quick Reference: git commit Internals

What git commit DoesPlumbing Equivalent
Create tree from indexgit write-tree
Create commit objectgit commit-tree <tree> -m "msg" -p <parent>
Update branch refgit update-ref refs/heads/<branch> <commit>

Objects Created During git add + git commit

StepObject TypeCreated ByContains
git addBlobhash-object -wFile contents
git commitTreewrite-treeDirectory listing
git commitCommitcommit-treeTree + metadata

🏗️ Part 2: Building From Scratch

Now let's do the same thing manually!

Step 1: Create Minimal .git Structure

bash
# Create a new empty directory
mkdir scratch-repo && cd scratch-repo

# Check if Git recognizes it
git status
# Output: fatal: not a git repository

What does a Git repository need?

bash
# Create the minimal structure
mkdir -p .git/objects
mkdir -p .git/refs/heads

# Still not recognized!
git status
# Output: fatal: not a git repository

Creating minimal structure

Step 2: Create HEAD

Git needs to know the current branch:

bash
# Point HEAD to master branch
echo "ref: refs/heads/master" > .git/HEAD

# Now Git recognizes it!
git status

Output:

On branch master

No commits yet

nothing to commit (create/copy files and use "git add" to track)

🎉 We just created a Git repository without git init!

.git/
├── HEAD              ← Contains: ref: refs/heads/master
├── objects/          ← Empty (no blobs yet)
└── refs/
    └── heads/        ← Empty (no branches yet)

Step 3: Create a Blob (file contents)

Instead of git add, we'll use git hash-object:

bash
# Create a blob from stdin and write it (-w)
echo "Hello from scratch repo!" | git hash-object --stdin -w

Output:

9319a0a8769459fe40ef3849dd2b19b9b31d3f1b

Hash object command

What happened?

bash
# See the new object
tree .git/objects
.git/objects/
├── 93/
│   └── 19a0a8769459fe40ef3849dd2b19b9b31d3f1b
├── info/
└── pack/

Object stored in .git/objects

💡 Why 93/ directory?

Git splits the hash: first 2 chars = directory name, remaining 38 = filename.

This is an optimization: instead of 300,000 files in one folder, Git can have 256 folders with ~1,172 files each. Much faster lookups!

Verify the Blob

bash
# Check object type
git cat-file -t 9319a0a8769459fe40ef3849dd2b19b9b31d3f1b
# Output: blob

# Check object contents
git cat-file -p 9319a0a8769459fe40ef3849dd2b19b9b31d3f1b
# Output: Hello from scratch repo!

Step 4: Add Blob to Index (Staging Area)

The blob exists, but it's not tracked yet:

bash
git status
# Shows: nothing to commit

📋 Deep Dive: What is the Staging Area (Index)?

🎯 The Index Explained

The Index (also called Staging Area or Cache) is a binary file at .git/index that acts as a bridge between your working directory and the next commit.

Think of it as a "draft" of your next commit!

The Three Areas of Git

What Does the Index Store?

The index stores a list of entries, each containing:

FieldDescriptionExample
File pathWhere the file livessrc/hello.txt
Blob SHAHash of file contents9319a0a876...
File modePermissions100644 (regular file)
TimestampsFor detecting changesmtime, ctime
File sizeFor quick comparison35 bytes
┌─────────────────────────────────────────────────────────────┐
│                     .git/index (binary)                      │
├─────────────────────────────────────────────────────────────┤
│  Entry 1: hello.txt  → blob 9319a0a8... │ mode 100644       │
│  Entry 2: src/app.js → blob 4f2e8c1a... │ mode 100644       │
│  Entry 3: run.sh     → blob 7a8b9c0d... │ mode 100755       │
└─────────────────────────────────────────────────────────────┘

Why Does the Index Exist?

✅ Selective Staging
Commit only some changes, not everything
✅ Performance
Fast comparison without reading all files
✅ Atomic Commits
Prepare everything before committing
✅ Merge Staging
Handle conflicts before finalizing

🔧 Adding Our Blob to the Index

We need to add our blob to the index (staging area):

bash
# Add blob to index with a filename
git update-index --add --cacheinfo 100644 \
    9319a0a8769459fe40ef3849dd2b19b9b31d3f1b \
    hello.txt

Parameters explained:

  • --add: Add a new entry to the index
  • --cacheinfo: We're providing cache info directly (not from a file)
  • 100644: File mode (regular file, not executable)
  • 9319a0a...: The blob SHA we created earlier
  • hello.txt: The filename to associate with this blob

File Modes Reference

ModeTypeDescription
100644Regular fileNormal file (rw-r--r--)
100755ExecutableScript or binary (rwxr-xr-x)
120000SymlinkSymbolic link
040000DirectoryUsed in trees
bash
# Check what happened
ls .git
# Output: HEAD  index  objects  refs

# The index file was created!

🤯 The "Deleted" File Mystery

bash
git status
On branch master

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
        new file:   hello.txt

Changes not staged for commit:
        deleted:    hello.txt    ← File doesn't exist in working dir!

What's Happening Here?

⚠️ Understanding the "Deleted" Status

Git is comparing THREE things:

  1. Index says: "hello.txt should exist with content SHA 9319a0a8..."
  2. Working directory says: "There's no file called hello.txt"
  3. Git concludes: "The file was deleted from working directory!"

The Three-Way Comparison

Git status actually compares:

ComparisonWhat It Shows
HEAD vs Index"Changes to be committed"
Index vs Working Dir"Changes not staged for commit"
┌─────────────────────────────────────────────────────────────────┐
│                        git status output                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  HEAD (last commit)     Index (staging)      Working Directory   │
│  ═══════════════════    ═══════════════      ════════════════    │
│  (no commits yet)   →   hello.txt exists  →  hello.txt MISSING   │
│                                                                  │
│  Result:                                                         │
│  • "new file: hello.txt" (HEAD→Index: file added)               │
│  • "deleted: hello.txt"  (Index→WD: file missing)               │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

This is Actually Normal!

This situation happens because:

  1. We created a blob (file contents in object database)
  2. We added an index entry (telling Git this blob is "hello.txt")
  3. But we never created the actual file on disk!
💡 Key Insight: The blob, index, and working directory are independent! You can have:
  • A blob without an index entry (orphaned object)
  • An index entry without a working file (our current situation)
  • A working file without an index entry (untracked file)

Fix: Create the Working Directory File

bash
# Extract blob contents to file
git cat-file -p 9319a0a8769459fe40ef3849dd2b19b9b31d3f1b > hello.txt

git status
On branch master

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
        new file:   hello.txt

Now we have:

  • ✅ Blob in object database
  • ✅ Entry in index
  • ✅ File in working directory

Step 5: Create a Tree

The index has our staged files. Now create a tree from it:

bash
git write-tree

Output:

5d602270f7e18bdf87859adc086fa0a90fb89e39

What happened?

bash
# Verify tree was created
tree .git/objects
.git/objects/
├── 5d/
│   └── 602270f7e18bdf87859adc086fa0a90fb89e39  ← New tree!
├── 93/
│   └── 19a0a8769459fe40ef3849dd2b19b9b31d3f1b  ← Our blob
bash
# Inspect the tree
git cat-file -t 5d602270f7e18bdf87859adc086fa0a90fb89e39
# Output: tree

git cat-file -p 5d602270f7e18bdf87859adc086fa0a90fb89e39
# Output: 100644 blob 9319a0a8769459fe40ef3849dd2b19b9b31d3f1b    hello.txt

Tree created from index

File structure


Step 6: Create a Commit

Now create a commit pointing to our tree:

bash
# May need to set identity first
git config --global user.email "you@example.com"
git config --global user.name "Your Name"

# Create commit
git commit-tree 5d602270f7e18bdf87859adc086fa0a90fb89e39 -m "Initial commit"

Output:

a52f8f31b84c2e5c0ea76bf21c9f57f30476af91

What happened?

bash
# Inspect the commit
git cat-file -p a52f8f31b84c2e5c0ea76bf21c9f57f30476af91
tree 5d602270f7e18bdf87859adc086fa0a90fb89e39
author Your Name <you@example.com> 1769447140 +0000
committer Your Name <you@example.com> 1769447140 +0000

Initial commit

Commit created - Cool!


Step 7: Update Branch Reference

The commit exists, but git status still says "No commits yet":

bash
git status
# Still shows: No commits yet

Why? Because the master branch doesn't point to anything yet!

bash
# Check refs/heads
ls .git/refs/heads/
# Empty!

Let's fix that:

bash
# Point master to our commit
echo "a52f8f31b84c2e5c0ea76bf21c9f57f30476af91" > .git/refs/heads/master

# Alternative (safer) way:
# git update-ref refs/heads/master a52f8f31b84c2e5c0ea76bf21c9f57f30476af91
bash
git status
On branch master
nothing to commit, working tree clean

🎉 SUCCESS! We created a complete commit without using git add or git commit!

Repository created from scratch!


📊 Complete Flow Visualization


🎯 What We Learned

Porcelain CommandEquivalent Plumbing
git initmkdir -p .git/{objects,refs/heads} + create HEAD
git add filegit hash-object -w file + git update-index --add
git commit -m "msg"git write-tree + git commit-tree + update refs

📁 Final Repository Structure

scratch-repo/
├── hello.txt                          ← Working directory file
└── .git/
    ├── HEAD                           ← ref: refs/heads/master
    ├── index                          ← Binary staging area
    ├── objects/
    │   ├── 5d/
    │   │   └── 602270f7e18bdf...     ← Tree object
    │   ├── 93/
    │   │   └── 19a0a8769459fe...     ← Blob object
    │   └── a5/
    │       └── 2f8f31b84c2e5c...     ← Commit object
    └── refs/
        └── heads/
            └── master                 ← a52f8f31b84c2e5c...

🚀 What's Next?

🌿 Next: Working with Branches From Scratch (Part B)

Now that we can create commits manually, let's create and switch branches without using git branch or git checkout!

Continue to Part B →


📝 Quick Reference

Commands Used

bash
# Create blob
echo "content" | git hash-object --stdin -w

# Inspect object
git cat-file -t SHA    # type
git cat-file -p SHA    # content
git cat-file -s SHA    # size

# Update index
git update-index --add --cacheinfo MODE SHA FILENAME

# Create tree
git write-tree

# Create commit
git commit-tree TREE_SHA -m "message"
git commit-tree TREE_SHA -m "message" -p PARENT_SHA

# Update ref
git update-ref refs/heads/BRANCH COMMIT_SHA

File Modes

ModeType
100644Regular file
100755Executable
120000Symlink
040000Directory (tree)

Released under the MIT License.