How Git Works Internally
Understanding the .git Folder
What is the .git Folder?
The .git A folder is Git's database and control center. When you run git init, Git creates this hidden directory in your project root. This folder contains everything Git needs to track your project's history, manage branches, store configurations, and maintain the integrity of your repository.
Why it exists:
- It's Git's complete repository state
- Contains all version history (even for deleted files)
- Stores configuration and metadata
- Enables Git to work offline (no server needed for basic operations)
Structure of the .git Directory
.git/
├── HEAD # Points to current branch
├── config # Repository-specific configuration
├── description # Repository description
├── hooks/ # Git hooks (pre-commit, post-commit, etc.)
├── info/ # Additional info (exclude patterns, etc.)
├── objects/ # Git's object database (THE HEART)
│ ├── [0-9a-f][0-9a-f]/ # Objects stored by hash prefix
│ └── pack/ # Packed objects for efficiency
├── refs/ # References (branches, tags)
│ ├── heads/ # Branch references
│ └── tags/ # Tag references
├── index # Staging area (binary file)
└── logs/ # Reflog history
└── HEAD # History of HEAD movements
Key Components:
1. `objects/` - The object database where all your files, directories, and commits are stored
2. `refs/` - Pointers to commits (branches and tags)
3. `HEAD` - Points to the current branch (which points to a commit)
4. `index` - The staging area (what you've git added)
5. `config` - Repository settings
Git Objects: Blob, Tree, Commit
Git stores everything as objects in its object database. There are three fundamental types:
1. Blob (Binary Large Object)
What it is: A blob stores the contents of a file (but not the filename or directory structure).
Characteristics:
- Contains only file content (no metadata like filename)
- Identified by the SHA-1 hash of its content
- Same content = same hash = same blob (deduplication!)
Example:
File: src/App.jsx
Content: "import React from 'react'..."
Blob hash: a1b2c3d4e5f6...
Key Insight: If you have 100 files with identical content, Git stores only ONE blob. This is how Git achieves efficient storage.
2. Tree
What it is: A tree represents a directory snapshot - it lists which blobs and subtrees belong to a directory.
Characteristics:
- Contains references to blobs (files) and other trees (subdirectories)
- Stores filenames and permissions
- Represents the state of a directory at a point in time
Example:
Tree: abc123...
├── blob def456... "App.jsx" (file mode: 100644)
├── blob ghi789... "index.css" (file mode: 100644)
└── tree jkl012... "components" (file mode: 040000)
├── blob mno345... "Hero.jsx"
└── blob pqr678... "About.jsx"
Key Insight: Trees create the directory structure. Each commit points to a root tree, which recursively points to all files in your project.
3. Commit
What it is: A commit is a snapshot of your entire project at a specific point in time.
Contains:
- Pointer to a tree (the root directory snapshot)
- Pointer to parent commit(s) - creates the history chain
- Author and committer information
- Commit message
- Timestamp
Example:
Commit: xyz789...
Tree: abc123... (root tree)
Parent: uvw456... (previous commit)
Author: John Doe <john@example.com\>
Date: 2024-01-15 10:30:00
Message: "Add new feature."
Key Insight: A commit is a complete snapshot, not a diff. Git can compute diffs by comparing trees, but it stores full snapshots.
Visual Relationship
Commit
│
├─→ Tree (root directory)
│ │
│ ├─→ Blob (file1.js)
│ ├─→ Blob (file2.css)
│ └─→ Tree (subdirectory)
│ │
│ ├─→ Blob (file3.js)
│ └─→ Blob (file4.js)
│
└─→ Parent Commit
│
└─→ (previous tree and history)
How Git Tracks Changes
The Three States of Git
Git has three main areas where files can exist:
1. Working Directory - Your actual files on disk
2. Staging Area (Index) - Files prepared for the next commit
3. Repository (.git/objects) - Committed snapshots
Working Directory → Staging Area → Repository
(modified) (staged) (committed)
How Git Detects Changes
Git doesn't track changes by watching files. Instead:
1. It computes hashes of file contents
2. Compares hashes to detect modifications
3. Stores new objects only when content changes
Example Flow:
1. You edit src/App.jsx
→ Git computes hash: old_hash = abc123...
→ Git computes hash: new_hash = def456...
→ Hashes differ → File changed!
2. You run git add src/App.jsx
→ Git creates/updates blob with hash def456...
→ Updates index to point to new blob
3. You run git commit
→ Git creates a new tree with the updated reference
→ Creates a new commit pointing to the new tree
→ Updates HEAD to point to the new commit
The Index (Staging Area)
The index is a binary file (`.git/index`) that tracks:
- Which files are staged
- What blob does each staged file point to
- File metadata (permissions, timestamps)
Key Point: The index is like a "proposed next commit" - it's what your commit will look like.
## The Internal Flow of Git Commands
What Happens During git add
┌─────────────────────────────────────────┐
│ git add src/App.jsx │
└─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ 1. Read file content from disk │
│ 2. Compute the SHA-1 hash of the content │
│ 3. Compress content (zlib) │
└─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ 4. Store blob in .git/objects/ │
│ Location: .git/objects/ab/c123... │
│ (first 2 chars = directory) │
└─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ 5. Update .git/index │
│ - Add/update entry for App.jsx │
│ - Point to blob hash │
│ - Store file metadata │
└─────────────────────────────────────────┘
What Actually Happens:
1. Git reads the file content
2. Creates a blob object (content + header)
3. Computes SHA-1 hash
4. Stores blob in .git/objects/[first-2-chars]/[remaining-chars]
5. Updates the index file to reference this blob
Important: If the file content hasn't changed, Git reuses the existing blob (same hash = same object).
What Happens During git commit
┌─────────────────────────────────────────┐
│ git commit -m "Update App.jsx." │
└─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ 1. Read current index │
│ 2. Build tree objects from the index │
│ - Create blobs (if not exists) │
│ - Create trees for directories │
│ - Link everything together │
└─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ 3. Create a commit object │
│ - Point to the root tree │
│ - Point to parent commit(s) │
│ - Add author, message, timestamp │
└─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ 4. Compute commit hash │
│ 5. Store commit in .git/objects/ │
└─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ 6. Update HEAD reference │
│ - Update .git/refs/heads/main │
│ - Point to the new commit hash │
└─────────────────────────────────────────┘
Detailed Steps:
1. Build Trees:
- Start from the root directory
- For each file in the index, ensure blob exists
- Create tree objects for each directory
- Link trees together hierarchically
2. Create Commit:
- Read current HEAD to get parent commit
- Create a commit object with:
- Tree hash (root tree)
- Parent commit hash(es)
- Author info
- Committer info
- Commit message
- Timestamp
3. Store and Update:
- Compute SHA-1 of commit object
- Store in .git/objects/
- Update .git/refs/heads/[branch-name] to point to a new commit
- Update .git/HEAD if needed
Visual Flow Diagram
┌──────────────┐
│ Working Dir │
│ App.jsx │
│ (modified) │
└──────┬───────┘
│ git add
▼
┌──────────────┐ ┌──────────────┐
│ Index │────▶│ Blob │
│ App.jsx │ │ (stored) │
│ (staged) │ │ hash: abc │
└──────┬───────┘ └──────────────┘
│ git commit
▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Tree │────▶│ Commit │────▶│ Branch │
│ (root dir) │ │ (snapshot) │ │ (refs/...) │
│ hash: def │ │ hash: xyz │ │ │
└──────────────┘ └──────────────┘ └──────────────┘
How Git Uses Hashes for Integrity
SHA-1 Hashing
Git uses SHA-1 (Secure Hash Algorithm 1) to create unique identifiers for every object.
How it works:
- Input: Object content (blob, tree, or commit)
- Process: SHA-1 algorithm computes a 160-bit hash
- Output: 40-character hexadecimal string (e.g., a1b2c3d4e5f6...)
Properties:
- Deterministic: Same content always produces the same hash
- Unique: Different content produces different hash (with extremely high probability)
- One-way: Cannot reverse hash to get original content
Integrity Guarantees
1. Content Addressing:
- Objects are stored by their hash, not by filename
- If content changes, hash changes → new object
- Same content = same hash = same object (automatic deduplication)
2. Tamper Detection:
- If someone modifies an object, its hash changes
- Git can detect corruption by verifying hashes
- Commits reference objects by hash, so any change breaks the chain
3. Chain of Trust:
Commit (hash: xyz)
│
├─→ Points to Tree (hash: abc)
│ │
│ └─→ Points to Blob (hash: def)
│
└─→ Points to Parent Commit (hash: uvw)
- If any object is modified, its hash changes
- Parent commit references break
- Git detects the corruption
Example: Detecting Corruption
Original:
Commit abc123... → Tree def456... → Blob ghi789...
If blob is modified:
Blob hash changes: ghi789... → jkl012...
Tree still references: ghi789... (WRONG!)
Git detects: "Object ghi789... not found or corrupted"
Hash Collisions (Theoretical)
SHA-1 has known vulnerabilities, but:
- Git is moving to SHA-256 in newer versions
- Collisions are extremely rare in practice
- Git's design makes collisions less critical (content addressing)
Building a Mental Model
Core Concepts
1. Git is a Content-Addressed File System
- Files are stored by their content hash, not by name
- This enables deduplication and integrity checking
2. Everything is an Object
- Files → Blobs
- Directories → Trees
- Snapshots → Commits
- All stored in .git/objects/
3. Commits are Snapshots, Not Diffs
- Each commit contains a full tree
- Diffs are computed on-demand by comparing trees
- This makes Git fast (can jump to any commit instantly)
4. References are Pointers
- Branches are just pointers to commits
- Tags are pointers to commits
- HEAD points to the current branch
- Moving a branch = updating a pointer
Mental Model Diagram
┌─────────────┐
│ HEAD │
│ (pointer) │
└──────┬──────┘
│
▼
┌─────────────┐
│ main │
│ (branch) │
└──────┬──────┘
│
▼
┌─────────────┐
│ Commit C │
│ hash: xyz │
└──────┬──────┘
│
┌──────────────────┼──────────────────┐
│ │ │
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Tree │ │ Parent │ │ Author │
│ (root) │ │ Commit │ │ Info │
└────┬────┘ └─────────┘ └─────────┘
│
▼
┌─────────┐
│ Tree │
│ (dir) │
└────┬────┘
│
┌────┴────┬──────────┬──────────┐
│ │ │ │
▼ ▼ ▼ ▼
┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐
│Blob │ │Blob │ │Tree │ │Blob │
│file1│ │file2│ │dir/ │ │file3│
└─────┘ └─────┘ └─────┘ └─────┘
Key Takeaways
1. `.git/` is the repository - Everything Git needs is here
2. Objects are immutable - Once created, they never change
3. Hashes ensure integrity - Any modification is detectable
4. References are lightweight - Branches are just files with commit hashes
5. History is a chain - Commits link to parents, creating a graph
Practical Understanding
When you understand Git internally:
- `git add` = Create/update blobs + update index
- `git commit` = Create trees + create commit + update branch pointer
- `git branch` = Create a new file in .git/refs/heads/
- `git checkout` = Update HEAD + update working directory from tree
- `git merge` = Create commit with multiple parents
- `git clone` = Copy entire .git/ folder + checkout working directory
Summary
Git's internal design is elegant and powerful:
- Content-addressable storage enables deduplication and integrity
- Immutable objects create a reliable history
- Hash-based references ensure data integrity
- Simple object model (blob, tree, commit) scales to complex projects
Understanding these internals helps you:
- Debug Git issues more effectively
- Use Git commands with confidence
- Appreciate Git's design decisions
- Build better mental models of version control
Remember: Git is not magic - it's a well-designed content-addressable file system with a version control layer on top. Once you understand the objects and references, everything else makes sense!
Further Exploration
To see Git internals in action:
```bash
# View the object database
ls -la .git/objects/
# View a specific object
git cat-file -p <hash>
# View the index
git ls-files --stage
# View a tree
git ls-tree <tree-hash>
# View a commit
git cat-file -p HEAD
# View all refs
ls -la .git/refs/heads/
Pro Tip: Use git cat-file -t <hash> to see the type of any object (blob, tree, or commit).




