byte2code.me

Understanding the .git Folder

What is the .git Folder?

The .git A folder is Git's database and control center. When you run git init, Git creates this hidden directory in your project root. This folder contains everything Git needs to track your project's history, manage branches, store configurations, and maintain the integrity of your repository.

Why it exists:

- It's Git's complete repository state

- Contains all version history (even for deleted files)

- Stores configuration and metadata

- Enables Git to work offline (no server needed for basic operations)

Structure of the .git Directory

.git/

├── HEAD # Points to current branch

├── config # Repository-specific configuration

├── description # Repository description

├── hooks/ # Git hooks (pre-commit, post-commit, etc.)

├── info/ # Additional info (exclude patterns, etc.)

├── objects/ # Git's object database (THE HEART)

│ ├── [0-9a-f][0-9a-f]/ # Objects stored by hash prefix

│ └── pack/ # Packed objects for efficiency

├── refs/ # References (branches, tags)

│ ├── heads/ # Branch references

│ └── tags/ # Tag references

├── index # Staging area (binary file)

└── logs/ # Reflog history

└── HEAD # History of HEAD movements

Key Components:

1. `objects/` - The object database where all your files, directories, and commits are stored

2. `refs/` - Pointers to commits (branches and tags)

3. `HEAD` - Points to the current branch (which points to a commit)

4. `index` - The staging area (what you've git added)

5. `config` - Repository settings

Git Objects: Blob, Tree, Commit

Git stores everything as objects in its object database. There are three fundamental types:

1. Blob (Binary Large Object)

What it is: A blob stores the contents of a file (but not the filename or directory structure).

Characteristics:

- Contains only file content (no metadata like filename)

- Identified by the SHA-1 hash of its content

- Same content = same hash = same blob (deduplication!)

Example:

File: src/App.jsx

Content: "import React from 'react'..."

Blob hash: a1b2c3d4e5f6...

Key Insight: If you have 100 files with identical content, Git stores only ONE blob. This is how Git achieves efficient storage.

2. Tree

What it is: A tree represents a directory snapshot - it lists which blobs and subtrees belong to a directory.

Characteristics:

- Contains references to blobs (files) and other trees (subdirectories)

- Stores filenames and permissions

- Represents the state of a directory at a point in time

Example:

Tree: abc123...

├── blob def456... "App.jsx" (file mode: 100644)

├── blob ghi789... "index.css" (file mode: 100644)

└── tree jkl012... "components" (file mode: 040000)

├── blob mno345... "Hero.jsx"

└── blob pqr678... "About.jsx"

Key Insight: Trees create the directory structure. Each commit points to a root tree, which recursively points to all files in your project.

3. Commit

What it is: A commit is a snapshot of your entire project at a specific point in time.

Contains:

- Pointer to a tree (the root directory snapshot)

- Pointer to parent commit(s) - creates the history chain

- Author and committer information

- Commit message

- Timestamp

Example:

Commit: xyz789...

Tree: abc123... (root tree)

Parent: uvw456... (previous commit)

Author: John Doe <john@example.com\>

Date: 2024-01-15 10:30:00

Message: "Add new feature."

Key Insight: A commit is a complete snapshot, not a diff. Git can compute diffs by comparing trees, but it stores full snapshots.

Visual Relationship

Commit

│

├─→ Tree (root directory)

│ │

│ ├─→ Blob (file1.js)

│ ├─→ Blob (file2.css)

│ └─→ Tree (subdirectory)

│ │

│ ├─→ Blob (file3.js)

│ └─→ Blob (file4.js)

│

└─→ Parent Commit

│

└─→ (previous tree and history)

How Git Tracks Changes

The Three States of Git

Git has three main areas where files can exist:

1. Working Directory - Your actual files on disk

2. Staging Area (Index) - Files prepared for the next commit

3. Repository (.git/objects) - Committed snapshots

Working Directory → Staging Area → Repository

(modified) (staged) (committed)

How Git Detects Changes

Git doesn't track changes by watching files. Instead:

1. It computes hashes of file contents

2. Compares hashes to detect modifications

3. Stores new objects only when content changes

Example Flow:

1. You edit src/App.jsx

→ Git computes hash: old_hash = abc123...

→ Git computes hash: new_hash = def456...

→ Hashes differ → File changed!

2. You run git add src/App.jsx

→ Git creates/updates blob with hash def456...

→ Updates index to point to new blob

3. You run git commit

→ Git creates a new tree with the updated reference

→ Creates a new commit pointing to the new tree

→ Updates HEAD to point to the new commit

The Index (Staging Area)

The index is a binary file (`.git/index`) that tracks:

- Which files are staged

- What blob does each staged file point to

- File metadata (permissions, timestamps)

Key Point: The index is like a "proposed next commit" - it's what your commit will look like.

## The Internal Flow of Git Commands

What Happens During `git add`

┌─────────────────────────────────────────┐

│ git add src/App.jsx │

└─────────────────────────────────────────┘

│

▼

┌─────────────────────────────────────────┐

│ 1. Read file content from disk │

│ 2. Compute the SHA-1 hash of the content │

│ 3. Compress content (zlib) │

└─────────────────────────────────────────┘

│

▼

┌─────────────────────────────────────────┐

│ 4. Store blob in .git/objects/ │

│ Location: .git/objects/ab/c123... │

│ (first 2 chars = directory) │

└─────────────────────────────────────────┘

│

▼

┌─────────────────────────────────────────┐

│ 5. Update .git/index │

│ - Add/update entry for App.jsx │

│ - Point to blob hash │

│ - Store file metadata │

└─────────────────────────────────────────┘

What Actually Happens:

1. Git reads the file content

2. Creates a blob object (content + header)

3. Computes SHA-1 hash

4. Stores blob in .git/objects/[first-2-chars]/[remaining-chars]

5. Updates the index file to reference this blob

Important: If the file content hasn't changed, Git reuses the existing blob (same hash = same object).

What Happens During `git commit`

┌─────────────────────────────────────────┐

│ git commit -m "Update App.jsx." │

└─────────────────────────────────────────┘

│

▼

┌─────────────────────────────────────────┐

│ 1. Read current index │

│ 2. Build tree objects from the index │

│ - Create blobs (if not exists) │

│ - Create trees for directories │

│ - Link everything together │

└─────────────────────────────────────────┘

│

▼

┌─────────────────────────────────────────┐

│ 3. Create a commit object │

│ - Point to the root tree │

│ - Point to parent commit(s) │

│ - Add author, message, timestamp │

└─────────────────────────────────────────┘

│

▼

┌─────────────────────────────────────────┐

│ 4. Compute commit hash │

│ 5. Store commit in .git/objects/ │

└─────────────────────────────────────────┘

│

▼

┌─────────────────────────────────────────┐

│ 6. Update HEAD reference │

│ - Update .git/refs/heads/main │

│ - Point to the new commit hash │

└─────────────────────────────────────────┘

Detailed Steps:

1. Build Trees:

- Start from the root directory

- For each file in the index, ensure blob exists

- Create tree objects for each directory

- Link trees together hierarchically

2. Create Commit:

- Read current HEAD to get parent commit

- Create a commit object with:

- Tree hash (root tree)

- Parent commit hash(es)

- Author info

- Committer info

- Commit message

- Timestamp

3. Store and Update:

- Compute SHA-1 of commit object

- Store in .git/objects/

- Update .git/refs/heads/[branch-name] to point to a new commit

- Update .git/HEAD if needed

Visual Flow Diagram

┌──────────────┐

│ Working Dir │

│ App.jsx │

│ (modified) │

└──────┬───────┘

│ git add

▼

┌──────────────┐ ┌──────────────┐

│ Index │────▶│ Blob │

│ App.jsx │ │ (stored) │

│ (staged) │ │ hash: abc │

└──────┬───────┘ └──────────────┘

│ git commit

▼

┌──────────────┐ ┌──────────────┐ ┌──────────────┐

│ Tree │────▶│ Commit │────▶│ Branch │

│ (root dir) │ │ (snapshot) │ │ (refs/...) │

│ hash: def │ │ hash: xyz │ │ │

└──────────────┘ └──────────────┘ └──────────────┘

How Git Uses Hashes for Integrity

SHA-1 Hashing

Git uses SHA-1 (Secure Hash Algorithm 1) to create unique identifiers for every object.

How it works:

- Input: Object content (blob, tree, or commit)

- Process: SHA-1 algorithm computes a 160-bit hash

- Output: 40-character hexadecimal string (e.g., a1b2c3d4e5f6...)

Properties:

- Deterministic: Same content always produces the same hash

- Unique: Different content produces different hash (with extremely high probability)

- One-way: Cannot reverse hash to get original content

Integrity Guarantees

1. Content Addressing:

- Objects are stored by their hash, not by filename

- If content changes, hash changes → new object

- Same content = same hash = same object (automatic deduplication)

2. Tamper Detection:

- If someone modifies an object, its hash changes

- Git can detect corruption by verifying hashes

- Commits reference objects by hash, so any change breaks the chain

3. Chain of Trust:

Commit (hash: xyz)

│

├─→ Points to Tree (hash: abc)

│ │

│ └─→ Points to Blob (hash: def)

│

└─→ Points to Parent Commit (hash: uvw)

- If any object is modified, its hash changes

- Parent commit references break

- Git detects the corruption

Example: Detecting Corruption

Original:

Commit abc123... → Tree def456... → Blob ghi789...

If blob is modified:

Blob hash changes: ghi789... → jkl012...

Tree still references: ghi789... (WRONG!)

Git detects: "Object ghi789... not found or corrupted"

Hash Collisions (Theoretical)

SHA-1 has known vulnerabilities, but:

- Git is moving to SHA-256 in newer versions

- Collisions are extremely rare in practice

- Git's design makes collisions less critical (content addressing)

Building a Mental Model

Core Concepts

1. Git is a Content-Addressed File System

- Files are stored by their content hash, not by name

- This enables deduplication and integrity checking

2. Everything is an Object

- Files → Blobs

- Directories → Trees

- Snapshots → Commits

- All stored in .git/objects/

3. Commits are Snapshots, Not Diffs

- Each commit contains a full tree

- Diffs are computed on-demand by comparing trees

- This makes Git fast (can jump to any commit instantly)

4. References are Pointers

- Branches are just pointers to commits

- Tags are pointers to commits

- HEAD points to the current branch

- Moving a branch = updating a pointer

Mental Model Diagram

┌─────────────┐

│ HEAD │

│ (pointer) │

└──────┬──────┘

│

▼

┌─────────────┐

│ main │

│ (branch) │

└──────┬──────┘

│

▼

┌─────────────┐

│ Commit C │

│ hash: xyz │

└──────┬──────┘

│

┌──────────────────┼──────────────────┐

│ │ │

▼ ▼ ▼

┌─────────┐ ┌─────────┐ ┌─────────┐

│ Tree │ │ Parent │ │ Author │

│ (root) │ │ Commit │ │ Info │

└────┬────┘ └─────────┘ └─────────┘

│

▼

┌─────────┐

│ Tree │

│ (dir) │

└────┬────┘

│

┌────┴────┬──────────┬──────────┐

│ │ │ │

▼ ▼ ▼ ▼

┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐

│Blob │ │Blob │ │Tree │ │Blob │

│file1│ │file2│ │dir/ │ │file3│

└─────┘ └─────┘ └─────┘ └─────┘

Key Takeaways

1. `.git/` is the repository - Everything Git needs is here

2. Objects are immutable - Once created, they never change

3. Hashes ensure integrity - Any modification is detectable

4. References are lightweight - Branches are just files with commit hashes

5. History is a chain - Commits link to parents, creating a graph

Practical Understanding

When you understand Git internally:

- `git add` = Create/update blobs + update index

- `git commit` = Create trees + create commit + update branch pointer

- `git branch` = Create a new file in .git/refs/heads/

- `git checkout` = Update HEAD + update working directory from tree

- `git merge` = Create commit with multiple parents

- `git clone` = Copy entire .git/ folder + checkout working directory

Summary

Git's internal design is elegant and powerful:

- Content-addressable storage enables deduplication and integrity

- Immutable objects create a reliable history

- Hash-based references ensure data integrity

- Simple object model (blob, tree, commit) scales to complex projects

Understanding these internals helps you:

- Debug Git issues more effectively

- Use Git commands with confidence

- Appreciate Git's design decisions

- Build better mental models of version control

Remember: Git is not magic - it's a well-designed content-addressable file system with a version control layer on top. Once you understand the objects and references, everything else makes sense!

Further Exploration

To see Git internals in action:

```bash

# View the object database

ls -la .git/objects/

# View a specific object

git cat-file -p <hash>

# View the index

git ls-files --stage

# View a tree

git ls-tree <tree-hash>

# View a commit

git cat-file -p HEAD

# View all refs

ls -la .git/refs/heads/

Pro Tip: Use git cat-file -t <hash> to see the type of any object (blob, tree, or commit).

Command Palette

Understanding the .git Folder

What is the .git Folder?

Why it exists:

Structure of the .git Directory

Key Components:

Git Objects: Blob, Tree, Commit

1. Blob (Binary Large Object)

Characteristics:

Example:

2. Tree

Characteristics:

Example:

3. Commit

Contains:

Example:

Visual Relationship

How Git Tracks Changes

The Three States of Git

How Git Detects Changes

Example Flow:

The Index (Staging Area)

What Happens During git add

What Actually Happens:

What Happens During git commit

Detailed Steps:

Visual Flow Diagram

How Git Uses Hashes for Integrity

SHA-1 Hashing

Integrity Guarantees

Example: Detecting Corruption

Git detects: "Object ghi789... not found or corrupted"

Building a Mental Model

Core Concepts

Mental Model Diagram

Key Takeaways

Practical Understanding

Summary

Further Exploration

Comments

More from this blog

What Happens During `git add`

What Happens During `git commit`