Linux - RAID Concepts (Redundant Array of Independent Disks)

RAID is a technology that uses multiple physical disks to improve:

  • Performance

  • Reliability

  • Fault tolerance

  • Storage capacity (in some RAID levels)

RAID can be implemented in two ways:

  • Software RAID → via Linux mdadm

  • Hardware RAID → via RAID controller cards


Why RAID?

RAID helps solve key storage problems:
✔ Disk failure protection
✔ Better read/write performance
✔ Combining multiple disks for larger storage
✔ Continuous data availability


Types of RAID Levels

Below are the commonly used RAID levels and what they mean.


RAID 0 – Striping

Focus: High performance
Fault Tolerance: ❌ None

How it works

Data is split evenly into blocks across multiple disks.

Disk1: A1  A3  A5
Disk2: A2  A4  A6

Benefits

  • Fast read and write speeds

  • Utilizes full capacity:
    Total size = Disk1 + Disk2 + Disk3 …

Drawbacks

  • If one disk fails → everything is lost

  • No redundancy

Best For

  • Gaming

  • High-speed workloads

  • Cache disks


RAID 1 – Mirroring

Focus: Redundancy (high safety)
Fault Tolerance: ✔ Yes (can survive 1 disk failure)

How it works

Data is duplicated identically on two disks.

Disk1: A1 A2 A3
Disk2: A1 A2 A3

Benefits

  • High reliability

  • Fast reads (data can come from any disk)

  • Simple setup

Drawbacks

  • Storage is cut in half:
    Total size = size of one disk

  • Write speed may be slightly slower

Best For

  • Critical data

  • Small servers

  • Databases


RAID 5 – Striping + Parity (Most common)

Focus: Balance of performance and redundancy
Fault Tolerance: ✔ Can survive 1 disk failure

How it works

Data + parity distributed across disks

Disk1: A1  A2  P3
Disk2: A1  P2  A3
Disk3: P1  A2  A3

Parity helps rebuild data if one disk fails.

Benefits

  • Good performance

  • Fault-tolerant

  • More efficient than RAID 1

  • Total size = (N−1) disks

Drawbacks

  • Slow write performance due to parity

  • If 2 disks fail → data lost

  • Rebuild times can be long

Best For

  • File servers

  • Backup servers

  • Read-heavy workloads


RAID 6 – Double Parity

Focus: High redundancy
Fault Tolerance: ✔ Can survive 2 disk failures

How it works

Like RAID 5 but with two parity blocks.

Benefits

  • Very high safety

  • Good for large disks (where rebuild takes long)

Drawbacks

  • Slower writes than RAID 5

  • More parity overhead

  • Total size = (N−2) disks

Best For

  • Large enterprise storage

  • Archival systems

  • High-availability servers


RAID 10 (RAID 1 + RAID 0) – Best Combined Level

Focus: Performance + Redundancy
Fault Tolerance: ✔ Can survive multiple disk failures (as long as not in same mirror pair)

How it works

First mirror, then stripe:

Disk1 ⇄ Disk2  → mirror
Disk3 ⇄ Disk4  → mirror
Then stripe across the two mirrors

Benefits

  • Very fast

  • High reliability

  • Great for databases

Drawbacks

  • Needs minimum 4 disks

  • Storage = 50% of total

Best For

  • Databases

  • Heavy I/O workloads

  • Virtual machines


RAID Comparison Table

RAID Level Min Disks Fault Tolerance Speed Storage Efficiency
RAID 0 2 ❌ None ⭐⭐⭐⭐ 100%
RAID 1 2 ✔ One ⭐⭐ (writes) ⭐⭐⭐ (reads) 50%
RAID 5 3 ✔ One ⭐⭐⭐ (N−1)/N
RAID 6 4 ✔ Two ⭐⭐ (N−2)/N
RAID 10 4 ✔ Multiple ⭐⭐⭐⭐ 50%

Software RAID (mdadm)

Linux commonly uses mdadm to create RAID arrays.

Example: Create RAID 5

sudo mdadm --create /dev/md0 --level=5 --raid-devices=3 /dev/sdb /dev/sdc /dev/sdd

Check RAID status:

cat /proc/mdstat

Key RAID Terms

Striping

Splitting data into chunks and writing across disks
→ High performance

Mirroring

Making copies of data
→ High redundancy

Parity

Extra information that allows reconstructing lost data
→ Used in RAID 5/6

Hot Spare

A standby disk that automatically replaces a failed disk


When NOT to Use RAID

RAID is not a backup.
It protects from disk failure but not from:

  • accidental deletion

  • ransomware

  • corruption

  • fire/flood theft
    Always keep separate backups.