Jump to content

RAID1

From Pulsed Media Wiki


Pulsed Media uses RAID1 on every production server's boot drives. Two disks, identical copies. One fails, the other keeps running. No parity math, no striping, no minimum disk count beyond two. The trade-off: 50% of raw capacity goes to the mirror, and that cost buys the ability to lose a disk without losing the OS.

RAID1 is the simplest redundant RAID level. On Linux, it is implemented with mdadm, the standard software RAID tool. This article covers the full mdadm command set for RAID1.

How mirroring works

When a write arrives, the kernel sends it to all member disks. The write completes when every disk acknowledges it. Reads can come from any mirror member, and mdadm distributes them in round-robin by default, so read throughput roughly doubles with two disks.

Capacity equals the smallest disk in the array. Two 4TB drives give you 4TB usable. Three 4TB drives still give you 4TB usable, but now any two can fail and the array survives. Extra mirrors buy fault tolerance, not space.

Writes are limited to single-disk speed because every write must land on all members. Reads scale with disk count, up to bus and controller limits.

mdadm: complete command reference

Creating an array

<syntaxhighlight lang="bash">

  1. Basic RAID1 with two disks

mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sda1 /dev/sdb1

  1. --level=mirror is equivalent to --level=1

mdadm --create /dev/md0 --level=mirror --raid-devices=2 /dev/sda1 /dev/sdb1

  1. With write-intent bitmap (recommended — see section below)

mdadm --create /dev/md0 --level=1 --raid-devices=2 --bitmap=internal /dev/sda1 /dev/sdb1

  1. Triple mirror: tolerates 2 simultaneous failures

mdadm --create /dev/md0 --level=1 --raid-devices=3 /dev/sda1 /dev/sdb1 /dev/sdc1

  1. Specify metadata version explicitly

mdadm --create /dev/md0 --level=1 --raid-devices=2 --metadata=1.2 /dev/sda1 /dev/sdb1 </syntaxhighlight>

On creation, mdadm starts an initial sync in the background. You can use the array immediately, but expect slower I/O until the sync finishes. Watch progress with cat /proc/mdstat.

Checking status

<syntaxhighlight lang="bash">

  1. Quick overview of all arrays

cat /proc/mdstat

  1. Detailed information on one array

mdadm --detail /dev/md0

  1. All arrays with UUIDs (useful for mdadm.conf)

mdadm --detail --scan </syntaxhighlight>

The /proc/mdstat status line shows the array state in bracket notation:

  • [UU] — both disks up, array healthy
  • [U_] or [_U] — one disk missing or failed, array degraded
  • (F) after a device name — that device is marked faulty

A degraded array is still functional and still protects data from read errors, but it cannot survive another disk failure. Replace the failed disk promptly.

mdadm --detail gives richer output including rebuild progress, member device state, array UUID, and metadata version.

Examining a disk

<syntaxhighlight lang="bash">

  1. Read the RAID superblock on a disk — shows which array it belongs to

mdadm --examine /dev/sda1

  1. Examine a full disk (useful when you are not sure which partition is in the array)

mdadm --examine /dev/sda </syntaxhighlight>

This is useful when troubleshooting unknown drives or confirming that a replacement disk does not already belong to another array.

Replacing a failed disk: full procedure

<syntaxhighlight lang="bash">

  1. Step 1: mark the failed disk as faulty (if the kernel has not already)

mdadm --fail /dev/md0 /dev/sda1

  1. Step 2: remove it from the array

mdadm --remove /dev/md0 /dev/sda1

  1. Step 3: physically replace the drive
  1. Step 4: copy the partition table from the surviving disk to the new one

sfdisk -d /dev/sdb | sfdisk /dev/sda

  1. Step 5: add the new disk — rebuild starts immediately

mdadm --add /dev/md0 /dev/sda1

  1. Step 6: monitor the rebuild

watch cat /proc/mdstat </syntaxhighlight>

The partition table copy in step 4 is important. The new disk must have the same partition structure as the surviving member, or the add will fail. sfdisk -d dumps the partition table; piping it back into sfdisk on the target disk replicates it exactly.

Rebuild time depends on disk size and I/O load on the server. A 4TB mirror rebuilds in roughly 4-8 hours under light load. Progress is visible in /proc/mdstat as [==>..................] recovery = 12.3% with an estimated time remaining.

Adding a hot spare

<syntaxhighlight lang="bash"> mdadm --add /dev/md0 /dev/sdc1 </syntaxhighlight>

If the array is degraded when you add a disk, rebuild starts immediately. If the array is healthy, the disk sits as a hot spare and activates on its own when a member fails. With a spare in place, the array starts rebuilding the moment a disk dies, no human needed.

Assembling an array

<syntaxhighlight lang="bash">

  1. Assemble all known arrays from mdadm.conf

mdadm --assemble --scan

  1. Force a degraded array to start with one disk (use after hardware failure when only one disk is available)

mdadm --assemble --run /dev/md0 /dev/sdb1

  1. Assemble a specific array by UUID (useful when mdadm.conf is absent)

mdadm --assemble /dev/md0 --uuid=<uuid-from-examine> </syntaxhighlight>

The --run flag is needed when only one member is present and the array would normally wait for the second disk before starting. This allows the system to boot from a degraded mirror rather than stalling at the initramfs prompt.

Making the array persistent

Without configuration, mdadm assembles arrays based on superblocks found on disks, which works for most cases. For production systems, write the configuration explicitly:

<syntaxhighlight lang="bash">

  1. Append current array configuration to mdadm.conf

mdadm --detail --scan >> /etc/mdadm/mdadm.conf

  1. Rebuild the initramfs so the array assembles correctly at boot

update-initramfs -u </syntaxhighlight>

On Debian and Ubuntu, update-initramfs -u embeds the mdadm.conf into the initrd so the array can assemble before the root filesystem is mounted. Skipping this step can cause boot failures after a kernel update or disk change.

Monitoring and alerts

<syntaxhighlight lang="bash">

  1. Start the mdadm monitor daemon manually

mdadm --monitor --scan --daemonise --delay=60 --mail=root

  1. Or enable via systemd (preferred on modern systems)

systemctl enable mdadm-mon.service systemctl start mdadm-mon.service </syntaxhighlight>

The monitor watches for events and sends email when they occur. The --delay flag sets the poll interval in seconds; 60 seconds is the default. Events that trigger notifications:

  • DegradedArray — a disk failed, array is now degraded
  • DeviceDisappeared — a member disk vanished from the system
  • RebuildFinished — a rebuild completed successfully
  • Fail — a disk is marked faulty
  • SpareActive — a spare was promoted to fill a failed slot

Without monitoring, a failed disk goes unnoticed until someone manually checks. On a two-disk mirror, that means the first failure is silent. The second failure, which destroys the array, is when you find out. By then it is too late.

Scrubbing (integrity check)

<syntaxhighlight lang="bash">

  1. Start a check (reads all data, compares across mirrors, reports mismatches)

echo check > /sys/block/md0/md/sync_action

  1. Read mismatch count (0 = healthy, nonzero = inconsistency found)

cat /sys/block/md0/md/mismatch_cnt

  1. Repair mismatches (writes the "correct" version to both disks)

echo repair > /sys/block/md0/md/sync_action

  1. Cancel a running check or repair

echo idle > /sys/block/md0/md/sync_action </syntaxhighlight>

A check reads both disks and compares every block. Mismatches in RAID1 indicate one disk returned different data than the other for the same block, which suggests a disk is developing errors or there was an unclean shutdown that left the disks inconsistent.

The repair action resolves mismatches by copying the "correct" block to the disk that returned the wrong data. For RAID1, mdadm uses the first member as the authoritative source. Use repair with caution if you suspect the first member is the problematic disk.

Periodic scrubbing is standard practice. Running checks on all servers at once creates a measurable I/O storm, so production fleets stagger scrubs across time rather than running them simultaneously.

Write-intent bitmap

The write-intent bitmap is a small structure stored on the array itself. It tracks which regions have been written to recently. After a crash or power loss, mdadm checks the bitmap and only resyncs the dirty regions instead of the whole array.

<syntaxhighlight lang="bash">

  1. Enable bitmap at creation (recommended)

mdadm --create /dev/md0 --level=1 --raid-devices=2 --bitmap=internal /dev/sda1 /dev/sdb1

  1. Add a bitmap to an existing array

mdadm --grow /dev/md0 --bitmap=internal

  1. Remove the bitmap

mdadm --grow /dev/md0 --bitmap=none </syntaxhighlight>

Without a bitmap, an unclean shutdown triggers a full resync of the entire array. On a 4TB mirror, that is a full 4TB copy, taking hours. With a bitmap, only the blocks written in the seconds before the crash need resyncing, which typically takes minutes.

The cost is approximately 10% write throughput reduction, due to the extra write per block to update the bitmap. The default bitmap chunk size is 64MB, meaning each bit in the bitmap covers 64MB of array data. Smaller chunk sizes track more precisely at higher bitmap storage and update cost.

For production servers that may lose power unexpectedly, the bitmap is worth enabling. For arrays rebuilt from scratch (e.g., replacing a known dead disk), the bitmap does not help because the new disk needs a full copy regardless.

Metadata formats

Format Superblock location Notes
0.90 End of device Legacy format. 2TB device limit. Kernel autodetects via filesystem type. Still present on older systems.
1.0 4096 bytes from end Modern features, UEFI-safe. Compatible with GRUB2 boot arrays.
1.2 4096 bytes from start Default since mdadm 3.1.2. Works with GRUB2. Recommended for new arrays.

The default metadata version (1.2) places the superblock near the start of the device. This means the array occupies slightly less than the full partition size (the first few kilobytes hold metadata). This is invisible in practice but matters if you are trying to use the exact same partition table offset on a replacement disk.

On systems running very old kernels (pre-2.6.28), only 0.90 is supported. All current Debian and Ubuntu releases support 1.2.

RAID1 for boot drives

RAID1 is the standard choice for OS/boot drives because GRUB2 supports mdadm RAID1 arrays natively. The bootloader can read the superblock and find kernels on a degraded array, allowing the system to boot after a single disk failure without manual intervention.

<syntaxhighlight lang="bash">

  1. Install GRUB to both physical disks — NOT to the md device

grub-install /dev/sda grub-install /dev/sdb

  1. After replacing a failed boot disk: re-run grub-install on the new disk

grub-install /dev/sda # (assuming /dev/sda is the replacement) </syntaxhighlight>

Installing GRUB to /dev/md0 is incorrect and will fail on disk replacement, because the new disk will not have a bootloader. Install GRUB to the underlying physical disks. Both disks receive a complete bootloader, so either can boot the system independently.

After replacing a failed boot disk and rebuilding the array, always re-run grub-install on the new physical disk. The RAID rebuild copies filesystem data but not the MBR/GPT bootloader region.

On Pulsed Media ProDesk servers, we use RAID1 on two USB sticks for the OS boot partition. This works around the 4TB MBR limit on large NVMe drives: the NVMe holds customer data, while the USB pair holds the OS. If either USB stick fails, the other continues booting normally.

Metadata version 1.0 or 1.2 both work for boot arrays. Version 1.2 is preferred for new setups.

--write-mostly flag

<syntaxhighlight lang="bash">

  1. Create a mirror where sdb is used only for writes, not reads

mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sda /dev/sdb --write-mostly

  1. Add a device as write-mostly to an existing array

mdadm --add /dev/md0 /dev/sdb --write-mostly </syntaxhighlight>

The --write-mostly flag marks a device as write-preferred: mdadm will not schedule reads to it unless no other device is available. This is useful in two situations:

  • A remote mirror connected over a slower link (reads would be slow, writes must happen for redundancy)
  • Hardware with asymmetric read/write performance, where one disk is slower on reads

In seedbox contexts, this flag rarely matters since both disks in a local mirror have equivalent access times.

How RAID1 compares to other levels

The RAID article covers all levels in detail. For quick reference on RAID1 specifically:

RAID1 RAID0 RAID5 RAID10
Minimum disks 2 2 3 4
Usable capacity 1/N 100% (N-1)/N 50%
Fault tolerance N-1 disks None 1 disk 1 per mirror pair
Write speed Single disk Fastest (striped) Slightly slower (parity) Fast (no parity)
Rebuild complexity Simple copy N/A Parity recalculation Copy (per pair)
Rebuild speed Fast (copy only) N/A Slow (parity math across N disks) Moderate

RAID1 rebuild is faster than RAID5 because it is a direct disk-to-disk copy. No parity calculation happens: mdadm reads from the surviving disk and writes to the new disk, block by block. On a RAID5 array, every rebuilt block requires reading from all other members and XOR-ing the results, which involves more I/O and CPU time.

We have measured RAID5 rebuilds on our production hardware at roughly 90% throughput reduction for both reads and writes. RAID1 rebuild hits lighter, since only two devices are involved and there is no parity math, but it still puts real I/O pressure on the server while the copy runs.

When RAID1 fits

RAID1 makes sense when:

  • You need a boot drive that survives disk failure. GRUB reads RAID1 natively, and the system boots from whichever disk is still alive.
  • You only have two disks. RAID5 needs three. With two, RAID1 is the only redundant option.
  • The dataset is small but losing it would hurt. Databases, TLS certificates, server configuration. 50% overhead on a 100 GB partition costs almost nothing.
  • Rebuild time matters more than capacity. No parity math, just a copy. Faster and simpler than RAID5 recovery.
  • You want something you can reason about at 3 AM. No XOR, no distributed parity, no stripe alignment. Two copies. One fails, the other has everything.

RAID1 is not suitable as the primary customer storage layer in seedbox hosting. A 50% overhead on tens of terabytes of HDD storage is too expensive. Pulsed Media's customer-facing products use RAID0 (V series, no redundancy) or RAID5 (M series, single-disk fault tolerance with lower overhead).

Internally, we use RAID1 for OS and boot partitions on production servers, where reliability matters more than raw capacity.

PM's M-series seedboxes use RAID5 for customer storage (one drive can fail, data stays intact). V-series uses RAID0 for maximum speed. Both run on PMSS in PM-owned Finnish datacenters.

RAID1 does not replace backup

A RAID1 mirror protects against a single disk failing. It does not protect against:

  • Accidental deletion — the delete operation runs on the live filesystem, which the mirror faithfully copies to both disks.
  • Filesystem corruption — corrupted data is mirrored identically to both disks.
  • Software bugs that overwrite data — same as above.
  • Physical catastrophe (fire, water, theft) — both disks are in the same machine.

RAID and backup solve different problems. RAID keeps the service running after a hardware failure. Backup lets you recover from the things RAID cannot help with: the wrong rm command, the bug that corrupted a database, the fire that took the server room.

See also

  • RAID — overview of all RAID levels, software vs hardware RAID, performance comparison
  • Hard disk drive — the physical disks in RAID arrays
  • Seedbox — how RAID levels map to seedbox product tiers
  • PM Software Stack — PMSS uses mdadm on all production servers
  • NVMe — NVMe drives as cache tier alongside RAID arrays
  • Pulsed Media Datacenters — where the hardware runs

Blog posts: