Computer Basics - Backup

Below is a comprehensive, practical guide to what backups are, why they matter, the main approaches, how to design a safe backup strategy, and concrete examples you can use.

What a backup is

A backup is a separate copy of data (files, system state, databases, configuration, etc.) created so that the original data can be restored after loss, corruption, accidental deletion, theft, ransomware, hardware failure, or disaster. Backups are intended for recovery, while archives are long-term stores for compliance or historical retention (you usually don’t restore an archive frequently).

Why backups matter

Backups protect you from many failure scenarios:

Accidental deletion or user error
Hardware failure (disk, RAID controller, server)
Software bugs or updates that corrupt data
Ransomware or malicious deletion (if backups are protected)
Natural disasters, fire, theft — if an offsite copy exists
Logical errors (e.g., bad data written to production that needs to be reverted)

Without reliable backups, recovery is often impossible or extremely costly.

Main backup types (how they work, pros & cons)

Full backup

Copies everything selected.
Pros: simplest restores, single set contains everything.
Cons: slow and storage-intensive.

Incremental backup

After an initial full, saves only data changed since the last backup of any kind (usually last incremental).
Pros: fast, space-efficient.
Cons: restore requires the full plus every incremental since — slower and more complex.

Differential backup

After a full backup, saves data changed since that full backup.
Pros: faster to restore than incremental (only full + last differential).
Cons: differentials grow larger with time; more storage than incrementals.

Mirror (replication)

Exact copy of data (often automated).
Pros: up-to-date copy.
Cons: usually not versioned — if file is deleted/ corrupted on source, mirror also loses it unless versioning/snapshots are used. Not a substitute for backups.

Snapshot

Point-in-time view of a filesystem or volume (e.g., LVM, ZFS snapshots, cloud block snapshots).
Pros: fast, efficient. Good for quick rollback.
Cons: often stored on same system/cluster (so not a substitute for offsite). Retention/immutability depends on system.

Continuous Data Protection (CDP)

Captures every change (or near-continuous), enabling restore to a fine-grained point in time.
Pros: very small RPO.
Cons: higher cost and complexity.

Synthetic full

Backup system synthesizes a full backup from existing full + incrementals on the backup storage (saves time on source systems). Useful for large enterprise setups.

Key metrics: RPO and RTO

RPO (Recovery Point Objective) — how much data you can afford to lose (e.g., 1 hour, 24 hours). This determines backup frequency.
RTO (Recovery Time Objective) — how quickly systems must be back online. This determines restore method and infrastructure (e.g., hot standby vs restore from cold tape).

Design backups by selecting acceptable RPO and RTO for each system or dataset.

Storage media & locations

Local disk / external HDD / NAS — fast and cheap for short-term retention.
Tape (LTO) — low cost per TB, long shelf life; commonly used for archival/offsite rotation.
Cloud object storage (S3, Azure Blob, etc.) — scalable, durable offsite storage; supports lifecycle rules.
Snapshots on SAN / hypervisor — quick VM-level snapshots.
Offsite / air-gapped copies — critical protection against ransomware and local disasters.

Combine local for fast recovery and offsite/cloud for disaster recovery.

Backup strategy & best practices

3–2–1 rule: Keep 3 copies of data, on 2 different media types, with 1 copy offsite. Variations: 3-2-1-1 (one air-gapped/immutable), 3-2-2 (two offsite).
Immutable backups / WORM: Make at least one copy that cannot be altered or deleted for a set period. This helps resist ransomware.
Encryption: Encrypt backups in transit and at rest; securely manage keys. Treat backups as sensitive data.
Retention and lifecycle: Decide retention windows (daily/weekly/monthly/yearly) and implement lifecycle/archival policies.
Versioning: Keep multiple historical versions so you can revert to a known-good point.
Automate and monitor: Schedule backups, monitor success/failure, alert on errors.
Document the restore procedure: Restoration must be straightforward and tested.
Least privilege: Limit who can create/modify/restore backups and protect credentials.
Avoid backing up corrupted data immediately: If corruption occurs, ensure you keep older known-good versions.

Application-specific notes

Databases: Use application-consistent backups (database dumps, transaction-log shipping, or hot backups that quiesce transactions). Ensure backups include logs needed for point-in-time recovery.
Virtual machines: Use hypervisor-aware backups or guest-level backups. Snapshots alone are usually not sufficient for long-term backup.
Large file stores: Use deduplication and compression on backup storage.
Config and metadata: Back up system configuration, certificates, keys, and user accounts in addition to files.

Verification & testing

Regular restore tests: Periodically perform full restores and test applications — the only reliable way to know a backup works.
Automated verification: Use checksum verification, test-restore to a separate environment.
Monitor backup health: Track completion times, sizes, error logs, and retention compliance.

Security considerations

Encrypt backups (AES etc.) and protect keys offline.
Use multi-factor authentication for backup management consoles.
Implement role-based access and logging/audit on backup actions.
Keep at least one copy offline or immutable to defend against ransomware.

Common pitfalls

Not testing restores.
Keeping backups only on the same physical site or same RAID array.
Backups that run but miss open files or databases (no application-consistent snapshot).
Poor retention rules that delete required historical data.
Treating replication as backup (no historical versions).
Weak key management for encrypted backups.

Quick implementation checklist

Identify critical data and systems.
Define RPO and RTO per system.
Choose backup type (full/incremental/differential/snapshots).
Select storage: local + offsite/cloud + immutable copy.
Configure encryption & access controls.
Schedule and automate backups.
Implement monitoring/alerts.
Test restores regularly and document procedures.
Review retention and compliance requirements.
Update plan after infrastructure or data changes.

Concrete example — simple Linux incremental backups with `rsync` + hard links

This is a practical, space-efficient method that creates daily “full” appearing folders but only stores changed files once (uses hard links).

Initial full backup (example):

rsync -a --delete /home/username/ /backups/backup-2025-09-18/

Next day incremental using --link-dest:

rsync -a --delete --link-dest=/backups/backup-2025-09-18/ /home/username/ /backups/backup-2025-09-19/

How it works: unchanged files are hard-linked to the previous backup, so each backup-YYYY-MM-DD looks like a full snapshot but only changed files consume space. Note: ensure /backups is on the same filesystem for hard-links, and test restores.

For encrypted, deduplicated, and robust backups consider tools such as Restic, BorgBackup, or commercial products (they have built-in encryption, dedupe, and repository maintenance).

When to consider a professional or managed solution

Large datasets, strict RTO/RPO, regulatory compliance, or complex multi-site replication needs.
If you lack staff/time to manage testing, retention, and security. Managed backup vendors and enterprise backup suites can automate and scale policies.