Software Testing - Stress Testing
Stress Testing is a type of performance testing used to determine how a system behaves under extreme load conditions, beyond its normal capacity.
It checks:
-
Breaking point (when the system fails)
-
How gracefully it fails
-
How quickly it recovers
In simple terms:
Stress Testing pushes the system harder than normal to see when it crashes and how well it recovers.
2. Goals of Stress Testing
✔ Identify the system’s breaking point
At what user/load level does the system stop responding?
✔ Understand system behavior under extreme conditions
Does it slow down, hang, crash, or throw errors?
✔ Evaluate recovery capability
How fast does it return to normal after overload?
✔ Identify bottlenecks
Weakest areas such as:
-
Database
-
APIs
-
Server CPU
-
Memory leaks
-
Network limits
✔ Verify stability and error handling
System should fail gracefully, not crash abruptly.
3. When to Perform Stress Testing
-
Before launching an application
-
Before festival sales or marketing events
-
When expecting sudden traffic peaks
-
After major infrastructure upgrades
-
When performance issues are suspected
-
When adding new hardware or load balancers
4. Types of Stress Testing
1) Application Stress Testing
Find defects in app logic under extreme load.
(E.g., API timeouts, crashes)
2) System Stress Testing
Test different components together under stress
(e.g., DB + server + network under heavy load)
3) Spike Testing
Sudden load increase in seconds
(e.g., 100 users → 10,000 users instantly)
4) Distributed Stress Testing
Multiple remote machines generate load together.
5) Exploratory Stress Testing
No specific numbers — random extreme loads to see unexpected behavior.
Comparison with Load Testing
| Feature | Load Testing | Stress Testing |
|---|---|---|
| Purpose | Test under expected load | Test beyond normal load |
| Load level | Normal / planned | Extreme / overload |
| Outcome | Validate performance | Find breaking point & recovery |
| Goal | Stability during peak traffic | Behavior during and after failure |
5. Stress Testing Metrics
Important metrics include:
Performance
-
Peak load supported
-
Response time under stress
-
Throughput
-
Latency
Reliability
-
Error rate
-
Timeout rate
-
Crash point
Resource Usage
-
CPU max usage
-
Memory consumption
-
Disk I/O
-
DB query spikes
Recovery
-
Time to recover after overload
-
Auto-scaling behavior (if cloud setup)
6. Stress Testing Process (Step-by-Step)
Step 1 — Define breaking-point goals
-
Example: “Find out at which user count API fails.”
Step 2 — Identify critical scenarios
Examples:
-
Login
-
Search
-
Add to cart
-
Checkout
-
File upload
Step 3 — Create extreme load profiles
Example:
-
Expected users: 2,000
-
Stress test: push up to 10,000
Step 4 — Prepare test environment
Must be as close to production as possible.
Step 5 — Execute test
Increase load gradually or apply sudden spikes.
Step 6 — Monitor system behavior
Track:
-
Response time
-
CPU/memory
-
DB queries
-
Error logs
Step 7 — Identify breaking point
Find the level at which:
-
System slows down drastically
-
Errors increase
-
Server crashes
Step 8 — Analyze results
Which component failed first?
-
DB?
-
Server CPU?
-
Network?
-
API latency?
Step 9 — Fix bottlenecks + Retest
7. Stress Testing Example
Scenario: E-commerce site during Big Sale
Normal expected traffic:
3,000 users
Stress test
Push step-by-step:
-
3,000 → 5,000 → 8,000 → 12,000 → 15,000 users
What happens:
-
At 8,000 users: response time reaches 7 seconds
-
At 10,000 users: checkout API begins returning errors
-
At 12,000 users: database CPU spikes to 100%
-
System crashes at 13,500 users
After recovery test:
-
Server takes 3 minutes to recover
-
Auto-scaling adds 2 servers → system stable again
Outcome:
Major bottleneck found in checkout API + database indexing needed.
8. Popular Stress Testing Tools
-
JMeter
-
Gatling
-
LoadRunner
-
Locust
-
k6
-
BlazeMeter
-
Neoload
Cloud platforms:
-
AWS Load Testing
-
Azure Load Testing
9. Common Mistakes in Stress Testing
-
Testing in a weak test environment
-
No monitoring setup
-
Not testing recovery mechanisms
-
Using unrealistic user data
-
No clear breaking-point goals
-
Only checking app, not infra (DB, server, cache)
10. Best Practices
-
Use realistic traffic models
-
Stress test core user flows only
-
Monitor server + database + app logs together
-
Combine stress test with spike tests
-
Always test system recovery and failover
-
Document exact failure point
-
Involve developers + DevOps for root-cause analysis