Software Testing - Stress Testing

Stress Testing is a type of performance testing used to determine how a system behaves under extreme load conditions, beyond its normal capacity.

It checks:

Breaking point (when the system fails)
How gracefully it fails
How quickly it recovers

In simple terms:

Stress Testing pushes the system harder than normal to see when it crashes and how well it recovers.

2. Goals of Stress Testing

✔ Identify the system’s breaking point

At what user/load level does the system stop responding?

✔ Understand system behavior under extreme conditions

Does it slow down, hang, crash, or throw errors?

✔ Evaluate recovery capability

How fast does it return to normal after overload?

✔ Identify bottlenecks

Weakest areas such as:

Database
APIs
Server CPU
Memory leaks
Network limits

✔ Verify stability and error handling

System should fail gracefully, not crash abruptly.

3. When to Perform Stress Testing

Before launching an application
Before festival sales or marketing events
When expecting sudden traffic peaks
After major infrastructure upgrades
When performance issues are suspected
When adding new hardware or load balancers

4. Types of Stress Testing

1) Application Stress Testing

Find defects in app logic under extreme load.
(E.g., API timeouts, crashes)

2) System Stress Testing

Test different components together under stress
(e.g., DB + server + network under heavy load)

3) Spike Testing

Sudden load increase in seconds
(e.g., 100 users → 10,000 users instantly)

4) Distributed Stress Testing

Multiple remote machines generate load together.

5) Exploratory Stress Testing

No specific numbers — random extreme loads to see unexpected behavior.

Comparison with Load Testing

Feature	Load Testing	Stress Testing
Purpose	Test under expected load	Test beyond normal load
Load level	Normal / planned	Extreme / overload
Outcome	Validate performance	Find breaking point & recovery
Goal	Stability during peak traffic	Behavior during and after failure

5. Stress Testing Metrics

Important metrics include:

Performance

Peak load supported
Response time under stress
Throughput
Latency

Reliability

Error rate
Timeout rate
Crash point

Resource Usage

CPU max usage
Memory consumption
Disk I/O
DB query spikes

Recovery

Time to recover after overload
Auto-scaling behavior (if cloud setup)

6. Stress Testing Process (Step-by-Step)

Step 1 — Define breaking-point goals

Example: “Find out at which user count API fails.”

Step 2 — Identify critical scenarios

Examples:

Login
Search
Add to cart
Checkout
File upload

Step 3 — Create extreme load profiles

Example:

Expected users: 2,000
Stress test: push up to 10,000

Step 4 — Prepare test environment

Must be as close to production as possible.

Step 5 — Execute test

Increase load gradually or apply sudden spikes.

Step 6 — Monitor system behavior

Track:

Response time
CPU/memory
DB queries
Error logs

Step 7 — Identify breaking point

Find the level at which:

System slows down drastically
Errors increase
Server crashes

Step 8 — Analyze results

Which component failed first?

DB?
Server CPU?
Network?
API latency?

Step 9 — Fix bottlenecks + Retest

7. Stress Testing Example

Scenario: E-commerce site during Big Sale

Normal expected traffic:

3,000 users

Stress test

Push step-by-step:

3,000 → 5,000 → 8,000 → 12,000 → 15,000 users

What happens:

At 8,000 users: response time reaches 7 seconds
At 10,000 users: checkout API begins returning errors
At 12,000 users: database CPU spikes to 100%
System crashes at 13,500 users

After recovery test:

Server takes 3 minutes to recover
Auto-scaling adds 2 servers → system stable again

Outcome:
Major bottleneck found in checkout API + database indexing needed.

8. Popular Stress Testing Tools

JMeter
Gatling
LoadRunner
Locust
k6
BlazeMeter
Neoload

Cloud platforms:

AWS Load Testing
Azure Load Testing

9. Common Mistakes in Stress Testing

Testing in a weak test environment
No monitoring setup
Not testing recovery mechanisms
Using unrealistic user data
No clear breaking-point goals
Only checking app, not infra (DB, server, cache)

10. Best Practices

Use realistic traffic models
Stress test core user flows only
Monitor server + database + app logs together
Combine stress test with spike tests
Always test system recovery and failover
Document exact failure point
Involve developers + DevOps for root-cause analysis