MongoDb - MongoDB Query Optimization and Query Planner

MongoDB Query Optimization is the process of improving the performance and efficiency of database queries so that data can be retrieved faster while consuming fewer system resources. As applications grow and collections contain millions of documents, poorly optimized queries can lead to slow response times, increased CPU usage, excessive memory consumption, and reduced overall system performance. MongoDB includes a built-in Query Planner that helps determine the most efficient way to execute a query.

Understanding Query Execution

Whenever a query is submitted to MongoDB, the database must decide how to locate the requested data. There are generally two ways to find documents:

Collection Scan (COLLSCAN)
Index Scan (IXSCAN)

A Collection Scan examines every document in a collection to find matching results. This approach works for small datasets but becomes inefficient as the collection grows.

For example:

db.users.find({ age: 30 })

If there is no index on the age field, MongoDB scans every document in the collection.

An Index Scan uses an index structure to quickly locate matching documents without checking every record. This significantly improves performance for large collections.

Role of the Query Planner

The Query Planner is responsible for determining the best execution strategy for a query. When a query is received, MongoDB evaluates available indexes and possible execution plans.

The Query Planner performs the following tasks:

Analyzes query conditions
Examines available indexes
Generates multiple execution plans
Tests candidate plans
Selects the most efficient plan
Stores the winning plan in cache for future use

For example, consider a collection with indexes on both name and age.

db.users.find({
    name: "John",
    age: 30
})

MongoDB may evaluate:

Using the name index
Using the age index
Using both indexes
Performing a collection scan

The Query Planner selects the plan with the lowest execution cost.

Query Execution Plans

An execution plan describes how MongoDB retrieves data.

Common stages include:

COLLSCAN

A collection scan checks every document.

db.products.find({ category: "Electronics" })

Without an index on category, MongoDB scans the entire collection.

Advantages:

Simple execution
No index maintenance required

Disadvantages:

Slow on large datasets
High CPU usage
Increased disk reads

IXSCAN

An index scan uses an index to locate matching records.

db.products.createIndex({ category: 1 })

After creating the index:

db.products.find({ category: "Electronics" })

MongoDB uses the index instead of scanning the entire collection.

Advantages:

Faster data retrieval
Reduced resource usage
Better scalability

FETCH

After locating matching entries through an index, MongoDB may fetch the actual documents from the collection.

Example workflow:

Search index
Find matching document IDs
Retrieve documents

This process appears as a FETCH stage in execution plans.

Using Explain to Analyze Queries

MongoDB provides the explain() method to inspect query execution.

Example:

db.users.find({ age: 30 }).explain("executionStats")

The output includes:

Query planner information
Winning plan
Number of documents examined
Number of documents returned
Execution time

Important metrics:

executionTimeMillis

Shows the query execution time.

Example:

"executionTimeMillis" : 5

The query took 5 milliseconds.

totalDocsExamined

Indicates how many documents MongoDB checked.

Example:

"totalDocsExamined" : 100000

This means MongoDB examined 100,000 documents.

A well-optimized query should examine only a small number of documents.

totalKeysExamined

Shows how many index entries were checked.

Example:

"totalKeysExamined" : 50

Lower values generally indicate better efficiency.

Single Field Indexes

A single-field index is created on one field.

Example:

db.users.createIndex({ email: 1 })

Queries using the email field become faster.

db.users.find({
    email: "[email protected]"
})

MongoDB can directly locate matching records using the index.

Compound Indexes

Compound indexes contain multiple fields.

Example:

db.orders.createIndex({
    customerId: 1,
    orderDate: -1
})

This index helps queries such as:

db.orders.find({
    customerId: 101
}).sort({
    orderDate: -1
})

Compound indexes reduce sorting overhead and improve filtering performance.

Covered Queries

A covered query retrieves all required information directly from an index without accessing the collection.

Example index:

db.users.createIndex({
    name: 1,
    age: 1
})

Query:

db.users.find(
    { name: "John" },
    { name: 1, age: 1, _id: 0 }
)

Since all requested fields exist in the index, MongoDB can return results directly from the index.

Benefits:

Faster execution
Reduced disk access
Lower memory usage

Query Optimization Techniques

Create Appropriate Indexes

Indexes should be created on frequently queried fields.

Example:

db.customers.createIndex({
    phoneNumber: 1
})

Avoid Unnecessary Fields

Retrieve only required fields.

Instead of:

db.users.find({})

Use:

db.users.find(
    {},
    {
        name: 1,
        email: 1
    }
)

This reduces network traffic and memory consumption.

Limit Returned Documents

Use limit() whenever possible.

db.products.find().limit(20)

Only the first 20 records are returned.

Use Efficient Sorting

Sorting without indexes can be expensive.

Example:

db.sales.find().sort({
    saleDate: -1
})

Creating an index improves performance:

db.sales.createIndex({
    saleDate: -1
})

Avoid Large Skip Operations

Using large skip values can slow queries.

Example:

db.posts.find().skip(100000)

MongoDB still processes skipped documents.

A better approach is cursor-based pagination:

db.posts.find({
    _id: { $gt: lastId }
})

Index Intersection

MongoDB can combine multiple indexes when necessary.

Indexes:

db.users.createIndex({ city: 1 })
db.users.createIndex({ age: 1 })

Query:

db.users.find({
    city: "Bangalore",
    age: 25
})

MongoDB may merge results from both indexes.

Although useful, a compound index is often more efficient.

db.users.createIndex({
    city: 1,
    age: 1
})

Query Plan Cache

MongoDB stores successful query plans in a plan cache.

Benefits include:

Faster planning
Reduced optimization overhead
Consistent query performance

When query patterns change significantly, MongoDB can re-evaluate and select a better execution plan.

Common Causes of Slow Queries

Missing Indexes

Without indexes, MongoDB performs collection scans.

Inefficient Query Patterns

Example:

db.users.find({
    name: /john/i
})

Regular expressions may prevent efficient index usage.

Excessive Sorting

Sorting large datasets without indexes increases memory and CPU usage.

Large Documents

Retrieving unnecessary fields increases disk reads and network transfer.

Poor Schema Design

Improper document structures can result in inefficient queries and excessive data processing.

Best Practices for Query Optimization

Create indexes for frequently searched fields.
Use compound indexes for multi-field queries.
Analyze queries regularly with explain().
Avoid collection scans on large datasets.
Return only required fields.
Use covered queries whenever possible.
Optimize sorting operations using indexes.
Monitor execution statistics and query performance.
Implement efficient pagination techniques.
Review and update indexes as application requirements evolve.

Conclusion

MongoDB Query Optimization is a critical aspect of database performance management. The Query Planner automatically evaluates available execution strategies and chooses the most efficient plan for each query. By understanding execution plans, leveraging indexes effectively, analyzing queries with explain(), and following optimization best practices, developers can significantly improve application responsiveness, reduce resource consumption, and ensure that MongoDB databases continue to perform efficiently even as data volumes grow.