MongoDb - MongoDB Query Optimization and Query Planner
MongoDB Query Optimization is the process of improving the performance and efficiency of database queries so that data can be retrieved faster while consuming fewer system resources. As applications grow and collections contain millions of documents, poorly optimized queries can lead to slow response times, increased CPU usage, excessive memory consumption, and reduced overall system performance. MongoDB includes a built-in Query Planner that helps determine the most efficient way to execute a query.
Understanding Query Execution
Whenever a query is submitted to MongoDB, the database must decide how to locate the requested data. There are generally two ways to find documents:
-
Collection Scan (COLLSCAN)
-
Index Scan (IXSCAN)
A Collection Scan examines every document in a collection to find matching results. This approach works for small datasets but becomes inefficient as the collection grows.
For example:
db.users.find({ age: 30 })
If there is no index on the age field, MongoDB scans every document in the collection.
An Index Scan uses an index structure to quickly locate matching documents without checking every record. This significantly improves performance for large collections.
Role of the Query Planner
The Query Planner is responsible for determining the best execution strategy for a query. When a query is received, MongoDB evaluates available indexes and possible execution plans.
The Query Planner performs the following tasks:
-
Analyzes query conditions
-
Examines available indexes
-
Generates multiple execution plans
-
Tests candidate plans
-
Selects the most efficient plan
-
Stores the winning plan in cache for future use
For example, consider a collection with indexes on both name and age.
db.users.find({
name: "John",
age: 30
})
MongoDB may evaluate:
-
Using the name index
-
Using the age index
-
Using both indexes
-
Performing a collection scan
The Query Planner selects the plan with the lowest execution cost.
Query Execution Plans
An execution plan describes how MongoDB retrieves data.
Common stages include:
COLLSCAN
A collection scan checks every document.
db.products.find({ category: "Electronics" })
Without an index on category, MongoDB scans the entire collection.
Advantages:
-
Simple execution
-
No index maintenance required
Disadvantages:
-
Slow on large datasets
-
High CPU usage
-
Increased disk reads
IXSCAN
An index scan uses an index to locate matching records.
db.products.createIndex({ category: 1 })
After creating the index:
db.products.find({ category: "Electronics" })
MongoDB uses the index instead of scanning the entire collection.
Advantages:
-
Faster data retrieval
-
Reduced resource usage
-
Better scalability
FETCH
After locating matching entries through an index, MongoDB may fetch the actual documents from the collection.
Example workflow:
-
Search index
-
Find matching document IDs
-
Retrieve documents
This process appears as a FETCH stage in execution plans.
Using Explain to Analyze Queries
MongoDB provides the explain() method to inspect query execution.
Example:
db.users.find({ age: 30 }).explain("executionStats")
The output includes:
-
Query planner information
-
Winning plan
-
Number of documents examined
-
Number of documents returned
-
Execution time
Important metrics:
executionTimeMillis
Shows the query execution time.
Example:
"executionTimeMillis" : 5
The query took 5 milliseconds.
totalDocsExamined
Indicates how many documents MongoDB checked.
Example:
"totalDocsExamined" : 100000
This means MongoDB examined 100,000 documents.
A well-optimized query should examine only a small number of documents.
totalKeysExamined
Shows how many index entries were checked.
Example:
"totalKeysExamined" : 50
Lower values generally indicate better efficiency.
Single Field Indexes
A single-field index is created on one field.
Example:
db.users.createIndex({ email: 1 })
Queries using the email field become faster.
db.users.find({
email: "[email protected]"
})
MongoDB can directly locate matching records using the index.
Compound Indexes
Compound indexes contain multiple fields.
Example:
db.orders.createIndex({
customerId: 1,
orderDate: -1
})
This index helps queries such as:
db.orders.find({
customerId: 101
}).sort({
orderDate: -1
})
Compound indexes reduce sorting overhead and improve filtering performance.
Covered Queries
A covered query retrieves all required information directly from an index without accessing the collection.
Example index:
db.users.createIndex({
name: 1,
age: 1
})
Query:
db.users.find(
{ name: "John" },
{ name: 1, age: 1, _id: 0 }
)
Since all requested fields exist in the index, MongoDB can return results directly from the index.
Benefits:
-
Faster execution
-
Reduced disk access
-
Lower memory usage
Query Optimization Techniques
Create Appropriate Indexes
Indexes should be created on frequently queried fields.
Example:
db.customers.createIndex({
phoneNumber: 1
})
Avoid Unnecessary Fields
Retrieve only required fields.
Instead of:
db.users.find({})
Use:
db.users.find(
{},
{
name: 1,
email: 1
}
)
This reduces network traffic and memory consumption.
Limit Returned Documents
Use limit() whenever possible.
db.products.find().limit(20)
Only the first 20 records are returned.
Use Efficient Sorting
Sorting without indexes can be expensive.
Example:
db.sales.find().sort({
saleDate: -1
})
Creating an index improves performance:
db.sales.createIndex({
saleDate: -1
})
Avoid Large Skip Operations
Using large skip values can slow queries.
Example:
db.posts.find().skip(100000)
MongoDB still processes skipped documents.
A better approach is cursor-based pagination:
db.posts.find({
_id: { $gt: lastId }
})
Index Intersection
MongoDB can combine multiple indexes when necessary.
Indexes:
db.users.createIndex({ city: 1 })
db.users.createIndex({ age: 1 })
Query:
db.users.find({
city: "Bangalore",
age: 25
})
MongoDB may merge results from both indexes.
Although useful, a compound index is often more efficient.
db.users.createIndex({
city: 1,
age: 1
})
Query Plan Cache
MongoDB stores successful query plans in a plan cache.
Benefits include:
-
Faster planning
-
Reduced optimization overhead
-
Consistent query performance
When query patterns change significantly, MongoDB can re-evaluate and select a better execution plan.
Common Causes of Slow Queries
Missing Indexes
Without indexes, MongoDB performs collection scans.
Inefficient Query Patterns
Example:
db.users.find({
name: /john/i
})
Regular expressions may prevent efficient index usage.
Excessive Sorting
Sorting large datasets without indexes increases memory and CPU usage.
Large Documents
Retrieving unnecessary fields increases disk reads and network transfer.
Poor Schema Design
Improper document structures can result in inefficient queries and excessive data processing.
Best Practices for Query Optimization
-
Create indexes for frequently searched fields.
-
Use compound indexes for multi-field queries.
-
Analyze queries regularly with
explain(). -
Avoid collection scans on large datasets.
-
Return only required fields.
-
Use covered queries whenever possible.
-
Optimize sorting operations using indexes.
-
Monitor execution statistics and query performance.
-
Implement efficient pagination techniques.
-
Review and update indexes as application requirements evolve.
Conclusion
MongoDB Query Optimization is a critical aspect of database performance management. The Query Planner automatically evaluates available execution strategies and chooses the most efficient plan for each query. By understanding execution plans, leveraging indexes effectively, analyzing queries with explain(), and following optimization best practices, developers can significantly improve application responsiveness, reduce resource consumption, and ensure that MongoDB databases continue to perform efficiently even as data volumes grow.