When troubleshooting why your application is running slowly or looking at how to improve MongoDB performance, revisiting your indexing strategy—or creating one if it doesn’t already exist—is often the key. In this blog, we’ll explore why using indexes matters, the different types of indexes available in MongoDB, and when to use each type with practical examples.
What are indexes and why are they important?
Indexes are special data structures that enable MongoDB to quickly lookup field values rather than scanning all the data in a collection. This is especially useful when you’re working with large datasets, as it significantly improves query performance. Indexes in MongoDB store field values in a sorted order. When MongoDB runs a query, it can make use of this pre-sorted structure to efficiently return results in the required order without the need for additional sorting operations.
By creating indexes, we’re doing the heavy lifting as we go along, storing specific field values at the point when documents are created or updated. This upfront effort saves time later by enabling faster and more efficient data retrieval.
MongoDB’s diverse index types and when to use them
Single field indexes
Single field indexes contribute to performance optimization by enhancing searches that focus on a single field.
How it works:
MongoDB organizes and sorts the values of a specific field, including embedded document fields or entire embedded documents. It can traverse the values either in ascending or descending order, regardless of the sort order you define when creating the index.
When to use:
Use a single field index when queries frequently target a particular field. For example, in a collection of student records, creating an index on the average grade field would improve performance for queries that search or sort students based on their average grade. Similarly, in an HR system, a single field index on job level enables efficient filtering and sorting of employees.
Compound indexes
Compound indexes combine two or more fields in a document, allowing more efficient data retrieval when filtering or sorting on multiple criteria simultaneously.
How it works:
MongoDB uses the order in which you specify the fields when it processes your query, following the Equality, Sort, Range rule:
- First field: Fields used for exact matches
- Second field: Fields that reflect the sort order of the query
- Final field(s): Fields used for range filters
When to use:
Use a compound index when queries consistently use the same field combinations. For example in an order management system with fields such as customer id, product id, and order date. A compound index allows fast retrieval of orders by customer and product with the results sorted by order date.
Multikey indexes
Multikey indexes are used for fields that contain arrays or nested documents within a field.
How it works:
When you specify the field that contains the array with nested documents, MongoDB automatically creates an index key for each element in the array, including any embedded documents. With the multikey index, MongoDB can directly locate all the documents that contain the value in the array and/or embedded documents within the array field instead of scanning all the documents.
When to use:
Use a multikey index if you frequently query, filter or sort by elements within arrays or search nested document structures. For example, consider a website with blog posts that have multiple tags for each post. A multikey index on the tag field enables efficient querying on tag values in the array.
Text indexes
Text indexes provide advanced search capabilities for string content in documents, beyond simple exact matches.
How it works:
You can create one text index per collection on one or more string fields. The string content is tokenized into individual words.
These support language-specific text processing where the language determines the rules for parsing word roots and defining stop words to be filtered out. For example in English, suffix stems like -ed and -ing and common words such as the and a would be excluded from the search.
When to use:
Text indexes are useful for keyword searches, phrase matching, or partial word matching in large volumes of string content. They are particularly beneficial in applications like content management systems or search engines. For instance, in a repository of books with titles and descriptions, you could match on keywords such as “ghost stories” or “science fiction”.
Wildcard indexes
A wildcard index is a dynamic indexing strategy that provides flexible, pattern-base searching across document fields.
How it works:
Wildcard indexes automatically create index entries for all fields in a collection, including nested document fields. They support partial field matching across multiple or unknown fields. Unlike other indexes that target specific fields, wildcard indexes allow flexible querying on arbitrary or changing field names.
When to use:
Use wildcard indexes when working with fields that are not defined in advance or may change over time. For example, in user profile management with dynamic user attributes, a wildcard index allows searching across different user profile structures. This is particularly useful where field names vary between documents or your application repeatedly queries an embedded document field where the subfields are not consistent.
Geospatial indexes
A geospatial index enables complex queries of geographical location data.
How it works:
MongoDB provides two types of geospatial indexes: 2dsphere indexes, which support queries interpreting geometry on an earth-like sphere and 2d indexes, which use planar geometry with latitude and longitude coordinates.
When to use:
Use geospatial indexes for location-based services such as finding nearby points of interest such as restaurants or gas stations, calculating distances between geographical points, or determining if a point falls within a specific area. For example, in a mobile application for finding restaurants in Seattle, you could use a geospatial index to determine user location, the number of restaurants in that area, and restaurants within a specified distance.
Hashed indexes
A hashed index supports efficient sharding and load balancing across a cluster.
How it works:
MongoDB applies a hashing function to the values of the indexed field, transforming them into hash values. These hash values are used to distribute data more uniformly across shards in a cluster. This process helps reduce index hotspots in write-heavy environments by balancing write operations across multiple shards.
When to use:
Use a hashed index to balance data across multiple servers or clusters in a sharded environment. This improves distribution for fields with high cardinality such as timestamps or sequential IDs. For example, in a system tracking user activities with monotonically increasing timestamps, a hashed index on the timestamp field can ensure more balanced data distribution across shards, preventing any single shard from becoming a bottleneck.
TTL indexes
Time to Live (TTL) indexes automatically remove documents after a specified time period. They provide an automated way to manage and expire data with a natural lifecycle.
How it works:
MongoDB monitors the indexed date field with a background process and deletes documents that exceed the predefined time threshold.
When to use:
TTL indexes are useful for managing session data or clearing temporary logs. For example, in session management, to clean up expired user sessions to free up data storage.
Unique indexes
A unique index ensures critical fields are distinct across all documents in a collection.
How it works:
When MongoDB detects a duplicate value, the document cannot be inserted or updated.
When to use:
Use unique indexes when you need to enforce data integrity by preventing duplicate values in specific fields. For example, in user identification to prevent multiple users registering with the same email address or username.
Sparse indexes
Sparse indexes include only the documents where the indexed field exists, even if the value is null.
How it works:
Unlike other index types, MongoDB does not index all the documents in a collection when using sparse indexes. If the document does not contain the indexed field, it is skipped during indexing.
When to use:
Use sparse indexes when the indexed field is optional and appears only in some documents. Sparse indexes are useful when you need to reduce index size and optimize memory usage for irregularly populated data. For example, in a book collection where only some books have ratings, you could create a sparse index on the rating field. Queries that filter or sort by rating would be more efficient while conserving memory.
Partial indexes
Partial indexes include only the documents that match a specified filter condition.
How it works:
Partial indexes are selective and MongoDB indexes only a subset of documents in a collection.
When to use:
Use partial indexes when you need to reduce the size of the index for large collections and frequently query only a specific subset of the documents. For example, in an e-commerce platform with products in various states (available, sold out, discontinued), you could create a partial index on the status field for only the available products. If the most common search is for available products, this optimizes query performance while conserving storage.
Evaluating the impact of an index to improve MongoDB efficiency
If you want to know how well a particular index is working, you can assess whether it is beneficial for query performance by hiding it. Hidden indexes allow you to compare query execution times with and without the index. You can hide an index, run test queries, and then unhide it to observe the differences. While the index is hidden, MongoDB continues to maintain field values in the index for new or updated documents, ensuring that you can safely unhide it, if it is proven to improve query efficiency. However, the hidden index is not used by the MongoDB query planner. If the index hinders query efficiency, you can drop it with the confidence that you won’t be taking up database resources with unnecessary index recreation.
The indexing dilemma: weighing up the costs
Now that you know the right situations in which to apply a particular index type, you should be aware that liberally creating indexes comes at a cost. These extra costs include increased latency, higher CPU usage, and a larger memory footprint during write operations because all the relevant indexes must be updated when documents are inserted, updated, or deleted.
Indexes are particularly beneficial in read-heavy environments but can slow down write-heavy applications due to this maintenance overhead. However, with the right indexes, these trade-offs are often worthwhile for improving query performance on frequently accessed fields by reducing the number of documents scanned during read operations.
Effective indexing isn’t about adding indexes when you notice they’re missing, it’s about configuring indexes correctly from the start and designing queries with indexing in mind.