1. Introduction
Whenever we want to display large amounts of data to a user, paginating the database response should be our first option.
Implementing pagination itself might be easy, but creating a performant solution is a bit more complicated and involves several tradeoffs.
In this article, we’re going to look into different pagination options in MongoDB and see their main benefits and downsides.
2. Pagination with Skip and Limit
The most common approach to pagination is to use Mongo’s skip
and limit
. It is really straightforward and easy to get started, but there is a big issue: the larger the skip value, the slower the requests get.
Let’s imagine we are serving the requests using the following controller:
Here’s a graph which shows the request duration for a database with over 5,000,000 records:
As we can see, the larger the offset, the slower the request gets. For example, compared to offset 0, offsets of 5,000,000 takes more than 50% longer - this graph looks very much like O(n).
Where is the exact issue there?
In order to find the root cause of the apparent slowdown, we can use the Mongo explain
to understand the exact logic of the query. Let’s execute the following command in the Mongo shell:
We should see a response like this:
Digging further, we need to look the at key executionStats
. Examining the root-level keys totalDocsExamined
and nReturned
, we can see that in order to serve the skip
and limit
query with offset 1,000,000, the database needs to scan 1,000,020 records while just returning 20 records. What a waste of the resources!
In an ideal world, whenever we scan a record, we would like to also return it (i.e. the relation between totalDocsExamined
and nReturned
should be 1:1). Keeping this in mind helps us to write very efficient database queries.
3. Pagination with Cursor
Pagination with a cursor is one possible alternative to the skip
and limit
approach.
Instead of returning the numerical offset in the response data, the response should contain a reference to the next item in the list. When making a query to fetch the next page of the data, we have to pass the returned reference back to the query instead of the numerical offset.
If the referenced field is properly indexed, we should see a constant-time performance. Let’s imagine that we have the following controller:
As we can see, instead of returning the offset as a number, we return the ID of the last item in the array which serves as the indicator telling us where were we left off. The next time a client passes an offset to the request, we query it using $gt
.
Looking at the graph above, we can see that no matter how far are we with traversing the list of records, the performance stays roughly the same - O(1).
Executing explain
reveals that this time totalDocsExamined
is exactly equal to nReturned
.
As we’re using the comparison operator when requesting records, we always have to ensure that the offset value is orderable and unique.
For example, it’s perfectly fine to use either an integer, Mongo ID or a Version 1 UUID which contains a timestamp. However, we can’t use a value which is not sortable as an offset as we have no way to determine the exact orders of the records.
On the other hand, there’s a major drawback to the cursor-based approach. Namely, we can’t jump to a specific page in the list of contacts as we do with the skip
and limit
.
3. Bucket-based Pagination
Bucket-based pagination offers a solution to the previous problem: on one hand, it’s significantly faster than the skip-based approach, but it allows going to a specific page.
Let’s imagine that all of the records in units
collection are in the following form:
The bucket-based approach offers an alternative by bucketing multiple records into a single bucket:
For example, if we decide to have 5,000 elements in a single bucket, then instead of having 5,000,000 records in the database, we should end up with just a 1,000 records.
The controller for traversing the records looks like this:
Note that once again, we’re using skip
, but this time it’s way more efficient. Even if we are at the end of the records, we just need to scan a maximum of 1,000 records compared to the situation in Paragraph 2, where we had to scan over almost 5,000,000 elements.
Looking at the request duration graph, we can see that it looks like a linear one, meaning that no matter which page we’re at, the complexity of this approach should stay roughly the same.
Unfortunately there are no silver bullets in this game and this method has its’ downsides. The most important one probably is that, it’s a bit more complex to insert, delete or update an individual record inside the bucket.
For example, in order to insert a record to a bucket, we have to execute this query:
The query looks up a single record in the units
collection which has less items than the threshold of 5,000 and then push the item there.
Because the maximum document size in Mongo can be 16MB, we have to pay attention that we don’t exceed that limit. Hence, we should be careful when storing large documents inside the buckets.
4. Conclusion
Returning a paginated response is a required for almost every client-facing application.
Even though paginating response data with Mongo’s skip
and limit
is relatively straightforward, skip
causes the queries to be of complexity of O(n).
To create a performant contasnt-time pagination solution, we need to look either into using the cursor-based approach or start using buckets for storing the data.