Paginating Requests in APIs

When exposing large data sets through APIs, it needs to provide a mechanism to paginate the list of resources. Multiple names are used in the industry for endpoints that return a paginated set, particularly in REST APIS, such as collection resource, listing endpoints, index endpoints, etc. I am going to name it “listing endpoints” throughout the document.

Although rarely, a listing endpoint is not paginated, it might happen. Particularly when the set is fixed, well defined, and the amount of data is considerably small.

Before exposing any listing endpoint, we should ask ourselves, at least, the following questions: Can the list be sorted? Is there any default order? Can the list be filtered? If so, which filter params should it accept? Are the queries, executed under the hood, performant enough?. Exposing endpoints is very easy. Deprecating and deleting them is extremely hard and potentially impossible.

There are different techniques to provide listing endpoints and how to paginate them. In this post, I will discuss different approaches and their usage in the industry. Most of the use cases are REST APIs, but it applies to any SOAP, REST, GraphQL, etc. The most common pagination techniques are Page-based pagination (also called offset-based pagination), KeySet-Based pagination, and Cursor-based Pagination.

Page — based pagination

This is the simplest and most common form of paging, particularly for Apps that use SQL Databases. The set is divided into pages. The endpoint accepts a page param that is an integer indicating the page within the list to be returned.

If it’s using SQL, It will do a query using LIMIT and OFFSET, where the first is the length of the page and the latter is the number of records already returned (page's size * page).

SQL offset skips the first N results of the query. Nonetheless, the database will have to fetch from the disks the N results and bring them in order to return the records next to them.

“…the rows are first sorted according to the <order by clause> and then limited by dropping the number of rows specified in the <result offset clause> from the beginning…. .” SQL:2016, Part 2, §4.15.3 Derived tables

See the analysis of performance using OFFSET by Markus Winand.

Pros:

  • You can jump to any particular page, not need to query 99 pages to get the page 100.
  • It allows sending parallel requests with different pages.
  • Stateless on the server-side.
  • Easy to understand and debug.
  • This approach requires very little business logic. There are a lot of libraries written in different languages which are easy to use.

Cons:

  • Bad performance for large OFFSET in SQL. When doing OFFSET Nin SQL, the database needs to scan and count N rows.

“The larger the offset, the slower the request is, up until the point that it times out.” Shopify Blog

  • It can return repeated or missing if any is added/deleted while paginating. E.g., If the first request asks for page 1 and a new record is inserted to the first page, then the request with page 2 will have a record repeated which was returned on the previous request.

KeySet-based pagination

The API provides a key param that acts as a delimiter of the page. This key param should be the same key of the set sort order. For example, if the set is sorted by ID, then the key param should be since_id. Other examples would be since_updated_at, since_created_at etc.

The first request doesn’t contain the delimiter param. The response of this request will contain the value of the key for the last element of the set. For instance, if the delimiter is the id, to get the next page, the client needs to send the paramsince_id with the value id of the last element of the response, and the set must be sorted by id.

The SQL query executed would be:

Pros:

  • The SQL query is more efficient than OFFSET (for most cases) since it uses a WHERE condition (assuming it has good SQL indexes).
  • Unlike page-based pagination, new records inserted on previous pages won’t cause duplicated elements.

Cons:

  • It’s tied to the sort order. If you want to use since_id then, the set should be sorted by id.
  • There is no way to jump for a specific page. It needs to iterate through all the prior pages.
  • It doesn’t allow sending parallel requests for different batches.
  • The API needs to expose multiple key-params (e.g. since_id,since_updated_at).
  • The client needs to keep track of the key-value of the set.
  • Missing items if they are added to the previous pages

Cursor-based pagination

Given a set, a cursor will be a piece of data that contains a pointer to an element and the info to get the next/previous elements. The server should return the cursor pointing to the next page in each request. In most cases, the cursor is opaque, so users cannot manipulate it.

Some APIs also return a cursor per element so that clients can ask for elements after a specific one.

The SQL query will depend on the implementation, but it will be similar to the query generated by the KeySet-based Pagination method using a WHERE condition.

There are different approaches to implement this method. Some return the cursor as part of the payload, others return the cursor as part of the Header, particularly in the LINK headers. The cursor might contain all the information needed or partially, allowing clients to add other filter params.

Pros:

  • If the cursor is opaque, the implementation underneath can change without having to introduce an API change.
  • In SQL, for most of the cases, it is much faster than using page since it won’t use OFFSET in the Database.
  • There is no issue when a record is deleted as opposed to Page-based Pagination.

Cons:

  • There is no way to skip pages. If the user wants page X, it needs to request pages from 1 to X.
  • It doesn’t allow sending parallel requests for different batches.
  • The implementation is more complex than LIMIT/OFFSET.
  • Hard to debug. Given a request, you have to unencode it to see what’s doing.
  • Missing items if they are added to the previous pages

Usages in the industry

Different Apps follow different techniques. In this section, I will describe a few approaches taken by different companies:

Stripe

Stripe does use cursor-based pagination in its REST API. It accepts three parameters: limit, starting_afterand ending_before.

Both parameters take an existing object ID value and return objects in reverse chronological order. The ending_before parameter returns objects listed before the named object. The starting_after parameter returns objects listed after the named object. These parameters are mutually exclusive -- only one of starting_after orending_before may be used.

Facebook:

Facebook uses the three types of pagination mentioned here, but they emphasize using Cursor-based pagination.

Cursor-based pagination is the most efficient method of paging and should always be used when possible.Facebook API Docs

Page-based Pagination: Accepts two params offset and limit.

https://facebook.com/me/feed?limit=25&offset=50

Time-based Pagination: It’s a particular case of KeySet-based pagination where the key is a timestamp (e.g. created_at, updated_at). It accepts one of the two params (since or until). For example:

https://facebook.com/me/feed?limit=25&since=1314224121 https://facebook.com/me/feed?limit=25&until=1314224121

Cursor-based Pagination: It accepts one of the two params before or after and the value of it is the cursor returned by previous requests. Example:

https://facebook.com/me/feed?limit=25&before=NDMyNzQy https://facebook.com/me/feed?limit=25&after=NDMyNzQy

Slack API

The Slack API uses the three methods. However, Cursor-based pagination is used heavily, and KeySet-based pagination is deprecated and will be removed soon (Nov 2020).

Cursor-based pagination: It accepts cursor and limit parameters. The response returns as part of the payload the cursor to be sent in the next request.

https://slack.com/api/user.list?limit=20&cursor=dXljpVEc6&token=123JOIC

Page-based pagination: It uses page and count or other limiting parameters. Only a few endpoints support it.

KeySet-based pagination: The key is a timestamp value oldest and latest. This method is deprecated.

Shopify:

Shopify uses the three methods. However, Page-based pagination is deprecated in the REST Admin API in favor of Cursor-based pagination.

Cursor-based Pagination: The response body of a GET request returns the first page of results, and the response header returns the URLs to the next and previous page (if applicable) via the LINK header.

Github:

Github uses page-based Pagination on version REST v3 that accepts a page param.

https://api.github.com/search/code?=addClass+user:mozilla&page=10

It was removed on version v4, which uses GraphQL and Cursor-based pagination via the params before and after.

Mechanisms used for different companies

Conclusions

There isn’t any perfect solution for pagination. It depends on the case. If you don’t care about skipping pages, and performance is important for the case, I would suggest using cursors, otherwise using page might be the best option.

Thanks to Marius-Constantin Melemciuc, German Chiazzo, and Princejeet Singh for reviewing this post.

Staff Software Engineer at Shopify. Opinions are my own. ignaciochiazzo.com