# **Mastering Pagination in Your Hypersync Data Sources**

When you're pulling data from an external REST API, chances are that API isn't going to hand you every record in a single response. Most APIs break their data into pages — returning a chunk at a time along with some mechanism for fetching the next chunk. The Hypersync SDK handles all of this for you through **paging schemes** defined in your `dataSource.json` file.

This guide walks through each of the four supported paging schemes in detail, with real-world examples, tips for choosing the right one, and guidance on handling edge cases. If you haven't already read the [Data Sources](https://developer.hyperproof.app/hypersync-sdk/doc/005-data-sources) overview, start there first — we'll be building on those fundamentals here.

Let's dive in.

## **Before We Start: How Paging Schemes Fit Into Your dataSource.json**

Every dataset you define in your `dataSource.json` can optionally include a `pagingScheme` object. When a paging scheme is present, the SDK will automatically loop through pages of results, accumulating data until the stop condition is met. You don't have to write any loop logic yourself — just configure it correctly, and the `RestDataSourceBase` takes care of the rest.

A paging scheme always has three core pieces:

* **`type`** — Which paging strategy to use (`pageBased`, `offsetAndLimit`, `nextToken`, or `graphqlConnections`).
* **`request`** — How the SDK should construct the query parameters (or request body) for each page.
* **`pageUntil`** — The stop condition that tells the SDK when to quit fetching pages.


Some schemes also include a **`response`** object, which tells the SDK where to find relevant metadata (like total counts or next-page tokens) in the API's response body.

NOTE: By default, paging parameters are appended to the URL query string. If your dataset uses `"method": "POST"`, the paging parameters are automatically included in the request body instead. You don't need to do anything extra for that — the SDK handles it.

## **Page-Based Paging**

This is the most straightforward approach. The API accepts a page number and a page size. You start at page 1 (or sometimes 0), and the SDK increments the page number by one after each request.

### **When to Use It**

Use page-based paging when the API documentation says something like "pass `page` and `per_page`" or "pass `pageNumber` and `pageSize`." This is extremely common in REST APIs. If you see numbered pages in the API docs, this is your scheme.

### **Basic Example**

Let's say your external service has an endpoint `/api/v1/users` that accepts `pageNumber` and `pageSize` query parameters. The first call would look like:


```
GET /api/v1/users?pageNumber=1&pageSize=50
```

Here's how you'd configure that in `dataSource.json`:


```json
{
  "dataSets": {
    "users": {
      "url": "/api/v1/users",
      "method": "GET",
      "property": "data.users",
      "pagingScheme": {
        "type": "pageBased",
        "request": {
          "pageParameter": "pageNumber",
          "pageStartingValue": 1,
          "limitParameter": "pageSize",
          "limitValue": 50
        },
        "pageUntil": "noDataLeft"
      }
    }
  }
}
```

The SDK will call the endpoint repeatedly — `?pageNumber=1&pageSize=50`, then `?pageNumber=2&pageSize=50`, then `?pageNumber=3&pageSize=50` — until the response comes back with an empty result set.

### **Using `reachTotalCount` Instead of `noDataLeft`**

Sometimes the API tells you upfront how many total records exist. If that's the case, you can use `reachTotalCount` as your stop condition. This avoids making one extra "empty" request at the end.

Let's say the API response looks like this:


```json
{
  "data": {
    "users": [ ... ],
    "totalRecords": 237
  }
}
```

You'd configure the scheme like this:


```json
"pagingScheme": {
  "type": "pageBased",
  "request": {
    "pageParameter": "pageNumber",
    "pageStartingValue": 1,
    "limitParameter": "pageSize",
    "limitValue": 50
  },
  "response": {
    "totalCount": "data.totalRecords"
  },
  "pageUntil": "reachTotalCount"
}
```

The `totalCount` path (`"data.totalRecords"`) tells the SDK where in the response body to find the total. The SDK then does the math: once `pageNumber * pageSize >= totalRecords`, it stops.

### **Watch Out: Zero-Based vs One-Based Pages**

Some APIs start page numbering at 0, others at 1. Make sure `pageStartingValue` matches what the API expects. Getting this wrong won't throw an error — you'll just silently skip the first page of data or fetch an empty page first.

## **Offset and Limit Paging**

Offset and limit is similar to page-based, but instead of a page number, the API accepts a starting index (offset) and a count (limit). The offset increments by the limit value after each page. Think of it like: "give me 100 items starting at item 0," then "give me 100 items starting at item 100," and so on.

### **When to Use It**

Use this when the API documentation mentions `offset` and `limit` parameters, or when it feels like a database-style skip/take pattern. This is common in APIs that are thin wrappers around SQL queries.

### **Basic Example**

Imagine your service has an endpoint `/api/v1/tickets` that takes `offset` and `limit` query parameters, and returns a total count in the response:


```
GET /api/v1/tickets?offset=0&limit=100
```

Response:


```json
{
  "tickets": [ ... ],
  "pagination": {
    "total": 542
  }
}
```

Here's the configuration:


```json
{
  "dataSets": {
    "tickets": {
      "url": "/api/v1/tickets",
      "method": "GET",
      "property": "tickets",
      "pagingScheme": {
        "type": "offsetAndLimit",
        "request": {
          "offsetParameter": "offset",
          "offsetStartingValue": 0,
          "limitParameter": "limit",
          "limitValue": 100
        },
        "response": {
          "totalCount": "pagination.total"
        },
        "pageUntil": "reachTotalCount"
      }
    }
  }
}
```

The SDK will generate calls like:

* `?offset=0&limit=100`
* `?offset=100&limit=100`
* `?offset=200&limit=100`
* ...and so on until the offset reaches or exceeds 542.


### **Can I Use `noDataLeft` with Offset and Limit?**

Yes! If the API doesn't return a total count, you can use `"pageUntil": "noDataLeft"` instead and omit the `response` object entirely. The SDK will keep fetching until it gets an empty result set. Just know that this means one extra request at the end that comes back empty.

### **Choosing Between Page-Based and Offset/Limit**

If the API supports both styles, either will work. In practice, use whichever one the API's documentation uses as its primary example. If you're unsure, offset/limit tends to be slightly more predictable because you're working with absolute positions rather than page numbers.

## **Next Token Paging**

Token-based (or cursor-based) pagination is increasingly popular in modern APIs. Instead of specifying a page number or offset, the API returns a token (sometimes called a cursor) with each response. You pass that token back in the next request to get the next page of data. When the token stops appearing in the response, you've reached the end.

### **When to Use It**

Use this when the API gives you a `nextToken`, `cursor`, `nextPageToken`, `continuationToken`, or a full URL to the next page in the response body. This is the standard approach for APIs like AWS services, many Google APIs, and newer REST APIs in general.

### **Basic Example with a Token String**

Let's say the API endpoint is `/api/v1/events` and it accepts `size` and `token` parameters. The first response looks like this:


```json
{
  "events": [ ... ],
  "next": {
    "token": "eyJsYXN0SWQiOiI0MjAifQ=="
  }
}
```

You'd configure it like this:


```json
{
  "dataSets": {
    "events": {
      "url": "/api/v1/events",
      "method": "GET",
      "property": "events",
      "pagingScheme": {
        "type": "nextToken",
        "request": {
          "tokenParameter": "token",
          "limitParameter": "size",
          "limitValue": 50
        },
        "response": {
          "nextToken": "next.token"
        },
        "pageUntil": "noNextToken",
        "tokenType": "token"
      }
    }
  }
}
```

The SDK will call:

1. `GET /api/v1/events?size=50` (no token on the first call)
2. `GET /api/v1/events?size=50&token=eyJsYXN0SWQiOiI0MjAifQ==`
3. `GET /api/v1/events?size=50&token=<next token from step 2>`
4. ...continues until the response no longer contains a `next.token` value.


### **Basic Example with a URL**

Some APIs don't return a token string — they return a full URL to the next page. The SDK supports this as well. Just change `tokenType` to `"url"`.

Let's say the response looks like this:


```json
{
  "results": [ ... ],
  "nextPageUrl": "https://api.example.com/api/v1/events?cursor=abc123&size=50"
}
```

Configure it like this:


```json
"pagingScheme": {
  "type": "nextToken",
  "request": {
    "tokenParameter": "nextPageUrl",
    "limitParameter": "size",
    "limitValue": 50
  },
  "response": {
    "nextToken": "nextPageUrl"
  },
  "pageUntil": "noNextToken",
  "tokenType": "url"
}
```

When `tokenType` is `"url"`, the SDK will use the returned URL directly as the next request URL instead of appending a query parameter. This is a really handy feature — just make sure the API is returning a fully qualified URL and not a relative path.

### **Tokens in Response Headers**

Here's a scenario that trips people up: some APIs return pagination tokens in response headers rather than in the response body. The SDK handles this too — just prefix the path with `header:` in the `response` object.

For example, if the API returns a header like `X-Next-Token: abc123`:


```json
"response": {
  "nextToken": "header:X-Next-Token"
}
```

NOTE: The `header:` prefix trick works in any paging scheme where you're referencing response values — not just next token. If an API returns its total count in a header, you can use `"totalCount": "header:X-Total-Count"` in an offset/limit scheme as well.

## **GraphQL Connections Paging**

If you're working with a GraphQL API that follows the [GraphQL Connections specification](https://graphql.org/learn/pagination/#connection-specification), this is the scheme for you. It uses forward cursor-based pagination with `first`, `after`, `endCursor`, and `hasNextPage`.

### **When to Use It**

Use this when the GraphQL API returns a `pageInfo` object with `endCursor` and `hasNextPage` fields. This is the standard pagination pattern in GraphQL — if you're querying a GraphQL endpoint, this is almost certainly what you want.

### **Full Example**

Let's say you're pulling attributes from a GraphQL API. Your query uses the connections pattern with `first` and `after` variables:


```json
{
  "dataSets": {
    "attributes": {
      "url": "/graphql",
      "method": "POST",
      "body": {
        "query": "query($first: Int, $after: String) { attributes(first: $first, after: $after) { nodes { id name description createdAt } pageInfo { endCursor hasNextPage } } }",
        "variables": {
          "first": 500
        }
      },
      "property": "data.attributes.nodes",
      "pagingScheme": {
        "type": "graphqlConnections",
        "request": {
          "limitParameter": "first",
          "limitValue": 500
        },
        "response": {
          "pageInfo": "data.attributes.pageInfo"
        },
        "pageUntil": "noNextPage"
      }
    }
  }
}
```

Here's what happens under the hood:

1. The SDK sends the first request with `variables: { "first": 500 }`.
2. The response includes `pageInfo: { "endCursor": "abc123", "hasNextPage": true }`.
3. The SDK automatically injects `"after": "abc123"` into the variables for the next request.
4. This continues until `hasNextPage` comes back as `false`.


You don't need to manually add the `after` variable to your query's variables object — the SDK handles that dynamically. Just make sure the `$after` variable is declared in your query definition.

### **A Few Things to Keep in Mind**

* The SDK currently supports **forward pagination only** (`first`/`after`). Backward pagination with `last`/`before` is not supported.
* **Nested connections** (paginating within paginated results) are not supported by the scheme directly. If you need to paginate a nested connection, you'd likely need a [custom data source](https://developer.hyperproof.app/hypersync-sdk/doc/005-data-sources#custom-data-sources) approach.
* The `pageInfo` path in the `response` object should point to the `pageInfo` object itself — not to `endCursor` or `hasNextPage` individually. The SDK knows to look for both fields within that object.


## **Choosing the Right Paging Scheme: A Quick Decision Guide**

Not sure which scheme to use? Here's a simple way to figure it out:

* **Does the API use GraphQL?** → Use `graphqlConnections`.
* **Does the API return a token, cursor, or "next" URL in its responses?** → Use `nextToken`.
* **Does the API accept `offset` and `limit` parameters?** → Use `offsetAndLimit`.
* **Does the API accept a `page` number parameter?** → Use `pageBased`.


If the API supports multiple styles, prefer `nextToken` or `graphqlConnections` when available — cursor-based approaches are generally more reliable for large datasets because they aren't affected by records being added or deleted between pages.

## **Choosing the Right Stop Condition**

Every paging scheme needs a `pageUntil` value. Here's when to use each:

* **`noDataLeft`** — Stop when a page comes back with zero results. This is the simplest and most universal option. The trade-off is one extra empty request at the end.
* **`reachTotalCount`** — Stop when you've fetched enough pages to cover the total count reported by the API. This avoids the extra empty request, but requires the API to return a total count. Use this with `pageBased` or `offsetAndLimit` schemes.
* **`noNextToken`** — Stop when the response no longer contains a next token/cursor. This is specifically for `nextToken` schemes.
* **`noNextPage`** — Stop when `hasNextPage` is `false`. This is specifically for `graphqlConnections` schemes.


## **Common Pitfalls and How to Avoid Them**

### **Wrong Property Path**

The most common mistake is getting the `property`, `totalCount`, `nextToken`, or `pageInfo` path wrong. These paths use dot notation to navigate the JSON response body. Double-check your paths against the actual API response — even a small typo like `"data.users"` vs `"data.items"` will cause the SDK to find no data and stop immediately.

TIP: Use the [Debugging](https://developer.hyperproof.app/hypersync-sdk/doc/008-debugging) guide to inspect the raw API responses while developing. This makes it much easier to verify that your paths are correct.

### **Page Size Too Large or Too Small**

Setting `limitValue` too high might cause the API to reject the request or time out. Setting it too low means more round trips and slower syncs. Check the API's documentation for maximum allowed page sizes and set your `limitValue` at or just below that maximum.

### **Forgetting the `header:` Prefix**

If the total count or next token lives in a response header and you're looking for it in the body, you'll get no results. Remember to use the `header:` prefix in your path (e.g., `"header:X-Total-Count"`).

### **POST Datasets and Paging Parameters**

When your dataset uses `"method": "POST"`, the SDK puts paging parameters into the request body instead of the query string. This is usually what you want, but make sure the API expects them there. If the API expects paging in the query string even for POST requests, you may need a [custom data source](https://developer.hyperproof.app/hypersync-sdk/doc/005-data-sources#custom-data-sources) to handle that.

## **When JSON Configuration Isn't Enough**

The four paging schemes cover the vast majority of REST API patterns, but some APIs have quirks that don't fit neatly into any of them. A few examples:

* The API requires you to call one endpoint to initiate a data export and another to retrieve results (see [Overriding getProofData](https://developer.hyperproof.app/hypersync-sdk/override-getproofdata) for a walkthrough of exactly this scenario).
* The API uses a non-standard pagination pattern, like returning an array of page IDs that you have to fetch individually.
* The API requires different authentication headers for different pages.


In these cases, implement a **custom data source** with your own `getData` method, and write the paging loop yourself. The `IDataSource` interface gives you full control — you can make as many HTTP calls in whatever order you need.

This should give you a solid foundation for configuring pagination in your Hypersync apps. When in doubt, start with the simplest scheme that matches your API, verify the response paths using the debugger, and iterate from there. Happy syncing!