Multi-Page Extraction
Extract and compare data across multiple pages
Some extractions span more than one page — paginated results, detail pages linked from a list, or data from two structurally identical pages you want to compare.
Two approaches
1. Single-page agent with navigation
The default single-page agent can navigate across pages as part of its exploration. It clicks "Next page" buttons, follows detail page links, and collects data across the navigation path. This works well for:
- Paginated lists (click-based pagination or infinite scroll)
- List → detail page flows (collect a list, visit each item for more fields)
- Category tabs or filter-based navigation
Describe the navigation intent in your website description:
"This is a paginated event listing. Scroll through all pages using the 'Load more' button and collect all events."
"Each event card links to a detail page with the full schedule. Visit each detail page to get the full lineup."
The agent uses navigation tools (click, scroll, wait) to reach the data before writing its extraction script.
2. Multi-page agent (two-page comparison)
The multi-page agent is designed for a specific use case: two pages from the same site that are structurally identical (e.g., two match pages, two product pages from the same store). It:
- Loads both pages simultaneously
- Reads the cleaned HTML from each
- Generates a single extraction script that works for both
- Returns one result object per page
This is a fast, one-shot extraction — no validation loop, no network inspection. It works best for DOM-consistent pages where the single-page agent's deeper analysis is not needed.
Use the multi-page agent when:
- You want to compare the same data across two pages on the same site
- The site doesn't expose data via APIs and DOM extraction is sufficient
- You need results from both pages in a single run
Performance notes
Multi-page extractions take longer per run than single-page ones. Each additional page requires a browser load, rendering time, and extraction passes. For large page sets (50+), consider:
- Using a playbook (skips AI exploration on subsequent runs)
- Batching with a workflow (run multiple playbooks in parallel)
- Starting with a small page count to validate the schema before scaling up