Tutorial: YC Daily Hacker News
Build an automated pipeline that extracts Hacker News front page stories daily via webhook
This tutorial walks through the full Ev3ry workflow -- from creating a playbook that extracts Hacker News stories, to deploying an automated workflow triggered by a webhook. By the end, you will have a production-ready pipeline you can call from any external system.
What you will build
A daily data pipeline that:
- Navigates to
https://news.ycombinator.com/ - Extracts all 30 front-page stories with title, URL, points, author, and comment count
- Runs automatically when triggered by a webhook POST request
- Returns structured JSON data for downstream processing
Step 1 -- Add the website
From the dashboard, click Add Website. Enter the site name, URL, and a brief description of the content.
Add Website
The URL is the starting point the agent will navigate to during extraction. The description helps the agent understand what data to look for.
Step 2 -- Create the data schema
Go to the website detail page and create a new Data Template. This defines the shape of the data you want to extract.
HN Stories Schema
This schema tells the agent to extract an array of objects, each with a story name, link, and comment count. The required constraint ensures incomplete rows are excluded.
Start with fewer fields to keep the first run fast. You can add points, author, rank, and posted_time later by editing the schema and re-running.
Step 3 -- Run the extraction
Click Run on the playbook page. The agent opens a browser, navigates to Hacker News, and writes an extraction script. Click the button below to see a simulated run.
Single Page - / (Hacker News)
news.ycombinator.com
What happens under the hood
Hacker News uses a simple HTML table layout. The single-page agent:
- Loads the page and takes a DOM snapshot
- Identifies the repeating
<tr>row structure in the stories table - Checks for API endpoints or framework globals (
__NEXT_DATA__,window.__NUXT__) -- Hacker News has none, so it falls back to DOM extraction - Writes a JavaScript extraction script that maps each row to your schema fields
- Validates the script output against your schema (retries if the data shape is wrong)
- Submits the final script and extracted data
The entire process takes about 20-30 seconds. The result is 30 structured story objects.
Step 4 -- Save as a playbook
After a successful run, the extraction is automatically saved as a playbook. A playbook stores:
- The extraction script (JavaScript that runs in the browser)
- The navigation actions (page URL, any clicks or scrolls)
- The data schema
Future runs skip the AI exploration and execute the saved script directly. This makes re-runs faster (under 10 seconds) and cheaper (no LLM calls).
Step 5 -- Create a workflow
Now wrap the playbook in a workflow to add automation.
- Go to the Workflows page and click New Workflow
- Add a Playbook node and select your Hacker News playbook
- Connect the trigger node to the playbook node
- Set the trigger type to Webhook
Webhook Trigger
POST /api/webhooks/...
Extract Data
Hacker News - 30 items
Click Deploy to activate the webhook endpoint. The workflow status changes from draft to deployed, and the endpoint starts accepting requests.
Step 6 -- Trigger via webhook
Send a POST request to trigger the workflow. The x-webhook-secret header authenticates the request.
curl -X POST https://your-domain.com/api/webhooks/YOUR_ID \
-H "Content-Type: application/json" \
-H "x-webhook-secret: whsec_7d58..." \
-d '{"source": "tutorial"}'The workflow creates a new run, navigates to Hacker News, executes the saved extraction script, and stores the results. The entire execution takes about 10-30 seconds depending on browser startup time.
Step 7 -- View run history
Go to Workflows > YC Daily Hacker News > History to see all past runs. Click any node in the execution tree to inspect its output data.
Each run stores the full extracted dataset. You can compare outputs across runs to track how the Hacker News front page changes over time.
Automating with an external scheduler
Since the workflow is webhook-triggered, you can automate it from any system:
Cron job (Linux/macOS)
Add to your crontab (crontab -e) to run daily at 8 AM:
0 8 * * * curl -s -X POST https://your-domain.com/api/webhooks/YOUR_WORKFLOW_ID \
-H "Content-Type: application/json" \
-H "x-webhook-secret: whsec_YOUR_SECRET" \
-d '{"source": "crontab"}'
GitHub Actions
name: Daily HN Extract
on:
schedule:
- cron: '0 8 * * *'
jobs:
trigger:
runs-on: ubuntu-latest
steps:
- run: |
curl -X POST ${{ secrets.WEBHOOK_URL }} \
-H "Content-Type: application/json" \
-H "x-webhook-secret: ${{ secrets.WEBHOOK_SECRET }}" \
-d '{"source": "github-actions"}'
Zapier / Make / n8n
Use a generic HTTP Request action:
- Method: POST
- URL: Your webhook endpoint
- Headers:
x-webhook-secret: whsec_...andContent-Type: application/json - Body:
{"source": "zapier"}(optional, for your own tracking)
Extending the pipeline
Once the basic pipeline works, you can extend it:
- Add fields -- edit the schema to include
points,author,rank,posted_time. Re-run the playbook to generate a new extraction script. - Add pagination -- update the website description: "Click the 'More' link at the bottom and extract stories from the first 3 pages." This gives you 90 stories per run.
- Add an iterator -- use an iterator node to process each extracted story individually (e.g., visit each URL and extract the article body).
- Add a condition -- filter stories by point threshold using a condition node before downstream processing.
Summary
| Step | What you did |
|---|---|
| 1 | Added Hacker News as a website |
| 2 | Defined a JSON schema for story data |
| 3 | Ran the AI agent to generate an extraction script |
| 4 | Saved the result as a reusable playbook |
| 5 | Created a workflow with a webhook trigger |
| 6 | Deployed and triggered the pipeline via HTTP POST |
| 7 | Verified results in the run history |
You now have a production-ready data pipeline that extracts structured data from Hacker News on demand. The same pattern works for any website -- swap the URL, schema, and playbook to build pipelines for product monitoring, competitor tracking, sports data, or any other use case.