Tutorial: YC Daily Hacker News

Build an automated pipeline that extracts Hacker News front page stories daily via webhook

This tutorial walks through the full Ev3ry workflow -- from creating a playbook that extracts Hacker News stories, to deploying an automated workflow triggered by a webhook. By the end, you will have a production-ready pipeline you can call from any external system.

What you will build

A daily data pipeline that:

Navigates to https://news.ycombinator.com/
Extracts all 30 front-page stories with title, URL, points, author, and comment count
Runs automatically when triggered by a webhook POST request
Returns structured JSON data for downstream processing

Step 1 -- Add the website

From the dashboard, click Add Website. Enter the site name, URL, and a brief description of the content.

app.ev3ry.io/websites

Add Website

Name

Hacker News

URL

https://news.ycombinator.com/

Description

YC Hacker News front page with tech stories, ranked by community votes

The URL is the starting point the agent will navigate to during extraction. The description helps the agent understand what data to look for.

Step 2 -- Create the data schema

Go to the website detail page and create a new Data Template. This defines the shape of the data you want to extract.

app.ev3ry.io/schemas/new

HN Stories Schema

FieldTypeFormatReq

namestring--

urlstringuri

commentsinteger--

This schema tells the agent to extract an array of objects, each with a story name, link, and comment count. The required constraint ensures incomplete rows are excluded.

Start with fewer fields to keep the first run fast. You can add points, author, rank, and posted_time later by editing the schema and re-running.

Step 3 -- Run the extraction

Click Run on the playbook page. The agent opens a browser, navigates to Hacker News, and writes an extraction script. Click the button below to see a simulated run.

app.ev3ry.io/websites/hacker-news/playbooks/hn-stories

Single Page - / (Hacker News)

news.ycombinator.com

Ready

Press Run to start the AI extraction agent.

What happens under the hood

Hacker News uses a simple HTML table layout. The single-page agent:

Loads the page and takes a DOM snapshot
Identifies the repeating <tr> row structure in the stories table
Checks for API endpoints or framework globals (__NEXT_DATA__, window.__NUXT__) -- Hacker News has none, so it falls back to DOM extraction
Writes a JavaScript extraction script that maps each row to your schema fields
Validates the script output against your schema (retries if the data shape is wrong)
Submits the final script and extracted data

The entire process takes about 20-30 seconds. The result is 30 structured story objects.

Step 4 -- Save as a playbook

After a successful run, the extraction is automatically saved as a playbook. A playbook stores:

The extraction script (JavaScript that runs in the browser)
The navigation actions (page URL, any clicks or scrolls)
The data schema

Future runs skip the AI exploration and execute the saved script directly. This makes re-runs faster (under 10 seconds) and cheaper (no LLM calls).

Step 5 -- Create a workflow

Now wrap the playbook in a workflow to add automation.

Go to the Workflows page and click New Workflow
Add a Playbook node and select your Hacker News playbook
Connect the trigger node to the playbook node
Set the trigger type to Webhook

app.ev3ry.io/workflows/yc-daily-hacker-news

YC Daily Hacker News

Draft

Webhook Trigger

POST /api/webhooks/...

Extract Data

Hacker News - 30 items

Click Deploy to activate the webhook endpoint. The workflow status changes from draft to deployed, and the endpoint starts accepting requests.

Step 6 -- Trigger via webhook

Send a POST request to trigger the workflow. The x-webhook-secret header authenticates the request.

Terminal

Webhook request

curl -X POST https://your-domain.com/api/webhooks/YOUR_ID \
  -H "Content-Type: application/json" \
  -H "x-webhook-secret: whsec_7d58..." \
  -d '{"source": "tutorial"}'

The workflow creates a new run, navigates to Hacker News, executes the saved extraction script, and stores the results. The entire execution takes about 10-30 seconds depending on browser startup time.

Step 7 -- View run history

Go to Workflows > YC Daily Hacker News > History to see all past runs. Click any node in the execution tree to inspect its output data.

app.ev3ry.io/workflows/yc-daily/runs

Click a node to inspect output

Each run stores the full extracted dataset. You can compare outputs across runs to track how the Hacker News front page changes over time.

Automating with an external scheduler

Since the workflow is webhook-triggered, you can automate it from any system:

Cron job (Linux/macOS)

Add to your crontab (crontab -e) to run daily at 8 AM:

0 8 * * * curl -s -X POST https://your-domain.com/api/webhooks/YOUR_WORKFLOW_ID \
  -H "Content-Type: application/json" \
  -H "x-webhook-secret: whsec_YOUR_SECRET" \
  -d '{"source": "crontab"}'

GitHub Actions

name: Daily HN Extract
on:
  schedule:
    - cron: '0 8 * * *'
jobs:
  trigger:
    runs-on: ubuntu-latest
    steps:
      - run: |
          curl -X POST ${{ secrets.WEBHOOK_URL }} \
            -H "Content-Type: application/json" \
            -H "x-webhook-secret: ${{ secrets.WEBHOOK_SECRET }}" \
            -d '{"source": "github-actions"}'

Zapier / Make / n8n

Use a generic HTTP Request action:

Method: POST
URL: Your webhook endpoint
Headers: x-webhook-secret: whsec_... and Content-Type: application/json
Body: {"source": "zapier"} (optional, for your own tracking)

Extending the pipeline

Once the basic pipeline works, you can extend it:

Add fields -- edit the schema to include points, author, rank, posted_time. Re-run the playbook to generate a new extraction script.
Add pagination -- update the website description: "Click the 'More' link at the bottom and extract stories from the first 3 pages." This gives you 90 stories per run.
Add an iterator -- use an iterator node to process each extracted story individually (e.g., visit each URL and extract the article body).
Add a condition -- filter stories by point threshold using a condition node before downstream processing.

Summary

Step	What you did
1	Added Hacker News as a website
2	Defined a JSON schema for story data
3	Ran the AI agent to generate an extraction script
4	Saved the result as a reusable playbook
5	Created a workflow with a webhook trigger
6	Deployed and triggered the pipeline via HTTP POST
7	Verified results in the run history

You now have a production-ready data pipeline that extracts structured data from Hacker News on demand. The same pattern works for any website -- swap the URL, schema, and playbook to build pipelines for product monitoring, competitor tracking, sports data, or any other use case.

PreviousMulti-Page Extraction