JD-Pipeline: Job Search Automation Pipeline

Find, Evaluate, Tailor - End to End in One Evening

Job search automation, end to end: find roles, score fit, generate tailored documents – built in one evening with Python and Claude API

JD-Pipeline automates the most repetitive part of a job search: finding roles, evaluating fit, and generating tailored application documents. Built end-to-end at the Claude for Everyone AI Build Night in Toronto, June 2026.

Challenge

Why Job Search Automation Matters: The Manual Loop is Broken

Most job searches follow the same exhausting loop: find a role, read the description, decide if it fits, tailor a resume, write a cover letter, repeat. For a job seeker targeting 5-7 role types across multiple job boards, this process can consume hours per week with no guarantee of quality. The core problem is not effort – it is a lack of structure. Without a systematic way to find, evaluate, and tailor, time gets wasted on poor-fit roles and strong-fit roles get generic applications.

No fit scoring

Every role looks like a maybe until you read the full description

No memory

The same reposted role gets evaluated twice with no record of prior scoring

No tailoring at scale

Cover letters and resume bullets get recycled rather than targeted

STRATEGIC OBJECTIVES

Product Strategy: A Three-Stage Automation Pipeline

I scoped jd-pipeline as a minimum viable automation system with three distinct stages, each with a single responsibility. jd-pipeline is a job search automation system built around three principles: single responsibility per stage, zero redundant API calls, and human review before any document is generated. The goal was to eliminate manual effort at every stage while keeping the human in control of the final decision.

Find – Apify scrapes LinkedIn and Indeed for roles matching 6 search terms across Toronto GTA and Canada remote. 50 results per run, deduplicated by URL across both sources. Scoring memory tracks roles seen across previous runs so no role is evaluated twice unnecessarily.
Evaluate – The Claude API reads each role against a master resume and outputs a fit score (0-100), a verdict (Skip / Maybe / Strong), and a 2-line reason. Only Strong fits proceed to Stage 3. Roles already scored in prior runs are skipped entirely – zero API cost for repeat scrapes.
Tailor – Claude generates tailored resume bullets and a cover letter draft for each Strong fit. python-docx produces properly formatted .docx files on demand via a single terminal command (bash apply.sh “Company” “Role”). All results are written to a Notion database as a structured tracker with full status management.

Action & Results

My Approach: PM-Led Architecture with Deliberate Cost Control

I built jd-pipeline as a personal job search automation tool in a single evening using Python, the Anthropic SDK, Apify, and the Notion API. Every architecture decision was driven by a PM instinct: scope the minimum viable system, ship it, then extend deliberately.

Decoupled document generation – the first version generated .docx files automatically for every Strong fit. I identified this as wasteful: most Strong fits still need human review before documents are worth generating. Decoupling generation from the main pipeline reduced per-run API cost by approximately 80%.
Duplicate URL detection – get_existing_urls() queries the Notion database before the scoring loop begins. Any URL already scored is skipped with zero API calls. Scoring memory then updates Last Seen and Previous Score so reposted roles are tracked over time without redundant cost.
Security from day one – .env file isolation, .gitignore enforced before any other file was created, startup key validation that fails loudly if any credential is missing. No keys ever appear in code or chat.
On-demand tailoring – generate_docs.py and apply.sh allow document generation for any specific role by company name or Notion row ID. The human reviews Notion first, decides the role is worth pursuing, then generates documents. This keeps quality high and cost low.

Results after first run:

50 roles scored across LinkedIn and Indeed in under 4 minutes
Strong fits written to Notion with resume bullets and cover letter drafts
Cost per full run: approximately $0.50

Job Discovery

Apify (LinkedIn actor + Indeed actor) · 6 search terms · GTA + Canada remote · 50 results/run · URL dedup across sources

AI Layer

Anthropic Python SDK · claude-sonnet-4-6 · Fit scoring (0-100) · Resume bullet generation · Cover letter drafting · On-demand via apply.sh

Output & Tracking

Notion API · Role Finder database · Status management · Scoring memory · python-docx · .docx resume + cover letter per role

Security & Runtime

Python 3.14 · venv · python-dotenv · .gitignore · Startup key validation · Mac Mini M4

The Spotlight Feature: The Scoring Memory System: The Engine Behind Efficient Job Search Automation

The Problem

Job boards repost the same roles weekly. Without memory, a pipeline re-evaluates the same role on every run, consuming API credits and cluttering the tracker with duplicate rows.

The Execution

get_existing_urls() fetches all JD URLs from the Notion database before scoring begins. The result is a dict mapping each URL to its Notion row ID, current score, and last seen date. When the scraper finds a URL already in this dict, the pipeline updates Last Seen to today's date and logs the previous score - then skips scoring entirely. A --rescore flag overrides this for weekly refreshes when you want to check if fit scores have changed as roles get updated.

The Win

A pipeline that runs daily costs the same as a pipeline that runs once, as long as the deduplication layer is correctly designed. This is the same principle applied in production data pipelines: idempotent operations prevent waste without sacrificing completeness.

WANT TO DISCUSS AI-POWERED DELIVERY?