Experiment Preferences jobs-apply

Gemini Flash vision can analyze ATS page screenshots to extract form structure and job details without DOM parsing

aivisionatscareer
Hypothesis

Gemini Flash vision can analyze ATS page screenshots to extract form structure and job details without DOM parsing

Result: confirmed
Key Findings

Vision approach worked for page understanding but was eventually replaced by DOM-based extraction for form filling. Vision remained useful for fallback analysis of non-standard ATS layouts. The OpenRouter provider with vision support became the standard AI integration pattern.

Changelog

DateSummary
2026-04-07Created during temporal gap audit
2026-02-26Original experiment

Hypothesis

Gemini Flash vision can analyze screenshots of ATS (Applicant Tracking System) pages to extract form structure and job details, bypassing the need for DOM parsing of complex, diverse ATS implementations.

Method

Built OpenRouter provider with vision support. The test-hunt.ts script ran the first end-to-end pipeline: discover jobs via LinkedIn, navigate to ATS pages, take screenshots, feed to Gemini Flash for page analysis, then AI-match extracted job details against a candidate profile.

Results

Vision approach worked for understanding page layouts and extracting job details. However, for interactive form filling, DOM-based extraction proved more reliable. Vision remained as a fallback for non-standard layouts that resist DOM parsing.

What Carried Forward

The OpenRouter provider with multi-modal support (text + vision) became the standard AI integration pattern in jobs-apply. The test-hunt.ts pipeline structure : discover, navigate, analyze, match : became the engine’s core loop.