Gemini Flash vision can analyze ATS page screenshots to extract form structure and job details without DOM parsing
HypothesisGemini Flash vision can analyze ATS page screenshots to extract form structure and job details without DOM parsing
Vision approach worked for page understanding but was eventually replaced by DOM-based extraction for form filling. Vision remained useful for fallback analysis of non-standard ATS layouts. The OpenRouter provider with vision support became the standard AI integration pattern.

Changelog
| Date | Summary |
|---|---|
| 2026-04-07 | Created during temporal gap audit |
| 2026-02-26 | Original experiment |
Hypothesis
Gemini Flash vision can analyze screenshots of ATS (Applicant Tracking System) pages to extract form structure and job details, bypassing the need for DOM parsing of complex, diverse ATS implementations.
Method
Built OpenRouter provider with vision support. The test-hunt.ts script ran the first end-to-end pipeline: discover jobs via LinkedIn, navigate to ATS pages, take screenshots, feed to Gemini Flash for page analysis, then AI-match extracted job details against a candidate profile.
Results
Vision approach worked for understanding page layouts and extracting job details. However, for interactive form filling, DOM-based extraction proved more reliable. Vision remained as a fallback for non-standard layouts that resist DOM parsing.
What Carried Forward
The OpenRouter provider with multi-modal support (text + vision) became the standard AI integration pattern in jobs-apply. The test-hunt.ts pipeline structure : discover, navigate, analyze, match : became the engine’s core loop.