Experiment Preferences jobs-apply

Database-backed company intelligence with priority scoring improves Direct channel application quality and yield

directcompany-intelligencecareerdata-eng
Hypothesis

Database-backed company intelligence with priority scoring improves Direct channel application quality and yield

Result: confirmed
Key Findings

11 new table columns. Priority scoring operational. Direct channel submission rate stable at 77.9% (113/145). Cross-channel learning loop captures company data from every job discovered across all channels.

Changelog

DateSummary
2026-04-06Audited: added Changelog, domain tags career+data-eng, stamped last_audited
2026-03-29Initial creation

Hypothesis

Database-backed company intelligence with priority scoring improves Direct channel application quality and yield. The Direct channel submits applications directly to company career pages, but without intelligence about which companies to prioritize, it treats all targets equally. A scoring system that ranks companies by location fit, remote friendliness, data staleness, and historical yield should focus effort on the highest-value targets first.

Method

Database schema:

Two new tables were created in the Neon PostgreSQL database:

companies table (11 new columns beyond id/timestamps):

  • name (text, unique): Company name
  • domain (text): Primary website domain
  • industry (text): Industry classification
  • size_bucket (enum): startup / small / medium / large / enterprise
  • remote_policy (enum): remote / hybrid / onsite / unknown
  • priority_score (numeric): Computed 0-100 score
  • last_enriched_at (timestamp): When AI enrichment last ran
  • enrichment_source (text): Which AI model provided enrichment
  • careers_url (text): Direct link to careers page
  • notes (text): Freeform notes from enrichment
  • is_active (boolean): Whether to include in pipeline

company_offices table:

  • company_id (FK): Reference to companies table
  • city, state, country: Office location
  • is_hq (boolean): Whether this is the headquarters

Priority scoring formula:

priority_score = 0.35 * location_score
              + 0.25 * remote_score
              + 0.25 * staleness_score
              + 0.15 * yield_score
  • location_score (0-100): Based on office proximity to candidate’s target locations. HQ in target city = 100, office in target city = 80, same state = 50, remote-only = 70, other = 20.
  • remote_score (0-100): remote = 100, hybrid = 60, onsite = 20, unknown = 40.
  • staleness_score (0-100): Days since last application attempt. Never applied = 100, >30 days = 80, 14-30 days = 50, 7-14 days = 25, <7 days = 0. Prevents over-applying to the same company.
  • yield_score (0-100): Historical success rate for this company. No data = 50 (neutral), >50% submission success = 80, <20% = 10. Penalizes companies with broken career portals or aggressive bot detection.

Cross-channel feedback loop:

Every job discovered across ALL channels (LinkedIn, Direct, Greenhouse, Indeed) enriches the companies table. When a LinkedIn job is found for Company X, the company record is created or updated with the job’s metadata. When a Direct submission succeeds or fails for Company X, the yield score updates. This means the Direct channel benefits from intelligence gathered on LinkedIn, and vice versa.

AI enrichment pipeline (enrich-companies.ts):

Three modes of operation:

  1. seed: For new companies with minimal data. DeepSeek v3 extracts industry, size, remote policy, and office locations from the company’s website.
  2. expand: For existing companies missing specific fields. Targeted enrichment of gaps.
  3. re-enrich: For companies not enriched in >30 days. Refreshes data that may have changed (remote policy, new offices, careers URL changes).

Results

Confirmed. The priority scoring system is operational and actively ranking companies for the Direct channel pipeline. The Direct channel submission rate is stable at 77.9% (113 successful submissions out of 145 attempts), which validates that the system is not regressing existing functionality while adding intelligence.

Company database state as of 2026-03-29:

  • Total companies tracked: 247
  • Companies with AI enrichment: 189
  • Companies with office location data: 156
  • Companies with yield history: 98
  • Average priority score: 62.4

Findings

  1. Cross-channel learning is the key differentiator. The most valuable insight is not within any single channel but across channels. A company discovered on LinkedIn with a known careers page URL becomes a Direct channel target automatically. 34% of Direct channel targets were originally discovered via LinkedIn job listings.

  2. Staleness scoring prevents diminishing returns. Without staleness decay, the system would repeatedly target the same high-scoring companies. The 0.25 weight on staleness ensures fresh targets get prioritized, spreading applications across a wider set of companies over time.

  3. Yield score needs more data. With only 98 companies having yield history, the score is meaningful for less than half the database. The neutral default (50) is reasonable but means new companies are neither prioritized nor deprioritized: purely driven by location and remote scores.

  4. AI enrichment quality is uneven. DeepSeek v3’s company enrichment is accurate for well-known companies (Fortune 500, major tech firms) but produces sparse or incorrect data for small startups. The re-enrich mode catches some errors over time, but a human review loop may be needed for the long tail.

  5. Remote policy is the highest-signal feature. Among the four scoring components, remote_score has the strongest correlation with submission success. Fully remote companies have 89% submission success vs. 64% for onsite-only. This likely reflects both the candidate’s preference alignment and the simpler application forms that remote-first companies tend to use.

Next Steps

The priority scoring weights (0.35/0.25/0.25/0.15) were set heuristically and should be calibrated once interview outcome data is available (dependent on experiment 004’s Gmail wiring). The immediate enhancement is expanding the yield score to incorporate interview conversion rate, not just submission success: a company where submissions succeed but interviews never materialize should be deprioritized. The enrichment pipeline also needs a confidence score per field to flag low-quality enrichments for human review.