Learn › AI recruiting in Japan

How to evaluate an AI sourcing vendor for the Japan market

Most procurement teams compare AI sourcing vendors on the wrong axes — model size, demo polish, US-market case studies. The axes that actually predict whether a tool will work in Japan are different and sometimes uncomfortable to ask about. This guide walks through the eight questions a buyer should put to any vendor before signing, with the answer profile that distinguishes a Japan-capable platform from one that will not survive contact with the market.

The short answer

Eight questions distinguish a Japan-capable AI sourcing vendor from one that will fail in the Japan mid-career market: regulatory footprint (第4号 filing status), data sourcing posture, bilingual register at the model layer, partial-profile handling, validation against placement outcomes, named operator accountability, scout-message production semantics, and pricing tied to qualified meetings rather than seats. Most US-built platforms fail four to six of these. The framework below is what we use ourselves; it’s also what we’d use to evaluate ourselves.

Why the standard procurement framework fails here

Most enterprise procurement teams comparing AI sourcing vendors run a standard playbook: feature matrix, demo cadence, reference calls, security review. That playbook is calibrated for tools that work the same way in every market. AI sourcing in Japan does not. The differences are not surface-level localization — Japanese UI, Japanese support hours — they are model-layer and operating-layer. A platform that handles English-language profiles competently will produce embarrassing scout mails on Japanese-language candidates. A platform that scores US software engineering well will rank Japan mid-career bilingual candidates poorly because the model has not been trained on the signals those candidates carry. A platform that runs comfortably under US data-protection norms can quietly put a Japanese employer in violation of the amended 職業安定法.

The standard framework misses these because they are not features the vendor advertises. They are operating constraints the vendor either has internalized through years of running in this market or has not. The eight questions below are designed to surface that distinction in a procurement conversation that takes one hour rather than one quarter.

Question 1 — What's your filing status under 改正職業安定法?

Since the 2022 amendment to the Employment Security Act, any operator that provides candidate information to Japanese employers must file as a 特定募集情報等提供事業者. There are sub-categories. A vendor that scores or matches candidates — that is, processes data beyond simple delivery — falls under 第4号. The MHLW registry currently lists six entities in this category among 1,642 total filings. Headhunt.AI is one of them. Most foreign AI sourcing platforms are not filed at all, and a smaller number are filed under categories that don’t cover what their AI actually does.

The procurement implication is direct. Article 30 of the Employment Security Act puts a confirmation duty on the buyer — the employer using the service must verify that the provider is appropriately filed. If your vendor isn’t filed, or is filed under the wrong category, the regulatory exposure sits with you. Ask for the 受理番号 and check it against the MHLW registry. A vendor that needs to look this up rather than answering immediately is a vendor that hasn’t internalized this part of the market.

Question 2 — Where does your underlying profile data come from?

Vendors will often answer this with phrases like "proprietary database" or "independent data layer." Push back. The honest answer for any AI sourcing platform serving the Japan market is some combination of public LinkedIn data, public web sources, and customer ATS data they’re allowed to use. Headhunt.AI’s underlying profile data is built primarily from public LinkedIn data through commercial licensing, with additional public social signals from X (formerly Twitter), GitHub, Facebook, and Instagram layered in where candidates have visible activity on those surfaces; we have a 4M+ Japan-focused profile universe; the matching and scoring AI is proprietary, the underlying profile data is not. Saying that out loud reduces a vendor’s pitch surface area, which is why many vendors don’t.

Why this matters for the buyer: the source determines the data freshness profile, the regulatory exposure profile, and what happens during a LinkedIn enforcement action. Vendors with less direct sourcing arrangements have been caught in the 2025–2026 enforcement cycle (the Proxycurl injunction, the Apollo and Seamless deplatforming events, the ProAPIs filing). A buyer should know which side of those events their vendor sits on.

Question 3 — Show me a scout mail your platform produced for a Japan-bilingual candidate, with no human edits

This is the question that ends most demos. Ask for an actual unedited platform output addressed to a Japan-based bilingual mid-career candidate, in the candidate’s preferred language register, with the JD-derived hooks the AI selected. Not a demo script. Not a marketing example. A real artifact.

What you’re checking: keigo register correctness, the use of 拝啓 / 敬具 if the message uses formal opening, paragraph break density (Japanese business email runs longer between breaks than English), and whether the JD-to-candidate hook actually reads as a hook in Japanese rather than translating from an English structure. In our 2026 production cohort across 123,675 contacted candidates, 3.13% replied without any human review of the AI-drafted bilingual scout mails. Our reply rate at scale on unedited output is the test we’d want any vendor to pass before we’d recommend them.

Question 4 — How does your scoring handle a partial profile?

Most candidate profiles in the Japan market are partial — a name, a current employer, a position title, sometimes a graduation year. The full North American-style profile with eight career stops, seventeen skills tags, and a portfolio link is rare here. A vendor whose scoring degrades sharply on partial profiles will rank exactly the wrong candidates highly: the over-credentialed, over-displayed candidates whose profiles look complete because they’ve been job-hunting publicly. The candidates who would be the best hires are often the ones whose profiles are the sparsest.

The right answer from a vendor is specific: which signals are weighted when the public profile is sparse, how the model handles missing-data fields, and what its precision/recall trade looks like at the partial-profile end of the distribution. In our 2026 cohort, roughly 30% of qualified meetings came from candidates Boolean wouldn’t rank in the top 50 of a keyword search. A vendor that can’t describe its partial-profile mechanics is a vendor that quietly hands you the wrong shortlist.

Question 5 — How do you validate scoring against actual placement outcomes?

Many AI sourcing vendors validate against "engagement" — opens, replies, profile views. These are upstream signals that correlate weakly with the only outcome that matters: did the candidate get hired and stay. The honest validation runs scoring back-tests against placement data over a multi-month window and reports precision and recall at the placement layer, not the engagement layer.

Ask the vendor for their back-test methodology, sample size, and the time window. Ask whether they have placement data — either their own or a customer’s that they’re permitted to use — to back-test against. A vendor who can describe a placement-back-tested scoring validation has done work most have not. Ours is documented in the methodology page and validated against a published 25-month sample of 3,852 resumes producing 74 placements, drawn from our production data — the firm’s complete placement record is not disclosed.

Question 6 — Who's accountable when an output goes wrong?

An AI scout mail can go wrong in ways that range from awkward (wrong honorific) to expensive (regulator complaint, candidate-publishes-on-LinkedIn) to existential (employer named in a 個人情報 mishandling case). The procurement question is who is named — by name, with a title and email — as the accountable operator. "The team" is not an answer. "Our customer success organization" is not an answer. A specific human, reachable at a specific email, is the answer. If the vendor cannot or will not name one, the platform is being run without an accountable operator.

Question 7 — How is your pricing model anchored?

Seat-based pricing dominates the AI sourcing category because it’s familiar and easy to procure. It also creates a structural incentive for the vendor to expand seat counts that don’t correlate with the buyer’s outcome. A pricing model anchored to qualified candidate matches — what we use — aligns the vendor’s revenue with the buyer’s outcome (more matched candidates means more meetings means more placements). Buyers should at minimum understand which model their vendor runs and what the unit-economic implications are at the buyer’s actual usage profile.

Question 8 — What's your honest list of where this won't work?

A vendor that can name the cases where their tool genuinely won’t work has thought about its limits. A vendor that says it works in every case has not. Our honest answer: AI sourcing of this kind is weakest where the candidate has no public footprint of any kind — extremely senior individuals who keep no LinkedIn presence, candidates in highly stigmatized adjacent industries who suppress their professional history, and per-candidate long-cycle workflows where a recruiter is courting a single specific named target over weeks rather than evaluating candidates from a list. Two limits, named honestly. A vendor whose limits list is empty either hasn’t run the system long enough to find them, or has and is choosing not to tell you.

How to run the conversation

Eight questions, one hour. Bring them in writing. Ask for written follow-ups on the ones the vendor can’t answer in the room. Score each answer on a 0–2 scale where 0 is no answer or hand-waving, 1 is partial but credible, and 2 is specific with cited evidence. A Japan-capable vendor scores 14+ across the eight. A US-built platform that hasn’t internalized this market typically scores 6–10. Anything below 8 is a procurement risk that no enterprise security review will catch.

We use this framework on ourselves quarterly. The questions get harder over time, not easier. Anyone running an AI sourcing platform in Japan should expect them.

Frequently asked

Can I trust a vendor that says "we comply with APPI" without specifics?

No. APPI compliance for AI sourcing is not a one-line claim — it requires identifying the legal basis for processing, documenting cross-border transfer arrangements with foreign processors, naming a 個人情報取扱事業者 contact, and showing how Article 30 of the amended 職業安定法 buyer’s confirmation duty is supported. A vendor whose answer is "we comply with APPI" without those specifics has not done the work. See our APPI compliance cornerstone for the seven-question audit.

What if the vendor I want isn't filed under 第4号?

Treat it as a procurement blocker, not a footnote. The Article 30 confirmation duty puts the regulatory exposure on the buyer when the provider is misfiled. Either the vendor needs to file (it’s a public process; the registry receives new entries monthly) or the buyer needs to look at a vendor that is filed. The 第4号 category is small specifically because most platforms haven’t done this work — if the vendor is committed to the Japan market, they will.

Is this framework just to favor Headhunt.AI?

It’s the framework we use on ourselves, and it would be the framework we’d use on a competitor. The questions are calibrated to what determines whether an AI sourcing platform can actually run in Japan — they’re not synthetic. We score 14–16 across the eight in our own honest audit; we score 12 if we’re being strict on the limits answer. We’re not perfect against this framework. We just think the questions are the right ones, and any vendor confident in the Japan market should be willing to answer them.

How long should the eight-question conversation actually take?

An hour. Five to seven minutes per question on average. Some questions a Japan-capable vendor will answer in a sentence; some will take fifteen minutes. The vendor that needs to bring in three other people to answer Question 1 (filing status) is not the vendor for you — that’s an operating-team test, not just an information test.

Where can I see a worked example of running this on a real vendor?

We can’t publish a comparative scorecard naming specific competitors, for legal reasons — but the Headhunt.AI vs LinkedIn Recruiter comparison walks through what an honest comparative analysis looks like across two platforms with stated relative strengths and limits. The same analytical posture, applied to your shortlist, is what the eight questions are designed to support.

Sources

Production data from ExecutiveSearch.AI K.K. and ESAI Agency K.K. internal operations. The 6-of-1,642 第4号 figure is from the MHLW 特定募集情報等提供事業者 届出受理事業者リスト, retrieved Q2 2026. The published 25-month back-test sample (3,852 resumes, 74 placements — a representative slice of our production data, not the firm’s complete placement record) and the 16-week 2026 outreach cohort (123,675 candidates contacted, 3.13% reply rate) are documented on our methodology page with published-sample sizes, statistical methods, and anonymization policy. The 30% partial-profile finding is from our 2026 cohort and is detailed in the AI candidate scoring cornerstone. LinkedIn enforcement timeline references (Proxycurl, Apollo/Seamless, ProAPIs) are public court filings and platform announcements.

Run the framework on us

Talk to sales and put us through the eight questions. We answer them in writing if useful. Ten free credits are available to validate the scout-mail and scoring claims directly.

Talk to sales Read: Headhunt.AI vs LinkedIn Recruiter Get started