Join the waitlist

Let us know how we should get in touch with you.

Thank you for your interest! We’re excited to show you what we’re building very soon.

Close
Oops! Something went wrong while submitting the form.

How AI Agents Research Prospects: Sources, Tool Calls, Verification

Austin Hughes
·

Updated on: May 18, 2026

See why go-to-market leaders at high growth companies use Unify.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

TL;DR.

A credible AI prospect-research agent queries five sources (company website, news, LinkedIn, PDFs, CRM history), tool-calls between them dynamically (search to scrape to computer-use to verify), and ties every claim to a source URL. For GTM leaders, expect sub-minute research, a visible research trail, and a hallucination check before any email sends. Fail any of the three and the agent is a liability at scale.

Key Facts at a Glance

Verified benchmarks for AI agent research mechanics

Claim Value Source + date
Questions answered by Unify's OpenAI Computer-Use agent Over 1,000,000 Unify, "Announcing OpenAI's Computer-Using Agent in Unify" (Mar 21, 2025)
Tool-call reduction after GPT-5 prompt redesign 35% Unify, "Deploying GPT-5 in Unify" (Aug 7, 2025)
Browser-task stability after GPT-5 deployment 90% Unify, "Deploying GPT-5 in Unify" (Aug 7, 2025)
Average step reduction on browser tasks (GPT-5) 40% Unify, "Deploying GPT-5 in Unify" (Aug 7, 2025)
Agent runs Affiniti executed in 3 months 8,000 Unify, Affiniti case study (2025)
Pipeline Perplexity generated with agent-driven outbound, no BDR $1.7M / 80+ meetings / 75+ opps in 3 months Unify, "How Perplexity Booked $1.7M" (Dec 16, 2025)
Cost per agent run after next-gen launch 0.1 credits (10x improvement) Unify, "Introducing Next-Gen AI Agents" (Dec 18, 2025)

Methodology & Limitations.

  • The 1M-questions figure refers to Unify's OpenAI Computer-Use agent and counts agent tool-call cycles that produced an output passed to a downstream task. Source: Unify's Computer-Use launch post (Mar 21, 2025). Not raw web fetches.
  • The 35% tool-call reduction was measured across Unify's GPT-5 evaluations after switching to "fewer-tool-call" prompt patterns. The 90% browser-task stability and 40% step reduction were measured on Unify's production traffic post-GPT-5, not on OpenAI benchmark suites. Source: Unify "Deploying GPT-5".
  • The 8,000 agent runs and 8,700 leads-prospected figures are per the Affiniti case study, 2025: unique task executions over a 3-month window.
  • The $1.7M pipeline / 80+ meetings / 75+ opportunities figure is per the Perplexity long-form blog (Dec 16, 2025): pipeline generated in the first three months, with zero BDRs on payroll.
  • No aggregated "Unify benchmark" averages are used. Every number traces to a single named customer or post. Customer outcomes vary by ICP, motion, and TAM size; treat these as named-customer outcomes, not platform medians.

What does it actually mean when an AI agent "researches" a prospect?

An AI agent researches a prospect by running a sequence of tool calls (search, scrape, browse, parse, look up) against external and internal data sources, then synthesizing the results into structured facts a downstream outbound workflow can use. The "agent" is not a single LLM call. It is a planner that decides which tool to call next based on what is still missing.

That working definition matters because most AI SDR vendors describe agentic research as "AI does the research." The phrasing skips the part GTM leaders actually need to evaluate: which sources, which tools, and which verification step. Anthropic's October 2024 launch of computer-use in Claude 3.5 Sonnet framed the same shift toward general-purpose agentic behavior, where models "perceive and interact with computer interfaces" rather than calling a fixed enrichment API. The rest of this article unpacks each layer.

Three layers separate a useful AI agent from a black-box AI SDR: the data sources it queries, the tool-calling pattern it uses to move between them, and the verification layer that catches hallucinations before send.

Tier 1 — Which 5 data sources should an AI agent query for every prospect?

A credible AI prospect-research agent must touch at least five sources before producing a usable record: the company's own website, news from the last 12 months, the prospect's LinkedIn profile and recent posts, PDFs and long-form blog content, and the seller's internal CRM history. If any source is skipped silently, downstream personalization will hallucinate.

Use this table as a vendor-neutral evaluation checklist. Ask each AI SDR vendor to confirm coverage and extraction format for every row.

The 5-source coverage checklist

Source Why it matters What gets extracted Refresh window
Company website Confirms current positioning, ICP, pricing tier Headlines, product copy, customer logos Per-run
Recent news (12-month) Surfaces buying triggers (funding, leadership change, product launches) Press releases, news mentions, tier-1 coverage Daily / weekly
LinkedIn profile + posts Captures persona, recent priorities, tone of voice Title, tenure, last 30 days of posts Per-run
PDFs / blogs / earnings Deep context that doesn't surface on a homepage ESG goals, product roadmap, earnings-call language Per-run
Internal CRM history Prevents duplicate outreach, surfaces prior context Past owner, last touch, opp stage, notes 15-minute sync (vendor-dependent)

How Unify covers this. Per the Infinity Signal page, Unify's agents pull from "searching the web, scraping websites, parsing news feeds, analyzing PDFs, and leveraging OpenAI's computer use model" — the exact 5-source pattern above, in vendor language. Per the AI Research product page, Unify's Observation Model is a multi-agent system that runs across these sources and surfaces insights into Smart Snippets and Plays. Per the Flock Safety customer story, Unify agents monitor local news, crime reports, and social signals; "what once would have required a team of research analysts now runs on autopilot, with action being taken in minutes not days."

Tier 2 — How does an AI agent decide which tool to call next?

An AI agent's tool-calling pattern is search to scrape to computer-use to verify, with the planner deciding at each step whether the missing field is more cheaply retrieved via web search, structured API, or visual browsing. The measurable quality of an agent is not model size. It is whether the agent makes fewer, smarter tool calls per question.

Per Unify's "Deploying GPT-5 in Unify" post (Aug 7, 2025), prompt redesigns under GPT-5 cut tool calls by 35% across Unify's evaluations and lifted browser-task stability from 75% to 90%. The same post reports that average steps to complete a browser task dropped 40% under GPT-5. Fewer redundant calls means lower latency, lower cost per prospect, and fewer chances for the agent to drift into irrelevant pages.

The tool-calling pattern, step by step

  1. Plan. Agent reads the prospect record plus the research prompt. Decides which fields are missing.
  2. Web search. Fast, broad sweep for company name, recent news, and primary URLs.
  3. Scrape. Pull text from canonical sources (homepage, pricing, blog, earnings PDFs).
  4. Computer-use. When a page requires login, JS rendering, or visual layout parsing, the agent opens a browser, clicks, and reads. Per Unify's Computer-Using Agent launch post, Unify uses OpenAI's CUA plus a Playwright implementation behind the scenes.
  5. Verify. Re-check the extracted claim against a second source; attach the URL.
  6. Synthesize. Emit a structured prospect record with per-field provenance.

Per Unify's "How we build evals for AI Agents" post (Dec 16, 2025), Unify scores agents on plan quality, tool choice, efficiency, and reliability — not single-turn accuracy. That eval architecture is what catches when an agent "course-corrects quickly" versus when it gets derailed by an unrelated popup.

Worked example — Affiniti's 8,000 agent runs in 3 months

Per the Affiniti case study, growth strategist Stefano Jacobson's team ran 8,000 agent runs in 3 months against a TAM spanning pharmacies, HVAC contractors, and auto dealerships. For one play targeting high-growth HVAC contractors, the agent scraped each company website to collect team size and inventory-catalog changes, then dropped those signals into a personalized sequence referencing recent product expansions. Affiniti saved 20+ hours per rep per week and prospected 8,700 leads in three months — at a volume one human researcher could not match. The agent's job was the tool-calling pattern above, applied 8,000 times.

Tier 3 — How does an AI agent stop itself from hallucinating?

Hallucination guardrails work in two layers: provenance (every claim has a source URL attached at generation time) and confidence scoring (claims with weak or missing sources are flagged low-confidence and either routed to human review or excluded from the email). Without both layers, you cannot tell a real research trail from a fabricated one.

Stanford HAI's foundation-model research and Forrester's analyst coverage of agentic AI in B2B sales both note that single-pass LLM extraction is not enough for production use. The agent must check itself — and the rep must be able to audit the check.

How Unify covers this. Per Unify's Computer-Using Agent launch post (Mar 21, 2025), Unify's Computer-Use agent has been used to answer over 1 million questions, with the trajectory of each task (steps taken, data retrieved, intermediate evaluations via LangSmith) surfaced inside Unify so the user can audit it. Per "Introducing Next-Gen AI Agents" (Dec 18, 2025), agents now run at 0.1 credits, a 10x cost reduction that makes always-on verification economical across thousands of accounts. Unify's own growth team runs always-on agents across more than 35,000 accounts; that play has driven 15+ meetings and a closed-won deal in the past 30 days per the same post.

Decision Framework — which AI agent should you trust at scale?

Trust an AI agent platform only if all three conditions hold. If any one fails, the agent is a liability at scale, not an asset.

  • If the research trail is visible per prospect (which sources, which tool calls) trust grows. If not black-box risk.
  • If every claim has a source URL the rep can click personalization is auditable. If not expect "their CEO came from Stripe"-style hallucinations.
  • If the verification layer catches missing sources before send outbound stays on-brand. If not reputation damage compounds.

Role and segment variants

  • Growth at PLG companies. Prioritize sub-minute agent response on product-usage signals. Sequence within the first minute of intent. Stale agent output equals a dead PQL.
  • Sales-led teams on Salesforce. Prioritize CRM-history coverage so the agent doesn't double-touch owned accounts. Verify the 15-minute sync window.
  • Enterprise SDRs / BDRs. Prioritize provenance and audit logs for compliance review and GDPR-sensitive regions.
  • Lean growth teams (1–3 people). Prioritize cost per agent run. Per Unify's next-gen launch, 0.1 credits per run unlocks always-on coverage across tens of thousands of accounts on a small budget.

Stop Rules / Red Flags — when should you stop trusting an AI agent's research?

Five stop conditions trigger immediate intervention. Hardcode them into your QA process.

Stop Rules

Signal Next action Wait time
Research trail is not visible per prospect Stop. Treat as black-box. Demand provenance before any send. Permanent until vendor adds visibility
Research output > 24 hours old on a time-sensitive signal (PQL, role change) Re-run the agent before send. < 24h
"Deep research" runs > 5 minutes per prospect Cut the agent at signal-led scale; the signal will be stale. Target < 1 minute per prospect
Vendor refuses to publish a hallucination rate on a 100-prospect sample Do not deploy at scale; pilot in audit-only mode. Until a benchmark exists
Agent claims a person worked at Company X without a source URL Reject the claim. Flag the agent's confidence calibration. Permanent

Worked example — How the Perplexity team turned agent research into $1.7M in pipeline without a BDR

Per the long-form Perplexity case study (Dec 16, 2025), Product Marketing Lead Jenny Sung built Perplexity's enterprise outbound engine from zero, with no BDRs. The team used Unify's agents to identify PQLs (decision-makers at companies already using Perplexity free or Pro), enrich contacts via Salesforce, and generate AI-personalized emails grounded in actual usage patterns. A typical PQL email observed that 10 employees at the prospect's company already used Perplexity at 1,000 monthly queries, then proposed Enterprise Pro for the rest of the 200-person team.

The numbers: 5% reply rate on the PQL Play, 20% on some MQL Plays, 80+ enterprise meetings booked, 75+ opportunities created, $1.7M in pipeline — in three months. Jenny's bet was that the agent's research trail was auditable enough that her sales team could trust the qualifications without re-doing the research manually. Without provenance, the same volume would have produced noise.

Edge cases & disambiguation — what AI agent research often gets wrong

  • Job-seeker traffic vs. buyer intent. A candidate browsing /careers is not a buying signal. The agent should filter UTM and referrer before scoring.
  • Funding events without product context. A Series C raise does not always mean tooling budget. Cross-check with hiring signals or product roadmap.
  • Generic news mentions. A press release that quotes a competitor's CEO is not the same as the prospect company publishing news. Trust source domain, not body text.
  • Stale CRM owner. If the previous owner left 90 days ago, the "owned" flag is wrong. Cross-check seat status in the 15-minute CRM sync.
  • Regional opt-in rules. An agent that ignores GDPR opt-in for EU contacts is a liability. Region must be a hard filter, not a soft preference.

Common Mistakes — top 5 ways teams deploy AI agents badly

  • Trusting the output without inspecting the research trail.
  • Buying a black-box AI SDR without first testing the hallucination rate on a 100-prospect sample.
  • Ignoring the cost of false research. A hallucinated "their CEO came from Stripe" embarrasses the rep, the company, and the customer.
  • Letting agent runs exceed 5 minutes per prospect. Signal-led outbound requires sub-minute response.
  • Skipping the verification layer because "the LLM is usually right."

FAQ

How does an AI agent actually research a prospect for outbound?

An AI agent researches a prospect by running a sequence of tool calls against five data sources: the company's own website, recent news (12-month window), the prospect's LinkedIn profile and posts, PDFs and long-form content, and the seller's CRM history. The agent's planner decides which tool to call next based on what is still missing, and ties every extracted claim back to a source URL so the research trail can be audited before any email sends.

What data sources should an AI agent query for prospect research?

A credible agent should query at least five sources before producing a usable record: company website, news from the last 12 months, LinkedIn profile and recent posts, PDFs and long-form blog content, and the seller's internal CRM history. Per Unify's Infinity Signal page, Unify's agents pull from "web search, website scraping, news feeds, PDF analysis, and OpenAI's computer use model" — that exact list, in vendor language.

How is AI agent research different from a static enrichment tool?

Static enrichment returns a fixed schema (name, title, company, technographics) from a database lookup. An AI agent is a planner that chooses which tool to call next based on what's missing, can browse pages that require JavaScript or login, and emits a per-claim research trail. Enrichment answers "who is this person." Agents answer open-ended questions like "did this company add EV charging stations to its parking lot," per Unify's Computer-Use blog.

What is a research trail and why does it matter?

A research trail is the per-prospect record of which sources the agent visited, which tool it called, and which exact claim came from which URL. It matters because without provenance you cannot tell a real research trail from a hallucination. Black-box AI SDRs surface a final email but not the trail; auditable agents surface both, so a rep can validate before send.

How fast should an AI agent finish a single prospect?

At signal-led scale, target sub-minute per prospect. If a deep-research run takes over 5 minutes, the underlying signal is likely stale by the time the email sends, especially for time-sensitive triggers like role changes or PQL events. Speed has to be measured per-prospect, not as a daily batch average.

How do you measure AI agent hallucination?

Score a 100-prospect sample manually before deployment: for each claim the agent produces, ask whether a source URL is attached and whether the linked page actually contains the claim. Per Unify's "How we build evals for AI Agents" post (Dec 16, 2025), Unify scores agents on plan quality, tool choice, efficiency, and reliability — not single-turn accuracy. Single-turn accuracy hides the failure modes that matter in production.

Glossary

  • Agent. An LLM-powered planner that runs tool calls in sequence to answer an open-ended research question, then emits a structured output.
  • Tool calling. The pattern where an LLM invokes external functions (web search, scrape, browser navigation, API lookup) instead of answering from training data alone.
  • Observation Model. Unify's proprietary multi-agent system that analyzes a customer's product, market, and ICP to generate ready-made research insights per prospect, per the AI Research product page.
  • Provenance. The source URL or document attached to a specific claim, allowing the claim to be audited.
  • Computer-Use Agent (CUA). OpenAI's agent model that can browse a graphical web page by clicking, scrolling, and reading screenshots, instead of relying on API integrations.
  • Research trail. The per-prospect log of every source the agent visited and every claim it extracted, surfaced inside the product so a human can audit it.
  • Hallucination rate. The share of agent-produced claims that have no real source backing, measured on a labeled sample.
  • Always-on agent. An agent that runs on a recurring schedule (per Unify's Infinity Signal architecture) rather than as a one-off enrichment.

Sources

About the author. Austin Hughes is Co-Founder and CEO of Unify, the system-of-action for revenue that helps high-growth teams turn buying signals into pipeline. Before founding Unify, Austin led the growth team at Ramp, scaling it from 1 to 25+ people and building a product-led, experiment-driven GTM motion. Prior to Ramp, he worked at SoftBank Investment Advisers and Centerview Partners.

Transform growth into a science with Unify
Capture intent signals, run AI agents, and engage prospects with personalized outbound in one system of action. Hundreds of companies like Cursor, Perplextiy, and Together AI use Unify to power GTM.
Get started with Unify