Contextual ICP Scoring with Claude Code: Why Employee Count and Tech Stack Aren't Enough Anymore
Get deeper insights and better conversion rates by moving beyond simple filters to dynamic ICP scoring powered by AI
Blogby JanMarch 03, 2026

98% of marketing qualified leads never convert into closed deals. That number, reported by Martal Group in their 2025 B2B lead scoring analysis, is not a typo. For every 100 leads your scoring model marks as "qualified," roughly two become customers.
The average MQL to SQL conversion rate sits at just 13% across B2B industries, according to Landbase's 2026 benchmark data. Companies using AI-driven scoring see 40% accuracy improvements over traditional methods, but most teams are still running scoring models built on a handful of firmographic filters that were standard practice a decade ago.
| The Old Way | The Contextual Way |
| 5 to 6 static filters (employee count, revenue, industry, tech stack, job title) | 10+ scoring dimensions including hiring velocity, content signals, competitive displacement windows, and buying committee structure |
| Binary pass/fail against firmographic thresholds | Weighted scoring with custom logic that adapts per ICP segment |
| Runs inside a visual tool with column limits | Runs in Claude Code with no constraints on scoring complexity |
| Same model for every client and every campaign | Unique scoring per client, updated with every campaign cycle |
The problem is not the tools. It is that the scoring models are shallow. A typical ICP scoring setup checks five or six dimensions: employee count, revenue, industry, tech stack, job title keywords. Pass all five, high score. Fail one, lead drops, regardless of every other signal suggesting they are ready to buy.
Contextual ICP scoring with Claude Code works differently. Instead of five filters, you score across 10 or more dimensions, including hiring patterns, competitive displacement signals, buying committee structure, and contextual indicators that static filters cannot capture. Because the scoring runs in code, there is no ceiling on how complex or specific your logic gets.
1. What's Wrong with Five-Filter Scoring
Traditional lead scoring models follow a consistent pattern. Define your ICP using a handful of attributes, assign point values, run every lead through the filter.
→ Employee count: 50 to 500 (+20 points)
→ Annual revenue: $5M to $100M (+15 points)
→ Industry: SaaS, FinTech, or Healthcare (+10 points)
→ Tech stack includes HubSpot or Salesforce (+10 points)
→ Contact title contains "VP" or "Director" (+15 points)
Score above 50, routed to sales. Below, goes to nurture. This approach has two fundamental problems.
First, it treats all qualifying attributes as equally static and equally predictive. A 200-person SaaS company matching every firmographic dimension could be in a hiring freeze, locked into a three-year contract with your competitor, and showing zero buying intent. It scores a 70 and gets sent to your best AE, who wastes a week on an account that was never going to buy this quarter.
Second, five filters cannot capture the signals that predict readiness. Hiring velocity, leadership changes, competitive contract windows, job postings describing a pain your product solves, LinkedIn content from the buying committee signaling frustration. These are the signals experienced GTM operators use to prioritize. They just cannot express them in a five-column scoring model.
One agency operator put it bluntly: scoring with employee count and tech stack is the 1990s way. Real scoring spans 15 dimensions, and most of them are contextual. Is there a signal inferring this prospect is in a certain state? Feeling a specific pain? At a moment where your value proposition is uniquely relevant?
2. The 15-Dimension Scoring Model
A contextual ICP scoring model evaluates prospects across three categories: static fit, dynamic signals, and strategic context.
Static Fit (the baseline, not the score)
→ Company size (employee count range)
→ Revenue range
→ Industry and sub-industry
→ Geographic location
→ Tech stack composition
Static fit determines whether a prospect could be a customer. It does not tell you whether they should be contacted right now.
Dynamic Signals (what's happening right now)
→ Hiring velocity. Three SDR roles and a VP of Sales posted in the same month signals active investment in GTM infrastructure. Zero roles posted means they are probably not expanding their stack.
→ Funding recency and type. A Series B from four months ago means budget and active growth investment. A Series A from two years ago with no follow-on tells a different story.
→ Leadership changes. A new CRO or VP of Marketing in the last 90 days almost always triggers a vendor evaluation cycle. New leaders bring in their own tools.
→ Competitive displacement windows.Detectable through technographic changes, job postings mentioning specific tools, or LinkedIn posts about evaluating alternatives. A prospect approaching contract renewal with your competitor is a high-value timing signal.
→ Content engagement velocity. Not just "visited the pricing page" but the acceleration. A prospect going from zero to three site visits, a case study download, and a webinar registration in two weeks is moving through their buying process fast.
Strategic Context (the why behind the score)
→ Buying committee structure. A champion (daily user), decision maker (budget authority), and influencer (trusted opinion) identified at the same account scores higher than finding only the VP.
→ Pain signal inference. Job postings describing manual processes your product automates. LinkedIn posts complaining about tool sprawl. These require AI to read, interpret, and score.
→ Market position and trajectory. A company growing 40% year-over-year has different urgency than one flat for three years.
→ Account-level engagement density. Three different people from the same company engaging with your content in one month is stronger than one person engaging three times.
→ Disqualification signals. Recent layoffs in the buyer's department. Completed competitor implementation. Public budget cuts. Scoring should subtract points aggressively for these, not just add for positive ones.
3. Why This Runs in Code, Not a Dashboard
In Claude Code, a contextual scoring workflow runs like this:
→ Pull enrichment data via API. Firmographics, technographics, funding, and hiring data from your enrichment platform. One API call returns standardized JSON across all structured dimensions.
→ Score structured dimensions programmatically. Each dimension gets a weighted score based on your ICP definition. The weights are variables you tune based on what actually converts.
→ Use Claude for contextual interpretation. Feed the prospect's job postings, LinkedIn activity, and company news into a prompt: "Based on these signals, is this company likely experiencing the pain our product solves? Rate 0-100 with reasoning."
→ Combine into a composite score. Structured dimensions produce a quantitative score. Contextual interpretation adds a qualitative layer. Both feed into a final composite ranking prospects by predicted buying readiness.
→ Output a prioritized list with reasoning. Each scored prospect comes with an explanation: why they scored the way they did, which signals were strongest, and the recommended outreach angle. AEs see a brief, not just a number.
4. Building the Scoring Skill
A Claude Code skill for contextual ICP scoring has three components: the ICP definition, the scoring logic, and the output format.
The ICP definition lives in your CLAUDE.md or a client-specific config file. Not a paragraph of marketing copy. A structured specification:
→ Static fit: 50 to 500 employees, $5M to $80M revenue, B2B SaaS or FinTech, must use HubSpot or Salesforce
→ Hiring signals: weight 2x if 3+ GTM roles posted in last 60 days, negative weight if layoffs detected
→ Funding: weight 2x if Series B or C within 12 months, negative if no funding and flat growth
→ Competitive: weight 3x if using a direct competitor with renewal approaching, 2x if competitor present with no renewal data
→ Buying committee: weight 2x if champion + decision maker identified, disqualify if no relevant contacts available
The scoring logic lives in a skill file (.claude/skills/icp-scoring/SKILL.md) with the enrichment API calls, scoring formulas, contextual interpretation prompts, and composite rules.
The output is a tiered prospect list. Tier 1 (80+) gets immediate AE attention with a brief. Tier 2 (60-79) enters an automated sequence with personalized hooks. Tier 3 (40-59) goes to nurture. Below 40 gets archived.
For agencies managing 15 clients, this means one skill, 15 ICP configs, 15 uniquely scored prospect lists. No manual reconfiguration between runs.
5. Where the Enrichment Data Comes From
A contextual scoring model requires data from multiple categories, and no single provider covers all of them.
Static fit data comes from standard B2B enrichment providers. Hiring data comes from job posting aggregators like TheirStack and PredictLeads. Competitive and technographic signals come from BuiltWith and similar providers. Content signals come from your own analytics and LinkedIn activity tracking.
The challenge is combining data from six or eight different providers into a unified view. This is where waterfall enrichment through an aggregator matters. Instead of managing separate API keys, rate limits, and response formats for each provider, you call one endpoint that cascades through multiple sources and returns a normalized response. That standardized format is what makes it possible for the AI agent to process hundreds of prospects reliably, because the data schema is consistent regardless of which underlying provider returned the value.
6. The Scoring Feedback Loop
A static five-filter ICP stays the same until someone manually updates it. Your actual ICP might drift, but the scoring model does not notice because nobody updated the spreadsheet.
A contextual ICP scoring model in Claude Code updates through a feedback loop:
1. Score a batch. Each prospect gets a composite score and reasoning explanation.
2. Track outcomes. After 30 to 60 days, log which scored prospects converted and which did not.
3. Analyze the gap. Compare predictions against outcomes. Which dimensions were most predictive? Which least? Were there converted prospects scored low (missing dimensions) or high-scoring prospects that ghosted (overweighted dimensions)?
4. Adjust weights. If hiring velocity was 3x more predictive than revenue range, increase its weight. If competitive signals had zero correlation with conversion, reduce them.
5. Run the next batch with the updated model. Each cycle produces a more accurate model.
Research from Landbase's 2026 benchmarks shows AI-driven scoring delivers 40% accuracy improvements. But that improvement compounds with each feedback cycle. Teams running this loop monthly for three to four months see the gap between their scoring predictions and actual outcomes narrow significantly.
For agencies, each client's model learns independently. After six months, you have 15 individually optimized scoring systems, each reflecting what actually works for that specific ICP.

7. A Practical Example: Scoring One Account
To make this concrete, here is how one prospect scores across both models.
The prospect is a 180-person B2B SaaS company in Austin. Revenue estimated at $22M. Uses HubSpot. Three recent job postings: an SDR, a Content Marketing Manager, and a Sales Operations Analyst. Series B closed nine months ago. VP of Sales started four months ago.
Five-filter model: 72/100. Hits employee range, revenue range, industry, and tech stack. Misses on geography. Solid but unspectacular.
Contextual model: 84/100. The five-filter score is the baseline. But three GTM hires in the same window signals active investment. The Series B is recent enough to indicate budget. The new VP of Sales is the strongest signal: new sales leaders almost always evaluate tools within their first six months. Claude Code reads the Sales Ops job posting, which describes "building and maintaining enrichment workflows across multiple data tools," mapping directly to a product pain point. The VP of Sales' LinkedIn shows recent engagement with posts about CRM data quality. Two people from the company attended a sales tech webinar last month. Buying committee looks strong.
The five-filter model routes this to a standard sequence. The contextual model routes it to an AE with a brief explaining the timing window, the pain signals, and the recommended outreach angle referencing the Sales Ops role.
That difference, between "decent fit, standard sequence" and "high priority, specific approach, timed to a buying window," is what separates 13% average MQL-to-SQL conversion from the 39% that behavioral scoring achieves in B2B SaaS according to Data-Mania's 2026 analysis.
8. Getting Started in One Week
Days 1 to 2: Define your ICP across all three categories. Static fit criteria, 5 to 7 dynamic signals you can source data for, and 3 to 4 strategic context dimensions. Start with the 10 you can populate from existing enrichment data.
Day 3: Build the enrichment pipeline. Set up the API calls for firmographics, technographics, hiring data, and funding data. Using a waterfall enrichment aggregator means one integration instead of six.
Days 4 to 5: Write the scoring skill. Encode the ICP definition, scoring formulas, Claude prompts for contextual interpretation, and output format. Test against 20 to 30 prospects with known outcomes.
Days 6 to 7: Run a live batch and calibrate. Score a real prospect list. Have AEs review Tier 1 results. Adjust weights before the next batch.
By end of week, you have a functional contextual ICP scoring system producing better-prioritized prospect lists than any five-filter model. The system compounds from there. Each feedback cycle makes it more accurate. Each new client gets a uniquely optimized model. Each month, the gap between your scoring and your competitors' scoring widens.
That is what happens when ICP scoring stops being a checklist and starts being a reasoning system.

FAQ
What is contextual ICP scoring?
Contextual ICP scoring evaluates prospects across 15+ dimensions including static firmographics, dynamic signals (hiring velocity, funding recency, leadership changes), and strategic context (buying committee structure, pain signal inference, competitive displacement windows). It considers the prospect's current situation and buying readiness, not just whether their company profile matches a checklist.
How is this different from traditional lead scoring?
Traditional scoring uses 5 to 6 filters with point assignments. Contextual scoring uses 15+ dimensions with weighted, conditional logic and natural language interpretation. Traditional scoring says "this company fits your ICP." Contextual scoring says "this company fits, is entering a buying window because of these signals, and here is the recommended outreach angle."
What enrichment data do I need?
At minimum: firmographic data, technographic data, and verified contact data. For the contextual dimensions, add hiring data from job posting aggregators, funding data, and your own engagement analytics. Databar.ai covers multiple data categories through one integration.
How does contextual scoring work for agencies with multiple clients?
One Claude Code skill, multiple client configs. Each client gets their own ICP specification with target ranges, weights, and disqualification triggers. The skill loads the appropriate config per client. Over time, each client's model improves independently through its own feedback loop.
Related articles

MCP vs. SDK vs. API: When to Use Which for GTM Workflows
When to Use MCP: Best for Exploratory and Conversational Workflows
by Jan, March 06, 2026

Claude Cowork for GTM: What Sales and RevOps Teams Need to Know
How Claude Cowork Simplifies Sales and Revenue Operations
by Jan, March 05, 2026

250+ Hours of Claude Code for GTM: Here's What We Learned
What 250+ Hours Building an Claude Code Powered GTM Campaign Taught Us About Automation and Accuracy
by Jan, March 04, 2026

Claude Code vs. Clay: When to Use Which for GTM Workflows
Finding the Right Tool for Your GTM Strategy and Data Enrichment Needs
by Jan, March 02, 2026


