me

Databar.ai

Resources
👋
Schedule a call
API Network
Get started free
Go back to blog

How to Audit Your CRM Data Quality with Claude Code

How to Spot and Solve Data Problems That Slow Down Your Sales

Blog
me

by Jan

Post preview

B2B contact data decays at roughly 2.1% per month. That adds up to about 22.5% of your database going stale every year, according to data from Marketing Sherpa. For a CRM with 20,000 contacts, that is 4,500 records silently rotting while your team builds campaigns on top of them.

The usual response is a quarterly cleanup sprint. Someone exports the database, spends a week eyeballing records in a spreadsheet, fixes what they can find, and imports it back. The data looks better for a month. Then the cycle repeats.

What decays How fast What breaks
Job titles 65.8% change annually Lead scoring, routing, personalization
Phone numbers 42.9% change annually Connect rates, outbound sequences
Email addresses 37.3% change annually Deliverability, sender reputation
Company firmographics 22.5% decay annually ICP targeting, segmentation

CRM data quality audits with Claude Code work differently. Instead of manual spot-checking, you export your CRM data, point Claude Code at it, and get a structured analysis of every quality issue across every field in your database. Missing values, duplicate clusters, formatting inconsistencies, stale records, invalid emails, outdated job titles. All of it surfaces in minutes, not days.

This guide walks through the full audit process in seven phases: export, assess, diagnose duplicates, check for stale data, standardize fields, re-enrich what is broken, and set up recurring maintenance so the cleanup compounds instead of resetting every quarter.

1. What a CRM Data Quality Audit Covers

A proper CRM data quality audit goes beyond "how many fields are empty." It measures five dimensions of quality, and each one affects different parts of your GTM operation.

Completeness. What percentage of records have all required fields filled? A contact with a name and email but no company, no title, and no phone number is technically in your CRM but functionally useless for outbound.

Accuracy. Are the filled fields actually correct? A job title that was accurate 18 months ago is now wrong for roughly two-thirds of your database if the industry average holds. Accuracy problems are harder to catch than completeness problems because the data looks fine until someone tries to use it.

Consistency. Is the same information represented the same way across records? "Salesforce" vs. "salesforce.com" vs. "SFDC" vs. "Salesforce Inc." in your company name field creates four records that should be one. "VP of Sales" vs. "Vice President, Sales" vs. "VP Sales" makes segmentation unreliable.

Freshness. When was each record last verified or updated? A record that has not been touched in 12 months has a 22.5% chance of containing at least one outdated field. After 24 months, that probability climbs significantly.

Validity. Do the values conform to expected formats? Phone numbers without country codes, email addresses with typos in the domain, revenue fields with currency symbols mixed into the numbers. These seem minor but break automations and integrations downstream.

Claude Code can assess all five dimensions from a single CRM export. That is the practical advantage over manual auditing, where a human reviewer might focus on completeness (the easiest to see) while missing accuracy and consistency problems that cause more damage.

2. Preparing Your CRM Export

The audit starts outside Claude Code. You need a clean export from your CRM that includes the right fields and enough metadata to make the analysis useful.

What to export. Pull contacts and companies as separate CSVs. For contacts, include: first name, last name, email, phone, job title, company name, company domain, lifecycle stage, lead source, last activity date, create date, and owner. For companies, include: name, domain, industry, employee count, revenue, city, state, country, and any custom fields your team uses for segmentation.

Include timestamps. The "last modified" and "last activity" dates are critical. They are how Claude Code determines freshness. Without them, the audit can measure completeness and consistency but cannot flag stale records, which are often the most damaging quality issue.

Export everything, not just active records. The point of an audit is to see the full picture. If you only export "marketing qualified" contacts, you miss the thousands of stale records polluting your database and inflating your CRM costs.

For HubSpot, use the native export from Contacts > Actions > Export. For Salesforce, Data Loader or a report export works. For Pipedrive, use the export from the list view. The format does not matter as long as you get CSVs with all fields and timestamps.

Drop the exported files into your Claude Code project directory. That is the entire setup.

3. Phase 1: The Completeness Scan

This is the fastest win. Ask Claude Code to analyze every field across all records and calculate fill rates.

The prompt is straightforward: "Read contacts.csv. For every column, tell me the total count, the number of non-empty values, the fill rate as a percentage, and sort by fill rate ascending so I see the worst fields first."

Claude Code reads the file, runs the analysis, and produces a table. A typical result for a mid-market B2B company looks something like this:

Field Fill Rate Impact
First Name 98% Low risk
Email 95% Low risk
Company Name 91% Moderate risk
Job Title 74% High risk, breaks lead scoring
Phone 52% High risk, kills outbound connect rates
Industry 41% High risk, breaks segmentation
Revenue 23% Critical, ICP scoring is unreliable
Employee Count 19% Critical, ICP scoring is unreliable

The fill rates alone tell you where your biggest gaps are. But the real value is the next step: asking Claude Code to cross-reference fill rates against lifecycle stages. "Show me the average fill rate for contacts in each lifecycle stage: subscriber, lead, MQL, SQL, opportunity, customer."

That cross-reference usually reveals that your early-stage leads have terrible completeness (expected) while your SQLs and opportunities also have gaps (not expected and much more expensive). Those are the records where missing data costs real pipeline.

4. Phase 2: Duplicate Detection

Duplicates are the second most common quality issue after missing data, and they are harder to find because they rarely involve exact matches. The same person appears as "John Smith" at "Acme Corp" and "Jonathan Smith" at "Acme Corporation." Both records have different owners, different activity histories, and different lifecycle stages.

Claude Code handles this with fuzzy matching. Ask it to "identify potential duplicate contacts based on similar email domains, similar name combinations, and matching company domains. Group them into clusters and score each cluster by confidence level."

Claude Code produces output grouped by cluster:

High confidence (90%+). Same email address or same name at same domain. These are almost certainly duplicates and can be merged.

Medium confidence (70 to 89%). Similar names at the same company domain, or same person with different email addresses (personal vs. work). These need human review but are usually real duplicates.

Low confidence (50 to 69%). Similar company names that could be parent/subsidiary relationships, or common names at different companies. Flag but do not auto-merge.

The duplicate report also surfaces which record in each cluster has better data. If one "John Smith" record has a phone number, a title, and recent activity, while the other has only a name and email from 2023, Claude Code can recommend which to keep as the primary and which fields to merge from the secondary.

For a 15,000-record CRM, this analysis typically finds 500 to 2,000 duplicate clusters. Manual deduplication at that volume takes days. Claude Code produces the report in minutes. Some teams feed the deduplicated output directly into enrichment platforms to fill gaps on the merged records in the same workflow.

5. Phase 3: Staleness and Accuracy Check

This is where the audit gets interesting. Completeness and duplicates are structural issues. Staleness is a content issue, and it is where most revenue leaks hide.

Ask Claude Code to "flag all contacts where the last activity date is older than 6 months, the last modified date is older than 12 months, or the job title contains keywords that suggest a previous role (former, ex-, retired, interim)."

Then layer on accuracy signals: "For contacts with a last modified date older than 12 months, calculate the probability that their job title has changed based on industry-average role tenure. Flag any contacts where the email domain no longer resolves."

Claude Code produces a staleness report segmented by risk tier:

  • Critical (last activity 12+ months, no recent modification). These records are almost certainly outdated. Job titles, phone numbers, and possibly even company associations are likely wrong.
  • High risk (last activity 6 to 12 months, no recent modification). One in five of these records has at least one outdated field based on standard decay rates.
  • Moderate risk (last activity 3 to 6 months, modified within 12 months). Worth verifying before building campaigns around them, but most are still usable.
  • Low risk (active within 3 months or recently modified). These are your clean records. Build campaigns here first while you fix the rest.

The staleness report is also where you find ghost records: contacts at companies that have been acquired, merged, or shut down. Claude Code can cross-reference company domains against known domain status if you give it the data, or it can flag companies with zero activity across all contacts as candidates for manual verification. For email-specific validation, email finder and verification tools can confirm which addresses are still deliverable before you waste sends on dead mailboxes.

6. Phase 4: Field Standardization

Inconsistent data does not show up in fill rate reports because the fields are technically filled. But "VP of Sales" and "Vice President, Sales" and "VP Sales" look like three different titles to your lead scoring model, your routing rules, and your segmentation filters.

Claude Code excels at this. Ask it to "analyze the job title field and group similar titles into standardized categories. Show me the current variants and your recommended standard for each."

Common standardization tasks Claude Code handles well:

Job titles. Map variants to a consistent taxonomy (C-suite, VP, Director, Manager, Individual Contributor). This fixes lead scoring and routing immediately.

Company names. Standardize to official names ("Salesforce" not "salesforce.com" or "SFDC"). This is a prerequisite for accurate account-based reporting.

Location fields. Standardize state names (CA vs. California vs. Calif.), country names (US vs. United States vs. USA), and city spellings.

Industry classifications. Map free-text industry entries to a standard taxonomy like NAICS or your own internal classification.

Phone number formats. Add country codes, remove special characters, standardize to E.164 format.

Claude Code produces a mapping table for each field showing the current messy value, the proposed standard value, and how many records are affected. Review the mappings, approve them, and Claude Code generates an import-ready CSV with the corrections applied.

The time savings here are dramatic. Standardizing job titles manually across 10,000 records can take two to three days. Claude Code does it in minutes and catches edge cases a human would miss, like abbreviations, misspellings, and non-English title formats.

7. Phase 5: Re-enrichment Planning

The audit has now identified three categories of records that need attention: records with missing fields (completeness gaps), records with outdated information (staleness issues), and records with formatting problems (standardization fixes).

Standardization is handled by the mapping tables from Phase 4. That is a data transformation Claude Code can perform directly. But completeness and staleness are different. You cannot fill in missing phone numbers or update stale job titles from the data you already have. You need fresh data from external sources.

This is where the audit connects to your enrichment infrastructure. Claude Code produces an enrichment specification: a list of records that need updating, which fields need filling or refreshing, and the priority order based on lifecycle stage and potential deal value.

A typical enrichment spec from an audit looks like this:

Tier 1 (enrich immediately). Open opportunity contacts missing phone numbers or with job titles older than 12 months. These records directly affect active deals.

Tier 2 (enrich this week). SQLs and MQLs with incomplete firmographic data. Missing employee count, revenue, and industry data means your ICP scoring is unreliable for these records.

Tier 3 (enrich this month). All contacts with 6+ month staleness and fewer than three filled fields. These need a full refresh to be usable.

Tier 4 (evaluate for removal). Contacts with 18+ months of zero activity, no valid email, and no recent enrichment. It may cost less to remove and re-source these than to enrich them.

For Tier 1 and 2 records, waterfall enrichment is the most effective approach. Single-source enrichment fills about 50 to 60% of missing fields. A waterfall approach cascading through multiple providers pushes that to 85 to 95% fill rates. The enrichment spec Claude Code produces maps directly to the input format most data enrichment platforms expect: a list of domains or email addresses with the specific fields to fill.

For Tier 3 records, re-enrichment through CRM enrichment tools with broad coverage across firmographic, technographic, and contact data points is the move. These records need more than a single field update. They need a full data refresh.

8. Phase 6: Validation and Import

After enrichment, the updated data needs to go back into your CRM. This is where teams frequently introduce new quality problems by importing data that conflicts with existing fields, overwrites correct values, or creates new duplicates.

Claude Code handles the pre-import validation. Drop the enriched CSV into your project and ask Claude Code to "compare the enriched data against the original export. Flag any conflicts where the enriched value differs from the existing CRM value. Show me the conflicts by field and recommend whether to overwrite, skip, or flag for manual review."

The conflict resolution logic follows a priority hierarchy:

  • Always overwrite. Empty fields getting filled for the first time. No conflict.
  • Overwrite with verification. Stale fields (last modified 12+ months) where the enriched value is different. Likely the new data is more current.
  • Manual review. Fields that were recently modified (within 6 months) where the enriched value differs. Your team may have manually corrected something the enrichment source does not reflect.
  • Never overwrite. Protected fields like lead source, lifecycle stage, deal owner, or any field your team has flagged as manually maintained.

Claude Code applies these rules, produces the validated import file, and generates a change log documenting every modification. That change log matters for accountability. When someone asks "why did this contact's title change?" three months from now, you have a record.

9. Phase 7: Building Your Recurring Audit Skill

A one-time audit fixes the current state. A recurring audit prevents the next decay cycle. This is where Claude Code skills turn a manual process into a system.

Build a CRM audit skill that encodes everything from this guide: the completeness scan, the duplicate detection logic, the staleness thresholds, the standardization mappings, the enrichment tier definitions, and the import validation rules. Store it in .claude/skills/crm-audit/SKILL.md so it loads automatically whenever you run the audit.

A practical maintenance schedule:

  • Weekly. Run the completeness scan on new records added in the past 7 days. Catch gaps early before they compound.
  • Monthly. Run the full duplicate detection and staleness check. Feed Tier 1 and 2 records into your enrichment workflow for refreshing.
  • Quarterly. Run the full audit including field standardization, enrichment spec generation, and the validated import cycle. Update your CLAUDE.md with any changes to your ICP definition, field standards, or enrichment priorities.

The compounding effect is significant. Teams that run weekly completeness checks and monthly staleness audits report that their quarterly full-audit results improve steadily over time, with fewer issues surfacing each cycle because problems get caught and fixed before they spread.

For agencies managing multiple client CRMs, the skill becomes a template. Customize the field mappings, staleness thresholds, and enrichment tiers per client, but the audit structure stays consistent. One well-built skill scales across every client engagement.

FAQ

How long does a full CRM data quality audit take with Claude Code?

The Claude Code portion, from export to audit report, takes 15 to 30 minutes for a database of 10,000 to 30,000 records. The total timeline including enrichment and import depends on your enrichment provider's turnaround. Most teams complete the full cycle in two to three days, compared to one to two weeks for a fully manual audit.

What CRM systems does this work with?

Any CRM that can export to CSV. The audit process is CRM-agnostic because it operates on exported files, not directly inside the CRM. HubSpot, Salesforce, Pipedrive, Close, Zoho, and any other system with a CSV export function all work.

How often should we run a CRM data quality audit?

Weekly completeness checks on new records, monthly staleness and duplicate scans, and quarterly full audits is the cadence most RevOps teams settle on. The exact frequency depends on your database size and how quickly your market moves. High-turnover industries like tech and SaaS warrant more frequent audits.

Can Claude Code fix the data, or does it just identify problems?

Both, but with boundaries. Claude Code can fix structural issues directly: standardize field formats, generate de-duplication recommendations, and produce import-ready files with corrections applied. It cannot fix accuracy problems, such as outdated job titles or stale phone numbers, because that requires fresh data from external sources. That is where the re-enrichment step comes in.

How does this differ from built-in CRM data quality tools?

Most CRM-native data quality features focus on completeness (required fields) and basic deduplication (exact match). They do not assess staleness, cross-reference decay rates, produce enrichment specifications, or generate validated import files with conflict resolution logic. Claude Code handles the analysis and planning layers that CRM-native tools miss, while the CRM handles the storage and workflow layers that Claude Code does not replace.

 

Related articles

MCP vs. SDK vs. API: When to Use Which for GTM Workflows
MCP vs. SDK vs. API: When to Use Which for GTM Workflows

When to Use MCP: Best for Exploratory and Conversational Workflows

avatar

by Jan, March 06, 2026

Claude Cowork for GTM: What Sales and RevOps Teams Need to Know
Claude Cowork for GTM: What Sales and RevOps Teams Need to Know

How Claude Cowork Simplifies Sales and Revenue Operations

avatar

by Jan, March 05, 2026

250+ Hours of Claude Code for GTM: Here's What We Learned
250+ Hours of Claude Code for GTM: Here's What We Learned

What 250+ Hours Building an Claude Code Powered GTM Campaign Taught Us About Automation and Accuracy

avatar

by Jan, March 04, 2026

Contextual ICP Scoring with Claude Code: Why Employee Count and Tech Stack Aren't Enough Anymore
Contextual ICP Scoring with Claude Code: Why Employee Count and Tech Stack Aren't Enough Anymore

Get deeper insights and better conversion rates by moving beyond simple filters to dynamic ICP scoring powered by AI

avatar

by Jan, March 03, 2026