Enrichment Data Testing Guide
Recipes

Enrichment Data Testing Guide

May 5, 2025

10 minutes

Purpose

Welcome! We’re excited to see you testing People Data Labs (PDL) to measure our fit into your central workflow. A well‑designed test establishes a baseline match‑rate expectation, highlights blind spots in your sample, and prevents last‑minute surprises when you transition to production. Treat this as due diligence: verify fit, quantify coverage for you, and decide early whether you must adjust inputs, field bundles, or any other downstream logic.


Pre‑test Checklist

  1. Create an API key in the API Dashboard

  2. Put together a random, and statistically significant, representative sample

  3. Ensure each row contains at least one, high‑confidence identifier such as email, linkedin URL, website, etc

  4. Ensure you aren’t using role‑based or catch‑all inboxes e.g. info@acmeco.com

  5. Deduplicate the file so every record counts once

  6. Conduct standard data cleaning e.g. obvious typos or last names in the first name value


Best practices

  • Check out the API Dashboard Quickstart guide to get up and running quickly

  • Once you have a solid grasp of the API Dashboard, check out the API Quickstart Guide to start sending your first API calls

  • Your sample should be drawn from your typical live production sources rather than demo records for more accurate results

  • Some inputs like LinkedIn URL provide the highest probability of returning a match. Use these where possible

  • While the data doesn’t need to be perfectly cleaned as our algorithm will do some of the work for you, misspelled company names or mismatched first and last name will not typically return a result

  • Verify inbox validity before assuming the dataset is the culprit. Regulatory removals under GDPR or CCPA reduce EU and Canadian coverage, so adjust expectations accordingly


Sample Size 

Statistical confidence relies on sample size (n) and diversity in the records chosen for the test. Use the table below to pick a minimum record count. Smaller, homogeneous lists require larger n because correlated attributes (e.g., all employees from one startup) inflate variance and distort averages. Each time you filter by a new variable—country, seniority—double the sample again to maintain confidence.

Sample Size Scenario table

“Range of Normal” Match Rates*

  • Linkedin URLs → 70-85%

  • B2B contact enrichment with valid work email → 40‑70%

  • Consumer/social enrichment with fresh personal email → 60‑85%

  • Niche segments (stealth, early‑stage, non‑US) → 15‑40%

*Ranges are highly dependent on sample size and use case


Evaluate Results

When responses return, filter the records where status equals 200 to get your total matches. You can compute your overall match rate by dividing your total matches by total inputs. 

Total matches Total inputs 100

475 500 100= 95% match rate

Segment the results by dimensions that matter to your business, e.g. industry, company size, region, seniority, etc. to expose coverage dips that a blended strategy or extra identifier might fix. Compare each segment to the “range of normal” above; if numbers fall outside the expected band, review the “lift actions: below.

It’s important to understand the field fill rates across our dataset to set reasonable expectations about match rate. For example, for all profiles with at least a LinkedIn URL, as of the April 2025 release, we show a fill rate of 7.5% with a mobile_phone. Therefore, expecting 80% of your matches to return a mobile_phone would be unreasonable as the data doesn’t exist for a broad portion of the data.


Lift Actions if Coverage Feels Low

  1. Enrich each record with a second strong identifier such as LinkedIn URL or a sanitized company domain if available; additional signals typically boost matches by several points. 

  2. Standardize company domains (acme-inc.com → acme.com)

  3. Remove accents or special characters from names

  4. Ensure emails are active inboxes rather than stale aliases

  5. Use required parameters to help raise a better match

  6. Set a reasonable min_likelihood score

If enrichment still underperforms, run the list through /person/search, which trades precision for broader recall and can sometimes surface otherwise hidden profiles.


Resources & Self‑help Paths

Sam Bortol
Sam Bortol