Engineering Cover

B2B Audience Generation [In-Depth Tutorial]

March 17, 2021

10 minutes


Welcome to the Use Case Series! This article is the first in a series of in-depth tutorials focused on exploring several key business applications of B2B data along with the API’s, tools, and data provided by People Data Labs (PDL) to enable these use cases. An interactive code implementation will accompany each tutorial, along with in-depth explanations of each step of the process. For these tutorials, some basic familiarity with Python and Google Colaboratory will be helpful.

In this first tutorial, we’ll walk through how we can use B2B data to generate a custom audience for a targeted advertising campaign. Along the way, we will explore PDL’s Company Enrichment API and Person Search API. Here’s everything you will need to follow along:

  1. Starter Company ListPDL’s Free Company Dataset

  2. PDL API Key activate search API access

  3. Custom Audience Generation Pipeline (Colab Script)

With that said, let’s get started!

Audience Generation Scenario

Let’s imagine we are a company that sells products and services to SaaS companies. Currently, we are looking for ways to increase our presence in markets on the east coast, and so our marketing team has decided to run a targeted ad campaign to drive more inbound traffic. Let’s assume we have built an initial list of possible target companies (i.e. our starter company data) containing large tech-related companies located in New York City, which we would like to narrow down. We know that our ideal buyer persona is a decision maker working at a large SaaS company, and so these are the types of people we would like to build our custom audience from. Custom Audiences allow us to reach select demographics with targeted messaging, which translates to better traction, larger top-of-funnel numbers, and an improved ROI on marketing spend.

Now that we have our task defined, let’s take a look at how we can accomplish it.

Overview of the Process

  1. Setup: Getting an API key, downloading the starter data, and configuring the Colab script

  2. Importing Company Data: Loading our company data into the Colab script

  3. Company Enrichment: Using the Company Enrichment API to pull additional metadata for each company in our dataset

  4. Company Targeting: Narrowing down our list of target companies using the enrichment data

  5. Audience Generation: Building our custom audience using the People Search API

  6. Exporting Audience Profiles: Formatting our audience list for different marketing platforms


Getting a Free PDL API Key

First, let’s walk through the process of signing up for a PDL API key, which we’ll need to access the various API endpoints (like Company Enrichment and Person Search). You can sign up for an API key by filling out the API Signup Form.


API signup form

Please note: we recommend using a business email rather than a personal email, since all personal emails must first undergo a manual review as part of our spam filtering process. 

Once you have filled out the form, a validation email will be sent to your email address. Click the link to validate your email and activate your API key. After that, navigate to your API dashboard to see your API key. (Remember this is your key - don’t share it with others!)

Verify account

Account verification page after submitting API signup form

For more information, check out our Self-Signup API Quickstart Guide.

Getting Starter Data

Next, we’ll download the starter company data. In this scenario, this data represents a list of potential target companies that we’d like to narrow down. This dataset can be downloaded from the link here, or alternatively you can generate this list of companies yourself using the Free Company Dataset and this customizable filtering script. You don’t need to use our free company dataset either! Many of our customers generate their own target account lists via other public company data, their internal CRM, or an intent marketing solution like Madison Logic.

Setting Up Colab

Finally, let’s do a couple quick things to get set up using Google Colab: 

Open up the colab notebook, and make a copy of it by clicking “File -> Save a copy” from the toolbar.


Copying the colab notebook

‍Next, we’ll run the Setup section of the script. Enter your API key into the cell and hit “shift+enter” to run the cell. Run the next cell block as well and then upload the starter company dataset by clicking the button that appears. 

If you aren’t familiar with Colab and how it works, take a quick look at this Introduction to Google Colaboratory notebook, which will give you all the context needed to follow along with this tutorial.


Screenshot after running the cell and uploading company data file

At this point, our Colab script is now fully set up and we can run the entire script end-to-end. In the remainder of this tutorial, we’ll walk through the script section by section, and explain what’s happening along the way.

Importing Company Data

In the Setup section, we uploaded our dataset file to Colab. Now, in order to use that data we need to import it from the csv file.

There are many ways to do this in python, but for this tutorial, we’ll use the pandas library, which lets us easily load the csv data in a single line:

# Load company dataset into pandas dataframe data = pd.read_csv(filepath, header=0)

We call the read_csv function to load the data from the specified file. Note that header=0 specifies that the first line of the csv file contains a header rather than data. The information from the csv file is loaded into a table-like structure called a pandas DataFrame.

We can use the following code snippet to print out some information about the loaded dataframe:

print(f"Data Size: {data.shape}")print(f"Data Columns: {data.columns.tolist()}")

Data Size: (100, 8) Data Columns: ['name', 'website', 'year_founded', 'industry', 'size', 'locality', 'country', 'linkedin_url']

From the output, we can see that our dataframe is a table with 100 rows of company data, and each row has 8 field columns.

Company Enrichment

Now that we have our starter company dataset loaded, we would like to pull some additional information on these companies so that we can further narrow down this list to high-value accounts. We can easily do this using PDL’s Company Enrichment API, which lets us pull down additional information for each company. Enrichment actually serves a few purposes:

  • It allows us to add new fields to our existing company records

  • It allows us to pull down up-to-date information on each company from PDL’s datastore

  • Assuming you’re using a custom dataset, it allows us to standardize company information (such as spellings of names, website urls, locations, etc…)

In this case, we are interested in retrieving PDL’s up-to-date tags on our companies. 

First, we loop over each row (i.e. company) in our dataframe and construct an API request using that company’s information. Then, we collect the responses in a list and build a new dataframe from the results. This is illustrated in the following code snippet:

enriched_companies = [] num_successful_responses = 0 for idx, company in data.iterrows(): # Set Company Enrichment API parameters # We want to find all the company info for a particular company given its linkedin profile params = { "name": [ company['name'] ], # e.g. "google" "website": [ company['website'] ], # e.g. "" "profile": [ company['linkedin_url'] ], # e.g. "" } # Send a single API request success, response = send_company_enrichment_api_request(params) # Parse the enrichment from the response if success: # enriched_company is a single profile object enriched_company = response num_successful_responses += 1 else: # Print if we get an error error = response['error'] print(f"Error: \n\tType: {error['type']}\n\tMessage: {error['message']}") enriched_company = {} enriched_companies.append(enriched_company) # Print Summary:print(f"Total Number of Responses Received {len(enriched_companies)}")print(f"Number of Successful Enrichment Matches: {num_successful_responses}") # Create DataFrame of enriched companies enriched_data = pd.DataFrame(enriched_companies)

The two key aspects of this code block are the params object, which is how we specify which company we would like to enrich, and the send_company_enrichment_api_request function which handles sending the params request to the Company Enrichment API endpoint. For more information on the details of this process PDL’s Company Enrichment API Documentation.

At the end of this process, we will have a standardized table of all the available enrichment information for each company in our starter dataset. This includes data points like:

  • Size

  • Founding year

  • Website and email domains

  • Associated social media profiles (e.g. linkedin, facebook, twitter)

  • Tags

  • Summary description

  • Ticker symbol (and whether company is private or public)

  • And more

See an example of a full response here: Company Enrichment API - Full Response

Company Targeting

Our next step is narrowing down our list of starter companies. While there is a wealth of information in the enrichment data, we are specifically interested in using PDL’s company tags to focus our search.

To build our list of target companies, we will look for companies specifically tagged with SaaS:

target_tags = [ "saas" # Add other relevant tags here] mask = data['tags'].apply(lambda tags: lists_share_element(tags, target_tags)) target_companies = data[mask]

In this code snippet, we are first constructing a list of target tags that we are interested in, and then searching for all the companies that contain any one of the tags in our list of target tags.

After this process, we are left with 11 companies in our target list:

print(f"Data Size: {target_companies.shape}")

Data Size: (11, 21)

Audience Generation

Now all that remains is simply finding the right type of employees working at these target companies. We can do this using the Person Search API, which allows us to query the entire database of PDL data for very specific segments of people data. 

The following code snippet shows how we can use the API to find people that match our target buyer persona:

# Get company urls company_linkedin_urls = data['linkedin_url'].values.tolist() # Define our query for the Person Search APIif not use_sql: # Here's a query using Elasticsearch's Query Syntax QUERY = { "query": { "bool": { "must": [ { "terms": { "job_title_levels": ["manager", "owner", "director", "cxo", "vp", "partner", "senior"] } }, { "terms": { "job_company_linkedin_url": company_linkedin_urls } }, { "exists": { "field": "work_email" } } ] } } } elif use_sql: # Here's the same query as above using SQL QUERY = f""" SELECT * FROM person WHERE job_title_levels IN ('manager', 'owner', 'director', 'cxo', 'vp', 'partner', 'senior') AND job_company_linkedin_url IN {tuple(company_linkedin_urls)} AND work_email IS NOT NULL; """ # Send the API request success, response = send_person_search_request(QUERY, use_sql, size=100, start_from=0) # Parse Responseif not success: print(f"Error from Person Search API: \n{response}\n\nCould not generate audience") return pd.DataFrame() person_matches = [] num_successful_matches = 0 total_available_matches = response['total'] # Check total number of available matchesprint(f"Total Number of Matches: {total_available_matches}")if (total_available_matches < num_desired_matches): print(f"WARNING: Person Search API has only found [{total_available_matches}] total matches which is less than the desired number matches [{num_desired_matches}]") num_desired_matches = total_available_matches # Retrieve all matches batch-by-batchwhile success and num_successful_matches < num_desired_matches: person_matches += response['data'] num_successful_matches += len(response['data']) # Get the next batch of matches success, response = send_person_search_request(QUERY, use_sql, size=100, start_from=num_successful_matches) # Create DataFrame from parsed matches person_data = pd.DataFrame(person_matches)

The most important parts of the above code block are the query construction and the sending/retrieving of the query results. Let’s take a closer look at these two pieces.

Building our Person Search Query

The PDL Person Search API is built on top of Elasticsearch, and the API itself exposes a subset of the Elasticsearch query syntax (see the exact specifications). What this means is that we can query the PDL datastore using the Elasticsearch query syntax or SQL. Here’s what that looks like from the previous code snippet, which demonstrates the same query in both elasticsearch syntax and SQL:

# Define our query for the Person Search APIif not use_sql: # Here's a query using Elasticsearch's Query Syntax QUERY = { "query": { "bool": { "must": [ { "terms": { "job_title_levels": ["manager", "owner", "director", "cxo", "vp", "partner", "senior"] } }, { "terms": { "job_company_linkedin_url": company_linkedin_urls } }, { "exists": { "field": "work_email" } } ] } } } elif use_sql: # Here's the same query as above using SQL QUERY = f""" SELECT * FROM person WHERE job_title_levels IN ('manager', 'owner', 'director', 'cxo', 'vp', 'partner', 'senior') AND job_company_linkedin_url IN {tuple(company_linkedin_urls)} AND work_email IS NOT NULL; """

The elasticsearch query syntax shows that queries are defined using nested JSON-like objects (i.e. dicts in python). The first two nested objects (i.e. "query" and "bool") indicate our query as a boolean query in the Elasticsearch query syntax. The array of elements associated with the "must" key specify the criteria that must be satisfied by the search results. So in other words, we use the elements of the "must" array to define our query. In the above example, we have the following 3 query criteria:

  1. The person profile must have one of the following job title levels: ["manager", "owner", "director", "cxo", "vp", "partner", "senior"]. Note that these job title levels are from PDL’s set of canonical job levels.

  2. The person profile must have an associated employer linkedin that is in our list of target company linkedins. In other words, they must be working at one of our target companies.

  3. The person profile must have a work email associated with it.

In the SQL version of this query, we build a single string containing the query parameters. The “SELECT” clause defines which index we want to search and the fields we want, and the “WHERE” clause specifies our search parameters. The “WHERE” clause in SQL is directly analogous to the “bool” object in Elasticsearch Query Syntax. 

For either query format, any search result that satisfies all three criteria will be returned by the API as a valid match. Hopefully, it is apparent how this query targets our ideal buyer persona: we specify that we want matches with upper-level job titles (1), who are employed by one of our target companies (2) and have an available work email for us to build a custom audience out of.

Sending the Query and Retrieving the Results

Once we have built our search query, we must send it to the API endpoint and pull down the results. Similar to the Company Enrichment API, the send_person_search_request function (shown in the audience generation code snippet) handles the process of sending the API request. For more information on the details of this process, see PDL’s Person Search API Documentation.

However, unlike the Company Enrichment API, which returns a single result for each query, the Person Search API returns multiple results (anywhere from 0 to 10,000, depending on the query criteria), and so some additional steps are involved to retrieve all the results.

For each query request, if there are any successful matches, the API response will indicate the total number of matches found across the entire database. Here’s an example of what a successful result looks like:

{ "status": 200, "data": [ { "id": "qEnOZ5Oh0poWnQ1luFBfVw_0000", "full_name": "sean thorne", ... }, ... ], "total": 99 }

The total field indicates that there were 99 successful matches in our query, while the data field is an array containing a fixed number of profile matches (i.e. a batch). Each profile contains information on any subset of the fields described in our Person Schema (and these are all the fields that can be used in our query specification).

Note about Batch Sizes

The Person Search API limits the number of profiles that can be returned in a single response (i.e. the batch size must be number between 1 and 100). Therefore, in order to pull more than 100 profiles, we need to send multiple queries using the “from” parameter in our query to specify the beginning of the batch. The figure below demonstrates this concept for pulling 25 matches with a batch size of 5:


Batch by batch retrieval of query results

Thus, we can easily pull a large number of results using multiple batches, as illustrated in the following code snippet (taken from a subset of the previous audience code block):

# Send the API request success, response = send_person_search_request(ES_QUERY, size=100, start_from=0) person_matches = [] num_successful_matches = 0 total_available_matches = response['total'] # Check total number of available matchesif (total_available_matches < num_desired_matches): print(f"WARNING: Person Search API has only found [{total_available_matches}] total matches which is less than the desired number matches [{num_desired_matches}]") num_desired_matches = total_available_matches # Retrieve all matches batch-by-batch while success and num_successful_matches < num_desired_matches: person_matches += response['data'] num_successful_matches += len(response['data']) # Get the next batch of matches success, response = send_person_search_request(ES_QUERY, size=100, start_from=num_successful_matches)

This code demonstrates sending the API query, extracting the total number of matches, and then iteratively retrieving the desired number of results batch-by-batch. 

Person Search API Summary

Using our list of target companies, we saw how to construct a query to find upper-level employees at these companies, send the query to the Person Search API endpoint, and retrieve the profile matches in multiple batches. The output of this process is the target audience that we initially set out to build!

Exporting Audience Profiles

Finally, once we have our target audience profiles, we can easily export our set of person matches to the formats accepted by various audience campaign platforms. For our example, let’s say that we would like to use Twitter's Custom Audiences Tool for running our ad campaign, which supports csv lists of email addresses. This is implemented in the following code snippet:

person_matches['work_email'].to_csv(filename, header=None, index=False)

Similarly, we could generate a similar file for Facebook's Custom Audiences Platform using their customer list format as follows:

headers = ["email", "phone", "fn", "ln"] fields = ["work_email", "mobile_phone", "first_name", "last_name"] person_matches[fields].to_csv(filename, header=headers, index=False)

And just like that, we have multiple custom audience files that we can directly upload and start running marketing against!

Reviewing What We’ve Accomplished

In this tutorial, we took a rough list of potential target companies, used it to build a highly targeted audience of person profiles that we can directly use for an advertising campaign. In our example, we wanted to find a list of upper-level employees at SaaS companies based in the east coast which we accomplished by:

  1. Loading the starter company dataset

  2. Running those companies through the PDL Company Enrichment API to get associated tags for each company

  3. Using this enrichment information to target just the companies tagged as SaaS

  4. Using the PDL Person Search API to find upper-level employees working at these target companies

  5. Turning this list of person matches into an audience list that can be directly uploaded to various marketing platforms (in this example, Twitter and Facebook)

Along the way, we also stepped through a complete python implementation of this audience generation pipeline provided by our colab notebook, which can be customized to support various other use cases such as lead generation, direct outreach, and talent acquisition among others.


We hope this tutorial illustrates the immediate value that person data can provide, and shows just how easy it is to get started using person data in your business. We encourage you to sign up for a free API key, and give the audience generation colab notebook a spin! As always, please reach out if you have any questions or suggestions, and join us again for the next tutorial in this series!

Need more data or credits? Need help customizing your pipeline? Speak to one of our data consultants today!

Like what you read? Scroll down and subscribe to our newsletter to receive monthly updates with our latest content.

Vinay Rajur
Vinay Rajur