ChatGPT and PDL to Enrich CRM Data
Sales & Marketing

Using ChatGPT with our APIs to Enhance CRM Data

April 6, 2023

10 minutes

Until now, most of my ChatGPT interactions have been purely casual and philosophical, asking its take on things such as happiness, the ethics of art generation models, and other simple or quirky questions to test the waters. However, following the recent update release, I wondered how easy it would be to use it to write some tutorial code for leveraging People Data Labs’ APIs. 

TL;DR, it’s a good start, but needs a few tweaks!

Initial Prompt

I started out with the simple goal of seeing if it knew of (and could leverage) any part of our API without my explicit description. 

The first prompt:

You’re a data engineer. Write a program using People Data Labs’ API in python to ingest data into a pandas dataframe.

What ChatGPT came up with:

import requests import pandas as pd

def get_people_data(api_key, search_params): base_url = 'https://api.peopledatalabs.com/v5.0/person' headers = {'X-api-key': api_key} response = requests.get(base_url, params=search_params, headers=headers) if response.status_code == 200: return response.json() else: print(f"Error: {response.status_code}") return None

def create_dataframe(people_data): return pd.DataFrame(people_data)

def main(): # Replace with your actual API key api_key = 'your_api_key_here' # Define the search parameters you want to use search_params = { 'q': 'software engineer', # Search query 'limit': 10, # Number of results to return 'offset': 0 # Starting index for pagination }

people_data = get_people_data(api_key, search_params) if people_data: df = create_dataframe(people_data) print(df) else: print("No data retrieved from PeopleDataLabs API")

if __name__ == '__main__': main()

Honestly, not bad! 

Next, I wanted to test how to make PDL data add value to something. Enriching CRM leads is a good start. I knew I could leverage our Person Enrichment API to add depth to lead data, and enable more granular segmentation.

Practical Application

To give ChatGPT a simple contacts data schema, I asked it to instead leverage our Person Enrichment API to add depth to the profiles of our purchasers.

Now, let’s say I also had some sales data that had the schema first_name, last_name, email, sales_order, item_id, date.

  • Get all of the unique first_name, last_name, email combinations in the sales data.

  • Use the People Data Labs Person Enrichment API to enrich these unique combinations.

  • Join that back with the sales data dataframe.

Here’s our new code:

import requests import pandas as pd

def get_person_data(api_key, first_name, last_name, email): base_url = 'https://api.peopledatalabs.com/v5.0/person/enrich' headers = {'X-api-key': api_key} params = { 'first_name': first_name, 'last_name': last_name, 'email': email } response = requests.get(base_url, params=params, headers=headers) if response.status_code == 200: return response.json() else: print(f"Error: {response.status_code}") return None

def enrich_unique_combinations(api_key, unique_combinations): enriched_data = []

for index, row in unique_combinations.iterrows(): person_data = get_person_data(api_key, row['first_name'], row['last_name'], row['email']) if person_data: enriched_data.append(person_data)

return pd.DataFrame(enriched_data)

def load_sales_data(): # Load your sales data here, e.g., from a CSV file sales_data = pd.read_csv('sales_data.csv') return sales_data

def join_dataframes(sales_df, enriched_df): joined_df = sales_df.merge(enriched_df, on='email', how='left') return joined_df

def main(): # Replace with your actual API key api_key = 'your_api_key_here' sales_df = load_sales_data() unique_combinations = sales_df[['first_name', 'last_name', 'email']].drop_duplicates()

enriched_df = enrich_unique_combinations(api_key, unique_combinations)

joined_df = join_dataframes(sales_df, enriched_df) print(joined_df)

if __name__ == '__main__': main()

That’s pretty good! But, I knew I could be using the enrichment API a little more effectively, since there’s a bulk API available. So, I then asked:

Now, instead of using the single Person Enrichment API, let’s use the Bulk Enrichment API. It allows up to 100 records at a time.

Here’s what it returned:

import requests import pandas as pd

def get_bulk_person_data(api_key, records): base_url = 'https://api.peopledatalabs.com/v5.0/person/enrich/bulk' headers = {'X-api-key': api_key} data = {'data': records} response = requests.post(base_url, json=data, headers=headers) if response.status_code == 200: return response.json() else: print(f"Error: {response.status_code}") return None

def split_records(records, chunk_size=100): for index in range(0, len(records), chunk_size): yield records[index:index + chunk_size]

def enrich_unique_combinations(api_key, unique_combinations): records = unique_combinations.to_dict('records') enriched_data = []

for chunk in split_records(records): chunk_data = get_bulk_person_data(api_key, chunk) if chunk_data: enriched_data.extend(chunk_data)

return pd.DataFrame(enriched_data)

def load_sales_data(): # Load your sales data here, e.g., from a CSV file sales_data = pd.read_csv('sales_data.csv') return sales_data

def join_dataframes(sales_df, enriched_df): joined_df = sales_df.merge(enriched_df, on='email', how='left') return joined_df

def main(): # Replace with your actual API key api_key = 'your_api_key_here' sales_df = load_sales_data() unique_combinations = sales_df[['first_name', 'last_name', 'email']].drop_duplicates()

enriched_df = enrich_unique_combinations(api_key, unique_combinations)

joined_df = join_dataframes(sales_df, enriched_df) print(joined_df)

if __name__ == '__main__': main()

Cool! Now, I had enriched CRM data. But, what could I do with it? I decided to break it up based on job_title. My thinking was to get the most frequent occurrences of job_title and item_id to identify new segments to target for a specific subset of our products. For instance, maybe we launched some new stuff Q1 of 2023, so we only want to target purchases after December 21, 2022.

After joining this data, give me the top 100 most common pairings of job_title and item_id after December 31, 2022.

Some more code:

import requests import pandas as pd from datetime import datetime

def get_bulk_person_data(api_key, records): base_url = 'https://api.peopledatalabs.com/v5.0/person/enrich/bulk' headers = {'X-api-key': api_key} data = {'data': records} response = requests.post(base_url, json=data, headers=headers) if response.status_code == 200: return response.json() else: print(f"Error: {response.status_code}") return None

def split_records(records, chunk_size=100): for index in range(0, len(records), chunk_size): yield records[index:index + chunk_size]

def enrich_unique_combinations(api_key, unique_combinations): records = unique_combinations.to_dict('records') enriched_data = []

for chunk in split_records(records): chunk_data = get_bulk_person_data(api_key, chunk) if chunk_data: enriched_data.extend(chunk_data)

return pd.DataFrame(enriched_data)

def load_sales_data(): # Load your sales data here, e.g., from a CSV file sales_data = pd.read_csv('sales_data.csv') return sales_data

def join_dataframes(sales_df, enriched_df): joined_df = sales_df.merge(enriched_df, on='email', how='left') return joined_df

def get_top_100_pairings(joined_df, date_threshold): date_filter = pd.to_datetime(joined_df['date']) > date_threshold filtered_df = joined_df[date_filter] pairings = filtered_df.groupby(['job_title', 'item_id']).size().reset_index(name='count') top_100_pairings = pairings.nlargest(100, 'count') return top_100_pairings

def main(): # Replace with your actual API key api_key = 'your_api_key_here' sales_df = load_sales_data() unique_combinations = sales_df[['first_name', 'last_name', 'email']].drop_duplicates()

enriched_df = enrich_unique_combinations(api_key, unique_combinations)

joined_df = join_dataframes(sales_df, enriched_df)

date_threshold = datetime.strptime('2022-12-31', '%Y-%m-%d') top_100_pairings = get_top_100_pairings(joined_df, date_threshold) print(top_100_pairings)

if __name__ == '__main__': main()

That’s pretty great, but when I tried to run it, I ran into some issues. Chat GPT isn’t quite perfect (yet), as aspects like readability could definitely be improved. A little cleaning could quickly fix that.

To save you time, here’s the quick rundown of the updates:

  • Fix the dataframe joining

  • Update the inputs to match PDL’s expected input

  • Add some docstrings for readability (Used ChatGPT to generate!!)

  • Give us some information on response error (Used ChatGPT to generate!!)

  • Type the methods (Used ChatGPT to generate!!)

import requests import pandas as pd import random from datetime import datetime, timedelta from typing import List, Dict, Union, Optional, Generator

def get_bulk_person_data(api_key: str, records: List[Dict[str, str]]) -> Optional[List[Dict[str, Union[str, int]]]]: """ Get enriched person data from PeopleDataLabs Bulk API for a list of records. :param api_key: API key for PeopleDataLabs API. :param records: List of records containing first_name, last_name, and email. :return: List of enriched records, or None if an error occurs. """ request_list = { "requests": [ { "params": { "email": record["email"] } } for record in records ] }

base_url = 'https://api.peopledatalabs.com/v5/person/bulk' headers = {'X-api-key': api_key} response = requests.post( base_url, headers=headers, json=request_list )

if response.status_code == 200: enriched_records = [item['data'] for item in response.json()] return enriched_records else: print(f'Error: {response.status_code}\n{response.text}')

def split_records(records: List[Dict[str, str]], chunk_size: int = 100) -> Generator[List[Dict[str, str]], None, None]: """ Split records into chunks of the given size.

:param records: List of records. :param chunk_size: Size of each chunk. :return: Generator yielding lists of records with length <= chunk_size. """ for index in range(0, len(records), chunk_size): yield records[index:index + chunk_size]

def enrich_unique_combinations(api_key: str, unique_combinations: pd.DataFrame) -> pd.DataFrame: """ Enrich unique first_name, last_name, email combinations using PeopleDataLabs Bulk API.

:param api_key: API key for PeopleDataLabs API. :param unique_combinations: DataFrame with unique first_name, last_name, email combinations. :return: DataFrame containing enriched person data. """ records = unique_combinations.to_dict('records') enriched_data = []

for chunk in split_records(records): chunk_data = get_bulk_person_data(api_key, chunk) if chunk_data: enriched_data.extend(chunk_data)

return pd.DataFrame(enriched_data)

def join_dataframes(sales_df: pd.DataFrame, enriched_df: pd.DataFrame) -> pd.DataFrame: """ Join sales DataFrame and enriched person DataFrame on the email column.

:param sales_df: DataFrame containing sales data. :param enriched_df: DataFrame containing enriched person data. :return: Joined DataFrame. """ enriched_df['primary_email'] = enriched_df['emails'].apply(lambda emails: emails[0]['address'] if emails else None) joined_df = sales_df.merge(enriched_df, left_on='email', right_on='primary_email', how='left') return joined_df

def get_top_100_pairings(joined_df: pd.DataFrame, date_threshold: datetime) -> pd.DataFrame: """ Get the top 100 most common pairings of job_title and item_id after a specific date.

:param joined_df: DataFrame containing joined sales and enriched person data. :param date_threshold: Date threshold for filtering. :return: DataFrame containing the top 100 most common pairings. """ date_filter = pd.to_datetime(joined_df['date']) > date_threshold filtered_df = joined_df[date_filter]

pairings = filtered_df.groupby(['job_title', 'item_id']).size().reset_index(name='count') top_100_pairings = pairings.nlargest(100, 'count') return top_100_pairings

def main(): # Replace with your actual API key api_key = 'your_api_key_here' sales_df = generate_sales_data(50) unique_combinations = sales_df[['first_name', 'last_name', 'email']].drop_duplicates() enriched_df = enrich_unique_combinations(api_key, unique_combinations) joined_df = join_dataframes(sales_df, enriched_df) date_threshold = datetime.strptime('2022-12-31', '%Y-%m-%d') top_100_pairings = get_top_100_pairings(joined_df, date_threshold) print(top_100_pairings)

if __name__ == '__main__': main()

Awesome! With ChatGPT and People Data Labs, I was able to form a much more granular understanding of who exactly is buying my products within 20 minutes. I can now use this data to build new ad campaigns and personalizations.

Call to Action
Jacob Langholz headshot
Jacob Langholz