Investment Research [In-Depth Tutorial]
April 15, 2021
Table Of Contents
Welcome to the second installment in our Use Case Series! In this tutorial, we’ll take a look at an Investment Research use case and explore how to leverage PDL's Company Enrichment API and Person Search API for both sourcing and contextualizing potential investment opportunities.
The only things you will need to get started with this tutorial are:
This tutorial can be read without reading any of the previous tutorials. However, we will point back to a few relevant sections in ourfor additional details. A basic familiarity with Python and Google Colaboratory will be helpful for reading and interacting with the code in this tutorial.
Getting a Free PDL API Key
Sign up for aby filling out the form on the signup page. For a detailed walkthrough of this process, take a look at our .
Setting up Colab
First, open up the File > Save a copy in Drive” from the toolbar., and make a copy of it by clicking “
Copying the colab notebook.
Next, once you have your API key, enter into the cell block shown below (and don't forget to run the cell!).
Enter your API key the colab cell indicated.
Again make sure to run the cell by hitting “Shift+Enter”, or clicking the play button in the top left corner of a cell that appears when hovering over the cell.
A Quick Note about Helper Function Definitions
The Setup section of the notebook also defines several utility functions that we will make use of throughout the tutorial. These functions help in the process of sending and receiving API requests and responses (such as managing rate limiting, checking error responses, and retrieving match results in batches). Run the cell to load these function definitions.
This cell block is hidden by default, but we definitely encourage you to open up the cell block and take a look at the implementations. You can do so by clicking the “SHOW CODE” block shown in the following image. Note: you can also hide the block again by double clicking the whitespace on the right side of the expanded code block after opening it.
Helper function definitions in the colab notebook (click "SHOW CODE" to expand).
At this point, we are all done with the setup and our Colab notebook is fully set up to run the entire script end-to-end. Now, let’s take a look at the scenario we’ll explore in this tutorial.
The Investment Research Scenario
Imagine we are an investment firm looking for new investment opportunities in the autonomous vehicles space. Our firm focuses on very early-stage startups (≤ 3 years old) with a strong technology offering and high-growth potential.
Recently (as of April 2021), John Krafcikthat he would be stepping down as CEO of Waymo, Alphabet's self-driving subsidiary. This news may also signal the departure of other employees, and so to source new investment opportunities, our firm has decided to target former Waymo employees and investigate what companies these employees have gone to. In particular, we'd like to focus on employees in engineering roles as we believe they serve as a strong positive indicator for the technological strength of their new employers. For each of these companies, we would like to do some initial screening and background research to come up with a small handful of candidates matching our investment profile to bring to the table for internal discussion.
Overview of the Process
To summarize our approach to this task, here's a high-level breakdown of what we'll cover in the remainder of this tutorial.
Find ex-Waymo Engineers: Pull profiles of all former Waymo engineering employees using the Person Search API
Aggregate Current Employers: Group all the companies that our former Waymo employees currently work at
Enrich Current Employers: Use the Company Enrichment API to pull additional metadata for each company in our list
Screen Current Employers: Filter out companies that do not fit our target investment profile
Find All Employees at Current Employers: For each of the remaining companies, pull profiles for all current and past employees using the Person Search API
Background Research on Current Employers: Use the combination of company enrichment data and current/past employee profiles to generate insights into each company
Export Results: Export our profile lists, company matches and background research for use in other analysis pipelines
Find ex-Waymo Employees
Our first task is to find ex-Waymo employees, which can be easily done using the PDL.
Before pulling person profiles, however, we will want to standardize our company information for Waymo in order to ensure consistency with the way data is represented in PDL's databases (for example, like the way names are spelled or urls are formatted). This is referred to as 'cleaning' our data and can be done using either PDL'sor the dedicated .
For this tutorial, we will use the Company Enrichment API to get the internal PDL Company ID for Waymo (which we can then use to find people associated with Waymo):
The code snippet above is a simple instance of sending a Company Enrichment API request. We use the helper function send_company_enrichment_api_request() to handle sending the request (along with handling error checking and rate limiting). For a successful API request, the response object will contain all the company enrichment data for that request, including the id field we are after. Feel free to take a look at an to learn what additional data is provided through the Company Enrichment API.
Now that we have a standardized way to reference Waymo using its id field, we can use that to construct aquery for ex-Waymo engineers as follows:
Search API Query Construction
As we can see, there are 2 ways of structuring queries for the Person Search API: we can useor , which the code snippet above demonstrates. While the Elasticsearch syntax is a bit more complex, it is generally recommended over SQL for querying the PDL API's since it is more flexible and maps more naturally onto PDL's internal search mechanisms.
Regardless of the syntax used, however, the query built in the code block specifies the following logical criteria for a valid profile match:
The person must have worked at Waymo
They must be working in an engineering role
They must not be currently working at Waymo
Translating these search criteria into code requires an understanding of the representation of fields within our, so let's briefly take a closer look:
Criteria 1 (The person must have worked at Waymo)
For this constraint, the related schema field is the experience.company.id field, which represents the id for any company that a person has ever worked at. In PDL's person schema, the experience field is an array where each element represents a single job experience. We specify that a valid person match must have at some point worked at Waymo by stating that experience.company.id must equal the company id for Waymo (which we got from the initial company enrichment we did).
Criteria 2 (They must be working in an engineering role)
The second constraint uses the job_title_role field, which represents the current type of job that a person has. PDL has defined a set of that enumerate all values this field can take on, of which engineering is one such enumerated value. Thus, we specify that a valid match must be working in an engineering role by requiring that job_title_role must equal “engineering”.
Criteria 3 (They must not be currently working at Waymo)
The last constraint uses the job_company_id schema field, which represents the company id for a person's current employer. By requiring that job_company_id must not equal the company id for Waymo, we can enforce that the profile match is not a current Waymo employee.
Taken together, these 3 criteria are how we can define our search query for ex-Waymo engineers. While there are multiple ways to define most queries, this clarifies the reasoning behind one such approach to constructing queries for the Person Search API. If you would like to build your own queries, we highly encourage familiarizing yourself with the available fields in the.
Sending and Receiving using the Search API
Now that we've seen how to build our search query, let's finish up this section by looking at how we send it and pull down profiles matching our criteria for ex-Waymo engineers:
Our goal is to recover all the profiles matching our search query. However, the Person Search API does not allow us to retrieve more than 100 profile matches at a time (see the retrieve_search_api_matches() helper function does for us, as seen in the code block above. This function sends a query multiple times each time incrementing the batch starting position until all results have been pulled down. We use the send_person_search_request() helper function to handle sending the actual query request (which manages error checking and rate limiting). For additional information on this process, see the Sending the Query and Retrieving the Results section in the .). So instead, we must pull the results down in batches, which is what the
The end result is that we have successfully pulled down every available profile in the PDL database matching our query for ex-Waymo engineers.
Aggregate Current Employers
After pulling all the profiles of ex-Waymo engineers from the Person Search API, we now want to look at where all these people are currently employed. This can be done quite simply by iterating over our profiles and aggregating each profile's current employer. Knowing that we will want to enrich these companies in the next step, we will also make sure to keep track of the necessary information to use as input for the Company Enrichment API (e.g. name, website, and linkedin_url for each company). This is demonstrated in the code snippet below:
Enrich Current Employers
As mentioned in the previous step, we now want to enrich this list of companies employing former Waymo engineers. Here, our goal in doing this enrichment is to collect additional metadata on each company that will be useful in later steps: for both screening out companies that do not fit our target investment profile, as well as providing extra contextual information to supplement our background research.
To enrich these companies, we simply iterate over each company in our list, send and an enrichment request and collect the responses in an array:
As before, we use the send_company_enrichment_api_request() helper function to submit the enrichment requests and handle error checking as well as rate limiting.
Having run the enrichments, let's take a minute to explore the set of companies currently employing ex-Waymo engineers:
Histogram of current employers of ex-Waymo engineers. Bar height indicates number of ex-Waymo engineers hired, and bar color indicates total company size.
The figure above shows a histogram of all the current employers of ex-Waymo engineers, with bar height indicating the number of ex-Waymo engineers and bar color indicating the size of the company. Unsurprisingly, most of the former Waymo engineers end up at larger companies (with Google itself taking the lead). However, there are a few smaller companies with multiple former Waymo engineers. Overall, we can see that Waymo engineers have left for companies of all different sizes, which means we will likely be able to find some investment candidates.
Screen Current Employers
At this point, we would like to do some initial screening to exclude companies that are far outside our typical investment profile, which is very early stage companies with strong technology products. We'll rely on the fact that these companies all employ former Waymo engineers to satisfy our requirement for a strong technology product, and instead, focus our screening on companies that are beyond our typical upper-bound size and age profiles:
As seen in the code block above, this type of screening process is quite simple to implement and serves as a coarse filter for discarding clearly poor fits. After screening, we are left with just a handful of companies we aim to take a deeper look at.
Find All Employees at Current Employers
In order to better understand each of these companies, we will want to pull all the associated person profiles for each company (e.g. all current/former employees), which will allow us to better understand the demographics and growth profiles of these companies. In order to pull all employees, we will again use the Person Search API like we did in the Find ex-Waymo Employees section as shown in the following code block:
As you can see, this code is almost exactly identical to the code we used to pull all ex-Waymo engineers. The only difference is that we are using a slightly modified (and simpler) query and repeating this process for each company in our list.
Background Research on Current Employers
Having narrowed down our list of candidate companies and having pulled down all the related employee profiles for each company, we can now do some deeper investigation to help further inform the investment quality of these remaining candidate companies. For this section, we will define a background_research() helper function to generate the background information and data, and then simply call this function for each candidate company as follows:
This section is meant to be more illustrative than exhaustive in terms of demonstrating the types of insights available in the data, and so we won’t dive into the details of the helper functions here. We do encourage you to look at the code implementation in the notebook however! There are many things that can be done, and here are just a few examples implemented in the background_research() helper function:
Generating various top 10 lists (e.g. for skills within the company, previous employers, past universities)
Historical headcount and growth metrics
Distributions of job roles within the company
FYI: Checkout the Bonus Visualizations section at the end of this tutorial, which demonstrates some of the types of visualizations that can be created from the data generated in this section.
Finally, our last step is to export these results for use in further analysis and investigation processes. We'll output our results in csv, a straightforward process using the pandas library, as shown in the following code block:
Reviewing What We've Accomplished
In this tutorial, we explored how we could use PDL's data and APIs to support an investment research use case. We first sourced a list of potential investment opportunities by targeting former Waymo engineers and aggregated the set employers they currently work for. Next, we enriched these companies and did some basic screening to narrow our focus down to a handful best matching our target profile. And finally, we dove deeper on the remaining companies by pulling their employees and computing growth metrics and other demographic-related information.
At the end of this process, we now have a targeted selection of investment candidates that match our target profile, and even the contact information for all the employees currently working at each company.
Along the way, we also stepped through a complete python implementation of this audience generation pipeline provided by our Colab notebook, which can be customized to support your particular application or even other use cases such as lead generation, direct outreach, and talent acquisition among others.
We hope this tutorial illustrates the immediate value that person data can provide, and shows just how easy it is to get started using person data in your business. We encourage you to sign up for a free API key, and give the investment research Colab notebook a spin! As always, please reach out if you have any questions or suggestions, and join us again next time for the third tutorial in this series!
Need more data or credits? Need help customizing your pipeline?!
Like what you read? Scroll down and subscribe to our newsletter to receive monthly updates with our latest content.