Engineering Cover
Tutorials

Building Elasticsearch Queries with the Query Builder [Tutorial]

August 25, 2021

10 minutes

Introduction

We are excited to announce the release of our new Query Builder tool, which is available to all customers! Anyone interested in trying out the Query Building can sign up for a free API key as well and access the tool through the API dashboard. Although creating UI tools is not something that we focus on at PDL, this is a tool that we feel provides a great deal of value to our users and ultimately supports our mission of democratizing high-quality B2B data.

The Query Builder allows you to interactively construct queries for the Person Search API right from your browser - no coding necessary! In addition, the tool allows you to preview the number of profile matches without using any API credits and also provides auto-generated code examples that you can use to run your query and retrieve the actual profile results. We’ll explore each of these features of the Query Builder in detail with this tutorial. 

Query Builder New

Simple demonstration of building a query using the Query Builder. In this example, we are designing a query to find all the current employees at Amazon living in Seattle, WA.

We’ve often heard that our code examples make it very easy to get started with our APIs, and with the new Query Builder tool, you will be able to generate your own custom code examples specifically for your own use cases.

So let’s get started!

Building Queries for the Person Search API

As the name implies, one of the key uses for the Query Builder is...well...to build queries. Specifically, Elasticsearch (ES) queries for our Person Search API. We’ve written fairly extensive tutorials before on how to design ES queries for our Person Search API (see our Recruiting In-Depth Tutorial Part 1 and Part 2). However, as you may be able to tell, it can get a bit complicated.

With the new Query Builder tool we can get a head start on designing custom search queries with an interactive UI that automatically structures the queries and even auto-generates code that we can use to run the query ourselves. (Note: the August 2021 release is the first release of the Query Builder so some of the more advanced ES syntax patterns are not yet supported such as nested queries. If you are interested in this, again please checkout our Recruiting tutorials Part 1 and Part 2). 

The general workflow for using the Query Builder is a simple 2-step process. First, you build your query by specifying the demographic criteria for the profiles you would like to target. Second, you submit your query to view the number match results and auto-generated code for your query.

Specifying the Query Criteria

Queries for the Person Search API are composed of criteria, which are specific requirements that target profiles must match (such as “current employee at Amazon” or “skills in javascript”). You can use any field in our Person Schema to construct these criteria. 

There are 3 parts to a criteria (as shown in the figure below): the field, the operator, and the value. In the Query Builder, each row of input corresponds to a single criteria within the query. Additional criteria can be added to the query by clicking on the “Add More” button, and existing criteria can be removed by clicking on the red “X” button that appears next to each criteria when more than one criteria has been defined. Note that when using the Query Builder, all criteria are joined by the AND operation when evaluating the query, meaning that in order for a profile to match a query, it must satisfy every specified criteria (i.e. every row in the Query Builder interface). In Elasticsearch terms, these criteria are all grouped together under a must clause.

Query Builder Criteria Cropped

The 3 components for each criteria in a query: the field, the operator and the value. Here the field is “industry”, the operator is “is”, and the value is “biotechnology”.

Let’s take a closer look at how to specify the 3 parts of each query criteria:

Criteria Element 1: Field

Each query criteria must specify a field (aka profile attribute) to filter on. For the Person Search API, you can use any field from the Person Schema to build a query criteria around. As an example, we could use the “job_company_name” field from the Person Schema to search for profiles based on the name of the current employer.

Criteria Element 2: Filter Operator

In addition to the field, each criteria must also specify a filter operation that will be applied to the field. Our Person Search API supports a subset of the Elasticsearch operators listed in our documentation page. The Query Builder supports the following filter operations: “is”“is not”“is one of”“is not one of”“exists” and “does not exist”, which are shown in the dropdown menu when constructing a query criteria. 

Criteria Element 3: Value

The last part of a single query criteria is the value, which defines the field value that profiles will be evaluated against. The value is dependent on the filter operator selected. 

  • If the “is” or “is not” operator is used, then the value must be a valid field value. For example, if our criteria was “job_company_name is <value>”, then the value must be a company name such as “google”.

  • If the “is one of” or “is not one of” operator is used, then the value must be a comma-separated list of field values. For example, if our criteria was “job_company_name is not one of <value>”, then value must be a list of company names such as “google, facebook, amazon”.

  • If the "exists" or "does not exist" operator is used, then a value is not needed. For example, if the criteria was "job_company_name exists" then the criteria is fully specified without a value. In this example, the criteria would be satisfied for any profiles that have a non-null "job_company_name" value.

One subtlety of specifying values is that the value must match how values are represented in our full dataset. If we wanted to specify a company name value for the “job_company_name” field, our input must match how company names are stored in person profiles which is documented in our Person Manual. Note that all fields have a baseline level of format standardization such that all inputs are lowercase and stripped of leading/trailing whitespace. For example, company names would be formatted as “google” or “amazon”, instead of “Google” or “Amazon”.

Autocompletion

To help with setting values (e.g. Criteria Element 3), some fields support autocompletion in the Query Builder, particularly since formatting values correctly can be an easy source of mistakes. The supported fields are fields related to company names, location names, school names, industry, skills, interests, job roles/sub roles, and job titles. The exact list of supported autocomplete fields is specified in our Autocomplete API documentation

In the Query Builder, any autocomplete-supported field will display a prompt “Start typing to get suggestions” as shown in the image below.

Autocomplete Prompt

Autocompletion for a supported field in the Query Builder. Here we are using the “industry” field, which automatically displays a prompt to provide suggestions for the field value.

As you start typing in the value input box, suggestions will be populated in the menu. Each suggestion will indicate the total number of profiles in our dataset with that field value, and suggestions will be sorted by this number (with the most common values appearing first). 

Autocompletion

Autocomplete suggestions being populated as the input field is filled out.

Submitting the Query

Once you have finished building your query by adding in the desired criteria, you can submit your query by clicking the “Update Query” button. This will allow you to view:

  1. The number of profile matches satisfying the full query 

  2. Auto-generated code for your query

Submit Query

After submitting your query (by clicking the “Update Query” button), you will be able to see (a) the total number of profile matches for the query, as well as (b) the auto-generated sample code.

Number of Profile Matches

As stated earlier, this number reflects the total number of profile records from the full Person Dataset that satisfy every criteria in our submitted query. Although simple, being able to see the number of profile matches is quite a powerful tool when designing queries:

Identifying Overly-Restrictive Criteria

For example, one common mistake pattern that we see when using our Search API’s is including overly restrictive criteria with very low match rates within the PDL dataset. Using the Query Builder tool to preview the number of matches, allows you to identify restrictive criteria before spending credits to actually retrieve the profiles. This way you can ensure that your constructed query has a reasonable population of profile matches before integrating it into a larger application.

PDL Dataset Exploration

Additionally, being able to preview the number of match results for a query affords you the ability to investigate the coverage and breadth of PDL datasets. By using the Query Builder to construct custom queries and preview the number of matches, you can view population sizes for the specific demographic matching your query criteria. This gives you some additional insight into the PDL datasets, and another avenue to explore whether the data quality suits your intended applications. If you need more help understanding whether PDL data will work for your use case, we have a dedicated team of Data Consultants and technical experts ready to assist you, drop us a line!

Retrieving Profiles

Underneath the number profiles is another display area containing the auto-generated code for your submitted query. In order to view the actual profiles matching your constructed query, you will need to run the auto-generated code shown in this display area.

Auto-generated Code

Auto-generated code for the submitted query is displayed. Here the Elasticsearch query (JSON) is displayed for our query seeking “software engineers in San Francisco”.

Code is generated in a variety of languages for the submitted query, which can be accessed by the tabs above the code display. 

Raw

The default tab titled “Raw” displays the raw Elasticsearch query based on your submitted query. Elasticsearch queries are written in JSON format, so you can take this query and either directly copy it into your application code or save it to a file and load it programmatically (although usually people just do the former).

cURL

The next tab titled “cURL” displays an auto-generated cURL command which can be used to pull the queries. Unlike the “Raw” syntax, this auto-generated code can be run directly from a terminal. The auto-generated code in this section can pull down up to the first 100 profile matches (by changing the “size” parameter in the command before running it. You can run the code and retrieve up to 100 profile results as follows:

  1. Switch to the cURL tab

  2. Copy the auto-generated code in the display area

  3. Paste the code into a terminal

  4. Change the value of the size parameter by picking a number between 1-100 (default: 100)

  5. Replace the text “<INSERT YOUR API KEY>” with your API key

  6. Run the command and you should see the matching profiles returned 

The results returned will be full profile records from our Person Dataset. For more information on the structure and content of a person profile record, see our documentation.

Python

The last tab titled “Python” displays an auto-generated python script that can pull more all available profile records matching the submitted query. Unlike the cURL command, this auto-generated python script does not have any restriction on the number of profiles that can be pulled, so you can retrieve the entire demographic slice for your query. As a result, please keep in mind that it can easily use up a large amount of API credits if you are not careful (this code is based on our Bulk Retrieval Person Search example - so the same warnings apply here). By default, the number of profiles retrieved is set as 150. You can run the python script as follows:

  1.  Switch to  the Python tab

  2. Copy the auto-generated code in the display area

  3. Paste the code into a text file

  4. Paste your API key between the double quotes in this line: API_KEY = “”

  5. Set the fields you would like saved for each profile by updating the list of “csv_headers”

  6. Save the file with the “.py” extension

  7. Run the file using your python interpreter - you should see the matching profiles returned

The output of this python script will be a csv file containing all the profile matches for the submitted query. And just like that you have downloaded your own custom demographic slice of our Person Dataset!

Wrapping Up

Now that we’ve walked through the Query Builder, let’s quickly recap some of the key highlights. We can use the Query Builder to generate simple Elasticsearch queries based on our own custom demographic criteria. Each time we submit a query, the Query Builder will display the number of matching profiles as well as several auto-generated code samples that we can use to retrieve the actual profiles. 

As we said before, we often hear that our code examples make it very easy to get up and running with our APIs, and now you have the ability to construct your own examples customized for your specific use case! But not only is the Query Builder a great tool for learning and getting started with designing queries, but it also provides an easy way to explore our data coverage with your specific applications in mind. We encourage you to sign up and try out our new Query Builder today! As always, we are happy to help however possible, so please feel free to reach out and share your thoughts and questions with us.


Like what you read? Scroll down and subscribe to our newsletter to receive monthly updates with our latest content.

Vinay Rajur
Vinay Rajur

linkedintwitterfacebook