Headcount Methodology
Research

Exploring Employee Headcount Accuracy: Pitfalls and Solutions

September 6, 2024

8 minutes

How is Headcount Data Generated?

LinkedIn is frequently used as a gold-standard for company headcount data, thanks to its extensive database of user-generated profiles. But how does LinkedIn actually generate these headcount figures? 

At a high level, the process involves two main steps:

  1. Matching User Profiles to Company Profiles: In the first step, LinkedIn uses the employment details from user profiles to match individuals to companies.  This involves both direct matching based on the user-selected company as well as indirect matching where LinkedIn will infer the appropriate company profile based on information in the user’s profile. 

  2. Aggregating Profiles: Once users are matched to companies, LinkedIn sums up the profiles to estimate total headcount. By considering previous work experiences as well, LinkedIn is even able to produce historical headcounts as well. 

While this approach seems straightforward, it introduces several challenges and biases that can lead to inaccurate headcount figures. 

Let’s explore these issues in more detail.

Handling Low-Quality User Profiles

One significant source of error in LinkedIn’s headcount data is the inclusion of low-quality or incomplete profiles. Based on some of our own internal analysis, we’ve found that only between 10%-20% of the 1 billion+ profiles on LinkedIn contain complete and up-to-date employment information. 

As an example, LinkedIn reports that Anthropic, a relatively young, but fast-growing AI research organization has a headcount of over 839 employees. But a closer look at these profiles reveals a highly variable distribution quality, with somewhere ~15% of these profiles being relatively incomplete and generally irrelevant. As a result, many of the headcounts produced by LinkedIn have an inflationary bias (which is particularly relevant for popular and well-known companies). 

In our own headcount estimation process, we try to address this issue by enforcing strict quality and completeness requirements on the profiles we use for computing headcounts. The result is that our estimates are slightly more conservative, but more realistic (which many of our users with direct access to different company headcounts have verified). 

Person-to-Company Matching Issues

A second challenge arises from LinkedIn’s person-to-company matching process. LinkedIn relies heavily on user input and text-based matching logic to figure out which company profiles a user is associated with. As a result, errors arise when users incorrectly select their employers as well as when LinkedIn’s text-based matchers assign profiles to the wrong company. 

A clear example of this is with the company Railway, a cloud-infrastructure provider. Linkedin reports a headcount of nearly 4,000 employees, despite the company’s self-reported range being only 11-50. The discrepancy here stems from the thousands of profiles of railroad workers being incorrectly mapped to the Railway organization. While this is possibly due to user error in some cases, the magnitude of the issue in this case indicates a more systematic problem in the way LinkedIn matches users to companies. 

Our solution to this challenge has been the development of our own company entity resolution system, which allows us to generate consistent identifiers for companies based on a varied combination of inputs (including company names, websites, locations and more). We use this system to determine whether we have enough information in a user’s profile to identify a company to reliably match profiles to companies. In cases where we don’t have enough information, we exclude them from the matching process and therefore from the headcount calculation as well. Similar to the case above, this approach produces more conservative, yet more reliable headcounts. 

Parent-Subsidiary Relationships

Finally, LinkedIn’s approach to handling parent and subsidiary relationships can also distort headcount estimates. For companies with subsidiary organizations, LinkedIn generally reports the total headcount at the parent company by including the headcounts across all its subsidiary organizations as well. While this is common practice (and even how most public companies choose to report their headcounts to the SEC), this approach can not only obscure the true size and growth trends of the parent organization, but also magnify the biases presented previously when summed up across each of the subsidiary organizations. 

An example here is the company LVMH, which LinkedIn reports as having over 146,000 employees across the parent and its many subsidiary organizations. But if we are interested in just the growth trends within the parent organization, it can be extremely tedious to isolate the headcount when reported this way. 

In contrast, we report headcounts for employees directly employed at the parent organization, which we believe provides a clearer picture of the true size and growth trends. Furthermore, since we also track the set of subsidiaries associated with a company, it is relatively easy to reconstruct the full headcount as well. 

Conclusion

While LinkedIn provides a convenient source for headcount data, it is essential to be aware of its limitations. The key biases we’ve discussed include:

  • Inadequate filtering of low-quality profiles

  • Inaccurate person-to-company matching algorithms

  • Aggregation of headcounts across parent and subsidiary entities

In contrast, PDL’s approach includes stricter filtering of profiles, more reliable person-to-company matching , and separate accounting for parent and subsidiary organizations. We believe these methods lead to more accurate and reliable headcount estimates that better reflect the actual workforce trends. Our approach was built up over years of experience and feedback from our customers, and ultimately we believe our data provides a more accurate and reliable reflection of the actual workforce trends in the market.

If you’d like to take a deeper dive into the details of our analysis and approach, check out the full whitepaper we published, which you can find here: Exploring Headcount Accuracy.

Call to Action
Vinay Rajur
Vinay Rajur