Person Stats

Terminology

Person (API) Dataset: Our collection of every profile record that we have published and made available for use. These records can contain null values for any of their fields.

Field: The attributes associated with each record in our dataset, as listed in the Person Schema. Each record in our Person Dataset contains all the fields in the Person Schema. However, in general, these records can have null values for their fields.

Slice Dataset: A subset of our Person Dataset that contains every record with non-null value for a specific field. For example, our Email Slice Dataset contains every record from our Person Dataset with at least one non-null email address.

Description

We refer to our full dataset of person profiles as our Person Dataset or our API Dataset. This dataset contains every person record that we have been able to confidently produce through our data ingestion and build process. However, records in our dataset are not guaranteed to have every field populated, and, in general, can contain null values. This is due to our high confidence requirement for merging and inferring missing field values and that we want to present the data as authentically as possible with a minimal amount of modification.

📘

Null Values and Frankenstein Profiles

Given the volume of data we take in, we often have many raw records with data on the same person. While we spend significant engineering resources working on linkages, it's not unusual to end up with 3-4 profiles of disparate information that relate to the same person.

While we could link these together (for example, based on name), this would create many false positive linkages. We call these Frankenstein profiles, where data on multiple people has been combined, making that record unusable, even application-breaking. Frankenstein profiles are bad, and we are extremely vigilant to the presence of them in our datasets.

In opting to have a strict linkage algorithm in our data build process, we also decided to define multiple use-case-specific slices of data (subsets of our full API dataset in which every record in that subset is guaranteed to have a value for a particular field). For example, our email slice dataset is a subset of our API dataset containing every record that has a non-null email address.

We have found that the vast majority of use cases are served quite well by this slice dataset structure, since typically there are only a few key linkages that our customers care about for their use case. The slice approach means that we are not forced to merge and infer missing information where we have low confidence in the original data while also helping to ensure a lower rate of duplication in the data as compared to using the full API dataset. Customers interested accessing the unmerged records still have the option do so through using the full API dataset instead of one of the slices.

List of Slice Datasets

All Dataset

All Records Have: Name AND One Other Piece of PII

Number of Profiles: 3,178,815,044
Main Use Cases: Enrichment
Detailed Stats

Resume Slice

All Records Have: LinkedIn URL

Number of Profiles: 744,191,278
Main Use Cases: Candidate Search, Prospect Search, Custom Audiences, Career Path Prediction/Labor Force Modeling, Investment Sourcing
Detailed Stats

Email Slice

All Records Have: Email

Number of Profiles: 842,934,945
Main Use Cases: Email Enrichment, Sales Lead Generation, Candidate Outreach
Detailed Stats

Street Address Slice

All Records Have: Street Address

Number of Profiles: 230,581,392
Main Use Cases: Contact Info Enrichment, Sales and Marketing, Skiptracing, Background Checks // People Search
Detailed Stats

Consumer Social Slice

All Records Have: Facebook URL

Number of Profiles: 706,324,853
Main Use Cases: Contact Info Enrichment, Sales and Marketing, Fraud, Background Checks // People Search
Detailed Stats

Mobile Phone Slice

All Records Have: Mobile Phone Number

Number of Profiles: 477,819,832
Main Use Cases: Direct Dial Outreach, Caller ID
Detailed Stats

Phone Slice

All Records Have: Any Phone Number

Number of Profiles: 984,597,596
Main Use Cases: Skiptracing, Background Checks // People Search
Detailed Stats

Developer Slice

All Records Have: GitHub URL

Number of Profiles: 3,116,626
Main Use Cases: Recruiting, Investment Sourcing
Detailed Stats