Person Stats

Terminology

Person (All) Dataset: Our collection of every profile record that we have published and made available for use. These records can contain null values for any of their fields.

Field: The attributes associated with each record in our dataset, as listed in the Person Schema. Each record in our Person Dataset contains all the fields in the Person Schema. However, in general, these records can have null values for their fields.

Dataset: A subset of our Person Dataset that contains every record with non-null value for a specific field. For example, our Email Dataset contains every record from our Person Dataset with at least one non-null email address.

Description

We refer to our full dataset of person profiles as our Person Dataset or our All Dataset. This dataset contains every person record that we have been able to confidently produce through our data ingestion and build process. However, records in our dataset are not guaranteed to have every field populated, and, in general, can contain null values. This is due to our high confidence requirement for merging and inferring missing field values and that we want to present the data as authentically as possible with a minimal amount of modification.

📘

Null Values and Frankenstein Profiles

Given the volume of data we take in, we often have many raw records with data on the same person. While we spend significant engineering resources working on linkages, it's not unusual to end up with 3-4 profiles of disparate information that relate to the same person.

While we could link these together (for example, based on name), this would create many false positive linkages. We call these Frankenstein profiles, where data on multiple people has been combined, making that record unusable, even application-breaking. Frankenstein profiles are bad, and we are extremely vigilant to the presence of them in our datasets.

In opting to have a strict linkage algorithm in our data build process, we also decided to define multiple use-case-specific subsets of the All Dataset data called Datasets. For example, our Eamil Dataset is a subset of our All Dataset containing every record that has a non-null email address.

This approach means that we are not forced to merge and infer missing information where we have low confidence in the original data while also helping to ensure a lower rate of duplication in the data as compared to using the All Dataset. Customers interested accessing the unmerged records still have the option do so through using the All Dataset instead of one of subset datasets.

List of Datasets

All Dataset

All Records Have: Name AND One Other Piece of PII

Number of Profiles: 2,462,569,045
Main Use Cases: Enrichment
Detailed Stats

Consumer Social Dataset

All Records Have: Facebook URL

Number of Profiles: 706,303,705
Main Use Cases: Contact Info Enrichment, Sales and Marketing, Fraud, Background Checks // People Search
Detailed Stats

Developer Dataset

All Records Have: GitHub URL

Number of Profiles: 3,111,096
Main Use Cases: Recruiting, Investment Sourcing
Detailed Stats

Email Dataset

All Records Have: Email

Number of Profiles: 637,553,882
Main Use Cases: Email Enrichment, Sales Lead Generation, Candidate Outreach
Detailed Stats

Mobile Phone Dataset

All Records Have: Mobile Phone Number

Number of Profiles: 587,408,521
Main Use Cases: Direct Dial Outreach, Caller ID
Detailed Stats

Phone Dataset

All Records Have: Any Phone Number

Number of Profiles: 794,524,574
Main Use Cases: Background Checks // People Search
Detailed Stats

Resume Dataset

All Records Have: LinkedIn URL in the Profiles Array

Number of Profiles: 724,455,525
Main Use Cases: Candidate Search, Prospect Search, Custom Audiences, Career Path Prediction/Labor Force Modeling, Investment Sourcing
Detailed Stats

Street Address Dataset

All Records Have: Street Address

Number of Profiles: 230,815,842
Main Use Cases: Contact Info Enrichment, Sales and Marketing, Skiptracing, Background Checks // People Search
Detailed Stats