Person Stats
Terminology
Person (All) Dataset: Our collection of every profile record that we have published and made available for use. These records can contain null values for any of their fields.
Field: The attributes associated with each record in our dataset, as listed in the Person Schema. Each record in our Person Dataset contains all the fields in the Person Schema. However, in general, these records can have null values for their fields.
Dataset: A subset of our Person Dataset that contains every record with non-null value for a specific field. For example, our Email Dataset contains every record from our Person Dataset with at least one non-null email address.
Description
We refer to our full dataset of person profiles as our Person Dataset or our All Dataset. This dataset contains every person record that we have been able to confidently produce through our data ingestion and build process. However, records in our dataset are not guaranteed to have every field populated, and, in general, can contain null values. This is due to our high confidence requirement for merging and inferring missing field values and that we want to present the data as authentically as possible with a minimal amount of modification.
Null Values and Frankenstein Profiles
Given the volume of data we take in, we often have many raw records with data on the same person. While we spend significant engineering resources working on linkages, it's not unusual to end up with 3-4 profiles of disparate information that relate to the same person.
While we could link these together (for example, based on name), this would create many false positive linkages. We call these Frankenstein profiles, where data on multiple people has been combined, making that record unusable, even application-breaking. Frankenstein profiles are bad, and we are extremely vigilant to the presence of them in our datasets.
In opting to have a strict linkage algorithm in our data build process, we also decided to define multiple use-case-specific subsets of the All Dataset data called Datasets. For example, our Eamil Dataset is a subset of our All Dataset containing every record that has a non-null email address.
This approach means that we are not forced to merge and infer missing information where we have low confidence in the original data while also helping to ensure a lower rate of duplication in the data as compared to using the All Dataset. Customers interested accessing the unmerged records still have the option do so through using the All Dataset instead of one of subset datasets.
List of Datasets
All Dataset
All Records Have: Name AND One Other Piece of PII
Number of Profiles: 2,472,221,285
Main Use Cases: Enrichment
Detailed Stats
Consumer Social Dataset
All Records Have: Facebook URL
Number of Profiles: 706,087,128
Main Use Cases: Contact Info Enrichment, Sales and Marketing, Fraud, Background Checks // People Search
Detailed Stats
Developer Dataset
All Records Have: GitHub URL
Number of Profiles: 3,103,444
Main Use Cases: Recruiting, Investment Sourcing
Detailed Stats
Email Dataset
All Records Have: Email
Number of Profiles: 636,228,660
Main Use Cases: Email Enrichment, Sales Lead Generation, Candidate Outreach
Detailed Stats
Mobile Phone Dataset
All Records Have: Mobile Phone Number
Number of Profiles: 486,453,297
Main Use Cases: Direct Dial Outreach, Caller ID
Detailed Stats
Phone Dataset
All Records Have: Any Phone Number
Number of Profiles: 793,359,963
Main Use Cases: Background Checks // People Search
Detailed Stats
Resume Dataset
All Records Have: LinkedIn URL in the Profiles Array
Number of Profiles: 735,903,933
Main Use Cases: Candidate Search, Prospect Search, Custom Audiences, Career Path Prediction/Labor Force Modeling, Investment Sourcing
Detailed Stats
Street Address Dataset
All Records Have: Street Address
Number of Profiles: 229,806,575
Main Use Cases: Contact Info Enrichment, Sales and Marketing, Skiptracing, Background Checks // People Search
Detailed Stats
Updated 4 days ago