Person (API) Dataset: Our collection of every profile record that we have published and made available for use. These records can contain null values for any of their fields.
Field: The attributes associated with each record in our dataset, as listed in the Person Schema. Each record in our Person Dataset contains all the fields in the Person Schema. However, in general, these records can have null values for their fields.
Slice Dataset: A subset of our Person Dataset that contains every record with non-null value for a specific field. For example, our Email Slice Dataset contains every record from our Person Dataset with at least one non-null email address.
We refer to our full dataset of person profiles as our Person Dataset or our API Dataset. This dataset contains every person record that we have been able to confidently produce through our data ingestion and build process. However, records in our dataset are not guaranteed to have every field populated, and, in general, can contain null values. This is due to our high confidence requirement for merging and inferring missing field values and that we want to present the data as authentically as possible with a minimal amount of modification.
Null Values and Frankenstein Profiles
Given the volume of data we take in, we often have many raw records with data on the same person. While we spend significant engineering resources working on linkages, it's not unusual to end up with 3-4 profiles of disparate information that relate to the same person.
While we could link these together (for example, based on name), this would create many false positive linkages. We call these Frankenstein profiles, where data on multiple people has been combined, making that record unusable, even application-breaking. Frankenstein profiles are bad, and we are extremely vigilant to the presence of them in our datasets.
In opting to have a strict linkage algorithm in our data build process, we also decided to define multiple use-case-specific slices of data (subsets of our full API dataset in which every record in that subset is guaranteed to have a value for a particular field). For example, our email slice dataset is a subset of our API dataset containing every record that has a non-null email address.
We have found that the vast majority of use cases are served quite well by this slice dataset structure, since typically there are only a few key linkages that our customers care about for their use case. The slice approach means that we are not forced to merge and infer missing information where we have low confidence in the original data while also helping to ensure a lower rate of duplication in the data as compared to using the full API dataset. Customers interested accessing the unmerged records still have the option do so through using the full API dataset instead of one of the slices.
All Records Have: Name AND One Other Piece of PII
Number of Profiles: 3,139,601,100
Main Use Cases: Enrichment
All Records Have: LinkedIn URL
Number of Profiles: 723,056,672
Main Use Cases: Candidate Search, Prospect Search, Custom Audiences, Career Path Prediction/Labor Force Modeling, Investment Sourcing
All Records Have: Email
Number of Profiles: 777,937,198
Main Use Cases: Email Enrichment, Sales Lead Generation, Candidate Outreach
All Records Have: Street Address
Number of Profiles: 231,420,363
Main Use Cases: Contact Info Enrichment, Sales and Marketing, Skiptracing, Background Checks // People Search
All Records Have: Facebook URL
Number of Profiles: 706,009,617
Main Use Cases: Contact Info Enrichment, Sales and Marketing, Fraud, Background Checks // People Search
All Records Have: Mobile Phone Number
Number of Profiles: 441,992,036
Main Use Cases: Direct Dial Outreach, Caller ID
All Records Have: Any Phone Number
Number of Profiles: 935,497,025
Main Use Cases: Skiptracing, Background Checks // People Search
All Records Have: GitHub URL
Number of Profiles: 3,124,026
Main Use Cases: Recruiting, Investment Sourcing
Updated about 2 months ago