Person (API) Dataset: Our collection of every profile record we have published and made available for use. These records can contain null values for any of their fields.
Field: The attributes associated with each record in our dataset, as listed in the Person Schema. Each record in our Person Dataset contains all the fields in the Person Schema, however, in general these records can have null values for their fields.
Slice Dataset: A subset of our Person Dataset that contains every record with non-null value for a specific field. For example, our Email Slice Dataset contains every record from our Person Dataset with at least one non-null email address.
Our full dataset on person profiles is referred to as our Person Dataset or our API Dataset. This dataset contains every record we have been able to confidently produce through our data ingestion and build process. However, records in our dataset are not guaranteed to have every field populated and in general can contain null values. This is due to our high confidence requirement for merging and inferring missing field values and the fact that we want to present the data as authentically as possible with the minimal amount of modification.
Null Values and Frankenstein Profiles
Given the volume of data we take in -- we often have many raw records with data on the same person. While we spend significant engineering resources working on linkages, it is often the case that we might end up with 3-4 profiles of disparate information that relates to the same person.
While we could link these together based on name, for example, this would create many false positive linkages. We call these frankenstein profiles -- where data on multiple people is tied together making that record unusable, even application-breaking. Frankenstein profiles are bad, and we are extremely vigilant to the presence of them in our datasets.
In opting to have a strict linkage algorithm in our Data Build Process, we also decided to define multiple use-case specific slices of data (subsets of our full API dataset in which every record in that subset is guaranteed to have a value for a particular field). For example, our email slice dataset is a subset of our API dataset containing every record that has a non-null email address.
We have found that the vast majority of use cases are served quite well by this slice dataset structure, since typically there are only a few key linkages that our customers care about for their use case. The slice approach means that we are not forced to merge and infer missing information where we have low confidence in the original data, while also helping to ensure a lower rate of duplication in the data as compared to using the full API dataset. Customers interested accessing the unmerged records still have the option do so through using the full API dataset instead of one of the slices.
All Records Have -- Name + One other piece of PII
Number of Profiles: 3,109,152,901
Main Use Cases: Enrichment
All Records Have -- Linkedin URL
Number of Profiles: 710,998,512
Main Use Cases: Candidate Search, Prospect Search, Custom Audiences, Career Path Prediction/Labor Force Modeling, Investment Sourcing
All Records Have -- Email
Number of Profiles: 774,182,148
Main Use Cases: Email Enrichment, Sales Lead Generation, Candidate Outreach
All Records Have -- Street Address
Number of Profiles: 231,189,002
Main Use Cases: Contact Info Enrichment, Sales and Marketing, Skiptracing, Background Checks // People Search
All Records Have -- Facebook URL
Number of Profiles: 706,019,163
Main Use Cases: Contact Info Enrichment, Sales and Marketing, Fraud, Background Checks // People Search
All Records Have -- Mobile Phone Number
Number of Profiles: 441,860,164
Main Use Cases: Direct Dial Outreach, Caller ID
All Records Have -- Any Phone Number
Number of Profiles: 935,509,392
Main Use Cases: Skiptracing, Background Checks // People Search
All Records Have -- Github URL
Number of Profiles: 3,124,413
Main Use Cases: Recruiting, Investment Sourcing
Updated 5 days ago