v4 (Schema 4) to v5 (Schema 5) Migration
As you may have read in the most recent release notes we are deprecating our v4 schema (Schema 4) in the API, and moving entirely to our v5 schema (Schema 5). This article describes the differences, similarities, and pros/cons of v4 of the data schema and v5 to help ease the transition.
Since we initially created v5 in July 2020, we have developed new features exclusively in v5 including new API endpoints, better field-level metadata, and new fields which are v5 exclusive. We’ve made certain persistence commitments around v5 and the schema is flatter and easier to query.
FAQs:
What's the deal with the Schemas?
Schema 4 is being deprecated, which includes the schema 4 API endpoint (https://api.peopledatalabs.com/v4/person
). The vast majority of our customers have already switched to schema 5.
The schema 5 endpoint has been running for over a year and heralded much praise from customers for its simpler, flatter schema which makes it easy to query, and easier to manage. This is a net win for customers, however, you must complete your conversion of your API code to schema 5 ASAP to ensure smooth transition of service. Continue reading for additions and removals to schema 5 that may introduce breaking changes.
Support for schema 4 ended in July 2021. Bugs are no longer being fixed. It is with urgency that migrations to schema 5 must be completed by Oct 31, 2021.
PLEASE NOTE: Schema 4 is End of Life
With schema 4 support ended in July 2021, if the schema 4 API endpoint suffers a failure on or after November 1, any customer software based on the schema 4 API endpoint will cease functioning. In any event, the schema 4 API endpoint will be taken down by the end of Q4 2021 at the latest.
Where can I get instructions on migrating to v5?
These docs are a good first step. Feel free to also reference our v5 field list or consult with a technical services resource through your Customer Success Manager or by emailing [email protected].
Can I review the schema files for v4 and v5?
You can view sample profiles with data from the API v4 Sample Profile and v5 Sample Profile.
You can get the standard schema for v5 here.
Is there an increased cost for v5?
There is no increased cost for v5. For data license customers, simply make a v5 request to Customer Success for your next data license delivery. For API accounts, simply switch your API endpoint from https://api.peopledatalabs.com/v4/ to https://api.peopledatalabs.com/v5/.
Are there any old data fields removed in v5 that were in v4?
All fields in v4 are represented in some way in v5 with a few exceptions:
experience.most_recent
,education.most_recent
,location.most_recent
,experience.type
: All were deprecated in 2019, and are removed due to their obsolescencenames
: Having multiple historic names was causing more questions than it answered. Instead, we take the formerprimary.name
and use that as our only name values.phone_numbers
andemails
metadata: Fields likeemails.sha256
,phone_numbers.country_code
, etc… are redundant since they can be generated from the email/phone string respectively.locations.subregion
: had low adoption and caused bloat
Are all top-level fields also contained in v5 arrays with is_primary=true
as v4 had?
is_primary=true
as v4 had?In v4, the primary
object contained the most relevant values. These values were also repeated in name[]
, industry[]
, experience[]
and location[]
arrays and tagged as is_primary=true
.
In v5:
- The v5
experience
array contains all experience history and the primary job (the one presented in the top-leveljob_
fields is also presented in theexperience
array and hasis_primary=true
flag.- No other array retains the v4
is_primary=true
flag.
- No other array retains the v4
- The v4
industries
array is converted to the v5industry
string, showing the most relevant industry for the person. - The
primary
array has been flattened into the top-level fields. - The
names
array has been flattened as described in the previous section. - The
location
array has been restructured as explained below.
Are there any new data fields included in v5 that were not in v4?
Yes.
job_start_date
: makes it easier to access the start date for a person’s current jobjob_title_role
//job_title_sub_role
: Replace thetitle.functions
with a usable title taxonomylocation_metro
,experience.company.location.metro
,job_company_location_metro
: Tag most US localities with a metro area, making it easier to query for people in a metro area (vs a geographic city)job_company_ticker
,job_company_type
: Two new restricted fields which add more metadata to our companies.linkedin_username
,linkedin_id
,facebook_url
,facebook_username
,facebook_id
,twitter_url
,twitter_username
,github_url
,github_username
,work_email
,mobile_phone
: We tagged and flattened a lot of PII to make it easier to query, use as a required parameter, or flatten into a CSV. All these values will continue to also exist inprofiles
,emails
, orphone_numbers
as well.education.school.website
vseducation.school.domain
: We previously used top-level domains to identify schools. These exist in the domain field, but we now also identify children of that school domain in thewebsite
field. (for example haas.berkeley.edu vs berkeley.edu)
Data Field Mapping Highlights
v4 field(s) | v5 field(s) | comments |
---|---|---|
birth_date_fuzzy | birth_year | |
primary.industry | industry | |
primary.job | job_ fields | We flattened the primary object to make it easier to ingest and query. Not all fields from the experience object are flattened, but all the most relevant fields are. |
primary.location | location_ fields | We flattened the primary object to make it easier to ingest and query. |
primary.name | full_name , first_name , last_name , middle_name | We flattened the primary object to make it easier to ingest and query. |
primary.linkedin | linkedin_url | We flattened the primary object to make it easier to ingest and query. |
primary.other_emails | Included in emails array | We flattened the primary object to make it easier to ingest and query. |
primary.personal_emails | personal_emails | We flattened the primary object to make it easier to ingest and query. |
primary.work_emails | work_email | We flattened the primary object to make it easier to ingest and query and select a most likely work email. |
experience.title.functions | experience.title.roles | We removed the functions field and replaced it with more useful tagging. |
emails | emails | Turned List[Object] into List[String] |
phone_numbers | phone_numbers | Turned List[Object] into List[String] |
locations[] | location_names , regions , countries , street_addresses | See Note Below. |
What happened to the location data?
We redesigned how we deal with locations. Before, we had a primary.location
(which we’ve converted to the location_
fields) and a list of historic locations. The list of historic locations had a lot of empty fields and bloat. For example, a location object which was only a country had 9 empty fields alongside it. Instead, we divided all historic locations by the relevant categories for querying and access. For example, someone who used to live in Tuscaloosa, AL might have:
countries: [“united states”]
regions: [“alabama, united states”]
location_names: [“tuscaloosa, alabama, united states”]
street_addresses: [ {"name": "tuscaloosa, alabama, united states", "locality": "tuscaloosa", "region": "alabama", "metro": "tuscaloosa, alabama", "country": "united states", "continent": "north america", "street_address": "123 main street", "address_line_2": "apartment 42", "postal_code": "35405", "geo": "33.20,-87.56"}]
Or they might only have if we didn’t have an address, decreasing the bloat of data when we have less information:
countries: [“united states”]
regions: [“alabama, united states”]
location_names: [“tuscaloosa, alabama, united states”]
Updated about 1 year ago