v4 (Schema 4) to v5 (Schema 5) Migration

As you may have read in the most recent release notes we are deprecating our v4 schema (Schema 4) in the API, and moving entirely to our v5 schema (Schema 5). This article describes the differences, similarities, and pros/cons of v4 of the data schema and v5 to help ease the transition.

Since we initially created v5 in July 2020, we have developed new features exclusively in v5 including new API endpoints, better field-level metadata, and new fields which are v5 exclusive. We’ve made certain persistence commitments around v5 and the schema is flatter and easier to query.

FAQs:

What's the deal with the Schemas?

Schema 4 is being deprecated, which includes the schema 4 API endpoint (https://api.peopledatalabs.com/v4/person). The vast majority of our customers have already switched to schema 5.

The schema 5 endpoint has been running for over a year and heralded much praise from customers for its simpler, flatter schema which makes it easy to query, and easier to manage. This is a net win for customers, however, you must complete your conversion of your API code to schema 5 ASAP to ensure smooth transition of service. Continue reading for additions and removals to schema 5 that may introduce breaking changes.

Support for schema 4 ended in July 2021. Bugs are no longer being fixed. It is with urgency that migrations to schema 5 must be completed by Oct 31, 2021.

❗️

PLEASE NOTE: Schema 4 is End of Life

With schema 4 support ended in July 2021, if the schema 4 API endpoint suffers a failure on or after November 1, any customer software based on the schema 4 API endpoint will cease functioning. In any event, the schema 4 API endpoint will be taken down by the end of Q4 2021 at the latest.

Where can I get instructions on migrating to v5?

These docs are a good first step. Feel free to also reference our v5 field list or consult with a technical services resource through your Customer Success Manager or by emailing [email protected].

Can I review the schema files for v4 and v5?

You can view sample profiles with data from the API v4 Sample Profile and v5 Sample Profile.

You can get the standard schema for v5 here.

Is there an increased cost for v5?

There is no increased cost for v5. For data license customers, simply make a v5 request to Customer Success for your next data license delivery. For API accounts, simply switch your API endpoint from https://api.peopledatalabs.com/v4/ to https://api.peopledatalabs.com/v5/.

Are there any old data fields removed in v5 that were in v4?

All fields in v4 are represented in some way in v5 with a few exceptions:

  • experience.most_recent, education.most_recent, location.most_recent, experience.type: All were deprecated in 2019, and are removed due to their obsolescence
  • names: Having multiple historic names was causing more questions than it answered. Instead, we take the former primary.name and use that as our only name values.
  • phone_numbers and emails metadata: Fields like emails.sha256, phone_numbers.country_code, etc… are redundant since they can be generated from the email/phone string respectively.
  • locations.subregion: had low adoption and caused bloat

Are all top-level fields also contained in v5 arrays with is_primary=true as v4 had?

In v4, the primary object contained the most relevant values. These values were also repeated in name[], industry[], experience[] and location[] arrays and tagged as is_primary=true.

In v5:

  • The v5 experience array contains all experience history and the primary job (the one presented in the top-level job_ fields is also presented in the experience array and has is_primary=true flag.
    • No other array retains the v4 is_primary=true flag.
  • The v4 industries array is converted to the v5 industry string, showing the most relevant industry for the person.
  • The primary array has been flattened into the top-level fields.
  • The names array has been flattened as described in the previous section.
  • The location array has been restructured as explained below.

Are there any new data fields included in v5 that were not in v4?

Yes.

  • job_start_date: makes it easier to access the start date for a person’s current job
  • job_title_role // job_title_sub_role: Replace the title.functions with a usable title taxonomy
  • location_metro, experience.company.location.metro, job_company_location_metro: Tag most US localities with a metro area, making it easier to query for people in a metro area (vs a geographic city)
  • job_company_ticker, job_company_type: Two new restricted fields which add more metadata to our companies.
  • linkedin_username, linkedin_id, facebook_url, facebook_username, facebook_id, twitter_url, twitter_username, github_url, github_username, work_email, mobile_phone: We tagged and flattened a lot of PII to make it easier to query, use as a required parameter, or flatten into a CSV. All these values will continue to also exist in profiles, emails, or phone_numbers as well.
  • education.school.website vs education.school.domain: We previously used top-level domains to identify schools. These exist in the domain field, but we now also identify children of that school domain in the website field. (for example haas.berkeley.edu vs berkeley.edu)

Data Field Mapping Highlights

v4 field(s)v5 field(s)comments
birth_date_fuzzybirth_year
primary.industryindustry
primary.jobjob_ fieldsWe flattened the primary object to make it easier to ingest and query. Not all fields from the experience object are flattened, but all the most relevant fields are.
primary.locationlocation_ fieldsWe flattened the primary object to make it easier to ingest and query.
primary.namefull_name, first_name, last_name, middle_nameWe flattened the primary object to make it easier to ingest and query.
primary.linkedinlinkedin_urlWe flattened the primary object to make it easier to ingest and query.
primary.other_emailsIncluded in emails arrayWe flattened the primary object to make it easier to ingest and query.
primary.personal_emailspersonal_emailsWe flattened the primary object to make it easier to ingest and query.
primary.work_emailswork_emailWe flattened the primary object to make it easier to ingest and query and select a most likely work email.
experience.title.functionsexperience.title.rolesWe removed the functions field and replaced it with more useful tagging.
emailsemailsTurned List[Object] into List[String]
phone_numbersphone_numbersTurned List[Object] into List[String]
locations[]location_names, regions, countries, street_addressesSee Note Below.

What happened to the location data?

We redesigned how we deal with locations. Before, we had a primary.location (which we’ve converted to the location_ fields) and a list of historic locations. The list of historic locations had a lot of empty fields and bloat. For example, a location object which was only a country had 9 empty fields alongside it. Instead, we divided all historic locations by the relevant categories for querying and access. For example, someone who used to live in Tuscaloosa, AL might have:

  • countries: [“united states”]
  • regions: [“alabama, united states”]
  • location_names: [“tuscaloosa, alabama, united states”]
  • street_addresses: [ {"name": "tuscaloosa, alabama, united states", "locality": "tuscaloosa", "region": "alabama", "metro": "tuscaloosa, alabama", "country": "united states", "continent": "north america", "street_address": "123 main street", "address_line_2": "apartment 42", "postal_code": "35405", "geo": "33.20,-87.56"}]

Or they might only have if we didn’t have an address, decreasing the bloat of data when we have less information:

  • countries: [“united states”]
  • regions: [“alabama, united states”]
  • location_names: [“tuscaloosa, alabama, united states”]