April 2024 Release Announcement (v26.0)

v26.0 was released on 4/2/2024.

Welcome to our April 2024 release notes! One quarter into the new year and we have a ton of exciting updates to share!

Here are some of the key highlights:

Excited yet? Read on to learn more, or jump to a specific section using the table of contents below.

Table of Contents

📣 Key Announcements

❗Breaking Changes

🚀 Data Updates

🛠 Improvements and Bug Fixes


📣 Key Announcements

Schema Changes

Rename person.gender to person.sex (Person Schema)

❗️

Note that this field rename is a breaking change - please see the Breaking Changes section for previous announcements and additional information.

As a reminder, with the v26.0 release, we are renaming the person.gender field to person.sex in the Person Schema. The output of the field will remain the same, as shown in the example record below:

Example PDL Record - v26.0

  "id": "qEnOZ5Oh0poWnQ1luFBfVw_0000",
  "full_name": "sean thorne",
  "first_name": "sean",
  "middle_initial": "f",
  "middle_name": "fong",
  "last_initial": "t",
  "last_name": "thorne",
  "sex": "male",  -> renamed from gender
...

This change is required to demonstrate adherence with legislative changes defining aspects of gender as sensitive personal data (which PDL does not process or output).

For help moving over to the new field, please reach out to your Customer Success team for support and enablement resources. Please also see this easy-to-follow guide prepared by our Technical Services team for instructions on how to transition to this new schema:

Breaking Change Guide: Field Rename from Gender to Sex

New Resume Timestamps (Person Schema)

⚠️

This change is associated with a deprecation of our current job_last_updated field in the Person Schema as part of the July 2024 (v27.0) release. See the Deprecation announcement for additional details.

This quarter, we are excited to announce the launch of two new fields in our Person Schema: job_last_changed and job_last_verified.

Field NameData TypeField DescriptionExample
job_last_changedString (Date)The timestamp that reflects when the top-level job information changed."job_last_changed": "2023-10-04"
job_last_verifiedString (Date)The timestamp that reflects when the top level job information was last validated by a data source."job_last_updated": "2024-01-05"

These new fields contain timestamps associated with the top-level job on a profile (i.e. the most current experience) and provide additional clarity and granularity on the freshness of a person’s current work experience. They are now included in all Person data records, and are immediately available to all PDL users who have access to our job information.

These two timestamps are intended to replace the existing timestamp field, job_last_updated, which will be deprecated in v27.0. Any customers currently using the job_last_updated field should transition to the new job_last_changed and job_last_verified fields over this next quarter.

For support transitioning off of the job_last_updated field and onto our newly released resume timestamp fields, please reach out to your Customer Success team. Please also see this easy-to-follow guide prepared by our Technical Services team for additional guidance:

Breaking Changes Guide: Deprecation of job_last_updated

Employee Count By Role Fields (Company Schema)

We are excited to share that 2 new fields have been added to our company schema as of our v25.2 release:

📘

Salesforce Integration

These fields are now also live in the PDL Salesforce Integration.

Field NameData TypeField Description
employee_count_by_roleObjectThe number of employees (INT) by Job Role on the final day of the most recent month.
employee_growth_rate_12_month_by_roleObjectThe twelve month rate of change (FLOAT) by Job Role on the final day of the most recent month.
Examples (click to expand)
Field Name Example
employee_count_by_role
"employee_count_by_role": {
    "real_estate": 0,
    "design": 2,
    "trades": 0,
    "marketing": 4,
    "education": 4,
    "legal": 0,
    "customer_service": 10,
    "finance": 6,
    "public_relations": 1,
    "engineering": 24,
    "human_resources": 3,
    "media": 1,
    "sales": 12,
    "operations": 10,
    "health": 0
  }
employee_growth_rate_12_month_by_role
"employee_count_by_role": {
    "real_estate": 0,
    "design": 2,
    "trades": 0,
    "marketing": 4,
    "education": 4,
    "legal": 0,
    "customer_service": 10,
    "finance": 6,
    "public_relations": 1,
    "engineering": 24,
    "human_resources": 3,
    "media": 1,
    "sales": 12,
    "operations": 10,
    "health": 0
  }

These fields provide quick access for our customers to the most recent department/role headcounts for companies without the need to un-nest this information from our insights data. Customers using our Salesforce Integration in particular may find these new fields especially valuable, making it possible to now assign role tags and department growth rates to customer accounts directly within the integration.

Both of these new fields have been added to the existing Premium and Comprehensive Company Data Bundles and are immediately available to customers with these bundles.

Role and Sub_Role Updates (Person Schema)

In our October release (v28.0) we will be making significant changes to our job_title_role and job_title_sub_role enum values in order to improve our tag fill rates and improve the categories we use to represent titles. We’ll be posting a formal breaking change notice and updated canonical values alongside the v27.0 release.

💡

Open Feedback Solicitation

We are currently soliciting feedback on our existing taxonomy and a draft of the new taxonomy. If you’d like to get a preview and/or give feedback on the taxonomy please reach out to your Customer Success Manager.


❗Breaking Changes

❗Rename person.gender to person.sex

Previous Announcements: v24 / October 2023, v25 / January 2024

We have renamed the gender field to sex in the Person Schema. The output will remain the same. We output the biological sex of a profile, but not their gender as defined in applicable legislation. This change is required to demonstrate adherence with legislative changes defining aspects of gender as sensitive personal data (which People Data Labs does not process or output).

Example PDL Record - v26.0

 "id": "qEnOZ5Oh0poWnQ1luFBfVw_0000",
 "full_name": "sean thorne",
  "first_name": "sean",
  "middle_initial": "f",
  "middle_name": "fong",
  "last_initial": "t",
  "last_name": "thorne",
  "sex": "male",  -> renamed from gender
...

For help moving over to the new field, please reach out to your Customer Success team for support and enablement resources. Please also see this easy-to-follow guide prepared by our Technical Services team for instructions on how to transition to this new schema:

Breaking Change Guide: Field Rename from Gender to Sex

❗Company ID Format Changes

Previous Announcements: v25 / January 2024

Change to Format
While the field name (id) and data format (string) remain the same as before, PDL’s Company IDs will now have an alphanumeric hash format similar to our Person IDs.

The Company ID for the People Data Labs record

v25.2v26.0
"id": "peopledatalabs""id": "tnHcNHbCv8MKeLh92946LAkX6PKg"

Old ID Shortcomings
For v25.2 and prior releases, the Company ID for each company record was generated from the profile’s most recent LinkedIn URL. This created barriers to serving as a reliable ID for updating and managing profiles over time, as LinkedIn URLs can change when a company changes their name, companies can edit their LinkedIn URL at any point, and old LinkedIn URLs may be reused by new companies in the future.

Benefits of the New IDs
While the new format Company IDs are not persistent IDs, they were designed in a manner intended to undergo fewer changes than the previous LinkedIn URL slug-defined format. In addition, since the new Company IDs are generated independent of LinkedIn URLs, we can now add companies to our dataset that do not have an associated LinkedIn profile.

Handling the Changes
There are a few things we’ve planned to to help make the transition as smooth as possible:

The ID field name and datatype are the same as before

  • Neither the field name (“id”) nor the datatype (string) of the ID field are changing, so any queries, joins, or other code references to that field should continue to function as they did previously.

The old ID exists in a new field called linkedin_slug, which is still used in enrichment matching

  • If you’ve stored past company IDs and would like to use those as Company Enrichment inputs, the old ID field still exists under the new field name linkedin_slug, generated using the exact same logic as our old IDs.
  • In addition, as of the v26.0 release, Company Enrichment queries using the pdl_id field will match against both the id or linkedin_slug fields to help maintain backwards compatibility.

Mapping of v26.0 Company ID to LinkedIn Slug (prior ID format)

⚠️

Only for v26.0

This mapping is a one-time file that we have created specifically for the v26.0 release. We will not be maintaining this file in future releases.

Please reach out to your Customer Success team for access to this resource and additional support material.

For v26.0, we have created a mapping of Company IDs to LinkedIn Slug that we can provide to users to support their transition to the new Company IDs.

The format for this file is:

display_namelinkedin_slugid
People Data LabspeopledatalabstnHcNHbCv8MKeLh92946LAkX6PKg
GooglegoogleaKCIYBNF9ey6o5CjHCCO4goHYKlf
...

PDL Record - v25.2

  "name": "people data labs",
  "id": "peopledatalabs",  -> linkedin_slug format in v25.2
  "linkedin_url": "linkedin.com/company/peopledatalabs"
  "linkedin_slug": "peopledatalabs"

PDL Record - v26.0

  "name": "people data labs",
  "id": "tnHcNHbCv8MKeLh92946LAkX6PKg",  -> alphanumeric format in v26.0
  "linkedin_url": "linkedin.com/company/peopledatalabs"
  "linkedin_slug": "peopledatalabs"

⚠️ Upcoming Breaking Changes

🚧

Upcoming Breaking Changes

There are upcoming breaking changes in future versions that may impact your current processes. We are announcing them here to provide ample time for you to adjust your processes accordingly.

⚠️ Snowflake Schema Standardization

Change expected in: v26.1 / May 2024
Previous Announcements: v24 / October 2023, v25 / January 2024, v25.1 / February 2024

As announced in our previous release notes (linked above) we will be standardizing our Snowflake Person and Company Schemas in May 2024 (v26.1). This is a reminder that this change will be a breaking change to our existing Snowflake schema.

To prepare for this transition, we strongly encourage our snowflake customers to follow the steps below:

  1. Make a copy of your current data after your April 2024 delivery. This way you not only have a backup, but can also compare new to old after you switch over.
  2. Go through the new standard schemas, which are included in the Resources section below as well as here
  3. Prepare any script changes to your existing processes before the switch in May 2024.
    For any questions or help transitioning to these new schema, please reach out to your Customer Success Manager.

The Standard Person and Company Schemas that will be used for Snowflake deliveries are available here.

⚠️ Deprecation of job_last_updated

Change expected in: v27.0 / July 2024

As part of our new resume timestamps that were released this quarter, we will be deprecating our existing job_last_updated field in the July 2024 (v27.0) release. Our newly released resume timestamps provide more granularity and clarity than our existing job_last_updated timestamp, and help resolve ambiguity in the freshness of a person’s current work experience.

Any customers currently using our job_last_updated field will need to migrate to the new job_last_verified and job_last_changed fields before v27.0.

For help moving over to the new field, please reach out to your Customer Success and Technical Services for support and enablement resources. Please also see this easy-to-follow guide prepared by our Technical Services team for instructions on how to transition to this new schema:

Breaking Changes Guide: Deprecation of job_last_updated


🚀 Data Updates

Freshness

This quarter, we updated millions of jobs and locations in our Global Resume Dataset. See below for details:

DatasetGeographyFieldRecords Updated
ResumeGlobalexperience199,381,717
ResumeGloballocation298,777,846
ResumeUnited Statesexperience53,538,566
ResumeUnited Stateslocation82,270,299

Coverage (Full Stats: Person, Company)

Resume Dataset

LinkageCoverage in v25Coverage in v26Increase (%)
total_records794,313,831744,191,278-6.31%
mobile_phone17,666,37153,088,015200.50%
phone_numbers69,139,24944,226,27956.33%
education.gpa4,563,2808,757,49191.91%
education.summary29,679,96649,868,05068.02%
education.majors141,539,272179,725,85126.98%
education.degrees126,248,489156,797,78324.20%
birth_date7,971,5048,821,95410.67%

API Dataset

LinkageCoverage in v25Coverage in v26Increase (%)
total_records3,225,330,1003,178,815,044-1.44%
mobile_phone478,067,684513,257,6307.36%
phone_numbers1,125,062,1851,157,207,0072.86%
education.gpa9,730,9935,548,67675.38%
education.summary50,039,00029,954,66967.05%
education.majors209,162,040171,610,32521.88%
education.degrees169,748,056139,634,54821.57%


LinkageCoverage in v25Coverage in v26Increase (%)
total_records61,297,15262,109,4271.33%
funding_details221,107226,3022.35%
all_subsidiaries40,99442,4963.66%
direct_subsidiaries40,90842,3933.63%
ultimate_parent112,785116,3673.18%
immediate_parent111,943115,3943.08%
alternative_domains4,504,0074,504,3900.01%

Commentary

  • We saw significant improvements (over 200% increase) in our coverage of mobile phones tied to linkedin profiles due to new data partnerships.
  • We rebuilt our school dataset and additionally ingested new sources of education data resulting in significant increases in coverage of our education-related fields including Degrees, Summaries, Majors, and GPAs.
  • We decreased our total records in the resume data slice by ~6% as a result of improvements to our deduplication logic as well as improved QA of low-quality data sources
  • We increased our linkages between linkedin profiles and birth dates by over 10% in our resume dataset

🛠 Improvements and Bug Fixes

Improvements

  • We’ve made a significant amount of updates to our IP data and matching to help with reliability and accuracy
  • We decreased the number of frankenstein records in our person dataset by over 13% and improved our work email quality by removing inferred emails being contributed by some of our data sources
  • We improved the accuracy of our alternative_domains field by removing a low-quality data source without impact to our overall fill rate in the company dataset
  • We reduced the number of duplicate company tags present in our our company records

Bug Fixes

  • Fixed a bug in our Autocomplete API where autocompletion using the region field was not returning location-based metadata
  • We fixed a bug in our changelog process that was erroneously creating multiple changelog records in certain cases. These multiple records are now stored in an array in the to field of a changelog record.
  • We made changes to our inferred_years_experience logic which allows us to under-emphasize education when we have detailed job history with start/end dates increasing the overall accuracy of this value.
  • We fixed a bug where certain degree abbreviations were being added into the name fields in a person profile
  • Resolved unexpected company enrichment matching behavior on websites where the input was actually an email domain such as sbcglobal.com and sbcglobal.net.