April 2024 Release Notes (v26.0)
Release Name | Dataset Version | Publish Date |
---|---|---|
April 2024 | v26.0 | 04/02/2024 |
This data version was released on 4/2/2024.
Welcome to our April 2024 release notes! One quarter into the new year and we have a ton of exciting updates to share!
Here are some of the key highlights:
- Significant improvements in our coverage mobile phone numbers
- Improved data quality in our Person Dataset including removal of duplicate profiles and reduction of frankenstein records
- Better insight into job freshness with our new Resume Timestamp fields
- New employee count by role aggregations in our Company dataset
- An important breaking change to our
person.gender
field - Significant updates to our IP data and matching logic to help with reliability and accuracy
- An open solicitation for customer feedback to improve our Role and Sub_Role tagging
- Over 199 million jobs and 298 million locations have been updated this past quarter!
Excited yet? Read on to learn more, or jump to a specific section using the table of contents below.
Note that this field rename is a breaking change - please see the Breaking Changes section for previous announcements and additional information.
As a reminder, with the v26.0 release, we are renaming the person.gender
field to person.sex
in the Person Schema. The output of the field will remain the same, as shown in the example record below:
Example PDL Record - v26.0
"id": "qEnOZ5Oh0poWnQ1luFBfVw_0000",
"full_name": "sean thorne",
"first_name": "sean",
"middle_initial": "f",
"middle_name": "fong",
"last_initial": "t",
"last_name": "thorne",
"sex": "male", -> renamed from gender
...
This change is required to demonstrate adherence with legislative changes defining aspects of gender as sensitive personal data (which PDL does not process or output).
For help moving over to the new field, please reach out to your Customer Success team for support and enablement resources. Please also see this easy-to-follow guide prepared by our Technical Services team for instructions on how to transition to this new schema:
Breaking Change Guide: Field Rename from Gender to Sex
New Resume Timestamps (Person Schema)
This change is associated with a deprecation of our current
job_last_updated
field in the Person Schema as part of the July 2024 (v27.0) release. See the Deprecation announcement for additional details.
This quarter, we are excited to announce the launch of two new fields in our Person Schema: job_last_changed
and job_last_verified
.
Field Name | Data Type | Field Description | Example |
---|---|---|---|
job_last_changed | String (Date) | The timestamp that reflects when the top-level job information changed. | "job_last_changed": "2023-10-04" |
job_last_verified | String (Date) | The timestamp that reflects when the top level job information was last validated by a data source. | "job_last_updated": "2024-01-05" |
These new fields contain timestamps associated with the top-level job on a profile (i.e. the most current experience) and provide additional clarity and granularity on the freshness of a person’s current work experience. They are now included in all Person data records, and are immediately available to all PDL users who have access to our job information.
These two timestamps are intended to replace the existing timestamp field, job_last_updated
, which will be deprecated in v27.0. Any customers currently using the job_last_updated
field should transition to the new job_last_changed
and job_last_verified
fields over this next quarter.
For support transitioning off of the job_last_updated
field and onto our newly released resume timestamp fields, please reach out to your Customer Success team. Please also see this easy-to-follow guide prepared by our Technical Services team for additional guidance:
Breaking Changes Guide: Deprecation of job_last_updated
Employee Count By Role Fields (Company Schema)
We are excited to share that 2 new fields have been added to our company schema as of our v25.2 release:
Salesforce Integration
These fields are now also live in the PDL Salesforce Integration.
Field Name | Data Type | Field Description |
---|---|---|
employee_count_by_role | Object | The number of employees (INT) by Job Role on the final day of the most recent month. |
employee_growth_rate_12_month_by_role | Object | The twelve month rate of change (FLOAT) by Job Role on the final day of the most recent month. |
Examples (click to expand)
Field Name | Example |
---|---|
employee_count_by_role |
"employee_count_by_role": { "real_estate": 0, "design": 2, "trades": 0, "marketing": 4, "education": 4, "legal": 0, "customer_service": 10, "finance": 6, "public_relations": 1, "engineering": 24, "human_resources": 3, "media": 1, "sales": 12, "operations": 10, "health": 0 } |
employee_growth_rate_12_month_by_role |
"employee_count_by_role": { "real_estate": 0, "design": 2, "trades": 0, "marketing": 4, "education": 4, "legal": 0, "customer_service": 10, "finance": 6, "public_relations": 1, "engineering": 24, "human_resources": 3, "media": 1, "sales": 12, "operations": 10, "health": 0 } |
These fields provide quick access for our customers to the most recent department/role headcounts for companies without the need to un-nest this information from our insights data. Customers using our Salesforce Integration in particular may find these new fields especially valuable, making it possible to now assign role tags and department growth rates to customer accounts directly within the integration.
Both of these new fields have been added to the existing Premium and Comprehensive Company Data Bundles and are immediately available to customers with these bundles.
Role and Sub_Role Updates (Person Schema)
In our October release (v28.0) we will be making significant changes to our job_title_role and job_title_sub_role enum values in order to improve our tag fill rates and improve the categories we use to represent titles. We’ll be posting a formal breaking change notice and updated canonical values alongside the v27.0 release.
Open Feedback Solicitation
We are currently soliciting feedback on our existing taxonomy and a draft of the new taxonomy. If you’d like to get a preview and/or give feedback on the taxonomy please reach out to your Customer Success Manager.
Previous Announcements: v24 / October 2023, v25 / January 2024
We have renamed the gender field to sex in the Person Schema. The output will remain the same. We output the biological sex of a profile, but not their gender as defined in applicable legislation. This change is required to demonstrate adherence with legislative changes defining aspects of gender as sensitive personal data (which People Data Labs does not process or output).
Example PDL Record - v26.0
"id": "qEnOZ5Oh0poWnQ1luFBfVw_0000",
"full_name": "sean thorne",
"first_name": "sean",
"middle_initial": "f",
"middle_name": "fong",
"last_initial": "t",
"last_name": "thorne",
"sex": "male", -> renamed from gender
...
For help moving over to the new field, please reach out to your Customer Success team for support and enablement resources. Please also see this easy-to-follow guide prepared by our Technical Services team for instructions on how to transition to this new schema:
Breaking Change Guide: Field Rename from Gender to Sex
Previous Announcements: v25 / January 2024
Change to Format
While the field name (id
) and data format (string) remain the same as before, PDL’s Company IDs will now have an alphanumeric hash format similar to our Person IDs.
The Company ID for the People Data Labs record
v25.2 | v26.0 |
---|---|
"id": "peopledatalabs" | "id": "tnHcNHbCv8MKeLh92946LAkX6PKg" |
Old ID Shortcomings
For v25.2 and prior releases, the Company ID for each company record was generated from the profile’s most recent LinkedIn URL. This created barriers to serving as a reliable ID for updating and managing profiles over time, as LinkedIn URLs can change when a company changes their name, companies can edit their LinkedIn URL at any point, and old LinkedIn URLs may be reused by new companies in the future.
Benefits of the New IDs
While the new format Company IDs are not persistent IDs, they were designed in a manner intended to undergo fewer changes than the previous LinkedIn URL slug-defined format. In addition, since the new Company IDs are generated independent of LinkedIn URLs, we can now add companies to our dataset that do not have an associated LinkedIn profile.
Handling the Changes
There are a few things we’ve planned to to help make the transition as smooth as possible:
The ID field name and datatype are the same as before
- Neither the field name (“id”) nor the datatype (string) of the ID field are changing, so any queries, joins, or other code references to that field should continue to function as they did previously.
The old ID exists in a new field called linkedin_slug
, which is still used in enrichment matching
- If you’ve stored past company IDs and would like to use those as Company Enrichment inputs, the old ID field still exists under the new field name
linkedin_slug
, generated using the exact same logic as our old IDs. - In addition, as of the v26.0 release, Company Enrichment queries using the
pdl_id
field will match against both the id orlinkedin_slug
fields to help maintain backwards compatibility.
Mapping of v26.0 Company ID to LinkedIn Slug (prior ID format)
Only for v26.0
This mapping is a one-time file that we have created specifically for the v26.0 release. We will not be maintaining this file in future releases.
Please reach out to your Customer Success team for access to this resource and additional support material.
For v26.0, we have created a mapping of Company IDs to LinkedIn Slug that we can provide to users to support their transition to the new Company IDs.
The format for this file is:
display_name | linkedin_slug | id |
---|---|---|
People Data Labs | peopledatalabs | tnHcNHbCv8MKeLh92946LAkX6PKg |
Google | google | aKCIYBNF9ey6o5CjHCCO4goHYKlf |
... |
PDL Record - v25.2
"name": "people data labs",
"id": "peopledatalabs", -> linkedin_slug format in v25.2
"linkedin_url": "linkedin.com/company/peopledatalabs"
"linkedin_slug": "peopledatalabs"
PDL Record - v26.0
"name": "people data labs",
"id": "tnHcNHbCv8MKeLh92946LAkX6PKg", -> alphanumeric format in v26.0
"linkedin_url": "linkedin.com/company/peopledatalabs"
"linkedin_slug": "peopledatalabs"
Upcoming Breaking Changes
There are upcoming breaking changes in future versions that may impact your current processes. We are announcing them here to provide ample time for you to adjust your processes accordingly.
Change expected in: v26.1 / May 2024
Previous Announcements: v24 / October 2023, v25 / January 2024, v25.1 / February 2024
As announced in our previous release notes (linked above) we will be standardizing our Snowflake Person and Company Schemas in May 2024 (v26.1). This is a reminder that this change will be a breaking change to our existing Snowflake schema.
To prepare for this transition, we strongly encourage our snowflake customers to follow the steps below:
- Make a copy of your current data after your April 2024 delivery. This way you not only have a backup, but can also compare new to old after you switch over.
- Go through the new standard schemas, which are included in the Resources section below as well as here
- Prepare any script changes to your existing processes before the switch in May 2024.
For any questions or help transitioning to these new schema, please reach out to your Customer Success Manager.
The Standard Person and Company Schemas that will be used for Snowflake deliveries are available here.
Change expected in: v27.0 / July 2024
As part of our new resume timestamps that were released this quarter, we will be deprecating our existing job_last_updated
field in the July 2024 (v27.0) release. Our newly released resume timestamps provide more granularity and clarity than our existing job_last_updated
timestamp, and help resolve ambiguity in the freshness of a person’s current work experience.
Any customers currently using our job_last_updated
field will need to migrate to the new job_last_verified
and job_last_changed
fields before v27.0.
For help moving over to the new field, please reach out to your Customer Success and Technical Services for support and enablement resources. Please also see this easy-to-follow guide prepared by our Technical Services team for instructions on how to transition to this new schema:
Breaking Changes Guide: Deprecation of job_last_updated
This quarter, we updated millions of jobs and locations in our Global Resume Dataset. See below for details:
Dataset | Geography | Field | Records Updated |
---|---|---|---|
Resume | Global | experience | 199,381,717 |
Resume | Global | location | 298,777,846 |
Resume | United States | experience | 53,538,566 |
Resume | United States | location | 82,270,299 |
Linkage | Coverage in v25 | Coverage in v26 | Increase (%) |
---|---|---|---|
total_records | 794,313,831 | 744,191,278 | -6.31% |
mobile_phone | 17,666,371 | 53,088,015 | 200.50% |
phone_numbers | 44,226,279 | 69,139,249 | 56.33% |
education.gpa | 4,563,280 | 8,757,491 | 91.91% |
education.summary | 29,679,966 | 49,868,050 | 68.02% |
education.majors | 141,539,272 | 179,725,851 | 26.98% |
education.degrees | 126,248,489 | 156,797,783 | 24.20% |
birth_date | 7,971,504 | 8,821,954 | 10.67% |
Linkage | Coverage in v25 | Coverage in v26 | Increase (%) |
---|---|---|---|
total_records | 3,225,330,100 | 3,178,815,044 | -1.44% |
mobile_phone | 478,067,684 | 513,257,630 | 7.36% |
phone_numbers | 1,125,062,185 | 1,157,207,007 | 2.86% |
education.gpa | 5,548,676 | 9,730,993 | 75.38% |
education.summary | 29,954,669 | 50,039,000 | 67.05% |
education.majors | 171,610,325 | 209,162,040 | 21.88% |
education.degrees | 139,634,548 | 169,748,056 | 21.57% |
Linkage | Coverage in v25 | Coverage in v26 | Increase (%) |
---|---|---|---|
total_records | 61,297,152 | 62,109,427 | 1.33% |
funding_details | 221,107 | 226,302 | 2.35% |
all_subsidiaries | 40,994 | 42,496 | 3.66% |
direct_subsidiaries | 40,908 | 42,393 | 3.63% |
ultimate_parent | 112,785 | 116,367 | 3.18% |
immediate_parent | 111,943 | 115,394 | 3.08% |
alternative_domains | 4,504,007 | 4,504,390 | 0.01% |
- We saw significant improvements (over 200% increase) in our coverage of mobile phones tied to linkedin profiles due to new data partnerships.
- We rebuilt our school dataset and additionally ingested new sources of education data resulting in significant increases in coverage of our education-related fields including Degrees, Summaries, Majors, and GPAs.
- We decreased our total records in the resume data slice by ~6% as a result of improvements to our deduplication logic as well as improved QA of low-quality data sources
- We increased our linkages between linkedin profiles and birth dates by over 10% in our resume dataset
- We’ve made a significant amount of updates to our IP data and matching to help with reliability and accuracy
- We decreased the number of frankenstein records in our person dataset by over 13% and improved our work email quality by removing inferred emails being contributed by some of our data sources
- We improved the accuracy of our alternative_domains field by removing a low-quality data source without impact to our overall fill rate in the company dataset
- We reduced the number of duplicate company tags present in our our company records
- Fixed a bug in our Autocomplete API where autocompletion using the
region
field was not returning location-based metadata - We fixed a bug in our changelog process that was erroneously creating multiple changelog records in certain cases. These multiple records are now stored in an array in the
to
field of a changelog record. - We made changes to our
inferred_years_experience
logic which allows us to under-emphasize education when we have detailed job history with start/end dates increasing the overall accuracy of this value. - We fixed a bug where certain degree abbreviations were being added into the name fields in a person profile
- Resolved unexpected company enrichment matching behavior on websites where the input was actually an email domain such as sbcglobal.com and sbcglobal.net.