January 2024 Release Notes (v25.0)

Release NameDataset VersionPublish Date
January 2024v25.001/04/2024

This data version was was released on 1/4/2024.

Welcome to our January 2024 release notes! We’re rolling out some exciting updates with this release.

Here are some of the key highlights:

Excited yet? Read on to learn more, or jump to a specific section using the table of contents below.

Table of Contents

📣 Key Announcements

✨ New Products and Features

❗Breaking Changes

🚀 Data Updates

🛠 Improvements and Bug Fixes


📣 Key Announcements

Schema Changes

Funding Data Fields (Company Schema)

Last quarter (v24), we released the Beta of the Company Funding Data Fields. In v25, we’re excited to roll out these highly anticipated fields for General Availability!

These 6 new fields provide information on a company’s fundraising history, including the amount of money raised, the number of funding rounds (ex: Series B stage), and details on the individual funding rounds.

The following five funding fields will be available for free in our Base Company Bundle:

Field NameData TypeField DescriptionExample
funding_stagesArray [Enum (String)]All disclosed funding stages for the company.[”series_a”, “series_b”]
last_funding_dateString (Date)The date of the company’s most recent funding event."2021-11-16"
latest_funding_stageEnum (String)The stage of the company’s most recent funding event."series_b"
number_funding_roundsInteger (> 0)The number of funding rounds announced by the company.7
total_funding_raisedFloat (> 0)The cumulative amount raised by the company in USD.55250000.0

For users that want additional detail on the individual funding rounds themselves, including the investment dates, amounts raised, and PDL profiles of disclosed investors, that data will be available at an additional cost in the funding_details field:

Field NameData TypeField Description
funding_detailsArray [Object]List of all funding events associated with the company, with corresponding details.

funding_details subfields:

SubfieldData TypeDescription
funding_round_dateString (Date)The publicly disclosed date of the closing of the financing event.
funding_raisedFloat (> 0)The total amount raised during the funding event.
funding_currencyEnum (String)The currency code for the funding_raised value. Must be one of our Canonical Currency Codes.
funding_typeEnum (String)The funding stage of the funding event. Must be one of our Canonical Funding Rounds.
investing_companiesArray [String]The PDL Company IDs of the investing companies participating in the funding event.
investing_individualsArray [String (Titlecase)]The names of any other investing individuals participating in the funding event.
Full Example (Click to Expand):
{
  "id": "peopledatalabs",
  "total_funding_raised": 55250000.0,
  "funding_stages": [
    "series_b",
    "series_a",
    "seed"
  ],
  "latest_funding_stage": "series_b",
  "number_funding_rounds": 7,
  "last_funding_date": "2021-11-16",
  "funding_details": [
    {
      "funding_round_date": "2017-04-26",
      "funding_raised": null,
      "funding_currency": null,
      "funding_type": "series_a",
      "investing_companies": [
        "8vc"
      ],
      "investing_individuals": []
    },
    {
      "funding_round_date": "2016-02-10",
      "funding_raised": 775000.0,
      "funding_currency": "usd",
      "funding_type": "seed",
      "investing_companies": [
        "e-merge-sa-nv",
        "forumvc"
      ],
      "investing_individuals": []
    },
    {
      "funding_round_date": "2018-10-10",
      "funding_raised": 7000000.0,
      "funding_currency": "usd",
      "funding_type": "series_a",
      "investing_companies": [
        "8vc",
        "the-founders-fund",
        "susa-ventures",
        "forumvc"
      ],
      "investing_individuals": []
    },
    {
      "funding_round_date": "2015-06-12",
      "funding_raised": 275000.0,
      "funding_currency": "usd",
      "funding_type": "seed",
      "investing_companies": [
        "forumvc"
      ],
      "investing_individuals": []
    },
    {
      "funding_round_date": "2016-01-01",
      "funding_raised": null,
      "funding_currency": null,
      "funding_type": "series_a",
      "investing_companies": [],
      "investing_individuals": [
        "David J. Namdar"
      ]
    },
    {
      "funding_round_date": "2016-08-01",
      "funding_raised": 2200000.0,
      "funding_currency": "usd",
      "funding_type": "seed",
      "investing_companies": [
        "right-side-capital-management",
        "honecapital",
        "susa-ventures",
        "haystackvc",
        "曼图资本mandra-capital",
        "forumvc"
      ],
      "investing_individuals": [
        "Joel Englander",
        "Sandy Kory",
        "Susan Kimberlin"
      ]
    },
    {
      "funding_round_date": "2021-11-16",
      "funding_raised": 45000000.0,
      "funding_currency": "usd",
      "funding_type": "series_b",
      "investing_companies": [
        "flexcapital",
        "craft-ventures"
      ],
      "investing_individuals": [
        "Guillaume \"G\" Cabane"
      ]
    }
  ]
}

Premium Company Attributes Bundle (Person Schema)

We are adding four new fields to our Person Schema in a new, add-on bundle. These fields provide additional information about the person’s current company, and are mapped directly from our Company dataset.

These fields are fully searchable using our Person Search API and will be returned in Person Search and Enrichment responses.

With these fields, you can better identify key talent, ideal buyers, or other target personas based on attributes of their companies. It also unlocks the ability to target candidate or lead searches based on company size and growth metrics, add company size and funding information to enriched leads, and much more!

This bundle is available for Person Enrich and Person Search customers. It is NOT available for Data License customers at this time. If you want access to this bundle or have any questions, please reach out to us.

The 4 new person fields are:

job_company_employee_count

Company Field Represented employee_count
Data Type Integer (>= 0)
Field Summary The total number of PDL profiles associated with the person’s current company.

job_company_inferred_revenue

Company Field Represented inferred_revenue
Data Type Enum (String)
Field Summary The estimated annual revenue range in USD of the person’s current company.

job_company_12mo_employee_growth_rate

Company Field Represented employee_growth_rate.12_month
Data Type Float
Field Summary The person’s current company’s percentage increase in total headcount over the past twelve months. Growth rate is calculated as (current_employee_count / previous_employee_count) - 1.

job_company_total_funding_raised

Company Field Represented total_funding_raised
Data Type Integer (> 0)
Field Summary The cumulative amount of money raised in USD by the person’s current company during all publicly disclosed funding rounds.

linkedin_slug (Company Schema)

Field NameData TypeField DescriptionExample
linkedin_slugStringThe company’s LinkedIn URL slug."peopledatalabs"

To support our upcoming change to PDL Company IDs, we are adding the new linkedin_slug field. This field is generated in the same way as our current company id field.

For new company records that do not have associated LinkedIn pages, this field will be null.

See the Upcoming Company ID Format Changes announcement for more information.


[BETA] display_name_history (Autocomplete API)

Field NameData TypeField DescriptionExample
display_name_historyArray [String]A list of the company’s historical primary names with proper capitalization.

Note: Only available in the Autocomplete API.
[”Twitter”, “X”]

This field is only available in Autocomplete API responses for companies and websites.

See display_name for how we handle display names.

You can use this field to enhance the autocomplete experience, display former business names, or as an additional source of highly confident names to use in entity resolution/company matching.

The alternative_names field also contains former company names (along with other user-generated names) and can be accessed through Autocomplete and our other Company APIs.


✨ New Products and Features

Bulk Company Enrichment API

This quarter, we are excited to announce the launch of our Bulk Company Enrichment API.

This API supports enriching up to 100 companies in one request. This enables greater scale and flexibility, and will reduce the likelihood of exceeding our rate limits.

Any customer that has Company Enrich credits can use the Bulk Company Enrichment API.

Multiple Results in Company Enrichment API

We are adding a new size input parameter to the Company Enrichment API, which will control how many matching company records are returned.

Depending on the enrichment parameters, there may be multiple company records with a very close match score in our dataset. Using the size parameter, you can get up to 100 results in one API response ordered by likelihood score from highest to lowest. An example request and response can be found here.

In practice, this feature will enable similar workflows to our Person Identify API with company data. For example, let’s say you want to see a list of company records related to “Selby”. The Multiple Results feature would allow you to return several matches, such as “Selby and Sons”, “Selby Inc. Johns”, and “Selby and Johnson”. This lets you view multiple possible profiles while still leveraging our cleaning and likelihood logic, which isn’t present in our Search APIs.

Billing will be set on a per record basis, meaning that each successful result returned that is within the min_likelihood threshold will charge a credit.

This Limited Availability feature is only available for Enterprise Company Enrich API customers. If you are an Enterprise Company Enrich API customer and want to enable the size parameter, speak with your Data Consultant.

[OPEN BETA] Autocomplete API Logic Improvements

We’ve made improvements to our existing Autocomplete API and launched a beta Autocomplete endpoint. We’ll be gathering customer feedback on the beta during this quarter and plan to merge the logical changes over next quarter assuming we get positive feedback.

The beta Autocomplete endpoint is a Typesense-powered version of Autocomplete, which improves query performance and matching logic. We’ve observed that Autocomplete isn’t meeting customer’s expectations around performance (both in response time and output results). Internal testing and user feedback has indicated that this v2 is an overall better sorting and ranking system, but we’re eager to get more feedback.

The beta endpoint includes the following improvements:

  • Logic overhaul based on a rebuilt Typsense-powered backend
  • Places our Job Title and Skill Enrichment logic in front of title and skill Autocomplete to allow for better matching logic
  • Added support for autocompleting websites
  • Added type tolerance for inputs to improve UX
  • Added company display_name, alternative_names and display_name_history to the meta object. These fields also help improve company autocomplete logic when a company changes their name (ex: Twitter -> X) or has multiple semantically dissimilar names (ex: Ernst & Young and EY).

To enable the beta endpoint, simply set the input parameter beta=true in your Autocomplete request.

You can view a full sample response for the Autocomplete beta here.


❗Breaking Changes

❗Person Changelog Restructure

We have rebuilt our changelog to better align the changelog with the workflows of our customers, reducing their pain points and associated expenses when updating PDL data.

The Person changelog no longer contains every record in our dataset. Now, the changelog will only include records with high-signal changes:

  • merged: Records merged into other records, including metadata of the merged records.
  • added: Newly added records.
  • opted_out: Records that have opted out.
  • deleted: Records that have been deleted.
  • updated: Records where a value in any field has changed and/or a profile has been merged into the record.
    • The field changes for child fields will be limited to their parents. For example, if a record’s experience.end_date changes, that will be shown as “fields_updated”: [“experience”].

The changelogs will be stored in our existing S3 bucket. Each part of the changelog is capped to a file size of approximately 100MB. Changelogs are separated by update cadence with the following paths:

  • Monthly: s3://pdl-prod-id-changelog/version_number/monthly/
  • Quarterly: s3://pdl-prod-id-changelog/version_number/quarterly/

Migration Guide

If you used our previous changelog, you may need to make some changes to switch to our new changelog:

  1. The unchanged status no longer exists. Remove any logic that relies on it.
  2. Switch from using moved to the merged status.
  3. The merged status additional_metadata.to field has changed from a string to an array of strings. Adjust your processes accordingly.
  4. The updated status additional_metadata.fields_updated array can be used to filter records according to the type of updates made. Adjust your processes accordingly.

You can view updated schemas for each changelog status here.

See the original announcement from the October 2023 release notes for more information.

❗Company Insights Logic Changes

We have made improvements to our Company Insights aggregation logic and filter parameters. These improvements will ensure that the following employee count fields will sum to comparable values:

As part of this new logic, we have changed how we calculate employee_count. Previously, for a profile to be included in employee_count, it needed to have experience.is_primary=true but could have job_start_date=null. Now, for a profile to be included in employee_count, it does NOT need to have experience.is_primary=true but it DOES need to have a non-null job_start_date.

We have also added new “other_uncategorized” subfields to employee_count_by_country and employee_count_by_month_by_role which show the number of profiles that do not have sufficient location or experience data. With this additional subfield, the total employee counts should be much closer.

Example

{
  "employee_count_by_country": { 
    "united states": 117,
    "canada": 1, 
    "puerto rico": 1, 
    "other_uncategorized": 19, 
  },
  "employee_count_by_month_by_role": {
    "2015-03": { 
      "real_estate": 0, 
      "design": 0,
      "trades": 0,
      "marketing": 0, 
      "education": 0, 
      "legal": 0,
      "customer_service": 0, 
      "finance": 0, 
      "public_relations": 0, 
      "engineering": 0, 
      "human_resources": 0,
      "media": 0, 
      "sales": 0, 
      "operations": 0, 
      "health": 0, 
      "other_uncategorized": 8 
    }
  }
}

See the original announcement from the October 2023 release notes for more information.

❗Remove Oxford Comma from Industry Canonical Values

The Canonical Industries "​​leisure, travel & tourism" and "glass, ceramics & concrete" are now represented across all industry fields without Oxford commas.

This change affects the following fields:

This enforces schema consistency between the fields in our person and company dataset and allows us to have one consistent industry enum.

See the original announcement from the October 2023 release notes for more information.

⚠️ Upcoming Breaking Changes

🚧

Upcoming Breaking Changes

There are upcoming breaking changes in future versions that may impact your current processes. We are announcing them here to provide ample time for you to adjust your processes accordingly.

⚠️ Company ID Format Changes

Change Expected In: v26 / April 2024

We will be changing the format of our Company IDs in the v26.0 release to alphanumeric hash format. While we don’t anticipate that this will break user implementations, we do want to call attention to this change and explain why we’re making it.

For users that like our current ID format or have built workflows that hinge on the legibility or mapping to LinkedIn URLs, we’ve added a new field called linkedin_slug that will duplicate the current ID and carry that forward.

Why We’re Changing our Company ID Format

We’re making the change to the methodology in how we generate our company IDs primarily to:

  • Support the addition of companies that do not have a LinkedIn URL
  • Create an ID that changes less often than a company’s LinkedIn URL

Today, the company’s id is generated from the company’s LinkedIn URL. For v25.0, we’ve added a new field linkedin_slug, that is generated in the same way as our current id field.

{
  "name": "people data labs",
  "id": "peopledatalabs",
  "linkedin_url": "linkedin.com/company/peopledatalabs",
  "linkedin_slug": "peopledatalabs"
}

From v25.0 until v26.0 the id and linkedin_slug fields will be the same value.

Beginning in v26.0, the id field will be in alphanumeric hash format, while the linkedin_slug field will stay the LinkedIn URL slug.

{
  "name": "people data labs",
  "id": "605ebe8594b",
  "linkedin_url": "linkedin.com/company/peopledatalabs",
  "linkedin_slug": "peopledatalabs"
}

⚠️ Rename person.gender to person.sex

Change Expected In: v26 / April 2024
Change First Announced: v24 / October 2023

We are renaming the gender field to sex in the Person Schema. The output will remain the same. We output the biological sex of a profile, but not their gender. This change is required to demonstrate adherence with legislative changes defining aspects of gender as sensitive personal data (which People Data Labs does not process or output).

⚠️ Snowflake Schema Standardization

Change Expected In: v26.1 / May 2024
Change First Announced: v24 / October 2023

📘

Update: Change Now Expected May 2024

We originally planned this change for v25. We have decided to postpone it to v26.1. If you have any questions or concerns about this, please reach out to your Customer Success Manager.

In May 2024 we will be standardizing our Snowflake Person and Company schemas to expand and enhance our support of this delivery destination. After this change, all current and new customers who receive Snowflake deliveries will use the standardized schemas.

Before the change, we strongly suggest you:

  1. Make a copy of your current data after your April 2024 delivery. This way you not only have a backup, but can also compare new to old after you switch over.
  2. Go through the new standard schemas, which will be available by January 31.
  3. Prepare any script changes to your existing processes before the switch in May 2024.

The Standard Person and Company Schemas that will be used for Snowflake deliveries will be available here by January 31.


🚀 Data Updates

Freshness

This quarter, we updated millions of jobs and locations in our Global Resume Dataset. See below for details:

DatasetGeographyFieldRecords Updated
ResumeGlobalexperience186,938,902
ResumeGloballocation242,840,457
ResumeUnited Statesexperience49,309,414
ResumeUnited Stateslocation54,880,798

Coverage (Full Stats: Person, Company)

Resume Dataset

LinkageCoverage in v24Coverage in v25Increase (%)
total_records763,202,971794,313,8314.08%
job_start_date245,046,508261,827,9626.85%
job_company_name451,936,629461,248,8692.06%

API Dataset

LinkageCoverage in v24Coverage in v25Increase (%)
total_records3,198,403,4553,225,330,1000.84%
summary150,056,742172,240,01814.78%
job_start_date307,542,538321,877,6544.66%

Company Dataset

LinkageCoverage in v24Coverage in v25Increase (%)
total_records51,241,19761,297,15219.62%
funding_details136,958221,10761.44%
funding_stages136,958221,10761.44%
last_funding_date136,958221,10761.44%
latest_funding_stage136,958221,10761.44%
number_funding_rounds136,958221,10761.44%
total_funding_raised104,561162,68855.59%
all_subsidiaries28,76040,99442.54%
direct_subsidiaries28,76040,90842.24%
ultimate_parent79,965112,78541.04%
immediate_parent80,003111,94339.92%

Commentary

  • We now have over 61 million total company records, an increase of 20%
  • Company funding data coverage increased by 61%
  • Company parent and subsidiary coverage increased by 40%
  • We added over 26 million person records to our API dataset and 31 million person records to our Resume dataset
  • User-inputted summaries increased by 14% in our API dataset
  • Current job start dates increased by 16 million in our Resume dataset

🛠 Improvements and Bug Fixes

Improvements

  • Improved the Autocomplete API
    • Added support for titlecase
    • Added company display_name, alternative_names and display_name_history to the meta object
  • Improved our alternative_domains field by evaluating our source data and removing records contributing incorrect or outdated information.
  • Improved our name parser to decrease occurrences where the same first and last name prefixes result in dropped names. For example, cleaning “Van Ma Van” previously resulted in a null first name. Now, that name cleans to the first name “Van Ma” and last name “Van”.
  • Reduced our thresholds for inferred_years_experience. We placed a conservative cap on the upper limit (80 years) to combat instances of erroneous graduation or job start dates in our source data.
  • Improved filtering of inappropriate language to our data cleaning processes, particularly for our name, job title, and skills fields.

Bug Fixes

  • Resolved issue where job_company_total_funding_raised was null in all records.
  • Resolved a bug with Company display_name not appropriately preserving capitalization.
  • Fixed a bug that displayed incorrect timestamps on location updates. This triggered a large amount of location updates in minor release v24.2.
  • Locations containing "Remote" were incorrectly canonicalized to "Remote, Oregon" and "Remote, Alaska". For the small amount of records this applied to, location information on the record has been updated to a correct or null location.