April 2020 Release Notes

Released on 4/08/2020

Freshness

  • This quarter we have refreshed job titles for over 175mm of our global profiles and locations for over 160mm.
  • Similarly, we have refreshed job titles for over 50mm of our US profiles and locations for over 60mm.

Coverage Increase

  • We've updated and increased our coverage of the linkedin_connections beta field for over 100mm records.
  • We've improved our coverage of historical experience and education, as well as all three of the summaries beta fields.

Data Field Changes

  • We've added two new fields -- experience.company.location.street_address and primary.job.company.location.street_address. This represents the HQ location of the company and should help with matching our canonicalized company data to other company sources.
  • We've made modifications to the experience.title.levels field. The enumerable values for the field have changed and can be referenced in our canonical data (link - deprecated). While the tagging logic has mostly remained the same, the cxo level should see some net lift and the manager level should slightly decrease due to some logical improvements. We've also added lower levels including senior, junior, and unpaid.
  • We've begun to do some basic pre-processing on the experience.title.name field to improve merging and help with standardization. These changes are mainly around mapping abbreviations and stripping punctuation and should be non-destructive. We have also added an experience.title.raw field which is available to license customers upon request.
  • The birth_date_fuzzy field will now have the same year as the birth_date field instead of being blank when a birth_date exists.
  • We are now exposing the experience.company.id and education.school.id fields by default. This allows for easy linkage with our canonical company data (link - deprecated) and canonical school data (link - deprecated). As of now these ids do not persist between versions of the canonical data.
  • Our highly confident mobile phones are now tagged as phone_numbers.type = mobile
  • experience.company.linkedin_size has been renamed to experience.company.size

Improvements

  • We made improvements to fuzzy company canonicalization to avoid generic matches.
  • We've updated our canonical company data and made some additions. We are now providing two files: company_vx.0 which contains the information exposed in the person data and company_vx.0_full which contains additional fields.
  • We removed two sources that exceeded our maximum threshold for frankenstein records (>1% instance rate).

Bug fixes

  • Stripped out invalid/temporary email providers like dummy.com.
  • We removed a data source that was providing incorrect // generic skill information.
  • We removed null bytes from summary data.
  • A small subset of linkedin profile URLs were being incorrectly parsed to be blank -- which is now fixed.
  • Included punctuation for the skill .net (instead of net).
  • Fixed a scenario where an experience object would show up as a primary.job, but not in the experience array.
  • Fixed merging issues with education objects where there were erroneous compounding merges.

Data Delivery Formats

  • We can now deliver the data license updates in Parquet format. If you would like to receive this, please let us know!