April 2020 Release Notes (v10.0)
over 5 years ago by Henry Nevue
| Release Name | Dataset Version | Publish Date |
|---|---|---|
| April 2020 | v10.0 | 04/08/2020 |
Released on 4/08/2020
Freshness
- This quarter we have refreshed job titles for over 175mm of our global profiles and locations for over 160mm.
- Similarly, we have refreshed job titles for over 50mm of our US profiles and locations for over 60mm.
Coverage Increase
- We've updated and increased our coverage of the
linkedin_connectionsbeta field for over 100mm records. - We've improved our coverage of historical
experienceandeducation, as well as all three of thesummariesbeta fields.
Data Field Changes
- We've added two new fields --
experience.company.location.street_addressandprimary.job.company.location.street_address. This represents the HQ location of the company and should help with matching our canonicalized company data to other company sources. - We've made modifications to the
experience.title.levelsfield. The enumerable values for the field have changed and can be referenced in our canonical data (link - deprecated). While the tagging logic has mostly remained the same, thecxolevel should see some net lift and themanagerlevel should slightly decrease due to some logical improvements. We've also added lowerlevelsincludingsenior,junior, andunpaid. - We've begun to do some basic pre-processing on the
experience.title.namefield to improve merging and help with standardization. These changes are mainly around mapping abbreviations and stripping punctuation and should be non-destructive. We have also added anexperience.title.rawfield which is available to license customers upon request. - The
birth_date_fuzzyfield will now have the same year as thebirth_datefield instead of being blank when abirth_dateexists. - We are now exposing the
experience.company.idandeducation.school.idfields by default. This allows for easy linkage with our canonical company data (link - deprecated) and canonical school data (link - deprecated). As of now these ids do not persist between versions of the canonical data. - Our highly confident mobile phones are now tagged as
phone_numbers.type = mobile experience.company.linkedin_sizehas been renamed toexperience.company.size
Improvements
- We made improvements to fuzzy company canonicalization to avoid generic matches.
- We've updated our canonical company data and made some additions. We are now providing two files:
company_vx.0which contains the information exposed in the person data andcompany_vx.0_fullwhich contains additional fields. - We removed two sources that exceeded our maximum threshold for frankenstein records (>1% instance rate).
Bug fixes
- Stripped out invalid/temporary email providers like dummy.com.
- We removed a data source that was providing incorrect // generic skill information.
- We removed null bytes from summary data.
- A small subset of linkedin profile URLs were being incorrectly parsed to be blank -- which is now fixed.
- Included punctuation for the skill
.net(instead ofnet). - Fixed a scenario where an
experienceobject would show up as aprimary.job, but not in theexperiencearray. - Fixed merging issues with
educationobjects where there were erroneous compounding merges.
Data Delivery Formats
- We can now deliver the data license updates in Parquet format. If you would like to receive this, please let us know!
