April 2020 Release Notes (v10.0)
over 4 years ago by Henry Nevue
Release Name | Dataset Version | Publish Date |
---|---|---|
April 2020 | v10.0 | 04/08/2020 |
Released on 4/08/2020
Freshness
- This quarter we have refreshed job titles for over 175mm of our global profiles and locations for over 160mm.
- Similarly, we have refreshed job titles for over 50mm of our US profiles and locations for over 60mm.
Coverage Increase
- We've updated and increased our coverage of the
linkedin_connections
beta field for over 100mm records. - We've improved our coverage of historical
experience
andeducation
, as well as all three of thesummaries
beta fields.
Data Field Changes
- We've added two new fields --
experience.company.location.street_address
andprimary.job.company.location.street_address
. This represents the HQ location of the company and should help with matching our canonicalized company data to other company sources. - We've made modifications to the
experience.title.levels
field. The enumerable values for the field have changed and can be referenced in our canonical data (link - deprecated). While the tagging logic has mostly remained the same, thecxo
level should see some net lift and themanager
level should slightly decrease due to some logical improvements. We've also added lowerlevels
includingsenior
,junior
, andunpaid
. - We've begun to do some basic pre-processing on the
experience.title.name
field to improve merging and help with standardization. These changes are mainly around mapping abbreviations and stripping punctuation and should be non-destructive. We have also added anexperience.title.raw
field which is available to license customers upon request. - The
birth_date_fuzzy
field will now have the same year as thebirth_date
field instead of being blank when abirth_date
exists. - We are now exposing the
experience.company.id
andeducation.school.id
fields by default. This allows for easy linkage with our canonical company data (link - deprecated) and canonical school data (link - deprecated). As of now these ids do not persist between versions of the canonical data. - Our highly confident mobile phones are now tagged as
phone_numbers.type = mobile
experience.company.linkedin_size
has been renamed toexperience.company.size
Improvements
- We made improvements to fuzzy company canonicalization to avoid generic matches.
- We've updated our canonical company data and made some additions. We are now providing two files:
company_vx.0
which contains the information exposed in the person data andcompany_vx.0_full
which contains additional fields. - We removed two sources that exceeded our maximum threshold for frankenstein records (>1% instance rate).
Bug fixes
- Stripped out invalid/temporary email providers like dummy.com.
- We removed a data source that was providing incorrect // generic skill information.
- We removed null bytes from summary data.
- A small subset of linkedin profile URLs were being incorrectly parsed to be blank -- which is now fixed.
- Included punctuation for the skill
.net
(instead ofnet
). - Fixed a scenario where an
experience
object would show up as aprimary.job
, but not in theexperience
array. - Fixed merging issues with
education
objects where there were erroneous compounding merges.
Data Delivery Formats
- We can now deliver the data license updates in Parquet format. If you would like to receive this, please let us know!