A CSV of 5000 titles and their top related skills and other titles.
You can download our Related Title Dataset by filling out this form.
494 kb Compressed
1.9 mb Uncompressed
The relations were built by leveraging co-occurrence counts (how many times this skill appeared alongside this title on a person's resume) in conjunction with some simple mathematical modeling. The math draws from the philosophy of tf-idf by attributing higher relational scores for entities that have proportionately high rates of co-occurrence when compared to the rest of their co-occurrences. For example, the skill "microsoft office" will have a high rate of co-occurrence with most titles, so it's relational score to any given title is heavily penalized for being such a commonly co-occurring skill.
This is NOT a canonical list of verified or cleaned titles. The collection is centered mostly around direct user-input data and very little normalization or filtering has been done. We have included counts of the titles for scale and relativity. The count is roughly equal to the number of person profiles the title occurs on in an unprocessed variant of our dataset.
This is an abridged version of a dataset which has ~100k titles - each with ~1000 relations for both skills and titles as well as scores for each relation.
You may use this data for any purpose. It is released under the terms of the Creative Commons Attribution license (CC BY 4.0 - https://creativecommons.org/licenses/by/4.0/).