Canonical Field Values

Overview

Canonical field values are the normalized, enumerated values we use for schema fields that support a fixed set of choices, autocomplete, or controlled vocabulary.

Many Enum (String) and Array [Enum (String)] fields in our Person Schema and Company Schema are backed by published canonical datasets. For example, the values for education.degrees are defined in Education Degrees.

What are canonical field values?

Canonical values are the standard allowed values for certain fields. They are not raw source text; they are curated, normalized terms that help keep search, autocomplete, and schema validation consistent.

Examples include:

  • education.degrees
  • education.majors
  • company.types
  • industry
  • location.countries
  • job_title_roles
  • language_names

Why this matters

Using canonical values helps you:

  • build queries that align with our searchable values
  • avoid mismatches from raw text or alternate spellings
  • understand what values are accepted for fields with fixed vocabularies

Our schema pages usually link fields to their canonical value docs when available. If you see a field with a canonical reference, follow that link to see the exact permitted values.

Common canonical data pages

📘

For a complete list of canonical datasets, browse the subpages under Data Standardization > Canonical Field Values on the left-hand navigation bar!


Where to access canonical data

Datasets of possible values for many fields are stored in our public Amazon S3 bucket:

We update canonical data quarterly, either by moving files into a new version folder or by updating an existing file. We note updated or changed files in the Release Notes.