Persistent IDs

Persistent IDs allow for a significantly easier ingestion of new releases for our license customers. It provides for faster updates, a better understanding of how our dataset changes between builds and for collaboration between our license customers and products.

Goals

We created our persistent ID solution to do the following:

  1. Maintain IDs across builds for the great majority of records.
  2. Provide metadata on a lineage of ID movement from one release to the next.

How do IDs persist?

We built each production record in our dataset from many raw input source records. We further generate a record ID for each raw input record as it enters our ingestion pipeline. We then merge these records at some point into clusters of raw records that make up our final person records. Finally, we generate our persistent IDs by taking the oldest and most fundamental raw record that contributes to the final record in the cluster. This means that our highest confidence sources, which are the bedrock of our data, are contributing the majority of our IDs.

The only case in which an ID would disappear from our data would be if we removed a bedrock source, which happens very rarely and will continue to happen increasingly rarely over time.

For License Customers

Please consult our ID changelog to learn more about how this impacts your deliveries.