Persistent IDs

Persistent IDs allow for ingestion of new releases significantly easier for our license customers. It allows for faster updates, a better understanding of how our dataset changes between builds, and for collaboration between our license customers and products.

Goals

Our persistent ID solution was created to do the following:

  1. Maintain IDs across builds for the grand majority of records.
  2. Provide metadata on a lineage of ID movement from one release to the next.

How do IDs persist?

Each production record in our dataset was built of many raw input source records. We generate a record ID for each raw input record as it enters into our ingestion pipeline. These records are then merged together at some point into clusters of raw records that make up our final person records. Our persistent IDs are generated by taking the oldest and most fundamental raw record that contributes to that final record in the cluster. This means that our highest confidence sources that are the bedrock of our data are contributing the majority of our IDs.

The only case in which an ID would disappear from our data would be if we removed a bedrock source, which happens very rarely and will continue to happen increasingly rarely over time.

Edge Case

Currently there exists an edge case where a persistent ID would be duplicated across 2 profiles. This occurs in <1k records across our entire dataset. These duplicates can be ignored while we work on a fix.

For License Customers

Please consult our ID changelog to learn more about how this impacts your deliveries.


Did this page help you?