Data Accuracy
What is PDL’s Entity Resolution Strategy?
As Data-as-a-Service (DaaS) companies compete with each other on metrics like fill rates and linkages, there’s an incentive in our industry to use very aggressive merging logic and combine records that shouldn’t really be combined in order to inflate metrics. This results in false-positive merges, where profiles have inaccurate PII. Philosophically at PDL, we believe these false-positive, Frankenstein merges are more detrimental than false-negatives. In other words, we focus on accuracy over anything else.
How do we measure quality in data sources?
Every new data source we ingest goes through a battery of quality tests, and those quality tests are growing in scope over time so that we can continue to raise the bar on our data. Our testing framework includes tests such as linkage accuracy measures, email validation checks, confirmations of mobile phone tagging, and more. This testing process is so strict that we turn down about 3 sources for every 1 source utilized.
How do we QA our data?
Every one of our monthly data releases goes through a massive testing framework, where in addition to running the same tests we run on the sources themselves, we also run comparisons on every component of the data against the latest release to ensure that every change is as expected according to our new sources and code changes. For more information about our processes, see Data Build Process and Data Updates.
How do we encompass customer feedback into our builds?
Our product team monitors every single data quality issue our customers bring up, and our analytics and data engineering teams help us understand the root causes of any issues, along with an exact understanding of the impact of the issue across our dataset. We also ensure that our data teams invest a portion of their bandwidth in every engineering sprint to address data quality issues. If you happen to find an issue within our data, please submit it here.
Updated about 1 year ago