U.S. flag

An official website of the United States government, Department of Justice.

Comparing person-level matching algorithms to identify risk across disparate datasets among patients with a controlled substance prescription: retrospective analysis

NCJ Number
Date Published
January 2022

This study used statewide prescription drug monitoring program (PDMP), arrest, and mortality data matched at the person-level using an approximate match and 2 exact match algorithms. Impact of matching was assessed by analyzing 3 independent concepts: (1) the prevalence of key risk indicators used by PDMP programs in practice, (2) the prevalence of arrests and fatal opioid overdose, and (3) the performance of a multivariate logistic regression for fatal opioid overdose.


The PDMP key risk indicators included (1) multiple provider episodes (MPE), or patients with prescriptions from multiple prescribers and dispensers, (2) high morphine milligram equivalents (MMEs), which represents an opioid's potency relative to morphine, and (3) overlapping opioid and benzodiazepine prescriptions. Results Prevalence of PDMP-based risk indicators were higher in the approximate match population for MPEs (n = 4893/1 859 445 [0.26%]) and overlapping opioid/benzodiazepines (n = 57 888/1 859 445 [4.71%]), but the exact-basic match population had the highest prevalence of individuals with high MMEs (n = 664/1 910 741 [3.11%]). Prevalence of arrests and deaths were highest for the approximate match population compared with the exact match populations. Model performance was comparable across the 3 matching algorithms (exact-basic validation area under the receiver operating characteristic curve [AUC]: 0.854; approximate validation AUC: 0.847; exact + zip validation AUC: 0.826) but resulted in different cutoff points balancing sensitivity and specificity. Conclusions Our study illustrates the specific tradeoffs of different matching methods. Further research should be performed to compare matching algorithms and its impact on the prevalence of key risk indicators in an applied setting that can improve understanding of risk within a population. Lay Summary The persistence of the opioid epidemic necessitates an understanding of risk factors for fatal opioid overdose across multiple sources of data. However, without a national identifier for individuals that can be used to link the data from multiple sources together, the data are linked together using personal identifiers in a variety of ways. This study combines prescription histories, criminal justice encounters, and deaths using several different approaches to linking the data together at the person level to provide insight into the impact on understanding risk of fatal opioid overdose. Several risk indicators, prevalence of arrests and deaths, and models predicting risk were quantified for 3 approaches to linking being used in practice. As more studies call for combining cross-sector datasets for more comprehensive analysis, state programs must understand the implications of the data linking process when designing public health programs and interventions that benefit the community.

Date Published: January 1, 2022