Dataset 1: Analyzing Opioid Prescription Rates by State and Provider Type
Model 1: K-Means Clustering
Why the model was chosen: K-Means was chosen to explore the relationship between total opioid claims, opioid prescription rates (percentage of opioid prescriptions out of total prescriptions), and total opioid costs. This unsupervised method allows for an understanding of patterns in provider behavior and can reveal different risk profiles or areas in need of intervention. For example, providers who have high costs, high claims, and high rate points should be identified. On the other hand, some prescribers may have a low rate but high cost, indicating differences in the opioid market and how this may be affecting patients.
Model assumptions: K-Means requires scaled, quantitative data because it is based on Euclidean distance. The model is inherently exploratory, with the assumption or hope that the data is naturally clustered. K, the number of clusters, is also an informed assumption based on the elbow plot.
Data preparation: