Visualizing Type 2 Diabetes Prevalence: Localizing Model Feature Impacts

Authors

  • Youssef Sultan College of Computing, Georgia Institute of Technology, 801 Atlantic Dr NW, Atlanta, 30332, GA, United States
  • Mohammad Hammad College of Computing, Georgia Institute of Technology, 801 Atlantic Dr NW, Atlanta, 30332, GA, United States
  • Kelly Lester College of Computing, Georgia Institute of Technology, 801 Atlantic Dr NW, Atlanta, 30332, GA, United States

DOI:

https://doi.org/10.18517/ijods.5.2.64-74.2024

Keywords:

Spatial Epidemiology, Predictive Modeling in Healthcare, Health Disparities, Geospatial Data Analysis

Abstract

SHAP values have been a common approach used to understand machine learning model predictions by averaging the marginal contributions of each feature across every possible permutation of the feature set. Our research provides a localized view of SHAP values contributing to Type 2 Diabetes (T2D) prevalence in the United States from 2012 - 2021 covering each year independently. Instead of visualizing SHAP feature importance across an entire geographical dataset using a beeswarm plot, our approach is more granular. We visualize individual SHAP values of Social Determinants of Health (SDOH) features by county on a Choropleth map. Additionally, we found that replacing geographic identifiers such as zipcode with precise latitude and longitude coordinates before applying KNN imputation reduced the MSE by 10%. Our visualization reveals how specific factors influence T2D prevalence at the county level using a non-linear machine learning model. By re-appending the initially preserved geographic identifiers for each record by index, we traced the contribution of each SHAP value back to its locality. Our approach opens up a new geographical vantage point of the mechanisms of model predictions, thereby identifying localized key factors influencing Type 2 Diabetes (T2D). This study extends the possibilities for tailored interventions and public health policies showing how some factors have varying predictive impact on an outcome at the geographic level.

References

[1] Centers for Disease Control and Prevention. United states diabetes surveillance
system, 2024.
[2] OpenStreetMap contributors. Openstreetmap, 2024. Open Database License.
[3] Ivan Dokmanic, Reza Parhizkar, Juri Ranieri, and Martin Vetterli. Euclidean
distance matrices: A short walk through theory, algorithms and applications.
CoRR, abs/1502.07541, 2015.
[4] Feeding America. Hunger and poverty in the united states — map the meal gap,
2024.
[5] Debra Haire-Joshu and Felicia Hill-Briggs. The next generation of diabetes translation: A path to health equity. Annual Review of Public Health, 40:391–410,
2019.
[6] Steven A Hicks, Inga Str¨umke, Vajira Thambawita, Malek Hammou, Michael A
Riegler, P˚al Halvorsen, and Sravanthi Parasa. On evaluation metrics for medical
applications of artificial intelligence. Scientific reports, 12(1):5979, 2022.
[7] Anil Jadhav, Dhanya Pramod, and Krishnan Ramanathan. Comparison of performance of data imputation methods for numeric dataset. Applied Artificial
Intelligence, 33(10):913–933, 2019.
[8] Ziqi Li. Extracting spatial effects from machine learning model using local interpretation method: An example of shap and xgboost. Computers, Environment
and Urban Systems, 96:101845, 2022.
[9] Fang Liu and Demosthenes Panagiotakos. Real-world data: a brief review of
the methods, applications, challenges and opportunities. BMC Medical Research
Methodology, 22, 11 2022.
[10] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model
predictions. Advances in neural information processing systems, 30, 2017.

Downloads

Published

2024-12-31

How to Cite

[1]
Y. Sultan, M. . Hammad, and K. . Lester, “Visualizing Type 2 Diabetes Prevalence: Localizing Model Feature Impacts”, Int. J. Data. Science., vol. 5, no. 2, pp. 64–74, Dec. 2024.

Issue

Section

Articles