Diabetes is gradually becoming a global challenge owing to the gradual increase in the number of cases of Type 2 diabetes mellitus (T2DM). T2DM is characterized as a state of hyperglycaemia due to abnormal control of insulin levels that eventually affects metabolism. This study aimed to review articles that implement machine learning methods to identify suitable risk factors for prediabetes.
The study adopted the preferred reporting items for systematic review (PRISMA) protocol and research questions were formulated by the identification of synonyms and related terms "predictors and prediabetes and machine learning" from PubMed and Google scholar. Both observational and interventional original articles that were published between 2018 and 2023 were included in this study. Eligibility for inclusion was determined by scanning the article title, abstract, and study methodology section.
Table of Contents
1. Introduction
1.1 Situation of Diabetes Mellitus
1.2 Introduction to Machine Learning
2. Materials and methods
2.1 Data Extraction and Analysis
3. Results
4. Discussion
Purpose and Research Scope
This study aims to conduct a systematic review of existing research to evaluate how machine learning methodologies are implemented for the identification of risk factors associated with prediabetes, ultimately evaluating model performance for future screening applications.
- Analysis of machine learning algorithms in prediabetes diagnostic models
- Evaluation of socio-demographic, clinical, and biochemical risk factors
- Comparison of diagnostic performance metrics such as AUC-ROC and F1-Score
- Assessment of the applicability of prediabetes models in primary healthcare settings
- Investigation of global trends in prediabetes prevalence and early diagnostic necessity
Excerpt from the Book
1.2 Introduction to Machine Learning
Machine learning is a discipline that deals with the use of data and algorithms to reproduce human actions. Machine learning is an important domain in the rising field of Data Science. With the application of statistical methods, computer algorithms learn to classify data and make appropriate predictions thus discovering important insights. Machine learning algorithms are categorized as; supervised, semi-supervised, and unsupervised learning methods. Supervised learning is defined by the use of labeled datasets to train the algorithms that it uses to forecast results. Input data, also known as the training data, are subjected to the model and it’s upon the model to adjust the weights until the model appropriately fits the function. The general idea in supervised learning includes defining a learning function y = f(x) from the inputs data x and their corresponding outcomes y. Thereafter, the algorithm is expected to predict the outcomes for the test dataset. Supervised learning is best applied for classification tasks such as disease diagnosis where the outcome is either present or absent. Common supervised methods include can be sub-categorized as regression, decision trees, Naïve Bayes, and support vector machine [3], [5].
On the other hand, unsupervised learning utilizes algorithms to analyze and group unlabelled datasets. These sets of algorithms determine patterns or groups in data. In unsupervised learning, the learning function determines essential features about the distribution of the inputs x and the outcomes y from the training dataset. Afterward, when the unsupervised trained model encounters a test dataset, it determines the outcome of every data record based on the previously learned function. The main unsupervised learning problems include association, clustering, and dimensionality reduction The capability of unsupervised learning algorithms to learn likenesses and variances in data makes these algorithms essential in exploratory data analysis and pattern recognition among other applications. Unsupervised learning methods are applied for the reduction of the number of features in a model using algorithms such as principal component analysis (PCA) and singular value decomposition (SVD).
Summary of Chapters
1. Introduction: This chapter contextualizes the global burden of diabetes and prediabetes, highlighting the critical need for early diagnosis and the role of machine learning in developing predictive diagnostic tools.
2. Materials and methods: This section details the systematic review process, including search strategy, inclusion criteria (articles published 2018-2023), and the specific methodology used for data extraction and analysis of the selected studies.
3. Results: This chapter provides a comparative overview of four key research articles, summarizing their respective methodologies, algorithms—such as Random Forest and XG Boost—and their predictive performance accuracy.
4. Discussion: This section evaluates the findings, emphasizing the importance of focusing on accessible risk factors and the need for standardized, globally generalizable screening tools in primary healthcare.
Keywords
Prediabetes, Machine Learning, Risk Prediction, Diabetes Mellitus, Systematic Review, PRISMA, AUC-ROC, Feature Selection, Clinical Diagnostics, Data Science, Supervised Learning, Predictive Models, Healthcare Screening
Frequently Asked Questions
What is the primary focus of this research paper?
The paper provides a systematic review of machine learning models currently used to identify risk factors and predict prediabetes in clinical settings.
Which fields are covered as central themes?
The central themes include machine learning methodology, clinical diagnostic criteria for prediabetes, risk factor identification, and systematic review protocols.
What is the overarching research goal?
The goal is to determine how effective various machine learning models are in detecting prediabetes, with the ultimate aim of promoting early diagnosis and better disease management.
Which methodological approach was employed?
The study followed the PRISMA systematic review protocol to search, identify, and filter relevant observational and interventional articles published between 2018 and 2023.
What does the main body of the work address?
The main body examines individual studies, analyzes specific machine learning algorithms like Support Vector Machines and Random Forest, and discusses the importance of modifiable risk factor selection.
Which criteria define the effectiveness of the models in this study?
The study measures model effectiveness primarily through the Area Under the Receiver-Operating Characteristic curve (AUC-ROC), with additional considerations for F1-score and sensitivity.
How does the study view the role of feature selection?
Feature selection is presented as a crucial step to remove multicollinearity and focus on easily accessible patient data, though the authors note that results vary depending on the specific algorithm used.
What conclusion do the authors draw regarding global health?
The authors conclude that existing models are promising but need to be generalized so that screening tools can be applied to diverse populations regardless of region or ethnicity.
- Arbeit zitieren
- Amos Olwendo (Autor:in), 2025, A survey of Machine Learning Models for Prediabetes Screening, München, GRIN Verlag, https://www.grin.com/document/1567635