Making heart diseases detectable. The invention of an algorithm for systematically predictions

Research Paper (postgraduate), 2020

15 Pages, Grade: 3


Table of contents



Heart disease

Literature review

Proposed Algorithm
Data set
Heart UCI Dataset

Used Prediction models
Naive Bayes
K nearest neighbor
Decision tree
Random forest
Experimentation Results




In today’s world the heart disease is increasing. Hence a lot of data related to the heart disease is being collected by using data mining. This important can be evaluated and used to predict and detect the coronary artery disease and heart related problem before the occurrence of the fatal experience. In our research paper we will conduct and experimental analysis to seek an improved method to predict heart disease in the upcoming years. So efficient steps can be taken in order to predict and treat the avoidable fatal heart problem. We will be creating an efficient algorithm which will detect the disease on the basis of some parameters and give as much accurate information as possible. By using our method you can systematically predict the risk of suffering from this disease. The main feature utilized in the detection will include age, gender, max heart rate, exercise induced angina etc.

Keywords: Machine learning, data mining, supervised learning algorithms, Heart disease prediction.


Many different types of life threating diseases are amongst us but heart disease has been studied the most in medical research. Early diagnosis of the disease is a very difficult task. We want to introduce an automated way of prediction of heart disease in individuals. This solution is not one and all solution but it will serve as a complementary diagnosis in the field of medical research. The main task in heart disease is to detect the disease early and treat it efficiently before any fatal experience occurs.

Many techniques are used to detect the heart problem for example EKG, ECG; Echo monitoring these are the devices which are used for diagnosing the disease. Several factors play a major role in the increased risk of getting this disease. Obesity, lack of exercise, high blood pressure and high cholesterol all increase the risk of heart problems.

As the digital technology is rapidly growing a lot of data is available throughout the world. Often poor diagnosis can cause serious harm to the patient. Detecting disease on time is one of the major aspects of our system. Around 17 million deaths occur due to heart disease and strokes. Mental stress and physical stress also contribute to the increase in heart disease. But the diagnosis costs a lot for the everyday person even in the developed countries.

Many people cannot afford the treatment and the diagnosis test. Hence an automated system will be created which will use provided historical data in order to diagnose heart disease for everyone. Mainly our system will work as a support system rather than a full time diagnosing system. Our system will help the doctors to detect the disease on time and predict heart problems for younger people. So, effective steps can be taken in order to resolve this issue.

Heart disease

Heart is an organ of the body. It works as a pump; it supplies blood to all the other parts of the body. The human is dependent on the circulation of blood from the heart. Life is dependent on the efficient working of the heart.

Many factors play role in increasing chances of heart disease:

1) Smoking
2) Family history
3) Obesity
4) High blood pressure
5) Physical inactivity

While detecting the disease through machine learning algorithms mostly these parameters are kept in mind. As these parameters are important factors in increase anyone’s chances in acquiring this problem.

Literature review

Heart disorder (HD) is one among the most not unusual sicknesses today, due to variety of contributing elements, like excessive strain level, diabetes, sterol fluctuation, exhaustion and masses of others. companion diploma early diagnosing of such un wellness has been probe for numerous years, and masses of expertise analytics equipment are carried out to help fitness care providers to pick out some of the primary symptoms and symptoms of HD. numerous exams are frequently executed on capability sufferers to require the extra precautions measures to cut back the effect of having the sort of unwellness 1, and dependable techniques to are expecting early levels of HD, just like the techniques proposed throughout this paper, are frequently a critical challenge for saving lives. variety of Machine Learning (ML) algorithms, such as, Naïve mathematician, random Gradient Descents (SGD), Support Vector Machine (SVM), K- Nearest Neighbor (K-NN), Adaboost, JRip, name tree J48, et al had been carried out for the goal of category and prediction of HD dataset, and masses of promising consequences had been given inside the literature 2. Due to the complex nature of the HD, suggested exams, that must be prioritized 3, and projected strategies want to be selected rigorously, anyplace authors labored on appropriately and with performance are expecting coronary heart-associated hospitalizations supported the available patient-precise anamnesis, and five gadget studying algorithms, specifically SVM, AdaBoost, deliver regression, a naïve mathematician occasion classifier anyplace used, and consequences confirmed had been constant for all used classifiers for accomplishable prediction accuracy with a detection fee of 80 two. Authors in 4 projected companion diploma algorithmic rule to are expecting the lifestyles of coronary heart disorder exploitation Back Propagation MLP (Multilayer Perceptron) of Artificial Neural Network on a given HD dataset classifications, and cubic centimeter algorithms, mainly neural networks for the postulation of HD instances become hired in 5, anyplace authors projected to broaden companion diploma software which may also are expecting the vulnerability of a cardiovascular disorder given simple signs like age, sex, pulse fee, and neural networks confirmed the most accurate and dependable algorithmic rule for the projected device. a expertise mining version has been developed 6 exploitation Random Forest classifier to decorate the prediction accuracy and to research numerous occasions related to cardiovascular disorder, and experimental consequences confirmed that category exploitation Random Forest Classification algorithmic rule are frequently with achievement hired in predicting the occasions and threat elements related to HD.

Proposed Algorithm

In our system the main task is to predict how many people will get heart disease or not on the basis of the historical data available. Our architecture will work on a top to bottom approach. Heart disease dataset will be downloaded. Than the missing values from the dataset will be extracted. After dataset preprocessing the dataset will be divided into 0 and 1 format. When no heart disease will be detected the results will display 0 and 1 will be displayed for positivity of the heart disease.

Abbildung in dieser Leseprobe nicht enthalten

Figure 1. Proposed model (author own created)


Heart disease prediction is basically a classification or a clustering algorithmic problem. We performed data dimensionality on the dataset, so more perceive data is only used to make better and accurate predictions. We will be preforming the machine learning algorithms four algorithms.

1) Naïve Bayes
2) K- nearest neighbor
3) Decision Tree
4) Random forest

Data set

In order to remove biasness and skewedness from the data the data set needs to be preprocessed. It is a necessity to preprocess the dataset for the effective utilization. The dataset is divided into training and testing dataset for accurate predictions. Preprocessing mostly adapts the missing values, standard scalar and robust scalar will be adopted to remove the unutilized values from the data set. Any value which is missing from the data set gets deleted by using these techniques. Standard scalar makes sure that values 0 and 1 variance.

Abbildung in dieser Leseprobe nicht enthalten

Figure 1 Dataset division (author own created)

The dataset is divided into 0 and 1. Total value of 1 is 165. 1 represents the people with heart disease and the 0 is 138. 0 represents the value of people not having the heart disease.

Heart UCI Dataset

Heart Disease Dataset in particular is called the Cleveland dataset is mostly used for machine learning research. The dataset contains 76 attributes but out of the 76 attributes 14 are specifically used for diagnosis in a systematic manner.

- Feature Extraction

Any new feature is extracted from the available original set of features. Principal component analysis (PCA) is utilized to extract the subset features from the data set. The best reconstruction is provided by PCA algorithm.


Excerpt out of 15 pages


Making heart diseases detectable. The invention of an algorithm for systematically predictions
Machine learning
Catalog Number
ISBN (eBook)
ISBN (Book)
machine learning, detection
Quote paper
Daniyal Baig (Author), 2020, Making heart diseases detectable. The invention of an algorithm for systematically predictions, Munich, GRIN Verlag,


  • No comments yet.
Read the ebook
Title: Making heart diseases detectable. The invention of an algorithm for systematically predictions

Upload papers

Your term paper / thesis:

- Publication as eBook and book
- High royalties for the sales
- Completely free - with ISBN
- It only takes five minutes
- Every paper finds readers

Publish now - it's free