Free online reading
The objective is to employ deep learning networks in order to learn a representation of
the ECG signal with the most important features. When it comes to the duration of the
ECG signal, a decision has to be taken. The use of data with the duration of one second is
motivated by two reflections:
· In order to reduce computational complexity, it is important to keep the size of data as
digestible as possible.
· By using ECG recordings which are too short in duration, the risk of not being able
to learn a typical heartbeat representation arises. One second pretty much guarantees
containing the important elements of an ECG signal such as the QRS complex (Figure
This work could certainly have some contributions in the medical field. Evaluating ECGs
is an important task mostly realized in a pure visual way by the doctors themselves. For
instance, it is generally known that the ischaemic heart disease (weakened heart muscle
affecting the blood supply) globally remains the top cause of death.
Can the removing of the human factor and the use of neural networks improve the way
the health of the heart is diagnosed and be made more efficient? This work steers towards
this potential goal and tries to find out how suited neural networks are in terms of processing
cardiac signals   . A possible additional classification task could make use of the
learned features and enable diagnoses based on pure artificial intelligence.
State of the Art
Over the last decades, a large number of computational techniques have been proposed as an
alternative way for automated classification. The most important component for achieving
an efficient classification is the extraction and selection of important features of the ECG
signals. Many algorithms have been proposed over these years that could be sorted in three
main categories: time domain, frequency domain and time-frequency domain. The modern
techniques work in time-frequency domain and they try to adopt all the advantages of the
classical methods. 
However, the existing algorithms and methods are limited to specific heart diseases and
they still leave room for improvement.
This project tries to address the following research questions in the best possible manner:
· What can an artificial neural network learn from an ECG signal?
· Which type of autoencoder is best suited for learning an ECG representation and what
is the motivation behind using this type?
· After having trained the network with a "clean" data set, what is the outcome when
feeding "noisy" data to it?
· Is it possible to predict an ECG lead based on training an autoencoder on a different
Electrocardiography (ECG) is a simple, painless and noninvasive medical procedure, which
measures the electrical activity of the heart over a period of time, by placing a number of
metal electrodes on the skin in specific locations. These electrodes, which are connected to a
device called electrocardiograph, are able to detect the tiny electrical impulses that the heart
produces in order to trigger its contraction.
Figure 2: ECG signal showing P wave, QRS complex and T wave
Although the Resting ECG, which is recorded while the patient is lying down, is the most
common form of ECG, other types do exist. In some cases the recording requires the human
body to perform some kind of effort. That's why running on a treadmill or cycling a bike are
typical applications for an Exercise ECG.
Finally, the 24-hour ECG is monitored throughout the day. The patient wears a small elec-
trocardiograph machine which is evaluated when returned to the doctor. 24-hour ECG data
is used as basis for this work.
MIT-BIH Arrhythmia Database
In the MIT-BIH Arrhythmia Database, there are 48 two-channel ambulatory records, each
lasting almost 30 minutes, which were collected from 25 men between 32 and 89 years
old and 22 women aged from 23 to 89 years old during the period 1975-1979 at Boston's
Beth Israel Hospital. Two of the recordings belong to the same person. 23 recordings were
selected randomly from a set of 4000 24-hours ambulatory measurements obtained by inpa-
tients (60%) and outpatients (40%). The rest of the recordings were also selected from the
same set to represent less common, but still significant types of arrhythmias. 
The recordings were digitized at 360Hz and the resolution was 11 bits over the range of
10mV. Each record was annotated beat by beat by two or more cardiologists independently.
These 110,000 annotations are included in the dataset.
Figure 3: Signal from MIT-BIH Arrhythmia Database
For the purposes of this project, the MLII lead of the set of ECG signals of the MIT-BIH
Arrhythmia Database is pre-processed in 3 steps:
· Resampling the signals to a selected frequency, in our case 250 or 300 Hz. The idea is
to use the smallest commonly used frequencies which still provide enough details.
· To later feed the data to the autoencoder, normalizing the data and adapting to the
functioning of neural networks, by obtaining values between 0 and 1, is required.
· The windowing step, meaning the splitting of the long recordings into 1 second parts,
is done without aligning the start of the window to a specific element of the ECG signal
(QRS complex for instance). The non aligning procedure leads to a greater variety of
1 seconds parts.
The concept of an autoencoder  can be described as a neural network, which tries to
reconstruct its input (number of nodes in the input and output layer are equal). It is an un-
supervised learning algorithm which applies backpropagation. All in all the network may
be viewed as consisting of two parts: an encoder function and decoder producing a recon-
struction (Figure 4). That's why autoencoders can be seen as a perfect tool for representation
learning, which is exactly what this work tries to achieve. By encoding and reconstructing
cardiac signals, the objective is to investigate what is learned at the level of the different
layers of the network.
Figure 4: General architecture of an autoencoder
A traditional autoencoder does not take into account that a signal can be seen as a sum of
other signals. A convolutional autoencoder, however, uses convolutional filters which are
able to filter parts of the input signal. This method allows this type of autoencoder to exploit
the fact that a signal is a sum of other signals.
In figure 5 we can see the architecture of the implementation of the convolutional. The
model uses several variables of which the two most important are the length of the convolu-
tional filter and the amount of the convolutional filters in each of the layers. When looking at
performance in terms of computing time it can be observed that the larger the convolutional
filter length and/or the larger the amount of filters, the longer it takes to compute the result.
Figure 5: Model of the convolutional autoencoder
Stacked Sparse Autoencoder
The second implementation is a stacked sparse autoencoder with 3 layers. A stacked sparse
autoencoder is a neural network that consists of multiple layers of sparse autoencoders, in
which the outputs of each layer are wired to the inputs of the next layer. In figure 6 you can
see the general model of the autoencoder.
Figure 6: Model of the stacked sparse autoencoder
The idea behind a sparse autoencoder is that the number of nodes in the hidden layers
can be even greater than the number of nodes in the input layer. An important factor is the
sparsity proportion. This factor defines how many of the nodes in the hidden layer will be
active during the process. Informally, we can think of a neuron as being active if its output
value is close to 1, or as being inactive if its output value is close to 0. We would like
to constrain the neurons to be inactive most of the time, by setting the sparsity proportion
low. With this approach we can benefit over a simple autoencoder with fewer number of
nodes, that are always active. More nodes in the hidden layer means more possible extracted
The experiments with the convolutional autoencoder mainly focused on reconstructing ECG
data and extracting the features learned by the autoencoder.
Reconstructing ECG data
The model of a convolutional autoencoder that gives the best results in terms of accuracy
is an autoencoder with the convolutional filter length between 5 and 15 and the amount of
convolutional filters between 25 and 35 per layer. The numbers are obtained by experimenta-
tion. These variables are used in the architecture described in the convolutional autoencoder
section. In figure 7 we can see some reconstructions of the convolutional autoencoder.
Figure 7: Original signal (blue) - Reconstructed signal (orange)
Extracting the learned features
The extraction of the learned features by the autoencoder is dependent on the convolutional
filter length and the amount of filters we used. The images below are using convolutional
filter length 5 and 100. The amount of filters used is 32. The layers which are displayed is
layer 1 and layer 7. These layers are chosen since the first layer contains the first filters of
the entire algorithm, while the 7th layer is the most reduced and filtered step of the model.
Interesting to see is that shapes that are present in the ECG can be found in the first 2
images. In the last 2 images we can really see that the QRS complex is recognized.
Layer 1, convolutional filter length = 5, number of filters = 32
Layer 7, convolutional filter length = 5, number of filters = 32
Layer 1, convolutional filter length = 100, number of filters = 32
Layer 7, convolutional filter length = 100, number of filters = 32
Stacked Sparse Autoencoder
The experiments with the stacked sparse autoencoder focus on the reconstruction of the ECG
data, the extraction of the features learned by the autoencoder and the connection between
the original signal and its features.
Reconstructing ECG data - SAE
The model of the stacked sparse autoencoder that seems to give the best results in terms
of accuracy is an autoencder with 100 nodes in the final layer and a sparsity proportion of
0.1. For this experiment we train the autoencoder with 5000 signals from the MIT-BIH
Arrhythmia Database. Each signal is 1 second long, and has a frequency of 250Hz. Figure
8 shows an example of a reconstruction of a signal based on the extracted features from the
Figure 8: Original signal (blue) - Reconstructed signal (orange)
Extracting the learned features - SAE
A stacked autoencoder allows the visualization of the features learned by the nodes in the
hidden layer. Our model with the 100 nodes in the final layer is able to extract 100 features
from the ECG signals. In figure 9 you can observe indicatively a set of 9 of the total 100.
Figure 9: A set of extracted features from the stacked autoencoder
Connection between ECG signals and features
The greatest challenge is to find out whether it is possible to detect the common patterns
between the original ECG signal and its main features, as those have been formed after
being extracted from the autoencoder. For this specific experiment, we extract the weights
map and detected the nodes with the highest weights. After that, we compare the original
signal with the features of those nodes. Please bear in mind that each signal is a combination
of multiple features, so it is impossible to find a perfect match between the signal and the
features. But in some cases it is noticeable that there is a common pattern between a feature
and the original signal.
In the following, we illustrate 2 examples. As you can see in the first case we can easily
notice the common pattern between the signal and its feature with the greatest weight. In the
second case, it is not very easy to notice a pattern, due to the abstract nature of the extracted
As a general comment we can say that the healthier the signal, the clearer the structure of
Figure 10: Healthy signal (left) and its feature with the highest weight (right).
Figure 11: Unhealthy signal (left) and its feature with the highest weight (right).
An interesting question that arises is whether it is possible to predict an ECG lead based
on another lead. Sometimes it's not possible to take measurements for all the leads, so we
use the alternative of MLII. In this experiment we will try to predict lead V1 based on the
measurements for the corresponding MLII. For this experiment we used 100 signals from
the MIT-BIH Arrhythmia Database , with duration of 1 second, frequency 250Hz and scale
The lead prediction problem can be handled as a fitting problem, so the neural network we
used is a fitnet one, which is a feedforward network with similar structure with an autoen-
coder. The most important difference is that the target output in a fitnet is different than
the input. In order to estimate the performance of the network, we used the most standard
performance function, which is mean squared error. We trained the fitnet with six different
training fitting algorithms:
2. Bayesian regularization
3. BFGS Quasi-Newton
4. Resilient Backpropogation
5. Scaled Conjugate Gradient
6. Gradient Descent
A table can be observed below, with the average performance of the different algorithms
for 10 repetitions (The MSE is measured in mV
Scaled Conjugate Gradient
The conclusion is that the Bayesian regularization training algorithm improves the performance of
the network in terms of estimating the target values. In the figure bellow you can see an example of V1
prediction in comparison with the original V1 signal.
Figure 12: Lead prediction
One of the research questions that is attempted to be answered is how the system would react to noisy
input data, while it is trained only with "clear" cardiac signals. For this purpose, the dataset from MIT-
BIH Arrhythmia Database is mixed with noisy signals of different frequencies using the WFDB Toolbox
for Matlab that is provided from Physionet for processing the signals.  
The experiments are conducted with the convolutional autoencoder, under the same circumstances as
the experiments with the "clear" data without removing the noise during the pre-process step. As it can
be shown in the image below, the autoencoder seems to identify most of the peaks and tries to reconstruct
them, but not successfully. The outcome is not representative as the reconstructed signal has very low
amplitude compared to the original signal, almost straight line in some cases, and a time delay of a few
ms regarding the peaks.
Figure 13: Original signal (blue) - Reconstructed signal (orange)
It can be observed that the autoencoders are able to reconstruct the ECG signal used as an input. While the
convolutional autoencoder is more versatile and the computing time is very low, the stacked autoencoder
gives the most accurate results.
Both autoencoders are able to extract features from the ECG signal. The extraction of features per-
forms better with the use of convolutional autoencoder since it is able to adjust the convolutional filter
length. When taking a smaller convolutional filter length there are really shapes of the signal in there. In
contrary if we take a longer convolutional filter length the QRS complexes will be highlighted in peaks.
In case a real accurate reconstruction of the ECG signal is desired, the stacked autoencoder might be
the better choice. However, when it comes to visualizing the learned features, it is better to choose the
more versatile convolutional autoencoder.
The prediction of a lead based on another lead can be seen as a successful experiment. We are able to
reconstruct lead V2 based on MLII. Knowing this we could have reconstructed other leads if the data was
The reconstruction of an ECG signal based on noisy data is less successful. It is not close enough to
a real world ECG. This is mainly due to lack of data. The expectation is that this result will also improve
by using a denoising autoencoder.
This work tries to explore which types of neural networks can learn which aspect of the ECG record-
ings. We looked into convolutional and stacked autoencoders and together with the previously mentioned
decisions, such as frequency or window length, we saw that feature extraction works in both cases.
Outside of the scope of this project and perhaps an interesting way of expanding this work, is the
processing of multiple leads, eventually covering the most popular ECG variant containing 12 leads. Also
heart recordings containing considerable amount of corrupted data (e.g. electrode lost skin contact) could
be processed in an appropriate way by detecting the noise and adapting the signal accordingly.
The big next step could possibly be to use a convolutional neural network to solve a classifying task.
The extracted features could serve as basis to detect the health state of a heart.
Eduardo Afonso, Aidos Helena, and Fred Ana. "ECG-based Biometrics using
a Deep Autoencoder for Feature Learning". In: (2017).
Kicmerova Dina. "Methods for Detection and Classification in ECG Analysis".
Ge Dingfei, Srinivasan Narayanan, and Krishnan Shankar M. "Cardiac arrhyth-
mia classification using autoregressive modeling". In: (2002).
AL Goldberger et al. "PhysioBank, PhysioToolkit, and PhysioNet: Compo-
nents of a New Research Resource for Complex Physiologic Signals". In: Cir-
culation (June 2000).
Bizjak Jani, Gjoreski Hristijan, and Gams Matjaz. "Deep learning for diagnos-
ing heart problems from ECG signals". In: ().
GB Moody and RG Mark. "The impact of the MIT-BIH Arrhythmia Database".
In: IEEE Eng in Med and Biol (May 2001).
Baldi Pierre. "Autoencoders, Unsupervised Learning, and Deep Architectures".
In: The Proceedings of Machine Learning Research (2012).
Min Seonwoo, Lee1 Byunghan, and Yoon Sungroh. "Deep Learning in Bioin-
formatics". In: (2017).
I Silva and G Moody. "An Open-source Toolbox for Analysing and Processing
PhysioNet Databases in Matlab and Octave." In: Journal of Open Research
Software (Sept. 2014).
15 of 15 pages
- Quote paper
- Hendricus Bongers (Author), 2018, Classification of Cardiac Signals using Deep Learning Networks, Munich, GRIN Verlag, https://www.grin.com/document/426714