Generating Counterfactual Explanations for Electrocardiography Classification with Native Guide


Bachelor Thesis, 2021

89 Pages


Excerpt


Contents

1. Introduction
1.1. Topic and Related Work
1.2. Research Approach: Design Science
1.2.1. Aims and Objectives
1.2.2. Research Questions
1.3. Thesis Outline and Research Methods

2. Background
2.1. Time Series Data
2.2. Basic Understanding of Artificial Intelligence, Machine Learning, and Classification
2.3. Convolutional Neural Network (CNN)
2.4. Explainable Artificial Intelligence (XAI)
2.5. Counterfactual Explanations
2.6. ECG Signal Data
2.7. Openly-accessible ECG Datasets
2.7.1. ECG
2.7.2. ECG
2.7.3. PTB
2.7.4. PTB-XL

3. Native Guide: A Counterfactual Explanation Technique
3.1. Reference Method
3.1.1. Learn or Load Classifier
3.1.2. Class-Activation-Map (CAM)
3.1.3. Finding the Native Guide
3.1.4. Perturbation
3.2. Investigation and Observation of the Method
3.2.1. Comparison of Classifiers
3.2.2. ECG Signal Strength and Wavelength
3.2.3. Swapped Subsequence-Length
3.2.4. Data Quantity, Length and Variety
3.2.5. Different Decision Boundaries
3.3. Experimental Approaches for Optimization
3.3.1. Normalization and Synchronization
3.3.2. Swapping Points instead of Subsequences
3.3.3. Shifted Decision Boundary

4. Evaluation: Expert Interview
4.1. Goal, Structure and Approach
4.2. Expert Background
4.3. Interview Results
4.3.1. Usage of ECG
4.3.2. ECG Data Quality
4.3.3. General Attitude towards Counterfactuals
4.3.4. Plausibility of Counterfactuals
4.3.5. Improvement Ideas
4.3.6. Possible Use-Cases

5. Discussion

6. Conclusion and Future Work

References

Appendix
A. Expert Interview Analysis and ECG Plots
B. Synchronization of ECG Plots
C. Algorithms

List of Figures

1.1. Thesis Overview

2.1. Neurons and Artificial Neural Networks

2.2. CNN for Multivariate Time Series

2.3. Human-Agent Interaction

2.4. ECG Waveform

2.5. Lead Positions and Axes

2.6. Normal vs. Abnormal ECG

3.1. FCN

3.2. ResNet

3.3. Inception

3.4. Optimal Warping Path for two Sequences ECG200

3.5. Implausible/Plausible Counterfactuals

3.6. Decision Boundary Shifting

3.7. ECG Synchronization

3.8. Perturbation Approach

List of Tables

2.1. Leads Description

2.2. ECG200

2.3. ECG5000

2.4. PTB

2.5. PTB-XL

3.1. Classifier Accuracy Results

3.2. Classifier Accuracy Results (Normalized Data)

List of Algorithms

1. Get CAMs for each T q

2. Generation of T by Perturbation

3. Normalization and Synchronization

4. Normalization norm(T)

5. Wavelength Synchronization waveSync(T1, T2)

6. Peak Alignment shift(T1, T2)

7. Generation of T by Point-for-Point Perturbation

Abstract

Explanations are essential components in the promising fields of artificial intel­ligence (AI) and machine learning. Deep learning approaches are rising due to their supremacy in terms of accuracy when trained with huge amounts of data. Because of their black-box nature, the predictions are also hard to comprehend, retrace, and trust. Good explanation techniques can help to understand why a system produces a certain prediction and therefore increase trust in the model. Understanding the model is crucial for domains like healthcare, where decisions ultimately affect human life. Studies have shown that counterfactual explana­tions in particular tend to be more informative and psychologically effective than other methods.

This work focuses on a novel instance-based technique called “Native Guide”, that generates counterfactual explanations for time series data classification. It uses nearest neighbour samples from the real data distribution with class change as a foundation. This thesis applies the method on the explanation of electrocardiogram (ecg) classification, a very complex and vital medical field where every single ecg carries unique features. Native Guide for ecgs is explained, examined and expanded by providing necessary background knowl­edge, amplifying aspects like plausibility, comparing different suitable models to each other and indicating benefits and downsides. Finally, counterfactual explanations for ecg data classification generated by Native Guide are evalu­ated by cardiologists by means of two expert interviews.

Synchronization of the periodic ecg data was shown to be the most important contribution to the method that enabled the generation of plausible coun- terfactuals. The experts, who had never seen or used counterfactuals in their work, were interested in this approach and could envision its application within the field when it comes to training junior doctors . In general, AI classifica­tion along with sophisticated proximate counterfactuals indicate success and reliability when it comes to the identification of heart diseases.

Zusammenfassung

Erklarungen sind wesentliche Komponenten in den vielversprechenden Berei­chen der KI und des maschinellen Lernens. Deep-Learning-Ansätze sind auf dem Vormarsch, weil sie beim Training mit riesigen Datenmengen eine präzise Genauigkeit aufweisen. Aufgrund ihrer Blackbox-Natur sind die Vorhersagen aber auch äaußerst schwer zu verstehen, nachzuvollziehen und zu vertrauen. Gute Erklaärungstechniken käonnen helfen zu verstehen, warum ein System eine bestimmte Vorhersage macht, und somit das Vertrauen in das Modell steigern. Das Verstaändnis des Modells ist in Bereichen wie der Gesundheitsfuärsorge, wo Entscheidungen letztlich das Leben von Menschen beeinflussen, von entschei­dender Bedeutung. Studien haben gezeigt, dass insbesondere kontrafaktische Erklaärungen informativer und psychologisch wirksamer sind als andere Metho­den.

Diese Arbeit befasst sich mit einer neuartigen instanzbasierten Technik na­mens Native Guide“, die kontrafaktische Erkläarungen fuär die Klassifizierung von Zeitreihendaten generiert. Sie verwendet Stichproben aus der realen Da­tenverteilung mit Klassenwechsel als Grundlage. Die Arbeit wendet die Metho­de auf die Erklaärung der Klassifizierung von Elektrokardiogrammen (ekg) an, ein sehr komplexes und wichtiges medizinisches Gebiet, in dem jedes einzelne ekg einzigartige Merkmale aufweist. Native Guide fuär ekgs wird erklaärt, un­tersucht und erweitert, indem notwendiges Hintergrundwissen vermittelt wird, Aspekte wie Plausibilitaät verstaärkt, verschiedene geeignete Modelle miteinan­der verglichen und Vor- und Nachteile aufgezeigt werden. Schließlich werden kontrafaktische Erklaärungen fuär die Klassifizierung von ekg-Daten, die mit Native Guide generiert wurden, von Kardiologen im Rahmen von zwei Exper­teninterviews bewertet.

Es stellte sich heraus, dass die Synchronisation der periodischen ekg-Daten der wichtigste Beitrag zur Methode war und die Generierung plausibler kon­trafaktischer Erklaärungen ermoäglichte. Die Experten, die in ihrer Arbeit noch nie Kontrafakten begegnet sind, zeigten sich an dem Ansatz interessiert und konnten sich verschiedene wichtige Anwendungsfelder vorstellen, wie z.B. die Ausbildung von Assistenzaärzten in der Inneren Medizin. Im Allgemeinen ver­sprechen KI-Klassifizierung und ausgereifte, naäher liegende Kontrafakten eine erfolgreiche Zukunft fuär eine zuverlaässigere Erkennung von Herzkrankheiten.

Acknowledgements

Over the last few months, I dedicated myself to tackle the research problem of explaining the often called unexplainable deep learning models through coun- terfactuals in the incredibly interesting and crucial but also deeply complex field of ecg classification. To accomplish this objective, I needed to be a gen­eralist as well as a specialist. As a generalist I needed to know all general background knowledge on the topics, and as a specialist I needed to know the intricacies of electrocardiography and be able to explain counterfactuals within the Native Guide. I am thankful for all the opportunities to learn and develop my skills and knowledge in the field of xai during the different stages of writing this thesis.

Special thanks goes to all people that supported me. First of all, I want to thank my supervisors and advisors that guided and helped me with all my questions and problems during the whole time. Second, thanks to the cardi­ologists that made it possible to gain deep insights into day-to-day work with ecgs and prospectively counterfactuals. Third, I want to thank all my friends that provided me with valuable discussions, reviews and helped me to keep focus on the work and progress.

Viktoria Andres

1. Introduction

1.1. Topic and Related Work

The field of artificial intelligence (AI) continues to evolve and become more and more complex. Many machine learning algorithms, especially those that include artificial neural networks, achieve exceptional performance and accu­racy. To give an example, DeepMind's AI AlphaFold determines a protein's 3D shape from its amino-acid sequence and therefore solved one of biology's greatest challenges 1. AlphaGo, also an AI of DeepMind, defeats interna­tional champions in the boardgame Go 2. A serious problem with accurate but deeply complex systems is interpretability. Greater attention for explain­able AI has recently become important with new research into modern deep learning models and their black-box character 3. More research is needed to realize sufficiently explainable models in all the different systems, data and use cases. This work focuses on time series data, specifically electrocardiograms (ecg), and their classification. Classification tasks predict class memberships for specified data inputs 4. For instance, predicting whether an electrocardio­gram corresponds to a normal or abnormal class. “Native Guide” is a method to generate instance-based counterfactual explanations on time series data 5. Counterfactuals are a subclass of different explanation techniques, that try to show alternatives to key elements of the data with a different outcome 678.

Besides Native Guide there are many approaches that try to generate good counterfactual explanations 59. However, only few can be applied directly to the time series classification field 51011. Counterfactuals for time se­ries data is still a largely unstudied problem. Methods from other fields, like image classification, cannot be directly adjusted to time series models due to the very complex nature of time series 12. Quite similar to Native Guide, E. Ates et al. 10 described a counterfactual generation approach for multivari­ate time series data using a greedy search algorithm on distractor instances. I. Karlsson et al. 11 explained a method for univariate time series tweaking, that does not require instances of the training data. A. Gee et al. 13 in­troduced prototypes as maximal representatives of a class, but do not make the connection to counterfactuals. Their work helps to get global insights into time series classification. Still, it is less useful looking at more discriminative areas of the time series where samples can have unique features 5. When it comes to using counterfactuals or other explanation methods, user studies like in the papers 14,15 and 16 have shown that users preferred counterfactuals more than other explainable methods like case-base reasoning or the feature importance approach. Thereby, this thesis focuses on the counterfactual ex­plainability method Native Guide. In the original paper 5 by E. Delaney et al. the method was promised to generate plausible, proximate, diverse and sparse counterfactuals by design.

1.2. Research Approach: Design Science

A Design Science 17 project centers around an artifact as the object of study. To thoroughly investigate the artifact two key components of the project, the design and the investigation, must be carefully constructed and followed. An artifact interacts with a problem context to improve that context in any pos­sible way. Therefore, Design Science problems are also referred to as improve­ment problems. Designs are solutions to a problem and artifacts aim for im­provements in the researcher's design. Methods, algorithms, or conceptual structures that hold all requirements are artifacts 1718.

This thesis investigates the problem context of counterfactual explanations in time series data. It also investigates the Native Guide method as the design ar­tifact using existing problem-solving knowledge and newly acquired knowledge from the investigative and observational steps.

1.2.1. Aims and Objectives

The goal of this thesis is to investigate different aspects about Native Guide, a generative counterfactual explanation method 5. First, an in-depth overview of background information is provided about the proposed method of inquiry. Second, we investigate the method and compare three different classifiers for time series data (Fully Convolutional Neural Network 19, Residual Network 19 and InceptionTime 20). Third, we go one step further and extend the method with additional ideas involving different altering of the generated counterfactual through synchronization, another perturbation approach and a shifted decision boundary. This should enable more reasonable use in the explanation of complex ecg data classification. Fourth, expert interviews eval­uate the counterfactuals from a user's perspective and share information about the problem space. And finally, critical examination of the benefits and draw­backs of the approach are examined.

1.2.2. Research Questions

All following research questions (RQ) should be interpreted in the context of time series ecg classification and counterfactual explanation using the later explained Native Guide method:

RQ1 Which scientific knowledge is needed to understand and implement the Native Guide method?

RQ2 Are there further technical approaches that could enrich the counterfac- tual goodness but still maintain the basic idea of the method?

RQ3 Does a counterfactual build trust in the prediction of a model?

RQ4 Does a counterfactual provide more insights about data than a prediction without counterfactual?

1.3. Thesis Outline and Research Methods

To answer the above stated research questions (RQ), various research meth­ods were applied. We systematically reviewed the knowledge needed to design and develop the Native Guide method through literature research, answer­ing RQ1 in the chapter 2 Background. Next, an experimental approach us­ing observations, qualitative and quantitative investigations and optimizations examines aspects about the method and aims to improve them. This infor­mation provided answers for the extensive question RQ2 in chapter 3 Native Guide: A Counterfactual Explanation Technique. Further, we evaluate the method in a qualitative problem-centered expert interview on complex ecg data with cardiologists. Answers to research questions RQ3 and RQ4 in chap­ter 4 Evaluation: Expert Interview, were obtained through expert interviews which provided more insights about the current use of ecgs. These interviews also provided insight for the future of predictive AI models and counterfactu- als in the medical cardiology field. Lastly, the chapters 5 Discussion and 6 Conclusion and Future Work complete the thesis.

Abbildung in dieser Leseprobe nicht enthalten

Figure 1.1.: Thesis Overview

2. Background

The next sections provide an overview of the required background knowledge for a counterfactual explainability method in the field of ecg classification. First, we introduce time series data in chapter 2.1. Second, we want to share a basic understanding of artificial intelligence, machine learning, and classifica­tion in 2.2. In behalf of that, we also introduce convolutional neural networks (cnn) more specifically in 2.3. Next, we explain explainable artificial intel­ligence (xai) in 2.4 and counterfactual explanations in 2.5. Lastly, we offer insights into ecg signal data and corresponding datasets in the sections 2.6 and 2.7.

2.1. Time Series Data

Time series are collections of observations obtained through repeated measure­ments over time 21. Any data with an ordered set of values can be adapted to a time series 2223. The following are formal defintions:

Abbildung in dieser Leseprobe nicht enthalten

Despite its omnipresent character, time series data analysis (tsa) is less re­searched than the analysis of other types of data, such as images or tabular data, especially for sub-domains. Different characteristics and temporal in­terdependencies make time series data difficult to analyse and predict future behaviour 22.

To give an overview, we define some global characteristics of time series data:

- Trend in a time series means that there is a long-term change in its mean level. In other words, the time series center around an increasing or decreasing line which suggest a trend 25.
- Seasonality patterns repeat themselves regularly and predictable over intervals of time with fixed length, often on a daily, weekly or monthly occasion 25.
- Periodicity is a cyclical pattern that (like seasonality) also repeats itself over time, but varies in frequency or cycle length 25.
- Serial correlation is the observed interdependence between a variable and its subsequent in the time series over periods of time 25.
- Skewness is the lack of symmetry left and right of the center point in a distribution 25.
- Kurtosis describes a peaked or flat characteristic, relative to a normal distribution. High kurtosis tend to have distinct peaks near the mean, rapid drops and heavy tails. Low kurtosis usually have a flat top near the mean and less sharp peaks 25.
- Linearity is a property of a time series, where each data point is a linear combination of past or future data points. Nonlinear time series usually describe much more complex dynamics 2526.
- Self-Similarity can be observed, if a time series has a substructure very similar or identical to its overall structure. This implies long-range de­pendencies in the time series 2527.
- Chaos is a random processes with sensitive dependence on initial values, which is also described as the butterfly effect. In a nutshell, very small changes or errors of a value lead to very big changes in the near future. Similar causes have completely different effects in chaotic time series 2528.

2.2. Basic Understanding of Artificial Intelligence, Machine Learning, and Classification

Artificial intelligence (AI) has gained different definitions over the past several years. We define AI corresponding to D. Poole et al. as “a field that studies the synthesis and analysis of computational agents intelligibly” 29.

Machine learning is a subfield of artificial intelligence and defined as a col­lection of computational methods that learn a model with data to improve performance or to make predictions 30. Such a model f takes an input data sample X and produces an output class label Y. We learn the model, so that an input space gets transformed into a correctly mapped output space f: X Y.

Deep learning is a subfield of machine learning that involves learning models with a high level of abstraction. They are composed of multiple processing layers, which create models that are referred to as artificial neural networks (ann), that perform nonlinear transformations on their input. To work effi­ciently, a network needs large amount of representative data. Deep learning is called deep, because of several layers of neurons that are stacked one after an­other 3132. Artificial neurons are analogous to biological neurons. They are the fundamental component for building artificial neural networks. Neurons receive inputs and produce outputs applying different parameters. This unit is also called a perceptron. Neural networks consist of multilayer perceptrons (mlp) which contain one or more hidden layers with multiple hidden neurons in them 33. Figure 2.1 shows a schematic representation of a neuron and an ann compared to its human counterpart.

Abbildung in dieser Leseprobe nicht enthalten

Figure 2.1.: Neurons and Artificial Neural Networks: (a) Human Neuron, (b) Artificial Neuron, (c) Biological Synapse, and (d) ANN Synapses 34

A classification task in the field of machine learning is a problem that requires predicting class memberships for data instances. In this case, the output Y is a probability space where a model f assigns a specific class label from Y with highest probability to the model input X 4.

2.3. Convolutional Neural Network (CNN)

This thesis focuses on convolutional neural networks (cnns) as a deep learning classification technique. The network structure is not a novel idea. It was first proposed by Fukushima 35 in 1988. However, computational limits made its application difficult at that time. Further improvement and development led to state-of-the-art results in many recognition tasks 33. Popularity spread when AlexNet 36 had won the ImageNet competition in 2012 and cnn be­came an important tool for many domains 23.

We define the architecture of a cnn as follows: an input layer, convolutional layers, pooling layers, a feature or flattened data layer, and an output layer:

- Input layer takes a time series as an input and has n x m neurons, where m is the input time series dimension and n the length of each series 33.
- Convolutional layer applies and slides learnable filters (i.e. convolutions) over the time series of the preceding layer. A filter can be pictured as a generic non-linear transformation that averages specified proportions of a time series with a moving window. The output of the filters can go through a batch normalization function, that maintains the mean output close to zero and the output standard deviation close to one, and an activation function, such as the sigmoid or ReLu function, to form the so called output feature maps 2333.
- Pooling layer divides a feature map into equal-length segments. In case of average of max pooling, every segment is represented by its average or maximum value. Pooling is down-sampling the feature maps and thus reducing variability in the hidden activations. However, not every cnn architecture incorporates pooling layers 33. Global Average Pooling (gap) does not divide the feature maps providing only one averaged segment for every feature that result in a single real value that represents a whole feature. A gap layer can be used instead of a feature layer, as it is demonstrated in figure 2.2 5.
- Feature layer (or flattened data layer) represents the original time series as a series of feature maps. Connecting all feature maps generates a new time series as the final representation of the original input in the feature layer 33.
- Output layer consists of c neurons that all correspond to a class in Y of a time series. It is fully connected to the feature layer. Then, usually the maximum output neuron represents the class label solving a classification task 33.

Abbildung in dieser Leseprobe nicht enthalten

Figure 2.2.: CNN for Multivariate Time Series 37

2.4. Explainable Artificial Intelligence (XAI)

In recent years, machine learning models have become more and more effective due to improved algorithms, ever growing datasets and exceeding computa­tional power. Improved predictive accuracy also involves an increase in the complexity of the models. Deep learning systems have boosted the predictive power but have also decreased the ability to understand and interpret decisions of a model. The black-box character and complexity of the deep learning sys­tems makes its inner workings hard to explain. On the other hand, white-box models typically produce explainable results, but lack in performance com­pared to the former 3839.

A system, that does not provide a rationale for a model's prediction, cannot be fully trusted. To be able to validate a model's correctness, detect bias or leakages, and potentially learn new patterns in the data, a basic under­standing of the model is crucial. This is vital for domains such as healthcare where decisions ultimately affect human life. There is a need for trustwor­thy, fair, and high-performing models that can explain their predictions and actions to their users. This is where explainable artificial intelligence comes into play 384041. Explainable artificial intelligence (xai) is one of many human-agent interaction problems 8 that tries to combine excellent perfor­mance with a human understanding of predictions and psychologically effective explanations 41. Human-agent interaction can be defined as the intersection of artificial intelligence, social science, and human-computer interaction (hci) 8, as shown in figure 2.3.

Abbildung in dieser Leseprobe nicht enthalten

A brief taxonomy of basic xai method characteristics 38:

- Model-specific methods are restricted to a related group of machine learning classifiers.
- Model-agnostic methods are applicable for any possibly classifier.
- Local methods produce explanations for one specific data instance.
- Global methods produce explanation for the whole model.
- Post-Hoc methods focus on analysing and explaining a black-box model after training by applying particular interpretation methods.
- Intrinsic methods restrict the model complexity trying to make it more interpretable due to a simpler structure.
- Enhancing fairness is another interpretability method that tries to fight inequalities and discrimination on model predictions. Ideally, machine learning algorithms should be impartial and non-discriminative. Differ­ent techniques try to remove bias from data and predictions and to train a model to make fair predictions.
- Sensitivity analysis is a method measuring the contribution of a single or multiple input variables to the output variance. That way the impact of each feature in the model prediction is examined.

2.5. might our life be like if we had made key choices differently? What if we had moved to another city, attended a different university or chose to have no kids? It is common toCounterfactual Explanations

What ask questions like that occasionally. In fact, these types of questions are counterfactuals 6.

We will now take a look at the logical understanding and intuition behind counterfactuals. Factual conditionals state that if one fact is true, then so is another (p =^ q) 42. A factual condition in natural language would be:

“If I wash my hands for 30 sec, then they get clean.”

When the condition is known for sure it becomes a fact, and we can construct a counterfactual:

“If I had not washed my hands for 30 sec, they would not have got clean.”

That is just one possible counterfactual. Another variation is:

“If I had not washed my hands for 30 sec, they would have got clean.”

The next variation is not possible, because it contradicts to the initial assump­tion, and therefore it is not a valid counterfactual:

“If I had washed my hands for 30 sec, they would not have got clean.”

Finally, we could also alter the“30 sec”and get infinite different counterfactuals like:

“If I had washed my hands for 10 sec, they would have got clean.”

There are various use-cases for counterfactuals. One compelling use-case is in the field of xai. Counterfactuals can help to make complex and incomprehen­sible black-box system predictions more intelligible to developers and users, thus paving the way for interpretable models. An interpretable model gives insights about aspects of the system and helps to understand the causes of its decision making process. It enables the user to see different (counterfactual) explanations, such as why one decision was made instead of another, and to predict how a change to a feature will affect the system's output 78.

For the scope of our black-box (time series) classification tasks, we define more specific and restricted counterfactuals as follows:

Definition 5 (Counterfactual) Given an input x G X and the corresponding output class c G Y predicted by a classifier model b(x) = c, a counterfactual X is a perturbed instance of x that generates a different output class b(X) = d G Y 43 .

2.6. ECG Signal Data

This thesis will focus on time series data generated by an electrocardiogram (ecg). ecgs are used in the medical field to detect heart anomalies through activity measurements of electrical impulses produced by the heart. An ecg machine records this electrical activity and displays it as a trace on paper, which then is interpreted by a medical or cardiac expert. Damage or disorders of the heart can change its electrical activity and thus the ecg trace. ecg has become the most important tool when it comes to the diagnosis of heart diseases 44.

Abbildung in dieser Leseprobe nicht enthalten

Figure 2.4.: ECG Waveform 45

ecg signals are periodic time series where each beat consists of a p wave, qrs complex, and a t wave. We distinguish between different peaks (p, q, r, s, and t), intervals (pr, qrs, st, and qt), and segments (pr, st), all with inherent characteristics like amplitude and duration 45, which can be seen in figure 2.4.

Abbildung in dieser Leseprobe nicht enthalten

Figure 2.5.: Lead Positions and Axes 46

There are different approaches to produce an ecg. A standard ecg has 12 leads, which measure the heart activity. A lead is a pair of electrodes (+ve and -ve) placed on different regions of the body and connected to an ecg recorder, as shown in figure 2.5. Bipolar leads record the potential difference between two points. Unipolar leads, on the other hand, record the electrical potential at one particular location by a single electrode 45.

Abbildung in dieser Leseprobe nicht enthalten

Table 2.1.: Leads Description 45

Classification of ecgs is very complex because different abnormalities do look different and can be very subtle. They may occur only in specific leads. That is why it is essential to produce multi-lead ecgs. Nevertheless, figure 2.6 shows one example of a normal and an abnormal ecg signal, specifically an abnormal change of the st segment and t peak 47.

Abbildung in dieser Leseprobe nicht enthalten

Figure 2.6.: Normal vs. Abnormal ECG 47

Early detection and discrimination between different disorders is crucial for proper treatment of patients. Classification using machine learning algorithms can help cardiologists to identify and distinguish heart problems and save lifes by providing the proper therapy 44.

2.7. Openly-accessible ECG Datasets

There are not many ecg datasets openly accessible that are also sufficient to train an artificial neural network. This thesis examined four different data collections with different qualities, presented in the following.

2.7.1. ECG200

The ecg200 dataset has 200 ecg samples each with only one heartbeat wave­form and 96 data points. It is a binary 1-lead univariate dataset. The dataset can be easily accessed through the ucr time series archive 4849, but the small amount of data makes is almost impossible to learn a good deep neural network with much more learnable parameters than provided data samples 50.

Abbildung in dieser Leseprobe nicht enthalten

Table 2.2.: ECG200

2.7.2. ECG5000

The ecg5000 dataset is also accessible through the ucr archive 4849 and holds 5000 samples extracted from a single patient and is divided into five different classes. Each data sample is 140 points long 51. Even though it has more samples than ecg200, it still does not provide enough to learn a good deep neural network 50, especially for class 2, 3, and 4.

Abbildung in dieser Leseprobe nicht enthalten

Table 2.3.: ECG5000

2.7.3. PTB

A ecg heartbeat categorization dataset on kaggle 52 used the data from the ptb diagnostic ecg database 53 to prepocess it to enable useful classification with deep learning algorithms. Their preprocessed dataset consist of 14552 samples and is using only one lead (univariate) instead of the original 15 leads in ptb. The sampling frequency is 125 Hz resulting in 188 data points as the length of an ecg. It is a binary 1-lead univariate dataset differentiating be­tween normal and abnormal ecgs. Classification examples seem to accomplish good accuracy. However, to obtain equal length shorter samples are padded with zeros in the end and peaks are not synchronized, which makes it harder to use such data for the Native Guide method.

Abbildung in dieser Leseprobe nicht enthalten

Table 2.4.: PTB

2.7.4. PTB-XL

The ptb-xl dataset consist of 21837 samples collected from 18885 different patients. The ecgs have 12-leads, so the data is multivariate. It represents one of the largest freely accessible ecg dataset. The instances are collected over a time period of 10 seconds with a sampling frequency of 100 Hz. Therefore all data samples have 1000 data points per lead. Furthermore, it is a multi-label dataset labeled by two doctors, which means that one sample can be assigned to multiple classes. Being able to assign a sample to multiple classes more closely mirrors reality since patients can have more than one abnormality 47.

Abbildung in dieser Leseprobe nicht enthalten

Table 2.5.: PTB-XL

All things considered, we work with the ptb-xl dataset for later investigation and optimization of the method. To make the data more reasonable some time series can not be considered for analysis. All time series that are labeled as normal together with any other abnormal class are illogical and thus excluded during preprocessing. We assume that such labels exist because of different classifications proposed by the doctors.

3. Native Guide: A Counterfactual Explanation Technique

The following chapters describe a novel counterfactual explanation technique called “Native Guide”, originally proposed by E. Delaney et al. 5. After explaining the method and implementation, further adjustments and exten­sions are introduced to reach more generalizable and proficient explanations. In the following we refer to the time series, of which we want to produce a counterfactual, the query time series T q.

3.1. Reference Method

Unlike the original paper 5, this thesis defines Native Guide as an instance­based model-specific post-hoc explanation technique that generates counter- factual explanations for time series classifiers. The following points clarify the basic characteristics of the method and different views compared to E. De­laney et al. 5. After that, we define and explain the four technical steps of the generation process based on 5.

- Instance-based: The generation process is based on existing instances in the dataset. This involves a given query time series that needs to be explained and the closest neighbour time series that is in another class. This way the generated time series are usually very similar to the query time series and in distribution of the data 5. This also means that the method not only requires access to the classification model but also the training data, or at least a representative part of it.

- Model-specific: The paper 5 called this method model-agnostic, mean­ing it should be applicable for any possibly classifier. Given the fact that the algorithm requires feature maps and weights from the last model layers 5, we can argue that the method is model-specific. Not all anns incorporate these.

- Generation: The method generates new time series data that is use­ful to explain predictions of a model. This also has potential for data augmentation in sparse time series datasets 5.

3.1.1. Learn or Load Classifier

The first basic step of the method is to get an underlying (black-box) classi­fier b for the data that has to be explained. This classifier either needs to be trained on respective data, that represents the classes for the to be explained data, or be pretrained on other but similar data and then fine-tuned to get better accuracy. It also needs to meet certain requirements that make the next steps possible. First, the model has to produce feature maps, which are the outputs of filters applied to the outputs of a previous layer 54. Second, the model needs some kind of global-average-pooling (gap) layer, where all feature maps get compressed to one value each. This precedes a dense layer, which associates the different feature values to all possible classes with a weighted connection 523.

Models without a gap layer can not be used, since it is a key element for the next chapters 3.1.2 and 3.1.4. Known approaches, that produce feature maps and employ a gap layer are: Fully Convolutional Neural Network (fcn), Residual Network (ResNet) and InceptionTime. All three are described in the following.

Fully Convolutional Neural Network (FCN)

fcn 55 for time series data was first proposed by Z. Wang et al. 19 and evaluated by H. I. Fawaz et al. 23 and N. Strodthoff et al. 47. It is a con­volutional neural network without any local pooling layers. The consequence is that the length of the time series are kept unchanged throughout the layer operations. The architecture replaces the final fully connected layer with a Global Average Pooling (gap) layer reducing the number of parameters and enabling class activation mapping. As illustrated in figure 3.1, a fcn consists of three convolutional layer blocks. Each block contains three operations: a convolution, a batch normalization and a ReLU activation function in the end 1923.

Abbildung in dieser Leseprobe nicht enthalten

Figure 3.1.: FCN 23

Residual Network (ResNet)

Likewise, ResNet 56 for time series was also first proposed by Z. Wang et al. 19 and evaluated by H. I. Fawaz et al. 23 and N. Strodthoff et al. 47.

The architecture is pictured in figure 3.2 and consists of 12 layers, of which two are the input and output layers, nine are convolutional layers and one is a gap layer. Each convolution is followed by batch normalization and a ReLU activation function. ResNet also has a shortcut residual connection between each three consecutive convolutional layers, which also explains the name of the architecture. The linear shortcut enables flow of gradients directly through these connections, which makes training easier by reducing the vanishing gra­dient effect. This effect happens when the partial derivatives that control the training process are becoming smaller and smaller, which mitigates prediction performance 23.

Abbildung in dieser Leseprobe nicht enthalten

Figure 3.2.: ResNet 23

InceptionTime

InceptionTime was introduced by H.I. Fawaz et al. 20 and compared to fcn and ResNet by N. Strodthoff et al. 47. It is described as an ensemble of five Inception networks 57, with each prediction given an even weight. As shown in figure 3.3, InceptionTime consists of two blocks with each three inception modules rather than fully convolutional layers. Besides an input, output, and gap layer, it also has two residual connections. It has one residual connection less than ResNet, which has three. Inception modules perform convolutions, but also incorporate bottleneck layers that reduce the dimensionality of mul­tivariate time series and model's complexity. They also mitigate overfitting problems for small datasets, where the system memorizes the training data instead of learning predictive rules. Also, Inception modules allow to have much longer filters than ResNet with about the same number of parameters to be learned 20. The convolution is simultaneously applying multiple filters of different lengths on the same input time series. Furthermore, parallel max pooling makes the model invariant to small perturbations.

Abbildung in dieser Leseprobe nicht enthalten

Figure 3.3.: Inception 20

3.1.2. Class-Activation-Map (CAM)

The second step of the method produces a Class Activation Map. This process can be done for every time series in the dataset. First, the feature maps from the last convolutional layer are collected. One could picture feature maps for a time series input as an array with the length of that time series. Each element in that array is an importance value for every point in the specific input time series and corresponds to the features extracted by the applied filter. The higher the value the more important a data point is for that feature. Second, the weights that connect the global average pooling with the last dense layer are obtained. There should be weights from every feature value at the gap layer to every class. The weights suggest how important a feature is to belong to a class. The higher the value, the more important is that feature for a class. Finally, the weighted sum of all feature maps is calculated for every time series. This process is done by multiplying every feature map with its weight for the class of the time series and adding the weighted feature maps together to get an importance value for each data point for that class 5. We additionally illustrate the procedure in the algorithm 1.

Algorithm 1 Get CAMs for each Tq

Abbildung in dieser Leseprobe nicht enthalten

3.1.3. Finding the Native Guide

The third step of the method is to search for a query's (T q) counterfactual nearest (unlike) neighbour (T NUN ), also called ”native guide“. This is a time series from the training data that looks most similar to the query time series, but is in another class. To find the native guide, we need to calculate the distance between the points in the query time series and the points in native guide candidates. The candidate with the smallest distance to the query is the query's (T q) counterfactual nearest (unlike) neighbour (T NUN ). It is recom­mended to use Dynamic Time Warping (dtw) as a distance metric 5. Unlike the euclidean distance, that compares every two points of two sequences at the same time stamps, dtw compares the differences of two sequences by align­ing an optimal warping path using sampled points in time and a cost matrix. The more similar two points (x q, x NUN) are, the lower their cost c (x q , x NUN) 5859. Figure 3.4 shows such an exemplary warping path.

Abbildung in dieser Leseprobe nicht enthalten

Figure 3.4.: Optimal Warping Path for two Sequences from ECG200

3.1.4. Perturbation

The last step is the generation of the counterfactual T by performing a per­turbation between the query T q and the Native Guide T NUN . That means, a subsequence S q in T q is swapped by a feature important subsequence S NUN of T NU N that is provided by the class activation mapping. In the beginning, the length l s of the swapped subsequences is very small, e.g. l s = 1, and we increase l s , until the generated T is classified into the desired counterfactual class cC G Y. To be able to do this, all data must have the same length. We illustrated this step in algorithm 2.

Algorithm 2 Generation of T' by Perturbation

Abbildung in dieser Leseprobe nicht enthalten

3.2. Investigation and Observation of the Method

This chapter investigates the method based on diverse points of interests that could affect counterfactual goodness to gain more insights about the Native Guide's workings and also collects observations that give a foundation to pro­pose hypotheses about potential improvements in the next chapter. First, we compare three different potential classifiers in chapter 3.2.1. Second, we inves­tigate the influence of ecg signal strength and wavelength in 3.2.2. Next, we investigate the swapped subsequence-length in chapter 3.2.3 and influence of data quantity, length and variety in 3.2.4. Lastly, we compare the influence of different decision boundaries in 3.2.5. Because ptb-xl represents the most ex­tensive and state-of-the-art multi-class, multi-label, and multivariate dataset, we only used that for the investigations.

[...]

Excerpt out of 89 pages

Details

Title
Generating Counterfactual Explanations for Electrocardiography Classification with Native Guide
Author
Year
2021
Pages
89
Catalog Number
V1139598
ISBN (eBook)
9783346573148
ISBN (Book)
9783346573155
Language
English
Keywords
generating, counterfactual, explanations, electrocardiography, classification, native, guide
Quote paper
Viktoria Andres (Author), 2021, Generating Counterfactual Explanations for Electrocardiography Classification with Native Guide, Munich, GRIN Verlag, https://www.grin.com/document/1139598

Comments

  • No comments yet.
Look inside the ebook
Title: Generating Counterfactual Explanations for Electrocardiography Classification with Native Guide



Upload papers

Your term paper / thesis:

- Publication as eBook and book
- High royalties for the sales
- Completely free - with ISBN
- It only takes five minutes
- Every paper finds readers

Publish now - it's free