In 2022, various socio-political debates regarding the Russian-Ukrainian war took place between German-speaking users on Twitter. Due to the large amount of daily user- generated tweets, the main goal of this master thesis is the development of an automatic cross-target stance detection model to enable an examination of German Twitter data on the Russian-Ukrainian conflict 2022.
In the scope of this thesis, the BERT model is used and trained jointly on multiple-related targets of interest by encoding both tweet and target. Within this work, an auto-labeled dataset, a small manual-labeled test set and an unlabeled dataset with German tweets on four targets of interest are presented. A number of experiments with different BERT models are conducted studying cross-target generalization as well as the influence of class balance and case sensitivity. The best performing fine-tuned model is applied for automatic stance prediction on 2022 Twitter data. The stance prediction results are examined to detect potential reasons within a stance category.
The results of this work show that with the applied cross-target approach reasonable performance on known targets can be achieved, but does not suffice for a successful cross-target transfer on unknown targets. In addition, it is observed that a balanced class distribution can counteract a bias towards an overrepresented class and results suggest that case-sensitivity is detrimental in stance detection. The classified data show a number of potential reasons for a favorable and opposing stance towards a respective target within the Russo-Ukrainian conflict. Overall, the stance prediction results show that in 2022 there were consistently more German-speaking Twitter users in favor of supporting Ukraine in the conflict than those opposed to it.
Inhaltsverzeichnis (Table of Contents)
- Introduction
- Research Objectives
- Thesis Outline
- Background
- The Russian-Ukrainian Conflict 2022
- Foreign and Security Policy
- Energy Crisis
- Stance Detection
- The Task of Stance Detection
- Types of Stance Detection
- Related Work
- The Language Model BERT
- Methodology
- Transformer Encoder
- Pre-Training
- Fine-Tuning
- Dataset
- Data Collection
- Removal of Duplicates
- Development of Balanced Class Distributions
- Data Statistics
- Manually Labeled Test Datasets
- Final Dataset and Data Availability
- Experiments
- Experimental Setup
- Pre-Trained Language Models
- Preprocessing Methodology
- Evaluation Metrics
- Experiments and Results
- Experiment 1: Impact of a Balanced Dataset
- Experiment 2: Cross-Target Generalization
- Experiment 3: Different BERT Models
- Hyperparameter
- Discussion
- Application of Fine-Tuned Model on 2022 Twitter Data
- Twitter Data of 2022
- Statistics of Detected Stances
- Potential Reasons of Target-Specific Stance Groups
- Target NOC
- Target SLI
- Target AD
- Target US
- Summary and Evolution of Tweet Volume Over Time
- Conclusions and Outlook
Zielsetzung und Themenschwerpunkte (Objectives and Key Themes)
This master thesis aims to develop an automatic cross-target stance detection model for examining German Twitter data on the Russian-Ukrainian conflict in 2022. The research focuses on using the BERT model for stance prediction, specifically exploring the impact of balanced class distribution, cross-target generalization, and case sensitivity on model performance. Key themes include:- Automatic cross-target stance detection
- BERT model performance in stance detection
- Influence of data characteristics on model accuracy
- Analysis of stance perspectives on the Russian-Ukrainian conflict
- Utilization of German Twitter data in social media analysis
Zusammenfassung der Kapitel (Chapter Summaries)
- Introduction: This chapter outlines the research objectives and provides a brief overview of the thesis structure.
- Background: This chapter explores the context of the Russian-Ukrainian conflict in 2022, including its impact on foreign and security policy and the energy crisis. It then delves into the concept of Twitter as a social media platform and the task of stance detection, discussing its types, related work, and the BERT language model.
- Dataset: This chapter describes the data collection process for German tweets related to the Russian-Ukrainian conflict, focusing on the development of balanced class distributions, data statistics, and the creation of manually labeled test datasets. It also outlines the final dataset structure and availability.
- Experiments: This chapter discusses the experimental setup, including pre-trained language models, preprocessing methodology, and evaluation metrics. It then presents the results of different experiments exploring the impact of a balanced dataset, cross-target generalization, and different BERT models.
- Application of Fine-Tuned Model on 2022 Twitter Data: This chapter analyzes the results of applying the fine-tuned model to German Twitter data from 2022, examining the statistics of detected stances, identifying potential reasons for specific stance groups, and summarizing the evolution of tweet volume over time.
Schlüsselwörter (Keywords)
The key terms and topics of this master thesis include automatic stance detection, cross-target generalization, BERT language model, German Twitter data, Russian-Ukrainian conflict, data characteristics, balanced class distribution, case sensitivity, and social media analysis.- Quote paper
- Johanna Garthe (Author), 2023, Automatic Cross-Target Stance Detection With Fine-Tuned BERT, Munich, GRIN Verlag, https://www.grin.com/document/1431357