In this thesis, I address this heuristic crisis with the development of a fully-automated machine learning framework capable of optimizing arbitrary econometric state space ARIMA methods in a completely data-driven manner. With this framework, I compare the predictions of a model portfolio consisting of all 8 possible combinations of a standard ARIMA, a seasonal SARIMA, an ARIMAX model with socio-economic variables, and an ARIMAX model with conflict indicators of neighboring countries as exogenous predictors.
In addition, each model is examined on a monthly and quarterly periodicity. By comparing the out-of-sample prediction errors, I find that this approach can beat the no-change heuristic in the country-level one-year ahead prediction of the log change of conflict fatalities in all metrics used, including the TADDA score.
While the urgency for early detection of crises is increasing, truly reliable conflict prediction systems are still not in place, despite the emergence of better data sources and the use of state-of-the art machine learning algorithms in recent years. Researchers still face the rarity of conflict onset events, which makes it difficult for machine learning-based systems to detect crucial escalation or de-escalation signals. As a result, prediction models can be outperformed by naive heuristics, such as the no-change model, which leads to a lack of confidence and thus limited practical usability.
Contents
1 Introduction
2 Method
2.1 State Space Modelling Approach
2.1.1 Linear Gaussian State Space Model
2.1.2 ARIMA State Space Model
2.1.3 SARIMA State Space Model
2.1.4 (S)ARIMAX - Regression with (S)ARIMA errors
2.1.5 Kalman Filter
2.2 No-Change Baseline Model
2.3 Evaluation Metrics
2.3.1 TADDA
2.3.2 Mean Absolute Error (MAE)
2.3.3 Root Mean Square Error (RMSE)
3 Data
3.1 Armed Conflict Location and Event Data Project (ACLED)
3.1.1 Analysis of Conflict Incidence and Country Categorization
3.2 International Monetary Fund - World Economic Outlook Database (IMF-WEO)
3.3 World Bank - World Development Indicators (WB-WDI)
3.4 Variable Overview and Missing Data
4 Implementation in Python
4.1 No-Change Forecaster Class
4.2 State Space ARIMA Forecaster Class
4.3 Grid Search Class
4.4 Automated Model Building Process
5 Results
5.1 Global Model Performances
5.2 Country-Level Model Performances
6 Conclusion
Research Objectives & Key Themes
This thesis addresses the "heuristic crisis" in conflict prediction by developing a fully-automated machine learning framework that optimizes econometric state space ARIMA methods to forecast conflict fatalities. By comparing a portfolio of model variants—ranging from standard ARIMA to ARIMAX models incorporating socio-economic and neighbor-conflict indicators—the research aims to create a data-driven system capable of consistently outperforming naive no-change heuristics in predictive accuracy.
- Development of an automated state space ARIMA framework for country-level conflict forecasting.
- Evaluation of different exogenous predictors, including socio-economic indicators and spill-over conflict effects.
- Comparison of forecasting performance across different temporal aggregation levels (monthly vs. quarterly).
- Benchmarking of econometric time series models against cutting-edge machine learning and no-change baselines using the TADDA, MAE, and RMSE metrics.
Excerpt from the Book
2.1 State Space Modelling Approach
In contrast to the classical Box-Jenkins modelling of ARIMA processes, I rely on the state space approach, which allows the formulation of equivalent state space ARIMA equations that can be converted back to the classical ARIMA equations. The State Space approach is kept very general and allows the modeling of any kind of system, which is why it has a wide range of applications in the natural sciences as well as in engineering. For time series forecasting, state space modeling allows a reinterpretation of the underlying problem as the evolution of a system over time: Instead of directly modeling the time series of an observed variable yt, yt is linked to the unknown internal states αt of the system and then the temporal state evolution of the system is modeled. The training of a state space model corresponds to learning the unknown state components based on the observation time series yt. During forecasting, an estimate for a future state can be generated based on all previous states, which in turn can be used to derive the corresponding future value of the observed variable.
Summary of Chapters
1 Introduction: Outlines the urgent challenge of conflict prediction, the limitations of current naive heuristics, and the potential for a more sophisticated, automated state space ARIMA approach.
2 Method: Provides the mathematical foundation for the automated state space modelling framework and the different ARIMA, SARIMA, and ARIMAX model specifications, including the Kalman filter implementation.
3 Data: Describes the three main data sources—ACLED, IMF-WEO, and World Bank WDI—used to generate conflict history indicators and socio-economic exogenous variables.
4 Implementation in Python: Details the modular object-oriented Python implementation, covering the preprocessing pipeline, grid search cross-validation, and the automated model building process.
5 Results: Analyzes the predictive performance of the 16 state space ARIMA variants at both global and country levels, highlighting the significance of monthly vs. quarterly data and the impact of exogenous variables.
6 Conclusion: Summarizes the key findings, confirming that properly tuned time series models can compete with or outperform state-of-the-art conflict prediction models, and discusses directions for future research.
Keywords
Conflict Prediction, State Space ARIMA, Machine Learning, ACLED, Econometrics, Crisis Early Warning, TADDA Metric, Time Series Analysis, Exogenous Predictors, Spill-over Effect, Hyperparameter Optimization, Kalman Filter, Data-Driven Forecasting, Conflict Fatalities, Development in Reverse.
Frequently Asked Questions
What is the core focus of this thesis?
The work focuses on developing an automated, data-driven framework to improve conflict prediction by using state space ARIMA methods to forecast the number of conflict fatalities at the country level.
What are the central themes of the research?
Key themes include the comparison of different ARIMA-based model configurations, the impact of socio-economic and neighborhood-conflict indicators on forecast accuracy, and the evaluation of models against naive baseline heuristics.
What is the primary goal of the study?
The primary goal is to exceed the predictive accuracy of the simple "no-change" model—which has historically been difficult for sophisticated models to beat—in the context of conflict escalation and de-escalation.
Which methodology is employed?
The research uses econometric state space ARIMA methods in a pipeline that includes imputation, standardization, and principal component analysis (PCA), combined with an extending window cross-validation for hyperparameter tuning.
What does the main body cover?
The main body details the theoretical state space framework, the data preparation from ACLED and socio-economic sources, the modular Python implementation of the modeling pipeline, and an extensive evaluation of model performances.
Which keywords characterize this work?
Core keywords include Conflict Prediction, State Space ARIMA, Machine Learning, ACLED, Econometrics, Crisis Early Warning, and the TADDA metric.
Why are monthly models preferred over quarterly models in this thesis?
The research finds that monthly models perform significantly better than quarterly models because quarterly aggregation leads to a substantial loss of information and increases uncertainty in the model coefficients.
How does this thesis address the "data leakage" problem?
The research implements a strict pipeline where independent preprocessing (imputation, standardization, and PCA) is performed separately on training and test sets within an extending window cross-validation to prevent information from the future from influencing the model training.
What is the conclusion regarding seasonality in these models?
The thesis finds that the rigid implementation of annual seasonality used in this research actually interferes with prediction accuracy, likely acting as noise rather than helpful seasonal signals in most country-level time series.
- Citar trabajo
- Adrian Leon Scholl (Autor), 2022, Development of an Automated Conflict Prediction System. State Space ARIMA Methods, Múnich, GRIN Verlag, https://www.grin.com/document/1325354