One goal of the thesis is to evaluate static, dynamic and hybrid approaches in order to draw conclusions about the domains mentioned in the title of the thesis. Consequently, result-oriented conclusions about the characteristics that distinguish the three approaches from each other are to be drawn from the respective publications on basis of qualitative and quantitative evaluation criteria and the knowledge gap in the comparative literature is intended to be filled by the evaluation of hybrid approaches. The aim is to build a high-level understanding of the different methods and to identify differences and commonalities between these approaches based on research literature that presents new approaches within these domains. In particular, strengths, weaknesses and special properties of the three domains are to be determined. The second goal of this thesis is to develop a more comprehensive practical understanding of ML-based malware detection techniques, as exemplified by the practical section. Here, the ML workflow model is used to propose and implement a static malware detector step-by-step using the Python programming language and various ML algorithms.
Accordingly the three primary research-questions this thesis aims to address are as follows:
1. Which static, dynamic and hybrid ML based approaches exist both in current and past research and how do they work?
2. How do the underlying methodological domains (static, dynamic and hybrid) com-pare under consideration of multiple quantitative and qualitative evaluation criteria?
3. How can a static malware detection model be implemented hands on in practice using the ML workflow process model as a guideline?
Inhaltsverzeichnis (Table of Contents)
- 1 Introduction
- 1.1 Initial situation
- 1.2 Problem description
- 2 Scope of the thesis
- 3 Theory
- 3.1 Malware
- 3.1.1 Definition
- 3.1.2 Malware Evolution
- 3.1.3 Types of malware
- 3.2 Program architecture in Microsoft Windows (MW)
- 3.2.1 The Portable Executable file format
- 3.2.2 Relevant insights for malware analysis
- 3.3 Malware Detection
- 3.3.1 Methodologies
- 3.3.2 Evading detection by Obfuscation
- 3.4 Machine Learning (ML)
- 3.4.1 Definition
- 3.4.2 Features
- 3.4.3 ML-Workflow
- 3.4.4 ML-Paradigms
- 3.4.5 ML-Algorithms
- 3.4.6 Model accuracy and metrics
- 4 Literature Review: ML Approaches in Research
- 4.1 Review outline
- 4.1.1 Structure
- 4.1.2 Literature overview
- 4.1.3 Evaluation criteria
- 4.2 Malware Feature Taxonomy
- 4.3 Static ML approaches
- 4.3.1 Quantitative evaluation
- 4.3.2 Qualitative evaluation
- 4.4 Dynamic ML approaches
- 4.4.1 Quantitative evaluation
- 4.4.2 Qualitative evaluation
- 4.5 Hybrid ML approaches
- 4.5.1 Quantitative evaluation
- 4.5.2 Qualitative evaluation
- 4.6 Conclusive learning from literature
- 5 Practical Review: Implementing a Static ML-Based Malware Detector
- 5.1 Safety measures and disclaimer
- 5.2 Requirements and resources
- 5.2.1 Test-environment: Guest OS and host OS
- 5.2.2 PE file repository: VirusShare and EMBER
- 5.2.3 Feature extraction: Python PEpper
- 5.2.4 Model training and validation: WEKA
- 5.3 Implementation
- 5.3.1 Phase 1: Data gathering
- 5.3.2 Phase 2: Data preparation
- 5.3.3 Phase 3: Model training
- 5.3.4 Phase 4: Model validation
- 5.4 Conclusive learning from practical implementation
- 6 Conclusion and Outlook
- 6.1 Conclusion
- 6.2 Outlook
Zielsetzung und Themenschwerpunkte (Objectives and Key Themes)
This bachelor thesis aims to provide a comprehensive overview of machine learning (ML) based malware detection methods for Windows programs, focusing on both static and dynamic approaches. The thesis analyzes the strengths and weaknesses of each approach, comparing and contrasting their effectiveness in identifying and preventing malicious software. It explores how these methods are used in research and examines the practical implementation of a static ML-based malware detector.
- Evolution and characteristics of malware
- Static and dynamic malware detection methods
- Application of machine learning in malware detection
- Evaluation of different ML approaches
- Practical implementation of a static ML-based malware detector
Zusammenfassung der Kapitel (Chapter Summaries)
Chapter 1 introduces the initial situation and problem description related to malware threats in the context of Microsoft Windows systems. It outlines the scope of the thesis and the key research questions to be addressed. Chapter 3 delves into the theoretical foundation of the thesis, defining malware, its evolution, and different types. It also explores the program architecture of Microsoft Windows, focusing on the Portable Executable file format and its relevance for malware analysis. Furthermore, it examines the concept of malware detection, including methodologies, evasion techniques, and the role of machine learning. Chapter 4 conducts a literature review on ML-based malware detection approaches, exploring different static and dynamic approaches and their effectiveness in identifying malware. It evaluates these approaches quantitatively and qualitatively, highlighting key findings from research. Chapter 5 presents a practical review of implementing a static ML-based malware detector. It details the implementation process, including data gathering, preparation, model training, and validation, providing insights into the challenges and practical considerations of building a robust malware detection system. Finally, Chapter 6 concludes the thesis, summarizing the key findings and their implications for the field of malware detection. It also provides an outlook on future research directions and potential improvements to existing methods.
Schlüsselwörter (Keywords)
The thesis focuses on the crucial topics of malware detection, machine learning, and Windows program analysis. It delves into both static and dynamic ML-based approaches, analyzing their effectiveness in detecting malicious software. The research emphasizes the importance of feature extraction and model evaluation in building robust malware detection systems. This thesis provides valuable insights into the practical implementation of these techniques, contributing to the ongoing efforts in the field of cybersecurity.
- Citar trabajo
- Lars Kaiser (Autor), 2022, Static and Dynamic Machine Learning Based Malware Detection Methods for Windows Programs, Múnich, GRIN Verlag, https://www.grin.com/document/1323478