White-collar crime is and has always been an urgent issue for the society. In recent years, white-collar crime has increased dramatically by technological advances. The studies show that companies are affected annually by corruption, balance-sheet manipulation, embezzlement, criminal insolvency and other economic crimes. The companies are usually unable to identify the damage caused by fraudulent activities. To prevent fraud, companies have the opportunity to use intelligent IT approaches. The data analyst or the investigator can use the data which is stored digitally in today’s world to detect fraud.

In the age of Big Data, digital information is increasing enormously. Storage is cheap today and no longer a limited medium. The estimates assume that today up to 80 percent of all operational information is stored in the form of unstructured text documents. This bachelor thesis examines Data Mining and Text Mining as intelligent IT approaches for fraud detection in white-collar crime. Text Mining is related to Data Mining. For a differentiation, the source of the information and the structure is important. Text Mining is mainly concerned with weak- or unstructured data, while Data Mining often relies on structured sources.

At the beginning of this bachelor thesis, an insight is first given on white-collar crime. For this purpose, the three essential tasks of a fraud management are discussed. Based on the fraud triangle of Cressey it is showed which conditions need to come together so that an offender commits a fraudulent act. Following, some well-known types of white-collar crime are considered in more detail.

Text Mining approach was used to demonstrate how to extract potentially useful knowledge from unstructured text. For this purpose, two self-generated e-mails were converted into struc-tured format. Moreover, a case study will be conducted on fraud detection in credit card da-taset. The dataset contains legitimate and fraudulent transactions. Based on a literature research, Data Mining techniques are selected and then applied on the dataset by using various sampling techniques and hyperparameter optimization with the goal to identify correctly pre-dicted fraudulent transactions. The CRISP-DM reference model was used as a methodical procedure.

Leseprobe

1 Introduction

1.1 Motivation and problem statement

1.2 Research Methodology

1.3 Goal and structure of the thesis

2 White-Collar Crime

2.1 Fraud Management

2.1.1 Fraud Prevention

2.1.2 Fraud Detection

2.1.3 Fraud Investigation

2.2 Fraud Triangle

2.2.1 Opportunity

2.2.2 Incentive/Pressure

2.2.3 Rationalization/Attitude

3 Types of White-Collar-Crimes

3.1 Fraud

3.2 Credit Card Fraud

3.3 Healthcare Fraud

3.4 Embezzlement

3.5 Criminal Insolvency Offences

3.6 Corruption

4 Data Mining, Text Mining and Big Data

4.1 Introduction into Big Data

4.1.1 The 3 V’s of Big Data

4.1.2 Data Forms

4.2 Data Mining

4.2.1 Types of Machine Learning

4.2.2 Classification of Data Mining Applications

4.3 Text Mining

4.3.1 Practise areas of Text Mining

4.3.2 Example of feature extraction from unstructured data

4.4 Context of Data Mining and Text Mining in White-Collar Crime

5 A case study on Credit Card Fraud Detection

5.1 Overview

5.2 Data Exploration

5.3 Confusion Matrix Terminology

5.4 Algorithms and Techniques

5.4.1 Literature review on Data Mining Techniques

5.4.2 Selection of Data Mining Techniques

5.5 Sampling techniques

5.6 Train and Test Set

5.7 Imbalanced Data

5.7.1 Results on imbalanced Data

5.8 Undersampled Data

5.8.1 Results on undersampled Data

5.9 Oversampled Data

5.9.1 Results on oversampled Data

5.10 Oversampled Data with SMOTE

5.10.1 Results on Oversampled Data with SMOTE

5.11 Undersampled Data with Hyperparameters Optimization

5.11.1 Model Parameter and Hyperparameter

5.11.2 Hyperparameter optimization algorithms

5.11.3 Explanation of selected Hyperparameters

5.11.4 Cross-Validation

5.11.5 Selection of Hyperparameter Optimization Algorithm and k-fold CV

5.11.6 Results on Undersampled Data with Hyperparameter Optimization

5.12 Review of the case study: Credit Card Fraud Detection

6 Conclusion

Research Goal and Focus Areas

The primary research objective of this thesis is to determine which data mining techniques are effective for detecting fraudulent activities in both structured and unstructured datasets, specifically within the context of white-collar crime. The study bridges the gap between theoretical crime analysis and practical technical implementation.

White-collar crime dynamics and the fraud triangle theory.
Data mining and text mining applications for fraud detection.
Big data management and unstructured data transformation.
Evaluation of machine learning algorithms through a credit card fraud case study.

Excerpt from the Book

4.3.1 Practise areas of Text Mining

Information Retrieval (IR) The main task of Information Retrieval (IR) is not to analyse the data, but to index, search and retrieve documents from large text databases with keyword queries (Miner et al., 2012: 36). At the present time, IR systems are used in almost every application. For example, the powerful Internet search engine Google counts on this technology, but other applications e.g. E-Mail and text editors also use IR systems by providing the user the ability to receive response through keyword queries. In summary, the goal of IR “…is to connect the right information with the right users at the right time…” (Aggarwal and Zhai, 2012: 2).

Information Extraction Information Extraction (IE) is one of the more mature fields in text mining with the aim of constructing structured data from unstructured text (Miner et al., 2012: 37). With this technique, meaningful information can be extracted from large amount of text (Talib et al., 2016: 415). However, this cannot be done without great effort. Extracting data from large amount of text is not easy and requires special algorithms and softwares (Miner et al., 2012: 37). “IE systems are used to extract specific attributes and entities from the document and establish their relationship. The extracted corpus is stored in the database for further processing.” (Talib et al., 2016: 415).

Document Clustering According to Miner (2012: 959), clustering or cluster analysis is the oldest technology of text mining and was used by the military to document recovery systems during World War II. Today, clustering of documents is algorithms of DM used to group similar documents into clusters (ibid.: 36). The goal of clustering is to classify text documents into groups by applying different clustering algorithms (Talib et al., 2016: 416). Clustering is a method of unsupervised learning; no training is required, as it is the case with supervised learning. Unsupervised learning is not as powerful as supervised learning, but more versatile.

Summary of Chapters

1 Introduction: Discusses the growing societal problem of white-collar crime and establishes the research question regarding appropriate data mining techniques for fraud detection.

2 White-Collar Crime: Provides an overview of white-collar crime definitions, management strategies, and introduces the Fraud Triangle by Cressey.

3 Types of White-Collar-Crimes: Examines specific categories such as credit card fraud, embezzlement, insolvency offences, and corruption to illustrate the financial impact of such crimes.

4 Data Mining, Text Mining and Big Data: Explains technical foundations, data formats, and how text mining transforms unstructured data into forms suitable for predictive machine learning models.

5 A case study on Credit Card Fraud Detection: The practical core of the thesis, detailing the application of various data mining algorithms to a real-world credit card dataset using the CRISP-DM model.

6 Conclusion: Summarizes key findings, answers the research question based on the empirical results, and provides recommendations for future research.

Keywords

White-collar crime, Fraud detection, Data mining, Text mining, Big data, Machine learning, CRISP-DM, Credit card fraud, Classification, Unstructured data, Supervised learning, Hyperparameter optimization, Logistic regression, Support vector machine, Neural networks.

Frequently Asked Questions

What is the core focus of this bachelor thesis?

The thesis focuses on using intelligent IT approaches, specifically Data Mining and Text Mining, to identify and mitigate white-collar crimes within large datasets.

Which specific crime types are addressed in the study?

The study examines financial fraud, credit card fraud, healthcare fraud, embezzlement, criminal insolvency, and corruption.

What is the primary research question?

The main question asks which data mining techniques are most appropriate for detecting white-collar crimes in structured and unstructured data.

Which scientific methodology is applied?

The thesis utilizes a literature analysis based on Webster & Watson for the theory, and the CRISP-DM (Cross-Industry Standard Process for Data Mining) reference model for the empirical case study.

What is covered in the main body of the work?

The main body covers the theoretical background of fraud, an introduction to big data and text mining techniques, and a detailed case study on credit card fraud detection.

What are the key technical concepts described in the work?

Key concepts include supervised and unsupervised learning, sampling techniques for imbalanced datasets (e.g., SMOTE), and hyperparameter optimization to enhance predictive model accuracy.

Why is the CRISP-DM model relevant to this research?

It provides a structured, iterative framework (Business Understanding, Data Understanding, Data Preparation, Modelling, Evaluation, and Deployment) for conducting the credit card fraud detection project.

How does text mining contribute to the fraud detection process?

Text mining is essential for converting unstructured data—such as emails or documents—into a structured format (e.g., using Vector Space Model or TF-IDF) so that predictive machine learning algorithms can process them.

Ende der Leseprobe aus 93 Seiten - nach oben

Details

Titel: Fraud Detection in White-Collar Crime
Hochschule: Hochschule Heilbronn Technik Wirtschaft Informatik
Note: 1.3
Autor: Rohan Ahmed (Autor:in)
Erscheinungsjahr: 2017
Seiten: 93
Katalognummer: V426831
ISBN (eBook): 9783668738348
ISBN (Buch): 9783668738355
Sprache: Englisch
Schlagworte: fraud detection white-collar crime
Produktsicherheit: GRIN Publishing GmbH

Arbeit zitieren: Rohan Ahmed (Autor:in), 2017, Fraud Detection in White-Collar Crime, München, GRIN Verlag, https://www.grin.com/document/426831

Fraud Detection in White-Collar Crime