The advancement in computational linguistics and statistics has made an explicit impact on the emergence of corpus linguistics and the sophistication of its applications and studies involving not only pure linguistic issues but also areas related to real-life problems. One of these areas is authorship attribution studies.
Authorship attribution is a domain of a study concerned with identifying the most likely author of a particular anonymous or disputed document from a set of suspected authors. To this end, several methodologies, techniques, and approaches have been devised and so often assessed on various sets of data to make sure of their effectiveness. Although the literature shows no consensus as to which methodology is the best among others, there is an overwhelming fact that all authorship attribution studies are grounded on the assumption that each author has a particular "linguistic fingerprint" which can be captured through detecting and measuring the linguistic clues hidden in their authorial styles.
Taking an experimental framework, this study is an attempt to gauge the discriminating and clustering power of the selected methodology against a particular type of data covering samples of political journal articles. The corpus compiled is a special purpose one strictly controlled for genre, register, and date of publication. It comprises eleven samples extracted from eleven articles with their lengths ranging between (1,101) to (1,113) words long; three ones are taken as test (hypothetically questioned) samples and the rest as training samples. The corpus represents the journalistic writings of four authors.
Inhaltsverzeichnis (Table of Contents)
- Introduction
- CHAPTER 1
- 1.1 Computational Linguistics and Corpus Linguistics
- 1.2 Corpus-Based VS. Corpus-Driven Studies
- 1.3 A Historical Background of Corpus Linguistics
- 1.4 Types of Corpora
- 1.4.1 General Reference vs. Special Purpose Corpus
- 1.4.2 Written vs. Spoken Corpus
- 1.4.3 Monolingual vs. Multilingual Corpus
- 1.4.4 Synchronic vs. Diachronic Corpus
- 1.4.5 Open vs. Closed Corpus
- 1.4.6 Learner Corpus
- 1.4.7 Online Corpus/ Web as a Corpus
- 1.5 Methods in Corpus Linguistics
- 1.5.1 Concordance
- 1.5.2 Frequency / WordLists
- 1.5.3 Keyword Lists
- 1.5.4 Collocate Lists
- 1.5.5 Dispersion Plots
- CHAPTER 2
- 2.1 Introduction
- 2.2 Areas of Forensic Linguistics and Corpus Linguistics
- 2.2.1 Qualitative and Quantitative Analysis in Forensic Linguistics
- 2.2.2 Authorship Attribution
- CHAPTER 3
- 3.1 Introduction
- 3.2 Text Corpus
- 3.2.1 Text Genre and Time of Publication
- 3.2.2 Sampling Methodology
- 3.2.3 Length of Text Samples
- 3.3 The Research Methodology
- 3.3.1 The Stylistics Method
- 3.3.2 The Computational Method
- 3.3.3 The Statistical Method/ SPSS (Version 19)
- 3.3.4 Authorship Attribution Approach
- CHAPTER 4
- 4.1 Qualitative Analysis
- 4.2 Quantitative Analysis
- 4.2.1 Wordsmith Tools and Excel Program
- 4.2.2 SPSS (19)
- CHAPTER 5
- 5.1 Conclusions
- 5.2 Recommendations
- 5.3 Suggestions for Further Researches
Zielsetzung und Themenschwerpunkte (Objectives and Key Themes)
This book aims to explore the efficacy of corpus linguistics in addressing forensic authorship attribution inquiries, specifically focusing on the discriminatory power of function words. It utilizes a special purpose corpus of political journalism articles to evaluate the methodology and its applicability in authorship attribution cases. The book seeks to answer key questions concerning the feasibility of the intended methodology, the discriminatory power of English function words in forensic authorship attribution, and the overall efficiency of the approach when applied to a specific corpus type.
- The application of corpus linguistics in forensic authorship attribution
- The discriminatory power of function words in authorship attribution
- The effectiveness of the methodology when applied to a specific corpus type
- The potential of corpus linguistics in solving authorship disputes
- The theoretical underpinnings of authorship attribution
Zusammenfassung der Kapitel (Chapter Summaries)
The book delves into the theoretical foundations of corpus linguistics and forensic authorship attribution. Chapter 1 provides an overview of computational linguistics, corpus linguistics, and its historical development. It elaborates on various corpus types, including general reference, special purpose, written, spoken, monolingual, multilingual, synchronic, diachronic, open, closed, learner, and online corpora. The chapter also explores key methods in corpus linguistics, such as concordance, frequency/word lists, keyword lists, collocate lists, and dispersion plots.
Chapter 2 examines the intersection of forensic linguistics and corpus linguistics, focusing on the use of qualitative and quantitative analyses for authorship attribution. It delves into the theory of idiolect and its practical implications in forensic investigations.
Chapter 3 introduces the text corpus used in the study, outlining the sampling methodology, the length of text samples, and the research methodology employed. This includes the stylistics method, the computational method, the statistical method using SPSS, and the specific approach adopted for authorship attribution.
Chapter 4 presents a qualitative and quantitative analysis of the data, utilizing tools such as Wordsmith and Excel, as well as SPSS. It explores the results of the analysis and the insights derived from the data.
Schlüsselwörter (Keywords)
The key terms and concepts explored in this book include corpus linguistics, forensic authorship attribution, function words, idiolect, discriminatory power, methodology, special purpose corpus, political journalism, qualitative analysis, quantitative analysis, Wordsmith, Excel, SPSS, and authorship disputes.
- Quote paper
- Khalid Shakir Hussein (Author), Eman Abdul Kareem (Author), 2017, A Corpus-Based Analysis of Using Function Words in English Forensic Authorship Attribution, Munich, GRIN Verlag, https://www.grin.com/document/385050