[...] This paper will provide an overview of the different stages that CL has gone through. Early Corpus Linguistics will be presented first, a term that describes all corpus-based work up to the end of the 1950s. That is the time when Noam Chomsky makes the early researchers reflect on their work under certain aspects which neutralize somehow the work which was done up to that point. As an effect corpus research faces a certain discontinuity. Nevertheless, corpus-based work does not totally cease and the improvements in computer technology provide completely new possibilities in corpus research. Over the decades a considerable amount of machine-readable corpora is created for more and more different purposes and they initiate all variations of analysis. After the presenation of the chronological development of CL, the last but one chapter of the paper will finally deal with the concept of modern corpus linguistics and will give the definition of a corpus, which is not yet an definite thing to do. There is still a lot of work going on to improve the corpus linguistic methodology. The last chapter will give an overview of future prospects.
Table of Contents
1. Introduction
2. Early Corpus Linguistics
2.1 The concept of early corpus linguistics
2.2 Corpus-based work up to the end of the 1950s
3. Criticism on corpus linguistics
3.1 Noam Chomsky and Abercrombie
3.2 Evaluation of the criticism
4. The influence of computer technology and techniques
4.1 Different generations of corpora
5. The concept of modern corpus linguistics
5.1 Definition of a corpus
6. Future prospects in corpus linguistics
7. Conclusion
Objective & Topics
This paper examines the historical development of corpus linguistics, tracing its evolution from early empirical research through a period of critical decline to its current status as a fundamental methodology in modern linguistics. The study explores how shifts in linguistic theory and technological advancements have shaped the field's perception and practical application.
- Evolution of corpus linguistics from the pre-Chomskyan era to the present.
- Impact of Noam Chomsky's critique on empirical research paradigms.
- Role of computer technology in advancing large-scale linguistic data processing.
- Classification and development of different generations of corpora.
- Methodological integration of corpora, introspection, and computational tools.
Excerpt from the Book
4.1 Different generations of corpora
Despite Chomsky´s critical remarks Randolph Quirk, as mentioned in the paragraph above, starts to compile a corpus of both spoken and written British English known as the SEU. It is launched in 1961 and consists of approximately one million words, half of them written and the other half spoken. It is not yet conceived as a computer corpus since its strength lies in the `non-computable´ data of speech.
Very shortly afterwards, Nelson Francis and Henry Kucera present the Brown Corpus, a collection of printed American English. The Brown Corpus is considered to be the first electronically-readable corpus with a size of about one-million words. To make the corpus a good standard reference, the texts were sampled in different proportions from 15 different text categories: Press (reportage, editorial, reviews), Skill and Hobbies, Religious, Learned/scientific, Fiction (various subcategories) and many more. The well-thought lay-out of this corpus is copied by other corpus compilers. Its British English counterpart is Leech´s Lancaster-Oslo/Bergen Corpus (LOB) which is also published in 1961 and consists of one-million words.
Leech calls both the Brown Corpus and the LOB the first generation of corpora, since their million-word bulk seems vast by the standards of the earlier generations of corpus linguistics.
Summary of Chapters
1. Introduction: Presents the historical transformation of corpus linguistics from an "impossible" endeavor to a popular mainstream methodology.
2. Early Corpus Linguistics: Details corpus-based activities prior to the 1950s, emphasizing the structuralist tradition and empirical data collection.
3. Criticism on corpus linguistics: Discusses the paradigm shift triggered by Noam Chomsky's rationalist critique and its impact on the field.
4. The influence of computer technology and techniques: Explores how digital tools and computational developments enabled the revival and growth of corpus research.
5. The concept of modern corpus linguistics: Analyzes the contemporary definition of a corpus and its role as a cross-disciplinary, methodologically sound tool.
6. Future prospects in corpus linguistics: Addresses remaining challenges in manual analysis, spoken data collection, and the move toward multimedia corpora.
7. Conclusion: Synthesizes the evolution of the field and confirms the necessity of integrating multiple approaches for future linguistic research.
Keywords
Corpus Linguistics, Noam Chomsky, Empiricism, Rationalism, Computational Linguistics, Machine-readable corpora, Brown Corpus, SEU, LOB, Language acquisition, Introspection, Data retrieval, Annotation, Tagging, Parsing
Frequently Asked Questions
What is the primary subject of this academic paper?
The paper covers the historical development of corpus linguistics, specifically focusing on how the field evolved from pre-1950s empirical work to its modern, computer-aided state.
What are the core thematic areas discussed?
Key areas include the early days of empirical language research, the critical impact of Noam Chomsky, the technological influence of computers, and the development of different generations of corpora.
What is the primary research goal of the work?
The goal is to provide a comprehensive overview of how corpus linguistics transformed from an overlooked or criticized methodology into a vital, interdisciplinary tool in modern linguistics.
Which scientific method is predominantly used?
The paper utilizes a historical and descriptive methodology, synthesizing existing literature and critical theories to map the development of corpus linguistic research.
What does the main body of the text cover?
It covers early corpus-based methods, the rationalist critique of the 1960s, the evolution of software for data processing, and the evolving criteria for defining a "modern" corpus.
Which keywords best characterize the work?
Primary keywords include Corpus Linguistics, Empiricism, Rationalism, Computational Linguistics, and Machine-readable corpora.
How does the author define the shift in linguistic paradigms?
The author describes a shift from early empiricism toward rationalism following Noam Chomsky's work in the late 1950s, and a later, contemporary shift back to an improved empiricist methodology aided by technology.
What makes the "Brown Corpus" significant?
The Brown Corpus is identified as the first electronically-readable corpus of its kind, serving as a standard reference that heavily influenced the design of subsequent corpora.
Why are "monitor corpora" like the Bank of English special?
Unlike traditional, finite corpora, monitor corpora are constantly updated, allowing them to track ongoing changes in language, which is particularly useful for lexicographic work.
What role does human-computer interaction play in corpus research?
The paper categorizes this interaction into four models—Data retrieval, Symbiotic, Self-organizing, and Discovery procedure—based on the degree to which a human analyst delegates tasks to the computer.
- Quote paper
- Bernadette Wonner (Author), 2005, The development of corpus linguistics to its present-day concept, Munich, GRIN Verlag, https://www.grin.com/document/38258