This work is about different corpora and translation tools, which can assist during the translation process and have been used in order to ensure a high level of translation quality.
In this paper, different exercises with the aforementioned corpora and tools have been carried out.
The term “corpus” can be defined as a collection of spoken or written utterances that exist in machine-readable form (annotated corpora) or in their “raw” state (unannotated corpora) and that are used for linguistic-related tasks. However, the term is mainly used to refer to machine-readable variety.
Corpora are used to analyze lexical, syntactic and semantic/pragmatic aspects, as well as comparing different languages and language registers to each other. The observations in these areas found through corpora cannot only be a support in translation, they can also bring clarity in areas such as historical linguistics and language acquisition.
In order to be applicable in those fields, the corpus must be multifunctional and reusable. Therefore, it needs to conform to standards of falsifiability (the model can be tested on different samples of corpus material and can be replaced by a better fitting model if necessary), completeness (the model has to account for unrestricted data), and objectivity (the model can objectively be tested by observers who do not have an emotional connection to its success or failure).
Table of Contents
1. Introduction
2. Exercise 1: Corpora: Definition, Corpus Parts, Kinds of Corpora and their Usage
2.1 Definition
2.2 Corpus Parts
2.3 Kinds of Corpora
2.4 Usage of Corpora
3. Exercise 2: AntConc: Frequ. List, Keyword List, Coll., N-grams/Clusters, Con. Plot
3.1 Frequency List
3.2 Keyword List
3.3 Collocational Behaviour
3.4 Comparison between Collocates and Clusters/N-Grams
3.5 Concordance Plot
4. Exercise 3: Using TreeTagger for Annot., Analy. Annot. Errors, Analy. the Tagset
4.1 TreeTagger for Annotation of biology.txt
4.2 Analyzing the Annotation Errors
4.3 Analyzing the Tagset
5. Exercise 4: Creating an XML File, Def. Tags for Metad., Def. some new Tags
6. Exercise 5: Search in BNC, Usage of Semantically Related Words, Using DWDS
5.1 Search in BNC
5.2 Usage of Semantically Related Words
5.3 Using DWDS
7. Exercise 6: Analyzing German Support Verb Constructions with CQPWeb
9. Exercise 8: Passives in English and German and Translational Universals
9.1 Translational Differences of English and German Passives
9.2 Translational Universals: Shining Through and Normalization
10. Exercise 9: Synonyms and Antonyms
11. Exercise 10: Terminology Database MultiTerm
Objectives and Topics
This work aims to provide practical insights into the application of linguistic tools and corpora to enhance translational quality. It explores how various software and digital resources can assist translators in analyzing language patterns, terminology, and syntactical structures effectively.
- Application of corpus-based analysis using AntConc
- Annotation strategies and error analysis with TreeTagger
- Utilization of XML for metadata management
- Comparative corpus studies using BNC and DWDS
- Analysis of translation-specific phenomena like normalization and shining through
Excerpt from the Book
3.1 Frequency List
This AntConc-tool is able to count all the words in the corpus and presents them in an ordered list in order to find out which words are the most frequent in a corpus. This can help to study the type of vocabulary used in the text, to identify common word clusters and grammatical patterns and to compare the frequency of a word in different text files.
The most frequent word in the given “biology text” is the definite article “the”. This is not very surprising as “the” is the most frequent word in the English language. Other than that, many prepositions such as “of”, “in”, “to”, “by”, “for” and “with” occur in the text since they indicate the relationship of a noun/pronoun to another text element. Furthermore, the conjunction “and” that connects two ideas with each other has a very high frequency in the text. Besides that, the high frequency of the words “dna” and “phage” make clear that the given text deals with a biological topic.
Summary of Chapters
1. Introduction: Outlines the purpose of the seminar and the practical application of corpora and translation tools to ensure translational quality.
2. Exercise 1: Corpora: Definition, Corpus Parts, Kinds of Corpora and their Usage: Defines core corpus terminology, structural components like raw data and metadata, and explores different types of corpora.
3. Exercise 2: AntConc: Frequ. List, Keyword List, Coll., N-grams/Clusters, Con. Plot: Details the practical use of AntConc tools for frequency counts, collocation analysis, and concordance visualization.
4. Exercise 3: Using TreeTagger for Annot., Analy. Annot. Errors, Analy. the Tagset: Explores manual and automatic annotation methods and examines common errors in part-of-speech tagging.
5. Exercise 4: Creating an XML File, Def. Tags for Metad., Def. some new Tags: Introduces XML as a standard for encoding metadata and describes essential structural rules for markup.
6. Exercise 5: Search in BNC, Usage of Semantically Related Words, Using DWDS: Demonstrates how to use large-scale corpora like BNC and DWDS to observe actual language usage and semantic relationships.
7. Exercise 6: Analyzing German Support Verb Constructions with CQPWeb: Discusses the analysis of German support-verb constructions using CQPWeb and addresses challenges in identifying non-adjacent elements.
9. Exercise 8: Passives in English and German and Translational Universals: Analyzes translation patterns between English and German, specifically focusing on passive constructions and translation-specific universal features.
10. Exercise 9: Synonyms and Antonyms: Illustrates the practical use of a thesaurus for vocabulary enhancement and sentence transformation.
11. Exercise 10: Terminology Database MultiTerm: Provides an overview of using MultiTerm to manage and convert terminological data for specialized translation tasks.
Keywords
Corpus Linguistics, Translation Studies, AntConc, TreeTagger, XML, BNC, DWDS, Support Verb Constructions, Annotation, Normalization, Shining Through, Synonyms, Antonyms, MultiTerm, Terminology Management
Frequently Asked Questions
What is the primary focus of this document?
The document serves as a summary of practical exercises conducted during a seminar on machine translation and technical communication, focusing on the use of digital language resources.
What are the central thematic fields covered?
The work covers corpus linguistics, annotation tools, XML data structure, comparative translation analysis, and terminology management.
What is the primary goal of the presented exercises?
The primary goal is to demonstrate how translators can use specific tools and corpora to improve translational quality and gain deeper insights into linguistic patterns.
Which scientific methods are primarily utilized?
The work utilizes data-driven learning and corpus-based analysis, employing tools like AntConc, TreeTagger, CQPWeb, and MultiTerm to extract and interpret linguistic data.
What topics are addressed in the main part?
The main part systematically covers the technical application of linguistic software, ranging from basic frequency counts to complex annotation error analysis and the investigation of translation universals.
Which keywords best characterize the work?
Key terms include Corpus Linguistics, Translation Studies, Annotation, Terminology Management, and Digital Language Resources.
How does TreeTagger handle capitalized words in this study?
The study notes that TreeTagger has difficulty correctly annotating parts of speech and lemmas when words are written entirely in capital letters, often resulting in "unknown" tags.
What is the difference between "normalization" and "shining through" as discussed in the text?
Normalization refers to the target text conforming to the typical patterns of the target language, while shining through refers to features of the source language being reflected in the translated text.
What are the challenges of identifying support-verb constructions in German using CQPWeb?
The text explains that because German syntax allows for a large distance between the support verb and the noun, standard recognition tools may fail to identify the phrase correctly.
- Quote paper
- Marie-Louise Meiser (Author), 2017, Introduction to Language Resources for Translators, Munich, GRIN Verlag, https://www.grin.com/document/1187657