Linguistic Aspects in Machine Translation


Project Report, 2006
22 Pages, Grade: 1,3

Excerpt

Table of Contents

I. Introduction
I. 1.) What is Machine Translation?
I. 2.) Why Machine Translation Matters
I. 2. a) Social and Political Importance of MT
I. 2. b) Scientific Importance of MT
I. 2. c) Commercial Importance of MT
I. 2. d) Philosophical Importance of MT

II. The History of Machine Translation
II. 1.) The First Years of Translation Machines
II. 2.) A Pioneer: Warren Weaver, Founder of the Idea of MT
II. 3.) The Latter Years in MT

III. Machine Translation in Practice
III. 1.) MT Test: Google
III. 2.) To Avoid Mistakes

IV. Linguistic Aspects in MT
IV. 1.) Semantics
IV. 2.) Pragmatics
IV. 3.) Real World Knowledge

V. Computational Linguistics
V. 1.) Methods of MT
V. 2.) Commonly Acknowledged Translation Systems
V. 2. a) LOGOS
V. 2. b) METAL
V. 2. c) METEO

VI. Epilogue: On the Future of MT

VII. Bibliography
VII. 1.) Literature
VII. 2.) Internet Sources

I. Introduction

I. 1.) What is Machine Translation?

A translation machine is a specialised software system developed for the translating from one human language to another: “[Machine translation systems are actually not machines, rather to be thought of] as programs that run on computers, which really are machines” (Arnold et al. 1994:10). Machine translation, or as it was called in its early days: Mechanical Translation (hence- forth abbreviated as MT) is a subfield of artificial intelligence (AI), both belonging to the large area of computer science (CS).

The field of machine translation is widely considered as one of the most awkward issues in computational linguistics, because it requires interdisciplinary knowledge of the scientists involved in the development of translation machines: knowledge in informatics, language cognition, skills in translating and in language description methods, as well as specialised knowledge in the fields the texts, which are to be translated, deal with (see: Schwanke 1991:11).

Furthermore, if translation machines were able to take over translational work completely, they would have to cover all capacities of a human translator: Human translators have to set a pragmatical or aesthetical balance between the source text and the target text (see: Wilss 1988:VII). Applying skills, as well as language and transcultural knowledge are some of the transla- tor’s optional tools to reach the expectations of the source text writer and the target text reader. Another tool, according to Wilss, was a translator’s “intuition”. He suggests that it was “some kind of sixth sense”, “the opposite of calculatable dynamics”, a part of the translator’s mysterious, no- torious “black box”, whose existence was not unknown, but which we only had an intuitive image of. Wilss adds that intuition was a “mental axiom” that could not be challenged (129).

So if a translator’s ‘intuition’ is so hard to define, how can it be synthesised within computer software, within a machine?

For a start, these reasons can only give a clue of what is at least involved in the development of translation machines. Thus, the enthusiasm and belief in the future of computers taking over and handling the translation of human languages has see-sawed since the birthing of its idea.

I. 2.) Why Machine Translation Matters

MT was “one of the earliest applications” (Arnold et al. 1994:iii) suggested for digital[1] com- puters[2] and like an artist might argue that a painting is never really finished, the whole develop- ment of computer science is still in process – and so is the “long-term scientific dream” of MT.

Also, the issue of MT contains increasing importance in several different fields of human en- terprise; which will be explained in the following:

I. 2. a) Social and Political Importance of MT

Its social and political importance “arises from the socio-political importance of translation in communities where more than one language is generally spoken” (4), and where the adoption of a common lingua franca is proximate. This – on the other hand – involves the dominance of the cho- sen language among the community to the disadvantage of the speakers of the other language(s).

This other language(s) can then become “second class” or disappear in the worst case, which is undoubtedly something that should matter, because it involves potential loss of culture as well as ways and uses of thinking and living. “So translation is necessary for communication (…)”, even if it means putting up with the side effects of it, like modifying or by chance, even losing semantic or/and cultural details of the information which is to be translated into a different language, and to be made accessible for another cultural community respectively. But since the modern world’s demand for translation “far outstrips any possible supply”, that is because of the actual deficiency of human translators and capacity; “the automation of translation is a social and political necessity for modern societies which do not wish to impose a common language on their members”. Cases like the Spanish speaking parts of the USA or the Welsh speaking parts of Great Britain make this point obvious. Switzerland or the European Community, in which multilingualism is part of every- day life, even more do so.

I. 2. b) Scientific Importance of MT

The scientific importance of MT results from its quality of being an interesting application and testing ground for ideas in CS, AI, and Linguistics – from which some of the most important devel- opments have begun in MT, like: the origins of Prolog,[3] the first widely available logic program- ming language, which formed a key part of the Japanese Fifth Generation programme,[4] were originally developed for MT (see: 5).

I. 2. c) Commercial Importance of MT

In today’s world of business the commercial importance of MT is not to be underestimated. Firstly: As a matter of accessibility, a customer is more probable to buy a Japanese product with a manual written in English than one whose manual is written in Japanese; even more so, when hav- ing to buy a safety critical system. Secondly: translation is expensive and requires highly skilled (and paid) workers. An average human translator may be able to manage 4-6 pages a day (see: 1994:5), which may cause delays during the development and the launching of a new product. Up to 40-45% of the running costs of European Community institutions are ‘language costs’, “of which translating and interpreting are the main element” (1994:5). The costs per year would make out about £300 million – a figure only relating to translations actually being done, not the amount of translation being required (see: Patterson 1982).

I. 2. d) Philosophical Importance of MT

MT is also a philosophical challenge, because “it represents the attempt to automate an activ- ity that can require the full range of human knowledge (…): “The extend to which one can auto- mate translation is an indication of the extend to which one can automate ‘thinking’” (Arnold et al. 1994:5).

II. The History of Machine Translation

II. 1.) The First Years of Translation Machines

Ideas about mechanising translation processes can be traced back to the seventeenth century, in connection with ideas on ‘real characters’ and ‘universal’ or ‘philosophical languages’, but it was not until the 20th century, until it came to realistic possibilities: In the mid 1930s, a French- Armenian, named Georges Artsrouni and a Russian, named Petr Smirnov-Trojanskij, who remained unrecognized in the USSR (see: Schwanke 1991:69), both applied for patents for ‘translating ma- chines’. Their idea contained not only a method for an automatic bilingual dictionary, but also a scheme for coding interlingual grammatical roles, based on Esperanto, and ideas for analysing sen- tences and generating texts in other languages. Neither one of them nor their ideas were known to anyone involved in the latter putting forward of the first tentative ideas for using the new in- vention – computers – for translating natural languages.

Pioneers in MT came from a wide variety of backgrounds, like electrical engineering, physics, linguistics, interpretation or philosophy. Two of these pioneers were Andrew Booth and Warren Weaver (1894-1978), who are particularly referred to in chapter II. 2.) of this paper.

In the earliest period, the question of what constituted an intermediary language (‘interlin- gua’;[5] which is how the actual part of work done by the translation machine is named, because the whole act required – and still requires – pre-editing and post-editing by a human; see: Schwanke 1991:69) and how it might be created preoccupied many researchers. It was closely related to in the minds of many at the time with what was seen as parallel activity in the field of information retrieval[6] towards a universally applicable ‘information language’. The public interest and the at- tention of those different scientific disciplines drawn to this new task was widespread to such ex- tend, that it was not surprising that presentations of MT took place at a wide range of confer- ences, wherever there was interest in the use of computers for exploring language and communi- cation; for instance conferences on cybernetics, information retrieval, linguistics etc. The publicity which statements about the immediate prospects of working systems attracted was not always welcome by those in the field, because it raised the public hopes higher and higher.

With time, the attention was drawn to the limitations of dictionary-based systems and to the importance of analysing and transforming syntactic structures; and from the 1960s onwards the common focus of nearly all the MT groups was on syntax. There was initial interest in the theories of Chomsky, but in time computers for syntactic structure analysis developed independently of the dominant developments in theoretical linguistics. The basic system design moved away from the earlier ‘direct translation’ approach[7], and overall design was tending towards a three (or more) stage approach involving independent processes of analysis, transfer, and synthesis.

Pioneers in MT had to face manifold and complex problems:

- Computers were for a long time limited in storage and speed, expensive to use and not widely available (in the case of the USSR unavailable until the 1970s, and even then they were far behind American models in capacity and speed).
- Input was cumbersome: texts had to be laboriously coded onto punched cards, because most groups devised their own coding systems.
- A recurrent demand at the time was for optical character readers, which was not realised until the 1980s.
- The output was in the form reams of large sheets of computer paper, often nearly illegible.
- Off-line storage was either on punched cards or on paper or on steel magnetized tapes.

The pre-occupation of researchers’ minds through the problems of dictionary storage and ac- curate access led to the development of procedures which are taken for granted nowadays.

II. 2.) A Pioneer: Warren Weaver, Founder of the Idea of MT

Warren Weaver was born on the 17th July 1894 in Reedsburg, Wisconsin, of German descent. Interested in engineering and gifted with talent his career took him from graduating in civil engi- neering over teaching mathematics at the Throop College in Pasadena, California, to being ap- pointed director of the Natural Science Division of the Rockefeller Foundation, where he inaugu- rated programs to support quantitative experimental biology and molecular biology. During the war he directed the work of several hundred mathematicians on operations research at the Office of Scientific Research and Development, to which he was invited by Vannevar Bush (1890-1974).[8] Weaver carried out a globally important program of agricultural research in Central and South America, India and the Philippines. He collaborated with Richard Courant[9] in plans for strengthen- ing advanced mathematics research in the United States, and the establishment of the Courant Institute of New York University, whose main building is called ‘Warren Weaver Hall’. He also wrote many articles in popular science, “Comments on the general theory of air warfare” among them, which was a significant factor in the founding of the ‘RAND Corporation’.[10] Weaver was very fond of Carrol’s Alice’s Adventures in Wonderland (1962) and has built up a collection of transla- tions of it.

The first time Weaver had mentioned the possibility of using the computer to translate was in March 1947, when he wrote a letter to the cyberneticist Norbert Wiener, who was not interested in this idea, but soon after that[11] Weaver talked about it with Andrew Booth,[12] a British x-ray crys- tallographer, who was working on ideas for a mechanical dictionary. By 1949, Weaver was urged by colleagues at the Rockefeller Foundation to elaborate his ideas in a memorandum, which he was supposed to send to 20 or 30 acquaintances:

“I have a text in front of me which is written in Russian, but I am going to pretend that it is really written in English and that it has been coded in some strange symbols. All I need to do is strip off the code in order to retrieve the information contained in the text.”

(Warren Weaver, as cited in Arnold et al. 1994:13).

This sentence, taken out of that memorandum, traces the actual development of MT, since it is not quite clear, who was in fact the first one that had the idea of translating automatically be- tween human languages. This memorandum “sparked a significant amount of interest and re- search” (13), “written before most people had any idea of what computers might be capable of, it was the direct stimulus for the beginnings of research in the US” (Hutchins 2000:17). According to Schwanke, this was assumably so, because the works of Booth and his colleagues had not been well-known in the U.S. at that time (see: 1991:70).

Weaver believed in the code system, which Booth had especially developed for Weaver’s idea and he was also convinced, that difficulties of semantic ambiguity could be solved particularly in technical languages by adding a sufficient context. Schwanke states, that the enthusiasm, with which Weaver’s memorandum was being commonly complemented as a milestone in the history of MT, was retrospectively irreproducible.

Later, by the 1950s, a large number of groups researched on the idea in Europe and the USA, not to mention the financial investment of about £20,000,000 (Arnold et al. 1994:13). Unfortu- nately it was not met with much success, and doubts arose about the possibility of automating translation (at least in the current state of knowledge). According to Arnold et al., the philosopher Bar-Hillel[13] announced especially FAHQMT, as principally impossible in a 1959 report. But this did not mean that MT in general was impossible.

Weaver displayed the main issues he saw for changing MT for the better in his memorandum of 1949 as the “Three Levels of Problems in Communication” (Gibbon 1998):

Level A: How accurately can the symbols of communication be transmitted? (The technical problem)

Level B: How precisely do the transmitted symbols convey the desired meaning? (The se- mantic problem)

Level C: How effectively does the received meaning affect conduct in the desired way? (The problem of effectiveness)

According to Gibbon (1998), there was close relation between those three levels and the so- called semiotic distinctions:

A: Syntax and the forms of language

B: Semantics and the meanings of language

C: Pragmatics and the use or function of language.

Warren Weaver’s memorandum lead to the convening of the first MT conference in the Princeton Inn, in July 1960 and the first book-length treatment, with a foreword written by Weaver. In this, he states his optimism for MT:

“[It is] not to charm or delight, not to contribute to elegance or beauty; but to be of wide service in the work-a-day task of making available the essential content of documents in languages which are foreign to the reader.” (Hutchins 2000:20).

According to Hutchins, Warren Weaver’s words have proved.

[...]


[1] digit (lat.): finger (Savetz)

[2] computare (lat.): to reckon (Savetz)

[3] Prolog = short for PROgramming in LOGic was created by Alain Colmerauer (1941-) et al. in Marseille dur- ing the 1970s. At the University of Edinburgh the work was finished with the support of Clocksin and Mellish. And today their version called Edinburgh syntax is commonly acknowledged as standard (see: <www.pcai.com>.)

[4] Fifth Generation = a “Japanese billion-dollar project, with a target date of 1989 to design and build a com- puter that is not only a hundred times faster than a Cray ‘supercomputer’ (the so-called Cray-1 system which was built by Cray Inc. in 1976 with a speed of 160 megaflops and an 8 MB memory; for further information see: www.cray.com) but contains AI software as well” (Savetz); also see chapter III. 3.) of this paper.

[5] also see V. 1.) of this paper

[6] information retrieval = the use of computers to indentify and access documents relevant to particular query (see: Hutchins 2000:2)

[7] direct translation = essentially built on word-for-word lexical substitution and structure modification (see: 3)

[8] Bush, Vannevar (1890-1974): pivotal figure in hypertext research; concepted MEMEX (a device in which an individual stores all his books, records, and communications), which was the first idea of an “easily accessi- ble, individually configurable storehouse of knowledge” (Keep et al. (2001).

[9] Courant, Richard (1888-1972): German mathematician, founder of the Courant Institute for Mathematical Sciences (since 1964) at New York University (see: <www-history.mcs.st-and.ac.uk>)

[10] The RAND Corporation about themselves on their website: “The RAND Corporation is a non-profit re- search organization providing objective analysis and effective solutions that address the challenges facing the public and private sectors around the world”. Its name derived from a contraction of the term research and development. They have dealt with packet switching (seed of the internet) in 1962, water resource man- agement in Netherlands in 1976, and the expanding of the NATO in 1995 (Lewis 2004).

[11] Schwanke states that this conversation had in fact already taken place in the year 1946 (1991:69).

[12] also see chapter II. 1.) of this paper

[13] Bar-Hillel, Yehoshua (1915-1975): philosopher, who had been a central figure in the early development of the field and contributed what should be considered “the first set of sober assessments for MT”, believed that, “in order to achieve Fully Automatic High-Quality Machine Translation (FAHQMT), machines must be able to process meaning” (Nirenburg 2003).

Excerpt out of 22 pages

Details

Title
Linguistic Aspects in Machine Translation
College
University of Frankfurt (Main)  (Institut für England und Amerikastudien)
Course
Translation and Intercultural Communication
Grade
1,3
Author
Year
2006
Pages
22
Catalog Number
V117455
ISBN (eBook)
9783640196074
ISBN (Book)
9783640196142
File size
611 KB
Language
English
Notes
Tags
Linguistic, Aspects, Machine, Translation, Intercultural, Communication
Quote paper
M.A. Alexander Täuschel (Author), 2006, Linguistic Aspects in Machine Translation, Munich, GRIN Verlag, https://www.grin.com/document/117455

Comments

  • No comments yet.
Read the ebook
Title: Linguistic Aspects in Machine Translation


Upload papers

Your term paper / thesis:

- Publication as eBook and book
- High royalties for the sales
- Completely free - with ISBN
- It only takes five minutes
- Every paper finds readers

Publish now - it's free