Project Report, 2006
22 Pages, Grade: 1,3
I. 1.) What is Machine Translation?
I. 2.) Why Machine Translation Matters
I. 2. a) Social and Political Importance of MT
I. 2. b) Scientific Importance of MT
I. 2. c) Commercial Importance of MT
I. 2. d) Philosophical Importance of MT
II. The History of Machine Translation
II. 1.) The First Years of Translation Machines
II. 2.) A Pioneer: Warren Weaver, Founder of the Idea of MT
II. 3.) The Latter Years in MT
III. Machine Translation in Practice
III. 1.) MT Test: Google
III. 2.) To Avoid Mistakes
IV. Linguistic Aspects in MT
IV. 1.) Semantics
IV. 2.) Pragmatics
IV. 3.) Real World Knowledge
V. Computational Linguistics
V. 1.) Methods of MT
V. 2.) Commonly Acknowledged Translation Systems
V. 2. a) LOGOS
V. 2. b) METAL
V. 2. c) METEO
VI. Epilogue: On the Future of MT
VII. 1.) Literature
VII. 2.) Internet Sources
A translation machine is a specialised software system developed for the translating from one human language to another: “[Machine translation systems are actually not machines, rather to be thought of] as programs that run on computers, which really are machines” (Arnold et al. 1994:10). Machine translation, or as it was called in its early days: Mechanical Translation (hence- forth abbreviated as MT) is a subfield of artificial intelligence (AI), both belonging to the large area of computer science (CS).
The field of machine translation is widely considered as one of the most awkward issues in computational linguistics, because it requires interdisciplinary knowledge of the scientists involved in the development of translation machines: knowledge in informatics, language cognition, skills in translating and in language description methods, as well as specialised knowledge in the fields the texts, which are to be translated, deal with (see: Schwanke 1991:11).
Furthermore, if translation machines were able to take over translational work completely, they would have to cover all capacities of a human translator: Human translators have to set a pragmatical or aesthetical balance between the source text and the target text (see: Wilss 1988:VII). Applying skills, as well as language and transcultural knowledge are some of the transla- tor’s optional tools to reach the expectations of the source text writer and the target text reader. Another tool, according to Wilss, was a translator’s “intuition”. He suggests that it was “some kind of sixth sense”, “the opposite of calculatable dynamics”, a part of the translator’s mysterious, no- torious “black box”, whose existence was not unknown, but which we only had an intuitive image of. Wilss adds that intuition was a “mental axiom” that could not be challenged (129).
So if a translator’s ‘intuition’ is so hard to define, how can it be synthesised within computer software, within a machine?
For a start, these reasons can only give a clue of what is at least involved in the development of translation machines. Thus, the enthusiasm and belief in the future of computers taking over and handling the translation of human languages has see-sawed since the birthing of its idea.
MT was “one of the earliest applications” (Arnold et al. 1994:iii) suggested for digital com- puters and like an artist might argue that a painting is never really finished, the whole develop- ment of computer science is still in process – and so is the “long-term scientific dream” of MT.
Also, the issue of MT contains increasing importance in several different fields of human en- terprise; which will be explained in the following:
Its social and political importance “arises from the socio-political importance of translation in communities where more than one language is generally spoken” (4), and where the adoption of a common lingua franca is proximate. This – on the other hand – involves the dominance of the cho- sen language among the community to the disadvantage of the speakers of the other language(s).
This other language(s) can then become “second class” or disappear in the worst case, which is undoubtedly something that should matter, because it involves potential loss of culture as well as ways and uses of thinking and living. “So translation is necessary for communication (…)”, even if it means putting up with the side effects of it, like modifying or by chance, even losing semantic or/and cultural details of the information which is to be translated into a different language, and to be made accessible for another cultural community respectively. But since the modern world’s demand for translation “far outstrips any possible supply”, that is because of the actual deficiency of human translators and capacity; “the automation of translation is a social and political necessity for modern societies which do not wish to impose a common language on their members”. Cases like the Spanish speaking parts of the USA or the Welsh speaking parts of Great Britain make this point obvious. Switzerland or the European Community, in which multilingualism is part of every- day life, even more do so.
The scientific importance of MT results from its quality of being an interesting application and testing ground for ideas in CS, AI, and Linguistics – from which some of the most important devel- opments have begun in MT, like: the origins of Prolog, the first widely available logic program- ming language, which formed a key part of the Japanese Fifth Generation programme, were originally developed for MT (see: 5).
In today’s world of business the commercial importance of MT is not to be underestimated. Firstly: As a matter of accessibility, a customer is more probable to buy a Japanese product with a manual written in English than one whose manual is written in Japanese; even more so, when hav- ing to buy a safety critical system. Secondly: translation is expensive and requires highly skilled (and paid) workers. An average human translator may be able to manage 4-6 pages a day (see: 1994:5), which may cause delays during the development and the launching of a new product. Up to 40-45% of the running costs of European Community institutions are ‘language costs’, “of which translating and interpreting are the main element” (1994:5). The costs per year would make out about £300 million – a figure only relating to translations actually being done, not the amount of translation being required (see: Patterson 1982).
MT is also a philosophical challenge, because “it represents the attempt to automate an activ- ity that can require the full range of human knowledge (…): “The extend to which one can auto- mate translation is an indication of the extend to which one can automate ‘thinking’” (Arnold et al. 1994:5).
Ideas about mechanising translation processes can be traced back to the seventeenth century, in connection with ideas on ‘real characters’ and ‘universal’ or ‘philosophical languages’, but it was not until the 20th century, until it came to realistic possibilities: In the mid 1930s, a French- Armenian, named Georges Artsrouni and a Russian, named Petr Smirnov-Trojanskij, who remained unrecognized in the USSR (see: Schwanke 1991:69), both applied for patents for ‘translating ma- chines’. Their idea contained not only a method for an automatic bilingual dictionary, but also a scheme for coding interlingual grammatical roles, based on Esperanto, and ideas for analysing sen- tences and generating texts in other languages. Neither one of them nor their ideas were known to anyone involved in the latter putting forward of the first tentative ideas for using the new in- vention – computers – for translating natural languages.
Pioneers in MT came from a wide variety of backgrounds, like electrical engineering, physics, linguistics, interpretation or philosophy. Two of these pioneers were Andrew Booth and Warren Weaver (1894-1978), who are particularly referred to in chapter II. 2.) of this paper.
In the earliest period, the question of what constituted an intermediary language (‘interlin- gua’; which is how the actual part of work done by the translation machine is named, because the whole act required – and still requires – pre-editing and post-editing by a human; see: Schwanke 1991:69) and how it might be created preoccupied many researchers. It was closely related to in the minds of many at the time with what was seen as parallel activity in the field of information retrieval towards a universally applicable ‘information language’. The public interest and the at- tention of those different scientific disciplines drawn to this new task was widespread to such ex- tend, that it was not surprising that presentations of MT took place at a wide range of confer- ences, wherever there was interest in the use of computers for exploring language and communi- cation; for instance conferences on cybernetics, information retrieval, linguistics etc. The publicity which statements about the immediate prospects of working systems attracted was not always welcome by those in the field, because it raised the public hopes higher and higher.
With time, the attention was drawn to the limitations of dictionary-based systems and to the importance of analysing and transforming syntactic structures; and from the 1960s onwards the common focus of nearly all the MT groups was on syntax. There was initial interest in the theories of Chomsky, but in time computers for syntactic structure analysis developed independently of the dominant developments in theoretical linguistics. The basic system design moved away from the earlier ‘direct translation’ approach, and overall design was tending towards a three (or more) stage approach involving independent processes of analysis, transfer, and synthesis.
Pioneers in MT had to face manifold and complex problems:
- Computers were for a long time limited in storage and speed, expensive to use and not widely available (in the case of the USSR unavailable until the 1970s, and even then they were far behind American models in capacity and speed).
- Input was cumbersome: texts had to be laboriously coded onto punched cards, because most groups devised their own coding systems.
- A recurrent demand at the time was for optical character readers, which was not realised until the 1980s.
- The output was in the form reams of large sheets of computer paper, often nearly illegible.
- Off-line storage was either on punched cards or on paper or on steel magnetized tapes.
The pre-occupation of researchers’ minds through the problems of dictionary storage and ac- curate access led to the development of procedures which are taken for granted nowadays.
Warren Weaver was born on the 17th July 1894 in Reedsburg, Wisconsin, of German descent. Interested in engineering and gifted with talent his career took him from graduating in civil engi- neering over teaching mathematics at the Throop College in Pasadena, California, to being ap- pointed director of the Natural Science Division of the Rockefeller Foundation, where he inaugu- rated programs to support quantitative experimental biology and molecular biology. During the war he directed the work of several hundred mathematicians on operations research at the Office of Scientific Research and Development, to which he was invited by Vannevar Bush (1890-1974). Weaver carried out a globally important program of agricultural research in Central and South America, India and the Philippines. He collaborated with Richard Courant in plans for strengthen- ing advanced mathematics research in the United States, and the establishment of the Courant Institute of New York University, whose main building is called ‘Warren Weaver Hall’. He also wrote many articles in popular science, “Comments on the general theory of air warfare” among them, which was a significant factor in the founding of the ‘RAND Corporation’. Weaver was very fond of Carrol’s Alice’s Adventures in Wonderland (1962) and has built up a collection of transla- tions of it.
The first time Weaver had mentioned the possibility of using the computer to translate was in March 1947, when he wrote a letter to the cyberneticist Norbert Wiener, who was not interested in this idea, but soon after that Weaver talked about it with Andrew Booth, a British x-ray crys- tallographer, who was working on ideas for a mechanical dictionary. By 1949, Weaver was urged by colleagues at the Rockefeller Foundation to elaborate his ideas in a memorandum, which he was supposed to send to 20 or 30 acquaintances:
“I have a text in front of me which is written in Russian, but I am going to pretend that it is really written in English and that it has been coded in some strange symbols. All I need to do is strip off the code in order to retrieve the information contained in the text.”
(Warren Weaver, as cited in Arnold et al. 1994:13).
This sentence, taken out of that memorandum, traces the actual development of MT, since it is not quite clear, who was in fact the first one that had the idea of translating automatically be- tween human languages. This memorandum “sparked a significant amount of interest and re- search” (13), “written before most people had any idea of what computers might be capable of, it was the direct stimulus for the beginnings of research in the US” (Hutchins 2000:17). According to Schwanke, this was assumably so, because the works of Booth and his colleagues had not been well-known in the U.S. at that time (see: 1991:70).
Weaver believed in the code system, which Booth had especially developed for Weaver’s idea and he was also convinced, that difficulties of semantic ambiguity could be solved particularly in technical languages by adding a sufficient context. Schwanke states, that the enthusiasm, with which Weaver’s memorandum was being commonly complemented as a milestone in the history of MT, was retrospectively irreproducible.
Later, by the 1950s, a large number of groups researched on the idea in Europe and the USA, not to mention the financial investment of about £20,000,000 (Arnold et al. 1994:13). Unfortu- nately it was not met with much success, and doubts arose about the possibility of automating translation (at least in the current state of knowledge). According to Arnold et al., the philosopher Bar-Hillel announced especially FAHQMT, as principally impossible in a 1959 report. But this did not mean that MT in general was impossible.
Weaver displayed the main issues he saw for changing MT for the better in his memorandum of 1949 as the “Three Levels of Problems in Communication” (Gibbon 1998):
Level A: How accurately can the symbols of communication be transmitted? (The technical problem)
Level B: How precisely do the transmitted symbols convey the desired meaning? (The se- mantic problem)
Level C: How effectively does the received meaning affect conduct in the desired way? (The problem of effectiveness)
According to Gibbon (1998), there was close relation between those three levels and the so- called semiotic distinctions:
A: Syntax and the forms of language
B: Semantics and the meanings of language
C: Pragmatics and the use or function of language.
Warren Weaver’s memorandum lead to the convening of the first MT conference in the Princeton Inn, in July 1960 and the first book-length treatment, with a foreword written by Weaver. In this, he states his optimism for MT:
“[It is] not to charm or delight, not to contribute to elegance or beauty; but to be of wide service in the work-a-day task of making available the essential content of documents in languages which are foreign to the reader.” (Hutchins 2000:20).
According to Hutchins, Warren Weaver’s words have proved.
 digit (lat.): finger (Savetz)
 computare (lat.): to reckon (Savetz)
 Prolog = short for PROgramming in LOGic was created by Alain Colmerauer (1941-) et al. in Marseille dur- ing the 1970s. At the University of Edinburgh the work was finished with the support of Clocksin and Mellish. And today their version called Edinburgh syntax is commonly acknowledged as standard (see: <www.pcai.com>.)
 Fifth Generation = a “Japanese billion-dollar project, with a target date of 1989 to design and build a com- puter that is not only a hundred times faster than a Cray ‘supercomputer’ (the so-called Cray-1 system which was built by Cray Inc. in 1976 with a speed of 160 megaflops and an 8 MB memory; for further information see: www.cray.com) but contains AI software as well” (Savetz); also see chapter III. 3.) of this paper.
 also see V. 1.) of this paper
 information retrieval = the use of computers to indentify and access documents relevant to particular query (see: Hutchins 2000:2)
 direct translation = essentially built on word-for-word lexical substitution and structure modification (see: 3)
 Bush, Vannevar (1890-1974): pivotal figure in hypertext research; concepted MEMEX (a device in which an individual stores all his books, records, and communications), which was the first idea of an “easily accessi- ble, individually configurable storehouse of knowledge” (Keep et al. (2001).
 Courant, Richard (1888-1972): German mathematician, founder of the Courant Institute for Mathematical Sciences (since 1964) at New York University (see: <www-history.mcs.st-and.ac.uk>)
 The RAND Corporation about themselves on their website: “The RAND Corporation is a non-profit re- search organization providing objective analysis and effective solutions that address the challenges facing the public and private sectors around the world”. Its name derived from a contraction of the term research and development. They have dealt with packet switching (seed of the internet) in 1962, water resource man- agement in Netherlands in 1976, and the expanding of the NATO in 1995 (Lewis 2004).
 Schwanke states that this conversation had in fact already taken place in the year 1946 (1991:69).
 also see chapter II. 1.) of this paper
 Bar-Hillel, Yehoshua (1915-1975): philosopher, who had been a central figure in the early development of the field and contributed what should be considered “the first set of sober assessments for MT”, believed that, “in order to achieve Fully Automatic High-Quality Machine Translation (FAHQMT), machines must be able to process meaning” (Nirenburg 2003).
Term Paper, 14 Pages
Thesis (M.A.), 103 Pages
Bachelor Thesis, 57 Pages
Term Paper, 33 Pages
Term Paper (Advanced seminar), 21 Pages
Term Paper (Advanced seminar), 16 Pages
Lesson Plan, 16 Pages
Essay, 4 Pages
Research Paper (postgraduate), 26 Pages
Bachelor Thesis, 151 Pages
Research Paper (undergraduate), 6 Pages
Term Paper (Advanced seminar), 12 Pages
Seminar Paper, 14 Pages
GRIN Publishing, located in Munich, Germany, has specialized since its foundation in 1998 in the publication of academic ebooks and books. The publishing website GRIN.com offer students, graduates and university professors the ideal platform for the presentation of scientific papers, such as research projects, theses, dissertations, and academic essays to a wide audience.
Free Publication of your term paper, essay, interpretation, bachelor's thesis, master's thesis, dissertation or textbook - upload now!