The Internet and Languages [around the year 2000] - Marie Lebert

In 1966, the U.S. government-issued ALPAC (Automatic Language Processing Advisory Committee) report offered a prematurely negative assessment of the value and prospects of practical machine translation systems, effectively putting an end to funding and experimentation in the field for the next decade. It was not until the late 1970s, with the growth of computing and language technology, that serious efforts began once again. This period of renewed interest also saw the development of the Transfer model of machine translation and the emergence of the first commercial MT systems. While commercial ventures such as SYSTRAN and METAL began to demonstrate the viability, utility and demand for machine translation, these mainframe-bound systems also illustrated many of the problems in bringing MT products and services to market. High development cost, labor-intensive lexicography and linguistic implementation, slow progress in developing new language pairs, inaccessibility to the average user, and inability to scale easily to new platforms are all characteristics of these second- generation systems."

As explained in August 1998 by Eduard Hovy, head of the Natural Language Group at USC/ISI (University of Southern California/Information Sciences Institute), machine translation implies "language-related applications/functionalities that are not translation, such as information retrieval (IR) and automated text summarization (SUM). You would not be able to find anything on the Web without IR! — all the search engines (AltaVista, Yahoo!, etc.) are built upon IR technology. Similarly, though much newer, it is likely that many people will soon be using automated summarizers to condense (or at least, to extract the major contents of) single (long) documents or lots of (any length) ones together."

= Experiences

In December 1997, AltaVista, a leading search engine, was the first to launch a free translation software with Babel Fish — also called AltaVista Translation —, which could translate webpages (up to three pages at the same time) from English into French, German, Italian, Portuguese or Spanish, and vice versa. The software was developed by SYSTRAN (an acronym for System Translation), a company specializing in machine translation software. SYSTRAN's headquarters are located in Soisy-sous-Montmorency, near Paris, France. Sales, marketing, and research and development are based in its subsidiary in La Jolla, California.

This initiative was followed by other translation software developed by Alis Technologies, Globalink, Lernout & Hauspie, and Softissimo, with free and/or paid versions on the web.

Based in Montreal, Quebec, Alis Technologies has specialized in development and marketing of language handling solutions and services, particularly language implementation in the information technology industry. Alis Translation Solutions (ATS) has offered applications in a number of languages, and tools and services to improve the quality of translations. Language Technology Solutions (LTS) has marketed advanced tools and services for language engineering and information technology (90 languages covered).

Based in Ieper, Belgium, and Burlington, Massachusetts, Lernout & Hauspie (L&H) was a leader in advanced speech technology for commercial applications and products, with four core technologies: automatic speech recognition (ASR), text-to-speech (TTS), text-to-text (TTT), and digital speech compression (DSC). Its ASR, TTS and DSC technologies were licensed to companies in telecommunications, computers and multimedia, consumer electronics and automotive electronics. Its TTT translation services were provided to IT companies, and vertical and automation markets. The Machine Translation Group created by Lernout & Hauspie included L&H Language Technology, AppTek, AILogic, NeocorTech, and Globalink. Lernout & Hauspie was later bought by Nuance Communications.

Globalink, a company created in 1990 in the U.S., focused on language translation software and services, i.e. customized translation solutions built around software products, online options, and professional translation services. The software products were available in Spanish, French, Portuguese, German, Italian and English, for individuals, small businesses, multinational corporations and governments, from a stand-alone product giving a fast draft translation to a full system managing professional translations.

As explained on the company website in 1998, "with Globalink's translation applications, the computer uses three sets of data: the input text, the translation program and permanent knowledge sources (containing a dictionary of words and phrases of the source language), and information about the concepts evoked by the dictionary and rules for sentence development. These rules are in the form of linguistic rules for syntax and grammar, and some are algorithms governing verb conjugation, syntax adjustment, gender and number agreement and word re-ordering. Once the user has selected the text and set the machine translation process in motion, the program begins to match words of the input text with those stored in its dictionary. Once a match is found, the application brings up a complete record that includes information on possible meanings of the word and its contextual relationship to other words that occur in the same sentence. The time required for the translation depends on the length of the text. A three-page, 750-word document takes about three minutes to render a first draft translation."

At the headquarters of the World Health Organization (WHO) in Geneva, Switzerland, the Computer-assisted Translation and Terminology Unit (CTT) has been a pioneer since 1997 in assessing technical options for using computer-assisted translation (CAT) systems based on translation memory (TM). With such systems, translators can access previous translations from portions of the text; accept, reject or modify them; and add the new translation to the memory, thus enriching it for future reference. By archiving the daily output, the translator helps in building an extensive translation memory and in solving a number of translation issues. Several projects have been under way at the CTT for electronic document archiving and retrieval, bilingual/multilingual text alignment, computer-assisted translation, translation memory and terminology database management, and speech recognition.