2000: INFORMATION IS AVAILABLE IN MANY LANGUAGES

= [Overview]

2000 was a turning point for a multilingual internet, both for its content and its users. In summer 2000, non-English-speaking users reached 50%. This percentage went on to increase steadily: 52.5% in summer 2001, 57% in December 2001, 59.8% in April 2002, 64.4% in September 2003 - with 34.9% non-English- speaking Europeans and 29.4% Asians - and 64.2% in March 2004 - with 37.9% non-English-speaking Europeans and 33% Asians (source: Global Reach). The internet is also a good tool for minority languages, as stated by Caoimhín Ó Donnaíle, who teaches computing at the Institute Sabhal Mór Ostaig, located on the Island of Skye, in Scotland. Caoimhín also maintains the college website, which is the main site worldwide with information on Scottish Gaelic, with a bilingual (English, Gaelic) list of European minority languages. He wrote in May 2001: "Students do everything by computer, use Gaelic spell- checking, a Gaelic online terminology database. There are more hits on our website. There is more use of sound. Gaelic radio (both Scottish and Irish) is now available continuously worldwide via the internet. A major project has been the translation of the Opera web-browser into Gaelic - the first software of this size available in Gaelic."

= "Language nations"

At first, the internet was nearly 100% English. Born in the United States, it spread in North America before taking over the whole planet. Then people from all continents began connecting to the internet and posting webpages in their own languages. In the 1990s, the percentage of English decreased from nearly 100% to 85% (reached in 1997 or 1998, depending on the sources).

In 1997, Babel - a joint initiative from Alis Technologies (language translation services) and the Internet Society - ran the first major study relating to distribution of languages on the web. The results were published in June 1997 on a webpage named Web Languages Hit Parade. The main languages were English with 82.3%, German with 4.0%, Japanese with 1.6%, French with 1.5%, Spanish with 1.1%, Swedish with 1.1%, and Italian with 1.0%.

In July 1998, according to Global Reach, a company specializing in international online marketing, the fastest growing groups of internet users were non-English-speaking: Spanish-speaking, 22.4%, Japanese-speaking, 12.3%; German-speaking, 14%; and French-speaking, 10% - with 56 million non-English-speaking users. More than 80% of all webpages were still in English, whereas only 6% of the world population spoke English as a native language (16% spoke Spanish).

Randy Hobler was a consultant in internet marketing for Globalink, a company specializing in language translation software and services. He wrote in September 1998: "85% of the content of the web in 1998 is in English and going down. This trend is driven not only by more websites and users in non- English-speaking countries, but by increasing localization of company and organization sites, and increasing use of machine translation to/from various languages to translate websites."

Randy also brought up the concept of "language nations": "Because the internet has no national boundaries, the organization of users is bounded by other criteria driven by the medium itself. In terms of multilingualism, you have virtual communities, for example, of what I call 'Language Nations'… all those people on the internet wherever they may be, for whom a given language is their native language. Thus, the Spanish Language nation includes not only Spanish and Latin American users, but millions of Hispanic users in the U.S., as well as odd places like Spanish-speaking Morocco."

Robert Ware created OneLook Dictionaries in April 1996, as a "fast finder" of words in hundreds of online dictionaries. He wrote about an experience he had in 1994, that showed the internet could promote both a common language and multilingualism: "In 1994, I was working for a college and trying to install a software package on a particular type of computer. I located a person who was working on the same problem and we began exchanging email. Suddenly, it hit me… the software was written only 30 miles away but I was getting help from a person half way around the world. Distance and geography no longer mattered! OK, this is great! But what is it leading to? I am only able to communicate in English but, fortunately, the other person could use English as well as German which was his mother tongue. The internet has removed one barrier (distance) but with that comes the barrier of language. It seems that the internet is moving people in two quite different directions at the same time. The internet (initially based on English) is connecting people all around the world. This is further promoting a common language for people to use for communication. But it is also creating contact between people of different languages and creates a greater interest in multilingualism. A common language is great but in no way replaces this need. So the internet promotes both a common language *and* multilingualism. The good news is that it helps provide solutions. The increased interest and need is creating incentives for people around the world to create improved language courses and other assistance, and the internet is providing fast and inexpensive opportunities to make them available."

The internet could also be a tool to develop a "cultural identity". During the Symposium on Multimedia Convergence organized by the International Labor Office (ILO) in January 1997, Shinji Matsumoto, general secretary of the Musicians' Union of Japan (MUJ), explained: "Japan is quite receptive to foreign culture and foreign technology. (…) Foreign culture is pouring into Japan and, in fact, the domestic market is being dominated by foreign products. Despite this, when it comes to preserving and further developing Japanese culture, there has been insufficient support from the government. (…) With the development of information networks, the earth is getting smaller and it is wonderful to be able to make cultural exchanges across vast distances and to deepen mutual understanding among people. We have to remember to respect national cultures and social systems."

As the internet quickly spread worldwide, more and more people in the U.S. realized that, although English may stay the main international language for exchanges of all kinds, not everyone in the world reads English and, even so, people prefer to read information in their own language. To reach as large an audience as possible, companies and organizations needed to offer bilingual, trilingual, even multilingual websites, while adapting their content to a given audience. Thus the need of both internationalization and localization, which became a major trend in the following years, not only in the U.S. but in many countries, where foreign companies set up bilingual websites - in their language and in English - to reach a wider audience, and get more clients.

Translation software available on the web was far from perfect, but was helpful, because instantaneous and free, unlike a high- quality professional translation. In December 1997, AltaVista, a leading search engine, was the first to launch such software with Babel Fish - also called AltaVista Translation -, which could translate webpages (up to three pages at the same time) from English into French, German, Italian, Portuguese or Spanish, and vice versa. The software was developed by Systran, a company specializing in machine translation. This initiative was followed by others, with free and/or paid versions on the web, developed by Alis Technologies, Globalink, Lernout & Hauspie, IBM (with the WebSphere Translation Server), Softissimo, Champollion, TMX or Trados.

Brian King, director of the WorldWide Language Institute (WWLI), brought up the concept of "linguistic democracy" in September 1998: "Whereas 'mother-tongue education' was deemed a human right for every child in the world by a UNESCO report in the early '50s, 'mother-tongue surfing' may very well be the Information Age equivalent. If the internet is to truly become the Global Network that it is promoted as being, then all users, regardless of language background, should have access to it. To keep the internet as the preserve of those who, by historical accident, practical necessity, or political privilege, happen to know English, is unfair to those who don't."

Jean-Pierre Cloutier was the editor of "Chroniques de Cybérie", a weekly French-language online report of internet news. He wrote in August 1999: "We passed a milestone this summer. Now more than half the users of the internet live outside the United States. Next year, more than half of all users will be non English-speaking, compared with only 5% five years ago. Isn't that great?"

The internet did pass this second milestone in summer 2000, with non-English-speaking users reaching 50%. As shown in the statistics of Global Reach, they were 52.5% in summer 2001, 57% in December 2001, 59.8% in April 2002, 64.4% in September 2003 (with 34.9% non-English-speaking Europeans and 29.4% Asians), and 64.2% in March 2004 (with 37.9% non-English-speaking Europeans and 33% Asians).

= From ASCII to Unicode

Used since the beginning of computing, ASCII (American Standard Code for Information Interchange) is a 7-bit coded character set for information interchange in English. It was published in 1968 by ANSI (American National Standards Institute), with an update in 1977 and 1986. The 7-bit plain ASCII, also called Plain Vanilla ASCII, is a set of 128 characters with 95 printable unaccented characters (A-Z, a-z, numbers, punctuation and basic symbols), i.e. the ones that are available on the English/American keyboard.

With the use of other European languages, extensions of ASCII (also called ISO-8859 or ISO-Latin) were created as sets of 256 characters to add accented characters as found in French, Spanish and German, for example ISO 8859-1 (ISO-Latin-1) for French.

Yoshi Mikami, who lives in Fujisawa, Japan, launched the
bilingual (Japanese, English) website "The Languages of the
World by Computers and the Internet", also known as Logos Home
Page or Kotoba Home Page, in late 1995. Yoshi was the co-author
(with Kenji Sekine and Nobutoshi Kohara) of "The Multilingual
Web Guide" (Japanese edition), a print book published by
O'Reilly Japan in August 1997, and translated in 1998 into
English, French and German.

Yoshi Mikami explained in December 1998: "My native tongue is Japanese. Because I had my graduate education in the U.S. and worked in the computer business, I became bilingual in Japanese and American English. I was always interested in languages and different cultures, so I learned some Russian, French and Chinese along the way. In late 1995, I created on the web The Languages of the World by Computers and the Internet and tried to summarize there the brief history, linguistic and phonetic features, writing system and computer processing aspects for each of the six major languages of the world, in English and Japanese. As I gained more experience, I invited my two associates to help me write a book on viewing, understanding and creating multilingual web pages, which was published in August 1997 as 'The Multilingual Web Guide', in a Japanese edition, the world's first book on such a subject."

Yoshi added in the same email interview: "Thousands of years ago, in Egypt, China and elsewhere, people were more concerned about communicating their laws and thoughts not in just one language, but in several. In our modern world, most nation states have each adopted one language for their own use. I predict greater use of different languages and multilingual pages on the internet, not a simple gravitation to American English, and also more creative use of multilingual computer translation. 99% of the websites created in Japan are written in Japanese."

Brian King, director of the WorldWide Language Institute (WWLI), explained in September 1998: "A pull from non-English- speaking computer users and a push from technology companies competing for global markets has made localization a fast growing area in software and hardware development. This development has not been as fast as it could have been. The first step was for ASCII to become Extended ASCII. This meant that computers could begin to start recognizing the accents and symbols used in variants of the English alphabet - mostly used by European languages. But only one language could be displayed on a page at a time. (…) The most recent development is Unicode. Although still evolving and only just being incorporated into the latest software, this new coding system translates each character into 16 bytes. Whereas 8-byte Extended ASCII could only handle a maximum of 256 characters, Unicode can handle over 65,000 unique characters and therefore potentially accommodate all of the world's writing systems on the computer. So now the tools are more or less in place. They are still not perfect, but at last we can at least surf the web in Chinese, Japanese, Korean, and numerous other languages that don't use the Western alphabet. As the internet spreads to parts of the world where English is rarely used - such as China, for example, it is natural that Chinese, and not English, will be the preferred choice for interacting with it. For the majority of the users in China, their mother tongue will be the only choice."

Ten years later, in 2008, 50% of all the documents available on the internet were encoded in Unicode, with the other 50% encoded in ASCII. ASCII is still very useful, especially the original 7-bit plain ASCII, because it can be read, written, copied and printed by any text editor or word processor, and it is the only format compatible with 99% of all hardware and software.

First published in January 1991, Unicode "provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language" (excerpt from the website). This double-byte platform-independent encoding provides a basis for the processing, storage and interchange of text data in any language, and any modern software and information technology protocols. Unicode is maintained by the Unicode Consortium, and is a component of the W3C (World Wide Web Consortium) specifications.

= Language dictionaries

Logos is an international translation company with headquarters in Modena, Italy. In 1997, Logos had 200 in-house translators in Modena and 2,500 free-lance translators worldwide, who processed around 200 texts per day. The company made a bold move, and decided to put on the web all the linguistic tools used by its translators, for the internet community to freely use them as well. The linguistic tools were the Logos Dictionary, a multilingual dictionary with 7 billion words (in fall 1998); the Logos Wordtheque, a multilingual library with 300 billion words extracted from translated novels, technical manuals and other texts; the Logos Linguistic Resources, a database of 500 glossaries; and the Logos Universal Conjugator, a database for verbs in 17 languages.

When interviewed by Annie Kahn on December 7, 1997 for the French daily Le Monde, Rodrigo Vergara, head of Logos, explained: "We wanted all our translators to have access to the same translation tools. So we made them available on the internet, and while we were at it we decided to make the site open to the public. This made us extremely popular, and also gave us a lot of exposure. This move has in fact attracted many customers, and also allowed us to widen our network of translators, thanks to contacts made in the wake of the initiative."

In the same article, Annie Kahn wrote: "The Logos site is much more than a mere dictionary or a collection of links to other online dictionaries. The cornerstone is the document search program, which processes a corpus of literary texts available free of charge on the web. If you search for the definition or the translation of a word ('didactique', for example), you get not only the answer sought, but also a quote from one of the literary works containing the word (in our case, an essay by Voltaire). All it takes is a click on the mouse to access the whole text or even to order the book, including in foreign translations, thanks to a partnership agreement with the famous online bookstore Amazon.com. However, if no text containing the required word is found, the program acts as a search engine, sending the user to other web sources containing this word. In the case of certain words, you can even hear the pronunciation. If there is no translation currently available, the system calls on the public to contribute. Everyone can make suggestions, after which Logos translators check the suggested translations they receive."

Robert Beard, a language teacher at Bucknell University (in Lewisburg, Pennsylvania), founded the website "A Web of Online Dictionaries" (WOD) in 1995, and included it then in a larger project, yourDictionary.com, that he cofounded in early 2000. He wrote in January 2000: "The new website is an index of 1,200+ dictionaries in more than 200 languages. Besides the WOD, the new website includes a word-of-the-day-feature, word games, a language chat room, the old 'Web of Online Grammars' (now expanded to include additional language resources), the 'Web of Linguistic Fun', multilingual dictionaries; specialized English dictionaries; thesauri and other vocabulary aids; language identifiers and guessers, and other features; dictionary indices. yourDictionary.com will hopefully be the premiere language portal and the largest language resource site on the web. It is now actively acquiring dictionaries and grammars of all languages with a particular focus on endangered languages. It is overseen by a blue ribbon panel of linguistic experts from all over the world."

yourDictionary.com wants to be the premiere portal for all languages without any exception, and as such offers a specific section called Endangered Language Repository. Robert Beard explained in the same email interview: "Languages that are endangered are primarily languages without writing systems at all (only 1/3 of the world's 6,000+ languages have writing systems). I still do not see the web contributing to the loss of language identity and still suspect it may, in the long run, contribute to strengthening it. More and more Native Americans, for example, are contacting linguists, asking them to write grammars of their language and help them put up dictionaries. For these people, the web is an affordable boon for cultural expression."

The 6,700 languages of our planet are catalogued in "The Ethnologue: Languages of the World", an encyclopedia published by SIL International (SIL: Summer Institute of Linguistics). Barbara Grimes was the editor of the 8th to 14th editions, 1971-2000. She wrote in January 2000: "The Ethnologue is a catalog of the languages of the world, with information about where they are spoken, an estimate of the number of speakers, what language family they are in, alternate names, names of dialects, other socio-linguistic and demographic information, dates of published Bibles, a name index, a language family index, and language maps." The Ethnologue is freely available on the web. The print version and CD-ROM can be bought online.

= Minority languages

Caoimhín Ó Donnaíle teaches computing - through the Gaelic language - at the Institute Sabhal Mór Ostaig, located on the Island of Skye, in Scotland. He also maintains the bilingual (English, Gaelic) college website, which is the main site worldwide with information on Scottish Gaelic, as well as the webpage European Minority Languages, a list of minority languages by alphabetic order and by language family. He wrote in May 2001: "There has been a great expansion in the use of information technology in our college. Far more computers, more computing staff, flat screens. Students do everything by computer, use Gaelic spell-checking, and a Gaelic online terminology database. There are more hits on our website. There is more use of sound. Gaelic radio (both Scottish and Irish) is now available continuously worldwide via the internet. A major project has been the translation of the Opera web browser into Gaelic - the first software of this size available in Gaelic."

What about the internet and endangered languages? "I would emphasize the point that as regards the future of endangered languages, the internet speeds everything up. If people don't care about preserving languages, the internet and accompanying globalisation will greatly speed their demise. If people do care about preserving them, the internet will be a tremendous help."

Guy Antoine is the founder of Windows on Haiti, a reference website about Haitian culture. He wrote in November 1999: "In Windows on Haiti, the primary language of the site is English, but one will equally find a center of lively discussion conducted in 'Kreyòl'. In addition, one will find documents related to Haiti in French, in the old colonial Creole, and I am open to publishing others in Spanish and other languages. I do not offer any sort of translation, but multilingualism is alive and well at the site, and I predict that this will increasingly become the norm throughout the web."

Guy added in June 2001: "Kreyòl is the only national language of Haiti, and one of its two official languages, the other being French. It is hardly a minority language in the Caribbean context, since it is spoken by eight to ten million people. (…) I have taken the promotion of Kreyòl as a personal cause, since that language is the strongest of bonds uniting all Haitians, in spite of a small but disproportionately influential Haitian elite's disdainful attitude to adopting standards for the writing of Kreyòl and supporting the publication of books and official communications in that language. For instance, there was recently a two-week book event in Haiti's Capital and it was promoted as 'Livres en folie' ('A mad feast for books'). Some 500 books from Haitian authors were on display, among which one could find perhaps 20 written in Kreyòl. This is within the context of France's major push to celebrate Francophony among its former colonies. This plays rather well in Haiti, but directly at the expense of Creolophony. What I have created in response to those attitudes are two discussion forums on my website, Windows on Haiti, held exclusively in Kreyòl. One is for general discussions on just about everything but obviously more focused on Haiti's current socio-political problems. The other is reserved only to debates of writing standards for Kreyòl. Those debates have been quite spirited and have met with the participation of a number of linguistic experts. The uniqueness of these forums is their non-academic nature."

= Translations

Henk Slettenhaar is a professor in communication technologies at Webster University, Geneva, Switzerland. He has regularly insisted on the need of bilingual websites, in the original language and in English. He wrote in December 1998: "I see multilingualism as a very important issue. Local communities that are on the web should principally use the local language for their information. If they want to present it to the world community as well, it should be in English too. I see a real need for bilingual websites. I am delighted there are so many offerings in the original language now. I much prefer to read the original with difficulty than getting a bad translation."

Henk added in August 1999: "There are two main categories of websites in my opinion. The first one is the global outreach for business and information. Here the language is definitely English first, with local versions where appropriate. The second one is local information of all kinds in the most remote places. If the information is meant for people of an ethnic and/or language group, it should be in that language first, with perhaps a summary in English. We have seen lately how important these local websites are - in Kosovo and Turkey, to mention just the most recent ones. People were able to get information about their relatives through these sites."

Jean-Pierre Cloutier was the editor of "Chroniques de Cybérie", a weekly French-language online report of internet news. Jean- Pierre wrote in August 1999: "The web is going to grow in non- English-speaking regions. So we have to take into account the technical aspects of the medium if we want to reach these 'new' users. I think it is a pity there are so few translations of important documents and essays published on the web - from English into other languages and vice versa. (…) In the same way, the recent spreading of the internet in new regions raises questions which would be good to read about. When will Spanish- speaking communication theorists and those speaking other languages be translated?"

Marcel Grangier is the head of the French Section of the Swiss Federal Government's Central Linguistic Services, which means he is in charge of organizing translations into French for the Swiss government. He wrote in January 1999: "We can see multilingualism on the internet as a happy and irreversible inevitability. So we have to laugh at the doomsayers who only complain about the supremacy of English. Such supremacy is not wrong in itself, because it is mainly based on statistics (more PCs per inhabitant, more people speaking English, etc.). The answer is not to 'fight' English, much less whine about it, but to build more sites in other languages. As a translation service, we also recommend that websites be multilingual. The increasing number of languages on the internet is inevitable and can only boost multicultural exchanges. For this to happen in the best possible circumstances, we still need to develop tools to improve compatibility. Fully coping with accents and other characters is only one example of what can be done."