From this standpoint, I have much less faith in mechanized tools of language translation, which render words and phrases but do a poor job of conveying the soul of a people. Who are the Haitian people, for instance, without "Kreyòl" (Creole for the non-initiated), the language that has evolved and bound various African tribes transplanted in Haiti during the slavery period? It is the most palpable exponent of commonality that defines us as a people. However, it is primarily a spoken language, not a widely written one. I see the web changing this situation more so than any traditional means of language dissemination. In Windows on Haiti, the primary language of the site is English, but one will equally find a center of lively discussion conducted in "Kreyòl". In addition, one will find documents related to Haiti in French, in the old colonial creole, and I am open to publishing others in Spanish and other languages. I do not offer any sort of translation, but multilingualism is alive and well at the site, and I predict that this will increasingly become the norm throughout the web."

ENCODING: FROM ASCII TO UNICODE

= [Quote]

Brian King, director of the WorldWide Language Institute (WWLI), explained in September 1998: "The first step was for ASCII to become Extended ASCII. This meant that computers could begin to start recognizing the accents and symbols used in variants of the English alphabet — mostly used by European languages. But only one language could be displayed on a page at a time. (…) The most recent development is Unicode. Although still evolving and only just being incorporated into the latest software, this new coding system translates each character into 16 bytes. Whereas 8-byte extended ASCII could only handle a maximum of 256 characters, Unicode can handle over 65,000 unique characters and therefore potentially accommodate all of the world's writing systems on the computer. So now the tools are more or less in place. They are still not perfect, but at last we can at least surf the web in Chinese, Japanese, Korean, and numerous other languages that don't use the Western alphabet. As the internet spreads to parts of the world where English is rarely used - such as China, for example, it is natural that Chinese, and not English, will be the preferred choice for interacting with it. For the majority of the users in China, their mother tongue will be the only choice."

= Encoding in Project Gutenberg

Used since the beginning of computing, ASCII (American Standard Code for Information Interchange) is a 7-bit coded character set for information interchange in English. It was published in 1968 by ANSI (American National Standards Institute), with an update in 1977 and 1986. The 7-bit plain ASCII, also called Plain Vanilla ASCII, is a set of 128 characters with 95 printable unaccented characters (A-Z, a-z, numbers, punctuation and basic symbols), i.e. the ones that are available on the English/American keyboard. With the use of other European languages, extensions of ASCII (also called ISO-8859 or ISO- Latin) were created as sets of 256 characters to add accented characters as found in French, Spanish and German, for example ISO 8859-1 (ISO-Latin-1) for French.

Created by Michael Hart in July 1971, Project Gutenberg was the first information provider on the internet. Michael's purpose was to digitize as many literary texts as possible, and to offer them for free in a digital library open to anyone. Michael explained in August 1998: "We consider etext to be a new medium, with no real relationship to paper, other than presenting the same material, but I don't see how paper can possibly compete once people each find their own comfortable way to etexts, especially in schools."

Whether digitized years ago or now, all Project Gutenberg books are created in 7-bit plain ASCII, called Plain Vanilla ASCII. When 8-bit ASCII is used for books with accented characters like French or German, Project Gutenberg also produces a 7-bit ASCII version with the accents stripped. (This doesn't apply for languages that are not "convertible" in ASCII, like Chinese, encoded in Big-5.)

Project Gutenberg sees Plain Vanilla ASCII as the best format by far, and calls it "the lowest common denominator". It can be read, written, copied and printed by any simple text editor or word processor on any electronic device. It is the only format compatible with 99% of hardware and software. It can be used as it is or to create versions in many other formats. It will still be used while other formats will be obsolete, or are already obsolete, like formats of a few short-lived reading devices launched since 1999. It is the assurance collections will never be obsolete, and will survive future technological changes. The goal is to preserve the texts not only over decades but over centuries.

Project Gutenberg also publishes ebooks in well-known formats like HTML, XML or RTF. There are Unicode files too. Any other format provided by volunteers (PDF, LIT, TeX and many others) is usually accepted, as long as they also supply an ASCII version where possible.