= What has happened since our last interview?
I see a continued increase in small companies using language technology in one way or another: either to provide search, or translation, or reports, or some other communication function. The number of niches in which language technology can be applied continues to surprise me: from stock reports and updates to business-to-business communications to marketing…
With regard to research, the main breakthrough I see was led by a colleague at ISI (I am proud to say), Kevin Knight. A team of scientists and students last summer at Johns Hopkins University in Maryland developed a faster and otherwise improved version of a method originally developed (and kept proprietary) by IBM about 12 years ago. This method allows one to create a machine translation (MT) system automatically, as long as one gives it enough bilingual text. Essentially the method finds all correspondences in words and word positions across the two languages and then builds up large tables of rules for what gets translated to what, and how it is phrased.
Although the output quality is still low — no-one would consider this a final product, and no-one would use the translated output as is — the team built a (low-quality) Chinese-to-English MT system in 24 hours. That is a phenomenal feat — this has never been done before. (Of course, say the critics: you need something like 3 million sentence pairs, which you can only get from the parliaments of Canada, Hong Kong, or other bilingual countries; and of course, they say, the quality is low. But the fact is that more bilingual and semi-equivalent text is becoming available online every day, and the quality will keep improving to at least the current levels of MT engines built by hand. Of that I am certain.)
Other developments are less spectacular. There's a steady improvement in the performance of systems that can decide whether an ambiguous word such as "bat" means "flying mammal" or "sports tool" or "to hit"; there is solid work on cross-language information retrieval (which you will soon see in being able to find Chinese and French documents on the Web even though you type in English-only queries), and there is some rather rapid development of systems that answer simple questions automatically (rather like the popular web system AskJeeves, but this time done by computers, not humans). These systems refer to a large collection of text to find "factiods" (not opinions or causes or chains of events) in response to questions such as "what is the capital of Uganda?" or "how old is President Clinton?" or "who invented the xerox process?", and they do so rather better than I had expected.
= What do you think about e-books?
E-books, to me, are a non-starter. More even that seeing a concert live or a film at a cinema, I like the physical experience holding a book in my lap and enjoying its smell and feel and heft. Concerts on TV, films on TV, and e-books lose some of the experience; and with books particularly it is a loss I do not want to accept. After all, it's much easier and cheaper to get a book in my own purview than a concert or cinema. So I wish the e-book makers well, but I am happy with paper. And I don't think I will end up in the minority anytime soon — I am much less afraid of books vanishing than I once was of cinemas vanishing.
= What is your definition of cyberspace?
I define cyberspace as the totality of information that we can access via the Internet and computer systems in general. It is not, of course, a space, and it has interesting differences with libraries. For example, soon my fridge, my car, and I myself will be "known" to cyberspace, and anyone with the appropriate access permission (and interest) will be able to find out what exactly I have in my fridge and how fast my car is going (and how long before it needs new shock absorbers) and what I am looking at now. In fact, I expect that advertisements will change their language and perhaps even pictures and layout to suit my knowledge and tastes as I walk by, simply by recognizing that "here comes someone who speaks primarily English and lives in Los Angeles and makes $X per year". All this behaviour will be made possible by the dynamically updatable nature of cyberspace (in contrast to a library), and the fact that computer chips are still shrinking in size and in price. So just as today I walk around in "socialspace" — a web of social norms, expectation, and laws — tomorrow I will be walking around in an additional cyberspace of information that will support me (sometimes) and restrict me (other times) and delight me (I hope often) and frustrate me (I am sure).
= And your definition of the information society?