Workshop on Electronic Texts: Proceedings, 9-10 June 1992 - Library of Congress

SPERBERG-McQUEEN returned at length to the issue of images as simulacra for the text, in order to reiterate his belief that in the long run more than images of pages of particular editions of the text are needed, because just as second-generation photocopies and second-generation microfilm degenerate, so second-generation representations tend to degenerate, and one tends to overstress some relatively trivial aspects of the text such as its layout on the page, which is not always significant, despite what the text critics might say, and slight other pieces of information such as the very important lexical ties between the English and Latin versions of Comenius's bilingual text, for example. Moreover, in many crucial respects it is easy to fool oneself concerning what a scanned image of the text will accomplish. For example, in order to study the transmission of texts, information concerning the text carrier is necessary, which scanned images simply do not always handle. Further, even the high-quality materials being produced at Cornell use much of the information that one would need if studying those books as physical objects. It is a choice that has been made. It is an arguably justifiable choice, but one does not know what color those pen strokes in the margin are or whether there was a stain on the page, because it has been filtered out. One does not know whether there were rips in the page because they do not show up, and on a couple of the marginal marks one loses half of the mark because the pen is very light and the scanner failed to pick it up, and so what is clearly a checkmark in the margin of the original becomes a little scoop in the margin of the facsimile. Standard problems for facsimile editions, not new to electronics, but also true of light-lens photography, and are remarked here because it is important that we not fool ourselves that even if we produce a very nice image of this page with good contrast, we are not replacing the manuscript any more than microfilm has replaced the manuscript.

The TEI comes from the research community, where its first allegiance lies, but it is not just an academic exercise. It has relevance far beyond those who spend all of their time studying text, because one's model of text determines what one's software can do with a text. Good models lead to good software. Bad models lead to bad software. That has economic consequences, and it is these economic consequences that have led the European Community to help support the TEI, and that will lead, SPERBERG-McQUEEN hoped, some software vendors to realize that if they provide software with a better model of the text they can make a killing.

******

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ DISCUSSION * Implications of different DTDs and tag sets * ODA versus SGML * +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

During the discussion that followed, several additional points were made. Neither AAP (i.e., Association of American Publishers) nor CALS (i.e., Computer-aided Acquisition and Logistics Support) has a document-type definition for ancient Greek drama, although the TEI will be able to handle that. Given this state of affairs and assuming that the technical-journal producers and the commercial vendors decide to use the other two types, then an institution like the Library of Congress, which might receive all of their publications, would have to be able to handle three different types of document definitions and tag sets and be able to distinguish among them.

Office Document Architecture (ODA) has some advantages that flow from its tight focus on office documents and clear directions for implementation. Much of the ODA standard is easier to read and clearer at first reading than the SGML standard, which is extremely general. What that means is that if one wants to use graphics in TIFF and ODA, one is stuck, because ODA defines graphics formats while TIFF does not, whereas SGML says the world is not waiting for this work group to create another graphics format. What is needed is an ability to use whatever graphics format one wants.

The TEI provides a socket that allows one to connect the SGML document to the graphics. The notation that the graphics are in is clearly a choice that one needs to make based on her or his environment, and that is one advantage. SGML is less megalomaniacal in attempting to define formats for all kinds of information, though more megalomaniacal in attempting to cover all sorts of documents. The other advantage is that the model of text represented by SGML is simply an order of magnitude richer and more flexible than the model of text offered by ODA. Both offer hierarchical structures, but SGML recognizes that the hierarchical model of the text that one is looking at may not have been in the minds of the designers, whereas ODA does not.

ODA is not really aiming for the kind of document that the TEI wants to encompass. The TEI can handle the kind of material ODA has, as well as a significantly broader range of material. ODA seems to be very much focused on office documents, which is what it started out being called— office document architecture.

******

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ CALALUCA * Text-encoding from a publisher's perspective * Responsibilities of a publisher * Reproduction of Migne's Latin series whole and complete with SGML tags based on perceived need and expected use * Particular decisions arising from the general decision to produce and publish PLD * +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++