Last modified: December 21, 2002
Text Encoding Initiative (TEI)

"Initially launched in 1987, the TEI is an international and interdisciplinary standard that helps libraries, museums, publishers, and individual scholars represent all kinds of literary and linguistic texts for online research and teaching, using an encoding scheme that is maximally expressive and minimally obsolescent."

[June 15, 2002] In June 2002 the Text Encoding Initiative (TEI) Consortium P4. The Consortium, now in its second year, is an international non-profit corporation set up to maintain and develop the TEI system, which has become the de facto standard for scholarly work with digital text since its first publication in 1994. The launch of a fully XML-compliant version of the TEI Guidelines is a significant advance, placing the TEI firmly in the mainstream of current digital library and World Wide Web developments. The new edition has been available online for a few months, and will continue to be so, but the print edition now available from the University of Virginia Press (URL) marks a new milestone in the history of this long standing exercise in scholarly communication and international co-operation. In simple terms, the TEI Guidelines define a language for describing how texts are constructed and propose names for their components. By defining a standard set of names the Guidelines make it possible for different computer representations of texts to be combined into vast databases, and they also provide a common language for scholars wishing to work collaboratively. There are many such standard vocabularies in the industrial world -- in banking, in aircraft maintenance, or in chemical modelling, for example. The TEI's achievement has been to try to do the same thing for textual and linguistic data -- both for those working with the written culture of the past and for those studying the development of language itself. Membership in the TEI Consortium has climbed steadily during its first year of operation, standing at 56 members worldwide in May 2002, ranging from small university research projects to major academic libraries and institutions. The consortium offers a range of membership benefits including participation in TEI elections, special access to training, consultation on grant proposals, and free or discounted copies of the TEI Guidelines."

[January 09, 2002]   Lou Burnard invites comment on the publication of updated XML DTDs for the Text Encoding Initiative Guidelines. Based upon extensive public review, the XML DTDs have been improved and corresponding revised documentation has been created in HTML and PDF format for the TEI Guidelines. Approval of the new P4 edition by TEI Technical Council and final publication is expected within the near future. Already widely adopted for use in digital library projects, the TEI Guidelines are "intended for use in interchange between individuals and research groups using different programs and computer systems over a broad range of applications... The Guidelines apply to texts in any natural language, of any date, in any literary genre or text type, without restriction on form or content. They treat both continuous materials ('running text') and discontinuous materials such as dictionaries and linguistic corpora. The primary goal of the P4 revision has been to make available a new and corrected version of the TEI Guidelines which: (1) is expressed in XML and conforms to a TEI-conformant XML DTD; (2) generates a set of DTD fragments that can be combined together to form either SGML or XML document type definitions; (3) corrects blatant errors, typographical mishaps, and other egregious editorial oversights; (4) can be processed and maintained using readily available XML tools instead of the special-purpose ad hoc software originally used for TEI P3. A second major design goal of this revision has been to ensure that the DTD fragments generated would not break existing documents: in other words, that any document conforming to the original TEI P3 SGML DTD would also conform to the new XML version of it. Although full backwards compatibility cannot be guaranteed, we believe our implementation is consistent with that goal."

[August 01, 2001]   Text Encoding Initiative Consortium Releases P4 Draft Guidelines in XML and SGML.    TEI editors Lou Burnard and Steve DeRose have announced the official release of version 4 draft Guidelines for Electronic Text Encoding and Interchange. The third edition of the Guidelines known as 'P3' has been edited by participants in the Text Encoding Initiative Consortium (TEI-C); the third edition "has been heavily used since its released in April of 1994 for developing richly encoded and highly portable electronic editions of major works in philosophy, linguistics, history, literary studies, and many other disciplines. The fourth edition, 'P4' will be fully compatible with XML, as well as remaining compatible with SGML (XML's predecessor and the syntactic basis for P3). XML-compatible versions of the TEI DTDs have been available for some time by means of an automatic generation process using the TEI 'pizza chef' tool on the project's website. The first stage in the production of P4 has been to remove the need for this process; accordingly, a preliminary set of dual-capability XML or SGML DTDs was made available for testing at the ACH-ALLC Conference in New York in June. The next stage was to apply a series of systematic changes to the associated documentation, which is now complete: the results may be read online." The TEI editors invite participation in public review of the new P4 draft Guidelines.

[June 30, 1999] In June 1999, The Text Encoding Initiative (TEI) entered a significant new phase with the official publication of the XML DTD for TEI Lite, available with supporting resources on the Text Encoding Initiative has sponsored a major effort to "develop guidelines for the preparation and interchange of electronic texts for scholarly research, and to satisfy a broad range of uses by the language industries more generally." The published TEI Guidelines have gone through three major editions under the editorship of C. Michael Sperberg-McQueen and Lou Burnard, and the current TEI-P3 print volumes TEI Guidelines for Electronic Text Encoding and Interchange are also publicly available in SGML format. The TEI Guidelines have been used for SGML encoding in some sixty-nine (69) significant projects worldwide.

Though the TEI is a large and complex specification, a unique tool known as the online Pizza Chef will "help you design your own TEI-conformant document type definition (DTD). The TEI Guidelines define several hundred elements and associated attributes, which can be combined to make many different DTDs, suitable for many different purposes, either simple or complex. With the aid of the Pizza Chef, you can build a DTD that contains just the elements you want, suitable for use with any XML processing system. [To use the tool] you need to understand a little about how the TEI DTD is organized. In particular, you need to understand that the TEI scheme is organized into base and additional tagsets (groups of elements), and that each element in a tagset can be suppressed, or redefined... First, decide whether you need to use one base tagset or several base tagsets (Prose, Verse, Drama, Speech, Dictionaries, Terminology, General). Whichever base you use, you can add as many additional tagsets as you want. There are twelve to choose from. If you wish, your DTD can include declarations for one or more of the ISO public entity sets. If you want to discard or modify elements from the selected tagsets making up your DTD you can do this... you pass the names of your modification files to the pizza chef, along with the tagsets you chose originally... [press the button and ] build your personalized DTD..."

The TEI Lite XML DTD's public identifier is "-//TEI//DTD TEI Lite XML ver. 1.0//EN" (or: "-//TEI//DTD TEI Lite XML ver. 1//EN"). The principal resources supporting this XML release of the TEI DTD are described in a recent 'TEI, SGML and XML Resources.'

Principal References

  • TEI Consortium web site
  • The TEI Guidelines. Overview document.
  • Projects Using the TEI Guidelines
  • PizzaChef for creating TEI XML DTDs
  • TEI Guidelines in print: TEI P4: Guidelines for Electronic Text Encoding and Interchange. Edited by C.M. Sperberg-McQueen and Lou Burnard. Text Encoding Initiative Consortium. XML Version: Oxford, Providence, Charlottesville, Bergen. March 2002. Published for the TEI Consortium by the Humanities Computing Unit, University of Oxford, 2002. Distributed by the University of Virginia Press. XML-compatible edition prepared by Syd Bauman, Lou Burnard, Steve DeRose, and Sebastian Rahtz. ISBN: 0-952-33013-X. Printed in two parts. Volume One: Chapters 1-23, pages i-xviii, 1-572. Volume Two: Chapters 24-36, Index, Appendices, pages 573-1067. Available for purchase.
  • Note that the standard TEI DTDs are generated and maintained using a "literate programming style" system (originally) called ODD ['One Document Does It All']. For details, see the excerpted comments from TEI List postings of Sebastian Rahtz and Lou Burnard.
  • TEI Tutorials
  • TEI News Page
  • The TEI FAQ document
  • About the TEI Consortium
  • [1997-1999] Previous TEI database entry in the SGML/XML Web Page. This document section (though outdated) references software especially applicable to the creation/use of TEI-encoded texts.


2002-12-21 Note: This section under construction/revision

Articles, Papers, News

[This section under revision.]

Early History of TEI XML Version

[1998 description] C. Michael Sperberg-McQueen (University of Illinois) is both an Editor of the TEI Project, and XML co-editor. The TEI Extended Pointer language plays a significant role in the design of XLink and XPointer - the two major components in XML's linking language. The W3C's TEI Lite and Sweb DTDs, the latter being an effort largely of Michael Sperberg-McQueen. While the TEI P3 Guidelines now provide DTDs for SGML encoding, effort is underway to make the Guidelines accessible to XML users as well. The TEI has recently chartered a workgroup on architectural issues, chaired by Frank Tompa, where one of its specific charges is the development of an XML version of the full TEI DTD. A conference is to be held in the summer of 1998, sponsored by the Digital Library Federation and held Library of Congress, Washington, DC.; one of the goals is to "explore the impact of Extensible Markup Language (XML), and XML-conformant TEI, on digital library efforts."


  • From TEI Editor, C. M. Sperberg-McQueen. Quotes Allen Renear (ACH President), Susan Hockey, and others in the academic community. Also: TEI Web site

  • Unofficial work on an XML version of the TEI Lite DTD

  • Conference:

  • TEI, SGML and XML Resources

  • [May 13, 1999] Computers and the Humanities [The Official Journal of The Association for Computers and the Humanities.] Volume 33 Nos. 1-2, April 1999. ISSN: 0010-4817. Special Double Issue: Tenth Anniversary of the Text Encoding Initiative. Edited by Nancy Ide [Dept. of Computer Science, Vassar College, USA] and Dan Greenstein [Arts and Humanities Data Services, King's College, UK]. This issue contains an article by Steve DeRose, "XML and the TEI" (pages 11-30). Also: Jon Bosak, "XML Ubiquity and the Scholarly Community" (pages 199-206). See the Table of Contents

  • [May 13, 1999] Lou Burnard wrote on TEI-L, 11-May-1999, in response to a question by Fotis Jannidis ("...Does anybody know whether the long announced work on a conversion/adaption of the TEI dtds to XML dtds has begun, whether a working group has started on this task or whether P. Bonhomme's trial version is still the only thing around?..."): Michael [Sperberg-McQueen] and I have been working on this for the last few months. We have a working draft, almost complete, of a set of TEI extension files which will enable us to generate XML-compatible of any view of the TEI dtd. The first thing we produce with it will be a real XML version of TEI Lite (Patrice B.'s version is only a toy) and we hope to have this available by the ACH-ALLC conference next month [ = June 1999]." On the unofficial work, see: (1) local archive copy]

