Free since 2005 · No login required
AT

Academic Tutorials

Learn at your own pace

site-mobile-top-banner · 320x50

Comparing W3C XML Schemas and Document Type Definitions (DTDs)

Added 30 Jul 2008

Much of the point of using XML as a data representation format is the possibility of specifying structural requirements for documents: rules for exactly what types of content and subelements may occur within elements (and in what order, cardinality, etc.). In traditional SGML circles, the representation of document rules has been as DTDs -- and indeed the formal specification of the W3C XML 1.0 Recommendation explicitly provides for DTDs. However, there are some things that DTDs cannot accomplish that are fairly common constraints; the main limitation of DTDs is the poverty in their expression of data types (you can specify that an element must contain PCDATA, but not that it must contain, for example, a nonNegativeInteger). As a side matter, DTDs do not make the specification of subelement cardinality easy (you can compactly specify "one or more" of a subelement, but specifying "between seven and twelve" is, while possible, excessively verbose, or even outright contorted).

In answer to various limitations of DTDs, some XML users have called for alternative ways of specifying document rules. It has always been possible to programmatically examine conditions in XML documents, but the ability to impose the more rigid standard that, "a document not meeting a set of formal rules is invalid," essentially, is often preferable. W3C XML Schemas are one major answer to these calls (but not the only schema option out there). Steven Holzner, in Inside XML has a characterization of XML schemas that is worth repeating: Over time, many people have complained to the W3C about the complexity of DTDs and have asked for something simpler. W3C listened, assigned a committee to work on the problem, and came up with a solution that is much more complex than DTDs ever were (p.199). Holzner continues -- and most all XML programmers will agree (myself included) -- that despite their complexity, W3C XML Schemas provide a lot of important capabilities and are worth using for many classes of validation rules.

At least two fundamental and conceptual wrinkles remain for any "schemas everywhere" goal. The first issue is that the W3C XML Schema Candidate Recommendation, which just ended its review period on December 15, 2000, does not include any provision for entities; by extension, this includes parametric entities. The second issue is that despite their enhanced expressiveness, there are still many document rules that you cannot express in XML schemas (some proposals offer to utilize XSLT to enhance validation expressiveness, but other means are also possible and in use). In other words, schemas cannot quite do everything DTDs have long been able to, while on the other hand, schemas also cannot express a whole set of further rules one might wish to impose on documents. At a more pragmatic level, tools for working with XML schemas are less mature than those for working with DTDs (especially regarding validation, which is the core issue).

The whole state of XML document validation rules remains messy. Unfortunately, I am not able to prognosticate how everything will eventually shake out. (For a summary of when DTDs probably make sense to use, see the sidebar When to use DTDs.) In the meantime, let's look at some specifics of what DTDs and XML schemas are capable of expressing.