DTD - “The Evolution of XML Validation – From DTDs to XML Schema, Relax NG, and Modern Alternatives
The Evolution of XML Validation – From DTDs to XML Schema, Relax NG, and Modern Alternatives
1. Introduction
XML is widely used for representing structured data, but well-formedness is not enough for real-world applications. Validation ensures that XML documents follow a defined structure and rules. Over time, validation mechanisms evolved from DTD to more advanced schema languages like XML Schema (XSD), Relax NG, and beyond.
2. The Early Days – DTDs (Document Type Definitions)
-
Strengths:
-
First standardized mechanism for XML validation.
-
Simple syntax.
-
Widely supported in XML parsers.
-
-
Limitations:
-
No data types beyond text.
-
Poor namespace support.
-
Weak extensibility.
-
Not based on XML itself (DTD syntax is different).
-
Example:
<!ELEMENT book (title, author)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT author (#PCDATA)>
3. The Next Step – XML Schema (XSD)
-
W3C Recommendation (2001).
-
Key Features:
-
Rich data types (string, date, integer, boolean).
-
Namespaces support.
-
Written in XML syntax.
-
Complex structures (choice, sequence, attributes).
-
Support for inheritance and extension.
-
Example:
<xs:element name="price" type="xs:decimal"/>
-
Advantages over DTD:
-
Strong typing.
-
Interoperability with web services (SOAP, WSDL).
-
Extensible and modular.
-
-
Drawbacks:
-
Verbose and complex.
-
Steep learning curve.
-
4. An Alternative Approach – Relax NG
-
Developed by OASIS and ISO (2001–2002).
-
Two syntaxes: XML-based and compact (human-friendly).
-
Strengths:
-
Simpler than XML Schema.
-
More expressive content models.
-
Better readability.
-
Widely used in publishing, documentation standards (DocBook, TEI).
-
Example (Compact Syntax):
element book {
element title { text },
element author { text }
}
-
Advantages:
-
Flexible, minimal, elegant.
-
Better for document-centric XML.
-
-
Limitations:
-
Lacks strong typing compared to XSD.
-
Less widespread adoption in enterprise systems.
-
5. Modern Alternatives
-
Schematron
-
Rule-based validation (XPath-based).
-
Great for business rules and complex constraints.
-
Often used alongside XSD or Relax NG.
Example Rule:
<sch:rule context="invoice"> <sch:assert test="total = sum(item/price)">Invoice total must equal sum of items.</sch:assert> </sch:rule> -
-
Hybrid Approaches
-
Combine XSD (data typing) with Schematron (rules).
-
Used in healthcare (HL7), publishing, financial data.
-
-
JSON Schema (for JSON data)
-
Though not XML, JSON Schema is now widely adopted for validation in REST APIs.
-
Reflects the trend of moving beyond XML in many industries.
-
6. Comparison Overview
| Feature | DTD | XML Schema (XSD) | Relax NG | Schematron |
|---|---|---|---|---|
| Data Typing | No | Yes | Partial | XPath-based |
| Namespace | Weak | Strong | Strong | Strong |
| Readability | Simple | Verbose/Complex | High | Moderate |
| Expressiveness | Limited | High | High | Very High (rules) |
| Syntax | Non-XML | XML | XML/Compact | XML |
7. Conclusion
-
DTD was the foundation but is now outdated for large, data-heavy applications.
-
XML Schema (XSD) dominates in enterprise systems due to strong typing and web services integration.
-
Relax NG remains popular in publishing and documentation due to simplicity.
-
Schematron complements schemas with rule-based validation.
-
The broader industry trend is moving toward JSON and JSON Schema, but XML remains critical in specialized domains (finance, publishing, healthcare, standards).