DTD - “The Evolution of XML Validation – From DTDs to XML Schema, Relax NG, and Modern Alternatives

The Evolution of XML Validation – From DTDs to XML Schema, Relax NG, and Modern Alternatives

1. Introduction

XML is widely used for representing structured data, but well-formedness is not enough for real-world applications. Validation ensures that XML documents follow a defined structure and rules. Over time, validation mechanisms evolved from DTD to more advanced schema languages like XML Schema (XSD), Relax NG, and beyond.


2. The Early Days – DTDs (Document Type Definitions)

  • Strengths:

    • First standardized mechanism for XML validation.

    • Simple syntax.

    • Widely supported in XML parsers.

  • Limitations:

    • No data types beyond text.

    • Poor namespace support.

    • Weak extensibility.

    • Not based on XML itself (DTD syntax is different).

Example:

<!ELEMENT book (title, author)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT author (#PCDATA)>

3. The Next Step – XML Schema (XSD)

  • W3C Recommendation (2001).

  • Key Features:

    • Rich data types (string, date, integer, boolean).

    • Namespaces support.

    • Written in XML syntax.

    • Complex structures (choice, sequence, attributes).

    • Support for inheritance and extension.

Example:

<xs:element name="price" type="xs:decimal"/>
  • Advantages over DTD:

    • Strong typing.

    • Interoperability with web services (SOAP, WSDL).

    • Extensible and modular.

  • Drawbacks:

    • Verbose and complex.

    • Steep learning curve.


4. An Alternative Approach – Relax NG

  • Developed by OASIS and ISO (2001–2002).

  • Two syntaxes: XML-based and compact (human-friendly).

  • Strengths:

    • Simpler than XML Schema.

    • More expressive content models.

    • Better readability.

    • Widely used in publishing, documentation standards (DocBook, TEI).

Example (Compact Syntax):

element book {
  element title { text },
  element author { text }
}
  • Advantages:

    • Flexible, minimal, elegant.

    • Better for document-centric XML.

  • Limitations:

    • Lacks strong typing compared to XSD.

    • Less widespread adoption in enterprise systems.


5. Modern Alternatives

  1. Schematron

    • Rule-based validation (XPath-based).

    • Great for business rules and complex constraints.

    • Often used alongside XSD or Relax NG.

    Example Rule:

    <sch:rule context="invoice">
      <sch:assert test="total = sum(item/price)">Invoice total must equal sum of items.</sch:assert>
    </sch:rule>
    
  2. Hybrid Approaches

    • Combine XSD (data typing) with Schematron (rules).

    • Used in healthcare (HL7), publishing, financial data.

  3. JSON Schema (for JSON data)

    • Though not XML, JSON Schema is now widely adopted for validation in REST APIs.

    • Reflects the trend of moving beyond XML in many industries.


6. Comparison Overview

Feature DTD XML Schema (XSD) Relax NG Schematron
Data Typing No Yes Partial XPath-based
Namespace Weak Strong Strong Strong
Readability Simple Verbose/Complex High Moderate
Expressiveness Limited High High Very High (rules)
Syntax Non-XML XML XML/Compact XML

7. Conclusion

  • DTD was the foundation but is now outdated for large, data-heavy applications.

  • XML Schema (XSD) dominates in enterprise systems due to strong typing and web services integration.

  • Relax NG remains popular in publishing and documentation due to simplicity.

  • Schematron complements schemas with rule-based validation.

  • The broader industry trend is moving toward JSON and JSON Schema, but XML remains critical in specialized domains (finance, publishing, healthcare, standards).