DTD - External Subsets – Modularizing DTDs for Reuse
External Subsets – Modularizing DTDs for Reuse
1. Introduction
As XML projects grow, it becomes inefficient to declare all element and attribute rules inside every XML document. To promote reuse, modularization, and easier maintenance, XML allows external DTD subsets.
An external subset is a DTD stored in a separate file, which can be linked to multiple XML documents. This makes it possible to define common structures once and reuse them across projects.
2. Internal vs External Subset
-
Internal Subset
-
Defined inside the XML document itself.
-
Useful for small, document-specific rules.
Example:
<!DOCTYPE note [ <!ELEMENT note (to, from, body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT body (#PCDATA)> ]> -
-
External Subset
-
Stored in a separate
.dtdfile. -
Referenced from XML with
SYSTEMorPUBLICidentifiers.
Example:
<!DOCTYPE note SYSTEM "note.dtd"> -
3. Declaring an External Subset
There are two ways to include an external DTD subset:
a) Using SYSTEM Identifier
Points to a specific URI (local file or web location).
<!DOCTYPE book SYSTEM "book.dtd">
b) Using PUBLIC Identifier
Provides a public identifier plus a system identifier (fallback).
<!DOCTYPE book PUBLIC "-//Publisher//DTD Book 1.0//EN" "book.dtd">
-
PUBLICidentifiers are useful for standardized DTDs (e.g., DocBook, XHTML). -
Parsers try to resolve the public identifier first; if not available, they use the system path.
4. Advantages of External Subsets
-
Reusability – Define once, use in many XML files.
-
Maintainability – Updates apply to all linked documents.
-
Collaboration – Teams can work on separate DTD modules.
-
Consistency – Ensures all documents follow the same structure.
5. Modularizing with External Subsets
Instead of one large DTD file, break it into modules:
-
common.dtd → shared definitions (e.g.,
title,author). -
book.dtd → book-specific rules.
-
journal.dtd → journal-specific rules.
-
master.dtd → imports all others.
master.dtd example:
<!ENTITY % common SYSTEM "common.dtd">
%common;
<!ENTITY % book SYSTEM "book.dtd">
%book;
<!ENTITY % journal SYSTEM "journal.dtd">
%journal;
Then an XML file can link only to master.dtd:
<!DOCTYPE library SYSTEM "master.dtd">
6. Case Study – Publishing System
A publisher might use external subsets to ensure all manuscripts follow the same structure:
-
core.dtddefines authors, metadata, and copyright. -
article.dtddefinesabstract,section, andreferences. -
book.dtddefineschapter,appendix, andindex.
Both books and articles reuse core.dtd, ensuring consistent metadata across all publications.
7. Best Practices
-
Use relative paths for DTD files if documents are stored together.
-
Prefer PUBLIC identifiers when sharing DTDs across organizations.
-
Keep modules small and focused.
-
Document versioning (e.g.,
book-1.0.dtd,book-2.0.dtd). -
Validate both individual modules and the combined master DTD.
8. Conclusion
External subsets make DTDs scalable, modular, and reusable. By separating structure definitions from XML content, organizations gain flexibility, maintainability, and consistency across projects. This technique is especially valuable in industries like publishing, technical documentation, and standards development, where many XML files must follow a uniform model.