DTD - Mixed Content Models – Handling Text + Elements Together
1. Introduction
In many XML documents, elements need to contain both raw text and child elements. This is called a mixed content model.
For example, in a paragraph, you might want plain text mixed with inline markup like <em>, <strong>, or <link>. DTD provides a way to describe such content models.
2. What is Mixed Content?
A mixed content model allows:
-
Text-only content
-
Elements + text interleaved
In DTD, this is declared with #PCDATA (Parsed Character Data).
Syntax:
<!ELEMENT elementName (#PCDATA | child1 | child2 | ...)*>
-
#PCDATArepresents raw text. -
|means "or" (choice). -
*means "zero or more" occurrences (unlimited mixing).
3. Example – Paragraph with Formatting
DTD:
<!ELEMENT para (#PCDATA | em | strong | link)*>
<!ELEMENT em (#PCDATA)>
<!ELEMENT strong (#PCDATA)>
<!ELEMENT link (#PCDATA)>
XML (Valid):
<para>
This is a <em>mixed</em> content example with
<strong>inline formatting</strong> and a
<link>reference</link>.
</para>
Here:
-
Text and child elements appear in any order.
-
Inline elements (
em,strong,link) can occur multiple times.
4. Rules of Mixed Content
-
#PCDATAmust be listed first in the declaration.<!ELEMENT example (#PCDATA | child)*> -
Always use
*at the end.-
This means text and elements can repeat and mix freely.
-
Without
*, only one occurrence would be allowed, which rarely works for natural text.
-
-
Order is not enforced when using mixed content.
-
Any combination of text and allowed elements is valid.
-
5. When to Use Mixed Content
-
Paragraphs (
para) -
Headings (
title) -
Annotations, footnotes, comments
-
Inline markup in structured text
Essentially, any narrative or human-readable text where free-flowing content must be combined with tags.
6. Case Study – XHTML Paragraphs
In XHTML (an XML version of HTML), <p> uses mixed content:
<!ELEMENT p (#PCDATA | a | span | b | i | img | ...)*>
This allows text like:
<p>
This is <b>bold</b> and <i>italic</i> text with
a <a href="example.com">link</a>.
</p>
7. Best Practices
-
Keep the list of allowed inline elements focused to avoid complexity.
-
Use mixed content only where text really needs markup.
-
For structured, data-centric XML (like invoices), avoid mixed content to ensure strict validation.
-
Document which inline elements are allowed inside which block elements.
8. Conclusion
Mixed content models are essential in XML DTDs when handling text interleaved with inline elements. By combining #PCDATA with child elements, you can model flexible, human-readable documents such as books, articles, and web pages. Proper use of mixed content enables structured text without losing natural flow.