DTD - Mixed Content Models – Handling Text + Elements Together

1. Introduction

In many XML documents, elements need to contain both raw text and child elements. This is called a mixed content model.

For example, in a paragraph, you might want plain text mixed with inline markup like <em>, <strong>, or <link>. DTD provides a way to describe such content models.


2. What is Mixed Content?

A mixed content model allows:

  • Text-only content

  • Elements + text interleaved

In DTD, this is declared with #PCDATA (Parsed Character Data).

Syntax:

<!ELEMENT elementName (#PCDATA | child1 | child2 | ...)*>
  • #PCDATA represents raw text.

  • | means "or" (choice).

  • * means "zero or more" occurrences (unlimited mixing).


3. Example – Paragraph with Formatting

DTD:

<!ELEMENT para (#PCDATA | em | strong | link)*>
<!ELEMENT em (#PCDATA)>
<!ELEMENT strong (#PCDATA)>
<!ELEMENT link (#PCDATA)>

XML (Valid):

<para>
  This is a <em>mixed</em> content example with 
  <strong>inline formatting</strong> and a 
  <link>reference</link>.
</para>

Here:

  • Text and child elements appear in any order.

  • Inline elements (em, strong, link) can occur multiple times.


4. Rules of Mixed Content

  1. #PCDATA must be listed first in the declaration.

    <!ELEMENT example (#PCDATA | child)*>
    
  2. Always use * at the end.

    • This means text and elements can repeat and mix freely.

    • Without *, only one occurrence would be allowed, which rarely works for natural text.

  3. Order is not enforced when using mixed content.

    • Any combination of text and allowed elements is valid.


5. When to Use Mixed Content

  • Paragraphs (para)

  • Headings (title)

  • Annotations, footnotes, comments

  • Inline markup in structured text

Essentially, any narrative or human-readable text where free-flowing content must be combined with tags.


6. Case Study – XHTML Paragraphs

In XHTML (an XML version of HTML), <p> uses mixed content:

<!ELEMENT p (#PCDATA | a | span | b | i | img | ...)*>

This allows text like:

<p>
  This is <b>bold</b> and <i>italic</i> text with 
  a <a href="example.com">link</a>.
</p>

7. Best Practices

  • Keep the list of allowed inline elements focused to avoid complexity.

  • Use mixed content only where text really needs markup.

  • For structured, data-centric XML (like invoices), avoid mixed content to ensure strict validation.

  • Document which inline elements are allowed inside which block elements.


8. Conclusion

Mixed content models are essential in XML DTDs when handling text interleaved with inline elements. By combining #PCDATA with child elements, you can model flexible, human-readable documents such as books, articles, and web pages. Proper use of mixed content enables structured text without losing natural flow.