XSLT - Whitespace Handling Using <xsl:strip-space> and <xsl:preserve-space>?
Whitespace management is an important aspect of XSLT processing. XML documents often contain spaces, tabs, and line breaks that are added to improve readability for humans. While these formatting characters make XML files easier to read and maintain, they can sometimes interfere with transformations if not handled properly. XSLT provides mechanisms to control how whitespace is treated during processing, primarily through the <xsl:strip-space> and <xsl:preserve-space> elements.
Understanding Whitespace in XML
Whitespace refers to characters such as:
-
Spaces
-
Tabs
-
Carriage returns
-
Line breaks
Consider the following XML document:
<employees>
<employee>
<name>John</name>
<department>HR</department>
</employee>
<employee>
<name>Mary</name>
<department>Finance</department>
</employee>
</employees>
The indentation and line breaks make the document readable. However, XML parsers may treat some of these formatting characters as text nodes. During transformation, these unnecessary text nodes can produce unwanted output.
Why Whitespace Control Is Important
Without proper whitespace handling, transformations may generate:
-
Extra blank lines
-
Unexpected spaces
-
Formatting inconsistencies
-
Larger output documents
-
Reduced processing efficiency
For example, when converting XML into HTML or plain text, unnecessary whitespace can affect the appearance of the final result.
The <xsl:strip-space> Element
The <xsl:strip-space> element removes whitespace-only text nodes from specified elements before processing begins.
Syntax
<xsl:strip-space elements="element-list"/>
Example
XML Document:
<catalog>
<book>
<title>XML Guide</title>
</book>
</catalog>
XSLT Stylesheet:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<html>
<body>
<xsl:apply-templates/>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
In this example, whitespace-only text nodes throughout the document are removed before processing.
Using Specific Elements
Instead of removing whitespace from all elements, specific elements can be targeted.
<xsl:strip-space elements="book catalog"/>
Only whitespace within the specified elements will be stripped.
The <xsl:preserve-space> Element
The <xsl:preserve-space> element instructs the processor to retain whitespace-only text nodes for selected elements.
Syntax
<xsl:preserve-space elements="element-list"/>
Example
<xsl:preserve-space elements="description"/>
In this case, whitespace inside the description element is maintained exactly as it appears in the source document.
This is useful when formatting and spacing are meaningful.
Combining Strip and Preserve Rules
In many real-world applications, some elements require whitespace removal while others need preservation.
Example:
<xsl:strip-space elements="*"/>
<xsl:preserve-space elements="pre code"/>
This configuration removes whitespace from all elements except pre and code.
Such an approach is commonly used when transforming technical documentation where code formatting must remain intact.
Practical Example
Source XML:
<article>
<title>Learning XSLT</title>
<content>
XSLT is a language used
for XML transformations.
</content>
</article>
If whitespace is stripped:
<xsl:strip-space elements="article title"/>
The unnecessary formatting spaces around structural elements are removed.
If whitespace is preserved:
<xsl:preserve-space elements="content"/>
The spacing inside the content element remains available for processing.
Using Wildcards
A wildcard can be used to affect all elements.
Strip all whitespace
<xsl:strip-space elements="*"/>
Preserve all whitespace
<xsl:preserve-space elements="*"/>
The wildcard provides a convenient way to apply rules globally.
Impact on Performance
Whitespace stripping can improve performance in large XML documents because:
-
Fewer text nodes are processed.
-
Memory consumption is reduced.
-
Template matching becomes more efficient.
-
Output generation becomes faster.
For enterprise applications processing large XML files, whitespace management can contribute significantly to optimization.
Common Use Cases
HTML Generation
When generating HTML pages, unwanted spaces and blank lines may affect layout. Stripping whitespace helps create cleaner output.
XML-to-XML Transformation
During XML restructuring, whitespace nodes often have no business value and can be removed.
Technical Documentation
Documentation systems frequently preserve whitespace in code examples, configuration files, and command-line instructions.
Data Exchange Systems
Business XML messages exchanged between systems generally benefit from whitespace stripping to ensure consistent processing.
Potential Problems
Loss of Meaningful Formatting
If whitespace is stripped from content where spacing matters, information may be lost.
Example:
<poem>
Roses are red
Violets are blue
</poem>
Removing whitespace could alter the intended formatting.
Unexpected Output
Overusing <xsl:strip-space> without understanding document structure may lead to formatting issues in generated documents.
Mixed Content Documents
Documents containing both text and child elements require careful whitespace management because spacing may be significant.
Best Practices
-
Strip whitespace from structural elements where formatting is not important.
-
Preserve whitespace in content-oriented elements such as code blocks, poems, and formatted text.
-
Test transformations with realistic XML data.
-
Use wildcard rules carefully to avoid removing meaningful spaces.
-
Document whitespace-handling decisions for maintainability.
-
Combine stripping and preserving strategically for balanced results.
Conclusion
Whitespace handling is a crucial part of XSLT transformation design. The <xsl:strip-space> element removes unnecessary whitespace-only text nodes, helping create cleaner and more efficient output, while <xsl:preserve-space> ensures that important formatting remains intact where required. By using these elements thoughtfully, developers can improve transformation accuracy, enhance performance, and maintain the intended structure and appearance of XML-based content.