XSLT - Multi-Document Processing with the document() Function in XSLT
In many real-world XML transformation scenarios, the data required for generating the final output is not stored in a single XML file. Information may be distributed across multiple XML documents, such as customer records, product catalogs, inventory files, configuration files, or external reference data. XSLT provides the document() function to access and process data from multiple XML documents during a transformation.
The document() function enables an XSLT stylesheet to read and retrieve information from external XML files. This capability allows developers to combine, compare, and enrich data from different sources without manually merging the XML files beforehand. Multi-document processing is particularly useful in enterprise applications where data is maintained in separate systems and repositories.
Purpose of the document() Function
The primary purpose of the document() function is to load one or more external XML documents and make their contents available for processing within an XSLT transformation. Instead of working solely with the source XML document, the stylesheet can access additional XML files whenever needed.
The syntax is:
document(uri)
or
document(uri, node)
Where:
-
urispecifies the location of the external XML document. -
nodeprovides a base node for resolving relative paths.
Basic Example
Suppose the main XML file contains product information:
<products>
<product id="101"/>
<product id="102"/>
</products>
An external file named details.xml contains detailed product descriptions:
<details>
<item id="101">
<name>Laptop</name>
</item>
<item id="102">
<name>Printer</name>
</item>
</details>
The XSLT stylesheet can retrieve data from details.xml:
<xsl:value-of select="document('details.xml')/details/item[@id='101']/name"/>
Output:
Laptop
In this example, the stylesheet accesses an external XML file and extracts the product name.
Accessing Multiple Documents
The document() function can load multiple XML documents during a single transformation.
Example:
document('customers.xml')
document('orders.xml')
document('products.xml')
This approach enables data integration from different sources. For example:
-
Customer information from one file
-
Order information from another file
-
Product information from a third file
The transformation can combine all these datasets into a unified report.
Using Dynamic File References
The document path can be generated dynamically from XML data.
Source XML:
<reports>
<file>sales.xml</file>
</reports>
XSLT:
<xsl:variable name="filename" select="/reports/file"/>
<xsl:value-of select="document($filename)/sales/total"/>
The stylesheet reads the filename from the XML document and loads the corresponding external XML file.
Combining Data from Different Documents
One common use case is joining information from multiple XML sources.
Main XML:
<employees>
<employee dept="D1"/>
</employees>
Departments XML:
<departments>
<department id="D1">
<name>Human Resources</name>
</department>
</departments>
XSLT:
<xsl:value-of
select="document('departments.xml')
/departments/department[@id=current()/@dept]/name"/>
Output:
Human Resources
This technique resembles database joins and allows related information to be linked across XML documents.
Processing a Collection of Documents
An XSLT stylesheet may iterate through data loaded from another document.
Example:
<xsl:for-each select="document('employees.xml')/employees/employee">
<xsl:value-of select="name"/>
</xsl:for-each>
The transformation processes every employee record contained in the external XML file.
Loading Multiple Documents from a List
A stylesheet can load several documents specified in the source XML.
Source XML:
<files>
<file>sales1.xml</file>
<file>sales2.xml</file>
</files>
XSLT:
<xsl:for-each select="/files/file">
<xsl:copy-of select="document(.)"/>
</xsl:for-each>
The transformation loads and processes each XML file listed in the source document.
Benefits of Multi-Document Processing
Improved Data Organization
Different types of data can be maintained in separate XML files while still being processed together when required.
Better Maintainability
Updating one XML document does not require modifying other related files.
Data Reusability
The same external XML document can be referenced by multiple transformations.
Reduced Data Duplication
Information does not need to be copied into multiple XML files, reducing storage and synchronization issues.
Flexible Reporting
Reports can be generated by combining information from several sources dynamically.
Challenges and Considerations
Performance Impact
Loading multiple XML files can increase processing time, especially when working with large datasets or numerous documents.
File Availability
If an external XML file is missing or inaccessible, the transformation may fail or produce incomplete output.
Security Restrictions
Some XSLT processors restrict access to external files for security reasons. Proper permissions and configuration settings may be required.
Path Management
Relative and absolute paths must be carefully managed to ensure documents are located correctly.
Memory Usage
Processing many large XML files simultaneously may consume significant system memory.
Best Practices
-
Use meaningful and consistent file naming conventions.
-
Store frequently used documents in accessible locations.
-
Validate external XML documents before processing.
-
Cache frequently accessed documents when supported by the XSLT processor.
-
Handle missing documents gracefully using conditional logic.
-
Minimize unnecessary document loading to improve performance.
-
Use variables to store document references when the same file is accessed repeatedly.
Real-World Applications
Multi-document processing with the document() function is widely used in:
-
Enterprise data integration systems
-
Product catalog management
-
Customer and order reporting systems
-
Content management systems
-
XML-based publishing workflows
-
Financial reporting applications
-
Inventory management systems
-
Configuration management solutions
Conclusion
The document() function is a powerful feature of XSLT that enables multi-document processing by allowing access to external XML files during transformation. It supports data integration, reporting, and dynamic content generation by combining information from multiple sources. When used effectively, it helps create flexible, maintainable, and scalable XML transformation solutions while reducing data duplication and improving overall system organization.