XSLT - Multi-Document Processing with the document() Function in XSLT

In many real-world XML transformation scenarios, the data required for generating the final output is not stored in a single XML file. Information may be distributed across multiple XML documents, such as customer records, product catalogs, inventory files, configuration files, or external reference data. XSLT provides the document() function to access and process data from multiple XML documents during a transformation.

The document() function enables an XSLT stylesheet to read and retrieve information from external XML files. This capability allows developers to combine, compare, and enrich data from different sources without manually merging the XML files beforehand. Multi-document processing is particularly useful in enterprise applications where data is maintained in separate systems and repositories.

Purpose of the document() Function

The primary purpose of the document() function is to load one or more external XML documents and make their contents available for processing within an XSLT transformation. Instead of working solely with the source XML document, the stylesheet can access additional XML files whenever needed.

The syntax is:

document(uri)

or

document(uri, node)

Where:

  • uri specifies the location of the external XML document.

  • node provides a base node for resolving relative paths.

Basic Example

Suppose the main XML file contains product information:

<products>
    <product id="101"/>
    <product id="102"/>
</products>

An external file named details.xml contains detailed product descriptions:

<details>
    <item id="101">
        <name>Laptop</name>
    </item>
    <item id="102">
        <name>Printer</name>
    </item>
</details>

The XSLT stylesheet can retrieve data from details.xml:

<xsl:value-of select="document('details.xml')/details/item[@id='101']/name"/>

Output:

Laptop

In this example, the stylesheet accesses an external XML file and extracts the product name.

Accessing Multiple Documents

The document() function can load multiple XML documents during a single transformation.

Example:

document('customers.xml')
document('orders.xml')
document('products.xml')

This approach enables data integration from different sources. For example:

  • Customer information from one file

  • Order information from another file

  • Product information from a third file

The transformation can combine all these datasets into a unified report.

Using Dynamic File References

The document path can be generated dynamically from XML data.

Source XML:

<reports>
    <file>sales.xml</file>
</reports>

XSLT:

<xsl:variable name="filename" select="/reports/file"/>

<xsl:value-of select="document($filename)/sales/total"/>

The stylesheet reads the filename from the XML document and loads the corresponding external XML file.

Combining Data from Different Documents

One common use case is joining information from multiple XML sources.

Main XML:

<employees>
    <employee dept="D1"/>
</employees>

Departments XML:

<departments>
    <department id="D1">
        <name>Human Resources</name>
    </department>
</departments>

XSLT:

<xsl:value-of
select="document('departments.xml')
/departments/department[@id=current()/@dept]/name"/>

Output:

Human Resources

This technique resembles database joins and allows related information to be linked across XML documents.

Processing a Collection of Documents

An XSLT stylesheet may iterate through data loaded from another document.

Example:

<xsl:for-each select="document('employees.xml')/employees/employee">
    <xsl:value-of select="name"/>
</xsl:for-each>

The transformation processes every employee record contained in the external XML file.

Loading Multiple Documents from a List

A stylesheet can load several documents specified in the source XML.

Source XML:

<files>
    <file>sales1.xml</file>
    <file>sales2.xml</file>
</files>

XSLT:

<xsl:for-each select="/files/file">
    <xsl:copy-of select="document(.)"/>
</xsl:for-each>

The transformation loads and processes each XML file listed in the source document.

Benefits of Multi-Document Processing

Improved Data Organization

Different types of data can be maintained in separate XML files while still being processed together when required.

Better Maintainability

Updating one XML document does not require modifying other related files.

Data Reusability

The same external XML document can be referenced by multiple transformations.

Reduced Data Duplication

Information does not need to be copied into multiple XML files, reducing storage and synchronization issues.

Flexible Reporting

Reports can be generated by combining information from several sources dynamically.

Challenges and Considerations

Performance Impact

Loading multiple XML files can increase processing time, especially when working with large datasets or numerous documents.

File Availability

If an external XML file is missing or inaccessible, the transformation may fail or produce incomplete output.

Security Restrictions

Some XSLT processors restrict access to external files for security reasons. Proper permissions and configuration settings may be required.

Path Management

Relative and absolute paths must be carefully managed to ensure documents are located correctly.

Memory Usage

Processing many large XML files simultaneously may consume significant system memory.

Best Practices

  1. Use meaningful and consistent file naming conventions.

  2. Store frequently used documents in accessible locations.

  3. Validate external XML documents before processing.

  4. Cache frequently accessed documents when supported by the XSLT processor.

  5. Handle missing documents gracefully using conditional logic.

  6. Minimize unnecessary document loading to improve performance.

  7. Use variables to store document references when the same file is accessed repeatedly.

Real-World Applications

Multi-document processing with the document() function is widely used in:

  • Enterprise data integration systems

  • Product catalog management

  • Customer and order reporting systems

  • Content management systems

  • XML-based publishing workflows

  • Financial reporting applications

  • Inventory management systems

  • Configuration management solutions

Conclusion

The document() function is a powerful feature of XSLT that enables multi-document processing by allowing access to external XML files during transformation. It supports data integration, reporting, and dynamic content generation by combining information from multiple sources. When used effectively, it helps create flexible, maintainable, and scalable XML transformation solutions while reducing data duplication and improving overall system organization.