XML - XPath Functions and Operators (Advanced Level)

XPath is a powerful query language used to navigate and extract data from XML documents. While basic XPath focuses on selecting nodes using simple paths, the advanced use of XPath relies heavily on functions, operators, axes, and predicates to perform complex queries and transformations. Understanding these elements allows precise data retrieval, filtering, and manipulation.


1. XPath Functions Overview

XPath provides a rich set of built-in functions grouped into categories such as node, string, numeric, boolean, and date/time (in XPath 2.0+). These functions help process and evaluate XML data efficiently.

Node Functions
These functions operate on node sets:

  • last() returns the position of the last node in a node set.

  • position() returns the current node’s position.

  • count(node-set) counts the number of nodes.

  • name() returns the name of a node.

  • local-name() returns the local part of a node name without namespace.

Example:

/bookstore/book[last()]

Selects the last book element.


2. String Functions

String functions manipulate text within XML nodes:

  • contains(string1, string2) checks if one string contains another.

  • starts-with(string1, string2) checks prefix.

  • substring(string, start, length) extracts part of a string.

  • string-length() returns length of a string.

  • normalize-space() removes extra whitespace.

Example:

//book[contains(title, 'XML')]

Selects books whose title contains the word “XML”.


3. Numeric Functions

These functions handle numerical operations:

  • sum(node-set) calculates total values.

  • round(), floor(), ceiling() adjust numbers.

  • number() converts values to numeric form.

Example:

sum(//book/price)

Calculates total price of all books.


4. Boolean Functions

Boolean functions evaluate conditions:

  • true() and false() return boolean values.

  • not() negates a condition.

  • boolean() converts values to boolean.

Example:

//book[not(@available)]

Selects books without the "available" attribute.


5. XPath Operators

Operators are used to compare values and combine conditions.

Comparison Operators

  • = equal

  • != not equal

  • <, >, <=, >=

Example:

//book[price > 500]

Logical Operators

  • and

  • or

Example:

//book[price > 300 and price < 700]

Arithmetic Operators

  • +, -, *, div, mod

Example:

//book[price * 2 > 1000]

6. Axes in XPath

Axes define relationships between nodes and are essential for advanced navigation.

Common axes include:

  • child:: selects children

  • parent:: selects parent

  • ancestor:: selects all ancestors

  • descendant:: selects all descendants

  • following-sibling:: selects next siblings

  • preceding-sibling:: selects previous siblings

Example:

//book/ancestor::library

Selects the library element containing the book.


7. Predicates for Filtering

Predicates use square brackets to filter nodes based on conditions.

Examples:

//book[1]

Selects the first book.

//book[@category='fiction']

Filters books by category.

//book[price > 500][author='John']

Applies multiple conditions.


8. Combining Functions and Predicates

Advanced XPath often combines multiple techniques for precise selection.

Example:

//book[contains(title, 'XML') and price < 500]

Example with position:

//book[position() <= 3]

Selects first three books.


9. XPath 2.0 Enhancements (Advanced Insight)

In XPath 2.0 and later:

  • Support for sequences instead of node sets

  • Additional functions like matches(), replace(), tokenize()

  • Stronger data typing (string, integer, date)

Example:

//book[matches(title, '^XML')]

Conclusion

Advanced XPath functions and operators transform simple queries into powerful data extraction tools. By combining functions, axes, predicates, and operators, users can navigate complex XML structures with precision. Mastery of these concepts is essential for working with XML in real-world applications such as data integration, web services, and document processing systems.