What is XPath?
As the XML structure itself and the partitioning of data, structure and style should be clear now, the next question is, how to address a special element in the data part. This e.g. is important for pointing to specific parts of the document from within or outside the document, as it is done by XPointer (XML Pointer Language), or transforming the structure of the document, as it is done by XSLT (XSL Transformations). For delivering a standardized syntax and semantics for all these applications, the addressing language XPath (XML Path Language) has been defined.
XPath treats an XML document as a tree respectively as a part of a tree or a branch. As in a file system, every part of this tree can be adressed by a specific path as e.g. "/element/subelement", which is a compact and non-XML syntax. The branchings of this tree are nodes, groups are node-sets, whereby XPath distinguishes between seven types of node:
| root nodes |
This is the root of the tree ("/") and is different to the root element, which is a child element of the root node. Beside the root element the root node can contain comment nodes and processing instruction nodes as children, which can be found in the prolog (the head of the document), or after the data part, at the end of the document. |
| element nodes |
Element nodes, representing the elements of the documents, are the most common type of node, which are able to contain themselves comment nodes, processing instruction nodes, text nodes and element nodes again. |
| text nodes |
These nodes contain literals and are the leafs of the tree and therefore the most basic nodes, which contain no more nodes. |
| attribute nodes |
Every element node contains an optional amount of attributes, which are called as attribute nodes, not as child elements of the element node. These attribute nodes also contain literals. |
| namespace nodes |
Every namespace of an element is handled as a namespace node of the corresponding element node. Although the element node is the parent of these namespace nodes, these associated namespace nodes are not children of the element node.This is a kind of contradiction, of course, but defined this way in the XPath specification. |
| processing instruction nodes |
The processing instruction nodes represent the processing instructions, contained in the document. |
| comment nodes |
These nodes contain no children and their value are the literals between "<!--" and "-->". |
For pointing to a specific element, the distinction of the different nodes is only one step. Now the access to the data tree has to be explained. In general XPath expressions are built in the following way:

(modified graph from: Dr. Lothar Schmitz, "Dokumentenbeschreibungssprachen", 2001)
Shown in the upper picture, the first decision is, whether the path is absolute or not. This optional slash is followed by a sequence of steps - at least one. The steps themselves contain two must-have components, the axis and the node test, and one optional component, a sequence of predicates, which allow a refinement of the preceding parts. In the next graph the possible values of the axis, the first component, are shown (the actual node - self - is marked by an "S"):

(modified graph from: Henning Behme & Stefan Mintert, "XML in der Praxis", Addison-Wesley, 2000)
Additional to the described axes (the "P" marks the parent), there are six more axes:
| following |
all nodes after the actual node (not only on the same level of the hierarchy) |
| preceding |
all nodes before the actual node (not only on the same level of the hierarchy) |
| descendant-or-self |
like descendant, but plus the actual node |
| ancestor-or-self |
like ancestor, but plus the actual node |
| attribute |
the attributes nodes of a node (can only be used in element nodes) |
| namespace |
the namespace nodes of a node (can only be used in element nodes) |
Now, as the axis is selected, the node test has to be defined. Every axis contains a principal node type. If it is an attribute axis, the principal node type is attribute, if it is an namespace axis, the principal node type is namespace, for every other axis the type is element. It is possible now to define the selection by the specification of a node type, as there are:
| text() | selects all contained text nodes |
| comment() | selects all contained comment nodes |
| processing-instruction() | selects all contained processing instruction nodes |
| node() | selects all contained nodes |
But additional to that, it is possible to define the selection by giving an element name or a wildcard ("*") for arbitrary elements. In the case of a specific name the selection is only true, if on the one hand the name of the node is the same as demanded, and on the other hand the node is of the principal node type.
The detailed explanation of all possible predicates and reductions would cost too much time and is not the point here. Fore more details about defining predicates, please reference the predicates definition in the XPath recommendation. The following examples will show some possible uses of paths with and without predicates.
| child::*/child::paragraph | selects all paragraph child elements of all child elements of the actual node (grandchild elements) |
| following-sibling::chapter[position() = 1] | selects the first chapter element from the actual nodes brother or sister |
| /descendant::chapter[attribute::no = "12"] | selects all chapter elements in the document, which contain an attribute no with the value 12 |
| /descendant::fit[sub and position() < 3] | selects the first two fit elements in the document, which contain themselves at least one subelement sub |
| ../@attrib[2] | selects the second attribut attrib of the parent element |
For more detailed information about XPath, please follow the links below.
| the XPath definition |