GXL Ric Holt , Andy Schürr, Susan Elliott Sim, Andreas Winter
Graph eXchange Language


    XML Schema

    Graph Model

    Tool Catalogue

    Change Requests
    GXL 1.1

[2. Data Interoperability of Reengineering Tools] << [Table of Contents] >> [4. Exchanging Graph Schemas]

3 Exchanging Graphs with GXL

3.1 Exchanging typed, attributed, directed, ordered graphs | 3.2 Exchanging Hypergraphs | 3.3 Exchanging Hierarcical Graphs | 3.4 GXL DTD)

In the previous section, we argued that a graph-based standard exchange format is appropriate for reengineering. In this section, we discuss the specific graph features included in GXL and how these can be used to represent software.

GXL supports graphs which can have directed or undirected edges, typed nodes and edges, attributes attached to nodes and edges, and ordered edges [56]. Section [3.1] illustrates the use of these GXL features.

To this set of features, GXL adds n-ary edges (hyperedges) as well as hierarchic graphs (subgraphs within graphs). Sections [3.2] and [3.3] illustrate the use of these features. Finally, the GXL language definition is given in [3.4] using an XML document type definition (DTD).

3.1 Exchanging typed, attributed, directed, ordered graphs    [top]

Figure 2 shows a fragment of source code along with its corresponding UML object diagram [57]. The diagram is at the level of an abstract syntax graph. In the program, function main calls function max in line 8 and function min in line 12.

typed, attributed, directed, ordered graph
Figure 2: typed, attributed, directed, ordered graph

In the diagram, functions main, max and min are represented by nodes of type Function, while variables a and b are represented by nodes of type Variable. These nodes are attributed with the names of the functions and variables.

The calls to functions max and min are represented by FunctionCall nodes. These nodes are associated with the caller by isCaller edges and with the callee by isCallee edges. The isCaller edges are attributed with a line attribute giving the line number that contains the call. Parameters (represented by Variable nodes) are associated with function calls by isInput edges. The ordering of parameter lists is given by the ordering incidences of isInput edges pointing to FunctionCall nodes. The first edge of type isInput incident to function call v2, for the call max(a,b), comes from node v6 representing variable a. The second edge of type isInput comes from the second parameter b (node v7). The ordering of the parameters of the other call (v3) are represented analogously.

GXL provides constructs for exchanging graphs such as the one in Figure 2. These constructs represent nodes, edges and edge ordering, as well as type information and attribute values.

Example [3] depicts the graph from Figure 2 as an XML document following the GXL structure. The second and third lines of Example [3] give the DTD version for GXL as gxl-1.0.dtd. The body of the GXL document is enclosed in gxl tags. The fifth line gives the name of the graph as simpleGraph and specifies that edges are to have identifiers, such as e5. Next, the graph refers to its associated graph schema named Schema (cf. Section [4]) stored in file schema.gxl.

Nodes and edges are represented by node and edge elements. These can be located by their id attribute. Incidence information of edges including edge orientation is stored in from and to attributes within edge tags. Ordering of incidences is also represented here. Attributes fromorder and toorder represent the order of an edge in the incidence list of its start and target node. Node and edge types are represented by links pointing to the appropriate schema information. These links are enclosed in type elements.

The node and edge elements may contain further attribute information. The attr elements describe attribute names and values. Like OCL [58], GXL provides bool, int, float, and string attributes. Furthermore, enumeration values (enum) and URI-references (locator) to externally stored objects are supported. GXL offers composite attributes including sequences (seq), sets (set), multi sets (bag), and tuples (tup).

3.2 Exchanging Hypergraphs    [top]

GXL supports hypergraphs [59] (graphs with n-ary edges) as well as graphs with (binary) edges. These n-ary edges can be typed, attributed, directed or undirected and ordered.

Figure 4 shows a hypergraph in UML notation, modeling the function call a = max(a,b) by a 5-ary hyperedge of type FunctionCall2. The diamond, representing the hyperedge, is connected by lines (tentacles) to its related Function and Variable nodes. These tentacles are marked with roles, identifying caller, callee, input, and output. Numbers on the tentacles give the ordering of parameters. The hyperedge is attributed with a line attribute giving its line number as 8.

Figure 4: hypergraph

The GXL representation of this hyperedge is given in Example [5]. Hyperedges are represented by rel (relation) elements. Like node and edge elements, rel elements contain type (type) and attribute (attr) information. Tentacles, which point to the related graph objects (target), are represented by relend (relation end) subelements. Roles of tentacles are stored in role attributes. The ordering of tentacles at the hyperedge is given by startorder attributes. The ordering of tentacles at target objects is given by endorder attributes. Directed or undirected hyperedges and tentacles are distinguished by attributes isdirected and direction. Eges, which are inherently binary, can be represented as 2-ary hyperedges. This means GXL does not need to support edges explicitly. However, since binary edges are so common, GXL provides a special notation edge for them.

3.3 Exchanging Hierarchical Graphs    [top]

Although graphs are intuitive and convenient, when large, they become complex to manage and to visualize. This complexity can be reduced by introducing subgraphs, in which parts of graphs representing related objects are grouped into subgraphs. The resulting hierarchical graphs [60] support structuring of graphs by grouping and encapsulation.

hierarchical graph
Figure 6: hierarchical graph

Figure 6 gives an example of a hierarchical graph. Node v4, which represents the max function from Figure 2, contains a subgraph representing max's function body. The GXL representation in Example [7] shows this subgraph as a graph element inside node v4. Subgraphs inside edges or hyperedges are written analogously (cf. the GXL DTD in Section [3.4]).

The GXL form of hierarchical graphs is convenient when there is a strong sense of ownership that can be modeled by nesting of graphs. This form permits edges and hyperedges crossing the boundaries of graph hierarchies.

GXL provides one explicit form for graph hierarchies. There are alternate approaches to modeling them. For example, references to subgraphs and their elements may be represented using locator attributes pointing to their appropriate GXL representations. This approach does not support connectivity between sub- and supergraphs. Since locator attributes usually refer to external documents, the subgraph is only visible from the supergraph, and not vice versa.

This section introduces the structure of the GXL notation as a sublanguage of XML. It does this by giving a UML class diagram that defines the kind of graphs provided by GXL. This serves as origin for specifying GXL's DTD.

3.4 GXL DTD    [top]

This section introduces the structure of the GXL notation as a sublanguage of XML. It does this by giving a UML class diagram that defines the kind of graphs provided by GXL. This serves as origin for specifying GXL's DTD.

GXL Graph Model (attributes omitted)
Figure 8: GXL Graph Model (attributes omitted)

The class diagram for the graph features of GXL (omitting details about attributes) is given in Figure 8. As can be seen a Graph contains GraphElements, which are Nodes, Relations, and Edges. To support hierarchical graphs, each GraphElement may contain other Graphs. Edges record binary connections and Relations record n-ary connections between GraphElements. (Note that GXL allows edges and hyperedges to make connections between other edges and hyperedges as well as between nodes.) Ordering of incidences is stored in order attributes of relatesTo associations. Graphs and graph elements can be typed and attributed. Graph types are defined by graph schemas represented as GXL-documents (cf Section [4]). This set of entities with their interrelationships means that GXL defines typed, attributed, directed, ordered, hierarchical graphs and hypergraphs.

The user writes GXL graphs as XML documents. Therefore, it is important to specify the syntax of GXL, as a XML document type definition (DTD) (or as an XML schema definition). To keep this specification simple and understandable, its syntax was created manually, basically by translating Figure 8 into DTD notation. A commented version of this DTD is given at http://www.gupro.de/GXL/dtd/dtd.html.

The handcrafted GXL DTD has only 18 XML elements. In contrast, a DTD for GXL generated with IBM's XMI (XML Metadata Interchange) Toolkit [61] requires 66 elements for the GXL core and an additional 63 elements for XMI and Corba compatibility.

The GXL DTD begins by specifying predefined extension points (cf [39]) for customizing GXL. These lines can be used to add subelements or attributes to their corresponding graph elements. The rest of the DTD gives the syntax for graph components (graph, node, edge, rel, relend), attributes (attr), and references (type) to schema information.

XML and DTD notation impose syntactic constraints on GXL documents. However, there are additional semantic GXL constraints that XML and DTD are not able to impose. Some semantic constraints, like "edges and rel elements only connect elements of the given graph" are supported by XML. To do this, the referencing mechanism for identifiers (ID, IDREF) is used. The more restrictive constraint that these references are only allowed to refer to graph elements (and not attributes), is not expressible by XML and DTD notation. Additional constraints like these must be defined outside the DTD. The additional constraints that GXL imposes are as follows:

  • Edges and hyperedges only connect graph elements. Each IDREF pointing to incident graph elements refer to node, edge and rel elements only.
  • Edges and hyperedges only connect graph elements within the same graph. Each IDREF pointing to incident graph elements has to refer a graph element, which is defined within the same graph element representing the convenient subgraph.
  • Attribute identifiers have to be unique for each graph element. Each node, edge and rel element does not contain multiple attr elements with the same name.
  • Ordered incidences have to be linear. All fromorder/toorder attributes of edge elements and all startorder/endorder attributes of relend elements, respectively have to define a proper ordering according their incident graph elements.
A summary of these constraints (including a GXL validator suite) is given in [62].

[2. Data Interoperability of Reengineering Tools] << [Table of Contents] >> [4. Exchanging Graph Schemas]
July 17, 2002

[change log]
[printable version of this page]