In this paper, we gave an introduction to GXL 1.0 and its current applications. We conclude with a summary of the key features of GXL and an assessment of its merits as a standard exchange format.
GXL is an XML sub-language for representing graphs. The main features of the model in GXL are as follows.
Graphs, nodes, edges, and hyperedges are first class entities in GXL. Consequently, each of these have their own identity, can be typed and attributed, and can be included in a generalization hierarchy.
Edges can be directed or undirected. This flexability is necessary in a general format for graphs.
Both directed and undirected edges are permitted in the same graph.
First class entities have attributes. This feature is used to add information to graphs, nodes, edges, and hyperedges. For example, user annotations or (x, y) locations for graph layout, are attached to the graph and passed in GXL as attributes.
First class entities are typed. Graphs, nodes, edges, and hyperedges are typed by associating
them with with a corresponding class in the schema. These relationships provide further information and constraints on the data.
Hierarchical graphs are supported. This feature is implemented by permitting first class
entities to further contain graphs. Edges and hyperedges are allowed to join nodes from different levels of the hierarchy.
Edges and hyperedges are ordered. Incidence to and from the nodes at the endpoints of
edges and hyperedges can be stored.
In GXL, both the actual data representing the graph and the schema are passed using the same graph model as an XML stream. A schema is created by first modelling it as a UML class diagram and
then converting the diagram into a GXL graph according to the GXL meta-model. This uniform application of syntax across the different levels of abstraction ensures that tools that implement
the GXL DTD are capable of working with a variety of data.
We now re-visit the criteria for the success from Section [2.3] (reproduced here for convenience) and use them here to assess GXL.
works for several purposes (e.g. various levels of abstraction, several languages),
works for multi-million lines of code (e.g. 3 to 10 MLOC),
maps between entities/relationships and source code statements (e.g. line number or AST node),
is incremental (e.g. one subsystem at a time can be added),
is universal (e.g. used by others),
We feel that GXL does a good job of satisfying these criteria. By means of examples, we have explained how GXL meets Criteria 1 and 2. The adaptability and scalability of GXL is given by exchanging both instance data and schema data. Mappings between graph representation and source code (Criterion 3) are specified by appropriate attributes (cf. Figures 2 and 3). Each subsystem can have its own graph, represented in its own GXL stream; these can be merged by a graph manipulator such as Grok to create a combined graph, also represented in GXL. Furthermore GXL provides hierarchical graphs for substructuring complex systems (Criterion 4). The list of GXL users in Section  shows that GXL is already used by various
groups from different areas to make their tools interoperable (Criterion 5). Since GXL models such a rich class of graphs and their schemas, it can handle a variety of applications, both inside and outside the reengineering community (Criterion 6).
Developing and deploying GXL has been an exciting and challenging experience. Through many intense discussions, we were able to build bridges between research groups and even between research areas and nationalities. Arriving at a standard required us to understand the differences in data formats, research approaches, and problem domains. The result has been fruitful collaborations
between researchers and improved data interoperability between tools.
GXL is currently being applied and evaluated by the research community. There is work still to be done in developing standard schemas and broadening the acceptance of GXL. We look forward to maturing GXL along with the research discipline and tools for reengineering.