GXL Ric Holt , Andy Schürr, Susan Elliott Sim, Andreas Winter
Graph eXchange Language


    Background
    Introduction
    FAQ
    Examples
    Publications




    DTD
    XML Schema




    Graph Model
    Metaschema




    Tool Catalogue
    Downloads




    Change Requests
    Future
    GXL 1.1

[1. Introduction] << [Table of Contents] >> [3. Exchanging Graphs with GXL]

2 Data Interoperability of Reengineering Tools

(
2.1 Interoperability of graph-based tools | 2.2 Collaborating Toolsets | 2.3 Requirements for a standard exchange format | 2.4 Genealogy of GXL)

GXL was created to fulfill the need to exchange data between reengineering tools. Previously, interoperability between tools relied on converters between local formats. This approach requires case-by-case negotiation of exchange semantics. As the research area matured, it became apparent that a standard exchange format was needed and that this format should provide a mechanism to help articulate semantics.

These experiences with interoperability and local file formats form the context for the development of GXL. Moreover, they circumscribe the requirements and criteria for success for a standard exchange format. In this section, we will describe this background and how it informed the emergence of GXL.

2.1 Interoperability of graph-based tools    [top]

A variety of reengineering tools employ graphs as an internal data representation. With improved data interoperability, reengineering workbenches can be composed by choosing the best component for a particular task. A typical reengineering workbench consists of three types of tools: Extractors, Abstractors, and Visualizers [4].

Extractors

These tools extract information from software artifacts, such as source code. Examples of such extractors for the C/C++ source language are ACACIA [5], CPPX [6], and Columbus/CAN [7]. ASIS (ADA Semantic Interface Specification) [8] offers similar functionality.

Abstractors

These components of reengineering workbenches analyze the extracted data, generating further information, and sometimes changing the form of the data. Tools of particular interest here treat the data as graphs and these include the PROGRES graph transformation system [9] and GUPRO [10]. General graph-based query mechanisms such as Grok and GReQL are used for analyzing graph-structured data [11]. RPA uses a relational approach to analyzing software systems [12]. Further specialized abstractors have been developed for architectural analysis and recovery [13] [14], for control flow, data flow, and dependency analysis [15] [16], and for software metrics [17].

Visualizers

These tools display the information derived in the previous steps. This information can be textual or graphical. Source code browsers are typical textual visualizers [18] [19]. Graphical visualizers have been used to display class diagrams [20], sequence diagrams [21], and statecharts [22]. General graph drawing tools, for instance daVinci [23], Graphlet [24], and GraphViz [25], have been used to visualize small- and medium-sized graphs. Large complex graphs are better handled by visualization tools designed for reengineering, such as Rigi [26] and Shrimp [27].

2.2 Collaborating Toolsets    [top]

The approaches and tools shown above, provide good support for various aspects of reengineering. Individual tools from different workbenches have been combined to tackle a range of reengineering challenges. Here are some illustrative examples of the data exchanges [28].

  • Acacia C++ parser tool kit for C++ and graph drawing, by AT&T Labs. There is a command line interface that allows extraction of facts from Acacia's database of facts about an parsed C++ program. The analysis is at the external declaration level. These facts were converted into a corresponding TA stream and used to analyzing Mozilla source code using PBS tools.

  • Dali reverse engineering tool kit, by Software Engineering Institute. This tool kit combines features from a number of tools. To analyze the Linux kernel, SNiFf+'s API was used to extract facts about Linux. These facts were stored in TA, analyzed using a relational database, and the results viewed using Rigi.

  • CPPAnal by Harry Sneed and GUPRO from University of Koblenz. The CPPAnal tools extract source code information on an architectural level from large software systems and stores them in SQL tables. By using GraX these tables are transfered into TGraphs for further analysis with GUPRO tools. The G2HTML filter translates TGraphs into HTML documents. These HTML documents are used in GUPRO to browse large graphs.

All of these collaborations were made possible through converters that take files from one local format and transform the data into another local format. While this approach has been used successfully, it does not scale well. In other words, a converter would need to be written for each pair of local data formats and this effort quickly becomes unmanageable. A standard exchange format serves an intermediary for these file formats; tool developers would only need to convert to and from the local format and the exchange format. Individual tools may still use their own internal representations, including internal databases, and in-memory data structures, for analyzing and transforming data.

2.3 Requirements for a standard exchange format    [top]

Examination of the collaborations in the previous subsection and those reported that we have reported elsewhere [28] provide insights into the requirements for a standard exchange format. Many of the file formats from reengineering are graphs or are compatible with graphs. Consequently, we also examined formats from graph drawing and graph transformation. We took the best features from these formats and combined them. We decided to create standard exchange format that was graph-based, because graphs have a well-understood mathematical foundation. By building on this structural elegance, our format can be used to represent data from a range of domains with disparate structures.

Other authors who have performed an examination of data interoperability have identified the following criteria for the success for a standard exchange format [29] [30] [31]:

  1. works for several purposes (e.g. various levels of abstraction, several languages),
  2. works for multi-million lines of code (e.g. 3 to 10 MLOC),
  3. maps between entities/relationships and source code statements (e.g. line number or AST node),
  4. is incremental (e.g. one subsystem at a time can be added),
  5. is universal (e.g. used by others),
  6. is extensible.

Criterion I reflects the amount of variability in the data exchanged. There are differences in the data model used to represent software, level of abstraction, and the source language input to extractors. A standard exchange format would need to be flexible enough to be an intermediary in these and other situations. Criterion II is essentially scalability. A software system can be quite large, so fine-grained representations, such as an abstract syntax tree, can have millions of nodes and edges. Criteria III and IV give specific requirements and are derived from the software analysis problem domain. The final two criteria relate to adoption. The success of a standard exchange format is measured by the number of users, rather than technical elegance. Therefore, the more people who use the format, the more successful it is and the greater the benefits from using the format. Finally, having a format that is extensible increases its flexibility and its appeal to users from other problem domains who may have additional requirements.

We decided to create a new format rather than use one of the existing because i) we needed a format that was simultaneously compatible with as many of these as possible, ii) has only and all the necessary graph features, iii) was flexible enough to work with disparate data and different levels of abstraction, and iv) was a simple. We further decided to rely on XML because it allowed us to define our own structure, while at the same time providing infrastructure for constructing import and export tools. One repercussion of this decision is the size of the files being exchanged. These files will be larger due to XML syntax and the length of tags and parameter names. However, this is a problem faced by all XML users. Moreover, standard compression techniques are effective due to the amount of repetition in the files.

2.4 Genealogy of GXL    [top]

Development of GXL began with a merger of GRAph eXchange format (GraX) [32], Tuple Attribute Language (TA) [33], and the file format from the PROGRES graph rewriting system [9]. Features were added to this initial graph model to handle hierarchical graphs and hypergraphs. Other ideas were taken from data formats used in reengineering (ATerms [34], RPA [12], RSF [35]), graph drawing (daVinci [23], GML/Graphlet [36], GRL [37], XGMML [38], GraphXML [39], and graph transformation [40]. Consequently, GXL can be be viewed as a generalization of these formats. The genealogy of GXL is depicted in Figure 1.

GXL Genealogy
Figure 1: Genealogy of GXL

The development of GXL was advanced during various conferences and workshops since 1998. Initial discussions on defining a general exchange format for reengineering were held at WCRE 1998 [41] and at CASCON 1998 [42]. Approaches for graph-based exchange formats were discussed during meetings at WCRE 1999 [43], and GROOM 2000 [44]. These discussions resulted in an initial prototype of GXL, which was presented at the ICSE 2000 Workshop on Standard Exchange Formats (WoSEF) [45]. It was subsequently discussed, compared, and critiqued at meetings on exchange formats at APPLIGRAPH [40] and Graph Drawing [46]. Refinements of the prototype were presented at conferences and workshops throughout 2000, including CASCON~2000 [47], [48] and WCRE~2000 [49]. GXL was ratified as a standard exchange format in software reengineering at the Dagstuhl Seminar ``Interoperability of Reengineering Tools'' in January 2001 [50].

GXL was subsequently presented at meetings in other research areas. The graph transformation community is using GXL as a starting point for the Graph Transformation Exchange Language (GTXL) [51] [52]. In this context GXL is being used to represent graphs and work is under way to add features for representing transformation rules. This decision was made after the APPLIGRAPH meetings for exchange formats [40] and the GraBaTs Workshop on Graph-Based Tools [53]. Discussions have been held with the graph drawing community to make GXL a standard exchange format for graph layouts as well. Presentations were made at GD2000 [46] and a panel held at GD2001 [54]. It is possible to combine the graph model from GXL with the graph layout and the modularization features from GraphML to form a general and comprehensive graph exchange format [55].

The current version of GXL, news about ongoing development efforts, and up-to-date information including tutorials and documentation are available at http://www.gupro.de/GXL.

[1. Introduction] << [Table of Contents] >> [3. Exchanging Graphs with GXL]
top
July 17, 2002

[change log]
[printable version of this page]