What is XML?
XML (eXtensible Markup Language) is a subset of SGML(1 | 2) (Standard Generalized Markup Language), which exists since the 1980s and turned out to be too complicated and bulky. As SGML, XML is also a Document Description Language, but simply designed and easy to understand and use.
In consequence of its relationship to SGML, XML is a meta language, too, which means, that it is possible to create selfdefined, non-static and free configurable document markup with XML and format the data with it - in opposition to HTML (HyperText Markup Language), which uses static markup. The advantages of this non-static character are useful, if a document has to be structured with markup in a way which is not supported by the standard; there are only a few ways to do this in HTML (one is the example, using frames, shown in the next picture). But there is the opposite in XML - in the frameset example, it is possible to define own structure with frames as subelements in one file (the example is shown in the next picture).

The little picture below shows XML as a subset of SGML, but also as a meta language as SGML. To extend this tiny visualization a bit, HTML and XHTML are fit in, too. As it is shown, HTML is an application of SGML, a static markup language, designed for the WWW. As some disadvantages of HTML appeared in the last few years (such as bad flexibility, not enough functionality, or bad data-handling), while the web developed itself, XHTML has been designed as an improved successor of the old internet language. On the one hand, it is based on XML (and therefore you can use XMLs structure - like DTDs or schemata) and on the other hand, it is HTMLs fellow, which means, that is supports the standard functionality of HTML.

The most important feature of XML is the good web functionality, of course, but it is also used for publications, commercial applications and data exchange, as it is used by GXL as an instance. In the design phase the developers of XML defined a few goals they would try to realize in the construction of XML:
|
1) the use of XML in the internet should be simple 2) XML should support a wide range of applications 3) XML should be compatible with SGML 4) it should be easy to write programs which process XML documents 5) the amount of optional features should be limited to a minimum (in the ideal case to zero) 6) XML documents should be structered cleary and simple enough, to be legible for the user 7) the XML design should be available soon 8) the design of XML should be formal and significant 9) the production of XML documents should be simple 10) a clear structure should be more important than a small size of the document (less reductions) |
These goals show a lot of advantages to other languages, but XML and SGML have one more significant difference to other document formats: they make a difference between the structure, the data and the appearance of the document, which are also stored in different files. Because of this division, e.g. it is possible to change the style, without editing the data, as it has to be done in HTML.

The following XML code shows the so called Prolog - the head of an XML document:
|
<?xml (the XML declaration) version = >vnr< (the XML version) encoding = >code< (the character set)
standalone = >yes/no< ?> <?xml-stylesheet (the
XML stylesheet reference
href
=
>data< type = >typ< (the type of the stylesheet (e.g. XSL) )
...
?> <!DOCTYPE
>root_element< SYSTEM
>external_DTD< [ >internal_DTD< ] (the internal parts of the DTD - partial respectively complete redefinition of the external DTD is possible) > |
As can be seen the XML document references its stylesheet and DTD, whereby the reference of the stylesheet is optional. For further information about the stylesheet language XSL (Extensible Stylesheet Language), please reference the W3C Recommendation. In this explanation only the structure of a DTD has to be explained, because as well GXL as XMI are defined by such a DTD, wherefore all GXL and XMI documents can be validated by these definitions.
The two possible forms of DTDs have already been mentioned; on the one hand there is the internal DTD-subset, and on the other hand there is the external DTD-subset. Both of them are structured in the same way and differ only in the way of storing. The internal version is declared in the head of the XML document and the external one is declared in an separate file. Furthermore, combinations are allowed, whereas the internal part has the higher priority. In the following the common structure of both versions of DTDs is explained.
The data model of the DTD is the tree, containing one root element. This root element and all contained elements are defined as follows:
| <!ELEMENT name_of_the_element (a_possible_contents_specification)> |
As shown, the element contains its name an a possible contents specifications. Such specifications are:
| (EMTPY) | the empty content (e.g. meaningfully for elements which have the function of storing attributes) |
| (ANY) | the arbitrary content (e.g. meaningfully for the test phase) |
| (#PCDATA) | Parsable Character Data - this content is checked for markup tags by the XML processor |
| (possible_contents_model) | described below |
| (mixed_content) | described below |
Defined as contents models are single subelements or accumulations of subelements which are linked by structure symbols. Are A1, ... , An possible contents models, then the following linked expressions are possible contents models:
| (A1, A2) | the inclusion of the next structure symbols - this is done by brackets |
| (A1, ... , An) | the konkatenation (the elements can not be changed in their sequence) - this is done by commas |
| (A1 | A2 | ... | An) | alternatives - these are marked by vertically lines |
| A1? | the option - defined by a question mark |
| A1* | the optional repetition (can be left out) - this is done by an asterisk |
| A1+ | the repetition (at least one element) - this is done by a plus |
| A1 | no symbol means exactly one element |
The mixed content allows the combination of contents specifications and contents models. But this is not supported by all XML processors and can be replaced by other declarations in every case. Therefore, this possibility is not explained further here.
One more important part of DTDs are attributes. If elements would be the nouns of an XML document, the attributes would be the adjectives of the document. The general syntax of an attribute declaration is defined as:
|
<!ATTLIST name_of_the_element name_of_the_attribute type_of_the_attribute specification > |
Therefore, a possible element using an attribute list may look like this:
|
<!ELEMENT person (#PCDATA)> <!ATTLIST person name CDATA #REQUIRED firstname CDATA #IMPLIED number ID #IMPLIED > |
And is implemented the following way in the corresponding XML document:
|
<!DOCTYPE person SYSTEM "bsp.dtd"> <person name="Example" firstname="Tom" number="ID0">The first person</person> |
As can be seen in the upper example there are different attribute types possible. The next table shows and explains the available types:
| CDATA | "Charakter Data" |
| ENTITY | an entity reference which is declared in the DTD (for further information see the W3C Recommendation) |
| ENTITIES | several entity references which are separated by blanks |
| ID | a unique identifier for the sourrounding element |
| IDREF(S) | a reference (respectively several references) to an ID, declared somewhere in the document |
| NMTOKEN(S) | an XML name token - this is whether a number or a name |
| NOTATION (name1 | ...) | a foreign format which is declared in the DTD |
| (value1 | ... | valuen) | the enumeration type |
Also used in the example are the value specifications. In the first attribute the value specification #REQUIRED is used which means that the attribute must be given a value in the implementation. On the other hand, #IMPLIED offers the option to give the attribute a value in the XML document. One more specification which can not be found in the example but is also possible is #FIXED and a following value. This one sets the value and makes it unchangeably for the user. The last possible specification is a simple value which sets the default value but can be changed.
At the end it has to be mentioned that DTDs are not the only way to define an XML tree. More and more XML Schemata take over the task range of conventional DTDs, because those Schemata offer different advantages compared with simple DTDs. On the one hand, XML Schemata are XML documents (as the name says) and are parsable data for an XML processor, and on the other hand, Schemata offer a lot more options which enables the user to define his XML structures more finely.
For more detailed information about XML, please follow the links below.
| the XML definition | |
| a introduction into XMl by Tim Bray | |
| what is xml? | |
![]() |
XML.com |
![]() |
The XML Industry Portal |