1 eXtensible Markup Language (XML) Extensible Markup Language, or XML for short, was developed by the SGML1 Editorial Board of the World Wide Web Consortium (W3C). The initial XML draft was presented in 1996 at a conference in Boston. While the official W3C specification (XML 1.0) was presented in 1998 by the headquarters of the World Wide Web Consortium at the Massachusetts Institute of Technology. XML is a new technology for web applications that simplifies business-to-business transactions on the web and lets the users create their own tags. [1] XML is called extensible because it is not a fixed format Language but actually it is a language for describing other languages, which lets the users design their own customized mark-up languages for unlimited different types of documents. [2] C, C++, Pascal, Java and many more are programming languages in which users specify calculations, actions, and decisions to be carried out in order. Those languages are differ from XML, which says nothing about what to do with that data and used to design ways of describing information (text or data), storage, transmission, or processing by a program, Moreover any programming language can be used to output data from any source in XML format, Java Language appears to be the most popular one at the moment. [2] An XML document is a database only in the strictest sense of the term. That is, XML is a collection of data, which makes it not different from any other files. As a "database" format, XML has some advantages. For example, XML describes the structure and type names of the data, but not the semantic, it is portable, and it can describe the data in tree or graph structures. On the other hand it has some disadvantages. For example, the data access due to parsing and text conversion is slow. The XML documents are suitable to be used as a database in environments with small amounts of data, few users, and modest performance requirements, in other words XML isn’t suitable in environments with many users, strict data integrity’ requirements, and good performance requirements. [3] 1 Indexed list of topic on XML could be found at : http://www.idealliance.org/papers/dx_xmle03/index/keyword/ 2 Why XML Using XML has many advantages; it taps the potential of the World Wide Web and other technologies for disseminating “distribute” information accurately, quickly, and independently of specific software applications or hardware platforms. Other advantages of using XML can be: Reusing Content and "Modularity": Sharing information across the Enterprise. Reviewing and Translating Large Documents. Automating Tasks. Increasing Accuracy. Increasing Timeliness. A conceptual view of XML An XML document generally consists of two parts, header and continent. The header, which is an XML declaration, defines and gives XML application information such as how to handle the documents. The content, which is the XML data itself consist of three parts Root element: The root element for an XML document is the highest-level element in that document, which surrounds all the other document tags. The root element must be the first opening tag and the last closing tag in the document. Elements nodes: Each element node is a labelled with a name (often called the element type), and a set of attributes, each consisting of a name and a value. Each of these element nodes can have child and descendant nodes Character data: The characteristic data are the leaf nodes that contain the actual data (text strings). Usually, it must be non-empty and non-adjacent to other character data nodes. The XML document can be presented as following: <?xml version="1.0" encoding="iso-8859-1"?> the header 3 <Users> <User id=”15”> <UserName> marouf</UserName> <PassWord> marouf</PassWord> </User> … </Users> the content An important point that I have to mention is the XML Name Space (xmlns), which is the cheapest way of getting a unique name. Both elements and attributes can be qualified by name space; the Universal Resource Identifier URI will do this qualification. Then the element is defined not only by its name but also by the URI. Xmlns are not supposed to point at anything; the name space is usually a URL because the URLs are unique. A concrete view of XML An XML document is a (Unicode) text with markup tags. Moreover it is a well formed so the start and end tags are matched (i.e. start tag must match end tag), the element tags are properly nested, it is case sensitive, it can use a non Latin characters, and the white space is used for indentation and contents. 4 <USer id="15"> ... ... </user> | | | | | | | a matching element end tag | | the contents of the element (could be text | an attribute with name “id” and value “15”. an element start tag with name client or element)