Introduction to XML XML: eXtensible Markup Language Korth, Sudarshan – Chapter 23 1 Need •When some data needs to be exchanged between 2 computers – • for e.g., Shruti uses Jio service in India and she visits USA for some work. Jio has tie up with AT&T. So, when Shruti uses her Jio number in USA, AT&T needs to send details of incoming and outgoing phone calls to Jio. AT&T will have all the details in their database (the schema of which could be different from Jio). • So, when such data needs to be sent, most commonly XML is used. • Similarly, patient data 2 XML • XML stands for eXtensible Markup Language • XML was designed to describe data. • XML files contain tags and data • Tags (elements) are user defined 3 Data Interchange •XMLs key role is data interchange •Two business partners want to exchange customer data • Agree on a set of tags • Exchange data without having to change internal databases (heterogeneous) •Other business partners can join in the exchange by using the tagset • New tags can be added to extend the functionality 4 XML ■ XML is like HTML but tags are not predefined i.e., you can have your own tags ■ Like we have database table to define schema /structure of data, XML DTD define schema of XML document / data 5 XML data / document rules (simple text file) • Start with <?xml version="1.0"?> • XML is case sensitive • You must have exactly one root element (like table name) that encloses all the rest of the XML • Every element must have a closing tag • <fname> maya </fname> • Elements must be properly nested <?xml version=“1.0”?> <customer> …. </customer> https://www.tutorialspoint.com/online_xml_editor.htm 6 Building Blocks of XML • Elements (Tags) are the primary components of XML documents. Element FNAME nested inside element Author. <FNAME> JAMES</FNAME> <LNAME> RUSSEL</LNAME> <AUTHOR id = “123”> Element Author with Attr id </AUTHOR> <!- I am comment -> • Attributes provide additional information about Elements. Values of the attributes are set inside the Elements • Comments stats with <!- and end with -> 7 XML Document Type Definition (DTD) • When the receiver receives XML file, it needs to know, whether XML is following the structure and if it is legal. DTD is used • A DTD is a set of rules that allow us to specify our own set of elements and attributes (i.e., schema). • XML Document is valid if it has an attached DTD and document is structured according to rules defined in DTD. • DTD is grammar to indicate what tags are legal in XML documents. 8 XML and DTD Example <?xml version="1.0”?> <BOOKLIST> <BOOK GENRE = “Science” FORMAT = “Hardcover”> <AUTHOR> <FIRSTNAME> RICHRD </FIRSTNAME> <LASTNAME> KARTER </LASTNAME> </AUTHOR> </BOOK> </BOOKLIST> <!DOCTYPE BOOKLIST[ <!ELEMENT BOOKLIST(BOOK)*> <!ELEMENT BOOK(AUTHOR)+> <!ELEMENT AUTHOR(FIRSTNAME,LASTNAME)> <!ELEMENT FIRSTNAME(#PCDATA)> <!ELEMENT>LASTNAME(#PCDATA)> <!ATTLIST BOOK GENRE (Science|Fiction)#REQUIRED> <!ATTLIST BOOK FORMAT (Paperback|Hardcover) “PaperBack”> ]> 9 DTD (cont’d) Indicator Occurrence (no Required One and only one indicator) ? Optional None or one * Optional, repeatable None, one, or more + Required, repeatable One or more <!ELEMENT BOOKLIST(BOOK)*> <!ELEMENT BOOK(AUTHOR)+> 10 Exercise • Write a DTD for <novel> <foreword> <paragraph>This is the great Indian novel.</ paragraph> </foreword> <chapter number=“1”> <paragraph>It was a dark and stormy night.</paragraph> <paragraph>Suddenly, a shot rang out! </paragraph> </chapter> </novel> 11 DTD <!DOCTYPE novel [ <!ELEMENT novel (foreword, chapter+)> <!ELEMENT foreword (paragraph+)> <!ELEMENT chapter (paragraph+)> <!ELEMENT paragraph (#PCDATA)> <!ATTLIST chapter number CDATA #REQUIRED> ]> 12 XML Query Languages • Same functionality as database query languages (such as SQL) • XPath • XQuery 13 XPath Example Sample XML: <Student id= “s1”> <Name>John</Name> <Age>22</Age> <Email>jhn@xyz.com</Email> </Student> XPath: is a location path XPath: /Student [Name=“John”]/Email Output: <Email> element having value “jhn@xyz.com” 14 Examples <Patients> <Patient id=“p1”> <Name>John</Name> <Address> <Street>120 Northwestern Ave</Street> </Address> </Patient> <Patient id=“p2”> <Name>Paul</Name> <Address> <Street>120 N. Salisbury</Street> </Address> </Patient> <OpdPatient id=“o1”> <Name>Henry</Name> <Address><Street>New York</Street></Address> </OpdPatient> </Patients> 15 XPath examples <Patients> <Patient id=“p1”> <Name>John</Name> <Address> <Street>120 Northwestern Ave</Street> </Address> </Patient> <Patient id=“p2”> <Name>Paul</Name> <Address> <Street>120 N. Salisbury</Street> </Address> </Patient> <OpdPatient id=“o1”> <Name>Henry</Name> <Address><Street>New York</Street></Address> </OpdPatient> </Patients> • /patients/patient/Name – retrieves all patient names • starting with the root, traverses the tree, matches element 16 XPath by Example /Patients/(Patient|OpdPatient)/Address addresses of patient or opd patient /Patients/*/Name Names of patient or opdpatient /Patients//Name Names that are descendants of Patients /Patients//@id /Patients//OpdPatient[Name] value of the id attribute of descendants of Patients Patients that have a subelement firstname /Patients//[Street=“New York”] Specific condition 17 XPath URL • https://www.freeformatter.com/xpath-tester.html#before-output 18 XQuery • XQuery to XML is same as SQL to RDBMS • Most databases support XQuery • XQuery is built on XPath operators (XPath is a language that defines path expressions to locate document data) for $x in doc("books.xml")/bookstore/book where $x/price>30 order by $x/title return $x/title (from) (where) (order by) (select *) 19 Core Concepts of XQuery XQuery is an extremely powerful query language for XML data. A query has the form of a so-called FLOWR expression: FOR $var1 IN expr1, $var2 IN expr2, ... LET $var3 := expr3, $var4 := expr4, ... ORDER BY $var4 WHERE condition RETURN result-doc-construction The FOR clause evaluates expressions (which may be XPath-style path expressions) and binds the resulting elements to variables. For a given binding each variable denotes exactly one element. The LET clause binds entire sequences of elements to variables. The ORDER clause sorts the result The WHERE clause evaluates a logical condition with each of the possible variable bindings and selects those bindings that satisfy the condition. The RETURN clause constructs, from each of the variable bindings, an XML result tree. This may involve grouping and aggregation and even complete subqueries. 20 XQuery Result can be returned in the form of XML FOR $x IN document("bib.xml")/bib/book RETURN <result> $x </result> FOR $x IN document("bib.xml")/bib/book WHERE $x/year > 1995 RETURN $x/title LET $a=avg(document("bib.xml")/bib/book/price) FOR $b in document("bib.xml")/bib/book WHERE $b/price > $a RETURN $b 21 XQuery Example // find Web-related articles by Dan Suciu from the year 1998 <results> { FOR $a IN document(“literature.xml“)//article FOR $n IN $a//author, $t IN $a/title WHERE $a/@year = “1998“ AND contains($n, “Suciu“) AND contains($t, “Web“) RETURN <result> $n $t </result> } </results> 22 Popular XMLs • MathML (minus, plus, superscript, …) • ChemML • PhysML • CommerceXML • voiceXML 23 XML Security • XML is widely adopted in all aspects of Internet commerce • Hence, essential ingredients of all electronic security systems—data integrity, authentication, and confidentiality—must be supported. • XML security is addressed by a family of standards designed to help developers build secure XML-based applications https://www.informit.com/articles/article.aspx?p=601349&seqNum=2 24 XML Security • The core XML standards related to encryption and digital signatures are maintained by the W3C. • Other standards such as Security Assertion Markup Language (SAML) and Extensible Access Control Markup Language (XACML) are maintained by OASIS, a nonprofit consortium that drives the development of eBusiness standards. 25 XML Digital Signatures 26 XML Encryption • Encrypt the complete document. • Encrypt a single element with XML encryption. • Encrypt the content of an element. 27 Use Case • Suppose you want to send the XML file to a publishing company. • The XML contains details of a book order that includes book and credit card information. • The warehouse needs to see what book is being ordered but doesn’t need access to the credit card information. 28 Normal XML <?xml version="1.0"?> <Payments xmlns='http://globalbank.org'> <Payment> <Name>John von Neumann</Name> <Book>Godel, Escher, Bach: An Eternal Gold Braid</Book> <ISBN>046502656</ISBN> <CreditCard Limit='5000' Currency='Euro'> <Number>4654 2445 0277 5567</Number> <Issuer>Bank of America</Issuer> <Expiration>04/09</Expiration> </CreditCard> </Payment> </Payments> If we encrypt the entire document, the warehouse won’t be able to read the book information, so let’s choose to encrypt the entire credit element and all its content, including sub-elements. 29 Encrypted XML <?xml version="1.0"?> <Payments xmlns='http://globalbank.org’> <Payment> <Name>John von Neumann</Name> <Book>Godel, Escher, Bach: An Eternal Gold Braid</Book> <ISBN>046502656</ISBN> <EncryptedData xmlns=‘http:www.w3.org/2001/04/xmlenc#’ Type= =‘http:www.w3.org/2001/04/xmlenc#Element’/> <EncryptionMethod Algorithm=‘http:www.w3.org/2001/04/xmlenc#tripledes-cbc’/> <CipherData><CipherValue>ABCDEF</CipherValue></CipherData> </EncryptedData> </Payment> </Payments> 30 XML Encryption Original/Decrypted Encrypted <?xml version="1.0" encoding="UTF-8"?> <Customers> <Customer> <Name>Jose Aznar</Name> <CreditCard> <Number> 1000 1234 5678 0001 </Number> <ExpiryDate> 2003 June 30 </ExpiryDate> </CreditCard> </Customer> ... </Customers> <?xml version="1.0" encoding="UTF-8"?> <Customers> <Customer> <Name><EncryptedData…></Name> <CreditCard> <Number><EncryptedData…></Number> <ExpiryDate> 2003 June 30 </ExpiryDate> </Customer> ... </Customers> JSON What is JSON? •“JSON” stands for “JavaScript Object Notation” •Lightweight data-interchange format •Despite the name, JSON is a (mostly) language-independent way of specifying objects as name-value pairs •Structured representation of data object •Can be parsed with most modern languages •JSON Schema can be used to validate a JSON file •Very similar to XML •But no tags JSON Syntax Rules • Uses key/value pairs: • {“name”: “John”} • Uses double quotes around KEY and VALUE • Must use the specified types • File type is “.json” • A value can be a string, a number, true, false, null, an object, or an array • Strings are enclosed in double quotes, and can contain the usual assortment of escaped characters JSON Example { "name": "John Smith", "age": 35, "address": { "street": "5 main St.", "city": "Austin" }, "children": ["Mary", "Abel"] } JSON Schema • A JSON Schema allows you to specify what type of data can go into your JSON files. • It allows you to restrict the type of data entered. JSON Schema { "address": { "type": "object", "type": "object", "properties": { "properties": { "name": { "street": { "type": "string" "type": "string" }, }, "age": { "city": { "type": "integer" "type": "string" }, } } }, "children": { "type": "array", "items": { "type": "string" } } } } Validating JSON file • The following website can be used to validate a JSON file against a schema https://www.jsonschemavalidator.net/ • Paste both the schema and the corresponding JSON file JSONiq FLOWR. This is an extension of XQuery JSON Injection • Injection attacks in web applications are cyber attacks that seek to inject malicious code into an application to alter its normal execution. Injection attacks can lead to loss of data, modification of data, and denial of service. JSON Injection Occurs when: • Data from an untrusted source is not sanitized (validation) by the server and written directly to a JSON stream. This is referred to as server-side JSON injection. • Data from an untrusted source is not sanitized and parsed directly. This is referred to as client-side JSON injection. JSON Injection • The data supplied by the user (username, password and account type) is stored on the server side as a JSON string. Since the application is not sanitizing the input data, a malicious • user decided to append unexpected data to their username: richard%22,%22Account%22:%22administrator%22. Consequently, the resultant JSON string becomes: While reading, second account value will take precedence (which is administrator) 42 JSON Injection • Such attacks are very common and easy. • This is possible of no appropriate security mechanisms are employed. 43