Uploaded by aishwarya.andhale

Unit 6 XML

advertisement
Introduction to XML
XML: eXtensible Markup Language
Korth, Sudarshan – Chapter 23
1
Need
•When some data needs to be exchanged between 2
computers –
• for e.g., Shruti uses Jio service in India and she visits USA for some work. Jio
has tie up with AT&T. So, when Shruti uses her Jio number in USA, AT&T
needs to send details of incoming and outgoing phone calls to Jio. AT&T will
have all the details in their database (the schema of which could be different
from Jio).
• So, when such data needs to be sent, most commonly XML is used.
• Similarly, patient data
2
XML
• XML stands for eXtensible Markup Language
• XML was designed to describe data.
• XML files contain tags and data
• Tags (elements) are user defined
3
Data Interchange
•XMLs key role is data interchange
•Two business partners want to exchange customer data
• Agree on a set of tags
• Exchange data without having to change internal
databases (heterogeneous)
•Other business partners can join in the exchange by using
the tagset
• New tags can be added to extend the functionality
4
XML
■ XML is like HTML but tags are not predefined
i.e., you can have your own tags
■ Like we have database table to define schema
/structure of data, XML DTD define schema of XML
document / data
5
XML data / document rules (simple text
file)
• Start with <?xml version="1.0"?>
• XML is case sensitive
• You must have exactly one root element (like table name) that
encloses all the rest of the XML
• Every element must have a closing tag
• <fname> maya </fname>
• Elements must be properly nested
<?xml version=“1.0”?>
<customer>
….
</customer>
https://www.tutorialspoint.com/online_xml_editor.htm
6
Building Blocks of XML
• Elements (Tags) are the primary components of XML
documents.
Element FNAME nested inside
element Author.
<FNAME> JAMES</FNAME>
<LNAME> RUSSEL</LNAME>
<AUTHOR id = “123”>
Element
Author with
Attr id </AUTHOR>
<!- I am comment ->
• Attributes provide additional information about
Elements. Values of the attributes are set inside the
Elements
• Comments stats with <!- and end with ->
7
XML Document Type Definition (DTD)
• When the receiver receives XML file, it needs to know,
whether XML is following the structure and if it is legal.
DTD is used
• A DTD is a set of rules that allow us to specify our
own set of elements and attributes (i.e., schema).
• XML Document is valid if it has an attached DTD and
document is structured according to rules defined in
DTD.
• DTD is grammar to indicate what tags are legal in
XML documents.
8
XML and DTD Example
<?xml version="1.0”?>
<BOOKLIST>
<BOOK GENRE = “Science” FORMAT =
“Hardcover”>
<AUTHOR>
<FIRSTNAME> RICHRD
</FIRSTNAME>
<LASTNAME> KARTER
</LASTNAME>
</AUTHOR>
</BOOK>
</BOOKLIST>
<!DOCTYPE BOOKLIST[
<!ELEMENT BOOKLIST(BOOK)*>
<!ELEMENT BOOK(AUTHOR)+>
<!ELEMENT
AUTHOR(FIRSTNAME,LASTNAME)>
<!ELEMENT FIRSTNAME(#PCDATA)>
<!ELEMENT>LASTNAME(#PCDATA)>
<!ATTLIST BOOK GENRE
(Science|Fiction)#REQUIRED>
<!ATTLIST BOOK FORMAT
(Paperback|Hardcover) “PaperBack”>
]>
9
DTD (cont’d)
Indicator
Occurrence
(no
Required
One and only one
indicator)
?
Optional
None or one
*
Optional, repeatable
None, one, or more
+
Required, repeatable
One or more
<!ELEMENT BOOKLIST(BOOK)*>
<!ELEMENT BOOK(AUTHOR)+>
10
Exercise
• Write a DTD for
<novel>
<foreword>
<paragraph>This is the great Indian novel.</ paragraph> </foreword>
<chapter number=“1”>
<paragraph>It was a dark and stormy night.</paragraph>
<paragraph>Suddenly, a shot rang out! </paragraph>
</chapter>
</novel>
11
DTD
<!DOCTYPE novel [
<!ELEMENT novel (foreword, chapter+)>
<!ELEMENT foreword (paragraph+)>
<!ELEMENT chapter (paragraph+)>
<!ELEMENT paragraph (#PCDATA)>
<!ATTLIST chapter number CDATA #REQUIRED>
]>
12
XML Query Languages
• Same functionality as database query languages (such as SQL)
• XPath
• XQuery
13
XPath Example
Sample XML:
<Student id= “s1”>
<Name>John</Name>
<Age>22</Age>
<Email>jhn@xyz.com</Email>
</Student>
XPath: is a location path
XPath: /Student [Name=“John”]/Email
Output: <Email> element having value “jhn@xyz.com”
14
Examples
<Patients>
<Patient id=“p1”>
<Name>John</Name>
<Address>
<Street>120 Northwestern Ave</Street>
</Address>
</Patient>
<Patient id=“p2”>
<Name>Paul</Name>
<Address>
<Street>120 N. Salisbury</Street>
</Address>
</Patient>
<OpdPatient id=“o1”>
<Name>Henry</Name>
<Address><Street>New York</Street></Address>
</OpdPatient>
</Patients>
15
XPath examples
<Patients>
<Patient id=“p1”>
<Name>John</Name>
<Address>
<Street>120 Northwestern Ave</Street>
</Address>
</Patient>
<Patient id=“p2”>
<Name>Paul</Name>
<Address>
<Street>120 N. Salisbury</Street>
</Address>
</Patient>
<OpdPatient id=“o1”>
<Name>Henry</Name>
<Address><Street>New York</Street></Address>
</OpdPatient>
</Patients>
• /patients/patient/Name – retrieves all patient names
• starting with the root, traverses the tree, matches element
16
XPath by Example
/Patients/(Patient|OpdPatient)/Address addresses of patient or opd patient
/Patients/*/Name
Names of patient or opdpatient
/Patients//Name
Names that are descendants of Patients
/Patients//@id
/Patients//OpdPatient[Name]
value of the id attribute of descendants
of Patients
Patients that have a subelement
firstname
/Patients//[Street=“New York”]
Specific condition
17
XPath URL
• https://www.freeformatter.com/xpath-tester.html#before-output
18
XQuery
• XQuery to XML is same as SQL to RDBMS
• Most databases support XQuery
• XQuery is built on XPath operators
(XPath is a language that defines path expressions to locate
document data)
for $x in doc("books.xml")/bookstore/book
where $x/price>30
order by $x/title
return $x/title
(from)
(where)
(order by)
(select *)
19
Core Concepts of XQuery
XQuery is an extremely powerful query language for XML data. A query has the form of a
so-called FLOWR expression:
FOR $var1 IN expr1, $var2 IN expr2, ...
LET $var3 := expr3, $var4 := expr4, ...
ORDER BY $var4
WHERE condition
RETURN result-doc-construction
The FOR clause evaluates expressions (which may be XPath-style path expressions) and
binds the resulting elements to variables. For a given binding each variable denotes
exactly one element.
The LET clause binds entire sequences of elements to variables.
The ORDER clause sorts the result
The WHERE clause evaluates a logical condition with each of the possible variable
bindings and selects those bindings that satisfy the condition.
The RETURN clause constructs, from each of the variable bindings, an XML result tree.
This may involve grouping and aggregation and even complete subqueries.
20
XQuery
Result can be returned in the
form of XML
FOR $x IN document("bib.xml")/bib/book
RETURN <result> $x </result>
FOR $x IN document("bib.xml")/bib/book
WHERE $x/year > 1995
RETURN $x/title
LET
$a=avg(document("bib.xml")/bib/book/price)
FOR $b in document("bib.xml")/bib/book
WHERE $b/price > $a
RETURN $b
21
XQuery Example
// find Web-related articles by Dan Suciu from the year 1998
<results> {
FOR $a IN document(“literature.xml“)//article
FOR $n IN $a//author, $t IN $a/title
WHERE $a/@year = “1998“
AND contains($n, “Suciu“) AND contains($t, “Web“)
RETURN <result> $n $t </result> } </results>
22
Popular XMLs
• MathML (minus, plus, superscript, …)
• ChemML
• PhysML
• CommerceXML
• voiceXML
23
XML Security
• XML is widely adopted in all aspects of Internet commerce
• Hence, essential ingredients of all electronic security systems—data
integrity, authentication, and confidentiality—must be supported.
• XML security is addressed by a family of standards designed to help
developers build secure XML-based applications
https://www.informit.com/articles/article.aspx?p=601349&seqNum=2
24
XML Security
• The core XML standards related to encryption and digital signatures are maintained by the
W3C.
• Other standards such as Security Assertion Markup Language (SAML) and Extensible Access
Control Markup Language (XACML) are maintained by OASIS, a nonprofit consortium that
drives the development of eBusiness standards.
25
XML Digital Signatures
26
XML Encryption
• Encrypt the complete document.
• Encrypt a single element with XML encryption.
• Encrypt the content of an element.
27
Use Case
• Suppose you want to send the XML file to a publishing company.
• The XML contains details of a book order that includes book and
credit card information.
• The warehouse needs to see what book is being ordered but doesn’t
need access to the credit card information.
28
Normal XML
<?xml version="1.0"?>
<Payments xmlns='http://globalbank.org'>
<Payment>
<Name>John von Neumann</Name>
<Book>Godel, Escher, Bach: An Eternal Gold Braid</Book>
<ISBN>046502656</ISBN>
<CreditCard Limit='5000' Currency='Euro'>
<Number>4654 2445 0277 5567</Number>
<Issuer>Bank of America</Issuer>
<Expiration>04/09</Expiration>
</CreditCard>
</Payment>
</Payments>
If we encrypt the entire document, the warehouse won’t be able to read the book
information, so let’s choose to encrypt the entire credit element and all its content,
including sub-elements.
29
Encrypted XML
<?xml version="1.0"?>
<Payments xmlns='http://globalbank.org’>
<Payment>
<Name>John von Neumann</Name>
<Book>Godel, Escher, Bach: An Eternal Gold Braid</Book>
<ISBN>046502656</ISBN>
<EncryptedData xmlns=‘http:www.w3.org/2001/04/xmlenc#’ Type= =‘http:www.w3.org/2001/04/xmlenc#Element’/>
<EncryptionMethod Algorithm=‘http:www.w3.org/2001/04/xmlenc#tripledes-cbc’/>
<CipherData><CipherValue>ABCDEF</CipherValue></CipherData>
</EncryptedData>
</Payment>
</Payments>
30
XML Encryption
Original/Decrypted
Encrypted
<?xml version="1.0" encoding="UTF-8"?>
<Customers>
<Customer>
<Name>Jose Aznar</Name>
<CreditCard>
<Number>
1000 1234 5678 0001
</Number>
<ExpiryDate>
2003 June 30
</ExpiryDate>
</CreditCard>
</Customer>
...
</Customers>
<?xml version="1.0" encoding="UTF-8"?>
<Customers>
<Customer>
<Name><EncryptedData…></Name>
<CreditCard>
<Number><EncryptedData…></Number>
<ExpiryDate>
2003 June 30
</ExpiryDate>
</Customer>
...
</Customers>
JSON
What is JSON?
•“JSON” stands for “JavaScript Object Notation”
•Lightweight data-interchange format
•Despite the name, JSON is a (mostly)
language-independent way of specifying objects as
name-value pairs
•Structured representation of data object
•Can be parsed with most modern languages
•JSON Schema can be used to validate a JSON file
•Very similar to XML
•But no tags
JSON Syntax Rules
• Uses key/value pairs:
• {“name”: “John”}
• Uses double quotes around KEY and VALUE
• Must use the specified types
• File type is “.json”
• A value can be a string, a number, true, false, null, an object, or an array
• Strings are enclosed in double quotes, and can contain the usual
assortment of escaped characters
JSON Example
{
"name": "John Smith",
"age": 35,
"address": {
"street": "5 main St.",
"city": "Austin"
},
"children": ["Mary", "Abel"]
}
JSON Schema
• A JSON Schema allows you to specify what type of data can go into
your JSON files.
• It allows you to restrict the type of data entered.
JSON Schema
{
"address": {
"type": "object",
"type": "object",
"properties": {
"properties": {
"name": {
"street": {
"type": "string"
"type": "string"
},
},
"age": {
"city": {
"type": "integer"
"type": "string"
},
}
}
},
"children": {
"type": "array",
"items": {
"type": "string"
}
}
}
}
Validating JSON file
• The following website can be used to validate a JSON file against a schema
https://www.jsonschemavalidator.net/
• Paste both the schema and the corresponding JSON file
JSONiq
FLOWR. This is an extension of XQuery
JSON Injection
• Injection attacks in web applications are cyber attacks that seek
to inject malicious code into an application to alter its normal
execution. Injection attacks can lead to loss of data,
modification of data, and denial of service.
JSON Injection
Occurs when:
• Data from an untrusted source is not sanitized (validation) by
the server and written directly to a JSON stream. This is
referred to as server-side JSON injection.
• Data from an untrusted source is not sanitized and parsed
directly. This is referred to as client-side JSON injection.
JSON Injection
• The data supplied by the user (username, password and
account type) is stored on the server side as a JSON string.
Since the application is not sanitizing the input data, a malicious
• user decided to append unexpected data to their
username: richard%22,%22Account%22:%22administrator%22.
Consequently, the resultant JSON string becomes:
While reading, second account value will
take precedence (which is administrator)
42
JSON Injection
• Such attacks are very common and easy.
• This is possible of no appropriate security mechanisms are employed.
43
Download