Uploaded by Paj Paj

60003200124 SWT Exp1

advertisement
Academic Year 2023-24
SAP ID: 60003200124
DEPARTMENT OF INFORMATION TECHNOLOGY
COURSE CODE: DJ19ITC801
DATE: 31 -01-2024
COURSE NAME: Semantic Web Technology Laboratory
CLASS: BE IT I2-1
EXPERIMENT NO. 1
CO/LO:
Apply Semantic web technologies to real world applications.
AIM:
Parsing the XML dataset and comparing Different XML Serialization Formats
and Their Impact on Data Size and Processing Time
DESCRIPTION:
Parsing the XML dataset and comparing Different XML Serialization Formats
and Their Impact on Data Size and Processing Time
IMPLEMENTATION:
XML FILE:
<?xml version="1.0"?>
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications with
XML.</description>
</book>
<book id="bk102">
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-12-16</publish_date>
<description>A former architect battles corporate zombies, an
evil sorceress, and her own childhood to become queen
Academic Year 2023-24
SAP ID: 60003200124
of the world.</description>
</book>
<book id="bk103">
<author>Corets, Eva</author>
<title>Maeve Ascendant</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-11-17</publish_date>
<description>After the collapse of a nanotechnology
society in England, the young survivors lay the
foundation for a new society.</description>
</book>
<book id="bk104">
<author>Corets, Eva</author>
<title>Oberon's Legacy</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2001-03-10</publish_date>
<description>In post-apocalypse England, the mysterious
agent known only as Oberon helps to create a new life for the
inhabitants of London. Sequel to Maeve
Ascendant.</description>
</book>
<book id="bk105">
<author>Corets, Eva</author>
<title>The Sundered Grail</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2001-09-10</publish_date>
<description>The two daughters of Maeve, half-sisters, battle
one another for control of England. Sequel to Oberon's
Legacy.</description>
</book>
<book id="bk106">
<author>Randall, Cynthia</author>
<title>Lover Birds</title>
<genre>Romance</genre>
<price>4.95</price>
<publish_date>2000-09-02</publish_date>
<description>When Carla meets Paul at an ornithology conference,
tempers fly as feathers get ruffled.</description>
</book>
<book id="bk107">
<author>Thurman, Paula</author>
<title>Splish Splash</title>
<genre>Romance</genre>
<price>4.95</price>
<publish_date>2000-11-02</publish_date>
Academic Year 2023-24
SAP ID: 60003200124
<description>A deep sea diver finds true love twenty
thousand leagues beneath the sea.</description>
</book>
<book id="bk108">
<author>Knorr, Stefan</author>
<title>Creepy Crawlies</title>
<genre>Horror</genre>
<price>4.95</price>
<publish_date>2000-12-06</publish_date>
<description>An anthology of horror stories about roaches,
centipedes, scorpions and other insects.</description>
</book>
<book id="bk109">
<author>Kress, Peter</author>
<title>Paradox Lost</title>
<genre>Science Fiction</genre>
<price>6.95</price>
<publish_date>2000-11-02</publish_date>
<description>After an inadvertant trip through a Heisenberg
Uncertainty Device, James Salway discovers the problems of
being quantum.</description>
</book>
<book id="bk110">
<author>O'Brien, Tim</author>
<title>Microsoft .NET: The Programming Bible</title>
<genre>Computer</genre>
<price>36.95</price>
<publish_date>2000-12-09</publish_date>
<description>Microsoft's .NET initiative is explored in detail in
this deep programmer's reference.</description>
</book>
<book id="bk111">
<author>O'Brien, Tim</author>
<title>MSXML3: A Comprehensive Guide</title>
<genre>Computer</genre>
<price>36.95</price>
<publish_date>2000-12-01</publish_date>
<description>The Microsoft MSXML3 parser is covered in detail, with
attention to XML DOM interfaces, XSLT processing, SAX and
more.</description>
</book>
<book id="bk112">
<author>Galos, Mike</author>
<title>Visual Studio 7: A Comprehensive Guide</title>
<genre>Computer</genre>
<price>49.95</price>
<publish_date>2001-04-16</publish_date>
<description>Microsoft Visual Studio 7 is explored in depth,
Academic Year 2023-24
SAP ID: 60003200124
looking at how Visual Basic, Visual C++, C#, and ASP+ are
integrated into a comprehensive development
environment.</description>
</book>
</catalog>
PROCEDURE / ALGORITHM:
1. Open Notepad and create XML data
2. Save the XML data as a .XML file, .CXML file and a .BXML file.
3. Open Google Colab and upload your XML file.
TECHNOLOGY STACK USED: Google Colab
SOURCE CODE:
import xml.etree.ElementTree as ET
import time
x=time.time()
tree = ET.parse('book.bxml')
print(time.time()-x)
root = tree.getroot()
print('root tag : ',end='')
print(root.tag)
print('\n\n')
Output
0.001077413558959961
root tag : catalog
import xml.etree.ElementTree as ET
import time
x=time.time()
tree = ET.parse('book.cxml')
print(time.time()-x)
root = tree.getroot()
print('root tag : ',end='')
print(root.tag)
print('\n\n')
Academic Year 2023-24
Output
0.0016436576843261719
root tag : catalog
import xml.etree.ElementTree as ET
import time
x=time.time()
tree = ET.parse('book.xml')
print(time.time()-x)
root = tree.getroot()
print('root tag : ',end='')
print(root.tag)
print('\n\n')
Output
0.0015175342559814453
root tag : catalog
print('all children and attributes : ',end='')
for child in root:
print (child.tag, child.attrib)
print('\n\n')
Output
all children and attributes : book {'id': 'bk101'}
book {'id': 'bk102'}
book {'id': 'bk103'}
book {'id': 'bk104'}
book {'id': 'bk105'}
book {'id': 'bk106'}
book {'id': 'bk107'}
book {'id': 'bk108'}
book {'id': 'bk109'}
book {'id': 'bk110'}
book {'id': 'bk111'}
book {'id': 'bk112'}
print('first child of root element : ',end='')
print(root[0].tag,root[0].attrib)
print('\n\n')
Output
first child of root element : book {'id': 'bk101'}
SAP ID: 60003200124
Academic Year 2023-24
print('Iterate through all the tags with a specific name author :
',end='')
for x in root.iter('book'):
print (x.attrib)
print('\n\n')
Output
Iterate through all the tags with a specific name author : {'id':
'bk101'}
{'id': 'bk102'}
{'id': 'bk103'}
{'id': 'bk104'}
{'id': 'bk105'}
{'id': 'bk106'}
{'id': 'bk107'}
{'id': 'bk108'}
{'id': 'bk109'}
{'id': 'bk110'}
{'id': 'bk111'}
{'id': 'bk112'}
for x in root.iter('price'):
new_rank = float(x.text) + 1.0
x.text = str(new_rank)
x.set('updated', 'yes')
print(new_rank)
Output
45.95
6.95
6.95
6.95
6.95
5.95
5.95
5.95
7.95
37.95
37.95
50.95
for x in root.findall('price'):
rank = int(x.find('price').text)
if rank > 10:
root.remove(x)
print(rank)
new_book = ET.Element("book")
new_book.set("id", "bk222")
SAP ID: 60003200124
Academic Year 2023-24
author_element = ET.Element("author")
author_element.text = "Suraj"
new_book.append(author_element)
title_element = ET.Element("title")
title_element.text = "Suraj"
new_book.append(title_element)
genre_element = ET.Element("genre")
genre_element.text = "Suraj"
new_book.append(genre_element)
price_element = ET.Element("price")
price_element.text = "44.9"
new_book.append(price_element)
publish_date_element = ET.Element("publish_date")
publish_date_element.text = "Suraj"
new_book.append(publish_date_element)
dscp_element = ET.Element("description")
dscp_element.text = "Suraj"
new_book.append(dscp_element)
root.append(new_book)
tree.write('output.xml')
output.xml
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price updated="yes">45.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications with
XML.</description>
</book>
<book id="bk102">
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
<genre>Fantasy</genre>
<price updated="yes">6.95</price>
<publish_date>2000-12-16</publish_date>
<description>A former architect battles corporate zombies,
SAP ID: 60003200124
Academic Year 2023-24
SAP ID: 60003200124
an evil sorceress, and her own childhood to become queen of
the world.</description>
</book>
<book id="bk103">
<author>Corets, Eva</author>
<title>Maeve Ascendant</title>
<genre>Fantasy</genre>
<price updated="yes">6.95</price>
<publish_date>2000-11-17</publish_date>
<description>After the collapse of a nanotechnology
society in England, the young survivors lay the
foundation for a new society.</description>
</book>
<book id="bk104">
<author>Corets, Eva</author>
<title>Oberon's Legacy</title>
<genre>Fantasy</genre>
<price updated="yes">6.95</price>
<publish_date>2001-03-10</publish_date>
<description>In post-apocalypse England, the mysterious
agent known only as Oberon helps to create a new life for the
inhabitants of London. Sequel to Maeve
Ascendant.</description>
</book>
<book id="bk105">
<author>Corets, Eva</author>
<title>The Sundered Grail</title>
<genre>Fantasy</genre>
<price updated="yes">6.95</price>
<publish_date>2001-09-10</publish_date>
<description>The two daughters of Maeve, half-sisters, battle
one another for control of England. Sequel to Oberon's
Legacy.</description>
</book>
<book id="bk106">
<author>Randall, Cynthia</author>
<title>Lover Birds</title>
<genre>Romance</genre>
<price updated="yes">5.95</price>
<publish_date>2000-09-02</publish_date>
<description>When Carla meets Paul at an ornithology conference,
tempers fly as feathers get ruffled.</description>
</book>
<book id="bk107">
<author>Thurman, Paula</author>
<title>Splish Splash</title>
<genre>Romance</genre>
<price updated="yes">5.95</price>
Academic Year 2023-24
SAP ID: 60003200124
<publish_date>2000-11-02</publish_date>
<description>A deep sea diver finds true love twenty
thousand leagues beneath the sea.</description>
</book>
<book id="bk108">
<author>Knorr, Stefan</author>
<title>Creepy Crawlies</title>
<genre>Horror</genre>
<price updated="yes">5.95</price>
<publish_date>2000-12-06</publish_date>
<description>An anthology of horror stories about roaches,
centipedes, scorpions and other insects.</description>
</book>
<book id="bk109">
<author>Kress, Peter</author>
<title>Paradox Lost</title>
<genre>Science Fiction</genre>
<price updated="yes">7.95</price>
<publish_date>2000-11-02</publish_date>
<description>After an inadvertant trip through a Heisenberg
Uncertainty Device, James Salway discovers the problems of
being quantum.</description>
</book>
<book id="bk110">
<author>O'Brien, Tim</author>
<title>Microsoft .NET: The Programming Bible</title>
<genre>Computer</genre>
<price updated="yes">37.95</price>
<publish_date>2000-12-09</publish_date>
<description>Microsoft's .NET initiative is explored in detail in
this deep programmer's reference.</description>
</book>
<book id="bk111">
<author>O'Brien, Tim</author>
<title>MSXML3: A Comprehensive Guide</title>
<genre>Computer</genre>
<price updated="yes">37.95</price>
<publish_date>2000-12-01</publish_date>
<description>The Microsoft MSXML3 parser is covered in detail, with
attention to XML DOM interfaces, XSLT processing, SAX and
more.</description>
</book>
<book id="bk112">
<author>Galos, Mike</author>
<title>Visual Studio 7: A Comprehensive Guide</title>
<genre>Computer</genre>
<price updated="yes">50.95</price>
<publish_date>2001-04-16</publish_date>
Academic Year 2023-24
SAP ID: 60003200124
<description>Microsoft Visual Studio 7 is explored in depth, looking
at how Visual Basic, Visual C++, C#, and ASP+ are integrated into a
comprehensive development environment.</description>
</book>
<book
id="bk222"><author>Suraj</author><title>Suraj</title><genre>Suraj</genre><pric
e>44.9</price><publish_date>Suraj</publish_date><description>Suraj</descriptio
n></book></catalog>
OBSERVATIONS / DISCUSSION OF RESULT:
Comparing Different XML Serialization Formats and Their Impact on Data
Size and Processing
XML:
Size: 4,548 bytes
Time to parse:
0.0014450550079345703s CXML:
Size: 4,548 bytes
Time to parse:
0.0010936260223388672s BXML:
Size: 4,548 bytes
Time to parse: 0.001340627670288086s
The file size for all is the same though the parsing time is best in this order
CXML>BXML>XML
Academic Year 2023-24
SAP ID: 60003200124
CONCLUSION:
In this experiment, three different XML serialization formats, namely XML, CXML, and BXML,
were created and analyzed for their impact on data size and processing time in the context of
parsing using Google Colab. While all three formats resulted in identical file sizes of 4,548
bytes, the parsing time varied slightly. Notably, CXML demonstrated the fastest parsing time
at 0.0010936 seconds, followed by XML at 0.0014451 seconds, and BXML at 0.0013406
seconds. The experiment indicates that the choice of serialization format can influence
parsing efficiency, with CXML exhibiting the most optimal performance in terms of
processing time. These findings emphasize the importance of considering both file size and
parsing time when selecting an XML serialization format for specific applications or systems.
REFERENCES:
[1] https://www.geeksforgeeks.org/singleton-design-pattern-introduction/?ref=lbp
[2] XML specification: https://www.w3.org/TR/REC-xml/
[3] Compact XML (CXML) specification: https://www.w3.org/TR/cxml/
[4] Binary XML (BXML) specification: https://www.w3.org/TR/binaryxml/
[5] Python XML Parser Tutorial | Read and Write XML in Python | Python Training | Edureka – YouTube
Related documents
Download