TweaXML

advertisement
TweaXML
A Language to manipulate & extract data from XML files
Kaushal Kumar (kk2457)
Srinivasa Valluripalli (sv2232)
Contents







Overview and motivation
Language features
XML handling functionalities
Architectural Design
Tutorial (with example)
Lessons learned
Summary
Overview and Motivation
• TweaXML is a language to parse and extract data from XML files and create
new csv/txt files in user defined data-formats.
• XML is a universal language and is used to pass data around between
heterogeneous systems.
• (But) Parsing an XML file programmatically is not straightforward.
• To parse an XML file:
• First you need to learn Java (for example)
• Then learn APIs like DOM-Parser and SAX-Parser.
• These API-usage can be too complicated.
• TweaXML provides a much simpler language to parse XML files. Moreover, it
provides a way to create output files containing this data in user-defined
formats.
Language Features
• Carefully chosen set of keywords
• Multiple Types (int, string, node, file, array)
• Several Operators
• Unary Operators (~, !)
• Arithmetic Operators (+, -, *, /)
• Comparison (<, <=, >, >=, ==, !=)
• Logical Operators (&&, ||)
• node operators (getchild, getvalue)
• file operators (open, create, print, close)
• inbuilt functions (add, subtract, multiply, divide, length)
Language Features
(cont)
• various types of statements
• Conditional statements (if … else)
• Iterative statements (while)
• jump statements (return, continue, break)
• I/O statements (open, create, print, close)
• inbuilt function calls (add, subtract, multiply, divide, length)
XML Handling functionalities
• Open an XML file to read (open)
• returns the root node of the xml file
• Get the child nodes of a node, using the xpath of the child-nodes (getchild)
• returns an array of child-nodes
• Get the length of the child nodes array (length)
• Get the value of a node (getvalue)
• returns the value of the node in string format
• add the values of two nodes (add)
• implicit checks of data types
• subtract the values of two nodes (subtract)
• multiply the values of two nodes (multiply)
• divide the values of two nodes (divide)
File Handling functionalities
• Create an output file to write (create)
• returns the file type
• Write in the file (print)
• close the output file once you are done (close)
Architectural Design
Front end
(TweaXMLLexer & TweaXMLParser)
Tree Walker
(TweaXmlWalker & TweaXmlCodeGen)
Back End
(CodeGen.java)
Run time Libraries
(Apache’s DOM Parser)
Tutorial - Example
(A tweaxml program to extract student’s performance data and
create a csv file with the average marks of each student)
Input XML file: (marks_data.xml)
<students>
<student>
<name>kaushal</name>
<homework1>85</homework1>
<homework2>85</homework2>
<midterm>70</midterm>
<final>90</final>
</student>
<student>
<name>Srini</name>
<homework1>80</homework1>
<homework2>85</homework2>
<midterm>87</midterm>
<final>95</final>
</student>
…
…
</students>
Tweaxml program:
start(){
file output;
node rootNode;
output = create "AvgMarks.csv";
rootNode = open "marks_data.xml";
node studentNodes[];
studentNodes = getchild rootNode "student";
int len;
len = length studentNodes;
if(len > 0)
{
int j;
j=0;
while(j < len)
{
node nameNode[], homework1Node[], homework2Node[], midtermNode[],
finalNode[];
string name, homework1Marks, homework2Marks, midtermMarks,
finalMarks;
nameNode = getchild studentNodes[j] "name";
homework1Node = getchild studentNodes[j] "homework1";
homework2Node = getchild studentNodes[j] "homework2";
midtermNode = getchild studentNodes[j] "midterm";
finalNode = getchild studentNodes[j] "final";
name = getvalue nameNode[0];
homework1Marks = getvalue homework1Node[0];
homework2Marks = getvalue homework2Node[0];
midtermMarks = getvalue midtermNode[0];
finalMarks = getvalue finalNode[0];
string totalMarks;
totalMarks = add homework1Marks homework2Marks;
totalMarks = add totalMarks midtermMarks;
totalMarks = add totalMarks finalMarks;
string avgMarks;
avgMarks = divide totalMarks "4";
}
}
}
close output;
print output name;
print output "\t";
print output avgMarks;
print output "\n";
j = j + 1;
Output
Output file: (AvgMarks.csv)
kaushal 82.5
Srini
86.75
…
…
Lessons Learned
• Start early on the project
• More functionalities could have been added
• More data types could have been provided
• User defined functions could have been added
Summary
• TweaXML provides an easier way to deal with xml files.
• Data can be extracted and written out in user-defined formats.
• No need to learn APIs like DOMParser and SAXParser
• It’s not perfect, but it’s highly useful.
• More functionalities could have been provided if given more time.
Download