TweaXML A Language to manipulate & extract data from XML files Kaushal Kumar (kk2457) Srinivasa Valluripalli (sv2232) Contents Overview and motivation Language features XML handling functionalities Architectural Design Tutorial (with example) Lessons learned Summary Overview and Motivation • TweaXML is a language to parse and extract data from XML files and create new csv/txt files in user defined data-formats. • XML is a universal language and is used to pass data around between heterogeneous systems. • (But) Parsing an XML file programmatically is not straightforward. • To parse an XML file: • First you need to learn Java (for example) • Then learn APIs like DOM-Parser and SAX-Parser. • These API-usage can be too complicated. • TweaXML provides a much simpler language to parse XML files. Moreover, it provides a way to create output files containing this data in user-defined formats. Language Features • Carefully chosen set of keywords • Multiple Types (int, string, node, file, array) • Several Operators • Unary Operators (~, !) • Arithmetic Operators (+, -, *, /) • Comparison (<, <=, >, >=, ==, !=) • Logical Operators (&&, ||) • node operators (getchild, getvalue) • file operators (open, create, print, close) • inbuilt functions (add, subtract, multiply, divide, length) Language Features (cont) • various types of statements • Conditional statements (if … else) • Iterative statements (while) • jump statements (return, continue, break) • I/O statements (open, create, print, close) • inbuilt function calls (add, subtract, multiply, divide, length) XML Handling functionalities • Open an XML file to read (open) • returns the root node of the xml file • Get the child nodes of a node, using the xpath of the child-nodes (getchild) • returns an array of child-nodes • Get the length of the child nodes array (length) • Get the value of a node (getvalue) • returns the value of the node in string format • add the values of two nodes (add) • implicit checks of data types • subtract the values of two nodes (subtract) • multiply the values of two nodes (multiply) • divide the values of two nodes (divide) File Handling functionalities • Create an output file to write (create) • returns the file type • Write in the file (print) • close the output file once you are done (close) Architectural Design Front end (TweaXMLLexer & TweaXMLParser) Tree Walker (TweaXmlWalker & TweaXmlCodeGen) Back End (CodeGen.java) Run time Libraries (Apache’s DOM Parser) Tutorial - Example (A tweaxml program to extract student’s performance data and create a csv file with the average marks of each student) Input XML file: (marks_data.xml) <students> <student> <name>kaushal</name> <homework1>85</homework1> <homework2>85</homework2> <midterm>70</midterm> <final>90</final> </student> <student> <name>Srini</name> <homework1>80</homework1> <homework2>85</homework2> <midterm>87</midterm> <final>95</final> </student> … … </students> Tweaxml program: start(){ file output; node rootNode; output = create "AvgMarks.csv"; rootNode = open "marks_data.xml"; node studentNodes[]; studentNodes = getchild rootNode "student"; int len; len = length studentNodes; if(len > 0) { int j; j=0; while(j < len) { node nameNode[], homework1Node[], homework2Node[], midtermNode[], finalNode[]; string name, homework1Marks, homework2Marks, midtermMarks, finalMarks; nameNode = getchild studentNodes[j] "name"; homework1Node = getchild studentNodes[j] "homework1"; homework2Node = getchild studentNodes[j] "homework2"; midtermNode = getchild studentNodes[j] "midterm"; finalNode = getchild studentNodes[j] "final"; name = getvalue nameNode[0]; homework1Marks = getvalue homework1Node[0]; homework2Marks = getvalue homework2Node[0]; midtermMarks = getvalue midtermNode[0]; finalMarks = getvalue finalNode[0]; string totalMarks; totalMarks = add homework1Marks homework2Marks; totalMarks = add totalMarks midtermMarks; totalMarks = add totalMarks finalMarks; string avgMarks; avgMarks = divide totalMarks "4"; } } } close output; print output name; print output "\t"; print output avgMarks; print output "\n"; j = j + 1; Output Output file: (AvgMarks.csv) kaushal 82.5 Srini 86.75 … … Lessons Learned • Start early on the project • More functionalities could have been added • More data types could have been provided • User defined functions could have been added Summary • TweaXML provides an easier way to deal with xml files. • Data can be extracted and written out in user-defined formats. • No need to learn APIs like DOMParser and SAXParser • It’s not perfect, but it’s highly useful. • More functionalities could have been provided if given more time.