XML-based Testing of Web Software Services Jeff Offutt Software Engineering George Mason University Fairfax, VA USA www.cs.gmu.edu/~offutt/ offutt@gmu.edu Joint research with Wuxhi Xu, Juan Luo and Suet Chun Lee Need for Reliable Web Apps • Expedia sells more than $35 million in tickets weekly – Based on 2000 data • In Dec 2006, amazon.com’s BOGO offer turned into a double discount • Huge losses on web application failures – Financial services : $6.5 million per hour – Credit card sales applications : $2.4 million per hour – Media companies : $150,000 per hour • Most faults introduced during maintenance • Most security vulnerabilities are due to software faults STEP 2008 © Jeff Offutt 2 Research in Software Testing Growth fueled by need for better software Research Laboratories Microsoft Google Siemens … University / Industry Partnerships FedEx / U Memphis Conferences & Workshops ICST University Groups ISSRE STEP workshop MBT AST … FIT GMU UNL … Useful ways to build better software STEP 2008 © Jeff Offutt 3 Travel Information Flow name flights Airport Agent Airline seats dates airports checkin gate flights reservation confirm checkin room schedule meeting rooms rooms phones schedules checkout Traveler contact schedule receipt Hotel Organizers Colleagues STEP 2008 Spouse © Jeff Offutt 4 Travel Information Needed name flights Airline Airport AgentInformation Needed dates Conference schedule airports Hotel address & phonecheckin number reservation Flight numbers and times seats Gate numbers confirm gate flights Seat numbers checkin Hotel confirmation number Hotel room number room schedule Meeting rooms and times checkout meeting rooms Local contact information for colleagues rooms phones schedules … contact schedule receipt Hotel And after returning home … all this information is immediately discarded ! Organizers Colleagues STEP 2008 Spouse © Jeff Offutt 5 Current Method • Most of us accumulate this information from – – – – – Email Websites Phone conversations Personal conversations Pieces of paper This is very 20th century analog … • And then try to organize and track it – In our heads – Random scraps of paper – Laboriously hand-entering data into hand-held devices STEP 2008 © Jeff Offutt 6 21st Century Method • Data are sent to a hand-held device wirelessly – Hand-held device automatically organizes data into meaningful information • Information is presented to traveler when needed Data are generated and sent through software in a service oriented architecture architecture service oriented STEP 2008 © Jeff Offutt 7 Web Services • A Web Service is a program that offers services over the Internet to other software programs – Internet-based – Uses SOAP and XML – Peer-to-peer communication via message passing • Web service components can integrate dynamically, by finding other services during execution • Web services transmit data formatted in XML What does a service oriented architecture look like ? STEP 2008 © Jeff Offutt 8 Web Service Architecture Client-server Web Apps server servers internet Web Services PDA server server Workstation clients STEP 2008 Cell phone Laptop clients © Jeff Offutt 9 Web Service Technologies Applications Find SOAP / XML SOAP / XML UDDI Registry Specification Bind Components Points to URL Wrapped WSDL Specification Specification Publish Services Wrapped Legacy Systems messages transmitted in XML STEP 2008 © Jeff Offutt 10 Why XML ? • Software components that pass data must agree on format, types, and organization • Web services have unique requirements : – Very loose coupling and dynamic integration 1970s style P1 1980s style P2 File P1 File storage Un-documented format WM Data saved in binary mode Source not available STEP 2008 P2 File storage Un-documented format Data saved as plain text Access through wrapper module Data hard to validate File © Jeff Offutt 11 XML • Data is passed directly between components • XML allows for self-documenting data 2000s style • P1, P2 and P3 can see the format, P2 contents, and structure of the data P1 P3 • Data sharing is independent of type Parser • Format is easy to understand Schemas • Grammars are defined in schemas XML File STEP 2008 © Jeff Offutt 12 XML for Flight Example <flights> <flight> <airline>USAir 2608</airline> <origin>IAD</origin> <destination>CLT</destination> <leavetime>10:50:00</leavetime> <arrivetime>12:11:00</arrivetime> <date>2008-05-04</date> </flight> </flights> • XML messages are defined by grammars (schemas) • Schemas can define many kinds of types • Schemas include “facets,” which refine the grammar schemas define input spaces for software components STEP 2008 © Jeff Offutt 13 Input Space Grammars Input Space The set of allowable inputs to software • The input space can be described in many ways – – – – User manuals Unix man pages Method signature / Collection of method preconditions A language • Most input spaces can be described as grammars • Grammars are usually not provided, but creating them is a valuable service by the tester – Errors will often be found simply by creating the grammar STEP 2008 © Jeff Offutt 14 Using Input Grammars • • • • Software should reject or handle invalid data Programs often do this incorrectly Some programs (rashly) assume all input data is correct Even if it works today … – What about after the program goes through some maintenance changes ? – What about if the component is reused in a new program ? • Consequences can be severe … – The database can be corrupted – Users are not satisfied – Most security vulnerabilities are due to unhandled exceptions … from invalid data STEP 2008 © Jeff Offutt 15 Validating Inputs Input Validation Deciding if input values can be processed by the software • Before starting to process inputs, wisely written programs check that the inputs are valid • How should a program recognize invalid inputs ? • What should a program do with invalid inputs ? • If the input space is described as a grammar, a parser can check for validity automatically – This is very rare – It is easy to write input checkers – but also easy to make mistakes STEP 2008 © Jeff Offutt 16 Representing Input Domains Desired inputs (goal domain) Described inputs (specified domain) Accepted inputs (implemented domain) STEP 2008 © Jeff Offutt 17 Representing Input Domains • Goal domains are often irregular • Goal domain for credit cards† – – – – First digit is the Major Industry Identifier First 6 digits and length specify the issuer Final digit is a “check digit” Other digits identify a specific account • Common specified domain – First digit is in { 3, 4, 5, 6 } (travel and banking) – Length is between 13 and 16 • Common implemented domain – All digits are numeric † More STEP 2008 details are on : http://www.merriampark.com/anatomycc.htm © Jeff Offutt 18 Representing Input Domains goal domain specified domain This region is a rich source of software errors … implemented domain STEP 2008 © Jeff Offutt 19 Testing Web Services • • • • • • This form of testing allows us to focus on interactions among the components A formal model of the XML grammar is used The grammar is used to create valid as well as invalid tests The grammar is mutated The mutated grammar is used to generate new XML messages The XML messages are used as test cases STEP 2008 © Jeff Offutt 20 XML Data Model Example <xsd:element name = “flights”> <xsd:complexType> <xsd:sequence> <xsd:element name = “flight” maxOccurs = “unbounded”> <xsd:complexType> Built-in types <xsd:sequence> <xsd:element name = “airline” type = “xsd:string”/> <xsd:element name = “origin” type = “airportType”/> <xsd:element name = “destination” type = “airportType”/> <xsd:element name = “leavetime” type = “xsd:time”/> <xsd:element name = “arrivetime” type = “xsd:time”/> <xsd:element name = “date” type = “xsd:date”/> </xsd:sequence> </xsd:complexType> <xs:simpleType name = “airportType”> </xsd:element> <xs:restriction base = “xsd:string”> </xsd:sequence> <xs:length value = “3” /> </xsd:complexType> </xs:restriction> </xsd:element> </xs:simpleType> STEP 2008 © Jeff Offutt 21 XML Constraints – “Facets” Boundary Constraints maxOccurs Non-boundary Constraints enumeration minOccurs length maxExclusive use fractionDigits pattern maxInclusive maxLength minExclusive minInclusive nillable whiteSpace unique minLength totalDigits STEP 2008 © Jeff Offutt 22 XML Data Model An XML schema can be modeled as a tree T = (N, D, X, E, nr) nr is the root node E is a finite set of edges Edges are from N to ND, plus a constraint X is a finite set of constraints D is a finite set of built-in and derived data types N is a finite set of elements and attribute nodes STEP 2008 © Jeff Offutt 23 Generating Tests • Valid tests – Generate tests as XML messages by deriving strings from grammar – Take every production at least once – Take choices … “maxOccurs = “unbounded” means use 0, 1 and more than 1 • Invalid tests – Mutate the grammar in structured ways – Create XML messages that are “almost” valid – This explores the gray space on the previous slide STEP 2008 © Jeff Offutt 24 Mutation Operators 1. Nonterminal Replacement Every nonterminal symbol in a production is replaced by other nonterminal symbols. 2. Terminal Replacement These operators are designed to mimic common XML errors Every terminal symbol in a production is replaced by other terminal symbols. 3. Terminal and Nonterminal Deletion Every terminal and nonterminal symbol in a production is deleted. 4. Terminal and Nonterminal Duplication Every terminal and nonterminal symbol in a production is duplicated. STEP 2008 © Jeff Offutt 25 Test Case Generation • A test case is an XML message • Tests are generated directly from mutated schemas • Constraints are “violated” systematically – Values beyond the boundary values • “maxLength=5” “abcdef” – Values outside the non-boundary constraints • “fractionDigits=2” “456 . 324” • Multiple XML messages from the same schema • Messages are invalid, so a valid response is an error – False positives : Messages that are accidentally valid STEP 2008 © Jeff Offutt 26 Test Case Generation – Examples Original Schema (Partial) <xs:simpleType name = “priceType”> <xs:restriction base = “xs:decimal”> <xs:fractionDigits value = “2” /> <xs:maxInclusive value = “1000.00” /> </xs:restriction> </xs:simpleType> XML from Original Schema <books> <book> <ISBN>0-201-74095-8</ISBN> <price>37.95</price> <year>2002</year> </book> </book> STEP 2008 Mutants : value = “3” value = “1” Mutants : value = “100” value = “2000” Mutant XML 1 Mutant XML 2 <books> Mutant XML 3 <books> Mutant XML 4 <books> <book> <book> <books> <book> <ISBN>0-201-74095-8</ISBN> <ISBN>0-201-74095-8</ISBN> <book> 505 </price> <price>37.95 <ISBN>0-201-74095-8</ISBN> <price>37.95</price> <ISBN>0-201-74095-8</ISBN> 5 </price> <year>2002</year> <price>37.95 101.00 1001.00 </price> <year>2002</year> <price>37.95 </book> <year>2002</year> </book> <year>2002</year> </book> </book> </book> </book> </book> </book> © Jeff Offutt 27 Case Study 1 • Small web service created at GMU – Three components : Mars robot, space station, ground control – Ground control is a three-tier web application • Correct behavior is to have abnormal responses – Receiver cannot process the data, responds with a fault – Receiver has a runtime exception Only CV tests got • Three types of mutants – Deletion, Insertion, Constraint Violation normal responses D CV Original Total XML Schemas 23 25 11 4 63 XML Messages 64 103 53 12 232 Abnormal Response 64 103 11 0 178 42 42 12 54 Normal Response STEP 2008 I 0 0 © Jeff Offutt 28 Case Study 2 • From Web Services Interoperability Organization – Supply chain management – Seven XML schemas – Three were requests and used for invalid tests Mutated Schemas Schema Retailer D I CV XML Messages D I CV 14 83 9 129 710 90 Warehouse 6 22 3 15 48 6 Manufacturer 8 32 3 14 48 3 • Fifteen faults inserted into the program • Seven faults found – all by CV tests STEP 2008 © Jeff Offutt 29 Analysis of Faults 8 faults not found • 5 faults : Affected back-end log file – Observability … log file was not seen • 1 fault : Depends on inputs from the database – Controllability … tests depend on XML, not DB • 2 faults : Required specific values that were not used • All Deletion and Insertion tests were detected by program STEP 2008 © Jeff Offutt 30 Discussion • Deleting and inserting parts of the grammar have little or no value • Observability and controllability are major problems with web services – This is well-documented with web applications • The constraints are much more useful for generating tests than deleting and inserting XML elements STEP 2008 © Jeff Offutt 31 Extensions Needed • Improve invalid test generation – Focus on constraint-based tests from tests – Expand mutation of constraint • Automatic test generation – Based on input space partitioning (category partitioning) • General problems – Dealing with observability – Dealing with controllability STEP 2008 © Jeff Offutt 32 Travel Info Flow – Web Services web app desired travel info email initial schedule Agent web service Organizers travel info web services Traveler Use web services instead of email to plan trip Airline checkin checkout room key wireless web service connection checkin gate info Airport wireless web service connection Hotel Traveler Connect wirelessly to web services during journey STEP 2008 © Jeff Offutt 33 Travel Info Flow – Web Services • Traveler will send requirements to travel agent and hotel • Information will be sent to traveler’s web service, which will store it on the traveler’s hand-held device • Traveler will check-in at the airport by beaming data from hand-held – Airport will send gate information and a map to the hand-held • Traveler will check in and out of hotel by beaming data from hand-held – Hotel will send hotel room, map, and electronic key to hand-held • Conference organizers will send meeting organization details to room computer, which will sync with hand-held • Room computer will send contact details to spouse STEP 2008 © Jeff Offutt 34 Trusting Web Services For widespread adoption, users must be confident web services are • • • • STEP 2008 Reliable Secure Dependable Usable © Jeff Offutt 35 Conclusions • This mode of operation will be more convenient and efficient – Reduces the need to laboriously hand-translate information from one device or format to another • All of the technologies to support these interactions are available • We are missing some engineering – Reliable web services – Secure data transmission – Usable interfaces STEP 2008 © Jeff Offutt Testing addresses some, but not all these issues 36 Contact Jeff Offutt offutt@gmu.edu http://cs.gmu.edu/~offutt/ STEP 2008 © Jeff Offutt 37