Cheat Check SOFTWARE DESIGN SPECIFICATION For CS 430 at West Virginia University, Fall 2008 Lee Zaniewski, Kai Ma, Aaron Costa, Chris Cole lzaniews@mix.wvu.edu 3/11/2008 SOFTWARE DESIGN SPECIFICATION Table of Contents Table of Contents ............................................................................................................................ 2 1.0 Introduction ............................................................................................................................... 5 1.1 Goals and objectives .............................................................................................................. 5 1.2 Statement of scope ............................................................................................................... 5 1.3 Software context ................................................................................................................... 5 1.4 Major constraints .................................................................................................................. 5 2.0 Data design ................................................................................................................................ 6 2.1 Internal software data structure ........................................................................................... 6 2.2 Global data structure............................................................................................................. 6 2.3 Temporary data structure ..................................................................................................... 6 3.0 Architectural and component-level design ............................................................................... 7 3.1 Program Structure ................................................................................................................. 7 3.1.1 Architecture diagram...................................................................................................... 7 3.1.2 Alternatives .................................................................................................................... 7 3.2 Description for User Interface (UI) ........................................................................................ 8 3.2.1 Processing narrative for UI ............................................................................................ 8 3.2.2 UI interface description. ................................................................................................. 8 3.2.3 UI processing detail ........................................................................................................ 8 3.3 Description for Input Handler................................................................................................ 9 3.3.1 Processing narrative for Input Handler ......................................................................... 9 3.3.2 Input Handler interface description. .............................................................................. 9 3.3.3 Input Handler processing detail ..................................................................................... 9 Page 2 SOFTWARE DESIGN SPECIFICATION 3.4 Description for Parse Engine ............................................................................................... 10 3.4.1 Processing narrative for Parse Engine ......................................................................... 10 3.4.2 Parse Engine interface description. .............................................................................. 10 3.4.3 Parse Engine processing detail ..................................................................................... 10 3.5 Description for Comparison Engine ..................................................................................... 11 3.5.1 Processing narrative for Comparison Engine .............................................................. 11 3.5.2 Comparison Engine interface description. ................................................................... 11 3.5.3 Comparison Engine processing detail .......................................................................... 11 3.6 Description for API Handler ................................................................................................. 12 3.6.1 Processing narrative for API Handler........................................................................... 12 3.6.2 API Handler interface description. ............................................................................... 12 3.6.3 Component n processing detail .................................................................................... 12 3.7 Software Interface Description ........................................................................................... 13 3.7.1 External machine interfaces ......................................................................................... 13 3.7.2 External system interfaces ........................................................................................... 13 3.7.3 Human interface ........................................................................................................... 13 4.0 User Interface Design .............................................................................................................. 14 4.1 Screen images...................................................................................................................... 14 4.2 Interface design rules .......................................................................................................... 15 4.3 Components available ......................................................................................................... 15 4.4 UIDS description .................................................................................................................. 15 5.0 Restrictions, limitations, and constraints ................................................................................ 16 5.1 Time Limits .......................................................................................................................... 16 5.2 Budgetary Constraints ......................................................................................................... 16 Page 3 SOFTWARE DESIGN SPECIFICATION 5.3 Hardware Constraints.......................................................................................................... 16 6.0 Testing Issues........................................................................................................................... 17 6.1 Classes of tests .................................................................................................................... 17 6.2 Expected software response ............................................................................................... 17 6.3 Performance bounds ........................................................................................................... 17 6.4 Identification of critical components .................................................................................. 17 7.0 Appendices .............................................................................................................................. 18 7.1 Packaging and installation issues ........................................................................................ 18 7.2 Legal Considerations............................................................................................................ 18 7.3 Data Flow diagrams ............................................................................................................. 18 Page 4 SOFTWARE DESIGN SPECIFICATION 1.0 Introduction This document will describe all requirements and functionality of the Cheat-Checker Plagiarism Detection Application (CCPDA), which is designed to aid instructors in insuring that student submitted documents meet academic standards for honesty. This document will summarize the method by which we will design the CCPDA. 1.1 Goals and objectives The initial purpose of CCPDA is to check source code to see if it was plagiarized from another source such as a student or any publicly searchable online database. We hope to make CCPDA platform independent, easy to use, and easily extensible for other document formats. 1.2 Statement of scope CCPDA will compare documents submitted to the instructor and/or to public databases for similarities and show these similarities in an easily readable graphical format. 1.3 Software context Our software will attempt to quicken the process of checking for plagiarism in an academic setting. Our team realized the difficulty involved for instructors attempting to prevent academic dishonesty, and hope to ease some of the burden in reviewing large number of submissions. 1.4 Major constraints The major constraints of this project are limited design and implementation time due to the length of the academic semester. Page 5 SOFTWARE DESIGN SPECIFICATION 2.0 Data design 2.1 Internal software data structure We are going to parse the files and represent the files as a series of python data structures. The parsed_file class will contain an internal data structure of a list of dictionaries representing key value pairs of variable names to a list of operations performed on the variable. [{variable:[modification, modification,…]}, {variable:[modification, modification,…]},…] [ ] represent lists. {} represent keyword : value relations, called dictionaries. Variable represents the name we assign the variable, and modification represents the various operations performed on the variable. This is the primary internal data structure we will use. 2.2 Global data structure The input and output will be represented as data structures available to the vast majority of the program. The output will be a Result Set data structure consisting of a list of possible matches which will include the two matching files and the strength of the match as an integer ( 0-99% likeliness of a match). The input will represented as a class called Project Options that will contain the name, type and run mode of the file in strings, as well as the directories and API queries to be used in the comparison in lists. 2.3 Temporary data structure A python list of Parsed File objects will be used to pass the information of the file as well as the parsed file information from the input handler to the parser and comparison engine. Page 6 SOFTWARE DESIGN SPECIFICATION 3.0 Architectural and component-level design 3.1 Program Structure Our program will be based on the pipe and filter architecture. This is due to the linear nature of the design. 3.1.1 Architecture diagram Figure 1Project Architecture 3.1.2 Alternatives Our analysis is built on similar concepts to parsers, where each step prunes the data until we have the desired output. This is essentially pipe and filter, so other alternatives were quickly ruled out. Page 7 SOFTWARE DESIGN SPECIFICATION 3.2 Description for User Interface (UI) We will present the user with a desktop oriented Graphical User Interface to provide input to and output from the program. 3.2.1 Processing narrative for UI After starting the program, the user is prompted to enter the project name, the project type, and the directories or files to be checked. The user is then directed to a more project specific options page where they can select global or local search and the engine options. The input is then passed to the input handler for further processing. Once the results are returned, the UI will be updated. 3.2.2 UI interface description. The UI will create a Project Options class instantiation that will grab the user’s input. This object will be handed to the input handler. The UI will also grab the Result Set data structure and display that to the user based on a specified threshold. 3.2.3 UI processing detail 3.2.3.1 Restrictions/limitations The UI must be flexible for expansion to other document types. It must also be usable by people who do not have much experience with computers. 3.2.3.2 Performance issues The UI’s affect on performance will be negligible in comparison to the other components. 3.2.3.3 Design constraints The interface must be easily understood and user friendly and provide the ability to modify the engine options as needed. We will be using the GNOME Human Interface Guidelines because there is a strong relationship to GNOME and the GTK libraries we will be using. (developer.gnome.org/projects/gup/hig/) The UI must be able to save the results of the comparison to the user’s computer. Page 8 SOFTWARE DESIGN SPECIFICATION 3.3 Description for Input Handler The input handler will gather the information from the UI to then hand off to the File System and API handler. 3.3.1 Processing narrative for Input Handler The input handler will be passed a Project Options object. It will then pass the information provided by this class to the API handler. It will also retrieve the data from the file system and the API handler and pass it to the Parser. 3.3.2 Input Handler interface description. The input handler will pass the file data one by one to the parser. It will pass a query to the API handler, and will then receive the file data from the API handler. The Input Handler will also get file data from the file system of the user’s computer. 3.3.3 Input Handler processing detail 3.3.3.1 Restrictions/limitations The input handler must be able to resolve file system pass correctly based on the user’s operating system (Windows or Unix based). 3.3.3.2 Performance issues Like the UI, the input handler should not be very resource intensive. Page 9 SOFTWARE DESIGN SPECIFICATION 3.4 Description for Parse Engine The Parse Engine will take raw input of a single file at a time, and break it into the aforementioned parsed file data structure. 3.4.1 Processing narrative for Parse Engine The input handler passes the file data to the Parse Engine, which will in turn parse the file, before passing to the comparison engine. 3.4.2 Parse Engine interface description. The parse engine will fill a list of parsed file data structures to be used by the comparison engine. 3.4.3 Parse Engine processing detail 3.4.3.1 Algorithmic model The parse engine will break up the file based on variables and the operations performed on them. This is the metric by which we will do comparisons. 3.4.3.2 Restrictions/limitations The parse engine must be extremely accurate and must be able to identify all operations correctly and in a context, e.g. an assignment operation is different from an assignment operation in a loop. 3.4.3.4 Performance issues Performance will depend on the size of the source files and the amount of files processed. This is likely to be a resource intensive operation. 3.4.3.6 Design constraints We will initially only design the parser to accept single C language source code files. New parser modules should be easily written and installed. Page 10 SOFTWARE DESIGN SPECIFICATION 3.5 Description for Comparison Engine The comparison engine will take the list of parsed file objects and compare them to each other, determining the likeliness of plagiarism. 3.5.1 Processing narrative for Comparison Engine The Comparison engine will analyze the list of parsed files and check the operations performed on the variables to each other and find any files that are similar based on a defined threshold. The results will then be returned to the UI. 3.5.2 Comparison Engine interface description. The Comparison Engine will return the Result Set described in the global data structures to the UI. 3.5.3 Comparison Engine processing detail The Comparison Engine will check file by file, comparing the individual operations performed on the variables of the files to find likely matches. 3.5.3.1 Performance issues Along with the parse engine, this will be extremely computationally expensive, more so with large amounts of files. Page 11 SOFTWARE DESIGN SPECIFICATION 3.6 Description for API Handler The API Handler will interface with the input handler and query the specified databases. 3.6.1 Processing narrative for API Handler The API handler is passed the query information from the input handler and then queries the necessary databases. It then returns the file objects to be used by the parse engine. 3.6.2 API Handler interface description. The query the API Handler will get from the Input Handler will be a string of keywords to be passed to the appropriate database. 3.6.3 Component n processing detail The processing detail will be dependant on each databases respective API. 3.6.3.1 Restrictions/limitations The number of results returned should match the user specified amount. 3.6.3.5 Performance issues The performance is dependant on the network and the responsiveness of each respective server. 3.6.3.6 Design constraints The API Handler should be extensible for the installation of new databases. Page 12 SOFTWARE DESIGN SPECIFICATION 3.7 Software Interface Description 3.7.1 External machine interfaces The API handler will interface with databases on the internet if the project is run in global mode using the database’s API. 3.7.2 External system interfaces The Input Handler will retrieve files from the system, and the UI will save the output as a file for later use. 3.7.3 Human interface The UI will interface with the user as described above. Page 13 SOFTWARE DESIGN SPECIFICATION 4.0 User Interface Design 4.1 Screen images We have prototyped what we would like the basic user interface concepts to be. Below are the screenshots. Figure 2 First User Interface Screen Page 14 SOFTWARE DESIGN SPECIFICATION Figure 3 Second User Interface Screen 4.2 Interface design rules We will use the Human Interface Guidelines associated with the GNOME desktop. (developer.gnome.org/projects/gup/hig/) 4.3 Components available We are using pyGTK, Glade, and MatPlotLib to implement our GUI. These are all free software projects distributed under LGPL. 4.4 UIDS description We will provide documentation with our project on how to extend the User Interface to supporter new languages or documents using the above programs. Page 15 SOFTWARE DESIGN SPECIFICATION 5.0 Restrictions, limitations, and constraints 5.1 Time Limits The nature of our project requires that the program be finished by the middle of May. This creates severe limitations on the functionality that we will be able to implement. 5.2 Budgetary Constraints We do not have a budget, and will therefore use open source and free software to implement the application. 5.3 Hardware Constraints The program must be able to run on a typical desktop computer. Page 16 SOFTWARE DESIGN SPECIFICATION 6.0 Testing Issues 6.1 Classes of tests We will test each major component individually. We will provide a more comprehensive testing framework with our testing specification documents. In order to provide some form of comprehensiveness, we will test for boundary conditions, extraneous conditions, as well as valid inputs. We will use clear box style tests to insure that the application is working properly. 6.2 Expected software response Every test case ran will have an in, out, and expected to clearly show the testers which tests pass and fail, and how. 6.3 Performance bounds For more intensive tests, we will provide a run time on a specific hardware configuration as to make sure that the application runs quickly. 6.4 Identification of critical components The two most critical components are the parser and comparison engines. We will need to thoroughly test that these two components work well both separately and together. Page 17 SOFTWARE DESIGN SPECIFICATION 7.0 Appendices 7.1 Packaging and installation issues There will be a cross platform installer that will be test with Linux and Windows. 7.2 Legal Considerations We will be using primarily software that uses the LGPL license. We will also follow the philosophy behind the LGPL license while developing our project. 7.3 Data Flow diagrams Figure 4 Context Diagram Page 18 SOFTWARE DESIGN SPECIFICATION Figure 5 Data Flow Diagram Level 1 Figure 6 Data Flow Diagram Level 2 Page 19