L.A.S.I. Linguistic Analysis for Subject Identification November 12, 2012 Feasibility Presentation Presented by: CS410 Red Group 2 November 12, 2012 Outline • • • • • • • • • • Team Red Staff Chart Introduction Societal Problem Case Study Proposed Solution Major Component Diagram Algorithm The Competition Risk Conclusion 410 Red Group 3 November 12, 2012 Team Red Staff Chart Scott Minter Brittany Johnson Project Co Leader Software Specialist Project Co Leader Documentation Specialist Dustin Patrick Richard Owens Algorithm Specialist Expert Liaison Documentation Specialist Communication Specialist Aluan Haddad Erik Rogers Algorithm Specialist Software Specialist Marketing Specialist GUI Developer 410 Red Group 4 November 12, 2012 What is a theme? 410 Red Group 5 November 12, 2012 410 Red Group A specific and distinctive quality, characteristic, or concern.1 1“Theme” Merriam Webster 6 November 12, 2012 410 Red Group What are you looking for when you are identifying a theme? 7 November 12, 2012 5 W’s & 1 H • • • • • • Who What When Where Why How 410 Red Group 8 November 12, 2012 410 Red Group Bill’s stove was broken. He has been saying for months that he would go to the appliance store to buy a new one. He had some free time yesterday, so he drove to the store to buy a new stove. 9 November 12, 2012 Who Bill What He travelled to some place When Yesterday Where The store Why To buy a stove because his broke How By driving 410 Red Group 10 November 12, 2012 410 Red Group The Theme from the 5 W’s & 1 H Bill drove to the store yesterday to buy a new stove because his broke. 11 November 12, 2012 Why are themes important? • Comprehension • Summarization • Assists in communication between people 410 Red Group 12 November 12, 2012 410 Red Group Societal Problem It is difficult for people to identify a common theme over a large set of documents in a timely, consistent, and objective manner. 13 November 12, 2012 410 Red Group How long does it take? • Finding a theme over multiple documents is a time-consuming process. • The average reading speed of an adult is 250 words per minute.2 2Thomas "What Is the Average Reading Speed and the Best Rate of Reading?" 14 November 12, 2012 410 Red Group Consistency and Objectivity • The criteria for evaluation may vary from person to person. • Large quantities of documents must be mentally digested, assessed, and interrelated. 15 November 12, 2012 410 Red Group Dr. Patrick Hester Ph. D. from Vanderbilt University, 2007 Major: Risk and Reliability Engineering and Management “My research interests include multi-objective decision making under uncertainty, probabilistic and non probabilistic uncertainty analysis, critical infrastructure protection, and decision making using modeling and simulation.” 3 - Dr. Hester 3Patrick Hester Website 16 November 12, 2012 410 Red Group • Dr. Hester is a systems analyst and researcher ▫ He Must Conduct extensive research Quickly become familiar with client systems Formulate concise, objective assessments • LASI will help with all of this 17 November 12, 2012 410 Red Group Assessment Improvement Design (A.I.D.) • Preliminary Problem statement Identified from document • Problem statement then used to find Critical Operational Issues (COI’s) • COIs used to find Measures of Effectiveness (MOE’s) • MOE’s used to find Measures of Performance (MOP’s) 18 November 12, 2012 410 Red Group Current Method Continue on to the rest of the A.I.D Process Customer Contact yes Situational Awareness Meeting Is Customer satisfied? Problem Statement Presentation no Will NCSOSE be needed? no Client Goes Elsewhere yes Document Gathering Process Document Analysis 19 November 12, 2012 LASI: 410 Red Group Linguistic Analysis for Subject Identification LASI THEMES 20 November 12, 2012 410 Red Group Our Proposed Solution • LASI is a linguistic analysis decision support tool used to help determine a common theme across multiple documents. It is our goal with LASI to: ▫ accurately find themes ▫ be system efficient ▫ provide consistent results 21 November 12, 2012 410 Red Group What do we mean by “linguistic analysis”? The contextual study of written works and how the words combine to form an overall meaning. 26 November 12, 2012 410 Red Group Linguistic analysis involves Syntactic • Logical grammar • Statistical Data • Alphabetical Frequencies • Word Counts • Parts of Speech • Word Dependencies Semantic • Relating syntactic structures to languageindependent meanings • Extracting meaning and conceptional arguments • Summarization 23 November 12, 2012 410 Red Group The Wills and Will Nots of LASI What LASI Will Do • Analyze multiple documents to find common themes What LASI Will Not Do • Provide a concise synopsis • Provide a single theme • Provide statistical data to help a user make a decision 24 November 12, 2012 Who Would This Appeal To? • Researchers • Consultants • Academics • Students 410 Red Group 25 November 12, 2012 Benefits To The Customer • Time saving • Objective output • Consistent output • Cost saving solution 410 Red Group 26 November 12, 2012 410 Red Group How does LASI fit into our Case Study? 27 November 12, 2012 Before LASI Continue on to the rest of the A.I.D Process Customer Contact yes Situational Awareness Meeting Is the Customer satisfied? Problem Statement Presentation no Will NCSOSE be needed? no Client Goes Elsewhere 410 Red Group yes Document Gathering Process Document Analysis 28 November 12, 2012 After LASI Continue on to the rest of the A.I.D Process Customer Contact yes Situational Awareness Meeting Is the Customer satisfied? Problem Statement Presentation no Will NCSOSE be needed? no Client Goes Elsewhere 410 Red Group yes Document Gathering Process LASI Aided Document Analysis 29 November 12, 2012 410 Red Group Major Functional Components Hardware Software Algorithm: High End Notebook PC - Computation Quad-Core CPU - Primary Memory 8.0 GB DDR3 RAM - Document Storage Solid State Storage ~$1500 USD Extrapolates the most likely congruence of themes and ideas across all documents in the input domain User Interface: - Multi-Level Views - Weighted Phrase List - Detailed Breakdown - Step by Step Justification 30 November 12, 2012 410 Red Group Linguistic Analysis Algorithm Primary Analysis: Word Count and Syntactic Assessment Secondary Analysis: Associative Identification Tertiary Analysis: Semantic Relationship Assessment Traverse Document in Word-Wise Manner Bind Pronouns to Nouns, Updating Frequency Identify Potential Synonyms Identify Corresponding Parts of Speech Bind Adjectives to Nouns Assess Potential SubjectObject-Verb Relationships Determine Frequency by Grammatical Role Identify Potential Noun Phrases Output List of Weighted Themes 31 November 12, 2012 Milestone diagram 410 Red Group 32 November 12, 2012 The Competition 410 Red Group 33 November 12, 2012 The Competition 410 Red Group 34 WordStat November 12, 2012 410 Red Group 35 Stanford CoreNLP November 12, 2012 410 Red Group 36 ReadMe November 12, 2012 410 Red Group 37 Automap November 12, 2012 410 Red Group 38 November 12, 2012 410 Red Group Risk Matrix Customer Risks C1 -- Product Interest C2 -- Maintenance C3 -- Trust Technical Risks T1 -- System Limitations T2 -- Scanned Text Recognition T3 -- Jargon Recognition T4 – Illegal Character Handling 39 November 12, 2012 410 Red Group Customer Risks C1. Product Interest Probability 2 Impact 4 Mitigation: LASI offers unique functionality and user friendliness. C2. Maintenance Probability 3 Impact 2 Mitigation: LASI will be a free, open source application allowing the community to maintain and extend it over time. C3. Trust Probability 3 Impact 3 Mitigation: LASI will provide a step by step breakdown of output analysis and algorithm reasoning 40 November 12, 2012 410 Red Group Technical Risks T1. System Limitations Probability 4 Impact 2 Mitigation: LASI will be designed from the ground up in native C++ for memory and CPU efficient code. T2. Scanned Text Recognition Probability 4 Impact 3 Mitigation: LASI will implement an optical character recognition algorithm to handle scanned text 41 November 12, 2012 410 Red Group Technical Risks T3. Jargon Recognition Probability 3 Impact 2 Mitigation: LASI will have domain specific dictionaries and feature intuitive contextual inference. T4. Illegal Character Handling Probability 4 Impact 2 Mitigation: LASI will providers contextual inference, synonym recognition and statistical methods 42 November 12, 2012 410 Red Group Conclusion • LASI is feasible. • LASI is a decision support tool not a decision making tool. • Implications of success affect a wide area of study and professions. • In order for LASI to succeed the output needs to immediately usable and the interface userfriendly. 43 November 12, 2012 410 Red Group References 1. 2. 3. "Theme." Def. 1b. Merriam Webster. N.p., n.d. Web. 19 Oct. 2012. <http://www.merriam-webster.com/dictionary/theme >. Thomas, Mark. "What Is the Average Reading Speed and the Best Rate of Reading?" What Is the Average Reading Speed and the Best Rate of Reading? Web. 19 Oct. 2012. <http://www.healthguidance.org/entry/13263/1/What-Is-the-AverageReading-Speed-and-the-Best-Rate-of-Reading.html>. “Patrick Hester" Old Dominion University. N.p., n.d. Web. 24 Sept. 2012 <http://www.odu.edu/directory/people/p/pthester>. Stanislaw Osinski, Dawid Weiss. 13 August, 2012 . Carrot 2. 9/25/2012 <http://project.carrot2.org>. ”WordStat” Provalis Research. Web. 24 Sept. 2012. <http://provalisresearch.com/products/content-analysis-software/>. “ReadMe: Software for Automated Content Analysis” Web. 24 Sept. 2012. <http://gking.harvard.edu/node/4520/rbuild_documentation/readme.pdf> "AlchemyAPI Overview." AlchemyAPI. N.p., n.d. Web. 19 Oct. 2012. <http://www.alchemyapi.com/api/>. "AutoMap:." Project. N.p., n.d. Web. 19 Oct. 2012. <http://www.casos.cs.cmu.edu/projects/automap/>. "CL Research Home Page." CL Research Home Page. N.p., n.d. Web. 19 Oct. 2012. <http://www.clres.com/>.