LASI Linguistic Analysis for Subject Identification Milestone Presentation Presented by: CS410 Red Group July 26, 2016 2 July 26, 2016 Outline Team Red Staff Chart Introduction Problem Statement LASI in our Case Study • Functional Components • Algorithms • Milestones • • • • • • • • • • • Document Parsing Weighter GUI Flow GUI Screenshots Risk Matrix Competition Matrix Conclusion 3 July 26, 2016 Team Red Staff Chart Scott Minter Brittany Johnson Project Co Leader Software Specialist Project Co Leader Documentation Specialist Dustin Patrick Algorithm Specialist Expert Liaison Richard Owens Documentation Specialist Communication Specialist Aluan Haddad Erik Rogers Algorithm Specialist Software Specialist Marketing Specialist GUI Developer 4 What is LASI? July 26, 2016 5 July 26, 2016 LASI: Linguistic Analysis for Subject Identification LASI LASI THEMES 6 July 26, 2016 LASI Identifies Themes (5 W’s & 1 H) • Who • What • When • Where • Why • How 7 July 26, 2016 Why are themes important? • Comprehension • Summarization • Assists in communication between people 8 July 26, 2016 Societal Problem It is difficult for people to identify a common theme over a large set of documents in a timely, consistent, and objective manner. 9 July 26, 2016 Our Proposed Solution • LASI is a linguistic analysis decision support tool used to help determine a common theme across multiple documents. It is our goal with LASI to: • accurately find themes • be system efficient • provide consistent results 10 July 26, 2016 What do we mean by “linguistic analysis”? The contextual study of written works and how the words combine to form an overall meaning. 11 July 26, 2016 Dr. Patrick Hester & Dr. Tom Meyers: The AID Process Assessment Improvement Design • Dr. Hester & Dr. Meyers are systems analysts and researchers for NCSOSE • Conduct extensive research • Quickly become familiar with client systems • Formulate concise, Dr. Hester Dr. Meyers objective assessments 12 July 26, 2016 Before LASI Continue on to the rest of the A.I.D Process Customer Contact yes Situational Awareness Meeting Is the Customer satisfied? Will NCSOSE be needed? no Document Gathering Process no yes Client Goes Elsewhere Problem Statement Presentation Document Analysis 13 July 26, 2016 After LASI Continue on to the rest of the A.I.D Process Customer Contact yes Situational Awareness Meeting Is the Customer satisfied? Will NCSOSE be needed? no Document Gathering Process no yes Client Goes Elsewhere Problem Statement Presentation Document Analysis 14 July 26, 2016 Major Functional Components Hardware Software Algorithm: High End Notebook PC - Computation Quad-Core CPU - Primary Memory 8.0 GB DDR3 RAM - Document Storage Solid State Storage ~$1500 USD Extrapolates the most likely congruence of themes and ideas across all documents in the input domain User Interface: - Multi-Level Views - Weighted Phrase List - Detailed Breakdown - Step by Step Justification 15 July 26, 2016 Linguistic Analysis Algorithm Primary Analysis: Word Count and Syntactic Assessment Traverse Document in Word-Wise Manner Secondary Analysis: Associative Identification Tertiary Analysis: Semantic Relationship Assessment Bind Pronouns to Nouns, Updating Frequency Identify Potential Synonyms Identify Corresponding Parts of Speech Bind Adjectives to Nouns Assess Potential Subject-Object-Verb Relationships Determine Frequency by Grammatical Role Identify Potential Noun Phrases Output List of Weighted Themes 16 LASI Milestones July 26, 2016 17 Document Parsing July 26, 2016 18 Weighter July 26, 2016 19 GUI Flow July 26, 2016 20 Splash Screen July 26, 2016 21 New Project Screen July 26, 2016 22 Results Page July 26, 2016 23 July 26, 2016 Risk Matrix Customer Risks C1 -- Product Interest C2 -- Maintenance C3 -- Trust Technical Risks T1 -- System Limitations T2 -- Scanned Text Recognition T3 -- Jargon Recognition T4 – Illegal Character Handling 24 July 26, 2016 Customer Risks C1. Product Interest Probability 2 Impact 4 Mitigation: LASI offers unique functionality and userfriendliness. C2. Maintenance Probability 3 Impact 2 Mitigation: LASI will be a free, open source application allowing the community to maintain and extend it over time. C3. Trust Probability 3 Impact 3 Mitigation: LASI will provide a step by step breakdown of output analysis and algorithm reasoning 25 July 26, 2016 Technical Risks T1. System Limitations Probability 4 Impact 2 Mitigation: LASI will be designed from the ground up in native C++ for memory and CPU efficient code. T2. Scanned Text Recognition Probability 4 Impact 3 Mitigation: LASI will implement an optical character recognition algorithm to handle scanned text 26 July 26, 2016 Technical Risks T3. Jargon Recognition Probability 3 Impact 2 Mitigation: LASI will have domain specific dictionaries and feature intuitive contextual inference. T4. Illegal Character Handling Probability 4 Impact 2 Mitigation: LASI will providers contextual inference, synonym recognition and statistical methods 27 The Competition July 26, 2016 28 July 26, 2016 Conclusion • There is a need for LASI • LASI is an algorithm heavy program • Success is beneficial to anyone needing to analyze large sets of documents in a timely, consistent and objective manner 29 July 26, 2016 References “Patrick Hester" Old Dominion University. N.p., n.d. Web. 24 Sept. 2012 <http://www.odu.edu/directory/people/p/pthester>. "Tom Meyers." NCSOSE. N.p., n.d. Web. 22 Nov. 2012. <http://www.ncsose.org/index.php?option=com_jresearch>. Stanislaw Osinski, Dawid Weiss. 13 August, 2012 . Carrot 2. 9/25/2012 <http://project.carrot2.org>. ”WordStat” Provalis Research. Web. 24 Sept. 2012. <http://provalisresearch.com/products/content-analysis-software/>. “ReadMe: Software for Automated Content Analysis” Web. 24 Sept. 2012. <http://gking.harvard.edu/node/4520/rbuild_documentation/ readme.pdf> "AlchemyAPI Overview." AlchemyAPI. N.p., n.d. Web. 19 Oct. 2012. <http://www.alchemyapi.com/api/>. "AutoMap:." Project. N.p., n.d. Web. 19 Oct. 2012. <http://www.casos.cs.cmu.edu/projects/automap/>. "CL Research Home Page." CL Research Home Page. N.p., n.d. Web. 19 Oct. 2012. <http://www.clres.com/>.