LASI Milestone Presentation Presented by: CS410 Red Group Linguistic Analysis for Subject Identification

advertisement
LASI
Linguistic Analysis for Subject Identification
Milestone Presentation
Presented by: CS410 Red Group
July 26, 2016
2
July 26, 2016
Outline
Team Red Staff Chart
Introduction
Problem Statement
LASI in our Case
Study
• Functional
Components
• Algorithms
• Milestones
•
•
•
•
•
•
•
•
•
•
•
Document Parsing
Weighter
GUI Flow
GUI Screenshots
Risk Matrix
Competition Matrix
Conclusion
3
July 26, 2016
Team Red Staff Chart
Scott Minter
Brittany Johnson
Project Co Leader
Software Specialist
Project Co Leader
Documentation Specialist
Dustin Patrick
Algorithm Specialist
Expert Liaison
Richard Owens
Documentation Specialist
Communication Specialist
Aluan Haddad
Erik Rogers
Algorithm Specialist
Software Specialist
Marketing Specialist
GUI Developer
4
What is LASI?
July 26, 2016
5
July 26, 2016
LASI: Linguistic Analysis for Subject Identification
LASI
LASI
THEMES
6
July 26, 2016
LASI Identifies Themes (5 W’s & 1 H)
• Who
• What
• When
• Where
• Why
• How
7
July 26, 2016
Why are themes important?
• Comprehension
• Summarization
• Assists in communication between people
8
July 26, 2016
Societal Problem
It is difficult for people to identify a common
theme over a large set of documents in a
timely, consistent, and objective manner.
9
July 26, 2016
Our Proposed Solution
• LASI is a linguistic analysis decision support
tool used to help determine a common
theme across multiple documents. It is our
goal with LASI to:
• accurately find themes
• be system efficient
• provide consistent results
10
July 26, 2016
What do we mean by “linguistic analysis”?
The contextual study of written works and
how the words combine to form an overall
meaning.
11
July 26, 2016
Dr. Patrick Hester & Dr. Tom Meyers: The AID Process
Assessment Improvement Design
• Dr. Hester & Dr. Meyers
are systems analysts and
researchers for NCSOSE
• Conduct extensive
research
• Quickly become familiar
with client systems
• Formulate concise,
Dr. Hester
Dr. Meyers
objective assessments
12
July 26, 2016
Before LASI
Continue on to the rest of
the A.I.D Process
Customer
Contact
yes
Situational
Awareness
Meeting
Is the
Customer
satisfied?
Will
NCSOSE
be needed?
no
Document
Gathering
Process
no
yes
Client Goes
Elsewhere
Problem
Statement
Presentation
Document
Analysis
13
July 26, 2016
After LASI
Continue on to the rest of
the A.I.D Process
Customer
Contact
yes
Situational
Awareness
Meeting
Is the
Customer
satisfied?
Will
NCSOSE
be needed?
no
Document
Gathering
Process
no
yes
Client Goes
Elsewhere
Problem
Statement
Presentation
Document
Analysis
14
July 26, 2016
Major Functional Components
Hardware
Software
Algorithm:
High End Notebook PC
- Computation
Quad-Core CPU
- Primary Memory
8.0 GB DDR3 RAM
- Document Storage
Solid State Storage
~$1500 USD
Extrapolates the most
likely congruence of
themes and ideas across
all documents in the
input domain
User Interface:
- Multi-Level Views
- Weighted Phrase List
- Detailed Breakdown
- Step by Step
Justification
15
July 26, 2016
Linguistic Analysis Algorithm
Primary Analysis:
Word Count and
Syntactic Assessment
Traverse Document in
Word-Wise Manner
Secondary Analysis:
Associative
Identification
Tertiary Analysis:
Semantic Relationship
Assessment
Bind Pronouns to Nouns,
Updating Frequency
Identify Potential
Synonyms
Identify Corresponding
Parts of Speech
Bind Adjectives to Nouns
Assess Potential
Subject-Object-Verb
Relationships
Determine Frequency by
Grammatical Role
Identify Potential Noun
Phrases
Output List of
Weighted Themes
16
LASI Milestones
July 26, 2016
17
Document Parsing
July 26, 2016
18
Weighter
July 26, 2016
19
GUI Flow
July 26, 2016
20
Splash Screen
July 26, 2016
21
New Project Screen
July 26, 2016
22
Results Page
July 26, 2016
23
July 26, 2016
Risk Matrix
Customer Risks
C1 -- Product Interest
C2 -- Maintenance
C3 -- Trust
Technical Risks
T1 -- System Limitations
T2 -- Scanned Text Recognition
T3 -- Jargon Recognition
T4 – Illegal Character Handling
24
July 26, 2016
Customer Risks
C1. Product Interest
Probability 2
Impact 4
Mitigation: LASI offers unique functionality and userfriendliness.
C2. Maintenance
Probability 3
Impact 2
Mitigation: LASI will be a free, open source application
allowing the community to maintain and extend it over time.
C3. Trust
Probability 3
Impact 3
Mitigation: LASI will provide a step by step breakdown of
output analysis and algorithm reasoning
25
July 26, 2016
Technical Risks
T1. System Limitations
Probability 4
Impact 2
Mitigation: LASI will be designed from the ground up
in native C++ for memory and CPU efficient code.
T2. Scanned Text Recognition
Probability 4
Impact 3
Mitigation: LASI will implement an optical character
recognition algorithm to handle scanned text
26
July 26, 2016
Technical Risks
T3. Jargon Recognition
Probability 3 Impact 2
Mitigation: LASI will have domain specific dictionaries
and feature intuitive contextual inference.
T4. Illegal Character Handling
Probability 4 Impact 2
Mitigation: LASI will providers contextual inference,
synonym recognition and statistical methods
27
The Competition
July 26, 2016
28
July 26, 2016
Conclusion
• There is a need for LASI
• LASI is an algorithm heavy program
• Success is beneficial to anyone needing
to analyze large sets of documents in a
timely, consistent and objective manner
29
July 26, 2016
References
“Patrick Hester" Old Dominion University. N.p., n.d. Web. 24 Sept. 2012
<http://www.odu.edu/directory/people/p/pthester>.
"Tom Meyers." NCSOSE. N.p., n.d. Web. 22 Nov. 2012.
<http://www.ncsose.org/index.php?option=com_jresearch>.
Stanislaw Osinski, Dawid Weiss. 13 August, 2012 . Carrot 2. 9/25/2012
<http://project.carrot2.org>.
”WordStat” Provalis Research. Web. 24 Sept. 2012.
<http://provalisresearch.com/products/content-analysis-software/>.
“ReadMe: Software for Automated Content Analysis” Web. 24 Sept. 2012.
<http://gking.harvard.edu/node/4520/rbuild_documentation/
readme.pdf>
"AlchemyAPI Overview." AlchemyAPI. N.p., n.d. Web. 19 Oct. 2012.
<http://www.alchemyapi.com/api/>.
"AutoMap:." Project. N.p., n.d. Web. 19 Oct. 2012.
<http://www.casos.cs.cmu.edu/projects/automap/>.
"CL Research Home Page." CL Research Home Page. N.p., n.d. Web. 19 Oct.
2012. <http://www.clres.com/>.
Download