L.A.S.I. Linguistic Analysis for Subject Identification Presented by Red Team:

advertisement
L.A.S.I.
Linguistic Analysis for Subject Identification
Presented by Red Team:
Scott Minter, Dustin Patrick, Aluan Haddad, Richard Owens,
Brittany Johnson and Erik Rogers
September 26th, 2012
Index
Introduction
Problem Specifics (cont.)
NCSOSE
Dr. Patrick Hester
Problem Identification
1
2
4
Why it matters
Criteria Abstraction Synergy
Public Sector Work
The Private Sector
5
6
7
8
AID Significance
Beyond Strategic Assessment
Applicability
9
Diagram
Mission Statement
Project Vision
Goals
Organizational Hierarchy
10
11
12
13
14
Affected Areas
Time
15
16
Current Process Flow
Problem Specifics
Documentation Gathering
Data Analysis
Scientific Process
17
18
19
Characteristics
How would the process change?
Possible Issues
20
21
24
Overview
Syntactic v. Semantic
Carrot2
Word Stat
ReadMe
25
26
27
28
29
What the Competition Lacks
30
The Potential Solution
The Competition
A Unique Solution
Conclusion
Final Thoughts
Works Cited
31
32
1
September 26th, 2012
Introduction: NCSOSE
• National Center for Systems of Systems Engineering
• Provides key services in strategic analysis to
corporations and government agencies
• Discover the Underlying Causes of Issues
1 NCSOSE website
2
September 26th, 2012
Introduction: Dr. Patrick Hester
•
Ph. D. from Vanderbilt University, 2007
▫ Major: Risk and Reliability
Engineering and Management
•
B.S. from Webb Institute, 2001
▫ Major: Naval Architecture and Marine
Engineering
“My research interests include multi-objective
decision making under uncertainty,
probabilistic and non-probabilistic
uncertainty analysis, critical infrastructure
protection, and decision making using
modeling and simulation. “ 2
- Dr. Hester
2Patrick Hester Website
3
September 26th, 2012
Introduction: Dr. Patrick Hester cont.
• While working for the Department of Homeland
Security he was involved in research for resource
allocation.
• This research led to the development of the
Enterprise AID methodology.
• Showing the importance of problem
identification and assessment
4
September 26th, 2012
Introduction: Problem Identification
“Failing to key in on the right problem means even the best solution is
destined to fail.”
– Dr. Hester
4Parsing Tool for Linguistic Analysis
5
September 26th, 2012
AID Significance: Why it matters
The AID Methodology is a flexible, transparent, universal
systems assessment approach which:
• Analyzes reports and documents critically objectively
• De-shrouds issues from an author’s frame of reference
• Identifies relevance of explicitly stated problems
• Determines the feasibility of proposed solutions
6
September 26th, 2012
AID Significance: Criteria Abstraction Synergy
“none directly measureable but every one justifiable as an attribute that merits the title of MOE and
hence might promote the assessment, improvement, or design of innumerable enterprises.” 3
-Hester & Meyers
• The AID process provides a basis for assessing qualitative well as
quantitative goals such as
•
•
•
Team Chemistry
Positive Attitude
Customer Satisfaction1
• Qualitative concerns are highly significant
•
•
•
Qualitative concerns are what people care about
They provide important context
Integrate an agency’s Mission and Vision, directly into the specific problem area
3Enterprise AID
7
September 26th, 2012
AID Significance: Public Sector Work
NCSOSE’s public sector work improves Gov.
Agency efficiency and helps it maintain focus
• Currently engaged in a 3 year, $1.6 million effort to improve
efficiency for The Department of Homeland Security4
• Cost and necessity assessments for The Federal Emergency
Management Agency3
• Public sector projects matter to all of us
4Parsing Tool for Linguistic Analysis
3Enterprise AID
8
September 26th, 2012
AID Significance: The Private Sector
• The AID methodology is generally applicable
• Malleable yet not generic
• Robust yet adaptable
• Beneficial to a wide array of specific issues,
focus areas, and problem sets
• Corporate planning and management
• Academic organization and strategy
9
September 26th, 2012
Beyond Strategic Assessment:
Applicability of LASI
An intelligent, heuristic algorithm to analyze written work to aid in
reaching an unbiased, intelligent conclusion would be invaluable.
• Broad Applicability
•
•
•
•
•
Strategic Analysis
Education6
Translation Engines
Web Search Algorithms7
General Management Decisions8
5 “A Framework for the Computerized Assessment of University Student Essays.”
6 “Using an Intelligent Agent to Enhance Search Engine Performance."
7“A fuzzy multi-criteria decision-making method for facility site selection”
10
September 26th, 2012
Current Workflow Process:
Process Diagram
Manual Search for:
Strategy/Strategic Plans
Customer
Contact
Problem Statement
Mission/Vision/Goals
Client Input/SA Meeting
Situational
Awareness
Meeting
Will NCSOSE
Be Useful?
yes
Documentation
Gathering Process
Does the
customer
provide
documentation?
no
no
Client Goes
Elsewhere
Independent
Research
Required
yes
Strategic Plans
Vision/Mission/Goals
Org. Hierarchy
11
September 26th, 2012
Current Workflow Process: Mission Statement
• Qualitative overview of the company solving the problem
• One or two sentences at most
• Must be considered to find the scope of the problem
12
September 26th, 2012
Current Workflow Process:
Project Vision
• Qualitative outline of the problem to be solved
• Several sentences that outline the problem in question
• Very important for Situational Awareness Meeting
13
September 26th, 2012
Current Workflow Process:
Goals
• Quantifiable Statements outlining a specific task
• Several are included in each document
• The most useful thing to look for in a strategic document
14
September 26th, 2012
Current Workflow Process:
Organizational Hierarchy
Dept. of Homeland
Security
S&T Directorate
Individual
Disaster Relief
FEMA
TSA
Public Disaster
Relief
Hurricane Relief
8 “DHS Component Websites”
Other
15
September 26th, 2012
Problem Specifics:
Affected Areas
• Time
• Documentation Gathering
• Data Analysis
• Scientific Process
16
September 26th, 2012
Problem Specifics:
Time
• Only 2 Ph. D.’s, Dr. Hester and his counter-part,
currently work on this process.
• Process is very time consuming.
• Can take up to 12 hours to go through problem
statement process.
17
September 26th, 2012
Problem Specifics:
Documentation Gathering
• Documents will have “jargon” specific to a client’s
organization, which must be defined and understood
ahead of time.
• The client may not provide enough documentation,
which requires additional research.
• If a lack of documentation requires extensive research,
only one of the researchers will perform the entire
problem statement process.
18
September 26th, 2012
Problem Specifics:
Data Analysis
• Large quantities of documents must be mentally
digested, assessed, and interrelated.
• Must determine aspects achievable given specific
monetary and temporal restrictions.
• There is difficulty keeping the problem space and
solution space separate and without bias.
19
September 26th, 2012
Problem Specifics: Scientific Process
• Identify and weight individual problem aspects in terms of
stated scope, goals, and resources.
• At this time, there is no defined “critique criteria” used to
determine the problem statement.
• There is only some scientific defense to convince the client the
problem statement is accurate, and not have to start over.
20
September 26th, 2012
The Potential Solution: Characteristics
•
•
•
•
•
Fully Automated
Run Locally
Web Querying Capabilities
Unbiased
Provides a Weighted Breakdown
21
September 26th, 2012
The Potential Solution: How would the process change?
• Decreased time for researching documents
If the customer does not provide documentation, the program will be able to search for it.
Does the
customer
provide
documentation?
no
Independent
Research
Required
22
September 26th, 2012
The Potential Solution: How would the process change?
• Decreased time for reading and assessment
Reading and analyzing documents currently takes about 6-8 hours and the total time
can take up to 12 hours. 9
Manual Search for:
Strategy/Strategic Plans
Problem Statement
Mission/Vision/Goals
Client Input/SA Meeting
9 Hester, Patrick
23
September 26th, 2012
The Potential Solution: How would the process change?
• Provide justification for the analysts’ reasoning
Statistical and visual proof is needed when presenting the final problem statement.
Situational
Awareness
Meeting
Present findings to customer
Problem Statement
24
September 26th, 2012
The Potential Solution: Possible Issues
•
•
•
•
Incomplete Documents
Computation Time
Memory Usage
Acronyms
25
September 26th, 2012
The Competition: Overview
• Document parsing and linguistic analysis are both
popular subjects that have much use.
Used in:
•
•
•
•
In the field of Linguistics
In the field of Computer Science
By the Government
As an aid to productive business for companies
26
September 26th, 2012
The Competition: Syntactic v. Semantic
Syntactic
• Logical grammar
• Statistical Data
• Alphabetical Frequencies
• Word Counts
• Parts of Speech
• Word Dependencies
Semantic
• Relating syntactic
structures to languageindependent meanings
• Extracting meaning and
conceptional arguments
• Summarization
27
September 26th, 2012
The Competition:
Carrot2
• Type: Semantic Analysis Tool
• Input:
• Documents
• Search Engine Results
• Output:
• List of associations of words between different sources
• Two different visual representations of the data
• Note: This is what is currently used.
10Carrot2
28
September 26th, 2012
The Competition: WordStat
• Type: Syntactic Analysis Tool
• Input:





Transcripts
Websites
Documents
Books
Taxonomy lists




User interface with plots and graphs
Database
Website
Document
• Output:
11WordStat
29
September 26th, 2012
The Competition: ReadMe
• Type: Syntactic Analysis Tool
• Input:
• Populations of Social Science Documents
• User-input categories
• Output:
• Unbiased estimations for topics in the input categories
• Opinions of thousands of people about a given topic
12ReadMe:Software for Automated Content Analysis
30
September 26th, 2012
A Unique Solution:
What the Competition Lacks
• Single theme finding capabilities over a span of multiple
documents
• Finding themes based on user provided key terms
• Web querying capabilities to find organizationally
related documents
31
September 26th, 2012
Conclusion:
Final Thoughts
• The ability to determine a problem is necessary.
• The current process is inefficient and resource intensive.
• L.A.S.I will “sniff” out the problem searching for a single
theme across a plethora of documents.
• Competitors can be outshined by additional help from
data mining.
• Automating the document analysis process will ensure
the problem statement will be correct.
32
September 26th, 2012
Works Cited
1 "National Centers for System of Systems Engineering." National Centers for System of Systems Engineering. Web.
23 Sept. 2012. <http://www.ncsose.org/>.
2 “Patrick Hester" Old Dominion University. N.p., n.d. Web. 24 Sept. 2012
<http://www.odu.edu/directory/people/p/pthester>.
3 Hester, P.T., Meyers, T. (2012). Enterprise AID
4 Patrick Hester, Parsing Tool for Linguistic Analysis
5 Duwairi, Rehab M. "A Framework for the Computerized Assessment of University Student Essays." Computers in
Human Behavior 22.3 (2006): 381-88. Web.
6 Jansen, James. "Using an Intelligent Agent to Enhance Search Engine Performance." First Monday 2.3 (1997): Web.
7 “A fuzzy multi-criteria decision-making method for facility site selection” GIN-SHUH LIANG, MAO-JIUN J. WANG
International Journal of Production Research Vol. 29, Is. 11, 1991
8 “DHS Component Websites.” Homeland Security. N.p., n.d. Web. 26 Sept. 2012. <http://www.dhs.gov/>.
9 Hester,Patrick. Personal Interview. 12 Sept. 2012.
10 Stanislaw Osinski, Dawid Weiss. 13 August, 2012 . Carrot 2. 9/25/2012 <http://project.carrot2.org>.
11”WordStat” Provalis Research. Web. 24 Sept. 2012.
<http://provalisresearch.com/products/content-analysis-software/>.
12 “ReadMe: Software for Automated Content Analysis” Web. 24 Sept. 2012.
<http://gking.harvard.edu/node/4520/rbuild_documentation/readme.pdf>
Download