Analytics - Shidler College of Business

advertisement
Big Data And Analytics
Challenges and Issues
Stephen H. Kaisler, D.Sc.
Frank J. Armour, Ph.D.
J. Alberto Espinosa, Ph.D.
William H. Money, Ph.D.
Presented at HICSS-49
January 5, 2016
Grand Hyatt, Poipu, Kauai, Hawaii
Who We Are
Stephen H. Kaisler, D.Sc.
Senior Associate
PCI Strategic Management
Columbia, MD
Stephen.Kaisler@pci-sm.com
Frank J. Armour, Ph.D.
Kogod School of Business
American University
Washington, DC
farmour@american.edu
3/12/2016
BDA-2
J. Alberto Espinosa, Ph.D.
Professor and Chair
Kogod School of Business
American University
Washington, DC
alberto@american.edu
William H. Money, Ph.D.
Associate Professor
School of Business Administration
The Citadel
Charleston, SC
wmoney@citadel.edu
Copyright (except where referenced) 2014-2016
Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
BDA-2
Outline
Topic
Schedule
Analytics: An Introduction
1300-1415
Break
1415-1445
Analytics:
1445-1600
BDA-3
Copyright (except where referenced) 2014-2016
Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
Tutorial Purpose
• Big Data and Analytics is thought to be about:
– Business Intelligence and Analytics
– Computational Science
• But, it is much more than that!
–
–
–
–
Demographic Analysis
Geointelligence: Spatial Analysis
The Grand Challenges in Science
Medicine: Processing 3-D hyperspectral high resolution images
for diagnostics, genomic research, Proteonomics, etc.
– Media Analysis: Processing text, audio, video, imagery
– And much, much more …..
• So, we want to introduce you to the issues and
challenges at the frontiers of Advanced Analytics!
BDA-4
Copyright (except where referenced) 2014-2016
Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
The Business Analytics Landscape
Strategies
Social, Email, Blogs, Video, Mobile
Marketing, Sales - Product Listing, Promotions
Applications
ERP, CRM, Databases, Internal Applications,
Customer/Consumer facing applications
Context
Web, Customers, Products, Business Systems,
Processes and Services
Support Systems
CRM, Recommendation Systems
Data warehouses, Business Intelligence
Ref: S. Radhakrishnan, Advanced analytics: The Next Wave of Business Intelligence, Business Intelligence Conference,
2011
BDA-5
Copyright (except where referenced) 2014-2016
Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
The Emerging Analytics Landscape
• Extending diagnostic analytics to different domains
• Developing new predictive and prescriptive analytics
based on advanced analytic techniques
– Prediction based on scenario development rather than just
probabilities
– Prescription based on advanced simulation and visualization
capabilities
• Development of Analytic Scientist curricula and degree
programs
• Expansion beyond the traditional business intelligence
applications and scientific application based on
descriptive analytics.
BDA-6
Copyright (except where referenced) 2014-2016
Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
What is Advanced Analytics?
• Advanced analytics:
– the application of multiple analytic methods that address the diversity of
big data – structured or unstructured –
– to provide descriptive results, and
– to yield actionable predictive and prescriptive results that facilitate
decision-making.
• Beyond data mining and statistical processing methods to
encompass logic-based methods, qualitative analytics, and nonstatistical quantitative methods.
• A diverse set of techniques that require new software architectures
and application frameworks to solve complex problems.
• New metrics that focus on the contributions of the value of the
analysis as a holistic result are required to assess and evaluate the
outcomes of advanced analytics.
BDA-7
Copyright (except where referenced) 2014-2016
Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
Setting the Stage:
A Few Words
About Big Data
BDA-8
Copyright (except where referenced) 2014-2016
Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
Big Data Definition
• Big Data is the amount of data just beyond technology’s
capability to store, manage and process efficiently.
Ah, but a man’s reach should exceed his grasp, Or what’s a heaven for?” – Robert Browning
Copyright (except where referenced) 2014-2016
Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
BDA-9
Big Data - Quick Recap
• Organizations have access to a wealth of information:
– They can’t get value out of it because most of it is sitting in its
most raw form or in a semistructured or unstructured form
– They don't even know whether it's worth keeping (or even able to
keep it for that matter).
• Attributes:
– Gartner: Volume, Velocity, Variety
– Kaisler, Armour, Espinosa, Money: Value, Veracity
– Others have also been defined
• Value relies on Big Data processing being fast and agile
BDA-10
Copyright (except where referenced) 2014-2016
Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
Hmmm!
BDA-11
Copyright (except where referenced) 2014-2016
Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
The Data Scientist
Hal Varian, Mckinsey Quarterly, January 2009:
“The sexy job in the next ten years will be statisticians… The ability to take
data—to be able to understand it, to process it, to extract value from it, to
visualize it, to communicate it—that’s going to be a hugely important skill.”
Ref: http://www.mckinseyquarterly.com/Hal_Varian_on_how_the_Web_challenges_managers_2286
“The critical job in the next 20 years will be the analytic scientist … the
individual with the ability to understand a problem domain, to understand
and know what data to collect about it, to identify analytics to process that
data/information, to discover its meaning, and to extract knowledge from it—
that’s going to be a very critical skill.”
- Kaisler, Armour, Espinosa, Money (2014) Amended
For both roles:
Analytic scientists require advanced training in specific domains, data science
tools, multiple analytics, and visualization to perform predictive and
prescriptive analytics. They may hold Ph.D.’s, but pragmatic experience in a
domain will be equally important.
Copyright (except where referenced) 2014-2016
Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
BDA-12
Some Big Data Issues Affecting Analytics
•
Volume:
–
–
•
Velocity:
–
–
–
•
A small fraction is structured formats, Relational, XML, etc.
A fair amount is semi-structured, as web logs, etc.
The rest of the data is unstructured text, photographs, etc.
So, no single data model can currently handle the diversity
Veracity: cover term for …
–
–
•
Much data coming in at high speed
Need for streaming versus block approach to data analysis
So, how to analyze data in-flight and combine with data at-rest
Variety:
–
–
–
–
•
How much data is really relevant to the problem solution? Cost of processing?
So, can you really afford to store and process all that data?
Accuracy, Precision, Reliability, Integrity
So, what is it that you don’t know you don’t know about the data?
Value:
–
–
How much value is created for each unit of data (whatever it is)?
So, what is the contribution of subsets of the data to the problem solution?
BDA-13
Copyright (except where referenced) 2014-2016
Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
Types of Analytics
•
•
•
•
•
Descriptive: A set of techniques for reviewing and examining the data
set(s) to understand the data and analyze business performance.
Diagnostic: A set of techniques for determine what has happened and why
Predictive: A set of techniques that analyze current and historical data to
determine what is most likely to (not) happen
Prescriptive: A set of techniques for computationally developing and
analyzing alternatives that can become courses of action – either tactical or
strategic – that may discover the unexpected
Decisive: A set of techniques for visualizing information and recommending
courses of action to facilitate human decision-making when presented with
a set of alternatives.
Passive
Active
Deductive
Descriptive
Diagnostic
Inductive
Predictive
Prescriptive
Copyright (except where referenced) 2014-2016
Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
BDA-14
Descriptive Analytics
• Process:
http://v1shal.com/content/25cartoons-give-current-big-datahype-perspective/
•
– Identify the attributes, then assess/evaluate the attributes
– Estimate the magnitude to correlate the relative contribution of each attribute to
the final solution
– Accumulate more instances of data from the data sources
– If possible, perform the steps of evaluation, classification and categorization
quickly
– Yield a measure of adaptability within the OODA loop
At some threshold, crossover into diagnostic and predictive analytics
Copyright (except where referenced) 2014-2016
Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
BDA-15
Diagnostic Analytics
• Process:
– Begin with descriptive analytics
– Extract patterns from large data quantities via data mining
– Correlate data types for explanation of near-term behavior – past and
present
– Estimate linear/non-linear behavior not easily identifiable through other
approaches.
• Example: by classifying past insurance claims, estimate the number
of future claims to flag for investigation with a high probability of
being fraudulent.
16
Copyright (except where referenced) 2014-2016
Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
Predictive Analytics
• Process:
– Begin with descriptive AND diagnostic analytics
– Choose the right data based on domain knowledge and relationships among
variables
– Choose the right techniques to yield insight into possible outcomes
– Determine the likelihood of possible outcomes given initial boundary conditions
– Remember! Data driven analytics is non-linear; do NOT treat like an engineering
project
17
Prescriptive Analytics
• Process:
–
–
–
–
–
–
Begin w/ predictive analytics
Determine what should occur and how to make it so
Determine the mitigating factors that lead to desirable/undesirable outcomes
“What-if” analysis w/ local or global optimization
Ex: Find the best set of prices and advertising frequency to maximize revenue
Ex: And, the right set of business moves to make to achieve that goal
“Make it so”
Copyright (except where referenced) 2014-2016
Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
BDA-18
Decisive Analytics
• Process:
– Given a set of decision
alternatives, choose the one
course of action to do from
possibly many
– But, it may not be the optimal
one.
– Visualize alternatives – whole
or partial subset
– Perform exploratory analysis –
what-if and why
• How do I get to there from
here?
• How did I get here from there?
BDA-19
Copyright (except where referenced) 2014-2016
Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
Advanced Analytics:
Application of Analytics
To
Critical Problems
3/12/2016
BDA-20
Copyright (except where referenced) 2014-2016
Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
The Role of Analytics
• “Tools and techniques that gear the analyst’s mind to apply higher
levels of critical thinking can substantially improve analysis…
structuring information, challenging assumptions, and exploring
alternative interpretations.”
Richards Heuer, Jr., “The Psychology of Intelligence Analysis”
• Beware Frege’s Caution:
– Converse Problems:
• If you magnify on details, you are losing the overview
• If you focus on the overview, you don’t see the details
– Problem with Data Mining:
• Applying statistics to understand the trends causes a loss of
grounding in the data
3/12/2016
BDA-21
Copyright (except where referenced) 2014-2016
Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
The Analytics Continuum
• Analytics problems span a continuum:
– Short-term analysis leads to quick fixes and quick results, which
may be unsustainable
– What are the disruptive innovations in the middle-term that
provide near-term domain leadership?
– Long-term leads to strategic changes and innovations that
provide sustainable domain dominance.
push
Finding a Needle
In a Haystack
Top-Down Analysis
Deductive
3/12/2016
BDA-22
pull
A little bit of this; a little bit of that
Middle-Out Analysis
Abductive?
Spinning Straw into Gold
Bottom-Up Analysis
Inductive
Copyright (except where referenced) 2014-2016
Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
Analytics Classes
Indications & Warning
Dynamical Systems
(Hidden) Markov Models
Event Data Analysis
Econometric Models
Probabilistic Models
Principal Components
Analysis
Game Theory Models
Logic Systems
Generally, indications of warfare and potential conflict and
other crises, based on quantitative information found in open
source datasets
Differential or difference equations of low dimensionality
representing competing actors (incl. system dynamics)
Time-phased data aggregated at fixed intervals with scaled values.
Separate from underlying events input to set of discrete states w/
associated probabilities.
Analysis of abstracted and coded streams of short-term interactions
among competing or cooperating actors
Large-scale aggregate models of social actors, states or organizations in
economic and social systems – regional, national, international.
Regression and statistical models estimating the probability of how
variables will affect a specified outcome.
Techniques for the reduction of high-dimensionality models to a few critical
dimensions to facilitate prediction and visualization.
Application of 2-person and N-person game theory to competitive and
collaborative situations involving strategic interdependence.
Use of logical formulae and systems to represent and solve qualitative
Problems, including deductive, abductive, and inductive techniques.
Ref: Kaisler and Cioffi-Revilla 2007; Kaisler, Armour, Espinosa, and Money 2014
3/12/2016
BDA-23
Copyright (except where referenced) 2014-2016
Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
Analytics Classes
3/12/2016
BDA-24
Copyright (except where referenced) 2014-2016
Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
Analytics Is About Discovery
Novelty Discovery
– Finding new, rare, one-in-a-[million / billion / trillion/ etc.] objects
and events
Class Discovery
– Finding new classes of objects and behaviors
– Learning the rules that constrain class boundaries
Association Discovery
– Finding unusual (improbable) co-occurring associations
Correlation Discovery
– Finding patterns and dependencies, which reveal new natural
laws or new scientific principles
associations
Ref: Kirk Borne, Dynamic Events in Massive Data Streams, GMU
BDA-25
Copyright (except where referenced) 2014-2016
Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
The Goal of Analytics
http://timoelliott.com/blog/wpcontent/uploads/2013/04/Whats-the-ROI-ofknowing-what-you-dont-know.jpg
From sensors (data collection, measurement, observation, …)
to Monitoring and Alerting
to Sensemaking (Data and Analytics Science)
to Cents-Making (Getting to ROI!!)
Adapted from: Kirk Borne, Dynamic Events in Massive Data Streams, GMU
BDA-26
Copyright (except where referenced) 2014-2016
Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
Sensemaking
• In the end, what analytics is really about is sensemaking:
– What does that event really mean to me/him/her/my friends/etc.?
– What is a plausible explanation?
• Sensemaking:
– Fits data into a frame or mental model
– Can be physical or social
– Requires situational awareness that helps us to adapt and respond
to known and unexpected or unknown situations
– Interpreting – something is there that is waiting to be discovered or
approximated
– Comparison to previous experience - retrospectively
– Requires a higher level of intellectual engagement, not a passive
translation
• Klein (2006) theorizes that sensemaking processes are initiated
when individuals or organizations recognize a lack of
understanding of events
3/12/2016
BDA-27
Copyright (except where referenced) 2014-2016
Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
Sensemaking Challenges
•
Lillian Wu (IBM) has noted that everything is becoming:
– Instrumented: We now have the ability to measure, sense and see the exact
condition of practically everything.
– Interconnected: People, systems and objects can communicate and interact with
each other in entirely new ways
– Intelligent: People, systems and objects can respond to changes quickly and
accurately, and get better results by predicting and optimizing for future events.
•
•
•
•
How to deal with ambiguity?
How to deal with too much data?
Have we let algorithms and large centralized data centres not only control
the remembering but also the meaning and interpretation of the data?
(Giulia Forsythe, http://gforsythe.ca/big-data-sensemaking/)
We know how to do massive data collection and have the ability to index,
curate, search and share.
–
But, what seems to be missing is the ability to review, reflect, recall and ponder
Ref: Technology, Data. Analytics, PSM workshop -- October 14, 2011
3/12/2016
BDA-28
Copyright (except where referenced) 2014-2016
Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
BDA-28
Sensemaking: Properties (Weick)
• Grounded in identity construction
– Know thyself & Know your enemies/friends, …
• Retrospective:
– Look back to look forward; Look forward to look back
• Social
– Who socialized you, how, and who will see the results
– Beliefs: I believe what you believe what I believe, etc.
• Ongoing and dynamic:
– Who or what changes over time and space
• Cues: what are the initiators?
• Plausibility rather than accuracy:
– Understanding only needs to be sufficient, not
comprehensive
3/12/2016
BDA-29
Copyright (except where referenced) 2014-2016
Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
Sensemaking: An Application
• Mobile Crowdsensing:
– Individuals with sensing and computing devices collectively share
information to measure and map phenomena of common interest within
communities of people
• Participatory sensing - individuals are actively involved in contributing sensor
data
• Opportunistic sensing - autonomous and user involvement is minimal
• Localized analytics at the device and near field
• Aggregate analytics at the central repository
• Privacy Issues: Potentially collecting sensitive sensor data pertaining to
individuals
Ref: Ganti, R.K., F. Le, and H. Lei. ??. Mobile Crowdsensing: Current State and Future Challenges,
BM T.J. Watson Research Center, Hawthorne, NY
3/12/2016
BDA-30
Copyright (except where referenced) 2014-2016
Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
Knowledge-Centric Systems
•
User-centric Systems — Systems That Know
•
Adaptive Systems — Systems That Learn
– Knowledge-driven solutions that feature modeling, collaboration, and
advanced analytics to detect patterns, make sense, simulate, predict,
learn, take action, and improve performance with use and scale.
•
Smart Operations — Systems That Reason
– Knowledge-driven solutions that reason like experts, advise as
avatars, adapt, are autonomic, perform autonomously.
3/12/2016
BDA-31
Copyright (except where referenced) 2014-2016
Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
These are the types
of systems using
advanced analytics
– Knowledge-driven solutions that connect open information, share
open decision-making rules, deliver open composite services, access
and navigate information in context of use, and provide virtual
assistants that manage cases and complete tasks.
The New Analytic Paradigm
#1: You will be expected to do something with information
#2: There really is more to know
#3: You will have to know more about knowing
#4: Brain science and decision science are converging
#5: The environment is changing our brain
#6: Information management is the essence of leadership
#7: A more connected world means much more data is
available (and accessible)
#8: Math matters (but so does logic and rules)
#9: There are significant downsides to not knowing
#10: Knowing can change the world
Source: Thompson May, The New Know: Innovation Powered by Analytics, 2009
3/12/2016
BDA-32
Copyright (except where referenced) 2014-2016
Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
Analytics Challenges
BDA-33
Finding A Needle in a Haystack
•
•
With (all) the data available, find the/a key
pattern that indicates a situational change
– A single event
– Perhaps, a sequence of events
– (Not the signal in the noise problem!!)
Have we seen this pattern before?
–
•
•
•
•
•
Determine its characteristics, not just that it exists
Predict what event occurs next because
this/these event(s) occurred in the pattern
How to identify relevant fragments of data easily
from a multitude of data sources?
Difficult to determine what the right answer is in
advance
Problem: The needle hasn’t grown as
fast as the haystack!!
Problem: We need new analytics
methods to deal with larger, more
complex data and problems!!
3/12/2016
BDA-34
Copyright (except where referenced) 2014-2016
Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
Finding A Needle in a Haystack
•
What if the “needle” happens to be a complex data structure?
– Brute force search and computation are unlikely to succeed due to inefficiency
– Complexity increases with streaming data as opposed to a static data set
•
Absence of evidence (so far) is not evidence of absence! (Borne 2013)
•
What preprocessing do we need to do before searching?
– Quality vs. Quantity: What data are required to satisfy the given value
proposition?
– At what precision, accuracy, and reliability?
•
What if the needle must be derived rather than found?
– How do we track the provenance of the derived data/information?
– Is the process repeatable as we change algorithms and data structures?
Challenge: Consider finding the few packets in the millions (er, tens of
billions) flowing through a network that carry a virus or malware.
3/12/2016
BDA-35
Copyright (except where referenced) 2014-2016
Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
Ref: Crawford, L. Access and Analytics to the UK Archive,
British Library, 2010
Needle(s) in a Haystack
BDA-36
Blogs
Uhmm! What
are we
supposed to
glean from
this picture?
Copyright (except where referenced) 2014-2016
Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
Network Forensics
•
Networks have become exponentially faster. They carry more traffic and
more types of data than ever before. Yet as they get faster, they become
more difficult to monitor and analyze.
–
–
–
•
Problems:
–
–
–
–
•
40G Networks
Richer Data: VOIP as the telephony standard
Malicious security threats are more subtle
Finding proof of a security attack
Troubleshooting intermittent performance issues
Identifying the source of data leaks
Troubleshooting VOIP and Video over VOIP
Network forensics must be:
–
–
–
–
–
Precise: capture high-speed packets without droppage
Scalable: extend to new network technologies and speeds
Flexible: adapt to heterogeneous network segments
VOIP-Smart: reconstruct & replay VoIP calls; present Call Detail Records (CDR) for each call
Continuously available: run 24/7 with adequate storage; support real-time analysis
Finding the Knees
•
The knee of an algorithm or analytic is the scale value at which the
performance begins to degrade as larger data volumes are processed.
–
–
–
•
Factors affecting the knee:
–
–
–
•
Every analytic method and algorithm can have one (or more?)
Where positive slope increases begin to flatten out
Where positive or flat slopes transition to negative slopes
data structure, volume, and variety
algorithm complexity and implementation, and
infrastructure implementation.
What is/are the corollaries for non-algorithmic analytics?
3/12/2016
BDA-38
Copyright (except where referenced) 2014-2016
Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
Finding the Tipping Point
• A tipping point is one in which change in a system becomes
potentially irreversible and maybe even unstoppable.
–
–
–
Maybe associated with negative or positive effects
In social systems, a buildup to a critical mass at which point a seminal change occurs.
Ex: MySpace was a formidable component of Facebook, but once the Facebook
membership reached its “tipping point” people started abandoning MySpace and signing up
for Facebook.
• Small events can create
ripple effects – may be linear
or non-linear, chaotic or
perturbative
• Concept of emerging
trends in the commercial
marketplace
• The explosion of a viral
infection into an epidemic
Ref: Choucri, N., et al. (2006) Understanding and Modeling State stability: Exploiting System Dynamics.
MIT Sloan Research Papers, No. 4574-06, Jan. 2006.
BDA-39
Copyright (except where referenced) 2014-2016
Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
Spinning Straw Into Gold
• With (all) the data available, describe a situation in a generalized form such
that predictions for future events and prescriptions for courses of actions
can be made.
• Objective: Identify one or more patterns that characterize the behavior of
the system.
• Remember: All data has value to someone, but not all data has value
to everyone.
•
•
•
•
•
Patterns may be unknown or
ambiguously defined.
Patterns may be morphing over time.
The problem is sensemaking: the dual
process of trying to fit data to a frame
or model and of fitting a frame around
the data.
Neither data nor frame comes first!
Must evolve concurrently!
3/12/2016
BDA-40
Copyright (except where referenced) 2014-2016
Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
Dealing with Ambiguity
• Ambiguity arises from entity resolution:
–
–
–
–
Not enough data to explicitly resolve two or more entities or objects
1st Degree ER: “who is who”? (within domain)
2nd Degree ER: “who knows whom” (a graph/network analysis problem)
3rd Degree ER: cross-domain linkage (ontological resolution)
• Example: Text Processing and Understanding:
– Resolving ambiguity in human languages (much data is unstructured
text)
• Ex: The word “strike” has over 30 meanings in English
– Entity resolution is a multi-level process (Talburt 2009-2011)
• Ex: There are more than 45,000 people named “John Smith” in the U.S.
• Computational complexity increases with knowledge level.
– Tradeoff is end-to-end processing time versus number of entities to be
resolved
• Scaling may be problematic.
BDA-41
Copyright (except where referenced) 2014-2016
Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
Talburt’s Hierarchy for Entity Resolution
Method
Description
Deterministic
Matching
Link/merge two entity mentions based on the degree of similarity between the
values of corresponding entity attributes.
Ex: Direct match of names, addresses, other attributes
Probabilistic
Matching
Link/merge two entities based on corresponding attributes – even if some have
different values (but within the expected range).
Ex: “John, Doe, 1989-08-13” and “Jon, Doe, 1989-08-13”
Transitive
Matching
Link/merge two entities based on corresponding attributes: A matches B and
B matches C, so A matches C – even if not all attributes match. But, may lead to
false positives.
Associative
Matching
Link/merge two entities based on semantics and domain knowledge.
If (Mary Smith, 123 Oak St), (Mary Smith, 456 Elm St), (John Smith,123 Oak St),
and (John Smith, 456 Elm St) are entity mentions, none of the six possible
pairings of four records agree on name or address. But, we may infer that these
are the same John and Mary Smith at both addresses.
Assertive
Matching
Link/merge two entities based on prior and derived knowledge to reason about possible
relationships between them. Different models may be used to deduce/infer relationships.
Ex: “The Mary Smith who lives with her brother and resided at 123 Oak St. now
resides at 456 Elm St.”)
Ref: Talburt, J. (2009-2011) Reference Linking Methods, Identity Resolution Daily, Retrieved November 3, 2013 from
http://identityresolutiondaily.com/
BDA-42
Copyright (except where referenced) 2014-2016
Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
Advanced Analytics
BDA-43
Copyright (except where referenced) 2014-2016
Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
Crowdsourcing: A New Analytic?
• Tasks normally performed by employees are outsourced
via an open call to a large, self-selected community
• Some examples
– Netflix prize
– InnoCentive: solve R&D challenges
– DARPA Network Challenge
• Follows an AI blackboard model
• Distributed co-creation has become a mainstream
technology
• Issue: What metrics do we use to assess the results?
• Issue: How robust are the results?
• And, more issues to be addressed ….
3/12/2016
BDA-44
Copyright (except where referenced) 2014-2016
Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
Crowdsourcing: Wikipedia: An Example
The Death of the
Encyclopedia
Business Model
Ref: Mary Meeker. KPCB, presentation at All Things D.
3/12/2016
BDA-45
Copyright (except where referenced) 2014-2016
Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
BDA-45
Moving Analytics To the Edge
• Traditional analysis: data is stored and then analyzed
– Usually at some central location or a few distributed locations
– The cost and time to move large amounts of data may render it
obsolete or of little worth
• Moving analytics to the frontier of the domain:
– (Near) real-time analysis and decisions are required
– Streaming massive amounts of data is expensive, fraught with error:
microsecond latency, millions of events
– We just can’t store it all
– Perishability is a key factor
– May only really need the synthesized, aggregate information, not the raw data
– True data-driven analysis
• Example: Pushing analytics into cameras for images, full motion
video analysis, motion correction, 3D perception, …
BDA-46
Copyright (except where referenced) 2014-2016
Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
Edge Device Analytics
• Filtering and compression processes are often tied to the
downstream analytical requirements
– Means the filtering and compression algorithms must be dynamic
• How close to the edge can we push the filtering and compression
algorithms?
– What (additional) data do these algorithms need to be effective?
– How do we measure the efficiency of these algorithms?
– Does the in situ hardware have the computational capacity to support
such algorithms?
– How much data correction can we do at the edges?
• Challenges:
–
–
–
–
How can we summarize streaming data?
How fast can we determine changes in the incoming data?
How fast can we adapt to changes in the data stream?
How fast can we affect the environment based on what we see?
NOTE: Not the same as IBM Edge Analytics!!
3/12/2016
BDA-47
Streaming Analytics
• Streaming data: analytics must (often) occur in real time,
as the data passes through the sensing/collecting
device:
– Allows you to identify and examine patterns of interest as the
data is being created.
– Can yield instant insight and immediate (re)action.
• “Real-time is for robots”
– Joe Hellerstein, Professor of Computer Science at UC Berkeley.
– “If you have people in the loop, it’s not real time. Most people
take a second or two to react, and that’s plenty of time for a
traditional transactional system to handle input and output.”
BDA-48
Location Analytics
•
•
What is it?
– Augmenting mission-critical, enterprise
business systems with complementary
content, mapping, and geographic
capabilities
– Mapping & Visualization: use maps as
the media to visualize data
– Spatial analytics: merging GIS w/ other
types of analytics
– Find spatio-temporal patterns indicative
of physical activities or social behavior
– Data/information enrichment: add
maps, imagery, demographics,
consumer and lifestyle data,
environment and weather, social
media, etc.
Ubiquity of GPS on cellphones, cars,
wristwatches, laptops, tablets, etc.
Ref: Kerr & Nelson,ESRI International User Conference, July 2012
BDA-49
Ref: http://www.esri.com/software/location-analytics
Location Analytics:
After a Regime Change …
• Immediate transition from “normal operations in an urban
environment” to “government in place”
– Need Civil Affairs support for managing city/region relief efforts
• No existing analysis, planning, and management tool to assist civil
Government and Relief Officials
• A lack of coherent view will lead to failure and, possibly, continuing
deterioration of government, law and order, civil systems, etc…
• Why?
– (nearly) complete destruction of governing regime
– Disintegration of social/governmental institutions: water, education,
health, law enforcement, financial, food
– Society reorganized overnight to adapt to the new power structure,
which may/may not include former government personnel
• Ex: Sadr handed out food; created immediate constituency and had people
talking about him
• Cannot win “hearts and minds” if you cannot “feed and house them”
BDA-50
Web Analytics
• What is it?
– Now: The study of the behavior of web users
– Future: The study of one mechanism for how society makes
decisions
– Example: Behavior of Web Users
• How many people clicked on Ebola (or related terms in the past 2
months)
• Their location, their dwell time, the number of sites they examined,
the difficulty or complexity of the material on the web site
• What can this tell us about popular concern about Ebola?
• Can it help decision makers to better present information and
decisions
– Commercially, it is the collection and analysis of data from a web
site to determine which aspects of the website achieve the
business objectives
3/12/2016
BDA-51
BDA-51
Web Analytics
• Challenge: When people start getting more of their
information from the Web than from radio/tv/papers:
– How do we design web pages to influence the opinions of people?
• The web is not a neutral medium
– How do we measure the influence of a web page’s contents on user
opinions?
• Does # visitors translate into a viable influence metric?
• Does dwell time translate into influence?
• Are opinions more or less influenced as the visitor clicks through a sequence
of pages?
• How many different web sites (each of one or more pages) does a user
searching a particular topic click through?
– How do we measure the difficulty and/or complexity of the material
presented on a web page(s)?
• What is the corollary to the Flesch-Kincaid Score for web pages?
– How do we design “good” web pages to increase their Google
Page Rank score?
3/12/2016
BDA-52
Visual Analytics
• Visual analytics: the science of analytical reasoning
facilitated by interactive visual interfaces
Where are
the walkers
going?
Remember:
The eye can
be easily
fooled and,
thereby, fool
the brain.
3/12/2016
BDA-53
Visual Analytics
• Visual analytics: an evolving discipline which is driving new ways of
presenting data and information to the user.
• Visual analytics: “the science of analytical reasoning facilitated by
interactive visual interfaces” (Thomas and Cook 2005)
• Visual analytics:
– the formation of visual metaphors in combination with a human
information discourse (interaction)
– that enables detection of the expected and discovery of the
unexpected within massive, dynamically changing information
spaces. (Wong and Thomas 2004)
• Visual analytics provides the “last 12 inches” between the masses of
information and the human mind that enables us to make decisions.
3/12/2016
BDA-54
Visual Analytics
• Why is it hard?
– You can only see 2D because your screen is 2D
• To visualize k-dimensional data:
– Divide the screen into multiple 2D regions and show pair-wise
correlations across selected dimensions
– Project k-dimensional data into 2D
• Projection Methods (Dimension Reduction):
• PCA, MDS, LDA, LLE, many others …
• Many others! Usually, try to preserve distances in 2D as
they exist in k-D
3/12/2016
BDA-55
Visualization Analytics:
A Periodic Table
Copyright (except where referenced) 2014-2016
Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
Visual Analytics: Challenges
• Projection methods:
– How to choose which is best for a given problem
– Scaling
– Harder to understand what is being conveyed
• How to visualize non-numeric data, e.g. text, icons, or
images?
– Interactive multiple displays
– Change in one display begets a change in another to allow
exploration
• As k grows really large and the data types in the problem
space are mixed, what do we do?
– Remember, Miller 1956: 7 +/- 2
3/12/2016
BDA-57
Visual Analytics: Challenges
• What if the data cannot fit on your computer?
– Truncate (sample, filter)
• Easy to implement; efficient; scalable
• Sampling is often data- or task-dependent
– Resolution reduction (“blurring”, image zooming)
• Fine details can be lost (get the big picture)
• Can zoom in on specific features (but lose forest for trees)
– Streaming:
• Inspect data in blocks (with or without overlapping windows)
3/12/2016
BDA-58
Visual Analytics: Affecting Factors
•
•
•
•
•
•
•
Spatial Ability
Cognitive Workload/Mental Demand
Personality
Experience (novice vs. expert)
Emotional State
Perceptual Speed
… and more
Conclusions:
- Computer must be more aware of the user
- Computer must develop a model of the user’s behavior
- Develop a symbiotic environment for data exploration
3/12/2016
BDA-59
10 Exascale VA Challenges - I
In-Situ Analysis
Beyond PBytes, storing data, then retrieving it later for
visualization may not be feasible. Develop new algorithms for
in-situ VA to greatly reduce I/O. Perform VA concurrently with
data analysis.
User-driven Data Reduction
While data volumes grow rapidly, human cognitive capabilities
remain unchanged. Provide flexible, interactive user-control
mechanisms for dynamically filtering data for VA.
Multilevel Hierarchy
Hierarchy depth and complexity grow with data volume. New
algorithms are required for transformation and traversal of
multilevel hierarchies.
Representing Evidence
and Uncertainty
Evidence synthesis and uncertainty quantification are usually
united through visualization. How best to present evidence
and uncertainty without introducing significant bias.
Heterogeneous Data Fusion
Extreme scale problems are often heterogeneous with
complex structures. New algorithms are required for fusion
of heterogeneous data objects.
Ref: Wong, Shen, Johnson, Chen and Ross (2012)
3/12/2016
BDA-60
10 Exascale VA Challenges - II
Data summarization and Triage
for Interactive Query
Analyzing entire exascale data sets is likely impractical.
New tools for interactive filtering of data for analysis and
display of selected, relevant data is required.
Temporally Evolved Features
Develop tractable algorithms for VA of temporal streams.
Mitigate the Human Bottleneck
Finds ways to compensate for human cognitive limitations.
Need to understand how brain processes visual images.
New VA Frameworks for HPC
Design and develop new frameworks with open APIs for
interaction and UI that do not constrain HPC systems
Replace Conventional Wisdom
New ideas are required for VA Methodologies.
Ref: Wong, Shen, Johnson, Chen and Ross (2012)
3/12/2016
BDA-61
Context-Aware Computing
• Arose from ubiquitous computing in early 90s: computing
everywhere and “invisible”
• Ex: Active Badges
– Problem: locating researchers
– Solution: badge tied to identity, tracked as researcher moves in
building
– Xerox PARC pioneered this idea in 90s under Mark Weiser
– Now used in museums and other venues to “customize” the user
experience
– Can be based on RFIDs embedded in badges
What Is Context?
•
•
•
By example
– Location, time, identities of nearby users …
By synonym
– Situation, environment, circumstance
By dictionary [WordNet]
– the set of facts or circumstances that surround a situation or event
• “Context is any information that can be used to
characterize the situation of an entity. An entity is a
person, place, or object that is considered relevant to the
interaction between a user and an application, including
the user and the application themselves.” [Dey and
Abowd, 2000]
Personal AI: Context-Aware Computing
• No consensus on what it is!
• “A system is context-aware if it uses context to provide relevant
information and/or services to the user, where relevancy depends on
the user’s task.”
• For our purposes, it is symbiotic computing:
– A partnership between human and machine where each is aware of the
other – capabilities and limitations
– Each is a reasoning autonomous adaptive entity
– Each may initiate an activity and propose/dispose of ideas
• Contradicts Weiser’s first view of Ubiquitous Computing
– Computing recedes into the background
• Need to define what Context is:
– Location, time, situation, data/information
BDA-64
Copyright (except where referenced) 2014-2016
Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
Context-Aware Requirements
• John McCarthy defined the key ideas (for a robot), but
we adapt them to a general situation:
– The system observes its physical environment, recognizes the
status of its effectors, notices the relation of itself to the
environment and notices the values of important internal
variables, e.g. the state of its power supply and of its
communication channels.
– Observes that it does/does not know the value of a certain term
• E.g., observing whether it knows the telephone number of a certain
person.
– Observing that it does know the number or that it can get it by
some procedure is likely to be straightforward.
– Keeping a journal of physical and intellectual events so it can
refer to its past beliefs, observations and actions.
– Observing its goal structure and forming sentences about it.
• Notice that merely having a stack of subgoals doesn't achieve this
unless the stack is observable and not merely obeyable.
BDA-65
Context-Aware Requirements
• John McCarthy continued:
– The entity may intend to perform a certain action.
• It may later infer that certain possibilities are irrelevant in view of its
intentions.
• This requires the ability to reflect on its intentions.
– Observing how it arrived at its current beliefs.
• Most of the important beliefs of the system will have been obtained by
nonmonotonic reasoning, and therefore are usually uncertain.
• It will need to maintain a critical view of these beliefs, i.e., believe metasentences about them that will aid in revising them when new information
warrants doing so.
• It will presumably be useful to maintain a pedigree for each belief of the
system so that it can be revised if its logical ancestors are revised.
• Reason maintenance systems maintain the pedigrees but not in the form of
sentences that can be used in reasoning.
• Neither do they have introspective subroutines that can observe the
pedigrees and generate sentences about them.
BDA-66
Context-Aware Requirements
• John McCarthy continued:
– A system should be able to answer the questions: ``Why do I
believe ?'' or alternatively ``Why don't I believe ?''.
– Contexts need to be modeled as objects that represent mental
states of events/things/people in the world around it.
– The ability to transcend one's present context and think about it
as an object is an important form of introspection.
– Knowing what goals it can currently achieve and what its choices
are for action.
• He claims that the ability to understand one's own choices
constitutes free will.
BDA-67
Context-Aware Requirements
• On the computer side of the partnership:
– System must be dynamically extensible:
• One can incorporate a new module with new functionality,
incrementally, without the need for recompilation
• Indeed, the system environment should be (nearly) wholly selfcontained.
– Self-Modifying:
• Ability to generate new or revised modules that change the
functionality of the system
• Modules may affect any representation or control mechanisms of
the system
– Self-Adapting:
• System must be able to integrate new modules into its
computational repertoire without ceasing operations, except for
snapshotting and restarting critical processes.
• Safety and health of the system must be ensured.
• Note: Humans already do this, but not very well at times.
BDA-68
Context-Aware Computing
• Challenges:
– New situations don’t fit examples
– How to use in practice?
• Presentation to user
• Types of analytics
– How to model context in a computational
environment?
•
•
•
•
Spatial
Temporal
Dynamically changing: velocity, volume, ….
Is it a Big Data problem?
Advanced Analytics:
Critical Challenges
3/12/2016
BDA-70
What Are Grand Challenges?
•
Definition: A specific scientific or technological innovation that would
remove a critical barrier to solving an important domain problem with a
high likelihood of global impact and feasibility.
– Provide scope for engineering ambition to build something that has never
been seen before.
– Generally comprehensible, and capture the imagination of the general
public, as well as the esteem of scientists in other disciplines.
– Go beyond what is initially possible, and requires development of
understanding, techniques and tools unknown at the start of the project.
– Since these first appeared in the 80s, they abound in every science and
discipline!
•
Not just a restatement of the many “big problems” facing the world today
– Are they equivalent to Wicked Problems?, or are
– Grand Challenges
Wicked Problems?
•
A tool for focusing investigators working towards overcoming one or
more bottlenecks in a foreseeable path toward a solution to significant
domain problems.
BDA-71
Some Grand Challenges
• Modeling Our Planet’s Systems:
– Assessing global warming and determining mitigating actions
• Confronting Existential Risk:
– What is the impact of a dangerous genetically modified pathogen
• Exploring Transhumanism:
– What is the impact of embedded nanotechnology, genetic
therapy, and “smart” prosthetics?
• The Singularity?
– What happens when systems approach the level of human
intelligence? Emotional intelligence?
• Dealing Effectively with Globalism:
– Modeling the interconnected of human societies/organizations
Ref: Martin, J. "The Meaning of the 21st Century: A Vital Blueprint for Ensuring Our Future“, Jan 2007
72
Wicked Problems
•
•
•
•
In 1973, Horst Rittel and Melvin
Webber formally described the
concept of wicked problems.
Conventional problem solving
methods, rooted in 18th century
physics, economics and
engineering, focused on
efficiency.
Societal problems are
fundamentally different from the
types of problems that scientists
and engineers deal with.
Societal problems are wicked
problems.
BDA-73
Tenets of Wicked Problems
•
•
•
•
•
•
•
•
•
No general agreement on what
the problem is.
You don’t understand the
problem until you develop a
solution.
Wicked problems have no
stopping rule.
Solutions to wicked problems
are not right or wrong.
Every wicked problem is
essentially novel and unique.
Every solution to a wicked
problem is a 'one shot
operation‘.
Wicked problems have no given
alternative solutions.
Causes and Effects are Elusive
Sensitive to Initial and Boundary
Conditions (History)
BDA-74
Figure 3. Wicked Problems (Conklin, 2005)
Some Wicked Problems
•
•
•
•
•
•
•
•
Infrastructure Resilience
Climate Change
“Peaking” Oil or Coal: When does it run out?
The “Long War”: Is there an end to terrorism?
Sustainable Cities and Ecosystems
Sustainable Development in the Third World
Affordable Health Maintenance for an Aging Society
Transitioning to Democracy and Beyond
– Predicting the next “Arab Spring”
• Biological and Genetic Threats and Opportunities
• Reducing the U.S. Debt
• Discover drugs that minimize disease-resistant micro-organisms
BDA-75
Analytics Challenges
• The Google Property ?s:
– Can analyses improve with more data to process?
– Can analyses improve with more detailed analytics that we use?
• Kaisler, Armour, Espinosa, Money:
– Can analyses improve with better system and environment
models?
– How do we measure value of an analytic?
– What is the limit for value as we add more data?
– Can good algorithms, models, heuristics overcome data quality
problems?
– With more data to analyze, can Big Data improve decisionmaking? And, by how much (e.g., how do we measure it?)
BDA-76
Challenge: Population Imbalance
• Events of interest occur relatively infrequently in very large datasets.
• However, non-interesting events occur more frequently:
– May require {additional, extensive} computational effort.
• Reasons for Imbalance:
– Underrepresented data/severe class distribution skew:
• Large regions of the problem space may be covered sporadically or not at all
by the observer or observation instruments.
• Impact precision and performance of data mining/machine learning
algorithms (He and Garcia 2009).
– Data collection may be (usually is) imperfect.
– Data are often beset with noise.
– Data may be missing in longitudinal or temporal sequences.
• Enough relevant data of good quality may not be available to permit
robust analysis.
BDA-77
Challenge: Data Analysis
• Feature Selection: Information is distributed in a complex
way across many features.
• Mitigating False Alarms: Target patterns are
ambiguous/unknown; “squelch” settings are brittle;
cannot prevent false positives
• Domain Drift: Target patterns change/morph over time
and across operational modes (processing methods
becomes “stale”)
BDA-78
Challenge: Ethical Problems
• Should we use data without the permission of individual
owners, such as copying publicly available data?
– What is tacit permission and approval, anyway?
• Should we be required to inform individuals when we use
their data?
– Do they really own it? Or, how much of it?
• Should we (be required to) check the accuracy of what is
posted on a publicly available web site before using it?
• What rules and regulations should exist about combining
data about individuals into a central repository?
– Does aggregation exceed permissible need to know about an
individual?
BDA-79
Challenges: Data Annotation
• The semantic web is largely unrealized:
– Much metadata is lousy
– Standardization of models and labels is a major issue
•
•
•
•
•
– Integrating ontologies and vocabularies is a critical problem
No standards for social tagging, science tagging, etc.
Tagging only works if many are tagging
What is the Quality of the Result if the Quality of the Data/Metadata
is poor?
Mark Greaves: “If we don’t have semantic convergence, then
semantics isn’t a differentiator”
The “information provenance” problem
3/12/2016
BDA-80
Challenges: End-to-End Systems
• Wicked Problems will require an end-to-end system approach:
– Multiple analytics: cascade or mesh or other topology for the analytic
architecture
• Why? Different types of data (may) require different approaches
– Need a robust, reliable computing/infrastructure environment
• Pipelines are a relatively simple model; may not be adequate for complexity
of problem
• Must consider end-to-end behavior:
–
–
–
–
–
Bottlenecks
Data conversion, transfer, & loading overheads
Storage costs & other parts of the data life-cycle
Resource management challenges
Total Cost of Ownership (TCO)
3/12/2016
BDA-81
Challenges: Data Sources
• Tolerant Analysis – you are typically doing open-world reasoning
–
–
–
–
–
Things go away
Contradiction is present
Data is incomplete and may be erroneous/noisy
Surveying the KB may not/is not possible (it is too large!!)
Need substantial domain knowledge to reason effectively
• Both deep and shallow knowledge
• Multiple linked ontologies & data sources:
– Single ontologies are feasible only at the organizational level
– How to reconcile multiple authors and overlapping data sources
– Contain both private and public knowledge w/ security, privacy &
separation issues
• Is this equivalent to the multilevel security problem?
• Heterogeneity and Incompleteness:
– Humans are very tolerant of heterogeneous data; computers are not
– Need to perform dynamic cleansing, curation, and transformation
– How to build programs that accept heterogeneous data
BDA-82
Challenge: Metrics
83
Where is the ROI?
• ROI (Return on Investment) is not always immediately obvious
• Results of analytics may be available only after years of following
the prescription
• Requires long-term effort(s) to develop a sustainable capability
• Examples:
– Health: moving from predictive to preventative health care
– Health: enabling personalized medicine for shortened time to value
– Health: recognizing and predicting the spread of infectious diseases
(Ebola)
– Crime: aggressively recognizing and combating syndicated, multiparty
fraud online
– Crime: predicting potential crime locales and time to preventatively
deploy police
– Environment: predicting weather, floods, earthquakes, volcanic
eruptions earlier
– Computer Security: surveying systems to predict potential for attacks
BDA-84
Advanced Analytics
Key Challenges Today for Tomorrow
• Analytic Scientists or lack thereof
– How are we going to train them?
– How many do we need?
• Potential to end up like “data mining” (shudder)
– The Big Data mantra: “80% of the effort is in extracting, moving
cleaning, and preparing the data, not actually analyzing it.”
• Don’t disregard Traditional Analytics:
– Big Data Analytics and Advanced Analytics will be side by side
for years
– We need an analytics capability beyond what is offered by
traditional business analytics
BDA-85
Analytics: Transformative Science
• Making Analytics a Transformative Science:
– Computationally tractable reasoning algorithms
– Explicit models for prediction, prescription and decision versus
implicit or embedded models in technology
– Enabling technologies and infrastructures for modeling and
analysis, including interoperability interfaces and standards:
multiple suites of analytic tools
– Model validation and verification
– Ensuring availability of appropriate, quality data
– Uncertainty quantification and predictability of outcomes
– Community-developed versus custom-made software
– Peer review for analytics and modeling research – open
repositories
– Analytic Method Capture and Reuse
BDA-86
Is this ….?
Oooops! We mean the Exabyte Age!
BDA-87
Questions
BDA-88
Who We Are
Stephen H. Kaisler, D.Sc.
Senior Associate/PCI Strategic Management && Principal/SHK & Associates
Columbia MD/Laurel, MD
skaisler1@comcast.net
Dr. Stephen Kaisler is currently a Senior Associate at PCI Strategic Management and a Principle in SHK & Associates..
He has previously worked for DARPA, the U.S. Senate and a number of small businesses. Dr. Kaisler has worked
with big data, MapReduce technology, and advanced analytics in support of the ODNI CATALYST program. He
has been an Adjunct Professor of Engineering since 2002 in the Department of Computer Science at George
Washington University. Recently, he has also taught enterprise architecture and information security in the GWU
Business School. He earned a D.Sc. (Computer Science) from George Washington University, an M.S. (Computer
Science) and B.S. (Physics) from the University of Maryland at College Park. He has written or co-authored seven
books and published over 38 technical papers.
William H. Money, Ph.D.
School of Business Administration
The Citadel
wmoney@citadel.edu
William Money joined the Citadel as Associate Professor of Business Administration in 2014. Previously, he was with
the George Washington University School of Business faculty, which he joined in September 1992 after acquiring
over 12 years of management experience in the design, development, installation, and support of management
information systems (1980-92). His publications and recent research interests focus on information system
development tools and agile software engineering methodologies, collaborative solutions to complex business
problems, program management, business process engineering, and individual learning. He developed teaching
and facilitation techniques that prepare students to use collaboration tools in complex organizations and dynamic
work environments experiencing significant change. Dr. Money has a Ph.D., Organizational Behavior 1977,
Northwestern University, Graduate School of Management; the M.B.A., Management, 1969, Indiana University;
and a B.A., Political Science, 1968, University of Richmond.
BDA-89
Who We Are
Frank Armour, Ph.D.
Kogod School of Business/American University
farmour@american.edu
Dr. Armour is an independent senior IT consultant and Research Fellow at the Center for Information Technology
in the Global Environment (CITGE), Kogod Business School, American University. Dr. Armour has extensive
experience applying advanced information technology. His work and research includes business and
requirements analysis, enterprise architectures, System Development Cycle Development (SDLC), and
object-oriented development. Dr. Armour has consulted for both government and private organizations on the
effective application of enterprise architecture, IT Governance and system requirements approaches. In a
previous position, at a major IT consulting firm, he had a joint appointment as the lead Object Methodologist
and as the Assistant Director of the Object Technology Lab. In this position he provided guidance and indepth mentoring to object projects on object concepts, architecture, project management, methods and tools.
J. Alberto Espinosa, Ph.D.
Kogod School of Business/American University
alberto@american.edu
Dr. Espinosa is currently Professor and Chair of Information Technology at the Kogod School of Business,
American University. He holds a Ph.D. and Master of Science degrees in Information Systems from Carnegie
Mellon University, Graduate School of Industrial Administration; a Masters degree in Business Administration
from Texas Tech University; and a Mechanical Engineering degree from Universidad Catolica, Peru. His
research focuses on coordination and performance in global technical projects across global boundaries,
particularly distance and time separation (e.g. time zones). His work has been published in leading scholarly
journals, including: Management Science; Organization Science; Information Systems Research; the Journal
of Management Information Systems; Communications of the ACM; Information, Technology and People; and
Software Process: Improvement and Practice. He is also a frequent presenter in leading academic
conferences.
BDA-90
Thank You!!
3/12/2016
BDA-91
References
•
•
•
•
•
•
•
•
•
Argenta, C., J. Benson, N. Bos et al. 2014. “Sensemaking in Big Data Environments”, 1st
Workshop on Human-Centered Big Data Research, Raleigh, NC
Borne, K. 2013. Statistical truiisms in the Age of Big Data, retrieved December 2013 from
http://www.statisticsviews.com/details/feature/4911381/Statistical-Truisms-in-the-Age-of-BigData.html
Clarkson, A. (1981) Towards Effective Strategic Analysis, Westview Press, Boulder, CO
Davenport, T. H. and J. G. Harris. 2007. Competing on Analytics: The New Science of Winning ,
Harvard Business School Press
Felten, E. (2010) Needle in a Haystack Problems, Retrieved November 1, 2013 from
https://freedom-to-tinker.com/blog/felten/needle-haystack-problems/
Gladwell, M. (2000) The Tipping Point: How Little Things can Make a Big Difference. Boston: Little
Brown
He, H. and E.A. Garcia. (2009) “Learning from Imbalanced Data”, IEEE Transactions on Data and
Knowledge Engineering, 21(9):1263-1284
Heuer, R.J., Jr. 1999. Psychology of Intelligence Analysis, Center for the Study of Intelligence,
Central Intelligence Agency, Washington, D.C.
Kaisler, S. 1990. Strategic Automated Discovery System (STRADS), with C. Oresky, A. Clarkson,
and D. B. Lenat, published in Knowledge Based Simulation: Methodology and Application, ed. by
P. Fishwick and D. Modjeski, Springer-Verlag, December, 1990
3/12/2016
BDA-92
References
•
•
•
•
•
•
•
Kaisler, S. (2005) Software Paradigms, New York, NY: John Wiley & Sons
Kaisler, S. and C. Cioffi-Revilla. 2007. Quantitative and Computational Social
Sciences Tutorial, 40th Hawaii International Conference on System Sciences,
Waikoloa, HI, 2007
Kaisler, S. 2012. Advanced Analytics, Technical Report prepared under contract
AFRL #AFRL FA8750-11-C-0045
Kaisler, S., F. Armour, A. Espinosa, and W. Money. 2013. “Big Data: Issues and
Challenges Moving Forward”, 46th Hawaii International Conference on System
Sciences, Grand Wailea, Maui, HI
Kaisler, S., F. Armour, A. Espinosa, and W. Money. 2014. “Advanced Analytics:
Issues and Challenges”, 47th Hawaii International Conference on System Sciences,
Hilton Waikoloa, Big Island, HI
Kaisler, S., F. Armour, W. Money, and A. Espinosa. 2014. “Big Data: Issues and
Challenges”, Encyclopedia of Science and Technology, 3rd Edition, IGI Global
Kaisler, S., F. Armour, A. Espinosa, and W. Money. 2014. “Advanced Analytics:
Issues and Challenges”, Encyclopedia of Science and Technology, 3rd Edition, IGI
Global
3/12/2016
BDA-93
References
•
•
•
•
•
•
Ritchey, T. .2005. Wicked Problems: Structuring Social Messes with Morphological
Analysis, Swedish Morphological Society. Retrieved October 30, 2013 from
http://www.swemorph.com/wp.html
Rittel, H. and M. Webber. 1973. “Dilemmas in a General theory of Planning”, in Policy
Sciences, Vol. 4 (pp. 155-169). Amsterdam, the Netherlands: Elsevier Scientific.
Schwartz, P. M. 2010. Data Protection Law and The Ethical Use of Analytics, The
Centre for Information Policy Leadership, Hunton & Williams, LLP.
Singh, L., E.J. Bienenstock and J. Mann. 2010. What are we missing? Perspectives
on social network analysis for observational scientific data, Handbook of Social
Networks: Technologies and Applications. Ed. B. Furht., Springer
Stanton, J. 2013. Version 3: An Introduction to Data Science, http://jsresearch.net/
Suchman, L. 1987. Plans and Situated Actions: The Problem of human-Machine
Communication, Cambridge University Press, Cambridge, England
3/12/2016
BDA-94
References
•
•
•
•
•
•
•
•
Talburt, J. (2009-2011) Reference Linking Methods, Identity Resolution Daily,
Retrieved November 3, 2013 from http://identityresolutiondaily.com/
Tufte, E. R. 1997. Visual & Statistical Thinking: Displays of Evidence for Decision
Making. Graphics Press.
Thomas, J. J. and K. A. Cook, Eds. 2005. Illuminating the Path – the Research and
Development Agenda for Visual Analytics, IEEE Computer Society
Varian, H. (2009) McKinsey Quarterly. Retrieved October 30, 2013
http://www.mckinseyquarterly.com/Hal_Varian_on_how_the_Web_challenges_mana
gers_2286
Weick, K. E. (1995). Sensemaking in Organizations. Thousand Oaks, CA: Sage
Publications.
Weiser, M. 1991. “The Computer for the Twenty-First Century”, Scientific American,
265(3):94-104
Wong, P. C. and J. Thomas. 2004. “Visual Analytics”, IEEE Computer Graphics and
Applications, 24(5):20-21
Wong, P. C., H-W. Shen, C. R. Johnson, C. Chen, and R. B. Ross. 2012. “The Top
10 Challenges in Extreme-Scale Visual Analytics”, IEEE Computer Graphics and
Applications, 32(4):63-67
3/12/2016
BDA-95
Download