Data Analytics: Dealing with Cyber Storms Cybersecurity Analysis Suite (CASe) for Precision

advertisement
Data Analytics: Dealing with Cyber
Storms
Cybersecurity Analysis Suite (CASe) for Precision
and Prediction in an Era of Big Data
Joseph Kielman
Science Advisor
18 May 2015
PART 1: Operational Need
§  Data analysis challenges are well documented*
§  Annual global IP traffic will surpass the zettabyte threshold (1.3 zettabytes) by the end of
2016. In 2016, global IP traffic will reach 1.3 zettabytes per year or 110.3 exabytes per
month.
§  Global IP traffic has increased eightfold over the past 5 years, and will increase threefold
over the next 5 years. Overall, IP traffic will grow at a compound annual growth rate
(CAGR) of 29 percent from 2011 to 2016.
§  In 2016, the gigabyte equivalent of all movies ever made will cross global IP networks
every 3 minutes. Global IP networks will deliver 12.5 petabytes every 5 minutes in 2016.
§  The number of devices connected to IP networks will be nearly three times as high as the
global population in 2016.
§  This work focuses on the analysis portion of the data processing cycle and
applying it to the cyber security environment
§  US CERT projects need for analysis will rise to 10B events/day when EINSTEIN 3
becomes operational
§  Predictive analysis will help anticipate cybersecurity events – key as number and
frequency of events increases
*Source: CISCO VNI forecast, 2011-2016
2
Predictive/Prospective
Policy
Context, Culture,
Genes
Cognitive/Behavioral Analytics
Prediction
Visual Analytics
Synthesis
Analysis
Graph Matching
Cognition
Evidence Extraction
Organization
Pattern Analysis
Content
Management
Link Discovery
Integration
Connect the Dots
Extraction
Discovery
Aggregation
Data
Information
Knowledge
Wisdom
Analytics for Complex Information (ACI)
Being Prospective rather than Retrospective means
enabling users to harness
Massive data, which comes in
Multiple modes and
Multiple types, through
Multiple devices, in
Diverse user environments,
In order to make decisions in real-time.
Dynamic
Information
Made
Actionable
In RealTime
Information after Analytics
•  What part is relevant?
•  Am I missing something?
•  What does it mean?
•  When will I need it?
•  How do I use it?
•  Can I trust it?
•  Is it private or personal?
•  How do I convey it to others?
•  When can I take actions with it? And which ones?
•  How will others see it?
•  How should I interact with it?
Other Issues with Digital Data
•  Data Archiving
•  How and where to store it?
•  What do we do about clouds, warehouses, and websites?
•  Data Permanence
•  How long will it last?
•  Data Rendering
•  What will we use to run it?
•  Regulatory or Legal Issues
•  How to deal with Digital Rights Management and piracy?
•  Who enforces the copyright?
Science and Technology Principles
What Is the Problem?
Information -There are real-world [resiliency] issues for which the availability
and usefulness of information is problematic
How Do We Structure the Solution?
Utility -The capabilities being developed must address at least four
information-related concerns: increasing amounts and diversity of data
compounded by multiplicity of users; indeterminate nature of relevance
coupled with multiplicity and complexity of [threats] or problems
What Is the Value of the Work?
Outcome -The improvements being provided must have real-world outcomes
or implications for the [homeland security] enterprise
NOTE: Please replace words in [ ] with terms appropriate to your industry or affiliation.
What We Want/Need/Expect from Analytics
The Goal … A coherent story using complete, up-to-date information and
with which we are comfortable in making decisions
But …
•  How coherent?
•  How complete?
•  How current?
•  How comfortable?
So …
§  Risk measures
§  Confidence limits
PART 2: Program Objectives and Design
q  Develops and delivers capability to discern, analyze and investigate, and predict
multiple, large-scale cybersecurity threats to critical infrastructures
q  Addresses both DHS components’ investigative and protective missions and the
requirements of the recent Executive Order and Presidential Policy Directive
q  Delivered in three stages tailored for successively more complex applications
§  Standalone system addressing localized to nation-wide to global financial crimes and
cyber attacks by individuals and criminal networks
§  Networked capability allowing distributed toolsets and data sources to be readily
applied to address large-scale, distributed attacks on multiple infrastructure sectors
§  Public-Private Partnership using matching private sector funds to deliver a common
cyber threat and situational awareness capability for multiple, linked infrastructures
q  Enhances system performance and ROI with each discrete implementation stage
§  Relies on previously developed tools, frameworks, and systems as well as others
currently under development to replace largely manual, case-specific methods
§  Integrates and scales capabilities to provide real-time analysis of developing cyber
infrastructure attacks at levels of tens of billions events or transactions
§  Enables interdiction of criminal activities or cyber attacks before they become
widespread or severe as well as forecasting or prediction of potential cyber threats
9
EO-13636/PPD-21 Requirements
§  PPD-21 dated 2/12/13 calls for the following:
§  Strategic Imperative 3 calls for implementing “An integration and analysis
function to inform planning and operational decisions regarding Critical
Infrastructures (CI)”
§  DHS to: “Maintain national critical infrastructure centers that shall provide a
situational awareness capability that includes integrated, actionable information
about emerging trends, imminent threats, and the status of incidents that may
impact critical infrastructures”
§  DHS, “In conjunction with … other Federal Departments and agencies, to
provide analysis, expertise and other technical assistance to CI owners and
operators”
§  DHS to: “Support the Attorney General and law enforcement agencies with their
responsibilities to investigate and prosecute threats to and attacks against
critical infrastructure”
§  Integrated Task Force (ITF) Incentives Study in process. Cyber Security
Division leading this engagement for S&T, which includes integration of the
R&D requirements and strategy
10
Op Context Chart – Financial Crimes
Current (As is)
Various agencies and organizations are responsible for maintaining the integrity of the nation's financial infrastructure and payment systems. The y constantly
implement and evaluate prevention and response measures to guard against electronic crimes as well as other computer related fraud. The financial industry
estimates billions of dollars in annual losses associated with credit card fraud.
Financial
crime
occurs Receive raw financial crime data via massive excel files T = days
Organize and analyze data across mul+ple details (e.g. loca+on, +me, iden++es, etc.) T = weeks
Determine responsible par+es, collect evidence for prosecu+on Prosecute responsible par+es Ongoing Collec+on/Analysis Current detection of financial crimes comes from the victim or the company holding the victim’s access and financial information
•  Financial crimes happen very quickly and are international in nature, as the same information can be sent around the world to other locations
for action (e.g. ATM withdrawals simultaneously in US, UK, Germany, etc.)
•  Aware of the communities responsible for large-scale financial crimes, but difficult prosecuting the leads for these communities.
•  EX: Over 1 week in Feb. 2013, one incident with 42,000 transactions across multiple countries led to a total loss of $39M.
The raw financial data comes in different ways, exported from different systems, into massive excel files
•  Evidence collection is collected by location, amount, and account information
•  Geolocation of events is difficult as each ATM, bank, etc. are not properly tagged with location data
•  Surveillance information at sites varies by country, city, town, etc.
Determining responsible party(ies) is difficult and labor intensive
•  The “middle man” is easier to identify using camera and surveillance systems at the site of the crime, however, the larger organization and the
individuals therein giving the orders and disseminating the financial information is harder to identify for prosecution
Prosecuting responsible parties is difficult due to the level of data needed for prosecution is complicated and labor intensive
•  The “middle man” is easier to identify using camera and surveillance systems at the site of the crime, however, the larger organization
and the individuals therein giving the orders and disseminating the financial information
is harderName
to identify for
prosecution
Presenter’s
June
17, 2003
Version 12 (2-5-2013)
11
Op Context Chart – Financial Crimes
U.S.
Contractors
Future (To be)
Tools to help financial
institutions identify
events
Anomaly detec+on within financial transac+ons T = minutes
Apply WireVis to integrate data into transac+onal database with visualiza+on interface for analysis and data manipula+on Financial
crime
occurs
Receive raw financial crime data Organize and analyze data across mul+ple details (e.g. loca+on, +me, iden++es, etc.) Determine responsible par+es, collect evidence for prosecu+on Prosecute responsible par+es Ongoing Collec+on/Analysis Support initial anomaly detection activities of Federal Government and its major partners
•  Currently receives cluttered raw data in varied forms, mostly via Excel files
•  If appropriate, major partners could link to the WireVis system to immediately
•  This would enable uniform data creation and sharing methods for the US and its major partners
Apply the WireVis tool to enable faster organization, analysis, and understanding of financial crimes data (minutes vs. days)
•  WireVis was developed to investigate money-laundering & fraud; can be applied to everything from risk analysis to financial business intelligence
•  This tool was developed by UNCC with the support of S&T’s Office of University Programs and Bank of America
•  Supports highly interactive exploration from a grand overview to particular cases
Determining responsible party(ies) would remain the job of the US experts, but WireVis could help them sift through the large
amounts of data in a more timely and effective data to support conclusion finding
Prosecuting responsible parties
Return on Investment
Annual capability and/or efficiency improvements:
•  Process more data in a more cost and time effective manner
•  Supports better understanding of criminal network strategies and
practices, which could lead to more arrests and prosecutions
Millions of U.S. dollars saved by
disrupting the communities
responsible for large-scale financial
Presenter’s
Name
June
17, 2003
crimes around
the
world
12
Impact on Operations
Financial Fraud Investigation (As-Is)
Collect
Fraud
Data
0 Day
Extract Data
& Upload
(Transaction
and IP Tags)
Time =
Days
Analyze Data
(PatternsTrends)
Time =
Weeks
Share &
Disseminate
Days+60
Financial Fraud Investigation (To Be)
Collect
Fraud
Data
Extract Data
& Upload
(Transaction
and IP Tags)
Analyze Data
(PatternsTrends)
Visualize
(Spatial/temporal)
Share &
Disseminate
Store (Future
pattern
recognition)
**Additional Capability
0
Hour
0.25
hours
0.5
hours
0.75
hours
2.00
hours
13
Return on Investment (ROI)
§  Analyst workload impact
§  One case that takes 60 days, using
two analysts:
§  As is: 480 analyst hours x $32.93/hr
= $15,806
§  To be: 4 analyst hours x $32.93/hr =
$131.72
§  Using some of the FY 11 data
on case loads:
§  125 disaster fraud investigations
opened, 1000+ ongoing
§  Undisclosed # of mortgage fraud
investigations
§  Undisclosed # of Electronic crimes
cases, including financial fraud
§  Very roughly 250 cases x $15K labor
savings per case = ~$3.75M in
analyst hours in first year
Assumes 13% rise in case load each year
14
Cybersecurity Foundations
§  VASA – Visual Analytics for Security Applications
§  Joint Germany-US research program
§  Three-year effort with 10 partner institutions
§  Focus is on Critical Infrastructures
§  Understanding and Disrupting the Economics of Cybercrime
§  DHS/S&T Cyber Security Division program funded through White House
CNCI
§  Carnegie-Mellon University is prime
§  Focus is twofold: identifying disincentives and understanding criminal
behaviors
§  LINEBACkER – Line-speed Bio-inspired Analysis and Characterization
for Event Recognition
§  Another White House CNCI initiative
§  Prime is Pacific Northwest National Laboratory
§  Focus is distributed, early warning of possible cyber attacks
15
Visual and Data Analytics for Cybersecurity
Heterogeneous Data
Situational
Awareness
+
- Diverse
- Diffuse
- Distributed
Predictive Insights
Analysis
Analytic Tool
Description
Performer
Investment to Date
Status
Traffic Circle and
Clique
Cyber threats analysis
tools for network
attacks
PNNL
Basic capability funding, $2.75M
US CERT and other implementation,
$1.5M
Pilot
deployment at
US CERT
In-spire
Text analysis tool now
being applied to cyber
threat data
PNNL/U Ill UC
Basic capability, $3.5M
Cyber threat application, $200k
Deployed
@NBIC
Precision Information
Environment (PIE)
Situational awareness
and decision-making
for large-scale, multiagency emergency
response actions
PNNL
$2.25M
Deployed w/
FEMA Region
X
GREEN Suite
Near real-time analysis
of threats, risks
vulnerabilities of
Power Grid
PNNL
$3.5M S&T funding. Co funding from IC,
DOE, and Bonneville Power
Deployed at
Bonneville
Power Admin
WireVis
Financial fraud
analytical tool
UNCC
$1.75 S&T (COE) funding, $5M Bank of
America (BofA) funding
Deployed at
BofA
16
Technical Approach
Analytic Tool
Data Inputs
Data Outputs
Capacity (As
Is)
Capacity (to
Be)
Wire Viz
Structured, Semi Structured financial
transaction data
Heatmap (Clustering),
Similarity based
comparison, temporal
relationships
Query up to 10M data
points/minute
Modify to work for
more general analysis
applications
CLIQUE
Structured
User defined temporal
views, anomalous
behavior
Up to 100M records
per query (laptop)
Higher throughput
possible when used
with other appliances
(Neteeza)
PIE
Structured, Unstructured (Text)
Event tracking,
Collaborative
environment
Up to 400k data points
Up to 1M data points,
add resource
modeling and tasking,
video &image
Green Suite
Multiple types of structured &
unstructured data
Geospatial (Location-connection)
Physics (voltage-Current)
Telecom (Phone calls, text messages)
Link analysis, network
analysis
Up to 1M nodes
(desktop)
Up to 100M link
queries/30 mins
Scale to 1B link
queries/30 mins
17
Program Timeline & Cost
Data Analysis for Cybersecurity
$6M
0
$7M
Yr 1
$8M
Yr 2
$3M
Yr 3
Yr 4
Yr 5
Big Data Analytics for Cyber Security Time: 2 YRS Cost: $ 6M 1) Common Integrated analytical platform
2) Near real time processing of big data sets
3) Collection, Processing, Analysis, Sharing
Homeland Security Data Analytics Network Time: 2 YRS
Cost: $ 7M
1) Nationwide Computational a nd analysis network for real time i nsight i nto the cyber threat e nvironment
2) Analogous to the Big Science networks in place for grand scientific c hallenges
Next Gen data a nd Interoperability for Cyber-­‐Disaster Management Time: 3 YRS
Cost: $ 11M (Yr 1 , $ 8M, Yr 2 $ 3M) 1) Establish/Maintain public-­‐private partnership w/joint funding
2) Nationwide Computational a nd analysis network for real time i nsight into the cyber threat e nvironment
3) Address multiple, cascading e mergency response scenarios for interdependent c ritical i nfrastructures 4) Establish e ducation c omplex with c ompetitions a nd c hallenges
5) Addresses economics of, i nsurance for and risks of cyber threats
18
CASe Phase I
Example Use Case: Financial Crime Event
Development:
Deliver an advanced data analysis tool to DHS and
one financial institution in support of DHS financial
fraud and electronic crime missions.
Situational Awareness and Collaboration
PIE
CLIQUE
WireVis
Green
Suite
Network
Activity
Analysis
tool
Business
processes
tool for
financial
data
Massive
scale link
analysis
tool
CASe
er
Oth ied
tif
Iden ta
Da
(e.g. Insurance
fraud, ATM
locations, card
transactions)
Financial
Crime
Event:
42,000
transactions over
1 week
Loss of $39M
Current
Analysis:
4-weeks to
On average, 80% of
organize &
analysts time is spent
analyze giant Example
organizing, not
spreadsheet data
analyzing
problem
- data
Program Plan:
1 financial
1.  Concept development/Requirements Analysis: Observe
crime
event
& document DHS and financial
institution
analytic
processes& operational Needs
over the course
2.  Design: AoA, Tech Foraging, Development strategy
1 week:
3.  Development: Tailor/modify of
existing
tools to meet
different requirements for DHS
and
Financial
Institution,
42,000
increase application capacities
transactions
=a
4.  Development testing. Test each
application throughout
development phase
loss of $39M
5.  Integration: Integrate CLIQUE, WireVis, Green Suite
and PIE
6.  Integration Testing
7.  Pilot Deployment; Deploy integrated suite for piloting at
both DHS and commercial facility
8.  Finalize S/W and Documentation for customer
acceptance and transition
9.  Transition
CASe Phase 1 Timeline
Phase
1: Data
Analytics
Deployment
to USSS
Commercial
Bank
Phase
I: Data
Analytics
Deployment
to DHS
andand
Commercial
Bank
T = 0M
2M
Project Approved
4M
Concept Development
$300K
PNNL IA Awarded (Mod)
UNCC Grant Awarded
Req Analysis
1) Interviews
2) Review & document
analytic proc-­‐
esses
$450K
6M
8M
10M
1 YR
Strategy
ApprovedGo
/No Go
14M
16M
18M
Dev Review
Go/No Go
20M
22M
Integration
Review
Go/No Go
2 YR
Dev Review
Go/No Go
Design
$300K
Development 1) Tailoring S/W based on analyst/business processes (e.g. Wire V is for UDHS
SSS)
2) Increasing capacity for PIE
3) A dditonal data t ype handling for PIE
$1.68M
Development Testing
$540K
Integration -­‐ WireViz, CLIQUE, GREEN Suite & PIE
$1.2M
Integration
Testing
$660K
Beta Deployment
$120K
Final S/W and document Production
$150K
Transition
20
Technical Approach
§  CASe Phase 1: Data Analysis Capability for Financial Sector
§  This program element delivers a standalone system addressing localized to nation-wide to global
financial crimes and cyber attacks by individuals and criminal networks
Phase
Activities
Outputs
Cost
Concept
Development
•  Preliminary analysis
•  Initial transition strategy
•  Operational Needs statement
$300K
Requirements
Analysis
•  Define KPP, interfaces,
technical requirements
•  FRD
•  Tech Reqs Document
•  CONOPS (if required)
$450K
Design
•  Individual design features
•  Integrated design features
•  AOA Documentation
•  Sys Design Document
$900K
Development
•  Tailor PIE, WireViz, CLIQUE,
Green Suite to desired reqs
•  Testable s/w modules &
capabilities
$1680K
Development Testing
•  Analysis tool testing
•  Test plan/test report
$540K
Integration & Testing
•  Test integrated capability of 4
tools
•  Test plan/test report
$1860K
Pilot deployment
•  Deploy pilot
•  S/W fixes
$120K
Transition Prep
•  Final S/W builds
•  C&A Support
•  User/Admin guide
$150K
21
CASe Phase II
Development:
Build a distributed, networked
capability and incorporate additional
analysis capabilities as required by
respective infrastructures
Expand CASe’s analysis capabilities
by incorporating other tools, namely:
•  An investigative tool for modeling
attacker behavior and criminal
supply chains, and
•  A modeling capability for
understanding interdependencies
of critical infrastructure
Develop a networked capability to
access the CASe capability from
distributed locations, effectively:
•  Capitalizing on larger available
computing infrastructure
•  Increasing information sharing/
access
•  Better understanding of the
sector’s data analysis needs,
requirements, and future
challenges
PIE
CMU
VASA
Attacker
behavior &
cyber crime
supply chain
models
Critical
Infrastructure
interdependencies
CLIQUE
WireVis
Green
Suite
er
Oth a
Dat
CASe
22
CASe Phase 2 Timeline
Phase
2: Homeland Security Analysis Network
Phase 2: Homeland Security Analysis Network
T = 0M
Project Approved
2M
4M
Concept Development
$350K
Req Analysis
1) Interviews
2) Review & document
analytic proc-­‐
esses
$245K
6M
8M
10M
1 YR
Strategy
ApprovedGo
/No Go
14M
16M
18M
Dev Review
Go/No Go
20M
Integration
Review
Go/No Go
22M
2 YR
Dev Review
Go/No Go
Design
$700K
Development 1) Based ob BAA t opic awards, development will be required for t echnologies t o support distributed processing, storage retrieval
2) Tailoring CMU models
3) t ailoring WASA capabilities
$1.47M
Development Testing
$385K
Integration -­‐ WASA capability and Carnegie Mellon Cybercrime models into CASe baseline
$2.52M
Acquistion A pporach -­‐ BAA or fast t rack acq process
Awards made
Integration
Testing $840K
Beta
Deployment
$280K
Final S/W and document Production
$140K
23
Transition
Technical Approach
§  Phase 2: Homeland Security Data Analysis Network
§  This program element aims to deliver a networked capability allowing distributed toolsets and data
sources to be readily applied to address large-scale, distributed attacks on multiple infrastructure
sectors. The end state will be a network for cybersecurity threat analysis and information sharing,
analogous to the Big Science research networks.
Phase
Activities
Outputs
Cost
Concept Development
•  Preliminary analysis
•  Initial transition strategy
•  Operational Needs statement
$350K
Requirements Analysis
•  Define KPP, interfaces, technical
requirements
•  Multiple sector analysis
•  FRD
•  Tech Reqs Document
•  CONOPS (if required)
$245K
Design
•  Individual design features
•  Integrated design features
•  AOA Documentation
•  Sys Design Document
$700K
Development
•  Technologies to support distributed
processing, storage retrieval
•  Tailoring CMU models
•  Tailoring WASA capabilities
•  Testable s/w modules & capabilities
$1470K
Development Testing
•  Analysis tool testing
•  Test plan/test report
$385K
Integration & Testing
•  Test integrated capability of 2
additional tools plus testing of
distributed model
•  Test plan/report
$3360K
Pilot deployment
•  Deploy pilot to 2 sectors (ISACs)
•  S/W fixes
$280K
Transition Prep
•  Final S/W builds
•  C&A Support
•  User/Admin guide
$140K
24
CASe Phase III
Development:
Advance public-private partnerships by engaging a number of
committees and councils with missions to improve and secure
the banking and financial sector
“To continue to improve the resilience and availability of financial services,
the Banking and Finance Sector will work through its public-private
partnership to address the evolving nature of threats and the risks posed
by the sector’s dependency upon other critical sectors.” – National
Infrastructure Protection Plan, Banking and Finance Sector
Sector & Global Dependencies:
•  The Department of the Treasury and the
FBIIC have identified four important
sector dependencies with the Banking
and Financial Sector (BFS):
1.  Energy
2.  Information Technology
3.  Transportation Systems
4.  Communications
•  The BFS relies on an extensive and
complex supply chain, often reaching to
providers outside the U.S., including
third-party providers.
Financial and Banking Information Infrastructure Committee (FBIIC)
•  The international nature of financial
services markets and the cross-border
interdependencies in financial
Financial Services Sector Coordinating Council (FSSCC) for Critical
infrastructure require close cooperative
Infrastructure Protection and Homeland Security (CIP/HLS)
relationships with public-private sector
organizations in major markets around
the world.
Financial Services-Information Sharing and Analysis Center (FS-ISAC)
Others?
•  These relationships will ensure a
coordinated approach to financial
infrastructure protection around the
globe.
25
End State and Impact
§  Desired end state: Data analytics capability, to include deployment at multiple nodes across
the US, will help meet the requirements for increased situational awareness, information
sharing and analysis and improved decision support required in PPD-21
PPD 21 Directive
Data Analysis for Cybersecurity will:
Provide a situational awareness capability
that includes integrated, actionable
information about emerging trends,
imminent threats
•  Through nationwide network and public-private partnership, enable
consistent collection and analysis of cyber-risk data from a range of
government, industry, commercial, and internal sources to gain a
more complete understanding of threats, risks and exposures
•  Provide predictive insights into actual conditions within and across
multiple IT environments, including insight that can identify
anomalous behavior
•  Enable the storage, retrieval and analysis of massive historical data
sets in real time to identify anomalous activity
In conjunction with the SSAs and other
Federal Departments and agencies, provide
analysis, expertise and other technical
assistance to CI owners and operators
•  Establish public-private partnership to support increased information
sharing between public and private domains
•  Increased analytical capability to ISACs and other public-private
nodes through technology and educational/competition component
Support the Attorney General and law
enforcement agencies with their
responsibilities to investigate and
prosecute threats to and attacks against
critical infrastructure
•  Predictive insights can better guide allocation of scare investigative
resources
•  Discover of non obvious relationships for both investigative and
prosecutorial purposes
Integration and analysis function to inform
planning and operational decisions
regarding CI
•  Allow modeling & simulation for multiple, complex and cascading
emergency response scenarios for interdependent critical
infrastructures
•  Provide economic risk and analysis
26
Technical Aspects
§  Technical approach
§  Sample analysis of who is addressing which parts of the digital data analysis lifecycle
through their respective R&D programs:
Agency Programs
Collect
DARPA - Anamoly Detection at
Multiple Scales (ADAMS)
DARPA - Video Image Retrieval
and Analysis Tool (VIRAT)
DARPA - Cyber Insider Threat
(CINDER)
DOE - High Performance
Storage System (HPSS)
NASA Earth Observing System
Data and Information System
(EOSDIS)
NSF - BIGDATA
CSD - Data Analysis for
Cybersecurity
Curate
Store
Digital Data Life Cycle Steps*
Search
Retrieve Analyse
Visualize Share
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
§  Note: No agencies currently examining data analysis for the unclassified cybersecurity
domain across multiple steps in the analysis process
Source: Harnessing the Power of Digital Data for Science and Society,
January 2009, Interagency Working Group on Digital Data , National
Science and Technology Council
27
Return on Investment
§  Limited data available for analysis for Cybersecurity applications, however:
§  McKinsey Global Institute report found that up to $200B in savings could be realized
in the US Health Care Market through implementing big data analytic approaches to
develop or improve:
§  Comparative effectiveness research (CER)
§  Clinical decision support system
§  Predictive modeling
§  Clinical trial design
§  McKinsey report also estimated potential operating cost savings of 10-25% the
manufacturing sector by using sensor data–driven operations analytics. This has
obvious analogies in cybersecurity, where massive data sets available from the
sensor grid of the “Internet of Things” could provide near real time indications and
warnings
*Source: Big Data: The Next Frontier for innovation, competition and productivity,
McKinsey Global Institute, June 2011
28
Success and Transition
§  Potential Partnerships
§  Strategic:
§  Agency level partnerships with Department of Commerce (NIST), Department of
Justice, and Intelligence Community per Executive Order and PPD-21
§  NITRD chartered Big Data Senior Steering Group. Current focus is climate change
modeling, materials genome, and health records
§  National "Big Data Initiative“ comprising six Federal departments and agencies
committing more than $200 million to big data research projects
§  Tactical/Departmental Level:
•  CBP CISO, NPPD/CS&C/US CERT, DHS CIO, USSS
•  DHS component requirements, as expressed in the “Big Data” Steering Committee
report
§  Commercialization plan: The desired end state is a public-private partnership
technologies developed and integrated will be used in a public-private partnership
29
Program Management
§  Acquisition Strategy
§  Type of performer (national lab, academic institution, private sector)
•  Initial efforts with National Laboratories and Center of Excellence schools
•  BAA would be the preferred, long-term solicitation vehicle; LRBAA Topic CSD.17 Open
§  For each program element, plan to release BAA outlining the research requirements for
each
§  Cost sharing will be in the form of component or customer participation in technology
pilots, analytic and design reviews, requirements integration
§  Deliverables will be based on each contract and will vary by program element
30
Technical & Policy Risks
§  Technical risks to this project
§  Pace of data growth could outstrip traditional R&D timeline (Mitigation: Explore Cyber
Fast Track (CFT) like project; discussions underway between CSD, DHS OPO &
DARPA)
§  Analyzing, measuring, and ranking the provenance of massive heterogeneous data
sources
§  Policy risks
§  Information sharing policies could inhibit effectiveness of sharing process
§  Privacy concerns around the topic of data gathering and analysis
§  Working in concert with CSD Data Privacy Technology program
31
Summary - CASe
•  Applies modern informa+cs and decision-­‐making techniques to issues of cybersecurity and cri+cal infrastructure protec+on •  Develops and implements analy+cs capability for maintaining situa+onal awareness of and managing widespread, catastrophic cyber emergencies •  Addresses na+onal-­‐level requirements outlined in the recent cybersecurity Execu+ve Order and Presiden+al Policy Direc+ve •  Structured as three-­‐phase program, beginning with an agency mission-­‐
specific need and ending with a public-­‐private partnership •  Enables return on investment to be determined at conclusion of each phase •  Relies on available technologies: a few, basic capabili+es with known performance specifica+ons •  Will be pursued through BAAs •  Looking for partners 32
PART 3: Looking to the Future
•  Cyber Health •  LINEBACkER •  Anomaly Detec+on •  Reputa+on-­‐based Security •  Cyber Resilience –  Workshop: 17-­‐18 November 2014 –  US-­‐UK Collabora+on –  Book to be published Fall 2015 •  Cyber Iden+ty –  Workshop: 30 June – 1 July, Rutgers University –  Na+onal Conversa+on 33
Cyber Resilience
•  Theme 1 – Securing Infrastructure from Cyber Disrup+ons –  Situa+onal Awareness for Resilient Cyber Infrastructures –  Architecture and Design for Resilient Systems –  Understanding the Unique Aspects of the restora+on/Recovery of Infrastructure and Social Networks from Cyber Aaacks •  Theme 2 – Modeling and Measuring Societal Resilience –  Cyber Threats and the Percep+on of a Dependent Society –  Dynamic Topologies of Responsibili+es and Governance in a Cyber-­‐
disabled Era –  Cyber Threats in the Context of a Challenged and Changing World Order •  Theme 3 – Streaming Analy+cs for Effec+ve Data Exploita+on –  Cyber Decision-­‐making in the Presence of Noisy, Voluminous Data, Using Gold-­‐Standard Analogies –  Privacy-­‐preserving Informa+on Sharing –  Modeling, Monitoring, and Recognizing Poten+ally Dangerous Changes to Cyber Situa+ons –  Cascading Impact: Mul+dimensional Analysis of Infrastructural, Societal, and Vulnerability Networks 34
Cyber Identity
•  Theme 1 – Iden+ty Proofing in the Era of Social Media and Data Breaches •  Theme 2 – Provenance for the “Internet of Things” •  Theme 3 – Metrics for Trust Par$cipa$on in the Workshop and Na$onal Conversa$on Town Mee$ng is Welcome 35
Ques+ons or comments: joseph.kielman@dhs.gov 36
Download