Uploaded by aep_dam hilal

OT Security

advertisement
Table of Contents
1. Introduction
2. Defining industry-specific OT
3. What is cyber security?
4. Differences between IT and OT cyber security
5. OT security threat landscape
6. OT cyber security frameworks
7. Components of OT security through the NIST CSF lens
8. Organizational alignment in OT security
9. The Future of OT Security
10. Creating an OT cyber security program
Introduction
Industries managing OT security such as manufacturing, energy production, power distribution,
water treatment and supply, transportation, and healthcare all rely on a highly specialized
collection of technologies — referred to as “operational technology,” or OT — to produce,
move, heal, clean, and otherwise support the critical processes that are the pulse of their
endeavors. These industrial systems are increasingly under attack, targeted by malicious actors
with varying levels of skill and diversity of motives. Modern menace in OT ranges from
ransomware and IP theft to vandalism and full-blown acts of cyberterrorism that can disrupt
critical infrastructure, damage systems and facilities, and injure people.
This guide provides a holistic perspective on the processes, policies, and technologies that
together provide protection and defense of operational technology assets and processes from
cyber threats and attacks. It includes foundational elements of OT security relevant to those just
learning about the space, descriptions of various standards available to system defenders, along
with deeper dives into specific elements of OT security. This primer provides links to more
detailed coverage topics beyond the scope of this holistic guide. We hope this provides a
jumping-off point for those who need to build their OT security foundation.
Understanding The Many Facets of OT Security
As the name suggests, operational technology describes the combination of technologies that
support and enable industrial operations. OT encompasses a variety of systems from a wide array
of industries ranging from transport (rail, maritime, etc.) to logistics (ports, warehouses, etc.) and
many more.
OT also covers so-called “cyber-physical systems,” the set of technologies responsible for
monitoring and controlling real-world physical processes.
Basic systems in modern OT:
ICS: Industrial Control Systems
ICS includes a wide range of systems sometimes referred to as “factory automation” or
“distributed control systems”, and typically include DCS, SCADA, and IIOT. Industrial control
systems act as the interfaces to manufacturing, process management, rail or maritime transport
controls, and other similar functionality.
DCS: Distributed Control Systems
A subset of ICS, DCS describes complex systems in discrete or continuous manufacturing
environments that help control and manage production facilities. Functions such as power
generation, manufacturing, and refining often have significant OT assets in a single, geographic
site.
SCADA: Supervisory Control and Data Acquisition Systems
SCADA systems operate as an overarching data network that captures inputs and outputs of an
industrial process and facilitates system monitoring, analysis, and control. SCADA systems
collect data from widely distributed I/O devices across a large geographic footprint. Processes
such as electric transmission, pipelines, and rail all typically deploy SCADA technology.
Buildings and Physical Access Controls
OT also includes systems that control physical facilities. This includes elevators, HVAC
systems, lighting, and other physical elements. Building and access controls also include security
cameras, swipe cards, electronic door locks, and similar systems. Such building controls use
proprietary protocols and employ approaches very different from the industrial systems
mentioned above.
IIoT: Industrial Internet of Things
Sometimes considered a subset of ICS or SCADA, IIoT warrants its own category because such
devices often are not connected to the controls network, instead operating over public or private
wireless networks, a distinction that raises unique security challenges.
Medical Devices
These come in two varieties: devices that control on-site medical devices providing various
services to patients either in hospitals, homes, or doctor offices(MRI scanners, IV pumps and the
like); and consumer health devices such as pacemakers, insulin pumps, and prenatal monitors.
Four types of OT devices
Securing OT systems is challenging in part due to the wide variety of device types deployed on
OT networks. Servers, workstations, firewalls, diodes, remote terminal units (RTU), relays, I/O
devices, IIoT sensors, cameras, and backup power supplies are just a few of the thousands of
device types that comprise modern OT environments.
From a security perspective, it’s helpful to organize the plethora of OT constituents into four
broad categories:
Servers, workstations, HMI’s, and more.
These typically run traditional commodity operating systems such as Windows or Linux and are
used for a variety of control and reporting tasks from domain control to operating critical process
application software. They may also act as historian servers, which gather and forward data to
enterprise data collection.
Networking equipment
In addition to traditional IT-style switches and firewalls, OT systems include specialized
networking equipment such as industrial firewalls that control traffic using industrial
protocols. These purpose-built devices run proprietary embedded operating systems from the
networking company manufacturers.
Embedded control devices
Here the list grows significantly with the enormous diversity of control devices. The collection
includes PLCs (Programmable Logic Controllers), distributed control systems controllers,
remote terminal units, protective relays, machine controls of manufacturing devices, physical
access controls such as swipe cards, and a wide range of medical devices which control inputs
and outputs for medicinal dosing or bio-regulation signals, for example. These devices run
proprietary, embedded operating systems developed by manufacturers, often built on commodity
components with custom elements.
I/O: Input/output devices
While the list of control devices is long, the roster of I/O devices is practically limitless. I/O is
sometimes integrated with the control, but here we separate pure I/O devices, which provide
inputs to or outputs from processes. These can be cards in a PLC rack, cameras, pressure or
temperature sensors, and thousands of other types. Like embedded control devices, these devices
also run on proprietary manufacturer operating systems often built on commodity components
with custom elements.
Defining industry-specific OT
Process controls (continuous manufacturing rather than discrete)
Examples: Power generation, chemicals refining, pulp and paper, many consumerpackaged goods, water/wastewater
These systems depend on integrated systems that control a wide range of inputs and outputs for
end-to-end process management. Often managed via distributed control systems, these processes
require precise inputs and outputs throughout. The control systems must adjust in real-time in
response to readings from I/O devices while also guarding against unintended changes that could
cause catastrophic physical damage or harm the product itself.
In many cases, these processes integrate different types of controls – the core process itself, other
systems sometimes referred to as “balance of plant” – this could include environmental controls,
safety controls, water treatment, measurement of outputs such as vibration or temperature. While
these systems frequently include equipment from different OEMs specific to their functions, the
entire system needs to work together.
Risks in process control environments can be extreme. Many of the best-known OT cyber
security incidents targeted such systems. Examples include Stuxnet, which targeted Iranian
centrifuges, along with the multiple attacks on water treatment plants like the 2021 attack in
Oldsmar, Florida, the attack on the Saudi Arabian petrochemicals facility often referred to as
Trisis. All these incidents either threatened or achieved significant physical damage to the
facility or to the output of the facility.
Discrete Manufacturing
Examples: Automotive, electronics, and many other manufacturing industries
Here, OT systems manage specific steps in a manufacturing process. Often they are controlled by
PLCs programmed with “ladder logic” or a set of commands that perform a set of functions such
as turning a cutting tool, then picking up the part, and changing its angle for another cut. They
can be simple sets of commands or complex routines with thousands of commands built over
years. In many cases, such controllers are strung together to operate as a cohesive series of stages
of the manufacturing process.
These systems often have many I/O devices feeding data back into the controller and then acting
on the commands from the controller to adjust the process as needed. These devices might be
networked together or they may operate as stand-alone cells in a process.
Risks associated with these systems involve potential damage to products or disruption of
production resulting in financial loss. In addition, if the attack is targeted to a particular part of
the process or impacts robots or other mechanical devices, it could cause physical damage to the
plant or harm humans as well. Finally, in sensitive manufacturing operations, discrete systems
often contain classified or sensitive information like intellectual property that could be
compromised in an attack.
Distributed control/SCADA
Examples: Pipelines, electricity or water distribution, transportation
These systems are characterized by physically distributed controls that require wide-area
networking (WAN) capabilities to maintain visibility and control. Distributed controls rely on a
range of networking devices and types to gain that visibility and they’re typically used to control
valves, protective relays, meters, and the like.
Cyber risk related to these systems ranges from disrupted operations to damage to physical
equipment. Impacts can be significant – from shutting critical valves on a gas pipeline providing
the necessary fuel to power plants to disabling protective relays which could stop power
distribution across the grid.
Medical devices
Perhaps the most personal of devices are those used to capture medical images, manage IV drug
delivery, and regulate vital health metrics such as heart rate. These systems are often integrated
into larger medical information networks connecting to personal medical information that
contains sensitive patient data. In most cases the devices themselves are independent, however,
not providing an integrated process like those used for chemical manufacturing or the power
grid.
The risk from attacks includes physical human damage from inappropriate changes to inputs or
outputs of the systems connected physically to patients. Also at risk are sensitive medical records
and other PII should an attacker be able to pivot from the device to the organization’s data stores.
What is cyber security?
An ultimate guide to OT security must include some collection of general cyber security
components that can be translated into the specifics of OT. We won’t try to cover this
comprehensively here, however; there are hundreds of articles that outline many detailed security
programs by function, section, industry, regulation, or standards body.
OT cyber security is a constant challenge with ever-changing threats, perennially increasing
vulnerabilities, and evolving attacker business models.
Despite that, the core elements of cyber security remain foundational. While there are dozens of
frameworks available, perhaps the simplest is the National Institute of Science and Technology
(NIST) Cyber Security Framework. NIST is the U.S. governmental body that establishes
standards for a wide range of technologies; its CSF adroitly takes multiple, established standards
and maps specific components to a set of five common functions.
The five functions of the NIST CSF:
Identify
The Identify function includes requirements for aggregating inventory and categorization of a
company’s technology assets, networks, and risks into a comprehensive assessment. One major
priority of this component is the identification of all assets, software, and users on the network.
This is fundamental to cyber security. As often stated, “you can’t protect what you can’t see”. In
OT, this is among the most basic challenges given the range of networks.
Protect
NIST’s Protect function covers the defense of assets, networks, and information. This protection
takes different forms, from endpoint to networking to information to access controls and more.
Core to the mission of the protect function is the ability to stop potential threats before they are
able to gain unauthorized access, exfiltrate information, disrupt devices and processes, or install
malware.
Detect
In the Detect function, the NIST CSF gives guidance for spotting potential threats, actions, or
events in order to give defenders time to respond before attackers gain a foothold and inflict
serious damage. Proper detection requires the ability to recognize threats at multiple stages of an
attack including as malicious actors hit the network, after they’re already present in the system,
and as they attempt to pivot across the network. Detection requires not only the ability to monitor
information coming from a range of sources but also the ability to analyze patterns of behavior to
spot the potential threats in a sea of data. While detection is well-established in IT it is an
entirely different challenge to collect and analyze appropriate data in OT.
Respond
Detection is useless if the organization does not have the ability to respond. The Response
function comprises a set of actions defenders can use to react to the information that emerges
from a detected threat or anomaly. The response includes both the further analysis required to
understand whether something that was detected is a true threat, as well as the ability to act to
stop the threat or, at least, minimize the damage. The response also should include the ability to
interact with stakeholders when an attack is successful to alert them and provide information on
what they should do.
Recover
It is often said that cyber-attacks are a matter of “when” not “if.” The reality is that attackers are
well-funded and innovative. They only need to be right once, and defenders need to be right
constantly. As a result, recovery is a critical part of a robust cyber security program. NIST’s
Recover function covers rapid restoration of operations and data in the wake of a compromise.
Even when an attacker fails to cause an outage or steal data, all devices, networks, and
information need to be restored to a known-good state from a point in time prior to any
successful intrusion.
Differences between IT and OT cyber security
OT systems differ significantly from IT systems. First, the devices themselves create challenges
for traditional IT security processes and technology. A sample of devices includes old versions
of Windows such as Windows XP or Windows 7, a wide range of embedded devices such as
PLCs, controllers, relays, sensors, industrial (and traditional IT) networking equipment, and
more.
These devices require a different approach to security from the modern, updated, OS-based, or
cloud-based devices in today’s IT stack.
Second, the protection priorities differ greatly between IT and OT security. IT cyber security
efforts are guided by, priorities the well-known C-I-A triad. In order, the priorities are:



Confidentiality: Systems and data t are protected from unwanted or unauthorized access
Integrity: Systems and data are accurate, appropriately tuned, and verified
Availability: All systems and data are stable, online, and ready to function
In OT cyber security however, the greatest risks are to the safety of people and property which
are protected by OT safety and process control systems (availability) followed by integrity.
Information confidentiality, while important, pales relative to the others. As a result, OT risk
management must also adjust accordingly.
Unlike IT systems that value confidentiality and integrity first, OT systems are better served with
a risk management approach known as Safety-Reliability-Productivity, or SRP. Priorities here
include::



Safety: Covers activities designed to ensure the safe operation of a facility. This may
include the physical safety of employees or citizens close to the industrial operation.
Safety is a top priority because many industrial processes have the potential for
catastrophic harm to life and property — chemicals can explode, heavy machinery can
fall or change position quickly, robots can injure employees, trains can derail.
Productivity: After safety comes concerns over the risk of slowed or disrupted
operations as a result of a cyberattack. Attackers can manipulate PLC programming to
slow production runs or impact cold storage chains causing certain “lots” of product to
have to be thrown out or worse.
Reliability: The recent rise in ransomware attacks on OT operators demonstrates the
importance of system reliability. Such malware attacks cause significant plant disruption
resulting in deep financial losses.
While manufacturing may not seem an obvious target, eight of nine attacks on manufacturing
organizations last year caused shutdowns across multiple plants. The expensive and wellpublicized disruptions hackers leverage to ask for large sums of cash — up to $10 million in
some cases —especially in industries where cyber security insurance policies are common.
Specific safeguards and responses: the real OT difference makers
Differences in core devices and risk profiles aside, the most meaningful delta between IT and OT
from the security practitioner’s perspective resides in the specific knowledge of control systems
and security required to manage an OT security program and respond to attacks in the industrial
environment as they happen.
Incident detection and response in OT demands a specialized understanding of the unique
systems affected. IT systems are commodities with functions grouped and analyzed with a wide
range of readily-available detection rules. Incident responders in IT have the benefit of safe,
effective, well-documented actions they can take uniformly and automatically when trouble
strikes. With industrial control systems, however, system behavior is unique – often to that
process.
In OT, response must be measured and handled in a way that does not cause additional harm by
stopping expected operational processes inappropriately. Remediation tasks addressing
vulnerabilities and insecure configurations also require sensitive approaches on OT systems.
Patching, for instance, may require multiple other elements of the control system to be upgraded
which may be financially infeasible.
Finally, to secure OT safely and with operational resilience, specific knowledge of control
systems and security is required, a unique combination in even shorter supply than the muchpublicized IT security skills gap. OT systems were often designed years or decades ago and there
is a shortage of skilled personnel that understands them. To secure OT, the industry needs to
bring traditional IT security capabilities to the cadre of professionals with deep knowledge of the
arcana of OT systems. This emerging discipline, known as OT Systems Management, includes
the ability to conduct remediation tasks such as patching, vulnerability management,
configuration, and user management.
OT security threat landscape
The growing threat to OT systems is driven by several factors:
Increased connectivity between OT and IT
Historically, many OT systems had limited connectivity to corporate IT systems. Most operated
on OT protocols, were not dependent on corporate applications, leveraged proprietary operating
systems and devices, and remained isolated from corporate networks.
Over the past two decades, this separation between IT and OT has evaporated. Even before the
modern push for IIoT or “Industry 4.0,” industrial organizations and OEMs that provide control
systems “modernized” systems leading to increased connections with traditional IT networks and
devices. Modern OT environments now feature commodity hardware and software such as
Windows operating systems, virtual environments, and IT networking equipment. With the
increase of IIoT initiatives, such connectivity is expanding as analytics and productivity require
direct links to enterprise cloud and data center applications.
Increased research and focus on OT vulnerabilities
For many years, OT benefitted from so-called “security by obscurity.” While well-known,
widely distributed IT systems made attractive targets for attackers, the lesser-known, bespoke
wares in OT security remained mostly off the hacker radar. That’s changed with the increased
use of commodity IT devices in OT as well as the common practice of leveraging traditional IT
embedded components to build OT firmware and applications.
Verve’s analysis of ICS-CERT advisories for the past two years shows a nearly 50% rise in
published vulnerabilities. This is likely just a fraction of the vulnerabilities actually disclosed. In
addition, many embedded software vulnerabilities are never linked back to a corresponding OT
device, meaning that unknown risks abound.
Increased targeted attacks
Thanks to shifting motives and new-found ways to profit from cybercrime, attackers now have
industrial organizations in their crosshairs. For much of the past two decades, criminals focused
on stealing high-value data such as credit card or private medical information. That’s changing
as attackers discover new-found ROI in industrial targets. Industrial organizations have shown a
willingness to pay ransomware actors millions of dollars to avoid costly shutdowns. Nationstates, meanwhile, are increasing their threats to critical infrastructure as noted in several U.S.
government reports in recent years. In 2020, manufacturing moved from the eighth most targeted
industry to the second.
The OT cyber security threat landscape is covered well by a number of organizations, but the
following immediately come to mind:



SANS ICS focus area for threat reports, blogs, podcasts, conferences, and training
FireEye’s summary reports
IBM’s annual X-Force Threat Indexes
Three classes of OT security threats:
Collateral damage
In 2020, cyber security detection and response firm Mandiant analyzed OT cyberattacks and
found more than 95% began with an intrusion into IT systems that led to attackers pivoting into
the OT environment. This type of collateral damage is very common in OT incidents. In one
widely-publicized example, the Wannacry/NotPetya ransomware and wiper attacks of 2017 and
2018, OT systems were not the initial target, but were compromised and disrupted due to poor
network segmentation and a lack of patching. The incident cost companies including Merck,
Mondelez, Maersk and others billions of dollars in lost productivity and recovery
expenses. When it comes to protecting OT environments, such untargeted, highly-damaging
threats must be considered.
Insider threats
According to KPMG’s 2020 OT security survey, nearly 60% of industrial organizations that had
suffered at least one cyber security incident in the past 12 months claimed insiders were the
cause. Forty-five percent were the work of “negligent insiders” who made an error that caused or
enabled a data breach. Another 11% said the compromise was the result of a malicious insider.
While nation-states get a lot of attention and publicity, fewer than 13% encountered a “nationstate” attacker in the past year, according to the report.
The fact remains, insiders have extensive access to industrial facilities as ease of operation and
reliability become increasingly vital metrics. This means mistakes – or more intentional
malicious activity – carry significant security risks.
Targeted third-party threats
This broad category includes nation-states, “hacktivists,” and financially motivated actors, to
name but a few. Ransomware targeted at OT systems creates significant financial impact and a
consequential urgent pressure to pay unless robust incident response and recovery processes are
in place.
Nation-states in particular now increasingly target critical infrastructure like power grids,
pipelines, pharmaceutical supply chains, and more. What this means for asset owners:



Third parties providing services or products used in the OT environment face attack as a
way to tangentially target industrial organizations either intentionally via collateral
damage.
Products deployed in OT environments often contain source code or components that did
not come from the OEM and may be the result of integrating systems of systems.
Organizations could have backdoors embedded in products that can be leveraged by
sophisticated attackers to their strategic advantage.
Enterprising malicious actors may target industrial organizations in response to economic
forces or political conflict. Organizations portrayed as unscrupulous or immoral because
of the products or services they provide may be targeted simply to further the attackers
own initiatives rather than for monetary gain.
Regardless of the threat vector, today’s attackers enjoy access to copious technical and
organizational reconnaissance data on the internet (also known as OSINT or open-source
intelligence). As a result, large, strategic organizations are not the only targets; even small-tomedium businesses can be selected, surveilled, and surgically attacked.
OT cyber security frameworks
The multiplicity of security frameworks available to OT security practitioners only adds to the
complexity when it comes to developing effective programs and robust OT defenses. The roster
of popular regulatory and self-managed control standards includes both industry-specific as well
as general OT guidance. Some of the available frameworks are audited by regulatory bodies
while others are strictly voluntary. Some are directional and others are prescriptive. The best
known among the applicable OT security frameworks available include:
NIST Cyber Security Framework
NIST has very detailed cyber security controls recommendations including some specially tuned
for industrial control systems along with emerging guidance for IoT environments. As noted
above, the CSF is a more general set of guidelines with approximately 120 sub-controls across
five primary dimensions.
The five NIST functions span both technical and procedural controls, providing a foundation for
cyber security assessments. In each of its functional areas, the NIST CSF describes sub-controls
with detailed guidelines to achieve specific maturity levels.
Maturity in NIST CSF is defined by establishing a set of “profiles.” These profiles are not
prescriptive, though NIST does offer some suggested models. Organizations must determine
their own maturity targets and profiles, which adds to the framework’s flexibility.
NIST CSF is the most-used standard in ICS security according to SANS. In their 2017 and 2019
ICS security surveys, more respondents were using NIST CSF than any other framework,
followed by CIS Top 20, ISO 27000 Series and IEC 62443/ISA 99. The NIST CSF remains an
attractive alternative as it provides directional and foundational guidance without prescriptive
policies or controls that organizations may find too restrictive for their unique OT environments.
Center for Internet Security (CIS) Top 18 Security Controls
The Center for Internet Security is a non-profit, non-governmental organization that seeks to
improve overall individual, corporate, and government cyber security. Approximately ten years
ago, a group of organizations, including DHS/CISA, SANS, and several international cyber
security bodies came together to establish a common set of controls designed to improve the
security maturity of any organization. Originally known as the CSC 20, the framework was pared
down in May 2021 to a simplified list of 18 high-level controls. The updated roster of top-level
controls comes with 153 sub-controls or “safeguards” that provide more prescriptive guidance
than the NIST CSF.
The CIS Controls v8 represents a complete revamp of the framework’s approach. Previous
versions of the organization’s popular guidance, which many organizations are already heavily
invested in, broke top-level controls down into three groups dubbed: Basic, Fundamental, and
Procedural.
The first six controls of the former CSC 20 were referred to as “Base”. This included hardware
and software inventories, endpoint vulnerability and configuration management, user and access
management, and event logging. The second set of ten controls covered a wider range including
network configuration management, network segmentation, incident detection and response, and
data controls. Finally, the Procedural grouping included training, penetration testing, and DFIR.
One advantage of the CIS framework is its prescriptive levels of maturity for each sub-control.
This avoids much debate as to what the organization’s profiles and target levels should be. The
framework’s five levels of maturity for each sub-control are based on the number of devices,
accounts, or assets covered by that sub-control. Organizations can establish a specific maturity
level requirement then measure with specific, quantifiable metrics how they are doing against
that maturity objective. This offers significant advantages to organizations struggling to make
progress in aligning stakeholders.
CIS designed its framework with IT in mind and many CIS controls do not translate to sensitive
and embedded devices in OT. However, CIS has developed an “OT” version that attempts to
address these limitations. Nonetheless, it is feasible to adopt CIS guidance on OT systems with
appropriate compensating controls and technical feasibility exceptions. Many organizations have
achieved robust maturity across IT and OT leveraging the CIS framework with the appropriate
adaptation to OT. Read the CIS Top 18 Case Study to learn how a U.S.-based energy company
improved its cyber security readiness and maturity.
NIST 800-53 and other sub-standards
The NIST CSF is essentially the high-level summary of a much more detailed set of controls
defined in its NIST 800 publication. Clocking in at nearly 700 pages, The NIST 800 lays out a
comprehensive set of controls covering virtually all computing systems. 800-53 is a deep dive
into a specific set of controls relevant for ICS security, a sizable subset of OT security. As such,
the 800-53 standard is a helpful — if rather hefty — guide to NIST’s suggestions on OT
security.
Most organizations will use 800-53 as an enhancement of their CSF-based program, rather than
trying to achieve comprehensive control for all elements of the 800-53 standards. There are more
than200 pages in NIST 800-53 covering everything from traditional IT-like controls to details
specific to ICS security.
The ISO 27000 series
ISO 27000 was produced by the respected International Organization for Standards which covers
a wide range of standards and processes for ensuring quality, security, and productivity. ISO
27000 was developed in coordination with the International Electrotechnical Commission (IEC)
and focuses on Information Security Management Systems. Like other ISO standards,
organizations can be certified by ISO auditors though that is not a requirement for its use. The
standards are considered best practices that organizations can adopt to improve their overall
maturity without going through a certification process.
ISO 27000 is a general IT security standard and is. It is not specific to OT or ICS systems.
Similar to NIST CSF and CIS, however, many components of the ISO 27000 standard are highly
relevant to OT environments.
The ISO 27000 series is procedural in nature and is often used in tandem with NIST CSF or
IEC62443 which are more technical. ISO 27001 is the actual ISMS set of standards and an
organization can be certified by ISO on that standard as well. ISO 27002 is the list of
recommended best practices organizations can choose to pursue to achieve maturity, but there is
no certification for ISO 27002.
IEC 62443 and ISA 99 standard
The IEC 62443/ISA 99 is an OT-specific standard. Jointly developed by the International
Organization for Standards and the International Society of Automation, the framework details
four levels intended to provide security for different types or maturity of attacks. Organizations
can determine which security level is most appropriate based on their own, unique compliance or
supply-chain requirements.
Each IEC 62443/ISA 99 security level establishes a set of requirements to achieve that level. For
example, there are 37 requirements to achieve SL1 and an additional 23 to reach SL2. These
components look very much like those included in the CIS or NIST frameworks – which makes
sense considering these standards aren’t trying to reinvent security, only focus efforts.
Because IEC 62443/ISA 99 is specifically tailored to OT environments, the controls do provide
more context for items relevant to operational technology. For instance, the term of art known as
“zones and conduits” is a hallmark of this standard. “Zones” can be thought of as the different
parts of a network where devices may talk to one another, a characteristic particularly relevant in
segmented OT environments. The “conduits” then are the paths of communication either within
or between zones. IEC 62443/ISA 99 includes recommended architectures to ensure secure
communications within zones and across conduits.
The holistic IEC 62443/ISA 99 doesn’t compete with the NIST CSF, but can be used as an
additional framework to help guide OT-specific implementations in a NIST-based program.
>> See our guide: Protecting OT Systems with IEC 62443
Other OT framework contenders
A more comprehensive list of security standards might include UK NIS directives, CFATS,
RIIO2, NERC CIP, and many others around the world. Over time some of these standards may
become more widespread as organizations find using them to be more helpful than others.
These cyber security standards and frameworks risk an “alphabet soup” of acronyms, numbers,
rules, and requirements. The most important detail to remember however is that the foundational
elements are similar across all of them. Nearly all can be mapped to one another. The basic
functions of Identify, Protect, Detect, Respond, and recover laid out in the NIST CSF appear
again and again as core elements across all the frameworks. Some organizations face regulatory
requirements, so the choice of framework becomes more straightforward. But even for them,
security is usually more than compliance and the choice of which framework to overlay on top of
the regulatory requirements becomes important.
Components of OT security through the NIST CSF lens
Summarizing the components of OT cyber security from these standards is a challenge given the
breadth of requirements, and no summary will satisfy everyone as it is sure to leave something
out. Leveraging the functions of the NIST CSF as a structure for OT cyber security requirements,
we offer here a simplified version of the DHS’s Defense-in-Depth model. Each section includes
elements that apply to the broader categories of policy and procedure, networks, access controls,
endpoints and information.
Identify
Fundamental to the OT cyber security journey is a complete view of the organization’s assets,
accounts, software, and connections. This is necessary to create a robust risk assessment that can
be used to develop a risk remediation roadmap and prioritize potential impact from a cyber
event.
Asset inventory forms this foundation, and here is where challenges in OT security often begin.
Legacy systems running proprietary embedded operating systems or older Windows operating
systems are difficult to inventory. Traditional IT approaches such as scans and traffic analysis
can damage legacy devices. Even when they don’t cause havoc, they frequently fail to identify
the asset itself or capture basic information about it. An effective asset inventory needs to
include:









Make, model, and operating system or firmware version of the asset
All application software on the asset
All users and accounts and their administrative rights
Critical configuration information such as communication, password, administration,
ports and services, and other settings
Presence of security functionality such as anti-virus or malware, application whitelisting,
or backup status
Connectivity of the asset such as dual network interface cards (NICs)
Patch status and vulnerabilities
Asset criticality to the ultimate process
Presence of sensitive information
There are both technical and procedural methods available to gather such an inventory. In many
organizations, OT resources gather this information on spreadsheets. Others may combine
manual and technical means. Still others rely on traffic capture through networking equipment to
try to infer what asset information they can through packet inspection or will leverage
technology that communicates directly with assets in their native protocols to gather deeper and
more accurate inventory.
The “Identify” component also includes the ability to produce a comprehensive risk picture for
the environment. This includes vulnerability analysis of endpoints with data such as CVEs from
the National Vulnerability Database or ICS-CERT, the U.S. government agency that manages
alerts on industrial controls vulnerabilities and threats.
>> Read more about OT Vulnerability Management and overcoming common challenges.
Risks in OT don’t stop at identified software vulnerabilities. OT wares are often “insecure by
design,” meaning networks and systems were never designed with security in mind; they are
insecure even if the associated software has no documented vulnerabilities. For instance, in an
environment where the goal is operational efficiency, all operators can make administrative
changes to programmable logic controllers as a way to ensure rapid response to issues and
continually improve the process. Similarly, remote access is often widely available to allow
OEMs and other support providers to make changes with little or no security safeguards. The
prevalence of open ports and services offers attackers easy access without taking advantage of
vulnerabilities.
Finally, the Identify component should include the prioritization of risks and a remediation
roadmap. Identify should not stop with an assessment that includes all of the areas lacking in
security. In almost every case, the first vulnerability assessment in OT security will find a wide
range of risks, many critical. The key to a successful Identify component is the prioritization of
risks based on likelihood and potential impact. Equally important is the development of a
roadmap of initiatives to close the most significant gaps over time.
Protect
A robust Protect program includes elements that encompass policies and procedures, network
protections, systems and data access controls, and endpoint and application protection. At each
level, organizations can set up defensive layers that make it more difficult for an attacker to
breach systems, cause damage, or steal information.
Policies and procedures
Configuration and change management
These policies describe the minimum secure configuration standards for different types of
systems.
Vulnerability and patch management
These policies define when and how vulnerability assessments are to be completed, the standards
for remediation of vulnerabilities in terms of timing, criticality, and compensating controls, as
well as the review process for deploying patches to OT systems.
Access control policies
These policies define who and how users will gain access to systems and to information. These
include policies of “least privilege” which means ensuring that only those absolutely required to
have privileges do have them. It also includes procedures to ensure those least privileged policies
are in place, monitored and that any remediation is made in a timely manner. This will also
include the level of access different assets have to different zones within a network, a safeguard
fundamental to OT security.
Information control
These define how information is stored and transmitted over the network. This can include
sensitive information such as programming of control devices or performance and sensor data on
the process itself. In some environments, this may include intellectual property that needs to be
safeguarded from potential threats.
These policies and procedures require adjustment from IT security stakeholders as well. The
devices, systems, information, and users in OT require that policies adopted by IT need to evolve
in OT as well. This is both to make them more strict and to recognize the need for flexibility.
In IT, it may be normal to include software such as TeamViewer, Webex, or other
telecommunication software on a standard configuration. In OT, however, where assets may be
less protected and processes more sensitive, this software should not be installed by default. On
the other hand, patching of OT systems requires testing and may only be possible when systems
are offline. This likely means patching cannot occur at the same pace as in IT. As a result, other
compensating controls are required to ensure the protection of these systems until they can be
patched.
Network protections
Common in many OT environments, network protections provide an initial layer of defense for
sensitive systems that control critical processes. Network protections exist at the perimeter of the
corporate network connecting to the internet or cloud. They may also be found at the OT
network perimeter where it connects to the corporate network. Finally, they occur within the OT
network itself, providing segmentation and separation to various systems in the process.
There is no magic formula to the perfect network protection design. Approaches differ based on
the factors such as risks to access, age of endpoints, organizational defensive capabilities, and
required connectivity to corporate systems. This guide provides an outline of the types of
protection an organization might pursue as well as some common industry approaches, such as
the Purdue Model.
Types of network protection include:
Hardware
Perhaps most well-known in cyber security are the hardware-based network protections.
Data diodes
These devices offer one-way access to OT environments. Used heavily in the power generation
and oil and gas industries, data diodes allow traffic to proceed securely from the control system
to the enterprise IT system with no reciprocal connectivity.
This ensures that attackers cannot access these systems through inbound connections while still
allowing for monitoring of internal systems by corporate analytics tools. When incorporating
data diodes, network design is critical. Improper configurations can make the devices less
effective or allow traffic to circumvent the diode structure altogether.
Data diodes are a relatively constricting form of network protection and can have significant
operational drawbacks if implemented as originally designed. Many of the advances envisioned
by Industry 4.0 and IIoT are difficult to achieve with conventional diodes in place. In response,
many diode manufacturers now offer revised portfolios with so-called reversible and “two-way”
diodes that facilitate inbound access based on customizable criteria such as time and date
schedules or traffic types.
Firewalls
Well known in IT security, firewalls in OT environments serve a similar purpose, but are
purpose-built to monitor OT traffic and configured to allow or reject communications into the
OT network or subnet. While a number of traditional IT firewall vendors include OT capabilities
in their products, OT-specific firewalls continue to provide better management of OT protocols.
These are typically deployed deeper inside the OT layers at the level of a PLC or below.
One of the biggest challenges to OT security is ensuring proper firewall configuration. Even
appropriately designed firewalls often fail to provide adequate protection due to poorly executed
rules and improperly managed changes over time. Ensuring program execution and monitoring
for changes and insecure rules is necessary to maintain proper network segmentation.
Remote access
OT systems are designed with the assumption that remote access will be available to facilitate
troubleshooting and programming support. Some industries such as power generation now
greatly restrict remote access in their OT environments. For most other industrial organizations,
however, remote access remains vital to ongoing operations. This is true even as some argue that
their OT systems are “air-gapped,” a reference to the outdated concept that critical OT systems
have no external connections or access to the internet. In reality, very few OT environments were
ever truly air-gapped. Now, with the rise of IIoT, air gaps are a thing of the past for nearly all
modern systems.
That makes secure remote access a critical element of OT security. Connections are necessary
for OEMs to manage and troubleshoot their systems. Technicians still need a way to access
control systems. Corporate IIoT and analytics teams need persistent data access to leverage
insights and facilitate process changes.
Secure remote access involves several important practices including securing communication
paths; ensuring only authorized users can connect; monitoring and recording behavior during the
session, and; enforcing corporate policies on connected devices.
A range of vendors offer such solutions – some IT-centric, others more specific to the OT
environment. Operators should have a single system covering all vendors and personnel that
require access. Trying to support unique remote access solutions for multiple groups quickly
becomes unmanageable.
Zones and conduits
IEC 62443/ISA99 uses the terms “zones” and “conduits” to describe communications schemas
designed to help secure OT networks. Zones can be thought of as the area of the network that
encompasses a group of machines. Proper zoning limits the ability to communicate from one
zone to another without appropriate authorization. The intent behind zones is to keep an attacker
with access to one group of devices in a network from pivoting to another. Zones attempt to
restrict such movement unless the user, application, or device has the authorization to connect to
the other zone.
“Conduits” are the paths in which communication occurs either within a zone or across zones.
Conduits allow for certain traffic to move across zones. Proper conduit design means that certain
paths are open to a device to communicate with another device. For instance, an HMI on a
particular zone may be allowed to interact with a PLC in another zone if it follows a certain path
or conduit in doing so – for example via a particular firewall, only with “read commands” and
only for a certain type of traffic.
Access control
“Least privilege” is a fundamental concept common to all cyber security standards. It describes
minimizing the amount of access provided to each user, device, application, or service account to
the least amount possible to perform its prescribed function. In OT, access control is uniquely
challenging for several reasons:
First, processes require high reliability and uptime as well as very strict safety systems. As a
result, operators want to ensure that the closest person to the process can quickly shut down the
system for safety reasons or reset parameters to improve the process. Access controls include
requirements for signing on to individual accounts with separate passwords, ensuring time-based
lockouts, and limiting access to admin accounts. However, in a rapidly moving process,
companies often either don’t use passwords at all or provide shared passwords and accounts.
Next, many OT devices lack access control altogether. The assumption in many industrial and
operating processes is that the operator who is physically at the workstation has authority to
make changes and the systems are designed to support that.
Additionally, OEMs and service providers need access to OT devices and systems to provide
maintenance or troubleshooting. In many cases, efficiency motivates organizations to create new
accounts for these users.
Finally, many of these systems are not connected to the central active directory as most IT
systems are. As a result, monitoring access and maintaining limited access is often a manual
process.
However, true OT security maturity requires managing access in several ways:
User and account management
At its foundation is the asset inventory highlighted above. You can’t limit access to assets you
don’t know about. Once users and accounts are identified, mature OT cyber security requires the
ability to quickly remediate and maintain limited accounts – cleaning up dormant user accounts,
limiting admin rights, and establishing robust password requirements, for example.
Network access
The network protections described earlier provide the foundation for this. In the context of
network protection, different users, devices, and applications will have different rights. In OT,
it’s important to determine which devices and accounts need to communicate with the others.
Limiting communication could damage the process. Monitoring required communications,
reducing unnecessary flows, and locking down networking equipment are hallmarks of maturity.
Design and programming of devices
One major difference in OT is the variety of embedded device types, many of which do not have
access controls enabled by default. Security in OT requires selecting systems that allow for
sufficient control of access. For legacy devices, it’s vital to review all access control capabilities
to leverage what’s available rather than assuming physical access equals authorization. For
devices r that don’t offer such capabilities – and this list will be long – defenders must develop
compensating controls to address access control limitations through physical network protections
or by limiting the functionality of the device.
Endpoint and application protection
Among the least understood and most critical elements of OT security is the protection of the
endpoints themselves. Organizations often assume robust endpoint protection is impossible given
the unique characteristics of OT systems – OEM proprietary applications on OS devices and OS
on embedded devices. Yet, with the greater connectivity required by the digital infrastructure
investments of IIoT and Industry 4.0, it’s imperative that organizations look to actively protect
endpoints, rather than simply detecting anomalous network traffic.
Endpoint protection has its foundation in the asset inventory requirements mentioned above; it’s
then important to act on that information to provide protective elements. Some of the standard
systems management tasks in IT that OT defenders need to heed and adopt include:
Patch and vulnerability management for OT security
It is often said that you cannot patch OT systems. This is not true. You cannot patch all systems
for all known vulnerabilities immediately, but you can make significant headway in remediating
software bugs with a programmatic approach.
>>See our end-to-end patch management whitepaper for more on this topic.
Patch management begins with a detailed understanding of the necessary patches to deploy –
commodity OS, application, and firmware. This is followed by an analysis to identify the
relevance of each patch to each system in order to determine a patch’s necessity for a certain
device if key configuration settings on that device are not enabled or other patches have been
deployed. This is particularly important in the case of firmware updates, which may be mitigated
through configuration changes that do not require an upgrade of the device or a significant subset
of the control system.
Next comes a review of the patch for its impact on the operating environment. In many industrial
use cases, these reviews are provided by OEMs of the hardware or process-control software. For
others, an independent review of the patch and its potential impact is necessary. To make real
progress, organizations need automated patch deployment as manually deploying patches
requires significant labor resources of OT personnel that are just not feasible. OT systems that
accelerate the efficiency of patch deployment are critical.
Configuration management
Closely tied to patch and vulnerability management is secure configuration management to
design and maintain secure settings on OT equipment. As discussed earlier, OT systems are
“insecure by design.” Organizations can significantly reduce risks and improve the protection of
endpoints with a robust review of configuration settings against standard security posture.
Logout and login settings on devices, closing unnecessary ports and services, removing
unnecessary software, and ensuring limited connectivity are just some of the configurations
practices that should be followed where feasible. OEMs often share configuration best practices
on their websites or in security releases.
In addition to the design and initial install, robust security also requires management of
configuration changes. In some industries, such as those regulated by NERC CIP, change
management is a foundational notion, but in many others, changes are made on a frequent basis
by a range of factory automation or instrumentation and control technicians to improve the
process and allow access for remote support among a host of other reasons. Capturing changes,
determining whether they’re appropriate, then responding byre-establishing a proper baseline are
core components for a secure endpoint environment in OT.
Software management
This is separate from vulnerability or patch management, focusing exclusively on a reduction in
unnecessary software on OT systems. It’s common to find many unnecessary – and potentially
risky – applications on OT systems. This belies the notion that OT devices should only run
“OEM-approved” software. It’s not unusual to find remote access tools like LogMeIn, suites of
Adobe software, even DVD burners and Apple iTunes when auditing the contents of OT devices.
One fundamental endpoint protection lever is to remove all unnecessary software to reduce the
risk of malicious actors leveraging those applications.
Anti-malware
Many organizations struggle with anti-malware in OT given the challenges of updating
signatures on a regular basis, the lack of direct cloud connectivity necessary to leverage many
next-gen AV solutions, the inability to apply AV wares on embedded devices, the dearth of
vendor solutions that can be broadly applied to all devices in the fleet, and the persistent
challenge of false-positive detections that can bring critical processes to a halt. But malware is a
constant threat in OT, with strains that target many of the basic components of those found in IT.
As a result updated AV tools can be very effective in stopping them. In addition, there is a
growing number of OT-specific malware strains which do require more specific OT signatures or
detections.
There are several options for Anti-malware defense in OT including:
Application whitelisting
This technology, which is slowly going away in the IT environment given the explosion of
applications and the never-ending challenge of keeping the whitelist updated, remains a very
effective solution for more stable OT environments. Application whitelisting inverts the antivirus concept. Where AV allows all traffic unless it is deemed malicious, whitelisting disallows
all applications unless they are explicitly allowed.. Many OT processes do not add applications
frequently, and, in fact, the process runs more effectively if these applications are disallowed.
Integrated vendor-agnostic antivirus (AV)
The various OEMs approve and license multiple AV brands. Emerson uses McAfee while
Rockwell uses Symantec, for example. It’s not uncommon to find OT environments using a half
dozen different AV brands or more across their entire fleet. The management of such a complex
AV ecosystem quickly turns into a logistical challenge. One option is to integrate AV solutions
into a single pane of glass to gather status, alerts, and detections into a common interface for
resolution. This can significantly streamline the management of the various AV systems.
Nex-gen AV
The new standard in IT-oriented anti-malware is next-gen AV, shorthand for cloud-enabled.
Next-gen AV tools need no signature updates on the agent deployed to each device. Rather, they
take all of the processes occurring on the system, compare them to known risks, and look for
anomalous patterns of behavior in their cloud infrastructure to stop malware even before a
signature is created. These increasingly popular approaches can be effective in certain OT
environments where workstations and servers have access to the cloud. However, these systems
do not work for embedded devices where the software’s agents cannot be deployed.
Managing removable media and rogue devices
One of the myths of OT is that network protection and detection are sufficient to protect OT
endpoints inside the perimeter. One significant gap in the network protection armor is revealed in
the presence of removable media and other access points, especially in today’s IIoT world. Even
in the most “air-gapped” network, there inevitably comes a time to introduce or update
applications, or move data in and out of the system. Removable media, in the form of USB
sticks, portable drives, and other transient cyber assets such as laptops, are the form of choice in
most cases. Unfortunately, these devices can contain malware if not properly scanned and
treated before introduction into the protected OT environment.
Defenders have several options for protecting OT systems from such threats. Application
whitelisting is useful for limiting removable media devices to only those specifically approved to
open on any OT device. Asset owners can also use network access controls to limit new devices,
such as transient cyber assets or others connecting to the network, potentially introducing strains
of malware.
Another increasingly prevalent threat arises when users add wireless access points to the network
without approval or review. Ensuring these rogue devices cannot connect without permission, or,
at a minimum, alerting on such connections so defenders can remediate risks in an hour or
less, is necessary to effectively safeguard OT systems.
Detect
The NIST Detect function covers an organization’s ability to rapidly recognize malicious
activity. Suspicious activity can range from curious, anomalous events to hard evidence of
known bad behavior. Like protection, detection includes both network- and endpoint-based
requirements.
Network detection, often referred to as Network Intrusion Detection Systems (NIDS), monitors
network connections, traffic, packets, and other information to identify malicious patterns.
Endpoint detection, often referred to as Host Intrusion Detection Systems (HIDS), provides
similar analysis on the behavior of devices and the processes occurring on those devices in an
OT network.
Network Detections
OT security over the past five years has been defined by network detection methods. What has
come to be known as “passive anomaly detection” is now synonymous with OT security as
endpoint protection and detection is increasingly seen as risky to operational processes.
OEMs fueled this trend, discouraging customers from installing endpoint management and
security tools in their environments claiming that such activity could disrupt processes.
Network detection monitors traffic that flows through networking devices such as routers,
switches, and firewalls to determine baseline behavior and detect anomalous patterns that, based
on prior research, might represent malicious or risky activity. Network intrusion detection then
sends an alert to a security information and event management (SIEM) platform where it can be
combined with other threat and security data for further analysis. Network detection lets
operators spot potentially risky communications that can indicate a threat actor attempting to
infiltrate the OT network.
Endpoint Detections
Similar to network detections, endpoint or host detections look for anomalies or threat signatures
in behavior. Endpoint detection examines activity and events related to asset or endpoint itself,
such as logs, Syslog, and Netflow data. It also includes analyzing user behavior to reveal actions
by users on endpoints that might indicate threats. Successful endpoint detection can also include
a review of device performance data such as power and CPU usage that could indicate suspicious
activity.
In OT environments with connections to physical processes, endpoint detection can also ingest
physical outputs of the process itself. By adding these physical patterns, threats can be identified
more quickly and false positives reduced by comparing server, workstation, HMI, PLC, or other
behavior data with the I/O data on the process itself.
Respond
In cyber security of any type, detection is mostly worthless without a well-managed and
meaningful response. The response is the “so what” of detection. Those who have been in cyber
security for any length of time realize that detections create lots of alerts and potential indicators
of compromise. Providing true security hinges on how well an organization can respond to those
detections, conduct root cause analysis, and take action to neutralize the threat. The ability to
rapidly take appropriate action across an organization’s OT environment is critical to providing
effective responses to detections.
Response begins with root cause analysis. This is particularly critical in OT and requires
knowledge of the processes being controlled. Taking the wrong action in response to a noncritical event can result in downtime or outages worse than the perceived threat might have
caused on its own. Mature organizations will have a response process that takes the alert from
the network and host intrusion and be able to further analyze it to understand how critical the
threat is, what devices or networks it targets, the cause of the alert, and whether it has an
explainable cause such as an operational change.
Incident response in OT requires coordination with personnel that understand the process at that
particular site or environment. This knowledge allows for better root cause analysis to determine
whether the alert is a false alarm or a true security incident. Furthermore, this knowledge enables
OT personnel to take appropriate actions that can both contain the threat while also ensuring as
much uptime as possible on the core process.
Incident response often also includes the engagement with third parties, including insurers,
regulators, government entities, and cyber security analysts who are proficient and incident
response and management. Key in this phase is the coordination of these various groups to
ensure the lowest cost, fastest time to recovery.
Recover
In the event that the incident causes an outage or disruption of systems, the final phase in the
NIST CSF is to ensure the system is back up and running with no remaining malware threatening
the environment. Recovery in OT begins with robust backups. In many OT environments,
backups, while critical, can often be manual or ad hoc. In most cases, modern IT systems have
automated backups through virtualized tools, etc. However, in OT, these tools and processes are
often not employed rigorously. Organizations need to ensure they maintain recent and robust
backups to allow for rapid restore. As recent ransomware attacks have shown, backups also
must-have offline copies as one of the first steps in successful ransomware is to encrypt critical
online storage devices as well.
Beyond backup & restore, recovery in OT will often also include the restoration of the running
configurations, rules, and programming on industrial control systems. Unlike in IT, where the
programming is easily backed up, in OT recovery often requires significant manual efforts to
restore the core ladder logic, controller configurations and programming, etc. to ensure the
process begins to run correctly. In many operational environments, these programs must also
pass regulatory hurdles. For instance, in the medical device/pharma industry, quality protocols
require that configurations and programs need to be tested and proven. In the case of a
ransomware recovery, these systems may take a significant amount of time to come back online
in a compliant manner.
Like so many other parts of OT cyber security, the recovery phase requires a deep knowledge of
the OT process as well as the regulatory requirements it operates in. Having a central security
operations center try to manage the recovery process will not work. A specialized team of OT
personnel, including process control and quality engineers is key to achieving rapid recovery.
Organizational alignment in OT security
Perhaps the largest challenge in OT security is the organizational one. There are myriad reasons
for this – from different cultures to different priorities, different performance criteria, different
training, and the list goes on. In some cases, it is as if these groups are from different planets.
To achieve meaningful impact in the reduction of OT cyber security risk requires close
collaboration between security leaders and the controls engineers and operators that manage and
understand the operational technology systems that control the physical processes. In some
organizations, these groups are already closely aligned but in others, they are far apart. What is
most interesting is this is not an industry-driven or process-driven outcome. It is much more
about the culture and organizational models of the organizations prior to pursuing cyber security.
There is a wealth of advice on how to enable closer collaboration between IT, security, and OT.
Proper alignment begins with a shared set of objectives across the top of the organization.
Perhaps the largest barrier is when the objectives of the different functions are at odds with one
another – or at least seemingly so. In many cases, this is seen as the difference between
operational uptime on the part of the OT/plant leadership, while security’s goal is to reduce the
threat of attack, even if it may mean short-term downtime to implement tools or change
processes. Senior leadership must be willing to step in to bring these groups together so that a
common set of objectives is agreed to. Security is, in fact, the objective of reducing downtime. In
fact, most of the security controls for OT are about just this. However, clarity of goals and
communication across teams is necessary.
Next comes the structure of who is accountable and where authority resides.
There is no magic solution to the right organizational structure of security. Different
organizations have succeeded with different models based on their history. The best
recommendation is to leverage the overall organizational direction of the company, rather than
force-fitting something new just for security.
One of the most successful OT cyber security executions occurred at a utility holding company
with a culture of business-unit independence and ownership of results. The company’s
incumbent governance model uses the classical distributed business-unit P&L ownership model
made famous by Emerson Electric, Illinois Tool Works, Danaher, and many other industrial
companies over the years. The principle is to make clear accountabilities around the “what” – i.e.
targets and objectives. Then let the management of each business unit have full authority as to
the “how”– strategies and tactics to deliver.
In the case of cyber security, the senior team established a very clear top-down directive as to the
objective and standards they expected each of the business units to achieve – in this case the
CSC top 20 controls – down to specific maturity levels by each sub-control. They put in place a
company-wide review process to ensure progress to the objective. The CISO was very involved
in helping shape both the objective as well as the process. Then the “how” was left to each
business unit. Within a defined construct of objectives and metrics, business units had the
authority to make decisions such as what tools to deploy, how to balance compensating controls,
the specific approach to achieving least-privilege settings, and specific approaches to incident
response.
There are challenges with any approach: duplication of effort, inefficient use of underlying tools,
not applying corporate best-in-class approaches to each business unit, need for duplicate cyber
security expertise in a world where cyber talent is limited, too focused on a set of standards
rather than real “security” and reduction in threats or time to remediation. All of these
limitations are absolutely true and were addressed through other measures. However, the
organization did not have a culture of centralized experts or top-down directives of shared tools
or infrastructure. To create such a model would have meant going against the primary mode of
operation for the organization. Had the CISO tried to push in this direction, he most likely would
have ultimately failed because it was not in the organization’s DNA.
The CISO knew that no governance model is perfect. Successful OT cyber security leaders take
the time to understand the overall governance culture of their organizations and will build a
model that works with the flow, rather than trying to force-fit a theoretically better governance
model. They will then address the gaps unique to that approach to ensure the limitations do not
become hindrances.
In another example, an organization created a single cyber security architecture and management
body staffed with representatives from different areas of the organization – production staff that
had run plants, security representatives, IT leaders, and more. This group gelled into a working
body that brought the best knowledge from each group to the problem.
Balanced scorecards are another important element for driving OT security priorities across the
organization. Over the past 40 years, operations executives have learned how to balance a range
of different metrics in delivering output – efficiency, quality, environmental health, safety, and
others. Security is one additional element in ensuring the ongoing delivery of output. Because
cyber security events are still relatively rare, operations executives can often underweight these
risks as they are low probability-high impact events. Therefore, using the balanced scorecards
adopted and proven over the past 30 years is an effective way to make security an element in the
overall delivery of objectives for operations.
According to a recent KPMG/CSAI survey, the number one barrier to OT cyber security is a lack
of knowledgeable resources.
How does an organization address this expertise challenge? First, one can look to the outside for
help. Managed OT security services represent a growing industry with more firms establishing
their capabilities in delivering OT security as an outsourced provider. While this may not be a
cure-all, it does address one of the most challenging elements; the turnover of skilled security
staff. Many organizations already struggle to recruit new security team members from a limited
skills pool only to see their hard-won staff poached by another part of the organization or
recruited away from the company just as they are trained on the systems.
According to the NIST Cyberseek database, there are more than one million unfilled cyber
security jobs. Any trained person is going to be a recruiting target. When this happens, the OT
security organization is left to try to attract and train a new person, often without adequate
resources for the training. A managed OT security vendor has the scale to ensure a continuous
stream of hiring and training so that each customer can leverage the scale.
Second, training is a valuable component for gaining expertise. Most of the available training is
around what might be referred to as “OT Systems Management” rather than advanced threat
detection or artificial intelligence. According to NIST, almost three-fourths of jobs in cyber
security focus on “systems management” rather than advanced analytics. In OT, given the fact
that most of these systems have not been managed historically, these skills are even more in
need.
Is it easier to train IT people on OT or vice versa? The reality is that neither is simple and the
right answer includes a blend of both. The above chart, however, highlights that the types of
cyber security skills necessary are fundamental IT systems management capabilities that are
definitely feasible to learn. To take advantage, organizations can:



Leverage internal IT resources with depth in foundational elements of vulnerability
management, configuration hardening, and similar skill sets. By sheer numbers, there are
more IT workers than there are industrial engineers and technicians by a factor of 5-10X,
depending on how each is defined by BLS. Furthermore, the skills needed to operate and
manage IT and OT HMIs, switches, routers, firewalls, and other wares are similar.
Finally, functional requirements in security such as understanding correlations, using the
latest analysis tools like Splunk, and defining patch requirements are similar between IT
and OT, even if the specific threats or incident response actions differ.
Tap into this IT resource pool by centralizing the analysis of cyber risks using vendoragnostic technology. This obviates the need to build discrete cyber security expertise
areas in each plant or site.
Integrate OT experts into a central team. While general cyber security knowledge is
important, how to address those issues within the OT context requires people that


understand what is feasible and operationally safe within the OT environment. This
blending also enables cross-learning over time.
Invest in training for site-level OT resources in critical OT systems management
functions like patching and configuration hardening. Safe deployment of these security
actions demands that local OT resources be involved in and understand the management
tasks taken on those systems. This training should also include incident response
activities.
Leverage technology that enables local teams to automate actions they can take across
vendor systems to reduce the labor burden. One major challenge is the dependence on
OEMs for this management function, a suboptimal approach that places risk in the hands
of third parties — and in most cases multiple third parties — as most plants deploy
equipment from multiple vendors.
Having OT personnel closely involved in the response and actions for either protection or
response activities is a recipe for success. This can be referred to as a “Think Global: Act Local”
approach. This concept gives security teams visibility into endpoints, networks, and users across
an OT infrastructure through a centralized database and analytics platform. This enables scaling
of knowledgeable resources. The central SOC can monitor vulnerabilities and threat detection
across IT and OT and analyze and prioritize them based on experience and scale.
Be forewarned, however, continuing the response into root-cause analysis of an event or taking
action to protect OT systems or respond to an ongoing threat can cause unintended impacts to
systems if done improperly. Therefore, “act locally;” engage OT resources with the most
knowledge of the process when patching or managing users and configurations, for example, to
ensure such actions are tested and applied at an appropriate time.
The Future of OT Security
The basic premise of Dale Peterson’s article “How to be an OT Visionary” was to look at what is
happening in IT and assume it will arrive in OT five years later. He provides a range of great
examples from Antivirus to virtualization, and I would wholeheartedly agree with his sentiment.
We have been told many times that agents won’t work in OT only to demonstrate for the past
dozen years, that in fact they work quite well and are much less intrusive than other methods of
gathering information if tuned appropriately for ICS.
One of the clearest “coming attractions” for OT is the application of traditional IT Systems or
Security Management into the industrial controls environment. For nearly 20 years, IT teams
have applied foundational techniques such as hardware and software management, secure and
sustainable configuration management, patch management, user and account management,
etc. These processes – and the tools they use to automate them – have not only delivered
improved security of IT systems, but have also ensured improved reliability, lower operating
costs, and better customer satisfaction with IT as an organization.
Robust IT Systems Management is conducted comprehensively, regularly, and with statistics on
compliance and outliers. It provides the basis for much of the security within the IT realm – from
ensuring updated security patches, to proper network rules in firewalls, etc.
These tools, techniques, and processes are missing from almost all OT environments.
Organizations and their OEM partners build cyber security systems to last 20 or 30 years.
Upgrade cycles are measured in decades, not three-year refreshes or monthly updates. There are
many good reasons for these approaches given the unique processes and sensitive devices
involved. But, in most cases, these computing devices – servers, workstations, switches, PLCs,
relays, sensors, etc. are not “managed” in a typical ITSM model.
We see this clearly in our assessments of these industrial environments – unpatched systems,
device configurations with significant insecurities, many dormant and insecure users and
accounts, failed or non-existent backups, and at the foundation a fundamental lack of accurate
and deep asset inventory. The focus in most industrial organizations is the process itself rather
than the management of the computing devices that control that process. Ease of operation is the
primary driver, enabling the technicians to reduce the cost and complexity of the process.
The Need for Greater OT Systems Management
Over the next five-to-ten years, OT needs to adopt the core elements of IT Systems & Security
Management. To date, most industrial organizations have relied on network protections for their
OT systems – firewalls or data diodes, the mythical “air gap”, network anomaly detection, or
IDS/IPS. No one would debate the value of these initiatives in a defense-in-depth model.
However, the next five years will make these defenses less and less effective, requiring a push to
greater OT Security Management programs. Several trends and events drive this change:
Increasing IIOT/ Industry 4.0 connectivity between industrial operations and the internet
Organizations have trialed and proved connected plant initiatives for a decade. In the past three
years or so, organizations pivoted from trial to wide-spread adoption and the “wave” is gaining
steam. Based on multiple analyst views (Gartner predicts the enterprise IOT platform market
will grow to $7.6 billion in 2024), these initiatives are set to grow dramatically over the next five
years. Whether it be OEMs connecting to wind turbines to regularly update the programs or
monitoring the flow of fluids through a valve to tune for maximum output, we already see these
connections occurring. As connectivity explodes, network protection alone will grow
increasingly untenable as a solution to OT security.
Increasing public vulnerabilities in OT equipment
In our recent ICS Advisory Report, we found a 75% increase in CVE’s in ICS-CERT advisories
between 2019 and 2020. This growth highlights the growing research into the vulnerabilities of
these industrial-specific software and embedded systems. Moreover, this is just the tip of the
iceberg as the software supply chain risks from the underlying components of these systems are
hardly identified at all yet. The reality is that OT systems’ reliance on “security by obscurity” is
falling away as the curtain is pulled back. This will require a much more robust, IT-like,
endpoint management capability of these systems.
Increasing regulatory pressure in OT security
Over the next three years, almost every developed country, and many developing countries, will
implement rigorous security requirements that apply to OT systems. From the US Department of
Defense’s recent CMMC standard to the UK’s RIIO2 standard to Qatar and other locations
within the Middle East. The trend is to greater regulatory oversight of the world’s critical
infrastructure. We have seen how these regulations impact utilities in North America with the
NERC CIP requirements which essentially require true OT endpoint systems management –
patching, configuration management, user and account control, backup management, etc.
Increasing pressure from CISO/Board of Directors
As all of these changes occur, boards of directors place more emphasis on securing the OT
environment. This is not surprising given the potential financial impact of these attacks –see the
results from Merck, Maersk, Norsk Hydro, and more recently at many organizations involved in
the supply chain for critical COVID vaccines. Insurance companies are pressuring companies to
ensure all systems are protected. Therefore, CISOs put greater emphasis on OT. They expect the
same type of security capabilities and systems management as they achieve in IT, which will
drive greater push for OT Systems and Security Management.
The Future of OT Cyber Security
So, what does this mean for OT leadership? How will it impact the “leaned out” operational
excellence achieved over the past 15 or 20 years? What is the impact on the day-to-day jobs of
the Instrumentation and Controls techs?
In short, a coming tidal wave of new requirements, reporting, and security responsibilities on the
computing equipment that runs industrial operations. Why do we call this a “tidal wave”?
Because we have seen it. The North American electric utility industry over the past dozen years
has adopted an increasing set of requirements of OT systems management. Is NERC CIP
perfect? Of course not. It has many areas that may not deliver a great ROI on security.
But would we expect the regulatory requirements around the rest of the world to be significantly
more efficient? Probably not. In addition, these were established before the presence of IIOT and
cloud, before the increasing numbers of endpoint vulnerabilities, etc. As those areas grow, the
need for endpoint management will grow ever greater.
The reality is that most OT environments do not manage these endpoints. Therefore, as these
new requirements emerge, most will be relying on manual tasks to gather critical reporting for
the C-suite or regulators. Most will be using different OEM tools to try to patch systems
manually or with an inefficient approach of the system by system. Most won’t have automated
asset inventory or vulnerability assessment to provide real-time visibility, so will rely on manual
teams to gather this information into spreadsheets, etc.
We often hear forecasters talk about the coming risk from hackers, and this is real. But the real
coming risk is the operational costs in keeping up with the necessary OT systems management to
ensure security in connected, vulnerable, regulated environments.
OT Endpoint Systems Management
As Dale Peterson said, to predict the future of OT, just look at IT and add on 5-10 years. The
future is clear: It involves a greater and greater need for endpoint systems management of OT
computing equipment. The challenge is that doing this efficiently and effectively does not
happen overnight.
Verve has a decade of experience helping organizations adopt efficient OT endpoint systems
management. This begins with a truly robust asset inventory. But an inventory is only the
foundation of a true OTSM program. It also includes efficient and OT-safe vulnerability
management, patch management, configuration management, etc. This integrated and automated
approach can reduce the labor requirements by 70% over traditional manual methods. As this
tidal wave approaches, we encourage industrial organizations to begin to map out their OT
endpoint management roadmap. We look forward to helping.
Creating an OT cyber security program
There are no magic bullets or one-time actions to achieve success in OT security. Real progress
requires a programmatic approach to continuously improving the cyber security maturity and
effectiveness in the OT environment.
>> See our webinar on developing an OT security program
A robust program begins with establishing an objective then measuring the baseline against that
goal. There are many frameworks an organization can use to establish such targets. None is
perfect, but leveraging best practices can help accelerate the organization’s security journey.
Next comes assessment and prioritization built from a baseline and gap analysis of the
environment. Steps for an effective assessment and prioritization include:

Getting specific. Too often assessments end at high-level gaps based on surveys or
interpretations of network diagrams and documentation. This results in charts with a sea
of red areas and low scores, leaving organizations with little direction on
priorities. Instead, get deep access into the assets themselves to gather data necessary to
prioritize the risks and potential maturity impact from resolving those risks. This involves
an asset-by-asset, 360-degree risk assessment to gather details such as known
vulnerabilities, user and account risks, access risks, configuration risks, and network
design and implementation. This deep picture of the risks allows for more targeted
priorities for remediation actions that really make a difference.

Developing a roadmap. An assessment without a roadmap to achieve target maturities is
meaningless. Translate risks into a programmatic roadmap to close known gaps.
Roadmaps often include different time horizons for prescribed actions. No one roadmap
works for everyone.
Beginning remediation. In almost every OT security journey, the onset is a whirlwind of
cleaning up insecure architectures, buggy software, vulnerability “debt,” and improperly


managed user accounts. Often it requires significant project resources to work through
items such as network segmentation, secure remote access, backup, and system patching
and upgrading. Every roadmap will include requirements such as these that need to be
paced over time.
Monitoring and maintenance. Once the surge has occurred, the hard work begins even
as the initial wave of energy, budgets, and focus begins to fade. People go back to their
day jobs, staff that was hired is recruited away, new tools need to be maintained once
they’re deployed. This is the crucial period that separates the mature from the immature.
All plans and roadmaps should include budgets, resources, and procedures to ensure
monitoring and maintenance of the surge efforts. This includes the monitoring of
configurations, threats, vulnerabilities, patches, access management and more. It also
includes the ability to report on all of these metrics. And, perhaps most importantly, the
effort also demands the continued support of senior leadership to maintain organizational
focus once the big, initial wave is complete.
Download