Uploaded by Jasmin Ćosić

hra-seed21

advertisement
Implementing a Security Architecture for
Safety-Critical Railway Infrastructure
Michael Eckel∗
Stefan Katzenbeisser§
∗ Fraunhofer
Don Kuzhiyelil†
Jasmin Cosic¶
Christoph Krau߇
Matthias Drodt¶
Maria Zhdanova∗
Jean-Jacques Pitrolle†
Institute for Secure Information Technology SIT, Darmstadt, Germany
{firstname.lastname}@sit.fraunhofer.de
† SYSGO
GmbH, Klein-Winternheim, Germany
{firstname.lastname}@sysgo.com
‡ Darmstadt
University of Applied Sciences, Darmstadt, Germany
christoph.krauss@h-da.de
§ University
of Passau, Passau, Germany
stefan.katzenbeisser@uni-passau.de
¶ DB
Netz AG, Frankfurt am Main, Germany
{firstname.lastname}@deutschebahn.com
Abstract—The digitalization of safety-critical railroad infrastructure enables new types of attacks. This increases the need
to integrate Information Technology (IT) security measures into
railroad systems. For that purpose, we rely on a security architecture for a railway object controller which controls field elements
that we developed in previous work. Our architecture enables the
integration of security mechanisms into a safety-certified railway
system. In this paper, we demonstrate the practical feasibility of
our architecture by using a Trusted Platform Module (TPM) 2.0
and a Multiple Independent Levels of Safety and Security (MILS)
Separation Kernel (SK) for our implementation. Our evaluation
includes a test bed and shows how certification and homologation
can be achieved.
Index Terms—Railway, Security, Safety, TPM, MILS
I. I NTRODUCTION
The safety-critical railway infrastructure is currently undergoing a digitalization process. The Operational Technology (OT) for monitoring and controlling railway ControlCommand and Signalings (CCSs) systems is changing with
the use of COTS products and IP-based communications, as
well as the increase in communications between systems.
Previously used closed and manufacturer-specific systems—
typically characterized by proprietary, monolithic, and expensive systems—are increasingly being replaced by standard
hardware and software technologies.
As part of the NeuPro project, Deutsche Bahn (DB) in
Germany plans to digitalize its infrastructure by 2037 [1].
This implies a step-by-step process with several stages: from
equipping all trains with European Train Control System
(ETCS) equipment and building a new high-speed rail network
to transitioning to a pure ETCS system in the future. This
change can be observed not only in Germany at DB, but also in
the European Union (EU) and worldwide. For example, further
development of railway systems and implementation of necessary safety and security logic is on the agenda of the EULYNX
Cluster, an European initiative to standardize interfaces and
elements of signaling systems (cf. https://eulynx.eu/).
A typical railway signalling architecture consists of two
main layers: the field element layer that contains field elements and their Object Controllers (OCs) and the interlocking
layer with Maintenance and Data Management (MDM) and
Interlocking System (ILS). Field elements are sensors and
actuators, such as railroad signals, gates, and switches as well
as train detection systems (TDSs). An OC usually controls
exactly one field element and provides an interface to the ILS
by translating digital interlocking commands into electrical
signals that steer the field element and by reporting the
element’s state back to the ILS. The ILS is responsible for
the safe operation of trains, i.e., for determining of technical
dependencies for train routes and sending commands to proper
field elements. In case an error or a fault occurring in a field
element, the ILS switches to the safe state (fail-safe) and
blocks the route until the dependency is restored. The MDM is
in charge of providing software updates for the components in
the interlocking and field element layer, logging of diagnostic
data and potential security events, and time synchronization.
An Operation Control Center (OCC) centralizes supervision of
ILSs as an overarching layer. All components down to OCs
are connected via the so-called railroad WAN, i.e., Ethernetand IP-based communication network. Railroad operations that
mandate resilient transport and reliable message delivery may
use the Rail Safe Transport Application (RaSTA) protocol [2].
The first step in the realization of the NeuPro project is
the digitalization of OCs. OCs play a crucial role in the
translation of analog control signals for field elements and
digital commands received from the ILS. Since the DB railroad
network consists of more than 3,300 ILSs and more than
200,000 field elements, the integration of IT into control
processes is aimed at enabling more efficient and improved
railroad operations.
However, digitalization also increases the risk of IT attacks,
making it imperative to jointly examine safety and security [3],
[4]. Integrating security mechanisms into a safety-certified OT
system without losing certification is a major challenge [5].
According to EN 50128 [6], all software components must
be certified to the highest Safety Integrity Level (SIL), unless freedom of interference can be provided. In commercial
deployments, security applications are often developed and
verified using less rigorous methods than is the case for high
SIL applications. The use of external open source libraries,
such as OpenSSL, is common for the development of security
applications. In addition, it is essential to regularly update
security applications to fix vulnerabilities or introduce new,
more secure cryptographic algorithms. For these reasons, it is
not desirable for a security application to be certified with the
highest SIL; because this would mean repeating the long and
costly certification process for each software update.
In previous work, we proposed a security architecture for
safety-critical railway infrastructure, enabling the joint operation of safety and security mechanisms on a single hardware
platform [7], [8]. The architecture consists of a hardware platform with a Trusted Platform Module (TPM) 2.0, the Multiple
Independent Levels of Safety and Security (MILS) Separation
Kernel (SK), and various security applications. To facilitate
the use of the TPM by different security applications, we
introduce a TPM Resource Manager (RM) in our architecture
that manages concurrent access to the resource-limited TPM.
In this paper, we describe how
1) how this architecture can be implemented on an OC as
a solution for replacing legacy OCs in the infrastructure
of DB,
2) how such an architecture can receive certification and
homologation1 , and
3) how we evaluate it, including a test environment.
The remainder of the paper is organized as follows. In
Section II, we briefly describe our security architecture and in
Section III the requirements it should address. The implementation is described in Section IV. Our evaluation is presented
in Section V. Section VI describes how this architecture can
be certified and approved. Finally, we conclude the paper and
identify future work in Section VII.
II. BACKGROUND AND P REVIOUS W ORK
This paper builds on our previous work in [7], [8], where
we analyze requirements and introduce a security architecture
for a railway OC that enables security mechanisms to be run
on a single hardware platform with a safety application. The
following security goals guide the architectural design [7]:
1 Homologation is the name for the approval process of railroad vehicles
and railroad lines in accordance with a railroad commissioning approval
regulation.
•
•
•
•
•
•
•
Availability: The system should be able to provide the
expected functionality and data at any point in time.
Integrity: The system should ensure hardware, software
and data integrity for its components and interfaces.
Authenticity: The system should verify that any data,
especially, software packages and configuration updates
as well as network communications have a trusted origin.
Confidentiality: Security information such as access credentials and cryptographic keys must be kept confidential.
Accountability: Any change in the system, its components
or interfaces should be traceable to an authorized entity.
Non-repudiation: An authorized entity should not be able
to deny any performed change.
Auditability: Security events need to be logged.
Figure 1 shows our proposed architecture. It consists of
three main components: a hardware platform with a hardware
security module in the form of a TPM 2.0, a MILS SK, and
various security applications. The TPM serves as a security anchor and enables, among other things, secure storage of cryptographic keys (e. g., to secure communication connections),
measured boot to record software executions in a tamperevident manner, and remote attestation to allow authorized
external parties to detect tampering with the system software.
The MILS SK allows the joint operation of safety and security
applications on the same hardware. The SK controls critical
hardware interfaces and ensures the non-interference and the
resource availability for a safety application. In our case,
the safety application is a digital OC for NeuPro. Security
applications are, e. g., anomaly detection methods which detect
attacks over the network, secure software update protocols, or
a classic firewall. Possible applications are not limited to these
examples, the integration of further safety- or security-relevant
sensors located on the tracks can also be enabled this way as
shown in the study [9].
III. S ECURITY REQUIREMENTS
In [8], we analyzed relevant threats by using the requirements engineering process of DIN VDE V 0831-104 [3]
and IEC 62443. We identified 14 security requirements for
the proposed architecture that are summarized in Table I.
Requirements regarding key storage and key protection can
best be fulfilled with hardware support. We choose the TPM
for hardware cryptographic support since it provides secure
key generation, storage and protection. Further, the TPM
is a low-budget device and it is hardened against physical
attack. With the existing TPM ecosystem, we have open-source
software available, including Unified Extensible Firmware
Interface (UEFI), Bootloader, Operating System (OS) kernel,
and a whole TPM middleware: the TPM Software Stack (TSS).
We assume physical security to be ensured (e. g., using
housing with burglar alarm), so requirement R1 is out of scope
for this work. In Section V, we refine these requirements with
regard to the chosen security technologies and discuss how
our implementation fulfills them.
Fig. 1: Security Architecture for Railway Control-Command and Signaling [8]
TABLE I: Security requirements for railway CCS architecture
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
R13
R14
The
The
The
The
The
The
The
The
The
The
The
The
The
The
system
system
system
system
system
system
system
system
system
system
system
system
system
system
shall
shall
shall
shall
shall
shall
shall
shall
shall
shall
shall
shall
shall
shall
detect unauthorized physical access to its subsystems and/or prevent relevant exploitation of physical access
not allow the compromise of a communication key
not disclose classified or confidential data to an illegitimate user
exclude compromised endpoints from communication
not use insecure transfer methods
not allow any unauthorized user to access an endpoint
not allow unauthorized and unauthenticated communication between endpoints
not violate the runtime behavior requirements
allow for the updating of security mechanisms, credentials, and configurations to patch known vulnerabilities
not allow the execution of unauthorized software instances
maintain the transmission system requirements defined in EN 50159
provide means to detect an undesirable system state change and anomalies
impede that an unauthorized user can force it into one of the fall-back levels defined by the railway safety process
maintain the integrity of software, firmware, configuration, and hardware
IV. I MPLEMENTATION
In this section, we describe our implementation of the
security architecture for the railway OC as proposed in [7], [8].
In order to fulfill some of our requirements, we use the TPM
to protect and securely store cryptographic keys. Further, we
use the TPM in conjunction with measured boot to irrevocably
store measurements of boot and system software in a tamperproof manner. With remote attestation these measurements can
be verified by an external party in order to detect potentially
malicious software executions. We utilize a TPM in version
2.0 for several reasons. In contrast to TPM 1.2, it provides
cryptographic agility. That is, from a specification point of
view it can support multiple cryptographic algorithms instead
of a fixed set. In future, this agility may be useful to support
quantum-resistant algorithms. Moreover, we use TPM 2.0
extended authorization to further protect cryptographic keys
with the help of authorization policies enforced by the TPM
2.0.
In this section, we first describe briefly the architecture
of our implementation (Section IV-A). Then we describe the
MILS platform which runs safety and non-safety partitions
(Section IV-B). The TPM RM plays a central role in the architecture and is explained in Section IV-C. Section IV-D details
the OC application and the safety communication stack with
the RaSTA protocol [2]. Based on this architectural foundation
we explain the security applications Measured Boot, remote
attestation, and secure software update in Section IV-E.
A. Implementation Architecture
Our implementation architecture is depicted in Fig. 2. The
backend at the top hosts a Security Operations Center (SOC)
and the MDM. The OC in the middle constitutes the main
part of our architecture, featuring the MILS SK and a TPM
as well as safety and security applications. The OC controls
one or more field elements, such as switches or signals.
The SOC is the central point for all security-related services
and protects the IT infrastructure and data from internal and
external threats. It monitors, collects, analyzes, and examines
all security-relevant information for anomalies. Based on this
information, the SOC raises alerts and takes countermeasures
to protect systems, data, and applications. In the railway
context, the SOC relies on information from MDM systems.
One MDM system is responsible for managing multiple OCs.
It runs the safety-critical counterpart of the OC, sends management commands to the OC safety application and receives
status information from it. In our architecture, we extend the
MDM with security aspects: anomaly detection, safety and
security monitoring, secure update, and remote attestation.
The OC consists of a hardware platform, a software platform, and safety and security partitions. The hardware platform provides a Root of Trust for Measurement (RTM), a
TPM, and the UEFI. The RTM is the first component that is
run after the system is powered on. It is trusted implicitly
and the code is (usually) realized in hardware. The RTM
constitutes the first component of the Chain of Trust (COT)
in the measured boot process. All software measurements
are stored securely in the TPM as well as in the boot log.
Further, the TPM is used for securely generating and storing
cryptographic keys. The UEFI runs firmware which is TPM
aware and supports measured boot, thus, extending the COT.
The software platform encompasses a measured boot enabled boot loader and the MILS SK PikeOS. The MILS
SK consists, among others, of the partition updater and the
partition loader. The partition updater handles updating of
partitions, i. e., replacing an existing partition with an updated
one. The partition loader is responsible for shutting down and
starting up partitions. We modify the partition loader to record
partition measurements in the TPM as well as in the partition
log to keep track of all loaded partitions.
The OC is controlled by a safety application based on
management commands sent by MDM over the RaSTA protocol. Safety and security applications are mapped to separate
partitions provided by the MILS SK which ensures the spatial
and temporal isolation. Further, security services for secure
update and remote attestation are executed on the OC which
communicate with the appropriate counterparts in the MDM
system.
B. MILS Platform
When combining applications of different SILs on one
hardware platform, we have to provide fault containment
mechanisms to prevent cascading failures, i. e., a failure in
one component causing failure in another one. Also, for
independent certification of applications at their required assurance levels—i. e., security applications with non-SIL/QM and
safety applications with SIL4—we have to prove independence
between the applications. For achieving this, we use a MILS
based partitioned architecture [10] that executes applications
in sandboxes, called (resource) partitions. Communication
between applications is limited to explicitly defined communication channels. Partitions can be separated in space
and time using resource partitioning and time partitioning.
When applications are executed in separate partitions, mutual
dependencies between applications are reduced to the communication via explicitly defined communication channels. The
cornerstone of this partitioned architecture is a software SK,
a special type of operating system whose primary function is
to provide separation of partitions as well as explicit secure
communication channels. In our architecture, we use PikeOS,
a certifiable SK and hypervisor [11].
In PikeOS, resource partitioning is achieved by statically
assigning computing resources, such as memory, I/O and
file devices, communication objects, and cores to partitions.
PikeOS ensures that during run-time, an application has
guaranteed access to the resources of its partition and that
these resources are not accessible from applications belonging
to other partitions. To enforce resource partitioning, PikeOS
relies on the Memory Management Unit (MMU) and I/O
MMU to control access to main memory and I/O memory
resources, respectively.
PikeOS uses time partitioning to divide CPU time among
partitions and to implement time separation. It can be used
to ensure that all applications receive a predefined amount
of execution time and to prevent one thread from starving
others, even in the case of a faulty thread. In its simplest form,
PikeOS time partitioning can be used to allocate a certain
CPU quota (specified as time partition) to each partition.
Here, there is a one-to-one relationship between time and
resource partitions, and the partitions are separated in time.
In advanced configurations, multiple partitions can share the
same time partition. In this case, application threads from
different partitions mapped to the same time partition are
scheduled based on thread priority, and temporal interference
between applications may occur.
During design time of the partitioned architecture, the
system integrator estimates computing resources, such as CPU
time and RAM, that are required by applications in order to
fulfill their functional and non-functional requirements (e. g.,
timing and throughput requirements). The system integrator
then allocates the estimated set of resources to the partitions
that execute the applications. Safety and security applications
are mapped to partitions that are separated in space and time.
Communication channels between safety and security applications are also defined at design time. They are realized using
communication objects provided by the SK. At runtime, the
SK ensures that the resource allocation policy defined by the
system integrator is adhered to and that the applications only
affect each other via the explicitly configured communication
channels. Because of this separation, failures cannot propagate
from security applications to safety applications. As a result,
applications can be certified independently of each other at
their respective assurance levels.
Fig. 2: Implementation Architecture for Railway Control-Command and Signaling
C. TPM Resource Manager
The TPM RM is responsible for managing the limited
resources of a TPM, swapping in and out objects, sessions, and
sequences as needed. For applications, the RM provides a view
to the TPM as if it had (virtually) unlimited resources, similar
to a virtual memory manager [12]. The RM is part of the TSS
architecture. It sits directly on top of the TPM. Applications
use one of the TSS user Application Programming Interfaces
(APIs) to talk to it: System API (SAPI), Enhanced System
API (ESAPI) or Feature API (FAPI).
The RM has multi-user support, scheduling calls on a per-
command basis from the caller. For applications, the RM
behaves transparently, i. e., there is no difference for an application talking to the TPM directly or to the RM. Context
swapping as done by the RM is particularly useful in a multiprocess environment where applications are unaware of each
other, such as in PikeOS. In our implementation, we have a
need for a TPM RM since we have three partitions using the
TPM: partition loader, remote attestation, and secure update.
The RM specification from the Trusted Computing Group
(TCG) [12] leaves it open to implementations to define
command priorities and connection privileges. The official
tpm2-abrmd implementation of the RM on GitHub does not
implement any restrictions or privileges [13]. Further, tpm2abrmd depends on D-Bus, which is not available in PikeOS.
For these reasons, we develop a custom RM which allows us to
restrict TPM resource consumption as well as to assign TPM
usage priorities to PikeOS partitions. Priorities are defined per
connection and our internal command reordering gives priority
to commands from privileged PikeOS partitions, such as the
partition loader.
We introduce TPM proxy interfaces for all partitions which
require TPM access. TPM proxy interface applications connect
internally to the TPM RM partition, providing the same
interface as the real TPM device driver to applications. All
proxy interfaces can be located in a single partition or each
in a separate one. Fig. 3 shows a use case where each TPM
proxy interface runs in its own partition.
Each TPM proxy interface partition has a pair of unidirectional channels to communicate with the RM. The RM
allocates memory statically for each client. At compile time,
the RM is configured with static parameters:
•
•
•
MAX_CONNECTIONS defines the maximum client connection handle by the RM.
MAX_SAVED_OBJECTS defines the maximum number
of objects and sequences that can be held in the RM per
clients.
MAX_SESSION_PER_CLIENT defines the maximum
number of sessions that can be active for each client.
This also sets the maximum number of sessions that can
be loaded in the RM per client.
Internally, the RM is split into two execution entities. One
periodically reads requests from clients from TPM proxy interfaces and stores requests in a FIFO based-priority. The other
swaps TPM contexts as needed, sends requests extracted from
the FIFO based-priority to the TPM device driver, and writes
the response back to the appropriate TPM proxy interface.
D. Object Controller Application and Safety Communication
Stack
The RaSTA transport protocol is used for the safety-critical
communication channel between OC endpoints and the MDM.
RaSTA is standardized in DIN VDE V 0831-200 [14]. It
uses two independent communication channels to achieve the
safety properties availability and integrity against transmission
errors required according to the standard EN 50159. To achieve
security against active attacks, RaSTA is tunneled through a
Virtual Private Network (VPN) like IPSec.
As shown in Fig. 2, we use two physically distinct Ethernet
controllers for RaSTA, eth0 and eth1. The SIL4 OC application
and RaSTA are executed inside of safety partitions. The IP
stacks and network drivers of the two redundant channels are
mapped to two separate communication partitions. Memory
and timing requirements for the OC application and communication stacks are computed using tools qualified according
to the EN 50128 standard. During design time, safety and
communication partitions are assigned with required resources,
such as windows in the time schedule and stack regions,
that are enforced by the SK at runtime. Placing the network
stack of redundant communication paths in separate partitions,
achieves independence between the two channels and avoids
cascading failures from one channel to the other.
E. Security Applications
Our architecture introduces several security applications.
Measured boot provides the foundation for remote attestation
by keeping track of all software executions on the platform. Remote attestation then reports the platform operational
state in a tamper-proof manner—i. e. authenticated, integrityprotected, and ensuring freshness/recentness—to the appropriate counterpart in the MDM. Secure update of partitions
provides authentication and integrity of updated partitions,
using the TPM.
1) Measured Boot: In a measured boot process, all software
executions on a platform are recorded in an event log as well
as in the TPM to ensure integrity. Technically, a measurement
is a hash digest of the software binary. TPM Platform Configuration Registers (PCRs) are used to anchor log entries tamperproof by providing a cryptographic folding hash function,
called extend(): P CRi+1 = hash(P CRi |measurementi+1 ).
This allows for continuously extending hashes of log entries
into PCRs, forming a COT. PCRs can neither be reset nor
be set to arbitrary values during runtime. They can only
be extended and read, and are reset on boot to their initial
values. This is an essential requirement in order to detect
compromised software components.
The Measured Boot process starts with the RTM. After it
has finished executing its main logic, it measures the next
component in the boot sequence, the UEFI. It records the
measurement in the boot log and extends the hash of the boot
log entry to a PCR in the TPM. Then, control is passed to the
UEFI. This principle of first measure, then start repeats along
system components, i. e., RTM, UEFI, boot loader, and the SK.
The partition loader of the SK is responsible for measuring and
starting all safety and security partitions. It is part of the SK,
and as such already measured. Whenever a partition is started,
the partition loader measures it, logs it into the partition log,
anchors the log entry in the TPM, and then starts it. Exactly as
before, with one significant difference: The SK stays in control
and does not pass it to another component. The produced boot
log can then be reported by means of remote attestation.
Fig. 3: TPM Resource Manager (RM) in PikeOS context
2) Remote Attestation: Our remote attestation implementation is based on the Challenge-Response based Remote
Attestation with TPM 2.0 (CHARRA) [15]. CHARRA is
a Linux-based proof-of-concept implementation in C (C99)
of the “Challenge/Response Remote Attestation” interaction
model of the Internet Engineering Task Force (IETF) Remote Attestation Procedures (RATS) Reference Interaction
Models [16]. The remote attestation protocol employed by
CHARRA is as follows:
1) The remote verifier establishes a DTLS connection with
the attester.
2) The verifier requests an attestation from the attester,
transmitting a challenge in the form of a random nonce,
a selection of PCRs, and a key ID with which the attester
is supposed to sign the attestation. The nonce is used to
guarantee freshness and to prevent replay attacks.
3) The attester performs a TPM quote operation to sign the
internal state of the TPM, i. e., the PCR values according
to the PCR selection, incorporating the provided nonce.
4) The attester sends back to the verifier the TPM quote, the
boot log, and the partition log.
5) The verifier verifies the signature of the TPM quote as
well as the integrity of the logs by comparing them
against the PCR values. Further, the verifier matches the
log entries against a whitelist which holds known-good
hashes of boot software and PikeOS partitions. In case of
a deviation between the reported and expected reference
values, the system may potentially be compromised, and
mitigation actions, such as going to failsafe mode, should
be triggered.
To communicate between the MDM and the Remote Attestation Service PikeOS partition, CHARRA uses libcoap [17],
an implementation of the Constrained Application Protocol (CoAP) [18]. Constrained Binary Object Representation
(CBOR) [19] is used for wire-encoding data structures, utilizing QCBOR [20]. The ESAPI [21] of the TSS is used
internally to talk to the TPM.
We ported CHARRA and all libraries–including mbedTLS,
the TSS ESAPI, and QCBOR to PikeOS–except for libcoap
because porting overhead is too huge. Since libcoap constitutes
an essential part of CHARRA we decided to run CHARRA
in an ELinOS partition, an embedded Linux environment for
PikeOS. This way, all dependencies are easily met. CoAP
payload size is limited to 1 KiB per Protocol Data Unit (PDU).
Due to the size of attestation data and log files we easily
exceed this limit. Accordingly, we implement CoAP blockwise transfers to compensate for this limitation.
3) Secure Update: According to the railway operational
guidelines, when a software update of safety application is
performed, a technician shall validate the functionality on
the field before the updated system is made operational. Due
to this restriction, we do not allow remote software updates
of safety applications. However, security applications which
are not subject to safety certification in our architecture,
are allowed to be updated remotely in a process we call
secure update. Frequent updates to security applications are
necessary, e. g., to mitigate newly discovered weaknesses in
cryptographic algorithms or in their implementation, to fix
bugs in software, or simply to introduce new features to
security applications.
To be valid in the railway environment, security applications must be approved according to a coordinated security
certification process. This certification process must be done
according to Common Criteria (IEC 15408) and is part of the
approval process. A certificate is issued by the appropriate
standardization body of a particular country, e. g. the BSI in
Germany. In order to be able to provide the required level of
protection, security components must be continuously updated.
Due to the different boundary conditions, expected update
cycles for security applications are much smaller and more
frequent (from one hour to one day) compared to that in the
safety systems (from one week to one month). So updates and
patches can be (time-)critical and at the same time actually
require recertification which can take up to several months.
These contradictory requirements result in the deployment of
the latest and secure but uncertified security components in
railway systems.
The secure update mechanism used in our architecture is
described below. A PikeOS partition hosting security application is updated as a whole by shutting down an existing
partition and then applying and loading the entire image of a
new updated partition. In order to secure the update process, a
cryptographic key pair is generated to sign and verify partition
images before they are applied. The private portion of that
key remains in the MDM. The public portion is stored in
the NVRAM of the TPM on the OC. We protect the TPM
NVRAM area against deletion by using platform authorization
with a secure passphrase in the TPM platform hierarchy. The
passphrase is only known to the MDM, so that only the
MDM is able to delete the public key from the NVRAM.
In production, this process must happen during manufacturing
of the OC.
Whenever a partition update is due, the secure update
component in the MDM signs the partition image with the
private key. Then, the MDM transfers the partition image and
the signature to the secure update service partition in the OC.
There, the partition verifies the signature using the public part
of the key from the TPM, utilizing the TPM RM partition. We
use OpenSSL to perform signing and verification operations.
Only if the signature is valid, the partition image update
is allowed to be applied. After integrity verification of the
partition update package, reprogramming of persistent memory
is performed by the partition updater. Once the reprogramming
is done, the partition updater verifies the integrity of updated
components and communicates the status to the secure update
service. Safety partitions are not affected by the partition
updating process, due to the resource separation enforced by
the SK.
The partition updater is responsible for providing the appropriate security properties for the update process, such as
confidentiality and integrity of the communication channel,
client/server authentication, remote attestation, and integrity
checking of partition update packages. The partition loader
provides the required safety properties for the update process.
The functionality implemented by the partition loader include
system state and power management, life-cycle of updated
applications, error handling, and recovery/fallback processes.
Management of security software—i. e. development, deployment, transfer to the SOC and MDM is subject to the
railway operator. We assume that all software development
processes in the backend are in accordance with the necessary
safety and security guidelines. Our focus is in on the MDM
and the secure transfer and secure update.
4) Cryptographic Key Management: In our concept and
the implementation, we use a couple of cryptographic keys.
Table II provides an overview of all cryptographic keys that
are created, where they are stored, used, and when they are
destroyed.
V. E VALUATION
In this section, we briefly describe the evaluation of our
implementation by describing how the requirements are met,
how we realized the test bed, and how our architecture can be
integrated on a SIL4 hardware.
A. Compliance with Security Requirements
For evaluation purposes, we refined the generic security
requirements from Section III into specific ones that take into
account the technology choices made for the implementation
of the security architecture described above. The overall approach is similar to SREP (Security Requirements Engineering
Process) [22], [23]: first, in our previous work [8], relevant
generic security requirements were determined using standard
DIN VDE V 0831-104 (i.e., IEC 62443 applied to the railway
domain) [3] and, second, in this work, the specific knowledge
about the solution’s architecture (including related functional
limitations and security threats) was used by security experts to
elicit system-specific requirements that can later be utilized in
the solution’s validation and testing. Table III links our refined
requirements to the generic ones listed in the 3rd column.
In the following, we discuss mitigations the implementation
includes to meet the requirements. For R1r , R2r and R7r TPM’s
protected storage and enhanced authorization functionality are
employed, i.e., keys for the OC can be generated directly in
the TPM to never leave this TPM, other credentials are stored
integrity-protected. The access to the keys can be sealed to
the system’s boot state and/or to the particular application
or user that attempts the access by means of the pre-defined
TPM-based policies. R3r and R4r are achieved by combining
measured boot and platform integrity validation for an OC
using the attestation protocol directly on start-up, with the
OC being included into operation only if its integrity can be
verified by MDM. If the state of the booted OC’s software
differs from what MDM expects (the attestation response
cannot be validated or there is no response), the MDM sends
an alarm to SOC, which in turn forwards this information
to the operator’s command and control center responsible for
railway safety. This way, any routes and train drivers that can
be at risk due to the compromised OC can be timely taken
care of without any interference with safety-critical functions.
To meet R5r , the system only uses DTLS from mbedTLS for
all communications such as CHARRA and secure update. For
R6r , the same secure update mechanism can be used that allows
updating any vulnerable software in security applications. As
an update for a safety application needs to be validated and
authorized by an operator in the field, it is currently excluded
from the secure update. R8r is fulfilled by the use of SK based
partitioned architecture. It also allows independent restarting
r
of partitions thus fulfilling R9r . R10
is achieved by using the
TPM to securely store a public key to which only the MDM
knows the private portion. The public key is protected against
deletion. An update image is signed by the MDM and verified
r
by the OC using the protected public key. R11
is realized
r
by the partition updater as described in Section IV-E3. R12
can be satisfied with mechanisms described in [24], using
unique features of TPM 2.0. This is currently work in progress.
r
R13
is achieved by using measured boot in the boot chain
starting from the RTM. The separation mechanisms included
r
in PikeOS ensure R14
.
B. Test Environment
To evaluate the implemented architecture, we integrated the
prototype of our implementation into a testbed for railway
operations simulations. The testbed is based on a digital model
railway with currently two switches and two signals as field
elements on which a model train is running (cf. Fig. 4).
The field elements can be controlled by a legacy OC or an
OC that implements our security architecture. The hardware is
TABLE II: Cryptographic Key Management
Key
Created by
Stored
Used
Destroyed
MDM DTLS Key for Secure
Update (asymmetric)
MDM (during
deployment)
Private part in the MDM, public
part in the OC Secure Update
partition.
for secure update process
usually never, unless the
system (OC or MDM) is
compromised
OC DTLS Key for Secure Update (asymmetric)
OC (during deployment)
Private part in the OC Secure
Update partition, public part in
the MDM.
for secure update process
usually never, unless the
system (OC or MDM) is
compromised
MDM DTLS Key for Remote
Attestation (asymmetric)
MDM (during
deployment)
Private part in the MDM, public
part in the OC Remote Attestation partition.
for remote attestation process
usually never, unless the
system (OC or MDM) is
compromised
OC DTLS Key for Remote
Attestation (asymmetric)
OC (during deployment)
Private part in the OC Remote
Attestation partition, public part
in the MDM .
for remote attestation process
usually never, unless the
system (OC or MDM) is
compromised
Remote
Attestation
(asymmetric)
Key
OC
(during
manufacturing)
Private part on the OC (protected by the TPM), public part
on the MDM.
for remote attestation process
never, unless the system is
decommissioned (the key
should be destroyed properly by deleting it)
Secure Update Signature Key
(asymmetric)
MDM (during
manufacturing)
Private part on the MDM, public part in the TPM of the OC
(protected by the TPM).
for secure update process
never, unless the system is
decommissioned (the key
should be destroyed properly by deleting it)
TABLE III: Refined security requirements
R1r
R9r
r
R10
r
R11
r
R12
The system shall store credentials used to authenticate endpoints (MDM, ILS, peer OCs) or users under
protection of TPM
The system shall control and restrict access to the area where the authentication credentials for endpoints and
users are stored (e.g., with password or policy)
There shall be means to detect endpoints that run non-authenticated software
Before bringing the OC into service, the platform integrity shall be validated, i.e. the system shall not be put
to operational state when it executes unauthorized software
The system shall use only an industry-standard communication stack
There shall be means to update/replace the communication stack when a vulnerability is detected
Access to the credentials (e.g., keys) shall only be allowed if the system is not modified in an unauthorized
manner
The system shall ensure that the resource allocation for fulfilling the runtime requirements (e.g. latency, boottime) computed at design time is met at runtime
An application/component shall be updatable without restarting the system or affecting the rest of the system
The authenticity and integrity of update images shall be validated
The system shall ensure persistence of an update image
The system shall provide protection against downgrade attacks during update
r
R13
r
R14
The system shall record the integrity of software components at load time in a tamper-evident manner
The system shall provide isolation of safety-relevant components from any other components of the system
R2r
R3r
R4r
R5r
R6r
R7r
R8r
R2 R3 R5 R6 R7 R13
R2 R3 R5 R6 R7 R13
R4
R4 R10 R13 R14
R5
R5
R6 R13 R14
R8 R11 R10 R13 R14
R9
R9 R14
R9
R9
R12 R13 R14
R13
board MEN G22 equipped with TPM 2.0. A safety-backend
implements functions for railway interlocking and operations
control and a security-backend implements MDM and SOC
functionalities providing firmware and configuration updates
and performing periodic platform attestations. The communication of the OC to the model railway is handled by Railuino2 .
The testbed allows simulating local and remote attackers and
play through various test scenarios to evaluate our solution
and the effects it has on the railway operations. Examples of
test scenarios used to evaluate the above mitigations are as
follows:
Detecting compromised OCs: A local attacker boots manip-
ulated firmware on an OC to control it as they like. To
make this attack scalable a remote attacker manipulates
an update image in the MDM. This scenario can be
configured to test the mitigation for the requirements R3r ,
r
r
r
r
, R12
, R13
, and partially R14
.
R4r , R7r , R10
Preventing OC’s resource exhaustion: A software stack
used to built a (security) application on an OC has a
software bug or an exploit crafted by an attacker leading
to resources exhaustion and undesired behavior in the
OC. This scenario allows testing the mitigation for the
r
requirements R8r and R14
.
2 The Railuino is an Arduino Uno Rev3 with an ATmega328P, https://code.
google.com/archive/p/railuino/
In this work, our goal is to build a certifiable reference
design for augmenting safety applications with security mech-
C. Redundancy Architecture for SIL4 Object Controller
Fig. 4: Testbed for railroad simulations (MEN G22 on the right)
anisms on the same hardware platform. We use the MEN
G22 board for our implementation where the Hardware Fault
Tolerence (HFT) requirement of SIL4 control logic is not
addressed. For fulfilling the HFT ≥ 1 requirement of the SIL4
OC main logic, we need a redundancy architecture where the
control logic is executed on independent processors.
Fig. 5 shows an exemplary redundancy design using a three
Central Processing Unit (CPU) safe hardware platform3 . This
design includes two independent CPUs that are responsible for
redundant execution of OC main logic and one I/O processor
that is responsible for I/O devices in the platform, TPM,
network controllers, and I/O device that controls the Field
Elements.
When mapping our implementation to such a redundant
platform, we execute security applications and the safety
network stack, including RaSTA, on the I/O processor. This
is needed as these applications require access to the hardware
TPM and network controllers. A TPM has a unique secret from
which initial keys are derived. A random number generator is
used to generate additional secure keys. For this reason, two
different TPMs always generate two different keys. Therefore,
a TPM cannot be designed redundantly and is directly connected to the I/O CPU. The OC control logic is executed on the
control processor in parallel, receives the same input from the
RaSTA stack, and computes the results. The control processors
also implement two safety monitors which synchronize the
two CPUs, compare inputs and outputs for each command
received, and push the results to the I/O processor. In case
of a result mismatch, system is set into “Failure Mode” and
becomes inactive.
VI. C ERTIFICATION AND H OMOLOGATION
The railway infrastructure must be considered a critical
infrastructure, which by law requires special technical and
organizational measures to enhance its cybersecurity (e.g., the
act on the Federal Office for Information Security (BSI Act
3 The MEN F75P single-board vital computer, e. g., is such a redundant
hardware platform
– BSIG) in Germany). Evidence of the effectiveness of the
security measures needs to be reported periodically to the
authorities. At the same time, the infrastructure is safetycritical and requires safety certification in order to protect
passengers and the environment from physical harm stemming
from malfunctions. At the moment, there is no established
standard in the railway domain that specifies the interplay
between the safety and the security components as well
as methods for certification (work on a European technical
specification for security in the railway domain, prTS 50701
[4], is underway).
In order to certify the architecture described in the paper, we
thus propose to separate the concerns of safety and security to
the largest possible extent – as it is likely that security components need to be updated periodically (which would trigger
the need for re-certification), while safety-critical components
have a significantly longer lifespan. We thus propose to certify
that the architecture provides a clear separation between all
safety-critical components and security components, so that
the latter can be seen as “security shell” around the former,
thus implementing the concept proposed in [25]. This safety
certification can be performed according to well-established
safety standards (such as EN 50159), under the assumption
that the security components adequately shields safety. For
the security certification, we propose to follow the upcoming
prTS 50701, which will likely become the de-facto standard
for security in railway operation. The architecture proposed in
the paper is already compliant to prTS 50701.
The railway safety standard EN 50128 defines the requirements of the software for railway control and protection
systems. The standard defines development, verification and
validation processes for reducing the systematic software
failures to acceptable levels. Due to the rigorous processes
to be followed, the cost for safety critical software directly
depends on the number of Source Lines of Code (SLOC).
Thus, keeping the SLOC as small as possible reduces the
certification effort significantly. Thanks to our partitioned
architecture, we can limit the safety certification efforts to
Fig. 5: Redundancy architecture with three CPUs
the safety application alone without the need to certify the
security application even though they are integrated on the
same hardware platform. According to EN 50128, the subcomponent that provides independence between safety and
non-safety applications shall be certified at the same level
as the highest SIL safety application. PikeOS SK that we
use in our design is certifiable to SIL4 which corresponds
to the SIL required by the OC safety application. A TPM 2.0
is certifiable according to FIPS 140-2 level 1 or 24 as well
as Common Criteria (CC) EAL4+ Moderate based on the
TPM 2.0 Protection Profile.5 Formally verifying that our
implementation of remote attestation in conjunction with TPM
based measured boot is sound and yields expected results is
work-in-progress.
VII. C ONCLUSION
In this paper, we described the practical implementation
and evaluation of our railway security architecture introduced
in [7], [8]. A particular challenge was the integration of the
Trusted Platform Module (TPM) into the Multiple Independent
Levels of Safety and Security (MILS) Separation Kernel (SK)
PikeOS, and to enable TPM access to all security partitions with the TPM Resource Manager (RM). Our evaluation
showed that our solution is a viable approach for integrating
safety and security applications on one platform, and that it
has the potential for receiving certification and homologation.
As future work, we plan to integrate our solution in a test
field of the Deutsche Bahn (DB). This allows for close-toreal-life evaluation of the prototype with a fully functional
safety application, including the RaSTA stack and a security
application on a single secure hardware platform.
ACKNOWLEDGMENT
The work presented in this paper has been partly funded
by the German Federal Ministry of Education and Research
(BMBF) under the project “HASELNUSS” (ID 16KIS0597K).
Maria Zhdanova is member of the TALENTA program of the
“Fraunhofer-Gesellschaft”.
4 https://trustedcomputinggroup.org/resource/
tcg-fips-140-2-guidance-for-tpm-2-0/
5 https://trustedcomputinggroup.org/resource/pc-client-tpm-certification/
R EFERENCES
[1] H. Leister, “ETCS und digitale Technologie für Stellwerke,” EisenbahnRevue International, vol. 8-9, 2017.
[2] DIN VDE V 0831-200:2015-06, “Elektrische Bahn-Signalanlagen, Teil
200: Sicheres Übertragungsprotokoll RaSTA nach DIN EN 50159 (VDE
0831-159),” 2015.
[3] DKE, “Elektrische Bahn-Signalanlagen – Teil 104: Leitfaden für die ITSicherheit auf Grundlage der IEC 62443 (DIN VDE V 0831-104),” Oct
2015.
[4] CENELEC, “PD CLC/TS 50701 Railway applications - Cybersecurity,”
2020.
[5] C. Schlehuber, M. Heinrich, T. Vateva-Gurova, S. Katzenbeisser, and
N. Suri, “Challenges and approaches in securing safety-relevant railway
signalling,” in European Symposium on Security and Privacy Workshops
(EuroS&PW), 2017.
[6] CENELEC - European Committee for Electrotechnical Standardization,
EN50128 - Railway applications - Communications, signalling and
processing systems - Software for railway control and protection systems,
2010, no. EN 50128:2001 E.
[7] H. Birkholz, C. Krauß, M. Zhdanova, D. Kuzhiyelil, T. Arul, M. Heinrich, S. Katzenbeisser, N. Suri, T. Vateva-Gurova, and C. Schlehuber, “A
reference architecture for integrating safety and security applications on
railway command and control systems,” in 4th International Workshop
on MILS: Architecture and Assurance for Secure Systems, (MILS@DSN
2018), 2018.
[8] M. Heinrich, T. Vateva-Gurova, T. Arul, S. Katzenbeisser, N. Suri,
H. Birkholz, A. Fuchs, C. Krauß, M. Zhdanova, D. Kuzhiyelil,
S. Tverdyshev, and C. Schlehuber, “Security requirements engineering
in safety-critical railway signalling networks,” Security and Communication Networks, vol. vol. 2019, Article ID 8348925, 2019.
[9] T. Arul, J. Cosic, M. Drodt, M. Friedrich, M. Heinrich, M. Kant,
S. Katzenbeisser, H. Klarer, P. Rauscher, M. Schubert, G. Still,
D. Wallenhorst, and M. Zhdanova, “IT/OT-security for internet of railway things (IoRT),” https://haselnuss-projekt.de/downloads/Whitepaper
IoRT-Security en.pdf, Working Group CYSIS, retrieved August 30,
2021.
[10] S. Tverdyshev, H. Blasum, B. Langenstein, J. Maebe, B. De Sutter,
B. Leconte, B. Triquet, K. Müller, M. Paulitsch, A. Söding-Freiherr von
Blomberg et al., “Mils architecture,” Zenodo, 2013.
[11] SYSGO GmbH, “PikeOS hypervisor webpage,” https://www.sysgo.com/
products/pikeos-hypervisor/, retrieved June 29, 2021.
[12] Trusted Computing Group, “TCG TSS 2.0 TAB and Resource Manager
Specification,” Apr. 2019.
[13] Fraunhofer SIT, Intel, and Infineon Technologies. (2016, Apr.)
TPM2 Access Broker & Resource Manager. [Online]. Available:
https://github.com/tpm2-software/tpm2-abrmd
[14] DKE, “Electric signalling systems for railways - Part 200: Safe transmission protocol according to DIN EN 50159 (VDE 0831-159). (DIN
VDE V 0831-200),” June 2015.
[15] M. Eckel. (2019, Sep.) CHARRA: Challenge-Response based Remote
Attestation with TPM 2.0. [Online]. Available: https://github.com/
Fraunhofer-SIT/charra
[16] H. Birkholz and M. Eckel, “Reference Interaction Models for
Remote Attestation Procedures,” Internet Engineering Task Force,
Internet-Draft draft-ietf-rats-reference-interaction-models, Jan. 2020,
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
work in progress. [Online]. Available: https://datatracker.ietf.org/doc/
draft-ietf-rats-reference-interaction-models/
O. Bergmann. (2010, Jul.) libcoap: A C implementation of the
Constrained Application Protocol (RFC 7252). [Online]. Available:
https://github.com/Fraunhofer-SIT/charra
Z. Shelby, K. Hartke, and C. Bormann, “The Constrained Application
Protocol (CoAP),” Internet Requests for Comments, RFC Editor, RFC
7252, Jun. 2014. [Online]. Available: https://tools.ietf.org/html/rfc7252
C. Bormann and P. Hoffman, “Concise Binary Object Representation
(CBOR),” Internet Requests for Comments, RFC Editor, RFC 8949,
Dec. 2020. [Online]. Available: https://tools.ietf.org/html/rfc8949
L. Lundblade. (2018, Sep.) QCBOR: an implementation
of
nearly
everything
in
RFC8949.
[Online].
Available:
https://github.com/laurencelundblade/QCBOR
Trusted Computing Group, “TCG TSS 2.0 Enhanced System API
(ESAPI) Specification,” May 2020.
D. Mellado, E. Fernández-Medina, and M. Piattini, “Applying a security
requirements engineering process,” in Computer Security – ESORICS
2006, D. Gollmann, J. Meier, and A. Sabelfeld, Eds. Berlin, Heidelberg:
Springer Berlin Heidelberg, 2006, pp. 192–206.
A. Toval, J. Nicolás, B. na Moros, and O. Garcı́a, “Requirements reuse
for improving information systems security: A practitioner’s approach,”
Requirements Engineering Journal, vol. 6, pp. 205–219, 2001.
A. Fuchs, C. Krauß, and J. Repp, “Advanced Remote Firmware Upgrades Using TPM 2.0,” in Proceedings of the 31th International
Conference on ICT Systems Security and Privacy Protection (IFIP SEC),
2016.
C. Schlehuber, M. Heinrich, T. Vateva-Gurova, S. Katzenbeisser, and
N. Suri, “A security architecture for railway signalling,” in 36th International Conference on Computer Safety, Reliability and Security
SAFECOMP, 2017.
ACRONYMS
API
CBOR
CC
CCS
CHARRA
CoAP
COT
CPU
DB
EU
ESAPI
ETCS
FAPI
HFT
IETF
ILS
IT
MDM
MILS
MMU
OCC
OC
OS
OT
PCR
PDU
RaSTA
RATS
Application Programming Interface . . . . . . . . . . . . 5
Constrained Binary Object Representation . . . . . 7
Common Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Control-Command and Signaling . . . . . . . . . . . . . . 1
Challenge-Response based Remote Attestation
with TPM 2.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Constrained Application Protocol. . . . . . . . . . . . . .7
Chain of Trust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Central Processing Unit . . . . . . . . . . . . . . . . . . . . . 10
Deutsche Bahn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
European Union . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Enhanced System API of the TPM 2.0 TSS . . . . 5
European Train Control System . . . . . . . . . . . . . . . 1
Feature API of the TPM 2.0 TSS . . . . . . . . . . . . . 5
Hardware Fault Tolerence . . . . . . . . . . . . . . . . . . . 10
Internet Engineering Task Force . . . . . . . . . . . . . . . 7
Interlocking System . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Information Technology . . . . . . . . . . . . . . . . . . . . . . 1
Maintenance and Data Management . . . . . . . . . . . 1
Multiple Independent Levels of Safety and
Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Memory Management Unit . . . . . . . . . . . . . . . . . . . 4
Operation Control Center . . . . . . . . . . . . . . . . . . . . . 1
Object Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Operating System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Operational Technology . . . . . . . . . . . . . . . . . . . . . . 1
Platform Configuration Register . . . . . . . . . . . . . . . 6
Protocol Data Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Rail Safe Transport Application . . . . . . . . . . . . . . . 1
Remote Attestation Procedures . . . . . . . . . . . . . . . . 7
RM
RTM
SAPI
SIL
SK
SLOC
SOC
TCG
TDS
TPM
TSS
UEFI
VPN
Resource Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Root of Trust for Measurement . . . . . . . . . . . . . . . 4
System API of the TPM 2.0 TSS . . . . . . . . . . . . . 5
Safety Integrity Level . . . . . . . . . . . . . . . . . . . . . . . . 2
Separation Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Source Lines of Code . . . . . . . . . . . . . . . . . . . . . . . 10
Security Operations Center . . . . . . . . . . . . . . . . . . . 3
Trusted Computing Group . . . . . . . . . . . . . . . . . . . . 6
train detection system . . . . . . . . . . . . . . . . . . . . . . . . 1
Trusted Platform Module . . . . . . . . . . . . . . . . . . . . . 1
TPM Software Stack . . . . . . . . . . . . . . . . . . . . . . . . . 2
Unified Extensible Firmware Interface . . . . . . . . . 2
Virtual Private Network . . . . . . . . . . . . . . . . . . . . . . 6
Download