Short Paper: PRP: Priority Rollback Protocol – A PIP extension Abstract

advertisement
Short Paper: PRP: Priority Rollback Protocol – A PIP extension
for mixed criticality systems.
Lukasz Ziarek
Fiji Systems Inc., Indianapolis IN, 46202
luke@fiji-systems.com
Abstract
Priority inheritance protocol (PIP) is an important protocol to prevent unbounded priority inversion of threads which contend on
shared resources. Recently, there has been a renewed interest in
reducing the latency and increasing predictability for high priority
threads that acquire contended resources from low priority threads.
In this paper we present an extension to PIP called Priority Rollback Protocol, which allows high priority threads to rollback low
priority threads, thereby freeing contended resources. PRP leverages recent advances in software transactional memory (STM) to
facilitate efficient and predictable reversion of low priority threads.
We present two versions of the PRP algorithm and compare and
contrast their tradeoffs. PRP is geared toward mixed criticality systems, specifically for providing tight and predictable bounds for direct communication between processes. Since PRP is an extension
of PIP, systems can seamlessly leverage both PIP and PRP locks.
Categories and Subject Descriptors C.3 [Real-time and embedded systems]; D.3.3 [Language Constructs and Features]: Concurrent Programming structures
General Terms Design
Keywords Prioirty inheritance protocol
1.
Introduction
Real-time systems are notoriously difficult to program: correctness and predictability are paramount. Real-time systems are usually concurrent, relying on multiple threads of control. When such
threads execute with differing priorities, programs and runtimes
must provide assurances against priority inversion. Priority inheritance protocol (PIP) [4, 7] is a wide spread protocol used by realtime systems to prevent unbounded priority inversion. Whenever
a high priority thread attempts to acquire a shared lock held by a
low priority thread, the low priority thread is temporarily boosted
to the priority of the higher priority thread until it has completed its
critical region. PIP prevents unbounded priority inversion by disallowing intermediate priority threads from executing. When the low
priority thread has completed, its priority is returned to normal and
the high priority thread can acquire the contended resource.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. To copy otherwise, to republish, to post on servers or to redistribute
to lists, requires prior specific permission and/or a fee.
JTRES’10 August 19-21, 2010 Prague, Czech Republic
c 2010 ACM 978-1-4503-0122-0/10/08. . . $10.00.
Copyright Mixed criticality systems [6] are specialized real-time systems
in which multiple processes share a common hardware board. Processes in a mixed criticality system are time, space, and resource
partitioned to ensure one process does not effect the predictability of another process. Recent advances in Java VMs have yielded
multi-VMs [3] allowing the execution of mixed criticality systems
within a single Java VM. There has been a great deal of interest
in developing fast and predictable methods of communication between processes executing within a mixed criticality system. Unfortunately, providing sufficient predictability guarantees in the presence of direct communication between processes in a mixed criticality system is difficult. Even with locking protocols such as PIP,
guaranteeing that a direct communication from one partition into
another does not block high priority threads is far from trivial.
In this paper we present a novel extension to PIP aimed at
reducing blocking latency on contended resources for high priority
threads. Our extension, called priority rollback protocol (PRP),
allows for high priority threads to rollback lower priority threads,
thereby releasing contended resources.
2.
Priority Rollback Protocol
To allow for fast and predictable lock acquisition for high priority threads, especially in mixed-criticality environments, we introduce priority rollback protocol. Abstractly, PRP allows high priority threads to rollback lower priority threads allowing for immediate lock acquisition in a predictable fashion. PRP builds on concepts from software transactional memory to provide safety guarantees on the state of memory during a rollback. We introduce two
versions of priority rollback protocol. The first of the two utilizes
write buffering, and the second of the two uses write logging. Write
buffering directs all writes to local isolated memory, whereas write
logging allows writes to occur to shared memory and logs previous values. The two versions have differing rollback properties and
worst case execution times. We present the high-level details for
both approaches and compare and contrast the two.
2.1
Write Buffering PRP
In PRP a lock protecting a critical region can be acquired in two
different ways. The first adheres to traditional PIP and has the same
costs associated with PIP. We also allow for a lock to be acquired in
a rollback mode. We envision that this mode will be used by lower
priority threads, which signals to high priority threads which wish
to acquire this lock that the low priority thread can be unrolled. To
distinguish between the two modes, information is needed in the
lock state.
Threads which acquire a lock in rollback mode execute their
critical region differently. The Fiji VM statically creates a new
version of the code for the critical region with additional read and
write barriers inserted. Any writes that the thread does are not
foo.bar = o;
4 + b;
foo.bar
a
x
b
y
b
z
a
w
b
y
c
z
o
a
x
b
y
c
z
foo.bar
o
a
x
b
y
c
z
Figure 1: Every write to shared memory is stored within the write
buffer which acts as a stack of location value pairs. Each read from
a location that is stored in the buffer must read the latest value from
the buffer. This value is stored in the first location value pair for the
given location found by traversing the log top down.
immediately made visible to other threads. Instead, the writes occur
in a thread local buffer. To maintain consistency, read barriers are
also inserted into the critical region for the low priority thread. The
read barriers redirect reads to data that the low priority thread has
modified to the write buffer.
At a high-level, a write buffer can be implemented as a stack of
location and value pairs. The location is the raw memory address of
the data being written to, and the value is the value that the thread
wishes to write. The WCET of a write is, therefore, the cost of
creating the pair and pushing it onto the write buffer stack. A read,
on the other hand, must do a check to see if an object is present
in the write buffer. The WCET of a read is, therefore, the cost to
traverse the buffer. As an example, consider figure Fig. 1. When the
program executes a statement foo.bar = o; in a critical region
protected by a PRP lock a new value and location pair containing
foo.bar and o is pushed onto the write buffering stack. When the
program executes the read statement 4 + b;, we must first check
to see if a value is stored for b in the write buffer. This requires
traversing the write buffer. If a location and value pair is found for
b, the value in the first such pair is read. The first pair contains the
logically last write to the location b.
Once a low priority thread has completed its critical region, it
must flush its buffer to main memory. The low priority thread does
so by walking its write buffer stack and for every location and value
pair, writing the value to the location. The WCET cost of flushing
a buffer is the cost to traverse the buffer and perform the writes.
This process can be accelerated, by only doing the logically last
write to a given location. Therefore, if a low priority thread writes
to a given location many times, only the last value written will be
flushed. We reserve an additional bit of information in the lock state
to signal that a thread which acquired a lock in rollback mode is in
the process of flushing its buffer. As an example consider Fig. 2,
where the write buffer contains multiple entries for locations b and
a. When flushing the buffer, we traverse it top down. Notice the first
location and value pair for the location a is (a, x). Therefore, the
value we push to location a in shared memory is x. Similarly, we
write the value y to the location b and the value z to location c.
The writes of z and y to location b are intermediate writes. Since
a = x;
b = y;
c = z;
Figure 2: When a thread exits a critical region it must flush its
buffer to main memory. The thread needs only to perform one
write for each unique location held in the buffer. The write should
correspond to the latest location value pair held in the buffer for a
given location.
they occurred within a critical region, no other thread would be able
to observe those values in a correctly synchronized program.
Although PRP slows down the low priority thread while it executes a critical region, it allows high priority threads wishing to
acquire the lock to unroll low priority threads. If a low priority
thread is in the process of executing its critical region, a high priority thread can signal the low priority thread to stop its execution
and immediately acquire lock. This is safe because the low priority thread has buffered all of its changes to shared state. Therefore,
memory is consistent with a state in which the low priority thread
has not executed the critical region. The WCET of acquiring the
lock when the low priority thread is executing its critical region is
the cost of signaling the thread and acquiring the lock.
However, what happens if the high priority thread wishes to
acquire the lock when a low priority thread is in the process of
flushing its buffer? We cannot simply allow the high priority thread
to acquire the lock as the low priority thread is in the middle of
updating shared state. In this case we fall back to traditional PIP.
The WCET of acquiring the lock when a low priority thread is
flushing its buffer is the cost to flush the buffer, effectively on the
order of the number of writes to the critical region.
2.2
Write Logging PRP
The difference between write logging and buffering is the data
stored in the log. Instead of storing the location and the value to be
updated, the write log instead stores the location and the value that
existed prior to the write. The write is then performed directly on
shared memory. Therefore, a write log can be viewed as the inverse
of the critical region which provides a way to restore the state of
memory to one existing prior to the execution of the critical region.
The main benefit of utilizing a write log over a write buffer
is that the read barrier can be omitted. The read barrier can be
omitted because the write proceeds to main memory and therefore
any subsequent reads to the same location are valid. The WCET of
a read in the write logging version of PRP is simply the cost of the
read itself.
A write barrier is still necessary even though the write occurs
to main memory since a copy of the old value must be stored in
the write log. Therefore, prior to a write, a location and value pair
is pushed onto the stack of the write log. The value is the value
currently held in the location. After the old value has been logged,
the write proceeds as normal. The WCET of a write in the write
logging version of PRP is the cost of a normal write as well as the
cost of adding the location and value pair to the write log. As an
example, consider Fig. 3. When the statement foo.bar = o; is
executed in a critical region protected by a PRP lock, the current
value stored in the location foo.bar, in this case m, is stored in
foo.bar = o;
foo.bar
Main Memory
foo.bar
a
x
b
y
m
4 + b;
b
c
z
foo.bar
m
a
x
b
y
c
z
3
Figure 3: On every write to shared memory the location currently
held in the destination is stored within the write log which acts as a
stack of location value pairs. Each read from a given location reads
directly from main memory.
the write log. A new location and value pair for foo.bar and
m is created and pushed onto the log stack. However, when the
statement 4 + b; is executed in a region protected by a PRP lock
no additional actions need to be performed. Since shared memory
contains all of the updates this critical region has performed, the
location b is guaranteed to hold the most recently written value (in
this case 3). Notice, with a write log, there is no need to flush when
a thread completes its critical region. This is because a successful
completion of the critical region yields the correct state of memory.
Each write performed in the critical region is propagated to main
memory. The write log can, thus, simply be discarded and the
memory reclaimed.
When a high priority thread wishes to acquire a lock that is held
by a lower priority thread, it must first signal the lower priority
thread. Since the state of memory is inconsistent with that prior to
the execution of the critical region by the lower priority thread, the
high priority thread must revert the writes done by the low priority
thread. The cost of doing so is the cost of performing one write
for each location and value pair stored in the write log. Notice,
similarly to the write buffering version of PRP, only one write per
location is necessary. Therefore, if a critical region has written to a
given location many times, only one write is necessary to revert the
state of memory to one prior to the critical region.
Unlike write buffer, the log, however, is traversed bottom up.
Notice that to undo a critical region’s updates to shared memory we
want to restore the logically earliest value read for a given location.
Since we store a location and value pair prior to every write the last
location and value pair stored in the log is the logically earliest. As
an example consider Fig. 4. Traversing bottom up we would write
the values z , y, and x for the locations c, b, and a.
3.
a
x
b
y
b
z
a
w
b
y
c
z
m
Related Work
Due to space limitations, we discuss only a small subset of related
work. Preemptible atomic regions, or PARs [5], have been proposed
as low-latency synchronization primitives for real-time systems and
have been implemented in OVM [1]. PARs allow for a critical region protected by a PAR to be reverted if a higher priority thread
attempts to enter another critical region also protected by a PAR, in
much the same fashion that a low priority thread executing within
c = z;
b = y;
a = w;
Figure 4: When a low priority thread is signaled to rollback in its
critical region it must flush its log to main memory. The thread
needs only to perform one write for each unique location held in the
log. The write should correspond to the logically earliest location
value pair held in the log for a given location.
a critical region protected by a PRP lock can be rolled back. This
ensure that a high priority thread can execute its critical region with
extremely low latency. Unlike PRP (which extends PIP), PARs introduce a new programming model requiring programmers to explicitly manage locks and PARs. Only one PAR may execute on a
given system. Thus two threads, both of which make use of PARs,
but whose critical regions are non-interfering, will nevertheless not
be allowed to execute concurrenly. Since PRP utilizes the underlying locking mechanism for concurrency control, two regions protected by different PRP locks will be able to execute concurrently.
Moreover, a two non-interfering, critical regions protected by PRP
locks are guaranteed to never impede each others’ progress. The
flexotask environment [2] includes a version of PARs with support
for multi-cores and atomic flexotasks, giving regular threads limited transactional access to a flexotask’s private memory.
References
[1] Austin Armbruster, Jason Baker, Antonio Cunei, Chapman Flack,
David Holmes, Filip Pizlo, Edward Pla, Marek Prochazka, and Jan
Vitek. A real-time java virtual machine with applications in avionics.
ACM Trans. Embed. Comput. Syst., 7(1):1–49, 2007.
[2] Joshua Auerbach, David F. Bacon, Rachid Guerraoui, Jesper Honig
Spring, and Jan Vitek. Flexible task graphs: a unified restricted thread
programming model for java. In LCTES ’08: Proceedings of the 2008
ACM SIGPLAN-SIGBED conference on Languages, compilers, and
tools for embedded systems, pages 1–11, New York, NY, USA, 2008.
ACM.
[3] Hao Cai and Andy Wellings. Temporal isolation in ravenscar-java. In
ISORC ’05: Proceedings of the Eighth IEEE International Symposium
on Object-Oriented Real-Time Distributed Computing, pages 364–371,
Washington, DC, USA, 2005. IEEE Computer Society.
[4] J. B. Goodenough and L. Sha. The priority ceiling protocol: A method
for minimizing the blocking of high priority ada tasks. In IRTAW ’88:
Proceedings of the second international workshop on Real-time Ada
issues, pages 20–31, New York, NY, USA, 1988. ACM.
[5] Jeremy Manson, Jason Baker, Antonio Cunei, Suresh Jagannathan,
Marek Prochazka, Bin Xin, and Jan Vitek. Preemptible atomic regions
for real-time java. In RTSS ’05: Proceedings of the 26th IEEE International Real-Time Systems Symposium, pages 62–71, Washington, DC,
USA, 2005. IEEE Computer Society.
[6] Rodolfo Pellizzoni, Patrick Meredith, Min-Young Nam, Mu Sun, Marco
Caccamo, and Lui Sha. Handling mixed-criticality in soc-based realtime embedded systems. In EMSOFT ’09: Proceedings of the seventh
ACM international conference on Embedded software, pages 235–244,
New York, NY, USA, 2009. ACM.
[7] L. Sha, R. Rajkumar, and J. P. Lehoczky. Priority inheritance protocols: An approach to real-time synchronization. IEEE Trans. Comput.,
39(9):1175–1185, 1990.
Download