Measuring Media Gateway Software Efficiency Using Performance Monitor Counters

advertisement
Measuring Media Gateway Software Efficiency
Using Performance Monitor Counters
Mikko Viitanen
S-38.310 Thesis seminar on networking technology
Helsinki University of Technology
12.05.2005
12.5.2005
1
Mikko Viitanen
Basic Information
• Thesis written at Oy L M Ericsson Ab, Finland
• Supervisor: Professor Jörg Ott
• Instructors: M. Sc. Stefan Blomqvist
M. Sc. Dietmar Fiedler
12.5.2005
2
Mikko Viitanen
Contents
•
•
•
•
•
•
•
•
•
Background
Problem Description
Objectives
Scope
Performance Monitor Counters
Memory Hierarchy
Measurement Environment
Results
Future Work
12.5.2005
3
Mikko Viitanen
Background (1/2)
• The Universal Mobile Telecommunications System
(UMTS) is a third generation mobile network standard
specified by the 3rd Generation Partnership Project (3GPP)
• UMTS network is based on the GSM and GPRS
• UMTS specifications and features grouped into releases
– Enable vendors to make interoperable networks
12.5.2005
4
Mikko Viitanen
Background (2/2)
• Release 4 introduced the layered network architecture
– The Mobile services Switching Centre (MSC) was divided into
the MSC server and the Circuit-Switched Media Gateway (CSMGW).
– The MSC server handles the call control.
– The Media Gateway (MGW) handles the media and the bearer
control.
12.5.2005
5
Mikko Viitanen
Problem Description
• The Media Gateway is a real-time multiprocessor system
• A common problem in complex systems is how to verify
and measure software performance
• Performance monitor counters offers a way to monitor
code efficiency on the processor level
• The following problems are dealt with in this thesis:
– Which kind of efficiency problems can be found by using the
performance monitor counters?
– Which kind of programming methods should be used to reach
better results than before?
12.5.2005
6
Mikko Viitanen
Objectives
• The purpose is to get results that can be used to find
efficiency problems in the MGW’s software
• Find ways to improve the system performance
12.5.2005
7
Mikko Viitanen
Scope
• The MGW’s software will be introduced
• The software development tools used in the MGW
software development will be presented
• Overall software performance issues will be discussed
• Performance Monitor Counters measurement method is
explained
12.5.2005
8
Mikko Viitanen
Performance Monitor Counters (1/2)
• Performance Monitor Counters (included into many
PowerPC family processors) are special registers for the
usage of performance measurement.
• The measurements are implemented in runtime. The
processor steps the registers when monitored events
occur.
• Due to the fact that the method uses special resources
built into the processor in parallel with others, it does not
affect system performance and that is why it can provide
very realistic results.
12.5.2005
9
Mikko Viitanen
Performance Monitor Counters (2/2)
The following events can be measured:
• Completed instructions per processor clock cycles
• Memory hierarchy behavior (e.g. cache misses)
• Usage of different execution units
• Types of instructions dispatched
• Branch predictions
• etc
12.5.2005
10
Mikko Viitanen
Memory Hierarchy
• Fetching data from different parts of the memory system
requires different amounts of time/cycles
0 cycles
7 cycles
18 cycles
70 cycles
Registers
L1 cache
L2 cache
Main
memory
Source for estimations: IBM PowerPC 740 / PowerPC 750
RISC Microprocessor User’s Manual.
12.5.2005
11
Mikko Viitanen
Measurement Environment (1/2)
• The first M-MGW (a complete node) is the System Under Test (SUT).
The second M-MGW is a dummy one, not connected to any access
networks. It just answers the SUT’s requests.
• Several Catapult DCT2000s initiate all the traffic (act as
UTRAN/GERAN simulators).
• UPLoad generates user plane traffic according to Q.AAL2 signaling
received from Catapults.
• The MSC server is a real node, which is controlled by the Catapults. It
manages both of the M-MGWs.
• TTCN is used to initialize the PMC measurement procedure by
activating the PMC registers and specifying the measured events.
12.5.2005
12
Mikko Viitanen
Measurement Environment (2/2)
Catapult
Catapult
Catapult
DCT2000
DCT2000
DCT2000
RANAP
MSC server
GCP
GCP
Q.AAL2
UPLoad
Q.AAL2
Nb (ATM, TDM)
M-MGW
(system under test)
M-MGW
AAL2
PMC control
TTCN
12.5.2005
13
Mikko Viitanen
Results (1/3)
• L2 instruction cache misses
affect quite severely to IPC
(Instructions Per Clock cycles)
12.5.2005
80
70
60
IPC (%)
– Most probably the main reason
for the large delay is that when
an L2 instruction cache miss
occurs, the processor cannot
execute the following
instructions, because the
missing instruction can affect
the next ones. The processor
has to wait until the missing
instruction is available.
90
50
40
30
20
10
0
0
5
10
15
20
25
30
35
L2 instruction misses (%)
14
Mikko Viitanen
Results (2/3)
• Different amounts of load have quite small effect on the
results when comparing the IPC values in general.
However, there exist some measurement points that face
a strong impact when increasing the load.
• What is then common for these points that got a lot better
IPC values during high load? They all contain data
structure operations, such as searches, adds and
removes. When the system is having a high load, the
number of elements in these data structures is
considerable and managing data structures can be done
efficiently from the processor’s point of view.
12.5.2005
15
Mikko Viitanen
Results (3/3)
• The amount of code in the
operation has an effect on the
IPC value. The lengths of the
measured pieces of code differ
quite a lot.
12.5.2005
90
80
70
IPC (%)
– The usage of complicated state
machines is the main reason for
low IPC values in short
operations. When code is
generated from a state machine
with small pieces of code, the
program is very fragmented
(contains numerous small
blocks).
100
60
50
40
30
20
10
0
0
20000
40000
60000
80000
100000
120000
Processor cycles
16
Mikko Viitanen
Future Work
• Topics for future work:
– Comparing the results to some other pieces of software that are
implemented using different development tools.
– The comparison can also be done by using different processors.
For instance, if there would be a similar processor that would have
double sized L1 and L2 caches, the results would surely be
different.
12.5.2005
17
Mikko Viitanen
Thank you!
Questions or comments?
12.5.2005
18
Mikko Viitanen
Download