Memory Operation and Performance

advertisement
Lecture 10 –
Memory Operation
and Performance
Caches – repeat some concepts
Virtual Memory (VM)
1
Example of a matrix
int data[M][N];
for (i = 0 ; i < N; i++) {
for (j = 0; j < M; j++) {
sum += data[i[j];
}
}
This is
a
MxN
matrix
2
Row-major and Column-major –
note the sequence
Row
major –
sequence of
access data
Column
major
3
Accessing a column-major
4
Accessing row data – is faster
It will be faster. It is because once it
accesses [0,0], it will load [0,1], [0,2] …up
to [1, 3] into the cache line.
Row major is faster than
column major
5
Changing the order of the iterations is not always
better. Below is an example.
int original[M][N];
int transposed[N][M];
for (i = 0; i < M; i++) {
for (j = 0; j < N; j++) {
transposed[i][j] = original[j][i];
}
}
6
Effect of rotating shape
Rotate by 90
degree
7
Insufficient Temporal Locality
// the solution is to add a square cache memory
int original[M][N];
int transposed[N][M];
for (k = 0; k < M / m; k++) {
for (l = 0; l < N / n; k++) {
for (i = k*m; i < (k+1)*m; i++) {
for (j = l*n; j < (l+1)*n; j++) {
transposed[i][j] = original[j][i];
}
}
}
}
8
Blocked transpose gets around cache misses
m and n must be a square and is determined by
the cache line size, say 32 bytes.
9
Virtual memory – Glossary
thrashing (n.) a phenomenon of virtual memory
systems that occurs when the program, by the
manner in which it is referencing its data and
instructions, regularly causes the next memory
locations referenced to be overwritten by recent or
current instructions. The result is that the
performance is slow.
thread (n.) a lightweight or small granularity process.
tiling (n.) A regular division of a mesh into patches,
or tiles. Tiling is the most common way to do
geometric decomposition.
10
Virtual Memory
virtual memory (n.) A system that stores portions of
an address space that are not being actively used.
When a reference is made to a value not presently
in main memory, the virtual memory manager must
swap some values in main memory for the values
required.
Virtual memory is used by almost all uniprocessors
and multiprocessors, but not array processors and
multicomputers.
Muticomputers still employ real memory storage
only on each node.
11
Virtual Memory (VM)
The term virtual memory refers to a combination
of hardware and operating system software that
solves several computing problems.
It receives a single name because it is a single
mechanism, but it meets several goals:
To simplify memory management and
program loading by providing virtual
addresses.
To allow multiple large programs to be run
without the need for large amounts of RAM,
by providing virtual storage.
12
Virtual Addresses
Segmentation – group pages together with
different size
Memory Protection – due to the support of more
than ONE process, to protect the memory being
corrupted by others
Paging – use the same size in disk and memory
and load it into memory or from memory to dis.
But computers hold several programs in memory
at the same time.
13
Page and Segmentation
Page
16K
Page
16K
Page
16K
Page
16K
Page
16K
Page
16K
Page
16K
14
Memory Protection
If there are more
than two
processes
(programs in the
memory), there is
a need to protect
the programs not
to modified by
others.
Program 1
memory
Program 2
15
contradictory about VM facts:
The compiler determines the
address at which a program
will execute, by hard-wiring a
lot of addresses of variables
and instructions into the
machine code it generates.
The location of the program is
not determined until the
program is executed and may
be anywhere in main memory.
Program 1
Program 2
memory
16
Solution to contradictory facts
Code Relocation: Have the compiler generate
addresses relative to a base address, and change
the base address when the program is executed.
This means that the address of each reference is
calculated explicitly by adding the relative address to the
base address. This is the Drawback.:
Address Translation: At run time, provide
programs the illusion that there are no other
programs in memory. Compilers can then
generate any absolute address they wish. Two
programs may contain references to the same
address without interference.
17
Virtual and Physical Addresses
The addresses issued by the compiler
are called virtual addresses.
The addresses that result from the
translation are called physical
addresses, because they refer to an
actual memory chip.
18
Multiple programs without
relocation
19
Relocatable code can share memory
20
Segment
A segment is a region of the address space
of varying length. In the next figure, there
are two segments, one used to store
program A and the other, program B. Each
segment can be mapped to a region of
physical memory independently, as shown,
but the whole segment has to be translated
as one contiguous (continuous) chunk.
21
Segment address translation
22
Memory Protection
It is to protect the memory from
modifying by others.
This is important not only to prevent
malicious attacks or eavesdropping but
Memory
also to contain unintended catastrophic
errors.
If a computer has ever frozen or crashed
on you, you have probably experienced a
bug in one program careening out of
control and trampling over the memory of
other programs as well as that of the
operating system. Address translation is
the foremost tool in preventing such
behavior.
23
Paging
the allocation of memory into chunks of varying size
causes external fragmentation.
To solve this problem we can change the nature of the
address translation so that, instead of mapping virtual
to physical address in big chunks of varying size, it
maps them in small chunks of constant size,
24
An example of Paging
25
Page fault
The page needed is not in the memory.
The operating system will load it from
the disk (virtual memory)
It takes time to load from disk
The performance is down
The performance is measured in terms
of number of page faults.
A program having a page fault of 10 is
better than a program with 20 page
faults.
1
2
memory
4
3
26
Page fault – not in the main memory, has to load for disk
27
Working Sets
The working set of a program is the set of memory
pages that the program is currently using actively.
The principle of locality suggests that the working set
of a program will be, at any given time, much smaller
than the memory used by the program over its lifetime.
The working set will change as the program executes.
It will change both in the exact pages that are members
of it and in the number of pages.
The working set of a program will expand and contract
as the program's locality becomes more or less
constrained.
It is the size of the working set that is important in
choosing a victim program.
1
2
memory
28
Thrashing
When the working set is smaller, it causes the
operating system to re-load to the same memory
locations. The performance is affected by this as
it will create many collisions.
The computer will be doing a lot of work moving
pages back and forth between memory and disk,
but no useful work will get done. This situation is
often referred to as thrashing.
CPU is busy but is not productive,
as it loads data without executing
29
Thrasing – here, the program has insufficient memory
to execute and load it from memory
It
performs
swaps in
and out
30
Relationship between working set and page fault
Better to keep a small number
31
Impact of VM on Performance
int data[M][N];
for (i = 0 ; i < N; i++){
for (j = 0; j < M; j++){
sum += data[j][i]; } } //column major – more
page fault
32
Impact of VM on Performance
int data[M][N];
for (j = 0 ; j < N; j++){
for (i = 0; i < M; i++){
sum += data[j][i]; } } //row major – less
page fault
33
summary
Make use of cache size – it means to load
up to 32 or 64 bytes to the cache
Understand the row major against column
major to gain performance
Try to reduce the page fault (page fault
means that the page is not in main memory,
the CPU has to load from disk.)
34
Operating System Interaction
Dynamic Linking
Time-Sharing
Threads
35
Dynamic Linking
Libraries
Dynamic-Link Libraries (DLLs)
Example of DLL
36
Libraries
Almost all programs are composed
from many separately compiled units.
When you write a single-file program,
it is compiled to a representation of
machine instructions called an object
file.
For example, Visual C++ creates
an .obj file from your C++ source
code. The .obj file may seem to be a
complete program, but there is much
more code required to make it
complete.
Your
code
library
37
Reason of using library
Don’t
memorise
1 many functions, such as memory allocation, do not
require special privileges to perform, and they do not take
much CPU time. If these functions were invoked using a
time-consuming system call, it would have a dramatic
impact on performance. It is much faster to implement
them as simple functions.
2 these functions are language specific. OSs are language
independent, and it would greatly complicate the OS to
provide run-time support for all languages, even if that
were possible.
3 even when system calls are required, some additional
"glue" code is needed to translate between the standard
language interface, such as printf() or operator <<, and
the calling convention that is needed to set up parameters
38
and invoke a trap instruction.
Library in Visual C++
39
Example of linking
40
Explanation - static
In the above diagram, the application object has
to link with malloc and callinig main() to form an
executable (exe) file.
Your
code
Run time
library
41
Dynamic-Link Libraries (DLLs)
dynamic linking means where linking is performed on
demand at runtime.
An advantage of dynamic linking is that executable files
can be much smaller than statically linked executables.
Of course, the executable is not complete without all of the
associated library files, but if many executables share a set
of libraries, there can be a significant, overall savings.
Run time
Your
code
library
42
Don’t
memorise
Advantage of DLL (1)
In most systems, the space savings extend to memory.
When libraries are dynamically linked, the operating system can
arrange to let applications share the library code so that only one
copy of the library is loaded into memory.
With static linking, each executable is a monolithic binary program.
If several programs are using the same libraries, there will be several
copies of the code in memory.
Run time
Your
code
library
library
library
43
Advantage of DLL(2)
Another potential memory savings comes from the fact
that dynamically linked libraries do not necessarily need
to be loaded. For example, an image editor may support
input and output of dozens of file formats. It could be
expensive (and unnecessary) to link conversion routines
for all of these formats.
With dynamic linking, the program can link code as it
becomes useful, saving time and memory. This can be
especially useful in programs with ever growing lists of
features.
44
Disadvantage of DLL
First, there are version problems. Like all
software, libraries tend to evolve. New libraries
may be incompatible with old libraries in subtle
ways. If you update the libraries used by a
program, it may have good, bad, or no effects on
the program's behavior. In contrast, a statically
linked program will never change its behavior
unless the entire program is relinked and installed.
45
Summary
Dynamic link is to combine the library
during run time
It reduces program size, but causes version
problem.
46
Download