NUMA aware heap memory manager

advertisement
How to use our resources wisely in multi-thread and
multi-proccessor systems
Michael Shteinbok
Shai Ben Nun
Supervisor: Dmitri Perelman
What’s the problem?
 While increasing number of cores/processors seems to
increase the performance in proportion, it usually
increases in lower numbers and sometimes even slows
us down.
Ok, But Why?
 Some physical issues can still cause a bottleneck. For
example Memory Access and Management.
 UMA – Uniformed Memory Access
Solution - NUMA



Non Uniform Memory Access - each CPU has it’s own close
memory with quick access.
One processor can access the other memory units but with
slower BW.
The NUMA API lets the programmer to manage the memory.
How One Manages The Memory?
 When we allocate and free, we are blind to the process
in the background. For this need we have different lib
function. (glibc, TCMalloc, TCMalloc NUMA Aware)
 The article refers to this management problem and
offers the TCMalloc NUMA Aware heap manager
Performance Change
 The Previous solutions(more BW, more local access):
Project Goals
 Improving the current TCMalloc numa aware in
scenarios it losses performance
 Managing wisely the memory of different cores and
offer a new “read” function that will be faster (TBD)
Problem In Current Solution
 When thread frees memory that was allocated by another
thread (on another numa node) we can get performance loss
Thread A
Thread A
X = malloc
X = Allocate
Thread B
Thread A
Free pool
Free (X)
Free (X)
Our Benchmark
 Each couple of threads on a different numa node.
 8 quad-cores processors => Total 16 couples.
Allocator
Thread
Alloc
Queue
Alloc Rand
List (5000)
First-touch
policy
Access
Memory
Free
Thread
Achieving Project Goals
Step By Step
 Learn how does the TCMalloc work
 Find scenarios that makes current TCM to loss
performance.
 Sort the scenarios by most likely to occur in real
environment
 Implement a support in these scenarios
 Make new TCMalloc to get better performance!
Download