apandya, cgopal, mdharkar

advertisement
apandya, cgopal, mdharkar
“SYNERGY” – A SHARED RESOURCE AWARE PRIORITIZATION ALGORITHM FOR CHIP MULTIPROCESSORS
Problem Statement:
Recent Multicore processors employ a different request prioritization policy at each shared resource
individually. They do not take into account the inter-dependence between shared resources in order to
process a single request. We believe that developing a holistic policy that incorporates the knowledge
acquired from each resource (core, cache, interconnect, memory controller) could help us improve the
system throughput and ensure fairness in resource allocation.
Related Work:
All the proposed algorithms in the recent research focus on defining a prioritization policy based on a
single shared resource. Most the recent scheduling algorithms like ATLAS[2], PAR-BS[3] are
implemented on the memory controller and employ an out-of-order scheduling for maximizing row
access locality and bank-level parallelism.
The prioritizing policy proposed by Das et all [4] are application aware, distinguishing applications based
on the stall time criticality of their packets.
As proposed by Yuan et al [1], the memory request stream is reordered at the interconnect in turn
justifying the use of a simple, in-order DRAM scheduler. They consider row access locality for prioritizing
the memory access requests at the interconnect, without considering the interconnect latencies.
Kim et all [2] aimed to make the prioritization decision based on the thread ranking given by the least
attained-service to maximize system throughput and a long time quantum to provide scalability. Also
Mutlu et all [3] propose to exploit the bank level parallelism of the memory accesses while providing
fairness to individual threads by reducing the memory related stall time experienced by each thread.
While these policies have their own benefits, we believe that coordinating the decisions made at each
shared resource would help in enhancing the overall system performance.
Resource
Prioritization parameter
Router
1)Amount of Deflection in the
network space
2) Turnaround time of the packets
(using information received from
core)
Memory
Controller
Row Buffer Locality (using
information received from router)
apandya, cgopal, mdharkar
How to solve the problem:
We would consider a mesh network as our base topology to incorporate the shared resource awareness.
While deciding priorities at the NoC and at the memory controller, we iteratively converge to a single
decision for prioritizing each memory request. The metrics we would take into account at the
interconnect (NoC) are the deflection count of the packets and the turnaround time of the previous
packet at that particular core. Sine a mesh network has certain congestion points, considering the
deflection that each packet has suffered in a network would give us a reasonable prioritization metric. In
case of a conflict with another request, the router would look at the turn around time passed on by the
requesting core. This turnaround time would be calculated considering a window of multiple
instructions.
When each NoC sends a packet intended for a particular destination it uses a single virtual channel thus
reserving specific channels for each memory controller. This static virtual channel allocation (SVCA)
helps in reordering the packets based on the priorities calculated.
On reaching the memory controller, the controller would consider the priorities that have been sent by
the router, and would incorporate row buffer locality to further improve the memory access.
To optimize the performance of the prioritization policy , we propose to look at the prefetching in NoCs
and data migration in the shared cache.
Description of experimental setup:We would use a mesh topology for the simulations. We intend to use the BLESS simulator for
implementing the above mentioned prioritization algorithm. The benchmark suite we intend to use is
SPEC CPU 2000.
Brief research plan:The goal is to come up with a holistic scheduling algorithm that exploits the knowledge of each resource.
1) Milestone 1:We plan to set up the simulation environment required for our experiments and also
implement the baseline on this environment.
2) Milestone 2: We would be implementing the router prioritization policy considering deflection and
turnaround time and incorporate the row buffer locality as a priority metric at the memory controller
scheduler.
3)Milestone 3: Evaluate the performance of our algorithm possibly defining new metrics for the priority
decisions. If needed , we would look at the prefetching in the NoCs and data migration to optimize the
effectiveness of our algorithm.
apandya, cgopal, mdharkar
References:[1]Complexity Effective Memory Access Scheduling for Many-core Accelerator Architectures. Yuan,
Bakhoda, Aamodt[MICRO2009]
[2] ATLAS: A scalable and high performance scheduling algorithm for multiple memory controllers. Kim,
Han, Mutlu, Harchol-Balter [MICRO2009]
[3] PAR-BS: Parallelism Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared
DRAM Systems. Mutlu, Moscibroda[ISCA2008]
[4] Application aware prioritization mechanisms for On-Chip Networks. By Das, Mutlu, Moscibroda and
C.Das [MICRO2009]
Download