Notary: Hardware Techniques to Enhance Signatures Luke Yen

advertisement

Notary:

Hardware Techniques to Enhance Signatures

Luke Yen

Collaborator: Prof. Stark C. Draper

Advisor: Prof. Mark D. Hill

University of Wisconsin, Madison

MICRO-41 - November 11, 2008 www.cs.wisc.edu/multifacet/papers/micro08_notary.pdf

Executive Summary

Tackle 2 problems with hardware signatures:

• Problem 1: Best signature hashing (i.e., H

3

) has high area & power overheads

• Solution 1: Use entropy analysis to guide lower-cost hashing

(Page-Block-XOR, PBX) that performs similar to H

3

– Ex: 160 gates for H

3 vs 20 gates for PBX

• Problem 2: Spurious signature conflicts caused by signature bits set by private memory addrs

• Solution 2: Avoid inserting private stack addrs, propose privatization interface for higher performance

University of Wisconsin-Madison

4/11/2020 2

Outline

• Signature background

• Entropy

• Entropy results & PBX

• Privatization

• Methodology & workloads

• Results

• Conclusions & Future Work

3 4/11/2020

University of Wisconsin-Madison

Signature background

• Signatures (hardware Bloom filters) used to summarize and detect conflicts with a transaction’s read- and write-sets

– Inspired by Bulk system [Ceze,ISCA’06]

– Implemented in LogTM-SE [Yen,HPCA’07]

– Can have false positives, but never false negatives

– Also proposed for non-TM purposes (e.g., SC violation detection, atomicity violation detection, race recording)

• Ex: Use k Bloom filters of size m/k , with independent hash functions

4 University of Wisconsin-Madison

4/11/2020

Signature hash functions

• Which hash function is best? [Sanchez, MICRO’07]

– Bit-selection? Hash simply decodes some number of input bits

– H

3

? Each bit of a hash value is an XOR of (on avg.) half of the input address bits

LogTM-SE w/

2kb signatures

• Result: H

3 better with >=2 hash functions

• However, H

3 uses many multi-level XOR trees

•Can we improve this?

4/11/2020 5 University of Wisconsin-Madison

H

3

implementation

• Num XOR

addr length in bits

c

k

4

• Ex: 2kb signatures, k =2, c =10, 32-bit addr = 160 XOR gates per signature

• Can we reduce the total gate count?

University of Wisconsin-Madison

4/11/2020 6

Outline

• Signature background

• Entropy

• Entropy results & PBX

• Privatization

• Methodology & workloads

• Results

• Conclusions & Future Work

7 4/11/2020

University of Wisconsin-Madison

Entropy overview

• Not all address bits have equal randomness

– Ex: High-level address bits unlikely to change if working set size is small

• Key insight: If input bits are random and those bits are used as inputs to hash functions, random hash values result

– Use entropy to measure bit randomness

• Entropy – measure of the uncertainty of a random variable x

University of Wisconsin-Madison

4/11/2020 8

Entropy formally defined

• Entropy =  i

N 

1 p ( x i

) log

2

( p ( x i

))

• p ( x i

) = the probability of the occurrence of value x i

• N = number of sample values random variable x can take on

• Entropy = amount of information required on average to describe outcome of variable x (in bits)

– Ex: What is the best possible lossless compression?

Other cases

0 bits n bits max min Entropy value of n -bit field

4/11/2020 n -bit field has constant value

9

All bit patterns in n -bit field equally likely

University of Wisconsin-Madison

Our measures of entropy

• For our workloads, we care about:

• Q1: What is the best achievable entropy?

– Global entropy – upper bound on entropy of address

• Q2: How does entropy change within an address?

– Local entropy – entropy of bit-field within the address

31 Addr

Global entropy

6 31 6

NSkip

University of Wisconsin-Madison

4/11/2020 10

Outline

• Signature background

• Entropy

• Entropy results & PBX

• Privatization

• Methodology & workloads

• Results

• Conclusions & Future Work

11 4/11/2020

University of Wisconsin-Madison

Entropy results

• Workloads to be described later

• Global entropy is at most 16 bits

• Bit-window for local entropy is 16 bits wide (NSkip from 0-10)

– Smaller windows (<16b) may not reach global entropy value

– Larger windows (>16b) hides some fine-grain info

4/11/2020 12 University of Wisconsin-Madison

Entropy results summary

• More entropy results in our MICRO paper

• In summary, for our workloads entropy monotonically decreases when moving towards high-order bits

– We calculate the average entropy across the entire workload’s execution

– May miss entropy changes due to program phase behavior

• Our Page-Block-XOR (PBX) hash takes advantage of this overall trend

University of Wisconsin-Madison

4/11/2020 13

Page-Block-XOR (PBX)

• Motivated by 3 findings:

– (1) Lower-order bits have most entropy

• Follows from our entropy results

– (2) XORing two bit-fields produces random hash values

• From prior work on XOR hashing (e.g., data placement in caches, DRAM)

– (3) Bit-field overlaps can lead to higher false positives

• Correlation between the two bit-fields can reduce the range of hash values produced (worse for larger signatures)

University of Wisconsin-Madison

4/11/2020 14

PBX implementation

• For 2kb signatures with 2 hash functions:

– 20 XOR gates for PBX vs 160 XOR gates for H

3

!

• PPN and Cache-index fields not tied to system params:

• Use entropy to find two non-overlapping bit-fields with high randomness

4/11/2020 15 University of Wisconsin-Madison

Summary thus far

• Problem 1: H

3 has high area & power overheads

• Solution 1: Use entropy analysis to guide lower-cost PBX

– Ex: 160 gates for H

3 vs 20 gates for PBX

• Problem 2: Spurious signature conflicts caused by signature bits set by private memory addrs

• Solution 2: To be described

16 University of Wisconsin-Madison

4/11/2020

Outline

• Signature background

• Entropy

• Entropy results & PBX

• Privatization

• Methodology & workloads

• Results

• Conclusions & Future Work

17 4/11/2020

University of Wisconsin-Madison

Motivation

• False conflicts caused by thread-private addrs

– Avoid conflicts if addrs not inserted in thread’s signatures

4/11/2020 18 University of Wisconsin-Madison

Privatization solutions

• Two solutions proposed:

– (1) Remove private stack references from sigs.

• Very little work for programmer/compiler

• Benefits depend on fraction of stack addresses versus all transactional references

– (2) Language-level interface (e.g., private_malloc() , shared_malloc() )

• Even higher performance boost

• For skilled programmer

• WARNING: Incorrectly marking shared objects as private can lead to program errors!

University of Wisconsin-Madison

4/11/2020 19

Page-based implementation

• Each page is assigned a status, private or shared

– Invariant: Page is shared if any object is shared

• If stack is private, library marks stack pages as private

• If using privatization heap functions, mark heap pages accordingly

20 University of Wisconsin-Madison

4/11/2020

OS support

• OS allocates different physical page frames for shared and private pages

– Sets a per-frame bit in translation entry if shared

– Reduce number of page frames used by packing objects with same status together

• Signatures insert memory addresses of transactional references to shared pages

– Query page sharing bit in HW TLB & current transactional status

University of Wisconsin-Madison

4/11/2020 21

Outline

• Signature background

• Entropy

• Entropy results & PBX

• Privatization

• Methodology & workloads

• Results

• Conclusions & Future Work

22 4/11/2020

University of Wisconsin-Madison

Methodology

• Full-system simulation using Simics and Wisconsin

GEMS timing modules

• Transistor-level design for area & power of XOR gates

• CACTI for Bloom filter bit array area & power

• Simulated system

– Single-chip CMP

– 16 single-threaded,in-order cores

– 32kB, 4-way private L1 I & D, write-back

– 8MB, 8-way shared L2 cache

– MESI directory protocol

– Signatures from 64b-64kb (8B-8kB) & “Perfect”

4/11/2020 23 University of Wisconsin-Madison

Workloads

• Micro-benchmarks

– BTree – read and write ops on shared tree

– Sparse Matrix – algorithm from dense column vector multiplication kernel

• SPLASH-2 apps

– Barnes & Raytrace – exert most signature pressure

• Stanford STAMP apps

– Vacation, Genome, Delaunay, Bayes, Labyrinth

• DNS server

– BIND

University of Wisconsin-Madison

4/11/2020 24

Outline

• Signature background

• Entropy

• Entropy results & PBX

• Privatization

• Methodology & workloads

• Results

• Conclusions & Future Work

25 4/11/2020

University of Wisconsin-Madison

PBX vs H

3

area & power

• Area & power overheads (2kb, k=4):

Type of overhead

Area

(mm 2 )

Bloom filter bit array

H

3 hash PBX hash

H

3 sig.

PBX sig.

% savings for PBX sig.

2.70e-2 8.10e-3 4.70e-4 3.50e-2 2.70e-2 23

Power

(mW)

1.80e2

1.04e1

1.02

1.90e2

1.81e2

4.7

University of Wisconsin-Madison

4/11/2020 26

PBX vs H

3

execution time

4/11/2020

PBX performs similar to H

3

Additional workload results in paper

27 University of Wisconsin-Madison

Privatization results summary

• Removing private stack references from signatures did not help much

– Most addr references not to stack

– Most likely because running with SPARC ISA. Other ISAs

(e.g., x86) likely has more benefits

• Privatization interface helps four workloads

– Remainder either does not have private heap structures or does not have high transactional duty cycle

28 University of Wisconsin-Madison

4/11/2020

Privatization interface results

4/11/2020 29 University of Wisconsin-Madison

Outline

• Signature background

• Entropy

• Entropy results & PBX

• Privatization

• Methodology & workloads

• Results

• Conclusions & Future Work

30 4/11/2020

University of Wisconsin-Madison

Conclusions

• Tackle 2 problems with signature designs:

– (1) Area and power overheads of H

3 hashing

• E.g., 160 XOR gates for H

3

, 20 for PBX

– (2) False conflicts due to signature bits set by private memory references

• Our solutions:

– (1) Use entropy analysis to guide hashing function (PBX), a low-cost alternative that performs similarly to H

3

– (2) Prevent private stack references from entering signatures, and propose a privatization interface for heap allocations

• Notary can be applied to non-TM uses:

– PBX hashing can directly transfer

– Privatization may transfer if addr filtering applies

4/11/2020 31 University of Wisconsin-Madison

Future Work

• Dynamic entropy calculation:

– How to adapt PBX hashing to entropy changes over time?

• Dynamic privatization characteristics:

– How common is it for objects to change sharing status (i.e., from private to shared, and vice versa)?

32 University of Wisconsin-Madison

4/11/2020

BACKUP SLIDES

4/11/2020 33 University of Wisconsin-Madison

Privatization interface

Privatization function shared_malloc(size), private_malloc(size) shared_free(ptr), private_free(ptr) privatize_barrier(num_threads, ptr, size), publicize_barrier(num_threads, ptr, size)

Usage

Dynamic allocation of shared and private memory objects

Frees up memory allocated by shared or private allocators

Program threads come to a common point to privatize or publicize an object. Must be used outside of transactions

University of Wisconsin-Madison

4/11/2020 34

Dynamic privatization

• Dynamically switch from private to shared, and vice versa

• If transitioning from private -> shared, safe to mark page as shared (at cost of performance)

• If transitioning from shared -> private, default policy is to disallow if there exists other shared objects on same page

• Otherwise, trap to user software and let programmer call shared_free(), followed by private_malloc() on object

University of Wisconsin-Madison

4/11/2020 35

Bit-field overlaps harmful for PBX

4/11/2020 36 University of Wisconsin-Madison

Removing stack refs doesn’t help significantly

4/11/2020 37 University of Wisconsin-Madison

Entropy of commercial workloads

4/11/2020 38 University of Wisconsin-Madison

4/11/2020

Signature Operation Example

Program: xbegin

LD A

ST B

LD C

LD D

ST C

Hash Function(s)

R

W

00100 1 00

0 0 100010

ALIAS

NO CONFLICT

CONFLICT!

39 University of Wisconsin-Madison

Type of Hash Functions

• In real programs, addresses neither independent nor uniformly distributed (key assumptions to derive

P

FP

(n))

• But can generate hash values that are almost uniformly distributed and uncorrelated with good

(universal/almost universal) hash functions

• Hash functions considered:

Bit-selection

(inexpensive, low quality)

4/11/2020 40

H

3

[Carter, CSS79]

(moderate, higher quality)

University of Wisconsin-Madison

Download