lesson21

advertisement
Prelude to Multiprocessing
Detecting cpu and system-board
capabilities with CPUID and the
MP Configuration Table
CPUID
• Recent Intel processors provide a ‘cpuid’
instruction (opcode 0x0F, 0xA2) to assist
software in detecting a CPU’s capabilities
• If it’s implemented, this instruction can be
executed in any of the processor modes,
and at any of its four privilege levels
• But this ‘cpuid’ instruction might not be
implemented (e.g., 8086, 80286, 80386)
Intel x86 EFLAGS register
31
21
0
I
D
0
0
0
0
0
0
0
0
0
16
V
I
P
V
I
F
A
C
V
M
R
F
15
0
0
N
T
IOPL
O
F
D
F
I
F
T
F
S
F
Z
F
0
A
F
0
P
F
1
C
F
Software can ‘toggle’ the ID-bit (bit #21) in the 32-bit EFLAGS register
if the processor is capable of executing the ‘cpuid’ instruction
But what if there’s no EFLAGS?
• The early Intel processors (8086, 80286)
did not implement any 32-bit registers
• The FLAGS register was only 16-bits wide
• So there was no ID-bit that software could
try to ‘toggle’ (to see if ‘cpuid’ existed)
• How can software be sure that the 32-bit
EFLAGS register exists within the CPU?
Detecting 32-bit processors
• There’s a subtle difference in the way the
logical shift/rotate instructions work when
register CL contains the ‘shift-factor’
• On the 32-bit processors (e.g., 80386+)
the value in CL is truncated to 5-bits, but
not so on the 16-bit CPUs (8086, 80286)
• Software can exploit this distinction, in
order to tell if EFLAGS is implemented
Detecting EFLAGS
# Here’s a test for the presence of EFLAGS
mov $-1, %ax
# a nonzero value
mov $32, %cl
# shift-factor of 32
shl
%cl, %ax
# do logical shift
or
%ax, %ax # test result in AX
jnz
is32bit
# EFLAGS present
jmp is16bit
# EFLAGS absent
Testing for ID-bit ‘toggle’
# Here’s a test for the presence of the CPUID instruction
pushfl
# copy EFLAGS contents
pop
%eax
# to accumulator register
mov
%eax, %edx
# save a duplicate image
btc
$21, %eax
# toggle the ID-bit (bit 21)
push %eax
# copy revised contents
popfl
# back into EFLAGS
pushfl
# copy EFLAGS contents
pop
%eax
# back into accumulator
xor
%edx, %eax
# do XOR with prior value
bt
$21, %eax
# did ID-bit get toggled?
jc
y_cpuid
# yes, can execute ‘cpuid’
jmp
n_cpuid
# else ‘cpuid’ unimplemented
How does CPUID work?
• Step 1: load value 0 into register EAX
• Step 2: execute ‘cpuid’ instruction
• Step 3: Verify ‘GenuineIntel’ characterstring in registers (EBX,EDX,ECX)
• Step 4: Find maximum CPUID input-value
in the EAX register
Version and Features
• load 1 into EAX and execute CPUID
• Processor model and stepping information
is returned in register EAX
27
20 19
16
Extended Extended
Family ID Model ID
13 12 11
Type
8 7
Family
ID
4 3
Model
0
Stepping
ID
Some Feature Flags in EDX
28
H
T
T
13
9
3
2
1
0
P
G
E
A
P
I
C
P
S
E
D
E
V
M
E
F
P
U
HTT = HyperThreading Technology (1 = yes, 0 = no)
PGE = Page Global Entries (1=yes, 0=no)
APIC = Advanced Programmable Interrupt Controller on-chip (1 = yes,0 = no)
PSE = Page-Size Extensions (1 = yes, 0 = no)
DE = Debugging Extensions (1=yes, 0=no)
VME = Virtual-8086 Mode Enhancements (1 = yes, 0 = no)
FPU = Floating-Point Unit on-chil (1=yes, 0=no)
Some Feature Flags in ECX
5
V
M
X
VMX = Virtual Machine Extensions (1 = yes, 0 = no)
Multiprocessor Specification
• It’s an industry standard, allowing OS software
to use multiple processors in a uniform way
• OS software searches in three regions of the
physical address-space below 1-megabyte for a
“paragraph-aligned” data-structure of length 16bytes called the MP Floating Pointer Structure:
– Search in lowest KB of Extended Bios Data Area
– Search in topmost KB of conventional 640K RAM
– Search in the 128KB ROM-BIOS (0xE0000-0xFFFFF)
MP Floating Pointer Structure
• This structure may contain an ID-number
for one a small number of standard SMP
system architectures, or may contain the
memory address for a more extensive MP
Configuration Table having entries that
specify a “customized” system architecture
• The machines in our classroom employ
the latter of these two options
An example record
• The MP Configuration Table will contain
a record for each logical processor
reserved (=0)
reserved (=0)
Feature Flags
CPU signature (stepping, model, family)
CPU Flags
BP (bit 1), EN (bit 0)
Local-APIC
version
Local-APIC
ID
Entry Type
0
BP = Bootstrap Processor (1=yes, 0=no), EN = Enabled (1=yes, 0=no)
Our ‘mpinfo.cpp’ utility
• We created a Linux utility that will display
the system-information contained in the
MP Configuration Table (in hex format)
• You can refer to the ‘MP Specification 1.4’
document (online) to interpret this display
• This utility needs a device-driver ‘dram.c’
to be pre-installed (in order that it be able
to directly access the system’s memory)
A processor’s Local-APIC
• The purpose of each processor’s APIC is to
allow the CPUs in a multiprocessor system to
send messages to one another and to manage
the delivery of the interrupt-requests from the
various peripheral devices to one (or more) of
the CPUs in a dynamically programmable way
• Each processor’s Local-APIC has a variety of
registers, all ‘memory mapped’ to paragraphaligned addresses within the 4KB page at
physical-address 0xFEE00000
Local-APIC’s register-space
APIC
0xFEE00000
4GB physical
address-space
RAM
0x00000000
Analogies with the PIC
• Among the registers in a Local-APIC are
these (which had analogues in the older
8259 PIC’s design:
– IRR: Interrupt Request Register (256-bits)
– ISR: In-Service Register (256-bits)
– TMR: Trigger-Mode Register (256-bits)
• For each of these, its 256-bits are divided
among eight 32-bit register addresses
New way to do ‘EOI’
• Instead of using a special End-Of-Interrupt
command-byte, the Local-APIC contains a
dedicated ‘write-only’ register (named the
EOI Register) which an Interrupt Handler
writes to when it is ready to signal an EOI
# issuing EOI to the Local-APIC
mov
$0xFEE00000, %ebx
movl
$0, %fs:0xB0(%ebx)
# address of the cpu’s Local-APIC
# write any value into EOI register
# Here we assume segment-register FS holds the selector for a segment-descriptor
# for a ‘writable’ 4GB-size expand-up data-segment whose base-address equals 0
Each CPU has its own timer!
• Four of the Local-APIC registers are used
to implement a programmable timer
• It can privately deliver a periodic interrupt
(or one-shot interrupt) just to its own CPU
– 0xFEE00320: Timer Vector register
– 0xFEE00380: Initial Count register
– 0xFEE00390: Current Count register
– 0xFEE003E0: Divider Configuration register
Timer’s Local Vector Table
0xFEE00320
17 16
M
O
D
E
MODE:
0=one-shot
1=periodic
M
A
S
K
MASK:
0=unmasked
1=masked
12
B
U
S
Y
7
0
Interrupt
ID-number
BUSY:
0=not busy
1=busy
Timer’s ‘Divide-Configuration’
0xFEE003E0
3 2 1 0
reserved (=0)
0
Divider-Value field (bits 3, 1, and 0)
000 = divide by 2
001 = divide by 4
010 = divide by 8
011 = divide by 16
100 = divide by 32
101 = divide by 64
110 = divide by 128
111 = divide by 1
Initial and Current Counts
0xFEE00380
Initial Count Register (read/write)
0xFEE00390
Current Count Register (read-only)
When the timer is programmed for ‘periodic’ mode, the Current Count is
automatically reloaded from the Initial Count register, then counts down
with each CPU bus-cycle, generating an interrupt when it reaches zero
Using the timer’s interrupts
• Setup your desired Initial Count value
• Select your desired Divide Configuration
• Setup the APIC-timer’s LVT register with
your desired interrupt-ID number and
counting mode (‘periodic’ or ‘one-shot’),
and clear the LVT register’s ‘Mask’ bit to
initiate the automatic countdown operation
In-class exercise #1
• Run the ‘cpuid.cpp’ Linux application (on
our course website) to see if the CPUs in
our classroom implement HyperThreading
(i.e., multiple logical processors in a cpu)
• Then run the ‘mpinfo.cpp’ application, to
see if the MP Base Configuration Table
has entries for more than one processor
• If both results hold true, then we can write
our own multiprocessing software in H235!
In-class exercise #2
• Run the ‘apictick.s’ demo (on our CS 630
website) to observe the APIC’s ‘periodic’
interrupt-handler drawing ‘T’s onscreen
• It executes for ten-milliseconds (the 8254
is used here to create that timed delay)
• Try reprogramming the APIC’s Divider
Configuration register, to cut the interrupt
frequency in half (or perhaps to double it)
Download