Prelude to Multiprocessing Detecting cpu and system-board capabilities with CPUID and the

advertisement
Prelude to Multiprocessing
Detecting cpu and system-board
capabilities with CPUID and the
MP Configuration Table
CPUID
• Recent Intel processors provide a ‘cpuid’
instruction (opcode 0x0F, 0xA2) to assist
software in detecting a CPU’s capabilities
• If it’s implemented, this instruction can be
executed in any of the processor modes,
and at any privilege level
• But it may not be implemented (e.g., 8086,
80286, 80386)
Pentium EFLAGS register
31
21
0
I
D
0
0
0
0
0
0
0
0
0
16
V
I
P
V
I
F
A
C
V
M
R
F
15
0
0
N
T
IOPL
O
F
D
F
I
F
T
F
S
F
Z
F
0
A
F
0
P
F
1
C
F
Software can ‘toggle’ the ID-bit (bit #21) in the 32-bit EFLAGS register
if the processor is capable of executing the ‘cpuid’ instruction
But what if there’s no EFLAGS?
• The early Intel processors (8086, 80286)
did not implement 32-bit registers
• The FLAGS register was only 16-bits wide
• So there was no ID-bit that software could
try to ‘toggle’
• How can software be sure that the 32-bit
EFLAGS register exists within the CPU?
Detecting 32-bit processors
• There’s a subtle difference in the way the
logical shift/rotate instructions work when
register CL contains the shift-factor
• On the 32-bit processors (e.g., 80386+)
the value in CL is truncated to 5-bits, but
not so on the 16-bit CPUs (8086, 80286)
• Software can exploit this distinction, in
order to tell if EFLAGS is implemented
Detecting EFLAGS
# Here’s a test for the presence of EFLAGS
mov $-1, %ax
# a nonzero value
mov $32, %cl
# shift-factor of 32
shl
%cl, %ax
# do logical shift
or
%ax, %ax # test result in AX
jnz
is32bit
# EFLAGS present
jmp is16bit
# EFLAGS absent
Testing for ID-bit ‘toggle’
# Here’s a test for the presence of the CPUID instruction
pushfl
# copy EFLAGS contents
pop
%eax
# to accumulator register
mov
%eax, %edx
# save a duplicate image
btc
$21, %eax
# toggle the ID-bit (bit 21)
push %eax
# copy revised contents
popfl
# back into EFLAGS
pushfl
# copy EFLAGS contents
pop
%eax
# back into accumulator
xor
%edx, %eax
# do XOR with prior value
bt
$21, %eax
# did ID-bit get toggled?
jc
y_cpuid
# yes, can execute ‘cpuid’
jmp
n_cpuid
# else ‘cpuid’ unimplemented
How does CPUID work?
• Step 1: load value 0 into register EAX
• Step 2: execute ‘cpuid’ instruction
• Step 3: Verify ‘GenuineIntel’ characterstring in registers (EBX,EDX,ECX)
• Step 4: Find maximum CPUID input-value
in the EAX register
Version and Features
• load 1 into EAX and execute CPUID
• Processor model and stepping information
is returned in register EAX
27
20 19
16
Extended Extended
Family ID Model ID
13 12 11
Type
8 7
Family
ID
4 3
Model
0
Stepping
ID
Some Feature Flags in EDX
28
H
T
T
9
3
2
1
A
P
I
C
P
S
E
D
E
V
M
E
0
F
P
U
HTT = HyperThreading Technology (1 = yes, 0 = no)
APIC = Advanced Programmable Interrupt Controller on-chip (1 = yes,0 = no)
PSE = Page-Size Extensions (1 = yes, 0 = no)
DE = Debugging Extensions (1=yes, 0=no)
VME = Virtual-8086 Mode Enhancements (1 = yes, 0 = no)
FPU = Floating-Point Unit on-chil (1=yes, 0=no)
Some Feature Flags in ECX
5
V
M
X
VMX = Virtual Machine Extensions (1 = yes, 0 = no)
Multiprocessor Specification
• It’s an industry standard, allowing OS software
to use multiple processors in a uniform way
• Software searches in three regions of the
physical address-space below 1-megabyte for a
“paragraph-aligned” data-structure of length 16bytes called the MP Floating Pointer Structure:
– Search in lowest KB of Extended Bios Data Area
– Search in topmost KB of conventional 640K RAM
– Search in the 64KB ROM-BIOS (0xF0000-0xFFFFF)
MP Floating Pointer Structure
• This structure may contain an ID-number
for one a small number of standard SMP
system architectures, or may contain the
memory address for a more extensive MP
Configuration Table whose entries specify
a “more customized” system architecture
• Our classroom machines employ the latter
of these two options
The processor’s Local-APIC
• The purpose of each processor’s APIC is to
allow CPUs in a multiprocessor system to
transmit messages among one another and to
manage the delivery of interrupts from the
various peripheral devices to one or more CPUs
in a dynamically determined way
• The Local-APIC has a variety of registers which
are ‘memory mapped’ to paragraph-aligned
addresses in the 4KB page at 0xFEE00000
Local-APIC’s register-space
APIC
0xFEE00000
4GB physical
address-space
RAM
0x00000000
Each CPU has its own timer!
• Four of the Local-APIC registers are used
to implement a programmable timer
• It can privately deliver a periodic interrupt
just to its own CPU
– 0xFEE00320: Timer Vector register
– 0xFEE00380: Initial Count register
– 0xFEE00390: Current Count register
– 0xFEE003E0: Divider Configuration register
Timer’s Local Vector Table
0xFEE00320
17 16
M
O
D
E
MODE:
0=one-shot
1=periodic
M
A
S
K
MASK:
0=unmasked
1=masked
12
B
U
S
Y
7
0
Interrupt
ID-number
BUSY:
0=not busy
1=busy
In-class exercise
• Run the ‘cpuid.cpp’ Linux application (on
our course website) to see if the CPUs in
our classroom implement HyperThreading
(i.e., multiple processors within one CPU)
• Then run the ‘smpinfo.cpp’ application, to
see if the MP Base Configuration Table
has entries for more than one processor
• If both results hold true, then we can write
our own multiprocessing software in here!
In-class exercise #2
• Run the ‘apictick.s’ demo (on our website)
to observe the APIC’s periodic interrupt
drawing bytes onto the screen
• It executes for ten-milliseconds (the 8254
is used to create this timed delay)
• Try reprogramming the APIC’s Divider
Configuration register, to cut the interrupt
frequency in half (or to double it)
Download