Multiprocessor Initialization An introduction to the use of Interprocessor Interrupts Multiprocessor topology Back Side Bus Local APIC CPU #0 Local APIC CPU #1 IO APIC Front Side Bus peripheral devices system memory bridge Interrupt Command Register • Each Pentium’s Local-APIC has a 64-bit Interrupt Command Register • It can be programmed by system software to transmit messages (via the Back Side Bus) to one or several other processors • Each processor has a unique identification number in its APIC Local-ID Register that can be used to direct messages to it ICR (upper 32-bits) 31 24 Destination field 0 reserved The Destination Field (8-bits) can be used to specify which processor (or group of processors) will receive the message Memory-Mapped Register-Address: 0xFEE00310 ICR (lower 32-bits) 31 19 18 15 12 10 8 7 R / O Destination Shorthand 00 = no shorthand 01 = only to self 10 = all including self 11 = all excluding self Trigger Mode 0 = Edge Level 1 = Level 0 = De-assert 1 = Assert Delivery Status 0 = Idle 1 = Pending Register-address: 0xFEE00300 0 Vector field Delivery Mode 000 = Fixed 001 = Lowest Priority 010 = SMI 011 = (reserved) 100 = NMI 101 = INIT 110 = Start Up 111 = (reserved) Destination Mode 0 = Physical 1 = Logical MP initialization protocol • • • • • • • • Set processor-counter equal to zero Step 1: issue an ‘INIT’ IPI to all-except-self Delay for ten millieconds Step 2: issue ‘Startup’ IPI to all-except-self Delay for 200 microseconds Step 3: issue ‘Startup’ IPI to all-except-self Delay for 200 microseconds Check the value of the processor-counter Issue ‘INIT’ IPI ; broadcast ‘INIT’ IPI to all-except-self mov eax, #0x000C4500 mov [0xFEE00300], eax .B0: bt dword [0xFEE00300], #12 jc .B0 Issue ‘Startup’ IPI ; broadcast ‘Startup’ IPI to all-except-self ; using vector 0x11 to specify entry-point ; is at the memory-address 0x00011000 mov eax, #0x000C4611 mov [0xFEE00300], eax .B1: bt dword [0xFEE00300], #12 jc .B1 Delaying for EAX microseconds ; We use the 8254 Timer/Counter Channel 2 to generate a ; timed delay (expressed in microseconds by value in EAX) mov ecx, eax ; copy delay-time to ECX mov eax, #1000000 ; #microseconds-per-sec xor edx, edx ; extended to quadword div ecx ; perform dword division mov ecx, eax ; copy quotient into ECX mov eax, #1193182 ; #input-pulses-per-sec xor edx, edx ; extended to quadword div ecx ; perform dword division ; now transfer the quotient from AX to the Channel 2 Latch Mutual Exclusion • Shared variables must be accessed by only one processor at a time • The Pentium’s ‘lock’ prefix assist with this • Example: every processor adds 1 to count lock inc dword [count] • Example: all processors needs private stacks mov lock xadd mov ax, #0x1000 [new_SS], ax ss, ax ROM-BIOS isn’t ‘reentrant’ • The video service-functions in ROM-BIOS that we use to display a message-string at the current cursor-location (and afterward advance the cursor) modify global storage locations (as well as i/o ports), and hence must be called by one processor at a time • A shared memory-variable (called ‘mutex’) is used to enforce this mutual exclusion Implementing a ‘spinlock’ mutex: spin: .WORD 1 bt mutex, #0 jnc spin lock btr mutex, #0 jnc spin ;-- CRITICAL SECTION OF CODE GOES HERE -lock bts mutex, #0 In-class exercise • Include this procedure that multiple CPUs will execute simultaneously (without ‘lock) total: .WORD 0 add_one_thousand: mov cx, #1000 nxinc: add [total], #1 loop nxinc ret We need to use a ‘barrier’ • We can use a software construct (known as a ‘barrier’) to delay entry to a block of code until a prescribed number of CPUs are ready to enter it together arrived: .WORD 0 barrier: lock inc word [arrived] await: cmp word [arrived], #2 jb await call add_one_thouand