Model-Specific Registers A look at Intel’s scheme for introducing new CPU features Test Registers • The 80386 implemented two registers for testing its Translation Look-aside Buffer (i.e., the special cache used for speeding up virtual-to-physical address-conversions • The registers were named TR6 and TR7 • Intel warned that these system registers were unique to the 80386 CPU’s design and might not be present in future chips Then three more • The TR6 and TR7 registers were kept in the 80486 design -- along with three extra Test Registers (TR3, TR4, TR5) that allowed testing of the processor’s caches for code and data • Again Intel warned that these registers were unique to the 80486 CPU’s design and that they might not be implemented in subsequent chips • Sure enough, in the 80586 (‘Pentium’) they were gone – so software written to use them would no longer execute on the newer Pentium CPUs The ‘Model-Specific’ concept • Beginning with the Pentium processor, Intel has been including ‘experimental’ features in its processors, warning that they may disappear from future designs, but providing a standard and permanent way for all such features to be accessed • This access is via a pair of ‘privileged’ instructions (rdmsr and wrmsr) that can only be executed by ‘ring0’ code Quite a few MSRs now! • At first there were only about a dozen of these MSRs (Model-Specific Registers), but lately their number is well over 200 • Some MSRs have evidently proven to be sufficiently satisfactory and worth having that they are now deemed as permanent fixtures of the defined i386 architecture The Time-Stamp Counter • This 64-bit Model-Specific Register was introduced in the Pentium processor and has been present in each CPU thereafter • It increments once every CPU clock-cycle, starting from 0 when power is turned on • It won’t overflow for at least ten years • Unprivileged programs (ring3) normally can access, it via the rdtsc instruction Using the TSC 64-bits 63 32 31 EDX time0: time1: .quad .quad 0 0 0 EAX # saves starting value from the TSC # saves concluding value from TSC # how you can measure CPU clock-cycles in a code-fragment rdtsc # read the Time-Stamp Counter movl %eax, time0+0 # save least-significant longword movl %edx, time0+4 # save most-significant longword # <Your code-fragment to be measured goes here> rdtsc # read the Time-Stamp Counter movl %eax, time1+0 # save least-significant longword movl %edx, time1+4 # save most-significant longword # now subtract starting-value ‘time0’ from ending value ‘time1’ The TSC as an MSR • Each Model-Specific Register has its own identifying register-number, and can be accessed (from ring0) using the special pair of instructions: rdmsr and wrmsr • The Time-Stamp Counter is MSR number 0x10 • To write a new 64-bit value into the TSC, you load the desired 64-bit value into the EDX:EAX register-pair, you put the MSR ID-number 0x10 into register ECX, then you execute wrmsr IA32_APIC_BASE • This register has MSR number 0x1B and it’s private to each CPU in an SMP system • It establishes the base-address for the Local-APIC’s memory-mapped registers (the default base-address is 0xFEE00000, but that can be changed using this MSR) • The CPU’s Local-APIC functions can be either enabled or disabled (via bit #11) • The BSP can be recognized (via bit #8) Relocating the APIC registers IA32_APIC_BASE (64-bits) 63 32 31 reserved 12 11 APIC base-address (4K page-number) E N 8 B S P Default-value for APIC base-address page = 0xFEE00 Local-APIC Enable bit (1=enabled, 0=disabled) Boot-Strap Processor (read-only): 1=yes, 0=no # make the processor’s Local-APIC registers accessible in real-mode mov $0x000D8000, %eax # least-significant 32-bits mov $0x00000000, %edx # most-significant 32-bits mov $0x1B, %ecx # MSR register-number wrmsr # write to specified MSR 0 Extended Feature Enable Register • The EFER was introduced in conformity with Advanced Microprocessor Designs way of implementing 64-bit architecture • Its MSR register-number is 0xC0000080 IA32_EFER (64-bits) 63 32 31 reserved 12 11 10 9 8 reserved eXecute-Disable bit in paging structures (1=enabled, 0=disabled) IA32e-mode is active (1=yes, 0=no) Enable IA32e-mode (1=yes, 0=no) Enable SYSCALL/SYSRET instructions in 64-bit mode (1=yes, 0=no) 3 X 2 D e A 3 2 e E 0 S Y S C A L L Demo: ‘try64bit.s’ • We created a demo-program that shows what steps are needed to enable the new 64-bit capabilities of recent Pentium-D or Core 2 Duo processors (using EFER) • This demo cannot be executed on our current CS Lab/Classroom workstations, but it CAN execute on a remote-access department server named ‘anchor00’ New 4-Level page-tables needed • For executing in 64-bit mode, the PAE-bit (Page-Addressing Extensions) must be enabled (bit #6 in Control Register CR4) and 4-levels of page-table structures must be prepared which implement an “identity mapping” for the transition-code itself • Then 64-bit mode is entered by turning on the PG-bit in Control Register 0 (assuming bit #8 in the EFER register was set to 1) 4-Levels of mapping 63 48 47 sign-extension 39 38 PML4 30 29 PDPT 21 20 PDIR 12 11 PTBL 0 offset 64-bit ‘canonical’ virtual address Page Table Page Map Level-4 Table CR3 Page Directory Pointer Table Page Frame (4KB) Page Directory Each mapping-table contains up to 512 quadword-size entries Page-Table entry format 63 62 E X B 52 51 40 39 Base Address [39..32] reserved (must be 0) available 31 12 11 Base Address [31..12] Legend: P = present (0=no, 1=yes) R/W (0=read-only, 1=writable) S/U (0=supervisor-only, 1=user) A = accessed (0=no, 1=yes) D = dirty (0=no, 1=yes) 32 9 8 0 P PPSR avail G A D A C W / / P T D T UW PWT = Page Write-Through (0=no, 1=yes) PCD = Page Caching Disable (0=no, 1=yes) PAT = Page-Attribute Table-Index G = Global page (1=yes, 0=no) Segment descriptors • Segment-descriptors and gate-descriptors have an enlarged format in 64-bit mode to accommodate the larger-sized addresses • Segment-Limit and Base are disregarded for selectors in registers CS, DS, ES, SS 127 Formerly ‘reserved’ bit is now the ‘L’ bit (it indicates a ‘long’ segment-descriptor 64 63 GD L A 0 P D P L S TYPE A few GDT descriptors… .align theGDT: .octa .equ .octa .equ .octa .equ .octa 16 # octaword-alignment (for optimal access) 0x00000000000000000000000000000000 # null sel_cs64, (. – theGDT)+0 # code64 selector (ring0) 0x000000000000000000209A0000000000 # code sel_cs32, (. – theGDT)+0 # code32 selector (ring0) 0x000000000000000000409A010000FFFF # code sel_vram, (. – theGDT)+0 # data16 selector (ring3) 0x00000000000000000080F20B80000007 # data You must update ‘binutils’ • You cannot assemble and link programs that are written for the IA32e 64-bit mode unless you install the newest versions of the GNU assembler ‘as’ and the linker ‘ld’ • You can download these utilities from the website for the Free Software Foundation at: http://www.fsf.org/ • Directions for installing are easy-to-follow