Microprocessor system architectures – ARMv8 Jakub Yaghob ARM architecture RISC Large uniform register file Load/store architecture Simple addressing modes Execution states AArch64 x AArch32 Architecture profiles A – application profile R – real-time profile M – microcontroller profile Execution states – AArch64 AArch64 31 64-bit general-purpose registers X30 – procedure link 64-bit PC, SPs, ELRs (exception link registers) 32 128-bit SIMD registers Single instruction set A64 Exception levels EL0-EL3 64-bit virtual addressing Names each system register with suffix that indicates the lowest EL with access PSTATE (Process state) Execution states – AArch32 AArch32 13 32-bit general purpose registers 32-bit PC, SP, LR (link register) Some registers banked for each execution mode Single ELR (return from Hyp) 32 64-bit SIMD registers A32 instruction set – fixed length encoding, compatible with ARMv7 T32 instruction set – variable-length, compatible with ARMv7 Thumb 32-bit virtual address CPSR (current program state register) Supported data types, cryptographic extension Integer Floating point B, H, W, D, Q HP, SP, DP IEEE 754 Cryptographic extension Operates on the vector register file AES, SHA1, SHA2-256 Memory model The ARM memory model supports Generating an exception on an unaligned memory access Restricting access by applications to specified areas of memory Translating virtual addresses provided by executing instructions into physical addresses AArch64 – 64-bit addressing, TCR (Translation Control Register) determines VA range, EL0+EL1 have 2 independent VA ranges each with its own TCR AArch32 – 32-bit addressing, TCR determines VA range, OS can split VA range into 2 subranges for EL0+EL1 with separate TCR Altering the interpretation of multi-byte data between big-endian and little-endian Controlling the order of accesses to memory Controlling caches and address translation structures Synchronizing access to shared memory by multiple PEs Application architecture – AArch64 31 general-purpose registers R0-R30 64-bit GP registers X0-X30 32-bit GP registers W0-W30 Encoding 1Fh for register used as ZR (zero register) 32 vector registers V0-V31 FPCR, FPSR – floating-point status and control register SP 64-bit X30 procedure link WSP 32-bit Current SP PC 64-bit Application architecture – AArch64 – vector registers Application architecture – AArch64 – PSTATE Process state for EL0 Data processing flags N – negative Z – zero C – carry V – overflow Exception masking bits D – debug mask A – system error mask I – IRQ mask F – FIQ mask System registers Register naming General system control registers Debug registers Generic timer registers Performance monitor registers Optional Trace registers <register_name>_Elx, x∈{0,1,2,3} Optional Generic Interrupt Controller (GIC) CPU interface registers Optional Software control and EL0 Exception handling System instructions for control flow WFI – Wait For Interrupt WFE – Wait For Event YIELD – hint Can enter low-power state Cache management Interrupts Memory system aborts Undefined instructions System calls Secure monitor or Hypervisor traps Must be enabled by EL1 Debug events BKPT – breakpoint DBG – hint to the debug system HLT – entry to Debug state Caches and memory hierarchy Point of Unification IC, DC see the same copy of a memory Point of Coherency All agents that can access memory are guaranteed to see the same copy Memory types Normal Bulk memory operations, R/W, R/O Device Speculative reads forbidden Additional attributes Gathering Reordering Write can be acknowledged other than at the end point Shareability Preserves access order and synchronization requirements Early write acknowledgement Prevents aggregation of R/W Non-shareable, inner shareable, outer shareable Cacheability Non-cacheable, write-through cacheable, write-back cacheable Alignment Instruction alignment A64 instructions must be word-aligned Data alignment Unaligned access to any Device memory causes an Alignment fault Normal memory SCTLR_ELx.A – configure unaligned access behavior Generate an Alignment fault Perform an unaligned access Unaligned access Not guaranteed to be atomic Takes a number of additional cycles Can abort more times for memory exceptions Endian support Instruction endianness A64 instructions are always little-endian Data endianness SCTLR_EL1.E0E – configures endianness for EL0 at EL1 or higher Instructions for reverting data in registers REV16, REV32, REV64 Synchronization and semaphores Load-exclusive instructions Store-exclusive instructions STXP, STXR, STXRH, STXRB Clear-exclusive LDXP, LDXR, LDXRH, LDXRB CLREX Should scale on MPS Exception levels Exception levels EL0-EL3 EL0 – unprivileged execution, applications EL1 – OS kernel EL2 – supports virtualization of non-secure operation, hypervisor EL3 – supports switching between two security states (secure state, non-secure state), secure monitor All implementations must include EL0 and EL1 Stack pointer register selection SP_ELx Exception levels Exception mechanism Saved Program Status Register Saves PE state on taking exceptions SPSR_ELx for exception taken to ELx When returning from an exception, PE state restored to the state stored SPSR Exception link registers ELR_ELx holds preferred exception return address Exception vectors Vector Base Address Register (VBAR) Each Elx Defines base address for the table at that ELx System calls SVC HVC Supervisor call exception EL0 calls OS at EL1 Hypervisor call exception For EL1 and higher SMC Secure monitor call exception For EL1 and higher Virtual Memory System Architecture VMSA Provides MMU MMU translates VAs to PAs independently for ELx and security states A64 has 48-bit VA and PA Address translation system VMSAv8-64 Translation Table Base Register (TTBR) Translation Control Register (TCR) Up to four levels of address lookup IA of up to 48 bits OA of up to 48 bits A translation granule size of 4K, 16K, 64K 4K translation granule 16K translation granule 64K translation granule Translation table entries – levels 0-2 Translation table entries – level 3 Attribute fields MMU faults All types of MMU exceptions Alignment fault Permission fault Translation fault Address size fault Synchronous external abort on a translation table walk Access flag fault TLB conflict abort Translation Lookaside Buffers (TLB) TLB Caches results from translation table walks Global pages Process-specific pages Address Space Identifier (ASID) Virtual Machine Identifier (VMID) Concept of locked entries Implementation defined size 8 or 16 bits Optional for implementation Maintenance instructions TLBI <operation>{,Xt}