Input/Output Organization (Chapter 4) http://www.pds.ewi.tudelft.nl/~iosup/Courses/2011_ti1400_8.ppt 1 in1705/07 The “Data Deluge”: Trivia The Petabyte Age: Because More Isn't Just More — More Is Different, Wired, 23 June 2008. http://www.wired.com/science/discoveries/magazine/16-07/pb_intro# 2 TI1400/11-PDS TU-Delft The “Data Deluge”: Facts and Predictions "Everywhere you look, the quantity of information in the world is soaring. According to one estimate, mankind created 150 exabytes (billion gigabytes) of data in 2005. This year, it will create 1,200 exabytes. Merely keeping up with this flood, and storing the bits that might be useful, is difficult enough. Analysing it, to spot patterns and extract useful information, is harder still.“ The Data Deluge, The Economist, 25 February 2010. 3 TI1400/11-PDS TU-Delft “Data Deluge”: The Mobile Example exabyte Battling a Wireless Deluge: AT&T, Other Carriers Use Wi-Fi 'Hotzones' to Siphon Off Smartphone Traffic, Tech Journal, 2 February 2011. Read more: http://online.wsj.com/article/SB10001424052748704124504576118353354099780 .html#ixzz1LEpF4TuA 4 TI1400/11-PDS TU-Delft “Data Deluge”: The Personal Memex Example • Vannevar Bush in the 1940s: record your life • MIT Media Laboratory: The Human Speechome Project/TotalRecall, data mining/analysis/visio - Deb Roy and Rupal Patel “record practically every waking moment of their son’s first three years” (20% privacy time…Is this even legal?! Should it be?!) - 11x1MP/14fps cameras, 14x16b-48KHz mics, 4.4TB RAID + tapes, 10 computers; 200k hours audio-video - Data size: 200GB/day, 1.5PB total 5 TI1400/11-PDS TU-Delft “Data Deluge”: The Gaming Analytics Example • EQ II: 20TB/year all logs • Halo3: 1.4PB served statistics on player logs 6 TI1400/11-PDS TU-Delft “Data Deluge”: Datasets in Comp.Sci. Dataset Size http://gwa.ewi.tudelft.nl The Failure Trace Archive http://fta.inria.fr Peer-to-Peer Trace Archive … PWA, ITA, CRAWDAD, … 1TB/yr 1TB GamTA 100GB P2PTA 10GB 1GB ‘06 ‘09 ‘10 ‘11 Year • 1,000s of scientists: From theory to practice 7 TI1400/11-PDS TU-Delft 7 The Simplest(?) Problem: How to Access Data by the CPU/Cores? • Computers must be able to communicate with outside • Large variety of devices - size - speed - distance • Timing and electrical properties not the same as within CPU 8 TI1400/11-PDS TU-Delft Single-bus structure Processor Memory Bus I/O device #1 ............ I/O device #n 9 TI1400/11-PDS TU-Delft Multiple buses Memory memory bus Processor I/O Bus I/O device #1 ............ I/O device #n 10 TI1400/11-PDS TU-Delft Buses and interfaces Bus contains generally three bit strings: • Data lines to transport data • Address lines to identify devices • Control lines that take care of correct transfer of data 11 TI1400/11-PDS TU-Delft Interfaces Devices are coupled to bus through interface: • Address decoder - for detection if data is for device • Data registers - to store incoming and outgoing data • Status and control registers - to certify status of device - to control transfer 12 TI1400/11-PDS TU-Delft Interface organization Address lines Data lines Control lines Address Decoder Data and Status registers I/O interface Control circuits Device TI1400/11-PDS 13 TU-Delft Video terminal CPU DATAIN SIN Keyboard DATAOUT SOUT Display Video terminal 14 TI1400/11-PDS TU-Delft Operation (1) Busy waiting: READWAIT Branch to READWAIT if SIN=0 Input from DATAIN to R1 WRITEWAIT Branch to WRITEWAIT if SOUT=0 Output from R1 to DATAOUT I/O-instructions: Move Move DATAIN, R1 R1, DATAOUT 15 TI1400/11-PDS TU-Delft Operation (2) 2 IOSTATUS 1 SIN 0 SOUT DATAIN DATAOUT READWAIT Testbit #1, IOSTATUS Branch=0 READWAIT Move DATAIN, R1 16 TI1400/11-PDS TU-Delft I/O Instructions • Memory-mapped I/O - the registers of the devices have addresses in the same space as main memory locations - normal instructions can be used • move DATAIN, R1 • I/O instructions - special instructions for I/O • IN device, data • OUT data, device 17 TI1400/11-PDS TU-Delft Memory and register structure Memory CPU ...... IOPROC1 IOPROC2 18 TI1400/11-PDS TU-Delft Address spaces memory mapped separate address spaces 0 1 2 0 1 2 CPU 0 1 2 3 4 5 IOPROC1 IOPROC2 CPU 0 1 2 0 1 2 IOPROC1 IOPROC2 0 6 ...... Mem ...... Mem 19 TI1400/11-PDS TU-Delft I/O and Programming There are two basic mechanisms for I/O: 1. 2. Programmed I/O Non-programmed I/O 20 TI1400/11-PDS TU-Delft Programmed I/O • By executing of special program in CPU • Unconditional I/O - no synchronization with I/O device • Passive signaling - synchronization between CPU and Device by programmed interrogation by CPU • Active signaling - synchronization between CPU and Device by active interrupt of Device 21 TI1400/11-PDS TU-Delft Non-programmed I/O I/O is done by separate active entity • Direct Memory Access (DMA) - some intelligence in device takes care of data transport • Special I/O processors 22 TI1400/11-PDS TU-Delft Interrupts Compute routine Print routine 1 ... Interrupt i i +1 M jump .... return ..... ..... 23 TI1400/11-PDS TU-Delft Service Routines • I/O device alerts CPU by hardware signal called interrupt signal • Usually special line in control group of I/O bus is used for this: interrupt request line • CPU stops program and starts executing service routine • Much like executing subroutine • Except: these routines have nothing in common !! 24 TI1400/11-PDS TU-Delft Handling interrupts 1. Device raises interrupt request • Processor interrupts program in execution • Interrupts are disabled • Device is informed of acceptance and, as a consequence, lowers interrupt • Interrupt is handled by service routine • Interrupts are enabled • Execution of interrupted program is resumed 25 TI1400/11-PDS TU-Delft Multiple devices • How can processors distinguish devices ? • How can processors obtain the appropriate starting address service routine ? • Should we allow a new interrupt while another is being served ? • How do we handle simultaneous interrupts ? 26 TI1400/11-PDS TU-Delft Interrupt line INTR = INT1 + INT2 + .... + INTn interrupt request CPU INT1 INT2 INTn Finding device by POLLING : - search for device with IRQ bit set in status register 27 TI1400/11-PDS TU-Delft Vectored Interrupt • Device sends identification code on bus • Called interrupts vector • Issued after GRANT signal from CPU interrupt request CPU grant INT1 INT2 INTn 28 TI1400/11-PDS TU-Delft Interrupt priority priority circuit CPU grant1 grant2 INT1 INT2 INTn grant3 29 TI1400/11-PDS TU-Delft Bus arbitration (1) bus release line (rel_i) interrupt request line (req_i) CPU grant bus is free iff: (rel_1 • rel_2 • ..... • rel_n) =1 30 TI1400/11-PDS TU-Delft Bus arbitration (2) • Request: set req_i to 1 • Acquire: if grant=1, then set req_i to 0 (interrupt once) and set rel_i to 0 (prevent others from interrupting) • Release: set rel_i to 1 grant = (req_1 + req_2 + ..... +req_n) • (rel_1 • rel_2 • ..... • rel_n) at least one request bus released by all 31 TI1400/11-PDS TU-Delft PowerPC interrupt structure (1) MSR = Machine State Register 0 16 17 EE PR 21 SE 25 EP 31 EE = External interrupt enable PR = Privilege level SE = Single step trace exception enable EP = Exception prefix EP=0 address service starts at 000001F4 EP=1 address service starts at FFF001F4 32 TI1400/11-PDS TU-Delft PowerPC interrupt structure (2) • PowerPC has two special Save/Store registers: SRR0 and SRR1 • After interrupt: MSR PC SRR0 SRR1 Clear interrupt enable bit in MSR 33 TI1400/11-PDS TU-Delft IA-32 interrupt structure (1) Processor status register (EFLAGS) 31 • • • • • 13 12 11 9 8 7 6 0 IOPL OF IF TF SF ZF CF CF, ZF, SF, OF: condition code flags TF: trap flag IF: Interrupt Enable Flag IOPL: I/O Privilege Level (4 levels) IA-32 has two interrupt request lines 34 TI1400/11-PDS TU-Delft IA-32 interrupt structure (2) • Steps when an interrupt occurs: 1. push processor status register, current segment register (CS), and instruction pointer (EIP) onto the stack 2. clear interrupt-enable flag if needed 3. fetch starting address of interrupt-service routine from Interrupt Descriptor Table and load it into EIP • At end of routine, execute IRET 35 TI1400/11-PDS TU-Delft Example DATAIN STATUS 6 IE 2 1 0 SIN SOUT interrupt keyboard interface 36 TI1400/11-PDS TU-Delft Memory Layout STATUS DATAIN LINE buffer area 1F4 READ ..... ..... ..... 32 K I/O space 32 K program space address READ ..... 37 TI1400/11-PDS TU-Delft PowerPC: Initialization INTVEC EQU $1F4 Interrupt vector address (location where start address of interrupt routine is stored) INTEN INTDIS EQU EQU $40 0 NEWMSR EQU $8000 Desired contents of MSR (external interrupt enable) RTRN EQU $0D Keyboard interrupt enable and disable masks (will be stored in status register of device) Code Carriage Return (for checking end-of-line) 38 TI1400/11-PDS TU-Delft PowerPC: Interrupt Processing (1) START ADDI R2,0,READ STW R2,INTVEC(0) Get address of service routine and store at interrupt vector location ADDI R2,0,LINE STW R2, PNTR(0) Get address of LINE and store at PNTR ADDI R2,0,INTEN STW R2,STATUS(0) Store interrupt enable in STATUS register 39 TI1400/11-PDS TU-Delft PowerPC: Interrupt Processing (3) ADDI R2,0,NEWMSR MTSRR1 R2 Store new MSR in SRR1 ADDI R2,0,MAIN MTSRR0 R2 Store new PC in SRR0 RFI Return From Interrupt (use new MSR and PC) 40 TI1400/11-PDS TU-Delft PowerPC: Program (1) MAIN READ PNTR <main program> ..... ..... Save registers LBZ Get input character R30,DATAIN(0) LWZ R31,PNTR(0) STBU R30,1(R31) STW R31,PNTR(0) Load value at PNTR Store character in buffer Update PNTR for next character 41 TI1400/11-PDS TU-Delft PowerPC: Program (2) EOL CMPWI BNE CR1,R30,RTRN CR1,DONE Check for CR (end of line) ADDI STW R2,0,INTDIS R2,STATUS(0) Store interrupt disable in STATUS register next character BL TEXT DONE .... RFI Call subroutine for dealing with line Restore saved registers Return from interrupt 42 TI1400/11-PDS TU-Delft IA-32: Program (1) MAIN: MOV MOV OR STI EOL,0 BL,4 CONTROL,BL READ: PUSH PUSH MOV MOV MOV INC EAX EBX EAX,PNTR BL,DATAIN [EAX],BL DWORD PTR [EAX] not yet end of line set keyboard interrupt enable set interrupt flag in processor register save registers load address pntr get input, store it, and increment pntr 43 TI1400/11-PDS TU-Delft IA-32: Program (2) CMP JNE MOV XOR MOV RTRN: POP POP IRET BL,0DH RTRN BL,4 CONTROL,BL EOL,1 EBX EAX char=end of line? no yes so disable interrupts and set EOL flag restore registers return from interrupt 44 TI1400/11-PDS TU-Delft Other interrupts • Not only I/O devices can cause interrupts • Recovery from errors, e.g.: - illegal OP code used - division by 0 • Debugging • Privilege exception 45 TI1400/11-PDS TU-Delft Operating Systems (1) • In general, interrupts controlled by Operating System • CPU can be in user mode or supervisor mode • Privileged instructions only allowed in supervisor mode - starting of I/O operations - setting of priorities - setting of clock values 46 TI1400/11-PDS TU-Delft Operating Systems (2) • Process: program in execution - Program - Data - Status: PC, Registers, etc • State of a process: - Running - Runnable (waiting for CPU) - Blocked (waiting for something else) • Multi-tasking - Multiple tasks in execution • Time-slicing - Divide time across processes 47 TI1400/11-PDS TU-Delft Operating Systems (3) • Context switch: change of processes • After clock interrupt: dispatcher chooses suitable process • Device drivers: service routines for devices • System Call: call to OS service routine - printf (“%d\n”,a) - fscanf (file,”%d\n”,&a) 48 TI1400/11-PDS TU-Delft OS init, services, scheduler OSINIT Set interrupt vectors Set addresses Time slice clock <- SCHEDULER for dealing Trap <- OSSERVICES with these VDT interrupts <- IODATA interrupts ... OSSERVICES Examine stack to determine request Call appropriate routine SCHEDULER Save current context Select runnable process Context switch Restore saved context of new process Return from interrupt 49 TI1400/11-PDS TU-Delft I/O routines IOINIT IODATA Set process status to blocked Initialize memory buffers Call device driver to initialize device (e.g., VDT) Return from subroutine Poll devices to determine source of interrupt (e.g., VDT) Call appropriate driver if END=1 then set process to runnable Return from interrupt 50 TI1400/11-PDS TU-Delft VDT driver (e.g., Keyboard) VDTINIT Initialize device interface (e.g., baud rate) Enable interrupts Return from subroutine VDTDATA Check device status If ready then transfer character If character = CR (check end-of-line) then set END=1; Disable interrupts else set END=0 Return from subroutine 51 TI1400/11-PDS TU-Delft Direct Memory Access Start address Wordcount more “intelligent” device interface 31 30 2 1 Status &Control IE Interrupt request IRQ R/W 0 Done DMA controller 52 TI1400/11-PDS TU-Delft Direct Memory Access to Physical Devices Processor DMA Ch. 1 Memory Bus priority modes Cycle stealing: DMA > CPU Burst: DMA exclusive System bus DMA/Disk controller BUFR DMA controller DMA Channel 2 Disk1 Disk2 Network Interface 53 TI1400/11-PDS TU-Delft Cell/B.E.: A Modern DMA Use • 1 x PPE 64-bit PowerPC - • 8 x SPE cores: - • Local mem (LS): 256 KB 128 x 128 bit vector registers Peak performance - • L1: 32 KB I$+32 KB D$ L2: 512 KB ~200 GFLOPS for all SPEs ~240 GFLOPS total Main memory access: - PPE: Rd/Wr - SPEs: Async DMA 54 TI1400/11-PDS TU-Delft Bus structures Specification of bus • • • • • • Number of data lines Size of address space Multiplexing discipline Control structure Synchronous versus asynchronous Physical properties: connectors, pinning, electrical properties 55 TI1400/11-PDS TU-Delft NVIDIA G80/GT200/Fermi: I/O as Performance Bottleneck G80 • • • GT200 SM = streaming multiprocessor 1 SM = 8 SP (streaming proc/CUDA cores) 1TPC = 2 x SM / 3 x SM = thread processing clusters TI1400/11-PDS Per chip 1+TFLOPS I/O: 2.5GB/s (1:400) 56 TU-Delft Synchronous Bus Bus clock Address Data Clock “slow enough” for all connected devices 57 TI1400/11-PDS TU-Delft Asynchronous Bus (1) Address (to allow for skew) Ready (set by CPU) Accept (set by device) Data (from device) Explicit handshaking: Input Cycle 58 TI1400/11-PDS TU-Delft Asynchronous Bus (2) Address Ready Accept Data (from CPU) Explicit handshaking: Output Cycle 59 TI1400/11-PDS TU-Delft SCSI bus • Small Computer System Interface (SCSI) • ANSI standard X3.131 • Up to 25 meter • 50-wire cable • Up to 8 (16) devices connected to bus • A connection has an initiator and a target • Target controls data transfer 60 TI1400/11-PDS TU-Delft SCSI- based Computer System Processor Memory Printer Terminal Par. intface Ser.intface processor bus SCSI controller Disk controller Disk1 TI1400/11-PDS Disk2 CD-ROM controller CD ROM drive SCSI bus 61 TU-Delft SCSI bus signals • Data: DB(0),..., DB(7) • Parity: DB(P) • Phase: BSY, SEL • Information type: C/D, MSG (control/message) • Handshake: REQ, ACK • Direction: I/O • Other: ATN, RST • Data lines used for identifying bus controllers • Signals are active in the low-voltage state 62 TI1400/11-PDS TU-Delft Typical sequence -DB2 initiator 2 retreats target -DB5 -DB6 initiator 6 wins -BSY -SEL bus free arbitration selection 63 TI1400/11-PDS TU-Delft