Dr A Sahu Dept of Comp Sc & Engg. IIT Guwahati • NIC Cards –Registers TX/RX, Statistics Counter –Network Device Driver (Skeleton) • Kernel Counter –Jiffies, RTC, kernel timer • File System, Block Devices –An introduction • There will not be any class in DIWALI week • Wishing you a happy and safe Diwali • Assignment will be uploaded to course website before this week end. • Deadline of assignment 13 Nov 2010 • Assignment will carry 5 marks • You have to show the demo on your lab machine – No Demo, No marks – You will not get any marks by simply submitting assignment • Memory-information registers – TDBA(L/H) = Transmit-Descriptor Base-Address Low/High (64-bits) – TDLEN = Transmit-Descriptor array Length – TDH = Transmit-Descriptor Head – TDT = Transmit-Descriptor Tail • Transmit-engine control registers – TXDCTL = Transmit-Descriptor Control Register – TCTL = Transmit Control Register • Notification timing registers – TIDV = Transmit Interrupt Delay Value – TADV = Transmit-interrupt Absolute Delay Value • Memory-information registers – RDBA(L/H) = Receive-Descriptor Base-Address Low/High (64bits) – RDLEN = Receive-Descriptor array Length – RDH = Receive-Descriptor Head – RDT = Receive-Descriptor Tail • Receive-engine control registers – RXDCTL = Receive-Descriptor Control Register – RCTL = Receive Control Register • Notification timing registers – RDTR = Receive-interrupt packet Delay Timer – RADV = Receive-interrupt Absolute Delay Value • The 82573L has several dozen statistical counters which automatically operate to keep track of significant events affecting the ethernet controller’s performance • Most are 32-bit ‘read-only’ registers, and they are automatically cleared when read • Your module’s initialization routine could read them all (to start counting from zero) • The statistical counters all have addressoffsets in the range 0x04000 – 0x04FFF • You can use a very simple program-loop to ‘clear’ each of these read-only registers // Here ‘io’ is the virtual base-address // of the nic’s i/o-memory region { int r; // clear all of the Pro/1000 controller’s statistical counters for (r = 0x4000; r < 0x4FFF; r += 4) ioread32( io + r ); } 0x4000 CRCERRS 0x400C RXERRC 0x4014SCC 0x4018ECOL 0x4074GPRC 0x4078BPRC 0x407CMPRC 0x40D0 0x40D4 0x40F0MPTC 0x40F4BPTC CRC Errors Count Receive Error Count Single Collision Count Excessive Collision Count Good Packets Received Broadcast Packets Received Multicast Packets Received TPR Total Packets Received TPT Total Packets Transmitted Multicast Packets Transmitted Broadcast Packets Transmitted • Loopback.c, plip.c, e100.c are examples of network drivers : /drivers/net/ • Device registration: – Alloc net devices (Request for resources and offer facilities) • Struct net_devices *snull_dev[2] ; //linux/netdevice.h • snull_dev[0]=alloc_netdev(sizeof(struct snull_priv), “sn%d”,snull_init); • Alloac_etherdev(int sizeof_priv); /wrapper to alloc_netdev – After initialization complete register the devices • register_netdev(snull_dev[i]); // return 1 if fails • Strcut snull_priv *priv=nedev_priv(dev); Strcu snull_priv { struct net_devices_stats stats; int status; strcut snull_packet *ppool; struct snul_packet *rx_queue; int rx_enabled, tc_packele; u8 *tx_packetdata; struct sk_bff *skb; spinlock_t lock; }; • Initialization priv=netdriv_priv(dev); memset(priv,0,sizeof(strcutn null_priv)); spin_lock_init(&priv->lock); snull_rx_inits(dev,1); //enable revice interrupts • Global Information – name: name of device – State: state of device – net_device *next; // ptr to next dev in global list – init_funtion: An init fun called by reg_netdev(); • Hardware Information • Interface Information • Device methods • Low level hardware information • Base_address: io_base address of network interface • Char irq: dev->irq, the assigned interrupt number..ifconfig • Char if_port: the port is in use on multiport device..10base • Char dma; // dma allcoated by the device for ISA bus • Device memory information: address of shared memory used by the devices – Rmem (rx mem) , mem (tx_mem) – rmem_start, rmem_end, mem-start, mem_end; • Init setup most of the information But device specific setup information need to setup later on • Non ethernet interface can use helper functions – fc_setup, ltalk_setup, fddi_setup – Fiber channel, local talk, fiber dis data ineterface, token ring, hihh perf parllel interface (hppi_setup) • Non default interface filed – Hard_headerlen,MTU (max tx unit=1500 oct ), tx_queue_len (ether=1000, pipl=10), short type, char adresslen; char dev_addeess[Max_add_len], breadcast[max_ad_len] • Flags bt sets: Mask bits, loopback, debug, noarp, multicast • Special hardware capability the device has: DMA • Fundamental method – Open, Stop, Hard_start_xmit – Hard_header, Rebuild_header – Tx_timeout, Net_device_stats, Set_config • Optional methods – Poll, poll_controller, do_ioctl, set_multicastlist – Set_mac_address,change_mtu, header_cache, header_cache_update, hard_header_parse • Utilities fileds (not methods) – Trans_start, last_rx, watchdog_timeo, *priv, mc_list, mc_count, xmit_lock, xmit_lock_owner • • • • • PIT Jiffies : A global timing counter variable User space timing Timer interrupt ISR Do_timer() • Accurate timing crucial for many aspects of OS – – – – – Device-related timeouts File timestamps (created, accessed, written) Time-of-day (gettimeofday() High-precision timers (code profiling, etc.) Scheduling, cpu usage, etc. • Intel timer hardware – – – – RTC: Real Time Clock PIT: Programmable Interrupt Timer TSC: TimeStamp Counter (cycle counter) Local APIC Timer: per-cpu alarms • Timer implementations – Kernel timers (dynamic timers) – User “interval” timers (alarm(), setitimer()) • Need timing measurements to: – Keep track of current time and date for use by e.g. gettimeofday(). – Maintain timers that notify the kernel or a user program that an interval of time has elapsed. • Timing measurements are performed by several hardware circuits, based on fixed frequency oscillators and counters. • Kernel keeps time by reading a clock device (oscillator) and maintaining a kernel variable with the current time • Current time accessible to user-mode programs via system calls • gettimeofday() is the usual interface to the current time maintained by system. • Same is also used to determine when the currently running process should be removed from CPU to let others run • Also used to keep track of the amount of time a process runs in user or supervisor mode! #include <sys/time.h> struct timeval theTime; gettimeofday(&theTime, NULL); //Definition of struct timeval: struct timeval { long tv_sec; long tv_usec; }; The date command: this command gives the time according to the Gregorian (modern Christian) calendar. • The clock ISR – timer_interrupt() in file arch/i386/kernel/time.c calls – do_timer( ) function in file kernel/sched.c • Increments a counter in the kernel variable called jiffies each time the function (do_timer( )) runs. • do_timer( ) then marks TIMER_BH (bottom-half) for execution in the ret_from_sys_call • For the system time, the timer bottom half uses the current value of kernel variable jiffies to compute the current time. It stores the value in struct timeval xtime, can be read by kernel functions – sys_gettimeoday( ) • Real-Time Clock (RTC): – Often integrated with CMOS RAM on separate chip from CPU: e.g., Motorola 146818. – Issues periodic interrupts on IRQ line (IRQ 8) at programmed frequency (e.g., 2-8192 Hz). – In Linux, used to derive time and date. – Kernel accesses RTC through 0x70 and 0x71 I/O ports. • Intel Pentium (and up), AMD K6 etc incorporate a TSC. • Processor’s CLK pin receives a signal from an external oscillator e.g., 400 MHz crystal. • TSC register is incremented at each clock signal. • Using rdtsc assembly instruction can obtain 64bit timing value. • Most accurate timing method on above platforms. • Programmable Interrupt Timers (PITs): – e.g., 8254 chip. Already discussed • PIT issues timer interrupts at programmed frequency. • In Linux, PC-based 8254 is programmed to interrupt Hz (=100) times per second on IRQ 0. – Hz defined in <linux/param.h> – PIT is accessed on ports 0x40-0x43. • Provides the system “heartbeat” or “clock tick”. • • • • • • unsigned long volatile jiffies; global kernel variable (used by scheduler) initialized to zero when system reboots gets incremented during a timer interrupt so it counts ‘clock-ticks’ since cpu restart ‘tick-frequency’ is a ‘configuration’ option • Won’t overflow for at least 16 months • Linux kernel got modified to ‘fix’ overflow • Now the declaration is in ‘linux/jiffies.h’: unsigned long longjiffies_64; and a new instruction in ‘do_timer()’ (*(u64*)&jiffies_64)++; • jiffies is incremented every timer interrupt. – Number of clock ticks since OS was booted. • Scheduling and preemption done at granularities of time-slices calculated in units of jiffies. • Every timer interrupt: – Update jiffies. – Determine how long a process has been executing and preempt it, if it finishes its allocated timeslice. – Update resource usage statistics. – Invoke functions for elapsed interval timers. • Signal on IRQ 0 is generated: • timer_interrupt() is invoked w/ interrupts disabled (SA_INTERRUPT flag is set to denote this). • do_timer() is ultimately executed: – Simply increments jiffies & allocates other tasks to “bottom half handlers”. – Bottom half (bh) handlers update time and date, statistics, execute fns after specific elapsed intervals and invoke schedule() if necessary, for rescheduling processes. • lost_ticks (lost_ticks_system) store total (system) “ticks” since update to xtime, which stores approximate current time. This is needed since bh handlers run at convenient time and we need to keep track of when exactly they run to accurately update date & time. • TIMER_BH refers to the queue of bottom halves invoked as a consequence of do_timer(). • Declare a timer: struct timer_list mytimer; • Initialize this timer: init_timer( &mytimer ); mytimer.func = mytimeraction; mytimer.data = (unsigned long)mydata; mytimer.expires = <number-of-jiffies> • Install this timer: add_timer( &mytimer ); • Modify this timer: mod_timer( &mytimer, <jifs> ); • Delete this timer: del_timer( &mytimer ); • Delete it safely: del_timer_sync( &mytimer); • RTC – – – – – – battery backed (packaged with CMOS RAM) registers to access current date/time (ports 0x70, 0x71) includes programmable timer (2-8192Hz) accessible as /dev/rtc sampled by kernel (only) on startup set by “clock” command (synched at shutdown) • TSC time stamp ( MSR: microproc specific register) – – – – 64 bit counter increments at CPU cycle speed accessible via user space assembly instruction rdtsc provides high-resolution timing capability kernel determines frequency at boot (calibrate_tsc()) • PIT – – – – – heartbeat timer; drives timer interrupt (tick) 100 Hz on PC; 1024 Hz on fast chips (alpha, itanium) patches to change clock speed via /proc! jiffies: # of ticks since boot xtime: struct with secs, usecs since Jan 1, 1970 (“epoch”) • CPU Local (APIC) Timers – – – – when available does per-cpu timing (e.g. quantum) if not available, driven by PIT 32 bit (instead of PIT 16 bit) so lower frequency possible decrements in multiples of bus cycles (1, 2, 4, 8, .. 128) • xtime.tv_sec, xtime.tv_usec – seconds since Jan 1, 1970 • update_times() – wall_jiffies: time of last xtime update – update_wall_time(ticks) // handles usec wrap – calc_load(ticks) // load average void update_times(void) { unsigned long ticks; write_lock_irq(&xtime_lock); ticks = jiffies – wall_jiffies; if (ticks) { jiffies += wall_jiffies; update_wall_time(ticks); } write_unlock_irq(&xtime_lock); calc_load(ticks); } • checking cpu resource limits – update user and kernel mode ticks for times() – per_cpu_utime, per_cpu_stime – over cpu limit? send SIGXCPU, SIGKILL • updating system load averages <1.0 is good – average tasks in run queue last 1, 5, 15 minutes – includes UNINTERRUPTIBLE (but not pid 0) • kernel profiling – samples eip on each interrupt – activated by kernel option profile= – results exported via /proc/profile (readprofile command) • NMI watchdogs (detecting system freeze) – clever use of APIC to detect freezes (failure to re-enable interrupts) – broadcast NMI periodically, check for increasing interrupt count! • gettimeofday(): sec, usec – delay since last bottom half (xtime update) – delay since last interrupt (jiffies update) • samples TSC if available for high-precision – settimeofday(): update xtime (not RTC!) requires root • adjtimex(): gradual clock time change • alarm(), setitimer() – user mode interval timers – three different timers • Block Devices (Disk) – Sector, inode • File systems (Operations) – Read/write, open,close, lseek, type • Component in the kernel that handles filesystems, directory and file access. • Abstracts common tasks of many file-systems. • Presents the user with a unified interface, via the file-related system calls (open, stat, chmod etc.). • Filesystem-specific operations:- vector them to the filesystem in charge of the file. • $ mount -t iso9660 -o ro /dev/cdrom /mnt/cdrom • Steps involved: – Find the file system.(file_systems list) – Find the VFS inode of the directory that is to be the new file system's mount point. – Allocate a VFS superblock and call the file system specific read_super function. • Operations for block devices • In include/linux/fs.h : struct block_device_operations { int (*open) (struct inode *, struct file *); int (*release) (struct inode *, struct file *); int (*ioctl) (struct inode *, struct file *, unsigned, unsigned long); int (*check_media_change) (kdev_t); int (*revalidate) (kdev_t); }; • In include/linux/blkdev.h : typedef void (request_fn_proc) (request_queue_t *q); • Provides common functionality for all block devices in Linux – Uniform interface (to file system) e.g. bread( ) block_prepare_write( ) block_read_full_page( ), ll_rw_block( ) // low level – buffer management and disk caching – Block I/O requests scheduling • Generates and queues actual I/O requests in a request queue (per device) – Individual device driver services this queue (likely interrupt driven) • Generic block device layer – Generates and queues I/O request – If the request queue is initially empty, schedule a plug_tq tasklet into tq_disk task queue • Asynchronous run of task queue tq_disk – Run in a few places (e.g., in kswapd) – Take a request from the queue and call the request_fn function: • q->request_fn(q); • To service all I/O requests in the queue • Typical interrupt-driven procedure – Service the first request in the queue – Set up hardware so it raises interrupt when it is done – Return • Interrupt handler tasklet – Remove the just-finished request from the queue – Re-enter the request service routine (to service the next) • Device operation structure: – static struct block_device_operations xxx_fops = { open: xxx_open, release: xxx_release, ioctl: xxx_ioctl, check_media_change, xxx_check_change, revalidate, xxx_revalidate, owner: THIS_MODULE, }; • Block device driver – 1 class (Lect 36) • Creative Sound blaster – 1 class (Lect 37) • USB2.0 – 2 class (Lect 38-39) • Summery After Mid Semester & Question patterns – Last class (Lect40)