Control Update Focus on PlanetLab integration and booting Fred Kuhns fredk@arl.wustl.edu Applied Research Laboratory Washington University in St. Louis fredk@arl.wustl.edu Washington WASHINGTON UNIVERSITY IN ST LOUIS Documents • Control documentation http://www.arl.wustl.edu/projects/techX/ppt/ – This presentation • http://www.arl.wustl.edu/projects/techX/ppt/ControlUpdate.ppt – SRM interface • http://www.arl.wustl.edu/projects/techX/ppt/srm.ppt – RMP interface • http://www.arl.wustl.edu/projects/techX/ppt/rmp.ppt – SCD interface (ingress, egress and npe) • http://www.arl.wustl.edu/projects/techX/ppt/scd.ppt • Datapath documentation http://www.arl.wustl.edu/projects/techX/design/SPP/ – NAT overview (Interface??) • http://www.arl.wustl.edu/projects/techX/design/SPP/SPP_V1_NAT_design.ppt – FlowStats (Interface??) • http://www.arl.wustl.edu/projects/techX/design/SPP/FlowStats_Control.ppt Fred Kuhns - 3/24/2016 Washington WASHINGTON UNIVERSITY IN ST LOUIS 2 Traditional View of a PlanetLab Node • • Linux OS, vserver System services – – – – • Planetlab node: site, owner, model, ssh_host_key, groups Host = XXX, Domain = YYY IPAddress = A.B.C.D pl_netflow sirius: brokerage service stork: environmental service CoMon: monitoring and discovery Resource model Node System Manager Services (“root” VM) (VMs) – focused on PCs with single device instances (CPU, NIC) – standard Linux/UNIX tools to measure utilization – homogeneous environment with single vmm to manage all vm instances on a platform – local node manager interface through loopback interface • ... VMN Virtual Machine Monitor (VMM) Hardware Platform (General Purpose PC) User requests slice on a set of distributed nodes Disk – assigned VM instance on each node – Fedora Linux environment – per slice flowstats Fred Kuhns - 3/24/2016 VM1 DRAM CPU NIC host.domain A.B.C.D Washington WASHINGTON UNIVERSITY IN ST LOUIS Internet 3 An SPP Node SPP/PlanetLab node: site, owner, model ssh_host_key, groups Host = XXX Domain = YYY IPAddress = A.B.C.D GPE1 GPE2 *Node *System Manager Services VM1 ... VMX-1 Virtual Machine Monitor (VMM) Hardware Platform (General Purpose PC) NICNIC NIC Disk DRAM CPU CPU CP control *Node *System Manager Services VMX VMN ... Virtual Machine Monitor (VMM) Hardware Platform (General Purpose PC) NICNIC NIC Disk DRAM CPU CPU data data control HUB: 1GbE Control (Base); 10GbE Data (fabric) data data NPE Line Card NAT *Node *System Manager Services FwdDB/Filters datapath External Interface spp_host.domain A.B.C.D Fred Kuhns - 3/24/2016 data NPE vmX-1:fast path1 vm1:fast path1 vmY:fast path2 vm1:fast path2 vmN:fast path1 vmX:fast path1 ... ... Internet Washington WASHINGTON UNIVERSITY IN ST LOUIS 4 Challenges • Provide the standard PlanetLab slice environment – configure and boot individual GPEs with standard planetlab software and supporting the standard operational environment • Support standard interfaces – boot manager – node managers internal and external interfaces – resource monitoring • Create interface for allocating and managing fast-paths – allocate/free NPE resources – manage meta-interface mappings to externally visible IP address and UDP port – slice control of allocated fastpath resources Fred Kuhns - 3/24/2016 Washington WASHINGTON UNIVERSITY IN ST LOUIS 5 SPP Node External Interfaces IP IP ... IP 1 2 RTM PCI SPI PCI interfaces ntp xscale SPI Fabric Ethernet Switch (10Gbps, data path) Base Ethernet Switch (1Gbps, control) Hub FlowStats xmlrpc PLCAPI proxy System Resource Manager (SRM) and node manager (GNM) httpd SLM sliceDB boot files ingress SCD xscale vnet CP PXE, dhcpd tftp ntp ntp xscale egress SCD NATD NPU-B xscale NPU-B ntp TCAM RMP LC TCAM NMP ingress NPE SCD GPE user slivers GPE 10x1G/1x10G NPU-A NPE pl_netflow NPE NPU-A Boot Files: dhcpd.conf ethers tftpboot: bootcd.img overlay_gpeX.img pxelinux.0 pxelinux.cfg C0A82031 C0A82041 overlay.img: plnode.txt plc_config ethers spp_conf.txt spp_netinit.py server*, certs N Resource DB Slice DB node DB Fred Kuhns - 3/24/2016 nodeconf.xml flowDB sshd* user info/ home dirs Washington WASHINGTON UNIVERSITY IN ST LOUIS I2C (IPMI) /var/www/ ntpd Shelf manager 6 Software Components • Control Processor (CP): – Boot and Configuration Control (BCC): Node configuration, management and local state management (DB) • • • – System Resource Manager (SRM): Centralized resource management • • – – – – – • System Node Manager (SNM, aka GNM): “top-half” of the PlanetLab node manager Slice login manager (SLM) and ssh forwarding (modified sshd) -- Ritun Flow Statistics (FS): aggregates pl_netflow data and translates NAT records Set default (static) routes in line card What about dynamic route management (BGP/OSPF/RIP)? For now assume single next hop router for all routes. Local Boot Manager (LBM): Modified PlanetLab BootManager running on the GPEs Resource Manager Proxy (RMP) Node Manager Proxy (NMP), lower-half of PlanetLab’s node manage Network Processor Element (NPE) – Substrate Control Daemon (SCD): • – – • manages all NPE resources and provides mappings form slice to global name spaces Kernel module to read/write memory locations (wumod) Command interpreter for configuring NPU memory (wucmd) Line Card, Ingress – Substrate Control Daemon (scd_ingress) • • • – • responsible for all resource allocation decisions and maintaining dynamic system state delegates local operations to individual board-level managers General purpose Processing Element (GPE) – – – • httpd, dhcpd, tftp and PXE server for GPE and NPE boards; maintain config files Boot CD and distribution file management (overlay images, RPM and tar files) for GPEs and CP PLCAPI proxy (plc_api) and system level BootManager (part of gnm) implements interface to srm manage tcam access for ingress and egress reads/writes scratch rings for NATD Network Address Translation daemon (NATD), port only Line Card Egress: – Substrate Control Daemon (scd_egress) • • implements interface to srm reads/writes scratch rings and communicates with the FS and NATD. Fred Kuhns - 3/24/2016 Washington WASHINGTON UNIVERSITY IN ST LOUIS 7 Boot and Configuration Control • Read node configuration DB: currently this is an xml file – Allocate IP subnets and addresses for all boards – Assign external IP addresses to GPE fabric interfaces with default VLAN id – Create per GPE configuration DB: currently this is written to files. • Create dhcp configuration file and start dhcpd, httpd and system sshd – assigns control IP subnets and addresses; assigns internal substrate IP subnet on fabric Ethernet • Start PLCAPI proxy (plc_api) server and system node manager – read node DB for initialization data: currently use static configuration data and/or re-read xml file – Create GPE overlay images: currently this is done manually – Currently the SNM is split between the plc_api server and srm due to not having a DB and not wanting to implement transaction-like interface for the snm. – begin periodic slice updates and gpe assignments, maintain DB • Start SRM and bring up boards as they “report in” – Initialize Line Card to forward “default” (i.e. ssh and icmp) to CP – Initialize Hub: base and fabric switches; Initialize any switches not within the chassis • Start SLM and the ssh daemon – Remove the SLM configuration file for slices, may contain old mappings Fred Kuhns - 3/24/2016 Washington WASHINGTON UNIVERSITY IN ST LOUIS 8 Booting SPP1: Example Configuration Line Card (Slot 6) Ingress XScale scd the ARL network drn05.arl.wustl.edu 128.252.153.209 vlan 2 lc_b1a = 192.168.32.97/20 eth2.2 128.252.153.31 eth0:0 128.252.153.31 eth0 lc1_data = 171.16.1.6/26 ... Egress XScale eth0 128.252.153.78 natd b1a CP /tftpboot/ ramdisk.gz fs zImage.ppm10 httpd bootcd.img overlay_gpe1.img gnm* /var/www/html/boot/ overlay_gpe2.img index.html plc_api pxelinux.0 bootmanager.sh pxelinux.cfg/ bootstrapfs-planetlab-i386.tar.bz2 C0A82031 C0A82041 eth2 cp_ctrl b1 /etc/ dhcpd.conf ethers f1/0 eth0 srm eth0 cp_data = 171.16.1.1/26 b1b GPE1 (Slot 4) eth1:0 192.168.32.2 b1 IP Routing proxy arp for drn05 myPLC drn06.arl.wustl.edu Ebony f1/0 NPE (Slot 5) XScale A f1/1 scd lc_b1a = 192.168.32.81/20 eth0 lc1_data = 171.16.1.5/26 ... XScale B b1a f1/0 b1 f1/0 eth0 b1b f1/1 Fred Kuhns - 3/24/2016 Washington WASHINGTON UNIVERSITY IN ST LOUIS rmp b2 f2/0 f2/1 nm eth2 gpe1_ctrl = 192.168.32.65/20 noarp vlan 2 eth0.2 dnr05.arl.wustl.edu eth0 gbe1_data = 171.16.1.3/26 eth1 gpe1_int = 172.16.1.65/26 GPE2 (Slot 3) scd lc_b1b = 192.168.32.82/20 dhcpd 192.168.32.1/20 noarp vlan 2 eth0.2 dnr05.arl.wustl.edu f1/0 scd lc_b1b = 192.168.32.98/20 192.168.32.17 Hub rmp b2 f2/0 f2/1 nm eth2 gpe2_ctrl = 192.168.32.49/20 noarp vlan 2 eth0.2 dnr05.arl.wustl.edu eth0 gbe2_data = 171.16.1.4/26 eth1 gpe2_int = 172.16.1.66/26 9 Example Configuration, SPP3 Line Card (Slot 6) Ingress XScale scd the ARL network myPLC drn06.arl.wustl.edu spp3.arl.wustl.edu 128.252.153.3 vlan 2 lc_b1a = 192.168.0.97/20 eth2.2 128.252.153.39 eth0 lc1_data = 171.16.1.6/26 ... Egress XScale eth0 128.252.153.34 natd b1a /tftpboot/ ramdisk.gz fs zImage.ppm10 httpd bootcd.img overlay_gpe1.img gnm* /var/www/html/boot/ overlay_gpe2.img index.html plc_api pxelinux.0 bootmanager.sh pxelinux.cfg/ bootstrapfs-planetlab-i386.tar.bz2 C0A82031 C0A82041 eth2 cp_ctrl b1 /etc/ dhcpd.conf ethers eth0 srm eth0 cp_data = 171.16.1.1/26 b1b GPE1 (Slot 3) eth1:0 b1 192.168.0.2 f1/0 NPE (Slot 5) XScale A f1/1 scd cp5.arl.wustl.edu lc_b1a = 192.168.0.81/20 eth0 lc1_data = 171.16.1.5/26 ... XScale B b1a f1/0 b1 f1/0 eth0 b1b f1/1 Fred Kuhns - 3/24/2016 Washington WASHINGTON UNIVERSITY IN ST LOUIS rmp b2 f2/0 f2/1 nm eth2 gpe1_ctrl = 192.168.0.49/20 noarp vlan 2 eth0.2 spp3.arl.wustl.edu eth0 gbe1_data = 171.16.1.3/26 eth1 gpe1_int = 172.16.1.65/26 GPE2 (Slot 4) scd lc_b1b = 192.168.0.82/20 dhcpd 192.168.0.1/20 noarp vlan 2 eth0.2 spp3.arl.wustl.edu f1/0 eth0:0 128.252.153.39 IP Routing proxy arp for drn05 CP f1/0 scd lc_b1b = 192.168.0.98/20 192.168.0.17 Hub rmp b2 f2/0 f2/1 nm eth2 gpe2_ctrl = 192.168.0.65/20 noarp vlan 2 eth0.2 spp3.arl.wustl.edu eth0 gbe2_data = 171.16.1.4/26 eth1 gpe2_int = 172.16.1.66/26 10 bootcd file system / bin/ dev/ home/ lib/ ... etc/ init.d/ pl_boot pl_netinit pl_validateconf pl_sysinit pl_hwinit ... ... root/ selinux/ sys/ usr/ • pl_boot: modified to not use ssl or pgp to retrieve BootManager script from the cp • pl_netinit: sets boot_server to reference the cp • pl_validateconf: added SPP specific variables Fred Kuhns - 3/24/2016 Washington WASHINGTON UNIVERSITY IN ST LOUIS 11 overlay image / etc/{issue, passwd} kargs.txt pl_version usr/ isolinux boot/ spp_netinit.py ethers spp_conf.txt boot_server boot_server_port boot_server_path plnode.txt cacert.pem plc_config pubring.gpg backup/ boot_server boot_server_path boot_server_port cacert.pem pubring.gpg bootme/ BOOTPORT BOOTSERVER BOOTSERVER_IP ID cacert/drn06.arl.wustl.edu/cacert.pem • Changed to list cp as boot server and port as 81 • Added SPP initialization script and config files • Changed plnode.txt to list this GPEs mac address for control interface Fred Kuhns - 3/24/2016 Washington WASHINGTON UNIVERSITY IN ST LOUIS 12 GPE Configuration file: spp_conf.txt # Config name: spp1.txt [ nserv ] ctrl_ipaddr=192.168.32.1 ctrl_hwaddr=00:1E:C9:FE:76:22 data_ipaddr=172.16.1.1 data_hwaddr=00:1E:C9:FE:76:23 [ domain ] hostname=drn05 domain=arl.wustl.edu dns1=128.252.133.45 dns2=128.252.120.1 gateway=128.252.153.31 [ hosts ] nserv_f1.0=172.16.1.1 nserv=192.168.32.1 nserv_gbl=192.168.48.1 shmgr=192.168.48.2 hub=192.168.32.17 hub1_f1.0=172.16.1.2 hub1_m.0=192.168.48.17 gpe1_f1.0=172.16.1.3 gpe1_f1.1=172.16.1.65 gpe1_b1.0=192.168.32.65 gpe2_f1.0=172.16.1.4 gpe2_f1.1=172.16.1.66 gpe2_b1.0=192.168.32.49 npe1_f1.0=172.16.1.5 npe1_b1.0=192.168.32.81 npe1_m.0=192.168.48.81 npe1_b1.1=192.168.32.82 lc_f1.0=172.16.1.6 lc_b1.0=192.168.32.97 lc_m.0=192.168.48.97 lc_b1.1=192.168.32.98 drn05.arl.wustl.edu=128.252.153.209 Fred Kuhns - 3/24/2016 [ iface ] __name__=eth0 dev=eth0 name=gpe1_f1.0 hwaddr=00:0e:0c:85:e4:40 type=data lanid=fabric1 port=0 vlan=0 ipaddr=172.16.1.3 ipnet=172.16.1.0 ipbcast=172.16.1.63 ipmask=255.255.255.192 arp=no enable=yes [ iface ] __name__=eth0.2 dev=eth0.2 name=gpe1_f1.0 hwaddr=00:0e:0c:85:e4:40 vlan=2 type=data lanid=fabric1 port=0 ipaddr=128.252.153.209 ipnet=128.252.0.0 ipbcast=128.252.255.255 ipmask=255.255.0.0 arp=no enable=yes Washington WASHINGTON UNIVERSITY IN ST LOUIS [ iface ] __name__=eth1 dev=eth1 name=gpe1_f1.1 hwaddr=00:0e:0c:85:e4:42 type=data lanid=fabric1 port=1 vlan=0 ipaddr=172.16.1.65 ipnet=172.16.1.64 ipbcast=172.16.1.127 ipmask=255.255.255.192 arp=no enable=yes [ iface ] __name__=eth2 dev=eth2 name=gpe1_b1.0 hwaddr=00:0e:0c:85:e4:3e type=control lanid=base1 port=0 vlan=0 ipaddr=192.168.32.65 ipnet=192.168.32.0 ipbcast=192.168.39.255 ipmask=255.255.248.0 arp=yes enable=yes 13 ethers # ---------------------------------------------------------------------# Board Type cp, Name cp1, Slot 0 # nserv_f1.0 fabric1/0 00:1E:C9:FE:76:23 172.16.1.1 # nserv base1/0 00:1E:C9:FE:76:22 192.168.32.1 # nserv_gbl maint/0 00:10:18:32:00:76 192.168.48.1 # ---------------------------------------------------------------------# Board Type shmgr, Name shmgr1, Slot 0 # shmgr maint/0 00:50:C2:3F:D2:74 192.168.48.2 # ---------------------------------------------------------------------# Board Type hub, Name hub1, Slot 1 # hub base1/0 00:00:50:3D:10:6B 192.168.32.17 # hub1_f1.0 fabric1/0 00:00:50:3D:10:B0 172.16.1.2 # hub1_m.0 maint/0 00:00:50:3D:10:6C 192.168.48.17 # ---------------------------------------------------------------------# Board Type gpe, Name gpe1, Slot 4 # gpe1_f1.0 fabric1/0 00:0e:0c:85:e4:40 172.16.1.3 # gpe1_f1.1 fabric1/1 00:0e:0c:85:e4:42 172.16.1.65 # gpe1_b1.0 base1/0 00:0e:0c:85:e4:3e 192.168.32.65 # ---------------------------------------------------------------------- Fred Kuhns - 3/24/2016 # ---------------------------------------------------------------------# Board Type gpe, Name gpe2, Slot 3 # gpe2_f1.0 fabric1/0 00:0E:0C:85:E6:08 172.16.1.4 # gpe2_f1.1 fabric1/1 00:0E:0C:85:E6:0A 172.16.1.66 # gpe2_b1.0 base1/0 00:0E:0C:85:E6:06 192.168.32.49 # ---------------------------------------------------------------------# Board Type npe, Name npe1, Slot 5 # npe1_f1.0 fabric1/0 00:00:00:00:00:00 172.16.1.5 # npe1_b1.0 base1/0 00:00:50:3d:07:3e 192.168.32.81 # npe1_m.0 maint/0 00:00:50:3D:07:3C 192.168.48.81 # npe1_b1.1 base1/1 00:00:50:3D:07:3D 192.168.32.82 # ---------------------------------------------------------------------# Board Type lc, Name lc1, Slot 6 # lc_f1.0 fabric1/0 00:00:50:3d:0b:d4 172.16.1.6 # lc_b1.0 base1/0 00:00:50:3D:08:26 192.168.32.97 # lc_m.0 maint/0 00:00:50:3D:08:24 192.168.48.97 # lc_b1.1 base1/1 00:00:50:3D:08:25 192.168.32.98 # ---------------------------------------------------------------------# Gateway for drn05 (128.252.153.209), VLAN 2 00:00:50:3d:0b:d4 128.252.153.31 Washington WASHINGTON UNIVERSITY IN ST LOUIS 14 BootAPI calls made by the BootManager • PLCAPI/BootAPI calls 1. GetSession(node_id, auth, node_ip) returns new session key for node 2. BootCheckAuthentication(Session) returns true if Session id is valid 3. GetNodes(Session, node_id, [‘nodegroup_ids’,‘nodenetwork_ids’,‘model’,‘site_id’]) returns the indicated parameters for this node (ie. node_id). 4. GetNodeNetworks(Session, node_id, nodenetwork_ids) returns list of interfaces [ broadcast, network, ip, dns1, dns2, hostname, netmask, gateway, nodenetwork_id, method, mac, node_id, is_primary, type, bwlimit, nodenetwork_settings_ids ] 5. GetNodes(Session, node_id, ‘nodegroup_ids’) returns list of group ids associated with this node 6. GetNodeGroups(Session, nodegroup_id, ‘name’) returns the name string for each node group (in out case ‘SPP’) 7. GetNodeNetworkSettings() 8. BootUpdateNode(Session, boot_state) Sets node’s boot state at PLC 9. BootNotifyOwners(Session, “event”, params) causes email to be sent to the list of node owners. 10.BootUpdateNode(Session, ssh_host_key) records the latest ssh public key for node. Fred Kuhns - 3/24/2016 Washington WASHINGTON UNIVERSITY IN ST LOUIS 15 Other PLC/Server interactions • HTTP/HTTPS – Upload alpina boot logs: BOOT_SERVER_URL += /alpina-logs/upload.php – Compatibility step (we don’t use) BOOT_SERVER_URL +=/alpina-BootLVM.tar.gz BOOT_SERVER_URL +=/alpina-PartDisk.tar.gz – Download file system tar file containing basic plab node environment BOOT_SERVER_URL += /boot/bootstrapfs-”group”-”arch”.tar.bz2 – If not in config file get node id BOOT_SERVER_URL += /boot/getnodeid.php – Get yum update configuration file: BOOT_SERVER_URL += /PlanetLabConf/yum.conf.php Fred Kuhns - 3/24/2016 Washington WASHINGTON UNIVERSITY IN ST LOUIS 16 System Initialization: Stage 1 • Use PXE boot and download pxelinux and config file: – boot using basic initial ramdisk, overlay and kernel – Use dhcp, tftp and pxe server on the cp, files stored in the tfptboot directory. pxelinux.o, pxelinux.cfg/<GPE_IPADDR> bootcd.img, overlay_gpeX.img, kernel – The overlay image is modified for each GPE to include it’s configuration file, modified planetlab config files and an spp node python script. • Currently this is a manual step but ultimate (long term) plan is for the gnm daemon to create the individual images • The overlay image contains several files that identify the node and provide the name and address for the PLC and Boot servers. I have modified these to point o the cp. • Just before booting the final kernel I change these values to refer to the “real” plc/api servers. Fred Kuhns - 3/24/2016 Washington WASHINGTON UNIVERSITY IN ST LOUIS 17 System initialization: Stage 2 • Boot into basic, intermediate environment • Initial configuration information obtained from the overlay image – – – – – Includes spp_conf.txt defines gpe interfaces Includes ethers file contains mac addresses for static arp entries Updated plnode.txt with GPE’s control interface mac address Modified bootserver files listing the cp as the bootserver Includes spp_netinit.py, a python script to configure the interfaces and update system configuration files. • Enables “primary” interface and key network configuration files such as resolv.conf • Downloads BootManager source from the “boot_server” – In our case we download from the CP – I explicitly disable the use of ssl and certs (the certifictes on the overlay image are for the PLC server and not the CP) – Our assumption is that the control (base) network is “secure” plus within an SPP node we don’t have to worry about authentication issues. Fred Kuhns - 3/24/2016 Washington WASHINGTON UNIVERSITY IN ST LOUIS 18 BootManager • Opens connection to PLCAPI on bootserver – Opens connection to our proxy plcapi/bootapi server running on the CP • Get node session key: GetSession(node_id, auth, node_ip) – Since each call to create a session invalidates any existing keys we intercept this call on the cp and use a common session key for all gpes. • Determines node’s configuration – reads plnode.txt for node_id, node_key and the primary interface settings • we use DHCP to configure the control interface but I do not define a dns server – if node_id is not found then reads URL=BootServer/boot/getnodeid.php • Call BootCheckAuthentication(Session) to verify session key • Calls GetNodes to get the boot_state, node_groups, model, site_id • Calls GetNodeNetworks to get configuration information for all interfaces – in our case the call would return the externally visible network parameters, which differ from how each GPE is configured – long term, we can intercept this call and return GPE specific interface config info. – Short term we use a configuration file in the overlay image with similarly formatted information. I have replaced the BootManager code that reads the config info and configures the interfaces. – I had to add support for VLANs and our internal interfaces. Fred Kuhns - 3/24/2016 Washington WASHINGTON UNIVERSITY IN ST LOUIS 19 BootManager Continued • Download the nodes final filesystem image from the boot_server – in our case this is the CP, http://CP/boot/bootstrap-planetlab-i386-tar.bz2 • Download yum config file – I am not currently downloading, http://CP/PlanetLabConf/yum.conf • Call BootUpdateNode with new boot_state – we will need to intercept this call and both report and set node state based on all GPEs. • Call BootNotifyOwners with new state – forward to PLC • Update network configuration in new “sysimg” – downloads //BootServer/ PlanetLabConf/plc_config file • In our case I have copied onto the overlay image in the /usr/boot directory. – calls GetNodeNetworkSettings for a list of any additional interface attributes then creates various configuration files: hosts, resolv.conf, network, ifcfg-eth* • I have replaced this step with our own script spp_netinit.py and configuration file spp_conf.txt which I use to create the same config files in both the current environment and the new sysimg. – updates devices and creates the initrd image used for the next stage – finally boots a new kernel using the bootstrap file system Fred Kuhns - 3/24/2016 Washington WASHINGTON UNIVERSITY IN ST LOUIS 20 Boot States • The list of boot states is changing as I write this • In our version of the plc the states are shown on the right State Description install verified -> rins new instal: verify install with user. error->dbg install verified -> rins Install: same as new error->dbg reinstall: reformat disk success->boot and reinstall all software error->dbg and files. boot boot: boot using existing partitions error -> dbg Success: same as boot debug: boot node Fail: bootcd image diagnostics: bootcd user controlled image new inst rins boot dbg diag disable Fred Kuhns - 3/24/2016 Next state user controlled Washington WASHINGTON UNIVERSITY IN ST LOUIS disable: bootcd image 21 PLC Database • The PlanetLab central database keeps a database describing all nodes, slices and users/people. • Slice data base keeps track of all slices and their node bindings • The Node database includes externally visible properties and the ability to associate general attributes with these properties – the current (or next) node state (boot_state) – node identifier (node_id) – list of interface configuration parameters • ip address information, mac address, generic list of attributes – node’s owner – node’s site identifier (site_id) – model, can be used to specify a set of attributes forthe node. For example: minhw, smp – current ssh host key (ssh_host_key) – node groups: I believe this is being depricated in favor of associate a generic set of attributes with a node or its interfaces. Fred Kuhns - 3/24/2016 Washington WASHINGTON UNIVERSITY IN ST LOUIS 22 SPP Specific Information • On an SPP node the resource manager needs to know what kind of board is inserted in each slot and its I/O characteristics • Needs to associate interface MAC addresses with boards and interfaces. Or with standalone system connected to an RTM or front panel (for example the CP). • Also need to know which interfaces are connected to the base and which to the fabric switch when bringing up general purpose systems. • There is not a convenient mechanism for determining this at run time so I have a configuration file. • Also need to know what resources are available on each board and allocation policies. • Must also have a list of external links, their addresses and the address of any peers (Ethernet). • Need to keep track of current nodes state (as kept by PLC) as well as the state of each individual board. • Need to share state between different daemons Fred Kuhns - 3/24/2016 Washington WASHINGTON UNIVERSITY IN ST LOUIS 23 Node Configuration File <?xml version="1.0" encoding="utf-8" standalone="yes"?> <spp> <code_options> <IPv4 sram="fixed" queues="variable" id="0" fltrs="variable"> <sram> 1024 </sram> </IPv4> <I3 sram="fixed" queues="variable" id="1" fltrs="variable"> <sram> 1024 </sram> </I3> </code_options> <components> <cp name="cp1" slot="0" cat="host" alias="nserv"> <interface name="nserv_f1.0" dev="GigE" lanid="fabric1" assoc="" port="0"> ... </interface> ... </cp> <shmgr name="shmgr1" slot="0" cat="atca" alias="shmgr1"> <interface name="shmgr" dev="GigE" lanid="maint" assoc="" port="0"> ... </interface> ... </shmgr> <hub name="hub1" slot="1" cat="atca" alias="hub1"> <switch lanid="base1"> </switch> <switch lanid="fabric1"> <bw> 10000000000 </bw> </switch> <interface name="hub" dev="GigE" lanid="base1" assoc="" port="0"> ... </interface> ... </hub> <gpe name="gpe1" slot="4" cat="atca" alias="gpe1"> <interface name="gpe1_f1.0" dev="GigE" lanid="fabric1" assoc="" port="0"> ... </interface> ... </gpe> <npe name="npe1" slot="5" cat="atca" alias="npe1"> <product> Radisys_7010 </product> <model> NPEv1 </model> <interface name="npe1_f1.0" dev="10GigE" lanid="fabric1" assoc="" port="0"> ... </interface> ... </npe> <lc name="lc1" slot="6" cat="atca" alias="lc"> <product> Radisys_7010 </product> <model> LCv1 </model> <interface name="lc_f1.0" dev="10GigE" lanid="fabric1" assoc="" port="0"> ... </interface> ... <interface name="drn05" dev="GigE" lanid="external" port="0"> ... <link peering="true" primary="true" dev="GigE"> ... </link> ... </interface></lc> </components> </spp> Fred Kuhns - 3/24/2016 Washington WASHINGTON UNIVERSITY IN ST LOUIS 24 CP Record <!-- Interface parameters defined by user in original “xml” file --> <cp name="cp1" slot="0" cat="host" alias="nserv"> <interface name="nserv_f1.0" dev="GigE" lanid="fabric1" assoc="" port="0"> <!-- All internal IP addrs assigned by configuration software based on runtime parameters --> <ipaddr>172.16.1.1</ipaddr> <ipnet>172.16.1.0</ipnet> <ipmask>255.255.255.192</ipmask> <ipbcast>172.16.1.63</ipbcast> <!-- Device parameters and comment set by user in the original “xml” file --> <device> eth0 </device> <hwaddr> 00:1E:C9:FE:76:23 </hwaddr> <desc> Interface connected to HUB's fabric port </desc> </interface> <interface name="nserv" dev="GigE" lanid="base1" assoc="" port="0"> <ipaddr>192.168.32.1</ipaddr> <ipnet>192.168.32.0</ipnet> <ipmask>255.255.248.0</ipmask> <ipbcast>192.168.39.255</ipbcast> <device> eth1 </device> <hwaddr> 00:1E:C9:FE:76:22 </hwaddr> <desc> System control processor's Base Ethernet connection </desc> </interface> <interface name="nserv_gbl" dev="GigE" lanid="maint" assoc="" port="0"> <ipaddr>192.168.48.1</ipaddr> <ipnet>192.168.48.0</ipnet> <ipmask>255.255.248.0</ipmask> <ipbcast>192.168.55.255</ipbcast> <device> eth2 </device> <hwaddr> 00:10:18:32:00:76 </hwaddr> <desc> Connection to the maintenance ports </desc> </interface> </cp> Fred Kuhns - 3/24/2016 Washington WASHINGTON UNIVERSITY IN ST LOUIS 25 GPE Record <gpe name="gpe1" slot="4" cat="atca" alias="gpe1"> <interface name="gpe1_f1.0" dev="GigE" lanid="fabric1" assoc="" port="0"> -- IP Address Info -<device> eth0 </device> <hwaddr> 00:0e:0c:85:e4:40 </hwaddr> (Device Data) <bw> 1000000000 </bw><share> 2 </share> (Resource Policy) <desc> MAC=N+2, Fabric 1/0 or AMC Port 0 </desc></interface> <interface name="gpe1_f1.1" dev="GigE" lanid="fabric1" assoc="" port="1"> -- IP Address Info -- <device> eth1 </device> <hwaddr> 00:0e:0c:85:e4:42 </hwaddr> <desc> MAC=N+4, Fabric 1/1 or Maintenance Port 1 </desc></interface> <interface name="gpe1_b1.0" dev="GigE" lanid="base1" assoc="" port="0"> -- IP Address Info -- <device> eth2 </device> <hwaddr> 00:0e:0c:85:e4:3e </hwaddr> <desc> MAC=N, Base connection to Primary HUB </desc></interface> <interface name="gpe1_b2.0" dev="GigE" lanid="base2" assoc="" port="0"> -- IP Address Info -- <device> eth3 </device> <hwaddr> 00:0e:0c:85:e4:3f </hwaddr> <desc> MAC=N+1, Base connection to alternate HUB </desc></interface> <interface name="gpe1_f2.0" dev="GigE" lanid="fabric2" assoc="" port="0"> -- IP Address Info -- <device> eth4 </device> <hwaddr> 00:0e:0c:85:e4:41 </hwaddr> <desc> MAC=N+3, Fabric 2/0 or AMC Port 1 </desc></interface> <interface name="gpe1_f2.1" dev="GigE" lanid="fabric2" assoc="" port="1"> -- IP Address Info -- <device> eth5 </device> <hwaddr> 00:0e:0c:85:e4:43 </hwaddr> <desc> MAC=N+5, Fabric 2/1 or Maintenance Port 2 </desc></interface> </gpe> Fred Kuhns - 3/24/2016 Washington WASHINGTON UNIVERSITY IN ST LOUIS 26 NPE Record <npe name="npe1" slot="5" cat="atca" alias="npe1"> <product> Radisys_7010 </product> <model> NPEv1 </model> <interface name="npe1_f1.0" dev="10GigE" lanid="fabric1" assoc="" port="0"> -- IP Address Info --- Device Data --- Resource Policy -<desc> Fabric interface used for both NPUs </desc></interface> <interface name="npe1_b1.0" dev="GigE" lanid="base1" assoc="npua" port="0"> -- IP Address Info --- Device Data -<desc> Primary control interface associated with NPUA </desc></interface> <interface name="npe1_m.0" dev="GigE" lanid="maint" assoc="npua" port="0"> -- IP Address Info --- Device Data -<desc> NPUA Front Maintenance Port </desc></interface> <interface name="npe1_b1.1" dev="GigE" lanid="base1" assoc="npub" port="1"> -- IP Address Info --- Device Data -<desc> NPUB Front Maintenance Port -- But it's been patched to the Base switch </desc> </interface> </npe> Fred Kuhns - 3/24/2016 Washington WASHINGTON UNIVERSITY IN ST LOUIS 27 LC Record <lc name="lc1" slot="6" cat="atca" alias="lc"> <product> Radisys_7010 </product> <model> LCv1 </model> (Model Data) <interface name="lc_f1.0" dev="10GigE" lanid="fabric1" assoc="" port="0"> -- IP Address Info -- -- Device Data -- -- Resource Policy -- </interface> <interface name="lc_b1.0" dev="GigE" lanid="base1" assoc="npua" port="0"> -- IP Address Info -- -- Device Data -- </interface> <interface name="lc_m.0" dev="GigE" lanid="maint" assoc="npua" port="0"> -- IP Address Info -- -- Device Data -- </interface> <interface name="lc_b1.1" dev="GigE" lanid="base1" assoc="npub" port="1"> -- IP Address Info -- -- Device Data --</interface> <interface name="drn05" dev="GigE" lanid="external" port="0"> <hwaddr> 00:00:50:29:b1:46 </hwaddr> <link peering="true" primary="true" dev="GigE"> -- Link IP Address Info -- -- Device Data -- -- Resource Policy -<domain> arl.wustl.edu </domain> <hostname> drn05 </hostname> <dns1> 128.252.133.45 </dns1> <dns2> 128.252.120.1 </dns2> <peerIP> 128.252.153.31 </peerIP> <peerMAC> 00:0F:B5:FB:D8:67 </peerMAC> <vlan> 2 </vlan> <port_pool> <!-- used for NAT --> <udp count="500" start="30000"> </udp> <tcp count="500" start="30000"> </tcp> </port_pool> <desc> p2p link from drn05 to drn06, the plc </desc> </link></interface> </lc> Fred Kuhns - 3/24/2016 Washington WASHINGTON UNIVERSITY IN ST LOUIS 28 SRM Interface NATD to SRM: [egress_map, ingress_map] get_sched_map(LinkIP, BoardMAC) Depricated: original natd interface! {fid, port} alloc_epmap(map) status free_epmap(fid) FS to SRM: ?? (map vlan to slice id) RMP to SRM: Interfaces (Line Card Links): if_list get_interfaces(plabID) ifn get_ifn(plabID, ipaddr) if_entry get_ifattrs(plabID, ifn) : ipaddr get_ifpeer(plabID, ifn) : retcode resrv_fpath_ifbw(bw, ifn) retcode reles_fpath_ifbw(bw, ifn) To be implemented: retcode resrv_slice_ifbw(plabID, bw, ifn) retcode reles_slice_ifbw(plabID, bw, ifn) Fred Kuhns - 3/24/2016 EndPoints (local IP and Port number): NATD changes may have broken these ep alloc_endpoint(PlabID, ep) status free_endpoint(PlabID, ipaddr, port, proto) Fast Path: fp_params alloc_fastpath(PlabID, copt, bwspec,rcnts, mem) status free_fastpath() Fast-Path Meta-Interfaces: [mi, ep] alloc_udp_tunnel(bw, ipaddr, port) ep get_endpoint(mi) status free_udp_tunnel(ipaddr, port) Washington WASHINGTON UNIVERSITY IN ST LOUIS 29 RMP Interface Prototype completed: To do: 19. ep alloc_endpoint(ep) 1. result noop() 20. status free_endpoint(ipaddr, port, proto) 2. version get_version() 21. -- alloc_tunnel -3. result add_slice(plabID, len, name) 22. -- free_tunnel -4. result rem_slice(plabID) 23. [mi, ep] alloc_udp_tunnel(fpid, bw, ip, port) 5. ret_t alloc_fastpath(copt, bw, rcnts, mem) 24. status free_udp_tunnel(ipaddr, port) 6. void free_fastpath() 25. ep get_endpoint(fpid, mi) 7. if_list get_interfaces() 26. retcode write_fltr(fpid, fid, fltr) 8. ifn get_ifn(ipaddr) 27. retcode update_result(fpid, fid, result) 9. if_entry get_ifattrs(ifn) 28. fltr_t get_fltr_bykey(fpid, key) 10. ipaddr get_ifpeer(ifn) 11. retcode alloc_pl_ifbw(ifn, bw) 29. fltr_t get_fltr_byfid(fpid, fid) 12. retcode reles_pl_ifbw(ifn, bw) 30. result lookup_fltr(fpid, key) 13. retcode alloc_fpath_ifbw(fpid, ifn, bw) 31. retcode rem_fltr_bykey(fpid, key) 14. retcode reles_fpath_ifbw(fpid, ifn, bw) 32. retcode rem_fltr_byfid(fpid, fid) 15. retcode bind_queue(fpid, miid, list_type, qids) 33. stats_t read_stats(fpid, sindx, flags) 16. actual_bw set_queue_params(fpid, qid, 34. result clear_stats(sindx) threshold, bw) 35. handle create_periodic(fp,indx,P,cnt,flags) 36. retcode delete_periodic(fpid, handle) 17. [threshold, bw] get_queue_params(fpid, qid) 37. retcode set_callback(fpid, handle, xport) 18. [u32 Pkts, u32 Bytes] get_queue_len(fpid, qid) 38. stats_t get_periodic(fpid, handle) 39. retcode mem_write(fpid, offset[, len], data) 40. data mem_read(fpid, offset, len) Fred Kuhns - 3/24/2016 Washington WASHINGTON UNIVERSITY IN ST LOUIS 30 NPE SCD Interface SRM to SCD status set_fastpath(fpid, copt, VLAN, params, mem) status enable_fastpath(fpid) status disable_fastpath(fpid) status rem_fastpath(fpid) status set_sched_params(sid, ifn, BWmax, BWmin) status set_encap_cb(sid, srcIP, dMAC) status set_fpmi_bw(fpid, sid, miid, bw) status start_mes() status stop_mes() status set_encap_gpe(fpid, gpeIP, npeIP) result write_mem(kpa, len, data) data read_mem(kpa, len) SRM & RMP to SCD ret_t write_fltr(dbid, fid, key, mask, result) ret_t update_result(dbid, fid, result) fltr get_fltr_bykey(dbid, key) fltr get_fltr_byfid(dbid, fid) result lookup_fltr(dbid, key) retcode rem_fltr_bykey(dbid, key); retcode rem_fltr_byfid(dbid, fid) Fred Kuhns - 3/24/2016 RMP to SCD status set_gpe_info(exPort, ldPort, exQID, ldQID) u32 result bind_queue(u16 miid, u8 list_type, u16[] qid_list) u32 bw set_queue_params(u16 qid, u32 threshold, u32 bw) {u32 threshold, u32 bw} get_queue_params(u16 qid) {u32 pktCnt, u32 byteCnt} get_queue_len(u16 qid) result write_sram(offset, len, data) data read_sram(offset, len) stats = read_stats(sindx, flags) result = clear_stats(sindx) handle create_periodic(sindx, P, cnt, flags) retcode del_periodic(handle) retcode set_callback(handle, udp_port) stats = get_periodic(handle) Washington WASHINGTON UNIVERSITY IN ST LOUIS 31 LC SCD Interface SRM to SCD status set_sched_params(sid, ifn, BWmax, BWmin) status set_sched_mac(sid, MACdst, MACsrc) u32 result set_queue_sched(u16 qid, u16 sid) result write_mem(kpa, len, data) RMP to SCD data read_mem(kpa, len) u32 actual_bw set_queue_params(u16 qid, u32 threshold, u32 bw) SRM and RMP to SCD: {u32 threshold, u32 bw} ret_t write_fltr(dbid, fid, key, mask, result) get_queue_params(u16 qid) ret_t update_result(dbid, fid, result) {u32 pktCnt, u32 byteCnt} fltr get_fltr_bykey(dbid, key) get_queue_len(u16 qid) fltr get_fltr_byfid(dbid, fid) stats = read_stats(sindx, flags) result lookup_fltr(dbid, key) result = clear_stats(sindx) retcode rem_fltr_bykey(dbid, key); handle create_periodic(sindx, P, cnt, flags) retcode rem_fltr_byfid(dbid, fid) retcode del_periodic(handle) retcode set_callback(handle, udp_port) stats = get_periodic(handle) Fred Kuhns - 3/24/2016 Washington WASHINGTON UNIVERSITY IN ST LOUIS 32 Slice Example • Get list of interfaces, their Ip addresses and available bandwidth if_list = {if_entry, ...} if_entry = {u16 ifn, // logical interface number u16 type, // peering or multi-access u32 ipaddr, // interface’s IP address u32 linkBW, // Link’s native BW u32 availBW} // BW available for allocation struct epoint_t {u32 bw, u32 ipaddr; // interface’s IP address u16 port, // UDP port number for meta-interface u32 bw;} // total BW required for meta-interface iflist = get_interfaces(iflist); // return list of all available interfaces • Estimate the computational complexity and memory bandwidth requirements on NPE. bwSpec = {BWmax=totalBW, BWmin=0}; // fast path total BW requirement • max general NPE resource counts for this example I just assume a max number but in general it may be that a user scales it by the number of meta-interfaces they will use. fpCounts = {FLTR_CNT, QID_CNT, BUFF_CNT, STATS_CNT}; • Request substrate to allocate a fastpath instance for the IPv4 code option, assume we will use the default sram buffer sizes. Will also need to listen to returned sockes. [fpid, sockets] = alloc_fastpath(ipv4_copt, bwSpec, fpCnts, {IPV4_SRAM_SZ, 0}); Fred Kuhns - 3/24/2016 Washington WASHINGTON UNIVERSITY IN ST LOUIS 33 Slice Example - Continued • allocate one meta-interfaces for each external interface and assign our default UDP port number and BW requirement struct mi_t {uint_t mi; epoint_t rp;}; mi_t milist[iflist.len()]; for (indx = 0, mi = 0; indx < len(iflist); ++indx) { if (miBW > iflist[indx].availBW) throw Error; // allocate total BW required on this interface if (alloc_fpath_ifbw(fpid, iflist[indx].ifn, miBW)==1) throw Error; // Allocate one meta-interface on this interface milist[indx] = alloc_udp_tunnel(fpid, miBW, iflist[indx].ipaddr, myPort) my_bind_queues(milist+indx); my_add_routes(milist+indx); } Fred Kuhns - 3/24/2016 Washington WASHINGTON UNIVERSITY IN ST LOUIS 34 Test SPP Node keystone.arl.wustl.edu 128.252.153.81 scd natd 0/6 lc_b1a = 192.168.64.97/20 the ARL network 128.252.153.* vlan 2 eth2.2 128.252.153.XXX eth0:0 128.252.153.XXX eth1 eth0 lc1_data = 171.16.1.6/26 ... Egress XScale eth0 128.252.153.YYY Hub Ingress XScale b1a 0/6 FP 1/6 FP 1/7 eth0 b1b “Router” Issue Mounting /opt/crossbuild/* from ebony. Could export dirs form the “Router” host. Or could use ebony rather than “Router”. In that case will need an external switch connecting line cards of spp? to ebony’s eth2.2. Fred Kuhns - 3/24/2016 b1 noarp eth0.2 vlan 2 f1/0 gbe2_data = 171.16.1.5/26 eth0 keystone.arl.wustl.edu 0/5 2/1 f1/1 Washington WASHINGTON UNIVERSITY IN ST LOUIS eth0 cp_data = 171.16.1.1/26 b2 eth2 gpe1_ctrl = 192.168.64.33/20 noarp vlan 2 eth0.2 keystone.arl.wustl.edu eth0 gbe1_data = 171.16.1.2/26 f2/1* eth1 gpe1_int = 172.16.1.66/26 GPE2 (Slot 3) /etc/{ethers,hosts} /etc/sysconfig/network-scripts/ifcfg-eth* gpe2_int = 172.16.1.68/26 eth1 cp_ctrl = 192.168.64.1/20 f1/1* GPE3 (Slot 4) noarp keystone.arl.wustl.edu eth0.2 vlan 2 f1/0 gbe2_data = 171.16.1.4/26 eth0 eth2 noarp vlan 2 eth0.2 keystone.arl.wustl.edu RTM 3/2 f1/1 b1 b1 /etc/{ethers,hosts} /etc/sysconfig/network-scripts/ifcfg-eth* 0/5 gpe2_ctrl = 192.168.64.65/20 eth2 dhcpd /tftpboot/ ramdisk.gz zImage.ppm10 GPE1 (Slot 2) /etc/{ethers,hosts} /etc/sysconfig/network-scripts/ifcfg-eth* gpe2_int = 172.16.1.69/26 eth1 /etc/ dhcpd.conf ethers hosts f1/0 GPE4 (Slot 5) gpe2_ctrl = 192.168.64.81/20 eth2 srm RTM 3/1 FP 1/9 192.168.64.2 IP Routing proxy arp for keystone CP f1/0 scd lc_b1b = 192.168.64.98/20 192.168.64.17 Line Card (Slot 6) 0/4 0/3 0/4 0/3 /etc/{ethers,hosts} /etc/sysconfig/network-scripts/ifcfg-eth* b1 eth2 gpe2_ctrl = 192.168.64.49/20 noarp vlan 2 eth0.2 keystone.arl.wustl.edu f1/0 eth0 gbe2_data = 171.16.1.3/26 f1/1 eth1 gpe2_int = 172.16.1.67/26 35 Test Bed Use • Core platform issues: – Can we use the second fabric port on the GPE boards? – The hub does not display stats or mac fwd entries for the slots with GPEs. It used to work. – The radisys shelf manager • does not reliably reset boards • Base1 interface disabled on slot 2 • NAT/Line Card testing – Overall reliability – Add support for aging – Specific issues (jdd) • restarting line card (without reboot) occasionally results in data-path thinking the scratch ring to the xscale is full. • looping iperf test from cp occasionally stalls with no packets getting through LC • Lookup needs fix to not use DONE bit to indicate a tcam lookup is done. • GPE/Intel board testing Fred Kuhns - 3/24/2016 Washington WASHINGTON UNIVERSITY IN ST LOUIS 36