INTEGRATING THE ARM-BASED RASPBERRY PI INTO AN ARCHITECTURE COURSE * David Tarnoff Department of Computing East Tennessee State University Johnson City, TN 37614 423.439.6404 tarnoff@etsu.edu ABSTRACT The complex strategies being incorporated into modern processors to improve performance and security have forced computer architecture instruction to become largely theoretical. With the advent in 2011 of the ARM-based Raspberry Pi, computer architecture students could once again get hands-on experience using a single-board computer. Resources within the ARM processor and on the Raspberry PI allow students to practice almost all of the objectives of the ACM/IEEE computer architecture curriculum. This paper discusses how the Raspberry Pi was incorporated into a computer architecture course along with a description of the laboratory setup and examples of student assignments. INTRODUCTION The use of hands-on laboratory activities has been shown to improve students’ interest in and ability to understand course material [10]. In addition, hands-on activities give students a clearer picture of what to expect during a professional career. The challenge is to bring the equipment into the classroom without placing too great a financial burden on the student. In the 1980's, students could get hands-on experience using low-cost single-board computers. The slower clock speeds and interfacing requirements allowed students to access the system bus, add I/O peripherals, and fully examine the processor’s ___________________________________________ * Copyright © 2015 by the Consortium for Computing Sciences in Colleges. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the CCSC copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Consortium for Computing Sciences in Colleges. To copy otherwise, or to republish, requires a fee and/or specific permission. 67 JCSC 30, 5 (May 2015) architectural features. Since then, the drive for performance and the increasing gap between hardware and software due to operating system requirements have hindered an applied study of computer architecture. This paper demonstrates how the author used the ARM-based Raspberry Pi single-board computer to design a cost-effective laboratory setup and a set of hands-on exercises for a course in computer architecture. The concepts presented in these exercises directly reinforce the objectives for this particular course. LABORATORY CHALLENGES For the past fifteen years, the author has faced a number of challenges when it came to providing his computer architecture students with the tools to access a processor’s architectural features. One such attempt was to use laboratory desktop machines. The modern-day operating system’s “hands off” view of hardware, however, made this nearly impossible. In addition, the security measures taken by the University’s IT department made it impossible to boot the laboratory machines from a live CD. Booting from a live CD containing a more acquiescent operating system would have allowed students to get closer to the architecture. Another option attempted was to use emulators to simulate the processor’s architecture. Challenges arose here too. It was difficult if not impossible to emulate features such as hardware timers, interrupts, RAM caches, and DMA. In addition, features such as counters and timers present on real hardware that would allow the students to measure hardware performance did not exist or were simulated. Similar challenges occurred when trying to use virtualization. THE SINGLE-BOARD COMPUTER SOLUTION The challenges outlined in the previous section could be solved by getting hardware into the hands of the students. Single-board computers, small motherboards containing all of the components of a full computer system, might provide a solution. Originally, these devices were used by processor manufacturers to demonstrate new designs or to provide development platforms for programmers. Over time, hobbyists and educators began using the boards to create systems of their own. The primary drawback of single-board computers is their cost. They can be very expensive costing upwards of $150 to $400 each. In 2011, the Raspberry Pi Foundation released a much anticipated single-board computer called the Raspberry Pi based on the ARM architecture. At $35 for the top-end Model B, its intent was to promote basic programming to school-age children [5]. Researchers and developers, however, quickly adopted the device into their own work. The University of Southampton, for example, built a 64-node cluster from Raspberry Pi and LEGO [4]. The Raspberry Pi was used as the foundation for an operating systems development course at Cambridge University [6]. The International Technology University began including the Raspberry Pi in its introductory course in embedded system design [9]. 68 CCSC: Mid-South Conference The typical bootable image for the Raspberry Pi is based on a Debian Linux distribution containing a lightweight X11 desktop environment and development tools including gcc, as, and gdb. By connecting an HDMI-ready monitor and a USB keyboard and mouse, a user can have a fully functional “desktop” computer. Alternatively, the Raspberry Pi’s I/O header provides a serial port that can be connected to a desktop machine using a terminal emulator. A FAT-formatted SD card handles bootable media and persistent storage. Advanced users can create a custom bootable operating system and store it to the Raspberry Pi’s SD card [6]. Alternatively, by storing a bootloader to the SD card, a bootable image can be uploaded to the Raspberry Pi through its serial port [13]. BENEFITS OF THE ARM ARCHITECTURE It has been shown that students are more enthusiastic about learning on commonly used, commercially available hardware because they feel it gives them a better idea of what to expect in their jobs [7]. At the same time, it is important for professors to illustrate architectural concepts to the students using hardware that encompasses the technologies included in a computer architecture curriculum. During the last decade, CPUs used in desktop and laptop computers counted for only about 2% of all CPUs sold [12]. One of the most common architectures found in non-desktop applications is the 32-bit ARM architecture designed by ARM Holdings, headquartered in Cambridge, UK. According to ARM Holdings, processors based on their architecture were used in 95% of smartphones in 2013 with manufacturers of chips containing an ARM core reporting shipments of 4.8 billion ARM-based processors for the mobile device market alone during that time [1]. Common use, however, doesn’t necessarily mean that the architecture supports the concepts that are important to a computer architecture course. An examination of the capabilities of the ARM core reveals a long list of features applicable to the computer architecture curriculum [2]. o 16 general purpose registers with a directly accessible program counter o A robust 32-bit fixed-width instruction set architecture including conditional execution for all instructions, pre-shift operations for literals as part of all instructions, pre- or post- auto incrementing or decrementing indexed addressing, and block load/store operations o Ability to switch between 32-bit ARM and 16-bit Thumb instruction sets o Full implementation of interrupts with a shadow register file o Ability to switch between Harvard and von Neumann architectures o Superscalar architecture (single integer pipe plus a floating point pipe) o A substantial cache including virtual or physical addressing, process ID support for context switching, ability to lock at the way-level, pseudo-random or round-robin replacement algorithm, and write-back or write-through operation o Robust memory management unit including configurable page sizes from 4K to 16M, a translation look-aside buffer, memory access permission control, 1- or 2-level page translation tables, memory attribute control, and process ID support for context switching 69 JCSC 30, 5 (May 2015) There is not sufficient space in this paper to cover the capabilities of the ARM architecture. Examination of the topic list for computer architecture available from the ACM, IEEE Computer Society, “Computing Curriculum 2013,”[3] however, shows that the ARM architecture can provide students with experience in all areas but SIMD, complex bus arbitration, and multiprocessor cache coherence algorithms. Committing to a single platform inevitably has its drawbacks. One of the drawbacks for the ARM architecture is that its RISC-style load/store addressing limits the students’ experience with addressing modes. In addition, the granularity of its interrupt structure forces most interrupts to be served by a single ISR. Lastly, in order to reduce memory accesses, the architecture uses the RISC-style subroutine call method by storing return addresses in register r14 instead of on a stack. PHYSICAL LABORATORY IMPLEMENTATION CHALLENGES Prior to the beginning of the semester, each student is asked to purchase their own Raspberry Pi Model B. The cost for this device is $40, but there are hidden costs. The student still needs to purchase an SD card for persistent storage and a 1 amp micro-B USB power supply. There are vendors that offer these pieces as a package with the Raspberry Pi and a clear plastic case for about $60. A few of the laboratory exercises require the students to create a primitive operating system and store it to their SD card, so they will also need an SD card reader. The laboratory facilities posed a few challenges too. Approximately half of the exercises utilize the Debian-based Linux installation for which a monitor and keyboard are required. The other exercises used the serial port to communicate to a rudimentary bootloader or operating system. Either way, a method of I/O was going to be necessary for the students to perform the lab. It was first thought that the computer monitors in the lab could be used for displays. Unfortunately, the monitors took DVI input and were not HDMI-capable. This meant that an adaptor would be required for each machine. The students would also need to unplug the USB keyboards from the desktop machines in the lab to use them for input. Understandably, the IT administrator did not want this arrangement due to maintenance issues and the fact that many students would leave without putting the laboratory machines back in their original configuration. The solution was to purchase individual FTDI TTL-232R3V3 USB to serial adaptors at about $16.50 each. This allowed each student to plug a Raspberry Pi into the USB port of the laboratory computer and use the computer as a serial terminal. Any number of terminal emulators could have provided serial communication between the laboratory computer and the Raspberry Pi. The author was limited in his selection, however, due to the fact that he was only able to find one bootloader that worked with the Raspberry Pi. That bootloader, written by David Welch, utilized the XMODEM protocol to download an image to the Raspberry Pi [14]. As a result, the open-source terminal emulator Tera Term was selected. Lastly, although the Linux image came with a compiler, assembler, and linker, development of a custom bootable image required a cross compiler. The author selected 70 CCSC: Mid-South Conference the YAGARTO (Yet Another GNU ARM Toolchain) suite of tools providing an EABI cross compiler and Make utility for development on Windows-based desktops. It is important to note that in July of 2014, the Raspberry Pi Foundation released an updated version of their original Raspberry Pi Model B. Called the Model B+, it contained more general purpose I/O pins, two additional USB ports, a Micro SD slot instead of the full-sized SD slot, lower power consumption, and improved audio [11]. Some things were removed too including the old analog video output and an “OK” status LED on which most introductory labs were based. This caused some incompatibility problems for a couple labs when some of the students had the older Model B and others had the Model B+. OVERVIEW OF LABORATORY EXERCISES The Raspberry Pi’s versatility provides a foundation for a wide variety of labs ranging from simple I/O to the implementation of a cluster. The following is an overview of some of the laboratory exercises that the author developed for students in the computer architecture course that he teaches at his university. The first few labs focus on the basics of the ARM architecture, its development environments, and the Raspberry Pi. This includes studying the ARM instruction set through an examination of the assembly language output from the gcc cross-compiler, how gcc assigns segments, and the effects of different levels of gcc optimization. These labs also present the ARM register file and how the application binary interface defines the use of these registers. The GNU tools hexdump, objdump, and gdb are vital to the success of these labs, and the students gain significant experience with them. The next group of labs introduce the students to hardware peripherals. They start by studying basic digital I/O, then move onto system timers and serial communications. For each of these peripherals, the students start with polled I/O to interact with the hardware. Once this is mastered, they move to interrupts and DMA. There is also a laboratory exercise where gcc is used to compare the 32-bit ARM assembly language with the 16-bit Thumb language. (There is a one-to-one mapping of each Thumb instruction to a corresponding ARM instruction.) During this lab, students are asked to compare how the different instruction sets affect compiler optimization, memory usage, addressing, and the number of instructions. Another lab examines the superscalar properties of the ARM processor. In this lab, the students are asked to create a C program with a large loop that takes a long time to execute. They are then asked to compare the execution times of instructions that are independent with instructions that have true dependencies. They are also asked to mix data types and see if that has an effect on performance. An important part of the ARM architecture is its use of a coprocessor to evaluate performance. Some of the exercises utilizing this coprocessor had the student make minor changes in the configuration of an architectural feature and evaluate the resulting change in performance. This could be used to evaluate the effects of disabling the cache, locking lines in the cache, using to the Thumb instruction set, changing virtual memory page size, 71 JCSC 30, 5 (May 2015) and changing number of page table levels. This coprocessor also allows the programmer to directly access cache contents, tags, and status bits. In the last laboratory exercise of the semester, the students are asked to work together in groups to create small Raspberry Pi-based clusters. The steps for this exercise are based on the work by Simon Cox et al. from the University of Southampton in the United Kingdom [8]. The process uses the MPICH portable implementation of the Message Passing Interface (MPI). The majority of the work involves downloading, compiling, and installing MPICH2 on the Raspberry Pi, configuring the Raspberry Pi for use on a small IP network, running code on the MPICH2 installation, and comparing the execution times. The specific application they use is the parallel computation of pi, which is included as one of the examples in MPICH2. In the end, the students are often surprised how much of an effect communication overhead has on the performance of their clusters. CONCLUSION While it is possible to teach computer architecture from a purely theoretical point of view or teach it using an abstract processor, students receive a clear benefit from implementing code on a commercially available architecture. The Raspberry Pi provides a cost effective solution to bringing that architecture into the classroom. The combination of the ARM architecture with its depth and breadth of architectural features and the wide range of experiments that can be performed on the Raspberry Pi suggest that it may be difficult if not impossible to exhaust the possibilities. REFERENCES [1] ARM Holdings plc, Annual Report 2013: Strategic Report, Cambridge: ARM Holdings plc, 2013. [2] ARM Holdings plc, ARMv6-M Architecture Reference Manual, Cambridge, 2008. [3] Association for Computing Machinery/IEEE Computer Society, “Computer Science Curricula 2013,” 20 December 2013. [Online]. Available: http:// www.acm.org/ education/CS2013-final-report.pdf. [Accessed 6 December 2014]. [4] Bal, M., “A Supercomputer made with Raspberry Pi and LEGO,” Engineering.com, 22 February 2013. [Online]. Available: http://www.engineering.com/ ElectronicsDesign/ElectronicsDesignArticles/ArticleID/5357/A-Supercomputermade-with-Raspberry-Pi-and-LEGO.aspx. [Accessed 6 December 2014]. [5] Cellan-Jones, R., “A 15 Pound Computer to Inspire Young Programmers,” BBC, 5 May 2011. [Online]. Available: http://www.bbc.co.uk/blogs/legacy/thereporters/ rorycellanjones/2011/05/a_15_computer_to_inspire_young.html. [Accessed 6 December 2014]. 72 CCSC: Mid-South Conference [6] Chadwick, A., “Baking Pi - Operating Systems Development,” Cambridge University, July 2013. [Online]. Available: http://www.cl.cam.ac.uk/projects/ raspberrypi/tutorials/os/. [Accessed 6 December 2014]. [7] Clements, A., “ARMs for the Poor,” in Frontiers in Engineering Education Conference, Washington, DC, 2010. [8] Cox, S., “Steps to make Raspberry Pi Supercomputer,” Computational Engineering and Design Research Group, University of Southampton, UK, January 2013. [Online]. Available: https://www.southampton.ac.uk/~sjc/raspberrypi/ pi_supercomputer_southampton.htm. [Accessed 6 December 2014]. [9] ITU, “ITU Offers Embedded Systems Design with Raspberry Pi and Arduino in Python and C,” 24 July 2013. [Online]. Available: http://itu.edu/ee/itu-offers-embedded-systems-design-with-raspberry-pi-and-ardu ino-in-python-and-c/. [Accessed 6 December 2014]. [10] Medaris, K., “Study: Hands-on projects may be best way to teach engineering and technology concepts,” 28 January 2009. [Online]. Available: https://news. uns.purdue.edu/x/2009a/090128DarkStudy.html. [Accessed 6 December 2014]. [11] Raspberry Pi Foundation, “New Product Launch! Introducing Raspberry Pi Model B+,” July 2014. [Online]. Available: http://www.raspberrypi.org/introducing-raspberry-pi-model-b-plus/. [Accessed 6 December 2014]. [12] Turley, J., “The 2% Solution,” Embedded System Design, 18 December 2002. [13] Welch, D., “Baremetal,” 17 April 2014. [Online]. Available: https://github.com/dwelch67/raspberrypi. [Accessed 6 December 2014]. [14] Welch, D., “Bootloader 05,” 13 September 2012. [Online]. Available: https://github.com/dwelch67/raspberrypi. [Accessed 6 December 2014]. 73