Research Visit to Federal University of Bahia CUGS Travel Report Marcelo Santos 1 School of Innovation, Design and Engineering Mälardalen University, Sweden March 16, 2009 marcelo.santos@mdh.se Abstract. In this report I describe the research done in the LASID lab located at the Federal University of Bahia, Brazil, which I visited thanks to a research grant from CUGS, and a little description of the place. We investigated the execution time of components and its dependence on the hardware state and configuration using statistical methods. We found that the convolution is a good approximation to the execution time of a composite, and the the few cases where it is not good, another statistical method can be used to find out which particular hardware parameter should be changed in order to get a better approximation. 1. Introduction I’m a PhD student at the School of Innovation, Design and Engineering, Mälardalen University, and my research project is about worst-case execution time (WCET) analysis of component-based embedded systems (CBES). In this section, I give a brief introduction to these fields in order to set up the context of my research. Embedded Systems Widely common in our modern way of life, embedded systema are special-purpose computer systems designed to perform dedicated functions, and are usually part of another device, with some having also mechanical parts. They can be found in complex systems such as an airplane or a satellite and also in a variety of common everyday electronic devices , such as: • consumer electronics: cell phones, pagers, digital cameras, camcorders, videocassette recorders, portable video games, calculators, etc.; • home appliances: microwave ovens, DVD players, washing machines, TVs, etc. • office automation: fax machines, copiers, printers, etc; • business equipment: cash registers, alarm systems, card readers, product bar code readers, etc.; • automobiles: transmission control, fuel injection, brake systems, active suspension, etc. A common feature of embedded of systems is that they have some specific requirements, rather than being of general use like the desktop computer, and usually the application domain is known beforehand [5]. Some embedded systems can have very strict requirements: must cost just a few dollars, must be sized to fit on a single chip memory, must perform fast enough to process data in a specified time frame (i.e., they are real-time), and must consume minimum power to extend battery life. In order to know if the system will process the data in time, we need to know the execution time of the software in the system. The analysis of the execution time is an active area of research and is the subject of this work. Component-based Embedded Systems The goal of this methodology for software development is to divide a system into smaller, manageable pieces, and is not new: structured and modular programming came as a promise to help build better and more reliable software, but failed in this goal due to always increasing size, complexity and requirements of modern software systems. Many would argue that future breakthroughs in software productivity will depend on our ability to combine (or compose) existing pieces of software to produce new applications. A component can be defined as just a piece of software, and a composite as a set of components composed in some way. A composition theory is a methodology to combine components. Despite the widespread use of component-based development, still nowadays there is no universally accepted definition for components and composition, with each model having it’s own definitions. With the use of component-based software in the embedded systems development domain, one important problem that that is of great importance is how to predict properties of such systems, based on the properties of the components, in order to predict if the system will satisfy its requirements or not. Some properties of interest for these kind of systems are for example resource consumption like memory, energy, time, CPU, etc. The property of interest to my research is the execution time, mainly the worst-case execution time of a composite, and how can we predict them based on the execution time of the components. Execution Time Analysis Program timing characteristics is of fundamental importance to the successful design and execution of embedded systems that have real-time characteristics. One key timing measure is the worst-case execution time of a program. A WCET analysis is based on the assumption that the program is executed in isolation from beginning to end without interruption, and has the goal of finding an upper bound to the worst possible execution time of the program. Reliable WCET estimates are a key component when designing and verifying real-time systems, and are needed in hard real-time systems development to perform, for example, scheduling and schedulability analysis. The WCET is often estimated through measurements, and so is not reliable in general, as it is hard to find the input parameters that cause the worst execution time. An alternative is static WCET analysis, which determines a timing bound from mathematical models of the software and hardware. If the models are correct, then the analysis will derive a safe timing bound. For simple architectures, the analysis gives a good result. Nowadays, the trend in embedded systems is to use more complex microprocessor architectures with more advanced features that enhance the overall performance of the system, providing more functionalities to the user. In theses processors, data and history dependencies cause the execution time of some instructions to be no longer constant, and this is a big problem when estimating the worst-case execution time of a piece of code. While for some applications a safe estimation is needed, for others a quality of service degradation is tolerable provided that it stays within a certain limit. For these kind of soft systems (like telecommunications and multimedia), we can look into approximate and probabilistic execution time models, like finding the probability that the execution time will be greater than a certain value, and the goal of this work is to apply this to component-based embedded systems. In an earlier research, we found out that the hardware features have different effects in the execution time of a composite, when compared with the execution time of the components in isolation [3]. This means that the execution time of a component is dependent on the hardware state that was left by an earlier component. The next step is to evaluate to what extent can we left aside this dependency and take the execution time of a composite as the convolution of the execution times of its components. Indeed we found out that it is a very good approximation, with exception of a very few cases. The research was done in collaboration with Prof. Raimundo Macedo and his research group in the Laboratory for Distributed Systems, at the Federal University of Bahia, in Salvador, one of the oldest cities in Brazil. During the period of Nov. 2008 to Feb. 2009 I was able to visit the lab thanks to a grant provided by CUGS. The researchers in the lab have been doing research of probability distributions of response times for components within ARCOS1 component-based framework for distributed real-time systems. The group’s expertise was a great contribution in the supervision of this part of my research. 2. The Place The City The first Europeans arrived in the region in 1510, and the city of Salvador, in the state of Bahia, was founded in 1549 and was the capital of colonial Brazil until 1763, when it moved to Rio de Janeiro (nowadays, the capital is Brasilia). Today it has about three million inhabitants and is one of the famous tourist destinations due to its good climate, big carnaval and nice sunny beaches. The oldest part of the city is the Historic Centre (Figure 1), and was recently renovated to increase tourism. The people in the region are generally known to be very warm and happy, something that I very much agree. The city is on the Atlantic coast and I was living in walking distance from the Barra lighthouse and beach (Figure 2), where a reef barrier results in nice pools during the low tide. As the region is rather hilly, the sea water can become dirty after a heavy rain, but it is clean again after a couple of sunny days. The University The Federal University of Bahia (UFBA) exists since 1946 as the union of several faculties and units. Some units are almost as old as the city, and the oldest started as a college by the Catholic Church in the XVI century. The faculties are not so old, with the faculty of medicine having started in 1808. Today it has several campuses in the region, with about 4000 new undergraduate students having started in 2007. The Computer Science department is located in the Ondina campus (Figure 3), my workplace for four months. This campus is surrounded by a small stretch of the Atlantic forest (Figure 4), with lots 1 ARchitecture for COntrol and Supervision http://arcos.sourceforge.net/ Figure 1. Historic Centre of Salvador: Pelourinho, Lacerda’s Lift and Market. Figure 2. Barra’s beach and lighthouse. Figure 3. Ondina campus’ main entrance and the Data Processing Centre, where the LASID lab is located. of singing birds and small primates, and green all year round due to the frequent rain. Today, there is less than ten percent of the original forest in the country, but still it has more biodiversity than the Amazon forest. The Laboratory The laboratory for distributed systems (LaSiD) is located in the building of the Data Processing Centre, in the Ondina campus, and is coordinated by Prof. Raimundo Macêdo. The lab is responsible for the master’s programme in mechatronics and the researchers have projects in real-time systems, communication protocols, hybrid distributed systems, and other subjects, some in collaboration with industry. The department started a PhD programme in Computer Science in 2007 in collaboration with two other universities. In the lab, I was using an AMD64 desktop computer with Debian (Figure 5), where I was running my simulations. In the project, I worked in close collaboration with Prof. Raimundo Macêdo, Prof. George Lima and Prof. Verônica Lima, to whom I’m very thankful for having found time in their busy schedule to help me. 3. Research Activities and Results The research done was about execution time of components and composites and we want to analyse the dependency between the execution time and hardware state. In our case, we know that the hardware state in which a program starts execution has an effect on the total execution time of the program, and so the execution time of the component c2 in the composition c1 ; c2 is dependent on the hardware state that remains after the execution the component c1 . In this work, we wanted to know to what extent the effect of the dependence is negligible by taking the execution time of a composite as the convolution of the execution time of its components, considered as independent random variables. The Kolmogorov-Smirnov goodness-of-fit test is used to evaluate how close is the convolution from the real execution time of the composition. Convolution of execution times: let p1 be the probability that the execution time of c1 is t1 and p2 be the probability that the execution time of c2 is t2 . Then the convolution of the two random variables gives the probability that the execution time of the composition c1 ; c2 is t3 . If the two events are independent, the convolution can be easily found by t3 = t1 × t2 . Figure 4. Remnants of the Atlantic forest, with plenty of birds and small monkeys jumping around. Figure 5. Me and the computer which used to complain when it had to work during the weekends: usually there was the message CPU overheating on the screen on Monday morning. Benchmarks: we selected programs from the MiBench benchmark suite [2] to act as components, and for the composition, we used sequential execution of code. For the execution environment, we used the SimpleScalar tool chain [1]: it simulates a complete microprocessor and allows more than 40 input parameters to configure the architecture. It outputs lots of statistics and the most important to this work is the number of cycles taken to execute a program. Goodness-of-fit test: the Kolmogorov-Smirnov (KS) goodness-of-fit test [4] evaluates the following hypothesis H0 : the distribution of the data in the population that T (c1 ; c2 ) is derived from is consistent with the distribution of the data in the population that conv(T (c1 ), T (c2 )) is derived from. The test requires that we construt the empirical cumulative distribution function (ECDF) for each of the samples. In our case, the ECDF will give the probability that the execution time of a component is less than or equal to a certain value x. The test is based on the distance between the ECDF of T (c1 ; c2 ) and conv(T (c1 ), T (c2 )) Figure 6 shows an example with a sample of size 100. In this figure, Fn (x) is the ECDF, the probability that the execution time is less or equal than x. In order to construct the plot, we ran the components c1 and c2 in the SimpleScalar tool set, each with 100 different input samples. Then we ran also the composition c1 ; c2 . Their execution times were recorded and the convolution conv(T (c1 ), T (c2 )) was calculated based on execution times of c1 and c2 . The result of the KS test is the greatest vertical distance D between the two density functions and a probability value called p-value, and is the probability of finding a distance bigger than (or equal to) D in the population of execution times for this case. The values D and p-value are found based only on the samples, so the bigger the number of simulations, the more reliable the results. From a set of more than 100 compositions, only two hypothesis were rejected by the KS test. One of them is in Figure 7. A three-day simulation using several combinations of the hardware configuration parameters allowed us to identify what parameter had the most weight in the rejection. The combinations were chosen in accordance with the methodology called fractional factorial design, and in these two cases, the branch predictor algorithm was the main responsible for the discrepancy. A new simulation with the branch predictor algorithm in the SimpleScalar tool changed to 2-level adaptive was enough to pass the test and have the hypothesis accepted (see Figure 8). 4. Conclusions The visit to the LASID lab was a great experience, not only due to the research done, in which I learned a lot about statistical methods and their application to the analysis of execution times, but also to the warm reception from the researchers. This kind of visit helps build strong ties and future collaborations, and I strongly encourage students to do similar things if they have the opportunity. These kind of visits can be very pleasant and rewarding. References [1] Todd Austin, Eric Larson, and Dan Ernst. Simplescalar: An infrastructure for computer system modeling. Computer, 35(2):59–67, 2002. [2] M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. Mibench: A free, commercially representative embedded benchmark suite. In WWC 1.0 Cumulative Distribution Function adpcm;primes D = 0.046 0.8 p−value = 0.984 red: conv(T(a),T(b)) 0.0 0.2 0.4 Fn(x) 0.6 blue: T(a;b) 5 10 15 20 25 30 35 x (execution time − megacycles) Figure 6. Composition of two components: we accept the hypothesis that the convolution has the same distribution as the execution time of the composition. 1.0 Cumulative Distribution Function fft;mm2 D = 0.319 0.8 p−value = 0 red: conv(T(a),T(b)) 0.0 0.2 0.4 Fn(x) 0.6 blue: T(a;b) 22.251 22.252 22.253 22.254 22.255 x (execution time − megacycles) Figure 7. For this composition, using perfect branch prediction in the simulation, the hypothesis is rejected. 1.0 Cumulative Distribution Function fft;mm3 D = 0.079 0.8 p−value = 0.912 red: conv(T(a),T(b)) 0.0 0.2 0.4 Fn(x) 0.6 blue: T(a;b) 22.443 22.444 22.445 22.446 22.447 x (execution time − megacycles) Figure 8. The hypothesis for the same composition as in Figure 7 is accepted when we use the 2-level adaptive branch prediction algorithm in the hardware simulator. ’01: Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop, pages 3–14, Washington, DC, USA, 2001. IEEE Computer Society. [3] Marcelo Santos and Björn Lisper. Evaluation of an additive wcet model for software components. In WTR 2008 10th Brazilian Workshop on Real-time and Embedded Systems, Rio de Janeiro, Brazil, May 2008. [4] David J. Sheskin. Handbook of Parametric and Nonparametric Statistical Procedures, Third Edition. Chapman & Hall/CRC, 2004. [5] Frank Vahid and Tony D. Givargis. Embedded System Design: A Unified Hardware/Software Introduction. Wiley, 2002.