Visiting EnyAC by Ivan Ukhov Embedded Systems Laboratory Linköping University 2015 Introduction In the year of 2015, I paid a visit to a research group located at Carnegie Mellon University (CMU), Pittsburgh, PA, USA. The group in question is called EnyAC— the name is derived from ENIAC, the first general-purpose computer—and I was honored to be a part of EnyAC from early May till late August, which was just the perfect timing weather-wise. In this short essay, I am extremely pleased to share with the reader some of the details of this four-month visit. To begin with, I would like to say a few words about the group itself. EnyAC is a notable research group whose work is always anticipated and highly appreciated by the respective community. Its strength is in the talented individuals and in the adamant leadership. Both are merits of Professor Diana Marculescu, who is the group’s leader. EnyAC develops tools, frameworks, and methodologies for sustainable computing, computing for sustainability, and life-science applications. Since its inception in 2000, the group has worked on novel computing paradigms and computer-aided design tools for energy-, variability-, and reliability-aware computing, all aiming at sustainable computing. In what follow, I would like to explain what drove me to undertake this transcontinental journey and to highlight the major checkpoints of this undertaking. Motivation My desire to visit EnyAC was due to several factors. First of all, EnyAC’s research is tightly related to my PhD studies in the Embedded Systems Laboratory (ESLAB) at Linköping University (LiU). Therefore, I became familiar with their work fairly early on, and I inevitably got to admire it. The trip was then a great opportunity for me to meet in person the people that I looked up to. Second, the visit would allow me to develop further the research ideas that I had at that time and to explore new ones and incorporate them into my work. Third, the visit manifested itself as, and it undoubtedly was, a source of valuable experience for a researcher since one would get a change to collaborate closely with people in a completely different research environment; after all, it was a different university and in a different country with all cultural and social implications included. Needless to mention that CMU is one of the most prestigious universities in the US, and this collaboration was also important to ESLAB, to the Department of Computer and Information Science, and to LiU in general. 1 Agenda The topic of my research is uncertainty quantification for electronic-system design. The core problem that this line of research is concerned with is the uncertainty faced by the designer of electronic systems. In many cases, this uncertainty is inherent, inevitable. One of the reasons for this state of affairs is process variation: the properties of fabricated dies deviate from the nominal ones since process parameters cannot be controlled precisely using the technologies currently employed in the fabrication process. Another reason is aging: the performance of an electronic system degrades over time due to natural or accelerated wear. Yet another, and perhaps the most consequential, reason is workload variation: the demand on a system in the field is rarely, if ever, known in the lab. Such uncertainties have to be properly dealt with in order to achieve effectiveness, efficiency, and robustness, and this is the theme that penetrates my research. Prior to my visit to EnyAC, the techniques that we were developing at ESLAB together with my advisors Professor Zebo Peng and Professor Petru Eles were meant to be used in off-chip settings, that is, offline, at design time. The agenda for the trip was to consider on-chip settings as well since being on the chip opens up additional means of dealing with the aforementioned uncertainty. The attractiveness of this route is due to the fact that only one particular fabricated hardware in one particular environment needs to be considered. More importantly, being on the chip enables adaptation. This means that instances of the system are treated on a case-by-case basis: each instance attains a custom-built, fine-tuned solution, and this solution evolves over time, diligently following any changes. Research During the first month of my visit, I was primarily preoccupied with reading the literature on dynamic management 0f electronic systems. I paid particular attention to proactive management since it is considered to be the most efficient one. A proactive management strategy tries to predict the future state of the system under consideration and act in accordance with this prediction. Such strategies are largely about making a good use of on-chip data as these data capture what the system is actually going through, which is invaluable for forecasting. Nowadays, “making a good use of … data” is firmly, and not without a strong reason, associated with the fields of statistics and machine learning. These fields 2 provide the tools for learning from data and drawing well-grounded conclusions with respect to the generative process underlying these data. These tools are at heart of proactive management strategies. It should be clear, however, that having data at one’s disposal is an unconditional prerequisite to any of such tools. My foremost task was then to set up a research environment for myself that would allow me to explore and experiment with data-driven techniques. This primarily means that I needed a stable supply of the data that a potential management strategy would leverage in order to attain its management objectives. In such cases, computer simulators are of great help since real hardware often is unavailable (for technical, financial, or otherwise reasons), is not an appropriate place for early experimentation, or gets in the way of fluid exploration. In order to be useful for learning purposes, simulations should be sufficiently detailed so that they capture well the characteristics of real systems. Although detailed simulations are practical for the design of individual components, such simulations fall short when it comes to complex systems. A modern electronic system is reasonably complex, and it might take days for a state-of-the-art simulator to simulate a short, in wall-clock time, program running on such a system. This scheme is not affordable for designing learning-based management strategies as learning requires many simulations with potentially large payloads. All in all, the situation was daunting for a researcher trying to leverage the rich machinery of statistics and machine learning. The need for alternative sources of high-quality data fuel was prominent. More broadly, there was a prominent need for a systematic assistance in obtaining data for learning-based research studies, which was concluded based on the literature and on my personal experience. Providing such an assistance became a research work in its own right, which occupied the rest of my stay in Pittsburgh, as I will describe next. My advisors—Professor Zebo Peng and Professor Petru Eles at LiU and Professor Diana Marculescu at CMU—and I decided to take time and develop a methodology for fast generation of on-chip data with data-powered applications in mind, like those that we originally had. More concretely, we set out to construct a fast technique for producing power and temperature data of electronic systems, which would capture well the idiosyncrasies of real data and, hence, would be suitable for devising learning-based management strategies. The reason for focusing on power and temperature is due to the fact that power consumption and heat dissipation are of immense importance for the well-being of electronic systems. Power is tightly related to energy, and energy translates 3 willingly to hours of battery life and to electricity bills. Operating temperature, on the other hand, is one of the major causes of permanent damage, which necessitates the deployment of adequate cooling equipments, escalating overall product expenses. Under these circumstances, power and temperature are arguably the main parameters to consider when managing electronic systems. Developing a methodology that would satisfy the requirements that had been imposed—a high generation speed and a high level of realism were needed—was not straightforward. A lot of aspects pertaining to the functioning of electronic systems and the interactions of such systems with the corresponding environments had to be carefully thought through. Nevertheless, by the end of the fourth month, all the components of our methodology were in place. Moreover, we had implemented a prototype of a toolchain embodying the methodology. At the time I am writing these lines, I am obviously already back in Linköping, and we are planning to publish the methodology that we have developed and to polish and open-source the accompanying toolchain. We hope that our work will enable new and assist ongoing studies by making it easier to explore the potential of novel or revived data-driven techniques for analysis, prediction, and management of electronic systems. Conclusion The visit to the EnyAC research group was a highly valuable, productive experience from both personal and professional standpoints. I can confidently say that my expectations about the group, the university, and the country were well exceeded. I met many remarkable people, whom I enjoyed working with, and I will try to do my best to keep in touch with them. Acknowledgments I would like to thank the faculty and staff of the Department of Computer and Electrical Engineering at CMU for their warm hospitality. I would like to express my particular gratitude to Diana Marculescu, Zhuo Chen, and Ermao Cai for the exhaustive help with everything that I could possibly need. I also would like to thank Ina Fiterau, who is with the School of Computer Science at CMU, for the pieces of advice regarding machine learning. Lastly, I would like to acknowledge Swedish National Graduate School in Computer Science for funding this visit. 4