Curriculum Vitæ et Studiorum Name Date of birth Citizenship Email Web page Address Andrea Marongiu January 30, 1978 Italian a.marongiu@iis.ee.ethz.ch, a.marongiu@unibo.it http://www-micrel.deis.unibo.it/~marongiu/ Swiss Federal Institute of Technology Department of Information Technology and Electrical Engineering Gloriastrasse 35, 8092, Zurich, Switzerland Phone: +41 44 632 6087 Short Bio Dr. Andrea Marongiu received the PhD degree in electronic engineering from the University of Bologna, Italy, in 2010. He currently holds a postdoc position at ETH Zurich, Switzerland and at the University of Bologna. His research interests focus on programming models and architectures in the domain of heterogeneous multi- and many-core systems on a chip. This includes programming model, compiler and runtime support to efficiently address performance, predictability, energy and reliability issues in parallel, embedded systems, as well as HW-SW co-design of accelerator-based MPSoCs. In this field, he has published more than 70 papers in international peer-reviewed conferences and journals, with more than 400 citations and an h-index of 12 [Google Scholar]. He has collaborated with several international research institutes and companies. Contents Position and Education . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Awards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Professional Activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Teaching Activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Research Interests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Publication List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1 Position and Education R ECORD OF E MPLOYMENT 10/2013 – present Postdoctoral Research Associate at the Dept. of Information Technology and Electrical Engineering (D-ITET) of the Swiss Federal Institute of Technology in Zurich (ETHZ). 09/2015 – present Research Consultant at the Dept. of Electrical, Electronic and Information Engineering (DEI) of Università di Bologna. 05/2010 – 08/2015 Postdoctoral Research Associate at the Dept. of Electrical, Electronic and Information Engineering (DEI) of Università di Bologna. E DUCATION • Ph.D. degree in Electronics, Telecommunications and Information Technologies Engineering at Università di Bologna in 2010. Thesis Title: Tecniche di ottimizzazione del software per sistemi su singolo chip per applicazioni di Nomadic Computing, Advisor: Prof. Luca Benini • Laurea Degree in Electronic Engineering in 2005 Thesis title: Progetto e implementazione di un sistema di partizionamento hardware/software per architetture riconfigurabili, Advisor: Prof. Luigi Raffo, Prof. Salvatore Carta V ISITING EXPERIENCES • Visiting researcher at INRIA Futurs - Parc Orsay Université, Orsay Cedex France [Ref. Albert Cohen] (Jun-Sept 2008) • Visiting researcher at Brown University - Dept. of Electronics, Providence, Rhode Island, United States [Ref. Prof. Iris R. Bahar, Prof. Maurice Herlihy] (Nov 2010 - May 2011) 2 Awards AW.1. Best paper award: Paolo Burgio, Andrea Marongiu, Paolo Valente, Marko Bertogna, “A memorycentric approach to enable timing-predictability within embedded many-core accelerators,” ACM/IEEE/CSI Symposium on Real-Time and Embedded Systems and Technologies (RTEST), 2015. 3 AW.2. Best paper award: Francesco Conti, Chuck Pilkington, Andrea Marongiu, Luca Benini, “HeP2012: Architectural heterogeneity exploration on a scalable many-core platform,” 25th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP), 2014. 12 AW.3. Best paper award: Dimitra Papagiannopoulou, Tali Moreshet, Andrea Marongiu, Luca Benini, Maurice Herlihy, R. Iris Bahar, “Speculative synchronization for coherence-free embedded NUMA architectures,” International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (ICSAMOS), 2014. 19 AW.4. Best poster award: P. Burgio, A. Marongiu, L. Benini. “OpenMP extensions to exploit HW acceleration on shared-memory many-core clusters,” International Conference on Design, Automation and Test in Europe (DATE), 2013. AW.5. Best paper award candidate: Cesare Ferri, Andrea Marongiu, Benjamin Lipton, Iris R. Bahar, Luca Benini, Maurice Herlihy, Tali Moreshet, “SoC-TM: Integrated HW/SW Support for Transactional Memory Programming on Embedded MPSoCs,” International Conference on Hardware/Software Codesign and System Synthesis (CODES), 2011. 31 AW.6. Best paper award candidate: Jaume Joven, Andrea Marongiu, Federico Angiolini, Luca Benini, Giovanni De Micheli, “Exploring programming model-driven QoS support for NoCbased platforms,” International Conference on Hardware/Software Codesign and System Synthesis (CODES), 2010. 34 AW.7. Best paper award candidate: Shivani Raghav, Martino Ruggiero, David Atienza, Christian Pinto, Andrea Marongiu, Luca Benini, “Scalable instruction set simulator for thousand-core architectures running on GPGPUs,” International Conference on High Performance Computing and Simulation (HPCS), 2010. 37 3 Professional Activities C ONTRIBUTION TO NATIONAL AND I NTERNATIONAL R ESEARCH P ROJECTS • H2020-ICT-688860-HERCULES: High-performance real-time architectures for low-power embedded systems [Jan 2016 – ] http://hercules2020.eu/ Local project leader: Prof. Luca Benini (ETHZ) Role: Co-applicant, Work-Package Leader and Research Team Member (Level of involvement: HIGH) Activities: The project is aimed at achieving predictable performance on top of cutting-edge heterogeneous COTS multi-core platforms, with an order-of-magnitude improvement in the cost and power consumption. The research done at ETHZ under the lead of Andrea Marongiu focuses on developing predictable execution models on top of existing programming models and compiler infrastructures. • FP7-ICT-611016-P-SOCRATES: Parallel Software Framework for Time-Critical Many-core Systems [Oct 2013 – ] http://www.p-socrates.eu/ Local project leader: Prof. Luca Benini (ETHZ) Role: Co-applicant, Work-Package Leader and Research Team Member (Level of involvement: HIGH) Activities: The project is aimed at allowing applications with high-performance and real-time requirements to fully exploit the computation potential of many-core processors, whilst ensuring a predictable performance and simplifying application development. The research done at ETHZ under the lead of Andrea Marongiu focuses on developing optimized runtime systems for parallel many-core accelerators and on exploring FPGA-based architectural extensions for improved main memory sharing. • FP7-ICT-288574-VIRTICAL:SW/HW extensions for virtualized heterogeneous multicore platforms [Jul 2011 – Oct 2014] Local project leader: Prof. Luca Benini (UNIBO) Role: Co-applicant, Work-Package Leader and Research Team Member (Level of involvement: HIGH) Activities: The project targeted hardware and software extensions for virtualization of heterogeneous embedded multicore platforms. The research done at UNIBO under the lead of Andrea Marongiu focused on developing programming model extensions for efficient use of accelerators (programmable parallel systems or HW processing units) in a fully-virtualized SoC. • FP7-ICT-248776-PRO3D: Programming for Future 3D Architecture with Many Cores [Jan 2010 - Dec 2012] Local project leader: Prof. Luca Benini (UNIBO) Role: Work-Package Leader and Research Team Member (Level of involvement: HIGH) Activities: The project was aimed at enhancing the programmability of future 3D multicore platforms. The activities focused on i) developing data mapping and distribution techniques in a 3D-stacked partitioned global address space (PGAS) machine; ii) contributing to the development of parallel simulation infrastructures running on general-purpose GPUs. • FP7-IDEAS-ERC-291125-MULTITHERMAN: Multi-Scale Thermal Management of Computing Systems [Apr 2012 – ] http://wwwmicrel.deis.unibo.it/multitherman Local project leader: Prof. Luca Benini (UNIBO) Role: Research Team Member (Level of involvement: LOW) Activities: The project aims at moving beyond worst-case design practices adopted in traditional thermal planning and reactive thermal management by integrating thermal-aware platform design, thermal control 4 with workload management and shaping in a distributed, multi-scale strategy. The activities focus on collaborating to the development of runtime systems for distributed, parallel systems. • ARTEMIS-100230-SMECY : Smart Multicore Embedded SYstems [Feb 2010 – Jan 2013] Local project leader: Prof. Luca Benini (UNIBO) Role: Task leader, Research Team Member (Level of involvement: MEDIUM) Activities: The project was aimed at developing new programming technologies enabling the exploitation of many (100s) core architectures. The activities focused on developing language- and compilerlevel techniques for improved data locality in computations offloaded to a many-core architecture (STP2012/STHORM). • FP7-ICT-224170-SHARE: Sharing Open Source Software Middleware to improve industry competitiveness in the embedded systems domain (CSA) [May 2008 - Apr 2010] Local project leader: Prof. Luca Benini (UNIBO) Role: Work-Package Leader and Research Team Member (Level of involvement: HIGH) Activities: The project was a support action aimed at fostering the diffusion and adoption of open-source software. The activities focused on i) collaborating to the creation of a web-based tool to evaluate existing open-source software in a comparative manner; ii) organizing dissemination events and workshops to promote the initiative. S CIENTIFIC C OLLABORATIONS (B EYOND EU- PROJECTS ) • Collaborations with Italian academic institutions: – Politecnico di Milano - Italy (2015 - ongoing) Contact Person: Cristina Silvano Topic: Customized, self-adaptive low-power computing. – Politecnico di Torino - Italy (2008 - 2010) Contact Person: Andrea Acquaviva Topic: compiler and runtime techniques for NBTI-aware workload distribution in MPSoCs. – University of Ferrara - Italy (2010 - ongoing) Contact Person: Davide Bertozzi Topic: Hardware-accelerated synchronization primitives for multi-cluster MPSoCs; MPSoC virtualization. • Collaborations with international academic institutions: – Penn State University - Pennsylvania (2006 - 2007) Contact Person: Mahmut Kandemir Topic: Lightweight synchronization support for compiler-automated loop-level parallelization. – Penn State University - Pennsylvania (2016 - ongoing) Contact Person: Vijaykrishnan Narayanan Topic: Virtual shared memory performance on heterogeneous systems. – INRIA Futurs - Orsay Cedex, France (2008 - 2009) Contact Person: Albert Cohen Topic: Compiler support for transactional memory programming. – EPFL - Lausanne, Switzerland (2009 - 2010) Contact Person: Giovanni De Micheli Topic: Programming model-driven QoS in NoC-based MPSoCs 5 – EPFL - Lausanne, Switzerland (2010 - 2012) Contact Person - David Atienza Topic: GPGPU-accelerated simulation of many-core architectures. – Brown University - Providence, Rhode Island (2010 - ongoing) Contact Person: Iris R. Bahar Topic: Integrated HW/SW support for transactional memory programming on embedded MPSoCs; Transactional-memory based support to variability-induced error tolerance. – Universite de Bretagne Sud - Lorient, France (2011 - ongoing) Contact Person: Philippe Coussy Topic: Architecture and tools for HLS-generated HW processing units integrated in shared-memory MPSoCs. – University of California, San Diego - California (2012 - 2015) Contact Person: Rajesh Gupta Topic: Architecture and programming model support for variability tolerance in on-chip manycores. – Universitad Politecnica de Valencia - Spain (2011 - 2014) Contact Person: José Flich Topic: Network-on-Chip and operating system support for many-core accelerator virtualization in heterogeneous embedded systems. I NDUSTRIAL C OLLABORATIONS • ST Microelectronics [2010-2012] Topic: Technical leader for a research collaboration agreement on the integration of shared-memory tightly-coupled accelerators in the STHORM heterogeneous on-board system. The research project focused on architecture support for shared-memory heterogeneous computing, as well as programming model support for simplified development of accelerated applications. • ST Microelectronics [2010-2012] Topic: Technical leader for a research collaboration agreement on supporting the OpenMP programming model on the STHORM heterogeneous on-board system. The research project focused on the development of an optimized runtime system for the accelerator, plus a toolchain and Linux driver for enabling computation offloading from the ARM host system. • Freescale Semiconductors Ltd. [2007] Topic: Technical contributor for a research collaboration agreement on the development of Linux kernel techniques for energy efficient mobile devices. Program Committee Membership • DATE (2014 - 2016) - Design Automation and Test in Europe • FPL (2015) - International Conference on Field-Programmable Logic and Applications • EUC (2014 - 2015) - International Conference on Embedded and Ubiquitous Computing • MCSoC (2014 - 2016) - International Symposium on Embedded Multicore/Many-core Systems-on-Chip • DASIP (2013) - Design and Architectures for Signal and Image Processing • SCOPES (2014 - 2016) - International Workshop on Software and Compilers for Embedded Systems • SOMRES (2011) - Workshop on Synthesis and Optimization Methods for Real-Time Embedded Systems 6 R EFEREE SERVICES IN J OURNALS AND C ONFERENCES • Conferences and Workshops: FPL - International Conference on Field Programmable Logic and Applications, DATE - Design Automation and Test in Europe, ICECS - International Conference on Electronics, Circuits, and Systems, PACT Parallel Architectures and Compilation Techniques, LCTES Languages, Compilers, Tools and Theory for Embedded Systems, ICS International Conference on Supercomputing, SCOPES Workshop on Software and Compilers for Embedded Systems, CODES Conference on Design and Architectures for Signal and Image Processing, EUC - Embedded and Ubiquitous Computing, DASIP - Design and Architectures for Signal and Image Processing, HIRES - High-performance and Real-time Embedded Systems, ETFA - Emerging Technologies and Factory Automation, MCSOC - International Symposium on Embedded Multicore/Many-core Systems-on-Chip, WEHA - International Workshop on Energy-aware high performance Heterogeneous Architectures and Accelerators. • Journals: IEEE Transactions on Computers (TC), IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), IEEE Transactions on Industrial Informatics (TII), IEEE Transactions on Signal Processing (TSP), IEEE Transactions on Parallel and Distributed Systems (TPDS), ACM Transactions on Embedded Computing Systems (TECS), ACM Transactions on Design Automation of Electronic Systems (TODAES), ACM Transactions on Architecture and Code Optimization (TACO), ACM Transactions on Reconfigurable Technology and Systems (TRETS), ELSEVIER Microprocessors and Microsystems: Embedded Hardware Design (MICPRO), ELSEVIER Journal of System Architecture (JSA), ELSEVIER Computers & Electrical Engineering (COMPELECENG), ELSEVIER Journal of Systems and Software (JSS), ELSEVIER Journal of Future Generation Computer Systems (FGCS), ELSEVIER Simulation Modelling Practice and Theory (SIMPAT), ELSEVIER Microelectronics Journal (MEJ), ELSEVIER Integration, the VLSI Journal (VLSI), ELSEVIER Journal of Parallel and Distributed Computing (JPDC), SPRINGER International Journal of Parallel Programming (IJPP), SPRINGER Journal of Real-Time Image Processing (JRTIP), SPRINGER Journal of Supercomputing (JSUPE), 7 Teaching Activities 2013 - now Teaching Assistant Swiss Federal Institute of Technology in Zurich - Electronics Engineering Degree - Graduate level Advanced System-on-Chip Design - Lectures, seminars, exercises. 2013 - now Teaching Assistant Università di Bologna - Electronics Engineering Degree - Undergraduate level Hardware/Software Design Methodologies - Lectures, seminars, exercises. 2007 - 2013 Teaching Assistant Università di Bologna - Electronics Engineering Degree - Graduate level Metodologie di Progettazione Hardware/Software - Lectures, seminars, exercises. 2008 Guest Lecturer Università di Verona - Computer Science Degree - Undergraduate level Distributed Embedded Systems - Lectures, seminars. S TUDENTS ’ SUPERVISION PhD Students Supervision • ETHZ Bjorn Forsberg, Daniele Palossi, Pirmin Vogel. • UNIBO Giuseppe Tagliavini, Paolo Burgio, Francesco Conti, Alessandro Capotondi, Christian Pinto, Daniele Bortolotti. Graduate Students Supervision/Co-Advising • ETHZ Alessandro Angelino, Roberto Roncone, Maheshwara Sharma. • UNIBO Daniele Cesarini, Maria Abrahamyan, Alessio Franceschelli, Francesco Conti, Alessandro Capotondi, Christian Pinto, Francesco Lucchi, Matteo Bruni. Visiting Students Supervision/Co-Advising • Master Students Mariyah Abrahamian (Alari). • PhD Students Dimitra Papagiannopoulou (Brown University), Masoud Dehyadegari (University of Teheran), Abbas Rahimi (University of California, San Diego). 8 Research Interests My main research interests are related to architectures and programming models for heterogeneous SoCs featuring multi/many-core processing units. Particular emphasis is on efficient exploitation of memory hierarchies and accelerators (HW processing units and GPU-like manycore co-processors). The activities carried out in the last years span over the following research lines. A RCHITECTURES AND P ROGRAMMING M ODELS FOR H ETEROGENEOUS S O C S Within the framework of the Parallel Ultra-Low-Power platform (PULP) project12 , the research team I supervise focuses on programming models and architectures for heterogeneous systems-on-chip (SoC). The main activities include: i) the design of runtime system (OpenMP, CUDA, OpenCL, OpenVX) and compiler techniques to address performance/energy issues [JR.14] [JR.8] [JR.3] [IC.25] [IC.24] [NC.9] and, more recently, predictability requirements [IC.7] [IC.4] [IC.3] [JR.6] [IC.15]; ii) the design of architectural and programming-level support for lightweight many-core accelerator virtualization [JR.2] [IC.5] [NC.3] [NC.6]; iii) the design of architectural and programming-level support for tightly-coupled shared memory HW accelerators [JR.7] [JR.5] [IC.12] [IC.20] [IC.27] [IC.28]; iv) the development of simulation infrastructures for many-core based heterogeneous systems [JR.9] [JR.1] [JR.11] [IC.29] [IC.32] [IC.37] [NC.8] [NC.11]. T RANSACTIONAL MEMORY My activities in this field started in 2008, when I was involved in a project for developing support for TM programming in the GNU GCC compiler, during an internship at INRIA Paris. At that time I also studied the applicability of transactional memory to speculative parallelization of irregular applications (e.g. sparse array reductions). Transactions were leveraged to protect concurrent update operations in critical sections within parallel reduction loops. An initial proof-of-concept implementation of the technique within the autoparallelization pass (tree-parloops) in GCC was also developed. My research in this field continued over the years through a collaboration with the Brown University. A fully integrated HW/SW solution for transactional programming on embedded MPSoCs has been designed and developed within a virtual platform enabling full-system cycle-accurate simulation. The proposed HTM design leverages a dedicated module responsible for managing conflicts. This is achieved in a very lightweight and fast manner by employing Bloom filters. Application developers are not meant to directly interact with this HW module, nor to cope with low-level transactional programming APIs. Transactional features are triggered through a set of compiler directives, implemented as an extension to the OpenTM programming model (compiler and runtime system) from Stanford University. Support to speculative parallelism is also provided to further improve ease of programming. Loops with non-independent iterations can be annotated for speculative parallel execution. The underlying TM system ensures that, in case a real dependence arises, the original sequential program semantics is preserved. This is achieved by forcing transactions to commit in program order, thanks to specific hardware support for prioritized commit [IC.31] [IC.19] [NC.10] [NC.13]. HW/SW SUPPORT FOR TOLERATING VARIABILITY- INDUCED ERRORS In this field I am working in three main areas: • the collaboration with Brown University is still ongoing, and the transactional memory support developed over the years is currently being adapted to support tolerance to variability-induced computation errors. This revisited use of traditional TM allows to operate the platform at reduced voltages, as the recovery 1 www.pulp-platform.org, 2 www-micrel.deis.unibo.it/pulp-project/ 9 mechanisms ensure that if an error occurs the transaction that experienced it can be safely aborted and rolled back, after the appropriate countermeasures in terms of voltage adjustment have been taken [IC.8]. • within a collaboration with the University of California, San Diego, we explored runtime support for costeffective countermeasures against hardware timing failures during system operation. Instead than ultraconservative multi-corner design margins or costly circuit-level error recovery mechanisms we propose a variability-aware extension to the OpenMP v3.0 programming model. Using the notion of work-unit vulnerability (WUV) we capture timing errors caused by circuit-level variability as high-level software knowledge. WUV provides a useful abstraction of hardware variability to efficiently allocate a given work-unit to a suitable core for execution [JR.10] [IC.6] [IC.21] [IC.23]. • in ultra-low-power embedded devices aggressive voltage scaling techniques have the potential to reduce the power consumption within the admitted envelope, but memory operations on standard six-transistor static RAM (6T-SRAM) become unreliable. To cope with this problem we proposed hybrid memory systems coupling 6T-SRAM to standard cell memory (SCM). SCM stays reliable at low voltages, but is very costly and thus cannot fully replace SRAM. By providing programming model constructs to specify which data and computation exhibit inherent tolerance to computation errors and hardware support to split error-tolerant data between SRAM and SCM, the memory system can be powered at a low voltage while ensuring correct operation by binding possible (flip-bit) errors to the LSBs only [IC.9]. M EMORY M ANAGEMENT Most embedded multi-processor systems on a chip (MPSoC) feature explicitly managed (scratchpad-based) memory hierarchies. I have explored efficient management of such systems via extensions to the popular OpenMP API to fit the constrained requirements of MPSoCs and to adhere to the Partitioned Global Address Space (PGAS) organization of the memory system often assumed in the targeted devices [JR.13] [IC.38] [IC.35] [IC.33]. The extensions can be summarized as follows: • Features to trigger data distribution and data movement (additional directives); • Compiler support to data distribution, based on lightweight array access instrumentation (software address translation) and DMA-based data transfer; • A lightweight lookup mechanism based on compiler-generated metadata for low-cost distributed array references; • An allocation compiler pass that exploits profile information on array access count to determine a data distribution scheme which captures data locality at each parallel region; 10 Publication List International journals International conferences International workshops ( # 14 ) ( # 42 ) ( # 13 ) JR.1. Daniele Bortolotti, Andrea Marongiu, Luca Benini, “VirtualSoC: a Research Tool for Modern MPSoCs,” ACM Transactions on Embedded Computing Systems [To appear], Sept. 2016 JR.2. Alessandro Capotondi, Germain Haugou, Andrea Marongiu, Luca Benini, “Runtime Support for Multiple Offload-Based Programming Models on Embedded Manycore Accelerators,” IEEE Transactions on Emerging Topics in Computing [Preprint] 2016. [doi: http://doi.ieeecomputersociety.org/10.1109/TETC.2016.2554318] JR.3. Andrea Marongiu, Alessandro Capotondi, Luca Benini, “Controlling NUMA effects in embedded manycore applications with lightweight nested parallelism support,” Parallel Computing (Elsevier) [Preprint] 2016. [doi: http://dx.doi. org/10.1016/j.parco.2016.02.002] JR.4. Giuseppe Tagliavini, Germain Haugou, Andrea Marongiu, Luca Benini, “Optimizing memory bandwidth exploitation for OpenVX applications on embedded many-core accelerators,” Journal of Real-Time Image Processing (Springer), 2015. [doi: http://dx.doi.org/10.1007/s11554-015-0544-0] JR.5. Francesco Conti, Andrea Marongiu, Chuck Pilkington, Luca Benini, “He-P2012: Performance and Energy Exploration of Architecturally Heterogeneous Many-Cores,” Journal of Signal Processing Systems (Springer), 2015. [doi: http: //dx.doi.org/10.1007/s11265-015-1056-7] JR.6. Lus Miguel Pinho, Vincent Nélis, Patrick Meumeu Yomsi, Eduardo Quiñones, Marko Bertogna, Paolo Burgio, Andrea Marongiu, Claudio Scordino, Paolo Gai, Michele Ramponi, Michal Mardiak, “P-SOCRATES: A parallel software framework for time-critical many-core systems,” Microprocessors and Microsystems (Elsevier), 2015. [doi: http: //dx.doi.org/10.1016/j.micpro.2015.06.004] JR.7. Masoud Dehyadegari, Andrea Marongiu, Mohammad Reza Kakoee, Siamak Mohammadi, Nasser Yazdani, Luca Benini, “Architecture Support for Tightly-Coupled Multi-Core Clusters with Shared-Memory HW Accelerators,” IEEE Transactions on Computers, 2015. [doi: http://dx.doi.org/10.1109/TC.2014.2360522] JR.8. Andrea Marongiu, Alessandro Capotondi, Giuseppe Tagliavini, Luca Benini, “Simplifying Many-Core-Based Heterogeneous SoC Programming With Offload Directives,” IEEE Transactions on Industrial Informatics, 2015. [doi: http://dx.doi.org/10.1109/TII.2015.2449994] JR.9. Shivani Raghav, Martino Ruggiero, Andrea Marongiu, Christian Pinto, David Atienza, Luca Benini, “GPU Acceleration for Simulating Massively Parallel Many-core Platforms,” IEEE Transactions on Parallel and Distributed Systems, 2015. [doi: http://dx.doi.org/10.1109/TPDS.2014.2319092] JR.10. Abbas Rahimi, Daniele Cesarini, Andrea Marongiu, Rajesh K. Gupta, Luca Benini, “Improving Resilience to Timing Errors by Exposing Variability Effects to Software in Tightly-Coupled Processor Clusters,” IEEE Journal on Emerging and Selected Topics in Circuits And Systems, 2014. [doi: http://dx.doi.org/10.1109/JETCAS.2014.2315883] JR.11. Shivani Raghav, Andrea Marongiu, Christian Pinto, Martino Ruggiero, David Atienza, Luca Benini, “SIMinG-1k: A thousand-core simulator running on general-purpose graphical processing units,” Concurrency and Computation: Practice and Experience (Wiley), 2013. [doi: http://dx.doi.org/10.1002/cpe.2940] JR.12. Jaume Joven, Andrea Marongiu, Federico Angiolini, Luca Benini, Giovanni De Micheli, “An integrated, programming model-driven framework for NoC-QoS support in cluster-based embedded many-cores,” Parallel Computing (Elsevier), 2013. [doi: http://dx.doi.org/10.1016/j.parco.2013.06.002] JR.13. Andrea Marongiu, Luca Benini, “An OpenMP Compiler for Efficient Use of Distributed Scratchpad Memory in MPSoCs,” IEEE Transactions on Computers, 2012. [doi: http://dx.doi.org/10.1109/TC.2010.199] JR.14. Andrea Marongiu, Paolo Burgio, Luca Benini, “Supporting OpenMP on a multi-cluster embedded MPSoC,” Microprocessors and Microsystems (Elsevier), 2011. [doi: http://dx.doi.org/10.1016/j.micpro.2011.08.010] R EFEREED INTERNATIONAL CONFERENCES IC.1. Francesco Conti, Daniele Palossi, Andrea Marongiu, Davide Rossi, Luca Benini, “Enabling the Heterogeneous Accelerator Model on Ultra-Low Power Microcontroller Platforms,” Design, Automation, and Test in Europe conference (DATE), 2016. IC.2. Daniele Cesarini, Andrea Marongiu, Luca Benini, “An Optimized Task-Based Runtime System For Resource-Constrained Parallel Accelerators,” Design, Automation, and Test in Europe conference (DATE), 2016. 11 IC.3. Paolo Burgio, Andrea Marongiu, Paolo Valente, Marko Bertogna, “A memory-centric approach to enable timingpredictability within embedded many-core accelerators,” ACM/IEEE/CSI Symposium on Real-Time and Embedded Systems and Technologies (RTEST), 2015. IC.4. Maria A. Serrano, Alessandra Melani, Roberto Vargas, Andrea Marongiu, Marko Bertogna, Eduardo Quiones, “Timing characterization of OpenMP4 tasking model,” International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES), 2015. IC.5. Pirmin Vogel, Andrea Marongiu, Luca Benini, “Lightweight virtual memory support for many-core accelerators in heterogeneous embedded SoCs,” International Conference on Hardware/Software Codesign and System Synthesis (CODES), 2015. IC.6. Abbas Rahimi, Daniele Cesarini, Andrea Marongiu, Rajesh K. Gupta, Luca Benini, “Task scheduling strategies to mitigate hardware variability in embedded shared memory clusters,” Design Automation Conference (DAC), 2015. IC.7. Roberto Vargas, Eduardo Quiñones, Andrea Marongiu, “OpenMP and timing predictability: a possible union?,” Design, Automation, and Test in Europe conference (DATE), 2015. IC.8. Dimitra Papagiannopoulou, Andrea Marongiu, Tali Moreshet, Luca Benini, Maurice Herlihy, R. Iris Bahar, “Playing with Fire: Transactional Memory Revisited for Error-Resilient and Energy-Efficient MPSoC Execution,” ACM Great Lakes Symposium on VLSI (GLSVLSI), 2015. IC.9. Giuseppe Tagliavini, Davide Rossi, Andrea Marongiu, Luca Benini, “Synergistic Architecture and Programming Model Support for Approximate Micropower Computing,” IEEE Computer Society Annual Symposium on VLSI (ISVLSI), 2015. IC.10. Giuseppe Tagliavini, Germain Haugou, Andrea Marongiu, Luca Benini, “ADRENALINE: An OpenVX Environment to Optimize Embedded Vision Applications on Many-core Accelerators,” IEEE 9th International Symposium on Embedded Multicore/Many-core Systemson-Chip (MCSoC), 2015. IC.11. Alessandro Capotondi, Andrea Marongiu, Luca Benini, “Enabling Scalable and Fine-Grained Nested Parallelism on Embedded Many-cores,” IEEE 9th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC), 2015. IC.12. Francesco Conti, Chuck Pilkington, Andrea Marongiu, Luca Benini, “He-P2012: Architectural heterogeneity exploration on a scalable many-core platform,” 25th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP), 2014. IC.13. Paolo Burgio, Robin Danilo, Andrea Marongiu, Philippe Coussy, Luca Benini, “A tightly-coupled hardware controller to improve scalability and programmability of shared-memory heterogeneous clusters,” Design, Automation, and Test in Europe conference (DATE), 2014. IC.14. Paolo Burgio, Giuseppe Tagliavini, Francesco Conti, Andrea Marongiu, Luca Benini, “Tightly-coupled hardware support to dynamic parallelism acceleration in embedded shared memory clusters,” Design, Automation, and Test in Europe conference (DATE), 2014. IC.15. Luı́s Miguel Pinho, Eduardo Quiñones, Marko Bertogna, Andrea Marongiu, Jorge Pereira Carlos, Claudio Scordino, Michele Ramponi, “P-SOCRATES: A Parallel Software Framework for Time-Critical Many-Core Systems,” Euromicro Conference on Digital System Design: Architectures, Methods and Tools (DSD), 2014. IC.16. Paolo Burgio, Andrea Marongiu, Philippe Coussy, Luca Benini, “A HLS-Based Toolflow to Design Next-Generation Heterogeneous Many-Core Platforms with Shared Memory,” IEEE International Conference on Embedded and Ubiquitous Computing (EUC), 2014. IC.17. Francesco Conti, Chuck Pilkington, Andrea Marongiu, Luca Benini, “He-P2012: architectural heterogeneity exploration on a scalable many-core platform,” ACM Great Lakes Symposium on VLSI (GLSVLSI), 2014. IC.18. Marco Balboni, Marta Ortı́n-Obón, Alessandro Capotondi, Hervé Tatenguem Fankem, Alberto Ghiribaldi, Luca Ramini, Vı́ctor Viñals, Andrea Marongiu, Davide Bertozzi, “Augmenting manycore programmable accelerators with photonic interconnect technology for the high-end embedded computing domain,” IEEE/ACM International Synposium on Networkson-Chip (NOCS), 2014. IC.19. Dimitra Papagiannopoulou, Tali Moreshet, Andrea Marongiu, Luca Benini, Maurice Herlihy, R. Iris Bahar, “Speculative synchronization for coherence-free embedded NUMA architectures,” International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (ICSAMOS), 2014. IC.20. Francesco Conti, Andrea Marongiu, Luca Benini, “Synthesis-friendly techniques for tightly-coupled integration of hardware accelerators into shared-memory multi-core clusters,” International Conference on Hardware/Software Codesign and System Synthesis (CODES), 2013. IC.21. Abbas Rahimi, Andrea Marongiu, Rajesh Gupta, Luca Benini, “A Variability-Aware OpenMP Environment for Efficient Execution of Accuracy-Configurable Computation on Shared-FPU Processor Clusters,” International Conference on Hardware/Software Codesign and System Synthesis (CODES), 2013. 12 IC.22. Paolo Burgio, Andrea Marongiu, Robin Danilo, Philippe Coussy, Luca Benini, “Architecture and Programming Model Support for Efficient Heterogeneous Computing on Tigthly-Coupled Shared-Memory Clusters,” Design and Architectures for Signal and Image Processing (DASIP), 2013. IC.23. Abbas Rahimi, Andrea Marongiu, Paolo Burgio, Rajesh K. Gupta, Luca Benini, “Variation-tolerant OpenMP tasking on tightly-coupled processor clusters,” Design, Automation, and Test in Europe conference (DATE), 2013. IC.24. Paolo Burgio, Giuseppe Tagliavini, Andrea Marongiu, Luca Benini, “Enabling fine-grained OpenMP tasking on tightlycoupled shared memory clusters,” Design, Automation, and Test in Europe conference (DATE), 2013. IC.25. Andrea Marongiu, Paolo Burgio, Luca Benini, “Fast and Lightweight Support for Nested Parallelism on Cluster-Based Embedded Many-Cores,” Design, Automation, and Test in Europe conference (DATE), 2012. IC.26. José L. Abellán, Daniele Bortolotti, Andrea Marongiu, Davide Bertozzi, Juan Fernández, Manuel E. Acacio, Luca Benini, “Design of a Collective Communication Infrastructure for Barrier Synchronization in Cluster-Based Nanoscale MPSoCs,” Design, Automation, and Test in Europe conference (DATE), 2012. IC.27. Paolo Burgio, Andrea Marongiu, Dominique Heller, Cyrille Chavet, Philippe Coussy, Luca Benini, “OpenMP-based Synergistic Parallelization and HW Acceleration for On-Chip Shared-Memory Clusters,” Euromicro Conference on Digital System Design: Architectures, Methods and Tools (DSD), 2012. IC.28. Masoud Dehyadegari, Andrea Marongiu, Mohammad Reza Kakoee, Luca Benini, Siamak Mohammadi, Naser Yazdani, “A Tightly-Coupled Multi-Core Cluster with Shared-Memory HW Accelerators,” International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (ICSAMOS), 2012. IC.29. Christian Pinto, Shivani Raghav, Andrea Marongiu, Martino Ruggiero, David Atienza, Luca Benini, “GPGPU-Accelerated Parallel and Fast Simulation of Thousand-Core Platforms,” 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), 2011. IC.30. Alessio Franceschelli, Paolo Burgio, Giuseppe Tagliavini, Andrea Marongiu, Martino Ruggiero, Michele Lombardi, Alessio Bonfietti, Michela Milano, Luca Benini, “MPOpt-Cell: a high-performance data-flow programming environment for the CELL BE processor,” 8th ACM International Conference on Computing Frontiers, 2011 IC.31. Cesare Ferri, Andrea Marongiu, Benjamin Lipton, Iris R. Bahar, Luca Benini, Maurice Herlihy, Tali Moreshet, “SoCTM: Integrated HW/SW Support for Transactional Memory Programming on Embedded MPSoCs,” International Conference on Hardware/Software Codesign and System Synthesis (CODES), 2011. IC.32. Daniele Bortolotti, Francesco Paterna, Christian Pinto, Andrea Marongiu, Martino Ruggiero, Luca Benini, “Exploring instruction caching strategies for tightly-coupled shared-memory clusters,” International Symposium on System on Chip (SoC), 2011. IC.33. Andrea Marongiu, Paolo Burgio, Luca Benini, “Vertical stealing: robust, locality-aware do-all workload distribution for 3D MPSoCs,” International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES), 2010. IC.34. Jaume Joven, Andrea Marongiu, Federico Angiolini, Luca Benini, Giovanni De Micheli, “Exploring programming model-driven QoS support for NoC-based platforms,” International Conference on Hardware/Software Codesign and System Synthesis (CODES), 2010. IC.35. Andrea Marongiu, Martino Ruggiero, Luca Benini, “Efficient OpenMP data mapping for multicore platforms with vertically stacked memory,” Design, Automation, and Test in Europe conference (DATE), 2010. IC.36. Andrea Marongiu, Paolo Burgio, Luca Benini, “Evaluating OpenMP Support Costs on MPSoCs,” Euromicro Conference on Digital System Design: Architectures, Methods and Tools (DSD), 2010. IC.37. Shivani Raghav, Martino Ruggiero, David Atienza, Christian Pinto, Andrea Marongiu, Luca Benini, “Scalable instruction set simulator for thousand-core architectures running on GPGPUs,” International Conference on High Performance Computing and Simulation (HPCS), 2010. IC.38. Andrea Marongiu, Luca Benini, “Efficient OpenMP support and extensions for MPSoCs with explicitly managed memory hierarchy,” Design, Automation, and Test in Europe conference (DATE), 2009. IC.39. Andrea Marongiu, Andrea Acquaviva, Luca Benini, “OpenMP Support for NBTI-Induced Aging Tolerance in MPSoCs,” Stabilization, Safety and Security of Distributed Systems (SSS), 2009. IC.40. Andrea Marongiu, Luca Benini, Andrea Acquaviva, Andrea Bartolini, “Analysis of Power Management Strategies for a Large-Scale SoC Platform in 65nm Technology,” Euromicro Conference on Digital System Design: Architectures, Methods and Tools (DSD), 2008. IC.41. Andrea Marongiu, Luca Benini, Mahmut T. Kandemir, “Lightweight barrier-based parallelization support for non-cachecoherent MPSoC platforms,” International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES), 2007. IC.42. Giovanni Busonera, Salvatore Carta, Andrea Marongiu, Luigi Raffo, “Automatic Application Partitioning on FPGA/CPU Systems Based on Detailed Low-Level Information,” Euromicro Conference on Digital System Design: Architectures, Methods and Tools (DSD), 2006. 13 R EFEREED INTERNATIONAL WORKSHOPS NC.1. Daniele Palossi, Andrea Marongiu, “Exploring Single-Source Shortest Path Parallelization on Shared Memory Accelerators,” 19th International Workshop on Software and Compilers for Embedded Systems (SCOPES), 2016. NC.2. Alessandro Capotondi, Germain Haugou, Andrea Marongiu, Luca Benini, “Runtime Support for Multiple Offload-Based Programming Models on Embedded Manycore Accelerators,” International Workshop on Code Optimization for Multi and Many Cores (COSMIC), 2015. NC.3. Pirmin Vogel, Andrea Marongiu, Luca Benini, “An Evaluation of Memory Sharing Performance for Heterogeneous Embedded SoCs with Many-Core Accelerators,” International Workshop on Code Optimization for Multi and Many Cores (COSMIC), 2015. NC.4. Giuseppe Tagliavini, Germain Haugou, Andrea Marongiu, Luca Benini, “A framework for optimizing OpenVX applications performance on embedded manycore accelerators,” 18th International Workshop on Software and Compilers for Embedded Systems (SCOPES), 2015. NC.5. Hayder Al-Khalissi, Mladen Berekovic, Andrea Marongiu, “On the Relevance of Architectural Awareness for Efficient Fork/Join Support on Cluster-Based Manycores,” International Workshop on Manycore Embedded Systems (MES), 2014. NC.6. Christian Pinto, Andrea Marongiu, Luca Benini, “A Virtualization Framework for IOMMU-less Many-Core Accelerators,” International Workshop on Manycore Embedded Systems (MES), 2014. NC.7. Vincent Nélis, Patrick Meumeu Yomsi, Luı́s Miguel Pinho, José Carlos Fonseca, Marko Bertogna, Eduardo Quiñones, Roberto Vargas, Andrea Marongiu, “The Challenge of Time-Predictability in Modern Many-Core Architectures,” 14th International Workshop on Worst-Case Execution Time Analysis (WCET), 2014. NC.8. Daniele Bortolotti, Christian Pinto, Andrea Marongiu, Martino Ruggiero, Luca Benini, “VirtualSoC: A Full-System Simulation Environment for Massively Parallel Heterogeneous System-on-Chip,” International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2013. NC.9. Andrea Marongiu, Alessandro Capotondi, Giuseppe Tagliavini, Luca Benini, “Improving the programmability of STHORMbased heterogeneous systems with offload-enabled OpenMP,” International Workshop on Manycore Embedded Systems (MES), 2013. NC.10. Dimitra Papagiannopoulou, R. Iris Bahar, Tali Moreshet, Maurice Herlihy, Andrea Marongiu, Luca Benini: “Transparent and energy-efficient speculation on NUMA architectures for embedded MPSoCs,” International Workshop on Manycore Embedded Systems (MES), 2013. NC.11. Shivani Raghav, Andrea Marongiu, Christian Pinto, David Atienza, Martino Ruggiero, Luca Benini, “Full system simulation of many-core heterogeneous SoCs using GPU and QEMU semihosting,” Workshop on General Purpose Processing with Graphics Processing Units (GPGPU-5), 2012. NC.12. Hayder Al-Khalissi, Andrea Marongiu, Mladen Berekovic, “Low-Overhead Barrier Synchronization for OpenMP-like Parallelism on the Single-Chip Cloud Computer,” Many-core Applications Research Community (MARC) Symposium, 2012. NC.13. Martin Schindewolf, Albert Cohen, Wolfgang Karl, Andrea Marongiu, Luca Benini, “Towards Transactional Memory Support for GCC,” International Workshop on GCC Research Opportunities (GROW), 2009. 14 Date 13/06/2016 15