here - Micrel Home Page

advertisement
Curriculum Vitæ et Studiorum
Name
Date of birth
Citizenship
Email
Web page
Address
Andrea Marongiu
January 30, 1978
Italian
a.marongiu@iis.ee.ethz.ch, a.marongiu@unibo.it
http://www-micrel.deis.unibo.it/~marongiu/
Swiss Federal Institute of Technology
Department of Information Technology and Electrical Engineering
Gloriastrasse 35, 8092, Zurich, Switzerland
Phone: +41 44 632 6087
Short Bio
Dr. Andrea Marongiu received the PhD degree in electronic engineering from the University of Bologna,
Italy, in 2010. He currently holds a postdoc position at ETH Zurich, Switzerland and at the University of
Bologna. His research interests focus on programming models and architectures in the domain of heterogeneous multi- and many-core systems on a chip. This includes programming model, compiler and runtime
support to efficiently address performance, predictability, energy and reliability issues in parallel, embedded
systems, as well as HW-SW co-design of accelerator-based MPSoCs. In this field, he has published more
than 70 papers in international peer-reviewed conferences and journals, with more than 400 citations
and an h-index of 12 [Google Scholar]. He has collaborated with several international research institutes and
companies.
Contents
Position and Education . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
Awards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
Professional Activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
Teaching Activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
Research Interests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
Publication List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
1
Position and Education
R ECORD OF E MPLOYMENT
10/2013 – present
Postdoctoral Research Associate at the Dept. of Information Technology and Electrical Engineering
(D-ITET) of the Swiss Federal Institute of Technology in Zurich (ETHZ).
09/2015 – present
Research Consultant at the Dept. of Electrical, Electronic and Information Engineering (DEI) of Università di Bologna.
05/2010 – 08/2015
Postdoctoral Research Associate at the Dept. of Electrical, Electronic and Information Engineering (DEI)
of Università di Bologna.
E DUCATION
• Ph.D. degree in Electronics, Telecommunications and Information Technologies Engineering at Università di Bologna in 2010.
Thesis Title: Tecniche di ottimizzazione del software per sistemi su singolo chip per applicazioni di Nomadic Computing,
Advisor: Prof. Luca Benini
• Laurea Degree in Electronic Engineering in 2005
Thesis title: Progetto e implementazione di un sistema di partizionamento hardware/software per architetture riconfigurabili,
Advisor: Prof. Luigi Raffo, Prof. Salvatore Carta
V ISITING EXPERIENCES
• Visiting researcher at INRIA Futurs - Parc Orsay Université, Orsay Cedex France [Ref. Albert Cohen]
(Jun-Sept 2008)
• Visiting researcher at Brown University - Dept. of Electronics, Providence, Rhode Island, United States
[Ref. Prof. Iris R. Bahar, Prof. Maurice Herlihy] (Nov 2010 - May 2011)
2
Awards
AW.1.
Best paper award: Paolo Burgio, Andrea Marongiu, Paolo Valente, Marko Bertogna, “A memorycentric approach to enable timing-predictability within embedded many-core accelerators,”
ACM/IEEE/CSI Symposium on Real-Time and Embedded Systems and Technologies (RTEST),
2015. 3
AW.2.
Best paper award: Francesco Conti, Chuck Pilkington, Andrea Marongiu, Luca Benini, “HeP2012: Architectural heterogeneity exploration on a scalable many-core platform,” 25th IEEE
International Conference on Application-specific Systems, Architectures and Processors (ASAP),
2014. 12
AW.3.
Best paper award: Dimitra Papagiannopoulou, Tali Moreshet, Andrea Marongiu, Luca Benini,
Maurice Herlihy, R. Iris Bahar, “Speculative synchronization for coherence-free embedded
NUMA architectures,” International Conference on Embedded Computer Systems: Architectures,
Modeling and Simulation (ICSAMOS), 2014. 19
AW.4.
Best poster award: P. Burgio, A. Marongiu, L. Benini. “OpenMP extensions to exploit HW acceleration on shared-memory many-core clusters,” International Conference on Design, Automation
and Test in Europe (DATE), 2013.
AW.5.
Best paper award candidate: Cesare Ferri, Andrea Marongiu, Benjamin Lipton, Iris R. Bahar, Luca Benini, Maurice Herlihy, Tali Moreshet, “SoC-TM: Integrated HW/SW Support for
Transactional Memory Programming on Embedded MPSoCs,” International Conference on Hardware/Software Codesign and System Synthesis (CODES), 2011. 31
AW.6.
Best paper award candidate: Jaume Joven, Andrea Marongiu, Federico Angiolini, Luca
Benini, Giovanni De Micheli, “Exploring programming model-driven QoS support for NoCbased platforms,” International Conference on Hardware/Software Codesign and System Synthesis
(CODES), 2010. 34
AW.7.
Best paper award candidate: Shivani Raghav, Martino Ruggiero, David Atienza, Christian Pinto,
Andrea Marongiu, Luca Benini, “Scalable instruction set simulator for thousand-core architectures
running on GPGPUs,” International Conference on High Performance Computing and Simulation
(HPCS), 2010. 37
3
Professional Activities
C ONTRIBUTION TO NATIONAL AND I NTERNATIONAL R ESEARCH P ROJECTS
• H2020-ICT-688860-HERCULES: High-performance real-time architectures for low-power embedded
systems [Jan 2016 – ]
http://hercules2020.eu/
Local project leader: Prof. Luca Benini (ETHZ)
Role: Co-applicant, Work-Package Leader and Research Team Member (Level of involvement: HIGH)
Activities: The project is aimed at achieving predictable performance on top of cutting-edge heterogeneous COTS multi-core platforms, with an order-of-magnitude improvement in the cost and power
consumption. The research done at ETHZ under the lead of Andrea Marongiu focuses on developing
predictable execution models on top of existing programming models and compiler infrastructures.
• FP7-ICT-611016-P-SOCRATES: Parallel Software Framework for Time-Critical Many-core Systems [Oct
2013 – ]
http://www.p-socrates.eu/
Local project leader: Prof. Luca Benini (ETHZ)
Role: Co-applicant, Work-Package Leader and Research Team Member (Level of involvement: HIGH)
Activities: The project is aimed at allowing applications with high-performance and real-time requirements to fully exploit the computation potential of many-core processors, whilst ensuring a predictable
performance and simplifying application development. The research done at ETHZ under the lead of
Andrea Marongiu focuses on developing optimized runtime systems for parallel many-core accelerators
and on exploring FPGA-based architectural extensions for improved main memory sharing.
• FP7-ICT-288574-VIRTICAL:SW/HW extensions for virtualized heterogeneous multicore platforms [Jul
2011 – Oct 2014]
Local project leader: Prof. Luca Benini (UNIBO)
Role: Co-applicant, Work-Package Leader and Research Team Member (Level of involvement: HIGH)
Activities: The project targeted hardware and software extensions for virtualization of heterogeneous
embedded multicore platforms. The research done at UNIBO under the lead of Andrea Marongiu focused
on developing programming model extensions for efficient use of accelerators (programmable parallel
systems or HW processing units) in a fully-virtualized SoC.
• FP7-ICT-248776-PRO3D: Programming for Future 3D Architecture with Many Cores [Jan 2010 - Dec
2012]
Local project leader: Prof. Luca Benini (UNIBO)
Role: Work-Package Leader and Research Team Member (Level of involvement: HIGH)
Activities: The project was aimed at enhancing the programmability of future 3D multicore platforms.
The activities focused on i) developing data mapping and distribution techniques in a 3D-stacked partitioned global address space (PGAS) machine; ii) contributing to the development of parallel simulation
infrastructures running on general-purpose GPUs.
• FP7-IDEAS-ERC-291125-MULTITHERMAN: Multi-Scale Thermal Management of Computing Systems
[Apr 2012 – ]
http://wwwmicrel.deis.unibo.it/multitherman
Local project leader: Prof. Luca Benini (UNIBO)
Role: Research Team Member (Level of involvement: LOW)
Activities: The project aims at moving beyond worst-case design practices adopted in traditional thermal
planning and reactive thermal management by integrating thermal-aware platform design, thermal control
4
with workload management and shaping in a distributed, multi-scale strategy. The activities focus on
collaborating to the development of runtime systems for distributed, parallel systems.
• ARTEMIS-100230-SMECY : Smart Multicore Embedded SYstems [Feb 2010 – Jan 2013]
Local project leader: Prof. Luca Benini (UNIBO)
Role: Task leader, Research Team Member (Level of involvement: MEDIUM)
Activities: The project was aimed at developing new programming technologies enabling the exploitation of many (100s) core architectures. The activities focused on developing language- and compilerlevel techniques for improved data locality in computations offloaded to a many-core architecture (STP2012/STHORM).
• FP7-ICT-224170-SHARE: Sharing Open Source Software Middleware to improve industry competitiveness in the embedded systems domain (CSA) [May 2008 - Apr 2010]
Local project leader: Prof. Luca Benini (UNIBO)
Role: Work-Package Leader and Research Team Member (Level of involvement: HIGH)
Activities: The project was a support action aimed at fostering the diffusion and adoption of open-source
software. The activities focused on i) collaborating to the creation of a web-based tool to evaluate existing open-source software in a comparative manner; ii) organizing dissemination events and workshops
to promote the initiative.
S CIENTIFIC C OLLABORATIONS (B EYOND EU- PROJECTS )
• Collaborations with Italian academic institutions:
– Politecnico di Milano - Italy (2015 - ongoing)
Contact Person: Cristina Silvano
Topic: Customized, self-adaptive low-power computing.
– Politecnico di Torino - Italy (2008 - 2010)
Contact Person: Andrea Acquaviva
Topic: compiler and runtime techniques for NBTI-aware workload distribution in MPSoCs.
– University of Ferrara - Italy (2010 - ongoing)
Contact Person: Davide Bertozzi
Topic: Hardware-accelerated synchronization primitives for multi-cluster MPSoCs; MPSoC virtualization.
• Collaborations with international academic institutions:
– Penn State University - Pennsylvania (2006 - 2007)
Contact Person: Mahmut Kandemir
Topic: Lightweight synchronization support for compiler-automated loop-level parallelization.
– Penn State University - Pennsylvania (2016 - ongoing)
Contact Person: Vijaykrishnan Narayanan
Topic: Virtual shared memory performance on heterogeneous systems.
– INRIA Futurs - Orsay Cedex, France (2008 - 2009)
Contact Person: Albert Cohen
Topic: Compiler support for transactional memory programming.
– EPFL - Lausanne, Switzerland (2009 - 2010)
Contact Person: Giovanni De Micheli
Topic: Programming model-driven QoS in NoC-based MPSoCs
5
– EPFL - Lausanne, Switzerland (2010 - 2012)
Contact Person - David Atienza
Topic: GPGPU-accelerated simulation of many-core architectures.
– Brown University - Providence, Rhode Island (2010 - ongoing)
Contact Person: Iris R. Bahar
Topic: Integrated HW/SW support for transactional memory programming on embedded MPSoCs;
Transactional-memory based support to variability-induced error tolerance.
– Universite de Bretagne Sud - Lorient, France (2011 - ongoing)
Contact Person: Philippe Coussy
Topic: Architecture and tools for HLS-generated HW processing units integrated in shared-memory
MPSoCs.
– University of California, San Diego - California (2012 - 2015)
Contact Person: Rajesh Gupta
Topic: Architecture and programming model support for variability tolerance in on-chip manycores.
– Universitad Politecnica de Valencia - Spain (2011 - 2014)
Contact Person: José Flich
Topic: Network-on-Chip and operating system support for many-core accelerator virtualization in
heterogeneous embedded systems.
I NDUSTRIAL C OLLABORATIONS
• ST Microelectronics [2010-2012]
Topic: Technical leader for a research collaboration agreement on the integration of shared-memory
tightly-coupled accelerators in the STHORM heterogeneous on-board system. The research project focused on architecture support for shared-memory heterogeneous computing, as well as programming
model support for simplified development of accelerated applications.
• ST Microelectronics [2010-2012]
Topic: Technical leader for a research collaboration agreement on supporting the OpenMP programming
model on the STHORM heterogeneous on-board system. The research project focused on the development of an optimized runtime system for the accelerator, plus a toolchain and Linux driver for enabling
computation offloading from the ARM host system.
• Freescale Semiconductors Ltd. [2007]
Topic: Technical contributor for a research collaboration agreement on the development of Linux kernel
techniques for energy efficient mobile devices.
Program Committee Membership
• DATE (2014 - 2016) - Design Automation and Test in Europe
• FPL (2015) - International Conference on Field-Programmable Logic and Applications
• EUC (2014 - 2015) - International Conference on Embedded and Ubiquitous Computing
• MCSoC (2014 - 2016) - International Symposium on Embedded Multicore/Many-core Systems-on-Chip
• DASIP (2013) - Design and Architectures for Signal and Image Processing
• SCOPES (2014 - 2016) - International Workshop on Software and Compilers for Embedded Systems
• SOMRES (2011) - Workshop on Synthesis and Optimization Methods for Real-Time Embedded Systems
6
R EFEREE SERVICES IN J OURNALS AND C ONFERENCES
• Conferences and Workshops:
FPL - International Conference on Field Programmable Logic and Applications, DATE - Design Automation and Test in Europe, ICECS - International Conference on Electronics, Circuits, and Systems,
PACT Parallel Architectures and Compilation Techniques, LCTES Languages, Compilers, Tools and
Theory for Embedded Systems, ICS International Conference on Supercomputing, SCOPES Workshop
on Software and Compilers for Embedded Systems, CODES Conference on Design and Architectures for
Signal and Image Processing, EUC - Embedded and Ubiquitous Computing, DASIP - Design and Architectures for Signal and Image Processing, HIRES - High-performance and Real-time Embedded Systems,
ETFA - Emerging Technologies and Factory Automation, MCSOC - International Symposium on Embedded Multicore/Many-core Systems-on-Chip, WEHA - International Workshop on Energy-aware high
performance Heterogeneous Architectures and Accelerators.
• Journals: IEEE Transactions on Computers (TC), IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), IEEE Transactions on Industrial Informatics (TII), IEEE Transactions on Signal Processing (TSP), IEEE Transactions on Parallel and Distributed Systems (TPDS),
ACM Transactions on Embedded Computing Systems (TECS), ACM Transactions on Design Automation of Electronic Systems (TODAES), ACM Transactions on Architecture and Code Optimization (TACO),
ACM Transactions on Reconfigurable Technology and Systems (TRETS),
ELSEVIER Microprocessors and Microsystems: Embedded Hardware Design (MICPRO), ELSEVIER
Journal of System Architecture (JSA), ELSEVIER Computers & Electrical Engineering (COMPELECENG), ELSEVIER Journal of Systems and Software (JSS), ELSEVIER Journal of Future Generation
Computer Systems (FGCS), ELSEVIER Simulation Modelling Practice and Theory (SIMPAT), ELSEVIER Microelectronics Journal (MEJ), ELSEVIER Integration, the VLSI Journal (VLSI), ELSEVIER
Journal of Parallel and Distributed Computing (JPDC),
SPRINGER International Journal of Parallel Programming (IJPP), SPRINGER Journal of Real-Time
Image Processing (JRTIP), SPRINGER Journal of Supercomputing (JSUPE),
7
Teaching Activities
2013 - now
Teaching Assistant
Swiss Federal Institute of Technology in Zurich - Electronics Engineering Degree - Graduate level
Advanced System-on-Chip Design - Lectures, seminars, exercises.
2013 - now
Teaching Assistant
Università di Bologna - Electronics Engineering Degree - Undergraduate level
Hardware/Software Design Methodologies - Lectures, seminars, exercises.
2007 - 2013
Teaching Assistant
Università di Bologna - Electronics Engineering Degree - Graduate level
Metodologie di Progettazione Hardware/Software - Lectures, seminars, exercises.
2008
Guest Lecturer
Università di Verona - Computer Science Degree - Undergraduate level
Distributed Embedded Systems - Lectures, seminars.
S TUDENTS ’
SUPERVISION
PhD Students Supervision
• ETHZ Bjorn Forsberg, Daniele Palossi, Pirmin Vogel.
• UNIBO Giuseppe Tagliavini, Paolo Burgio, Francesco Conti, Alessandro Capotondi, Christian Pinto,
Daniele Bortolotti.
Graduate Students Supervision/Co-Advising
• ETHZ Alessandro Angelino, Roberto Roncone, Maheshwara Sharma.
• UNIBO Daniele Cesarini, Maria Abrahamyan, Alessio Franceschelli, Francesco Conti, Alessandro Capotondi, Christian Pinto, Francesco Lucchi, Matteo Bruni.
Visiting Students Supervision/Co-Advising
• Master Students Mariyah Abrahamian (Alari).
• PhD Students Dimitra Papagiannopoulou (Brown University), Masoud Dehyadegari (University of Teheran),
Abbas Rahimi (University of California, San Diego).
8
Research Interests
My main research interests are related to architectures and programming models for heterogeneous SoCs featuring multi/many-core processing units. Particular emphasis is on efficient exploitation of memory hierarchies
and accelerators (HW processing units and GPU-like manycore co-processors). The activities carried out in
the last years span over the following research lines.
A RCHITECTURES AND P ROGRAMMING M ODELS FOR H ETEROGENEOUS S O C S
Within the framework of the Parallel Ultra-Low-Power platform (PULP) project12 , the research team I supervise focuses on programming models and architectures for heterogeneous systems-on-chip (SoC). The
main activities include: i) the design of runtime system (OpenMP, CUDA, OpenCL, OpenVX) and compiler techniques to address performance/energy issues [JR.14] [JR.8] [JR.3] [IC.25] [IC.24] [NC.9] and, more
recently, predictability requirements [IC.7] [IC.4] [IC.3] [JR.6] [IC.15]; ii) the design of architectural and
programming-level support for lightweight many-core accelerator virtualization [JR.2] [IC.5] [NC.3] [NC.6];
iii) the design of architectural and programming-level support for tightly-coupled shared memory HW accelerators [JR.7] [JR.5] [IC.12] [IC.20] [IC.27] [IC.28]; iv) the development of simulation infrastructures for
many-core based heterogeneous systems [JR.9] [JR.1] [JR.11] [IC.29] [IC.32] [IC.37] [NC.8] [NC.11].
T RANSACTIONAL MEMORY
My activities in this field started in 2008, when I was involved in a project for developing support for TM
programming in the GNU GCC compiler, during an internship at INRIA Paris. At that time I also studied
the applicability of transactional memory to speculative parallelization of irregular applications (e.g. sparse
array reductions). Transactions were leveraged to protect concurrent update operations in critical sections
within parallel reduction loops. An initial proof-of-concept implementation of the technique within the autoparallelization pass (tree-parloops) in GCC was also developed. My research in this field continued over the
years through a collaboration with the Brown University. A fully integrated HW/SW solution for transactional
programming on embedded MPSoCs has been designed and developed within a virtual platform enabling
full-system cycle-accurate simulation. The proposed HTM design leverages a dedicated module responsible
for managing conflicts. This is achieved in a very lightweight and fast manner by employing Bloom filters.
Application developers are not meant to directly interact with this HW module, nor to cope with low-level
transactional programming APIs. Transactional features are triggered through a set of compiler directives,
implemented as an extension to the OpenTM programming model (compiler and runtime system) from Stanford University. Support to speculative parallelism is also provided to further improve ease of programming.
Loops with non-independent iterations can be annotated for speculative parallel execution. The underlying TM
system ensures that, in case a real dependence arises, the original sequential program semantics is preserved.
This is achieved by forcing transactions to commit in program order, thanks to specific hardware support for
prioritized commit [IC.31] [IC.19] [NC.10] [NC.13].
HW/SW SUPPORT FOR TOLERATING VARIABILITY- INDUCED ERRORS
In this field I am working in three main areas:
• the collaboration with Brown University is still ongoing, and the transactional memory support developed
over the years is currently being adapted to support tolerance to variability-induced computation errors.
This revisited use of traditional TM allows to operate the platform at reduced voltages, as the recovery
1 www.pulp-platform.org,
2 www-micrel.deis.unibo.it/pulp-project/
9
mechanisms ensure that if an error occurs the transaction that experienced it can be safely aborted and
rolled back, after the appropriate countermeasures in terms of voltage adjustment have been taken [IC.8].
• within a collaboration with the University of California, San Diego, we explored runtime support for costeffective countermeasures against hardware timing failures during system operation. Instead than ultraconservative multi-corner design margins or costly circuit-level error recovery mechanisms we propose
a variability-aware extension to the OpenMP v3.0 programming model. Using the notion of work-unit
vulnerability (WUV) we capture timing errors caused by circuit-level variability as high-level software
knowledge. WUV provides a useful abstraction of hardware variability to efficiently allocate a given
work-unit to a suitable core for execution [JR.10] [IC.6] [IC.21] [IC.23].
• in ultra-low-power embedded devices aggressive voltage scaling techniques have the potential to reduce
the power consumption within the admitted envelope, but memory operations on standard six-transistor
static RAM (6T-SRAM) become unreliable. To cope with this problem we proposed hybrid memory
systems coupling 6T-SRAM to standard cell memory (SCM). SCM stays reliable at low voltages, but is
very costly and thus cannot fully replace SRAM. By providing programming model constructs to specify
which data and computation exhibit inherent tolerance to computation errors and hardware support to
split error-tolerant data between SRAM and SCM, the memory system can be powered at a low voltage
while ensuring correct operation by binding possible (flip-bit) errors to the LSBs only [IC.9].
M EMORY M ANAGEMENT
Most embedded multi-processor systems on a chip (MPSoC) feature explicitly managed (scratchpad-based)
memory hierarchies. I have explored efficient management of such systems via extensions to the popular
OpenMP API to fit the constrained requirements of MPSoCs and to adhere to the Partitioned Global Address
Space (PGAS) organization of the memory system often assumed in the targeted devices [JR.13] [IC.38] [IC.35]
[IC.33]. The extensions can be summarized as follows:
• Features to trigger data distribution and data movement (additional directives);
• Compiler support to data distribution, based on lightweight array access instrumentation (software address translation) and DMA-based data transfer;
• A lightweight lookup mechanism based on compiler-generated metadata for low-cost distributed array
references;
• An allocation compiler pass that exploits profile information on array access count to determine a data
distribution scheme which captures data locality at each parallel region;
10
Publication List
International journals
International conferences
International workshops
( # 14 )
( # 42 )
( # 13 )
JR.1.
Daniele Bortolotti, Andrea Marongiu, Luca Benini, “VirtualSoC: a Research Tool for Modern MPSoCs,” ACM Transactions on Embedded Computing Systems [To appear], Sept. 2016
JR.2.
Alessandro Capotondi, Germain Haugou, Andrea Marongiu, Luca Benini, “Runtime Support for Multiple Offload-Based
Programming Models on Embedded Manycore Accelerators,” IEEE Transactions on Emerging Topics in Computing
[Preprint] 2016. [doi: http://doi.ieeecomputersociety.org/10.1109/TETC.2016.2554318]
JR.3.
Andrea Marongiu, Alessandro Capotondi, Luca Benini, “Controlling NUMA effects in embedded manycore applications
with lightweight nested parallelism support,” Parallel Computing (Elsevier) [Preprint] 2016. [doi: http://dx.doi.
org/10.1016/j.parco.2016.02.002]
JR.4.
Giuseppe Tagliavini, Germain Haugou, Andrea Marongiu, Luca Benini, “Optimizing memory bandwidth exploitation
for OpenVX applications on embedded many-core accelerators,” Journal of Real-Time Image Processing (Springer),
2015. [doi: http://dx.doi.org/10.1007/s11554-015-0544-0]
JR.5.
Francesco Conti, Andrea Marongiu, Chuck Pilkington, Luca Benini, “He-P2012: Performance and Energy Exploration
of Architecturally Heterogeneous Many-Cores,” Journal of Signal Processing Systems (Springer), 2015. [doi: http:
//dx.doi.org/10.1007/s11265-015-1056-7]
JR.6.
Lus Miguel Pinho, Vincent Nélis, Patrick Meumeu Yomsi, Eduardo Quiñones, Marko Bertogna, Paolo Burgio, Andrea Marongiu, Claudio Scordino, Paolo Gai, Michele Ramponi, Michal Mardiak, “P-SOCRATES: A parallel software
framework for time-critical many-core systems,” Microprocessors and Microsystems (Elsevier), 2015. [doi: http:
//dx.doi.org/10.1016/j.micpro.2015.06.004]
JR.7.
Masoud Dehyadegari, Andrea Marongiu, Mohammad Reza Kakoee, Siamak Mohammadi, Nasser Yazdani, Luca Benini,
“Architecture Support for Tightly-Coupled Multi-Core Clusters with Shared-Memory HW Accelerators,” IEEE Transactions on Computers, 2015. [doi: http://dx.doi.org/10.1109/TC.2014.2360522]
JR.8.
Andrea Marongiu, Alessandro Capotondi, Giuseppe Tagliavini, Luca Benini, “Simplifying Many-Core-Based Heterogeneous SoC Programming With Offload Directives,” IEEE Transactions on Industrial Informatics, 2015. [doi:
http://dx.doi.org/10.1109/TII.2015.2449994]
JR.9.
Shivani Raghav, Martino Ruggiero, Andrea Marongiu, Christian Pinto, David Atienza, Luca Benini, “GPU Acceleration
for Simulating Massively Parallel Many-core Platforms,” IEEE Transactions on Parallel and Distributed Systems, 2015.
[doi: http://dx.doi.org/10.1109/TPDS.2014.2319092]
JR.10.
Abbas Rahimi, Daniele Cesarini, Andrea Marongiu, Rajesh K. Gupta, Luca Benini, “Improving Resilience to Timing
Errors by Exposing Variability Effects to Software in Tightly-Coupled Processor Clusters,” IEEE Journal on Emerging
and Selected Topics in Circuits And Systems, 2014. [doi: http://dx.doi.org/10.1109/JETCAS.2014.2315883]
JR.11.
Shivani Raghav, Andrea Marongiu, Christian Pinto, Martino Ruggiero, David Atienza, Luca Benini, “SIMinG-1k: A
thousand-core simulator running on general-purpose graphical processing units,” Concurrency and Computation: Practice and Experience (Wiley), 2013. [doi: http://dx.doi.org/10.1002/cpe.2940]
JR.12.
Jaume Joven, Andrea Marongiu, Federico Angiolini, Luca Benini, Giovanni De Micheli, “An integrated, programming
model-driven framework for NoC-QoS support in cluster-based embedded many-cores,” Parallel Computing (Elsevier),
2013. [doi: http://dx.doi.org/10.1016/j.parco.2013.06.002]
JR.13.
Andrea Marongiu, Luca Benini, “An OpenMP Compiler for Efficient Use of Distributed Scratchpad Memory in MPSoCs,” IEEE Transactions on Computers, 2012. [doi: http://dx.doi.org/10.1109/TC.2010.199]
JR.14.
Andrea Marongiu, Paolo Burgio, Luca Benini, “Supporting OpenMP on a multi-cluster embedded MPSoC,” Microprocessors and Microsystems (Elsevier), 2011. [doi: http://dx.doi.org/10.1016/j.micpro.2011.08.010]
R EFEREED INTERNATIONAL CONFERENCES
IC.1.
Francesco Conti, Daniele Palossi, Andrea Marongiu, Davide Rossi, Luca Benini, “Enabling the Heterogeneous Accelerator Model on Ultra-Low Power Microcontroller Platforms,” Design, Automation, and Test in Europe conference
(DATE), 2016.
IC.2.
Daniele Cesarini, Andrea Marongiu, Luca Benini, “An Optimized Task-Based Runtime System For Resource-Constrained
Parallel Accelerators,” Design, Automation, and Test in Europe conference (DATE), 2016.
11
IC.3.
Paolo Burgio, Andrea Marongiu, Paolo Valente, Marko Bertogna, “A memory-centric approach to enable timingpredictability within embedded many-core accelerators,” ACM/IEEE/CSI Symposium on Real-Time and Embedded Systems and Technologies (RTEST), 2015.
IC.4.
Maria A. Serrano, Alessandra Melani, Roberto Vargas, Andrea Marongiu, Marko Bertogna, Eduardo Quiones, “Timing
characterization of OpenMP4 tasking model,” International Conference on Compilers, Architecture and Synthesis for
Embedded Systems (CASES), 2015.
IC.5.
Pirmin Vogel, Andrea Marongiu, Luca Benini, “Lightweight virtual memory support for many-core accelerators in heterogeneous embedded SoCs,” International Conference on Hardware/Software Codesign and System Synthesis (CODES),
2015.
IC.6.
Abbas Rahimi, Daniele Cesarini, Andrea Marongiu, Rajesh K. Gupta, Luca Benini, “Task scheduling strategies to
mitigate hardware variability in embedded shared memory clusters,” Design Automation Conference (DAC), 2015.
IC.7.
Roberto Vargas, Eduardo Quiñones, Andrea Marongiu, “OpenMP and timing predictability: a possible union?,” Design,
Automation, and Test in Europe conference (DATE), 2015.
IC.8.
Dimitra Papagiannopoulou, Andrea Marongiu, Tali Moreshet, Luca Benini, Maurice Herlihy, R. Iris Bahar, “Playing
with Fire: Transactional Memory Revisited for Error-Resilient and Energy-Efficient MPSoC Execution,” ACM Great
Lakes Symposium on VLSI (GLSVLSI), 2015.
IC.9.
Giuseppe Tagliavini, Davide Rossi, Andrea Marongiu, Luca Benini, “Synergistic Architecture and Programming Model
Support for Approximate Micropower Computing,” IEEE Computer Society Annual Symposium on VLSI (ISVLSI), 2015.
IC.10.
Giuseppe Tagliavini, Germain Haugou, Andrea Marongiu, Luca Benini, “ADRENALINE: An OpenVX Environment to
Optimize Embedded Vision Applications on Many-core Accelerators,” IEEE 9th International Symposium on Embedded
Multicore/Many-core Systemson-Chip (MCSoC), 2015.
IC.11.
Alessandro Capotondi, Andrea Marongiu, Luca Benini, “Enabling Scalable and Fine-Grained Nested Parallelism on
Embedded Many-cores,” IEEE 9th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC), 2015.
IC.12.
Francesco Conti, Chuck Pilkington, Andrea Marongiu, Luca Benini, “He-P2012: Architectural heterogeneity exploration on a scalable many-core platform,” 25th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP), 2014.
IC.13.
Paolo Burgio, Robin Danilo, Andrea Marongiu, Philippe Coussy, Luca Benini, “A tightly-coupled hardware controller
to improve scalability and programmability of shared-memory heterogeneous clusters,” Design, Automation, and Test in
Europe conference (DATE), 2014.
IC.14.
Paolo Burgio, Giuseppe Tagliavini, Francesco Conti, Andrea Marongiu, Luca Benini, “Tightly-coupled hardware support to dynamic parallelism acceleration in embedded shared memory clusters,” Design, Automation, and Test in Europe
conference (DATE), 2014.
IC.15.
Luı́s Miguel Pinho, Eduardo Quiñones, Marko Bertogna, Andrea Marongiu, Jorge Pereira Carlos, Claudio Scordino,
Michele Ramponi, “P-SOCRATES: A Parallel Software Framework for Time-Critical Many-Core Systems,” Euromicro
Conference on Digital System Design: Architectures, Methods and Tools (DSD), 2014.
IC.16.
Paolo Burgio, Andrea Marongiu, Philippe Coussy, Luca Benini, “A HLS-Based Toolflow to Design Next-Generation
Heterogeneous Many-Core Platforms with Shared Memory,” IEEE International Conference on Embedded and Ubiquitous Computing (EUC), 2014.
IC.17.
Francesco Conti, Chuck Pilkington, Andrea Marongiu, Luca Benini, “He-P2012: architectural heterogeneity exploration
on a scalable many-core platform,” ACM Great Lakes Symposium on VLSI (GLSVLSI), 2014.
IC.18.
Marco Balboni, Marta Ortı́n-Obón, Alessandro Capotondi, Hervé Tatenguem Fankem, Alberto Ghiribaldi, Luca Ramini,
Vı́ctor Viñals, Andrea Marongiu, Davide Bertozzi, “Augmenting manycore programmable accelerators with photonic interconnect technology for the high-end embedded computing domain,” IEEE/ACM International Synposium on Networkson-Chip (NOCS), 2014.
IC.19.
Dimitra Papagiannopoulou, Tali Moreshet, Andrea Marongiu, Luca Benini, Maurice Herlihy, R. Iris Bahar, “Speculative
synchronization for coherence-free embedded NUMA architectures,” International Conference on Embedded Computer
Systems: Architectures, Modeling and Simulation (ICSAMOS), 2014.
IC.20.
Francesco Conti, Andrea Marongiu, Luca Benini, “Synthesis-friendly techniques for tightly-coupled integration of hardware accelerators into shared-memory multi-core clusters,” International Conference on Hardware/Software Codesign
and System Synthesis (CODES), 2013.
IC.21.
Abbas Rahimi, Andrea Marongiu, Rajesh Gupta, Luca Benini, “A Variability-Aware OpenMP Environment for Efficient
Execution of Accuracy-Configurable Computation on Shared-FPU Processor Clusters,” International Conference on
Hardware/Software Codesign and System Synthesis (CODES), 2013.
12
IC.22.
Paolo Burgio, Andrea Marongiu, Robin Danilo, Philippe Coussy, Luca Benini, “Architecture and Programming Model
Support for Efficient Heterogeneous Computing on Tigthly-Coupled Shared-Memory Clusters,” Design and Architectures for Signal and Image Processing (DASIP), 2013.
IC.23.
Abbas Rahimi, Andrea Marongiu, Paolo Burgio, Rajesh K. Gupta, Luca Benini, “Variation-tolerant OpenMP tasking on
tightly-coupled processor clusters,” Design, Automation, and Test in Europe conference (DATE), 2013.
IC.24.
Paolo Burgio, Giuseppe Tagliavini, Andrea Marongiu, Luca Benini, “Enabling fine-grained OpenMP tasking on tightlycoupled shared memory clusters,” Design, Automation, and Test in Europe conference (DATE), 2013.
IC.25.
Andrea Marongiu, Paolo Burgio, Luca Benini, “Fast and Lightweight Support for Nested Parallelism on Cluster-Based
Embedded Many-Cores,” Design, Automation, and Test in Europe conference (DATE), 2012.
IC.26.
José L. Abellán, Daniele Bortolotti, Andrea Marongiu, Davide Bertozzi, Juan Fernández, Manuel E. Acacio, Luca
Benini, “Design of a Collective Communication Infrastructure for Barrier Synchronization in Cluster-Based Nanoscale
MPSoCs,” Design, Automation, and Test in Europe conference (DATE), 2012.
IC.27.
Paolo Burgio, Andrea Marongiu, Dominique Heller, Cyrille Chavet, Philippe Coussy, Luca Benini, “OpenMP-based
Synergistic Parallelization and HW Acceleration for On-Chip Shared-Memory Clusters,” Euromicro Conference on
Digital System Design: Architectures, Methods and Tools (DSD), 2012.
IC.28.
Masoud Dehyadegari, Andrea Marongiu, Mohammad Reza Kakoee, Luca Benini, Siamak Mohammadi, Naser Yazdani,
“A Tightly-Coupled Multi-Core Cluster with Shared-Memory HW Accelerators,” International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (ICSAMOS), 2012.
IC.29.
Christian Pinto, Shivani Raghav, Andrea Marongiu, Martino Ruggiero, David Atienza, Luca Benini, “GPGPU-Accelerated
Parallel and Fast Simulation of Thousand-Core Platforms,” 11th IEEE/ACM International Symposium on Cluster, Cloud
and Grid Computing (CCGrid), 2011.
IC.30.
Alessio Franceschelli, Paolo Burgio, Giuseppe Tagliavini, Andrea Marongiu, Martino Ruggiero, Michele Lombardi,
Alessio Bonfietti, Michela Milano, Luca Benini, “MPOpt-Cell: a high-performance data-flow programming environment for the CELL BE processor,” 8th ACM International Conference on Computing Frontiers, 2011
IC.31.
Cesare Ferri, Andrea Marongiu, Benjamin Lipton, Iris R. Bahar, Luca Benini, Maurice Herlihy, Tali Moreshet, “SoCTM: Integrated HW/SW Support for Transactional Memory Programming on Embedded MPSoCs,” International Conference on Hardware/Software Codesign and System Synthesis (CODES), 2011.
IC.32.
Daniele Bortolotti, Francesco Paterna, Christian Pinto, Andrea Marongiu, Martino Ruggiero, Luca Benini, “Exploring
instruction caching strategies for tightly-coupled shared-memory clusters,” International Symposium on System on Chip
(SoC), 2011.
IC.33.
Andrea Marongiu, Paolo Burgio, Luca Benini, “Vertical stealing: robust, locality-aware do-all workload distribution
for 3D MPSoCs,” International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES),
2010.
IC.34.
Jaume Joven, Andrea Marongiu, Federico Angiolini, Luca Benini, Giovanni De Micheli, “Exploring programming
model-driven QoS support for NoC-based platforms,” International Conference on Hardware/Software Codesign and
System Synthesis (CODES), 2010.
IC.35.
Andrea Marongiu, Martino Ruggiero, Luca Benini, “Efficient OpenMP data mapping for multicore platforms with
vertically stacked memory,” Design, Automation, and Test in Europe conference (DATE), 2010.
IC.36.
Andrea Marongiu, Paolo Burgio, Luca Benini, “Evaluating OpenMP Support Costs on MPSoCs,” Euromicro Conference
on Digital System Design: Architectures, Methods and Tools (DSD), 2010.
IC.37.
Shivani Raghav, Martino Ruggiero, David Atienza, Christian Pinto, Andrea Marongiu, Luca Benini, “Scalable instruction set simulator for thousand-core architectures running on GPGPUs,” International Conference on High Performance
Computing and Simulation (HPCS), 2010.
IC.38.
Andrea Marongiu, Luca Benini, “Efficient OpenMP support and extensions for MPSoCs with explicitly managed memory hierarchy,” Design, Automation, and Test in Europe conference (DATE), 2009.
IC.39.
Andrea Marongiu, Andrea Acquaviva, Luca Benini, “OpenMP Support for NBTI-Induced Aging Tolerance in MPSoCs,”
Stabilization, Safety and Security of Distributed Systems (SSS), 2009.
IC.40.
Andrea Marongiu, Luca Benini, Andrea Acquaviva, Andrea Bartolini, “Analysis of Power Management Strategies for
a Large-Scale SoC Platform in 65nm Technology,” Euromicro Conference on Digital System Design: Architectures,
Methods and Tools (DSD), 2008.
IC.41.
Andrea Marongiu, Luca Benini, Mahmut T. Kandemir, “Lightweight barrier-based parallelization support for non-cachecoherent MPSoC platforms,” International Conference on Compilers, Architecture and Synthesis for Embedded Systems
(CASES), 2007.
IC.42.
Giovanni Busonera, Salvatore Carta, Andrea Marongiu, Luigi Raffo, “Automatic Application Partitioning on FPGA/CPU
Systems Based on Detailed Low-Level Information,” Euromicro Conference on Digital System Design: Architectures,
Methods and Tools (DSD), 2006.
13
R EFEREED INTERNATIONAL WORKSHOPS
NC.1.
Daniele Palossi, Andrea Marongiu, “Exploring Single-Source Shortest Path Parallelization on Shared Memory Accelerators,” 19th International Workshop on Software and Compilers for Embedded Systems (SCOPES), 2016.
NC.2.
Alessandro Capotondi, Germain Haugou, Andrea Marongiu, Luca Benini, “Runtime Support for Multiple Offload-Based
Programming Models on Embedded Manycore Accelerators,” International Workshop on Code Optimization for Multi
and Many Cores (COSMIC), 2015.
NC.3.
Pirmin Vogel, Andrea Marongiu, Luca Benini, “An Evaluation of Memory Sharing Performance for Heterogeneous
Embedded SoCs with Many-Core Accelerators,” International Workshop on Code Optimization for Multi and Many
Cores (COSMIC), 2015.
NC.4.
Giuseppe Tagliavini, Germain Haugou, Andrea Marongiu, Luca Benini, “A framework for optimizing OpenVX applications performance on embedded manycore accelerators,” 18th International Workshop on Software and Compilers for
Embedded Systems (SCOPES), 2015.
NC.5.
Hayder Al-Khalissi, Mladen Berekovic, Andrea Marongiu, “On the Relevance of Architectural Awareness for Efficient Fork/Join Support on Cluster-Based Manycores,” International Workshop on Manycore Embedded Systems (MES),
2014.
NC.6.
Christian Pinto, Andrea Marongiu, Luca Benini, “A Virtualization Framework for IOMMU-less Many-Core Accelerators,” International Workshop on Manycore Embedded Systems (MES), 2014.
NC.7.
Vincent Nélis, Patrick Meumeu Yomsi, Luı́s Miguel Pinho, José Carlos Fonseca, Marko Bertogna, Eduardo Quiñones,
Roberto Vargas, Andrea Marongiu, “The Challenge of Time-Predictability in Modern Many-Core Architectures,” 14th
International Workshop on Worst-Case Execution Time Analysis (WCET), 2014.
NC.8.
Daniele Bortolotti, Christian Pinto, Andrea Marongiu, Martino Ruggiero, Luca Benini, “VirtualSoC: A Full-System
Simulation Environment for Massively Parallel Heterogeneous System-on-Chip,” International Parallel and Distributed
Processing Symposium Workshops (IPDPSW), 2013.
NC.9.
Andrea Marongiu, Alessandro Capotondi, Giuseppe Tagliavini, Luca Benini, “Improving the programmability of STHORMbased heterogeneous systems with offload-enabled OpenMP,” International Workshop on Manycore Embedded Systems
(MES), 2013.
NC.10.
Dimitra Papagiannopoulou, R. Iris Bahar, Tali Moreshet, Maurice Herlihy, Andrea Marongiu, Luca Benini: “Transparent
and energy-efficient speculation on NUMA architectures for embedded MPSoCs,” International Workshop on Manycore
Embedded Systems (MES), 2013.
NC.11.
Shivani Raghav, Andrea Marongiu, Christian Pinto, David Atienza, Martino Ruggiero, Luca Benini, “Full system simulation of many-core heterogeneous SoCs using GPU and QEMU semihosting,” Workshop on General Purpose Processing with Graphics Processing Units (GPGPU-5), 2012.
NC.12.
Hayder Al-Khalissi, Andrea Marongiu, Mladen Berekovic, “Low-Overhead Barrier Synchronization for OpenMP-like
Parallelism on the Single-Chip Cloud Computer,” Many-core Applications Research Community (MARC) Symposium,
2012.
NC.13.
Martin Schindewolf, Albert Cohen, Wolfgang Karl, Andrea Marongiu, Luca Benini, “Towards Transactional Memory
Support for GCC,” International Workshop on GCC Research Opportunities (GROW), 2009.
14
Date 13/06/2016
15
Download