An open-source overlay for reconfigurable, accelerator-rich embedded systems


Gianluca Bellocchi, Alessandro Capotondi and Andrea Marongiu

Presentation title

An open-source overlay for reconfigurable, accelerator-rich embedded systems

Authors

Gianluca Bellocchi, Alessandro Capotondi and Andrea Marongiu

Institution(s)

University of Modena and Reggio Emilia

Presentation type

Technical presentation

Abstract

The design of modern embedded systems is challenged by an increasing request for autonomous capabilities (e.g., unmanned vehicles), which in turn requires high performance and energy efficiency to execute highly sophisticated AI/ML/CV workloads within tight power envelopes. Heterogeneous computing paradigms and hardware-accelerated execution of key application blocks are pivotal to satisfying such requirements. However, for these paradigms to be effectively adopted also by non-expert users there is a need for methodologies and tools aimed at (i) easing the exploration of innovative architectural solutions matching application-specific needs; (ii) simplifying the design, optimization and system-level integration of the numerous required HW and SW components.

We propose a methodology for the exploration and design of accelerator-rich heterogeneous embedded systems. Our proposal targets commercial-off-the-shelf FPGA-based system-on-chips, where high-performance ARM cores and real-time operating systems coordinate the execution of HW blocks and guarantee the predictable execution of legacy software. On the reconfigurable logic we take advantage of a template-based tool-flow to generate the HW and SW components that enable seamless integration of custom accelerators at the system-level. This simplifies the process of design space exploration and instantiation of the selected architecture.

To target a wide public, both the reconfigurable hardware design and the programming phases need to be exposed to the user with a high level of abstraction. To this end, we architect our solution as an FPGA overlay system that can be programmed with standard programming models such as OpenMP or OpenCL.

We have conducted experiments to (i) assess the overheads (implementation costs); (ii) assess the application performance enabled by our overlay compared to standard FPGA design flows; (iii) demonstrate the effectiveness of employing the proposed methodology for system-level design and optimization of accelerator-rich platforms.


Additional material

  • Presentation slides: [pdf]