A cycle-accurate methodology to improve PREM-like memory bandwidth underutilization on FPGA-based HeSoCs


Gianluca Brilli, Giacomo Valente, Alessandro Capotondi, Tania Di Mascio, Paolo Valente, Paolo Burgio and Andrea Marongiu

Presentation title

A cycle-accurate methodology to improve PREM-like memory bandwidth underutilization on FPGA-based HeSoCs

Authors

Gianluca Brilli, Giacomo Valente, Alessandro Capotondi, Tania Di Mascio, Paolo Valente, Paolo Burgio and Andrea Marongiu

Institution(s)

University of Modena and Reggio Emilia

Presentation type

Technical presentation

Abstract

The rapid growth seen in recent years in the world of high-end embedded systems has paved the way for next-generation applications, which were impratical few decades ago. To match this need, high-performance embedded chips manufacturers are increasingly adopting a heterogeneous design (HeSoC), where sequential processors and energy efficient accelerators coexist withing the same chip. These systems, defined Commercial-Off-The-Shelf (COTS), are typically organized according to a shared memory architectural scheme, where the memory hierarchy composed of multiple cache layers and a main memory (DRAM) is shared between the computational engines of the system. This scheme allows on the one hand to increase the time-to-market, the scalability of the system and in general to provide good average-case performance. However, it is not always adequate in applications where by construction the system must guarantee bounded performance even in the worst-case. Shared memory organization creates contention problems on shared resources, where the execution time of a task also depends on the number of other tasks that access a given shared resource in the same time interval.

Several techniques have been proposed to mitigate the memory interference problem. One of these methodologies is the Predictable Execution Model (PREM), a mechanism that eliminates the problem of memory interference by imposing mutual exclusion on memory accesses. CMRI is a novel technique, proposed to improve the pessimistic approach of the PREM model, that has proven to be effective to increase the memory bandwidth utilization on CPU-based SoCs.

This work propose an architectural template for FPGA-based accelerators where a soft-processor, called -Proxy Core- is tightly coupled with a hardware bandwidth monitor. This allows: i) to enable CMRI on accelerators; ii) to reach an unprecedented degree of control on accelerator memory bandwidth exploitation in a FPGA-based HeSoCs.


Additional material

  • Presentation slides: [pdf]

For more details on this presentation please click the button below: