Flexible and Scalable Acceleration Techniques for Low-Power Edge Computing

Francesco Conti, Davide Rossi and Luca Benini

Presentation title

Flexible and Scalable Acceleration Techniques for Low-Power Edge Computing


Francesco Conti, Davide Rossi and Luca Benini


University of Bologna and ETH Zurich

Presentation type

Technical presentation


Next-generation Internet-of-Things nodes will extract from the environment an unprecedented amount of sensory data, due to the availability of more and more novel sensors capable to extract more information within an ever-decreasing energy budget. The sheer size of the compound amount of data makes it impractical to transfer, collect and analyse all of it using well-known data mining analytic pipelines - especially for battery-limited or energetically autonomous sensor nodes.

A proposed solution to this is the paradigm of edge computing, where part of the computation necessary to extract semantically relevant information out of raw data streams is performed directly on the sensor nodes. However, low-power microcontrollers currently on the market for this purpose lack both the flexibility and the computing power to perform much more than very naïve data analytics schemes, whereas complex but successful algorithms such as those based on machine learning are entirely out of reach.

With the Parallel Ultra-Low-Power Platform (PULP) we try to tackle this point and perform significant data analytics directly at the sensor’s edge. PULP is based on a small-scale cluster of simple in-order RISC cores coupled with a shared L1 scratchpad. It is designed in a vertically integrated fashion to extract energy efficiency out of every technology layer, from the software runtime down to the silicon, and to support architectural heterogeneity in the form of specialized computing engines able to further boost the energy efficiency of particularly critical workloads.

This methodology for hardware acceleration can be scaled to a diverse set of targets, ranging from ASICs and FPGAs for high-performance embedded systems, meant to run the challenging inference workload necessary for state-of-the-art Convolutional Neural Networks, to ultra-low power SoC's for the IoT.

As a case study, we bring the PULP-based Fulmine chip, fabricated in 65nm technology, which couples four OpenRISC cores with two engines dedicated respectively to Convolutional Neural Networks and AES security. Fulmine is able to perform complex CNN-based workloads within a 15mW power envelope.

Additional material

  • Extended abstract: [pdf]
  • Presentation slides: [pdf]

  • Warning: Undefined variable $ADDITIONAL_MATERIAL in /var/www/html/iwes/2017/presentations.phtml on line 79