CRIME: A Collaborative Edge/Cloud Inference Framework for Recurrent Neural Networks


Roberta Chiaro, Chen Xie, Daniele Jahier Pagliari, Yukai Chen, Enrico Macii and Massimo Poncino

Presentation title

CRIME: A Collaborative Edge/Cloud Inference Framework for Recurrent Neural Networks

Authors

Roberta Chiaro, Chen Xie, Daniele Jahier Pagliari, Yukai Chen, Enrico Macii and Massimo Poncino

Institution(s)

Politecnico di Torino

Presentation type

Presentation of a research group from one or more scientific institutions

Abstract

Deep learning (DL) has become a reference methodology for elaborating the massive amount of data generated by end devices, such as smartphones and IoT sensors, in a variety of application domains, ranging from computer vision to natural language processing. The excellent results reached by DL-based solutions come at the cost of computational and memory complexity, both in the training and inference phases. In order to meet such demanding computational requirements, the traditional paradigm involves transferring raw data from end-devices to remote cloud datacenters for processing, raising major issues regarding communication latency, scalability and privacy constraints.

Edge computing is an alternative paradigm in which computation tasks are hosted as close as possible to the end devices. Although this approach solves the issues of the cloud-based scheme, it is sometimes impossible to support the complete execution of DL applications on resource-constrained edge devices, unless these are equipped with dedicated hardware accelerators, which are only affordable in high-end systems.

In recent years researchers started investigating a sort of intermediate paradigm known as collaborative inference, in which DL inference applications are executed by a collaborative network of edge and cloud devices. It has been shown that this approach often offers a good compromise between the mobile-only and the cloud-only approaches in terms of latency and energy consumption. Collaborative strategies can be applied to any model and are orthogonal to standard model optimization techniques, such as quantization and pruning.

Although many approaches to collaborative inference have been proposed for feed-forward DNNs and CNNs, there are no studies targeting Recurrent Neural Networks (RNNs), which require different solutions especially designed for their architecture. In particular, one key peculiarity of RNNs is the fact that they process inputs of variable lengths, hence making the inference complexity input-dependent. We present CRIME (Collaborative RNN Inference Mapping Engine), the first collaborative inference framework specifically designed for RNNs. CRIME automatically maps inference requests to a network of collaborating devices, in an input-driven way. Moreover, it is flexible with respect to the connection topology and adapts to changes in the connections statuses and in the devices loads. Experiments on several RNNs and datasets have shown that CRIME can reduce the execution time (or end-node energy consumption) by more than 25% compared to any single-device approach.


Additional material

  • Presentation slides: [pdf]