Extending Performance and Reliability via Thread-Level Dataflow Management


Roberto Giorgi

Presentation title

Extending Performance and Reliability via Thread-Level Dataflow Management

Authors

Roberto Giorgi

Institution(s)

University of Siena

Presentation type

Technical presentation

Abstract

As the number of transistors per computing system is ever-increasing, so are the fault rates. This issue maybe critical also in the area of embedded systems.

It has been predicted that we are reaching a point when the time to checkpoint such systems will be longer than the mean time between interruptions (MTTI) as seen by applications.In this presentation, checkpointing is applied at a very small granularity by relying on a disciplined data flow among the application threads.The underlying execution model is known as dataflow-threads (DF-threads) and the fault-detection extension of this model allows to achieve a resilient execution of an application while faults are affecting the system.

In the proposed implementation, the execution time gracefully degrades as the number of faults increases, without the need for global checkpointing and without interrupting the application execution.The technique has been evaluated on a full-system x86-64 simulator with encouraging results.