Master Thesis MSTR-2011-13

BibliographyBaroud, Yousef: A Hardware Architecture for Numerical Instability Detection Based on Discrete Stochastic Arithmetic.
University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology, Master Thesis No. 13 (2011).
89 pages, english.

Numerical validation of computed results is of great importance especially in sceintific computing. Due to the use of finite representation of real numbers, round-off errors are introduced and accumulated in arithmetic operations. Nowadays, softwares tend to run longer. This exaggerate the problem as the computed results would get severely contaminated by the propagation of the round-off errors which might, at some point, lead to obtaining unreliable results.

The Discrete Stochastic Arithmetic (DSA) provides an effective and reliable approach to validate the numerical accuracy of the computed results. In DSA, a code is run N times with random rounding at every floating point operation, and the numerical accuracy information can be obtained by calculating the confidence interval of the randomly rounded results.

In this work, we present a novel hardware architecture which efficiently implements the DSA. A Numerical Analysis Unit (NAU) that estimates the numerical accuracy of any intermediate result and detects numerical instabilities has been implemented based on a hardware-reduced approach. The NAU has been integrated into a high-performance FPGA system that consists of two PowerPC processors which use stochastic floating point units. Upon catching numerical instabilities, the NAU raises exceptions to the PowerPCs stopping them at the instruction that caused the exception.

In contrary to the existent implementations, the proposed implementation has been de- veloped to meet three constrains; minimal original source code modifications, minimizing hardware resource cost and exhibiting a good performance.

An extension to a state of the art debugger has been developed for the debugging of numerical instabilities in a code. This extension adds functionalities which facilitate communicating with the FPGA system. Moreover, functionality specific to numerical accuracy, such as getting more details about a NAU exception or resuming execution after catching one, is provided through this extension.

Full text and
other links
Department(s)University of Stuttgart, Institute of Parallel and Distributed Systems, Parallel Systems
Superviser(s)Simon, Prof. Sven, Li, Wenbin
Entry dateMay 14, 2021
   Publ. Computer Science