Fault Tolerance For Array Architectures (Thesis)

Report ID: TR-380-92
Author: Sha, Edwin Hsing-Mean
Date: 1992-10-00
Pages: 134
Download Formats: |Postscript|
Abstract:

As array architectures become larger, surviving hardware failures becomes a critical issue. To achieve fault tolerance and increase reliability, redundancy is added to the system. In this way the original working architectur e can be reconfigured after a failure is detected by replacing components. Since in certain run--time applications reconfiguration should be accomplished as quickly as possible, the efficiency of reconfiguration of a fault--tolerant structure becomes a major concern. This dissertation investigates the relationships among reliability, reconfigurability and the hardware costs of fault--tolerant structures for different classes of array architectures. A mathematical framework is presented, and new measures, {it local} and {it finite reconfigurability,} are defined. For run--time fault tolerance, local and finite reconfigurability are desirable properties. It is proved that local reconfigurability and a fixed level of reliability cannot be achieved simultaneously unless a dynamic graph is of dimension at least one greater than the application graph. After this negative result, some highly reliable and easily reconfigurable structures are constructed based on simple layered graphs. An on--line distributed reconfiguration algorithm is presented for finding new bipartite matchings for these constructions. For run--time error detection in array processors, a methodology based on data--dependency graphs is described. It combines the projection method of deriving systolic arrays from dependency graphs with the idea of input--triggered testing. The method is called {it ITRED,} for {it Input--driven Time--Redundancy Error Detection.} Tests are triggered by inserting special symbols in the input, and so the approach gives the user flexibility in trading off throughput for error coverage. The method requires no extra {it PE's} and little extra hardware. To illustrate this approach, an error--detectable systolic array is designed in CMOS for ALL--SUBSTRING COMPARISONS, a problem which arises in applications such as information retrieval, and DNA pattern matching.