A fundamental issue for performing the best choice of fault tolerant architecture is the knowledge of the most frequent failures in a given technology.
Faults can be classified in permanent and transient. In modern technologies transient faults are predominant. Transient faults are typically due to electromagnetic interferences and ionizing particles. Electromagnetic interferences are due to crosstalk between long parallel lines in a die and to various external sources of electromagnetic noise. The effects of both sources can be drastically attenuated by proper system design, so that fault tolerance is not mandatory for them. Ionizing particles in terrestrial environments are typically alpha particles and secondary particles created by the interaction of atmospheric neutrons with silicon, oxygen or other atoms composing a die. Atmospheric neutron flux is increased with altitude and may be up to 1000 times higher at commercial flight altitudes than at the sea level. Complete elimination of alpha particles is very costly as it requires removing radioactive isotopes from the die, bonding and packaging materials. Drastic reduction of neutron flux is not practical as it requires several meters of concrete. Using older technologies, 0.35 mm or above, is a good solution for reducing the effect of ionizing particles. This may possible for low complexity devices. But this option may encounter some limitations in the long term because of lacking features in older devices and in any case, the trend is to abandon old technologies by focusing on a few current, up-to-date technological processes.
Permanent faults can be classified as timing faults and time-independent faults. Timing faults affect the temporal behavior of the circuit by increasing its worst case delay. Time-independent faults modify the response of the circuit even if we observe its outputs long time after a transition has occurred on its inputs. Timing faults (or delay faults) are much more difficult to detect than time-independent faults, since, on the one hand, they require performing specific transitions on the circuit inputs for sensitizing them, and on the other hand, they require propagating them through the longest circuit paths to increase the circuit delay beyond the duration of the clock period. Also, their detection may require activating the worst-case delay conditions of the circuit, such as worst case cross-talk, ground bounce and temperature.
Permanent faults can further be classified into fabrication faults, created during the fabrication phase and into faults developed during the circuit life. Fabrication test would detect most of fabrication faults and in particularly time-independent faults which are much easier to detect. But some of the timing faults (those requiring very complex test conditions) may escape detection during fabrication test. Due to their complex test conditions, these faults will affect system operation with a low frequency. As failure mechanisms inducing circuit-life faults are very slow processes, circuit-life defects will alter the circuit structure gradually and will first affect the timing characteristics of the circuit. To evolve into time-independent fault a circuit-life defect will require much longer time. This will enable fault detection and fault masking, and will offer the possibility for circuit replacement much earlier before the defect is transformed into a time-independent fault.