Reliability is a primary design metric and impacts heavily design, validation, implementation and testing choices.
Advance nanometer processes, dynamic operating conditions, aggressive working environments makes devices increasingly sensitive to external and internal perturbations. Additionally, more and more functionalities, computing power and sophistication are required from actual electronic systems, making them more complex and with a higher component count.
Accordingly, reliability management became mandatory for today’s highly complex electronic devices.
As usual, the first step is to evaluate the gravity of the problem. Ideally, the reliability analysis should provide data for every feature (block) of the system and also for the whole system. The results are then checked against input (customer) specifications. If the system performs worse than expected, corrective measures (if still possible) are applied. Again, the reliability results should provide relevant data about the critical features of the system to protect or harden. The added improvements in reliability are rarely free: extra area is required or the performances of the device are degraded. Reliability analysis is, again, called to judge the best compromise between the benefits and added costs.
However, effective reliability analysis is difficult to perform. Firstly, reliable data about the behavior of the technology should be gathered. Existing data (from literature) is not always matching the specific standard cells or memory blocks used in the circuit. Moreover, literature data is still poor in characterization data for standard cells. Additionally, most founders are not yet ready or willing to provide reliability data about their libraries. We will enumerate practical ways to obtain this information.
Secondly, per-circuit feature, de-rating must be performed. The de-rating should take in account the actual structure of the circuit and also the usage of the circuit in the present configuration. This de-rating can be potentially very demanding in terms of user expertise or computational power. Alternatively, we can use probabilistic methods that require less analysis effort and/or user expertise. These methods are potentially useful when the analysis becomes a required aspect of system reliability, enabling normal users to perform fault propagation evaluation with minimal effort. We will present some of the methodologies and tools that circuit designers can use to accomplish this task.
Lastly, the manufactured circuit may be used in a variety of scenarios. This is particularly true for generic purpose circuits such as microcontrollers or CPUs. Reliability performance is expected to vary accordingly. The circuit/system designers have to take in account these aspects by computing more accurate de-rating factors, according to the real-life usage of their systems.
IROC Technologies provides tools, services and solutions to perform efficient and ressource-aware fault simulation and propagation analysis. We have developed a complete EDA tool platform – SoCFIT, implementing static (probabilistic) and dynamic (fault-simulation based) methods for the evaluation of the behavior of the faulty design for a variety of applications.
SOCFIT™: Circuit-Level Reliability Analysis
SoCFIT is an EDA software platform for fault simulation and evaluation able to combine technology reliability data and de-rating factors to evaluate the functional reliability of complex circuits and systems.
SoCFIT is able to successfully handle very large designs (10s of millions FF) and includes static (probabilistic) and dynamic (fault simulation and propagation) approaches for the computation of applicative, timing and logic de-rating. SoCFIT’s internal algorithms are able to accelerate tremendously the simulation of various types of faults.
SoCFIT provides extensive reports:
- per cell instance, design feature/block reliability data
- de-rating information
- FIT/event rate for faults/errors/failures