Your Functional Reliability Partner – IROC Technologies
+33 438 120 763 (FR)
+1 408 982 9993 (US)
Contact Us
Support Site


P. Reviriego, M. Demirci, A. Evans and J. A. Maestro, “A Method to Design Single Error Correction Codes With Fast Decoding for a Subset of Critical Bits,” in IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 63, no. 2, pp. 171-175, Feb. 2016.
doi: 10.1109/TCSII.2015.2483362
Abstract: Single error correction (SEC) codes are widely used to protect data stored in memories and registers. In some applications, such as networking, a few control bits are added to the data to facilitate their processing. For example, flags to mark the start or the end of a packet are widely used. Therefore, it is important to have SEC codes that protect both the data and the associated control bits. It is attractive for these codes to provide fast decoding of the control bits, as these are used to determine the processing of the data and are commonly on the critical timing path. In this brief, a method to extend SEC codes to support a few additional control bits is presented. The derived codes support fast decoding of the additional control bits and are therefore suitable for networking applications.
keywords: {decoding;error correction codes;SEC code design;control bits;critical bits;critical timing path;fast decoding;networking applications;single error correction codes;Application specific integrated circuits;Decoding;Delays;Error correction codes;Parity check codes;Error correction codes;High speed networking;Single Error Correction (SEC);high-speed networking;memory;single error correction (SEC)},

A. Evans, E. Costenaro and A. Bramnik, “Flip-flop SEU reduction through minimization of the temporal vulnerability factor (TVF),” On-Line Testing Symposium (IOLTS), 2015 IEEE 21st International, Halkidiki, 2015, pp. 162-167.
doi: 10.1109/IOLTS.2015.7229851
Abstract: The effects of soft-errors in flip-flops remains a concern in large designs. There exist many radiation hardened flip-flops, however, these are custom cells and not available to all designers. In this paper, we explore a technique for the mitigation of flip-flop soft-errors through an optimization of the temporal vulnerability factor (TVF). By selectively inserting delay on the input or output of flip-flops, the probability of propagation of single event upsets (SEUs) can be minimized. The selection of where to insert the added delay is formulated as a linear programming problem. In this way, the flip-flop soft-error rate (SER) can be minimized subject to overhead constraints.
keywords: {flip-flops;linear programming;probability;radiation hardening (electronics);SER;TVF;flip-flop SEU;flip-flop soft-error rate;flip-flop soft-errors mitigation;linear programming problem;propagation probability;radiation hardened flip-flops;single event upsets;temporal vulnerability factor;Circuit faults;Clocks;Delays;Integrated circuit modeling;Logic gates;Mathematical model},

J. Noh et al., “Study of Neutron Soft Error Rate (SER) Sensitivity: Investigation of Upset Mechanisms by Comparative Simulation of FinFET and Planar MOSFET SRAMs,” in IEEE Transactions on Nuclear Science, vol. 62, no. 4, pp. 1642-1649, Aug. 2015.
doi: 10.1109/TNS.2015.2450997
Abstract: The assessment of the soft-error rate (SER) of semiconductor devices continues to be important, even with the adoption of FinFET devices which overcome some important limitations of planar MOSFETs. The study in this paper presents both theoretical and experimental results via advanced simulation techniques, to investigate the difference between planar and FinFET devices in terms of SER. Neutron test results from different facilities are presented, and the observed differences in sensitivity are explained through theoretical analysis. In the second half of the paper, the test results are validated through TCAD and TFIT simulations using a calibrated technology response model. The analysis shows that the reduction in sensitivity of FinFET devices is primarily due to an increase in the threshold LET and a reduction in the sensitive volume due the shape of the transistor.
keywords: {MOSFET circuits;SRAM chips;technology CAD (electronics);FinFET devices;MOSFET;SRAM;TCAD;TFIT;neutron soft error rate sensitivity;semiconductor devices;transistor;Computational modeling;Doping;FinFETs;Neutrons;Sensitivity;Substrates;FinFET;SRAM;TFIT;neutrons;planar MOSFET;sensitive volume;simulation tool;single event upset (SEU);soft error rate (SER);terrestrial radiation effects},

J. Abraham, R. Iyer, D. Gizopoulos, D. Alexandrescu and Y. Zorian, “The future of fault tolerant computing,” On-Line Testing Symposium (IOLTS), 2015 IEEE 21st International, Halkidiki, 2015, pp. 108-109.
doi: 10.1109/IOLTS.2015.7229841
Abstract: Fault tolerant (or dependable) computing has always been an exciting research area in the intersection of computer science and engineering and electrical and electronics engineering. During the last two decades the applicability of the methods and tools that the fault tolerance research community produces has expanded to virtually all application domains. The type of fault tolerance methods employed in a computing system depend on: (a) the faults expected to affect the system, (b) the importance of errors in the system operation, (c) the design, cost and power budgets that can allocated to fault tolerance and reliable operation. New solutions and tools in fault tolerant computing are emerging to deal with the very broad spectrum of values that all (a), (b) and (c) can take in today’s computing landscape.
keywords: {fault tolerant computing;reliability;computer engineering;computer science;computing system;electrical engineering;electronics engineering;fault tolerance methods;fault tolerance research community;fault tolerant computing;system operation;Testing;dependability;fault tolerance;reliability;resilience},

M. Ebrahimi et al., “Comprehensive Analysis of Sequential and Combinational Soft Errors in an Embedded Processor,” in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 34, no. 10, pp. 1586-1599, Oct. 2015.
doi: 10.1109/TCAD.2015.2422845
Abstract: Radiation-induced soft errors have become a key challenge in advanced commercial electronic components and systems. We present the results of a soft error rate (SER) analysis of an embedded processor. Our SER analysis platform accurately models generation, propagation, and masking effects starting from a technology response model derived using TCAD simulations at the device level all the way to application masking. The platform employs a combination of accurate models at the device level, analytical error propagation at gate level, and fault emulation at the architecture/application level to provide the detailed contribution of each component (flip-flops, combinational gates, and SRAMs) to the overall SER. At each stage in the modeling hierarchy, an appropriate level of abstraction is used to propagate the effect of errors to the next higher level. Unlike previous studies which are based on very simple test chips, analyzing the entire processor gives more insight into the relative contributions of combinational and sequential SER. The results of this analysis can assist circuit designers to adopt effective hardening techniques to reduce the overall SER while meeting the required power and performance constraints.
keywords: {embedded systems;error statistics;multiprocessing systems;SER analysis platform;TCAD simulations;advanced commercial electronic components;analytical error propagation;combinational soft errors;comprehensive analysis;device level;embedded processor;fault emulation;gate level;sequential soft errors;soft error rate analysis;technology response model;Analytical models;Circuit faults;Clocks;Flip-flops;Logic gates;Random access memory;Transient analysis;Embedded Processor;Embedded processor;Reliability;Soft Errors;reliability;soft errors},

A. Evans, D. Alexandrescu, V. Ferlet-Cavrois and K. O. Voss, “Techniques for heavy ion microbeam analysis of FPGA SER sensitivty,” Reliability Physics Symposium (IRPS), 2015 IEEE International, Monterey, CA, 2015, pp. SE.6.1-SE.6.6.
doi: 10.1109/IRPS.2015.7112826
Abstract: Using the heavy-ion micro-probe facility at GSI in Darmstadt, individual heavy ions can be targeted at specific locations on a die. Circuits to measure SEUs in flip-flops and RAMs, SETs in combinatorial logic and glitches in PLLs were developed and tested. Detailed results for a study of the 130 nm ProASIC3L FPGA tested under Au (94 MeV/mg /cm2) and Ti (19 MeV/mg / cm2) ions are presented.
keywords: {application specific integrated circuits;field programmable gate arrays;flip-flops;integrated circuit testing;phase locked loops;radiation hardening (electronics);random-access storage;Darmstadt;FPGA SER sensitivty;GSI;PLL;ProASIC3L FPGA testing;RAM;SET;combinatorial logic;flip-flop;heavy ion microbeam analysis;size 130 nm;Clocks;Field programmable gate arrays;Flip-flops;Gold;Ions;Latches;Phase locked loops;FPGA;Phase Loced Loop (PLL);micro-probe;single event transient (SET);single event upset (SEU)},

D. Alexandrescu, A. Evans, E. Costenaro and M. Glorieux, “A call for cross-layer and cross-domain reliability analysis and management,” On-Line Testing Symposium (IOLTS), 2015 IEEE 21st International, Halkidiki, 2015, pp. 19-22.
doi: 10.1109/IOLTS.2015.7229821
Abstract: For many applications, reliability, availability and trustability are key factors, requiring careful design to meet the end users expectations. The complex ASICs, which are now ubiquitous, often embed tens of millions of flip-flops, hundreds of megabits of embedded SRAM, and hundreds of millions of combinatorial cells. These designs integrate IP from multiple providers and are implemented in advanced process technologies, making it challenging to evaluate their reliability. Initiatives such as RIIF (Reliability Information Interchange Format) allow the formalization, specification and modeling of extra-functional, reliability properties for technology, circuits and systems. Continuing these efforts, we propose RAFT (Reliability Architect Framework and Toolset) – a reliability-centric framework including reliability data and models, methodologies and tools allowing system reliability exploration and optimization using mathematical models and high-level tools. The proposed approach can be combined with performance management methodologies aiming at reducing the engineering effort devoted to reliability analysis and improvement.
keywords: {fault tolerant computing;ASIC;RAFT;RIIF;application-specific integrated circuits;cross-layer cross-domain reliability analysis;cross-layer cross-domain reliability management;embedded SRAM;performance management methodologies;reliability architect framework and toolset;reliability data;reliability exploration;reliability improvement;reliability information interchange format;reliability models;reliability optimization;reliability-centric framework;static random access memory;Application specific integrated circuits;Data models;Databases;Integrated circuit reliability;Reliability engineering;Software reliability;Cross-layer optimization;RIIF;reliability analysis},

P. Reviriego, S. Pontarelli, A. Evans and J. A. Maestro, “A Class of SEC-DED-DAEC Codes Derived From Orthogonal Latin Square Codes,” in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 23, no. 5, pp. 968-972, May 2015.
doi: 10.1109/TVLSI.2014.2319291
Abstract: Radiation-induced soft errors are a major reliability concern for memories. To ensure that memory contents are not corrupted, single error correction double error detection (SEC-DED) codes are commonly used, however, in advanced technology nodes, soft errors frequently affect more than one memory bit. Since SEC-DED codes cannot correct multiple errors, they are often combined with interleaving. Interleaving, however, impacts memory design and performance and cannot always be used in small memories. This limitation has spurred interest in codes that can correct adjacent bit errors. In particular, several SEC-DED double adjacent error correction (SEC-DED-DAEC) codes have recently been proposed. Implementing DAEC has a cost as it impacts the decoder complexity and delay. Another issue is that most of the new SEC-DED-DAEC codes miscorrect some double nonadjacent bit errors. In this brief, a new class of SEC-DED-DAEC codes is derived from orthogonal latin squares codes. The new codes significantly reduce the decoding complexity and delay. In addition, the codes do not miscorrect any double nonadjacent bit errors. The main disadvantage of the new codes is that they require a larger number of parity check bits. Therefore, they can be useful when decoding delay or complexity is critical or when miscorrection of double nonadjacent bit errors is not acceptable. The proposed codes have been implemented in Hardware Description Language and compared with some of the existing SEC-DED-DAEC codes. The results confirm the reduction in decoder delay.
keywords: {error detection codes;hardware description languages;parity check codes;radiation hardening (electronics);random-access storage;semiconductor device reliability;SEC-DED double adjacent error correction;SEC-DED-DAEC codes;advanced technology nodes;decoder delay;hardware description language;memory design;nonadjacent bit errors;orthogonal latin square codes;parity check bits;radiation-induced soft errors;single error correction double error detection codes;Complexity theory;Decoding;Delays;Error correction codes;Hardware design languages;Parity check codes;Very large scale integration;Double adjacent error correction (DAEC);error correction codes;memory;orthogonal latin squares (OLS);single error correction double error detection (SEC-DED);single error correction double error detection (SEC-DED).},

R. Liu et al., “Analysis of advanced circuits for SET measurement,” Reliability Physics Symposium (IRPS), 2015 IEEE International, Monterey, CA, 2015, pp. SE.7.1-SE.7.7.
doi: 10.1109/IRPS.2015.7112827
Abstract: Single Event Transients (SETs) are a growing concern in advanced integrated circuits yet techniques to accurately characterize the cross-section and pulse width of SETs are less mature than those for measuring SEUs. We present four circuits for measuring SETs, an analysis of their capabilities and the subtleties in their implementation. Post-layout circuit simulation results are presented for a test-chip implemented in 28 nm FSDOI technology and integrating these detectors.
keywords: {integrated circuit testing;radiation hardening (electronics);silicon-on-insulator;FSDOI technology;SET cross-section;SET measurement;SET pulse width;SEU measurement;advanced circuit analysis;advanced integrated circuits;post-layout circuit simulation;single event transients;size 28 nm;Delays;Detectors;Flip-flops;Latches;Logic gates;Pulse measurements;Transient analysis;single event transient (SET)},

U. Schlichtmann et al., “Connecting different worlds — Technology abstraction for reliability-aware design and Test,” Design, Automation and Test in Europe Conference and Exhibition (DATE), 2014, Dresden, 2014, pp. 1-8.
doi: 10.7873/DATE.2014.265
Abstract: The rapid shrinking of device geometries in the nanometer regime requires new technology-aware design methodologies. These must be able to evaluate the resilience of the circuit throughout all System on Chip (SoC) abstraction levels. To successfully guide design decisions at the system level, reliability models, which abstract technology information, are required to identify those parts of the system where additional protection in the form of hardware or software coun-termeasures is most effective. Interfaces such as the presented Resilience Articulation Point (RAP) or the Reliability Interchange Information Format (RIIF) are required to enable EDA-assisted analysis and propagation of reliability information. The models are discussed from different perspectives, such as design and test.
keywords: {integrated circuit design;integrated circuit reliability;integrated circuit testing;system-on-chip;EDA assisted analysis;reliability aware design;reliability aware test;reliability interchange information format;resilience articulation point;system on chip abstraction levels;technology abstraction;technology aware design methodologies;Integrated circuit reliability;Mathematical model;Random access memory;Reliability engineering;Resilience;System-on-chip},

D. Alexandrescu, N. Bidokhti, A. Yu, A. Evans and E. Costenaro, “Managing SER costs of complex systems through Linear Programming,” On-Line Testing Symposium (IOLTS), 2014 IEEE 20th International, Platja d’Aro, Girona, 2014, pp. 216-219.
doi: 10.1109/IOLTS.2014.6873701
Abstract: Single Event Effects negatively impact the reliability of complex electronic devices and systems. System architects, reliability engineers and digital designers have to invest considerable resources to successfully meet the reliability goals set by the final user or application. The cost of SER mitigation techniques (e.g. additional power and reduced performance) may render the product less competitive. This paper proposes an approach that allows a system architect to select the best SEE management techniques subject to given cost and performance constraints. In this methodology, the costs of SER protection (area, power, engineering effort, IP costs) are expressed as a cost function depending on the selected protection schemes. A separate function expresses the reliability and/or availability as a function of the protection schemes. Then, Linear Programming techniques are used to select a set of protection techniques that minimizes the costs, subject to the reliability constraints being met. This systematic approach enables system-architects to find a minimal-cost SER protection strategy and thus reducing over-design and unnecessary overheads.
keywords: {integrated circuit reliability;linear programming;radiation hardening (electronics);SEE management techniques;SER cost management;SER mitigation techniques;complex electronic device reliability;complex systems;cost constraints;cost function;linear programming techniques;minimal-cost SER protection strategy;performance constraints;reliability constraints;single event effects;Application specific integrated circuits;Linear programming;Optimization;Random access memory;Reliability engineering;Testing;Cross-layer optimization;SER;SER management},

A. Evans, D. Alexandrescu, V. Ferlet-Cavrois and M. Nicolaidis, “New Techniques for SET Sensitivity and Propagation Measurement in Flash-Based FPGAs,” in IEEE Transactions on Nuclear Science, vol. 61, no. 6, pp. 3171-3177, Dec. 2014.
doi: 10.1109/TNS.2014.2365410
Abstract: Single-event transients (SETs) remain a concern in field-programmable gate arrays (FPGAs) used for space applications. However, accurate measurement of SETs in FPGAs is challenging. This paper describes a calibrated circuit for on-chip measurement of SETs with a temporal precision better than one gate delay. In addition, a technique to measure the final effect of SETs in clocked, complex circuits is presented. Heavy-ion test results for a ProASIC3L FPGA are reported, highlighting a strong dependence between the VersaTile configuration and input signal state with the SET sensitivity and pulse propagation.
keywords: {clocks;field programmable gate arrays;radiation hardening (electronics);ProASIC3L FPGA;SET sensitivity;VersaTile configuration;calibrated circuit;clock;complex circuits;field programmable gate arrays;flash-based FPGA;heavy ion test;on-chip measurement;propagation measurement;pulse propagation;single event transients;space applications;temporal precision;Field programmable gate arrays;Latches;Logic circuits;Sensitivity;Single event transients;Transient analysis;Heavy ion and pulse laser irradiation;SET propagation;propagation-induced pulse broadening;single-event transients (SETs)},

Hao Xie et al., “New approaches for synthesis of redundant combinatorial logic for selective fault tolerance,” On-Line Testing Symposium (IOLTS), 2014 IEEE 20th International, Platja d’Aro, Girona, 2014, pp. 62-68.
doi: 10.1109/IOLTS.2014.6873673
Abstract: With shrinking process technologies, the likelihood of mid-life faults in combinatorial logic is increasing. Approximate logic functions are a promising approach to mitigate such faults as the technique can be applied to any digital circuit, it protects against multiple fault models and offers a trade-off between increased area and fault coverage. In this paper we present a new algorithm for generating approximate logic functions. The algorithm considers the failure probabilities of the gates and it uses a sum of product (SOP) representation. The results on some circuits show that FIT rate can be reduced by 75% with an area penalty of 46% and inserting only two additional layers of logic.
keywords: {approximation theory;combinational circuits;fault tolerance;probability;FIT rate;SOP representation;approximate logic function;digital circuit;failure probability;likelihood midlife fault mitigation;redundant combinatorial logic synthesis;selective fault tolerance;shrinking process technology;sum of product representation;Circuit faults;Integrated circuit modeling;Logic functions;Logic gates;Mathematical model;Measurement;Vectors;approximate logic;fault-tolerance;logic synthesis},

S. Ganapathy, R. Canal, D. Alexandrescu, E. Costenaro, A. Gonzalez and A. Rubio, “INFORMER: An integrated framework for early-stage memory robustness analysis,” Design, Automation and Test in Europe Conference and Exhibition (DATE), 2014, Dresden, 2014, pp. 1-4.
doi: 10.7873/DATE.2014.046
Abstract: With the growing importance of parametric (process and environmental) variations in advanced technologies, it has become a serious challenge to design reliable, fast and low-power embedded memories. Adopting a variation-aware design paradigm requires a holistic perspective of memory-wide metrics such as yield, power and performance. However, accurate estimation of such metrics is largely dependent on circuit implementation styles, technology parameters and architecture-level specifics. In this paper, we propose a fully automated tool – INFORMER – that helps high-level designers estimate memory reliability metrics rapidly and accurately. The tool relies on accurate circuit-level simulations of failure mechanisms such as soft-errors and parametric failures. The statistics obtained can then help couple low-level metrics with higher-level design choices. A new technique for rapid estimation of low-probability failure events is also proposed. We present three use-cases of our prototype tool to demonstrate its diverse capabilities in autonomously guiding large SRAM based robust memory designs.
keywords: {SRAM chips;circuit simulation;failure analysis;integrated circuit reliability;radiation hardening (electronics);INFORMER;SRAM based robust memory designs;circuit-level simulations;early-stage memory robustness analysis;failure mechanisms;parametric failures;reliability;soft-errors;Estimation;Integrated circuit modeling;Measurement;Redundancy;Robustness;Transistors},

M. Ebrahimi, A. Evans, M. B. Tahoori, R. Seyyedi, E. Costenaro and D. Alexandrescu, “Comprehensive analysis of alpha and neutron particle-induced soft errors in an embedded processor at nanoscales,” Design, Automation and Test in Europe Conference and Exhibition (DATE), 2014, Dresden, 2014, pp. 1-6.
doi: 10.7873/DATE.2014.043
Abstract: Radiation-induced soft errors have become a key challenge in advanced commercial electronic components and systems. We present results of Soft Error Rate (SER) analysis of an embedded processor. Our SER analysis platform accurately models all generation, propagation and masking effects starting from a technology response model derived using TCAD simulations at the device level all the way to application masking. The platform employs a combination of empirical models at the device level, analytical error propagation at logic level and fault emulation at the architecture/application level to provide the detailed contribution of each component (flip-flops, combinational gates, and SRAMs) to the overall SER. At each stage in the modeling hierarchy, an appropriate level of abstraction is used to propagate the effect of errors to the next higher level. Unlike previous studies which are based on very simple test chips, analyzing the entire processor gives more insight into the contributions of different components to the overall SER. The results of this analysis can assist circuit designers to adopt effective hardening techniques to reduce the overall SER while meeting required power and performance constraints.
keywords: {flip-flops;microprocessor chips;nanoelectronics;radiation hardening (electronics);SER analysis platform;TCAD simulations;advanced commercial electronic components;alpha particle-induced soft errors;analytical error propagation;architecture-application level;device level;embedded processor;fault emulation;flip-flops;logic level;masking effects;neutron particle-induced soft errors;radiation-induced soft errors;soft error rate analysis;technology response model;test chips;Analytical models;Clocks;Emulation;Load modeling;Logic gates;Neutrons;Random access memory},

D. Alexandrescu, L. Sterpone and C. Lopez-Ongil, “Fault injection and fault tolerance methodologies for assessing device robustness and mitigating against ionizing radiation,” Test Symposium (ETS), 2014 19th IEEE European, Paderborn, 2014, pp. 1-6.
doi: 10.1109/ETS.2014.6847812
Abstract: Traditionally, heavy ion radiation effects affecting digital systems working in safety critical application systems has been of huge interest. Nowadays, due to the shrinking technology process, Integrated Circuits became sensitive also to other kinds of radiation particles such as neutron that can exist at the earth surface and affects ground-level safety critical applications such as automotive or medical systems. The process of analyzing and hardening digital devices against soft errors implies rising the final cost due to time expensive fault injection campaigns and radiation tests, as well as reducing system performance due to the insertion of redundancy-based mitigation solutions. The main industrial problem arising is the localization of the critical elements in the circuit in order to apply optimal mitigation techniques. The proposal of this tutorial is to present and discuss different solutions currently available for assessing and implementing the fault tolerance of digital circuits, not only when the unique design description is provided but also at the component level, especially when Commercial-of-the-shelf (COTS) devices are selected.
keywords: {digital circuits;fault tolerance;integrated circuit reliability;radiation hardening (electronics);digital circuits;fault injection;fault tolerance;ionizing radiation;optimal mitigation techniques;radiation tests;Aerospace electronics;Circuit faults;Hardware;Integrated circuit modeling;Reliability engineering;Sensitivity;Cross Section;Fault Injection;Fault Model;Ionizing Radiation;Mitigation Techniques;SETs;SEUs;Soft Errors},

A. Evans, D. Alexandrescu, E. Costenaro and L. Chen, “Hierarchical RTL-based combinatorial SER estimation,” On-Line Testing Symposium (IOLTS), 2013 IEEE 19th International, Chania, 2013, pp. 139-144.
doi: 10.1109/IOLTS.2013.6604065
Abstract: With increased device integration and a gradual trend toward higher operating frequencies, the effect of radiation induced transients in combinatorial logic (SETs) can no longer be ignored. Electrical, logical and temporal masking prevent the majority of SETs from becoming functional failures. Current work on SET analysis starts from a gate-level circuit representation, however, in an industrial design cycle, by the time a gate-level netlist is available, it is too late to make design changes. We propose a hierarchical SET analysis methodology that can be applied at the RTL level. The SET sensitivity of the cell library and the masking characteristics of standard combinatorial design blocks are pre-characterized and stored in compact models. The SET sensitivity of a complex circuit is then calculated by decomposing it into blocks and combining the compact SET models. Experimental results are presented for an ALU implemented in the NanGate library.
keywords: {combinational circuits;logic design;radiation effects;ALU;NanGate library;SET analysis;cell library;combinatorial design blocks;combinatorial logic;device integration;electrical masking;gate-level circuit representation;gate-level netlist;hierarchical RTL-based combinatorial SER estimation;industrial design cycle;logical masking;radiation induced transients;temporal masking;Adders;Circuit faults;Integrated circuit modeling;Libraries;Logic gates;Sensitivity;Vectors},

Hao Xie, Li Chen, A. Evans, Shi-Jie Wen and R. Wong, “Synthesis of Redundant Combinatorial Logic for Selective Fault Tolerance,” Dependable Computing (PRDC), 2013 IEEE 19th Pacific Rim International Symposium on, Vancouver, BC, 2013, pp. 128-129.
doi: 10.1109/PRDC.2013.26
Abstract: With shrinking process technologies, the likelihood of mid-life logic faults is increasing. In this paper, we present an approach for mitigating the effects of faults in combinatorial logic through the selective addition of redundant logic. This approach can be applied to a generic digital circuit, protects against multiple fault models and offers a trade-off between area and fault coverage. The results show that fault coverage can be improved by 4x with an area penalty of 50% and only two additional layers of logic.
keywords: {combinational circuits;fault tolerance;logic design;area penalty;fault coverage;fault models;logic faults;redundant combinatorial logic;redundant logic;selective fault tolerance;Circuit faults;Educational institutions;Fault tolerance;Fault tolerant systems;Logic functions;Logic gates;Maintenance engineering;fault-tolerance;logic synthesis},

D. Alexandrescu, E. Costenaro and A. Evans, “State-aware single event analysis for sequential logic,” On-Line Testing Symposium (IOLTS), 2013 IEEE 19th International, Chania, 2013, pp. 151-156.
doi: 10.1109/IOLTS.2013.6604067
Abstract: Single Event Effects in sequential logic cells represent the current target for analysis and improvement efforts in both industry and academia. We propose a state-aware analysis methodology that improves the accuracy of Soft Error Rate data for individual sequential instances based on the circuit and application. Furthermore, we exploit the intrinsic imbalance between the SEU susceptibility of different flip-flop states to implement a low-cost SER improvement strategy. Careful, per-state SEE analysis of sequential cells also highlights SET phenomena in flip-flops. We apply de-rating techniques to accurately evaluate their contribution to the overall flip-flop SEE sensitivity.
keywords: {flip-flops;SET phenomena;SEU susceptibility;derating techniques;flip-flop SEE sensitivity;flip-flop states;low-cost SER improvement strategy;per-state SEE analysis;sequential instances;sequential logic cells;single event effects;soft error rate data;state-aware single event analysis;Clocks;Inverters;Latches;Libraries;Single event upsets;Testing;Transistors},

A. Rohani, H. G. Kerkhoff, E. Costenaro and D. Alexandrescu, “Pulse-length determination techniques in the rectangular single event transient fault model,” Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIII), 2013 International Conference on, Agios Konstantinos, 2013, pp. 213-218.
doi: 10.1109/SAMOS.2013.6621125
Abstract: One of the well-known models to represent Single Event Transient phenomenon at the logic-level is the rectangular pulse model. However, the pulse-length in this model has a vital contribution to the accuracy and validity of the rectangular pulse model. The work presented in this paper develops two approaches for determination of the pulse-length of the rectangular pulse model used in Single Event Transient (SET) faults. The first determination approach has been extracted from radiation testing along with transistor-level SET analysis tools. The second determination approach has been elicited from asymptotic analytical behaviour of SETs in 45-nm CMOS process. The results show that applying these two pulse-length determination approaches to the rectangular pulse model will cause the fault injection results converge much faster (up to sixteen times), compared to other conventional approaches.
keywords: {combinational circuits;CMOS process;asymptotic analytical behaviour;logic-level;pulse-length determination techniques;radiation testing;rectangular pulse model;rectangular single event transient fault model;transistor-level SET analysis tools;Circuit faults;Clocks;Equations;Integrated circuit modeling;Libraries;Logic gates;Mathematical model},

S. Pontarelli, M. Ottavi, A. Evans and S. J. Wen, “Error detection in Ternary CAMs using Bloom Filters,” Design, Automation & Test in Europe Conference & Exhibition (DATE), 2013, Grenoble, France, 2013, pp. 1474-1479.
doi: 10.7873/DATE.2013.300
Abstract: This paper presents an innovative approach to detect soft errors in Ternary Content Addressable Memories (TCAMs) based on the use of Bloom Filters. The proposed approach is described in detail and its performance results are presented. The advantages of the proposed method are that no modifications to the TCAM device are required, the checking is done on-line and the approach has low power and area overheads.
keywords: {Arrays;Cams;Computer aided manufacturing;Filtering algorithms;Quality of service;Vectors},

A. Evans, M. Nicolaidis, R. Aitken, B. Aktan and O. Lauzeral, “Hot topic session 4A: Reliability analysis of complex digital systems,” VLSI Test Symposium (VTS), 2013 IEEE 31st, Berkeley, CA, 2013, pp. 1-1.
doi: 10.1109/VTS.2013.6548898
Abstract: Today, there are several trends that are making the reliability analysis of complex integrated circuits an important challenge in industry. As transistor geometries shrink, the number of physical failure mechanisms is increasing while at the same time the number of transistors per chip is still growing. The rollout of new services is pushing compute demands both in handheld devices and in the data center which is driving up complexity and the level of integration. People are becoming critically dependent on mobile services and expect high availability. Looking forward to the deployment of the Internet of Things (IoT) where processors and routers will be embedded in billions of end-points, we are only going to see an increased demand for reliable computing. In this session, we bring together three different industrial perspectives on reliability. The first looks at the end-points, the second looks at the servers and the last looks at the economic drivers for reliability and the demand for new EDA tools for reliability analysis. In the first talk, Rob Aitken from ARM will discuss the reliability challenges in mobile applications. As mobile systems continue to increase in size and complexity, and user requirements are also becoming more stringent, it is important for designers of mobile systems to be aware of reliability issues, and to adapt their methodologies accordingly. This talk discusses the issues involved, from latent defects, through soft errors, aging and wearout, and shows how to consider these as part of the design process, how to quantify their effects, and how to mitigate them through design changes. In the second presentation, Burcin Aktan from Intel is going to discuss the evolution of the reliability features that are found in server applications. With so many processing units packed in data centers the reliability requirements on an individual device is growing, especially with integrated memory controllers and very high bandwidth data pathways. What- was an “add-on” to a device function, 10–15 years ago, now needs to be considered carefully with stringent budgets distributed to each functional block that contribute to overall error rates. This talk will focus on the evolution of reliability features in a number of server products leading into the current state and look at how today’s designers are dealing with the challenges of gathering requirements, translating these to design implementation and delivering quality features to customers. Finally we will close with remarks on future directions and possible research areas. In the final presentation, Olivier Lauzeral from iROC Technologies will discuss the importance of methodologies for the reliability analysis of complex SoCs. There is an inherent cost to adding reliability features in a complex IC and designers need to be able to make informed decisions about how much hardware to allocate for mitigation (redundancy, error correction, repair). A prerequisite to make such choices is clearly defined targets and this requires an economic framework where the cost of failures is understood. Once the reliability targets for a system and individual devices are established, there is a need for EDA tools which allow designers to compute the failure rate and failure modes within the device. This analysis must include all failure mechanisms (radiation effects, lifetime effects, manufacturing detects) and take into account the relevant de-ratings between faults and observed errors. This new EDA infra-structure is key for designers to make effective trade-offs in order to arrive at a cost effective design.
keywords: {Integrated circuit modeling;Integrated circuit reliability;Mobile communication;Reliability engineering;Servers},

A. Evans, M. Nicolaidis, S. J. Wen and T. Asis, “Clustering techniques and statistical fault injection for selective mitigation of SEUs in flip-flops,” Quality Electronic Design (ISQED), 2013 14th International Symposium on, Santa Clara, CA, 2013, pp. 727-732.
doi: 10.1109/ISQED.2013.6523691
Abstract: In large SoCs, managing the effects of soft-errors in flip-flops is essential, however, selective mitigation is necessary to minimize the area and power costs. The identification of the optimal set of flip-flops to protect typically requires compute-intensive fault-injection campaigns. We present new techniques which group similar flip-flops into clusters to significantly reduce the number of fault injections. The number of required fault injections can be significantly lower than the total number of flip-flops and in one industrial design with over 100,000 flip-flops, by simulating only 2,100 fault injections, the technique identified a set of 4.1% of the flip-flops, which when protected, reduced the critical failure rate by a factor of 7x.
keywords: {flip-flops;integrated circuit reliability;radiation hardening (electronics);system-on-chip;SEU selective mitigation;SoC;clustering technique;flip-flops;soft-error effect;statistical fault injection;Circuit faults;Flip-flops;Hardware design languages;Integrated circuit modeling;Reliability;Sensitivity;System-on-chip},

S. Ganapathy, R. Canal, D. Alexandrescu, E. Costenaro, A. Gonzalez and A. Rubio, “A novel variation-tolerant 4T-DRAM cell with enhanced soft-error tolerance,” Computer Design (ICCD), 2012 IEEE 30th International Conference on, Montreal, QC, 2012, pp. 472-477.
doi: 10.1109/ICCD.2012.6378681
Abstract: In view of device scaling issues, embedded DRAM (eDRAM) technology is being considered as a strong alternative to conventional SRAM for use in on-chip memories. Memory cells designed using eDRAM technology in addition to being logic-compatible, are variation tolerant and immune to noise present at low supply voltages. However, two major causes of concern are the data retention capability which is worsened by parameter variations leading to frequent data refreshes (resulting in large dynamic power overhead) and the transient reduction of stored charge increasing soft-error (SE) susceptibility. In this paper, we present a novel variation-tolerant 4T-DRAM cell whose power consumption is 20.4% lower when compared to a similar sized eDRAM cell. The retention time on-average is improved by 2.04X while incurring a delay overhead of 3% on the read-access time. Most importantly, using a soft-error (SE) rate analysis tool, we have confirmed that the cell sensitivity to SEs is reduced by 56% on-average in a natural working environment.
keywords: {DRAM chips;SRAM chips;embedded systems;radiation hardening (electronics);SRAM;device scaling issues;eDRAM technology;embedded DRAM;enhanced soft error tolerance;memory cells;on-chip memories;variation-tolerant 4T-DRAM cell;Capacitance;Logic gates;Power demand;Random access memory;System-on-a-chip;Threshold voltage;Transistors},

S. Valadimas, Y. Tsiatouhas, A. Arapoyanni and A. Evans, “Single event upset tolerance in flip-flop based microprocessor cores,” Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT), 2012 IEEE International Symposium on, Austin, TX, 2012, pp. 79-84.
doi: 10.1109/DFT.2012.6378204
Abstract: Soft errors due to single event upsets (SEUs) in the flip-flops of a design are of increasing importance in nanometer technology microprocessor cores. In this work, we present a flip-flop oriented soft error detection and correction technique. It exploits a transition detector at the output of the flip-flop for error detection along with an asynchronous local error correction scheme to provide soft error tolerance. Alternatively, a low cost soft error detection scheme is introduced, which shares a transition detector among multiple flip-flops, while error recovery relies on architectural replay. To validate the proposed approach, it has been applied in the design of a 32-bit MIPS microprocessor core using a 90nm CMOS technology.
keywords: {CMOS digital integrated circuits;flip-flops;microprocessor chips;CMOS technology;MIPS;SEU;architectural replay;asynchronous local error correction scheme;error recovery;flip-flop oriented soft error detection;nanometer technology microprocessor cores;single event upset tolerance;size 90 nm;soft error correction technique;soft error tolerance;transition detector;word length 32 bit;Discrete Fourier transforms;Fault tolerance;Fault tolerant systems;Flip-flops;Nanotechnology;Single event upset;Very large scale integration;SEUs;Soft error detection and correction;Soft error tolerance},

D. Alexandrescu and E. Costenaro, “Towards optimized functional evaluation of SEE-induced failures in complex designs,” On-Line Testing Symposium (IOLTS), 2012 IEEE 18th International, Sitges, 2012, pp. 182-187.
doi: 10.1109/IOLTS.2012.6313869
Abstract: Single Event Effects strongly impact the reliability of electronic circuits and systems, requiring careful SER characterization and adequately sized mitigation strategy. The SER study aims at providing relevant information about the circuit behavior in the specified working environment, in terms of Functional Failures rates, criticality and so on. Ultimately, the error mitigation efforts are directed at improving the function of the circuit in the presence of SEE by either reducing the failure occurrence rate or the failure impact. However, when dealing with SEEs affecting highly sophisticated electronic designs, functional issues are one of the most complex aspects to reliably characterize. This paper aims at proposing and evaluating several fault characterization techniques, meant to approximate the functional failures induced by Single Event Upsets in complex circuits, very early in the design flow. The two main contributions of our efforts consist in a differential fault simulation approach based on standard simulation tools and a novel parallel, SEE-optimized, stand-alone simulation tool. Both methods accurately evaluate the immediate propagation of SEE-induced faults and predict the long-term behavior of the faulty circuit running a specified application. The works described in this paper also benefit from various optimization techniques targeting lower simulation costs (in terms of CPU and man-power) while preserving the accuracy of the results. Ultimately, the results of each method compare positively with reference data obtained from an exhaustive fault simulation campaign. This encouraging outcome suggests that we can reliably obtain highly informative functional error information while spending reasonable resources (CPU, man-power, time).
keywords: {circuit optimisation;digital integrated circuits;error statistics;failure analysis;fault simulation;integrated circuit design;integrated circuit reliability;multiprocessing systems;semiconductor process modelling;SEE-induced failures;SER characterization;adequately sized mitigation strategy;circuit behavior;circuit function;complex designs;differential fault simulation approach;electronic circuits reliability;error mitigation efforts;failure occurrence rate;fault characterization techniques;fault simulation campaign;faulty circuit;functional failures rates;highly sophisticated electronic designs;informative functional error information;optimization techniques;optimized functional evaluation;single event effects;single event upsets;stand-alone simulation tool;standard simulation tools;Circuit faults;Clocks;Computational modeling;Integrated circuit modeling;Logic gates;Standards;Vectors;Fault Simulation;Functional De-Rating;Single Event Upset;Soft Error Rate},

R. Wong, B. L. Bhuva, A. Evans and S. J. Wen, “System-level reliability using component-level failure signatures,” Reliability Physics Symposium (IRPS), 2012 IEEE International, Anaheim, CA, 2012, pp. 4B.3.1-4B.3.7.
doi: 10.1109/IRPS.2012.6241832
Abstract: System-level Mean Time Between Failures (MTBF) is usually evaluated using individual component-level reliability metrics. System-level failures are categorized by Reliability, Availability and Serviceability (RAS) metrics. However, RAS evaluation at the system-level requires precise mapping between component failure modes, their system failure signatures and system reliability requirements. In this paper, RAS analysis carried out on internet switches in a top-down hierarchical fashion is presented. Results show availability of failure classification at a lower-level of design allows for better fault management and improved RAS metrics at the system-level. A hierarchical modeling format is proposed to standardize the reporting of component failure modes to improve the system level modeling of RAS.
keywords: {failure analysis;integrated circuit reliability;Internet switches;MTBF;RAS analysis;RAS evaluation;RAS metrics;component failure modes;component-level failure signatures;failure classification;fault management;hierarchical modeling format;individual component-level reliability metrics;reliability availability and serviceability metrics;system failure signatures;system level modeling;system reliability requirements;system-level failures;system-level mean time between failures;system-level reliability;top-down hierarchical fashion;Hardware;Integrated circuits;Measurement;Reliability engineering;Software;Software reliability;FIT;component-level failures;hard errors;memory failures;soft errors;system-level failures},

M. Vilchis, R. Venkatraman, E. Costenaro and D. Alexandrescu, “A real-case application of a synergetic design-flow-oriented SER analysis,” On-Line Testing Symposium (IOLTS), 2012 IEEE 18th International, Sitges, 2012, pp. 43-48.
doi: 10.1109/IOLTS.2012.6313839
Abstract: We present a methodology that investigates SEEs in complex SOCs. The analysis integrates tightly with the design flow and provides static and dynamic de-rating algorithms. This approach is in good agreement with alpha testing results obtained from a 40nm CMOS testchip with sixty-four independently controlled/selectable Advanced Encryption Standard (AES) based processing element (PE) blocks.
keywords: {CMOS integrated circuits;cryptography;integrated circuit design;radiation hardening (electronics);system-on-chip;AES based processing element blocks;CMOS test chip;PE block;SoC;dynamic derating algorithms;independently controlled-selectable advanced encryption standard;size 40 nm;soft error rate;static derating algorithms;synergetic design-flow-oriented SER analysis;Circuit faults;Clocks;Flip-flops;Registers;Single event upset;Standards;Testing;Single Event Effects;Single Event Transient;Single Event Upset;Soft Error Rate},

A. Evans, M. Nicolaidis, S. J. Wen, D. Alexandrescu and E. Costenaro, “RIIF – Reliability information interchange format,” On-Line Testing Symposium (IOLTS), 2012 IEEE 18th International, Sitges, 2012, pp. 103-108.
doi: 10.1109/IOLTS.2012.6313849
Abstract: In this paper, a new standard language called RIIF (Reliability Information Interchange Format) is defined which enables designers to specify the failure characteristics and reliability requirements for simple and complex components. This language enables EDA tools to analyze reliability models and to compute the failure rates for complex systems. A formal language makes it possible for suppliers and consumers to exchange reliability information in a consistent fashion and to use this information to build accurate reliability models. The RIIF language is a general purpose reliability modeling language and is not tied to a specific application domain or implementation technology.
keywords: {electronic engineering computing;failure analysis;integrated circuit reliability;microprocessor chips;simulation languages;EDA tools;IP core;RIIF standard langauge;failure characteristics;formal language;reliability information interchange format;reliability modeling language;reliability models;Computational modeling;IP networks;Neutrons;Random access memory;Redundancy;Reliability engineering},

D. Alexandrescu, “A comprehensive soft error analysis methodology for SoCs/ASICs memory instances,” On-Line Testing Symposium (IOLTS), 2011 IEEE 17th International, Athens, 2011, pp. 175-176.
doi: 10.1109/IOLTS.2011.5993833
Abstract: Memory blocks are important features of any design, in terms of functionality, silicon area and reliability. Embedded SRAM instances are critical contributors to the overall Soft Error Rate of the system, requiring a careful consideration of the reliability aspects and adequate sizing of the error mitigation capabilities. While error detecting and correcting codes are widely available and particularly effective against most types of Single Event Effects, Multiple Bit Upsets and progressive errors accumulation may defeat the error correction capabilities of standard SECDED codes. Accordingly, the paper presents an overall approach to the structural and functional SER analysis of the memory instances in addition to error mitigation efficiency estimation. Moreover, intrinsic, nominal, SER figures are not a realistic indicator of the memory behavior for a given application. We propose instead, an opportunity window metric, associated to the notion of data lifetime in the memory, as extracted from functional simulations. Lastly, based on the opportunity window figures, targeted and efficient fault simulation campaigns can be prepared to estimate high-level functional failures induced by Single Events. The overall memory SER evaluation aims at assisting the designers to improve the performances of the design and to document the reliability figures of the system.
keywords: {SRAM chips;application specific integrated circuits;error analysis;system-on-chip;SECDED codes;SER analysis;SRAM;SoC-ASIC memory instances;comprehensive soft error analysis methodology;high-level functional failure estimation;multiple bit upsets;single event effects;soft error rate analysis;Computational modeling;Error analysis;Neutrons;Reliability;Single event upset;Throughput;Memory SER;SER De-rating;Single Event Upsets;Soft Error Rate;Soft Errors;data lifetime},

D. Alexandrescu, E. Costenaro and M. Nicolaidis, “A Practical Approach to Single Event Transients Analysis for Highly Complex Designs,” Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT), 2011 IEEE International Symposium on, Vancouver, BC, 2011, pp. 155-163.
doi: 10.1109/DFT.2011.18
Abstract: Single Event Transients are considerably more difficult to model, simulate and analyze than the closely-related Single Event Upsets. The work environment may cause a myriad of distinctive transient pulses in various cell types that are used in widely different configurations. We present practical methods to help characterizing the standard cell library using dedicated tools and results from radiation testing. Furthermore, we analyze the SET propagation in logic networks using a standard (reference) serial fault simulation approach and an accelerated fault simulation technique, taking in account both logic and temporal considerations. The accelerated method provides similar results as the reference approach while offering a considerable increase in the simulation speed. However, the simulation approach may not be feasible for large (multi-million cells) designs that could benefit from static analysis methods. We benchmark the results of a static, probabilistic approach against the reference and accelerated methods. Finally, we discuss the integration of the SET analysis in a complete Soft Error Rate analysis flow.
keywords: {fault simulation;logic circuits;radiation hardening (electronics);SET propagation;accelerated fault simulation technique;logic network;radiation testing;single event transient analysis;single event upset;soft error rate analysis flow;standard cell library;standard serial fault simulation approach;transient pulse;Analytical models;Circuit faults;Flip-flops;Integrated circuit modeling;Libraries;Transient analysis;Transistors;Single Event Effects;Single Event Transient;accelerated fault simulation;fault propagation;static fault analysis},

E. Costenaro, M. Violante and D. Alexandrescu, “A new IP core for fast error detection and fault tolerance in COTS-based solid state mass memories,” On-Line Testing Symposium (IOLTS), 2011 IEEE 17th International, Athens, 2011, pp. 49-54.
doi: 10.1109/IOLTS.2011.5993810
Abstract: Commercial-off-the-shelf (COTS) components are crucial for the success of future space missions as, although not being specifically designed for space, they are the only components able to meet the performance requirement that new missions impose to designers. COTS memories are particularly appealing as large memory arrays can be implemented, which can be made immune to radiations by means of cost-effective information redundancy schemes. In this paper, we present an intellectual property (IP) core implementing a (12, 8) Reed-Solomon (RS) code for error mitigation that is suitable for COTS Flash NAND and NOR memories. The main novelty of the proposed scheme consists in its architecture which is based on a shortened Reed-Solomon code with a fast error detection feature. Two implementations have been studied: a fully combinational scheme that provides error detection and correction in the access cycle and a two-stage pipeline with early (in-cycle) error detection a 1-cycle latency correction. The pipelined version presents the specific advantages of minimizing the time penalty associated to a traditional RS implementation. We have characterized the area and timing performance of the proposed architectures in a variety of FPGA implementations, obtaining a maximum frequency of 47 MHz for the combinational implementation and 53 MHz for the pipelined one (in a Virtex 6 FPGA), with a quick 5 ns error detection. In addition, we have characterized the fault resiliency of the proposed schemes with respect to Single Event Transients and Single Event Faults.
keywords: {NAND circuits;NOR circuits;Reed-Solomon codes;error correction codes;error detection;fault tolerance;field programmable gate arrays;flash memories;industrial property;integrated circuit reliability;redundancy;COTS flash NAND memory;COTS-based solid state mass memories;FPGA;IP core;NOR memory;Reed-Solomon code;commercial-off-the-shelf components;cost-effective information redundancy schemes;error correction codes;fast error detection;fault tolerance;frequency 47 MHz;frequency 53 MHz;intellectual property;memory arrays;one-cycle latency correction;single event faults;single event transients;time 5 ns;Circuit faults;Clocks;Decoding;Field programmable gate arrays;Flash memory;Integrated circuit modeling;Reed-Solomon codes;Error correction;FPGA;Flash memory;IP core;Reed-Solomon code;fault tolerant memory},

C. Weulersse, F. Miller, D. Alexandrescu, E. Schaefer and R. Gaillard, “Assessment and comparison of the low energy proton sensitivity in 65nm to 28nm SRAM devices,” Radiation and Its Effects on Components and Systems (RADECS), 2011 12th European Conference on, Sevilla, 2011, pp. 291-296.
doi: 10.1109/RADECS.2011.6131399
Abstract: The low energy proton sensitivity is investigated on 65nm to 28nm SRAM devices. The experiments are based on a novel cost and time-efficient methodology, which allows irradiations in high energy facilities and which is more tolerant to incident energy uncertainty. The low energy proton sensitivity is compared among the test chips. No scaling trends are highlighted.
keywords: {SRAM chips;radiation hardening (electronics);SRAM devices;high energy facilities;incident energy uncertainty;low energy proton sensitivity;size 65 nm to 28 nm;test chips;time-efficient methodology;Copper;Ionization;Protons;Radiation effects;Random access memory;Sensitivity;Silicon;Direct ionization;SRAM;proton irradiation;single event upset},

M. Tahoori, I. Parulkar, D. Alexandrescu, K. Granlund, A. Silburt and B. Vinnakota, “Panel: Reliability of data centers: Hardware vs. software,” Design, Automation & Test in Europe Conference & Exhibition (DATE), 2010, Dresden, 2010, pp. 1620-1620.
doi: 10.1109/DATE.2010.5457069
Abstract: In today’s life, data centers are integral part of daily life. From web search to online banking, online shopping to medical records, we rely on data centers for almost everything. Malfunctions in the operation of such data centers have become an inseparable part of our daily lives as well. Major malfunction causes include hardware and software failures, design errors, malicious attacks and incorrect human interactions. The consequences of such malfunctions are enormous: loss of human life, financial loss, fraud, wastage of time and energy, loss of productivity, and frustrations with computing. Therefore, reliability of these systems plays a critical role in all aspects of our day to day life.

D. Alexandrescu, “Reflections on a SER-aware design flow,” IC Design and Technology (ICICDT), 2010 IEEE International Conference on, Grenoble, 2010, pp. 215-219.
doi: 10.1109/ICICDT.2010.5510257
Abstract: Single Event Effects (SEEs) may cause system downtime, data corruption and maintenance incidents. Thus, the SEE are a threat to the overall system reliability, causing designers to be increasingly concerned about the analysis and the mitigation of radiation-induced failures, even for commercial systems performing in a natural working environment. Experts and reliability engineers are called in to support chip designers in the management of Single Event Effects. To this goal, we present a design-flow-oriented Soft Error Rate analysis methodology geared to allow practical and concrete decisions concerning implementation, design and functional choices in order to minimize SEEs impact on circuit and system behavior.
keywords: {failure analysis;integrated circuit design;integrated circuit reliability;SER-aware design flow;design-flow-oriented soft error rate analysis;radiation-induced failures;single event effects;system reliability;Circuits and systems;Concrete;Design engineering;Engineering management;Error analysis;Failure analysis;Maintenance;Performance analysis;Reflection;Reliability engineering;FIT;SER;Single Event;Single Event Transient;Single Event Upset;Soft Error;Soft Error Rate},

A. L. Silburt, A. Evans, I. Perryman, S. J. Wen and D. Alexandrescu, “Design for Soft Error Resiliency in Internet Core Routers,” in IEEE Transactions on Nuclear Science, vol. 56, no. 6, pp. 3551-3555, Dec. 2009.
doi: 10.1109/TNS.2009.2033915
Abstract: This paper describes the modeling, analysis and verification methods used to achieve a reliability target set for transient outages in equipment used to build the backbone routing infrastructure of the Internet. We focus on the ASIC design and analysis techniques that were undertaken to achieve the targeted behavior using 65 nm technology. Considerable attention was paid to Single Event Upset in flip-flops and their potential to produce network impacting events that are not systematically detected and controlled. Using random fault injection in large scale RTL simulations, and slack time distributions from static timing analysis, estimates of functional and temporal soft error masking effects were applied to a system soft error model to drive decisions on interventions such as the use of larger resilient flip-flops, parity protection of registers groupings, and designed responses to detected upsets.
keywords: {Internet;application specific integrated circuits;flip-flops;integrated circuit design;radiation hardening (electronics);telecommunication network reliability;telecommunication network routing;ASIC design;Internet core routers;RTL simulation;backbone routing infrastructure;communication system reliability;flip-flop single event upset;functional masking;random fault injection;slack time distribution;soft error resiliency;static timing analysis;temporal soft error masking effect;transient outages;verification method;Application specific integrated circuits;Control systems;Event detection;Flip-flops;Internet;Large-scale systems;Routing;Single event upset;Spine;Transient analysis;Communication system reliability;Single Event Upset (SEU);computer network reliability;integrated circuit design;integrated circuit radiation effects;neutron radiation effects},

D. Alexandrescu, A. L. Lhomme-Perrot, E. Schaefer and C. Beltrando, “Highs and lows of radiation testing,” On-Line Testing Symposium, 2009. IOLTS 2009. 15th IEEE International, Sesimbra, Lisbon, 2009, pp. 179-179.
doi: 10.1109/IOLTS.2009.5196004
Abstract: The presentation concerns a practical approach for dealing with difficulties associated to real time testing in a natural environment of microelectronic devices. This activity has several redeeming characteristics that make it very interesting for reliability engineers that are concerned with the Single Event Effects (SEEs) affecting the functioning of their devices in a natural working environment. The paramount advantage that the circuit behavior observed during the experiment will be identical to the effects that may appear during the useful lifetime of the product. Thus, the test results will closely match real- life scenario from both quantitative (event rate) and qualitative (types of events) perspectives. However, some difficulties are still present and may strongly impact the interest of this testing method. The first difficulty consists in obtaining a significant number of events in order to be able to compute statistical data with an adequate order of accuracy. This requirement has repercussions on the hardware test setAcircnot up and test protocol but also on the mathematical methods that will be used for result analysis. An appropriate test set-up for real-time testing should consist in a high number of same-type DUTs (Devices Under Test). This way, the event rate of a single device will be accelerated by the number of devices under test. This will have an obvious impact on the preparation of the experiment. The hardware board, tester, power supplies, and other instruments will have to be designed to support a large number of devices. To further accelerate the apparition frequency of the errors, the devices should operate in a natural working environment with a higher particle flux. Since the neutron flux increases with the altitude following a well known equation, we should place the test set-up (DUTs, tester, power supply, support instrumentation) in a high-altitude test facility. This facility could be either mobile (atmospheric balloon, dirigibles, a- irplanes, satellites, etc) or fixed (ground-level). In this presentation we only present our tests in mountain research stations such as Jungfraujoch, Swiss and Plateau de Bure, France, since these facilities benefits from all amenities (stable power main, internet access, lodging, accurate neutron flux monitoring) while still offering a good acceleration (15-20 times more neutrons compared to sea-level testing). In any case, the practical duration of real-time tests is quite long (several months). The second difficulty is related to the origin of the SEEs. Since internal radioactive impurities will produce alpha particles that may also cause SEEs, we will need to discriminate the contribution of external and internal perturbations. Accelerated radiation testing doesn’t present this problems, since the test times are very short (few hours), making the relative contributions of alpha particles very low. However, since real-time testing is performed during several months, the alpha particles cannot be ignored. Thus, we should be able to measure/estimate the contribution of the alpha particles to the Soft Error Rate of the circuit. Two approaches are possible. The first one consists in measuring the error count of the devices placed in a shielded environment (such as a cave or an underground tunnel) where the contribution of neutrons can be minimized, or estimating the alpha SER using the alpha emissivity of the package (measured through alpha-counting) and sensitivity results from dedicated alpha testing on exposed silicon dies. The third difficulty consists in correctly analyzing the test results. The low event rate requires adequate mathematical methods. Other possible external perturbations or set-up instability may cause additional errors that could mix with SEE-induced errors. Lastly, since the experiment will run during a long period of time, the test set-up will have to be controlled and monitored remotely. Additionally, the neutron atmospheric flux should b
keywords: {integrated circuit reliability;integrated circuit testing;radiation hardening (electronics);real-time systems;circuit behavior;microelectronic devices;product lifetime;radiation testing;real time testing;reliability engineering;single event effects;Alpha particles;Circuit testing;Hardware;Instruments;Life estimation;Life testing;Microelectronics;Neutrons;Power supplies;Reliability engineering},

S. J. Wen, D. Alexandrescu and R. Perez, “A Systematical Method of Quantifying SEU FIT,” On-Line Testing Symposium, 2008. IOLTS ’08. 14th IEEE International, Rhodes, 2008, pp. 109-114.
doi: 10.1109/IOLTS.2008.62
Abstract: We present a practical, systematical method for the evaluation of the soft error rate (SER) of microelectronic devices. Existing methodologies, practices and tools are integrated in a common approach while highlighting the need for specific data or tools. The showcased method is particularly adapted for evaluating the SER of very complex microelectronic devices by engineers confronted to increasingly demanding reliability requirements.
keywords: {radiation hardening (electronics);reliability;SEU fit;microelectronic devices;radiation hardening;reliability requirements;soft error rate;Circuits;Error analysis;Error correction;Manufacturing;Microelectronics;Performance analysis;Performance evaluation;Reliability engineering;Single event upset;System testing;FIT;Single Event;Single Event Transient;Single Event Upset;Soft Error;Soft Error Rate},

A. L. Silburt et al., “Specification and Verification of Soft Error Performance in Reliable Internet Core Routers,” in IEEE Transactions on Nuclear Science, vol. 55, no. 4, pp. 2389-2398, Aug. 2008.
doi: 10.1109/TNS.2008.2001742
Abstract: This paper presents a methodology for developing a specification for soft error performance of an integrated hardware/software system that must achieve highly reliable operation. The methodology enables tradeoffs between reliability and cost to be made during the early silicon design and SW architecture phase. An accelerated measurement technique using neutron beam irradiation is also described that ties the final system performance to the reliability model and specification. The methodology is illustrated for the design of a line card for an internet core router.
keywords: {Internet;computer network reliability;integrated circuit design;neutron effects;telecommunication network routing;Internet core router;integrated circuit design;line card design;neutron beam irradiation;reliability;silicon design;Acceleration;Computer architecture;Costs;Hardware;Internet;Measurement techniques;Particle beams;Silicon;Software systems;System performance;Communication system reliability;computer network reliability;integrated circuit design;integrated circuit radiation effects;neutron beams;neutron radiation effects;single event upset (SEU)},

D. Alexandrescu, L. Anghel and M. Nicolaidis, “New methods for evaluating the impact of single event transients in VDSM ICs,” Defect and Fault Tolerance in VLSI Systems, 2002. DFT 2002. Proceedings. 17th IEEE International Symposium on, 2002, pp. 99-107.
doi: 10.1109/DFTVS.2002.1173506
Abstract: This work considers a SET (single event transient) fault simulation technique to evaluate the probability that a transient pulse, born in the combinational logic, may be latched in a storage cell. Fault injection procedures and a fast fault simulation algorithm for transient faults were implemented around an event driven simulator. A statistical analysis was implemented to organize data sampled from simulations. The benchmarks show that the proposed algorithm is capable of injecting and simulating a large number of transient faults in complex designs. Also specific optimizations have been carried out, thus greatly reducing the simulation time compared to a sequential fault simulation approach.
keywords: {VLSI;circuit simulation;digital integrated circuits;discrete event simulation;fault simulation;probability;sensitivity analysis;statistical analysis;transient analysis;SET fault simulation technique;VDSM ICs;combinational logic;event driven simulator;fast fault simulation algorithm;fault injection procedures;probability;single event transient;statistical analysis;storage cell;transient faults;very deep submicron ICs;Circuit faults;Circuit simulation;Combinational circuits;Discrete event simulation;Latches;Logic circuits;Noise reduction;Power supplies;Protection;Statistical analysis},

L. Anghel, D. Alexandrescu and M. Nicolaidis, “Evaluation of a soft error tolerance technique based on time and/or space redundancy,” Integrated Circuits and Systems Design, 2000. Proceedings. 13th Symposium on, Manaus, 2000, pp. 237-242.
doi: 10.1109/SBCCI.2000.876036
Abstract: IC technologies are approaching the ultimate limits of silicon in terms of channel width, power supply and speed. By approaching these limits, circuits are becoming increasingly sensitive to noise, which will result in unacceptable rates of soft-errors. Manufacturing testing and periodic testing cannot cope with soft errors. Thus, fault tolerant techniques will become necessary even for commodity applications. This work considers the implementation of a new soft error tolerance technique based on time redundancy. Arithmetic circuits were used as test vehicle to validate the approach. Simulations and performance evaluation of the proposed fault-tolerance technique were made using in-house tools realized around an event driven simulator. The obtained results show that tolerance of soft errors can be achieved at low cost
keywords: {circuit simulation;digital arithmetic;discrete event simulation;errors;fault tolerant computing;integrated circuit reliability;integrated logic circuits;logic design;performance evaluation;redundancy;timing;transient analysis;SEU;arithmetic circuits;event driven simulator;fault tolerant technique;low cost;single event upsets;soft error tolerance technique;space redundancy;time redundancy;time/space redundancy;timing analysis;Arithmetic;Circuit simulation;Circuit testing;Discrete event simulation;Fault tolerance;Integrated circuit noise;Manufacturing;Power supplies;Redundancy;Silicon},

M. Nicolaidis, R. Perez and D. Alexandrescu, “Low-Cost Highly-Robust Hardened Cells Using Blocking Feedback Transistors,” VLSI Test Symposium, 2008. VTS 2008. 26th IEEE, San Diego, CA, 2008, pp. 371-376.
doi: 10.1109/VTS.2008.15
Abstract: CMOS nanometric technologies are increasingly sensitive to soft errors, including SEUs affecting storage cells and SETs initiated in the combinational logic, and eventually captured by some latches or flip- flops. SEUs affecting latches or flip-flops are by far the largest soft error rate (SER) contributor in logic. Thus, developing cost-efficient hardened storage cells to cope with SEUs in latches and flip-flops (but also in some memories difficult to protect by ECC ) is of increasing importance. This paper proposes a new principle for designing low-cost highly robust storage cells and several transistor level implementations.
keywords: {radiation hardening (electronics);semiconductor storage;blocking feedback transistors;hardened storage cells;highly-robust hardened cells;single event upsets;soft error rate;CMOS logic circuits;CMOS technology;Capacitance;Costs;Feedback;Flip-flops;Logic testing;Protection;Robustness;Single event transient;SEUs;radiation hardened cells;soft errors},