Fail operational design considerations for autonomous vehicles
To meet safety and reliability requirements for autonomous vehicles, electronic systems must transition from fail-safe/fail-silent to fail-operating modes. This calls for a higher degree of redundancy to assure the robustness, availability, and functional safety of critical systems. An effective strategy to achieve this redundancy uses a two-channel architecture for the electronic control units handling safety-critical functions.
At current levels of automation in passenger vehicles (Level 1 or 2, as defined by the SAE International), electronic systems are generally fail silent. The driver is fully engaged in control of the vehicle at all times and a system fault typically will generate a warning. A “limp home” mode may engage in the system to allow the vehicle to be driven to a safe location. As we advance into more extensive automation supporting conditional driver involvement (Level 3) or no driver intervention (Level 4 and 5) critical electronic systems will need to maintain function even in fault conditions. At Level 5 automation, fail operational systems meeting all appropriate safety standards are necessary.
The ISO 26262 standards published in 2011 are a primary resource for safety design in automotive electronics. Edition 2 of these standards is now in development and expected to reach initial publication/comment stage this year. Improvements in system redundancy clearly will be part of the updated standard.
Figure 1: Triple-modular redundancy for fail-operational systems.
One approach to fail operational design is referred to as triple modular redundancy, which is commonly used in aviation systems. Figure 1 illustrates this concept, in which the same algorithm runs on three equivalent instances and outputs are compared and evaluated by a majority voter. Application of this “2-out-of-3” (2oo3) approach in automotive ECUs significantly increases the complexity, and thus overall cost, of the involved hardware. As a result, “dual-dual redundancy” has been adopted for design of autonomous systems.
Figure 2: Dual-dual redundancy enables supports cost-efficient implementation of fail-operational systems.
The dual-dual architecture shown in Figure 2 features two independent, fail-silent processing channels. Each channel is typically implemented using a 1oo1D architecture, where D refers to diagnostic function. The dual-dual (or 2oo2D) fail-silent approach may be implemented with either symmetrical or asymmetric redundancy, providing further flexibility. Asymmetrical redundancy is applicable in such areas as the braking and steering ECUs, which typically support both safety-critical and comfort functions. Since the focus for fail operational systems is to assure the functionality of safety critical functions, designers can implement an asymmetric architecture in which one channel supports all functions and the second channel executes only critical features. Thus the second channel may use a less costly microcontroller (e.g., fewer processor cores, lower CPU speed and/or varied memory size).
Figure 3: The combination of the safety supply module and the AURIX microcontroller enables fail-operational systems with high availability.
Error reporting in today’s fail silent systems is illustrated in Figure 3. When the microcontroller detects a critical error it is reported to an error pin. An external safety module checks the pin and executes the required action (i.e., partial or global interrupt, reset or shut-down). As implemented in Infineon’s AURIX TC33x MCUs, an integrated Safety Management Unit is used to configure responses to error sources. The SMU is common to all of the MCUs in the TC33xx family, as well as locksteps and ECC protected memories. The multi-core nature of the MCU family contributes significantly to availability. Cores can operate separately, so that a fault in a single core can be reset while others continue normal operation.
Figure 4: Fail-Operational EPS design with two AURIX microcontrollers, which can al-so be asymmetrical (smaller and larger CPU) for cost reasons.
The highly scalable processor family is well-suited to the asymmetric architecture approach mentioned previously. Devices are available in single-, dual-, tri-, quad- and six-core configurations, with up to 4 lockstep cores to support ISO 26262 functional safety design. It is straightforward to use a high-end and low-end MCU in the same design, as shown in Figure 4. All aspects of the system safety concept, latencies and other design considerations are unchanged. Thus designers can balance the required performance of a full-function channel with a second, safety-critical only channel to optimize the total system cost.
The ability to meet the fail operational design requirements is critical both for advanced driver assist systems and autonomous vehicle design. The combination of multi-core MCU architecture and new approaches to design allow engineers to achieve the necessary performance and reliability while balancing development effort.