AEROSPACE REPORT NO. ATR-2023-01935

# Expanding Space Design Options Using COTS

September 6, 2023

Steven L. Hogan Digital & Integrated Circuit Electronics Department Electronics Engineering Subdivision

Prepared for: General Manager, Corporate Chief Engineer's Office

Authorized by: Corporate Chief Engineer's Office

For Public Release



## **MSIW Teams**

Two MSIW teams contributed to this report. Both provided significant contributions to this report.

#### **The Best Practices Team**

Dr. R. Rairigh (Lockheed Martin), Co-lead S. Hogan (Aerospace Corp), Co-lead R. DeLeon (Boeing) S. Duffy (Aerospace Corp) L. Harzstark (Aerospace Corp) J. Long (Raytheon) A. Para (Northrop Grumman) J. Ranaudo (Aerospace Corp) A. Sens (Boeing) Dr. B. Tabbert (Raytheon) A. Touw (Boeing) T. Wunderlich (Ball Aerospace)

#### The Workflow Team

M. Porter (NASA JPL), Co-lead S. Hogan (Aerospace Corp), Co-lead L. Harzstark (Aerospace Corp) Dr. J. Leitner (NASA GSFC) E. Minson (Maxar) A. Para (Northrop Grumman) J. Piacentine (Blue Canyon) Dr. R. Rairigh (Lockheed Martin) B. Tabbert (Raytheon) J. Walker (Maxar) T. Wunderlich (Ball Aerospace)

A big thanks to Brian Kosinski and Steve Lau for managing all the teams that contributed to this effort.

#### Abstract

The usage of Commercial Off the Shelf (COTS) components can provide impactful benefits to space programs. Space programs can benefit by accessing the latest performance technology and shorten procurement times for faster pace programs. This ATR (Aerospace Technical Report) provides guidance for determining the risk (cost, schedule, technical) of inserting COTS components or units on space vehicles as well as potential best practices and mitigations for many known COTS component or unit concerns.

| MS  | W Tear               | ns       |                                                                                                                                                                                                                                                                                                                  |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | i                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
|-----|----------------------|----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Abs | tract                |          |                                                                                                                                                                                                                                                                                                                  |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | ii                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
| 1.  | Execu<br>1.1         |          |                                                                                                                                                                                                                                                                                                                  |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
| 2.  | Discu<br>2.1         |          |                                                                                                                                                                                                                                                                                                                  | lerations                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
| 3.  | Inform               | ned Risk | Flow                                                                                                                                                                                                                                                                                                             |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | 9                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
| 4.  | Best I<br>4.1<br>4.2 | Genera   | l COTS bes                                                                                                                                                                                                                                                                                                       | t Practices<br>es by Component                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
| 5.  | COTS<br>5.1          |          | Mitigation<br>Radiation<br>Useful Li<br>Manufact<br>Trust Que                                                                                                                                                                                                                                                    | Questions for Flow Section B<br>Questions<br>fe Questions<br>uring Questions<br>estions<br>nent Questions                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
|     | 5.2                  |          | Mitigations<br>Radiation<br>5.2.1.1<br>5.2.1.1.1<br>5.2.1.1.2<br>5.2.1.1.3<br>5.2.1.1.4<br>5.2.1.1.5<br>5.2.1.2<br>5.2.1.2.1<br>5.2.1.2.1<br>5.2.1.2.2<br>5.2.1.2.3<br>5.2.1.2.4<br>5.2.1.2.5<br>5.2.1.3.1<br>5.2.1.3.1<br>5.2.1.3.2<br>5.2.1.3.3<br>5.2.1.3.4<br>5.2.1.4.1<br>5.2.1.4.2<br>5.2.1.4.3<br>5.2.1.5 | for Flow Section E<br>Total Dose<br>Local Shielding (1a)<br>Power Strobing (1b)<br>Increased Redundancy (1c)<br>N for M Redundancy (1d)<br>Multiple Images (1e)<br>Multiple Components in Parallel (1f)<br>Single Event Upsets<br>Periodic Refresh (1g)<br>Error Detection and Correction (1h)<br>Triple Modular Redundancy (1j)<br>FPGA-based Scrubbing (1k)<br>Zener Diodes Clamps and Filters (11)<br>Software Rollback (1m)<br>Single Event Functional Interrupt<br>Local Refresh (1n)<br>Component Reset (1o)<br>Power Cycle (1p)<br>CURTENT Limiting (1r)<br>Swap or Power Cycle by Fault Protection (1s)<br>Auto Power Cycle by the Hardware (1t)<br>Single Event Gate Rupture | $\begin{array}{c} & 40 \\ & 41 \\ & 43 \\ & 43 \\ & 43 \\ & 43 \\ & 43 \\ & 43 \\ & 43 \\ & 43 \\ & 44 \\ & 44 \\ & 44 \\ & 44 \\ & 44 \\ & 44 \\ & 44 \\ & 44 \\ & 44 \\ & 44 \\ & 44 \\ & 44 \\ & 44 \\ & 44 \\ & 44 \\ & 44 \\ & 45 \\ & 45 \\ & 45 \\ & 45 \\ & 45 \\ & 45 \\ & 45 \\ & 45 \\ & 46 \\ & 46 \\ & 46 \\ & 46 \\ & 46 \\ & 46 \\ & 46 \\ & 46 \\ & 46 \\ & 46 \\ & 46 \\ & 46 \\ & 46 \\ & 46 \\ & 46 \\ & 46 \\ & 46 \\ & 46 \\ & 46 \\ & 46 \\ & 47 \\ \end{array}$ |
|     |                      | 5.2.2    | 5.2.1.5.1<br>Useful Li<br>5.2.2.1                                                                                                                                                                                                                                                                                | Conservative Derating (1u)<br>fe<br>High Temperature Operational Life Temperature                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |

## Contents

|           | 5.2.2.1.1  | Accelerated Life Test for Temperature (2a)                  |    |    |
|-----------|------------|-------------------------------------------------------------|----|----|
|           | 5.2.2.2    | High Temperature Operational Life Voltage                   | 49 |    |
|           | 5.2.2.2.1  | 5.2.2.2.1 Life Test at Usage Voltage (2b)                   | 49 |    |
|           | 5.2.2.3    | High Temperature Operational Life Frequency                 | 49 |    |
|           | 5.2.2.3.1  | Life test at Usage Frequency (2c)                           |    |    |
|           | 5.2.2.4    | No HTOL or Reliability Data                                 |    |    |
|           | 5.2.2.4.1  | Extensive Accelerated Testing, EM or Unit Testing (2d)      |    |    |
|           | 5.2.2.4.2  | Focused Accelerated testing of the COTS Component (2e)      |    |    |
|           | 5.2.2.4.3  | Focused Testing of Like Process Structures (2f)             |    |    |
|           | 5.2.2.4.4  | Effects on System Reliability (2g)                          |    |    |
| 5.2.3     |            | y ()                                                        |    | 50 |
| - · · · - | 5.2.3.1    | Plastic Package                                             |    |    |
|           | 5.2.3.1.1  | Plastic Package encapsulation (3a)                          |    |    |
|           | 5.2.3.1.2  | Package Hermeticity (3b)                                    |    |    |
|           | 5.2.3.1.3  | Repackage Component (3c)                                    |    |    |
|           | 5.2.3.2    | Reworkability                                               |    |    |
|           | 5.2.3.2.1  | Risk Reduction Activity (3d)                                |    |    |
|           | 5.2.3.2.2  | Daughtercard Implementation (3e)                            |    |    |
|           | 5.2.3.2.3  | Easy to Replace Design (3f)                                 |    |    |
|           | 5.2.3.3    | Electrical Testing                                          |    |    |
|           | 5.2.3.3.1  | Enhanced Component Testing (3g)                             |    |    |
|           | 5.2.3.3.2  | Early Risk Reduction Circuit (3h)                           |    |    |
|           | 5.2.3.3.2  | Board Level Test (3j)                                       |    |    |
|           | 5.2.3.3.4  | Unit Level Test (3)                                         |    |    |
|           | 5.2.3.3.4  |                                                             |    |    |
|           | 5.2.3.4    | Lack of Component Level Burn-in Data (31)<br>Pure Tin Leads |    |    |
|           |            |                                                             |    |    |
|           | 5.2.3.4.1  | Double Layer of Conformal Coat (3m)                         |    |    |
|           | 5.2.3.4.2  | Re-Tin Leads or Repackage (3n)                              |    |    |
|           | 5.2.3.4.3  | Solder Wicking (30)                                         |    |    |
|           | 5.2.3.4.4  | Fusing (3p)                                                 |    |    |
|           | 5.2.3.4.5  | Annealing (3q)                                              |    |    |
|           | 5.2.3.4.6  | Matte Tin Plating (3r)                                      |    |    |
|           | 5.2.3.5    | Package RGA                                                 |    |    |
|           | 5.2.3.5.1  | Test the Component (3s)                                     |    |    |
|           | 5.2.3.6    | Particle Impact Noise Detection                             |    |    |
|           | 5.2.3.6.1  | Test the Component (3t)                                     |    |    |
|           | 5.2.3.7    | Bondpull                                                    |    |    |
|           | 5.2.3.7.1  | Bondpull Test the Component (3u)                            |    |    |
|           | 5.2.3.8    | Shock.                                                      |    |    |
|           | 5.2.3.8.1  | Shock Test the Component (3v)                               |    |    |
|           | 5.2.3.9    | Vibration                                                   |    |    |
|           | 5.2.3.9.1  | Vibration Test the Component (3w)                           |    |    |
|           | 5.2.3.10   | Burn-in                                                     |    |    |
|           | 5.2.3.11   | Lot Travelers                                               |    |    |
|           | 5.2.3.11.1 | On-Site Inspection (3y)                                     | 63 |    |
|           |            | Early Non-Destructive and Destructive Tests (3z)            |    |    |
|           | 5.2.3.12   | Electro-Static Discharge                                    |    |    |
|           |            | ESD Sensitivity Testing (3aa)                               |    |    |
|           | 5.2.3.13   | Barometric Pressure                                         |    |    |
|           |            | Pressure Test the Component (3ab)                           |    |    |
|           | 5.2.3.14   | Solderability                                               | 65 |    |

|       | 5.2.3.14.1 | Test the Component for Solderability(3ac)                        |       |
|-------|------------|------------------------------------------------------------------|-------|
|       | 5.2.3.15   | Lid Seal                                                         |       |
|       | 5.2.3.15.1 | Test the Component for Lid Seal(3ad)                             | 65    |
|       | 5.2.3.16   | Radiography                                                      |       |
|       | 5.2.3.16.1 | Radiographically Test the Component (3ae)                        | 65    |
|       | 5.2.3.17   | Resistance to Solvents                                           | 67    |
|       | 5.2.3.17.1 | Test the Component for Resistance to Solvents(3af)               | 67    |
|       | 5.2.3.18   | Scanning Acoustic Microscopy                                     |       |
|       | 5.2.3.18.1 | SAM Test the Component (3ag)                                     |       |
|       | 5.2.3.19   | Moisture                                                         |       |
|       |            | Moisture Test the Component (3ah)                                |       |
|       | 5.2.3.20   |                                                                  |       |
|       |            | Test the Component for Lead Finish (3aj)                         |       |
|       | 5.2.3.20.1 | Die Shear                                                        |       |
|       |            | Die Shear Test the Component (3ak)                               |       |
|       | 5.2.3.22   | Lid Torque                                                       |       |
|       |            | Lid Torque Test the Component (3al)                              |       |
|       | 5.2.3.23   | Lead Adhesion                                                    |       |
|       |            | Test the Component for Lead Adhesion(3am)                        |       |
|       | 5.2.3.24   | Column Pull                                                      |       |
|       |            |                                                                  |       |
|       |            | Pull Test the Component (3an)                                    |       |
|       | 5.2.3.25   | External Visual                                                  |       |
|       |            | On-Site Inspection or Observation (3ao)                          |       |
|       |            | External Visual post Component Receipt (3ap)                     |       |
|       | 5.2.3.26   | Pre-Cap Inspection                                               |       |
|       |            | On-Site Inspection or Observation (3aq)                          |       |
|       | 5.2.3.27   | Internal Visual                                                  |       |
|       |            | On-Site Inspection or Observation (3ar)                          |       |
|       |            | De-Lid and Inspect Component (3as)                               |       |
|       | 5.2.3.28   | Scanning Electron Microscopy                                     |       |
|       |            | SEM Test the Component (3at)                                     |       |
|       | 5.2.3.29   | Contamination                                                    |       |
|       |            | Existing Mitigations (3au)                                       |       |
|       | 5.2.3.30   | Hazardous materials                                              |       |
|       |            | Existing Mitigations (3av)                                       |       |
|       |            | Handling (aw)                                                    | 75    |
|       | 5.2.3.31   | Corrosive Materials                                              |       |
|       |            | Separation of Functions/Containment (3ax)                        |       |
| 5.2.4 |            |                                                                  |       |
|       | 5.2.4.1    | Foreign Sourced                                                  |       |
|       | 5.2.4.1.1  | Review of component (4a)                                         |       |
|       | 5.2.4.1.2  | Alternate Design (4b)                                            |       |
|       | 5.2.4.1.3  | Limit Usage in System Design (4c)                                | 78    |
|       | 5.2.4.1.4  | Blind Trusted Agent Buy (4d)                                     | 78    |
|       | 5.2.4.1.5  | Independent Verification and Validation for Critical Functions ( | 4e)78 |
|       | 5.2.4.2    | Heritage                                                         |       |
|       | 5.2.4.2.1  | Counterfeit Components (4f)                                      | 78    |
| 5.2.5 | Environm   | ental Considerations                                             |       |
|       | 5.2.5.1    | Aging                                                            |       |
|       | 5.2.5.1.1  | Multiple Components in Parallel (5a)                             |       |
|       | 5.2.5.1.2  | Conservative De-rating (5b)                                      |       |
|       |            | -                                                                |       |

|    |      | 5.2.5.1.3      | Conservative Thermal Environment (5c)     |    |
|----|------|----------------|-------------------------------------------|----|
|    |      | 5.2.5.1.4      | Dynamic Reliability Management (5d)       |    |
|    |      | 5.2.5.1.5      | Adaptive/Static Voltage Scaling (5e)      |    |
|    |      | 5.2.5.2        | Temperature                               |    |
|    |      | 5.2.5.2.1      | Local Heatpipes (5f)                      |    |
|    |      | 5.2.5.2.2      | Lower Supply Voltages (5g)                |    |
|    |      | 5.2.5.2.3      | Decreased Frequency (5h)                  |    |
|    |      | 5.2.5.3        | Vacuum                                    |    |
|    |      | 5.2.5.3.1      | Conservative Derating (5j)                |    |
|    |      | 5.2.5.3.2      | Increased Physical Spacing (5k)           |    |
|    |      | 5.2.5.3.3      | No "Golden" Nodes (51)                    |    |
|    |      | 5.2.5.3.4      | Encapsulation (5m)                        |    |
|    |      | 5.2.5.3.5      | Deployable Covers for Optics (5n)         |    |
|    |      | 5.2.5.3.6      | Unit Thermal Vacuum Test (50)             |    |
|    |      | 5.2.5.4        | EMC/EMI                                   |    |
|    |      | 5.2.5.4.1      | Shielding (5p)                            |    |
|    |      | 5.2.5.4.2      | Grounding (5q)                            |    |
|    |      | 5.2.5.4.3      | Power Strobing - Filtering (5r)           | 83 |
| 6. | Flow | Usage Examples |                                           | 84 |
| 0. | 6.1  | COTS Example 1 | On Board Computer Controller (OBCC) ~1988 |    |
|    | 6.2  | <b>1</b>       | Various programs – EEPROM usage ~2003     |    |
|    | 6.3  |                | Computer and Data Electronics (CDE)       |    |
| _  |      | •              | •                                         |    |
| 7. |      |                |                                           |    |
| 8. | Acro | nyms           |                                           |    |
|    |      |                |                                           |    |

## Figures

| Figure 1.  | Informed risk flow (A to E).                                 |    |
|------------|--------------------------------------------------------------|----|
| Figure 2.  | Informed risk flow – Determine COTS risk tolerance (A to B). |    |
| Figure 3.  | Informed risk flow – COTS mitigation scope (B to C)          |    |
| Figure 4.  | Informed risk flow – Scoping risk mitigation (C to D)        |    |
| Figure 5.  | Informed risk flow – Selecting the design solution (D to E). |    |
| Figure 6.  | Component selection, identification and mitigation flow      |    |
| Figure 7.  | Useful life with radiation.                                  |    |
| Figure 8.  | Useful life FIT rate including radiation.                    |    |
| Figure 9.  | Radiation mitigations flow                                   |    |
| Figure 10. | Useful life mitigations flow.                                |    |
| Figure 11. | Manufacturing mitigation flow (1 of 8).                      | 53 |
| Figure 12. | Manufacturing mitigation flow (2 of 8).                      |    |
| Figure 13. | Manufacturing mitigation flow (3 of 8).                      | 61 |
| Figure 14. | Manufacturing mitigation flow (4 of 8).                      | 64 |
| Figure 15. | Manufacturing mitigation flow (5 of 8).                      |    |
| Figure 16. | Manufacturing mitigation flow (6 of 8).                      | 69 |
| Figure 17. | Manufacturing mitigation flow (7 of 8).                      |    |
| Figure 18. | Manufacturing mitigation flow (8 of 8).                      | 74 |
| Figure 19. | Trust mitigations flow                                       |    |
| Figure 20. | Environmental mitigations flow.                              | 80 |
| Figure 21. | OBCC processor SEU mitigation example                        | 85 |
| Figure 22. | EEPROM mitigation example.                                   |    |
| Figure 23. | Example 3 flow example (1 of 2).                             |    |
| Figure 24. | Example 3 flow example (2 of 2).                             |    |

## Tables

| Table 1.  | Supplier Best Practices                                    | 16 |
|-----------|------------------------------------------------------------|----|
| Table 2.  | General Best Practices for the Component User              | 17 |
| Table 2.  | General Best Practices for the Component User (cont)       | 18 |
| Table 2.  | General Best Practices for the Component User (cont)       | 19 |
| Table 2.  | General Best Practices for the Component User (cont)       | 20 |
| Table 3.  | COTS Resistor User Best Practices                          | 21 |
| Table 4.  | COTS Capacitor User Best Practices                         | 21 |
| Table 5.  | COTS Connector User Best Practices                         | 22 |
| Table 6.  | COTS Diode User Best Practices                             | 22 |
| Table 7.  | COTS Inductor User Best Practices                          | 22 |
| Table 8.  | COTS Bipolar Junction Transistor (BJT) User Best Practices | 22 |
| Table 9.  | COTS Field Effect Transistor (FET) User Best Practices     | 23 |
| Table 10. | COTS Power Transistor User Best Practices                  | 23 |
| Table 11. | COTS Radio Frequency (RF) Transistor User Best Practices   | 23 |
| Table 12. | COTS OP-Amp/Comparator User Best Practices                 |    |
| Table 13. | COTS Pulse Width Modulators User Best Practices            | 23 |
| Table 14. | COTS Relay User Best Practices                             | 24 |
| Table 15. | COTS Printed Wiring Board User Best Practices              | 24 |
| Table 16. | COTS Fuse User Best Practices                              | 24 |
| Table 17. | COTS Wire User Best Practices                              | 24 |
| Table 18. | COTS Integrated Circuit/Hybrid User Best Practices         | 25 |
| Table 18. | COTS Integrated Circuit/Hybrid User Best Practices (cont)  |    |
| Table 19. | LM139A SMD Comparison                                      | 27 |
| Table 20. | Potential Radiation Mitigations                            | 41 |
| Table 21. | Potential Useful Life Mitigations                          | 47 |
| Table 22. | Potential Manufacturing Mitigations                        | 51 |
| Table 22. | Potential Manufacturing Mitigations (cont)                 | 52 |
| Table 23. | Potential Trust Mitigations                                | 76 |
| Table 24. | Potential Environmental Considerations Mitigations         | 79 |

## 1. Executive Summary

The space industry is being challenged by a heightened focus on their customer's desire/mandate to dramatically shorten the time from contract award to launch and the need for using the leadingedge/cutting-edge technology offered by commercially available electronic components.<sup>1</sup> This juncture marks a conscious decision to reduce one's traditional reliance of military component offerings and to implement the use of Alternate Grade Parts (AGP) also known as Commercial Off The Shelf (COTS) components that better align with these current program's technical and schedule restraints. In this ATR, COTS is defined as "A cataloged Electric, Electronic, and Electromechanical (EEE) component for which the US Government or its contractors does not control the specification. The item manufacturer, or non-government bodies (such as the Automotive Electronics Council), solely establishes and controls the specifications for performance, configuration, and reliability (including design, materials, processes, and testing) without additional requirements imposed by users". This ATR provides a sound, prudent, and time proven approach for leveraging the benefits of COTS components, while ensuring that the EEE COTS components chosen will satisfy all contract and mission parameters.

This paper provides a holistic approach that does not focus solely on the EEE components needed to build the hardware. The contract language, mission category, risk tolerance, and design mitigations (at the unit, system, and constellation levels) are integral parts that contribute to the determination of the EEE components chosen. It is imperative for the team members within the management, contract, design, and component selection organizations to agree upon the acceptable risk level for the mission category at hand. Once these factors have been made known and incorporated into the program's design and performance baseline, the components selection organization will be able to determine the quality and reliability level for the needed EEE components.

This ATR provides a series of questions whose answers will determine the ability of the EEE components chosen to satisfy all mission requirements. These questions have evolved from those asked when space flight first became a reality in the 1960s and house the knowledge the industry has learned from its mistakes. These questions need to be asked and answered regardless of the genre of components used (Mil-Spec or Commercial Off the Shelf) or the category of mission at stake (Class A, B, C or D). The answers will vary according to the mission category and risk profile acceptable for a given contract. To illustrate this point, the EEE components chosen for a Class A mission will have more stringent component history, qualification, and build knowledge requirements than those for a Class D mission. It is incumbent for the components selection organization to determine if the available data is sufficient to make an informed risk decision as to the component's ability to satisfy all contractual and mission requirements or if additional mitigations are necessary to bridge the gaps needed to make that decision.

To achieve the performance, cost, and schedule goals of these new proliferated space systems, the systems will need to be architected with more than Mil-Spec components in mind and may require use of COTS buses, systems, and components. This ATR provides guidance for increasing the trade space of possible solutions with regards to availability in contested space (informed risk of using COTS in space) with a holistic, reliable, and repeatable approach.

#### 1.1 Overview

The focus and products of this ATR is to enable the United States Government (USG) and its contractors to adapt to a changing landscape where the traditional approach to component selection, unit design and test, and system design and test will not meet performance, cost or schedule needs to achieve the mission.<sup>1</sup> Specifically, it lays out a framework that addresses this changing landscape by providing guidance on how EEE COTS components may be inserted in the technical baseline with a risk-informed

approach to directly address mission needs. This ATR is taking the NEXT STEP after the recent work performed by the NASA Engineering and Safety Council (NESC) on recommendations for COTS use in NASA missions which introduces the terminology of Industry Leading Part Manufacturers (ILPM) from established manufacturers that are already producing high quality, high volume, state-of-the-art, feature rich COTS components, and established COTS.<sup>2</sup> The methodology outlined in this ATR is relevant to all missions and mission classes.

Traditional component selection approaches to inserting COTS into a design focus on taking the asdesigned /as-built component from the supplier and subjecting it to tests and screens (up-screening) that attempt to demonstrate the component is compliant to the relevant requirements in an existing components assurance or control document (e.g. TOR-2006(8583)-5236, EEE-INST-002). Generally, these controls define a standard baseline that specifies usage of military specification components (MIL-PRF-38535 for microcircuits, MIL-STD-19500 for Semiconductors, and MIL-PRF-123 for ceramic capacitors to name a few examples) and the relevant test requirements from those specifications to demonstrate the compliance of the component to the pertinent set of quality requirements. This traditional approach presents risks in the context of the changing landscape for the following reasons:

- a) Up-screening introduces significant schedule considerations (> 24 52 weeks) [Schedule Risk]
- b) Mil-Spec Up-screening subjects COTS components to test levels that they may not have been designed to meet and are not directly derived from the relevant design application resulting in risk of component failure by exceeding the data sheet limits – [Cost or Schedule Risk (potential failure in ground testing) and Technical Risk (potential failure or performance degradation onorbit)]
- c) Historically, these risks would not have been recognized (captured in a program risk register) as they are part of the technical baseline under the traditional approach. However, under the changing landscape, particularly the reduced time between Authorization to Proceed (ATP) and launch, they must be recognized as risks in the context of executing the mission. This is because the component-level hardware risks can lead to impacts to unit or system-level assembly and test aimed at verification of mission requirements. This, in turn, can introduce programmatic risk (delays to capability or mission availability) or technical risk from escapes due to an accelerated or abbreviated system verification campaign that results in failure or performance degradation onorbit.

An overarching theme of the changing landscape is the need to establish an informed mission risk profile that addresses the acceptable risk of failure for given conditions. Failure, in this context includes the traditional notions of hardware failure in operations, but also failure to achieve the required mission availability due to schedule delay. The ability to select components (not just COTS) for the mission will directly result from this informed mission risk profile.

This ATR is broken into six sections and is designed for use by multiple organizations responsible for mission execution:

- a) Section 1 (Executive Summary), Section 2 (Discussion), and Section 3 (Informed Risk) are intended for USG and contractor program management and systems engineering for setting the stage and guidance to determine the informed risk and provide background for the best practices and mitigations.
- b) Section 4 (Best Practices) and Section 5 (Mitigations) are intended for systems engineering, design engineering, radiation/survivability engineering, components engineering and materials and process engineering organizations to determine the impact of using COTS in the application

with many suggestions for system mitigations that don't require the traditional approaches, such as expensive and time-consuming component up-screening.<sup>2</sup>

c) Section 6 (Flow Usage Examples) provides examples on using the mitigation flows.

There are several major points that are manifest across the document and summarized, as follows:

- a) Most potential component-level risks are addressed by selecting a COTS supplier that implements the prescribed best practices in Section 4. Component selection is the key, working within the chosen component's datasheet limits. Most of Section 5 content is provided for when no component can meet the conditions in the relevant application.
- b) Applying risk mitigation in Section 5, stemming from uncertainty in a supplier's component assurance or limitations in component design, testing, or application should occur across design elements and lifecycle in a manner that optimizes cost, technical, and schedule impacts for the mission and should not be restricted to the component level. Examples:
  - Implement board-level test programs to address risk from lack of burn-in or 100% electrical test in a COTS component.
  - Using a robust tin whisker mitigation manufacturing program (lead-solder reflow in circuit card assembly manufacturing, applying conformal coating) to address risk from lead-free components.
- c) Automotive Electronic Council (AEC) passive (AEC-Q200) and discrete (AEC-Q101) components procured from suppliers that share Level 3 Production Part Approval Process (PPAP) data (Best Practice by supplier) represent a low-risk COTS solution for most missions when used within their datasheet parameters and with appropriate derating for the application.
- d) Complex components (field programmable gate arrays (FPGA), System on a Chip (SoC), multicore processors, etc.) are a major focus for best practices in Section 4 and risk mitigation in Section 5 because they largely represent the performance enablers required to meet mission needs under the changing landscape.

The solutions are not component centric, but are holistic, combining component best practices with board, unit, and system mitigations to arrive at a total solution. This holistic approach includes recommendations on how the contract needs to be structured to embrace this new approach. These recommendations are identified in Acquisition Considerations to Expand Space Design Options using COTS Electrical, Electronic, and Mechanical (EEE) Parts and Units (ATR-2023-01981). This expanding space design options ATR is organized to provide a general overview of the usage of COTS for space, followed by a discussion and then a suggested flow for determining the component risk associated with a component that does not have sufficient basis for reliability in the application, followed by supplier and user best practices, both general and component specific. Following the best practices section, the mitigations section contains detailed risk assessment questions and associated component risk mitigations. The mitigations section also includes several examples of usage of non Mil-Spec components before Mil-Spec components existed to meet a specific need and how they were used successfully. When the design solution does not include components that fully meet component assurance need for the mission then other best practices and mitigations may be employed to sufficiently lower the mission risk. These are covered in Section 4 and 5. Section 6 provides the same non Mil-Spec examples using the flow charts from Section 5 to guide the reader on how to use the Section 5 flow to determine the necessary mitigations for the chosen component.

Besides the value of using COTS components for quick-turnaround, short duration missions where components with unknown life performance are acceptable, there are COTS components that can benefit

any class of mission. Complex components with significant leverage are Systems On a Chip (SOC), multicore processors, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGA), memory (SDRAM, FLASH), Standard Interfaces (i.e. Ethernet), Charge Coupled Devices (CCD) Analog to Digital Converters (ADC) and high speed and precision amplifiers. Components used in large quantities, both active and discrete, such as interface components, amplifiers, resistors, capacitors, transistors, and diodes are also good candidates for COTS space usage.

## 2. Discussion

EEE component assurance for space hardware has historically been focused on ensuring the quality of each lot of components through a series of screening and qualification tests, such as radiation hardness, including single event effects (SEEs), accelerated life testing, shock and vibration, temperature cycling and more. Ensuring component quality through these lot specific tests has been the standard method to eliminate infant mortality and to claim long term component reliability but are expensive and time-consuming both to the component supplier and the user.

As a result, complex components and subsystems for space usage have lagged those adopted for commercial use which largely do not follow the same Mil-Spec methods for establishing quality and demonstrating reliability. For microcontrollers, field programmable gate arrays (FPGAs), and systems on a chip (SOCs) for example, the space-approved components are two to three technology generations behind the current state-of-the art (SOTA) commercially available components. Market share analysis reported in multiple sources consistently shows that military and space components make up an ever-diminishing percentage of the market and are currently estimated to be 0.1% of the total component world market and 1% of the semiconductor market. It is imperative to use COTS to expand the components available to missions due to the shrinking Mil-Spec component market share and lagging capability.

For short development span programs (< 3 years ATP to launch), the emphasis will be on units that are already designed and are largely build-to-print or are already built and available in inventory. Using build-to-print units with COTS components doesn't fully address the changing landscape where we need to deliver higher performing systems under shorter development timelines that leverage commercial technologies, so developments using the SOTA components is vital<sup>1</sup>. It should be noted that while short development span programs may use COTS units, the buyer of the COTS units should ask the questions in Section 5.1 to fully understand if any unacceptable risk surfaces. The developers of the COTS units, when doing their developments, are encouraged to use this ATR to surface and be fully informed in their risk mitigation decisions.

This ATR endeavors to help address this conundrum by introducing and expanding on the concept that component selection is a broad trade space that must be factored into system design and test, and the mission's risk profile. Fundamentally, it is an incorrect assumption that all military or space-grade components (passives, discretes, actives, etc.) will show better reliability performance than COTS. Also, it is an incorrect assumption that a COTS component will always deliver better technical performance than a military or space-grade component in the required application. To paraphrase Thomas Jefferson, not all components are created equal. Under our definition of COTS lay a spectrum of components that range from unsuitable for many space applications to those that have demonstrated reliability and performance on par with or exceeding traditional military or space-grade components. By applying the guidelines laid out in this ATR and adhering to the component datasheet, a user will learn methods, best practices and mitigations that will help them better exploit the current landscape of components in their designs and enable future missions that will meet the needs under the changing landscape.

Components that can operate in the space environment are important to support US strategic interests. Flying components in space often requires many attributes, to include:

- Strict quality requirements
- Radiation effects knowledge for susceptible components
- Operation over large thermal swings
- Survival from shock and vibration
- Operation in vacuum

- Acoustic pressure
- EMC/EMI effects
- Long powered-on service life
- Component build knowledge

Many of these attributes are tied to the space environment which is difficult and unforgiving, including radiation, shock, vibration, and temperature extremes. This is exacerbated by the inability to maintain (replace) the vast majority of space hardware once on orbit. Currently, a great deal of effort is expended in component-level testing and verification that these needs are properly met. Because this assurance testing is time consuming and expensive, the components ultimately approved for satellite usage are behind the curve as compared with the current state of the art. As a result, the current baseline of space-grade components that meet the attributes above also have the following negative attributes including, lagging the industry state of the art, limited selection of qualified devices and uneven supply chain availability.

COTS spans an unbounded trade space of capabilities, performance levels, quality, inherent reliability, and environmental ranges. As such the burden is on the user to understand the limitations and promises of the datasheet combined with knowledge of the manufacturer to select components that meet mission requirements. COTS components can also provide some very tangible benefits to space programs. Besides the potential performance increase and shorter acquisition times, the components are much lower cost since appropriate components for selection are designed and manufactured for reliability, safety, capability, cost, and delivery time, unlike the premise of the Mil-Spec system, which has little consideration for anything outside of safety, quality, and reliability.

Modern manufacturing, technology, and statistical process controls today in many cases provide more robust COTS components to tolerate these challenging regimes, thus providing much greater performance and reliability, at lower cost. This has been particularly true for components designed and manufactured for the automotive and medical industries which have established controls and process qualification to achieve high quality (AEC-Q004, Automotive Zero Defects Framework) and demonstrated reliability for applications across diverse thermal, electrical, and mechanical environments that align with or encompass those seen for space applications<sup>3</sup>. The one clear difference is demonstrated radiation performance. However, a large majority of components used in space hardware designs do not exhibit radiation susceptibility and therefore it is not a consideration for those components (based on a survey of several space unit designs, a typical bill of materials for an electronic unit has > 75% passives and discretes that are not radiation susceptible). All complex components (FPGAs, SoC, multi-core processors, etc.) that do exhibit radiation susceptibility must be characterized as part of the component selection and design process, with particular emphasis on understanding SEE. Additional considerations need to be made for other potential knowledge gaps associated with COTS component packaging, internal build process and materials, and expected reliability if there is limited or no heritage performance to assess. These gaps need to be identified as early as possible as part of component selection by consulting the datasheet and supplier product information so that they can be addressed, and any risks mitigated, as part of the design trade space.

One feature of component selection that needs to be emphasized with COTS components, although it is true with all components, is that it is critical to identify the "right components from the right suppliers". As discussed above, certain types of industries levy controls but not all suppliers follow or fully implement those controls. Also broadly true is that it is critical to utilize all components within their datasheet limits with appropriate considerations for part selection and derating to meet required design and operational margins. For example, many military components such as capacitors are rated to Vr and designed to 2Vr (where Vr is the operational voltage) whereas many COTS capacitors are rated and designed to Vr. Therefore, to get the same design margin the user would need to select a part rated at 2Vr

for their application. For complex components, these considerations extend to many additional critical parameters besides voltage rating or current carrying capacity and cannot be ignored by the user. This discussion feeds into the broader fact that properly informed judgments about how to select components as part of the trade space are necessary, and the information to develop and support such judgments is provided in Sections 3, 4, and 5. Furthermore, in many cases, there will be no component available, COTS, military or space-grade, to meet all the necessary criteria, and thus mitigations will be needed in circuit, system design, and testing.

In some cases, no components will be available that satisfy all the necessary criteria combining resource and time constraints with long-term reliability in space, so several mitigations provided in Section 5 leverage the lower cost and shorter acquisition time to allow programs to assess the COTS insertion and development risk earlier. A typical design with Mil-Spec components goes through an engineering model (EM) to flight unit development period. Typically, the EM components are cheaper than their flight versions, using abbreviated test versions of the flight components (since the flight components have long *serial* testing as part of their schedule), but are still expensive relative to COTS. COTS components can be purchased early and more of them procured, which allows component risk reduction testing to be performed in parallel, leveraging an advantage of COTS. COTS components provide an opportunity to use alternate approaches (mitigations) with a wider array of components to realize a viable system leveraging component quality through high volume, automated manufacturing lines with strict statistical process control of a qualified process instead of low volume, semi-automated manufacturing lines with screening and lot-specific qualification.

## 2.1 Acquisition Considerations

It must be recognized that as the acquisition process proceeds, additional mitigations or the scope of known mitigations may change. For example, during the proposal phase, a mitigation to reset a component periodically may be identified as required due to SEE performance. As the design progresses (Section E of the flow, Figure 6) it may become necessary to change the system timing and Flight Software (FSW) to fully incorporate the desired reset function. The overall architecture needs to be considered as an integral part of the system solution. For new non Mil-Spec or non-RHA (Radiation Hardness Assurance) components, the system acquisition process may be as long as the Mil-Spec process with its known qualities (highly dependent on program schedule and cost constraints, component chosen and potential mitigations). While the usage of COTS components and units, if selected wisely, promises gains in performance and cost, that gain comes with some loss of knowledge about the component. There is an implicit trade between the COTS component and its required mitigations (if any) vs the cost and schedule of the Mil-Spec component. There needs to be a recognition that the acquisition process needs to change to use COTS components and units for program spectrum that span three years from ATP to launch to the long duration class A national asset type. The acquisition strategy for the range of programs that incorporate COTS components is documented in Acquisition Considerations to Expand Space Design Options using COTS Electrical, Electronic, and Mechanical (EEE) Parts and Units (ATR-2023-01981).

It is up to programs to decide what best practices, questions, and mitigations are consistent with their program constraints. It should also be recognized that a COTS component "qualified" for a particular application may not be equivalent to qualification of a Mil-Spec component as the COTS component "qualification" may be application dependent.

The holistic approach dictates that changes to space system architectures also may need to be revised when considering the use of COTS components. The following lists several changes that should be considered.

- a) Redundancy: To increase the reliability of the system, designers would need to incorporate redundancy in the design of the constellation architecture. This could include using multiple spacecraft in each orbital plane or using multiple orbital planes. This would be of consideration when one is confined to using COTS that are not established as reliable or that are used outside of their datasheet limits.
- b) Fault-tolerance: The architecture would need to be designed to be more fault-tolerant, so that it can continue to function even if one or more functions fail. This could involve using redundant items or designing the architecture to automatically switch to a backup mode in the event of a failure. This would be of consideration when one is confined to using COTS that are not established as reliable or that are used outside of their datasheet limits.
- c) Scalability: The architecture should be scalable, so that new spacecraft or subsystems can be easily added or removed as needed or as increased performance technology becomes available. This would allow for the system to be expanded or contracted as required.
- d) Distributed architectures: The constellation could be designed in a distributed architecture, where each spacecraft is able to operate independently, and the system can continue to function even if one or more spacecraft fails.
- e) Continuous monitoring: The architecture and systems should include a continuous monitoring function that would allow the system to detect any signs of degradation or failure in the select EEE components (first time use). This would be of consideration when one is confined to using components that are not established as reliable or that are used outside of their datasheet limits where the program risk profile cannot accommodate TD or SEE testing.
- f) Changes to satellite designs may also be considered. Satellite integrators should consider designing their system in a modular, scalable fashion. Satellites should be designed to allow plug and play architectures such that different configurations of satellite box elements can be combined depending on mission requirements.

Satellite unit providers (such as Star Trackers, Reaction Wheels, etc.) should produce modular scalable performance boxes in a somewhat continuous fashion. Standard off the shelf units could be redesigned on a planned periodic basis to roll technology improvements into new generations of units.

For long term constellations, a Diminishing Manufacturing Sources and Material Shortages (DMSMS) program will need to be established as COTS EEE components often have relatively short life cycles compared to military and space EEE parts. A good DMSMS program requires periodic re-design to replace or improve functions, not just EEE components. Strategic characterization of COTS parts technologies that factors in how certain component types evolve with technology, supply chain evolution, or material availability can strongly contribute to such a program.

## 3. Informed Risk Flow

The key word in this process is "informed". The flowcharts in this section are intended to provide guidance to understand and define the insertion and development risk the program is willing to tolerate when considering COTS components in the design solution. The flow is laid out to first determine what the insertion risk level for the program will be. This is intended to be a joint customer and contractor conversation. Figure 1 shows the overarching informed risk flow. Figures 2-5 provide the detailed steps and decision points embedded in the informed risk flow that culminate in execution of the accepted design solution by the program.

The informed risk flow in Figure 1 is broken up into nodes, identified as A-E, with each node representing a waypoint that needs to be reached prior to moving on to the next phase of the flow. Node A represents the start or inception of the program or associated effort. This could be as early as a pre-Request for Proposal (RFP) phase or as part of the Authorization to Proceed (ATP) for the effort. Node B represents achieving a definitized program risk posture regarding usage of COTs to achieve necessary cost, schedule, and technical performance requirements. Node C represents determination of the potential risk mitigations that can be applied to meet the program risk posture. Node D represents arriving at a scoped risk mitigation plan that incorporates all relevant best practices for addressing a candidate COTs design. Node E represents transition to program execution of the accepted solution which may be COTs, MIL-Spec or a combined design, and then implementation of the design and associated features.

The intention of this ATR is to provide information for any mission class so that the program may determine the best course of action for their COTS insertion risk tolerance, including potential risk mitigations, after deciding the insertion and development risk posture for the program. Once the risk posture has been jointly decided, the assessment of any COTS components as part of the proposal can be done. Figure 2 (nodes A to B) shows the actions to determine the program COTS insertion risk tolerance and if needed, cost and schedule decision points. It also illustrates that the CONOPs for the mission should be incorporated as part of the initial assessment effort.

Figure 3 (nodes B to C) shows the actions to be taken to assess the COTS component insertion and development risk. It is intended that the risk assessment questions in Section 5.1 (mitigation questions) are considered during this part of the flow as well as the commensurate mitigations to determine the mitigation scope early. The questions in Section 5.1 correspond to the detailed mitigation flows for easy correlation. Once the component insertion risk and mitigation scope has been determined, it is recommended that the best practices list be considered as they may alleviate some of the mitigation scope. Figure 4 (nodes C to D) shows the informed risk determination after consulting the best practices list found in Section 4 of this ATR. The best practices are broken into two main sections: the desired supplier best practices (4.1) and the user best practices (4.2).

The desired supplier best practices are based on the best practices of suppliers when providing Mil-Spec components. It is not expected that COTS component suppliers will provide the best practice information found in Table 1. Since it is expected that some of the information in Table 1 will not be available, the best practices list by the user was developed to augment the lack of information from the supplier. These suggested user best practices were compiled to avoid employing the mitigations found in Section 5. Table 2 shows general user best practices for components. Table 3 through Table 18 are user best practices by components for quick reference.

For the informed Risk Flow in Figure 5 (nodes D to E), the left side of the flow is for the occasion where the usage of COTS is known during the proposal phase. The right side is the proposal to use COTS components that occurs after the proposal and uses the same data to assess the COTS insertion risk. Exiting point E is the start of program execution. This represents the second pass through Section 5.2 to execute the selected mitigations. The flow in Figure 6 is essentially in parallel with the informed risk flow from flow Sections B to D for searching the Standard Military Cross Reference Matrix (SMCR) as an aid before concluding only a COTS component can meet the requirement, the questions, and scope, while the mitigation details are employed at flow Section E. It is recognized that for a short duration, short procurement mission that the usage of some Mil-Spec components or units may not be feasible due to their procurement lead time and that some of the steps outlined below may be skipped (for example, developing and trading off a Mil-Spec solution). The essence of COTS usage is a multivariable trade of cost, schedule, weight, and power of the Mil-Spec component solution vs the unknowns of the COTS solution (which promises substantial performance and SWAP advantage with potential mitigation impacts). Being informed about the advantages and disadvantages of both is the key to a good decision for the program.



Figure 1. Informed risk flow (A to E).



Figure 2. Informed risk flow – Determine COTS risk tolerance (A to B).



Figure 3. Informed risk flow – COTS mitigation scope (B to C).



Figure 4. Informed risk flow – Scoping risk mitigation (C to D).



Figure 5. Informed risk flow – Selecting the design solution (D to E).

## 4. Best Practices

The best practices are broken into two groups, general best practices for COTS components and COTS best practices by component.

The general best practices are broken down to best practices by the supplier and by the component user. It is desired that the supplier performs the best practices found in Table 1, to choose a component from a supplier with known quality. Refer to NESC-RP-19-01490 Recommendations on Use of COTS Guidance for NASA Missions Phase II (11-10-22 NRB) RP FINAL Section 7.1.3.1 for the definitions of the terminology and success criteria.<sup>2</sup> If the suppler does not perform these best practices or is unwilling to supply supporting information of their process, it may be necessary for the component user to use some of the best practices found later in Section 4.1 and perform some of the mitigations in Section 5.

Table 1. Supplier Best Practices

Best Practices for COTS Components (Supplier) Process stability for at least one year<sup>2</sup> Product produced in high volume<sup>2</sup> 100% electrical test<sup>2</sup> Multi-lot charaterization<sup>2</sup> Fully automated line<sup>2</sup> Undergoes in-process testing<sup>2</sup> Maintains consistent yield<sup>2</sup>

#### 4.1 General COTS best Practices

The general best practices for COTS components, shown in Table 2, are generic and may also apply to Mil-Spec components, but were compiled to apply to COTS components. These best practices are from the MSIW (Mission Success Information Workshop) sub-team for COTS best practices and is intended to be considered when developing the program informed risk flow (Section C).

| Number | Best Practices for COTS component (user)                                                                                                                                                                                                                                                                                                                                                                                                                      |
|--------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 1      | For a given COTS, once the program has defined the necessary best practices and                                                                                                                                                                                                                                                                                                                                                                               |
|        | mitigations, develop a plan to deal with the higher cost and schedule risk items first <sup>3</sup>                                                                                                                                                                                                                                                                                                                                                           |
| 2      | Consider not having the COTS in a critical timing/ mission path (although it is                                                                                                                                                                                                                                                                                                                                                                               |
|        | recognized that COTS is likely a part of the critical path for performance reasons) <sup>2, 5, 12, 16</sup>                                                                                                                                                                                                                                                                                                                                                   |
| 3      | Consider the fault management implications (fault/degradation mechanisms) <sup>4, 5, 12, 16, 23</sup>                                                                                                                                                                                                                                                                                                                                                         |
| 4      | Consider on orbit monitoring of key performance parameters for trending COTS performance particularly for supplier non tested parameters) <sup>12</sup>                                                                                                                                                                                                                                                                                                       |
| 5      | Conduct COTS component level FMEA (not functional level) to understand the systems effects of the COTS faults and potential propagation effects <sup>2, 4, 5, 12,16, 23</sup>                                                                                                                                                                                                                                                                                 |
| 6      | Consider the component failure rate and number of times used for critical applications as an element of the program risk assessment <sup>16</sup>                                                                                                                                                                                                                                                                                                             |
| 7      | Upon failure of a COTS component, the program/project should initiate a failure analysis, and all efforts within available resources should be made to determine root cause. The first steps to root cause determination should be to verify that the component's datasheet was not violated in processing, testing, or usage <sup>2</sup>                                                                                                                    |
| 8      | Pay particular attention to COTS absolute and recommended ratings (different from Mil-Spec components). Scrutinize the datasheet carefully <sup>4, 16, 20</sup>                                                                                                                                                                                                                                                                                               |
| 9      | Bring in senior peer reviewers early to consider the impacts of the COTS(s) in the design/board/unit/system <sup>12, 16, 23</sup>                                                                                                                                                                                                                                                                                                                             |
| 10     | Consider large design margins for poorly analyzed/tested COTS parameters - verify through test <sup>2, 8, 9, 16, 23</sup>                                                                                                                                                                                                                                                                                                                                     |
| 11     | Reserve extra connector pins for test points early with the expectation that a key parameter will need to be monitored at unit level (particularly for COTS with no burn-in testing) <sup>4, 12</sup>                                                                                                                                                                                                                                                         |
| 12     | Consider standard interfaces over proprietary/unique interfaces <sup>23</sup>                                                                                                                                                                                                                                                                                                                                                                                 |
| 13     | Consider the obsolescence aspect of the component chosen. Obsolete in a year may mean no supplier support at the end of the program <sup>2, 3, 5, 16</sup>                                                                                                                                                                                                                                                                                                    |
| 14     | Make as much usage as possible of supplier toolkits, simulation models, and development aids. Perform a careful review of the development tool parameters vs the FLT parameters (for example, a processor development kit may run full clock speed and use faster memory than the flight application, so the FLT processor throughput will be less than that of the development kit (the code may run faster on the development kit) <sup>2, 10, 16, 23</sup> |
| 15     | For interface components, VERIFY that the components are truly capable of cold<br>sparing and do not act as a sneak path (example, Honeywell HX422R and HX422D<br>are NOT cold spare capable, so cannot be used for a cross strap application with a<br>powered component connected. Many commercial components do not need to<br>operate in a cross strap (redundancy) configuration                                                                         |
| 16     | Reserve a greater than standard SWAP reserve for the assemblies with COTSs to absorb the mitigation impacts <sup>3, 4, 23</sup>                                                                                                                                                                                                                                                                                                                               |

| Number | Best Practices for COTS component (user)                                                          |  |
|--------|---------------------------------------------------------------------------------------------------|--|
| 17     | Review historical usage and any alerts on the COTS component <sup>2, 3, 4, 5, 11, 16, 20</sup>    |  |
| 18     | Consider using one size up case size <sup>23</sup>                                                |  |
| 19     | Avoid high aspect ratios (>3:1) on ceramic substrates <sup>23</sup>                               |  |
| 20     | Use conservative derating <sup>2, 9, 23</sup>                                                     |  |
| 21     | For chip case components, avoid areas of the PWB with high flexure <sup>23</sup>                  |  |
| 22     | For a COTS that the contractor has no experience, contact the supplier for                        |  |
|        | recommended handling, assembly, installation and test instructions <sup>16</sup>                  |  |
| 23     | For a new COTS component, consider the deratings carefully (mil-spec deratings                    |  |
|        | may not be enough) <sup>2</sup> Apply deratings that do not push the component outside of its     |  |
|        | datasheet limits <sup>16</sup>                                                                    |  |
| 24     | Make use of the community knowledge of the COTS component (PMPedia),                              |  |
|        | components working groups <sup>11, 23</sup>                                                       |  |
| 25     | For total dose (TD) considerations, COTS components fabricated in the newer                       |  |
|        | complementary metal oxide semiconductor/bipolar complementary metal oxide                         |  |
|        | semiconductor (CMOS/BiCMOS) technology nodes are preferred over those from                        |  |
|        | older technologies <sup>16</sup>                                                                  |  |
| 26     | Select widely used COTS components manufactured by major semiconductor on-                        |  |
|        | chip monitors (OCMs) and always select the highest grade available <sup>16</sup>                  |  |
| 27     | Be aware that Mil-Spec components generally have more built-in margins per the                    |  |
|        | datasheet limits (e.g., rated voltage, temperature, or frequency of operation) than               |  |
|        | COTS components, therefore operating COTS components beyond their datasheet                       |  |
|        | limits can be more problematic than doing so with Mil-Spec components <sup>2</sup>                |  |
| 28     | Know WHY a COTS component is being selected for use. Selecting COTS                               |  |
|        | components in lieu of standard Mil-Spec components solely on the expectation to                   |  |
|        | save cost rarely succeeds, particularly under strict adherence to current Agency                  |  |
|        | guidelines for Class A-C missions. In most cases, COTS are chosen to provide                      |  |
|        | performance, availability or, in some cases, reliability advantages <sup>2</sup>                  |  |
| 29     | Consider standard or dual footprints to allow drop-in replacements (alternates) <sup>16, 23</sup> |  |
| 30     | Consider avoiding components that push other aspects of the design for                            |  |
|        | manufacturability and rework (PWB microvias, number of layers, internal/external                  |  |
|        | PWB geometries, local heatpipes, mounting) that may be difficult to procure <sup>10, 20</sup>     |  |
| 31     | Vary the power supplies from minimum to maximum to assure components/boards                       |  |
|        | provide correct signals in light of the variations in COTS tolerances (full range in-             |  |
|        | situ testing) <sup>4, 10, 16, 2, 25</sup>                                                         |  |
| 32     | Perform a continuous test over temperature (not just the plateaus) as an element of               |  |
| 22     | the "test like you fly" or "day in the life" testing <sup>2, 4, 5, 9, 16, 21</sup>                |  |
| 33     | Strongly consider "test like you fly" tests where COTS performance is involved.                   |  |
|        | The tests need to be perceptive <sup>4, 12, 16</sup>                                              |  |

Table 2. General Best Practices for the Component User (cont)

| Number | Best Practices for COTS component (user)                                                                                                                                         |
|--------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 34     | Verify all non-tested and "typical" parameters of the COTS/compare to a nominal                                                                                                  |
|        | BOL analysis, consider trending during the EM test phase <sup>2, 9, 16, 21</sup>                                                                                                 |
| 35     | Verify all timing and signal integrity of the COTS/compare to nominal beginning                                                                                                  |
|        | of life (BOL) analysis, consider trending key parameters during the EM test phase <sup>2</sup> 4, 16, 21                                                                         |
| 36     | Perform early (as soon as component is baselined) destructive parts analysis (DPA) to understand supplier/component quality <sup>2, 3, 4, 20, 21</sup>                           |
| 37     | Consider board/unit level test points to evaluate supplier non-tested parameters.<br>Consider parametric performance criteria for COTS vs pass/fail criteria <sup>4, 5, 23</sup> |
| 38     | For shorter mission (<3 yrs) consider Board Level Screening (BLS) of thermal,                                                                                                    |
|        | electrical and dynamics (TED) to provide confidence of function over the entire                                                                                                  |
|        | test range (refer to COTS Card Unit Level Char Guidance). For longer duration                                                                                                    |
|        | programs, consider HALT/HAST (Highly Accelerated Life Test/Highly                                                                                                                |
|        | Accelerated Stress Test) testing <sup>11, 20, 25</sup>                                                                                                                           |
| 39     | Develop a good testplan around the COTS to fully test all of the functions required                                                                                              |
|        | for the application <sup>8, 10</sup>                                                                                                                                             |
| 40     | For boards that are using COTS components that have not seen burn-in, consider                                                                                                   |
|        | building extra boards (relatively easy to do when in sequence) or life test <sup>2, 11, 16</sup>                                                                                 |
| 41     | Use accelerated testing not just to find if component will survive, but to find weaknesses that may need mitigation <sup>16</sup>                                                |
| 42     | Make an effort to get enough run-time hours at board and unit level <sup>16</sup>                                                                                                |
| 43     | Thermal vacuum test units with vacuum sensitive COTS components <sup>21</sup>                                                                                                    |
|        | If additional component-level testing or screening need to be performed, always consult datasheets to ensure specified operational limits are not exceeded.                      |
| 44     | Exceeding specification limits can result in latent damage in components that can lead to failures in flight <sup>2</sup>                                                        |
| 45     | Shock sensitive, review the <i>conditions</i> the supplier uses for testing chatter and                                                                                          |
|        | transfer (the MIL-Spec allows the coils to be energized during the shock event for                                                                                               |
|        | the transfer test (NOT consistent with how it will be used))                                                                                                                     |
| 46     | Consider higher than Mil-Spec order quantities (EM, FLT (flight)) to allow for                                                                                                   |
|        | fallout for components that have not gone through burn-in or for component                                                                                                       |
|        | obsolescence (commercial component obsolescence or unannounced changes                                                                                                           |
|        | occurs much more often than mil-spec components) <sup>4, 16</sup>                                                                                                                |
| 47     | Ensure components used for FLT are from the same line as those used for DPAs,                                                                                                    |
|        | radiation testing, reliability testing, and EMs. Commercial designs change rapidly and maybe without notice <sup>2, 3, 16, 21</sup>                                              |
| 48     | Consider purchasing extra components and printed wiring boards (PWBs) of the                                                                                                     |
|        | FLT lot for sacrificial DPAs, coupon testing <sup>2, 4, 16</sup>                                                                                                                 |

Table 2. General Best Practices for the Component User (cont)

| Table 2. General Best Practices for the | Component User (cont) |
|-----------------------------------------|-----------------------|
|-----------------------------------------|-----------------------|

| Number | Best Practices for COTS component (user)                                                                                                                                           |
|--------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 49     | Contact the manufacturer and request reliability monitor data, product change notices, field return data and any other information available <sup>2, 3, 4, 5, 10, 16, 21, 23</sup> |
| 50     | Require LDC/CoC (lot date code/certificate of compliance) information when                                                                                                         |
| 50     | purchasing (BB (breadboard), EM, FLT) <sup>1, 8, 14, 22</sup>                                                                                                                      |
| 51     | Procure only from authorized suppliers <sup>2, 5, 10, 16</sup>                                                                                                                     |
| 52     | Monitor KPP relative to cost/schedule <sup>4</sup>                                                                                                                                 |
| 53     | If a COTS used in the past changes the process or suppliers, the component may need a complete re-evaluation <sup>3, 5, 21</sup>                                                   |
| 54     | "Heritage" with COTS will not mean the same as heritage with mil-spec components. COTS "qualification" will be highly application specific <sup>2, 9</sup>                         |
| 55     | Consider planning in the effort to make the engineering model (EM) form, fit, and function (FFF) with the FLT units. A common practice to preserve schedule is to                  |
|        | use the EM on the vehicle (in place of FLT) to perform rework. This also pays                                                                                                      |
|        | dividends later should the FLT unit have issues (testing can be done on the EM                                                                                                     |
|        | over temp, vacuum, dynamics, and EMI, whatever was the condition the FLT unit                                                                                                      |
|        | showed an anomaly. A FFF EM also has the benefit for long duration programs of                                                                                                     |
| 56     | a ground asset for anomaly testing and resolution (highly unit function dependent) <sup>16</sup><br>As $h$ the sumplier if they are plots with unusual following a wight           |
| 56     | Ask the supplier if they scrap lots with unusual fallout or yield (example: multi-<br>layer ceramic caps (MLCC) Pre-termination Confocal Scanning Acoustic                         |
|        | Microscopy (CSAM) inspection) <sup>2</sup>                                                                                                                                         |
| 57     | Ask the supplier how they establish production limits <sup>2</sup>                                                                                                                 |
| 58     | Consider minimizing manual (hand) assembly in favor of automated assembly.                                                                                                         |
| 50     | Hand soldering ceramic caps requires special care <sup>2, 8, 9, 23</sup>                                                                                                           |
| 59     | Avoid hand soldering COTS ceramic caps <sup>2, 8</sup>                                                                                                                             |
| 60     | For PWBs the coupons MUST have the most aggressive features used on the PWB                                                                                                        |
|        | (microvias, spacing, smallest hole size, blind vias). Early coupon DPA is                                                                                                          |
|        | recommended <sup>20</sup>                                                                                                                                                          |
| 61     | For PWBs with prime and redundant circuits, the Institute of Printed Circuits (IPC)                                                                                                |
|        | standard is not sufficient. The IPC isolation guidelines do not recognize the single                                                                                               |
|        | point failure implications <sup>26</sup>                                                                                                                                           |

## 4.2 COTS Best Practices by Component

The best practices found in the Tables 3 through 18 were compiled from many sources: The Aerospace Corporation, contractors, and multiple white papers on the usage of COTS for space. The best practices were broken down by component to allow quick reference for a particular component.

| Number | Resistors - Mil-Prf-55342                                                                         |
|--------|---------------------------------------------------------------------------------------------------|
| 1      | Consider using one size up case size <sup>25</sup>                                                |
| 2      | Avoid high aspect ratios (>3:1) on ceramic substrates <sup>25</sup>                               |
| 3      | Use conservative derating (power) <sup>10, 11, 25</sup>                                           |
| 4      | For chip case components, avoid areas of the PWB with high flexure <sup>25</sup>                  |
| 5      | Consider standard or dual footprints to allow drop-in replacements (alternates) <sup>18, 25</sup> |
| 6      | For COTS components with pure tin leads, consider the mitigation techniques                       |
|        | early <sup>28, 29</sup>                                                                           |

Table 1. COTS Resistor User Best Practices

| Number | Capacitors - Mil-Prf-123                                                                            |
|--------|-----------------------------------------------------------------------------------------------------|
| 1      | Consider using one size up case size <sup>23</sup>                                                  |
| 2      | Use conservative derating (voltage, power) <sup>11, 23</sup>                                        |
| 3      | Ceramic caps are generally preferred if available in size & rating <sup>23</sup>                    |
| 4      | Tantalum caps should be conservatively derated $(>50\%)^{23}$                                       |
| 5      | Avoid Aluminum electrolytics: they are not sealed, with higher equivalent series resistance (ESR)   |
| 6      | Caps in high ripple circuits may have internal heating issues <sup>23</sup>                         |
| 7      | Consider placing two small caps in series can preclude failure with same function <sup>23</sup>     |
| 8      | Consider inrush currents vs tantalum ESR <sup>9</sup>                                               |
| 9      | Consider vacuum effects on PTCs (anomalous transients) - derate conservatively <sup>25</sup>        |
| 10     | Consider standard/dual footprints to allow drop-in replacements (alternates) <sup>23</sup>          |
| 11     | Avoid hand soldering COTS ceramic caps <sup>2, 8, 9, 23</sup>                                       |
| 12     | Consider reflow temp based on solder material                                                       |
| 13     | For COTS components with pure tin leads, consider the mitigation techniques early <sup>27, 28</sup> |

 Table 2. COTS Capacitor User Best Practices

| Table 5. CO | TS Connector | User Best | Practices |
|-------------|--------------|-----------|-----------|
|-------------|--------------|-----------|-----------|

| Number | Connectors - Mil-Dtl-38999, 55302; Mil-Prf-24682, 83518                              |
|--------|--------------------------------------------------------------------------------------|
| 1      | Connector "qualification" is application unique: misalignment can result in pin      |
|        | fretting under dynamic conditions                                                    |
| 2      | Strongly consider a thorough tolerance analysis for connector engagement and         |
|        | alignment                                                                            |
| 3      | Consider controlled (limited) force when installing press-fit connectors             |
| 4      | Verify connector spice model and signal integrity for fast edge rate (<1 ns) signals |
|        | with test data. Check signal end to end.                                             |
| 5      | DPA connector early to verify plating and under-strike materials                     |
| 6      | Avoid "flip-drilling" on PWBs using press-fit connectors                             |
| 7      | Consider early DPA of PWB coupons for via quality if press-fit connectors are        |
|        | used                                                                                 |
| 8      | For COTS components with pure tin leads, consider the mitigation techniques          |
|        | early <sup>26, 27</sup>                                                              |

## Table 6. COTS Diode User Best Practices

| Number | Diodes - Mil-Prf-19500                                                         |
|--------|--------------------------------------------------------------------------------|
| 1      | Consider conservative thermal condition for schottkey diodes: High operating   |
|        | temps yield lower Vf and ~10x higher reverse leakage <sup>23</sup>             |
| 2      | Consider verifying breakdown voltage, reverse leakage current, and forward on- |
|        | current (maximum)                                                              |
| 3      | Consider Max forward current, Junction temperature, and power deratings        |
| 4      | Consider transient thermal impedance with pulse duration (operation)           |

## Table 7. COTS Inductor User Best Practices

| Number | Inductors Mil-Prf-27                                                                              |
|--------|---------------------------------------------------------------------------------------------------|
| 1      | Inductors in switching power supplies will dissipate power <sup>23</sup>                          |
| 2      | Inductors are the heaviest of parts, so care must be taken in mounting $^{23}$                    |
| 3      | For Mil-Std-1553 transformers, ensure the leakage inductance is balanced and less then $\xi$ will |
|        | than 6 uH                                                                                         |

## Table 8. COTS Bipolar Junction Transistor (BJT) User Best Practices

| Number | Transistor (BJT) - Mil-Prf-19500                                             |
|--------|------------------------------------------------------------------------------|
| 1      | Consider loss of gain as TD increases <sup>23</sup>                          |
| 2      | Consider enhanced low-dose rate sensitivity (ELDRS) for radiation assessment |
| 3      | Consider junction temperature and power derating                             |

#### Table 9. COTS Field Effect Transistor (FET) User Best Practices

| Number | Transistor (FET) Mil-Prf-19500                                                                                                |
|--------|-------------------------------------------------------------------------------------------------------------------------------|
| 1      | Consider increase gate leakage current with increased TID <sup>23</sup>                                                       |
| 2      | Ensuring sufficient gate drive and conservative (Drain voltage and power) derating result in reliable operation <sup>23</sup> |
| 3      | Consider junction temperature and power derating                                                                              |

#### Table 10. COTS Power Transistor User Best Practices

| Number | Transistor (Power - Si, SiC, GaN)                                             |
|--------|-------------------------------------------------------------------------------|
| 1      | Consider different technologies and power consumption requirements            |
| 2      | Consider derating (SOA, Package level derating) based on total power          |
|        | consumption and on-time duration.                                             |
| 3      | Consider thermal runaway, heat dissipation, Rdson, and dislocation defects in |
|        | device construction                                                           |

## Table 11. COTS Radio Frequency (RF) Transistor User Best Practices

| Number | Transistor (RF - Si, HEMT)                                                     |
|--------|--------------------------------------------------------------------------------|
| 1      | Consider technology node and power consumption requirements                    |
| 2      | Consider package for H-poisoning for high electron mobility transistors (HEMT) |
| 3      | Consider passivation material for topology and humidity reduction              |

Table 12. COTS OP-Amp/Comparator User Best Practices

| Number | Op Amp/Comparators - Mil-Prf-38535                                                |
|--------|-----------------------------------------------------------------------------------|
| 1      | Avoid relying on tight leakage currents and offsets without testing <sup>23</sup> |
| 2      | Comparators may momentarily flip their outputs under SEE (function of input       |
|        | differential voltage - smaller voltage is more susceptible) <sup>23</sup>         |
| 3      | Read the datasheets carefully. Specifications for the same part number have       |
|        | different capabilities (different suppliers) - Example LM139A                     |
| 4      | For COTS components with pure tin leads, consider the mitigation techniques       |
|        | early <sup>27, 28</sup>                                                           |

#### Table 13. COTS Pulse Width Modulators User Best Practices

| Number | PWMs - Mil-Prf-38535, 38534                                                   |
|--------|-------------------------------------------------------------------------------|
| 1      | Can exhibit function degradation with TID, design accordingly <sup>23</sup>   |
| 2      | SEE can cause momentary output upsets, so accommodate in design <sup>23</sup> |

| Table 14. COTS R | elay User Best Practices |
|------------------|--------------------------|
|------------------|--------------------------|

| Number | Relays - Mil-Prf-28750, 28776, 83536                                                   |  |
|--------|----------------------------------------------------------------------------------------|--|
| 1      | Relays are shock sensitive, review the <i>conditions</i> the supplier uses for testing |  |
|        | chatter and transfer (the MIL-Spec allows the coils to be energized during the         |  |
|        | shock event for the transfer test (NOT consistent with how it will be used))           |  |
| 2      | For relays that SHARE current through two or more contacts, consider the failure       |  |
|        | implications of loss of a contact path                                                 |  |

| Table 15. COTS Printed Wiring Board User Best Practices |
|---------------------------------------------------------|
|---------------------------------------------------------|

| Number | PWBs - IPC-2221, 2222                                                                                                                                                                                                                                       |  |
|--------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| 1      | For PWBs with prime and redundant circuits, the IPC standard is not sufficient.                                                                                                                                                                             |  |
|        | The IPC isolation guidelines do not recognize the single point failure implications <sup>26</sup>                                                                                                                                                           |  |
| 2      | Consider purchasing extra components and PWBs (FLT lot) for sacrificial DPAs, coupon testing <sup>2, 4, 16</sup>                                                                                                                                            |  |
| 3      | For PWBs the coupons MUST have the most aggressive features used on the PWB (microvias, spacing, smallest hole size, blind vias) Early coupon DPA is recommended <sup>20</sup>                                                                              |  |
| 4      | Consider avoiding components that push other aspects of the design for<br>manufacturability and rework (PWB microvias, number of layers, internal/external<br>PWB geometries, local heatpipes, mounting) that may be difficult to procure <sup>10, 20</sup> |  |
| 5      | Consider early DPA of PWB coupons for via quality if press-fit connectors are used                                                                                                                                                                          |  |

#### Table 16. COTS Fuse User Best Practices

| Number | Fuses - Mil-Prf-24319                                                             |  |
|--------|-----------------------------------------------------------------------------------|--|
| 1      | Fuses should be in an easily accessible location, able to change without removing |  |
|        | other components (particularly ceramic caps)                                      |  |
| 2      | Analyze power on conditions to ensure the inrush current does not open the fuse   |  |

Table 17. COTS Wire User Best Practices

| Number | Wire - Mil-Dtl-3432                                              |  |
|--------|------------------------------------------------------------------|--|
| 1      | Avoid cutting into the shield braid when stripping shielded wire |  |

| Number | ICs/Hybrids - Mil-Prf-38535, Mil-Prf-38534                                                                                                                                                                                                                                            |  |  |  |
|--------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|
| 1      | Vary the power supplies from minimum to maximum to assure parts/boards                                                                                                                                                                                                                |  |  |  |
|        | provide correct signals in light of the variations in alternate grade parts tolerances <sup>4, 10, 25</sup>                                                                                                                                                                           |  |  |  |
| 2      | Perform a continuous test over temperature (not just the plateaus) as part of the "test like you fly" or "day in the life" testing <sup>4, 5, 9</sup>                                                                                                                                 |  |  |  |
| 3      | Verify all non-tested and "typical" parameters of the COTS/compare to a nominal BOL analysis, consider trending during the EM test phase <sup>9</sup>                                                                                                                                 |  |  |  |
| 4      | Verify all timing and signal integrity of the COTS/compare to nominal BOL analysis, consider trending key parameters during the EM test phase <sup>4</sup>                                                                                                                            |  |  |  |
| 5      | Pay particular attention to COTS absolute and recommended ratings (different from Mil-Spec parts). Scrutinize the datasheet carefully <sup>4</sup>                                                                                                                                    |  |  |  |
| 6      | Perform early (as soon as part is baselined) DPA to understand supplier/part quality <sup>3, 4</sup>                                                                                                                                                                                  |  |  |  |
| 7      | Consider higher than Mil-Spec order quantities (EM, FLT) to allow for fallout for components that have not gone through burn-in or for component obsolescence (commercial component obsolescence or unannounced changes occurs much more often than mil-spec components) <sup>4</sup> |  |  |  |
| 8      | Contact the manufacturer and request reliability monitor data, product change notices, field return data and any other information available <sup>3, 4, 5, 10, 16, 21, 23</sup>                                                                                                       |  |  |  |
| 9      | Require LDC/CoC information when purchasing (BB, EM, FLT) <sup>3, 10, 16</sup>                                                                                                                                                                                                        |  |  |  |
| 10     | Procure only from authorized suppliers <sup>5, 10, 16</sup>                                                                                                                                                                                                                           |  |  |  |
| 11     | Monitor KPP relative to cost/schedule <sup>4</sup>                                                                                                                                                                                                                                    |  |  |  |
| 12     | Consider board/unit level test points to evaluate supplier non-tested parameters.<br>Consider parametric performance criteria for COTS vs pass/fail criteria <sup>4, 5, 23</sup>                                                                                                      |  |  |  |
| 13     | Consider large design margins for poorly analyzed/tested COTS parameters - verify through test <sup>8, 9, 16, 23</sup>                                                                                                                                                                |  |  |  |

## Table 18. COTS Integrated Circuit/Hybrid User Best Practices

| Number | ICs/Hybrids - Mil-Prf-38535, Mil-Prf-38534                                                                                                                                                             |  |
|--------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| 14     | Strongly consider "test like you fly" tests where COTS performance is involved.<br>The tests need to be perceptive <sup>4, 12, 16</sup>                                                                |  |
| 15     | Bring in senior peer reviewers early to consider the impacts of the COTS(s) in the design/board/unit/system <sup>12, 16, 23</sup>                                                                      |  |
| 16     | Consider the fault management implications $(fault/degradation mechanisms)^{4, 5, 12, 16, 23}$                                                                                                         |  |
| 17     | Consider on orbit monitoring of key performance parameters for trending COTS performance (particularly for supplier non tested parameters) <sup>12</sup>                                               |  |
| 18     | Reserve a greater than standard SWAP reserve for the assemblies with COTS to absorb the mitigation impacts <sup>3, 4, 23</sup>                                                                         |  |
| 19     | Reserve extra connector pins for test points early with the expectation that a key parameter will need to be monitored at unit level (particularly for COTSs with no burn-in testing) <sup>4, 12</sup> |  |
| 20     | Conduct COTS Part level FMECA (not functional level) to understand the system effects of the COTS faults and potential propagation effects <sup>4, 5, 12,16, 23</sup>                                  |  |
| 21     | Consider minimizing manual (hand) assembly in favor of automated assembly.<br>Hand soldering Ceramics caps requires special care <sup>2, 8, 9, 23</sup>                                                |  |
| 22     | Avoid pure tin components inside hybrids                                                                                                                                                               |  |
| 23     | For DDR Memory, Consider various technology nodes for Tr and Caps for optimum solution                                                                                                                 |  |
| 24     | For Flash memories, Consider stress-induced leakage current (SILC), Crosstalk and Program/Erase cycles (retention) <sup>29</sup>                                                                       |  |
| 25     | For flash memories, Limit operational temperature based on technology node                                                                                                                             |  |
| 26     | For flash memories, Consider optimum selection for Single-level cell (SLC),<br>Multi-level cell (MLC), Triple-level cell (TLC), and Quad-level-cell (QLC)<br>(Reliability vs Capacity) <sup>29</sup>   |  |
| 27     | For COTS components with pure tin leads, consider the mitigation techniques early <sup>27, 28</sup>                                                                                                    |  |
| 28     | Request what tests suppliers perform and activation energies for various failure mechanisms                                                                                                            |  |

Table 18. COTS Integrated Circuit/Hybrid User Best Practices (cont)

#### 5. COTS Mitigations

Commercial components have been used in space designs before, however, in these prior efforts, commercial suppliers allowed the buyers to review the process data (many of these items found in the manufacturing section of this document) that documented the process and component quality. At that time, many components were being designed for the military, not consumer electronics. An example is the venerable LM139A. Before the Standard Military Drawing (SMD) for the LM139A, suppliers, like National Semiconductor and Analog Devices, produced components to the best standards, but were not designed for a radiation environment. Buyers would procure lots of LM139A and radiation test them to find an acceptable lot. For the modern commercial processes, proprietary information has resulted in suppliers being unwilling to share the component build details, resulting in the inability for buyers to verify the build quality. With the SMD, suppliers changed their process to incorporate radiation tolerance. RHA is now a standard property for the LM139A. However, along with the RHA came a change of VCC (Voltage Common Collector) or VDD (Voltage Drain) the component required to operate. Table 19 shows the different flavors of the LM139A in the SMD catalog. What isn't shown in the table, although all four components have an RHA rating, is the SEE performance of the component as the RHA for these components does not address SEE. The first three components listed can be sensitive to SEE, while the last one is nearly immune. The message here is the details are important, even for the same "generic" component. While the SMD has simplified the selection of components for space use, the usage of modern COTS components without the US military as a large customer, results in questions that should be asked to implement the appropriate mitigations addressed by this ATR.

| LM139A Comparison |          |                        |  |
|-------------------|----------|------------------------|--|
| SMD               | RHA      | Minimum supply voltage |  |
| 96738             | 100K     | 2V                     |  |
| 98613             | 100-300K | 5V                     |  |
| 87739             | 100K     | 5V                     |  |
| 01510             | 50 -300K | 9V                     |  |

The flow shown in Figure 6 is a more detailed companion flow to the flows in Section 3. This flow is meant for the Systems Engineering, Design Engineering, Radiation, and Parts Materials and Process organizations to ask the right questions and apply mitigations. Note the mitigations are not component level solutions, but actions, changes in architecture, CONOPS, design, and usage of the COTS component to use it as-is (not up-screening), as seen from the LM139A example. The suggested mitigations are work-arounds to make the program successful, while being informed about the insertion and development risk of using a COTS component.

The flow in Figure 6 starts with assessing if there is a Mil-Spec component that will meet the requirements. A Mil-Spec solution, perhaps requiring more components, more SWAP, or long lead procurement may be unappealing, but should be considered so that there can be an honest trade of the different solutions (as the COTS component mitigations may be onerous too). The link to the SMCR is included in the acronyms list to aid in the component selection effort. If no solution is found there, potential high-quality suppliers (established COTS) can be engaged for the desired component, hopefully

using supplier best practices from Section 4. If the supplier is unwilling or unable to supply the ideal list of information, a supplier that the user has history with coupled with the best practices for users can be used to adjust to the different environment of the COTS component documentation to address the COTS component knowledge risk. It may still be necessary to consider the mitigations to fully determine the COTS component's shortcomings and what actions may be needed to fully adjust. Failing to find an established COTS component, the flow then proceeds to the questions and mitigations to work with the candidate COTS component to achieve a solution.



Figure 6. Component selection, identification and mitigation flow.

Below are several examples of where COTS components were successfully used for Class A space vehicles (design life > 10 years), explaining the vulnerability discovered and what mitigations were used to meet the mission need.

#### COTS Example 1: On Board Computer Controller (OBCC) ~1988.

The LSI Logic L65400 1750A processor was chosen over the McDonnell Douglas 281 1750A processor due to throughput requirements. The LSI Logic process was not radiation hardened (it was a commercial process). Radiation testing determined that the process TD performance was adequate, but the SEU performance was poor. To compensate for the poor SEU performance, the board and system design was modified to "build-in" a quiet period for the OBCC during the first 500 µs of the processing period. The first 500 us of the processing period was then used to check if any of the interfaces had not completed their transfers (the OBCC was a router between the On-Board Computers (OBCs) and the rest of the spacecraft). If an interface transfer was not complete, this was flagged as an error and on-board fault protection was notified. The quiet period also allowed for the OBCC design to use the leading edge of the processing cycle pulse to perform a processor restart via the non-maskable, level 0 interrupt (NMI). The NMI was used to force the processor to restart, executing the NMI interrupt service routine to re-initialize all the processor registers, reset all of the peripheral serial interfaces (via register resets), initialize the serial interfaces, and refresh the triple voted discretes. This effectively restart the OBCC every processing period (except for a key portion of SRAM that was reserved for memory load processing that could span the processing cycle). The mitigations to do this affected the OBCC hardware design, the OBCC firmware design, another unit that interfaces with the OBCC to NOT send commands to telemetry requests during the quiet period, and the flight software (the on-board fault protection). These techniques have been on orbit for 28 years (as of 2023) with no known upsets attributed to the processor. The techniques used can be found in the mitigation Sections 1g, n, and o. Refer to Section 6.1 for this example's flow exercise.

#### COTS Example 2: Various programs – EEPROM usage ~2003.

Many programs use the EEPROM that use the Hitachi die. The TD performance of the EEPROM depends on the application. The components are good to about 25Krad (Si) when powered on, significantly better when powered off. Additional circuitry is needed if powering off the EEPROM is required to meet the application's TD requirement need (mitigation 1b). In addition, the EEPROM RES pin is sensitive to noise (and potential data corruption from inadvertent writes). The power ON/OFF circuit to the EEPROM must also control the RES input during power ON/OFF to less than 0.25V. Reset circuits need to accommodate this constraint. Since this component is non-volatile, there is a data retention period to observe. At the time, little was known about the fabrication details from Hitachi (Hitachi sold the die, then stopped supporting the die). This component was also new to the space design world, and as a result, as more was gradually discovered about the component, additional design practices were developed. Given the 20-year stated datasheet retention, with no process details, many contractors were conservative with specifying a period to perform an on-orbit refresh, anywhere from 5 to 10 years, depending on the application (mitigation 1g). Anomalies with the component began to surface: One was byte writes could lead to corrupted data within that page, so full page writes were required (an impact to the SW governing the refresh). Another was weak bits (slow outputs due to weak cells that resulted in the output sense amps taking longer to output the correct value. This was screened out using at-speed testing. One supplier was using a DATAIO programmer to verify the packaged die speed, but the DATAIO waited microseconds (vs the 150 ns EEPROM read requirement), allowing EEPROMS with weak bits to pass the screening test. Other examples of mitigations for the EEPROM are at the board/Flight SoftWare (FSW) level by programming multiple images into the EEPROM, so that if one was corrupted, another image (the same content) could be used (mitigation 1e). One program architected the unit to be able to load the EEPROM

from another assembly. The boot code in the single board computer (SBC) with the EEPROM, which was a slave on the backplane bus, allowed access to the EEPROM and SRAM from an external source. The master could receive ground command and control the path to the EEPROM and SRAM, via the backplane interface, without the SBC processor running application code. Finally, the On-Board Fault Protection (OBFP) was designed to enable (but not yet swap) a swap of the SBC for EEPROM read errors. The SBC had Error Detection and Correction (EDAC), so a single bit error was corrected when read by the processor, but not in EEPROM. A swap was initiated if multiple locations showed Single Bit Errors (SBEs). This is mitigation 1s. Refer to Section 6.2 for an example flow exercise.

COTS Example 3: Computer and Data Electronics (CDE) ~2004.

The CDE had several Power On reset requirements that required both the 3.3V and 5V rails to be monitored. During power up, the POR circuit was required to provide a delay to allow the oscillator to be guaranteed to be oscillating (10 ms following voltage stability), hold off the reset until all voltage were stable, and provide a very low output reset signal to the EEPROM RES input. The POR circuit needed to be able to function at voltage where the IO components may start to conduct (due to a requirement for the unit to not be able to glitch the command output signals or potentially send a false command to the vehicle). A low voltage LM139A was chosen. The datasheet from National Semiconductor stated that their LM139A was designed to work down to 2V VCC. This feature was not tested by National Semiconductor (tested VCC minimum was +5V). The supplier was not willing to test the component at 2V VCC. The program chose to purchase components and test in house for VCC performance. Components from this lot were selected and installed into an evaluation board to characterize the component's performance. The board design was modified to provide a test point at board level to assess the POR performance during board testing (engineering model and flight). The success of this effort was attention to detail and obtaining the data needed to implement the appropriate mitigations. These techniques have been on orbit for many years with no known issues attributed to the reset circuit. The techniques used can be found in the mitigation Sections 1a, 3h, j, and k. Refer to Section 6.3 for this example's flow exercise.

### 5.1 COTS Mitigation Questions for Flow Section B

This section is intended to support the informed risk posture for the program, providing guidance to ask questions during the proposal effort or after a COTS component has been selected to aid in identifying the potential mitigations that may be required for a given program's risk tolerance level.

These questions are aimed at being considered during Section B (Figures 2 and 4) of the informed risk flow to determine the appropriate mitigation scope. The mitigation execution comes later (flow Section E, Figure 5) after the system technical, cost, schedule, and mitigation risks have been identified. It is recognized that not all the questions and mitigations will be needed for a given program. It is up to the program to determine which questions affect their risk profile and what associated mitigations are required. It is imperative to recognize that the Mil-Spec methods referenced in this ATR are for guidance and that a test method, if used, may need to be altered to stay within the COTS component datasheet limits. Some of the methods referenced may not be appropriate for a given technology.

### 5.1.1 Radiation Questions

Radiation is unique to space equipment and in recognition of what may be a difficult and time-consuming effort to determine a COTS component reaction to radiation. The radiation section covers Total Dose (TD), Single Event upsets (SEUs), Single Event Functional Interrupts (SEFI), Latchup, and Single Event Gate Rupture (SEGR) which also encompasses Single Event Burnout (SEB). They represent some of the

highest schedule and technical risks should a non-RHA component not have available radiation lot data (which is likely since the usage of non-RHA components usually requires lot specific test data and there are many choices, not radiation tested, that may be chosen).

TD is the cumulative long-term damaging effects of total dose of radiation due to protons and electrons. The effects of TD on a component can range from simple parametric shifts (input leakage current, offset voltages, or increased active or quiescent currents) to more catastrophic effect of functional failure. The amount of TD a component can handle translates to the amount of time for a given orbit the component can be expected to remain within its specifications. SEUs are a different phenomenon that is a change of state (usually as flip-flop or memory storage cell) caused by a high energy particle such as ion, electrons, or protons. SEFI is similar to an SEU, although different in that the SEFI changes the operation of the component. For example, A SEFI to a microprocessor may cause the internal state-machine to go to an illegal state or result in running code that is not intended for the current activity. Different components will have different vulnerabilities. For example, total dose radiation testing on COTS advanced CMOS components has been favorable, but SEFI testing has shown a vulnerability. SEFI may represent the largest single cost, schedule, and technical risk for a program inserting an advanced CMOS COTS component. This is due to the small geometries of the latest commercial components, the lack of designing for changes in internal state machines, the cost and schedule to perform a SEFI test, and the implications and mitigations required to handle SEFI. Related to SEFI is SET (Single Event Transient) which is typically associated with analog circuits. SETs usually originate from sensitive analog circuits, such as high gain op-amps or comparators. The issue with SETs is that unless properly accounted for, the "glitch" from the SET can propagate and confuse the control algorithm using the data. An excellent example is the OP467 op-amp. While there is TD radiation data for the component, there is no SET data, yet the on-orbit performance clearly shows SET effects, as do other op-amps from the same technology. Latchup is a condition caused by high energy particles that causes an unintended low impedance path in a component. The latchup event looks like an activated parasitic path in a transistor structure that will remain until power is removed, affecting the functionality of the component. The latchup event typically draws more current than the component is rated for and may result in component damage. SEGR is a damaging effect of a CMOS transistor gate from a high energy particle. SEGR is a function of the voltage stress on the component, increased voltage relative to the component capability equates to a more likely SEGR event.

Understanding the intent of the radiation testing and its effect on components is key to assessing the impact.

The following radiation questions are intended to surface the radiation related performance of a COTS component potential radiation impacts to the program:

- a) Does the component Total Dose (TD) capability data exist? If not, is it programmatically feasible to test for the data?
- b) Does the component have parametric shifts in the data? Are the shifts something that can be compensated for in the circuit design? Are system design compensations possible?
- c) Does the component SEU capability data exist? If not, is it programmatically feasible to test for the data?
- d) Does the component have an acceptable upset rate in the data (usually driven by the operational availability requirement or ground intervention requirement)? Are the SEU impacts within the allocation? How often must scrubbing or some mitigation be used to meet the mission need? Is the SEU performance something that can be compensated for in the circuit design? Are system design compensations possible?

- e) Does the component SEFI capability data exist? If not, is it programmatically feasible to test for the data?
- f) Does the component have an acceptable upset rate in the data? Are the SEFI impacts within the allocation (ground intervention)? How often must scrubbing or some mitigation be used to meet the mission need? Is the SEFI performance something that can be compensated for in the circuit design? Are system design compensations possible?
- g) Does the component latchup capability data exist? If not, is it programmatically feasible to test for the data?
- h) Does the component latchup? If so, is the latchup performance something that can be compensated for in the circuit design? Are system design compensations possible?
- i) Does the component SEGR capability data exist? If not, is it programmatically feasible to test for the data?
- j) Does the component have SEGR potential? Are the SEGR impacts within the allocation? Is the SEGR performance something that can be compensated for in the circuit design? Are system design compensations possible? Since SEGR can be damaging, can the component be de-rated appropriately without affecting component performance?

### 5.1.2 Useful Life Questions

Useful life is not unique to space components. The challenge is COTS suppliers typically use Statistical Process Control (SPC) to optimize the component yield and consistency. It may be difficult to obtain *independent* verification of the useful life for the COTS component. Many COTS components are not tested for radiation tolerance, and this may affect useful life. Figure 7 shows the typical reliability "bathtub" curve. For any component (not just COTS) subjected to radiation, the useful life is the relevant item to consider. It should be noted that radiation increases the component wear-out mechanisms (dielectric breakdown). It should also be noted that for COTS components that have not gone through burn-in testing that the components may show infant mortality later in the integration cycle as compared to Mil-Spec components).



Figure 7. Useful life with radiation.

For Mil-Spec components, High Temperature Operational Life (HTOL) testing has been used as a predictor of reliability, typically the test data is gathered as part of the component lot burn-in testing. There are drawbacks to this testing. Component failure during HTOL testing is not indicative of the wearout mechanisms for Time Dependent Dielectric Breakdown (TDDB), Bias Temperature Instability (BTI), or for Electromigration (from Hot Carrier Injection (HCI)). A component failure during burn-in is typically from a manufacturing defect, not a true dielectric breakdown or metal migration failure. For TDDB, the basic accelerators are temperature and voltage. For COTS components, a typical maximum junction temperature limit is 105 Deg C. This greatly limits the ability to test a COTS component at a high enough temperature to get accelerated failure data. For BTI, the accelerators are temperature and voltage. The same constraints for accelerated BTI testing are the same as for TDDB. For HCI, voltage and frequency are the accelerators. The typical HTOL test runs between 1 to 10 Mhz. Much, much less than the operating frequency of the COTS component in its' application (often greater than 1 Ghz). Therefore, the HTOL testing is greatly underestimating life/reliability effects due to HCI (and its effect on metal migration and junction temperature). It is for these reasons there are experiments ongoing to determine other methods of finding the useful life of COTS components.<sup>24</sup> The single largest message here is to pay close attention to the datasheet limits. Mil-Spec components used to be designed to operate at junction temperatures of 125 Deg C for over 20 years (the sizing of the metal interconnects, the thickness of the dielectric). That cannot be said of all components. Texas Instruments publishes in some of their COTS component datasheets Power on Hours (POH). The range of POH is between 100,000 and 200,000 hours (11 to 23 years). For a typical RHA Mil-Spec component, the average useful life (with radiation) is between 40-100 years, as the metallization and dielectric were designed for 20 years @ 125 deg C and maximum frequency. Derated operating conditions extend the life past the 20-year design life. Typically, for every 10 Degrees Celsius reduction in junction temperature, the useful life doubles. Reliability (vs. useful life) calculations based on HTOL testing arrive at Failure in Time (FIT) values that do not recognize the lack of acceleration (from frequency, voltage, and temperature) from testing and do not represent useful life (how long with the component function in the circuit).



Figure 8. Useful life FIT rate including radiation.

Seldom does a program use the useful life from radiation life limitations to form an equivalent FIT rate *for the environment* but ignores the useful life effects of radiation and instead only uses the non-radiation FIT rate. For example, a 50K RHA comparator may only be good for 10 years at 5Krad (Si) per year. The *effective* FIT rate for the *environment* for a 10-year mission is 11,400, not 37. This was determined by setting the mean time to failure (component ceases to function in the circuit) to 10 years and backing out the associated effective FIT rate.

The resulting useful life of a component (COTS or Mil-Spec) should consider the useful life limitations of the radiation environment, shown in Figure 8. The space community needs to carefully consider how to use components that COTS suppliers designed to last 11 years at reduced thermal conditions (105 Deg C) from typical Mil-Spec environments (125 DegC) when radiation is considered.

When assessing a COTS component potential useful life impact to the program, the useful life questions to ask are:

- a) What is the required lifetime of the component vs the as designed component?
- b) What is the expected usage temperature of the component vs the datasheet maximum?
- c) What is the expected voltage(s) of the component vs the datasheet maximum and minimum?
- d) What is the expected frequency of the component vs the datasheet limit?
- e) If HTOL test data was used to develop the component FIT rate, what was the frequency, voltage, and temperature used for the test vs the expected application frequency, voltage, and temperature?
- f) How will deratings for frequency, temperature and voltage affect the applications?
- g) Does the component supplier perform burn-in testing?

- h) Does the supplier have other components (simpler structures) that can be tested to show the process useful life?
- i) What is the component FIT rate and the rationale?
- j) Is the supplier willing to share their design life targets, SPC data, or life test data? This may require a Non-Disclosure Agreement (NDA).
- k) Does the program development plan (schedule, cost) allow for independent component testing or extended EM or FLT unit testing?

#### 5.1.3 Manufacturing Questions

The COTS challenge for manufacturing is multi-faceted. The COTS components in use today use manufacturing processes and tolerances that are not approved for space use (by requirements). Space equipment has unique requirements, such a reworkability, no tin coatings, electrical testing, plastic package handling, knowledge of the source processing and reliability to name a few to ensure as few component escapes as possible. The Mil-Spec process consists of many tests and inspections that are not performed for COTS components. This section uses the Mil-Spec processes as a guide to foster asking the appropriate questions. For COTS component, some of these Mil-Spec processes won't apply, but are included not knowing what component the designer will choose or its application. Many of the Mil-Spec tests are done to prevent a discrepant component from being installed, as the penalty for a latent failure can be large (cost and schedule). Below is a list and short description of each type of evaluation or feature to use for guidance. It should be recognized that applying Mil-Spec type testing to COTS may require modifying the limits of the test in accordance with the component datasheet limits.

- a) Plastic packages create a new set of manufacturing process requirements when being used for the space environment that are generally not standard for the space industry. Several of the tests listed below do not apply to plastic package (such as RGA, PIND, Hermeticity). Plastic packages have been used in space and there is a JEDEC standard to apply (refer to JEDEC 6294 /1 /3).
- b) Component reworkability is the ability to be able to replace any component in an assembly without damaging other areas of the assembly. Using a component that cannot be reworked may lead to scrapping of an entire board should that component have anomalies. While the cell phone industry can throw away a non-conforming board (due to volume), the space industry builds a very small number of boards, with very expensive components and invested time.
- c) It is important to understand if a component is 100% electrically tested from the supplier. Most Mil-Spec components are 100% electrically tested. If a COTS component is not 100% electrically tested, do the items that are not tested matter to the circuit? If the COTS component is not 100% electrically tested, it is likely that it has not been burn-in tested, which is effective at removing infant mortal components. The untested, but guaranteed, parameter may be timing related or a drive voltage or leakage current that may be relevant to the in-circuit use<sup>30</sup>.
- d) It is expected that most COTS component are Reduction of Hazardous Substances (RoHS) compliant, they will have no lead in the solder, instead using pure tin. For a long duration program, this can be a significant life issue due to tin whisker growth. How the tin is deposited and where is very important for the potential mitigations to be effective. The failure mode from pure tin does not need to be a short from one component to another. Pure tin on component leads that cannot be conformally coated present a shorting potential to other leads on the same component.
- e) The purpose of the RGA test is to determine if impurities or contamination is present. Impurities or contamination (like H<sub>2</sub>O) can lead to degraded useful life due to internal package corrosion.

- f) PIND testing is used for determining if there are any Foreign Object Debris (FOD) in the package cavity. The presence of FOD may lead to premature component failure as the FOD may move under vibration or no gravity conditions.
- g) Bondpull testing is used to show that the wirebonds from the die to the internal package pads or hybrid substrate are done correctly. The bondpull test (both destructive and non-destructive) test the bondpull strength to assess if the bond has any contamination which could compromise the bond or pad adhesion physically or electrically. A weak bond may be an indication of contamination or inadequate bonding time. Both can lead to premature component failure. Failure of a bondpull test after burn-in or temperature testing should be cause for concern as it is likely there is contamination (Bromine, Fluorine) that will, over time, result in bond failure. Components that use dissimilar metals (bond pad and wire) are more likely to suffer from poor bond quality.
- h) Another indicator of component quality is the component's ability to survive shock testing. Shock is a short duration mechanical pulse that can cause deflection (bending) or movement within a component. It is important to understand if the COTS component will be powered on and operational in its application, as this affects the type of shock testing performed. Relays, crystal oscillators, large chip capacitors and resistors, and hybrids are sensitive to shock events. Shock sensitivity is an item where analysis is a poor predictor of performance. The shock spectra at the component may be very different from the shock spectra at the baseplate of the unit, so it is necessary to test in the application in like flying conditions to understand the component susceptibility.
- i) Related to shock sensitivity is vibration sensitivity. Vibration is a longer dynamic event, basically shaking the unit. Relays, crystal oscillators, large chip capacitors and resistors, and hybrids are sensitive to vibration events.
- j) Burn-in testing by the supplier is an effective method of removing infant mortal components. If the supplier does not do burn-in testing, higher than Mil-Spec fallout should be expected<sup>30</sup>.
- k) Documentation is a key component to assess a components quality. Inspection of the lot travelers allows the quality of the component to be assessed independently by the program. Without this data, it becomes a lot harder to assess the COTS component quality.
- Many of the COTS components today use very small geometries to increase function while using less space. These small geometries may also result in components that are more sensitive to Electro-Static Discharge (ESD). Knowing the ESD sensitivity is important for not only how to handle the component during manufacturing and assembly, but also the components sensitivity in the application from apace discharge events (which can be minimized, but not eliminated).
- m) Pressure testing of components indicates that the materials used will not fail due to dielectric breakdown failures from voltage.
- n) Another measure of a component's quality is its solderability. Solderability is a measure of the ease with which a soldered joint can be made to the lead material (the solder's wetting properties). This affects the manufacturability of the component with the board. The better the solderability, the less time and heat required to adequately solder the component to the board the first time. Poor solderability can be an indication of substandard materials or contamination.
- o) Lid seal is a measure of the component's hermeticity and is done by placing the component in a vacuum and measuring the leak rate of any expelled gas (similar to the RGA test).
- p) Hermeticity is important to ensure that contaminants can't get into the package over time and compromise the component's useful life.
- q) Another useful test for components bound for space is radiography. Radiography is a nondestructive inspection of the internals for defects of a packaged component that would otherwise

not be visible post lid seal. This allows a qualitative assessment of the package internal quality after the component has completed processing.

- r) A component's ability to resist solvents is important as many solvents are used in the processing of electronic PWBs. The resistance to solvents test shows that solvents do not cause anomalous behavior, mechanical damage, or deteriorate the finishes (such as the markings) of the component. Reaction to solvents is an indication of potential sub-standard materials or a counterfeit component.
- s) Scanning Acoustic Microscopy (SAM) is a test that uses sound waves to detect internal delamination, internal voids, or non-homogeneous material density. Anomalies found during a SAM test may be signs of poor workmanship, substandard materials, or counterfeit components.
- t) Moisture testing determines the effect of moisture on the component. Moisture testing indicates if the moisture environment (humidity) affects the component in a negative way. Negative effects may be corrosion, organic growth, and package absorbed moisture (which may affect manufacturability, such as "pop-corning").
- u) Package lead testing is important to determine the lead finish and integrity. This test measures the resistance of the lead to metal fatigue, and the peel strength of solder pads. Poor performance in the lead testing may be an indication of substandard materials, poor bonding (soldering) of the lead to the package, contamination, poor pad adhesion to the package or counterfeit components.
- v) Another of a myriad of quality informing tests is the die shear test. The die shear test measures the amount of force required to force the die from its mounting. A poor die shear test result may be an indication of substandard materials, poor die bonding (epoxy, eutectic, solder), contamination during the assembly, or a counterfeit component.
- w) The Lid Torque test is similar to the die shear test, but this time the shear strength of the lid seal to the package is the focus. A poor lid torque test result may indicate substandard materials, poor lid seal (seam or solder), contamination during the assembly, or a counterfeit component.
- x) The column pull test is specific to column grid and ball grid array packages. The column pull test measures the pull strength of the column or ball attached to the package. Poor performance in this test may indicate substandard materials, poor column or ball bonding (column, ball, or pad failure), contamination during the assembly, or a counterfeit component.
- y) Mil-Spec component assembly at the supplier typically has on-site inspection as a condition for providing Mil-Spec components. An on-site external visual inspection is typical for a Mil-Spec component that is not expected for COTS components. Buyers of COTS components (and their customers) need to understand that such intrusive inspections on a COTS line may not be possible.
- z) Similar to the external visual inspection is the pre-cap inspection. Per-cap inspection is an inprocess inspection prior to placing and sealing the lid on the component or soldering on the edge cap (particularly important for complex microelectronics, hybrids, optical components, Multi-Layer Ceramic Capacitors (MLCC), and large resistors) in a manner easy to pick out obvious assembly defects. The pre-cap inspection can protect against used, cloned and counterfeit components as well as reject obviously defective components.
- aa) Internal visual inspection is another in-process step performed on site (like external visual and pre-cap) to gain knowledge of the component quality during its processing and assembly and provides an opportunity for rejection of obviously defective, used, cloned and counterfeit components.
- ab) Finally for the manufacturing tests and methods, Scanning Electron Microscopy (SEM) is used to determine interconnect metallization. This test is a low-level indication of the process quality after wafer fab.

The intent of the manufacturing questions is to surface issues that COTS components present to the space manufacturing capabilities and ruggedness requirements. When assessing a COTS component potential manufacturing impacts to the program, the manufacturing questions to ask are:

- a) Is the component in a plastic package?
- b) Is the component package hermetically sealed?
- c) Is the component such that reworkability may not be possible?
- d) Is the component 100% electrically tested?
- e) Is the component burn-in tested?
- f) Is the component a flipchip or Wafer Level Chip Scale Packaging (WLCSP)
- g) Does the component have pure tin leads?
- h) Does the component have a Residual Gas Analysis (RGA) report or data?
- i) Does the component have a Particle Impact Noise Detection (PIND) report or data?
- j) Does the component have a bondpull report or data?
- k) Is the component shock tested? To what level and is there test data?
- 1) Is the component vibration tested? To what level and is there test data?
- m) Are lot travelers for the component available for review?
- n) Is there Electro-Static Discharge (ESD) data available for the component?
- o) Is there barometric pressure data available for the component?
- p) Is there solderability data available for the component?
- q) Is there package lid seal data available for the component?
- r) Is there radiography data available for the component?
- s) Is there resistance to solvent data available for the component?
- t) Is there Scanning Acoustic Microscopy (SAM) data available for the component?
- u) Is there moisture resistance data available for the component?
- v) Is there lead finish data available for the component?
- w) Is there die shear data available for the component?
- x) Is there package column pull data available for the component?
- y) Is there an external visual inspection step in the processing or data available for the component?
- z) Is there pre-cap (soldering of the lid) inspection step in the processing or data available for the component?
- aa) Is there an internal visual inspection step in the processing or data available for the component?
- ab) Is there Scanning Electro Microscopy data available for the component?
- ac) Does the component (or unit) contain grease, plastic, bonding material or loctite, polyvinylchloride (PVC), or natural rubber that can outgas?
- ad) Does the component (or unit) contain zinc, or cadmium or any other material that sublimates?
- ae) Does the component (or unit) contain split, star, or tooth type lockwashers that may generate metallic FOD?
- af) Does the component or unit contain hazardous materials (magnesium, beryllium, mercury, or selenium).
- ag) Does the component (or unit) contain solder fluxes other than ROL0 or ROL1 type?
- ah) Does the component (or unit) contain titanium where chlorinated solvents, chlorinated cutting fluids, anhydrous methyl alcohol, or fluoridated hydrocarbons were used in the production of the component (or unit)?
- aj) Does the component have any formable or flexible leads that use electroless nickel plating?

### 5.1.4 Trust Questions

The COTS challenge when it comes to trust is about information or experience. The first usages of COTS with little information from suppliers will require a new level of focus and care not experienced with Mil-Spec components. Repeated positive experience may be needed to "trust" the supplier on their unshared proprietary information and their handling of components with potential US security implications.

The intent of the trust questions is to surface issues that COTS components present to the space trust in quality knowledge and technology transfer to foreign entities. When assessing a COTS component potential trust impacts to the program, the trust questions to ask are:

- a) Is the component foreign sourced, this could include being a foreign owned company, foreign manufacture, foreign package, assembly and test, or foreign sourced IP for critical functions?
- b) Are foreign nationals operating the manufacturing line?
- c) Does association with the component present risk of transferring knowledge of the system or its intent (personnel)?
- d) Is there risk of transferring the technical details of the component (FPGA, ASIC) to a foreign entity or transferring knowledge of the system and its intent (personnel)?
- e) Does the component have heritage in other applications other than spacecraft?
- f) Are the other applications for a heritage component relevant to the space application?
- g) What is the level of confidence that the component will do what is specified and nothing else? What is this confidence based on?

#### 5.1.5 Environment Questions

The environmental questions for using COTS in space is complex as there are multiple environmental pressures (aging while under radiation, thermal, vacuum, EMC/EMI) where the long-term data for the COTS component is non-existent, proprietary (making access difficult), or only partially available, producing gaps in information.

The intent of the environmental questions is to surface issues that COTS components present to their usage to allow informed decisions where there may be incomplete information. When assessing a COTS component's potential environment impacts to the program, the environment questions to ask are:

- a) Is there data for the component to show how the component ages under radiation?
- b) Is there data to show the aging effects of the component?
- c) Is there data to show the effects of temperature on the performance of the component?
- d) Is there data to show that the component is compatible with a vacuum environment?
- e) Is the component vacuum sensitive? If so, what is the effect?
- f) Are there unique storage or handling requirements for the component?
- g) Is the component sensitive to EMC/EMI?

#### 5.2 COTS Mitigations for Flow Section E

The following mitigations are not meant to be an all-encompassing set, but a good set of mitigations gathered for vulnerabilities for a COTS component. Other mitigations in other topic areas may apply. Since the capabilities of COTS components changes rapidly, it is expected other mitigations will be recognized for new technologies as they become available. This section is organized in the same order as the scope questions, radiation (1), useful life (2), manufacturing (3), trust (4) and environment (5). For quick reference, Tables 5.2.1-1 through 5.2.5-1 provide the mitigation identifier "(numberletter)" for each

mitigation topic, for example, local shielding (1a). When considering the potential mitigations, duplication of tests on components should be avoided to prevent component overstress (burn-in, for example).

#### 5.2.1 Radiation

The intent of the radiation flow is to understand the scope of the radiation parameters of the COTS (or any non-RHA) component that may need mitigation. The radiation section covers Total Dose (TD), Single Event upsets (SEUs), Single Event Functional Interrupts (SEFI), Latchup, and Single Event Gate Rupture (SEGR) which also encompasses Single Event Burnout (SEB). Figure 9 shows the radiation mitigations flow. For example, potential mitigations for TD can be found in but are not limited to mitigations 1a-f, which are each described with the corresponding identifier in the section title. Table 20 shows the potential radiation mitigations.

| Mitigation | Radiation                                |
|------------|------------------------------------------|
|            | Total Dose Radiation                     |
| 1a         | Local Shielding                          |
| 1b         | Power Strobing                           |
| 1c         | Increased Redundancy                     |
| 1d         | N for M redundancy                       |
| 1e         | Multiple Images                          |
| 1f         | Multiple components in parallel          |
|            | Single Event Upsets (SEU)                |
| 1g         | Periodic Refresh Period                  |
| 1h         | Error Detection And Correction (EDAC)    |
| 1j         | Triple Modular Redundancy (TMR)          |
| 1k         | FPGA based scrubbing                     |
| 11         | Zener Diodes/clamps/Filters              |
| 1m         | Software Rollback                        |
|            | Single Event Functional Interrupt (SEFI) |
| 1n         | Local Refresh                            |
| 10         | Component Reset                          |
| 1p         | Power Cycle                              |
| 1q         | CONOPS/System                            |
|            | Latchup                                  |
| 1r         | Current limiting                         |
| 1s         | Swap/Power Cycle (OBFM)                  |
| 1t         | Auto Power Cycle (hardware)              |
|            | Single Event Gate Rupture                |
| 1u         | Conservative Derating                    |

Table 20. Potential Radiation Mitigations



Figure 9. Radiation mitigations flow.

#### 5.2.1.1 Total Dose

There are multiple potential mitigations identified for COTS component TD shortfalls. These can be found in mitigations 1a through 1f (below).

#### 5.2.1.1.1 Local Shielding (1a)

For components with a radiation total dose (TD) vulnerability, local physical shielding (aluminum or tantalum) may be used to reduce the effective mission dose. When physical shielding alone is insufficient or is not even feasible, architectural mitigations may be considered to limit total dose effects. Lot testing can be very helpful to determine how much shielding is required for a given lot.<sup>2, 5, 21, 24</sup>

### 5.2.1.1.2 Power Strobing (1b)

Power strobing can be used for circuits that need to be active for a short time duration (valve driver transistors, EEPROMs, or FLASH (NAND or NOR) memory during system boot-up). Implementing the power strobe function requires additional circuitry but may allow the use of a component that is vulnerable to radiation when under bias and where removing that bias extends the radiation tolerance. Power strobing can also be used for minimizing the power used in a system by turning circuits off that are not needed for the particular mission function or phase.<sup>2, 24</sup>

#### 5.2.1.1.3 Increased Redundancy (1c)

To address useful life concerns, some programs have flown down redundancy requirements. To address the redundancy requirements, it is necessary to implement redundancy at the various element levels of a design. For example, it is common for PWBs to have both primary and redundant circuits. It is also common for redundancy to exist at the unit or sub-system level. For example, a Navy satellite has three on board computers, so redundancy at the unit level, hot, warm, and cold. Also, at the component level, Triple Modular Redundancy (TMR) is implemented in FPGAs to mitigate the effects of latch-up.

With increased redundancy, where one element is used, but multiple (unbiased or powered off) copies are used, physical isolation is key to prevent propagation of failure modes that would impact the redundant element. For PWBs with prime and redundant circuits, the IPC (2221) standard is not sufficient. The IPC isolation guidelines do not recognize the single point failure implications and increased separation that is required to protect the circuits from single random defects. This concept can be extended from circuit boards to units too. For circuits and assemblies that utilize redundancy, semiconductors and relays provide physical isolation that will prevent the propagation of a failure mode.<sup>12, 24, 26</sup>

### 5.2.1.1.4 N for M Redundancy (1d)

This is a method for increasing the reliability (radiation or useful life limitations) of an element by using additional active or stand-by (powered or un-powered) elements. For example, three identical elements are in place where two elements are active with remaining element in a stand-by (un-powered) state.<sup>12, 24</sup>

### 5.2.1.1.5 Multiple Images (1e)

For non-volatile storage components, multiple images protected with error detection and correction (EDAC) and a program code checksum, with SW logic that selects the intact image. For non-volatile storage it is recommended that periodic refresh (rewriting the entire contents) be performed from every 3

to 10 years, depending on the technology used. It should also be recognized that some NAND or NOR FLASH technologies may not be suitable even with the above-mentioned mitigations.<sup>3, 24</sup>

## 5.2.1.1.6 Multiple Components in Parallel (1f)

This approach uses multiple components in circuit, one powered and active, others power off and inactive. When the active component fails (radiation for example) the next un-powered component can be used to resume the function. A potential mitigation for the usage of COTS ADC components is to architect the board to use multiple ADCs, buffered from one another, When one has degraded to the point of no longer being acceptable, a new one can be switched in without the overhead of an entire board. For example, put three ADCs in parallel, each buffered (input and output) and on a separate power switch, with each feeding the common digital interface. FSW may be used to select the next component in the string to minimize impact to the mission.<sup>2, 3, 8, 24</sup>

## 5.2.1.2 Single Event Upsets

There are several mitigations available to lessen the effects and propagation of SEUs, detailed in mitigations 1g through 1m. Most of the techniques have potential significant board, unit, subsystem, on board fault protection, and system impacts. Software rollback can be used for existing COTS units.

## 5.2.1.2.1 Periodic Refresh (1g)

The scheduling of a periodic reset period or refreshing of control registers has been in use for decades. Most processing designs have a basic processing cycle. The resetting of the unit can be external, if there is a reset input, or internal, provided there is a defined time in the processing period that can accommodate the downtime. Setting up a dedicated time in the processing cycle to refresh addresses and control registers of susceptible components may result in reducing the vulnerable period-of-time for multiple errors to occur to an acceptable level. This technique has implications for the hardware, traffic schedule, FSW, and the on-board fault protection. The refresh period can also be used to detect any processing overrun conditions from the previous processing cycle as well as resetting key interfaces (which can be monitored using data wrap tests). This technique has been used successfully on Class A national asset vehicles.<sup>12</sup>

### 5.2.1.2.2 Error Detection and Correction (1h)

For memory, EDAC helps, but may not be a full-scope solution depending on the expected error rate. The robustness of EDAC can be increased by using byte or nibble EDAC. Additional error scrubbing can be achieved if control registers are refreshed when a single bit error is detected (update the control register and write the corrected word back to memory) during that processing cycle. EDAC is a board and FPGA level technique. It is important to consider protection for the memory used in FPGA IP (Intellectual Property) functions. Many IP functions have an EDAC option that can be enabled during configuration. EDAC is HIGHLY recommended for FLASH memories.<sup>2, 7, 11, 21, 24, 29</sup>

# 5.2.1.2.3 Triple Modular Redundancy (1j)

Another technique is to use triple modular redundancy (TMR) at either the component or functional level, with the voter (two of three) determining the correct output(s). It is important that the voter function not create a single point of failure. The functions being voted must be synchronized to avoid glitchy outputs from the voters or the stage that receives the signals from the voters must be tolerant to glitches (not edge sensitive). Since the TMR approach "masks out" a function with an error, it is important to provide a

method to verify the hardware has no latent defects, i.e., it must be shown that all three voter functions and voters are not defective. It is also recommended to have feedback from the voters to allow periodic refresh of the functions feeding the voters. The feedback mechanism should be designed so as not to defeat the TMR intent (no SPFs). While effective, this technique impacts size, weight, power, and cost (SWaPC) of the circuit. TMR is a unit level design choice and is only realistically feasible for 5yr+ missions due to development time. This technique has been used successfully on many class A programs.<sup>2, 7, 11, 12, 21, 24</sup>

## 5.2.1.2.4 FPGA-based Scrubbing (1k)

A more invasive technique is to use an FPGA with internal EDAC to scrub the memory independent of the processor (shares the memory bus). This technique is a board level technique and may reduce the effective processor throughput. This technique may also be used when the FPGA is the memory controller (scrub is done without the processor's knowledge).<sup>24</sup>

### 5.2.1.2.5 Zener Diodes Clamps and Filters (1I)

For simpler analog components, the use of zener diodes has been used to clamp the output of an analog circuit to prevent overvoltage damage downstream induced by an SEU. Zener clamping is a board level technique for analog type components. Op-amps also suffer from high energy particle effects, so to protect downstream circuits, adding zener diodes or RC (resistor/capacitor) filter may be a mitigation against railing the output (both positive and negative). Depending on the frequency sensitivity of the circuit, bandwidth limiting filtering may follow an op-amp to further mitigate the impact of Single Event Transients (SETs). Other techniques that can be implemented at the subsystem or system level is digital filtering. The analog circuit may really require high gain or bandwidth for its function, so digital filtering (application of thresholds or sanity checks) by either the downstream hardware or software can filter out readings beyond possible by the hardware sample-to-sample time to prevent large SET caused sensor excursions from reaching the control algorithm.<sup>21, 24</sup>

## 5.2.1.2.6 Software Rollback (1m)

Radiation hardened by software (RHBSW) techniques such as the implementation of periodic checkpoints and restart techniques for the processor to roll-back to previous saved state in the event of fault. These checkpoints can be written to on-board storage and consist of the processor's critical configuration, registers, and memory content, as well as current state of the application(s) that help with successful recovery of the processor to its previous state. RHBSW is for new or existing COTS units, but can be impactful on memory, throughput, and processing timeline.<sup>24</sup>

### 5.2.1.3 Single Event Functional Interrupt

The SEFI mitigations, 1n through 1q, are a mixture of component and unit mitigations. Reset can be either a component or unit mitigation but may have processing timeline impacts. Power cycle mitigation has potential impacts to the processing timeline and fault management. These mitigations at the component level may introduce sneak paths that need to be managed. CONOPs mitigations are best done at unit level but has scheduling and fault management impacts.

### 5.2.1.3.1 Local Refresh (1n)

The component can be locally refreshed to clear the SEFI as discussed above for SEUs, with a periodic refresh.<sup>12, 24</sup>

#### 5.2.1.3.2 Component Reset (10)

A reset to the component is the best way to ensure the effects of the SEFI are removed as internal registers may have changed, followed by re-initialization. A component level reset can be designed in if the register that is vulnerable is not accessible. A watchdog timer or wraparound test can provide the necessary monitor of off-nominal performance.<sup>2, 11, 12, 23, 24</sup>

#### 5.2.1.3.3 Power Cycle (1p)

A local power switch to the component can be used, which may need to be followed by a reset, or the assembly (or sub-circuit) may be designed to swap autonomously. Consider the usage of a watchdog timer on the COTS or provide a path to the COTS for a wraparound test.<sup>2, 11, 12, 16, 23, 24</sup>

### 5.2.1.3.4 CONOPs or System Action (1q)

CONOPS can be used to disable or turnoff sensitive (generally payload) equipment during vulnerable time periods in the orbit (like over the South Atlantic Anomaly, SAA). Resiliency for COTSs may also occur at the system level (multiple vehicles to spread the system **risk**).<sup>2, 7, 8</sup>

### 5.2.1.4 Latchup

Latch up is more severe than SEFI or SEU as it can lead to irreversible damage, so mitigations tend to be quick reaction (hardware) mechanisms. Mitigations 1r through 1t discuss three options to prevent latchup from turning into permanent damage, which redundancy will not solve.

### 5.2.1.4.1 Current Limiting (1r)

The simplest form of current limiting circuit (using a series resistor on the supply pins) and the ability to power cycle the assembly. Other techniques use an active current limiting circuit that can notify the fault protection of a current limit trip.<sup>2, 16, 23, 24</sup>

#### 5.2.1.4.2 Swap or Power Cycle by Fault Protection (1s)

An architectural mitigation is to have the current limiting circuit feed the On-Board Fault Protection (OBFP) system the current status and for the OBFP (hardware or software, depending on the time requirements for a swap) turn off the latch-up assembly and enable the backup assembly.<sup>2, 12, 16, 23, 24</sup>

### 5.2.1.4.3 Auto Power Cycle by the Hardware (1t)

Another technique is to design in auto power cycle with a hot-backup, active-redundant, swap architecture at the Circuit Card Assembly (CCA) level (N for M active redundancy). Designing the power supplies to support this kind of functionality from the start will widen the choice of processors (or other sub-assemblies that are SEL sensitive) while limiting mission downtime. The swap process should accommodate all combinations automatically so that the processor can switch to the desired backup processor without ground intervention.<sup>12, 23, 24</sup>

#### 5.2.1.5 Single Event Gate Rupture

For components susceptible to SEGR, the only real solution, discussed in mitigation 1u, is conservative component usage, with conservative deratings. A COTS unit with a component vulnerable to SEGR may not be mitigatable except by shortened life expectations or N for M redundancy.

#### 5.2.1.5.1 Conservative Derating (1u)

For components that have a SEGR vulnerability, the typical mitigation is a very conservative derating of the drain to source voltage parameter, typically for discrete MOSFETs. For example, MOSFET transistors can experience gate rupture that is dependent on the drain to source voltage. Limiting the applied drain to source voltage (Vds) to 40% or less results in effective mitigation, therefore the architectural mitigation is to limit the Vds applied in the design. FLASH memory technologies may be vulnerable to SEGR during programming operations which must be considered in the mission operation and life requirement. It should also be recognized that some NAND or NOR FLASH technologies may not be suitable even with the above-mentioned mitigations due to internal programming voltages that may exceed SEGR limits.<sup>8, 9, 11, 21, 23, 24</sup>

#### 5.2.2 Useful Life

The intent of the useful life flow is to understand the scope of the knowledge needed to assess useful life of the COTS component that may need mitigation. The useful life section covers HTOL temperature, HTOL voltage, HTOL frequency and SPC data. Figure 10 shows the useful life mitigations flow. Table 21 shows the potential useful life mitigations.

| Mitigation | Useful Life                                                     |
|------------|-----------------------------------------------------------------|
|            | HTOL T ~ usage (Acquire test data for guidance)                 |
| 2a         | Life test at usage temp                                         |
|            | HTOL V ~ usage (Acquire test data for guidance)                 |
| 2b         | Life test at usage voltage                                      |
|            | HTOL F ~usage (Acquire test data for guidance)                  |
| 2c         | HCI or EM: Life test at usage frequency                         |
|            | No HTOL or reliability data                                     |
| 2d         | Extensive accelerated testing, extensive EM or FLT unit testing |
| 2e         | Focused accelerated testing of the COTS                         |
| 2f         | Focused testing of simpler structure from the same process      |
| 2g         | Effects on system reliability                                   |

| Table 21. Potential Useful Life Mitigations |
|---------------------------------------------|
|---------------------------------------------|



Figure 10. Useful life mitigations flow.

#### 5.2.2.1 High Temperature Operational Life Temperature

There are several mitigations available to aid in determining the useful life of a component, detailed in mitigation 2a through 2g. Most of the techniques are component level testing techniques, but mitigations 2d through 2f can be done at the board or unit level to test the components in question past the burn-in period to attempt to limit infant mortality failures occurring during vehicle integration.<sup>30</sup>

#### 5.2.2.1.1 Accelerated Life Test for Temperature (2a)

Perform post purchase HTOL test at temperature that allows for limited life acceleration relative to usage temperature (junction).<sup>5, 21, 24, 25</sup>

### 5.2.2.2 High Temperature Operational Life Voltage

#### 5.2.2.2.1 5.2.2.2.1 Life Test at Usage Voltage (2b)

Perform post purchase HTOL test at voltage that allows for limited life acceleration relative to usage voltage.<sup>5, 21, 24, 25</sup>

#### 5.2.2.3 High Temperature Operational Life Frequency

#### 5.2.2.3.1 Life test at Usage Frequency (2c)

Perform post purchase HTOL test at frequency that is the same as the application.<sup>5, 21, 24, 25,30</sup>

#### 5.2.2.4 No HTOL or Reliability Data

#### 5.2.2.4.1 Extensive Accelerated Testing, EM or Unit Testing (2d)

Many commercial suppliers do not do burn-in testing to weed out the infant mortal components. A potential method used in the past is to perform the infant mortality testing at the board or unit level through increased unit and board test time, depending on the program schedule, to put many operational hours on the components. Consider trending key parameters of the COTS (component or unit) vs a nominal analysis. Consider using BOL limits for unit test limits.<sup>11, 16, 21, 25, 30</sup>

#### 5.2.2.4.2 Focused Accelerated testing of the COTS Component (2e)

Many commercial components are not HTOL tested. Statistical process control is used, which is about component consistency and less about useful life. Consider the focused testing of the COTS to include increased temperature, voltage and frequency to determine useful life. Consider conducting a component level life test (MIL-STD-883, Method 1005), thermal cycle testing, or step stress testing. Parameters to consider include timing, leakage current, output drive, slew rate, supply current, quiescent current, and input threshold sensitivity. Consider trending key parameters of the COTS vs a nominal analysis. Consider using BOL limits for unit test limits.<sup>4, 11, 16, 21, 23, 25</sup>

#### 5.2.2.4.3 Focused Testing of Like Process Structures (2f)

Many commercial components are not HTOL tested. Statistical process control is used, which is about component consistency and less about useful life. Consider the focused testing of a simpler component from the same process to include increased temperature, voltage, and frequency to determine useful life.

Parameters to consider include timing, leakage current, output drive slew rate, supply current, quiescent current, and input threshold sensitivity. Consider trending key parameters of the COTS vs a nominal analysis. Consider a non-disclosure agreement to obtain the process life data.<sup>18</sup>

### 5.2.2.4.4 Effects on System Reliability (2g)

When a COTS component is used, a functional FMECA may not surface failure modes that the system needs to be aware of. Consider a part level FMECA of the COTS component and those connected to the COTS component for a complete understanding for the system fault implications. The component FIT rate is used to support the system reliability analysis. What FIT rate is used and why (supporting data)? COTS suppliers will supply a FIT rate, but not the supporting data used to derive that FIT rate, so caution is appropriate. N for M redundancy is another way to augment the system reliability for components.<sup>4, 5, 11, 12, 16, 21, 23</sup>

### 5.2.3 Manufacturing

The intent of the manufacturing flow is to understand the potential mitigations that may be needed to address manufacturing aspects of the COTS component. The manufacturing section covers plastic packaging, reworkability, electrical testing, pure tin, package RGA, PIND, bondpull, shock, vibrations, burn-in, lot travelers, ESD, barometric pressure, solderability, lid seal, radiography, resistance to solvents, SAM, moisture, lead finish, die shear, lid torque, lead adhesion, column pull, external visual, pre-cap, internal visual, SEM, contamination, hazardous materials, and corrosive materials. Figures 11 through 18 show the manufacturing mitigation flows. Table 22 shows the potential manufacturing mitigations. Most of these mitigations are user implemented. Some mitigations mentioned, like in-process inspections, may not be possible, but the buyer can always ask.

| Mitigation | Manufacturing                                    |
|------------|--------------------------------------------------|
|            | Plastic package                                  |
| 3a         | Plastic package encapsulation                    |
| 3b         | Package hermeticity                              |
| 3c         | Repackage component                              |
|            | Reworkability (Demonstration required)           |
| 3d         | Risk reduction activity                          |
| 3e         | Put component on daughtercard                    |
| 3f         | Design as easy to replace                        |
|            | Electrical Testing                               |
| - 3g       | Enhanced part testing (test house)               |
| 3h         | Early risk reduction circuits                    |
| 3ј         | Board level test (application specific)          |
| 3k         | Unit level test (application specific)           |
| 31         | No component level burn-in testing               |
|            | Pure tin leads                                   |
| 3m         | Double layer of conformal coat                   |
| 3n         | Re-tin leads or replace package grid array leads |
| 30         | Solder Wicking                                   |
| 3р         | Fusing                                           |
| 3q         | Annealing                                        |
| 3r         | Matte Sn plating                                 |
|            | Package RGA                                      |
| <u>3s</u>  | Test component                                   |
|            | PIND                                             |
| 3t         | PIND test the component                          |
|            | Bondpull                                         |
| <u> </u>   | Bondpull test the component                      |
|            | Shock                                            |
| <u>3v</u>  | Shock test the component                         |
|            | Vibe                                             |
| 3w         | Vibration test the component                     |
|            | Burn-in                                          |
| 3x         | Perform infant mortality testing                 |
|            | Lot travelers                                    |
| Зу         | On-site inspection/observation                   |
| 3z         | Early Destructive and non-destructive testing    |
|            | ESD                                              |
| 3aa        | ESD sensitivity testing                          |
|            | Barometric Pressure                              |
| 3ab        | Pressure test the component                      |
|            | Solderability                                    |
| 3ac        | Test the component for solderability             |
|            | * *                                              |

Table 22. Potential Manufacturing Mitigations

|       | Lid Seal                                      |
|-------|-----------------------------------------------|
| 3ad   | Test the component for lid seal               |
|       | Radiography                                   |
| - 3ae | Radiographically test the component           |
|       | Resistance to Solvents                        |
| 3af   | Test the component for resistance to solvents |
|       | Scanning Acoustic Microscopy                  |
| 3ag   | SAM test the component                        |
|       | Moisture                                      |
| 3ah   | Moisture test component                       |
|       | Lead Finish                                   |
| 3aj   | Test the component for lead finish            |
|       | Die Shear                                     |
| 3ak   | Die shear test the component                  |
|       | Lid Torque                                    |
| 3al   | Lid torque test the component                 |
|       | Lead Adhesion                                 |
| 3am   | Test the component for lead adhesion          |
|       | Column Pull                                   |
| 3an   | Pull test the component                       |
|       | External Visual                               |
| 3ao   | On-site inspection/observation                |
| 3ap   | External visual post component receipt        |
|       | Pre-Cap                                       |
| 3aq   | On-site inspection/observation                |
|       | Internal Visual                               |
| 3ar   | On-site inspection/observation                |
| 3as   | De-lid, inspect component internally          |
|       | Scanning Electron Microscopy                  |
| 3at   | SEM test the component                        |
|       | Contamination                                 |
| 3au   | Mitigations 3a, 3c, 3m, 5n                    |
|       | Hazardous Materials                           |
| 3av   | Mitigations 3c, 5au                           |
| 3aw   | Handling                                      |
|       | Corrosive Materials                           |
| 3ax   | Separation of functions/containment           |
|       |                                               |

Table 22. Potential Manufacturing Mitigations (cont)



Figure 11. Manufacturing mitigation flow (1 of 8).

Figure 11 captures the manufacturing mitigations flow for plastic package, reworkability, electrical testing, and pure tin leads.

## 5.2.3.1 Plastic Package

Plastic packages present issues for manufacturing and vehicle cleanliness. There are established methods for handling plastic package components in manufacturing. This ATR refers to the established methods and describes systems-oriented mitigations to avoid contamination from outgassing to vehicle optics, described in mitigations 3a-c.

#### 5.2.3.1.1 Plastic Package encapsulation (3a)

JEDEC 6294 /1 /3 provides guidance for using plastic parts. A potential mitigation at the system level is deployable covers for any sensitive optics (Star-trackers, telescopes) to be opened after the plastics have outgassed (days to weeks). Refer to mitigation 5n.<sup>24</sup>

### 5.2.3.1.2 Package Hermeticity (3b)

If the package is non-hermetic, does it matter for the program application? For programs that vacuum sensitivity may be a concern (any voltage in the non-hermetic package above 12V, high current applications, or for performance of the functions that is affected by vacuum (RF)). Potential solutions are encapsulant (verified coverage), conformal coat (verified coverage), or internal unit positive pressure. In the cases of conformal coat or encapsulant, longer than normal bakeout may be required.<sup>24</sup>

#### 5.2.3.1.3 Repackage Component (3c)

Consider having the supplier provide a hermetic package. Another option is to procure the die and have a second party package the die. This has been done for programs to put the candidate die in a ceramic hermetic package.

#### 5.2.3.2 Reworkability

Reworkability allows re-use of a board (or unit) by reworking (repair or changing) a discrepant component. The increased usage of BGA/CGA packages has made reworking assemblies more difficult (particularly when tin leads are used). It is expected that COTS with high density packages will become more widely used (like flipchips). Mitigations 3d-f describe methods to address COTS reworkability.

### 5.2.3.2.1 Risk Reduction Activity (3d)

Consider a mock-up of the component in a condition like the application early and subject it to the stresses (mechanical, electrical, radiation) consistent with the program and then dis-assemble or perform a destructive parts analysis to determine if the stresses had an adverse effect. Another thing to consider for difficult to rework/replace components is a test article to practice the removal/replacement before the action is required on a flight board.

### 5.2.3.2.2 Daughtercard Implementation (3e)

Some components, like high density (small lead pitch) FPGAs or ASICS, flipchip packages, 3D packages, or plastic packaged components may be difficult to impossible to rework. Consider packaging these kinds of components on a daughtercard to mitigate the difficulty reworking or replacing the component and damaging an entire board. This is an acknowledgement that many commercial components are not burn-in tested so infant mortality is a non-trivial consideration for a difficult to rework component.<sup>30</sup>

## 5.2.3.2.3 Easy to Replace Design (3f)

Some components, like high density FPGAs or ASICS, or plastic package components may be difficult to rework. The ability to rework the component should be a consideration for COTS. Another thing to consider for difficult to rework/replace components is a test article to practice the removal/replacement before the action is required on a flight board.

## 5.2.3.3 Electrical Testing

Electrical testing is an option to determine the actual performance of a COTS component or for testing long enough to trend a parameter or to achieve some burn-in time prior to installation. Mitigations 3g-1 discuss options for augmenting electrical parameter knowledge.

## 5.2.3.3.1 Enhanced Component Testing (3g)

Many commercial components are not 100% electrically tested. Depending on the application, contracting with a component test company may be wise to characterize and pick the components appropriate for the application.<sup>2, 11</sup>

## 5.2.3.3.2 Early Risk Reduction Circuit (3h)

In some cases, it may be easier to setup a simple test board for the component of interest to characterize its' parameters that the commercial supplier does not test. It is recommended that the tested parameters that are important and not tested by the supplier are compared with the circuit analysis (both beginning of life and end of life worst case). Consider trending key parameters of the COTS vs a nominal analysis. Look for off-nominal behavior, not just pass/fail. Consider using BOL limits for unit test limits.<sup>2, 16, 24</sup>

# 5.2.3.3.3 Board Level Test (3j)

A technique used in the past is to test and measure the parameters of concern in the application circuit (at board level) and compare those measurements against the analysis. It is recommended that the tested parameters that are important and not tested by the supplier are compared with the circuit analysis (both beginning of life and end of life worst case). Additional test time may be required to pass the infant mortality period. Consider the use of JTAG (Joint Test Action Group) methods to verify solder joints for small lead pitch components (BGA, CGA, flipchip, 3D packages). Consider trending key parameters of the COTS vs a nominal analysis. Consider using BOL limits for unit test limits.<sup>2, 3, 4, 5, 9, 12, 16, 20, 23, 25</sup>

### 5.2.3.3.4 Unit Level Test (3k)

A technique used in the past is to test and measure the parameters of concern in the application circuit (at unit level) and compare those measurements against the analysis (may require external test points be designed in). It is recommended that the tested parameters that are important and not tested by the supplier are compared with the circuit analysis (both beginning of life and end of life worst case) and appropriate BOL test limits are set (not EOL). Consider testing over the entire temperature range, not just the plateaus, to fully test the COTS in the application. Monitor for parametric shifts (trending), not just pass/fail. Additional test time may be required to pass the infant mortality period. Also consider performance testing throughout the entire temperature range (on the EM) to weed out surprise parameter performance (advisable for new developments). Consider trending key parameters of the COTS vs a nominal analysis. Consider using BOL limits for unit test limits. For flight units, 500+ test hours may be needed to weed out infant mortals. For shorter mission (<3 years) consider Board Level Screening (BLS) of thermal, electrical and dynamics (TED) to provide confidence of function over the entire test range

(refer to COTS Card Unit Level Char Guidance). For longer duration programs, consider HALT/HAST (Highly Accelerated Life Test/Highly Accelerated Stress Test) testing. Implementation of perceptive "test like you fly" is key to ensuring the COTS component is performing as expected.<sup>2, 5, 9, 11, 12, 16, 20, 23, 25, 30</sup>

## 5.2.3.3.5 Lack of Component Level Burn-in Data (3I)

For components from commercial suppliers that do not perform burn-in testing, the risk of an early component failure must be weighed against the impact of reworking the failed component. For shorter mission (<3 years) consider Board Level Screening (BLS) of thermal, electrical and dynamics (TED) to provide confidence of function over the entire test range (refer to COTS Card Unit Level Char Guidance). For longer duration programs, consider HALT/HAST (Highly Accelerated Life Test/Highly Accelerated Stress Test) testing. Consider testing over the entire temperature range (trending), not just the plateaus to fully test the COTS in the application. Monitor for parametric shifts, not just pass/fail. Also consider the number of hours of testing before there is confidence of passing the infant mortality period, which will be dependent on the component junction temperature in the application (vs the standard acceleration junction temperature) Some acceleration can be achieved at the board or unit level. Consider a key test point to allow monitoring of a key parameter (board or unit level) that may shift during the infant mortality period to allow trending. Consider building spare tested boards to preserve schedule should an infant mortality failure occur. <sup>2, 5, 9, 12, 16, 20, 23, 25, 30</sup>

## 5.2.3.4 Pure Tin Leads

The use of pure tin will be a fact of life using COTS components (due to RoHS compliance). For space usage, the space community will have to learn how to mitigate components that have pure tin. There are multiple options for tin, discussed in mitigations 3m-r.

### 5.2.3.4.1 Double Layer of Conformal Coat (3m)

Attempt to minimize whisker growth. Conformal coat helps but is not 100% effective (the underside of the package still poses a risk of shorting). Two layers of conformal coat, one with a tracer to help with verification of coverage may help. Vapor deposition may be an option for packages with leads under the package.<sup>2, 27</sup>

### 5.2.3.4.2 Re-Tin Leads or Repackage (3n)

Remove the tin by re-tinning the leads of the component (commonly done) with tin-lead solder or in the case of BGA/CGA, rework the package to replace the pure tin leads with lead/solder balls/columns.<sup>2, 27</sup>

### 5.2.3.4.3 Solder Wicking (30)

The normal reflow process can reduce the amount of pure tin locally, but reflow may only be partially effective, depending on the amount of tin present on the lead. Reflow is not a viable technique for ball grid, column grid arrays, flip chip packages and components with large tin coated leads.<sup>27</sup>

# 5.2.3.4.4 Fusing (3p)

Fusing is subjecting the component or structure to 230 Deg C for 1 minute. This technique may not be viable for many COTS components (exceeds maximum allowable case temperature).<sup>27, 42</sup>

#### 5.2.3.4.5 Annealing (3q)

Annealing is subjecting the component or structure to just below critical temperatures for a specified amount of time. The temperatures and time duration depends on the type of material. Recommended annealing for tin is subjecting the component or structure to 150 Deg C for 1 hour. This technique may not be viable for many COTS components (exceeds maximum allowable case temperature).<sup>27</sup>

#### 5.2.3.4.6 Matte Tin Plating (3r)

Matte tin plating has shown substantially less growth than bright tin finishes. Some suppliers do supply tin coated components with matte tin finishes to be RoHS compliant and limit tin whisker growth. The usage of matte finishes provides a reasonable solution to COTS with tin.<sup>28</sup>



Figure 12. Manufacturing mitigation flow (2 of 8).

Figure 12 captures the manufacturing mitigations flow for Residual Gas Analysis (RGA), Particle Impact Noise Detection (PIND), bondpull, and shock.

## 5.2.3.5 Package RGA

RGA is expected to be reserved only for the long term, class A type missions. It is acknowledged that RGA has limited benefit for many COTS components but is included here for completeness. Mitigation 3s discusses RGA.

## 5.2.3.5.1 Test the Component (3s)

Consider procuring the components early (pre-EM) and test. Determine pass/fail limit, MIL-STD-883, method 1018. Consider ordering extra FLT components and test each lot if suspect. If RGA is used and the component fails, the program will have a risk decision moving forward as is (depending on the RGA findings), procuring another lot that may (or may not) be better, or find a substitute component.<sup>23</sup>

### 5.2.3.6 Particle Impact Noise Detection

Particle Impact Noise Detection (PIND) is a test to determine if there are Foreign Object Debris (FOD) in the package cavity that may cause damage or shorts. Mitigation 3t discusses PIND.

## 5.2.3.6.1 Test the Component (3t)

Consider early PIND testing (procure component early (Pre-EM) and test. If FOD is present, determine if it is conductive or non-conductive, then assess internal component shorting risk to circuit function. Determine the Pass/fail criteria. Consider ordering extra FLT components and test each lot if suspect. MIL-STD-883, method 2020. It is acknowledged that PIND may not be applicable to many programs and is included here for completeness.

## 5.2.3.7 Bondpull

Bondpull tests the wire bond to pad strength to determine if there is good adhesion of the wire bond and pad. Poor bondpull results may indicate contamination which may affect reliability. Mitigation 3u discusses bondpull options.

### 5.2.3.7.1 Bondpull Test the Component (3u)

Consider early bondpull testing (procure component early (Pre-EM) and test. If bondpull fails, does it matter to the program at the limit at which it failed? Determine Pass/fail criteria. Is the root cause contamination? Consider ordering extra FLT components and test each lot if suspect. MIL-STD-883, method 2011 and 2023.

### 5.2.3.8 Shock

Shock is typically experienced when a space vehicle is launched and separated from the launch vehicle. Shock may also be experienced during vehicle deployments. Shock mitigation 3v discusses options for COTS components with no history of shock testing.

# 5.2.3.8.1 Shock Test the Component (3v)

Consider early shock testing (procure component early (Pre-EM) and test. sensitivity to shock is highly application dependent and difficult to accurately analyze. Consider whether the component is powered or unpowered during shock events. Review the conditions the supplier uses for testing chatter and transfer (the Mil-Spec allows the coils to be energized during the shock event for the transfer test which is NOT

consistent with how it will be used). If the component is powered during the shock event, it is recommended that the component be tested while powered and operating (and monitored). Component level shock can be VERY different from unit level shock input (depends on packaging). If component fails required shock limits, does it matter to the program at the limit at which it failed? Determine the pass/fail criteria before testing. Consider ordering extra FLT components and test each lot if suspect. Components to consider are relays (chatter and transfer), oscillators (frequency shift or failure), large ceramic caps (on flexible surface), hybrids (stiff substrate), large connectors without support (solder joints), leadless chip carriers (on flexible surface), multi-layer chip capacitor (MLCC), and integrated circuits with large die. Potential solutions include conformal coat or other dampening under the component, softer board mounting, a more or less stiff board, change of unit or board orientation on the panel, relocation on the panel or vehicle for less baseplate shock input, or different separation devices, with test as the verification.



Figure 13. Manufacturing mitigation flow (3 of 8).

Figure 13 captures the manufacturing mitigations flow for vibration, Burn-in, lot traveler, and Electrostatic Discharge (ESD).

#### 5.2.3.9 Vibration

Vibration is experienced by a space vehicle during launch and ascent. Mitigation 3w discusses options for COTS components with no history of vibration testing.

#### 5.2.3.9.1 Vibration Test the Component (3w)

Consider early vibration testing (procure component early (Pre-EM) and test. sensitivity to vibration is highly application dependent. Consider whether the component is powered or unpowered during vibration events. If the component is powered during the vibration event, it is recommended that the component be tested while powered and operating (and monitored). Component level vibration can be VERY different from unit level vibration input (depends on packaging). If component fails required vibration limits, does it matter to the program at the limit at which it failed? Determine the Pass/fail criteria before testing. Consider ordering extra FLT components and test each lot if suspect. Components to consider are relays (chatter and transfer), oscillators (frequency shift or failure), large ceramic caps (on flexible surface), hybrids (stiff substrate), large connectors without support (solder joints), leadless chip carriers (on flexible surface), BGA and CGA packages, and integrated circuits with large die. Potential solutions include conformal coat or other dampening under the component, softer board mounting, a more or less stiff board, change of unit or board orientation on the panel (change the input energy direction vs the component's sensitive axis), relocation on the panel, vehicle for less baseplate vibration input, or powering off the unit.

#### 5.2.3.10 Burn-in

Many COTS components are not burn-in tested like Mil-Spec components. Burn-in typically weeds out infant mortal components. Burn-in testing is done to avoid installing, and then removing later and replacing, usually with schedule impact, a component. Mitigation 3x offers options for reducing the infant mortality failure risk that results in an integration return.<sup>30</sup>

### 5.2.3.10.1.1 Perform Infant Mortality Testing (3x)

Accelerated testing (240 hours) at the component level is best as this greatly reduces the chances of installing a component that may fail during the integration cycle. Potential solutions include post procurement testing (probably static, although dynamic is better), board, or unit level test. Board and unit level testing may not be capable of accelerating the infant mortality fallout, so more hours at the board and unit level may be required prior to or including vehicle integration. 500+ hours is a good target, but much depends on the application junction temperatures. Consider trending key parameters of the COTS vs a nominal analysis. Consider using BOL limits for unit test limits. For shorter mission (<3 years) consider Board Level Screening (BLS) of thermal, electrical and dynamics (TED) to provide confidence of function programs, consider HALT/HAST (Highly Accelerated Life Test/Highly Accelerated Stress Test) testing. Also consider the number of hours of testing before there is confidence of passing the infant mortality period, which will be dependent on the component junction temperature in the application (vs the standard acceleration junction temperature) Some acceleration can be achieved at the board or unit level. Consider a key test point to allow monitoring of a key parameter (board or unit level) that may shift during the infant mortality period to allow trending.<sup>11, 12, 16, 20, 21, 25, 30</sup>

#### 5.2.3.11 Lot Travelers

It is not expected that COTS component suppliers will allow Mil-Spec type inspection of the lot travelers. The lot travelers provide insight into the component build process and may be helpful for anomaly resolution. Mitigations 3y-z discuss options for obtaining lot traveler type information.

#### 5.2.3.11.1 On-Site Inspection (3y)

Consider an on-site inspector or observer for the lot(s) procured (inspect as it is built). It can't hurt to ask and put an NDA inplace.<sup>5</sup>

#### 5.2.3.11.2 Early Non-Destructive and Destructive Tests (3z)

Consider performing early (as soon as part is baselined) destructive and non-destructive testing to understand supplier/part quality (consistent with program risk profile). Procuring additional components from the same source as intended for flight allows early testing and leverages the lower cost of COTS components. Refer to the relevant Mil-Spec for references on testing specifics for consideration.<sup>2, 3, 16, 21</sup>

#### 5.2.3.12 Electro-Static Discharge

Components that are sensitive to ESD may fail prematurely due to damage (walking wounded) that is not immediately observable. If the ESD sensitivity is not known, mitigation 3aa discusses options for finding out a component's ESD sensitivity.

### 5.2.3.12.1 ESD Sensitivity Testing (3aa)

Consider investigating the ESD sensitivity of the chosen component. If it is very sensitive, does it require special handling? Will an ESD control plan be needed for the component? Consider ordering components early and testing or obtaining the supplier ESD assessment. ESD TM 3015 or ANSI/ESDA/JEDEC JS-001 and ANSI/ESDA/JEDEC JS-002. Consider obtaining data for the same *process* as is used for the component chosen. Note that ESD damage can occur anywhere in the handling of the component (die level, component level during electrical test, shipping, and installation/assembly).



Figure 14. Manufacturing mitigation flow (4 of 8).

Figure 14 captures the manufacturing mitigations flow for barometric pressure, solderability, lid seal, and radiography.

# 5.2.3.13 Barometric Pressure

Barometric pressure is either a high or low pressure applied to the component package. Typically, space components are subject to lower pressures (vacuum) but can also experience higher than atmospheric pressure during testing (positive air or dry nitrogen flow in thermal cycle or thermal vacuum testing). This test is typically not done by a supplier.

# 5.2.3.13.1 Pressure Test the Component (3ab)

This is a test of the component's ability to withstand low or high pressure. The test is best done at the component level. MIL-STD-202 or MIL-STD-883 method 1014.12 or 1001.

### 5.2.3.14 Solderability

Solderability is the ease of soldering the component to ensure the solder surface is not contaminated. It is expected tin will be the finish found for COTS components. Mitigation 3ac discusses solderability.

#### 5.2.3.14.1 Test the Component for Solderability(3ac)

This test can best be done at the component level, post procurement, MIL-STD-883, method 2003. Negative results may require cleaning and re-tinning of the leads or other mitigations found in the pure tin mitigations section, 5.2.3.4, depending on the test findings.

### 5.2.3.15 Lid Seal

Lid seal is useful for large cavity components (hybrids, DC-DC converters) and is an indication of workmanship. Mitigation 3ad discusses options for components not tested for lid seal.

# 5.2.3.15.1 Test the Component for Lid Seal(3ad)

This test can be done at the component level, post procurement, DPA to determine the lid seal integrity. MIL-STD-883 method 1014. If the seal allows leakage, vacuum inside the package may allow small geometry features with bus voltage to become susceptible to arcing. Should a component fail lid seal, the potential options are to purchase multiple lots (and test for yield), encapsulate (conformal coat for fine leaks), or find a substitute component.

#### 5.2.3.16 Radiography

Radiography is a non-destructive test used to detect cracks, de-lamination, or other flaws in workmanship in components. Mitigation 3ae discusses options for radiographic testing.

#### 5.2.3.16.1 Radiographically Test the Component (3ae)

This inspection should be done at the component level, post procurement, but can be done at the board level. NDPA to determine the die attach and interconnect integrity. MIL-STD-883 method 2012. Options include purchasing early component and performing the inspection, procure multiple lots to achieve the required yield, or if the workmanship issues are pervasive, will the flaws compromise the mission, in which case a substitute component should be found.<sup>30</sup>



Figure 15. Manufacturing mitigation flow (5 of 8).

Figure 15 captures the manufacturing mitigations flow for resistance to solvents, Scanning Acoustic Microscopy (SAM), moisture resistance, and lead finish.

# 5.2.3.17 Resistance to Solvents

Solvents are used during component installation and cleaning of the assemblies on which they reside. Resistance to solvents is necessary for the component to survive the board assembly process without affecting the function or appearance of the component. Failure of this test may indicate a counterfeit component. Mitigation 3af offers options for COTS components that have unknown resistance to solvents.

# 5.2.3.17.1 Test the Component for Resistance to Solvents(3af)

This test is best done at the component level, post procurement, but can be done on the board. Early NDPA is advised for components with no certification or history to determine if solvents will affect the component marking or function, using MIL-STD-883 Method 2015. Should a COTS component fail the resistance to solvents test, the options are to repackage the component, develop a manufacturing process to not use the solvent to which the component is sensitive, or find a substitute component.

# 5.2.3.18 Scanning Acoustic Microscopy

SAM testing is intended to detect and identify internal delamination, voids, material density changes, defects and many other anomalies within devices, assemblies and materials that may indicate defective material or workmanship issues. SAM testing is used for flipchips at the package level and should be considered for board level testing (solder joint integrity) for flipchips or WLCSP components. This test/inspection is advised for components with no traceability of the materials used in the component and failure of this test may indicate substandard materials, substandard workmanship or counterfeit components.

# 5.2.3.18.1 SAM Test the Component (3ag)

This test is best done at the component level, post procurement, with an early NDPA to determine the material and workmanship integrity, such as die attach, using MIL-STD-883 method 2030. Should a COTS component fail this test/inspection, multiple lots may be required to achieve the desired number of components, determine if there is inappropriate material used, determine if contamination present, and if the shortcomings will meet the mission need. If not, a substitute component is suggested.

# 5.2.3.19 Moisture

Moisture testing is intended to determine if the component is affected by moisture (humidity). This is important for the classic ceramic package but even more so for the plastic package components. Mitigation 3ah discussions options for components that may be sensitive to moisture.

# 5.2.3.19.1 Moisture Test the Component (3ah)

This test should be done at the component level, post procurement, DPA to determine resistance to moisture exposure, using MIL-STD-883 method 1004. Depending on how the component is sensitive to moisture, there are several options for mitigation. For ceramic packages, conformal coating may be sufficient for the mission. Other options are very carefully controlled environment (package level, kit level, board level) to limit exposure to moisture. Multiple bakeouts may be required. For plastic packages, refer to JEDEC 6294 /1 /3 for guidance.

#### 5.2.3.20 Lead Finish

Lead finish is helpful to understand the lead material to withstand the vibration fatigue and if any areas of the lead are susceptible to flaking or corrosion. This test is suggested for components without lead finish information or vibration testing when soldered to a board. Poor lead finish test results may indicate substandard materials, poor workmanship, contamination, or counterfeit components. Mitigation 3aj discussion potential options for components that fail lead finish testing.

### 5.2.3.20.1 Test the Component for Lead Finish (3aj)

This test is best done at the component level, post procurement, DPA to determine the lead integrity, using MIL-STD-883 method 2004. For components that have poor lead finish test performance, options are to soften the vibration or thermal cycling environment for the component, add component staking, add dampening under the component, or find a substitute component. For components that have electroless nickel underplated leads that may be required to be formed, bent or vibrate, consider a substitute package or component.



Figure 16. Manufacturing mitigation flow (6 of 8).

Figure 16 captures the manufacturing mitigations flow for die shear, lid torque, lead adhesion, and column pull.

### 5.2.3.21 Die Shear

Die shear testing is used to determine the integrity of the die adhesion for a semiconductor or for surface mounted passive elements. It is helpful to understand the workmanship quality and materials used. Failure of the die shear test may indicate poor workmanship, poor materials, poor process control, or counterfeit components. Mitigation 3ak discussion potential options for components that fail die shear testing.

# 5.2.3.21.1 Die Shear Test the Component (3ak)

This test is best done at the component level, post procurement, DPA to determine the die attach and interconnect integrity. It is suggested that if this test is desired, to perform it early, using MIL-STD-883 method 2019. Should a component fail die shear testing, are the materials suitable for them mission? Was there voiding? Consider designing a more benign vibration or shock environment for the component, having the die packaged by a different supplier, or find a substitute component.

### 5.2.3.22 Lid Torque

Lid torque testing is used to determine the integrity of the lid adhesion for a semiconductor. It is helpful to understand the workmanship quality and materials used. Failure of the lid torque test may indicate poor workmanship, poor materials, poor process control, or counterfeit components. Mitigation 3al discussion potential options for components that fail lid torque testing.

# 5.2.3.22.1 Lid Torque Test the Component (3al)

This test is best done at the component level, post procurement, DPA to determine the lid adhesion to the package. It is suggested that if this test is desired, to perform it early, using MIL-STD-883 method 2024. Should a component fail lid torque testing, are the materials suitable for them mission? Was there voiding? Consider having the die packaged by a different supplier or find a substitute component.

#### 5.2.3.23 Lead Adhesion

The lead adhesion test is used to determine the quality of the lead materials This test is suggested for components without lead adhesion information. Poor lead adhesion test results may indicate substandard materials, poor workmanship, contamination, or counterfeit components. Mitigation 3am discussion potential options for components that fail lead finish testing.

#### 5.2.3.23.1 Test the Component for Lead Adhesion(3am)

This test is best done at the component level, post procurement, DPA to determine the lead finish adhesion. It is suggested that if this test is desired, to perform it early, using MIL-STD-883 method 2025. For components that have poor lead adhesion test performance, options are to repackage the die in another package, perhaps with a different supplier, or find a substitute component.

#### 5.2.3.24 Column Pull

Column pull is similar to bondpull test, but for CGA and BGA packages. The column pull tests the column bond to package pad strength to determine if there is good adhesion of the column bond and pad. Poor column pull results may indicate contamination which may affect reliability. There is also a test for flipchip packages, which is a pull-off test. Mitigation 3an discusses column pull options.

#### 5.2.3.24.1 Pull Test the Component (3an)

This test can only be done at the component level, post procurement. Consider early column pull testing (procure component early (Pre-EM)) and DPA to determine the column strength using MIL-STD-883 method 2038. If the column pull test fails, does it matter to the program at the limit at which it failed? Determine the pass/fail criteria. For flipchip packages, the purpose of the test is to test the strength of the internal bonds between the die and the substrate using MIL-STD-883 method 2031.1 Consider ordering extra flight components and test each lot if suspect. Consider packaging the die in a package from a different source.



Figure 17. Manufacturing mitigation flow (7 of 8).

Figure 17 captures the manufacturing mitigations flow for external visual, pre-cap inspection, internal visual, and Scanning Electron Microscopy (SEM).

# 5.2.3.25 External Visual

External visual is gross assessment of the packaged component. Failure of the external visual inspection may indicate substandard material, defects, poor handling, or counterfeit components.

#### 5.2.3.25.1 On-Site Inspection or Observation (3ao)

If the supplier will allow it, consider an on-site inspector or observer for the lot(s) procured (inspect as it is built).<sup>5</sup> The supplier may have evidence of automated optical or graphical inspection that may suffice.

#### 5.2.3.25.2 External Visual post Component Receipt (3ap)

This test can be done at the component level, post procurement or the board level. NDPA to inspect the component externally using MIL-STD-883 method 2009. Should a component fail external visual, it greatly depends on what was discrepant and impact to the program, which may be to procure another lot or find a substitute component.

#### 5.2.3.26 Pre-Cap Inspection

Pre-Cap inspection is a visual check for component workmanship and cleanliness prior to lid seal. It is expected this inspection may be asked of new suppliers of commercial components and be turned down.

### 5.2.3.26.1 On-Site Inspection or Observation (3aq)

If the supplier will allow it, consider an on-site inspector or observer for the lot(s) procured (inspect as it is built).<sup>5</sup> The supplier may have evidence of automated graphical inspection that may suffice. If the supplier will not allow on-site inspection or observation, consider de-lid and internal inspection, post procurement (refer to 5.2.3.27.2).

#### 5.2.3.27 Internal Visual

Internal visual is an assessment of the packaged component. Failure of the internal visual inspection may indicate substandard material, defects, poor handling, or counterfeit components.

#### 5.2.3.27.1 On-Site Inspection or Observation (3ar)

If the supplier will allow it, consider an on-site inspector or observer for the lot(s) procured (inspect as it is built).<sup>5</sup> The supplier may have evidence of automated optical or graphical inspection that may suffice.

#### 5.2.3.27.2 De-Lid and Inspect Component (3as)

Consider de-lidding the component and inspecting visually, using MIL-STD-883 method 2010. Should a component fail internal visual, it greatly depends on what was discrepant and impact to the program, which may be to procure another lot or find a substitute component.

#### 5.2.3.28 Scanning Electron Microscopy

SEM is used for detecting defects in material, either on the surface or within the material. Failure of the SEM analysis may indicate poor workmanship, defective material, or counterfeit components. Mitigations for SEM component negative results are shown in mitigation 3at.

#### 5.2.3.28.1 SEM Test the Component (3at)

This test is best done at the component level, post procurement. It is recommended to perform this DPA SEM analysis of the component early using MIL-STD-883 method 2018. Should a component fail the SEM analysis, the options are to use another lot, use material from a different source, or find a substitute component or supplier.



Figure 18. Manufacturing mitigation flow (8 of 8).

Figure 18 captures the manufacturing mitigations flow for contamination, hazardous materials, and corrosive materials.

# 5.2.3.29 Contamination

Contamination is any unwanted or unintended substance on or in the component. Contamination can come from many sources, with the usual mitigation to be removal of the contamination source. This ATR assumes the contamination source is the COTs component itself and how mitigate the effects. Mitigation 3au describes the options when the contamination source is the introduced COTs component.

# 5.2.3.29.1 Existing Mitigations (3au)

Consider mitigations already mentioned in this ATR. Mitigation 3a is plastic packaging encapsulation. The idea here is simple encapsulation may be enough, depending on the mission needs. Mitigation 3c is repackaging the component. This could also apply to a unit to contain the contaminating material. Mitigation 3m is the simplest form of encapsulation: a double layer of conformal coat. Mitigation 5n is a vehicle or subsystem level mitigation where a shield is used to protect sensitive surfaces (covered to provide time for the contamination to outgas and disperse).

#### 5.2.3.30 Hazardous materials

Hazardous materials are substances that may harm humans. Mitigations 3av-aw discuss options for working with COTS components that contain hazardous materials.

### 5.2.3.30.1 Existing Mitigations (3av)

Consider mitigations already mentioned in this ATR. Mitigation 3c is repackaging the component. This could also apply to a unit to contain the contaminating material. Mitigation 5au (see above).

#### 5.2.3.30.2 Handling (aw)

Special handling may be required to use a component (or unit) that contains hazardous materials.

#### 5.2.3.31 Corrosive Materials

Corrosive materials the substances that may react with other materials, including human tissue.

#### 5.2.3.31.1 Separation of Functions/Containment (3ax)

It is not recommended for a component (or unit) on a space vehicle to contain corrosive materials. If the program must use components with corrosive materials, consider separating that function physically from any health and safety related equipment (vehicle bus functions). One way to do this is to contain the unit is a separate enclosure, away from other units. The component with corrosive material should also not be in an electrical path for vehicle health and safety.

#### 5.2.4 Trust

The intent of the trust flow is to understand the scope of the knowledge needed to assess the trust aspects of the COTS component that may need mitigation. The trust section covers foreign sourced, foreign national processed and heritage usage. The KEY aspect of the trust section is to prevent technology transfer to a foreign entity, either through the component itself (added functions) or through knowledge transfer via the processing personnel, and to establish confidence that the component will do what it is supposed to do – and nothing else. Figure 19 shows the trust mitigations flow. Table 23 shows the potential trust mitigations.

| Mitigation | Trust                                                              |  |
|------------|--------------------------------------------------------------------|--|
|            | Foreign Sourced                                                    |  |
| 4a         | Review for unexpected internal structures, do not use if found     |  |
| 4b         | Redundant design with alternate component                          |  |
| 4c         | Limit usage in S/C health and safety, link, and security functions |  |
| 4d         | Consider using a blind trusted agent for component purchase        |  |
| 4e         | Perform independent verification and validation for critical       |  |
|            | functions                                                          |  |
|            | Heritage                                                           |  |
| 4f         | Counterfeit components                                             |  |



Figure 19. Trust mitigations flow.

# 5.2.4.1 Foreign Sourced

Foreign sourced components should have extra scrutiny if used for US military space usage. If a program selects a foreign sourced component, there are several mitigations found in mitigations 4a-e.

### 5.2.4.1.1 Review of component (4a)

If a visual analysis is performed, are there any structures that are non-standard or odd in their placement? Potential supply chain issue.<sup>21</sup>

# 5.2.4.1.2 Alternate Design (4b)

Consider a design that uses a Mil-Spec component as a backup or an alternate COTS if the first selection is high risk (many mitigations required, time prohibitive mitigations required, costly mitigations required, or foreign national personnel involved in processing the component).<sup>5</sup>

### 5.2.4.1.3 Limit Usage in System Design (4c)

If the COTS component is used in a mission critical application or one that may be a single point of failure (single string) consider additional mitigations such as very conservative derating, derated voltage, conservative junction temperature, derated clock speed, or a backup alternate path in the design<sup>5, 21</sup>

# 5.2.4.1.4 Blind Trusted Agent Buy (4d)

For components that may result in information transfer, such as intent of the product by the user, operational security concerns raised by the purchasing the component, or identification of personnel, consider the use of a blind, trusted agent to make the purchase.

#### 5.2.4.1.5 Independent Verification and Validation for Critical Functions (4e)

For components that are in critical functions, consider an independent verification and validation program to provide evidence that the component does what it is supposed to – and nothing else.

#### 5.2.4.2 Heritage

Usage of COTS components from known sources will limit the exposure to counterfeit components. Mitigation 4f provides an option for components that lack heritage with the space community.

#### 5.2.4.2.1 Counterfeit Components (4f)

Procure only from authorized sources or require adherence to AS5553. Inspect carefully if procured from a third-party broker.<sup>5, 16, 21</sup>

### 5.2.5 Environmental Considerations

The intent of the environmental considerations flow is to understand the scope of the knowledge needed to assess the environmental aspects of the COTS component that may need mitigation. The environmental section covers aging, temperature, and vacuum. Figure 20 shows the environmental mitigations flow. Table 24 shows the potential environmental mitigations.

| Mitigation | Environmental                    |
|------------|----------------------------------|
|            | Aging                            |
| 5a         | Multiple components in parallel  |
| 5b         | Conservative Derating            |
| 5c         | Conservative Thermal Environment |
| 5d         | Dynamic Reliability Management   |
| 5e         | Adaptive/Static Voltage Scaling  |
|            | Temperature                      |
| 5f         | Local Heatpipes                  |
| 5g         | Lower Supply Voltages            |
| 5h         | Decreased frequency              |
|            | Vacuum                           |
| 5j         | Conservative Derating            |
| 5k         | Increased physical spacing       |
| 51         | No "golden nodes"                |
| 5m         | Encapsulation                    |
| 5n         | Deployable covers for optics     |
| 50         | Unit Thermal Vacuum test         |
|            | EMC/EMI                          |
| 5p         | Shielding                        |
| 5q         | Grounding                        |
| 5r         | Power strobing                   |

Table 24. Potential Environmental Considerations Mitigations



Figure 20. Environmental mitigations flow.

# 5.2.5.1 Aging

Aging, which is the effects of time in an environment (including radiation) may reduce the useful life of a component in the application. There are multiple mitigations, found in mitigations 5a-e, that discuss potential solutions. The key is knowing what and by how much the aging affects the component. Testing may be required.

# 5.2.5.1.1 Multiple Components in Parallel (5a)

This approach uses multiple components in circuit, one powered and active, others powered off and inactive. When the active component fails (radiation for example) the next unpowered component can be used to resume the function. A potential mitigation for the usage of COTS ADC components is to architect the board to use multiple ADCs, buffered from one another. When one has degraded to the point of no longer being acceptable, a new one can be switched in without the overhead of an entire board. For example, put three ADCs in parallel, each buffered (input and output) and on a separate power switch, with each feeding the common digital interface. FSW may be used to select the next component in the string to minimize impact to the mission.<sup>2, 3, 8, 24</sup>

# 5.2.5.1.2 Conservative De-rating (5b)

COTS processors and memory components can benefit from conservative frequency and voltage application relative to their rated limits to control the junction temperature (lower junction temperature = more useful life). A relatively new component also sensitive to vacuum is the polymer tantalum capacitor (PTC). Mitigation for the PTC is the conservative application of applied voltage relative to its voltage rating.<sup>3,4,5,9,23</sup>

# 5.2.5.1.3 Conservative Thermal Environment (5c)

The ability to maintain junction temperatures within specification is KEY to component reliability, useful life, and performance. Thermal design assumptions based on worst case power dissipations will not always prove enough margin (since many COTS components have reduced (from Mil-Spec) maximum junction temperature requirements). Some components have internal temperature controls (processors). Consider a conservative thermal limit to allow usage consistent with the program need or more thermal sensors for more precise thermal knowledge and control.<sup>1,9</sup>

# 5.2.5.1.4 Dynamic Reliability Management (5d)

Modern day Personal Computers (PCs) use dynamic reliability management (lowers frequency to reduce junction temperature). For a PC with non-realtime processing requirements, that is acceptable. For space REALTIME computing, not only would an upper limit need to be set (to limit the component's junction temperature, but also a LOWER limit to not break the processing cycle demands. Not completing tasks for a real-time system can mean loss of the system (a design issue). Consider dynamic management of the frequency (junction temperature -> reliability/useful life) very carefully. Some components may not allow setting a higher limit if the junction temperature exceeds its programmed limit (not user controlled). It is NOT recommended to use "overclocking" for high reliability or real-time applications.<sup>13, 18, 19</sup>

# 5.2.5.1.5 Adaptive/Static Voltage Scaling (5e)

It may be necessary to limit the supply voltage to the component to limit the power dissipation. For CMOS, typically, the higher the voltage, the higher the power dissipation. The supply voltage can be limited through the design of the supply (fixed), through adaptive voltage management (circuit to sense

the component rising temperature feeds back to the supply to decrease the voltage, or increasing voltage as the component ages).<sup>13, 14, 15, 18, 19, 20, 24</sup>

# 5.2.5.2 Temperature

The effect of temperature affects both useful life of the component, but also its' parametric performance. Mitigations 5f-h discuss options for temperature effects.

### 5.2.5.2.1 Local Heatpipes (5f)

Local heatpipes (heatpipes connected to the COTS package) to remove heat are effective but difficult to install and more importantly, difficult to rework, for components like FPGAs, ASICs, processors, microcontrollers, or any high-density integrated circuit to manage the junction temperature. Typically, a local heatpipe will more efficiently transfer the component heat to the internal rail of the unit instead of heat transfer through the PWB to the internal rail.<sup>24</sup>

# 5.2.5.2.2 Lower Supply Voltages (5g)

It may be necessary to limit the supply voltage to the component to limit the power dissipation. For CMOS, typically, the higher the voltage, the higher the power dissipation. The supply voltage can be limited through the design of the supply.<sup>14, 19, 24</sup>

# 5.2.5.2.3 Decreased Frequency (5h)

Decreasing the operational frequency for CMOS components will reduce the power dissipation. Usage of decreased frequency must be used with great care: processors in space vehicles are performing real-time tasks and anything that slows down processing that violates the processing period can result in loss of data, loss of vehicle function (temporary), or in the case of the space craft control computer, loss of vehicle.<sup>24</sup>

#### 5.2.5.3 Vacuum

The effect of vacuum affects both useful life of the component, but also its parametric performance. Mitigations 5j-o discuss options for vacuum effects.

# 5.2.5.3.1 Conservative Derating (5j)

Conservative derating can be used to use COTS components that are sensitive to vacuum (RF components). A relatively new component also sensitive to vacuum is the polymer tantalum capacitor (PTC). Mitigation for the PTC is the conservative application of applied voltage relative to its voltage rating.<sup>24</sup>

#### 5.2.5.3.2 Increased Physical Spacing (5k)

Usage of COTS components in a vacuum may require additional spacing due to voltage gradients and non-traditional packaging that may poorly inhibit arcing (voltages above 12V) Another potentially vacuum-sensitive feature is physical separation (often applicable for power supplies). Small separation distances in a circuit that are acceptable at ambient pressure may be vulnerable in a vacuum due to plasma arcing (tin whiskers from COTS components). Separation of prime and redundant assemblies (physical, barrier) may also apply.<sup>24</sup>

# 5.2.5.3.3 No "Golden" Nodes (5I)

It is recommended that a COTS component NOT be part of any "golden nodes" (nodes where there is no redundancy) due to non-hermeticity of the high voltage package (like a non-hermetically sealed FLASH memory, depending on the internal programming voltage) or a single signal trace.<sup>24, 26</sup>

#### 5.2.5.3.4 Encapsulation (5m)

Due to the packaging type used for the COTS component, it may be necessary to encapsulate the component to mitigate vacuum effects. Plastic components are known to outgas contaminants from the plastic. While this may not be an issue for the component itself, the contaminants may coat the surfaces of the vehicle's optics (telescopes, star-trackers, and laser communication interfaces). Encapsulation is a component level mitigation.<sup>24</sup>

### 5.2.5.3.5 Deployable Covers for Optics (5n)

An architectural mitigation is to have deployable covers for all sensitive optics that are opened well after the contaminants have dispersed away from the vehicle, which may take some time (days to weeks) and adds complexity to the overall vehicle (more equipment, fault management, and CONOPS).<sup>24</sup>

### 5.2.5.3.6 Unit Thermal Vacuum Test (50)

An effective way to ensure the COTS component will function as intended is to perform a unit level thermal vacuum test for units that have vacuum sensitive COTS components or small thermal margins.<sup>21</sup>

#### 5.2.5.4 EMC/EMI

Usage of COTS components in an electromagnetic environment may affect their function or their useful life. Components such as MRAMs may be affected by EMC/EMI. Mitigations 5p-r discuss options for sensitive components.

#### 5.2.5.4.1 Shielding (5p)

One way to protect a component or unit is to provide an external shield (a Faraday cage) to prevent electromagnetic charge buildup near the component.

# 5.2.5.4.2 Grounding (5q)

Another way to protect a component or unit is to provide a good ground to the package or the unit enclosure.

#### 5.2.5.4.3 Power Strobing - Filtering (5r)

A way to minimize the effects of EMI on a component is to leave it powered off except when used. In the case of memories (like MRAM, EEPROM, FLASH), program refresh should be done in light electromagnetic activity. Filtering may be an option for components (or units) that have bus power inputs to operate with typical space vehicle bus transients.

# 6. Flow Usage Examples

Three examples of how to use the mitigation flows for real world COTS component issues are detailed below. These are intended as an aid for navigating the mitigation flow and detailed mitigations, using the examples discussed early in Section 5.

# 6.1 COTS Example 1 On Board Computer Controller (OBCC) ~1988

Using the flow in Figure 6, the L65400 1750A processor required SEU mitigations. Referring to Figure 21, this shows the mitigations in the flow that were used to completely mitigate the poor SEU performance. The first pass was to implement the 1g mitigation (periodic refresh period). This by itself was insufficient, as the internal registers of the processor may contain incorrect values. A second pass through the logic used additional mitigations. The additional mitigations that were investigated were SEFI mitigations and two of the mitigations were implemented, 1n (local refresh) and 1o (component reset) to fully address the processor SEU performance and its potential implications on the system. These three mitigations were not enough as the state machines that govern activity on the bus and internal pipeline operations are not reset. To monitor those functions, a watchdog timer (WDT) was used (1o). It was required that the OBC send a command through the OBCC to another unit to reset the WDT. If the WDT was not reset, the monitoring unit would swap the OBCC. As a result of these mitigations, there were impacts to other subsystems (FSW, telemetry formats) on the vehicle to implement the mitigations. These impacts to other subsystems should be investigated early to fully capture the scope of the COTS component mitigations. In addition to the mitigations shown, multiple best practices from Table 2 were also considered: numbers 3, 31, 32, 33, 35, 43, and 55.



Figure 21. OBCC processor SEU mitigation example.

#### 6.2 COTS Example 2 Various programs – EEPROM usage ~2003

Using the flow in Figure 6, the EEPROM required multiple mitigations. Referring to Figure 22, this figure shows the mitigations in the flow that were used to completely mitigate the multiple EEPROM issues. While there was a total dose issue with the EEPROM, the use of shielding was marginal (height of the component limited the shielding thickness), so power strobing was used (mitigation 1b) to increase the TD performance, the EEPROM when powered OFF was much more TD tolerant. The power ON/OFF circuit to the EEPROM must also control the RES input during power ON/OFF to less than 0.25V. Most reset circuits needed to be modified to accommodate this constraint. Since this component is non-volatile, there is a data retention period to observe. Little is known about the fabrication details from Hitachi (Hitachi sold the die, then stopped supporting the die). Given the 20-year stated datasheet retention, with no process details, many contractors were conservative with specifying a period to perform an on-orbit refresh, anywhere from 5 to 10 years, depending on the application (mitigation 1g). Another mitigation used for the EEPROM was at the board/Flight SoftWare (FSW) level by programming multiple images into the EEPROM, so that if one was corrupted, another image (the same content) could be used (mitigation 1e). The SBC had Error Detection and Correction (EDAC), so a single bit error was corrected when read by the processor, but not in EEPROM. A swap was initiated if multiple locations showed Single Bit Errors (SBEs). This is mitigation 1s. In addition to the mitigations shown, multiple best practices from Table 2 were also considered: numbers 3, 8, 10, 20, 31, 32, 33, and 55.



Figure 22. EEPROM mitigation example.

#### 6.3 COTS Example 3 Computer and Data Electronics (CDE)

The CDE had several Power On reset requirements that required both the 3.3V and 5V rails to be monitored. During power up, the POR circuit was required to provide a delay to allow the oscillator to guaranteed to be oscillating (10 ms following voltage stability), Hold off the reset until all voltage were stable, and provide a very low output reset signal to the EEPROM RES input, and be able to function at voltage where the IO components may start to conduct (due to a requirement for the unit to not be able to glitch the command output signals or potentially send a false command to the vehicle. A low voltage LM139A was chosen. The datasheet from National Semiconductor stated that their LM139A was designed to work down to 2V VCC. This feature was not tested by National Semiconductor, but the datasheet stated VCC minimum was 2V. The tested VCC minimum was +5V. Components were purchased and tested in house for VCC performance. Radiation testing showed the LM139A to be sensitive to TD radiation (failed the program's 2X radiation requirement). The was the path to mitigation 1a, shown in Figure 23. Components from this lot were selected and installed into an evaluation board to characterize the component's performance. This was the path to mitigation 3h shown in Figure 24. More iterations through the flowchart were required to satisfy the engineer of the component's performance in the application. The board design was modified to provide a test point at board level to assess the POR performance during board testing (engineering model and flight). This was the 3j path through the flow. The final mitigation to ensure the design was not degrading through both EM and flight unit testing was mitigation 3k. In addition to the mitigations shown, multiple best practices from Table 2 were also considered: numbers 3, 8, 10, 20, 31, 32, 33, 34, 37, and 55. Component specific best practices were also considered from Table 12, numbers 2 and 3.



Figure 23. Example 3 flow example (1 of 2).



Figure 24. Example 3 flow example (2 of 2).

# 7. References

- 1. Calvelli, F, Three Years or less From Contract Start to Launch A Simple Formula to go Fast in Space Acquisition, April 2023
- NESC-RP-19-01490 Recommendations on Use of COTS Guidance for NASA Missions Phase II (11-10-22 NRB) RP FINAL (002)
- 3. Yarbrough, A., "Got Reliability? Off the Shelf Parts for Resilient Space," OTR-2019-00058, The Aerospace Corporation, April 2019
- 4. OTR-2019-01255-A-Quick-Reference-Sheet-for-Risk-Tolerant-Space-Missions-Using-Alternate-Grade-Electronics
- 5. OTR-2017-00845 Key Questions to Ask When Considering Alternate-Grade EEE (Electrical, Electronic and Electromechanical) Parts for Small Satellite Missions
- 6. Braun, B., "A Class Agnostic Mission Assurance Approach," TOR-2021-00133, The Aerospace Corporation, January 2021
- Yarbrough, A., "E.P.I.C. (Enterprise, Partnerships, Innovation, Culture) Speed Electronics: A Pre-RFP Application Guide for Alternate-Grade Electronics in Small Satellites and Resilient Systems," TOR-2020-01447, The Aerospace Corporation, September 2020
- Leitner, J., "Some not-so-intuitive lessons from key on-orbit failures and anomalies since 2005," NASA, May 2021
- 9. Leitner, J., "Assurance of electronic parts for aerospace system reliability: Past, present, and future," Quality Engineering, DOI: 10.1080/08982112.2021.2021423, January 2022
- Yarbrough, A., "Alternate Grade Electronics-Based Card, Slice and Unit Level Characterization Considerations –Decision Flow Concept," OTR-2022-00670, The Aerospace Corporation, April 2022.
- Yarbrough, A., "PMPedia: A Crowd-Sourced Alternate-Grade Electronics Space Radiation Knowledge Repository," OTR-2020-00038, The Aerospace Corporation, November 2019, https://pmpedia.space/
- 12. Hogan, S., "Effective Fault Management Guidelines," TOR-2009(8591)-14, The Aerospace Corporation, June 2009.
- 13. K. Boughton, "The Truth About Processor "Degradation," Anandtech Online Journal. March 5, 2008, <u>https://www.anandtech.com/show/2468/6</u>
- 14. Jyothi B. Velamala, "Statistical Aging under Dynamic Voltage Scaling," Proceedings of the IEEE 2012 Custom Integrated Circuits Conference, 2012
- 15. "Prediction of NBTI Degradation in Dynamic," IEEE TRANSACTIONS ON DEVICE AND MATERIALS RELIABILITY, 2016.
- 16. NASA recommendation on the use of COTS NASA-TM-20205011579
- 17. Managing Commercial-Off-the-Shelf (COTS) Based Space Hardware Risk (Aerospace Corp OTR 2023-00559)
- M. Chen, "Aging sensors for workload centric guardbanding in dynamic voltage scaling applications," 2013 IEEE International Reliability Physics Symposium IRPS, pp. pp. 4A.2.1-4A.2.5, 2013
- 19. D. Roberts, "Error analysis for the support of robust voltage scaling," Sixth international symposium on quality electronic design, pp. 65-70, 2005
- 20. A. Yarborough, "COTS Card Unit Level Characterization Guidance" 2023
- 21. TOR-2021-02290\_FAQs when Considering COTS
- 22. Y. Masuda, "MTTF-aware design methodology for adaptive voltage scaling," 2018 China Semiconductor Technology International Conference (CSTIC), pp. 1-4, 2018
- 23. OTR-2023-00627 Design Integrity Guidelines (DIG) for Alternate Grade Parts (AGP) in Space Power Electronics

- 24. ATM-2023-02205 COTS in Space
- 25. HALT-Test-For-New-Reliability-Approach-In-New-Space
- 26. TOR-2018-00616 Separation and Isolation Guidelines for Printed Wiring Boards Intended for Space Use
- 27. Tin Whisker Mitigation Methods
- 28. P. Lavery, Strategies to mitigate the tin whisker phenomenon, Vicor
- 29. Dakai Chen, "Radiation Qualification of Flash Memories" NEPP Electronic Technology Workshop, June 11-12, 2013
- 30. J. Ranaudo, OTR-2021-01017 Traditional Parts Policy as it Relates to the Selection of EEEE Parts for Space, August 2021

Uncited Useful References

SAE J3168 - Reliability Physics Analysis of Electrical, Electronic, and Electromechanical Equipment, Modules and Components

SAE J2816 Guide for Reliability Analysis Using the Physics-of-Failure Process

SAE J2940 Use of Model Verification and Validation in Product Reliability and Confidence Assessments ARP6338 Process for Assessment and Mitigation of Early Wearout of Life-Limited Microcircuits JEDEC JESD94 Application Specific Qualification Using Knowledge Based Test Methodology JEDEC JESD91 Method for Developing Acceleration Models for Electronic Component Failure Mechanisms

T. Katz, "Evaluation of COTS Hardware Assemblies for use in Risk Averse, Cost Constrained Spacebased Systems" 2019.

# 8. Acronyms

| ADC    | Analog to Digital Converter                              |
|--------|----------------------------------------------------------|
| AEC    | Automotive Electronic Council                            |
| AGP    | Alternate Grade Part                                     |
| ASIC   | Application Specific Integrated Circuit                  |
| ATP    | Authorization To Proceed                                 |
| ATR    | Aerospace Technical Report                               |
| BGA    | Ball Grid Array                                          |
| BiCMOS | Bipolar Complementary Metal Oxide Semiconductor          |
| BJT    | Bipolar Junction Transistor                              |
| BLS    | Board Level Screening                                    |
| BOL    | Beginning OF Life                                        |
| CCA    | Circuit Card Assembly                                    |
| CCD    | Charge Coupled Devices                                   |
| CGA    | Column Grid Array                                        |
| CMOS   | Complementary Metal Oxide Semiconductor                  |
| COTS   | Commercial Off the Shelf                                 |
| CRC    | Cyclic Redundancy Check                                  |
| CSAM   | Confocal Scanning Acoustic Microscopy                    |
| DMSMS  | Diminishing Manufacturing Sources and Material Shortages |
| DPA    | Destructive Parts Analysis                               |
| EEE    | Electric, Electronic, and Electromechanical              |
| EEPROM | Electrically Erasable Programmable Read Only Memory      |
| EDAC   | Error Detection and Correction                           |
| ELDRS  | Enhanced Low-Dose Rate Sensitivity                       |
| EM     | Engineering Model                                        |
| EMC    | Electromagnetic Compatibility                            |
| EMI    | Electromagnetic Interference                             |
| EOL    | End of Life                                              |
| ESD    | Electro-Static Discharge                                 |
| FFF    | Form, Fit, Function                                      |
| FIT    | Failure in Time                                          |
| FLT    | Flight                                                   |
| FMECA  | Failure Modes and Effect Criticality Analysis            |
| FOD    | Foreign Object Debris                                    |
| FPGA   | Field Programmable Gate Array                            |
| FSW    | Flight SoftWare                                          |
| HALT   | Highly Accelerated Life Testing                          |
| HAST   | Highly Accelerated Stress Testing                        |
| HCI    | Hot Carrier Injection                                    |
| HEMT   | High Electron Mobility Transistor                        |
| HTOL   | High Temperature Operational Life                        |
| IP     | Intellectual Property                                    |
| IPC    | Institute of Printed Circuits                            |
| ILPM   | Industry Leading Part Manufacturers                      |
| JTAG   | Joint Test Action Group                                  |
| KPP    | Key Performance Parameters                               |
| LDC    | Lot Date Code                                            |
|        |                                                          |

| LU     | Latchup                                                        |  |
|--------|----------------------------------------------------------------|--|
| MLCC   | Multi-Layer Ceramic Capacitor                                  |  |
| MOSFET | Metal Oxide Semiconductor Field Effect Transistor              |  |
| MRAM   | Magneto-resistive Random-Access Memory                         |  |
| MSIW   | Mission Success Information Workshop                           |  |
| NASA   | National Aeronautics and Space Administration                  |  |
| NDA    | Non-Disclosure Agreement                                       |  |
| NDPA   | Non-Destructive Parts Analysis                                 |  |
| NESC   | NASA Engineering and Safety Council                            |  |
| NMI    | Non-Maskable Interrupt                                         |  |
| OBC    | On-Board Computer                                              |  |
| OBCC   | Onn-Board Computer Controller                                  |  |
| OBFP   | On-Board Fault Protection                                      |  |
| PIND   | Particle Noise Impact Detection                                |  |
| POH    | Power On Hours                                                 |  |
| POR    | Power On Reset                                                 |  |
| PPAP   | Production Part Approval Process                               |  |
| PTC    | Polymer Tantalum Capacitor                                     |  |
| PWB    | Printed Wiring Board                                           |  |
| RC     | Resistor/Capacitor                                             |  |
| RF     | Radio Frequency                                                |  |
| RGA    | Residual Gas Analysis                                          |  |
| RHA    | Radiation Hardness Assurance                                   |  |
| RHBSW  | Radiation Hardened by SoftWare                                 |  |
| RoHS   | Restriction of Hazardous Substances                            |  |
| SAA    | South Atlantic Anomaly                                         |  |
| SAM    | Scanning Acoustic Microscopy                                   |  |
| SBC    | Single Board Computer                                          |  |
| SBE    | Single Bit Error                                               |  |
| SEB    | Single Event Burnout                                           |  |
| SEE    | Single Event Effect                                            |  |
| SEM    | Scanning Electron Microscopy                                   |  |
| SEFI   | Single Event Functional Interrupt                              |  |
| SEGR   | Single Event Gate Rupture                                      |  |
| SET    | Single Event Transient                                         |  |
| SEU    | Single Event Upset                                             |  |
| SMCR   | Standard Military Cross Reference Matrix (found at             |  |
|        | https://landandmaritimeapps.dla.mil/Programs/Smcr/lookup.aspx) |  |
| SMD    | Surface Mount Device                                           |  |
| SOA    | Safe Operating Area                                            |  |
| SOC    | System on a Chip                                               |  |
| SOTA   | State of The Art                                               |  |
| SPC    | Statistical Process Control                                    |  |
| SPF    | Single Point Failure                                           |  |
| SDRAM  | Synchronous Dynamic Random Access Memory                       |  |
| SRAM   | Static Random Access Memory                                    |  |
| SWAP   | Size, Weight, And Power                                        |  |
| TED    | Thermal, Electrical, Dynamics                                  |  |
| TD     | Total Dose                                                     |  |
| TDDB   | Time Independent Dielectric Breakdown                          |  |
| TLYF   | Test Like You Fly                                              |  |
|        |                                                                |  |

| TMR   | Triple Modular Redundancy        |
|-------|----------------------------------|
| USG   | United States Government         |
| VCC   | Voltage Common Collector         |
| VDD   | Voltage Drain                    |
| Vf    | Forward Voltage                  |
| Vr    | Rated Voltage                    |
| WDT   | Watch Dog Timer                  |
| WLCSP | Wafer Level Chip Scale Packaging |

# Expanding Space Design Options Using COTS

Cognizant Program Manager Approval:

Barbara M. Braun, PRINCIPAL DIRECTOR CORPORATE CHIEF ENGINEERS OFFICE OFFICE OF EVP

Aerospace Corporate Officer Approval:

Mark J. Silverman, CHIEF ENGINEER/GENERAL MANAGER OFFICE OF EVP

Content Concurrence Provided Electronically by:

Steven L. Hogan, SENIOR PROJECT LEADER DIGITAL & INTEGRATED CIRCUIT ELECT DEPT ELECTRONICS ENGINEERING SUBDIVISION ENGINEERING & TECHNOLOGY GROUP

Office of General Counsel Approval Granted Electronically by:

Kien T. Le, ASSISTANT GENERAL COUNSEL OFFICE OF THE GENERAL COUNSEL OFFICE OF GENERAL COUNSEL & SECRETARY

© The Aerospace Corporation, 2023.

All trademarks, service marks, and trade names are the property of their respective owners.

SH0071

# Expanding Space Design Options Using COTS

Export Control Office Approval Granted Electronically by:

Angela M. Farmer, SECURITY SUPERVISOR GOVERNMENT SECURITY SECURITY OPERATIONS OFFICE OF THE CHIEF INFORMATION OFFICER

© The Aerospace Corporation, 2023.

All trademarks, service marks, and trade names are the property of their respective owners. SH0071