# Impact and Mitigation of SRAM Read Path Aging

Innocent Agbo<sup>a</sup>, Mottaqiallah Taouil<sup>a</sup>, Daniël Kraak<sup>a</sup>, Said Hamdioui<sup>a</sup>, Pieter Weckx<sup>a,b</sup>, Stefan Cosemans<sup>b</sup>, Francky Catthoor<sup>a,b</sup>, Wim Dehaene<sup>c</sup>

<sup>a</sup>Delft University of Technology, Faculty of Electrical Engineering, Mathematics and CS, Mekelweg 4, 2628 CD Delft, The Netherlands <sup>b</sup>IMEC vzw, Kapeldreef 75, 3001 Leuven, Belgium <sup>c</sup>Katholieke Universiteit Leuven, ESAT, Belgium

#### Abstract

This paper proposes an appropriate method to estimate and mitigate the impact of aging on the read path of a 32 nm high performance SRAM design; it analyzes the impact of the memory cell, and sense amplifier (SA), and their interaction. The method considers different workloads and inspects both the bit-line swing (which reflect the degradation of the cell) and the sensing delay (which reflects the degradation of the sense amplifier); the voltage swing on the bit lines has a direct impact on the proper functionality of the sense amplifier. The results with respect to the quantification of the sense amplifier, while the sensing delay degradation strongly depends on the workload, supply voltage and temperature (up to 41% degradation). The mitigation schemes, one targeting the cell and one the sense amplifier, confirm the same and show that sense amplifier mitigation is more effective for the SRAM read path than cell mitigation.

Keywords: Bit-line swing, sensing delay, BTI, SRAM sense amplifier

## 1. Introduction

The CMOS technology scaling is well known for causing crucial reliability challenges on electronics reliability [1, 2, 3]; e.g., it reduces their lifetime. A general practice in industry is the use of conventional guard-band and application of extra design margins to counteract for the Bias Temperature Instability (BTI) effect. Accurate estimation of such effect is vital for achieving an optimal designs. Clearly, an electronic system comprises of various parts; hence, accurate BTI estimation requires to evaluate not only *all* the various parts of the system, but also the way they communicate with each other, and how they all provide to the complete degradation of the system. For example, when it comes to SRAMs (the subject of this paper), estimating the effect of BTI by only focusing on the memory array, or by only integrating the individual effects of each components, will lead to optimistic or pessimistic results.

Several publications have investigated the impact of reliability on individual SRAM components. Kumar *et al.* [4] and Andrew [5] analyzed the impact of negative Bias Temperate Stability (NBTI) on the read stability and the Static Noise Margin (SNM) of SRAM cells. Bansal *et al.* [6] presented insights on the stability of an SRAM cell under the worst-case conditions and analyzed the effect of NBTI and

\*Corresponding author

Email address: i.o.agbo@tudelft.nl (Innocent Agbo)

PBTI (positive BTI). Khan et al. [7] performed BTI analysis for FinFET based memory cells for different SRAM designs using SNM, Read Noise Margin (RNM) and Write Triple Point (WTP) as metrics. Menchaca et al. [8] analyzed the BTI impact on different sense amplifier designs implemented on 32 nm technology node by using failure probability (i.e., flipping a wrong value) as a reliability metric. Agbo et al. [9, 10, 11, 12, 13, 14, 15, 16] investigated the BTI impact on SRAM drain-input and standard latch-type sense amplifier design, while considering process, supply voltage, and temperature (PVT) variations in the presence of varying workloads and technology nodes. Other research focused on mitigation schemes. For example, Kraak [17, 18] investigated the mitigation of SA offset voltage degradation by considering periodic input switching. Gebregiorgis [19] investigated a low cost selfcontrolled bit-flipping scheme which reverses all bit positions w.r.t. an existing bit.

From the above, we conclude that not much work is published on aging, while taking into account all the memory components and thus their interactions, and the effect of mitigation methodologies on the whole memory. Li [20] studied the lifetime estimation of each individual transistor for the *entire* SRAM and for various reliability mechanism (i.e., HCI, TDDB, NBTI). However, this investigation did not require the workload, which has been demonstrated to have a large effect on the degradation rates [21, 22]. In our previous work [23], we analyzed the impact of aging in the read path of a 32 nm high performance SRAM design for different workloads. However, the impact of aging on different supply voltages, temperatures, and varying device drive strengths based on Bit-line swing (BLS), sensing delay (SD), and energy (E) metrics on the memory read path are yet to be explored. In addition, effective mitigation schemes are not proposed. The above clearly shows that an appropriate approach (that accurately predicts the impact of aging, workloads, and PVT) is needed. Hence, this analysis is crucial to help memory designers understand which of the memory parts to focus on during design for an optimal and reliable design.

In this paper, we setup a step towards this, and we propose an *accurate* method to estimate the impact of Bias Temperature Instability (BTI) on the read path consisting of an SRAM cell and sense amplifier (SA). This enables not only optimal designs (in terms of design margins), but also the development of appropriate design-for-reliability schemes. The proposed method uses the *Atomistic Model* for aging (which is a calibrated BTI model [24, 25]) and considers the *workload dependency* (as the aging variations are strongly workload dependent [21, 22]). To measure both the impact of the cell and SA appropriate workloads are defined while using the bit-line voltage swing, SA sensing delay, and energy as metrics. In addition, we analyze different mitigation schemes and their effectiveness.

The rest of the paper is organized as follows. Section 2 provides the SRAM simulation model, and explains BTI mechanism and its model. Section 3 provides the analysis framework and performed experiments. Section 4 analyzes impact of aging on the read path. Section 5 proposes and evaluates the mitigation schemes. Finally, Section 6 and Section 7 discusses the results and concludes this paper, respectively.

## 2. Background

This section briefly presents the simulation model; it consists of the critical SRAM components in the read path. Finally, it discusses the BTI mechanism and its model.

#### 2.1. Simulation model

Figure 1 shows the simulation model, which is divided into four parts (i.e., precharge circuitry, 6T cell, SA precharge and the SA). The W/L ratio of each transistor considered for aging is included in the figure. Capacitances are also added to the bit-lines to model the impact of other cells sharing the same column as the simulated cell. Here we assume a  $512 \times 128$  memory array. During a read operation, first the bit lines are precharged (using precharge circuit), and thereafter one of the bit lines is discharged through one of the cell's pull down transistors of the SRAM 6T cell. The voltage difference/swing is then amplified by the SA to produce the output.



Figure 1: Simulation setup.

The SA precharge is used to precharge and equalize the data-lines DL and DLBar to identical voltages before the SA amplifies a small voltage difference between BL and BLBar during read operations, and produces the output at Out (DL) and Outbar (DLBar). The positive feedback loop (created by cross-coupled inverters) ensures low amplification time and produces the read value at its output. Because the considered design is high performance, the cell has strong pull-down transistors to speed-up the formation of the swing between the bit-lines during read operation.

It is worth noting that only aging in the cell and the sense amplifier are considered; the cell precharge circuit and the SA precharge circuit are ignored due to their relative large transistor sizes (i.e., less affected by BTI).

## 2.2. Bias Temperature Instability

The Bias Temperature Instability (BTI) mechanism takes place inside MOS transistors and increases the absolute  $V_{\rm th}$  value of the transistors [26, 27]. The  $V_{\rm th}$  in-

crement in a PMOS transistor occurs under *negative* gate *stress* and is referred to as NBTI, while in an NMOS transistor this occurs under *positive* gate *stress*, and is known as PBTI. Note that for a MOS transistor, there are two BTI phases, i.e., the stress phase and the relaxation phase.

Exhaustive efforts have been put to understand and model BTI appropriately [26, 27, 28]. The most two known models are the reaction-diffusion (RD) model proposed by Alam *et al.* [26], and the atomistic model proposed by Kaczer *et al.* [24]; the first is deterministic and the second is probabilistic. In this work, we use the atomistic model as it produces accurate results than the RD model [29]. The atomistic model is based on the capture and emission of single traps during stress and relaxation phases of NBTI/PBTI respectively. The threshold voltage shift  $\Delta V_{th}$  of the device is the accumulated results of all the capture and emission of carriers in gate oxide defect traps. The probabilities of the defect occupancy in case of capture  $P_C$  and emission  $P_E$  are defined by [29]:

$$P_C(t_{STRESS}) = \frac{\tau_e}{\tau_c + \tau_e} \left\{ 1 - exp \left[ -(\frac{1}{\tau_e} + \frac{1}{\tau_c}) t_{STRESS} \right] \right\}$$
(1)

$$P_E(t_{RELAX}) = \frac{\tau_c}{\tau_c + \tau_e} \left\{ 1 - exp\left[ -(\frac{1}{\tau_e} + \frac{1}{\tau_c})t_{RELAX} \right] \right\}$$
(2)

where  $\tau_c$  and  $\tau_e$  are the mean capture and emission time constants, and  $t_{STRESS}$  and  $t_{RELAX}$  are the stress and relaxation periods, respectively. Furthermore, BTI induced  $V_{th}$  is an integral function of Capture Emission Time (CET) map [7], workloads, duty factor and transistor dimensions, which gives the mean number of available traps in each device, the model also includes the impact of temperature in [24, 25].

#### 3. Analysis Framework

This section presents the analysis framework and the conducted experiments.

#### 3.1. Framework Flow

Figure 2 depicts our generic simulation framework to evaluate the BTI impact on the cell and SA designs. It uses Spectre simulator and has the following components.

**Input:** The general input blocks of the framework are the technology library, cell and sense amplifier design, and BTI input parameters.

- Technology library: In this work we use only the 32 nm PTM library [30]. Note that in general any library card can be used.
- Cell and SA designs: Generally, any memory cell and sense amplifier design can be used. In this paper, we focus only on the design in Figure 1. The 6T cell and SA designs are described by a SPICE netlist.



Figure 2: Analysis framework.

- BTI parameters: The BTI induced degradation strongly depends on the stress time duration, hence on the workload. The workload sequence is assumed to be replicated until the age time is reached. To define the workloads for our analysis, we assume two extreme workloads for the cell's state: (i) 80% zero's where 80% of the cycles the cell holds a zero, and (ii) 20% zero's. Similarly, we assume two workloads for the SA: (i) 80% of the instructions are reads, and (ii) 20% of the instructions are reads. Based on this information, we derive four workload sequences for circuit simulation:
  - S1: denotes 20% zero's and 80% read instructions for the SA.
  - S2: i.e., 20% zero's and 20% read instructions for SA.
  - S3: i.e., 80% zero's and 80% read instructions for SA.
  - S4: i.e., 80% zero's and 20% read instructions for the SA.

Using the waveform of the read operation and the workload sequences, we extract duty factors for each transistors individually.

**Processing:** Based on the inputs (i.e., technology, design, BTI parameters etc.), a perl control script generates several instances of BTI augmented SRAM cell and/or sense amplifier, depending on the simulation case (see Section 3.2). Every generated instance has a distinct number of traps [24] (with their unique timing constants) in each transistor, and are incorporated in a Verilog-A module of cell netlist only, SA netlist only, or both cell and SA netlists. The module responds to every trap individually, and alters the transistors concerned parameters such as  $V_{th}$ . After inserting BTI in every transistor of either coupled design or individual designs, a Monte Carlo (MC) simulation is performed at different time steps (100 runs



Figure 3: Metric diagram of (a) Bit-line swing BLS and (b) Sensing delay SD.

at each time step) where circuit simulator (Spectre) is used to investigate the BTI impact.

**Output:** Finally, statistical post-analysis of the results are performed for varying supply voltages, temperatures and device drive strengths in MATLAB environment. The raw outputs are measured directly from Spectre and used to determine the bit-line swing and sensing delay metrics, which are described next.

**Bit-line swing:** The bit-line swing *BLS* specifies the voltage difference between bit-lines *BLBar* and *BL* (see Figure 3a) at a fixed reference time  $T_{ref}$ ; i.e., the time where the up transition of the sense amplifier enable signal reaches 50% of the supply voltage as shown in Figure 3a.

**Sensing delay:** The sensing delay SD is the time required for the SA to complete its operation; it is the time between the sense enable activation (i.e., when the up transition reaches 50% of the supply voltage) and the falling *out* or *outbar* signal (i.e., when the down transition reaches 50% of  $V_{dd}$ ) as depicted in Figure 3b.

#### 3.2. Experiments Performed

In this paper, three sets of experiments are performed that are related to the quantification of aging, where each set consists of three cases: (a) only the cell degrades (Cell-Only), (b) only the SA degrades (SA-Only), and (c) both of them degrade (Combined).

- 1. **BTI Impact Experiments:** BTI impact on bitline swing and sensing delay for four workload sequences (*S1*, *S2*, *S3* and *S4*) at nominal supply voltage and nominal temperature are investigated.
- 2. Supply Voltage Dependent Experiments: BTI impact on the bit-line swing and sensing delay for varying supply voltages (i.e., from  $-10\% V_{dd}$  to +10% of  $V_{dd}$ ) and two workload sequences S2 and S3 are investigated. Note that these two sequences present the best and the worst case stresses.

Table 1: BTI impact after 10<sup>8</sup>s.

| Degradation | Workload        | Bit-line   | Sensing    |  |
|-------------|-----------------|------------|------------|--|
| component   |                 | swing (mV) | delay (ps) |  |
| Cell-Only   | 20% zero        | 107.0      | 61.09      |  |
| Cell-Olliy  | 80% zero        | 106.3      | 61.20      |  |
| SA-Only     | 20% read instr. | 111.1      | 61.83      |  |
|             | 80% read instr. | 111.6      | 65.71      |  |
|             | S1              | 107.8      | 66.08      |  |
| Combined    | S2              | 107.4      | 62.18      |  |
|             | $\overline{S3}$ | 107.1      | 66.21      |  |
|             | $\overline{S4}$ | 106.7      | 62.29      |  |

3. Temperature Dependent Experiments: BTI impact on bit-line swing and sensing delay for three temperatures (i.e., 233K, 298K and 348K) and two workload sequences S2 and S3 are explored.

#### 4. Experimental Results

This section, presents the analysis results of the experiments mentioned in the previous section.

#### 4.1. BTI Impact Experiments

Table 1 shows the results for the three cases for a stress period of  $10^8$ s; the first column presents the simulated case. 'Cell-Only' denotes the case when only the cell is impacted by BTI, 'SA-Only' when only the SA is impacted, and 'Combined' when both the cell and SA degrade due to BTI. Note that in case of 'Cell-Only', both the bit-line swing (*BLS*) and the sensing delay (*SD*) are affected, while in the case of *SA-Only*, the *SD* is impacted (i.e., the *SD* may increase due to slow bit-line swing development or slow SA) while the *BLS* should not be affected. The table reveals the following for the different cases.

For the case 'Cell-Only', the BLS is marginally dependent on the workload, resulting in almost no impact on the SD. This can be explained by the fact that the pull-down transistors of the cell used for this design are very strong (see Figure 1). We will assume SD=61.09ps as the baseline.

For the case 'SA-Only', the cell is not suffering from BTI; hence, it is not affected and is about 111mV. The SD, however, is affected and increases for more stressy workloads. The SD at 80% read instructions is  $\sim 6\%$  higher than at 20% reads for which the SD is just 1% more than the baseline.

For the case 'Combined', although the BLS is reduced as compared with the a-fresh cell (see SA-Only case), the dependency of BLS on the workload is marginal due to the chosen design as already mentioned. However, as can be predicted, the results show clear dependency of the SD on the workload; the SD is higher for sequences S1 and S3 which both have 80% read instructions for the SA. At 80%



Figure 4: BTI impact for the four workload sequences.

| Degradation | Workload        | Vdd(V) | Bit-line   | Sensing    |
|-------------|-----------------|--------|------------|------------|
| component   |                 |        | swing (mV) | delay (ps) |
|             |                 | -10%   | 76.7       | 74.64      |
|             | 20% zero        | Nom.   | 107.0      | 61.09      |
| Cell-Only   |                 | +10%   | 136.8      | 52.53      |
| Cen-Omy     |                 | -10%   | 76.2       | 74.74      |
|             | 80% zero        | Nom.   | 106.3      | 61.20      |
|             |                 | +10%   | 135.5      | 52.67      |
|             | 20% read instr. | -10%   | 79.0       | 75.82      |
|             |                 | Nom.   | 111.1      | 61.83      |
| SA Only     |                 | +10%   | 143.7      | 53.18      |
| SA-Only     | 80% read instr. | -10%   | 79.3       | 80.41      |
|             |                 | Nom.   | 111.6      | 65.71      |
|             |                 | +10%   | 144.4      | 58.00      |
| Combined    |                 | -10%   | 76.9       | 76.28      |
|             | S2              | Nom.   | 107.4      | 62.18      |
|             |                 | +10%   | 137.5      | 53.92      |
|             |                 | -10%   | 76.7       | 80.94      |
|             | S3              | Nom.   | 107.1      | 66.21      |
|             |                 | +10%   | 136.8      | 58.92      |

Table 2: Voltage degradation dependency after  $10^8$ s.

read instructions (S1 and S3), the SD is also  $\sim 6\%$  higher than at 20% read instructions (S2 and S4); in the latter case the SD is about 2% more than the baseline. Note that the *relative* increase due to workload is the same as for 'SA-Only' case.

Figure 4 shows how BLS and SD evolve over time for a duration of 3 years degradation (i.e.,  $10^8$ s) for the case 'Combined'; each point in the graph corresponds to the average of 100 Monte Carlo simulations. The figure clearly confirms the conclusions extracted from Table 1, and that (although in terms of absolute number of our case study, the difference are not so big), the slowest SD is obtained when both the degradation of the cell and the SA are considered. Note that the SD tends to grow very fast when the operational lifetime gets closer to 3 years ( $10^8$ s).



Figure 5: Supply voltage dependency of SD and BLS for S3 sequence.

#### 4.2. Supply Voltage Dependency

Table 2 shows the result of Supply Voltage Dependent Experiments for a stress period of  $10^8 s$ . The table reveals the following.

For the case 'Cell-Only', similar to the first experiment, the BLS seems to be marginally dependent on the workload. However, a change in the supply voltage clearly influences both BLS and SD. Increasing the supply voltage accelerates the development of the swing on the bit lines; hence increasing the BLS. This in turn reduces the sensing delay. On the other hand, reducing the supply voltages reduces the BLS, which in turn increases the SD. A variation of +10% in supply voltage causes an increase of about 28% in BLS and a reduction of about 14% in SD, while a variation of -10% in supply voltage causes a decrease of almost the same percentage in BLS (28%) and an increase of more than 22% in SD.

For the case 'SA-Only', although the cell is not suffering from BTI, the supply voltage clearly impacts the BLS. It follows the same trend as for Cell-Only case. On the other hand, SD is both supply voltage and workload dependent. A higher voltage improves (reduces) the SD, while a lower voltage worsens (increases) the SD. A +10%variation in  $V_{dd}$  causes a reduction of about 14% in SD, and -10% variation in  $V_{dd}$  causes an increase of about 22% in SD. In addition, although the development of voltage swing is accelerated at higher supply voltage, the impact of the workload dependency seems to be slightly higher at higher supply voltage. For example, at  $-10\% V_{dd}$  the SD increases from 75.82ps (for 20% read instructions) to 80.41ps (for 80% read instructions); an increase of 6%. However, this is about 9% at  $+10\% V_{dd}$ . Note that the impact of supply voltage variation is much dominant than the impact of BTI; this is due to the sizing of the cell's pull-down transistors (see Section Discussion).

| Degradation | Workload        | Temp. | Bit-line   | Sensing    |
|-------------|-----------------|-------|------------|------------|
| component   |                 | (K)   | swing (mV) | delay (ps) |
|             |                 | 233   | 175.4      | 38.23      |
|             | 20% zero        | 298   | 107.0      | 61.09      |
| Coll Only   |                 | 348   | 72.2       | 86.27      |
| Cen-Only    |                 | 233   | 175.2      | 38.25      |
|             | 80% zero        | 298   | 106.3      | 61.20      |
|             |                 | 348   | 70.7       | 86.62      |
|             |                 | 233   | 177.9      | 38.36      |
|             | 20% read instr. | 298   | 111.1      | 61.83      |
|             |                 | 348   | 78.6       | 90.10      |
| SA-Olly     | 80% read instr. | 233   | 178.0      | 38.63      |
|             |                 | 298   | 111.6      | 65.71      |
|             |                 | 348   | 79.6       | 143.94     |
| Combined    |                 | 233   | 175.5      | 38.59      |
|             | S2              | 298   | 107.4      | 62.18      |
|             |                 | 348   | 73.1       | 90.46      |
|             |                 | 233   | 175.3      | 38.87      |
|             | S3              | 298   | 107.1      | 66.21      |
|             |                 | 348   | 72.6       | 151.77     |

Table 3: Temperature Degradation dependency after  $10^8$ s.

For the 'Combined' case, the results show similar trends as for 'SA-Only' case. Even in terms of absolute numbers, the impact of  $V_{dd}$  variations and workloads on SD are very close (max 1.5% increase) to the results found for 'SA-Only'. Although the slowest SD is obtained in this case, the additional contribution of interaction between degrading cell and degradation SA to the SD as compared with 'SA-Only' is very marginal and does not exceed 1.5%.

Figure 5 shows how BLS and SD supply voltage dependency evolve over time for a duration of 3 years degradation for the case 'Combined' using S3 (worst case stress). The figure shows the impact on the BLS becomes visible when the operational life becomes close to 3 years, which clearly start then impacting the SD.

#### 4.3. Temperature Dependency

Table 3 shows the results of the Temperature Experiments for a stress period of  $10^8 s$ . The table reveals the following.

For the case 'Cell-Only', similar to the first two experiments, the BLS seems to be marginally dependent on the workload. However, the temperature strongly influences both BLS and SD. The higher the temperature, the lower the BLS and the higher the SD. Increasing the temperature from 298K to 348K reduces the BLS with about 33% and increases the sensing delay with about 41%.

For the case 'SA-Only', the temperature clearly impacts the BLS although the cell is not suffering from BTI; hence, the temperature impacts the BLS irrespective of BTI. This impact strengthens the degradation of the SD due to the BTI. The SD is strongly temperature dependent and the situation becomes worst for stressy workloads. At 20% read instructions, the SD increases from 61.83*ps* at 298K to 90.10*ps* at 348K; an increase of 45%. However,



Figure 6: Temperature degradation dependency of SD and BLS for S3 sequence.

this is 119% for 80% read instructions!

For the 'Combined' case, the results show similar trends as to 'SA-Only' case. Even in terms of absolute numbers, the impact of temperature variations and workloads on the SD are close to the results found for 'SA-Only'. Although the slowest SD is obtained in this case, the additional contribution of interaction between degrading cell and degrading SA to the SD as compared with 'SA-Only' is marginal except for the *S3* at 348K where this is 5.4%.

Figures 6 shows how BLS and SD evolve over time for a duration of 3 years degradation for workload S3 in Combined case. The figure clearly confirms the conclusions extracted from Table 3, and that the degradation of the read paths starts to grow exponentially at high temperatures after a stress time of  $10^5$ s.

### 5. Mitigation schemes

In the previous section, we observed that BLS and sensing delay may heavily be impacted by BTI. In this section, we investigate two mitigation techniques, i.e., increasing the cell strength's pull-down transistors and the SA drive strength's pull-down transistors (i.e., *Nom.DS* denoting normal sized transistors, 25%DS denoting 25% larger transistors, and 50%DS denoting 50% larger transistors), while considering workloads sequences S2 and S3 at nominal supply voltage and temperature conditions. Note that the cell strength influences the BLS and thus indirectly the sensing delay.

Table 4 shows the individual impact of the drive strength of the Cell, the SA and their combined impact for a stress period of  $10^8$ s. In the table, 'Cell-Only' *denotes* the case where only the cell's pull-down transistors drive strength are sized up (i.e., *Nom*.DS, 25%DS and 50%DS). Similarly,

| Table 4: | Cell  | and   | $\mathbf{SA}$ | $\operatorname{strength}$ | degra  | dation | deper | idency | after | $10^{8}$ s. |
|----------|-------|-------|---------------|---------------------------|--------|--------|-------|--------|-------|-------------|
| Compone  | unt 1 | Workl | and           | Dovice st                 | rongth | Bit 1  | ino   | Sonsin | r F   | norm        |

| Component | mondad | Device burengen | Die mie    | Demoning   | LINCISJ |
|-----------|--------|-----------------|------------|------------|---------|
|           |        | (DS)            | swing (mV) | delay (ps) | (fJ)    |
|           |        | Nom.            | 107.3      | 62.22      | 23.51   |
|           | S2     | 25%             | 116.7      | 61.02      | 23.64   |
| Coll Only |        | 50%             | 123.5      | 59.88      | 23.65   |
| Cen-Only  |        | Nom.            | 107.0      | 66.27      | 24.36   |
|           | S3     | 25%             | 116.5      | 64.68      | 24.40   |
|           |        | 50%             | 123.2      | 63.52      | 24.42   |
|           |        | Nom.            | 107.3      | 62.22      | 23.51   |
|           | S2     | 25%             | 107.2      | 57.86      | 22.80   |
| SA Only   |        | 50%             | 107.1      | 54.97      | 22.28   |
| SA-Olity  |        | Nom.            | 107.0      | 66.27      | 24.36   |
|           | S3     | 25%             | 106.9      | 61.82      | 23.68   |
|           |        | 50%             | 106.8      | 58.73      | 23.15   |
|           |        | Nom.            | 107.3      | 62.22      | 23.51   |
|           | S2     | 25%             | 116.6      | 56.74      | 22.94   |
| Combined  |        | 50%             | 123.2      | 52.83      | 22.39   |
| Combined  |        | Nom.            | 107.0      | 66.27      | 24.36   |
|           | S3     | 25%             | 116.4      | 60.16      | 23.68   |
|           |        | 50%             | 123.0      | 56.23      | 23.17   |

'SA-Only' presents the case where only the drive strength of the pull-down transistors of the SA are sized up. In the 'Combined' case, the pull down transistors of both the cell and SA are simultaneously resized. The second column specifies the applied workload, both the cell and SA are stressed using either workload S2 or S3. This workload is applied whether or not a component is resized or not. The third column specifies the device strength (DS) of the pull down transistors, and the last 3 columns show the results; the evaluated metrics are Bit-line swing (BLS), Sensing delay (SD), and Energy (E), respectively. The BLS and SD are defined in Section 3, while the energy is defined as the sum of static and dynamic energy consumption for a single read operation. Next, the three cases will be described.

**Cell-Only:** For the case 'Cell-Only', the BLS significantly increases when the transistors are re-sized. For example, from 107mV to 123mV when a 50% bigger size is used. This 15% BLS increment is more or less workload independent. However, the BLS increment leads to a much smaller sensing delay improvement. For example, for S2 this improvement is only  $\frac{62.22-59.88}{62.22} \times 100 = 3.7\%$ , while  $\frac{66.27-63.52}{66.27} \times 100 = 4.1\%$  for workload S3. The energy consumption does not alter much with resizing. Although the operation is faster, also the peak power consumption increases.

**SA-Only:** In contrast, 'SA-Only' has the opposite effect and there is no impact on the Bit-line swing. However, a higher reduction for the sensing delay is observed as compared to the 'Cell-Only'. This delay depends strongly on the applied workload. Furthermore, the device drive strength marginally impacts the energy consumption. For example, increasing the device drive strength from 0% to 50%, has no impact on the BLS (small differences are due to Monte Carlo simulations) up to 0.2% and marginally reduces the energy consumption up to 5.0%, while SD significantly reduces with up to 11.4% for the worst-case (S3) workload.



Figure 7: Cell and SA strength degradation dependency of SD and BLS for S3 sequence.

**Combined:** For the 'Combined' case, the results show that the BLS is following the same trend as the 'Cell-Only' and the SD only slightly improves with respect to the case 'SA-Only'. For example, the impact difference for a 50% device drive strength (DS) on BLS between 'Cell-Only' and 'Combined' is 0.3mV, this difference can be attributed to Monte Carlo variations. With respect to the sensing delay, in the case 'SA-Only' a 50% drive strength is able to achieve a reduction of 11.4%, while this is 15.2% for the combined case. In addition, the energy consumption is similar as well.

Figure 7 shows the impact of different device drive strength on both bit-line swing (BLS) and sensing delay (SD) for the 'Combined' case, for workload S3. The figure shows that the BLS marginally reduces over time (i.e., up to 1.47% for Nom.DS, 0.94% for 25%DS and 0.97% for 50%DS) while the SD significantly increases (i.e., up to 7.01% for Nom.DS, 6.82% for 25%DS, and 6.73% for 50%DS) over the operational life time. The relative differences between the different drive strengths are marginal.

Figure 8 shows the impact of the device drive strengths on the energy consumption for the 'Combined' case; the energy reduces as the drive strength increases, irrespective of the operational life time. However, the decrease does not exceed 5.0%. For example, at  $10^8$ s and for DS=Nom, the energy consumption is 24.36fJ, while this is 23.17fJ for DS=50%. In addition, the figure shows for a given drive strength that the aging causes the energy to slightly increase up to 3.0%, irrespective of the drive strength.

Overall, the most effective mitigation technique would be to resize the SA Only, especially, when the area is also considered. Increasing the cell sizes affects the whole memory



Figure 8: Cell and SA strength degradation dependency of Dynamic Energy for S3 sequence.

matrix, while increasing the 'SA-Only' has a much lower area impact.

#### 6. Discussion

The memory cell and SA robustness are vital for the overall design of memory systems. In this work a simulation based analysis to examine the impact of BTI degradation on the sensing delay (SD), Bit-line Swing (BLS), and Energy (E) of the SRAM read path has been performed; the analysis is done for different workloads, voltages, temperatures, and varying device drive strengths. Below some interesting observations are made.

The obtained results clearly show that for the considered SRAM design the cell has a low impact and that the SA is the major component responsible for the read path timing degradation, even under different voltages and temperatures. Therefore, this information can be used by the designers to optimize the design margins of the cell. One possible explanation of the marginal contribution of the cell degradation to the SD is the cell's strong pull down transistors. Therefore, we investigate the impact of a small cell where we assume W/L of the pull-down transistors to be 2.4 instead of 4.8 (see Figure 1). The simulation is performed for 5 years using S3 workload (Combined case), and the results both for the initial design and the smaller cell design (0.5PDN) are shown in Figure 9. Although the trends of the SD increase for the two simulations seem similar, there are three interesting points to make. First, the relative increase of SD is 7% for the initial design, while this is 9% for the smaller one. Hence, the stronger the pull-down transistors of the cell, the smaller the contribution of the cell to the SD. Second, as the figure shows, the size of the pull-down transistors have also an impact on the SD spread; the stronger the devices, the smaller the

8

Table 5: Cell stability analysis.

|           | Time     | Nominal Cell size |       | Halve | Cell size |
|-----------|----------|-------------------|-------|-------|-----------|
|           | (s)      | WC                | BC    | WC    | BC        |
|           | 0        | 312.8             | 312.8 | 309.2 | 309.2     |
| HSNM (mV) | $10^{8}$ | 300.6             | 308.3 | 298.9 | 304.0     |
|           | Rel. %   | -3.90             | -1.44 | -3.33 | -1.68     |
|           | 0        | 168.3             | 168.3 | 167.1 | 167.1     |
| RSNM (mV) | $10^{8}$ | 152.5             | 160.0 | 153.2 | 158.2     |
|           | Rel. %   | -9.39             | -4.93 | -8.32 | -5.33     |
|           | 0        | 269.7             | 269.7 | 272.6 | 272.6     |
| WTP (mV)  | $10^{8}$ | 271.2             | 277.9 | 275.1 | 279.5     |
|           | Rel. %   | 0.56              | 3.04  | 0.92  | 2.53      |



Figure 9: Variation in Cell Pull-down transistor for S3 sequence.

spread (i.e.,  $+/-3\sigma$  represented by the boundaries of the vertical lines in Figure 9). Third, the SD increases relatively faster after  $10^4$ s, but then tends to saturate after 3 years ( $10^8$ s); the relative increase from 3 years to 5 years is no more than 0.7% for initial design and 0.9% for smaller version. Clearly the size of the cell's pull-down devices can be used also to minimize the degradation of the read path in SRAMS; and obviously this should be done while considering the SNM of the cell to ensure the stability of the cell as well.

Clearly, reducing the cell area (e.g., pull-down transistors) will only slightly increase the SD (2.0% difference). Hence, the memory cell area can be optimized as long as the SD is within acceptable limit. However, it is crucial to ensure the cell stability for the smaller cell. Therefore, we investigate for both the nominal and the smaller cell three metrics: HSNM (hold static noise margin), RSNM (read static noise margin), and WTP (write trip point) while considering two workloads (i.e., worst case (WC) and best case (BC)) for 3 years lifetime as shown in Table 5. The HSNM is the voltage  $V_n$  that flips the cell when it is injected at its internal node; it is swept from  $-V_{dd}$  to  $V_{dd}$ while the word lines are disconnected from the bit lines. The RSNM is the  $V_n$  that flips the cell while the word lines are connected to the bit lines and  $V_n$  is swept from  $-V_{dd}$  to  $V_{dd}$ . The WTP is the bit line voltage at which the cell flips while the word lines are connected to the bit lines; this voltage can be found by sweeping one of the bit lines potential from  $-V_{dd}$  to  $V_{dd}$  [7, 31]. Table 5 shows that for both cells HSNM marginally reduces after 3 years (does not exceed 3.9%), and that the relative difference is not more than 1.40%, irrespective of the workload and cell size considered. However, the results show that the RSNM reduces quite significant for both cells; this is up to 9.4%and 5.3% for the WC and BC workloads respectively, irrespective of the cell size. The difference between both cells is marginal. The table finally shows that the WTPincreases marginally, irrespective of the workload and cell size considered, and that the relative difference between the two cells do not exceed 1.44%. Overall, it is worth noting that halving the cell size does not impact the cell stability much worse as compared to the normal cell.

Our next observation is w.r.t. the impact of supply voltage. Higher voltage increases the bit line swing after an operation of  $10^8$ s and *reduces* the SD. Hence, it can be used to compensate for the degradation of read path especially when the targeted application poses a worst stress on the read path. Obviously, this comes at additional power consumption.

Furthermore, we observed that a higher temperature does not only reduce the BLS (which may impact the functionality) but also significantly increases the SD. Hence, using appropriate cooling is crucial for lifetime extension and degradation retardation.

Finally, we observed that resizing the cell only marginally mitigates the read path degradation. In contrast, resizing the SA is much more effective. Therefore, more research should focus on effective mitigation schemes for SA, such as input switching in [18].

# 7. Conclusion

This paper investigated an accurate technique to estimate and mitigate the impact of Bias Temperature Instability (BTI) on the read path of a 32 nm memory design while considering various degrading components i.e., *Cell only, SA only,* and *Combined* (i.e., cell and SA), and for different workloads, supply voltages and temperatures. Hence, the proposed methodology for the entire read path degradation analysis is an interesting case study as it allows for a better understanding of the overall degradation and hence for better design margin optimization. To ensure correct operational lifetime, designers must be aware about how the different parts of the memory degrade, how their interactions contribute to the degradation, and how all of these determine the overall degradation. It is worth noting that in our investigation zero-time variations (process variations) are not taken into consideration as a result of model limitations.

#### References

- ITRS, International technology roadmap for semiconductor 2005, www.itrs.net/common/2005 update/2005update.htm, SIA.
- [2] S. Borkar, Microarchitecture and design challenges for gigascale integration, MICRO 37 (2004) 3–3doi:10.1109/MICRO.2004.24.
- [3] S. Hamdioui, D. Gizopoulos, G. Guido, M. Nicolaidis, A. Grasset, P. Bonnot, Reliability challenges of real-time systems in forthcoming technology nodes, DATE (2013) 129–134doi:10. 7873/DATE.2013.040.
- [4] S. V. Kumar, C. H. Kim, S. S. Sapatnekar, Impact of nbti on sram read stability and design for reliability, the 7th International Symposium on Quality Electronic Design 0 (9) (2006) 210-218. doi:10.1109/ISQED.2006.73.
- [5] A. Carlson, Mechanism of increase in sram vmin due to negative-bias temperature instability, IEEE TDMR 7 (3) (2007) 473–478. doi:10.1109/TDMR.2007.907409.
- [6] A. Bansal, R. Rao, J.-J. Kim, S. Zafar, J. H. Stathis, C.-T. Chuang, Inpacts of nbti and pbti on sram static/dynamic noise margins and cell failure probability, Journal Microelectronics Reliability 49 (6) (2009) 642-649. doi:10.1016/j.microrel. 2009.03.016.
- [7] S. Khan, I. Agbo, S. Hamdioui, H. Kukner, B. Kaczer, P. Raghavan, F. Catthoor, Bias temperature instability analysis of finfet based sram cells, Design, Automation Test in Europe Conference Exhibition (DATE) (2014) 1–6doi:10.7873/DATE.2014. 044.
- [8] R. Menchaca, H. Mahmoodi, Impact of transistor aging effects on sense amplifier reliability in nano-scale cmos, 13th International Symposium on Quality Electronic Design (ISQED) (2012) 342–346doi:10.1109/ISQED.2012.6187515.
- [9] D. Rodopoulos, P. Weckx, M. Noltsis, F. Catthoor, D. Soudris, Atomistic pseudo-transient bti simulation with inherent workload memory, IEEE Transactions on Device and Materials Reliability 14 (2) (2014) 704–714. doi:10.1109/TDMR.2014. 2314356.
- [10] I. Agbo, S. Khan, S. Hamdioui, Bti impact on sram sense amplifier, 8th IEEE Design and Test Symposium (2013) 1– 6doi:10.1109/IDT.2013.6727094.
- [11] I. Agbo, M. Taouil, S. Hamdioui, H. Kukner, P. Weckx, P. Raghavan, F. Catthoor, Integral impact of bti and voltage temperature variation on sram sense amplifier, IEEE 33rd VLSI Test Symposium (VTS) (2015) 1–6doi:10.1109/VTS. 2015.7116291.
- [12] I. Agbo, M. Taouil, S. Hamdioui, P. Weckx, S. Cosemans, P. Raghavan, F. Catthoor, Comparative bti analysis for various sense amplifier designs, IEEE 19th International Symposium on Design and Diagnostics of Electronic Circuits Systems (DDECS) (2016) 1–6doi:10.1109/DDECS.2016.7482438.
- [13] I. Agbo, M. Taouil, S. Hamdioui, P. Weckx, S. Cosemans, P. Raghavan, F. Catthoor, W. Dehaene, Quantification of sense amplifier offset voltage degradation due to zero-and run-time variability, IEEE Computer Society Annual Symposium on VLSI (ISVLSI) (2016) 725–730doi:10.1109/ISVLSI.2016.30.
- [14] I. Agbo, M. Taouil, S. Hamdioui, P. Weckx, S. Cosemans, F. Catthoor, Bti analysis of sram write driver, 10th International Design Test Symposium (IDT) (2015) 100-105doi: 10.1109/IDT.2015.7396744.
- [15] I. Agbo, M. Taouil, D. Kraak, S. Hamdioui, H. Kukner, P. Weckx, P. Raghavan, F. Catthoor, Integral impact of bti, pvt variation, and workload on sram sense amplifier, IEEE Transactions on Very Large Scale Integration (VLSI) Systems 25 (4) (2017) 1444–1454. doi:10.1109/TVLSI.2016.2643618.

- [16] S. Cosemans, Variability-aware design of low power sram memories, Ph.D Thesis (2009) 1–338doi:978-94-6018-066-8.
- [17] P. Pouyan, E. Amat, A. Rubio, Process variability-aware proactive reconfiguration technique for mitigating aging effects in nano scale sram lifetime, IEEE 30th VLSI Test Symposium (VTS). (2012) 240–245doi:10.1109/VTS.2012.6231060.
- [18] D. Kraak, I. Agbo, M. Taouil, S. Hamdioui, P. Weckx, S. Cosemans, F. Catthoor, W. Dehaene, Mitigation of sense amplifier degradation using input switching, Design, Automation Test in Europe Conference Exhibition (DATE), 2017 (2017) 858– 863doi:10.23919/DATE.2017.7927107.
- [19] A. Gebregiorgis, M. Ebrahimi, S. Kiamehr, F. Oboril, S. Hamdioui, M. B. Tahoori, Aging mitigation in memory arrays using self-controlled bit-flipping technique, The 20th Asia and South Pacific Design Automation Conference (2015) 231–236doi:10. 1109/ASPDAC.2015.7059010.
- [20] X. Li, J. Qin, B. Huang, X. Zhang, J. B. Bernstein, Sram circuit-failure modeling and reliability simulation with spice, IEEE Transactions on Device and Materials Reliability 6 (2) (2006) 235–246. doi:10.1109/TDMR.2006.876568.
- [21] P. Weckx, B. Kaczer, M. Toledano-Luque, T. Grasser, P. J. Roussel, H. Kukner, P. Raghavan, F. Catthoor, G. Groeseneken, Defect-based methodology for workload-dependent circuit lifetime projections application to sram, International Reliability Physics Symposium (IRPS) (2013) 3A.4.1–3A.4.7doi:10.1109/ IRPS.2013.6531974.
- [22] D. Rodopoulos, S. B. Mahato, V. V. de Almeida Camargo, B. Kaczer, F. Catthoor, S. Cosemans, G. Groeseneken, A. Papanikolaou, D. Soudris, Time and workload dependent device variability in circuit simulations, IEEE International Conference on IC Design Technology (2011) 1–4doi:10.1109/ICICDT. 2011.5783193.
- [23] I. Agbo, M. Taouil, S. Hamdioui, P. Weckx, S. Cosemans, F. Catthoor, W. Dehaene, Read path degradation analysis in sram, 21th IEEE European Test Symposium (ETS) (2016) 1– 2doi:10.1109/ETS.2016.7519325.
- [24] B. Kaczer, S. Mahato, V. V. de Almeida Camargo, M. Toledano-Luque, P. J. Roussel, T. Grasser, F. Catthoor, P. Dobrovolny, P. Zuber, G. Wirth, G. Groeseneken, Atomistic approach to variability of bias-temperature instability in circuit simulations, International Reliability Physics Symposium (2011) XT.3.1– XT.3.5doi:10.1109/IRPS.2011.5784604.
- [25] T. Grasser, P. J. Wagner, H. Reisinger, T. Aichinger, G. Pobegen, M. Nelhiebel, B. Kaczer, Analytic modeling of the bias temperature instability using capture/emission time maps, International Electron Devices Meeting (2011) 27.4.1–27.4.4doi: 10.1109/IEDM.2011.6131624.
- [26] M. A. Alam, S. Mahapatra, A comprehensive model of pmos nbti degradation, Physica 45 (1) (2004) 71-81. doi:10.1016/j. microrel.2004.03.019.
- [27] B. Kaczer, V. Arkhipov, R. Degraeve, N. Collaert, G. Groeseneken, M. Goodwin, Disorder-controlled-kinetics model for negative bias temperature instability and its experimental verification, IEEE IRPS (2005) 381–387doi:10.1109/RELPHY.2005. 1493117.
- [28] S. Zafar, Y. Kim, V. Narayanan, C. Cabral, V. Paruchuri, B. Doris, J. Stathis, A. Callegari, M. Chudzik, A comparative study of nbti and pbti (charge trapping) in sio2/hfo2 stacks with fusi, tin, re gates, Symposium on VLSI Technology, Digest of Technical Papers. (2006) 23–25doi:10.1109/VLSIT. 2006.1705198.
- [29] H. Kukner, S. Khan, P. Weckx, P. Raghavan, S. Hamdioui, B. Kaczer, F. Catthoor, L. V. der Perre, R. Lauwereins, G. Groeseneken, Comparison of reaction-diffusion and atomistic trap-based bti models for logic gates, IEEE Transactions on Device and Materials Reliability 14 (1) (2014) 182–193. doi:10.1109/TDMR.2013.2267274.
- [30] PTM, Predictive technology model, http://ptm.asu.edu/, Beckley, Arizona.
- [31] E. Seevinck, F. J. List, J. Lohstroh, Static-noise margin analysis of mos sram cells, IEEE Journal of Solid-State Circuits 22 (5)

(1987) 748-754. doi:10.1109/JSSC.1987.1052809.