# Opportunities and Challenges of Digital Signal Processing in Deeply Technology-Scaled Transceivers

Chunshu Li · Min Li · Khaled Khalaf · André Bourdoux · Marian Verhelst · Mark Ingels · Piet Wambacq · Jan Craninckx · Liesbet Van Der Perre · Sofie Pollin

Received: date / Accepted: date

Abstract The ever improving cost advantages and processing capabilities of the technology have been happening according to the so-called Moore's Law. Although digital circuits can significantly benefit from the aggressive scaling, it is very controversial for analog circuit. However, analog circuit still has to follow the scaling trend because a single chip integration offers key commercial advantages. To optimally achieve the best performance/power/cost tradeoff with deeply scaled technology nodes, there is a clear trend and paradigm shift towards digital intensive and digitally assisted transceivers. Successes of such transceivers have been proven for individual transceiver components and narrow band systems. When targeting emerging communication standards, higher carrier frequencies, further technology scaling and reconfigurable radios, required signal processing design and implementation are orders of magnitudes more challenging but potential gains are promising. Based on a variety of transceiver designs implementing emerging architectures for different sub-6 GHz and 60 GHz communication systems, we will highlight the key challenges and opportunities experienced using 40 nm and 28 nm technology nodes.

Chunshu Li $\cdot$ Min Li $\cdot$ Khaled Khalaf $\cdot$ Andre Bourdoux $\cdot$ Mark Ingels $\cdot$ Piet Wambacq $\cdot$ Jan Craninckx $\cdot$ Liesbet Van Der Perre $\cdot$ Sofie Pollin

Department of Circuits and Systems, IMEC

Kapeldreef-75, Leuven 3001, Belgium

E-mail: chunshu.li@imec.be

Chunshu Li · Marian Verhelst · Liesbet Van Der Perre · Sofie Pollin

Department of Electrical Engineering, K.U.Levuen Leuven 3000, Belgium

Khaled Khalaf  $\cdot$  Piet Wambacq

Department of Electronics and Informatics, Vrije Universiteit Brussels, Brussels 1000, Belgium

Keywords Signal processing  $\cdot$  SDR  $\cdot$  Transceiver  $\cdot$  Moore's law  $\cdot$  Digital intensive  $\cdot$  Digital assistance

### **1** Introduction

Wireless systems are integrating different existing and evolving wireless access systems that complement each other for different application areas and communication environments. To enable seamless and transparent inter-working between these different wireless access systems, communication systems are moving towards an era where ubiquitous connectivity and growing levels of integration will be essential for most applications. This revolution will not slow down its penetration of society for the foreseeable future. There is on one hand a market pull by an increasingly connected world population asking for vast information resources through the ubiquitously connected devices. On the other hand, there is a market push from a hundred-billion-dollar industry delivering all kinds of communication products and applications. In this context, mobile devices can represent a real bottleneck as they incorporate several concurrent constraints (e.g., in battery life, cost, performance, size and weight) that compromise the flexibility of future networks. On the other hand, as a crucial enabler for the ever improving communication technology, the ever improving cost advantages and processing capabilities of the CMOS technology have been happening according to the so-called Moore's Law. Mainstream foundries are leaping toward 14 nm node and beyond. The digital gate density doubles when scaling to the next technology node and the computation power of the digital platforms is improved. Such aggressive improvements, although only achieved after intense research and development efforts in both the academia and the industry, drive commercial digital circuits to continuously scale further.

However, the clear benefits of the digital scaling can not be directly projected onto traditional analog circuits. Analog design with scaled technology is very controversial. On one side, parasitics are lower, intrinsic time resolution is increased as well as the transistor speed. On the other side, the voltage resolution is substantially decreased and accurate modelling of transistors becomes impossible. In addition, the spread with respect to nominal corner is much larger. Moreover, thermal effects, decreased reliability and aging effects are all degrading analog devices. Due to the above, commercial radio frequency (RF) transceivers often lag behind digital circuits for using frontwave technologies. More insight can be found in [1]. However, analog circuit still has to follow the scaling trend of digital design, because a single chip integration is one of the key elements toward commercial successes.

To achieve the above challenging goal, instead of following outdated analog design philosophies to design future transceivers, digital intensive and digitally assisted transceivers have been proposed and experimented in the past decade. The concept of digital RF and digitally assisted RF were proposed in [2][3] and many designs with such philosophies have been presented. Besides circuit architecture aspects, signal processing aspects have also been considered [4]. Although the importance of signal processing has been fully recognized, new challenges are continuously coming. Targeting emerging communication standards and deeply scaled technologies, there is an order of magnitudes growth of the demand for sophisticated signal processing design and implementations.

In this paper, we will bring several concrete examples to illustrate the challenges we recently experienced and optimizations we did and are still doing. Importantly, these concrete examples include silicon-proven 40 nm and 28 nm transceivers. In addition, to have a complete picture, these selected transceivers represent both sub-6 GHz and 60 GHz communication systems.

The rest of this paper is structured as four sections. Section 2 will describe the trends. Section 3 will describe the signal processing for parameter estimation and calibration of reconfigurable/programmable transceivers. Section 4 will describe challenges and optimizations for high performance signal processing in signal paths for wide band signals and in emerging communication standards. Section 5 concludes the paper and briefs several promising directions.

## 2 Trends and Signal Processing Challenges

The signal processing challenges in emerging transceivers are driven by many trends in communication standardization, transceiver requirement and technology itself.

First of all, communication systems are continuously shifting toward much wider bandwidth and higher performance requirement at the same time. Previous designs for GSM systems worked with hundreds of kHz bandwidth [5]. However, emerging standards target orders of magnitudes higher bandwidth, e.g., 80 MHz in IEEE802.11ac, 100 MHz in LTE-Advanced and 1760 MHz for IEEE802.11ad. Meanwhile, 64QAM becomes mandatory in the majority of high rate systems. 256QAM is mandatory in standards such as DVB-T2. This directly translates into tougher transceiver specifications. Both of the above aspects desire drastically different transceiver architectures and associated signal processing.

In addition, communication systems with higher carrier frequencies (e.g., 60 GHz) are also emerging. Since digital intensive transceivers inherently require signal processing modules working at sampling frequencies closer to intermediate frequencies (IF) or even RF. Such high carrier frequencies, often combined with very large bandwidth, substantially increase signal processing complexity. Although the part of digital power consumption is often not considered in some previous papers, for real life systems this is a must-solve issue.

Moreover, it is very important to mention that highly reconfigurable (or even software defined) transceivers are becoming more and more popular in various systems. The large number of wireless standards combined with the large diversity of modes strongly motivate more flexible transceivers. Abundant flexibilities combined with intensive signal processing (for parameter estimation, calibration, compensation, etc.) prove to be crucial in emerging transceivers [6].

Last but not least, further technology scaling is continuously imposing new challenges of uncertainties for transceiver design. Process/Voltage/Temperature (PVT) variations, device reliability concern and layout dependent effects are all growing. Although transceivers can be digital intensive or even fully digital, certain analog issues of a nanoscale circuit will never disappear. For instance, the discrete time receiver (Rx) in [7] does suffer from several spurious clock tones, and from limited anti-alias filtering due to inaccurate clock timings and charge leakage even with 90 nm technology, which in principle exhibits much less "evil analog behaviors" when compared to 40 nm, 28 nm and further. Inherent analog non-idealities, when combined with digitization, are often more difficult for signal processing algorithms to estimate and compensate. On top of this, the sheer complexity of how transceiver components interact with each other makes this even worse.

# 3 Parameter Estimation and Calibration in Reconfigurable Transceivers

Highly reconfigurable transceivers can achieve better power/area/performance tradeoffs for a variety of different scenarios. However, there are often a large number of configuration bits controlling different circuit components. In [8], even a Network on Chip (NOC) controller has been implemented. Efficient signal processing algorithms and implementations are crucial for both static and dynamic estimation, calibration and compensation of such transceivers. This is especially challenging for emerging transceivers that have multiple duplicated signal paths on chip (for MIMO communication [9], beamforming [10], harmonics rejection [11], nonlinearity cancellation [12] etc.). With deeply scaled technology, devices in different signal paths will differ significantly from each other, so that tuning and compensation signal processing will be crucial. A number of extended concepts such as "self healing" [13] are popping up for such purposes. Section 3.1 uses a specific silicon-proven transceiver to show the substantial challenges in signal processing for parameter calibration and compensation. In Section 3.2, we describe a signal processing optimization scheme for on-line parameter training to reject harmonic interference (HI) in wide band receivers.

# 3.1 Low Power 60 GHz Reconfigurable Transceiver with Beamforming

At millimeter-wave (mm-wave) frequencies, several gigahertz of unlicensed bandwidth around 60 GHz became available recently across the world. This enabled research of mm-wave radio chips targeting several gigabits per second communication for consumer applications. Fig.1 shows a chipset that combines direct conversion with beamforming [10]. It is implemented with TSMC 40 nm LP digital CMOS technology. Rx and transmitter (Tx) are both implemented for 4 antenna paths. With analog baseband beamforming, signal operations at 60 GHz are kept to a minimum. Sensitivity to small layout parasitics is lower at analog baseband than at 60 GHz and the inevitable parasitic interconnect capacitances arising from bringing together the antenna paths can be easily absorbed in the baseband filter capacitors. The RX front-end is based on the front-



**Fig. 1** 60 GHz beamforming transceiver (a): Receiver; (b): Transmitter (40 nm LP).

end from [14], where a 2-stage differential Low Noise Amplifier (LNA) is preceded by an on-chip balun that provides ElectroStatic Discharge (ESD) protection. In the Tx, the in-phase (I)/quadrature (Q) baseband input signal is first split over 4 antenna paths in which phase shifting is applied as well as DC offset compensation. The upconversion mixer is built around a super source follower which yields high conversion gain. More details of this chip can be found in [10].

Although the above chip can achieve satisfactory communication performance with handset compatible power consumption, signal processing requirement for tuning this chip is overwhelmingly complicated. For the Tx part, there are 115 configuration bits for each of the 4 Tx RF paths, and 135 configuration bits for each of the 2 PLLs. In total the Tx requires 730 bits. For the Rx part, there are 62 configuration bits for each of the 4 Rx RF paths, again 135 configuration bits for each of the 2 PLLs, and 437 configuration bits for Rx baseband. In total the Rx requires 955 bits.

We can see that, even for a transceiver dedicated to only 60 GHz communications, a large number of flexibility is designed to combat with uncertain technology parameters, aging, PVT variations, different RF channels and run-time scenarios. Optimal performance/power for each chip of the above design requires optimal tuning of the above 1685 bits, which unfortunately has no straightforward solutions. For very stable and well known technology options, the above configuration bits might be cut by  $\times 2$  or even  $\times 3$  by sacrificing certain tuning ranges. However, even 33% of 1685 configuration bits are still not straightforward to tune. This imposes substantial challenges for fabrication testing and also during run-time.

Based on complicated Matlab programs and interfacing instruments, optimal calibration and testing for this chip take hours in the lab. This is not a practical solution for commercial applications. Due to the high cost of RF testers in fabrication and testing facilities, a complete Build In Self Test (BIST) or BIST combined with a very short RF tester occupation are strongly preferred. In addition, about 30% to 50% of the total configuration bits need to be frequently tuned online to handle frequency hopping, temperature variation, channel variation and dynamic power optimizations. To perform both post-fabrication test and online calibration in cost efficient ways, smart signal processing and implementations are crucial.

### 3.2 Wide Band Transceivers with Harmonic Rejection

Software Defined Radio (SDR) Rxs allow to receive any band of interest over a wide frequency range, and hence require wide-band receiver design. The required flexible down-mixing is commonly implemented using switching mixers. One big challenge of this approach is the harmonic down-mixing problem: odd-order harmonics caused by the switching mixer will down-mix RF interfering signals present at multiples of the receive frequency band to the baseband in the Rx, distorting the desired signal [15].

Traditional designs either use multiple parallel dedicated single-band RF filters or an RF tracking filter, as proposed in [16] and [17] to filter out HIs, which is bulky and power hungry. Multi-path mixing is a promising alternative solution to handle odd-order harmonic downmixing [11]. In this approach, the outputs of multiple switching mixers are combined, each weighted with an appropriate weighting factor to approximate the aggregate local oscillator (LO) signal as a pseudo-sinusoid signal. The closer the aggregate LO signal approximates a sinusoid, the fewer harmonics it contains.  $LO_2$  and  $LO_3$  are typically  $45^{\circ}$  and  $90^{\circ}$  shifted duplicates of  $LO_1$ and each of them contains many odd-order harmonics. It has been shown that an aggregate LO with an exact weight of  $\sqrt{2}$  for  $LO_2$  rejects the  $3^{rd}$  and  $5^{th}$  order HIs completely.

To reject HIs down to the transceiver noise floor, normally 60 to 100 dB harmonic rejection ratio is required. However, the achievable harmonic rejection performance of multi-path mixing solution greatly depends on the phase and amplitude accuracy in each path. Phase and gain mismatch in practical implementations typically limits the harmonic rejection ratio to 30 - 40 dB [18] even with outdated technology node. Hence, mismatch estimation and compensation need to be applied to mitigate this problem. Digital intensive compensation scheme has been proposed to first estimate the phase and gain mismatch among the mixing paths, and then compute the optimal digital recombination weights for the mixing paths. For computing the optimal digital recombination weights for the mixing paths, building on the developed mathematical framework presented in [15], the coefficients for path recombination for interference estimation can be derived by Eqn.(1)

$$\begin{bmatrix} Cancelling harmonics & in I path \\ Cancelling & image frequencies & in I path \\ Cancelling harmonics & in Q path \\ Extracting & image frequencies & in Q path \end{bmatrix}$$
(1)

which leads to:

$$\begin{bmatrix} \sum_{n\overline{N}}^{N} S_{In}F_{LOnm} \\ \Re(\sum_{n=1}^{N} S_{In}F_{LOn1}) \\ \sum_{n\overline{N}}^{N} S_{Qn}F_{LOnm} \\ \Im(\sum_{n=1}^{N} S_{Qn}F_{LOn1}) \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \\ 0 \end{bmatrix} \forall m \quad concerned \\ \forall m \quad concerned \tag{2}$$

where  $F_{LOnm}$  denotes the complex coefficient of LO's  $m_{th}$  harmonic in the  $n_{th}$  mixing path, which also reflect the phase and gain mismatches.  $S_{In}$  and  $S_{Qn}$  are the desired digital weighting factors for the  $n_{th}$  mixing path for I and Q, N is the total number of mixing paths and m represents the order of the harmonics interferences that need to be rejected. The above can be seen as solving a linear equation with least square criterion, which usually involve complex Orthogonal Triangularization (QR) or Singular Value Decomposition (SVD) methods.

However, for practical transceivers, loop back testing requires iterative calibration among Tx and Rx, starting from un-calibrated Rx and Tx. In addition, such a calibration has to be performed very often, simply because the path mismatches vary a lot depending on frequency range, signal bandwidth, temperature, supply voltage, etc.. When allocating 1 micro-second latency budget for such calibrations, the estimated area for the required computation power (based on processor) is larger than  $0.5 mm^2$  with 40 nm GP technology.

We propose a low complexity method with four mixing paths capable of fully rejecting any single HI, which is achieved through an adaptively optimizing HI rejection scheme [19] and avoids the computational intensive SR or SVD.

Fig.2 illustrates the system framework for the proposed harmonic rejection (HR) scheme. After low pass filtering, the down-converted baseband signal in each path is directly converted to digital by an A/D converter. Equidistant  $45^{\circ}$  shifted LOs  $(0-45-90-135^{\circ})$ 



Fig. 2 System framework for the proposed HR scheme

are provided to the four paths, taking into account unavoidable phase error and gain mismatch for each LO. Each mixing path provides a baseband input signal containing rich distortions due to harmonic down-mixing.

In the digital domain, the primary input, containing the desired signal and multiple distortions, is constructed by a linear recombination of the input paths [19]. To eliminate the harmonic distortions in this primary signal, adaptive interference rejection is performed. To this end, a reference input containing the interference estimation is generated by a second linear recombination of the input signals, as shown in Fig.2(b). The working mechanism of the adopted least mean squares (LMS) adaptive filtering method is to adaptively adjust the amplitude and phase of the interference estimation to produce an output that is as close a replica as possible to the distortion components in the primary input. This output is then subtracted from the primary input to produce the desired signal [20].

Two single-tap filtering with each multiplied by a complex equalization factor (w1, w2) are conducted in the adaptive filtering engine (AFE), as shown in Eqn.(3). We introduce time index k in the following equations for better illustration.

$$I_{in}[k] = (R_{in}[k] \times w1^*[k] + R^*_{in}[k] \times w2^*[k])$$
(3)

where  $R_{in}$  is the interference estimation and  $I_{in}$  is the filtering output, which is a phase and gain adjusted  $R_{in}$  to approximate the distortion in the primary input.

Adaptive adjustment of w1, w2 is shown in Eqn.(4).

$$w1[k+1] = w1[k] + \mu \times E_{out}^{*}[k] \times R_{in}[k]$$
  

$$w2[k+1] = w2[k] + \mu \times E_{out}^{*}[k] \times R_{in}^{*}[k]$$
  

$$E_{out}[k] = P_{in}[k] - I_{in}[k]$$
(4)

where  $E_{out}[k]$  is the error signal generated at time k and is also the system output.  $\mu$  is the LMS step-size parameter.



Fig. 3 Simulation result of the  $3^{rd}$  order HI rejection. (a): Scatter plot of SIR before and after compensation. (b): Probability of achieved HR

In this HR scheme, the computational intensive matrix operation is avoided and the equalization factors can be computed on line. To show the robustness of the proposed method, an unfavorable situation for performance with relative large phase error  $(2^{o})$  and gain mismatch (6%) in the analog front-end is assumed in the simulation model. A random 256-QAM modulated desired signal and a  $3^{rd}$  order HI with input power varying from 15 to 65 dB stronger is used as the RF input.

Fig.3 shows the simulation performance of our proposed adaptive HR scheme for the  $3^{rd}$  order HI's rejection in the form of scatter plot of improved signal to interference ratio (SIR) after adaptive compensation and probability of achieved HR. The simulation was conducted with different input power of the  $3^{rd}$  order HI to cover the real RF scenario and without considering other analog imperfections than the phase and gain imbalances. It can be seen that more than 80 dB HR can be achieved for the RF scenario concerned, which is enough to provide a SIR of more than 20 dB in the digital domain to guarantee correct demodulation.

## 4 Very High Performance Signal Processing in Signal Paths

Digital intensive transceiver design, as the name also implies, moves digital processing closer to antennas. Discrete time receivers [7] and digital transmitters [21] are typical examples. Recently, 60 GHz systems and wide band sub-6 GHz systems (e.g., 80 MHz 802.11ac and 100 MHz IEEE LTE-Advanced) are becoming the target for such designs. A key challenge is that signal processing needs to work at very high sampling frequency which may even stay close to RF. This very high sample rate processing, combined with substantial bit width requirement (e.g., 10 to 11 bits for LTE), creates substantial challenges.



Fig. 4 Block diagram of a mm-wave polar transmitter

# 4.1 Digital Processing for Low Power 60 GHz Polar Transmitter

The power amplifier (PA) is usually the most power hungry block in 60 GHz chips. Moreover, in order to overcome signal losses at 60 GHz, phased arrays are often employed and at least the front-ends have to be multiplied with the same number of antenna paths. This increases the PA share in the total chip power consumption. Different applications benefit from improving the PA power efficiency, such as high datarate short-range portable applications that require minimal power consumption for longer battery lifetime and high datarate backhaul systems that transmit with high output powers for longer range communication. Most 60 GHz PAs operate in class-A linear mode [10][22][23] due to the use of variable envelope modulations that are required for high datarates and high spectral efficiency. This causes the PA to work at power efficiency values of less than 5% although values up to 30% could be achieved [10]. In order to improve the PA power efficiency, the PA needs to work in its nonlinear region to utilize the peak efficiency. The polar architecture is one interesting solution that allows the PA to operate in saturation without the need for duplicating the signal path or using power combiners. As shown in Fig. 4, the phase signal goes to the PA, while the amplitude is extracted and applied to the PA through a separate modulation path. Polar conversion can be done with the digital signal processing to avoid the need of an RF limiter that can introduce extra nonlinearity and bandwidth limitations. The amplitude signal can then digitally modulate an RF digital-to-analog converter (DAC) working as a variable-size PA. This eliminates the need to have an additional RF amplitude detection circuit and also avoids modulating the supply.

The non-linear transformation from rectangular signals to polar signals broadens the spectrum. Fig.5 (a) depicts the power spectral density (PSD) of the rectangular signal, which is compliant with the spectrum mask of IEEE 802.11ad [24]. After non-linear conversion to polar signals, the spectrum of the converted signal greatly expands, as shown in Fig.5 (b). To avoid



Fig. 5 Signal spectrum expansion due to rectangular to polar transformation

the spectrum overlap due to expansion after conversion, the rectangular signal needs to be firstly upsampled and digitally filtered before converting to polar signal. The first residual image after oversampling appears at an offset equal to the sampling frequency. For a symbol rate of 1760MS/s (according to the IEEE802.11ad standard), an oversamping ratio (OSR) of at least 6 is normally required to avoid the first residual image locate in the RF band of 802.11ad standard spanning from 57 GHz to 66 GHz.

To have better knowledge of the implementation complexity of signal processing involved in the rectangular to polar conversion, the quantization accuracies of rectangular signal and converted polar signal are analyzed here. Complete 802.11ad transmission system with 16QAM modulation was modeled using Matlab. Error vector magnitude (EVM) results shown in Fig. 6 (a) indicates a notable improvement when increasing accuracy of the transmitted rectangular signal (IQ denoted in the figure) from 6 bits to 7 bits, while minor improvement with further increase. Fig. 6 (b) then depicts the EVM results with fixed 7-bit rectangular signal and for multiple resolutions of the polar signal. Since the allowable constellation error should not be worse than  $-21 \, dB$  for 16QAM modulation in the 802.11ad standard, 7-bit phase signal and 5-bit amplitude signal are chosen to obtain  $-31 \, dB$  EVM performance when targeting 10 dB's design margin. Note that although there are multiple choices of quantization accuracies to achieve  $-31 \, dB$  EVM, the one with minimum bits of amplitude signal is chosen to ease the layout when routing the digital amplitude bit-wires to the PA. Fig. 7 shows the PSD of the output signal with 7, 5 and 7 bits for rectangular signal, converted amplitude signal and phase signal respectively, which is compliant with the spectrum mask.

In this architecture, the DSP needs to operate at very high speeds, which can generate a bottleneck in

#### Table 1 Power Consumption Budget

| Scenario         | Psat               | Pout per FE                        | PA Pdc                  | PA PAE @Pout | Total Pout           | FE Pdc            | Total Pdc          | Total eff. |
|------------------|--------------------|------------------------------------|-------------------------|--------------|----------------------|-------------------|--------------------|------------|
| Back-off         | $14\mathrm{dBm}$   | $5.8 \mathrm{dBm} (\mathrm{P5dB})$ | $78\mathrm{mW}$         | 4.9%         | $17.2\mathrm{dBm^1}$ | $110\mathrm{mW}$  | $724\mathrm{mW}$   | 7.25~%     |
| Polar@same PA    | $14\mathrm{dBm}$   | 9 dBm (Psat, avg)                  | $43.7\mathrm{mW^2}$     | 18.2%        | $20.4\mathrm{dBm}$   | $73.7\mathrm{mW}$ | $578.8\mathrm{mW}$ | 18.94~%    |
| Polar @same Pout | $10.8\mathrm{dBm}$ | $5.8\mathrm{dBm}$                  | $20.9 \mathrm{mW}^{-3}$ | 18.2%        | $17.2\mathrm{dBm}$   | $50.9\mathrm{mW}$ | $487.6\mathrm{mW}$ | 10.76~%    |

 $^1$  A measured value of  $11.4\,\mathrm{dB}$  is considered for the 4-antenna paths.

 $^2$  The 5 dB PAPR corresponds to RFDAC size of 0.56× the full size.

 $^3$  Assuming the same PAE@Psat of 32.2%



Fig. 6 (a): Output EVM results in terms of quantization accuracies of rectangular and polar signals; (b):Output EVM results in terms of quantization accuracies of polar signals with 7-bit rectangular signals



Fig. 7 Output signal spectrum

the power budget of the polar solution. The following power consumption calculations can be used to estimate the power budget for the DSP. Taking the phased array chip of [10] as a reference, Fig. 1 (b) shows the top-level Tx architecture with 4-antenna paths. If the same chip is used in polar mode, the PA will be replaced by a variable size RFDAC for amplitude modulation. Table 1 shows the advantage of the output power, power consumption and efficiency values for a chip used in polar and non-polar modes. With a 5 dB Peak-to-Average Power Ratio (PAPR), the linear PA operates at 5 dB



Fig. 8 RFDAC power characteristics with different sizes showing the linear and polar operating modes of Table.1

back-off from the 1 dB compression point of 10.8 dBm, which gives a PA efficiency of 4.9% compared to the maximum value of 32.2% in saturation. If the same chip is used in polar mode, the RFDAC input includes only phase information and is allowed to operate in the saturation region. With a PAPR of 5 dB, the amplitude will modulate the RFDAC such that the average output power is 5 dB less than the peak saturated 14 dBm (see Fig.8). This causes the average RFDAC size to be  $10^{(-5/20)} = 0.56 \times$  the full size, and the power consumption to reduce with the same factor. The PA op-



Fig. 9 A typical architecture of digital RF transmitter

erating efficiency is then 18.2% in the polar mode compared to 4.9% in the linear mode. The polar-mode total Tx output power is 3 dB higher than the linear mode, and the total Tx efficiency goes to 18.94% compared to 7.25% in the linear mode. A fair evaluation to the power consumption advantage of the polar mode in this chip should include the same analysis at the same Tx output power. Assuming the same peak saturated efficiency, the total Tx power consumption in the polar mode reduces to 487.6 mW compared to 724 mW at the same output power. In order for the DSP to have a minor influence on the total power budget, a value of 10%of the total Tx power consumption should be considered. This concludes an average of 50 mW for the extra digital processing required for the polar operation.

The above analysis puts a challenging task on the optimization of algorithms and design techniques of the additional signal processing circuitry. The 50 mW budget for signal processing power consumption needs to cover 10560 (1760x6) Msps I/Q to phase/amplitude transformation. Aggressive algorithms and circuit level optimizations are being explored to achieve this target.

# 4.2 Digital Correction for Sub-6 GHz Quadrature Digital RF Transmitter

A fully flexible digital RF transmitter (DTX), which explores intensive digital implementation of radio functions, has been a hot topic in both the academia and the industry in past years. With DTX, retargeting system requirements can be achieved by reprogramming the digital circuits rather than analog redesign, which is usually costly and time consuming. A typical high level architecture of a DTX is shown in Fig.9, which features digital mixing and hence requires an ultrawideband DAC. Current-steering (CS) architecture is widely adopted in the wideband DAC design for its advantage in speed and accuracy. A popular segmented architecture of it is shown in Fig.10. Generally, binary scaled current sources are steered by the least signifi-



Fig. 10 Segmented architecture of CS RFDAC

cant bits (LSBs) and an array of unary current sources are steered by the thermometer-decoded most significant bits (MSBs).

A big advantage of DTX over the alternative that consists of multiple single-band transmitters to enable a multi-mode transmitter is the significantly reduced size and power consumption. This holds especially when considering the continuously scaling technology. However, with the decreasing dimension of transistors to nano-scale level, the design of this DTX becomes more and more challenging. Problems pinpointed as performance limitations are listed below:

- random errors: Random amplitude errors exist in the current sources in the CS DAC. The random errors are due to process variations in manufacture and determined by the dimensions of the current source. Increasing the active area of each current source is the most effective method for reducing the random errors. However, in DAC with high accuracy, this approach results in large dimension arrays which may then lead to significant gradient and systematic errors [25].
- gradient errors in CS DAC: Gradient errors are significant in CS DAC with over 10-bit linearity. Main sources of gradient errors are modulated output impedance and gradient amplitude errors in current sources due to voltage drop in supply lines and technology-related errors (e.g., doping, oxide thickness gradient) [26].
- systematic errors in digital mixing: The mixing function of LO and transmitted signal is shifted from analog domain to digital domain in the DTX. As shown in Fig.10, square wave LO signal is generated by digital circuits and routed to digital mixing block to up-convert the transmitted signal. Phase and duty cycle mismatch in the generated I and Q LOs directly

cause quadrature modulation errors in the transmitted signal. Furthermore, the introduced distortion can depend on the inner product of transmitted I and Q signals [27], which makes the existing solutions to IQ imbalance correction *not* feasible any more. In addition, the current cells in CS DAC are switched by each specific bit in I and Q signals. The delay mismatch among bit-wires from the digital mixing block to RF-DAC block will lead to I and Q interference errors significantly different from the traditionally studied I/Q phase/gain imbalance problem. Actually, the interference is between bits, not only between I and Q signals, which imposes big challenge to designers for correcting this problem.

Besides the common "AM-AM" and "AM-PM" errors extensively discussed in non-linear PA, significant "PM-AM" and "PM-PM" errors are also introduced by delay mismatch and duty cycle mismatch [27]. The "PM-AM" and "PM-PM" errors from the bit-grained I and Q interference increases the compensation complexity exponentially, since it requires I/Q co-addressing based correction rather than the traditional amplitude-addressing scheme widely adopted in compensation of non-linear PA. In a most recent ISSCC paper on DTX, this technique is formulated as "2-D Lookup table" [28]. Importantly, the signal processing block (to compensate the above non-idealities) will stay between the filtering block and the digital mixing block. Due to the upsampling with high OSR, the signal processing block usually works in the sampling frequency range 800 Msps to 1600 Msps.

The philosophy that we propose is similar to [28], but with much finer grain and is based on multidimensional polynomial approximations. The originally transmitted signal  $(BB_I \text{ and } BB_Q)$  is pre-distorted to another value  $(BB'_I \text{ and } BB'_Q)$  which corresponds an output OUT' with a minimum root mean square (RMS) error by reference to  $BB_I + jBB_Q$ . The predistortion operation is conducted by polynomial transformation. The default polynomial parameters are calculated based on a prior knowledge of distortion measured from test chip implemented in 28 nm CMOS technology. The update of the polynomial parameters is conducted at predetermined time instances, such as system setup or LO frequency switch. To reduce the time and computational complexity needed for update, this procedure is conducted block by block. As shown in Fig.11, all the possible input data pool is segmented into blocks. Each block has its own transforming polynomial function whose parameters are updated at one time instance.

The off-line update of each block's transforming polynomial function combines three steps. Firstly, all the test data within the block will be sent out, detected at the Tx output and fed back to baseband. Secondly, mapping relation between  $BB_I + jBB_Q$  and  $BB'_I + jBB'_Q$  is created. The bigger is the block, the more candidates of  $BB'_I + jBB'_Q$  need to be checked to find the one with a minimum output RMS error referring to  $BB_I + jBB_Q$ . Thirdly, after mapping relation is created for each possible  $BB_I + jBB_Q$  within one block, a polynomial function can be derived for the mapping curve fitting. Eqn.(5) depicts how a  $T_{th}$ -order polynomial function is constructed, where the  $P_{I(Q)(i,q)}$  is the polynomial parameter for element  $BB_I^I \times BB_Q^G$ .

$$BB'_{I}(t) = \sum_{i=0,q=0}^{i+q \leq T} P_{I(i,q)} \times BB^{i}_{I} \times BB^{q}_{Q}$$
$$BB'_{Q}(t) = \sum_{i=0,q=0}^{i+q \leq T} P_{Q(i,q)} \times BB^{i}_{I} \times BB^{q}_{Q}$$
(5)

The method of least squares is used to find the optimal polynomial parameters based on the created mapping relations. Assuming N digital inputs (N mapping relations) exist in one block, a matrix equation with polynomial parameters can then be built as follows:

$$\begin{bmatrix} BB_{I}^{i}(1)\\ BB_{I}^{i}(2)\\ \dots\\ BB_{I}^{i}(N) \end{bmatrix} = \begin{bmatrix} BB_{I}^{i}(1)BB_{Q}^{q}(1) \quad \forall i+q \leq T\\ \dots\\ BB_{I}^{i}(N)BB_{Q}^{q}(N) \quad \forall i+q \leq T \end{bmatrix}$$

$$\underbrace{\underbrace{A}_{i}}_{x} \underbrace{\begin{bmatrix} P_{I(i,q)}\\ \forall i+q \leq T \end{bmatrix}}_{x}$$
(6)

The polynomial parameters can then be calculated by  $\underline{x} = (\underline{A}^T \underline{A})^{-1} \underline{A}^T \underline{b}.$ (7)

To have a more clear perspective of design choice, Tab.2 concludes achieved root mean square (RMS) error reduction at different design options, i.e. segmented block number and polynomial orders.

If without any segmentation, as expected, the RMS error reduction is much worse than those of segmented options. For the segmented options, it can be seen that the best option for performance is with 36 segmented blocks which can achieve over 20 dB RMS error reduction. Performance becomes worse with even more blocks due to edge effects. With large number of blocks, the needed polynomial order can be reduced, so does the computational load. When the number of blocks is more than 36, minor performance improvement can achieved by increasing the polynomial order from 1 to 3.

The computational complexity mainly exists in updating the polynomial parameters for each block, in



Fig. 11 Block segmentation scheme and self-correction flow

 Table 2 RMS error reduction of transmitter output at different design options

|      |   | RMS error reduction of I data (dB) |      |      |      |      |      | RMS error reduction of Q data (dB) |      |      |      |      |      |  |
|------|---|------------------------------------|------|------|------|------|------|------------------------------------|------|------|------|------|------|--|
|      |   | segmented block number             |      |      |      |      |      |                                    |      |      |      |      |      |  |
|      |   | 400                                | 100  | 36   | 9    | 4    | 1    | 400                                | 100  | 36   | 9    | 4    | 1    |  |
| rder | 1 | 10.6                               | 15.1 | 16.7 | 12.8 | 6.5  | 3.3  | 11.6                               | 15.3 | 16.7 | 11.2 | 5.7  | 2.9  |  |
|      | 2 | 10.7                               | 16.2 | 19.5 | 18.7 | 16.2 | 3.4  | 11.8                               | 16.5 | 18.5 | 17.7 | 15.5 | 3.0  |  |
| õ    | 3 | 10.8                               | 16.4 | 22.0 | 20.5 | 19.5 | 15.4 | 12.0                               | 16.8 | 19.7 | 19.3 | 17.9 | 14.2 |  |

which a matrix with  $N \times M$  elements needs to be inverted, where N is the number of digital inputs in one block and M is determined by the polynomial order according to Eqn.(5) (M = 3, 6, 10 for T = 1, 2, 3). Assuming the arithmetic with individual element has complexity O(1), the computational complexity in updating polynomial parameters for one block can be denoted as  $O(M^2N) + O(M^3)$  [29]. With segmentation, the block size, and also the needed polynomial order can be made smaller. The computational complexity, and hence the needed time and power consumption, for one blocks update in one predetermined time instance can be greatly reduced as well.

### 5 Conclusions and Future Work

In this paper, we went through several examples to depict the signal processing opportunities and challenges we experienced in recent and ongoing design activities. Signal processing together with circuit architecture are the key enablers for digital intensive and digitally assisted transceivers. When targeting emerging communication standards and deeply scaled technology nodes, signal processing design and implementation are much more challenging than before. For the parameter estimation, calibration and compensation of analog nonidealities (or self healing), although recent papers [13] built very dedicated circuits for signal processing, we see a clear need to have flexible processors on chip running versatile algorithms that can even perform model identification by themselves. Importantly these algorithms can be reprogrammed at any time. The high performance signal processing in signal paths also call for disruptive solutions due to the high duty cycle, high sampling rate and stringent power/area constraints. Aggressive algorithm/architecture optimizations, analog driven digital design techniques and stochastic computation techniques are considered to be promising in this context.

#### References

- K. Lee, I. Nam, I. Kwon, J. Gil, K. Han, S. Park, and B.I. Seo, The impact of semiconductor technology scaling on cmos rf and digital circuits for wireless application, *Electron Devices*, *IEEE Transactions on*, 52(7):1415-1422 (2005).
- B. Murmann, Digitally assisted analog circuits. *Micro*, *IEEE*, 26(2):38-47 (2006).
- R. Staszewski, K. Muhammad, D. Leipold, C.-M. Hung, Y.C. Ho, J. Wallberg, C. Fernando, K. Maggio, R. Staszewski, T. Jung, J. Koh, S. John, I. Y. Deng, V. Sarda, O. Moreira-Tamayo, V. Mayega, R. Katz, O. Friedman, O. Eliezer, E. de Obaldia, and P. Balsara, All-digital tx frequency synthesizer and discrete-time receiver for bluetooth radio in 130nm cmos, *Solid-State Circuits, IEEE Journal of*, 39(12):2278-2291 (2004).
- 4. J. Dabrowski and R. Ramzan, Built-in loopback test for ic rf transceivers, Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, 18(6):933-946 (2010).
- R. Staszewski, R. Staszewski, T. Jung, T. Murphy, I. Bashir, O. Eliezer, K. Muhammad, and M. Entezari, Software assisted digital rf processor for single-chip gsm radio in 90nm cmos, *Solid-State Circuits, IEEE Journal of*, 45(2):276-288 (2010).

- J. Borremans, G. Mandal, V. Giannini, B. Debaillie, M. Ingels, T. Sano, B. Verbruggen and J. Craninckx, A 40 nm cmos 0.4-6ghz receiver resilient to out-of-band blcokers, *Solid-State Circuits, IEEE Journal of*, 46(7):1659-1671 (2011).
- J. Craninckx, Cmos software-defined radio transceivers: Analog design in digital technology, *Communications Magazine*, *IEEE*, 50(4):136-144 (2012).
- V. Giannini, P. Nuzzo, C. Soens, K. Vengattaramane, M. Steyaert, J. Ryckaert, M. Goffioul, B. Debaillie, J. Van Driessche, J. Craninckx, and M. Ingels, A 2mm2 0.1to-5ghz sdr receiver in 45nm digital cmos, *In Solid-State Circuits Conference -Digest of Technical Papers (ISSCC), 2009, IEEE International*, pages 408-409 (2009).
- M. Wickert, U. Mayer, and F. Ellinger, 802.11a compliant spatial diversity receiver ic in bicmos, *Microwave Theory and Techniques, IEEE Transactions on*, 60(4):1097-1104 (2012).
- 10. V. Vidojkovic, V. Szortyka, K. Khalaf, G. Mangraviti, S. Brebels, W. v. Thillo, K. Vaesen, B. Parvais, V. Issakov, M. Libois, M. Matsuo, J. Long, C. Soens, and P. Wambacq, A low-power radio chipset in 40nm lp cmos with beamforming for 60ghz high-data-rate wireless communication, *In Solid-State Circuits Conference Digest of Technical Papers (ISSCC)*, 2013, *IEEE International*, pages 236-237 (2013).
- H.-K. Cha, S.-S. Song, H.-T. Kim, and K. Lee, A cmos harmonic rejection mixer with mismatch calibration circuitry for digital tv tuner applications, *Microwave and Wireless Components Letters, IEEE*, 18(9):617-619 (2008).
- E. Klumperink, R. Shrestha, E. Mensink, G. Wienk, Z. Ru, and B. Nauta, Multipath polyphase circuits and their application to rf transceivers, *In Circuits and Systems (IS-CAS)*, 2007, *IEEE International Symposium on*, pages 273-276 (2007).
- 13. A. Tang, F. Hsiao, D. Murphy, I.-N. Ku, J. Liu, S. DSouza, N.-Y. Wang, H. Wu, Y.-H. Wang, M. Tang, G. Virbila, M. Pham, D. Yang, Q. Gu, Y.-C. Wu, Y.-C. Kuan, C. Chien, and M. Chang, A low-overhead self-healing embedded system for ensuring high yield and long-term sustainability of 60ghz 4gb/s radio-on-a-chip, In Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2012, IEEE International, pages 316-318 (2012).
- 14. V. Vidojkovic, G. Mangraviti, K. Khalaf, V. Szortyka, K. Vaesen, W. Van Thillo, B. Parvais, M. Libois, S. Thijs, J. Long, C. Soens, and P. Wambacq, A low-power 57to-66ghz transceiver in 40nm lp cmos with -17db evm at 7gb/s, *In Solid-State Circuits Conference Digest of Technical Papers* (ISSCC), 2012 IEEE International, pages 268-270 (2012).
- C. Li, M. Li, M. Verhelst, A. Bourdoux, J. Borremans, S. Pollin, A. Chiumento, L. Van der Perre, and R. Lauwereins, A generic framework for optimizing digital intensive harmonic rejection receivers, *In Signal Processing Systems* (SiPS), 2012, IEEE Workshop on, pages 167-172 (2012).
- O. Gaborieau et al., "A SAW-less multiband WEDGE receiver," *IEEE ISSCC Dig. Tech. Papers*, pp. 114-115, Feb. 2009.
- Y. Sun et al., "A 50-300-MHz low power and high linear active RF tracking filter for digital TV tuner ICs," *IEEE Custom Integrated Circuits Conference*, pp. 1-4, 2010.
- Z. Ru, N. Moseley, E. A. M. Klumperink, and B. Nauta, Digitally enhanced software-defined radio receiver robust to out-of-band interference, *Solid-State Circuits, IEEE Journal* of, 44(12):3359-3375 (2009).
- C. Li et al., "Adaptive filter based low complexity digital intensive harmonic rejection for SDR receiver," Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pp. 2712-2715, May 2013.

- S. Haykin, "Adaptive Filter Theory," Upper Saddle River, NJ: Prentice Hall, 2002.
- S. Balasubramanian, S. Boumaiza, H. Sarbishaei, T. Quach, P. Orlando, J. Volakis, G. Creech, J. Wilson, and W. Khalil, Ultimate Transmission, *Microwave Magazine*, *IEEE*, 13(1):64-82 (2012).
- W. Chen et. al., "A 60GHz-Band 22 Phased-Array Transmitter in 65nm CMOS, In Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2013, IEEE International, pages 42-43 (2010).
- M. Nariman, et al., "A Compact Millimeter-Wave Energy Transmission System for Wireless Applications," *RFIC Symposium*, pp. 407410, Jun. 2013.
- 24. IEEE Std  $802.11ad^{TM}$ -2012
- 25. Y. Cong et al., "Switching sequence optimization for gradient error compensation in thermometer-decoded DAC arrays," *IEEE Trans. on Circuits and Systems-II: Analog and digital signal processing*, vol. 47, No. 7, pp. 585-595, July 2000.
- Geert A. M. Van der Plas et al., "A 14-bit intrinsic Accuracy Q<sup>2</sup> random walk CMOS DAC," *IEEE J. of Solid-state Circuits*, vol. 34, no. 12, pp. 1708-1718, Dec. 1999.
- C. Li et al., Efficient self-correction scheme for static nonidealities in nano-scale quadrature digital RF transmitters, In Signal Processing Systems (SIPS), 2013, IEEE Workshop on, pages 71-76 (2012).
- 28. C. Lu, H. Wang, C. H. Peng, A. Goel, S. W. Son, P. Liang, A. Niknejad, H. C. Hwang and G. Chien, A 24.7dbm all-digital rf transmitter for multi-mode broadband applications in 40nm cmos, *In Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2013, IEEE International*, pages 332-334 (2012).
- A.J. Stothers, "On the complexity of Matrix Multiplication," *PhD Thesis*, University of Edinburgh, 2010.