# A High-Speed 850-nm Optical Receiver Front-End in 0.18-μm CMOS Carolien Hermans, Student Member, IEEE, and Michiel S. J. Steyaert, Fellow, IEEE Abstract—A high-speed optical interface circuit for 850-nm optical communication is presented. Photodetector, transimpedance amplifier (TIA), and post-amplifier are integrated in a standard $0.18-\mu m$ 1.8-V CMOS technology. To eliminate the slow substrate carriers, a differential n-well diode topology is used. Device simulations clarify the speed advantage of the proposed diode topology compared to other topologies, but also demonstrate the speed-responsivity tradeoff. Due to the lower responsivity, a very sensitive transimpedance amplifier is needed. At 500 Mb/s, an input power of -8 dBm is sufficient to have a bit error rate of $3 \cdot 10^{-10}$ . Next, the design of a broadband post-amplifier is discussed. The small-signal frequency dependent gain of the traditional and modified Cherry-Hooper stage is analyzed. To achieve broadband operation in the output buffer, so-called " $f_T$ doublers" are used. For a differential 10 mV $_{\rm pp}$ 2 $^{31}-1$ pseudo random bit sequence, a bit error rate of $5\cdot 10^{-12}$ at 3.5 Gb/s has been measured. At lower bit-rates, the bit error rate is even lower: a 1-Gb/s 10-mV<sub>DD</sub> input signal results in a bit error rate of $7 \cdot 10^{-14}$ . The TIA consumes 17 mW, while the post-amplifier circuit consumes 34 mW. *Index Terms*—CMOS analog integrated circuits, optical receivers, output buffer, photodiode, post-amplifier, transimpedance amplifier. # I. INTRODUCTION PTICAL receivers with photodetectors integrated in a standard CMOS technology are attractive due to their reduced cost and size. They can be used in short-distance, low-cost communication systems like local-area networks, fiber-to-the-home and optical interconnects between boards or even between chips. Furthermore, high volumes of low-cost optical front-ends will be needed in optical storage systems like CD-ROM, DVD and Blu-ray Disc. A traditional arrangement of the optical receiver building blocks is depicted in Fig. 1. The photodiode converts the light coming from the laser-diode into an electrical current and is normally manufactured in a dedicated technology with excellent optical performance at the desired wavelength (e.g., GaAs or InGaAs for 850-nm light). The transimpedance amplifier (TIA) converts the current into a voltage and is followed by additional amplifying stages that boost the signal swing (the post-amplifier). An output buffer is needed to drive the 50- $\Omega$ load. To obtain high-speed receivers, high $f_T$ technologies such as GaAs or SiGe are used traditionally. This work investigates the high-speed capabilities of CMOS receivers and the optical performance of silicon photodiodes. It will lead to the integration of the complete front-end, from photodetector Manuscript received November 30, 2005; revised January 27, 2006. The authors are with the ESAT-MICAS Laboratory, Katholieke Universiteit Leuven, B-3001 Leuven, Belgium (e-mail: chermans@esat.kuleuven.be). Digital Object Identifier 10.1109/JSSC.2006.873855 Fig. 1. Block diagram of a traditional receiver front-end. to output buffer, in a CMOS technology, and consequently an optimization of both cost and integration capacity. Up to now, most (Bi)CMOS photodiodes for data-communication which can be found in literature have been integrated in silicon technologies with line-widths from 0.6 $\mu$ m [1], $0.5 \mu m$ [2], [3] down to $0.35 \mu m$ [4]. To enhance the optical performance of the photodiodes, additional masks are often added to the standard process. For example, in [2] and [3], a PIN diode with a lowly doped intrinsic region is used. This way, there exists a large drift region where electrons and holes are generated and the quantum efficiency of the diode is high. Another example is the creation of buried layers with a vertical doping profile, which generate an electrical field that accelerates the light-generated carriers [1]. A major disadvantage of these approaches is the additional cost. A different way to widen the depletion region of the diode is to apply a reverse junction voltage higher than the power supply [4], however, this seriously impacts the reliability of the front-end. An alternative approach is presented by [5], where the poor n-well diode frequency response is compensated by an analog equalizer. This work focuses on the integration of photodiodes in a standard CMOS process, without the use of any additional masks or antireflective coatings. Two bandwidth enhancement diode topologies are discussed and compared with the classical n-well diode: the p<sup>+</sup> to n-well diode with guard [6] and the differential n-well diode [7]. Having a look at the post-amplifier, a cascade of simple resistively-loaded differential pairs often proves inadequate as broadband amplifier, especially if the input amplitude is small and the first stages operate linearly. This is mainly due to the limited $f_T$ of today's mainstream CMOS technologies. One way to increase bandwidth is the use of inductive peaking [8]. A major disadvantage of this approach is the large chip area needed for the integrated inductors. To circumvent this problem, active inductors can be used [9]. Unfortunately, they need a bias voltage larger than $V_{\rm cld}$ to eliminate headroom difficulties. Furthermore, active inductors are unfavorable for noise performance. The post-amplifier in this work [10] is based on a cascade of modified CMOS Cherry–Hooper amplifiers [11]. No source followers are placed between the cascaded stages as Fig. 2. Block diagram of the proposed receiver front-end. they only degrade the gain performance. To drive the off-chip load, so-called " $f_T$ doublers" [12] are used. Fig. 2 shows the block diagram of the proposed CMOS optical receiver. Every building block will be discussed in the following sections. Section II explains light detection with a photodiode and its specific problems in CMOS technologies. A basic TIA and subtraction block (CSDA) to amplify and combine the signals from the photodiode are described in Section III. The small-signal behavior of the classical and the modified Cherry–Hooper stage is analyzed in Section IV. Cascading four stages results in a broadband post-amplifier. Section V discusses the output buffer which comprises two scaled " $f_T$ doublers". Layout and measurements are presented in Section VI. Finally, conclusions are given in Section VII. #### II. INTEGRATED CMOS PHOTODIODES Light detection in a CMOS technology can be performed by a reverse-biased junction diode. This section describes the basic topology to capture light, and some modifications to enhance the speed performance of the photodiode. When light impinges on the diode, incident photons with an energy larger than or equal to the bandgap of the semiconductor material generate electron—hole pairs in the depletion region and neutral regions. While the carrier pairs in the depletion region are separated by the electric field and drift in opposite directions, the carriers in the neutral regions are diffusing. This is a relatively slow transport mechanism compared to the drift mechanism and should be avoided for high-speed operation. The carriers within one or a few diffusion lengths of the depletion region will finally reach the junction. Others, which have to bridge a too large distance, will recombine without being detected. A good photodetector generates a large photocurrent for a given optical input power. The ratio between photocurrent and input power is called responsivity (R). For a high responsivity, the depletion region must be sufficiently wide to absorb a large fraction of the incident light within this layer. This is also a requirement for a high-speed diode, because otherwise slowly diffusing carriers would be generated outside the depletion region. This is why the most commonly used CMOS photodetector is the n-well to p-substrate junction. This junction has the widest available depletion region due to the lower doping levels of the substrate and n-well. A cross section of this diode is shown in Fig. 3(a). However, due to the relatively large penetration depth of 850-nm light into silicon ( $\approx$ 10 $\mu$ m), combined with a relatively shallow n-well ( $\approx$ 2 $\mu$ m), the electrons in the substrate are generated at a large depth underneath the junction. These carriers start diffusing in all directions, and add a slowly rising tail to the diode's time response. The holes generated in the n-well Fig. 3. Cross section of CMOS diode topologies: (a) classical n-well diode; (b) $p^+$ to n-well diode with guard; (c) differential n-well diode. TABLE I SIMULATED PHOTODIODE PERFORMANCE (1) | | Classical n-well diode | | p <sup>+</sup> to n-well diode with guard | | |-------|------------------------|--------------------|-------------------------------------------|--------------------| | $N_s$ | R (A/W) | $\mathbf{f}_{3dB}$ | R (A/W) | $\mathbf{f}_{3dB}$ | | 4 | 0.33 | | 0.023 | | | 6 | 0.325 | | 0.021 | | | 8 | 0.32 | 10MHz | 0.019 | 1GHz | | 10 | 0.31 | | 0.017 | TOTIZ | | 12 | 0.30 | | 0.015 | | | 14 | 0.29 | | 0.013 | | are much less of a problem, since they only have to bridge a short distance before being detected. The basic structure of Fig. 3(a) has been simulated using the device simulation program MEDICI. As a multimode fiber has a core diameter of 50 $\mu$ m or 65 $\mu$ m, the dimensions of the diode are $80 \times 80 \,\mu\text{m}^2$ . This could be implemented as one large n-well region, or several smaller parallel n-well regions separated by substrate contacts to minimize substrate resistance. The amount of squares per diode side is given by $N_s$ , and the influence on both responsivity and speed is investigated. The speed performance of the photodiode is characterized by its 3-dB bandwidth $f_{3dB}$ . A summary can be found in Table I. The responsivity turns out to be slightly dependent on the amount of squares. The higher $N_s$ , the more useful area is taken by the substrate contacts, the smaller R. The bandwidth of the diode, mainly determined by the diffusing carriers in the substrate, seems to be independent of layout topology, and always equal to 10 MHz. These results correspond with the simulations of [5]. Eliminating the substrate carriers from the total signal would make the diode response a lot faster. This idea is implemented in the $p^+$ to n-well diode with guard, shown in Fig. 3(b) [4]. The active junction detecting the signal is now the $p^+$ to n-well junction. The substrate contacts act as a guard which remove the substrate carriers. This photodiode is capable of high-speed operation, at the expense of a smaller responsivity. As summarized in Table I, the responsivity is more than 10 times smaller | | Differential n-well diode | | | | | |-------|---------------------------|-------------------------------|-----------|--|--| | $N_f$ | R <sub>light</sub> (A/W) | $R_{light}$ - $R_{dark}(A/W)$ | $f_{3dB}$ | | | | 18 | 0.10 | 0.052 | 1GHz | | | | 22 | 0.094 | 0.046 | 1.5GHz | | | | 26 | 0.088 | 0.042 | 2.2GHz | | | | 30 | 0.083 | 0.038 | 3.1GHz | | | | 34 | 0.076 | 0.034 | 4.4GHz | | | TABLE II SIMULATED PHOTODIODE PERFORMANCE (2) compared to the classical n-well diode. On the other hand, the 3-dB bandwidth is 100 times larger (1 GHz) and again independent of $N_s$ . Another way to increase the diode bandwidth is to use a differential n-well diode. It is based on the same physical principles as the SML detector in [13]. As depicted in Fig. 3(c), the differential n-well diode consists of an alternating pattern of illuminated and dark junctions. The latter ones are covered with metal to block the light. This diode also features substrate contacts alternating the junctions, again to minimize substrate resistance. When light falls on this diode, carriers are generated below the illuminated junctions and not below the dark junctions. The substrate carriers generated close to an illuminated junction will have a higher probability to reach this junction than to reach one of the neighboring dark junctions. Those generated deep in the substrate will require a long time to reach a junction and have more or less equal probability to reach an illuminated or dark junction. If the response of the dark junctions is subtracted from the response of the illuminated junctions, a fast response is achieved as the influence of the slowly diffusing carriers is cancelled. MEDICI simulations have been performed using the amount of fingers $N_f$ as parameter (see Table II). $R_{light}$ , the responsivity of the illuminated junctions, is less than one half of the responsivity of the classical n-well. This is because the same total area $(80 \times 80 \ \mu \text{m}^2)$ is now taken by two diodes, the illuminated one and the dark one. The DC responsivity of the differential signal (R<sub>light</sub> - R<sub>dark</sub>) is approximately one half of the responsivity of the illuminated junctions, but still more than 2 times larger than the responsivity of the p<sup>+</sup> to n-well diode. The 3-dB bandwidth of the differential diode ranges from 1 GHz to more than 4 GHz, and does depend on the amount of fingers. More fingers per diode means less distance for the diffusing carriers to bridge before they get detected. These results show a tradeoff between bandwidth and responsivity, and dependent on the design goals, a proper choice has to be made. As this topology shows the best speed–responsivity performance, the rest of this paper will only deal with differential diodes. # III. TRANSIMPEDANCE AMPLIFIER DESIGN The most important and extremely critical building block of the optical receiver is the transimpedance amplifier (TIA). It determines sensitivity and speed of the front-end. Furthermore, as it is the first block in the receiver chain, a large transimpedance gain is necessary for noise suppression of the following receiver blocks. This section describes the TIA design tradeoffs, and a way to subtract the signals coming from the differential diode. Fig. 4. Simplified schematic of TIA and CSDA. Fig. 4 shows the schematic of the first part of the receiver front-end. It consists of two identical transimpedance amplifiers (TIA $_1$ and TIA $_2$ ) and a complementary self-biased differential amplifier (CSDA) [14]. Diode $D_1$ corresponds to the illuminated junctions, while diode $D_2$ corresponds to the dark junctions. The diffusion current coming from the two diodes will appear as a common-mode signal at the input of the CSDA and will be suppressed. The drift current will be amplified. Mostly, a TIA consists of a voltage amplifier with DC gain A and a feedback resistor $R_f$ , which provides the current to voltage conversion. The TIA bandwidth $\mathrm{BW}_{\mathrm{TIA}}$ is approximately given by $$BW_{TIA} \approx \frac{A}{2\pi C_{inT} R_f}$$ (1) where A is the DC voltage gain, $R_f$ is the feedback resistance, and $C_{\rm inT}$ is the total input capacitance. This capacitance consists of two main parts: the parasitic capacitance of the photodiode and the input capacitance of the voltage amplifier (gate-source capacitance of transistors $M_1$ and $M_2$ ). The photodiode capacitance is determined by the total diode area, doping concentrations, and applied junction voltage. These parameters are fixed and cannot be changed during design: the photodiode area must correspond with the fiber diameter, and doping concentrations and maximum reverse voltage are dictated by the CMOS technology used. Therefore, to maximize bandwidth, $R_f$ must be small and A must be large. Maximizing A also results in a larger gate-source capacitance of the amplifier transistors, so an optimal design point can be found. In addition, the noise generated by the TIA is extremely important, as it determines the sensitivity of the receiver. The generated noise must be very small, as the responsivity of the CMOS photodiode is not as good as the responsivity of a commercially available photodiode. It can be found that the equivalent input-referred current noise $\overline{di_{eq,in}^2(\omega)}$ is given by $$\frac{di_{eq,in}^{2}(\omega)}{di_{eq,in}^{2}(\omega)} = \frac{4kT}{R_{f}} \left| 1 + j\omega \frac{C_{\text{inT}}}{g_{m1} + g_{m2}} \right|^{2} + \gamma \frac{4kT}{R_{f}^{2}(g_{m1} + g_{m2})} |1 + j\omega C_{\text{inT}}R_{f}|^{2} \quad (2)$$ where T is the absolute temperature, k is Boltzmann's constant, $\gamma$ is the transistor excess noise factor, $g_{m1}$ and $g_{m2}$ are the transconductances of transistors $M_1$ and $M_2$ . The first term in (2) refers to the noise from the feedback resistor, while the second term refers to the noise coming from the amplifier transistors. For low frequencies, the resistor $R_f$ is the main noise source, while for high frequencies, the noise of the transistors may become dominant. However, for practical cases where the focus lies on high bandwidth, the noise from resistor $R_f$ is dominant within the noise integration bandwidth. Consequently, $R_f$ must be as high as possible to achieve large receiver sensitivity. This low noise prerequisite contradicts with the high bandwidth criterium. To summarize, the design of the TIA must focus on low noise by maximizing $R_f$ , and bandwidth can be optimized by maximizing the gain A. Traditionally, stability of the TIA is achieved by adding a diode-connected transistor as load [13]. The major drawback of this approach is that the voltage gain A drops. In this design, the TIA consists of an inverter topology and a variable feedback resistance. Without a diode-connected transistor, a much larger bandwidth can be achieved. Stability is investigated by simulating loop gain. As the structure consists of a single-stage voltage amplifier, a safe phase margin can be ensured. The feedback resistor is implemented as a poly-resistor in parallel with a pMOS transistor. The transresistance can be varied between 54 $dB\Omega$ and 65 $dB\Omega$ . The major advantage of this receiver topology is that both TIA and CSDA have a CMOS inverter topology, which makes it very suitable to implement in standard digital CMOS technologies. # IV. POST-AMPLIFIER DESIGN As already mentioned in Section I, the gain-bandwidth of a basic differential pair with resistive load is not large enough for broadband operation. A circuit solution to improve both gain and bandwidth of an amplifying stage is proposed. The small-signal gain and major design issues of a traditional and a modified Cherry-Hooper amplifier are discussed in the first two subsections. The last subsection addresses the cascading of several stages to constitute the post-amplifier. #### A. Cherry-Hooper Amplifier A well-known technique to enhance the bandwidth of a differential amplifier is the Cherry-Hooper topology [15]. A CMOS implementation is depicted in Fig. 5(a). Transistors $M_1$ and $M_2$ form the input pair, and resistor $R_f$ provides feedback between the drain and gate of transistor $M_3$ resp. $M_4$ . $R_d$ is the load resistor. The small-signal half circuit is shown in Fig. 5(b). The output resistances of the transistors are usually larger than $R_f$ and $R_d$ , so they are omitted. The differential small-signal gain $A_{\rm CH}$ of this stage is given by $$A_{\rm CH} = A_{\rm CH,DC} \frac{1 - s \frac{C_{gd3}}{g_{m3}}}{s^2 C^2 \frac{R_f}{g_{m3}} + s(RC)_{\rm CH} + 1}$$ (3) with $$A_{\text{CH,DC}} = g_{m1} R_f \tag{4}$$ Fig. 5. CMOS Cherry-Hooper amplifier stage: (a) schematic and (b) smallsignal half circuit. $$C^2 = C_1 C_{qd3} + C_1 C_L + C_{qd3} C_L \tag{5}$$ $$C^{2} = C_{1}C_{gd3} + C_{1}C_{L} + C_{gd3}C_{L}$$ $$(RC)_{CH} = C_{gd3}R_{f} + C_{1}\frac{R_{f} + R_{d}}{R_{d}g_{m3}} + \frac{C_{L}}{g_{m3}}$$ $$(6)$$ where $C_1$ is the total parasitic capacitance at node n1, $C_{qd3}$ is the gate-drain capacitance of transistor $M_3$ , and $C_L$ is the total capacitance at the output node n3. $g_{m1}$ is the transconductance of transistor $M_1$ , while $g_{m3}$ is the transconductance of transistor M<sub>3</sub>. These equations are valid as long as $g_{m3}R_f \gg 1$ and $q_{m3}R_d \gg 1$ . The small-signal gain of the Cherry-Hooper amplifier at low frequencies is comparable to that of a differential pair (4), because the load resistor of a differential stage is of the same order of magnitude as the feedback resistor $R_f$ . However, the bandwidth of this stage can be seriously extended. Assuming that the denominator of (3) shows two separated poles, the dominant pole is in first order given by $1/(RC)_{CH}$ . Mostly, $C_{qd3}$ is much smaller than $C_1$ and $C_L$ . If it can be assumed that $R_d \gg R_f$ , the pole at node n1 equals $g_{m3}/C_1$ , and the pole at node n3 is given by $g_{m3}/C_L$ . In a simple differential stage, the dominant pole is determined by the load capacitance $C_L$ and the output resistance. The latter one is usually much larger than $1/g_{m3}$ , resulting in a smaller bandwidth. If the design goal of this stage is to optimize bandwidth, the poles will not be separated, but complex conjugated. However, the above reasoning is very valuable for gaining insight in the major advantages of the Cherry-Hooper topology. Gain-bandwidth of the stage is enlarged by moving the poles toward higher frequencies, without the penalty of losing significant gain. #### B. Modified Cherry–Hooper Amplifier One can even go a step further in improving the performance by raising the gain, without a corresponding decrease in bandwidth. This is realized by the modified Cherry-Hooper amplifier, depicted in Fig. 6(a). This stage was first introduced in CMOS by [11]. Compared to a traditional Cherry–Hooper stage, resistor $R_d$ is split up in two resistors $R_1$ and $R_2$ , and transistors M5-M6 provide source-follower feedback. The smallsignal half circuit is shown in Fig. 6(b), where again the output resistances of the transistors are left out. The differential smallsignal gain $A_{MCH}$ is now given by $$A_{\text{MCH}} = A_{\text{MCH,DC}} \frac{1 - s \frac{C_{gd3}}{g_{m3}}}{s^2 (RC)_{\text{MCH}}^2 + s (RC)_{\text{MCH}} + 1}$$ (7) with $$A_{\text{MCH,DC}} = g_{m1} R_f f_{\text{MCH}} \tag{8}$$ $$f_{\text{MCH}} = 1 + \frac{R_2}{R_1} \tag{9}$$ $$f_{\text{MCH}} = 1 + \frac{R_2}{R_1}$$ $$(RC)_{\text{MCH}}^2 = C^2 \frac{A_{\text{MCH,DC}}}{g_{m1}g_{m3}}$$ $$(C^2 = C_1C_{gd3} + C_1C_L + C_{gd3}C_L$$ $$(11)$$ $$C^2 = C_1 C_{gd3} + C_1 C_L + C_{gd3} C_L \tag{11}$$ $$(RC)_{\text{MCH}} = C_{gd3}R_f f_{\text{MCH}} + C_1 \frac{R_f}{R_1 g_{m3}} + C_L \frac{f_{\text{MCH}}}{g_{m3}}.$$ (12) Again, $C_1$ is the total parasitic capacitance at node n1, $C_{qd3}$ is the gate-drain capacitance of transistor $M_3$ , and $C_L$ is the total capacitance at the output node n3. $g_{m1}$ is the transconductance of transistor $M_1$ , while $g_{m3}$ is the transconductance of transistor $M_3$ . This time, the equations are valid as long as $g_{m5}R_f\gg 1$ and $g_{m3}R_1 \gg 1$ . By comparing (4) and (8), one can see that the gain at low frequencies is raised in the modified topology with the factor $f_{\rm MCH}$ . This factor is mainly determined by the ratio of $R_2$ and $R_1$ . To increase gain, this ratio must be large. Because this factor is determined by a ratio of resistors, it is less sensitive to process modifications. Unfortunately, the topology of Fig. 6 has some limitations. The major one is the small voltage headroom available in recent CMOS technologies. This problem will only get worse as line-width scales down and, consequently, the available power supply drops. First, when cascading several stages, the ratio $R_2/R_1$ cannot be made extremely large, as the DC voltage at nodes n3 and n4 must be sufficiently high to drive the next stage. Second, a critical path exists between power supply and ground, where attention must be paid to keep all transistors in saturation. The path is formed by the voltage drop over $R_1$ , the gate-source voltage of transistor $M_5$ ( $V_{qs5}$ ), the voltage drop over $R_f$ , the gate-source voltage of transistor $M_3$ ( $V_{qs3}$ ) and finally the drain-source voltage of biasing transistor $M_{c3}$ ( $V_{dsc3}$ ). Fig. 6. Modified CMOS Cherry-Hooper amplifier stage: (a) schematic and (b) small-signal half circuit. With an available power supply of 1.8 V, the overdrive voltage $V_{qs} - V_T$ of transistors $M_3$ and $M_5$ , and the saturation voltage of transistor $M_{c3}$ , must be smaller than the traditional value of 0.2 V. To limit the voltage drop over the resistors, the current consumption is low and the resistance values are designed as small as possible. ( $R_f$ is as small as 40 $\Omega$ .) As a result, the approximation $g_{m5}R_f\gg 1$ is no longer valid, and the small-signal DC gain is more accurately modeled by $$A_{\text{MCH,DC}} = \frac{g_{m1}(R_1 + R_2)(1/g_{m5} + R_f)}{R_1 + 1/g_{m3}}.$$ (13) Another important issue is the influence of the ratio $R_2/R_1$ on bandwidth, as $f_{\text{MCH}}$ also appears in $(RC)_{\text{MCH}}^2$ and $(RC)_{MCH}$ . It follows from (7) that a large ratio, which is needed for large gain, also results in two real separated poles, which is less beneficial for bandwidth. This should be avoided, and the design must focus on optimizing both gain and bandwidth by introducing complex conjugated poles. Finally, as the circuit of Fig. 6(a) has a feedback loop consisting of $M_5$ , $R_f$ , $M_3$ , $R_1$ , and $R_2$ , care should be taken to ensure stability. It can be found that the loop gain has the same positive zero as the small-signal gain (7). This is dangerous for the phase margin, as it introduces a phase rotation of $-90^{\circ}$ . So the stability of the loop has to be monitored very closely during the optimization of the gain-bandwidth. # C. Post-Amplifier Despite the larger complexity of the modified Cherry–Hooper stage, this amplifying stage is cascaded in the broadband post-amplifier. The analysis above clearly demonstrates its better gain—bandwidth performance compared to the traditional Cherry–Hooper amplifier. Determining how many stages have to be cascaded is a tradeoff between amplification and bandwidth. When N second-order stages are cascaded, the overall transfer function is given by $$A_{\rm PA} = \left(\frac{A_{\rm MCH,DC}}{\left(\frac{s}{\omega_o}\right)^2 + 2\zeta\frac{s}{\omega_o} + 1}\right)^N. \tag{14}$$ The more stages are cascaded, the higher the overall small-signal DC gain $A_{\rm PA,DC}$ , as it equals $A_{\rm MCH,DC}^N$ , with $A_{\rm MCH,DC}$ given by (13). If $\zeta=\sqrt{2}/2$ for a maximally flat response, it can be calculated that the overall bandwidth $f_{3~{\rm dB}~PA}$ is $$f_{\text{3dB,PA}} = f_{\text{3dB,MCH}} \sqrt[4]{\sqrt[N]{2} - 1}$$ (15) $f_{ m 3dB,MCH}$ is the 3-dB bandwidth of one single modified Cherry–Hooper stage. Consequently, the higher N, the smaller the overall bandwidth, or the higher the needed gain–bandwidth per stage to attain a predetermined overall bandwidth. In Fig. 7, the simulated frequency response of the post-amplifier, consisting of four cascaded modified Cherry–Hooper stages, is depicted. Also the response of the individual stages is shown. The gain per stage equals 6.8 dB, which results in an overall gain of 27.4 dB. The first stage is different from the following stages, as an *LC* input network is included in the simulations to take bondwires, bond pads, and ESD protection into account. This stage is of third order and has a bandwidth of 5.2 GHz. The next stages are all second-order stages and have a 3-dB frequency equal to 6.7 GHz. This results in an overall simulated bandwidth of 4 GHz, which corresponds quite well with (15). The load of the post-amplifier is the input impedance of the output buffer. The circuit with biasing consumes 19 mA from a 1.8-V power supply. ### V. OUTPUT BUFFER DESIGN The only function of the output buffer is to drive the 50- $\Omega$ load of the measurement equipment. In a fully integrated one-chip solution where the front-end, consisting of photodiode, TIA, and post-amplifier is integrated together with clock-and-data recovery and digital back-end, the output buffer is not needed at this stage. But in a real-life measurement set-up where the chip is mounted on a ceramic substrate and wire-bonded, a high-speed output buffer is mandatory. This section discusses the implementation of this final building block. The presented output buffer consists of two stages. To achieve a broad bandwidth, each stage comprises an $f_T$ doubler [12], shown in Fig. 8. Transistors $M_1$ to $M_4$ all have the same sizing and operating point. Resistor $R_l$ is the load of transistors $M_1$ and $M_3$ at one side, and transistors $M_2$ and $M_4$ at the other side. Compared to a traditional differential pair with resistive Fig. 7. Simulated frequency response of the post-amplifier, consisting of four cascaded modified Cherry–Hooper stages. Fig. 8. Schematic of the $f_T$ doubler. load, this circuit reduces input capacitance while maintaining the same gain. Consider a differential voltage v applied at the input transistors $\mathrm{M}_1$ and $\mathrm{M}_4$ . The small-signal behavior of the circuit is also shown in Fig. 8. The input $V_{\mathrm{in1}}$ goes up with an amplitude of v/2, while the input $V_{\mathrm{in2}}$ goes down with the same amplitude. $V_b$ is zero, as it is a bias voltage equal to the common-mode level of $V_{\mathrm{in1}}$ and $V_{\mathrm{in2}}$ . As a consequence, a current equal to $g_m v/4$ flows from drain to source in transistors $\mathrm{M}_1$ and $\mathrm{M}_3$ , and from source to drain in transistors $\mathrm{M}_2$ and $\mathrm{M}_4$ . $g_m$ is the transconductance of each transistor. These currents add up at the output nodes which results in a negative small-signal voltage $2 \cdot g_m v/4R_l$ at $V_{\mathrm{out1}}$ and a positive small-signal voltage $2 \cdot g_m v/4R_l$ at $V_{\mathrm{out2}}$ . So the differential gain equals $g_m R_l$ , which is the same as that of a differential pair with input transistor transconductance $g_m$ and load resistor $R_l$ . By using the configuration of Fig. 8, the input ports of the transistors are placed in series while the output ports are connected in parallel. This results in a lower input capacitance. If the parasitic capacitance at the two common-mode nodes nA and nB is negligible, the input capacitance is roughly equal to $C_{gs}/2$ . $C_{gs}$ is the gate-source capacitance of transistors $M_1$ to $M_4$ . Hence, the name " $f_T$ doubler" of the circuit: it halves the input capacitance while maintaining the same overall transconductance. Fig. 9. Chip microphotograph: (a) photodiode with TIA and CSDA and (b) post-amplifier with output buffer. Unfortunately, the circuit also suffers from several drawbacks compared to a simple differential pair. First, power dissipation is doubled. Next, the total current flowing through the load resistors is doubled, making it more difficult to keep the transistors in saturation. Further, total output capacitance is doubled. And finally, if the parasitic capacitance at nodes A and B is not negligible, the input capacitance is higher than $C_{qs}/2$ . To conclude, for delivering high-speed signals off-chip, a larger bandwidth must be exchanged for a higher power consumption. ### VI. LAYOUT AND MEASUREMENTS This section describes the practical realization of a CMOS integrated optical front-end. Eye diagram and bit error rate (BER) measurements are discussed. The chip microphotograph of the photodiode with TIA and CSDA on one hand (area $1000 \times 570 \, \mu \text{m}^2$ ), and the post-amplifier with output buffer on the other hand (area $1300 \times 1500 \, \mu \text{m}^2$ ), is depicted in Fig. 9. The bond pad size on each die is $80 \times 80 \, \mu \text{m}^2$ . Both chips are implemented in a standard $0.18 - \mu \text{m} \ 1.8 - \text{V}$ CMOS technology. No additional masks are used to enhance the performance of the diode. On top of the diode, there is only the standard dielectric stack, no anti-reflective coating. The power dissipation of the two identical TIAs with CSDA is 17 mW. The post-amplifier consumes 34 mW, while the output buffer dissipates 92 mW. The optical source used during measurements is a commercially available 850-nm transmitter. Fig. 10(b) compares the eye diagram of the differential diode at 300 Mb/s with a classical n-well diode [Fig. 10(a)]. A $2^7-1$ pseudo random bit sequence (PRBS) data stream with an average input optical power ( $P_{\rm opt}$ ) of -7 dBm is applied. The slowly rising tail originating from the diffusing substrate carriers can be clearly distinguished in the eye diagram of the classical n-well diode. The eye diagram of the differential diode shows the large improvement of this alternative structure: the eye-opening is much wider, the rising Fig. 10. Eye diagrams of the classical n-well diode and differential n-well diode, both followed by TIA and CSDA, for an input optical power of -7 dBm: (a) classical n-well diode, 300 Mb/s; (b) differential n-well diode, 300 Mb/s; (c) differential n-well diode, 500 Mb/s. and falling edges are steeper and the jitter is only 80 ps compared to 310 ps. Fig. 10(c) shows a high speed eye diagram of the front-end with the differential n-well diode. The optical input signal has a bit-rate of 500 Mb/s and $P_{\rm opt}=-7$ dBm. Even at this speed, the jitter is small and the eye opening is wide compared to the one from the classical n-well diode at 300 Mb/s. Also, bit error rate (BER) measurements have been performed for different optical input levels. At 300 Mb/s and when $P_{\rm opt}=-8$ dBm, the BER is better than $10^{-13}$ . If smaller input signals are applied, the BER performance becomes worse due to the limited responsivity and the reduced data eye width. At 500 Mb/s, the differential n-well diode achieves a BER of $3\cdot10^{-10}$ for $P_{\rm opt}=-8$ dBm, and $5\cdot10^{-6}$ for $P_{\rm opt}=-10$ dBm. As the post-amplifier has a better speed performance, the measurement results are obtained from the Agilent ParBERT (Parallel Bit Error Ratio Tester) 81250 . A differential $2^{31}-1$ PRBS is applied at the input. Fig. 11 shows the eye diagrams for different bit-rates and different input amplitudes. The eye opening measurement obtained from the ParBERT generates a three dimensional bit error rate diagram as a function of the sample delay (x-axis) and the sample threshold (y-axis). The contour of the eye is derived from the bit error rates that have been measured. Different grayscales are used for the regions between the lines of equal bit error rate: black corresponds to a BER lower than $10^{-12}$ . The eye opening is the best for large input signals and low bit-rates. But even for an input signal with a speed of 3.5 Gb/s and an input voltage of 10 mV<sub>pp</sub> the eye is still considerably open. The results of the BER tests are shown in Fig. 12. A differential $2^{31}-1$ PRBS input signal with an amplitude of 10 mV $_{\rm pp}$ is applied, while for each test the bit-rate of the input is changed. As expected, the higher the bit-rate, the higher the BER. At 1 Gb/s the BER is as low as $7 \cdot 10^{-14}$ , while at 3.5 Gb/s the BER 520.0 m\ Fig. 11. Eye diagrams of post-amplifier with output buffer: (a) 1 Gb/s–100 mV $_{\rm pp}$ ; (b) 1 Gb/s–10 mV $_{\rm pp}$ ; (c) 3.5 Gb/s–100 mV $_{\rm pp}$ (d) 3.5 Gb/s–10 mV $_{\rm pp}$ . is still only $5\cdot 10^{-12}$ . The rms jitter is slightly dependent on level and speed of the input signal. For an input level of $10~\text{mV}_{\text{pp}}$ and a bit-rate of 3.5~Gb/s, the measured rms jitter is 16~ps. Taking the rms jitter added by the ParBert into account, the post-amplifier and buffer establish an rms jitter of only 12~ps. Finally, the small-signal voltage gain is measured with a network analyzer. This results in a measured single-ended voltage gain of 21~dB, which corresponds to a differential voltage gain of 27~dB. All measurement results are summarized in Table III. There is still a large difference in maximum achievable bit-rate with low BER between the photodiode with TIA and post-amplifier with output buffer. This clearly demonstrates where the bottleneck lies in opto-electro receiver design. Not at the post-amplifier and output buffer, but at the very first stages comprising photodiode and TIA. Therefore, further effort must be put in the development of high-speed and more sensitive photodiodes, combined with a very low-noise TIA. Fig. 12. BER versus bit-rate of post-amplifier with output buffer for an input level of 10 mV $_{\rm pp}.$ # TABLE III OVERVIEW OF MEASUREMENT RESULTS | Technology | $0.18\mu\mathrm{m}$ CMOS | |-----------------------------------|--------------------------| | $V_{dd}$ | 1.8V | | Optical wavelength | 850nm | | Photodiode with TIA + CSDA | | | Bitrate | 500Mbit/s | | BER@ $P_{opt}$ = $-8dBm$ | $3 \cdot 10^{-10}$ | | rms jitter | 80ps | | Post-amplifier with output buffer | | | Differential gain | 27dB | | Bitrate | 3.5Gbit/s | | BER@10m $V_{pp}$ | $5.10^{-12}$ | | rms jitter | 12ps | | Total power dissipation | 143mW | # VII. CONCLUSION This work demonstrates the feasibility of light detection and high-speed amplification in an unmodified CMOS technology. A photodetector with TIA and post-amplifier with output buffer have been presented. All receiver building blocks are integrated in a 0.18- $\mu$ m 1.8-V CMOS technology. The innovative differential n-well topology leads to a better speed performance compared to the classical n-well topology. The TIA is optimized toward bandwidth, taking the generated receiver noise into account. The cancellation of the diffusing carriers in the substrate by subtraction of the signal from the illuminated and dark junctions, is demonstrated by the eye diagrams. At 500 Mb/s, the BER is $3 \cdot 10^{-10}$ when the input optical power is -8 dBm. Power dissipation is 17 mW. The post-amplifier consists of four cascaded Cherry-Hooper stages with source-follower feedback to improve bandwidth and increase gain. The output buffer necessary to drive the 50- $\Omega$ load comprises two $f_T$ doublers. Ideally, this topology halves the input capacitance without a change of the transconductance. To demonstrate the implemented broadband techniques, measurements have been performed with the Agilent ParBERT 81250. Eye diagrams and BERs from 1 Gb/s to 3.5 Gb/s have been measured. Even for an input level as low as 10 mV $_{\rm pp}$ , the eye is still wide open. The bit error rate for a $2^{31}-1$ PRBS 10 mV $_{\rm pp}$ input signal with a bit-rate of 1 Gb/s is $7 \cdot 10^{-14}$ . At 3.5 Gb/s, the bit error rate is higher but still acceptable: $5 \cdot 10^{-12}$ . The rms jitter of this output signal equals 12 ps. Power dissipation of the post-amplifier is only 34 mW. Although there is still a discrepancy between the obtained speed performance of the two blocks, the Gb/s single-chip solution is within sight. The major bottleneck, the performance of the photodiode, must be overcome by developing faster diodes with a very sensitive TIA. This will lead to a high-speed optical receiver integrated in a standard submicron CMOS technology, which converts 850-nm light into a voltage with a high output swing. #### REFERENCES - [1] G. W. de Jong, J. R. Bergervoet, J. H. Brekelmans, and J. F. van Mil, "A DC-to-250 MHz current pre-amplifier with integrated photo-diodes in standard CBiMOS, for optical-storage systems," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2002, pp. 362–363. - [2] R. Swoboda, J. Knorr, and H. Zimmermann, "A 5-Gb/s OEIC with voltage-up-converter," *IEEE J. Solid-State Circuits*, vol. 40, no. 7, pp. 1521–1526. Jul. 2005. - [3] J. Sturm, M. Leifhelm, H. Schatzmayr, S. Groiss, and H. Zimmermann, "Optical receiver IC for CD/DVD/Blue-laser application," *IEEE J. Solid-State Circuits*, vol. 40, no. 7, pp. 1406–1413, Jul. 2005. - [4] T. Woodward and A. V. Krishnamoorthy, "1 Gb/s CMOS photoreceiver with integrated detector operating at 850 nm," *Electron. Lett.*, vol. 34, no. 12, pp. 1252–1253, Jun. 1998. - [5] S. Radovanovic, A.-J. Annema, and B. Nauta, "A 3-Gb/s optical detector in standard CMOS for 850-nm optical communication," *IEEE J. Solid-State Circuits*, vol. 40, no. 8, pp. 1706–1717, Aug. 2005. - [6] C. Hermans, P. Leroux, and M. Steyaert, "A high-speed optical front-end with integrated photodiode in 90 nm CMOS," presented at the Int. Symp. Signals, Systems and Electronics, Linz, Austria, Aug. 2004 - [7] ——, "Two high-speed optical front-ends with integrated photodiodes in standard 0.18 μm CMOS," in *Proc. Eur. Solid-State Circuits Conf.* (ESSCIRC), Sep. 2004, pp. 275–278. - cuits, vol. 38, no. 12, pp. 2138–2146, Dec. 2003. [9] E. Säckinger and W. C. Fischer, "A 3-GHz 32-dB CMOS limiting amplifier for SONET OC-48 receivers," *IEEE J. Solid-State Circuits*, vol. 35, no. 12, pp. 1884–1888, Dec. 2000. - [10] C. Hermans and M. Steyaert, "A 3.5 Gb/s post-amplifier in 0.18 μm CMOS," in *Proc. Eur. Solid-State Circuits Conf. (ESSCIRC)*, Sep. 2005, pp. 431–434. - [11] C. D. Holdenried, M. W. Lynch, and J. W. Haslett, "Modified CMOS Cherry-Hooper amplifiers with source follower feedback in 0.35 μm technology," in *Proc. Eur. Solid-State Circuits Conf. (ESSCIRC)*, Sep. 2003. - [12] B. Razavi, "Prospects of CMOS technology for high-speed optical communications," *IEEE J. Solid-State Circuits*, vol. 37, no. 9, pp. 1135–1145, Sep. 2002. - [13] C. Rooman, D. Coppée, and M. Kuijk, "Asynchronous 250-Mb/s optical receivers with integrated detector in standard CMOS technology for optocoupler applications," *IEEE J. Solid-State Circuits*, vol. 35, no. 7, pp. 953–958, Jul. 2000. - [14] M. Bazes, "Two novel fully complementary self-biased CMOS differential amplifiers," *IEEE J. Solid-State Circuits*, vol. 26, no. 2, pp. 165–168, Feb. 1991. - [15] E. M. Cherry and D. E. Hooper, "The design of wideband transistor feedback amplifiers," in *Proc. Inst. Electr. Eng.*, Feb. 1963, vol. 110, pp. 375–389. Carolien Hermans (S'01) was born in Hasselt, Belgium, in 1978. She received the M.S. degree in electrical engineering from the Katholieke Universiteit Leuven (K.U.Leuven), Heverlee, Belgium, in 2001. The subject of her M.S. thesis was a comparative design study of CMOS front-end circuits for gigabit optical communication. Currently, she is a Research Assistant at the ESAT-MICAS Laboratories of the Katholieke Universiteit Leuven. She is working toward the Ph.D. degree on the design of CMOS interface circuits for optical communication. For this work, she obtained a fellowship from the Fund for Scientific Research—Flanders (Fonds voor Wetenschappelijk Onderzoek—Vlaanderen). Michiel S. J. Steyaert (S'85–A'89–SM'92–F'04) was born in Aalst, Belgium, in 1959. He received the M.S. degree in electrical–mechanical engineering and the Ph.D. degree in electronics from the Katholieke Universiteit Leuven (K.U.Leuven), Heverlee, Belgium, in 1983 and 1987, respectively. From 1983 to 1986, he obtained an IWNOL fellowship (Belgian National Foundation for Industrial Research) which allowed him to work as a Research Assistant at the Laboratory ESAT at K.U.Leuven. In 1987, he was responsible for several industrial projects in the field of analog micro power circuits at the Laboratory ESAT as an IWONL Project Researcher. In 1988, he was a Visiting Assistant Professor at the University of California, Los Angeles. In 1989, he was appointed by the National Fund of Scientific Research (Belgium) as a Research Associate, in 1992 as a Senior Research Associate and in 1996 as a Research Director at the Laboratory ESAT, K.U.Leuven. Between 1989 and 1996, he was also a part-time Associate Professor. He is now a Full Professor at the K.U.Leuven and the Chair of the Electrical Engineering Department. His current research interests are in high-performance and high-frequency analog integrated circuits for telecommunication systems and analog signal processing. Prof. Steyaert received the 1990 and 2001 European Solid-State Circuits Conference Best Paper Award, the 1991 and the 2000 NFWO Alcatel-Bell-Telephone Award for innovative work in integrated circuits for telecommunications, the 1995 and 1997 IEEE ISSCC Evening Session Awards, and the 1999 IEEE Circuit and Systems Society Guillemin–Cauer Award.