Journal of Semiconductors, Volume. 45, Issue 6, 062204(2024)

A 28/56 Gb/s NRZ/PAM-4 dual-mode transceiver with 1/4 rate reconfigurable 4-tap FFE and half-rate slicer in a 28-nm CMOS

Yukun He, Zhao Yuan, Kanan Wang, Renjie Tang, Yunxiang He, Xian Chen, Zhengyang Ye, and Xiaoyan Gui*
Author Affiliations
  • School of Microelectronics, Xi’an Jiaotong University, Xi’an 710049, China
  • show less

    A 28/56 Gb/s NRZ/PAM-4 dual-mode transceiver (TRx) designed in a 28-nm complementary metal-oxide-semiconductor (CMOS) process is presented in this article. A voltage-mode (VM) driver featuring a 4-tap reconfigurable feed-forward equalizer (FFE) is employed in the quarter-rate transmitter (TX). The half-rate receiver (RX) incorporates a continuous-time linear equalizer (CTLE), a 3-stage high-speed slicer with multi-clock-phase sampling, and a clock and data recovery (CDR). The experimental results show that the TRx operates at a maximum speed of 56 Gb/s with chip-on board (COB) assembly. The 28 Gb/s NRZ eye diagram shows a far-end vertical eye opening of 210 mV with an output amplitude of 351 mV single-ended and the 56 Gb/s PAM-4 eye diagram exhibits far-end eye opening of 33 mV (upper-eye), 31 mV (mid-eye), and 28 mV (lower-eye) with an output amplitude of 353 mV single-ended. The recovered 14 GHz clock from the RX exhibits random jitter (RJ) of 469 fs and deterministic jitter (DJ) of 8.76 ps. The 875 Mb/s de-multiplexed data features 593 ps horizontal eye opening with 32.02 ps RJ, at bit-error rate (BER) of 10?5 (0.53 UI). The power dissipation of TX and RX are 125 and 181.4 mW, respectively, from a 0.9-V supply.

    Keywords

    Introduction

    The increasing demand for digital-intensive services such as internet of things (IoT), big data, cloud computing, artificial intelligence (AI) and especially the rapid achievements in large language model, is fueling the data centers to continuously upgrade the infrastructure, which will support the exponential growth in the network bandwidth. In particular, there is a strong demand for high-speed, low-power data transmissions between chip-to-chip, chip-to-module, and module-to-module within high-performance data centers[1].

    Four-level pulse amplitude modulation (PAM-4) is introduced in the migration path from 28 Gb/s per-lane to 56 Gb/s per-lane and beyond due to its bandwidth efficiency. Compared to non-return-to-zero (NRZ) signaling[2], PAM-4 offers higher spectral efficiency, lower loss at the Nyquist frequency, and relaxes clock speeds at the same data rate. These advantages have led to the implementation of PAM-4 modulation in recent interface standards, both electrical and optical[35]. However, design of PAM-4 transceiver raises quite a few challenges and tradeoffs. Since the vertical eye opening is compressed by more than 1/3, the signal to noise ratio (SNR) is degraded by ~9.5 dB compared to NRZ signaling. In addition, the jitter performance suffers due to the finite rise and fall times of transitions between non-adjacent levels. Besides, PAM-4 signaling poses critical requirements on the transceiver linearity for signal integrity. Tremendous efforts therefore have been put into development of high-speed and low-power PAM-4 transceivers[620], which require more stringent linearity and equalization performance that mitigate the multi-level inter-symbol interference (ISI) and improve the sensitivity, with respect to NRZ signaling.

    56 Gb/s electrical interface involves multi-types of link connection protocols, such as long-reach (LR), medium-reach (MR), short-reach (SR), very short-reach (VSR), and ultra short-reach (USR). Various scenarios are considered, spanning from die-to-die connection with negligible interconnect loss, to heavy-loss link where the backplane or cable reflections are generated by traces, packages and connectors. The harsh operating conditions of PAM-4 signaling are driving the SerDes transceiver architecture migration towards analog-to-digital converter (ADC) based receiver and digital-to-analog converter (DAC) based transmitter, where complex and flexible equalization are performed by digital signal processing (DSP). However, this solution faces two main challenges. First, the total power dissipation of receiver is quite high due to the DSP. For example, the link power efficiency, including DSP, reported in Ref. [14] is 8 pJ/bit. Second, ADCs require very low clock jitter to maintain a reasonable effective number of bits (ENOB)[21]. Many efforts have been put into reducing the power consumption of ADC based receivers through FinFET technologies[811]. Nonetheless, for SR/VSR applications, analog based receivers may offer higher power and area efficiency[5, 11]. The analog mixed-signal architecture therefore, has drawn more attention in such applications.

    In this work, a 28/56 Gb/s NRZ/PAM-4 dual-mode transceiver in 28-nm complementary metal-oxide-semiconductor (CMOS)[15] is presented. A dual-mode quarter-rate transmitter (TX) with reconfigurable feed-forward equalizer (FFE) is designed to compensate for the channel loss. A dual-mode half-rate receiver (RX) with a novel three-stage multi-clock-phase sampling high-speed slicer is also proposed to fully exploit the bandwidth margin and minimize the power overhead.

    Transmitter architecture and design

    TX architecture

    To achieve lower power consumption, the 1/4-rate architecture with a 4-to-1 multiplexer at the last stage of serializer is employed, as shown in Fig. 1. The 64-bit parallel data, consisting of 32-bit least-significant bit (LSB) and 32-bit most-significant bit (MSB), either from the pseudo-random binary sequence (PRBS) generator or receiver, are multiplexed into 4-bit parallel LSB and 4-bit parallel MSB data at 7 Gb/s, which are sent to the reconfigurable FFE. For low-power design, a voltage-mode (VM) output driver is employed. To address the inherent bandwidth limitation of the VM driver, a T-coil peaking network is introduced at the output of the driver, extending the bandwidth and providing 50-ohm impedance matching up to ~20 GHz as well, by splitting the large parasitic capacitances of the electrostatic-discharge (ESD) diodes and bonding pads. The 7 GHz quarter-rate clock C4, generated by an analog divide-by-2 from the C2 input clock of 14 GHz, is sent to the FFE by the distributed clock tree network. The C4 clock is further divided into C8, C16 and C32 clock by digital dividers, which are used in the serializers and PRBS generator. To mitigate the negative effects of duty-cycle distortion and quadrature phase-error induced by process voltage and temperature (PVT) variations, the clock path employs the duty-cycle correction (DCC) and quadrature phase-error correction (QEC) blocks, ensuring the signal integrity of the high-speed clocking.

    (Color online) TX architecture.

    Figure 1.(Color online) TX architecture.

    FFE and driver design

    Fig. 2 shows the design of the reconfigurable TX equalizer, where 4-way 1UI-wide data (D3’−D0’) is generated through 1UI-tap delay generation unit and 1UI-wide pulse generation unit. After passing through the 4 : 1 multiplexer (MUX), the full-rate 28 Gb/s data is sent to the voltage-mode driver. In the 4 : 1 MUX, a pull-up transistor (M0) is used instead of a resistor to minimize the pre-driver area and the associated routing parasitics. The tap-delay data (D3−D0) is generated by sampling the latch with C4 I/Q clock. 1UI-wide data (D3’−D0’) is generated by the AND gate with the input of 1 tap delay data (D3−D0) and 25% duty-cycle C4 clock. The 25% duty-cycle clock is generated by the AND gate, the input of which are quadrature C4 clock from the phase selector with 90° phase difference. A SGNn signal determines sign of the tap coefficients and a C4 clock phase selector is used to choose appropriate clock phases, which determine the tap configurations in the FFE. A 4 : 1 MUX with tri-state inverters is implemented as the C4 clock phase selector for minimal power and area cost. Any of the incoming C4 I/Q clock can be assigned to a given output (C4ISn/C4ISBn/C4QSn/C4QSBn) with SEL1 and SEL2. A manually adjustable bias voltage VCTRL is used to maintain optimum rise/fall time of the 4 : 1 MUX over PVT variations.

    (Color online) TX FFE and driver architecture.

    Figure 2.(Color online) TX FFE and driver architecture.

    The adopted 16-segment FFE driver in Fig. 1 provides flexible assignment for FFE tap-weight control. Compared with the conventional fixed tap FFE, the phase selector used in this design enables timing adjustment of the 1-UI pulse shaping, which can be used to reassign each segment among different FFE taps, thereby adjusting the FFE coefficients. The tap weight can be calculated as the number of segments assigned to a given FFE cursor divided by 16 total available segments. By the segment reassignment manner, a constant output amplitude is maintained, and all segments remain active with different tap configurations[22].

    Fig. 3(a) shows the timing diagram of the clock phase selection with the main-tap cursor designated by solid arrows, where the FFE is configured for 1 pre-tap and 2 post-tap cursors. By selecting the C4 phases, each of the 16 FFE-MUX-Driver segments can be configured as pre-tap, main-tap, post1-tap or post2-tap cursor. For each segment shown in Fig. 2, the tap cursors are shifted to the left or right, illustrated with dashed arrows, by properly arranging the C4 clock phases. For instance, the main cursor of data D0’ is generated by the rising edge of C4Q and falling edge of C4I. The pre-cursor for D0’ is generated by the rising edge of C4I and falling edge of C4QB. By the same manner, the post1-tap cursor is determined by the rising edge of C4IB and falling edge C4Q, and the post_2-cursor is determined by the rising edge of C4QB and falling edge C4IB. One reconfigured timing diagram depicted in Fig. 3(b) covers the scenario where the TX FFE entails 2 pre-cursors and 1 post-cursor, where the main cursor of data D0’ is initiated by the rising edge of C4IB and ended by the falling edge of C4Q by shifting the main tap from the second to the third UI of the available quarter-rate data. By arranging the C4 clock phase selection and allocating 16 FFE-MUX-Driver segments, a four-tap fully reconfigurable FFE is implemented in this design.

    (Color online) Timing diagram illustrating FFE tap selection. (a) 1 pre- and 2 post-cursors; (b) 2 pre- and 1 post-cursors.

    Figure 3.(Color online) Timing diagram illustrating FFE tap selection. (a) 1 pre- and 2 post-cursors; (b) 2 pre- and 1 post-cursors.

    TX clock distribution

    Fig. 4 shows the clock distribution of the transmitter. The C4 quadrature clock is generated by single-to-differential (S2D) buffer and analog divide-by-2 from a C2 off-chip input clock. A current mode logic (CML) to CMOS block is adopted to generate rail-to-rail signal. As mentioned before, DCC and QEC circuitry are inserted along the clock path to overcome the duty-cycle distortion and quadrature phase error. C4 clocks are sent to the N and P paths of all 64 TX driver slices (S1−S64) including both the MSB and LSB data.

    (Color online) Clock distribution of TX.

    Figure 4.(Color online) Clock distribution of TX.

    DCC and QEC design

    The operation principles of the DCC and QEC correction is shown in Fig. 5, which can be both calibrated. The duty-cycle distortion of the quadrature clock is corrected first, followed by the I/Q mismatch correction, ensuring uniform 1-UI eye-width generation from the 4 : 1 MUX. The duty cycle and phase error of the quadrature clock are adjusted by two-stage DCC and QEC compensation units, shown in Fig. 6(a). The source-degenerate inverters with tunable NMOS/PMOS resistance are employed to adjust the rise and fall time of the clock. For DCC correction, the rise and fall time of the compensation unit is adjusted in a complimentary manner by applying the DCC voltage both at the gate of Mp1 and Mn1, as illustrated in Fig. 6(b). On the other hand, the rise and fall time are adjusted in the same direction by tuning the degeneration resistances with complimentary QEC voltage. The implemented DCC and QEC circuitry can cover a calibration range of 3.5 ps with a duty-cycle correction step of 500 fs, while achieve a calibration range of 7 ps with a quadrature phase error resolution of 1 ps.

    (Color online) Operation principles of DCC and QEC[23].

    Figure 5.(Color online) Operation principles of DCC and QEC[23].

    (Color online) (a) DCC and QEC control cell. (b) Adjustment of duty-cycle and quadrature phase-error.

    Figure 6.(Color online) (a) DCC and QEC control cell. (b) Adjustment of duty-cycle and quadrature phase-error.

    Receiver architecture and design

    RX architecture

    Fig. 7 shows the RX block diagram. The continuous time linear equalizer (CTLE) can compensate for up to 10-dB insertion loss (IL) at Nyquist frequency of 14 GHz. A 3-stage half-rate slicer with multi-phase clock sampling is employed to minimize the power overhead and fully exploit the bandwidth margin in 28-nm CMOS process. The 56 Gbps PAM4 data from the CTLE is sampled by the slicer and down to 4-way parallel thermometer-coded 14 Gb/s NRZ data, including both the data and edge information. The 14 Gb/s data streams are deserialized into 1.75 Gb/s by three stages of 1-to-2 demultiplexer before being utilized in the clock and data recovery (CDR) logic. The decoder further demultiplexes the 1.75 Gb/s data down to 875 Mb/s and converts the thermometer code into binary. For the clock path, the 14 GHz half-rate quadrature clock is generated by the IQ generation block from a 14 GHz differential clock and sent to the phase interpolator (PI). The clock phases of PI are determined by the control bits from CDR logic to achieve data synchronization. The 14 GHz clock is first divided to 7 GHz by analog divide-by-2 and further divided down to lower speed by digital dividers for each stage of the demultiplexer (DEMUX).

    (Color online) Half-rate receiver architecture.

    Figure 7.(Color online) Half-rate receiver architecture.

    Analog front-end circuit design

    Fig. 8(a) depicts the analog front-end circuit. The CTLE employs RC source-degeneration to provide a zero/pole pair. The degeneration varactor employs a NMOS varactor array with control voltage VCS, and the tunable resistor consists of three NMOS transistors with three control voltages, VRS1, VRS2, and VRS3, as shown in Fig. 8(b). All the control voltage VCS and VRS1,2,3 are ramping from 200 to 900 mV with a step size of 100 mV generated by 3-bit voltage digital-to-analog converters (VDAC). Two stages of CTLE are adopted to achieve a smaller gain step size control[24]. In order to ensure bandwidth margin while matching the layout, two types of CML buffers are used in this design, the CML buffer with inductive peaking for bandwidth enhancement as shown in Fig. 8(c), and the CML buffer with RC source-degeneration as shown in Fig. 8(d) with more compact area. Fig. 9 shows the frequency responses of CTLE, where the peaking gain ranges from −1.32 to 5.15 dB and the DC gain ranges from −9.85 to 4.23 dB.

    (Color online) Design of the analog front-end. (a) Block diagram; (b) schematic of the CTLE; (c) schematic of the CML buffer with inductive peaking; (d) schematic of the CML buffer with RC source-degeneration.

    Figure 8.(Color online) Design of the analog front-end. (a) Block diagram; (b) schematic of the CTLE; (c) schematic of the CML buffer with inductive peaking; (d) schematic of the CML buffer with RC source-degeneration.

    (Color online) Frequency responses of CTLE. (a) Peaking gain adjustment; (b) DC adjustment.

    Figure 9.(Color online) Frequency responses of CTLE. (a) Peaking gain adjustment; (b) DC adjustment.

    Slicer design

    In the half-rate architecture, the input data after the CTLE is sampled by two sets of slicers in the even path and odd path, respectively. There are four slicers in each set, three for data sampling, and one for edge sampling. To enhance the sensitivity and settling time of the half-rate slicer in 28-nm CMOS, a three-stage high-speed comparator with multi-clock-phase sampling is introduced in this work. Fig. 10 illustrates the timing diagram of the three-stage high-speed slicer with half-rate quadrature clock. The three-stage slicer consists of two stages of the proposed reset-and-regenerate comparator and a sense amplifier with S/R latch in the third stage followed by two D flip-flops to restore the CMOS rail-to-rail signal level. CLKN0 at 14 GHz with 50% duty cycle is applied to the first-stage comparator to resolve the input data at the falling edge. Due to the limited gain-bandwidth product at 28-nm CMOS process node when handling the 56 Gbps PAM4 data with Nyquist frequency of 14 GHz, the subsequent falling edge of CLKN90 is used in the second-stage comparator such that the analog full-swing amplitude can be further regenerated. The sense amplifier with S/R latch is employed in the third-stage driven by the successive falling edge from CLKP0 to resolve the output data to rail-to-rail signaling. The quadrature clocking scheme in association with the cascade multi-stage comparator enhances the regeneration capability by ensuring correct timing.

    (Color online) Block diagram of half-rate slicer.

    Figure 10.(Color online) Block diagram of half-rate slicer.

    Fig. 11(a) shows the schematic of conventional StrongArm comparator. The operation principle can be illustrated as follows: When the clock (CLK) is low, the outputs are both charged to VDD such that the differential output is reset; when the clock (CLK) becomes high, the StrongArm samples the differential input, and the differential output is regenerated toward rail-to-rail with the help of the positive feedback offered by the cross-coupled pairs. There is always certain time required for the output to regenerate both from VDD due to the reset phase. That is to say, for high data-rate operations, the time required for the StrongArm comparator to distinguish logic low and logic high may not be sufficient. Schematic of the proposed high-speed reset-and-regenerate comparator employed in the first and second stage of the slicer is shown in Fig. 11(b). Compared to the StrongArm comparator, whose output node is connected with 2 gates and 3 drains of the MOS transistors, the parasitic capacitances at the output of the proposed comparator are significantly reduced by connecting with only one gate and one drain. Resistor RD is inserted between the PMOS and NMOS transistors to isolate the output nodes from the parasitics of the switching transistor and the input pairs. Figs. 11(c) and 11(d) illustrate the operation of the proposed slicer during the two complementary clock phases. When the rising edge of the clock comes, the output resets to the common-mode level. And when the falling edge of the clock triggers, the output starts to perform track and regenerate operation. The output voltage is regenerated from the common-mode level. The input-referred offset is nulled by transistors MN2 and MN3, which also define the reference voltage of the upper, middle and lower eye when processing the PAM-4 input signal. Compared to the conventional StrongArm comparator[25, 26], the proposed comparator demonstrates effective capability to reduce regeneration time. The regeneration time constant is simulated to be 27.7 ps for the StrongArm comparator, whereas reduced down to 11.0 ps for the proposed comparator, as shown in Fig. 12, when handling a 200 mVpp input signal by a 14 GHz clock. Schematic of the sense-amplifier in the third stage is shown in Fig. 13, with track and regeneration phase to further amplify the output signal from the second stage.

    (Color online) (a) Schematic of StrongArm comparator; (b) schematic of proposed reset-and-regenerate comparator; (c) reset mode of proposed comparator; (d) track and regenerate mode of proposed comparator.

    Figure 11.(Color online) (a) Schematic of StrongArm comparator; (b) schematic of proposed reset-and-regenerate comparator; (c) reset mode of proposed comparator; (d) track and regenerate mode of proposed comparator.

    (Color online) Clock-to-Q delay simulation results of (a) strongArm comparator, (b) proposed reset-and-regenerate comparator.

    Figure 12.(Color online) Clock-to-Q delay simulation results of (a) strongArm comparator, (b) proposed reset-and-regenerate comparator.

    Schematic of sense-amplifier with S/R latch.

    Figure 13.Schematic of sense-amplifier with S/R latch.

    Fig. 14 depicts the simulation results at the output nodes of each comparator. When a 200 mVpp PAM-4 signal is received as shown in Fig. 14(a), the first stage comparator provides sufficient amplification of the input error, and the second stage comparator further regenerates the logic high and logic low from the common-mode voltage. The rail-to-rail CMOS logic "1"s and "0"s are restored at the output of the slicer. In addition, when sampling the input PAM-4 signal, the three data slicers sense the four-level input signal and generate the thermometer-code, representing the four-voltage levels from the three eye openings as shown in Fig. 14(b).

    (Color online) Simulation results of (a) the three-stage comparator, (b) the data slicers.

    Figure 14.(Color online) Simulation results of (a) the three-stage comparator, (b) the data slicers.

    CDR design

    Fig. 15 shows the block diagram of the phase interpolator (PI) based CDR. After the data and edge signal is resolved and de-multiplexed, the bang-bang phase detector (BBPD) determines if the clock is early or late, with the resultant phase control codes speeding up or slowing down the PI, which in turn adjusts the frequency and phase of the quadrature recovered clock.

    (Color online) Block diagram of CDR loop.

    Figure 15.(Color online) Block diagram of CDR loop.

    The digital engine of the CDR consists of three key modules, as shown in Fig. 16, the BBPD and phase voting, the digital loop filter (DLF) and the phase interpolator control signal generation module. The BBPD extracts the relative phase relationship between the clock and the data according to the data and edge information. The major voting determines whether the clock is leading or lagging behind the data. The digital counter based DLF then conducts increment and decrement operations to the PI control, which sends out the phase interpolator control bits PI_ctrl<63 : 0>.

    Digital CDR block diagram.

    Figure 16.Digital CDR block diagram.

    Phase interpolator design

    The design topology of the phase interpolator is shown in Fig. 17. Four branches of CML differential pairs share the same load resistors. And each branch consists of 16 differential transistor pairs connected in parallel, driven by the C2 quadrature clock. The tail current of each differential pair is enabled/disabled by the thermometer-coded PI control bits, with 16 "1"s and 48 "0"s to ensure a constant output amplitude while rotating the output phase clock-wise or anticlock-wise. The simulated integral nonlinearity (INL) is less than 4° as shown in Fig. 18.

    Schematic of PI.

    Figure 17.Schematic of PI.

    (Color online) Simulation results of PI. (a) Simulated output phase versus ideal output phase; (b) INL.

    Figure 18.(Color online) Simulation results of PI. (a) Simulated output phase versus ideal output phase; (b) INL.

    IQ generation design

    In the half-rate architecture, C2 differential clock is generated by the PLL, or off-chip in this work. A quadrate clock generation circuitry is required as shown in Fig. 19, where four stages of tetrahedral oscillators[27, 28] rotating next to each other in sequence are used. Fig. 20(a) shows the transient simulation result of IQ generation. Fig. 20(b) shows the monte-carlo (MC 100) simulation result of IQ mismatch, with average IQ difference of 90.41° and standard deviation of 2.51°.

    (Color online) Block diagram of IQ generation.

    Figure 19.(Color online) Block diagram of IQ generation.

    (Color online) Simulation results of IQ generation. (a) Transient; (b) phase difference of monte-carlo.

    Figure 20.(Color online) Simulation results of IQ generation. (a) Transient; (b) phase difference of monte-carlo.

    Decoder design

    The PAM-4 signal is presented in the form of thermometer code, which is further converted into the MSB and LSB NRZ data by the decoder circuit shown in Fig. 21. The AND gate and OR gate are used for MSB decoding and XOR gates are used for LSB decoding, respectively. A certain number of buffers are inserted into the data path of the decoder circuit to ensure that the timing and delays of the input signals match.

    (Color online) Decoder design. (a) Truth table; (b) schematic.

    Figure 21.(Color online) Decoder design. (a) Truth table; (b) schematic.

    Experimental results

    Fabricated in a 28-nm CMOS process, the transmitter occupies a chip area of 0.115 mm2 and the receiver takes an active area of 0.532 mm2, respectively, as shown in Fig. 22. Measurement setup of the TX and RX are shown in Fig. 23. A Keysight DSOZ634A sampling oscilloscope is used to capture the output waveform as well as the eye-diagram from the transmitter. The 28 Gb/s NRZ eye diagram shows a vertical eye opening of 210 mV with an output amplitude of 351 mV singled-ended, and the 56 Gb/s PAM-4 eye diagram exhibits eye opening of 33 mV (upper-eye), 31 mV (mid-eye), and 28 mV (lower-eye) with an output amplitude of 353 mV single-ended, as shown in Fig. 24. In the RX measurement, the recovered 14 GHz clock is monitored by a Tektronics DPO75902SX oscilloscope, and the demultiplexed 875 Mb/s data is sent to a Keysight MXR608A oscilloscope to evaluate the bit-error rate, respectively. The PRBS13 data is fed into the RX, experiencing the input channel consisting of cable, package, connectors and printed circuit board (PCB) traces. The insertion loss of the input channel is −6.6 dB at Nyquist frequency of 14 GHz as shown in Fig. 25(a). The measured jitter tolerance is shown in Fig. 25(b). The recovered 14 GHz clock from the RX exhibits random jitter (RJ) of 469 fs and deterministic jitter (DJ) of 8.76 ps. The 875 Mb/s de-multiplexed data features 593 ps horizontal eye opening with 32.02 ps RJ, with timing margin of 0.53-UI at BER < 10−5, as shown in Fig. 26. The transmitter dissipates 125 mW and receiver consumes 181.4 mW with a 0.9-V power supply. The power breakdown of TX and RX are shown in Fig. 27, respectively. Performance summary and comparison with prior state-of-the arts are shown in Table 1.

    (Color online) Chip micrograph.

    Figure 22.(Color online) Chip micrograph.

    (Color online) Measurement setup of (a) RX, (b) TX.

    Figure 23.(Color online) Measurement setup of (a) RX, (b) TX.

    • Table 1. Performance summary and comparison.

      Table 1. Performance summary and comparison.

      ParametersRef. [4]Ref. [6]Ref. [7]Ref. [8]This work
      Data rate (Gb/s)565664.3756056
      Technology40-nm CMOS16-nm FinFET16-nm FinFET7-nm FinFET28-nm CMOS
      ArchitectureMixed-signalADC-DSPADC-DSPADC-DSPMixed-signal
      Active area (mm2)TX:1.14 RX:1.6N/ATX:0.09 RX:0.1630.268TX:0.115RX:0.532
      Power (mW)TX:290 RX:420TX:140 RX:370TX:89.7 RX:100182TX:125RX:181.4
      Link power efficiency (pJ/bit)12.6799.1 (w/o DSP)2.95 (w/o DSP)3.03 (w/DSP)5.47
      Recovered clock jitterRMS (fs)520N/AN/AN/A469

    (Color online) Measured TX Eye diagrams. (a) 28 Gbps NRZ; (b) 56 Gbps PAM-4.

    Figure 24.(Color online) Measured TX Eye diagrams. (a) 28 Gbps NRZ; (b) 56 Gbps PAM-4.

    (Color online) (a) Measured input channel loss; (b) jitter tolerance measurement results.

    Figure 25.(Color online) (a) Measured input channel loss; (b) jitter tolerance measurement results.

    (Color online) Measured RX results. (a) Eye diagram of recovered 14 GHz clock; (b) bathtub curve of demultiplexed 875 Mbps data.

    Figure 26.(Color online) Measured RX results. (a) Eye diagram of recovered 14 GHz clock; (b) bathtub curve of demultiplexed 875 Mbps data.

    (Color online) Power breakdown. (a) TX; (b) RX.

    Figure 27.(Color online) Power breakdown. (a) TX; (b) RX.

    Conclusion

    An analog mixed-signal based 28/56 Gb/s NRZ/PAM-4 transceiver has been demonstrated in a 28-nm CMOS technology. The transceiver architecture employs a quarter-rate TX featuring a 4-tap reconfigurable FFE and a half-rate RX incorporating a digital CDR, achieving a total link power efficiency of 5.47 pJ/bit.

    Tools

    Get Citation

    Copy Citation Text

    Yukun He, Zhao Yuan, Kanan Wang, Renjie Tang, Yunxiang He, Xian Chen, Zhengyang Ye, Xiaoyan Gui. A 28/56 Gb/s NRZ/PAM-4 dual-mode transceiver with 1/4 rate reconfigurable 4-tap FFE and half-rate slicer in a 28-nm CMOS[J]. Journal of Semiconductors, 2024, 45(6): 062204

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Articles

    Received: Jan. 12, 2024

    Accepted: --

    Published Online: Jul. 8, 2024

    The Author Email: Gui Xiaoyan (XYGui)

    DOI:10.1088/1674-4926/24010001

    Topics