Improved SD-FEC with multi-tap error-tracking DFE with partial unrolling in high-speed IM/DD systems

Xue Zhao; Jing Zhang; Jiahao Zhou; Chenye Wang; Zhengyu Ma; Shaohua Hu; Bo Xu; Kun Qiu

doi:10.3788/COL202523.010605

1. Introduction

Driven by large-bandwidth Internet applications such as media services, cloud computing, and the Internet of Things (IoT), there is an exponentially increasing requirement of the data traffic in data center interconnects (DCIs)^[1]. In recent years, the data traffic of intradata center interconnects (Intra-DCIs) is typically greater than that of interdata center interconnects (Inter-DCIs)^[2]. Intra-DCIs that cover 2-km or 10-km transmission distances are highly sensitive to cost and power consumption. The intensity modulation and direct-detection (IM/DD) system has been widely implemented in 100- and 400-Gb/s intra-DCIs due to its low cost, power consumption, and simple architecture. Advanced modulation formats such as discrete multitone (DMT), pulse amplitude modulation (PAM), and carrier-less amplitude and phase modulation (CAP), have been used to increase the data rate with serious bandwidth limitation^[3–5]. Among these modulation formats, the PAM4 modulation format has been adopted in IEEE 802.3bs 400 gigabit Ethernet (GbE) standardization due to its simplicity and lower power consumption^[6].

The band-limited effect and the signal-to-signal beating interference (SSBI) caused by the interaction of chromatic dispersion (CD) and square-law detection are two limitations for IM/DD systems^[7]. One popular approach to deal with the distortions is to utilize the electric equalization technique at the receiver. The maximum likelihood sequence estimation (MLSE) has been proven to be the optimal way to address intersymbol interference (ISI) for signals with white noises^[8]. However, when the channel induces a large delay spread or a higher modulation order is used, the number of trellis states increases enormously, which increases the power consumption of MLSE significantly. A feed-forward equalizer (FFE) combined with a decision feedback equalizer (DFE) can effectively compensate for linear and nonlinear distortions with lower complexity^[9]. However, with the growing demands of high baud rate transmission, equalizers cannot completely eliminate the distortion caused by optical channels. The large ISI resulting from bandwidth-limited transceivers severely degrades the transmission performance^[10]. Forward error correction (FEC), as an effective method to provide coding gain, has been widely employed to improve overall performance and guarantee reliable transmission^[11]. However, the error propagation resulting from DFE significantly impacts the log-likelihood ratio (LLR) distribution and degrades the soft-decision forward error correction (SD-FEC) decoding performance. A weighted DFE (WDFE) that introduces a reliability value to control the weight of feedback symbols has been studied to suppress burst error propagation^[12]. However, the WDFE cannot fully compensate for the spectral nulls because only a part of feedback symbols is directly decided. In addition, the performance evaluation of WDFE for SD decoding is currently lacking. In addition, the interleaving or precoding technique at the transmitter side can suppress the burst errors of DFE^[13,14]. However, the implementation of interleaving and de-interleaving increases the latency. While the transmission performance is degraded in scenarios, where the correlation of errors is less severe with a precoding scheme. Furthermore, the precoding needs to adopt the quantized LLRs for decoding. This degrades the SD-FEC decoding performance. A one-step state transition model that tracks the state probabilities recursively is proposed to decrease the impact of error propagation of DFE for soft decoding^[15,16]. However, it only considers the decoding performance improvement under one-tap DFE. Except for the error propagation problem, the timing constraint of feedback of DFE also limits its application in high-speed transmissions. Loop-unrolling architecture is a well-known method to relax the timing constraint^[17]. It precomputes all possible output cases and then selects the most probable one based on the previous symbols through a multiplexer. However, the application of a multiplexer still limits the efficiency of DFE.

In this Letter, we propose a multi-tap look-up-table (LUT)-based error-tracking DFE with partial unrolling (ET-DFE-PU) to alleviate the degradation of LLR by error propagation resulting from DFE. In ET-DFE, we use an error-tracking model to calculate the error-offset probabilities of DFE so that the distorted LLRs can be corrected. Thus, the performance of post-FEC bit error rate (BER) is improved with more accurate LLRs. To reduce the calculation complexity of LLRs, we establish an LUT to record the indexing symbol and its corresponding probability. Moreover, a low-complexity partially unrolled architecture is used to reduce possible output states. The proposed algorithm is verified in a 170-Gb/s PAM4 signal IM/DD system at C-band. The experimental results show that the LUT-based ET-DFE-PU can effectively solve the error propagation problem of DFE in the SD-FEC decoding stage. The proposed ET-DFE-PU achieves a 3-dB receiver sensitivity gain compared with conventional DFE at a post-FEC BER of $5 \times 10^{- 6}$ .

2. Principle

Figure 1(a) shows the block diagram of the multi-tap ET-DFE. The FFE is used to compensate for the linear impairments. Then, a post filter (PF) is used to suppress the noise enhancement caused by FFE, and the $N$ -tap DFE cancels the trailing ISI resulting from PF. In Fig. 1(a), the signal $y_{i}$ after PF at time $i$ can be described as $y_{i} = x_{i} + \sum_{k = 1}^{N} b_{k} x_{i - k} + n_{i},$ (1)where $x_{i}$ is the transmitted symbol belonging to ${\pm 1, \pm 3, \dots, \pm (M - 1)}$ and $M$ is the modulation order. $b_{k}$ is the tap coefficient of DFE, and $n_{i}$ stands for the additive white Gaussian noise (AWGN) with noise variance of $σ^{2}$ . The input of slicer ${\bar{y}}_{i}$ is expressed as ${\bar{y}}_{i} = x_{i} + \sum_{k = 1}^{N} b_{k} (x_{i - k} - d_{i - k}) + n_{i},$ (2)where $d$ is the decision symbol. Hence, $x_{i}$ would be equal to $d_{i}$ plus the error $e_{i}$ , i.e., $x_{i} = d_{i} + e_{i}$ . Then, ${\bar{y}}_{i}$ can be rewritten as ${\bar{y}}_{i} = d_{i} + e_{i} + \sum_{k = 1}^{N} b_{k} e_{i - k} + n_{i} .$ (3)

Figure 1.(a) Block diagram of ET-DFE; (b) error-tracking model of DFE.

Download full size

View all figures

We only consider 0 and $\pm 2$ for error events, and neglect errors like $\pm 4$ and $\pm 6$ , which would only occur in very low signal-to-noise ratio (SNR) cases. Therefore, we assume $e_{i} \in {\pm 2, 0}$ in Eq. (3). We define the biased state of the equalized symbol ${\bar{y}}_{i}$ as $s_{i} \in S = {l, c, r}$ , where $l$ , $c$ , and $r$ stand for the left-biased state, center state, and right-biased state, respectively. We use a vector $P_{i}$ to record different biased state probabilities conditioned on the previous equalized symbols^[15,16], which can be expressed as $P_{i} = [P (s_{i} | {\bar{y}}_{i - 1}, {\bar{y}}_{i - 2}, \dots)], s_{i} \in {l, c, r},$ (4)where the summation of $P_{i}$ elements is 1, and $P_{1}$ is initialized to [0,0,1]. Since $P_{i}$ records the different biased state probabilities for each symbol, this can be used to improve the accuracy of LLR. Figure 1(b) shows the error-tracking model of DFE. In Fig. 1(b), $p_{s_{i - 1}, s_{i}}$ indicates the state transition probability from error state $s_{i - 1}$ to $s_{i}$ , and $p_{s_{i - 1}, s_{i}}$ is defined as $p_{s_{i - 1}, s_{i}} = P (s_{i} | {\bar{y}}_{i - 1}, s_{i - 1}) .$ (5)

According to the error-tracking model, we create a transition probability matrix $Q_{i}$ . Then, the $P_{i + 1}$ can be calculated recursively as $P_{i + 1} = P_{i} Q_{i},$ (6)where $Q_{i}$ is obtained by doing a row normalization on $W_{i}$ , and $W_{i}$ can be calculated as $W_{i} = [\begin{matrix} p_{l, l} & p_{l, c} & p_{l, r} \\ p_{c, l} & p_{c, c} & p_{c, r} \\ p_{r, l} & p_{r, c} & p_{r, r} \end{matrix}] .$ (7)

In this Letter, the probability density function (PDF) of Gaussian distribution is denoted by $f (\cdot)$ . According to Eq. (3), we define an offset $Δ$ and assume ${\bar{y}}_{i} \sim N$ ( $d_{i} + Δ, σ^{2}$ ). Then, Eq. (7) can be rewritten as $W_{i} = [\begin{matrix} \sum_{m_{j} \in {U_{i} (v) > 0}} f (2 + m_{j}) & \sum_{m_{f} {U_{i} (v) > 0}} f (m_{j}) & \sum_{m_{j} {U_{i} (v) > 0}} f (- 2 + m_{j}) \\ f (2) & f (0) & f (- 2) \\ \sum_{z_{q} \in {U_{i} (v) < 0}} f (2 + z_{q}) & \sum_{z_{q} {U_{i} (v) < 0}} f (z_{q}) & \sum_{z_{q} {U_{i} (v) < 0}} f (- 2 + z_{q}) \end{matrix}],$ (8) $U_{i} (v) = \sum_{k = 1}^{N} b_{k} e_{i - k}, e_{i - k} \in {- 2, 0, + 2}, v = 1, \dots, 3^{N},$ (9) $f (Δ) = \frac{1}{\sqrt{2 π} σ} e^{- \frac{{[\bar{y} - (d + Δ)]}^{2}}{2 σ^{2}}} .$ (10)

When $d_{i} = (M - 1)$ or $- (M - 1)$ , the left or the right column of $W_{i}$ in Eq. (8) is set to a zero-column vector. In order to obtain more accurate LLRs, the different error state probabilities are used in LLR calculation. We extend this improved LLR expression for one-tap cases in Refs. [15,16] to multi-tap DFE, which can be expressed as $Λ_{j}^{S} = \log \frac{\sum_{x \in χ_{j}^{0}} \sum_{s \in S} f_{\bar{Y} | X, S} (\bar{y} | x, s) P (s | \bar{y})}{\sum_{x \in χ_{j}^{1}} \sum_{s \in S} f_{\bar{Y} | X, S} (\bar{y} | x, s) P (s | \bar{y})},$ (11)where $j = 1, 2, \dots, \log 2 (M)$ , $χ_{j}^{0}$ and $χ_{j}^{1}$ are the sets of PAM- $M$ symbols whose $j$ th bits are ‘0’ and ‘1’, respectively. $f_{\bar{Y} | X, S} (\bar{y} | x, s)$ is the PDF given by the distribution $\bar{y} \sim N (x + Δ, σ^{2})$ , where $Δ$ can be obtained from Eq. (2).

The above PDF calculation leads to a great amount of computational complexity. Therefore, we establish an LUT to store the quantized PDF and avoid the calculation of PDF, so the complexity of ET-DFE can be reduced dramatically. Figure 2 shows the quantized and continuous PDF curves with mean = 0 and $σ^{2} = 0.2$ as an example. In Fig. 2, we divide the range of $x \in [- 4, 4]$ into a series of segments with the interval of length $Δ I = 0.1$ and quantize the probability within the segment by the center point of the segment. When $x$ is beyond this range $[- 4, 4]$ , we assign it with a minimum probability. Note that, we need to update the LUT under different noise variances. With a simple offset, we can obtain the PDF distributions with the other levels.

Figure 2.Quantized and continuous PDF curves.

Download full size

View all figures

DFE removes ISI by subtracting the interference caused by previously detected symbols. This operation feedbacks the previous decisions to the decision of the current symbol, which causes a large delay and limits the implementation of high-speed DFEs. Loop unrolling can be used to relax the timing constraint resulting from the feedback loop in DFE. In this Letter, we propose a simplified ET-DFE-PU architecture. Figures 3(a) and 3(b) show the block diagram of $N$ -tap PAM- $M$ ET-DFE with loop unrolling (ET-DFE-LU) and ET-DFE-PU, respectively. In Fig. 3(a), PAM- $M$ ET-DFE-LU requires the precomputation of $M^{N}$ possible equalized values, whose complexity increases significantly with $M$ or $N$ . Actually, some symbols with lower probabilities can be neglected to reduce some unnecessary calculations. In Fig. 3(b), we employ a threshold detector to make a predecision of the signal at the output of FFE, which can select the most likely $P$ symbols (noted as $G$ ) from $M$ symbols, and $P < M$ ^[18]. Then, the number of possible results is reduced from $M^{N}$ to $P^{N}$ . Moreover, the multiplexer in the ET-DFE-PU critical path is $P^{N} -to- 1$ , as opposed to $M^{N} -to- 1$ multiplexer for a conventional ET-DFE-LU. Therefore, the proposed ET-DFE-PU roughly reduces the critical path delay and the maximum achievable throughput of DFE can be improved considerably, especially for higher-order PAM formats.

Figure 3.Block diagram of N-tap PAM-M. (a) ET-DFE-LU, (b) ET-DFE-PU.

Download full size

View all figures

$N$ -tap conventional DFEs require $N$ multiplications. As for the proposed LUT-based ET-DFE-PU, the complexity mainly comes from the recursive computation of vector $P$ . The multiplications of recursive computation of vector $P$ counts for nine. Moreover, the loop unrolling scheme precomputes all possible output cases and selects the proper one according to previous decision results, which do not require any multiplications^[19]. Therefore, the number of multiplications of the proposed ET-DFE-PU is nine.

3. Experimental Setup

Figure 4 shows the experimental setup of the PAM4 IM/DD system. At the transmitter, a pseudo-random bit sequence (PRBS) was encoded by the LDPC code with a block length of 64,800 bits and a code rate of 5/6. The encoded bit streams are mapped into PAM4 symbols with gray coding and then shaped by a root-raised cosine (RRC) filter with a 0.01 roll-off factor. Then, the data signal is loaded into the arbitrary waveform generator (AWG) operating at 120 GSa/s. The generated PAM4 signal passes through an electrical amplifier (EA) that drives a 40-GHz Mach–Zehnder modulator (MZM). The laser is operated at 1550 nm with an output power of 12 dBm. After 5-km standard single-mode fiber (SSMF) transmission, we use a boost amplifier (BA) to amplify the optical signal, since the photodiode (PD) is without any transimpedance amplifiers (TIAs). Then, the variable optical attenuator (VOA) is adopted to adjust the received optical power (ROP) of the TIA-free single-ended 40-GHz PD with a maximum optical input power of 10 dBm. Finally, the detected electrical signal is sampled by a 59-GHz real-time oscilloscope (RTO) operating at 256 GSa/s. Subsequently, the received signal is processed offline. The receiver digital signal processing (Tx DSP) includes resampling, matched filter, equalization, PAM4 de-mapping, LDPC decoding, and BER calculation. We compare the BER between conventional DFE and the proposed ET-DFE-PU.

Figure 4.Experimental setup of PAM4 IM/DD system.

Download full size

View all figures

4. Results and Discussion

We investigate the performance of the proposed LUT-based ET-DFE-PU in the 170-Gb/s PAM4 signal transmission over a 5-km SSMF. The tap numbers of FFE and DFE are set to 181 and 5, respectively. Figure 5(a) shows the post-FEC BER performance and the LUT size versus different intervals for 170-Gb/s PAM4 signal transmission at an ROP of 6 dBm. It can be seen that the post-FEC BER performance of LUT-based ET-DFE-PU improves as the interval $Δ I$ decreases, while the LUT size increases. When $Δ I$ is set to 0.06, the BER performance saturates and approximates the performance of ET-DFE-PU without LUT. Figure 5(b) shows the pre-FEC BER performance of the 170-Gb/s PAM4 signal. As comparisons, the conventional DFE, LUT-based ET-DFE-LU, and LUT-based ET-DFE-PU are employed after a 181-tap FFE for equalization. The inset of Fig. 5(b) shows the amplitude histogram of the PAM4 signal after FFE at 10-dBm ROP. The decision results after DFE may be one of three candidate symbols that are close to the output of FFE. Therefore, we limit the set of the most likely symbols in each time slot from four to three in partial unrolling. Then, the number of possible results is reduced from $4^{5}$ to $3^{5}$ . Moreover, the multiplexer in the critical path of the proposed ET-DFE-PU is reduced from $4^{5} -to- 1$ to $3^{5} -to- 1$ . In Fig. 5(b), the pre-FEC BER is computed by making hard decisions on the LLRs. The LUT-based ET-DFE-PU achieves almost the same pre-FEC BER performance as the LUT-based ET-DFE-LU with lower implementation complexity. Moreover, the proposed LUT-based ET-DFE-PU outperforms conventional DFE by 0.26-dB gain in ROP at a BER of $2 \times 10^{- 2}$ due to the improved LLR accuracy. Figure 5(c) shows the post-FEC BER performance versus ROP curves of conventional DFE, LUT-based ET-DFE-LU, and LUT-based ET-DFE-PU. The decoder performs belief propagation (BP) with 15 iterations. For the SD-FEC, the quality of LLR strongly affects the decoding performance. The proposed ET-DFE-PU can track the DFE error propagation and introduce the error information into the LLR calculation. Therefore, the LUT-based ET-DFE-PU can achieve 3-dB receiver sensitivity improvement at a post-FEC BER of $5 \times 10^{- 6}$ compared with the conventional DFE at the cost of increasing the number of multiplications from five of conventional DFE to nine. The results also show that the proposed LUT-based ET-DFE-PU shows a similar post-FEC BER performance with the LUT-based ET-DFE-LU. Figure 5(d) shows the LDPC performance for conventional DFE and LUT-based ET-DFE-PU. In Fig. 5(d), employing LUT-based ET-DFE-PU can improve the post-FEC BER performance by more than 3 orders of magnitude compared with the conventional DFE at the same pre-FEC BER.

Figure 5.(a) Post-FEC BER performance and the LUT size versus different intervals; (b) pre-FEC BER performance versus different ROPs; (c) post-FEC BER performance versus different ROPs; (d) LDPC performance for conventional DFE and LUT-based ET-DFE-PU.

Download full size

View all figures

5. Conclusions

We have proposed a multi-tap LUT-based ET-DFE-PU technique in high-speed PAM4 signal transmission IM/DD systems, which can improve the accuracy of LLR and relax the feedback time constraint of DFE. Moreover, the computation complexity of ET-DFE-PU is decreased with the help of PDF LUT. The experimental results show that the proposed LUT-based ET-DFE-PU can effectively alleviate the SD-FEC decoding performance degradation resulting from error propagation. In a 170-Gb/s PAM4 IM/DD system, the LUT-based ET-DFE-PU can achieve 3-dB receiver sensitivity improvement at the same post-FEC BER compared with the conventional DFE.

Category: Fiber Optics and Optical Communications

Received: Mar. 30, 2024

Accepted: Aug. 5, 2024

Posted: Aug. 5, 2024

Published Online: Feb. 10, 2025

The Author Email: Shaohua Hu (sh.hu@uestc.edu.cn)

DOI:10.3788/COL202523.010605

CSTR:32184.14.COL202523.010605