GRU neural-network-assisted high-refractive-index sensing based on a no-core fiber with a waist-enlarged fusion taper structure

Shiwei Liu; Mengyuan Wu; Shuaihua Gao; Zhuang Li; Haoran Wang; Hongyan Fu

doi:10.1117/1.APN.4.4.046004

1 Introduction

The refractive index (RI) is a critical physical parameter that describes the inherent properties of a material. Different materials have varying ranges of RIs, and by measuring the RI, one can generally distinguish among different types of materials. For known substances, the RI can be used to determine their purity. For example, in the food production sector, RI measurements can monitor the concentrations of components such as sugars and salts, ensuring product quality.1^–3 In environmental protection, measuring the RI of dissolved substances in water can aid in monitoring pollutant concentrations and assessing water quality, thus providing data support for environmental protection and water resource management.4 In oceanographic studies, measuring the RI of seawater can reveal how it is affected by temperature, salinity, and pressure, as well as facilitate the study of marine ecosystems and the distribution of ocean resources.5 In the production process of edible oils, organic solvents (e.g., hexane) are commonly used for fat extraction. By measuring the high RI, it is possible to effectively detect the residual solvent content in the oil, thereby ensuring compliance with safety standards.6

Optical fibers are widely used for sensing measurements of parameters such as curvature,7 pH,8 concentration,9 and magnetic fields10 due to their corrosion resistance, low cost, and insulating properties. Furthermore, because optical fibers are composed of silica and are incompatible with surrounding media, they are ideal for liquid RI sensing. Researchers have proposed various RI sensors based on optical fibers. Li et al.11 connected C-type fibers with single-mode fibers (SMFs) to form a dual Fabry–Pérot (FP) interferometer for temperature-compensated RI sensing measurements. As the C-type fiber is an open-cavity fiber, it allows the surrounding liquid to enter the fiber, resulting in an RI sensitivity of up to $1704 nm / RIU$ . In addition, it exhibits a temperature sensitivity of $- 0.196 nm / ° C$ , which supports temperature-compensated RI sensing. Liu et al.12 bent SMFs into a balloon shape to form a modal interferometer for RI and temperature measurement. At a bending radius of 4 mm, the RI sensitivity reaches $225.95 nm / RIU$ , ranging from 1.3493 to 1.3822 RIU, while the temperature sensitivity is $0.418 nm / ° C$ . Madrigal et al.13 etched tilted gratings in multicore fibers, increasing crosstalk between the cores and making them sensitive to RI. As the RI increases, the sensitivity exhibits a nonlinear response. When the tilt angle is 7 deg, the sensitivity is $- 250.8 dB / RIU$ within the range of RI from 1.39 to 1.44. Zhang et al.14 connected a tapered SMF in series within a multimode fiber, forming multimode interference. When the length of the multimode fiber is 2.2 mm, the maximum sensitivity of the RI is $1879.87 nm / RIU$ at an RIU of 1.437. Salceda-Delgado et al.15 proposed etching large-diameter air gaps within the fiber to form a Mach–Zehnder interferometer (MZI) for RI sensing. To facilitate linear fitting, the measurement range was divided. When the air gap diameter is $76 μ m$ , the sensitivity can reach up to $226.8 nm / RIU$ within the high RI range of 1.43 to 1.45. This allows the sensor to more effectively detect small variations in the sample’s composition.

In spectral data, the relationship between wavelength or intensity and sensing variable is often nonlinear, which makes it challenging for traditional linear models to accurately capture these relationships. Deep learning, particularly neural network models, possesses robust nonlinear modeling capabilities, enabling the extraction of hidden feature information from large datasets, thereby leading to more accurate demodulation. Nguyen et al.16 utilized deep neural network models to analyze modal interference information in multimode fibers, extracting the relationship between hidden information in the spectrum and measurement parameters, and successfully conducted temperature sensing measurements under external mechanical vibration interference. This model does not require shielding from external interference, enabling it to extract the desired measurement values even in noisy conditions, demonstrating high robustness. Li et al.17 used convolutional neural networks (CNNs) to demodulate speckle generated by multimode fibers for curvature sensing, successfully predicting the curvature features corresponding to 57 speckle patterns by training on speckle images at varying curvatures. They reduced the curvature error for 94.7% of the speckle patterns to within $\pm 0.3 m^{- 1}$ . Liu et al.18 combined tilted fiber Bragg gratings (TFBGs) with FP structures in series and applied CNN algorithms for dual-parameter demodulation of salinity and temperature. By training on 4344 spectral data samples, the average errors for salinity and temperature were reduced to 0.0207% and 0.11°C, respectively. Tan et al.19 employed CNN models for multidirectional curvature sensing using an MZI composed of multimode and dual-core fibers, enhancing the curvature sensing range to $10.4167 m^{- 1}$ and successfully achieving sensing in 12 different directions.

The gated recurrent unit (GRU) was first introduced by Chung et al.20 It is a variant of recurrent neural networks (RNNs) that aims to solve the issues of gradient vanishing and explosion encountered during sequence processing. Unlike long short-term memory networks (LSTMs), GRUs have only the update gate and the reset gate, resulting in a simpler structure compared with the three gates of LSTMs. Consequently, GRUs require fewer computational resources and memory, leading to faster training speeds. GRUs were initially applied in fields such as text analysis,21 speech recognition,22 and weather forecasting23 and have gradually been integrated into the field of optical sensing in recent years. Manie et al.24 proposed applying the GRU neural network algorithm to a wavelength division multiplexed sensing network, validating it through strain sensing in fiber Bragg gratings (FBGs). Even with minimal intensity differences, the GRU algorithm enhanced the accuracy of signal measurements, boosted multiplexing capabilities, and reduced computation time. Lu et al.25 applied the GRU algorithm to signal localization, successfully predicting and localizing the positions of three types of interference signals. Gao et al.26 proposed integrating generative adversarial networks into GRUs, successfully filling in and repairing vital sign signals in fiber sensing, thereby demonstrating high robustness.

In this paper, we propose a sensing scheme utilizing a neural network model for high-RI sensing demodulation based on the no-core fiber (NCF) with a waist-enlarged fusion taper (WEFT) structure. As the RI of the measured liquid is close to that of the NCF, the response of the resonant interference wavelength becomes highly sensitive. When employing the method of tracking the spectral resonance interference dip for demodulation, the nonlinear response of the interference dip leads to significant errors in high RI measurement. By utilizing the GRU neural network model for assisted demodulation of the high RI from the spectrum, we achieve a high accuracy of 0.9993. The combination of fiber RI sensing and the GRU model offers a new reference method for neural network applications in fiber sensing demodulation, highlighting the significant potential of neural networks in the optical field.

2 Sensing Principle

The WEFT structure based on the NCF is an up-taper structure, as illustrated in Fig. 1. NCF (CLF, YOFC) is a unique type of fiber that lacks a traditional core region, typically consisting solely of a cladding structure with a diameter of $125 μ m$ . The core and cladding diameters of the SMF (G.652.D, YOFC) are 9 and $125 μ m$ , respectively. We connect the NCF in series with an SMF and use an arc discharge in the optical fiber fusion splicer (66S, Fujikura, Koto, Japan) to create the WEFT structure,27 and the fusion splicing parameters of the WEFT structure based on NCF are shown in Table 1.

Figure 1.(a) WEFT structure based on the NCF and (b) its microscopic photo.

Download full size

View all figures

Table 1. Fusion splicing parameters of WEFT structure based on NCF.

View table
View all Tables
Table 1. Fusion splicing parameters of WEFT structure based on NCF.

Project Parameter
Prefusion time 180 ms
Prefusion power Standard
Distance $10 μ m$
Overlap $150 μ m$
Discharge time 2000 ms
Discharge intensity Standard

When light reaches the first WEFT structure, it acts as a beam splitter, causing some fundamental mode energy to leak into the NCF. Given that higher-order modes exhibit a stronger energy distribution in the NCF, the light leaking into the NCF primarily exists as higher-order modes. These higher-order modes propagate within the NCF, influenced by the RI. As light travels through the second WEFT structure, it serves as a coupler, coupling some higher-order mode energy from the NCF back into the core. At this point, the core contains both the fundamental mode and the light coupled back from the higher-order modes. As the fundamental and higher-order modes possess different propagation constants, they accumulate a phase difference during propagation. When these two modes meet in the optical fiber after the second WEFT structure, modal interference will occur; the interference intensity can be expressed as $I = I_{i} + I_{j} + 2 \sqrt{I_{i} I_{j}} \cos ϕ,$ (1)where $I_{i}$ and $I_{j}$ are the intensities of the $i$ th and $j$ th modes of the WEFT structure, respectively, and $ϕ$ is the phase difference of the two modes, which can be determined by the effective RI of the interference mode. It can be written as $ϕ = \frac{2 π (n_{eff}^{i} - n_{eff}^{j}) L}{λ},$ (2)where $L$ represents the sensing area effective length of the WEFT structure, $λ$ is the working wavelength, and $n_{eff}^{i}$ and $n_{eff}^{j}$ are the effective RIs of the $i$ th and $j$ th modes, respectively. When $ϕ = (2 m + 1) π$ and $m = 1, 2, 3, \dots$ , the interference dips of the resonance wavelength can be expressed as $λ_{dip} = \frac{2 (n_{eff}^{i} - n_{eff}^{j}) L}{2 m + 1} .$ (3)

In principle, the modes transmitted in the NCF are sensitive to interference from the surrounding RI. Compared with low RIs, higher RIs—particularly those near the RI of the NCF (1.444@1550 nm)—cause a significant amount of light intensity to leak into the surrounding liquid medium, resulting in a decrease in the intensity of the resonance dip wavelength. We conduct simulations of the optical field, observing the transmission of the light field at surrounding RIs of 1.430, 1.444, and 1.450, as shown in the simulation results presented in Fig. 2. At lower surrounding RIs, a large amount of light is transmitted within the NCF. When the RI matches that of the fiber, a significant amount of light leaks from the NCF into the surrounding liquid medium. When the surrounding RI exceeds that of the NCF, some light remains in the NCF due to Fresnel reflection, whereas the other portion leaks into the surrounding medium.

Figure 2.Optical field distributions of the WEFT structure based on the NCF with surrounding RIs at 1.430, 1.444, and 1.450, respectively.

Download full size

View all figures

3 Experiments and Discussion

3.1 Experimental Setup

Figure 3 illustrates a schematic diagram of the proposed RI sensing scheme utilizing the NCF with a WEFT structure. Both ends of the WEFT structure are secured by fiber clamps, which are connected to a spectrometer (OSA, HP70004A) and a broadband light source (BBS, QPHOTONICS, Ann Arbor, Michigan, United States). The OSA has a resolution of 80 pm and can record 800 data points at once. The BBS emits light in the wavelength range of 1450 to 1650 nm. In the experiment, we dissolve varying weights of ${MnCl}_{2} \cdot 2 H_{2} O$ (analytical reagent, Xilong, Shantou, China) in deionized water to prepare solutions of different concentrations. At room temperature ( $\sim 24 ° C$ ), the RI of the saturated ${MnCl}_{2}$ solution is measured at 1.4505 RIU. Therefore, the measured RI range of the ${MnCl}_{2}$ solutions spans from 1.4300 to 1.4505. The RI values are measured using an Abbe refractometer (WYA, INESA, Shanghai, China), which has a measurement range of 1.3000 to 1.7000 and an accuracy of $\pm 0.0002$ , and each measurement is calibrated three times. To measure the RI of the liquid, it is sufficient to place the fiber taper structure in a V-shaped groove and add the ${MnCl}_{2}$ solution. In preparing the ${MnCl}_{2}$ RI solutions, we first prepare a saturated ${MnCl}_{2}$ solution. Subsequently, we dilute it by gradually adding deionized water, enabling us to accurately prepare 30 different RI solutions within the range of 1.4300 to 1.4505. For each RI, we measure a total of 10 sets of transmission spectrum data, resulting in a sample size of 300 sets.

Figure 3.Experimental setup.

Download full size

View all figures

3.2 Wavelength Interference Dip Demodulation

Initially, we employ the WEFT structure to conduct measurements for RI solutions. The RI varied from 1.4300 to 1.4505, with the measurements organized in ascending order. Figure 4(a) displays the measured transmission spectra under different RIs. The wavelength of the resonance interference dip exhibits a significant red shift. To better illustrate the shift of wavelength in relation to changes in RI, we utilize a two-dimensional spectral response, as shown in Fig. 4(b), which captures complex spectral variations, making it suitable for dynamic monitoring and intricate analysis. From Fig. 4(b), the relationship of wavelength dip with RI demonstrates a nonlinear characteristic. This occurs because when the RI of the surrounding medium approaches that of the fiber, most of the light leaks into the surrounding medium. In this scenario, changes in light power are highly sensitive to variations in the RI of the surrounding medium, leading to a nonlinear response.

Figure 4.(a) Spectral response of wavelength under different RIs and (b) two-dimensional spectral response.

Download full size

View all figures

Figure 5 illustrates the polynomial fitting curve relating interference dip shift to RI, with a fitting coefficient of 98.1%. The sensitivities near RIs of 1.444 and 1.4505 are 1389.1 and $1875 nm / RIU$ , respectively. By performing inverse calculations on the spectral dip wavelength, the mean absolute error (MAE) for the wavelength dip values is 0.26 nm. This is due to the fact that nonlinear fitting involves more parameters, increasing the complexity of the fit, which can lead to model instability and a larger MAE.

Figure 5.Polynomial fitting of wavelength resonance dip with RI.

Download full size

View all figures

3.3 GRU Neural Network Demodulation Scheme

In high RI measurements, the variation of spectral interference dip exhibits nonlinear characteristics, and traditional nonlinear demodulation methods may fail to accurately capture these complex relationships, resulting in significant measurement errors and diminished accuracy. Neural networks, particularly deep learning techniques, can leverage multilayer neural network structures to capture higher-order spectral features, rendering them effective tools for addressing the aforementioned issues.

Traditional neural networks face challenges in effectively capturing temporal information during sequence prediction tasks. To address this issue, RNNs are commonly employed; however, they encounter gradient vanishing and explosion problems when learning long sequences. To overcome these challenges, GRUs were introduced as an enhanced RNN architecture, offering a more effective approach to handling long sequence data. GRUs manage the flow of information by introducing update and reset gates, enabling the retention of important information while forgetting unimportant details. Figure 6 illustrates the structure of the single GRU cell. Compared with the LSTM, GRU structures are simpler, possess fewer parameters, and offer higher training efficiency. GRUs can dynamically adjust the content and duration of memory to accommodate varying sequence characteristics. The GRU primarily consists of the following components.

Figure 6.Structure of a single GRU cell.

Download full size

View all figures

The update gate $z_{t}$ : $z_{t}$ controls the degree of integration between the new information $x_{t}$ and the previous state $h_{t - 1}$ . If $z_{t}$ is close to 1, it indicates that the model hardly updates its state at the current moment and relies more on the old information; if it approaches 0, the model will completely replace the old information with the new information: $z_{t} = σ (W_{z} \cdot [h_{t - 1}, x_{t}]),$ (4)where $x_{t}$ represents the measured spectral signal used to infer changes in the surrounding RI, $h_{t - 1}$ indicates the RI assessment by the fiber sensor at the previous time step, $W_{z}$ is the weight matrix, and $σ$ is the sigmoid function.

The reset gate $r_{t}$ : $r_{t}$ defines how information is used when calculating the current state. If $r_{t}$ is low, the model will disregard previous measurement data; whereas if it is high, the model will depend more on historical information: $r_{t} = σ (W_{r} \cdot [h_{t - 1}, x_{t}]) .$ (5)

Candidate state ${\hat{h}}_{t}$ : By incorporating the function of the reset gate $r_{t}$ , the candidate state can effectively integrate new input information with historical data, resulting in more accurate RI predictions. tanh is the tangent function: ${\hat{h}}_{t} = \tanh (W_{h} \cdot [r_{t} * h_{t - 1}, x_{t}]) .$ (6)

Current state $h_{t}$ : the model’s real-time prediction of the RI. By integrating the current input with historical information, the current state allows the model to quickly adapt to changes in the RI $h_{t} = (1 - z_{t}) * h_{t - 1} + z_{t} * {\hat{h}}_{t} .$ (7)

Before training the GRU neural network model, we first clean the spectral sample data to enhance data quality. Then, we perform normalization to improve the stability of training, preventing a large data range that could slow down convergence speed. Subsequently, we use 70% of the data for training and the remaining 30% for testing to assess the model’s generalization ability. Next, we configure the hyperparameters: the learning rate is 0.001, the batch size is 80, and the training duration is set to 1200 epochs, selecting Adam as the optimization algorithm to improve data convergence speed. Figure 7 illustrates the model architecture connecting the GRU-assisted demodulated spectrum with the RI. Table 2 presents a detailed description of the GRU network model configuration. The complete neural network model comprises nine layers and a total of 202,739 learnable parameters. The GRU model is trained using MATLAB R2024a, with a computer configuration that includes an Intel i9-13900HX CPU and a GeForce RTX 4060 GPU. We also use the MAE as the performance evaluation metric for the neural network model. Its equation is $MAE = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} |,$ (8)where $n$ represents the total number of RI samples, $y_{i}$ denotes the true RI for the $i$ th spectral data, and ${\hat{y}}_{i}$ indicates the predicted RI for the $i$ th spectral data.

Figure 7.Schematic of GRU model architecture in this paper.

Download full size

View all figures

Table 2. GRU network model parameters.

View table
View all Tables
Table 2. GRU network model parameters.

Layer Layer name Output shape Parameter
1 Input (800, 1, 1) —
2 Flatten (800, 1) —
3 GRU (64, 1, 1) 166,080
4 GRU (64, 1, 1) 24,768
5 GRU (32, 1, 1) 9312
6 GRU (16, 1, 1) 2322
7 GRU (4, 1) 252
8 FC^a (None, 1) 5
9 Output (None, 1) —

The coefficient of determination ( $R^{2}$ ) is an important metric for evaluating prediction performance, which can be abbreviated as $R^{2} = 1 - \frac{\sum_{i} (y_{i} - {\hat{y}}_{i})^{2}}{\sum_{i} (y_{i} - \bar{y})^{2}},$ (9)where $\bar{y}$ represents the average of the true RI. If the calculated $R^{2}$ value from the RI measurements approaches 1, it indicates that the established model effectively accounts for variations in the RI, and the predicted results closely match the actual measured values.

Figure 8 illustrates the loss curve and root mean square error (RMSE) curve for the GRU model’s training set. In Fig. 8(a), during the first 500 epochs of training, the weights of the model are randomized, resulting in the model’s inability to accurately predict the measured RI values, which leads to a high training loss. As the GRU model continues to learn and adjust its weight parameters, the loss value gradually decreases. After 1500 epochs, the loss value for the training set data drops to $2.7 \times 10^{- 5}$ . In Fig. 8(b), it is evident that during the initial 500 epochs, the model is unable to accurately extract the hidden features from the spectral data, resulting in significant fluctuations in the RMSE. As training progresses and by the time it reaches 1500 epochs, the model gradually adapts to the data, and the RMSE begins to decrease steadily, ultimately reaching 0.009. The total training time for the spectral samples is 27 s.

Figure 8.Training curves of the GRU model for the proposed high-RI sensor based on NCF with WEFT structure: (a) loss curve and (b) RMSE curve.

Download full size

View all figures

Figures 9(a) and 9(b) illustrate the RI prediction results obtained from the training and test datasets. The predicted RI closely aligns with the actual RI, conforming to the distribution of $y = x$ . In the training set, $R^{2}$ is 0.99925, and the MAE is 0.000124 RIU; in the test set, $R 2$ is 0.99926, and the MAE is 0.00011 RIU.

Figure 9.Comparison between true RI and predicted RI for GRU model-assisted high-RI sensor based on NCF with WEFT structure for the (a) training set and (b) test set.

Download full size

View all figures

To assess the predictive capability of the proposed GRU model, we conduct an error analysis on the test datasets and fit the error histogram data to a Gaussian distribution to examine the distribution characteristics of the errors. As illustrated in Fig. 10, 95% of the errors are below $3.1 \times 10^{- 4} RIU$ . Following Gaussian fitting, the error histogram demonstrates a good fit, indicating that the model’s prediction errors are primarily concentrated around zero. This suggests that the model can accurately predict the target values in most instances. Only a small fraction of the predicted RI values exhibit significant deviations from the actual values, which is mainly due to the excessively high RI of ${MnCl}_{2}$ , resulting in instability in the RI.

Figure 10.Error histogram between true RI and predicted RI for the test set.

Download full size

View all figures

3.4 Comparison of RI Demodulation Schemes

We compare the proposed neural network model with RI demodulation schemes presented by other researchers (see Table 3). In comparison to Ref. 28, our mean squared error (MSE) is lower, indicating that our proposed GRU model is more effective in handling extreme values, thereby avoiding larger errors. In comparison to Refs. 18 and 30, our MAE is lower; a smaller MAE indicates that the model exhibits a lower average prediction error across all RIs, reflecting a more stable overall predictive performance.

Table 3. Comparison of RI demodulation schemes.

View table

View all Tables

Table 3. Comparison of RI demodulation schemes.


Structure	Demodulation technique	Performance	$R^{2}$	Ref.
TFBG	Residual CNN	$2.82 \times 10^{- 7}$ (MSE)	0.9982	28
TFBG-assisted SPR	Principal component analysis (PCA)	$9 \times 10^{- 6}$ (accuracy)	—	29
TFBG with FP	Double-branch CNN	0.0003174 (MAE)	0.9998	18
SMF-NCF-SMF	Artificial neural network (ANN)	0.000808 (MAE)	0.9957	30
NCF with WEFT	Wavelength drift	—	0.981	This work
GRU	0.00011 (MAE)	0.9993
$2.24 \times 10^{- 8}$ (MSE)

4 Conclusion

We propose a scheme utilizing a GRU neural network for assisted high-RI sensing demodulation and conduct experimental validation based on an NCF with a WEFT structure. We integrate the NCF in series with SMFs and use an arc discharge in a fiber fusion splicer to form the WEFT structure, thereby inducing modal interference. In traditional interference dip tracking demodulation schemes, it resulted in nonlinear effects, reducing the accuracy of RI measurements. We employ the GRU neural network algorithm for feature extraction from the spectral information, significantly enhancing accuracy through the learning capabilities of the multilayer neural network. The experimental results demonstrate an $R^{2}$ of 0.9993 and an MAE of 0.00011 RIU, confirming the strong performance of the GRU model in assisting high-RI sensing demodulation. The success of this method highlights the significant application potential of neural network demodulation in the field of optical sensing.

Shiwei Liu obtained his master’s degree from the School of Electrical and Information Engineering at Northeast Petroleum University, Daqing, China, in 2022. He is currently pursuing a PhD in the Department of Electronic Engineering at Xiamen University, Xiamen, China. His research interests include optical fiber sensing and neural network data processing.

Mengyuan Wu is currently pursuing a PhD in the Department of Electronics Engineering, Xiamen University, Xiamen, China. Her research interests include optical fiber sensors and their applications.

Shuaihua Gao is currently working toward an MS degree in the Department of Electronic Engineering, Xiamen University, Xiamen, China. His research interests include the development of fiber optical Mach-Zehnder interferometers and the application of optical fiber in gas sensing.

Zhuang Li is pursuing a PhD in integrated circuit science and engineering at the School of Electronic Science and Technology, Xiamen University, Xiamen, China. His research interests include fiber and solid-state lasers and laser spectroscopy.

Haoran Wang received his PhD from Xiamen University, Xiamen, China, in 2024. He is currently an associate professor in the School of Ocean Information Engineering, Jimei University, Xiamen, China. His research interests include fiber-optic sensors, photonic sensors, and energy storage batteries.

Hongyan Fu received her PhD in optical engineering from Zhejiang University, Hangzhou, China, in 2010. She is currently a professor in the Department of Electronic Engineering, Xiamen University, Xiamen, China. Her research interests include fiber-optic sensors, fiber-based devices, and microwave photonics.

Category: Research Articles

Received: Jan. 21, 2025

Accepted: May. 12, 2025

Published Online: Jun. 13, 2025

The Author Email: Haoran Wang (wanghaoran@jmu.edu.cn), Hongyan Fu (fuhongyan@xmu.edu.cn)

DOI:10.1117/1.APN.4.4.046004

CSTR:32397.14.1.APN.4.4.046004

Table 1. Fusion splicing parameters of WEFT structure based on NCF.

Table 1. Fusion splicing parameters of WEFT structure based on NCF.

Table 2. GRU network model parameters.

Table 2. GRU network model parameters.

Table 3. Comparison of RI demodulation schemes.

Table 3. Comparison of RI demodulation schemes.