Optical wavelength meter with machine learning enhanced precision

Gazi Mahamud Hasan; Mehedi Hasan; Peng Liu; Mohammad Rad; Eric Bernier; Trevor James Hall

doi:10.1364/PRJ.473686

1. INTRODUCTION

Interferometric methods are extensively used in diverse applications, including inter alia optical communications coherent receiver front ends [1] and spectral monitoring [2]; Bragg grating sensor interrogation [3]; laser intensity and frequency fluctuation metrology [4,5]; and fiber optic sensing of temperature [6], pressure [7], refractive index [8], and strain [9]. Common interferometric architectures involving a Mach–Zehnder interferometer [10], Michelson interferometer [5,11], or Fabry–Perot interferometer [12] are employed to convert a phase shift provided by some sensing means to a measurable change in light intensity. Mach–Zehnder interferometers (MZIs) formed by circuits of planar waveguides and couplers are particularly attractive for photonic integration.

The fundamental principle of a wavelength meter or frequency discriminator is the use of an interferometer to measure the phase difference between the original signal and a delayed replica of the signal. The delay converts a change of frequency to a change of relative phase between the original and replica signals. The conventional MZI structure has a co-sinusoidal response to the relative phase; hence, the sensitivity to small frequency deviations varies over its period. In frequency discriminator applications, it is necessary to maintain a quadrature phase bias to maximize sensitivity and, in wavelength meter applications, the loss of precision at null and peak bias points is a concern.

To avoid this signal fading problem in passive structures, Sheem introduced an MZI architecture using $3 \times 3$ couplers to provide a three-phase output [13]. Koo et al. developed a demodulation method that projects the three-phase output onto quadrature phase components from which a continuous phase is retrieved by a process involving differentiation, cross multiplication, summation, and integration [14]. Jin et al. applied least-squares estimation to the digitized quadrature components to recover the variation in amplitude and phase [15]. Todd et al. disclosed a Bragg grating sensor interrogation system that projects the digitized three-phase output onto quadrature components from which the phase is retrieved via a digital arctangent function [3]. Todd’s original method assumes an ideal coupler. Todd et al. subsequently extended the method to incorporate nonideal coupler parameters, which involves a weighted linear combination of outputs to which the digital arctangent is applied [10]. Xu et al. applied the extended approach to laser phase and frequency noise metrology [5]. The characterization of the impairments of a component in isolation is often not possible consequently; it is Todd’s original method that has become the conventional method of interferometric data processing. Wu et al. showed that the conventional approach is superior to preceding methods for a high-power signal in a severely noisy environment [16].

Kleijn et al. applied a nonlinear least-squares fitting procedure to calibration data provided by a $3 \times 3$ MMI based MZI wavelength meter [17]. A total of 10 parameters are extracted: six coupler scattering magnitudes, three coupler phase shifts, and one delay; which enable the compensation of uncertain coupler transmission matrices, interferometer delay imbalance, and photodetector responsivity. The convergence of the fitting algorithm requires good starting points for all parameters. Moreover, the parameter estimation requires knowledge of the source power used during calibration and operation modes. The sum of the output port powers of the interferometers is used for this purpose, but this sum is only substantially independent of the measurand for small impairments.

Motivated by the Lissajous figure traced by any pair of delay interferometer outputs as the frequency is scanned, researchers have applied a curve-fitting method developed by Fitzgibbon et al. [18], which is a specific case of Bookstein’s conic-section fitting method [19], to fit an ellipse [11] or squircle [20] to extract phase-retrieval parameters from scattered data. Recently, a $2 \times 4$ 90° hybrid (e.g., a $4 \times 4$ MMI) based MZI has been applied to the measurement of laser frequency fluctuations [4], and Chen et al. demonstrated a parallel arrangement of $4 \times 4$ MMI-based wavelength meters with waveguide delay lines engineered to relax temperature sensitivity [21].

This paper re-examines the interferometric wavelength measurement problem. An object vector composed of an in-phase, quadrature phase, and input power component emerges in the ideal case as a representation of the autocorrelation of the input sequence to a discrete Fourier transform (DFT) representing the interferometer output coupler. The vector belongs to a circular cone within an object space $R^{3}$ . Each point on the cone is sent by an orthogonal map to a vector of interferometer egress port photoreceiver outputs in an image space $R^{n}$ , $n \geq 3$ . In the nonideal case, intensity fluctuations of the source, impairments of the waveguide delay line and interferometer couplers, and sensitivity errors of the photoreceiver array and noise are considered. It is found that the circular cone is retained as the fundamental object on which the data to be retrieved are located. The component impairments break the orthogonal symmetry, but the map from the cone to the image space remains linear. An information-retrieval problem is formulated for a known delay as the construction using linear algebra only of a $3 \times n$ matrix representing the linear map that minimizes the sum of the squared prediction error over a training data set. An uncertain delay introduces nonlinearity, but a few iterations of a golden search algorithm suffice to retrieve the delay parameter. The method corrects the same comprehensive set of impairments as Kleijn’s method while eliminating its deficiencies. The algorithm is simple and robust. No parameter starting points are required; only the time-delay parameter requires bracketing over a broad interval. The calibration and phase retrieval process is invariant to source power. The retrieval process is naturally invariant to source optical power fluctuations during data processing.

2. THEORY

A. Perfect Components

Figure 1(a) illustrates a dual MZI approach to eliminate signal fading suffered by a single MZI architecture [22]. The signal is split between two parallel MZIs that are notionally identical with the exception that one MZI is biased in quadrature relative to the other. Ideally, each MZI is lossless; consequently, the intensities of their two output ports are complementary. The difference in intensity between the two output ports of each MZI provides a signed in-phase component and a signed quadrature-phase component of the phasor that describes the interference term. It is then straightforward to recover the phase with a frequency-invariant sensitivity.

Figure 1.(a) Schematic of a two-stage interferometer architecture consisting of two parallel $2 \times 2$ MZI. The two MZIs, including the delay lines represented by the circles, are notionally identical except for the quadrature bias of the lower (blue) MZI provided by the $π / 2$ phase shift. (b) Rearrangement of the architecture of (a). The notionally identical arms of the two MZIs, excluding the phase shift, have been brought forward and are now shared. The dashed subsystem block is recognized as the decomposition of a $4 \times 4$ DFT into a network of four $2 \times 2$ DFT blocks and a phase-shift element.

Download full size

View all figures

In its improved version, as illustrated in Fig. 1(b), the sharing of a delay line between two MZIs guarantees that the two delay lines in Fig. 1(a) are identical. The network of four $2 \times 2$ couplers and a $π / 2$ phase shift is recognized as an instance of a $4 \times 4$ DFT, which may be implemented alternatively using a single $4 \times 4$ coupler. For example, multimode interference (MMI) couplers with a uniform split ratio have transmission matrices that are phase-permutation equivalent to a Fourier matrix.

This rearrangement provides the motivation to consider a general interferometer architecture consisting of a $1 \times m$ uniform splitter and an $n \times n$ Fourier coupler interconnected by $m$ arms with imbalanced phase. The transmission matrix $F \in C^{n \times n}$ of the output coupler maps a column vector $b \in C^{n}$ composed of the complex field amplitudes at its ingress ports to a column vector $c \in C^{n}$ composed of the complex field amplitude at its egress ports: $c = F b .$ (1)

Each datum is a vector with elements equal to the modulus squared of the amplitude at each egress port, which can be identified with the diagonal of the outer product $C = c c^{†},$ (2)which is related to the outer product $B = b b^{†}$ by $C = {F B F}^{†} .$ (3)

The measured output port power vector $p$ is then given by $p = \sum_{j = 0, n - 1} tr ({F B F}^{†} e_{j} e_{j}^{†}) e_{j} = \sum_{j = 0, n - 1} tr (B F^{†} e_{j} e_{j}^{†} F) e_{j},$ (4)where the basis vector $e_{j}$ has a unit element in position $j$ and zeros elsewhere. In the special case of a transmission matrix $F$ that is a Fourier matrix: $F = \frac{1}{\sqrt{n}} [\begin{matrix} w^{0} & w^{0} & \dots & w^{0} \\ w^{0} & w^{1} & \dots & w^{n - 1} \\ ⋮ & ⋮ & □ & ⋮ \\ w^{0} & w^{n - 1} & \dots & w^{{(n - 1)}^{2}} \end{matrix}]; w = \exp (- i 2 π / n); F^{†} F = I,$ (5)where $I$ is the identity matrix. The outer product $B$ has the representation $B = \sum_{j, k = 0, 1, \dots, n - 1} b_{j} b_{k}^{*} e_{j} e_{k}^{†} .$ (6)

Substituting Eq. (6) into Eq. (4) noting $e_{k}^{†} F^{†} e_{p} e_{p}^{†} F e_{j} = {(e_{p}^{†} F e_{k})}^{†} (e_{p}^{†} F e_{j}) = \frac{1}{n} w^{p (j - k)}$ (7)and collecting terms with $j - k = q \mod n$ , yields $p = F ρ; ρ_{q} = \frac{1}{\sqrt{n}} \sum_{j - k = q \mod n} b_{j} b_{k}^{*} .$ (8)

Equation (8) is a restatement in matrix/column vector form of the familiar result that the modulus squared of the discrete Fourier transform of a sequence is equal to the discrete Fourier transform of the circular autocorrelation of the sequence. The vector $ρ \in C^{n}$ is the result of summing over the trailing diagonals of $B = b b^{†}$ . A vector $b$ of length $m$ generates $2 m - 1$ nonzero trailing diagonals. The cyclic nature of the summation in Eq. (8) does not come into play if zero padding leads to $n \geq 2 m - 1$ . The vector $ρ$ may then be expressed by a total of $2 m - 1$ real-valued components, which is the largest number of knowable unknowns that may be recovered from the measurement. For $m = 2$ and unit input power, the vector $ρ$ takes the form $ρ = \frac{1}{\sqrt{2}} \frac{1}{\sqrt{n}} (\cos (θ) ρ_{1} + \sin (θ) ρ_{2} + \sqrt{2} ρ_{3}),$ (9)where $θ$ is the phase imbalance of the two arms, and ${ρ_{1}, ρ_{2}, ρ_{3}}$ are orthonormal vectors: $ρ_{1} = \frac{1}{\sqrt{2}} [\begin{matrix} 0 \\ 1 \\ 0 \\ 1 \end{matrix}], ρ_{2} = \frac{1}{\sqrt{2}} [\begin{matrix} 0 \\ i \\ 0 \\ - i \end{matrix}], ρ_{3} = [\begin{matrix} 1 \\ 0 \\ 0 \\ 0 \end{matrix}]; (ρ_{j}, ρ_{k}) = δ_{j k},$ (10)where $0$ is the null vector representing the zero padding. The vector $ρ$ contains all the information to be retrieved: the in-phase term $\cos (θ)$ , the quadrature-phase term $\sin (θ)$ , and the input power $1$ appear as weights of its three orthonormal components. The phase may be extracted by introducing the real scalar coordinates $x = (ρ, ρ_{1}); y = (ρ, ρ_{2}); z = (ρ, ρ_{3}) \Rightarrow (x, y, z) = \frac{1}{\sqrt{2}} \frac{1}{\sqrt{n}} (\cos (θ), \sin (θ), \sqrt{2})$ (11)and evaluating the arctangent $θ = \arctan (y / x)$ (12)interpreted in the four-quadrant sense. The coordinates $(x, y, z)$ satisfy $x^{2} + y^{2} - \frac{1}{2} z^{2} = 0,$ (13)which defines a double-napped circular cone in $R^{3}$ .

The Fourier matrix preserves the inner product so that $p = \frac{1}{\sqrt{2}} \frac{1}{\sqrt{n}} (\cos (θ) p_{1} + \sin (θ) p_{2} + \sqrt{2} p_{3}); (p_{j}, p_{k}) = δ_{j k},$ (14)where ${p_{1}, p_{2}, p_{3}}$ are the transformed basis: $p_{1} = \frac{\sqrt{2}}{\sqrt{n}} [\begin{matrix} \cos (φ_{0}) \\ \cos (φ_{1}) \\ ⋮ \\ \cos (φ_{n - 1}) \end{matrix}]; p_{2} = \frac{\sqrt{2}}{\sqrt{n}} [\begin{matrix} \sin (φ_{0}) \\ \sin (φ_{1}) \\ ⋮ \\ \sin (φ_{n - 1}) \end{matrix}]; p_{3} = \frac{1}{\sqrt{n}} [\begin{matrix} 1 \\ 1 \\ ⋮ \\ 1 \end{matrix}]; φ_{p} = p \frac{2 π}{n} .$ (15)

Together, Eqs. (9) and (14) define a real orthogonal map $O : R^{3} \to R^{n}, n \geq 3$ such that $p = \frac{1}{\sqrt{2}} \frac{1}{\sqrt{n}} O x; O = [\begin{matrix} p_{1} & p_{2} & p_{3} \end{matrix}] \in R^{n \times 3}; x = [\begin{matrix} x \\ y \\ z \end{matrix}] \in R^{3} .$ (16)

The transpose $O^{T}$ may be used to project the measured data onto the 3D space $R^{3}$ containing the circular cone. The image $p$ of the object $x$ under the orthogonal map also lives on a cone. The conic shape is a consequence of linearity, and the absence of deformation is a consequence of orthogonality. The invariance of the cone to rotation about its axis corresponds to a translation of the phase. The mirror symmetry in any plane containing the axis or in the plane at the origin perpendicular to the axis results in a reversal of the direction clockwise or anticlockwise of increasing phase. A specific choice of coordinate system and a calibration measurement are necessary to fix the phase origin and direction.

B. Imperfect Components

The optical system is equivalent to a parallel arrangement of $n$ copies of a single input and output port MZI terminated by photoreceivers and driven by a perfect $1 \times n$ splitter. Consequently, the measurement at a selected egress port of the output coupler is of the form $p = {| a_{1} \exp (i θ) + a_{2} |}^{2} = {| a_{1} |}^{2} + {| a_{2} |}^{2} + 2 | a_{1} | | a_{2} | \cos (θ - ϕ) \Rightarrow p = [\begin{matrix} 2 | a_{1} | | a_{2} | \cos (ϕ) & 2 | a_{1} | | a_{2} | \sin (ϕ) & \frac{1}{\sqrt{2}} ({| a_{1} |}^{2} + {| a_{2} |}^{2}) \end{matrix}] [\begin{matrix} \cos (θ) \\ \sin (θ) \\ \sqrt{2} \end{matrix}],$ (17)where $a_{1}$ and $a_{2}$ are the complex transmissions of the paths from the input port through one or other of the interferometer arms to the selected output port, excluding the phase contributed by the delay line and scaled by a real constant to account for photoreceiver sensitivity. The impairments affect the power bias ${| a_{1} |}^{2} + {| a_{2} |}^{2}$ , phase origin $ϕ = \arg (a_{1}) - \arg (a_{2})$ , and amplitude $2 | a_{1} | | a_{2} |$ of the recorded fringe patterns. The orthogonal symmetry is broken, but the map remains linear, and the circular cone is retained as the fundamental object on which the data to be retrieved are located.

C. Learning Algorithm

A linear system that maps an input $x \in R^{m}$ to an output $y \in R^{n}$ may be described by a matrix $A \in R^{n \times m}$ : $y = A x .$ (18)

Suppose a sequence of measurements is made of pairs of inputs and outputs associated by the system, which are assembled into a collection of data $D$ known as the training set $D = {(x_{k}, y_{k}) | k = 1, N} .$ (19)

The task is to reconstruct $A$ from $D$ . In practice, the training set $D$ is corrupted by measurement errors and noise, so the problem is reformulated as finding an $A$ that minimizes an error function defined by $F (A) = \frac{1}{N} \sum_{k = 1, n} (y_{k} - A x_{k}, y_{k} - A x_{k}),$ (20)where $(\cdot, ⋅)$ denotes the Frobenius inner product. The Gâteaux derivative of $F$ evaluated on the tangent vector $h$ is given by $D_{A} F (h) = - 2 (h, R_{y x} - A R_{x x}),$ (21)where $R_{y x} = \frac{1}{N} \sum_{k = 1, N} y_{k} x_{k}^{†}; R_{x x} = \frac{1}{N} \sum_{k = 1, N} x_{k} x_{k}^{†} .$ (22)

Consequently, the error function is minimized by the choice $A = R_{y x} R_{x x}^{- 1} .$ (23)

In general, $A$ is not invertible if $n > 3$ . However, the Moore–Penrose inverse $A^{+}$ provides the minimum norm least-squares solution of Eq. (18). In practice, the system is overdetermined, which leads to the explicit expression $A^{+} = {(A^{†} A)}^{- 1} A^{†} .$ (24)

An individual measurement $y$ can be mapped to the object space by evaluating $x = (A^{+} y, e_{1}); y = (A^{+} y, e_{2}); z = (A^{+} y, e_{3})$ (25)and its phase retrieved using $θ = \arctan (x / y)$ , where the arctangent function is interpreted in the four-quadrant sense. However, in experiments, there is no direct object space measurement; only the frequency of the input is measured and paired with the image space measurement. The object space data are parameterized by the phase $θ$ , which must be inferred from the measured frequency $ω$ using; if dispersion is neglected, the relationship $θ = (ω - ω_{0}) τ + θ_{0},$ (26)where $ω_{0}$ is some nominal reference frequency, $τ$ is the interferometer delay imbalance, and $θ_{0}$ is the phase at $ω = ω_{0}$ .

The phase bias $θ_{0}$ is sensitive to fabrication process variations and hence uncertain. It acts as a rotation about the axis of the circular cone on which the object samples live. The group of rotations is a subgroup of the general linear group to which $A$ belongs. Consequently, the phase bias in Eq. (26) may be dropped and its action as a rotation absorbed into $A$ .

The delay is robust to fluctuations of the ambient environment. It may be determined by design through accurate knowledge of the physical path length imbalance and the group index of the waveguide and refined by a measurement of the free-spectral range (FSR) of the interferometer. The latter may be done by applying a golden section search for the delay $τ$ that minimizes the residual error given by Eq. (20) after substitution of Eq. (23).

3. RESULTS AND DISCUSSION

A. Simulation

A schematic of the conventional wavelength interrogation system considered for validation of the proposed method and the optical spectra at the three outputs are shown Figs. 2(a) and 2(b). The Virtual Photonics Inc. (VPI) software package has been used to derive these spectra. The unbalanced MZI architecture consists of a $2 \times 2$ MMI input coupler and $3 \times 3$ MMI output coupler with an ideal path length difference between its two arms corresponding to a free spectral range (FSR) of 50 GHz. For an ideal system, the outputs of the identical photoreceivers can be expressed as $p = [\begin{matrix} p_{1} \\ p_{2} \\ p_{3} \end{matrix}] = \frac{1}{\sqrt{2}} \frac{K}{\sqrt{3}} O [\begin{matrix} \cos (θ) \\ \sin (θ) \\ \sqrt{2} \end{matrix}],$ (27)where $K$ is proportional to the input optical power and the responsivity of the photoreceivers, and $O = \frac{1}{\sqrt{3}} [\begin{matrix} - \frac{\sqrt{3}}{\sqrt{2}} & - \frac{1}{\sqrt{2}} & 1 \\ 0 & \sqrt{2} & 1 \\ \frac{\sqrt{3}}{\sqrt{2}} & - \frac{1}{\sqrt{2}} & 1 \end{matrix}]; O^{T} O = I .$ (28)

Figure 2.(a) Schematic of a conventional wavelength meter system. (b) Ideal optical spectra of the egress ports of the output coupler.

Download full size

View all figures

Equations (27) and (28) can be derived from Eqs. (14 )–(16) with appropriate allowance for phase permutation equivalence of MMI and Fourier couplers; no adjustment of the proposed data-processing method is necessary. Impairments due to imperfect couplers, photoreceivers, and interferometer arms break the orthogonal symmetry, but the concept of the circular cone as the fundamental object remains useful since all these impairments are encompassed by the linear map $A$ . Fluctuations and noise will also be added by source power fluctuations, photoreceiver, and quantization noise. These errors are accommodated by the least-squares fit of the system map to the training set and the Moore–Penrose inverse used for data processing. The deviations from the ideal case will be small, and the port data are clearly recognisable as a poly-phase fringe pattern. The phase retrieved for a given linear map is robust to source power fluctuations, as the linearity in input power of the system ensures that the object samples lie on the cone irrespective of source power.

A MATLAB code was developed to evaluate the performance of the learning algorithm in processing data generated by a simulated wavelength meter subject to a variety of random impairments. Simulation of a wavelength meter with perturbed interferometer delay imbalance and arbitrary phase bias provides synthetic measured data for further data processing. Impairments are added to the coupler transmission matrices and to the responsivities to emulate fabrication process variations and component tolerances. A matrix $L$ represents the instrument map where large perturbation is provided by the different impairments discussed. Gaussian noise is added to emulate thermal, RIN, quantum, and quantisation noise processes that occur during measurements made during the calibration and operation phases. To elucidate the robustness of the proposed algorithm against impairments and noise, a process is considered where it is perceived that 1000 instruments are available from the same manufacturer. There will be differences from instrument to instrument; however, for a tightly controlled standard process, in practice, the variability would be a small about a static but impaired “mean” instrument. Design variations could move that mean closer to a perfect “mean” instrument. Randomized impairments representing this variability are applied in the simulation of these 1000 interferometric instruments. Impairments of the couplers are introduced by Gaussian-distributed real and imaginary parts of transmission matrix components. Table 1 lists five cases where the degree of impairment has been increased gradually by varying the symmetry-preserving and symmetry-breaking perturbation parameters of the couplers. The impairments of the delay-line delay time and photoreceiver responsivities follow a Gaussian distribution; however, their standard deviations are kept constant at $σ = 5 %$ and $σ = 10 %$ , respectively, in all these cases.Table 1.

Simulation Parameter for Impairment and Noise

Case	Parameter
I	Couplers	Symmetry-preserving perturbation $σ = 10 %$ ; symmetry-breaking perturbation $σ = 1 %$
I	Noise	$σ = 4.08 \times 10^{- 3} mW$ (source power 1 mW)
II	Couplers	Symmetry-preserving perturbation $σ = 20 %$ ; symmetry-breaking perturbation $σ = 2 %$
II	Noise	$σ = 4.08 \times 10^{- 4} mW$ (source power 1 mW)
III	Couplers	Symmetry-preserving perturbation $σ = 30 %$ ; symmetry-breaking perturbation $σ = 3 %$
III	Noise	$σ = 4.08 \times 10^{- 4} mW$ (source power 1 mW)
IV	Couplers	Symmetry-preserving perturbation $σ = 40 %$ ; symmetry-breaking perturbation $σ = 4 %$
IV	Noise	$σ = 4.08 \times 10^{- 3} mW$ (source power 1 mW)
V	Couplers	Symmetry-preserving perturbation $σ = 50 %$ ; symmetry-breaking perturbation $σ = 5 %$
V	Noise	$σ = 4.08 \times 10^{- 4} mW$ (source power 1 mW)

After projecting these impairments, $L$ has been generated for an individual interferometer. The instrument is then trained with an independent training set with associated additive random noise. The signal-to-noise ratio (SNR) is varied between different cases. The training set is used to estimate and refine the delay imbalance and thus obtain $A$ via the proposed learning method. The matrix $A$ can, at best, inherit the condition number of $L$ ; there is no data-processing method able to retrieve information that is not present in the data. Figure 3 depicts distributions of the condition number of $L$ , $A$ , and the norm of the Moore–Penrose inverse $A^{+}$ for all cases. It can be observed that the distributions of the condition number of $A$ are well-bounded and follow almost exactly the distributions for $L$ for all cases. For severe impairment and noise, the condition number of $A$ remains of the order of unity, which explains how the linear mapping can approximate the orthogonal mapping in the limited impairment case; consequently, the inverse of $A$ is well-conditioned and processed continuous results for the data. It is possible to generate extreme impairments resulting in singular $A$ and extreme condition numbers; however, these are outliers characterizing a poor fabrication run that has destroyed the inherent DFT phase relationship of the couplers. As these extreme cases are rare, they can be removed in practice by adopting a quality-control procedure that discards an interferometer with too severe impairment.

Figure 3.Distribution of calculated condition number of $L$ and $A$ and norm of $A^{+}$ derived from the calibration simulations of 1000 interferometric instruments using the proposed method. Different impairment and noise settings, as listed in Table 1, correspond to different cases: (a) condition number of $L$ , (b) condition number of $A$ , and (c) norm of $A^{+}$ belong to Case I; (d) condition number of $L$ , (e) condition number of $A$ , and (f) norm of $A^{+}$ belong to Case II; (g) condition number of $L$ , (h) condition number of $A$ , and (i) norm of $A^{+}$ belong to Case III; (j) condition number of $L$ , (k) condition number of $A$ , and (l) norm of $A^{+}$ belong to Case IV; and (m) condition number of $L$ , (n) condition number of $A$ , and (o) norm of $A^{+}$ belong to Case V.

Download full size

View all figures

Figure 3 also shows that the distributions of the norm of $A^{+}$ are also well-bounded and close to unity. From linear algebra, it can be inferred that the noise of the processed data (before calculating the arctangent) is increased by no more than the norm of the Moore–Penrose inverse of $A$ ; further, as this norm is bounded (close to unity), it can be concluded that the processed data are stable, i.e., small perturbations such as noise are not significantly magnified.

To observe the effects of additive noise in the operation phase as well, an interferometer with an arbitrary condition number is chosen to be perturbed with impairment and calibration noise setting of Case I, and the proposed method is applied. After learning, the interferometer processes a test data set. In the operation stage, additive Gaussian noise providing SNR of 30 dB has been applied. Figure 4 shows the associated simulation results. Figure 4(a) shows that the projection by the Moore–Penrose inverse $A^{+}$ of the simulated measured data [Fig. 4(b)] has an excellent match to the original object data. Likewise, the mapping of the original object space by the linear map $A$ estimated from the training data provides an excellent fit to the simulated output port fringe patterns shown in Fig. 4(b). To judge the efficacy of the proposed algorithm, the conventional method due to Todd et al., where impaired image space data are processed by the orthogonal mapping of the perfect interferometer, has also been applied [3]. It can be observed from Fig. 4(a) that the conventional method results in poor object sample estimation, which is reflected in the corresponding retrieved-frequency plot shown in Figs. 4(c) and 4(d). A substantial improvement in the accuracy of the retrieved frequency is achieved by the proposed algorithm, as shown in Figs. 4(c) and 4(d).

Figure 4.(a) Correct object samples retrieved by the conventional method and object samples retrieved using the proposed method. (b) Output port fringe pattern samples (marker) accompanied by the fitted fringe pattern (solid) provided by the proposed method. (c) Comparison between the frequency measured using the conventional and proposed methods. (d) Comparison between the residual measured and source frequency using the conventional and proposed methods. The wavelength meter simulated has an MZI architecture based on a $3 \times 3$ MMI output coupler with all components impaired. The reference frequency is 193.4 THz (wavelength 1.55 μm).

Download full size

View all figures

To make the comparison between the proposed and conventional methods more evident, seven interferometers, after going through the impairment and learning process, are selected to have mappings with different condition numbers representative of different static impairments and calibration noise. Table 2 lists the corresponding parameters. Each interferometer processes 100 test data sets. Figure 5 shows the mean error and standard deviation of the distribution in estimating individual frequency samples of each wavelength meter. It can be observed from Figs. 5(a) and 5(c) that, even with the most severe impairment setting, the estimation error processed by the proposed approach is smaller than 0.4 GHz on average. The conventional approach cannot achieve such performance even with the least impairment and noise setting. It is evident from the mean error and standard deviation in Fig. 5(d) that small inherent impairments due to design flaw or fabrication limitation followed by noise in the learning and measurement stages limit the performance of the conventional approach and result in failure in predicting the wavelength with reliable precision.Table 2.

Simulation Parameter Applied for Operation

Case	Impairment and Calibration Noise Setting	Condition Number of Chosen $A$	Additive Noise in the Operation Stage
A	Case I	1.2456	Gaussian distribution; noise-equivalent optical power of $- 30 dBm$
B	Case II	1.8982	Gaussian distribution; noise-equivalent optical power of $- 20 dBm$
C	Case III	2.9186	Gaussian distribution; noise-equivalent optical power of $- 30 dBm$
D	Case IV	4.4648	Gaussian distribution; noise equivalent optical power of $- 30 dBm$
E	Case V	11.0627	Gaussian distribution; noise-equivalent optical power of $- 20 dBm$
F	Case V	7.6899	Uniform distribution; noise-equivalent optical power of $- 30 dBm$
G	Case V	10.6547	Uniform distribution; noise-equivalent optical power of $- 20 dBm$

Figure 5.Mean residual between estimated and original frequency using the (a) proposed and (b) conventional methods; standard deviation of the calculated residual between estimated and original frequency using the (c) proposed and (d) conventional methods. The reference frequency is 193.4 THz (wavelength 1.55 μm).

Download full size

View all figures

The simulation trials confirmed that: 1.The training and phase retrieval algorithms are invariant to static source power and phase bias. The calibration and test source powers may differ in value. The retrieved phase is naturally invariant to fluctuations from sample to sample of the test source power. The calibration process with a source power monitor is also invariant to fluctuations from sample to sample of the calibration source power.2.The code functions with a training set containing as few as four frequency samples providing 12 knowns to retrieve 10 unknown parameters.3.In the absence of additive noise, the method fully corrects all simulated impairments to machine precision provided the golden section search accuracy parameter is small enough. The loss of precision with increasing noise is graceful. The largest contributor to the loss of precision is noise in the phase-retrieval process. The loss of precision of the calibration process due to noise is reduced by the averaging over the training set. It is expected that the noise in most applications will be small ( $SNR > 30 dB$ ) given the modest photoreceiver bandwidth requirement.

To confirm the generality of the proposed algorithm, another wavelength meter with a $3 \times 3$ MMI coupler replaced by a $4 \times 4$ MMI coupler has also been investigated. The resulting orthogonal map is $O = \frac{1}{\sqrt{4}} [\begin{matrix} - 1 & 1 & 1 \\ - 1 & - 1 & 1 \\ \begin{matrix} 1 \\ 1 \end{matrix} & \begin{matrix} 1 \\ - 1 \end{matrix} & \begin{matrix} 1 \\ 1 \end{matrix} \end{matrix}]; O^{T} O = I .$ (29)

The conventional and proposed algorithms have been applied to the same set of impaired 4D image space data. The results shown in Figs. 6(a) and 6(b) validate the superior accuracy of the proposed algorithm in comparison with the conventional method.

Figure 6.(a) Correct object samples retrieved by the conventional method and object samples retrieved using the proposed method. (b) Comparison between the residual measured and source frequency using the conventional and proposed methods. The wavelength meter simulated has an MZI architecture based on a $4 \times 4$ MMI output coupler with all components impaired. The reference frequency is 193.4 THz (wavelength 1.55 μm).

Download full size

View all figures

B. Fabrication and Experiment

To evaluate the efficacy of the proposed data-processing method, experimental data are provided by a photonic integrated circuit wavelength meter with a $3 \times 3$ MMI-based MZI circuit architecture fabricated on the CMOS-compatible ${Si}_{3} N_{4}$ photonic integration platform provided by LioniX International. Their TriPlex technology offers a variety of planar waveguide structures based on alternating silicon nitride and silicon dioxide films [23]. Among them, only the asymmetric double strip (ADS) waveguide is offered by their multiproject wafer (MPW) service. The development of an on-chip wavelength meter on ${Si}_{3} N_{4}$ was motivated by research on a compact high-resolution wideband spectrometer [24]. To meet the specifications such as low loss, low dispersion, $< 1 GHz$ resolution, whole C band operation, and compact size for the spectrometer, ADS technology on ${Si}_{3} N_{4}$ was chosen as the most suitable option. Figure 7 shows the micrograph of the fabricated circuit. The MZI architecture consists of a Y-junction as the input coupler and a $3 \times 3$ MMI as the output coupler with a path length difference between its arms of 3393 μm. The associated FSR for the ADS waveguide is $\sim 49.69 GHz$ at the reference wavelength 1.55 μm (193.4 THz). Each input and output waveguide is terminated via a spot size converter (SSC) and an attached optical fibre which are not shown in Fig. 7. The ADS waveguide is optimized for TE mode propagation; thus, polarization-maintaining fibers with principal axes aligned with the chip are employed. A tunable laser (Agilent 81680A) capable of tuning over the whole C-band with 3 pm wavelength step is used as the optical input. The input power is fixed at 0 dBm. The wavelength response of the circuit is measured for a desired wavelength span around 1.55 μm. The output is detected by an optical power sensor (Agilent 81632A) and recorded by a light-wave measurement system (Agilent 8164A). The optical spectral data are collected and processed off-line by the proposed data-processing method. The experiment has been conducted in a centrally temperature-controlled laboratory environment.

Figure 7.Micrograph of the fabricated on-chip wavelength meter.

Download full size

View all figures

Figures 8(a)–8(c) depict the experimental results associated with a frequency span of one FSR with center vacuum wavelength 1550 nm. The learning algorithm is independent of the choice of training set center wavelength or number of FSRs spanned. Once a training set is chosen, the linear mapping is optimized for the wavelength span bounded by that set.

Figure 8.(a) Recorded output port intensity (markers) from the three output ports of the $3 \times 3$ MMI coupler and the fit provided by the proposed algorithm (solid). (b) Frequency offset retrieved from the power sensor data by the conventional and proposed approaches versus the original frequency. (c) Residual error in calculating the frequency over the desired frequency span. For the following figures, the test data processed are extracted from the adjacent FSR to the data used for training. (d) Recorded output port intensity (markers) from the three output ports of the $3 \times 3$ MMI coupler and the fit provided by the proposed algorithm (solid). (e) Frequency offset retrieved from the power sensor data by the conventional and proposed approaches versus the original frequency. (f) Residual error in calculating the frequency over the desired frequency span. The reference frequency is 193.4 THz (wavelength 1.55 μm).

Download full size

View all figures

The raw data collected from the three output ports of the $3 \times 3$ MMI coupler are shown by the markers in Fig. 8(a). An excellent fit is provided by $A$ shown by the solid line fringe pattern in Fig. 8(a). Figure 8(b) depicts an almost linear relationship between the original frequency recorded by the power sensor and the measured frequency; the residual error is limited to $\pm 0.2 GHz$ . It can be observed in Fig. 8(c) that the prediction of the conventional method can deviate significantly from the original frequency; the maximum residual error observed over the FSR is $\sim 3 GHz$ . It is a realistic assumption that the wavelength estimation will be performed over the same span as the training data; thus, $A$ has already been calculated. To demonstrate the generalization ability of the learning algorithm, the linear mapping $A$ constructed using the training set over the FSR centered at wavelength 1550 nm is used to retrieve the frequency using test data over an adjacent FSR. Figures 8(d)–8(f) show that the maximum residual error of the proposed approach increases only slightly to $\sim 0.8 GHz$ , which may be expected, as $A$ is not optimized for this test data set but remains substantially superior in precision compared with the conventional method.

Figure 9 shows the frequency estimation error observed for a total span of 950 GHz. Recorded data contained in one FSR around the center frequency depicted in Fig. 5 are taken as the training set. After each calibration, recorded test data aligned to the respective FSR are processed by the system. It can be observed that, over the total 950 GHz span, the residual error is limited to $\pm 0.35 GHz$ .

Figure 9.Residual error in calculating the frequency over the desired frequency span for different reference frequencies.

Download full size

View all figures

Although the precision achieved experimentally is over one order of magnitude greater than the conventional method, it is not as great as that achieved in simulations where precision is only limited by noise. This indicates that performance is limited by weak impairment mechanisms not captured by the model. Observations point to phenomena involving reflections and a mixed polarization state to explain the current limit to the precision. A learning algorithm based on a model (a priori knowledge) with numerous parameters risks overfitting the data at the expense of generalization ability. The current model is parsimonious and comprehensive. Consequently, rather than increasing the complexity of the model, it is preferable that the interferometer meets the assumptions of the model. If sufficient care is taken to avoid spurious reflections and to maintain the state of polarization by improved component and circuit designs, the only deviations would be due to the finite bandwidth of the components and the dispersion of the waveguide. It is only errors of the cone inferred by the data processor from a data set that will propagate to subsequent phase measurements.

Fluctuations in the calibration source power during the training set collection can misplace data off the circular cone and thereby impair construction of the linear map leading to error in the phase retrieval. The resolution of this issue, if significant, is to monitor the calibration source power to correctly scale the length of each object vector sample. For $n = 3$ , the $1 \times 2$ input splitter may be replaced by a $3 \times 3$ input coupler. This has the merit of a symmetric architecture more robust to fabrication process variations, and the otherwise unused central egress port of the input coupler may monitor the input power. It is only necessary that the measurement is proportional to the input power; a precise value of responsivity is not required.

To evaluate long-term stability, an experiment was performed in which training and test data sets were collected with time intervals of several hours, and the results showed significant long-term stability. The prototype featured no input power monitoring, temperature sensor, or control mechanism. Thus, an experimental study to assess long-term stability with proper temperature control and input power monitoring is left to a future endeavor. Nevertheless, it is expected that the principal source of drift is the temperature sensitivity of the bias phase of the interferometer. This can be corrected by collecting training set data over a range of temperatures as measured by an on-chip temperature sensor. It is expected that the differences between estimated linear maps corresponding to different temperatures will be a rotation. Moreover, the rotation angle or, equivalently, the phase bias is expected to be linear in the temperature range [21]. Consequently, knowledge of the temperature coefficient is enough to compensate for temperature drift.

4. CONCLUSION

In conclusion, this work has analyzed an interferometer with three or more polyphase outputs. The theoretical analysis has informed the formulation of a machine learning and data-processing method that corrects for imperfections of the interferometer components. The simulations demonstrate that a precision limited only by the level of random noise is attainable to the extent the model of the interferometer captures all significant impairments. The experimental observations using an MZI-based ${Si}_{3} N_{4}$ wavelength meter demonstrate an order of magnitude reduction in frequency estimation error compared with the conventional method. The maximum residual error is limited to $\pm 0.35 GHz$ over a 50 GHz FSR.

Acknowledgment

Acknowledgment. The authors acknowledge Huawei Technologies Canada for its support through a project contract. T. J. Hall is grateful to the University of Ottawa for its support of a University Research Chair. G. M. Hasan acknowledges the Ontario Student Assistance Program for its support through the Ontario Graduate Scholarship. G. M. Hasan is also grateful to the University of Ottawa for its support through an international admission scholarship.

Category: Integrated Optics

Received: Aug. 23, 2022

Accepted: Jan. 3, 2023

Published Online: Feb. 24, 2023

The Author Email: Gazi Mahamud Hasan (ghasa102@uottawa.ca)

DOI:10.1364/PRJ.473686