Integrated photonic fractional convolution accelerator

Kevin Zelaya; Mohammed-Ali Miri

doi:10.1364/PRJ.517491

1. INTRODUCTION

Convolution is one of the fundamental mathematical operations in signal processing, which is core to various filtering tasks [1]. By convolving a signal with a filter or kernel, one can identify patterns, detect edges, and remove noise, making it an essential tool in many applications such as image and audio processing. Since the introduction of convolutional neural networks, which have been largely successful in various applications, convolution has gained further interest as one of the most important processes for feature extraction in deep learning [2]. In optics, since the early days of optical signal processing, there has been great interest in optically performing the convolution operation [3 –9]. In free space optics, high-dimensional convolution is readily performed in the Fourier domain by using the so-called 4f system that utilizes two lenses for realizing discrete Fourier transform (DFT), in the inverse operation on a spatially modulated light wavefront, while the kernel is encoded using electronically controllable spatial light modulating devices [10]. In integrated photonics, some recent solutions have been reported to perform the DFT operation [11,12] and the convolution in the form of a matrix-vector multiplication and by utilizing wavelength-division multiplexing to encode information in different colors [13,14]. These are promising devices that attract wide attention for their potential to speed up classification problems in artificial neural networks [14 –16]. Nevertheless, an on-chip realization of the convolution operator in the Fourier domain has remained challenging. Recently, there have been proposals for performing convolution on-chip by utilizing multimode interference in broad-area waveguides with large footprints to approximate the DFT operation [15].

The fractional Fourier transform (FrFT) was first introduced by Namias [17] as a way to extend the conventional Fourier transform (FT) for studying the dynamics of free and forced quadratic systems in quantum mechanics. Since its conception, it has been the subject of extensive research due to its potential applications in areas like signal processing and optics, including optimal filtering [18,19] and time-frequency analysis [20,21]. Despite its usefulness, the FrFT does not straightforwardly inherit all the desired mathematical properties of the FT, namely, the translation invariance, correlation, and convolution of signals. In particular, several definitions of the fractional convolution operation have been introduced, fulfilling different properties. For instance, Mendlovic and Ozaktas [22] defined it by replacing the conventional FT with the FrFT of arbitrary order. Generalized convolution and product theorems using FrFT have been reported in the literature [23 –26], which are not strictly equivalent among themselves as they are based on modified translation invariance theorems; as a result, the product theorem gets replaced by a more relaxed condition. Likewise, several definitions and constructions of the discrete fractional Fourier transform (DFrFT) have been reported in the literature, each with different properties. Particularly, Ref. [27] reported a construction based on the set of Krawtchouk polynomials and its application using grated index media. In turn, Candan et al. [28] introduced an alternative construction in terms of the Harper equation, and the so-called Jx photonic lattice [29] provides another mechanism to implement the DFrFT in waveguide arrays on chip-scale devices [30]. Furthermore, these lattices have been experimentally realized in the microwave domain using interdigital capacitors to generate the required coupling strength between microstrip transmission lines [31].

In this paper, we introduce a modified convolution operation based on the discrete fractional Fourier transform, implemented in a simple photonic circuit involving only waveguides and modulator arrays. Mainly, the photonic Jx lattice [29,30,32] is used as the generator of the DFrFT operation. In contradistinction to the DFT, which relies on bulky lens setups that compromise its scalability, the Jx photonic lattice is reasonably easy to fabricate through open-access foundries and can be built up in a relatively small chip-scale setup. Furthermore, for large input ports, the DFrFT converges to the continuous FT for the appropriate length, and thus any operation involving the DFrFT is expected to converge to that of the FT, including the convolution operation. This establishes a comparison point for the results obtained from our architecture and those expected from a conventional convolution scheme. Electromagnetic wave simulations are carried out to test the performance of the proposed device, and error defects are separately studied to analyze the resilience of the architecture.

2. FRACTIONAL CONVOLUTION

Consider the vectors $f = {(f_{- j}, \dots, f_{j})}^{T} \in C^{N}$ and $k = {(k_{- j}, \dots, k_{j})}^{T} \in C^{N}$ , where $N = 2 j + 1$ , $j \in Z^{+}$ , and $f^{T}$ stands for the matrix transposition. The convolution of $f$ and $k$ , henceforth considered as the input signal and convolution kernel, respectively, is defined as ${(f * g)}_{p} = \sum_{q} f_{q} k_{p - q}$ . Alternatively, one can rewrite the latter in terms of the discrete Fourier transform (DFT) of the aforementioned vectors, $F [f] = F$ and $F [k] = K$ . This allows rewriting the convolution operation as $\sqrt{N} F {[(f * k)]}_{p} = F_{p} ⊙ K_{p} \equiv {(F_{- j} K_{- j}, \dots, F_{j} K_{j})}^{T}$ , with $⊙$ denoting the pointwise vector multiplication or Hadamard product. The DFT is defined by the matrix multiplication $F [f] = F f$ , with the matrix elements $F_{p, q} = e^{- 2 i π \frac{p q}{N}} / \sqrt{N}$ . The convolution is then recovered by performing the inverse DFT. Although mathematically equivalent, the latter form is more convenient for physical applications as the Fourier transform (FT) can be implemented using a conventional lens configuration [10].

Throughout this work, we exploit the latter construction in order to implement an analogous setup based on the proper discrete fractional Fourier transform (DFrFT) and its asymptotic properties. Following the continuous fractional convolution proposed in Ref. [22] (see also Refs. [25,26]), we define the corresponding discrete counterpart of order $α$ as $\sqrt{N} {(f \underset{α}{*} k)}_{q} = F^{2 π - α} {[F (α) ⊙ K (α)]}_{q},$ (1)where $F (α) ≔ F^{α} [f]$ and $K (α) ≔ F^{α} [k]$ are the $α$ -order discrete fractional Fourier transform (DFrFT) of $f$ and $k$ , respectively.

Before proceeding with the definition of the DFrFT, it is worth mentioning that the construction in Eq. (1) is supported by the following arguments. (i) The DFrFT used in this work can be manufactured in a compact on-chip device (e.g., in a cost-effective silicon-on-insulator platform) using photonic waveguide arrays, where the fractional order $α$ is associated with the array length. (ii) As discussed later, in the limit of large arrays $j \to \infty$ and for channels such that $q ≪ j$ , the DFrFT under consideration converges to the continuous FT. Although this limit case is unpractical for on-chip solutions, this establishes a benchmark to numerically compare the fractional convolution with the conventional one.

For the present work, the Jx photonic lattice is used as the passive element in the architecture to perform the DFrFT operation. The wave evolution through such a lattice is based on the coupled-mode theory, defined in terms of the symmetric and tridiagonal tight-binding matrix Hamiltonian $H_{p, q} = κ_{p} δ_{p + 1, q} + κ_{p - 1} δ_{p - 1, q}$ , where $κ_{p} = \frac{\tilde{κ}}{2} \sqrt{(j - p) (j + p + 1)}$ stands for the coupling parameter, $p, q \in {- j, \dots, j}$ the waveguide number (channel), $\tilde{κ}$ a scaling factor, and $N = 2 j + 1$ the total number of waveguides in the array. The eigenvalue problem $H u^{(n)} = λ_{n} u^{(n)}$ is well-known in the context of angular momentum in quantum mechanics [33], the eigenmodes (henceforth called supermodes interchangeably) and eigenvalues of which are $u_{q}^{(n)} = 2^{q} \sqrt{\frac{(j + q)! (j - q)!}{(j + n)! (j - n)!}} P_{j + q}^{(n - q, - n - q)} (0), λ_{n} = n,$ respectively, for $\tilde{κ} = 1$ , and $q, n \in {- j, \dots, j}$ . The upper-index $n$ denotes the lattice eigenmode number, the lower-index $q$ denotes the channel number, and $P_{n}^{(a, b)} (z)$ denotes the Jacobi polynomials [34].

Coupled-mode theory dictates the wave evolution of the electric field modal amplitudes $E = {(e_{1}, \dots, e_{N})}^{T}$ as they propagate through the lattice. Such a propagation is ruled by the Schrödinger-like equation [35,36], $i d E / d z = H E$ , where $z$ is the propagation distance. That is, the action of the unitary evolution operator $G (z = α) = e^{- i H α}$ on a given initial electric field $E (z = 0)$ generates the output field $E (z = α)$ . The matrix components of evolution operator $G$ write as [37] $G_{p, q} (α) = \sum_{k} e^{- i α n} u_{k}^{(p)} u_{k}^{(q)} \equiv i^{p - q} {[\sin (\frac{α}{2})]}^{q - p} \times {[\cos (\frac{α}{2})]}^{- q - p} \sqrt{\frac{(j + p)! (j - p)!}{(j + q)! (j - q)!}} \times P_{j + p}^{(q - p, - q - p)} (\cos α),$ (2)where the unitary operator $G (α)$ clearly meets all the required properties for the DFrFT, that is, (i) unitarity, (ii) additivity rule $(F^{α} \circ F^{β}) [f] = F^{α + β} [f]$ , and (iii) the existence of the cyclic index $F^{α = 0} [\cdot] = F^{2 π} [\cdot] = I$ [38], with $I$ the corresponding identity matrix in $C^{N}$ . The cyclic index property $G (2 π) \equiv I$ follows from the equidistant spectrum of $H$ and the spectral decomposition in Eq. (2).

Although the latter properties may be considered the essential building blocks to define a DFrFT properly, additional conditions may be taken into account, such as the reduction to the continuous FT for infinitely many input ports [27,28], which is also the case for the Jx lattice in consideration. It is worth mentioning that the exponential form of the evolution operator is equivalent to the spectral decomposition; nevertheless, the opposite is not necessarily true. Indeed, the DFrFT introduced in Ref. [28] is based on the spectral decomposition using the eigenvectors of a lattice Hamiltonian associated with the Harper equation; however, this is not related to the evolution of the Hamiltonian itself. Such a class of models is challenging for photonic implementations like the one presented in this work.

For our construction, the inverse order $- α$ is, in practice, implemented for the optical length $2 π - α$ . Furthermore, following the cyclic property, the input signal spreads and reconstructs itself at $α = 2 M π$ , for $M \in Z^{+}$ . Lastly, the eigensolutions $u^{(n)}$ converge in the limit $N \to \infty$ to the continuous Hermite-Gaussian modes (see Methods section in Ref. [29]), and thus Eq. (2) converges to the FT for $α = π / 2$ and $j \to \infty$ . Since our convolution architecture is expected to reduce to the conventional convolution for a large enough number of channels and $α = π / 2$ , one can compare its outcome with that of the conventional convolution.

3. PHOTONIC IMPLEMENTATION

Let us consider the block diagram in Fig. 1, which fully captures the mathematical operation related to the fractional convolution scheme in Eq. (1). The architecture under consideration requires three stages: (i) the input signal $f$ and the convolution kernel $k$ shall be transformed using a DFrFT of order $α$ for each, (ii) such that the transformed signal and kernel are pointwise multiplied and (iii) finally inverse transformed through a final DFrFT layer of order $2 π - α$ . For the rest of the manuscript, we focus on the fractional order $α = π / 2$ .

$A block diagram representation of the proposed discrete fractional convolution operation defined in Eq. (1), where the DFrFT operator is described by the unitary operator Eq. (2). The symbol ⊙ represents the pointwise or Hadamard product.$

Figure 1.A block diagram representation of the proposed discrete fractional convolution operation defined in Eq. (1), where the DFrFT operator is described by the unitary operator Eq. (2). The symbol $⊙$ represents the pointwise or Hadamard product.

Download full size

View all figures

The latter allows for visualization of the involved steps in the photonic implementation. For instance, a photonic Jx lattice of (normalized) length $π / 2$ can perform the DFrFT transform on the input signal, rendering the required signal $F (π / 2)$ of step (i). The DFrFT transform of the kernel [ $K (π / 2)$ ] of step (i) and the pointwise multiplication of step (ii) can both be performed in a single step by introducing an array of $N$ programmable Mach-Zehnder interferometers (MZIs) coupled to the end of the first Jx lattice. Each MZI in the array is composed of two 50:50 directional couplers and two phase shifters $ϕ_{n}$ and $θ_{n}$ , with $n = - j, \dots, j$ , enabling both amplitude and phase modulation. Thus, the $q$ -th component of the transformed input signal, $F_{q} (π / 2)$ , transforms accordingly as ${\tilde{F}}_{q} (π / 2) ≔ e^{i (ϕ_{q} + θ_{q} / 2 + π / 2)} \sin (θ_{q} / 2) F_{q} (π / 2)$ . Here, $U_{DC} ≔ (σ_{0} - i σ_{1}) / \sqrt{2}$ is the operator characterizing the 50:50 directional couplers, with $σ_{j}$ the conventional Pauli matrices. In this form, the phases $θ_{q}$ and $ϕ_{q}$ are tuned so that the complex-valued multiplicative factor in ${\tilde{F}}_{q} (π / 2)$ renders the $q$ -th component of the transformed kernel, $K_{q} (π / 2)$ , i.e., $K_{q} (π / 2) = e^{i (ϕ_{q} + θ_{q} / 2 + π / 2)} \sin (θ_{q} / 2)$ . Once this phase-tuning operation is performed across all the $N$ channels, the required pointwise operation $F (π / 2) \cdot K (π / 2)$ is ultimately achieved. Lastly, step (iii) is accomplished by coupling a final Jx lattice of length $3 π / 2$ that performs the inverse transform operation.

We remark that once the transformed kernel $K (α)$ has been programmed, it can be used to sequentially analyze and extract the features of multiple signals $f$ . Also, if the transformed kernel requires amplitude coefficients larger than one, it is always possible to rescale the kernel so that it can be programmed into the MZI array. The latter follows from Eq. (1), where it is clear that a global constant can be factored out of the architecture without modifying the convolution operation.

For completeness, the sketch in Fig. 2 illustrates the overall photonic architecture. This is based merely on waveguide coupling to achieve both the Jx lattice and the 50:50 directional coupler operations, whereas the phase manipulation can be performed by using either thermo-optical [39] or phase-change material [40] solutions, widely available on silicon-based platforms for production through open-access foundries. More details on the circuit designs are discussed below.

Figure 2.Integrated photonic circuit sketch for the proposed DFrFT-based convolution accelerator. The architecture comprises two DFrFT lattices of normalized lengths $π / 2$ and $3 π / 2$ , rendering the direct and inverse DFrFT of order $α = π / 2$ , respectively. The intermediate pointwise multiplication is performed through an array of programmable phase shifters. These are electrically tuned by the phase sets ${e^{i θ_{n}}}$ and ${e^{i ϕ_{n}}}$ so that this layer performs $K (α) ≔ F^{α} [k]$ .

Download full size

View all figures

Throughout this section, the convolution architecture is tested under scenarios by programming different kernels with properties known in the conventional convolution case. This establishes a benchmark to test the performance of the current device and the results of the expected convolution. The operations considered are signal displacement, Gaussian filtering, and edge detection. Each of these cases is associated with a specific convolution kernel, and the test is performed with different signals, some including background noise to push forward the capabilities of the architecture.

A. Displacement Filter

This particular setup considers the lattice eigenmodes $u^{(n)}$ as the signal input. These eigenmodes can be experimentally produced at the output of an additional photonic Jx lattice of length $α = π / 2$ by exciting its $n$ -th input channel. For instance, one generates a Gaussian-like distribution by exciting the $j$ -th channel, which is the fundamental supermode and, from the oscillation theorem of Hermitian operators [41], is the only one with no oscillations [42]. This is the only supermode with such behavior so that, in the continuous limit, it converges to a Gaussian distribution.

For the convolution kernel, we use a delta-like function $k_{q}^{(δ, r)} = δ_{q, r}$ , with $r \in {- j, \dots, j}$ and $δ_{q, r}$ the Kronecker delta function. This is equivalent to exciting only the $r$ -th convolution channel of the device. The straightforward calculation shows that $K_{p}^{(δ, r)} (π / 2) ≔ {(- i)}^{r - p} u_{p}^{(m)}$ , which shall be programmed into the MZI array of the architecture, and the corresponding DFrDT of the Gaussian-like input signal becomes $F_{p} (π / 2) = i^{j} u_{p}^{(j)}$ . The fractional convolution takes the form ${(f \underset{π / 2}{*} k)}_{q} = {(- 1)}^{r} i^{q + r + j} [u_{0}^{(q)} u_{0}^{(j)} u_{0}^{(r)} + \sum_{p = 1}^{j} u_{p}^{(j)} (u_{p}^{(q)} u_{p}^{(r)} + u_{- p}^{(q)} u_{- p}^{(r)})]$ (see Appendix A for a detailed proof). For even $j$ and following the parity symmetry of the Jx lattice eigenmodes, the convolution vanishes at the odd output channel if $r$ is even; likewise, the output vanishes at the even channel for odd $r$ . Furthermore, the roles are interchanged for odd $j$ . For the sake of simplicity and without loss of generality, we henceforth focus only on even $j$ . To avoid the latter partial suppression at the output, we can exploit the linearity of the convolution scheme Eq. (1) and alternatively use the convolution kernel $k_{q}^{(δ_{2}, r)} = δ_{q, r} + δ_{q, r + 1}$ composed of two consecutive delta-like signals so that both even and odd channels of $u^{(j)}$ contribute to the convolution process.

Further remarks can be obtained for more complex input functions. Since the set of eigenfunction ${u^{(n)}}_{n = - j}^{j}$ forms an orthonormal basis in $C^{2 j + 1}$ , any arbitrary input signal can be decomposed into the linear combination $f = \sum_{m = - j}^{j} C_{n} u^{(m)}$ , with $C_{m} = {[u^{(m)}]}^{†} f$ the corresponding expansion coefficients. For linear combinations of even ( $f_{- q} = f_{q}$ ) or odd signals ( $f_{- q} = - f_{q}$ ), the delta-like convolution kernel produces an output exclusively on the even and odd channels, respectively.

It is straightforward to note that the delta-like kernel $k^{(δ, 0)}$ in the conventional convolution (using the DFT) leaves the input signal unchanged, whereas the kernel $k^{(δ, r)}$ displaces the signal by $r$ units. This establishes a reference framework for our fractional convolution. To this end, let us consider $r = 0$ for the delta-like and double-delta kernel, combined with one of the eigenmodes $u^{(s)}$ at the input. The eigenmode number $s$ is fixed as $s = 20$ and $s = 10$ for $j = 20$ so that we have nodeless and highly oscillatory signals at the input. For $j = 100$ , we use the corresponding modes, $s = 100, 90$ , that match the number of nodes of the previous case. The resulting fractional convolution is depicted in Figs. 3(a) and 3(b) for $j = 20$ and $j = 100$ , respectively, where the suppression at the odd channel of the output is evident for the delta-like kernel, as predicted. In turn, the double-delta kernel $k^{(δ_{2}, 0)}$ fixes the output suppression and leads to an output signal akin to the respective input eigenmode. To quantify any difference between the input and output, we employ the distance function $d_{y_{1}, y_{2}} = ∥ y_{1} - y_{2} ∥,$ (3)with $∥ \cdot ∥$ the Euclidean norm in $C^{2 j + 1}$ .

Figure 3.(a) Convolution of the eigenmodes $u^{(20)}$ (left column) and $u^{(10)}$ (right column) with the convolution kernel $k^{(δ_{2}, 0)}$ (upper row) and $k^{(δ, 0)}$ (lower row) for $j = 20$ ( $N = 41$ ). (b) Convolution of the eigenmodes $u^{(100)}$ (left column) and $u^{(90)}$ (right column) with the convolution kernel $k^{(δ_{2}, 0)}$ (upper row) and $k^{(δ, 0)}$ (lower row) for $j = 100$ ( $N = 201$ ).

Download full size

View all figures

Here, we consider $y_{1} = u^{(n)}$ as the input eigenmode and $y_{2}$ the corresponding normalized convolution at the architecture output when the double-delta kernel $k^{(δ_{2}, 0)}$ is used. It is worth stressing that the convolution output has been normalized in order to obtain a meaningful comparison. The corresponding results are shown in Fig. 4. On the one hand, it is noted that the distance is smaller for $j = 100$ as compared to $j = 20$ for the lowest and highest eigenmodes $n = j$ and $n = - j$ , leading to a good fidelity at the output. This agrees with the corresponding convolution results presented in Fig. 3. On the other hand, the distance increases for eigenmodes at and around $n = 0$ , where more deviations are expected for $j = 100$ compared to $j = 20$ .

Figure 4.Continuous interpolation of the distance function $d_{y_{1}, y_{2}} = ∥ y_{1} - y_{2} ∥$ [Eq. (3)] between the input eigenmodes $y_{1} = u^{(n)}$ and the corresponding normalized convolution operation using the kernel $y_{2} = k^{(δ_{2}, 0)}$ . The distance has been plotted as a function of the eigenmode number $n \in {- j, \dots, 0, \dots, j}$ for $j = 20, 50, 100$ .

Download full size

View all figures

Once the fidelity of the convolution process with delta-like and double-delta kernels has been tested, we proceed to analyze the displacement produced by the kernel $k^{(δ_{2}, r)}$ , with $r \neq 0$ , where the double-delta function has been considered to avoid any output suppression. The effects of this kernel are tested for $j = 20$ and displacements to the right amounting to $12.5 %$ ( $r = 5$ ), $25 %$ ( $r = 10$ ), and $37.5 %$ ( $r = 15$ ) for the inputs $u^{(18)}$ and $u^{(15)}$ . For both inputs, displacements of $12.5 %$ and $25 %$ produce an output that displaces without any major distortion, as the right tail of signals vanishes prior to reaching the lattice edge (see Fig. 3). This is not the case with displacements of $37.5 %$ , where the output becomes perturbed in channels near the edge. For $j = 100$ , we consider displacement kernels equivalent to those of the previous case, namely, $12.5 %$ ( $r = 25$ ), $25 %$ ( $r = 50$ ), and $37.5 %$ ( $r = 75$ ), as well as inputs with the same number of oscillations, such as $u^{(98)}$ (two nodes) and $u^{(95)}$ (five nodes). Figure 5 displays the resulting output, which exhibits only subtle distortions when subjected to $75 %$ displacements. For eigenmodes with the same quantity of oscillations, the tail of both modes is larger than the scenario where $j = 20$ , which results in reduced edge effects leading to less interference at the edges.

Figure 5.(a) Convolution of the eigenmodes $u^{(15)}$ (upper row) and $u^{(18)}$ (lower row) with the kernel $k^{(δ_{2}, r)}$ for $r = 5$ (yellow), $r = 10$ (green), and $r = 15$ (red) for $j = 20$ . (b) Convolution of the eigenmodes $u^{(95)}$ and $u^{(98)}$ with the kernel $k^{(δ_{2}, r)}$ for $r = 25$ (yellow), $r = 50$ (green), and $r = 75$ (red) for $j = 100$ .

Download full size

View all figures

B. Gaussian-Like Smoothing Filter

In this section, we consider the $u^{(j)}$ filter and a discrete Gaussian filter as testing kernels to analyze the output for a noisy signal composed by a rectangular window function with added background Gaussian noise. Before proceeding, it is worth comparing the two filters. In order to take them to equal grounds, we generate discrete Gaussian filters with the same standard deviation of the corresponding $u^{(j)}$ filter. For instance, for $j = 20$ , the standard deviation of $u^{(j)}$ is $σ \approx 3.1623$ . The corresponding discrete Gaussian filter with the same standard deviation and sampled in the same grid leads to a kernel indistinguishable from the first one. The difference between these two kernels can be quantified through the distance Eq. (3). For $j = 20$ and $j = 100$ , the distances are approximately 0.0111017 and 0.0010284, respectively, and become smaller for larger waveguide arrays. This reinforces the use of the filter $u^{(j)}$ as an alternative candidate for a Gaussian filter, which in turn can be generated from another Jx lattice with length $α = π / 2$ by exciting its $j$ input channels. Nevertheless, contrary to a Gaussian filter, the standard deviation of the filter $u^{(j)}$ is fixed for each $j$ and cannot be modified. This means that narrow or peaked signals might be suppressed during the process, and thus this filter is better suited for smoothing wide-enough signals.

Let the input signal $f$ composed of a rectangular window with added Gaussian noise and the Gaussian-like filter $u^{(j)}$ be the inputs for our convolution device. We focus on the case $j = 100$ so that each DFrFT element in the device approximates the conventional DFT, and thus results close to the conventional convolution are expected. The output of the latter is depicted in Fig. 6, where the noisy input signal is also shown (lower inset) for reference. This figure illustrates that the proposed convolution solution performs the intended smoothing process while averaging the initial noise. In this form, the hidden rectangular window inside the noise is still readable in the processed output. Still, the output contains noise that is inherently introduced from the Jx photonic lattice. As discussed above, any signal and convolution kernel lead to an output that splits into the even and odd channels, depending on the parity of $j$ . This classification at the output reveals that an additive term exists in the even output that is not present on the odd one, or vice versa, depending on the parity of $j$ and the input signal $f$ [see Eqs. (A2) and (A3) of Appendix A].

Figure 6.Convolution norm $‖ {(f \underset{π / 2}{*} k)}_{q} ‖$ using the $u^{(j)}$ filter and a noisy signal $f$ (lower inset) composed of a rectangular window plus Gaussian noise for $j = 100$ . The upper-left and upper-right insets depict the even and odd pooling, respectively, associated with the convolution output.

Download full size

View all figures

In filtering processes, it is common to introduce the notion of pooling layers, in which the effects of some filtering procedures are sharpened by splitting the sampling space into sub-groups and performing specific operations on the latter. As a result, the sampling space of the output of the pooling layer is smaller than the original output. This is particularly common in convolutional neural networks, where the pooling process depends on the intended task at hand, such as minimum, maximum, and average pooling [43]. Here, we introduce the even and odd pooling, an operation that consists of splitting the output signal into its even ( $2 q$ ) and odd channels ( $2 q + 1$ ). That is, for even pooling, only the information of the output channels $2 q$ is analyzed, and likewise for the odd pooling. Although the resulting sampling space is half of the original output, the previously described noisy behavior is suppressed. This is particularly illustrated in the upper insets in Fig. 6.

C. Edge Detection Filter

Although a large variety of edge detection filters exist, the general idea behind them lies in the smoothing of the signal and the gradient of the same. The smoothing is necessary to avoid unnecessary abrupt changes or singularities of step-like signals. For each smoothing filter, one may associate an edge detection scheme, which might or not be suitable for the signal in question. For the conventional convolution, it is quite a straightforward fact that the differentiation operator commutes with the convolution operation. This means that, instead of determining the gradient of the filtered signal, one can, in principle, compute the convolution using the gradient of the filter kernel instead. This simplifies the complexity of the edge detection scheme. The Gaussian derivative, the first Hermite–Gaussian mode, becomes the most natural edge detection filter as it comprises the Gaussian filtering and the gradient operation required for edge detection. Other alternatives exist in the literature, such as the Canny [44] and Laplacian [45] filters.

In our scheme, the eigenmode $u^{(j - 1)}$ becomes the natural candidate for the edge detection kernel as it converges to first-order Hermite-Gaussian mode in the continuous limit. This can be easily generated by exciting the $(j - 1)$ -th input of a Jx photonic lattice of length $π / 2$ . One of the drawbacks of such a filter is the lack of control over its width, which limits edge detection in narrow signals. We thus consider the double-box (Canny edge) [44] kernel $k^{(C)} = - δ_{q, - 2} - δ_{q, - 1} + δ_{q, 2} + δ_{q, 3} \equiv (0, \dots, - 1, - 1, 0, 0, 1, 1, \dots, 0)$ as the second filter to perform edge detection. Here, we placed two consecutive zeros at $q = 0$ and $q = 1$ so that the kernel contains an even number of elements, which, as previously discussed, prevents the output from showing additional noisy behavior for even $j$ .

For the input signal, we devise an ideal sequence of window functions of different widths and spacings. Such an input serves as an enlightening test scenario to understand any potential issues and advantages for each filter, the outcome of which is depicted in Fig. 7. In the latter, one can identify a relatively large window at the left of the input. The $u^{(j - 1)}$ filter is capable of identifying the edges of the signal in a clean way. Although the double-box filter accomplishes the same task, the signal becomes noisier than the first filter. The situation changes in the intermediate region of the signal, where a narrow window appears. As previously argued, the first filter fails to detect the edges, whereas the double-box filter does the job. Lastly, on the right side of the signal, the first filter spikes up at the outer edges of the signal and fails to detect the inner sides.

Figure 7.Input signal (black-shaded area) and the corresponding convolution norm using the filter $u^{(j - 1)}$ (lower row) and double-box (Canny edge) filter $k^{(c)}$ (upper row) for a waveguide array with $j = 100$ .

Download full size

View all figures

Thus, the double-box filter provides a more accurate detection scheme but induces extra noise into the output, which is a less-than-ideal scenario when dealing with more complex input signals, such as those with background noise. To verify this, let us consider a simple window signal exposed to a background Gaussian noise with a standard deviation $σ = 0.3$ . We now use this noisy signal to test the edge detection capacities using the previously introduced two filters. The results are shown in Fig. 8, where it is clear that the $u^{(j - 1)}$ does a better job suppressing the initial background noise and finding the edges of the clean signal. The even and odd pooling at the output improves the signal quality by removing the oscillatory behavior between consecutive even and odd ports, as seen in the upper-left and upper-right insets of Fig. 8. The double-box filter produces an output indistinguishable from noise (see the upper-center inset of Fig. 8), an issue not fixed with the above-mentioned pooling scheme. As a possible workaround to this problem, one can consider a wider double-box kernel, namely, $k^{(d - C)} ≔ {(0, \dots, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, \dots, 0)}^{T}$ . The width of such a modified double-box filter can be steered in order to control the effectiveness of detecting peaked and noisy signals.

Figure 8.Edge detection scheme using the noisy rectangular signal of Fig. 6 and the convolution kernel $u^{(j - 1)}$ (lower row) and the double-box filter $k^{(c)}$ (upper row) with $j = 100$ . In both cases, the upper-left and upper-right insets depict the even and odd pooling of the convolution output, respectively.

Download full size

View all figures

4. SIMULATIONS AND WORKING PLATFORM

For completeness, a wave simulation of the proposed architecture is performed further to test the reliability of the fractional convolution device. To this end, the beam-envelope formulation is used in COMSOL Multiphysics to speed up the design and computational time of the architecture, valid for electromagnetic waves whose oscillations in the propagation direction are much faster than those in the perpendicular direction. In this regime, electric fields take the form $\vec{E} (\vec{r}; t) = \vec{E} ({\vec{r}}_{T}) e^{- i (w t + \vec{k} \cdot {\vec{r}}_{L})}$ , with ${\vec{r}}_{T}$ and ${\vec{r}}_{L}$ the position vectors on the transverse and longitudinal directions of propagation, respectively. Furthermore, $\vec{k} = β {\hat{r}}_{L}$ is the propagation vector with $β$ the corresponding propagation constant. Here, we consider a two-dimensional setup in which the direction of propagation is fixed at $\hat{x}$ so that the transverse electric field takes the form $E (\vec{r}) = E (y) \hat{z}$ .

In the current setup, we consider a silicon-on-insulator (SOI) platform to develop the proposed architecture. To achieve this, consider a 1550 nm infrared source propagating through a silicon (Si) waveguide (core). The core has a transverse square shape with $200 nm$ of width and a refractive index of $n_{core} = 3.48$ , which renders an effective index of 2.85 for the effective two-dimensional model. The core is surrounded by a fused silica cladding ( ${SiO}_{2}$ ) with a refractive index of $n_{cla} = 1.47$ . This leads to a unique guided mode with propagation constant $β = 8.9151 \times 10^{6} rad / m$ . The guided mode has a maximum point at the core center that varies slowly within the core and decays exponentially throughout the cladding. To illustrate the complete convolution architecture, we consider a device with $N = 9$ channels ( $j = 4$ ), where the waveguides corresponding to the channels $p = - 1, 0, + 1$ are uniformly separated by 0.63 nm, whereas the subsequent waveguides are separated by 637.99 nm ( $p = 1, 2$ and $p = - 1, - 2$ ), 657.13 nm ( $p = 2, 3$ and $p = - 2, - 3$ ), and 700.11 nm ( $p = 3, 4$ and $p = - 3, - 4$ ). This separation ensures the coupling strengths required for the $J x$ lattice so that $\tilde{κ} L = π / 2$ is fulfilled for the lattice length $L = 139.32 μm$ , which is the required length to perform the DFrFT, and $L = 419.97 μm$ for the inverse DFrFT. The coupling length and separation of each 50:50 directional coupler are 29.35 μm and 630 nm, respectively. Passive phase shifters, such as small bends, are placed before any bend to compensate for any deviation in the optical path, whereas active phase shifters are placed after every 50:50 directional coupler to modulate phase and amplitude throughout the MZI.

Although phase shifters can be deployed through conventional thermo-optical effects [39,46], phase-change materials have opened the possibility for even more compact phase shifters. Particularly, an implementation based on ${Sb}_{2} {Se}_{3}$ has shown to achieve a phase shift of approximately $π / 11$ rads/μm for a 1550 nm source [40]. The ongoing simulations use the latter solution for the active phase elements to steer the properties of the MZI. From the previous considerations, and after including proper bends to connect the $J x$ lattices with the MZI elements and the phase shifters, the total dimension of the optical device becomes $1096 μm \times 89 μm$ . Clearly, this does not include the packaging, but it provides a reasonable estimation of the device size. As a particular example, the edge detection scheme is considered in our simulations, where the window signal $f = {0, 0, 0, 1, 1, 1, 0, 0, 0}$ is injected at the input and the kernel function $k^{(C)} = {0, 0, - 1, - 1, 0, 1, 1, 0, 0}$ is used as the convolution filter. Recall that the phase shifters of the MZI array are tuned so that they render the DFrFT of $k^{(C)}$ , which in this case becomes $K^{(C)} = {0.935 e^{- i \frac{π}{2}}, 0.935 e^{- i \frac{π}{2}}, 0.353 e^{- i \frac{π}{2}}, 0.353 e^{- i \frac{π}{2}}, 0, 0.353 e^{i \frac{π}{2}}, 0.353 e^{i \frac{π}{2}}, 0.935 e^{i \frac{π}{2}}, 0.935 e^{i \frac{π}{2}}}$ . Because of the small number of channels, the double-box filter has been used instead of the $u^{(j - 1)}$ filter. As discussed earlier, the latter filter is unsuitable for narrow input signals, such as the one considered here. The simulated propagation of the electric field modulus throughout the architecture is depicted in Fig. 9(a). The respective input and output electric field moduli are in Fig. 9(b), where the gray shaded area denotes the location of the waveguide core. Clearly, the simulation output reveals the two expected peaks, one next to each edge of the input signal, thus performing the desired edge detection. Furthermore, Fig. 9(c) portrays the theoretical results obtained from coupled-mode theory, from which one can notice a good agreement between the simulated and theoretical results.

Figure 9.(a) Propagation of the electric field modulus $| E (x, y) \hat{z} |$ throughout the convolution architecture. (b) Input and output field amplitudes of the simulated results, where the gray-shaded area denotes the location of the waveguide core. (c) Theoretical predictions from coupled-mode theory.

Download full size

View all figures

5. LATTICE DEFECTS

Hitherto, our architecture has been tested for different filtering processes and proved reliable even for noisy signals. Still, errors may appear during the manufacturing process of the Jx lattice, rendering waveguide arrays that deviate from the ideal one described by the Hamiltonian $H$ . Given the nature of the unpredicted errors, we can introduce a new lattice model accounting for such effects through added uniform random noise. To this end, let us consider the new perturbed lattice Hamiltonian $\tilde{H} (δ) = H + δ R$ , with $δ > 0$ a strength parameter and $R$ a $(2 j + 1) \times (2 j + 1)$ symmetric matrix whose components $R_{p, q}$ are taken from the uniform distribution with unit mean. In this form, the wave propagation is still ruled by a lossless process characterized by the unitary evolution operator $\tilde{G} (α; δ) = e^{- i α \tilde{H} (δ)}$ . Since the DFrFT of the convolutional kernel $k$ is programmed through the MZI array, we only require the implementation of two Jx lattices (see Fig. 2), and thus defects are only considered in such layers. For each one of the latter, we consider a different perturbation matrix $R$ so that defects may not be the same. Recall that the second DFrFT layer has a different length ( $2 π - α$ ), corresponding to the inverse DFrFT operation. Following the same treatment as in previous sections, we fix $α = π / 2$ throughout the rest of this section.

In order to understand the influence of the perturbation $R$ , we compute the relative error $E (δ) ≔ ∥ \tilde{G} (π / 2; δ) - G (π / 2) ∥_{F} / ∥ G (π / 2) ∥_{F}$ , where $∥ M ∥_{F} ≔ \sqrt{tr (M M^{†})}$ stands for the Frobenius norm of the complex matrix $M$ . The latter allows quantifying any deviation from the original DFrFT as a function of the perturbation strength $δ$ . To avoid any misleading behavior, we compute 100 perturbed DFrFT matrices [ $G (π / 2; δ)$ ] and their corresponding relative percent error [ $E (δ) \times 100 %$ ] for each $δ$ ; we then compute the mean and standard deviation per $δ$ . The results are shown in Fig. 10, where it can be seen that a relative error of $14.7 %$ is reached for $δ = 0.05$ , whereas the error rises to $28 %$ and $80 %$ for $δ = 0.1$ and $δ = 0.3$ , respectively. A graphic visualization of the DFrFT and its perturbed counterparts is presented as the matrix modulus [ ${| G (π / 2; δ) |}_{p, q}$ = $| G_{p, q} (π / 2; δ) |$ ] in the insets of Fig. 10. These figures reveal a mild distortion of the DFrFT for $δ = 0.05$ and $δ = 0.1$ , and an indistinct form for $δ = 0.3$ . We thus should expect similar results for relative errors up to approximately $28 %$ ( $δ \approx 0.1$ ).

Figure 10.Mean (line) and standard deviation (shaded area) of the relative percent error [ $E (δ) \times 100 %$ ] for a set of 100 randomly generated perturbed DFrFT matrices [ $G (π / 2; δ)$ ] per value of $δ$ . The insets depict $| G (π; δ) |$ (matrix modulus) of a random sample out of the set of 100 random matrices for $δ = 0.05, 0.1, 0.3$ .

Download full size

View all figures

Notably, we can determine the effects of the lattice defects on the edge detection scheme using both the $u^{(j - 1)}$ and double-box filters. Since the output of the perturbed convolution is also a vector in $C^{2 j + 1}$ and we are interested in deviations on the output vector compared to the ideal case, we use the distance function Eq. (3) with $y_{1}$ and $y_{2}$ the perturbed and unperturbed outputs. A set of 100 random matrices $G (π / 2; d)$ and another one for $G (3 π / 2; d)$ are generated so that multiple edge detection operations can be computed for each $δ$ . The corresponding distance is computed for every output with respect to the unperturbed one; the mean and standard deviations for each $δ$ are shown in Fig. 11. The distance using both filters leads to similar results, where it can be noticed that lattice errors of $14.2 %$ do not produce any relevant change in the edge detection for both filters [see insets in Figs. 11(a) and 11(b)]. For lattice errors around $28.1 %$ , the filter $u^{(j - 1)}$ is still performing well, whereas the double-box filter misses some of the edges during the detection scheme. In the latter, the even and odd pooling allows for a cleaner output where the edges are properly identified. This highlights the relevance of such pooling operations in this convolution architecture in the presence of lattice defects. In turn, the output is indistinct for an error of $55.1 %$ , where the pooling operations also fail to extract any relevant information.

Figure 11.Distance function ( $d_{y_{1}, y_{2}} (δ)$ ) between the unperturbed ( $δ = 0$ ) and perturbed ( $δ \neq 0$ ) convolution operations using the (a) $u^{(j - 1)}$ and (b) double-box filters. The plot is presented as a function of $δ$ and the lattice error $E (δ) \times 100 %$ . The mean (line) and standard deviation (shaded area) are computed by generating a set of 100 random perturbed DFrFT matrices [ $G (π / 2; δ)$ ] per value of $δ$ . The insets depict the output of the perturbed filtering using the even (yellow) and odd (red) pooling operation for $δ = 0.05, 0.1, 0.2$ .

Download full size

View all figures

6. CONCLUSIONS

The DFrFT based on the Jx photonic lattice proved to be a valuable resource for constructing an adequate convolution architecture. This choice was not fortuitous as the asymptotic behavior for large channel numbers reveals that such a lattice asymptotically reproduced the continuous FT operation. Interestingly, the same lattice can be used as a source to generate convolution kernels that can be later used in the convolution process. So far, the $j$ -th and $(j - 1)$ -th modes have proved to be suitable kernels to produce Gaussian-like smoothing and edge detection tasks, respectively. As a test setup, we used a single delta kernel in the middle channel of the convolution kernel, which, in the conventional convolution, leaves the input invariant. To quantify any difference, the distance was computed between the lattice eigenmodes at the input and their respective convolution output, which showed that eigenmodes around $n = 0$ are more prompt to deviate at the output. In turn, eigenmodes close to the edges are less affected, and their deviation decreases when a larger number of channels are considered.

For more general signals without any particular symmetry, the output decomposes into even and odd channel outputs ${(f \underset{π / 2}{*} k^{(δ)})}_{2 q}$ and ${(f \underset{π / 2}{*} k^{(δ)})}_{2 q + 1}$ , respectively. The latter can be written entirely in terms of $C_{m}$ and the basis elements $u^{(m)}$ (see Appendix A), from which one notices that even and odd channels at the output differ by an additive term that breaks any potential continuity contiguous channels. This explains the wavy behavior of the convolution when noisy signals are considered, such as those depicted in Figs. 6 and 8, and also justifies the introduction of even and odd pooling as a mechanism to clean the convolution output.

Interestingly, the architecture has shown to be a reliable source for performing edge detection tasks for both clean and noisy inputs. This is particularly achieved by either exploiting one of the lattice eigenmodes $u^{(j - 1)}$ as an edge-detection filter or using the conventional double-box filter. The former one suits better for signals whose width is at least of the order of the eigenmode $u^{(j)}$ width, whereas the double-box filter works better for narrow signals. This fact is further tested by computing wave simulations of the full architecture, which reveals a good match with the theoretical predictions. The simulation design reveals some challenges in achieving the ideal Jx lattice, mainly due to the preliminary coupling between the waveguides around the bends connecting the channel cross-talk. Numerical data from the simulation suggest that such undesired coupling occurs around 1–2 μm prior to the array end points for an initial waveguide separation of 630 nm in the central channels. This separation has been chosen as the total propagation length to achieve the DFrFT at 139.32 μm. This is much larger than the distance where the channel cross-talk occurs, and any potential deviations from the ideal behavior do not compromise the desired functionality.

Our convolutional architecture has shown resilience due to defects on the Jx lattice. This was elucidated by adding a random and symmetric interaction to the lattice Hamiltonian so that the wave evolution through the lattice is still unitary despite the included random defects. The intensity of such defects is controlled by employing a strength parameter $δ$ . The error induced in the wave evolution is relatively large even for $δ \approx 0.1$ . This was further illustrated by performing the edge detection operation, which is still reliable with lattice errors of up to $28 %$ . Thus, the coupling parameters $κ_{p}$ are allowed to have mild deviations from their exact values, and the architecture is still expected to perform the convolution task.

It is worth noting that the current convolution photonic design has the potential to operate passively for task-specific convolution kernels. This implies that for a specific kernel filter, optical bends can be used instead of active phase shifters to produce the required phase changes and pre-program the necessary kernel. This results in a device that consumes no external power and can process any number of signals to extract the corresponding feature pre-programmed in the kernel.

For completeness, one can alternatively construct a convolution scheme based on circulant convolutions. The convolution kernel has been replaced by a circulant matrix that multiplies the input signal, where the circulant matrix factorizes in terms of the DFT. Such an approach can be easily modified and implemented using the evolution operator [Eq. (2)], as proved in Appendix B. Nevertheless, the implementation of the latter solution as a photonic architecture is unclear, and the approach discussed throughout this work seems more adequate from a practical point of view, because it is possible to identify each of the matrices with optical elements.

Category: Silicon Photonics

Received: Jan. 3, 2024

Accepted: May. 21, 2024

Published Online: Aug. 2, 2024

The Author Email: Kevin Zelaya (kevin.zelaya@cinvestav.mx)

DOI:10.1364/PRJ.517491

CSTR:32188.14.PRJ.517491

$q$	$ℓ$	$s$	${(u^{(ℓ)} \underset{π / 2}{*} e^{(s)})}_{q}$	$q$	$ℓ$	$s$	${(u^{(ℓ)} \underset{π / 2}{*} e^{(s)})}_{q}$
$2 q$	$2 ℓ$	$2 s$	$μ_{2 ℓ, 2 s}^{(2 q)} + σ_{2 ℓ, 2 s}^{(2 q)}$	$2 q + 1$	$2 ℓ$	$2 s$	0
	$2 ℓ$	$2 s + 1$	0		$2 ℓ$	$2 s + 1$	$σ_{2 ℓ, 2 s + 1}^{(2 q + 1)}$
	$2 ℓ + 1$	$2 s$	0		$2 ℓ + 1$	$2 s$	$σ_{2 ℓ + 1, 2 s}^{(2 q + 1)}$
	$2 ℓ + 1$	$2 s + 1$	$σ_{2 ℓ + 1, 2 s + 1}^{(2 q)}$		$2 ℓ + 1$	$2 s + 1$	0

微信扫一扫：分享