Photonics Research, Volume. 12, Issue 10, 2279(2024)

Mode-multiplexed photonic integrated vector dot-product core from inverse design Spotlight on Optics

Zheyuan Zhu1、*, Raktim Sarma2, Seth Smith-Dryden1, Guifang Li1, and Shuo S. Pang1
Author Affiliations
  • 1CREOL, The College of Optics and Photonics, University of Central Florida, Orlando, Florida 32816-2700, USA
  • 2Center for Integrated Nanotechnologies, Sandia National Laboratories, Albuquerque, New Mexico 87123, USA
  • show less

    Photonic computing has the potential to harness the full degrees of freedom (DOFs) of the light field, including the wavelength, spatial mode, spatial location, phase quadrature, and polarization, to achieve a higher level of computing parallelism and scalability than digital electronic processors. While multiplexing using the wavelength and other DOFs can be readily integrated on silicon photonics platforms with compact footprints, conventional mode-division multiplexed (MDM) photonic designs occupy areas exceeding tens to hundreds of microns for a few spatial modes, significantly limiting their scalability. Here, we utilize inverse design to demonstrate an ultracompact photonic computing core that calculates vector dot products based on MDM coherent mixing. Our dot-product core integrates the functionalities of two-mode multiplexers and one multimode coherent mixer within a nominal footprint of 5 μm×3 μm. We have experimentally demonstrated computing examples on the fabricated dot-product core, including complex number multiplication and motion estimation using optical flow. The compact dot-product core design enables large-scale on-chip integration in a parallel photonic computing primitive cluster for high-throughput scientific computing and computer vision tasks.

    1. INTRODUCTION

    Vector, matrix, or tensor calculations are the fundamental building blocks of modern scientific computing. The underlying core components of these computing tasks are basic linear algebra subprograms (BLASs) that provide hardware implementation of the arithmetic operations between vectors (level 1), vector and matrix (level 2), and matrices (level 3), each building upon the previous level [1]. Given its unique role as a BLAS level 1 routine, efficient and scalable vector dot-product calculation is crucial to achieving optimal performances in more complex and computationally intensive operations. In traditional uniprocessor digital computers, the central processing unit (CPU) executes a single basic operation, such as addition, multiplication, or fused multiply–add (FMA), on a single data stream, a process known as single instruction stream, single data stream (SISD), as shown in Fig. 1(a). The sequential execution and repeated data access of SISD compromise the computation speed and efficiency in vector- and matrix-based operations. Single instruction stream, multiple data streams (SIMD), as shown in Fig. 1(b), which simultaneously applies an arithmetic operation to multiple data streams [2], has been adopted in virtually all modern CPUs and stream processors in GPUs. These processors incorporate dedicated SIMD tiles of cascaded FMA units with pipelined inputs to accelerate vector instructions [3]. Because caching the intermediate results is still necessary to ensure timing closure in electronics, the computing throughput per unit area is usually on the order of 0.1 tera operations per second per millimeter square (TOPS/mm2), and the vector length is typically limited to several hundred, even with a highly optimized layout of logic and memory units within an SIMD engine [3,4].

    (a) Electronic and (b) photonic implementations of SISD and SIMD operations. (a1) An SISD electronic arithmetic unit that performs multiplication and addition. (a2) SIMD design with multiple, pipelined inputs for dot-product calculation. (b) Individual single-mode coherent mixers as multiplier units without parallelism, equivalent to SISD architecture in digital electronics.

    Figure 1.(a) Electronic and (b) photonic implementations of SISD and SIMD operations. (a1) An SISD electronic arithmetic unit that performs multiplication and addition. (a2) SIMD design with multiple, pipelined inputs for dot-product calculation. (b) Individual single-mode coherent mixers as multiplier units without parallelism, equivalent to SISD architecture in digital electronics.

    Recently, driven by the computing demand in the artificial intelligence (AI), analog computing platforms based on integrated photonic devices [5,6] have demonstrated the potential of higher efficiency and computing throughput than the electronic counterparts, due to the intrinsically passive photonic multiply–accumulate (MAC) operations without intermediate memory access [7]. Figure 1(c) illustrates a photonic computing design based on two coherent mixers without parallelization in DOF of light, much like the SISD architecture in digital computing. In a single coherent mixing unit, the two inputs of electrical fields encode the numbers a and b in their amplitudes. After splitting and balanced detection, the output is proportional to their product Re{a*b} [8]. To perform dot products between two N-element vectors, N sets of mixers and balanced photodiodes are required, and the intermediate element-wise products must first be individually digitized and then summed in the post-processing stage. Due to the power consumption of analog-to-digital converters (ADCs) [9] required in the design, coherent mixing without data-level parallelism suffers from low efficiency when handling large vectors.

    Similar to the transition from SISD to SIMD architecture in digital processors, using wavelength- or mode-division multiplexed (WDM or MDM) photonic signals enables a single coherent detection unit, consisting of a 2×2 coherent mixer and a pair of balanced photodiodes, to simultaneously process multiple data inputs in parallel [10,11]. Leveraging the intrinsic orthogonality of the light fields, coherent photonic MAC operations with multiplexed signals naturally accumulate the intermediate elementwise products between two vectors, and thus could achieve two- or threefold lower power consumption than the nonmultiplexed designs [12]. While WDM-based photonic processing devices have matured into practice to some extent in AI-related computing applications [13,14], MDM-based devices only begin to emerge as a viable approach in high-bandwidth optical communication [15,16], and their applications in parallel photonic computing are yet to be exploited.

    Although utilizing MDM can lead to significant advances in high-bandwidth optical communication and photonic computing, a major bottleneck to high density integration is the large footprint usually associated with these MDM-based nanophotonic devices. In addition, different from the conventional MDM components used for optical communication, the complexity of photonic computing often necessitates two or more traditional MDM building blocks to implement the arithmetic operations. In this work, we present an end-to-end MDM-based photonic design that integrates the functionalities of multiple MDM blocks, resulting in a compact footprint for vector and/or matrix-based SIMD computing applications. Combined with peripheral electronics and algorithms targeting our photonic platform, we have experimentally demonstrated vector-dot product, complex number multiplication, and a computer vision task on a fabricated MDM-based photonic dot-product core.

    2. PRINCIPLE OF OPERATION

    A. MDM Coherent Photonic Dot-Product Core

    Figure 2(a) shows an implementation of photonic dot-product core based on conventional MDM components in optical communications. The elements in the vector, a1 (b1) and a2 (b2), are mapped to the electric field profiles of the fundamental (ψI, TE0) and the second order (ψII, TE1) TE modes of a few-mode waveguide via mode multiplexers (MUXs). The mode-multiplexed photonic signals, Ea=a1ψI+a2ψII and Eb=b1ψI+b2ψII, undergo coherent mixing via multimode interference (MMI), producing the electrical fields on the upper and lower arms Ep=12(Ea+iEb) and En=12(iEa+Eb). Based on the orthogonality between ψI and ψII, the difference between the overall intensity of the upper and lower outputs Idiff=|Ep|2|En|2 produces the dot-product between vectors a and b. The functionality of the conventional MDM dot-product design can be expressed as a Kronecker product (denoted by ) between a 3 dB coupling matrix representing the MMI, and an identity matrix representing the MUX, as in [EIIpEIpEInEIIn]=12([1ii1][1001])[a1a2b2b1].

    Photonic implementation of vector dot-product core based on mode-division multiplexing. (a) Implementation based on the traditional MDM infrastructure in optical communications, using two MUXs and one MMI. (b) An end-to-end photonic dot-product core that integrates the functionalities of two MUXs and one MMI. The inset shows the photonic structure from inverse design.

    Figure 2.Photonic implementation of vector dot-product core based on mode-division multiplexing. (a) Implementation based on the traditional MDM infrastructure in optical communications, using two MUXs and one MMI. (b) An end-to-end photonic dot-product core that integrates the functionalities of two MUXs and one MMI. The inset shows the photonic structure from inverse design.

    Using conventional MDM components, the dot-product core requires two MUXs and one MMI. Each MUX occupies at least 20  μm×4  μm in footprint [17,18], which is required by the adiabatic taper. For a 2×2 MMI, a footprint of 40  μm×6  μm [19,20] is required to match the first Talbot distance. The overall footprint of a conventional dot-product core is thus larger than 50  μm×10  μm.

    Figure 2(b) shows our topologically optimized mode-multiplexed photonic vector dot-product core that integrates the functionalities of two MUXs and one MMI within a 5  μm×3  μm footprint. Compared to the behavior of the electrical field inside a conventional multimode photonic design [Fig. 2(b1)], in which the regions for mode multiplexing and mixing are clearly distinguishable, the integrated dot-product core does not perform an intermediate conversion step of the input electrical fields onto the spatial mode basis. The end-to-end transformation of the electrical field by the integrated core, expressed as matrix Si in [EIIpEIpEInEIIn]=12[100i01i00i10i001][a1a2b2b1],contributes to its compact footprint.

    The ultracompact footprint addresses one of the fundamental bottlenecks for utilizing MDM-based approaches for photonic computing and paves the way for high-density integration of the core in a parallel computing array.

    B. End-to-End Design of Photonic Dot-Product Core

    The photonic core was inversely designed on a silicon-on-insulator (SOI) platform by optimizing the structure that maximizes the coupling efficiency from the inputs into the target electric field profiles. The design process follows a gradient-based paradigm that tunes the distribution of the relative permittivity εr on the silicon layer as the design parameters [2124]. The parameters are updated along the gradient direction of the objective function l=j=1J|EtjEj(εr)|2.

    Here, l calculates the overlap integral between the target field Etj at the output location and the field Ej within the structure, εr is the three-dimensional distribution of relative permittivity, and (·) denotes the matrix conjugate transpose. We set the fundamental or second-order TE eigenmodes in the few-mode output waveguides as the target fields Etj. The summation over j aggregates the contributions from all four pairs of output and target fields. The field Ej in the device satisfies the finite-difference frequency domain (FDFD) Maxwell equations in matrix form, expressed as (DLdiag(εr))Ej(εr)=bj.

    Here, DL is the finite difference matrix for the three-dimensional vector electrical field, representing the operator 1k0×× with perfectly matched layers (PMLs) on the boundary of the solution domain [25]. diag(εr) represents a diagonal matrix constructed from the vectorized εr. k0 is the wavenumber in vacuum. bj denotes the input excitation that induces the field Ej within the device and is derived from the fundamental TE mode of the input waveguide based on the total field/scattered field technique [26].

    Combining Eqs. (3) and (4), the gradient of the objective function with respect to εr can be derived as l(εr)=j=1J2Re{diag(Ej*)(DLdiag(εr))1Etj}.

    The inverse problems (DLdiag(εr))1Etj and (DLdiag(εr))1b were both solved using the least squares method [27], and were carried out on J=4 parallel GPUs (NVIDIA RTX 3090). The relative permittivity εr is updated along the gradient direction with an adaptive step size τ as εrεr+τl(εr). To promote the binary medium (air and silicon) on the silicon layer, the updated εr is mapped by a sigmoid function to produce the relative permittivity in the next iteration, and is expressed as εr=εSiεair1+exp(γ(εrεair+εSi2))+εair.

    Here, εSi and εair are the relative permittivity of silicon and air, respectively, and γ=4 is a hyperparameter that controls the slope of the sigmoid function.

    The inversely designed photonic dot-product core was fabricated on commercially available silicon-on-insulator (SOI) wafers. The wafers consisted of 250 nm silicon on top of a 3 μm buried oxide. The core was fabricated using a positive tone ZEP resist followed by electron beam lithography and inductively coupled plasma reactive ion etching. To realize the subwavelength sized and spaced features of the inversely designed structure, short range proximity correction was used to vary the dose of the exposure across the device. The core consisted of four single-mode input waveguides (480 nm in width) and two few-mode output waveguides (774 nm in width). The two few-mode output waveguides were each tapered to a 40  μm×40  μm photonic crystal structure [28,29], which vertically couples out the electric field profiles for observation by a microscope imaging system. Details of the photonic crystal design and simulation results can be found in Appendix B.

    A microscope setup was used to experimentally characterize the fabricated vector dot-product core. The core was edge-coupled to the fiber array that provided four modulated inputs, each driven by an independent off-chip Mach–Zehnder modulator (MZM, JDSU IOAP-MOD9140). The modulators were driven by a multichannel digital-to-analog converter (DAC, Analog Devices, MAX11300), which was controlled by a microcontroller (Analog Devices, SDP-CK1Z). The modulated signals were edge-coupled into the four input ports of the dot-product core. The intensity profiles on the two vertical output couplers were recorded from above through a long working distance 20× objective and a tube lens onto a short-wave infrared (SWIR) camera (Allied Vision, Goldeye CL-008 TEC1). The camera and DAC synchronously perform 100 multiplications per second at the frame rate of the SWIR camera.

    3. EXPERIMENTAL RESULTS

    A. Characterization of the Fabricated Dot-Product Core

    Figure 3(a) shows a microscope image of the photonic core under our characterization setup. Figure 3(b) plots the intensity profiles at the output coupler when the first two single-mode input arms, a1 and a2, were individually activated in the experiments. The intensity profiles match the target spatial profiles of the fundamental and second-order modes. To quantify the computing performance, we simulated the electromagnetic (EM) behavior of the designed and fabricated core using Ansys Lumerical FDTD software based on the design and scanning electron microscope (SEM) image, respectively. TE fundamental modes were launched into each single-mode input waveguide, and the resulting field profiles at a nominal operating wavelength of 1570 nm are shown in Figs. 3(c) and 3(d). The orange boxes show the cross-section of the electrical field profiles marked by the dashed lines.

    Characterization of the fabricated dot-product core. (a) Microscope image of the fabricated dot-product core under test. (b) Experimentally observed intensity profiles on the two output couplers when the inputs a1 and a2 were individually excited. (c) Structure of the ideal inversely designed dot-product core and simulated electrical field profiles within the core. (d) SEM image of the fabricated dot-product core and simulated electrical field profiles within the core based on the SEM image. The side views show the electrical field profiles at the location marked by the orange dashed line.

    Figure 3.Characterization of the fabricated dot-product core. (a) Microscope image of the fabricated dot-product core under test. (b) Experimentally observed intensity profiles on the two output couplers when the inputs a1 and a2 were individually excited. (c) Structure of the ideal inversely designed dot-product core and simulated electrical field profiles within the core. (d) SEM image of the fabricated dot-product core and simulated electrical field profiles within the core based on the SEM image. The side views show the electrical field profiles at the location marked by the orange dashed line.

    The transfer matrices St of the designed and fabricated cores can be calculated from the overlap integral [30] between the cross-sectional electrical fields and the two TE eigenmodes in the top and bottom arms. Both matrices share the same structure as the ideal transfer matrix Si in Eq. (2). The ideal inversely designed core features a symmetric design with <10% crosstalk, as indicated by the off-diagonal elements. The power-splitting ratios between the top and bottom arms are both approximately 46% versus 54% for the fundamental and second-order TE modes. The fabricated core maintains the relative low crosstalk with a maximum of 13.5% in the off-diagonal elements. The power splitting ratios are 41% versus 59% and 49% versus 51% for fundamental and second-order TE modes, respectively.

    The insertion loss and crosstalk of the designed and fabricated core can be quantified by the crosstalk matrix MX, whose elements are the overlap between the columns in the transfer matrices of the ideal (Si) and the designed (or fabricated) device (St), expressed as MX[i,j]=St[:,i]*·Si[:,j].

    Here, [:,i] extracts the i-th column vector from the matrix. The insertion loss (IL) and crosstalk (XT) can both be derived from MX, respectively, as [30] IL(dB)=10log10(maxeigenvalueofMX);XT(dB)=10log10(powerinthediagonalsofMXpowerintheoff-diagonalsofMX).

    The crosstalk matrices at a normal operating wavelength of 1570 nm of both ideal and fabricated dot-product core designs are shown in Fig. 4(a). Figure 4(b) plots the insertion loss and crosstalk of the designed and fabricated cores as a function of the wavelength. The ideal dot-product core design features a consistent 2.3 dB insertion loss and a crosstalk of <13  dB (<5%) across the wavelength range of 1540 nm to 1590 nm. The fabricated core maintains a consistent insertion loss and crosstalk within the wavelength range 1550 nm to 1580 nm, suggesting broadband performance that supports wavelength multiplexed inputs. Despite the uneven splitting of the input fields into the upper and lower arms, the crosstalk between the two spatial modes in the output waveguides is 9.06  dB, or 12.4%. The low crosstalk allows us to empirically correct most of the computing errors, as described in Appendix A.

    Characterization of (a1), (b1) designed and (a2), (b2) fabricated dot-product core. (a) Crosstalk matrix MX of the core. (b) Insertion loss and crosstalk (in dB) as a function of wavelength.

    Figure 4.Characterization of (a1), (b1) designed and (a2), (b2) fabricated dot-product core. (a) Crosstalk matrix MX of the core. (b) Insertion loss and crosstalk (in dB) as a function of wavelength.

    B. General-Purpose Computing Examples

    The core supports dot products between two-element vectors with fixed-point precision, enabling the deployment of general purpose computing tasks such as complex number multiplication and optical flow calculation. To carry out general purpose dot-product calculations, (a1,a2)·(b1,b2)T, on the photonic core, we calibrated the four MZMs to generate five signed linear analog levels representing the integers from 2 to 2 on each input. The four input ports of the inversely designed photonic structure receive the modulated optical signal representing a1, a2, b2, and b1, respectively, from top to bottom. The intensity differences between two output couplers were proportionally mapped to the dot products using the output from (1,0)·(1,0)T. Figure 5(a) plots a time-division multiplexing (TDM) sequence of 16 dot products performed on the photonic core. We quantify the computing error with normalized mean square error (NMSE) between the ground truth Ygt and the experimental Yexp dot products, defined in NMSE=k=1K|Yk,expYk,gt|2k=1K|Yk,gt|2.

    General-purpose computing examples as dot-products on the photonic core. (a) Dot-product calculation of a sequence of 16 two-element vectors. (b) Complex number multiplications encoded as two equivalent dot-products in time-division multiplexing. (c1), (c2) Multiplication results between 16 complex numbers. Blue circles indicate ground truth results, green circles indicate simulated results from the ideal inversely designed core in (b), and red circles indicate experimental results calculated on the fabricated dot-product core.

    Figure 5.General-purpose computing examples as dot-products on the photonic core. (a) Dot-product calculation of a sequence of 16 two-element vectors. (b) Complex number multiplications encoded as two equivalent dot-products in time-division multiplexing. (c1), (c2) Multiplication results between 16 complex numbers. Blue circles indicate ground truth results, green circles indicate simulated results from the ideal inversely designed core in (b), and red circles indicate experimental results calculated on the fabricated dot-product core.

    Here, the summation is performed over all K symbols in the sequence. The NMSE of all multiplications was 6.32%, offering sufficient dynamic range to represent signed integers from 8 to 8 (signed 4-bits) in the dot-product results.

    We first applied the photonic dot-product core to perform complex number multiplication [i.e., (a+bi)×(c+di)]. The real and imaginary parts of the result [(acbd)+(ad+bc)i] are split into two equivalent dot products encoded in a TDM symbol sequence. Sixteen complex number pairs represented by a sequence of 32 dot products were multiplied on the core. Figure 5(c) compares the products from the ideal and fabricated cores with the ground truth on the complex plane. The designed dot-product core shows good agreement with ground truth and an NMSE of 4.0%, suggesting that the design can reach a dynamic range of signed 25 levels, or greater than the signed 4-bit precision. The NMSE between the ground truth and experimental complex products is 15.9%, which is consistent with the simulation of a fabricated dot-product core. The computing error is primarily attributed to the fabrication deviation from the ideal design and the time-varying phase instability from the off-chip fiber inputs. The phase stability can be improved by switching to on-chip modulators. The fabrication deviation can be compensated with additional phase modulation on each input, which can be generated from integrated thermal optical phase shifters.

    In addition, we have also demonstrated a computer vision task using the photonic dot-product core. Specifically, we use the device to calculate the optical flow in a visual scene to quantify the motion of the object. The real-time calculation of the optical flow in a dynamic environment plays an important role in motion detection and object tracking of computer vision systems [31,32]. Here, we calculated the optical flow of selected edge pixels between two adjacent two-dimensional frames, I1(x,y) and I2(x,y), from a 10 frames-per-second spinning wheel animation on the dot-product core. The flow vector (u,v)T satisfies (dx,dy)·(u,v)T=dt, where dx, dy, and dt are the finite differences of the image It(x,y) along x, y, and t dimensions, respectively [33]. Due to the ambiguity in uniquely determining the pixelwise (u,v)T, we expand the optical flow vector onto the diagonal pixels in a 2×2 window, as [dx11dy11dx22dy22]·[uv]=[dt11dt22].

    Assuming uniform flow vectors in the 2×2 window, the calculations are broken down into two parts: (i) on the two pixels [marked in gray in Fig. 6(b)] along the primary diagonals dx11, dx22, dy11, dy22, dt11, and dt22; and (ii) along the secondary diagonals dx12, dx21, dy12, dy21, dt12, and dt21 [marked in white in Fig. 6(b)]. Results from the primary and secondary diagonals are averaged to obtain the flow vector within the 2×2 window. Equation (10) can be solved using Cramer’s rule, written as u=|dt11dy11dt22dy22||dx11dy11dx22dy22|,v=|dx11dt11dx22dt22||dx11dy11dx22dy22|.

    Optical flow calculation between two adjacent frames of a spinning wheel animation. (a) Two frames from the spinning wheel animation with 100 ms interval (10 frames per second). (b) Optical flow vector of eight edge pixels. Red arrows indicate the ground truth of the flow vectors, and orange arrows indicate experimental results calculated on the photonic dot-product core. (c) Comparison of the calculated angular speed on the dot-product core with ground truth.

    Figure 6.Optical flow calculation between two adjacent frames of a spinning wheel animation. (a) Two frames from the spinning wheel animation with 100 ms interval (10 frames per second). (b) Optical flow vector of eight edge pixels. Red arrows indicate the ground truth of the flow vectors, and orange arrows indicate experimental results calculated on the photonic dot-product core. (c) Comparison of the calculated angular speed on the dot-product core with ground truth.

    Here, all the 2×2 determinants |abcd| are computed as their equivalent dot products adbc. The flow vector within one 2×2 window requires six dot-product calculations encoded in a TDM sequence.

    Figure 6(c) shows the optical flow vector within eight (J=8) 2×2 windows on the edges of the spinning wheel. We quantify the error in flow vector calculation using the mean cosine similarity, defined in Sc=1Jj=1J(uj,ideal,vj,ideal)·(uj,exp,vj,exp)||(uj,ideal,vj,ideal)||||(uj,exp,vj,exp)||.

    Here, (uj,ideal,vj,ideal) and (uj,exp,vj,exp) represent the ideal flow vector and the one calculated on the fabricated dot-product core, respectively. The mean cosine similarity is 81.8%, suggesting that the flow vectors calculated on the photonic dot-product core have captured the correct rotation direction. The mean magnitude of the flow vectors reflects the angular speed of the wheel, which is 1.32 rad/s based on the calculated optical flow on the fabricated dot-product core. Compared to the ground truth 1.41 rad/s, the relative error of the angular speed calculation is 6.8%. This example illustrates that the fixed-point dot-product core can be used in conjunction with a tailored algorithm to extract features-of-interest in computer vision tasks.

    4. CONCLUSION

    In summary, we have demonstrated a compact, integrated photonic dot-product core from an inverse design. The core utilizes spatial mode as the multiplexing dimension to perform arbitrary two-element vector dot products. To account for the difference between the design and fabricated photonic structures, calibration and error correction routines have been developed and tested on the fabricated dot-product core. We have demonstrated an equivalent signed 4-bit precision in the dot-product results, and successfully deployed a general purpose complex number multiplier and an optical flow calculator on the fabricated device.

    The miniaturized footprint enables the large-scale integration of the core as part of the photonic primitives in an electronic-photonic co-packaged parallel computing array on modern CMOS-compatible platforms. Combining our current design with on-chip modulators and multimode photodiodes [34], a computing speed on the order of 109 dot products per second is supported by modern gigabaud optoelectronics. By further integrating dense wavelength-division multiplexing (DWDM) channels and spatial modes as super-dimensions in photonic matrix- and tensor-based processors [10], our strategy enables a computing throughput on the order of 103  TOPS/mm2, which is orders of magnitude higher than that of dedicated electronic vector/matrix accelerators [4,35].

    Acknowledgment

    Acknowledgment. R.S. acknowledges the support of the Laboratory Directed Research and Development program at Sandia National Laboratories, a multi-mission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration. This work was performed in part at the Center for Integrated Nanotechnologies, an Office of Science User Facility operated for the U.S. Department of Energy (DOE) Office of Science. This paper describes objective technical results and analysis. Any subjective views or opinions that might be expressed in the paper do not necessarily represent the views of the U.S. Department of Energy or the United States Government.

    APPENDIX A: CALIBRATION AND COMPUTING ERROR OF DOT-PRODUCT CORE

    Given the transfer matrix St of the fabricated dot-product core, it is possible to compensate the four inputs to account for the uneven splitting and/or crosstalk due to fabrication imperfections. This process mixes the four inputs according to the compensation matrix C before sending to the dot-product core. The choice of C must minimize the crosstalk and equalize the amplitudes in the two output arms for both spatial modes. Theoretically, C can be calculated according to St·C=12[100i01i00i10i001].

    Here, the right-hand side denotes the transfer matrix of an ideal dot-product core in Eq. (2). Figure 7 shows the transfer matrix of the fabricated core and the corresponding compensation matrix C˜full for pre-mixing the four inputs. However, using the full transfer matrix for compensation involves the multiplication of a 4×4 complex matrix on top of the desired inputs, giving rise to 16 additional digital MAC operations.

    Transfer matrix of the fabricated dot-product core St and the corresponding compensation matrices Cfull and Cemp. Only the magnitudes of the matrix elements are shown.

    Figure 7.Transfer matrix of the fabricated dot-product core St and the corresponding compensation matrices Cfull and Cemp. Only the magnitudes of the matrix elements are shown.

    Comparison of the different compensation methods. (a) TDM dot-product output sequence before and after compensation using the full transfer matrix St and the empirical method with/without phase modulation. (b) Comparison between the NMSE of the raw and compensated dot products.

    Figure 8.Comparison of the different compensation methods. (a) TDM dot-product output sequence before and after compensation using the full transfer matrix St and the empirical method with/without phase modulation. (b) Comparison between the NMSE of the raw and compensated dot products.

    Comparison of the dot products before and after calibration in experiments.

    Figure 9.Comparison of the dot products before and after calibration in experiments.

    The NMSE after the empirical compensation presented here represents a theoretical lower bound in the computing error. In actual experiments, the time-varying phase on the four off-chip modulated inputs cannot be measured and compensated. As a result, the experimental computing error could be higher than the lower bound. We envision that with fully integrated optical paths on a chip, including the use of on-chip modulators [36] and few-mode photodiodes [34], the time-varying phase could be resolved.

    APPENDIX B: CHARACTERIZATION OF HIGHER-ORDER SPATIAL MODES

    The intensity profiles of different spatial modes on the output multimode waveguide can be coupled vertically using a photonic crystal structure acting as a high-order grating coupler. In our design, the photonic crystal structure measures 40  μm×40  μm with a pitch of 0.64 μm and a hole size of 0.32 μm. The two multimode output waveguides from our inversely designed structure are each tapered to a photonic crystal for observing the intensity profiles by a microscope imaging system from above. Figure 10 shows the intensity profiles of different input modes and the output on top of the photonic crystal region from FDTD simulation. The number of lobes in the intensity profile indicates the order of the TE modes.

    Design and FDTD simulations of the photonic crystal output coupler supporting the observation of multimode intensity profiles from the top.

    Figure 10.Design and FDTD simulations of the photonic crystal output coupler supporting the observation of multimode intensity profiles from the top.

    [10] A. Fardoost, F. G. Vanani, Z. Zhu. A high-speed photonic tensor accelerator. IEEE Photonics Conference (IPC), 1-2(2022).

    [26] R. C. Rumpf. Electromagnetic and Photonic Simulation for the Beginner: Finite-Difference Frequency-Domain in MATLAB(2022).

    [30] N. K. Fontaine, R. Ryf, H. Chen. Design of high order mode-multiplexers using multiplane light conversion. European Conference on Optical Communication (ECOC), 1-3(2017).

    [31] Y. Mae, Y. Shirai, J. Miura. Object tracking in cluttered background based on optical flow and edges. 13th International Conference on Pattern Recognition, 1, 196-200(1996).

    [32] Z. Chen, J. Cao, Y. Tang. Tracking of moving object based on optical flow detection. International Conference on Computer Science and Network Technology, 2, 1096-1099(2011).

    Tools

    Get Citation

    Copy Citation Text

    Zheyuan Zhu, Raktim Sarma, Seth Smith-Dryden, Guifang Li, Shuo S. Pang, "Mode-multiplexed photonic integrated vector dot-product core from inverse design," Photonics Res. 12, 2279 (2024)

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Silicon Photonics

    Received: Apr. 3, 2024

    Accepted: Jul. 22, 2024

    Published Online: Oct. 8, 2024

    The Author Email: Zheyuan Zhu (zheyuan.zhu@ucf.edu)

    DOI:10.1364/PRJ.524419

    Topics