Mode-multiplexed photonic integrated vector dot-product core from inverse design

Zheyuan Zhu; Raktim Sarma; Seth Smith-Dryden; Guifang Li; Shuo S. Pang

doi:10.1364/PRJ.524419

1. INTRODUCTION

Vector, matrix, or tensor calculations are the fundamental building blocks of modern scientific computing. The underlying core components of these computing tasks are basic linear algebra subprograms (BLASs) that provide hardware implementation of the arithmetic operations between vectors (level 1), vector and matrix (level 2), and matrices (level 3), each building upon the previous level [1]. Given its unique role as a BLAS level 1 routine, efficient and scalable vector dot-product calculation is crucial to achieving optimal performances in more complex and computationally intensive operations. In traditional uniprocessor digital computers, the central processing unit (CPU) executes a single basic operation, such as addition, multiplication, or fused multiply–add (FMA), on a single data stream, a process known as single instruction stream, single data stream (SISD), as shown in Fig. 1(a). The sequential execution and repeated data access of SISD compromise the computation speed and efficiency in vector- and matrix-based operations. Single instruction stream, multiple data streams (SIMD), as shown in Fig. 1(b), which simultaneously applies an arithmetic operation to multiple data streams [2], has been adopted in virtually all modern CPUs and stream processors in GPUs. These processors incorporate dedicated SIMD tiles of cascaded FMA units with pipelined inputs to accelerate vector instructions [3]. Because caching the intermediate results is still necessary to ensure timing closure in electronics, the computing throughput per unit area is usually on the order of 0.1 tera operations per second per millimeter square ( $TOPS / {mm}^{2}$ ), and the vector length is typically limited to several hundred, even with a highly optimized layout of logic and memory units within an SIMD engine [3,4].

Figure 1.(a) Electronic and (b) photonic implementations of SISD and SIMD operations. (a1) An SISD electronic arithmetic unit that performs multiplication and addition. (a2) SIMD design with multiple, pipelined inputs for dot-product calculation. (b) Individual single-mode coherent mixers as multiplier units without parallelism, equivalent to SISD architecture in digital electronics.

Download full size

View all figures

Recently, driven by the computing demand in the artificial intelligence (AI), analog computing platforms based on integrated photonic devices [5,6] have demonstrated the potential of higher efficiency and computing throughput than the electronic counterparts, due to the intrinsically passive photonic multiply–accumulate (MAC) operations without intermediate memory access [7]. Figure 1(c) illustrates a photonic computing design based on two coherent mixers without parallelization in DOF of light, much like the SISD architecture in digital computing. In a single coherent mixing unit, the two inputs of electrical fields encode the numbers $a$ and $b$ in their amplitudes. After splitting and balanced detection, the output is proportional to their product $Re {a^{*} b}$ [8]. To perform dot products between two $N$ -element vectors, $N$ sets of mixers and balanced photodiodes are required, and the intermediate element-wise products must first be individually digitized and then summed in the post-processing stage. Due to the power consumption of analog-to-digital converters (ADCs) [9] required in the design, coherent mixing without data-level parallelism suffers from low efficiency when handling large vectors.

Similar to the transition from SISD to SIMD architecture in digital processors, using wavelength- or mode-division multiplexed (WDM or MDM) photonic signals enables a single coherent detection unit, consisting of a $2 \times 2$ coherent mixer and a pair of balanced photodiodes, to simultaneously process multiple data inputs in parallel [10,11]. Leveraging the intrinsic orthogonality of the light fields, coherent photonic MAC operations with multiplexed signals naturally accumulate the intermediate elementwise products between two vectors, and thus could achieve two- or threefold lower power consumption than the nonmultiplexed designs [12]. While WDM-based photonic processing devices have matured into practice to some extent in AI-related computing applications [13,14], MDM-based devices only begin to emerge as a viable approach in high-bandwidth optical communication [15,16], and their applications in parallel photonic computing are yet to be exploited.

Although utilizing MDM can lead to significant advances in high-bandwidth optical communication and photonic computing, a major bottleneck to high density integration is the large footprint usually associated with these MDM-based nanophotonic devices. In addition, different from the conventional MDM components used for optical communication, the complexity of photonic computing often necessitates two or more traditional MDM building blocks to implement the arithmetic operations. In this work, we present an end-to-end MDM-based photonic design that integrates the functionalities of multiple MDM blocks, resulting in a compact footprint for vector and/or matrix-based SIMD computing applications. Combined with peripheral electronics and algorithms targeting our photonic platform, we have experimentally demonstrated vector-dot product, complex number multiplication, and a computer vision task on a fabricated MDM-based photonic dot-product core.

2. PRINCIPLE OF OPERATION

A. MDM Coherent Photonic Dot-Product Core

Figure 2(a) shows an implementation of photonic dot-product core based on conventional MDM components in optical communications. The elements in the vector, $a_{1}$ ( $b_{1}$ ) and $a_{2}$ ( $b_{2}$ ), are mapped to the electric field profiles of the fundamental ( $ψ_{I}$ , TE0) and the second order ( $ψ_{II}$ , TE1) TE modes of a few-mode waveguide via mode multiplexers (MUXs). The mode-multiplexed photonic signals, $E_{a} = a_{1} ψ_{I} + a_{2} ψ_{II}$ and $E_{b} = b_{1} ψ_{I} + b_{2} ψ_{II}$ , undergo coherent mixing via multimode interference (MMI), producing the electrical fields on the upper and lower arms $E_{p} = \frac{1}{\sqrt{2}} (E_{a} + i E_{b})$ and $E_{n} = \frac{1}{\sqrt{2}} (i E_{a} + E_{b})$ . Based on the orthogonality between $ψ_{I}$ and $ψ_{II}$ , the difference between the overall intensity of the upper and lower outputs $I_{diff} = {| E_{p} |}^{2} - {| E_{n} |}^{2}$ produces the dot-product between vectors $\vec{a}$ and $\vec{b}$ . The functionality of the conventional MDM dot-product design can be expressed as a Kronecker product (denoted by $\otimes$ ) between a 3 dB coupling matrix representing the MMI, and an identity matrix representing the MUX, as in $[\begin{matrix} E_{II p} \\ E_{I p} \\ E_{I n} \\ E_{II n} \end{matrix}] = \frac{1}{\sqrt{2}} ([\begin{matrix} 1 & i \\ i & 1 \end{matrix}] \otimes [\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix}]) [\begin{matrix} a_{1} \\ a_{2} \\ b_{2} \\ b_{1} \end{matrix}] .$ (1)

Figure 2.Photonic implementation of vector dot-product core based on mode-division multiplexing. (a) Implementation based on the traditional MDM infrastructure in optical communications, using two MUXs and one MMI. (b) An end-to-end photonic dot-product core that integrates the functionalities of two MUXs and one MMI. The inset shows the photonic structure from inverse design.

Download full size

View all figures

Using conventional MDM components, the dot-product core requires two MUXs and one MMI. Each MUX occupies at least $20 μm \times 4 μm$ in footprint [17,18], which is required by the adiabatic taper. For a $2 \times 2$ MMI, a footprint of $40 μm \times 6 μm$ [19,20] is required to match the first Talbot distance. The overall footprint of a conventional dot-product core is thus larger than $50 μm \times 10 μm$ .

Figure 2(b) shows our topologically optimized mode-multiplexed photonic vector dot-product core that integrates the functionalities of two MUXs and one MMI within a $5 μm \times 3 μm$ footprint. Compared to the behavior of the electrical field inside a conventional multimode photonic design [Fig. 2(b1)], in which the regions for mode multiplexing and mixing are clearly distinguishable, the integrated dot-product core does not perform an intermediate conversion step of the input electrical fields onto the spatial mode basis. The end-to-end transformation of the electrical field by the integrated core, expressed as matrix $S_{i}$ in $[\begin{matrix} E_{II p} \\ E_{I p} \\ E_{I n} \\ E_{II n} \end{matrix}] = \frac{1}{\sqrt{2}} [\begin{matrix} 1 & 0 & 0 & i \\ 0 & 1 & i & 0 \\ 0 & i & 1 & 0 \\ i & 0 & 0 & 1 \end{matrix}] [\begin{matrix} a_{1} \\ a_{2} \\ b_{2} \\ b_{1} \end{matrix}],$ (2)contributes to its compact footprint.

The ultracompact footprint addresses one of the fundamental bottlenecks for utilizing MDM-based approaches for photonic computing and paves the way for high-density integration of the core in a parallel computing array.

B. End-to-End Design of Photonic Dot-Product Core

The photonic core was inversely designed on a silicon-on-insulator (SOI) platform by optimizing the structure that maximizes the coupling efficiency from the inputs into the target electric field profiles. The design process follows a gradient-based paradigm that tunes the distribution of the relative permittivity $ε_{r}$ on the silicon layer as the design parameters [21 –24]. The parameters are updated along the gradient direction of the objective function $l = \sum_{j = 1}^{J} {| E_{t_{j}}^{†} E_{j} (ε_{r}) |}^{2} .$ (3)

Here, $l$ calculates the overlap integral between the target field $E_{t_{j}}$ at the output location and the field $E_{j}$ within the structure, $ε_{r}$ is the three-dimensional distribution of relative permittivity, and ${(\cdot)}^{†}$ denotes the matrix conjugate transpose. We set the fundamental or second-order TE eigenmodes in the few-mode output waveguides as the target fields $E_{t_{j}}$ . The summation over $j$ aggregates the contributions from all four pairs of output and target fields. The field $E_{j}$ in the device satisfies the finite-difference frequency domain (FDFD) Maxwell equations in matrix form, expressed as $(D_{L} - diag (ε_{r})) E_{j} (ε_{r}) = b_{j} .$ (4)

Here, $D_{L}$ is the finite difference matrix for the three-dimensional vector electrical field, representing the operator $\frac{1}{k_{0}} \nabla \times \nabla \times$ with perfectly matched layers (PMLs) on the boundary of the solution domain [25]. $diag (ε_{r})$ represents a diagonal matrix constructed from the vectorized $ε_{r}$ . $k_{0}$ is the wavenumber in vacuum. $b_{j}$ denotes the input excitation that induces the field $E_{j}$ within the device and is derived from the fundamental TE mode of the input waveguide based on the total field/scattered field technique [26].

Combining Eqs. (3) and (4), the gradient of the objective function with respect to $ε_{r}$ can be derived as $\nabla l (ε_{r}) = \sum_{j = 1}^{J} 2 Re {diag (E_{j}^{*}) {(D_{L} - diag (ε_{r}))}^{- 1} E_{t j}} .$ (5)

The inverse problems ${(D_{L} - diag (ε_{r}))}^{- 1} E_{t j}$ and ${(D_{L} - diag (ε_{r}))}^{- 1} b$ were both solved using the least squares method [27], and were carried out on $J = 4$ parallel GPUs (NVIDIA RTX 3090). The relative permittivity $ε_{r}$ is updated along the gradient direction with an adaptive step size $τ$ as $ε_{r} \leftarrow ε_{r} + τ \nabla l (ε_{r})$ . To promote the binary medium (air and silicon) on the silicon layer, the updated $ε_{r}$ is mapped by a sigmoid function to produce the relative permittivity in the next iteration, and is expressed as $ε_{r}^{'} = \frac{ε_{Si} - ε_{air}}{1 + \exp (- γ (ε_{r} - \frac{ε_{air} + ε_{Si}}{2}))} + ε_{air} .$ (6)

Here, $ε_{Si}$ and $ε_{air}$ are the relative permittivity of silicon and air, respectively, and $γ = 4$ is a hyperparameter that controls the slope of the sigmoid function.

The inversely designed photonic dot-product core was fabricated on commercially available silicon-on-insulator (SOI) wafers. The wafers consisted of 250 nm silicon on top of a 3 μm buried oxide. The core was fabricated using a positive tone ZEP resist followed by electron beam lithography and inductively coupled plasma reactive ion etching. To realize the subwavelength sized and spaced features of the inversely designed structure, short range proximity correction was used to vary the dose of the exposure across the device. The core consisted of four single-mode input waveguides (480 nm in width) and two few-mode output waveguides (774 nm in width). The two few-mode output waveguides were each tapered to a $40 μm \times 40 μm$ photonic crystal structure [28,29], which vertically couples out the electric field profiles for observation by a microscope imaging system. Details of the photonic crystal design and simulation results can be found in Appendix B.

A microscope setup was used to experimentally characterize the fabricated vector dot-product core. The core was edge-coupled to the fiber array that provided four modulated inputs, each driven by an independent off-chip Mach–Zehnder modulator (MZM, JDSU IOAP-MOD9140). The modulators were driven by a multichannel digital-to-analog converter (DAC, Analog Devices, MAX11300), which was controlled by a microcontroller (Analog Devices, SDP-CK1Z). The modulated signals were edge-coupled into the four input ports of the dot-product core. The intensity profiles on the two vertical output couplers were recorded from above through a long working distance 20 $\times$ objective and a tube lens onto a short-wave infrared (SWIR) camera (Allied Vision, Goldeye CL-008 TEC1). The camera and DAC synchronously perform 100 multiplications per second at the frame rate of the SWIR camera.

3. EXPERIMENTAL RESULTS

A. Characterization of the Fabricated Dot-Product Core

Figure 3(a) shows a microscope image of the photonic core under our characterization setup. Figure 3(b) plots the intensity profiles at the output coupler when the first two single-mode input arms, $a_{1}$ and $a_{2}$ , were individually activated in the experiments. The intensity profiles match the target spatial profiles of the fundamental and second-order modes. To quantify the computing performance, we simulated the electromagnetic (EM) behavior of the designed and fabricated core using Ansys Lumerical FDTD software based on the design and scanning electron microscope (SEM) image, respectively. TE fundamental modes were launched into each single-mode input waveguide, and the resulting field profiles at a nominal operating wavelength of 1570 nm are shown in Figs. 3(c) and 3(d). The orange boxes show the cross-section of the electrical field profiles marked by the dashed lines.

Figure 3.Characterization of the fabricated dot-product core. (a) Microscope image of the fabricated dot-product core under test. (b) Experimentally observed intensity profiles on the two output couplers when the inputs $a_{1}$ and $a_{2}$ were individually excited. (c) Structure of the ideal inversely designed dot-product core and simulated electrical field profiles within the core. (d) SEM image of the fabricated dot-product core and simulated electrical field profiles within the core based on the SEM image. The side views show the electrical field profiles at the location marked by the orange dashed line.

Download full size

View all figures

The transfer matrices $S_{t}$ of the designed and fabricated cores can be calculated from the overlap integral [30] between the cross-sectional electrical fields and the two TE eigenmodes in the top and bottom arms. Both matrices share the same structure as the ideal transfer matrix $S_{i}$ in Eq. (2). The ideal inversely designed core features a symmetric design with $< 10 %$ crosstalk, as indicated by the off-diagonal elements. The power-splitting ratios between the top and bottom arms are both approximately 46% versus 54% for the fundamental and second-order TE modes. The fabricated core maintains the relative low crosstalk with a maximum of 13.5% in the off-diagonal elements. The power splitting ratios are 41% versus 59% and 49% versus 51% for fundamental and second-order TE modes, respectively.

The insertion loss and crosstalk of the designed and fabricated core can be quantified by the crosstalk matrix $M_{X}$ , whose elements are the overlap between the columns in the transfer matrices of the ideal ( $S_{i}$ ) and the designed (or fabricated) device ( $S_{t}$ ), expressed as $M_{X} [i, j] = S_{t} {[:, i]}^{*} \cdot S_{i} [:, j] .$ (7)

Here, $[:, i]$ extracts the $i$ -th column vector from the matrix. The insertion loss (IL) and crosstalk (XT) can both be derived from $M_{X}$ , respectively, as [30] $IL (dB) = - 10 \log_{10} (\max eigenvalue of M_{X}); XT (dB) = - 10 \log_{10} (\frac{power in the diagonals of M_{X}}{power in the off - diagonals of M_{X}}) .$ (8)

The crosstalk matrices at a normal operating wavelength of 1570 nm of both ideal and fabricated dot-product core designs are shown in Fig. 4(a). Figure 4(b) plots the insertion loss and crosstalk of the designed and fabricated cores as a function of the wavelength. The ideal dot-product core design features a consistent 2.3 dB insertion loss and a crosstalk of $< - 13 dB$ ( $< 5 %$ ) across the wavelength range of 1540 nm to 1590 nm. The fabricated core maintains a consistent insertion loss and crosstalk within the wavelength range 1550 nm to 1580 nm, suggesting broadband performance that supports wavelength multiplexed inputs. Despite the uneven splitting of the input fields into the upper and lower arms, the crosstalk between the two spatial modes in the output waveguides is $- 9.06 dB$ , or 12.4%. The low crosstalk allows us to empirically correct most of the computing errors, as described in Appendix A.

Figure 4.Characterization of (a1), (b1) designed and (a2), (b2) fabricated dot-product core. (a) Crosstalk matrix $M_{X}$ of the core. (b) Insertion loss and crosstalk (in dB) as a function of wavelength.

Download full size

View all figures

B. General-Purpose Computing Examples

The core supports dot products between two-element vectors with fixed-point precision, enabling the deployment of general purpose computing tasks such as complex number multiplication and optical flow calculation. To carry out general purpose dot-product calculations, $(a_{1}, a_{2}) \cdot {(b_{1}, b_{2})}^{T}$ , on the photonic core, we calibrated the four MZMs to generate five signed linear analog levels representing the integers from $- 2$ to 2 on each input. The four input ports of the inversely designed photonic structure receive the modulated optical signal representing $a_{1}$ , $a_{2}$ , $b_{2}$ , and $b_{1}$ , respectively, from top to bottom. The intensity differences between two output couplers were proportionally mapped to the dot products using the output from $(1, 0) \cdot {(1, 0)}^{T}$ . Figure 5(a) plots a time-division multiplexing (TDM) sequence of 16 dot products performed on the photonic core. We quantify the computing error with normalized mean square error (NMSE) between the ground truth $Y_{gt}$ and the experimental $Y_{\exp}$ dot products, defined in $NMSE = \frac{\sum_{k = 1}^{K} {| Y_{k, \exp} - Y_{k, gt} |}^{2}}{\sum_{k = 1}^{K} {| Y_{k, gt} |}^{2}} .$ (9)

Figure 5.General-purpose computing examples as dot-products on the photonic core. (a) Dot-product calculation of a sequence of 16 two-element vectors. (b) Complex number multiplications encoded as two equivalent dot-products in time-division multiplexing. (c1), (c2) Multiplication results between 16 complex numbers. Blue circles indicate ground truth results, green circles indicate simulated results from the ideal inversely designed core in (b), and red circles indicate experimental results calculated on the fabricated dot-product core.

Download full size

View all figures

Here, the summation is performed over all $K$ symbols in the sequence. The NMSE of all multiplications was 6.32%, offering sufficient dynamic range to represent signed integers from $- 8$ to 8 (signed 4-bits) in the dot-product results.

We first applied the photonic dot-product core to perform complex number multiplication [i.e., $(a + b i) \times (c + d i)]$ . The real and imaginary parts of the result [ $(a c - b d) + (a d + b c) i$ ] are split into two equivalent dot products encoded in a TDM symbol sequence. Sixteen complex number pairs represented by a sequence of 32 dot products were multiplied on the core. Figure 5(c) compares the products from the ideal and fabricated cores with the ground truth on the complex plane. The designed dot-product core shows good agreement with ground truth and an NMSE of 4.0%, suggesting that the design can reach a dynamic range of signed 25 levels, or greater than the signed 4-bit precision. The NMSE between the ground truth and experimental complex products is 15.9%, which is consistent with the simulation of a fabricated dot-product core. The computing error is primarily attributed to the fabrication deviation from the ideal design and the time-varying phase instability from the off-chip fiber inputs. The phase stability can be improved by switching to on-chip modulators. The fabrication deviation can be compensated with additional phase modulation on each input, which can be generated from integrated thermal optical phase shifters.

In addition, we have also demonstrated a computer vision task using the photonic dot-product core. Specifically, we use the device to calculate the optical flow in a visual scene to quantify the motion of the object. The real-time calculation of the optical flow in a dynamic environment plays an important role in motion detection and object tracking of computer vision systems [31,32]. Here, we calculated the optical flow of selected edge pixels between two adjacent two-dimensional frames, $I_{1} (x, y)$ and $I_{2} (x, y)$ , from a 10 frames-per-second spinning wheel animation on the dot-product core. The flow vector ${(u, v)}^{T}$ satisfies $(d_{x}, d_{y}) \cdot {(u, v)}^{T} = - d_{t}$ , where $d_{x}$ , $d_{y}$ , and $d_{t}$ are the finite differences of the image $I_{t} (x, y)$ along $x$ , $y$ , and $t$ dimensions, respectively [33]. Due to the ambiguity in uniquely determining the pixelwise ${(u, v)}^{T}$ , we expand the optical flow vector onto the diagonal pixels in a $2 \times 2$ window, as $[\begin{matrix} d_{x 11} & d_{y 11} \\ d_{x 22} & d_{y 22} \end{matrix}] \cdot [\begin{matrix} u \\ v \end{matrix}] = - [\begin{matrix} d_{t 11} \\ d_{t 22} \end{matrix}] .$ (10)

Assuming uniform flow vectors in the $2 \times 2$ window, the calculations are broken down into two parts: (i) on the two pixels [marked in gray in Fig. 6(b)] along the primary diagonals $d_{x 11}$ , $d_{x 22}$ , $d_{y 11}$ , $d_{y 22}$ , $d_{t 11}$ , and $d_{t 22}$ ; and (ii) along the secondary diagonals $d_{x 12}$ , $d_{x 21}$ , $d_{y 12}$ , $d_{y 21}$ , $d_{t 12}$ , and $d_{t 21}$ [marked in white in Fig. 6(b)]. Results from the primary and secondary diagonals are averaged to obtain the flow vector within the $2 \times 2$ window. Equation (10) can be solved using Cramer’s rule, written as $u = - \frac{| \begin{matrix} d_{t 11} & d_{y 11} \\ d_{t 22} & d_{y 22} \end{matrix} |}{| \begin{matrix} d_{x 11} & d_{y 11} \\ d_{x 22} & d_{y 22} \end{matrix} |}, v = - \frac{| \begin{matrix} d_{x 11} & d_{t 11} \\ d_{x 22} & d_{t 22} \end{matrix} |}{| \begin{matrix} d_{x 11} & d_{y 11} \\ d_{x 22} & d_{y 22} \end{matrix} |} .$ (11)

Figure 6.Optical flow calculation between two adjacent frames of a spinning wheel animation. (a) Two frames from the spinning wheel animation with 100 ms interval (10 frames per second). (b) Optical flow vector of eight edge pixels. Red arrows indicate the ground truth of the flow vectors, and orange arrows indicate experimental results calculated on the photonic dot-product core. (c) Comparison of the calculated angular speed on the dot-product core with ground truth.

Download full size

View all figures

Here, all the $2 \times 2$ determinants $| \begin{matrix} a & b \\ c & d \end{matrix} |$ are computed as their equivalent dot products $a d - b c$ . The flow vector within one $2 \times 2$ window requires six dot-product calculations encoded in a TDM sequence.

Figure 6(c) shows the optical flow vector within eight ( $J = 8$ ) $2 \times 2$ windows on the edges of the spinning wheel. We quantify the error in flow vector calculation using the mean cosine similarity, defined in $S_{c} = \frac{1}{J} \sum_{j = 1}^{J} \frac{(u_{j, ideal}, v_{j, ideal}) \cdot (u_{j, \exp}, v_{j, \exp})}{|| (u_{j, ideal}, v_{j, ideal}) || || (u_{j, \exp}, v_{j, \exp}) ||} .$ (12)

Here, $(u_{j, ideal}, v_{j, ideal})$ and $(u_{j, \exp}, v_{j, \exp})$ represent the ideal flow vector and the one calculated on the fabricated dot-product core, respectively. The mean cosine similarity is 81.8%, suggesting that the flow vectors calculated on the photonic dot-product core have captured the correct rotation direction. The mean magnitude of the flow vectors reflects the angular speed of the wheel, which is 1.32 rad/s based on the calculated optical flow on the fabricated dot-product core. Compared to the ground truth 1.41 rad/s, the relative error of the angular speed calculation is 6.8%. This example illustrates that the fixed-point dot-product core can be used in conjunction with a tailored algorithm to extract features-of-interest in computer vision tasks.

4. CONCLUSION

In summary, we have demonstrated a compact, integrated photonic dot-product core from an inverse design. The core utilizes spatial mode as the multiplexing dimension to perform arbitrary two-element vector dot products. To account for the difference between the design and fabricated photonic structures, calibration and error correction routines have been developed and tested on the fabricated dot-product core. We have demonstrated an equivalent signed 4-bit precision in the dot-product results, and successfully deployed a general purpose complex number multiplier and an optical flow calculator on the fabricated device.

The miniaturized footprint enables the large-scale integration of the core as part of the photonic primitives in an electronic-photonic co-packaged parallel computing array on modern CMOS-compatible platforms. Combining our current design with on-chip modulators and multimode photodiodes [34], a computing speed on the order of $10^{9}$ dot products per second is supported by modern gigabaud optoelectronics. By further integrating dense wavelength-division multiplexing (DWDM) channels and spatial modes as super-dimensions in photonic matrix- and tensor-based processors [10], our strategy enables a computing throughput on the order of $10^{3} TOPS / {mm}^{2}$ , which is orders of magnitude higher than that of dedicated electronic vector/matrix accelerators [4,35].

Acknowledgment

Acknowledgment. R.S. acknowledges the support of the Laboratory Directed Research and Development program at Sandia National Laboratories, a multi-mission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration. This work was performed in part at the Center for Integrated Nanotechnologies, an Office of Science User Facility operated for the U.S. Department of Energy (DOE) Office of Science. This paper describes objective technical results and analysis. Any subjective views or opinions that might be expressed in the paper do not necessarily represent the views of the U.S. Department of Energy or the United States Government.

Category: Silicon Photonics

Received: Apr. 3, 2024

Accepted: Jul. 22, 2024

Published Online: Oct. 8, 2024

The Author Email: Zheyuan Zhu (zheyuan.zhu@ucf.edu)

DOI:10.1364/PRJ.524419

CSTR:32188.14.PRJ.524419

微信扫一扫：分享