Opto-Electronic Advances, Volume. 8, Issue 1, 240267-1(2025)

Advancing depth perception in spatial computing with binocular metalenses

Junkyeong Park1, Gyeongtae Kim1, and Junsuk Rho1,2,3,4、*
Author Affiliations
  • 1Department of Mechanical Engineering, Pohang University of Science and Technology (POSTECH), Pohang 37673, Republic of Korea
  • 2Department of Chemical Engineering, Pohang University of Science and Technology (POSTECH), Pohang 37673, Republic of Korea
  • 3Department of Electrical Engineering, Pohang University of Science and Technology (POSTECH), Pohang 37673, Republic of Korea
  • 4POSCO-POSTECH-RIST Convergence Research Center for Flat Optics and Metaphotonics, Pohang 37673, Republic of Korea
  • show less

    Spatial computing and augmented reality are advancing rapidly, with the goal of seamlessly blending virtual and physical worlds. However, traditional depth-sensing systems are bulky and energy-intensive, limiting their use in wearable devices. To overcome this, recent research by X. Liu et al. presents a compact binocular metalens-based depth perception system that integrates efficient edge detection through an advanced neural network. This system enables accurate, real-time depth mapping even in complex environments, enhancing potential applications in augmented reality, robotics, and autonomous systems.

    The research team tested the metalens system’s capabilities across various challenging scenarios. For example, with transparent backgrounds, plastic sheets printed with "RIKEN" and "CITYU" placed at 16.0 cm and 12.8 cm were accurately distinguished. Similarly, in a scene with a 3D building sketch at 17.3 cm, along with toy cars positioned at 12.9 cm and 15.7 cm, the system precisely captured the depth differences (Fig. 2). Further tests included architectural sketches at 13.5 cm and 16.5 cm, as well as a toy car body ranging from 12.5 cm to 15.5 cm in depth. The system's ability to identify depth variations, coupled with a processing time of less than 0.15 seconds, highlights its potential for real-time applications in areas like AR, robotics, autonomous navigation, and other time-sensitive uses.

    The researchers developed H-Net, a stereo-matching neural network that follows an end-to-end learning framework from stereo input images to disparity map prediction. They could address the limitations of traditional stereo-matching algorithms, which frequently struggle in ambiguous environments such as flat, textureless surfaces or reflective areas where it is challenging to capture accurate depth data. To overcome these difficulties, H-Net’s architecture incorporates a novel “H-Module,” which includes cross-pixel interaction and cross-view interaction mechanisms to aggregate contextual information and combine diverse perspectives effectively. These interactions allow the system to dynamically prioritize contextual information from stereo views, enhancing edge detection within the images.

    The fields of spatial computing1 and augmented reality (AR) are on the cusp of a transformative evolution, as researchers and engineers develop technologies to seamlessly blend the virtual and physical worlds. To make this vision a reality, devices must accurately perceive depth, allowing virtual objects to integrate naturally within real-world environments. Methods for achieving depth perception include stereo cameras2, structured light3, and time-of-flight systems4. Among these, stereo cameras are widely used because they do not require an active light source, resulting in lower power consumption. However, traditional lenses tend to be thick and heavy, which is a significant drawback.

    The physical operation of the device relies on a binocular metalens, a 532 nm filter, and a CMOS sensor, to function like human binocular vision (Fig. 1). This novel metalens employs an array of nanopillars, each crafted from gallium nitride (GaN) on a sapphire substrate. Unlike conventional lenses, which often sacrifice form for function, these metalenses are exceptionally thin, lightweight, and efficient. Each metalens has a diameter of 2.6 mm and a volume of 4.25×10−6 cm3, with a weight of 2.61×10−5 g, making it lighter than one percent of a typical hair's weight. This enables a device to capture depth data with greater accuracy and clarity, all within a design optimized for portability and comfort.

    Edge-enhanced depth perception of various objects. The first image displays the original left view, while the second image shows the corresponding depth map. The third image presents the edge-enhanced depth map, using a shared color scale shown to the right of the third image. The fourth image is the integration of the raw image with the edge-enhanced depth map to provide a combined view.

    Figure 2.Edge-enhanced depth perception of various objects. The first image displays the original left view, while the second image shows the corresponding depth map. The third image presents the edge-enhanced depth map, using a shared color scale shown to the right of the third image. The fourth image is the integration of the raw image with the edge-enhanced depth map to provide a combined view.

    Schematic of the edge-enhanced spatial computing with binocular metalens.

    Figure 1.Schematic of the edge-enhanced spatial computing with binocular metalens.

    As a result, the H-Module filters out unreliable depth data from non-textured regions, resulting in a more precise and stable depth map. This edge-based approach to depth perception, enhanced by cross-pixel and cross-view interaction, minimizes errors in depth prediction. Traditional depth-sensing algorithms often struggle in low-texture regions, as these areas lack sufficient feature points for depth mapping. With the H-Module in H-Net, contextual information is preserved and processed through the 4D cost volume created from left and right image features, and depth estimation is refined through a 3D CNN, followed by a disparity regression module before final disparity map prediction. This approach allows the system to consistently produce accurate depth perception even in complex environments, preserving edges that indicate depth shifts and achieving a more realistic 3D model of the scene.

    Metasurfaces have emerged as a promising alternative to conventional optical components58. These ultra-thin, lightweight surfaces can manipulate light at a subwavelength scale, enabling advanced optical functionalities that were previously unattainable with traditional lenses. In depth-sensing applications, metasurfaces have been effectively applied to point spread function (PSF) engineering9 and structured light10, 11, showing great potential for developing more compact and efficient depth perception systems. As the demand for lightweight and compact depth cameras grows, research on metasurface-based depth perception has accelerated. In a recent work published in Opto-Electronic Science12, X. Liu et al. has recently introduced a groundbreaking binocular metalens-based depth perception system. This compact and lightweight solution promises to enhance next-generation wearable devices, bringing us closer to a more immersive and practical spatial computing experience.

    The fusion of spatial computing and meta-optics represents a new era, enabling devices like AR glasses and drones to seamlessly integrate digital data with the physical world. This compact, efficient system lays the foundation for immersive and practical devices, driving spatial computing’s potential in daily life. Innovations like the binocular metalens system are paving the way for real-time, high-quality 3D modeling across fields as diverse as robotics, healthcare, and virtual reality, marking an essential step toward a more integrated technological future.

    [1] G Yenduri, M Ramalinga, KR Maddikunta et al. Spatial computing: concept, applications, challenges and future directions(2024).

    Tools

    Get Citation

    Copy Citation Text

    Junkyeong Park, Gyeongtae Kim, Junsuk Rho. Advancing depth perception in spatial computing with binocular metalenses[J]. Opto-Electronic Advances, 2025, 8(1): 240267-1

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Research Articles

    Received: Nov. 11, 2024

    Accepted: Nov. 15, 2024

    Published Online: Mar. 24, 2025

    The Author Email: Rho Junsuk (RhoJ)

    DOI:10.29026/oea.2025.240267

    Topics