Advanced Imaging

Research Background

 

With the rapid advancement of artificial intelligence, face recognition technology has achieved significant improvements in accuracy and reliability, becoming a cornerstone of modern identity verification systems. It is widely deployed in security, finance, transportation, and other critical domains. However, the convenience and security it offers are accompanied by escalating risks of privacy breaches. As a unique and immutable biometric feature, facial data leakage or misuse can lead to severe consequences for individuals. Balancing facial recognition accuracy with robust privacy protection remains a pressing challenge. Current solutions, spanning software and hardware approaches, face trade-offs in system power consumption, computational complexity, and recognition performance.

 

Recently, a research team led by Prof. Hongwei Chen at Tsinghua University published a study titled "Privacy-preserving face recognition with a mask-encoded microlens array" in Advanced Imaging. The team proposed a novel privacy-preserving face recognition system (MEM-FR) that encrypts facial data at the optical layer before signals reach the sensor. By integrating a mask-encoded microlens array (MLA) to perform optical dilated convolution, the system extracts facial features while inherently protecting privacy. This passive optical convolution under incoherent illumination achieves superior spatial resolution, enhanced optical throughput, and improved recognition accuracy. An end-to-end training strategy based on a dual cosine similarity loss function balances privacy preservation and recognition performance. The system demonstrated remarkable effectiveness, achieving 95.0% and 92.3% recognition accuracy in simulated and physical experiments, respectively.

 

Fig.1 Face recognition framework of the MEM-FR system.

 

1. Mask-encoded microlens array for optical privacy protection

 

The MEM-FR system addresses the limitations of conventional privacy-preserving systems in spatial resolution, light throughput, and recognition performance by encrypting images optically before sensor acquisition. Building upon the team's prior research, the MEM-FR system incorporated an MLA aligned with mask apertures to enable optical dilated convolution (5×5 kernel, dilation rate 8). This design yields three key advancements:

  1. Enhanced light throughput: The MLA's light-focusing capability allows larger mask apertures (300 μm vs. 30 μm in LOEN), increasing optical throughput by 4.4 times.
  2. Superior spatial resolution: By extending the mask-sensor distance to 14.2 mm (vs. 1 mm in LOEN) while maintaining point spread function (PSF) characteristics (Δ' ≈ 30 μm), the system achieves a theoretical 14.2-fold resolution improvement (160×160 pixels), validated by 4.0% and 5.0% accuracy gains in simulations and physical experiments.
  3. Sensitive information blurring: The MLA-mask synergy generates sparsely distributed PSFs on the sensor, mimicking dilated convolution to expand the receptive field while blurring sensitive details, thereby avoiding the collection of identifiable facial features.

 

 

Fig. 2 Schematic diagram of the experimental setup. (a) MEM-FR prototype for face recognition. (b) Schematic diagram of the convolution calculation correction in the experiment.(c) Concept of the point spread function (PSF) formation. (d) Optical path comparison betweenthe MEM-FR system (right) and LOEN system (left), where the MEM-FR system increases thespatial resolution from Δ to Δ∕14.2 and enlarges the aperture size of the mask from 30 to300 μm, enhancing light throughput.

 

2. End-to-End Optimization for Performance-Privacy Balance

 

The team proposed an end-to-end joint optimization strategy to strengthen privacy protection while ensuring recognition accuracy through the co-training of an optical dilated convolution layer and a backend electronic neural network. The strategy employs a composite loss function comprising three components: (a) Task loss: A cross-entropy loss to measure the discrepancy between predictions and ground truth. (b) Dual cosine similarity loss: Constrains the similarity between encrypted images, their complementary counterparts, and raw data to prevent privacy leakage through complementary relationships. (c) Kernel weight gradient loss: Promotes spatial dispersion of optical convolution kernel weights, enhancing the irreversibility of optical encoding. These loss terms are jointly optimized with a weighted ratio of 1:20:0.02, achieving a balance between privacy protection and recognition performance.

 

To validate the privacy protection efficacy, the team conducted three experiments: (a) Human visual assessment: Environmental factors (e.g., lighting, eyewear) alter spatially processed information during optical convolution, rendering encrypted images visually unrecognizable. (b) Standard model evaluation: Untrained models like FaceNet achieved only 63.10% accuracy on encrypted LFW datasets—significantly lower than the MEM-FR system's 92.33% accuracy—confirming the system's privacy protection capability. (c) Blind Deconvolution Attack: U-Net-trained models using 600 image pairs yielded reconstructed images with PSNR=21.84 dB and recognition rates below 70%, proving the impracticality of recovering original facial data. These multi-dimensional validation results demonstrate the system's reliability in achieving optical-level privacy protection.

 

Conclusion and Outlook

 

Addressing the long-standing challenge of balancing privacy protection with recognition performance in face recognition technology, this study proposes the MEM-FR system, which leverages encoded microlens arrays (MLA) and end-to-end joint optimization to pioneer a new approach for high-performance optical privacy protection in face recognition. By performing optical dilated convolution on raw images at the optical layer, the system significantly enhances spatial resolution and light throughput. Combined with task loss, dual cosine similarity loss, and kernel weight gradient loss, it achieves balanced optimization of privacy protection and recognition performance. Real-world face recognition experiments validate its practicality in natural scenarios.

 

Future research will focus on three directions: (a) Optical modulation upgrade: Explore grayscale masks and reconfigurable devices to improve optical computational flexibility, reduce system power consumption, and enhance complex task handling. (b) Standardization and industrialization: Develop optical privacy protection standards and promote compliant deployment in security, finance, and transportation through customized MLA parameters. (c) Cross-Task Generalization: Validate the framework's universality in other visual privacy protection tasks.