Acta Photonica Sinica, Volume. 54, Issue 5, 0506003(2025)
Vocal Cord Vibration Sensors and Speech Intelligent Recognition Based on S-shaped Micro-nano Optical Fibers
The vocal cords serve as the primary organ responsible for human vocalization. The vibrational signals generated by the vocal cords carry abundant textual information, much like those of speech signals. Vocal cord vibration recognition technology has the potential to effectively address the fundamental communication challenges encountered in the daily lives of individuals with language disorders. The utilization of flexible pressure sensors enables the detection of vocal cord vibrations and facilitates the discrimination of subtle variations within these vibrations. However, traditional electrical sensors are plagued by issues such as parasitic effects and electromagnetic interference, which significantly restrict their practical application in the detection of vocal cord vibration signals. In contrast, fiber optic sensors, particularly Micro-Nano Fiber (MNF) sensors, are more apt for detecting vocal cord vibrations owing to their small size, rapid response speed, and high sensitivity.At present, some researchers are engaged in using MNF for vocal cord recognition studies, although the intelligent classification and recognition of vibration signals as corresponding speech information has not yet been accomplished. Integrating wearable devices with deep learning presents a novel approach to accurately recognizing vocal cord vibration signals. This paper designs and fabricates a wearable flexible sensor featuring an S-shaped MNF, which consists of two layers of polydimethylsiloxane (PDMS) films with an S-shaped bent MNF embedded within it. The S-shaped MNF structure enlarges the contact area between the optical fiber and the vocal cords, thereby augmenting the efficiency and sensitivity of vocal cord vibration signal acquisition, enabling the capture of more stable and accurate signals. Additionally, this design makes the optical fiber's mode field more susceptible to vocal cord vibrations. When the vocal cords vibrate, the curved sections of the S-shaped structure experience minute displacements and deformations. These changes lead to substantial variations in the phase and intensity of the optical signals within the S-shaped MNF, which contain characteristic information regarding the vocal cord vibrations. Furthermore, combining the S-shaped MNF with flexible PDMS material effectively prevents local pressure or damage to the vocal cords during use, thus enhancing the safety and repeatability of the sensor. Subsequent in-depth simulation studies disclose that the evanescent field of the MNF increases as the MNF diameter and bending radius decrease. To balance the strength of the optical fiber with the contact area, the optimal sensor parameters were ultimately determined, specifically an MNF diameter of 4 μm and a bending radius of 1 mm.In terms of performance evaluation, this article has comprehensively investigated the response of MNF flexible sensors to both static and dynamic pressure as well as vibration. Experimental results show that the sensor's response time (222 ms) and recovery time (163 ms) are both under 300 ms, thereby demonstrating its rapid responsiveness to external stimuli and excellent durability. Across different frequencies, the sensor displays significant response variations at each test frequency, indicating strong frequency adaptability. In vocal cord vibration recognition, the sensor is worn in the vocal cord region of the human body. When the subject utters a sound, the vibrations of the vocal cords lead to changes in the transmitted light intensity of the sensor. A photodetector (PD, CONOUER, 200 kHz) converts the light intensity into corresponding electrical signals in real-time, and these signals are then transmitted to an oscilloscope for display and monitoring. The subject is required to repeatedly pronounce the 26 English letters, facilitating the acquisition of precise light-intensity spectral responses. A total of 1 660 datasets are obtained and combined with the target detection algorithm (YOLOv8) for the classification and recognition of vocal cord vibration signals, achieving an average recognition accuracy of 96.8%. To further assess the sensor's universality, vocal cord vibration data for four commonly used high-frequency phrases—“Ni Hao”, “Zao Shang Hao”, “Hello”, and “How Are You”—were collected from ten participants (5 males, 5 females). The YOLOv8 model was used to train and recognize 1 200 collected datasets. Among these phrases, the recognition accuracy for “How Are You” was the highest at 99%, with an average recognition accuracy across all phrases being 97.75%. These results suggest that the sensor attains high recognition accuracy across multiple individuals, highlighting its strong generalization capability. To comprehensively showcase the advantages of the proposed flexible wearable sensor based on MNF combined with deep learning technology, a detailed comparison was carried out with other sensors paired with different deep learning models regarding accuracy. The results disclose that the S-shaped MNF sensor combined with the YOLOv8 model used in this study achieves high accuracy in recognizing various characteristic signals (26 English letters). Moreover, even with 10 test subjects, the model maintains outstanding accuracy in identifying four commonly used phrases. Therefore, the proposed approach not only surpasses other methods in terms of accuracy but also exhibits significant advantages in meeting diverse requirements and demonstrating broad applicability.This study designs and fabricates a flexible wearable sensor that is based on a PDMS-encapsulated S-shaped MNF structure for the recognition of vocal cord vibration signals. Through theoretical modeling, simulation, and analysis, the main structural parameters of the sensor are rationally designed in order to achieve high sensitivity, reliability, and applicability. By utilizing the YOLOv8 deep learning model, the sensor successfully recognizes 26 English letters with an accuracy rate of 96.8%, thereby demonstrating its significant potential in vocal cord vibration recognition applications. To further enhance the generalizability of vocal cord vibration recognition, training is carried out using the pronunciations of four commonly used words from 10 different subjects, attaining a recognition accuracy of 97.75%. This effectively validates the sensor's universality and reliability among different users. The sensor is simple to fabricate, highly reliable, and possesses excellent resistance to electromagnetic interference, thus offering promising prospects in human-computer interaction applications. Future work will concentrate on testing across diverse populations and system optimization to establish a cloud database, further expanding its application potential in vocal cord vibration recognition, and holding prospects for assisting individuals with speech disabilities in daily communication.
Get Citation
Copy Citation Text
Zhijun WANG, Shengyou HUANG, Kun LI, Yang YANG, Fudan CHEN, Binbin LUO, Decao WU, Xue ZOU. Vocal Cord Vibration Sensors and Speech Intelligent Recognition Based on S-shaped Micro-nano Optical Fibers[J]. Acta Photonica Sinica, 2025, 54(5): 0506003
Category: Fiber Optics and Optical Communications
Received: Nov. 20, 2024
Accepted: Jan. 24, 2025
Published Online: Jun. 18, 2025
The Author Email: Binbin LUO (luobinbin@cqut.edu.cn)