Laser & Optoelectronics Progress
Co-Editors-in-Chief
Dianyuan Fan
Lixuan Chen, Peng Rao, Hanlu Zhu, Yingying Sun, and Liangjie Jia

Point source method can rigorously detect the modulation transfer function (MTF). The method can effectively test the MTF value in all directions and can be used to monitor the on-orbit performance of space detection cameras. However, there is noise in the point source image, and the traditional filtering method is used to denoise it. The detection accuracy may be insufficient, and even the phenomenon that the MTF curve step is approximately 0, which affects the evaluation of image quality. Therefore, depending on the characteristics of the light intensity distribution of the point source image, a point source image-denoising method based on the mean value of the flat area is proposed. This method is quantitatively compared with the traditional filtering method. Simulation and experimental results show that this method increases the detection rate of the MTF curve to more than 90.91%. Compared with the best median filtering in traditional filtering methods, the detection accuracy of MTF can be improved by 98.73%, improving the peak signal-to-noise ratio and structural similarity of images.

Sep. 02, 2020
  • Vol. 57 Issue 18 181001 (2020)
  • Lu Lu, Jiong Yang, Jie Liang, and Yulin Jiang

    To quickly and accurately detect rectangles and comprehensively analyze the advantages and disadvantages of existing rectangle detection algorithms, a fast and high-precision rectangle detection algorithm is proposed herein. The proposed algorithm first divides an image into multiple regions of interest using a ring window, then extracts the subpixel contour of the region of interest, divides the subpixel contour into several line segments, and finally uses fuzzy mathematics to analyze the geometric and physical characteristics of the line segments. As a result of the fusion calculations, high-precision detection and positioning of the rectangle are obtained based on the fuzzy fusion assessment. Experimental results show that the detection speed of the proposed algorithm is 7.4 times that of the rectangle detection algorithm based on Hough transform, and the maximum center positioning error is (0.507 pixel, 0.272 pixel). Furthermore, the average length error is 1.034 pixel, the average width error is 1.310 pixel, and the average inclination error is 0.304°. The rectangle can be accurately detected through the proposed algorithm that meets the requirements of fast and high precision in industrial applications and exhibits strong stability.

    Sep. 02, 2020
  • Vol. 57 Issue 18 181002 (2020)
  • Guoqing Qiu, Haijing Yang, Yantao Wang, Yating Wei, and Pan Luo

    In this paper, a dim target detection method in airborne infrared images based on visual feature fusion is proposed. The proposed method aims at improving the high false alarm rate or low detection rate achieved by existing methods in complex cloud and strong clutter interference environments. Initially, the original image is sharpened using Laplace algorithm to extract the contour edge, which is added to the original image. The purpose is to enhance the pixel intensity of real and suspected targets. Subsequently, based on the gradient characteristics of the targets, the local multidirectional gradient method is used to suppress the complex background and strong clutter in processed images. Next, based on the gray difference characteristics of the images, the local gray difference method is employed to properly enhance the target. Finally, the images acquired by visual feature information are fused to highlight the saliency of the targets, and the adaptive threshold is used to achieve accurate target detection. The experiment results verify that compared with other methods, the proposed method significantly improves the signal-to-clutter ratio, background suppression factor, and detection rate. It also achieves a lower false alarm rate.

    Sep. 02, 2020
  • Vol. 57 Issue 18 181004 (2020)
  • Yuan Fang, Yang Zhou, Jing Zhao, and Kaige Gong

    To improve the accuracy, anti-interference and real-time performance of linear motor mover position measurement from the shooting target image, an image displacement measurement algorithm based on fine interpolation of correlation peak (FICP) is introduced in this paper, and a deep learning algorithm is used to select the fence image with strong robustness. First, the width standard deviation and average gray gradient of the fence fringe are controlled to generate a series of fence fringe images. Second, the displacement of adjacent target images is calculated by combining chirp Z transform and FICP algorithm. Then, the mean error of displacement estimation is used as the evaluation index, and deep neural network is used to establish the quality optimization model of fence image, and the aperiodic fence image with strong robustness is screened out. Finally, the one-dimensional fence image signal in the motion process is obtained by line-scanning camera, the calibration coefficient of the system is determined according to the chessboard target method, and the actual displacement value is obtained. Simulation and experimental results show that the optimized aperiodic fence image selected in this paper can effectively improve the measurement accuracy and prove the correctness of this method.

    Sep. 02, 2020
  • Vol. 57 Issue 18 181005 (2020)
  • Jianjun Li, Yue Sun, and Baohua Zhang

    Research on interactive behavior recognition has always been a research hotspot and difficulty in the field of machine vision research. For the problem of low recognition rate, this paper proposes a recognition algorithm that combines edge features of depth images, texture features of RGB (Red, Green, Blue) images, and optical flow motion trajectory features. First, Canny operator is used to extract the edge features of the depth images, local binary pattern operator is used to extract the texture features of the RGB images, and optical flow histogram is used to describe the dynamic characteristics of the images. Then, the extracted edge features and texture features are weighted and fused. Finally, static fusion feature and optical flow motion trajectory feature are coded and fused using the spatial pyramid matching model based on sparse representation to identify interactive behaviors. Experimental results based on MSR Action Pair, SBU Kinect interaction, and CAD-60 data sets show that the algorithm has a better recognition effect.

    Sep. 02, 2020
  • Vol. 57 Issue 18 181006 (2020)
  • Ke Wu, Baohua Zhang, Xiaoqi Lü, Yu Gu, Yueming Wang, Xin Liu, Yan Ren, Jianjun Li, and Ming Zhang

    Aiming at the problems of deep network depth, low utilization rate of feature relationship and low time efficiency in existing pedestrian recognition algorithm based on deep learning, this paper proposes an improved method based on squeeze and excitation residual neural network (SE-ResNet) and feature fusion. By introducing the squeeze and excitation (SE) module, the features are compressed and excited on the feature channels, and then weights are assigned to each channel to enhance the useful feature channels and suppress the useless feature channels to reduce the depth of the network model. In order to improve the recognition accuracy and computing efficiency, shallow features and deep features are used, and feature extraction modules are deleted. The relationship between the size of convolution kernel and the running time and recognition accuracy is modeled to find the best balance point. Experimental results show that compared with ResNet50, the recognition accuracy of this algorithm is 4.26 percentage points higher, mean average accuracy value is 17.41 percentage points higher. Compared with other classic algorithms, the recognition accuracy of this algorithm has also been improved to varying degrees, and the robustness is better.

    Sep. 02, 2020
  • Vol. 57 Issue 18 181007 (2020)
  • Junyu Zhong, Jian Qiu, Peng Han, Kaiqing Luo, Li Peng, and Dongmei Liu

    Head pose estimation is widely used in many fields, mostly based on two-dimensional (2D) images. However, there is little research on the combination of three-dimensional (3D) face reconstruction. The reconstructed head 3D information can provide more effective data information for head pose estimation, and greatly improve the accuracy and accuracy of head pose estimation. Hence, in this paper, a recombination of 3D reconstruction based on structured light and 3D head pose estimation is proposed to reconstruct 3D facial morphology and realize 3D point cloud visualization. At the same time, a 3D head pose estimation algorithm is put forward, which searches the nose tip and nose bridge, establishes the space rectangular coordinate system and the face eigen coordinate system, and uses the vertical symmetry of the face to estimate the Euler angle of the head posture. The results show that the Euler angle of the 3D reconstruction can be measured in the range of -25° to 25°. The average and standard deviations of the absolute errors are both less than 1°. The linear correlation between the value and the true value is 99.8%. Comparing with the head pose estimation based on 2D images, the algorithm in this paper is more accurate and robust.

    Sep. 02, 2020
  • Vol. 57 Issue 18 181008 (2020)
  • Xingyu Chen, Weijin Zhang, Weizhi Sun, Ping'an Ren, and Ou Ou

    Recent years, although the super-resolution reconstruction technology based on neural network has developed rapidly, there are still some shortcomings, such as difficult to find the appropriate size of convolution kernel, and slow convergence speed caused by too deep network layers. In this paper, a model which can extract features at multiple scales and contains multi-residual structure is proposed. Low-resolution image is input to the network, through serial multi-scale residual blocks, extracted and concatenated features at multiple scales in each block, after residual structure the image outputs to the next block, after all blocks, builds residual again, and finally outputs high-resolution image through sub-pixel convolution. The experimental results show that the proposal of multi-residual structure makes faster convergence, and the multi-scale structure extracts image features better to make the image excel other mainstream algorithms in whether subjective or objective measurement.

    Sep. 02, 2020
  • Vol. 57 Issue 18 181009 (2020)
  • Jiulun Fan, Yang Yan, Haiyan Yu, Dan Liang, and Mengfei Gao

    Kernel possibilistic C-means (KPCM) clustering algorithm introduces kernel method into possibilistic clustering, which can effectively cluster hypersphere, noisy, and singular point data, but it has the center coincidence problem of possibilistic clustering. Therefore, the β cut-set is introduced into the KPCM clustering algorithm, and the typical values of some sample data are modified by generating clustering kernel to improve the relationship between classes. At the same time, a Kernel possibilistic C-means clustering algorithm based on cut-set threshold (C-KPCM) is proposed to overcome the defect of consistency clustering of KPCM clustering algorithm. Combined with the non-local spatial information of the image, the adaptive median filtering algorithm is used to adaptively adjust the filtering radius to generate new fuzzy factors, which are added to the objective function of C-KPCM clustering algorithm. The kernel possibilistic C-means clustering algorithm based on non-local spatial information enhances the robustness of the clustering algorithm under strong noise interference. Simulation results verify the effectiveness of the proposed algorithm.

    Sep. 02, 2020
  • Vol. 57 Issue 18 181010 (2020)
  • Qingsheng Zhao, Yuying Wang, Dingkang Liang, and Zun Guo

    This paper proposes a BOF(bag of features image) retrieval algorithm to classify electrical equipment images. First, the location of feature points is determined by speed up robust features (SURF) algorithm, and a high-dimensional feature description operator is constructed to describe and count the features. Then, the K-means clustering algorithm is used to deal with the feature description operators, and the independent visual vocabularies are collected into a specific number of codebooks. The feature description operators in codebooks are quantified and weighted, and the eigenvector histogram is used to represent the entire image. Finally, the high-dimensional feature vectors of the training set images are used for machine learning, and the unknown images are classified quickly and accurately. Electrical equipment images under natural light conditions and infrared images under the working conditions of electrical equipment are taken as two experimental sample sets for classification test. The results show that the algorithm can classify different image sets quickly and accurately with the highest accuracy of 95.59%.

    Sep. 02, 2020
  • Vol. 57 Issue 18 181011 (2020)
  • Tiantian Zhu, Zhongnan Fu, Mei Zhang, Pei Ye, and Guihua Li

    The selection of initial values in digital image correlation (DIC) method has a great influence on the search efficiency of image subpixel displacement and the algorithmic convergence speed. A new digital image correlation method based on KAZE feature matching, namely KAZE algorithm, is proposed in this paper. Compared with the traditional feature detection algorithm, the KAZE algorithm can eliminate noise and extract feature points by establishing nonlinear scale space, which can effectively avoid boundary blurring and detail loss. The coordinates of feature points before and after image deformation are matched by the KAZE algorithm, and the initial deformation parameters are estimated by affine transformation; the inverse combined Gaussian Newton method is used to iteratively optimize the estimated initial values. The results show that this method has higher search efficiency than traditional methods without precision loss.

    Sep. 02, 2020
  • Vol. 57 Issue 18 181012 (2020)
  • Mingwei Xiao, Sumin Li, and Yan Li

    To improve the fusion performance of the domestic synthetic aperture radar (SAR) and optical images, multi-spectral images of GF-1 and GF-2 and the SAR spotlight (SL) and fine stripe 1 (FS1) model images of GF-3 are used. Using two different fusion ideas of transform domain and space domain, we propose a fusion algorithm, namely IHS_NSST, based on non-subsampled shearlet wave transform (NSST) combined with the intensity-hue-saturation (IHS) transform and the regional improved pulse coupled neural network (PCNN) method. In this algorithm, IHS transform is firstly performed on multi-spectral images. Secondly, in the sub-band of NSST decomposition, the regional idea is used. The region energy averaging algorithm is used for low frequency components, and the PCNN algorithm based on the improved sum of modified Laplacian (SML) excitation is used for high frequency components. Finally, the proposed algorithm is compared with many fusion methods in terms of qualitative and quantitative evaluations. The results show that the fusion method based on regional IHS_NSST has a great advantage in the fusion of the high resolution SAR and optical images. The proposed method greatly improves the fusion performance, reduces spectral distortion, maintains spatial feature information better, and improves the availability of domestic high resolution SAR and optical images.

    Sep. 02, 2020
  • Vol. 57 Issue 18 181013 (2020)
  • Chen Qu, and Duyan Bi

    This study focuses on the problem of a priori blind zone, which is generated by the current single-frame haze image restoration algorithm using a single prior. To address this problem, a haze image restoration algorithm using multiple prior constraints is proposed. First, the saturation prior is proposed, and the defined adjustment coefficient is used to simplify the process of solving the rough transfer diagram. Second, in the Markov random field model, the color attenuation prior is used to constrain and optimize the adjustment coefficient to obtain an accurate transfer diagram. Then, the light and dark pixels are used to obtain accurate atmospheric light a priori. Finally, the fog-free image is restored. Experimental results reveal that compared with other algorithms, Compared with the proposed algorithm, other algorithms have reduced the effective detail intensity by 24.9%, 51.4%, 41.5%, and 39.3%, respectively, and the hue reproduction has decreased by 21.4%, 24.8%, 24.1%, and 29.5%, respectively. The proposed algorithm successfully restores the image. Consequently, the effective detail information in the image becomes rich, and the color tone becomes natural. Moreover, it enables the image to have strong applicability.

    Sep. 02, 2020
  • Vol. 57 Issue 18 181014 (2020)
  • Yi Zhang, Zhiyuan Gong, and Wenwen Wei

    In this paper, we present a traffic sign system using Faster R-CNN(Faster Region-Convolutional Neural Networks) for the active safety performance of automobiles. The detection algorithm (Faster R-CNN) has been improved and can be applied to traffic signs detection. Therefore, the basic network of the detection algorithm is designed using multi-scale convolution kernel ResNeXt model. Moreover, the algorithm has a multi-dimensional feature fusion strategy and it is adopted because it meets the needs of small target detection in traffic signs. In designing the RPN (Regional Proposal Network) for Faster R-CNN, anchor frames are designed by fitting traffic sign features to efficiently obtain recommended areas, and reduce false and missed detection rate. According to the experimental results in the TT100K dataset, the improved algorithm has an excellent detection effect on traffic signs under the conditions of small targets, multiple targets, and complex backgrounds, with an average accuracy of 90.83%.

    Sep. 02, 2020
  • Vol. 57 Issue 18 181015 (2020)
  • Dongxu Han, and Baojiang Zhong

    Current edge detection algorithms based on convolutional neural network usually give the probability that each pixel in the image is an edge, namely the edge probability map. To address the problems of edge loss and discontinuity after edge probability map thinning, an edge thinning algorithm based on a gradient mask filter is proposed. To obtain the high-gradient and low-gradient masks, a dual threshold method based on the Canny edge detection algorithm is introduced. Then, an edge probability map filtered using the high gradient mask is enhanced, and that filtered by the low gradient mask is weakened. Finally, we performed non-maximum suppression on an edge probability map to obtain a binary edge map. The experimental results indicate that the proposed edge thinning algorithm provides more continuous edges and conforms to the single-edge response criterion.

    Sep. 02, 2020
  • Vol. 57 Issue 18 181016 (2020)
  • Weiming Yao, Xiaohua Wang, and Nan Wu

    In human-robot collaborative sewing, the premise of realizing human-robot interaction is based on the detection and understanding of robot sewing gestures. The traditional algorithm is characterized by a low gesture-recognition rate and poor target-gesture detection. Therefore, a method based on an improved single-shot multibox detector (SSD) model for recognizing sewing gestures is proposed in this work. First, a deeper Resnet50 residual network was introduced to replace the VGG16 basic network in the original SSD model to improve the network feature extraction capability. Subsequently, a feature base pyramid (FPN)-based network structure was used to perform a fusion of high- and low-level features, thereby further improving the detection accuracy. Experiment results reveal that for the constructed sewing gesture dataset, the improved model exhibited higher detection accuracy than the original SSD algorithm and other algorithms. Furthermore, the residual connection in the network improved accuracy without increasing the number of parameters and complexity of the model. In our method, the average detection speed is 52 frame/s, which can fully meet the requirements for real-time detection of sewing gestures.

    Sep. 02, 2020
  • Vol. 57 Issue 18 181017 (2020)
  • Xiaoman Cui, and Fengqin Yu

    The traditional generation model causes image blurring and lack of details. Therefore, in this paper, we propose a conditional generation adversarial network combining with the powerful feature extraction capability of the variational autoencoder to realize high-quality photo generation. In training process of sketch-photo generation, sketches with the same style are used, leading to the monotonous input image. The hand-drawn sketches of various artists have different styles. Therefore, using sketches in multiple styles to extend the training dataset, the universality of the model is improved. The experimental results demonstrate that the similarity of the generated photos using the proposed method improves by 0.09 (to 0.77) based on CUHK student data set. In addition, the compared with the unexpanding training set, the similarity of the generating image using our training set also improves by 0.233 (to 0.603).

    Sep. 02, 2020
  • Vol. 57 Issue 18 181018 (2020)
  • Xiangdan Hou, Xixin Yu, and Hongpu Liu

    PointNet model only extracts features of isolated points and therefore does not consider neighborhood structure information among points. To address this limitation, we propose GraphPNet, a 3D point cloud classification and segmentation model based on graph convolutional networks. The 3D point cloud is transformed into an undirected graph structure. Then, the neighborhood structure information of the 3D point cloud is obtained from the undirected graph structure. Classification and segmentation accuracy are improved by fusing neighborhood information with single point information. In classification experiments, GraphPNet is trained and tested on the ModelNet40 dataset and compared with VoxNet, PointNet, and 3D ShapeNets models. The results demonstrate that GraphPNet obtains better accuracy than the other models. In segmentation experiments, the ShapeNet dataset is used for training and testing, and the mean intersection over union values of GraphPNet and other segmentation models, such as PointNet, are compared. The results confirm the effectiveness of the proposed GraphPNet model.

    Sep. 02, 2020
  • Vol. 57 Issue 18 181019 (2020)
  • Guoliang Yang, Zhendong Lai, and Yang Wang

    Aiming at the problem of skin lesion image segmentation, a skin lesion image segmentation is proposed based on multi-scale DenseNet. First, the morphological closing operation and the un-sharp filter are used to preprocess the original skin lesion image and to obtain a refinement image without hairs and blood-vessel artifacts. Then, the pre-processed image is input into a segmentation network. This network is based on an encoder-decoder architecture and uses two multi-scale feature fusion methods of parallel multi-branch structure and pyramid pooling block model to achieve feature extraction under different receptive fields. Furthermore, the DenseNet structure is integrated into the encoder to realize the reuse of image features, and the LTotal loss function which combines target loss and content loss is adopted to further improve the accuracy of image segmentation. Finally, the segmentation results are obtained through the SoftMax classifier and the related evaluation indicators are calculated. The experimental results on the ISBI 2016 skin lesion image dataset show that the accuracy, Dice coefficient, Jaccard index, sensitivity, and specificity are 95.48%, 96.37%, 93.41%, 92.93%, and 96.49%, respectively, and the whole performance here is better than those of the existing algorithms. The proposed algorithm can accurately segment skin lesions and thus it can be applied to the melanoma computer-aided diagnosis systems.

    Sep. 02, 2020
  • Vol. 57 Issue 18 181020 (2020)
  • Ke Zhou, Chengmao Wu, and Changxing Li

    When image quality evaluation algorithms are used to evaluate one color image, the color information of the color image is often lost and the integrity is destroyed, which makes the evaluation result inconsistent with the subjective evaluation result. Based on the idea that the more blurred the image is, the less high-frequency components it contains, a no-reference color image quality assessment algorithm is proposed. First, a quaternion matrix is used to characterize the color image and its spectrum can be obtained by quaternion Fourier transform. Then, the threshold value of the high-frequency components is calculated. Finally, the number of pixels above the threshold value is used to calculate the color image quality score. The experimental results show that the predicted results have good accuracy and monotonicity, and match well with the subjective evaluation results. The proposed algorithm has better anti-noise performance, lower computational complexity and better overall performance than existing algorithms.

    Sep. 02, 2020
  • Vol. 57 Issue 18 181021 (2020)
  • Qing Luo, Wei Zhou, Zijun Ma, and Haixia Xu

    In this paper, a classification method for seven types of dermoscopic images based on deep convolution neural network model is proposed. The training set is amplified using the data augmentation method. For the multiclassification of dermoscopic images, the multiclassification model (FL-ResNet50 model) based on ResNet50 model and multiclassification Focal Loss function is proposed. The experimental results show that the micromean F1 value of FL-ResNet50 model is 0.88, which is better than the results obtained using the traditional ResNet50 model. The proposed method realizes seven types of dermoscopic image classification and forms a complete and continuous system model by image preprocessing, feature extraction, and prediction model learning. The FL-ResNet50 model improves upon the classification performance and efficiency of the previous models and has important application value.

    Sep. 02, 2020
  • Vol. 57 Issue 18 181022 (2020)
  • Wei Liu, and Hongwei Ge

    In the first stage, a classification algorithm is used to select M-type training samples with a small distance from the test samples. And in the second stage, the selected M-type training samples are used as a new training sample set for the second-stage recognition. To increase the recognition speed, an algorithm that can select M-type training samples quickly is proposed. First, a k-means clustering algorithm is used to aggregate the training samples into a large cluster. For a new test sample, the distance between the centers of each large cluster is calculated; then, several large clusters that are closer to the test sample are selected. The categories of these large clusters are included in the new training set. Training samples with the corresponding categories are combined to form a new training sample set that is used for the second-stage recognition. Experiments on different face databases confirm that the proposed algorithm can achieve faster recognition speed based on the slightly improved recognition rate.

    Sep. 02, 2020
  • Vol. 57 Issue 18 181024 (2020)
  • Hui Jin, and Xinyang Li

    The feature expression of the target is the key to the target tracking process. The artificial features are relatively simple and have strong real-time performance, but the expression ability is insufficient. It is easy to produce tracking drift when dealing with the problems of rapid change and target occlusion. The strong feature expression ability of deep neural network (DNN) in target detection and recognition tasks makes DNN gradually become a feature extraction tool. A deeper residual neural network (ResNet) is used to replace VGG-19 network as a feature extraction tool. First, the special additional layer structure and convolution layer features in ResNet-50 are fused to obtain target representation features with stronger robustness. Then, the feature is filtered and the target position is determined according to the maximum response value. Finally, in order to expand the application scene of the algorithm in the field of local target tracking, a graphic based visual saliency detection algorithm is used to increase the weight value of the local target and suppress background information, so as to improve the target representation ability of the feature layer.

    Sep. 02, 2020
  • Vol. 57 Issue 18 181025 (2020)
  • Jingfeng Shao, and Haiqiang Feng

    In order to solve the problem of low recognition accuracy caused by insufficient illumination and occlusion, a convolutional neural network model based on transfer learning was constructed. Based on the analysis of yarn quality index, the classification standard of the expression of the spinner was determined, and the expression data set was established. At the same time, the data set was preprocessed by histogram equalization, ROF (Rudin-Osher-Fatemi) denoising and facial correction. On the basis of intercepting the real-time eye data of the spinner, the transfer learning method was used to train the expression recognition model. Finally, through the experimental verification, it is shown that the recognition accuracy of the proposed model was as high as 98%, which effectively solves the problem that the spinner expression can not be recognized due to illumination and occlusion.

    Sep. 02, 2020
  • Vol. 57 Issue 18 181026 (2020)
  • Yingchun Wu, Xing Cheng, Yingxian Xie, and Anhong Wang

    The defocus and correspondence (DCDC) algorithm uses the spatial information and angular information of four-dimensional light field to complete depth estimation. The defocus response function established by the DCDC algorithm has the problems of limited second-order derivative direction and positive and negative energy offset, which limit its depth reconstruction accuracy in complex scenes. Aiming at the problems, the defocusing response function is improved, and a light field depth estimation algorithm based on energy enhanced defocusing response is proposed. The algorithm fully considers the influences of the surrounding pixels on the defocusing degree of the current position, and realizes the energy enhancement of the defocusing response by increasing the number and direction of second-order derivatives with different weights. Experiments verify the effectiveness of the proposed method. For some light field images with complex boundaries, the depth map obtained by the proposed algorithm has better visual effects. Compared with the DCDC algorithm, the root mean square error of the depth image obtained by the proposed method decreases by 3.95% on average.

    Sep. 02, 2020
  • Vol. 57 Issue 18 181027 (2020)
  • Guo Peng, Weiming Li, Yang Huang, Yihai Cheng, and Xingyu Gao

    In this paper, we propose an improved least squares unwrapping algorithm. This algorithm is aimed at solving the problems associated with smooth transition, large number of iterations, and long running time of least squares unwrapping in local high-density noise and wire-drawing regions of laser speckle interference images. This algorithm is based on the law that the speckle interference image approximately obeys the periodic parabolic distribution. First, the coordinate points where the noise is located are locked using two matrix transformations. Then use the mask technology and combine the two-dimensional discrete cosine transform and Picard iterative method to suppress the propagation of noise, so as to obtain smooth images. The experimental results show that laser speckle interferometry is very sensitive to local high-density noise. Thus, the proposed algorithm has fewer iterations and shorter calculation time during image smoothing and optimization compared with the traditional least-squares iterative algorithms. The recognition rate of interferometry under single noise and deformation interference is approximately 96%, and the accuracy is better than traditional algorithms, which has high engineering application value.

    Sep. 02, 2020
  • Vol. 57 Issue 18 181101 (2020)
  • Rui Yang, Baohua Zhang, Yanyue Zhang, Xiaoqi Lü, Yu Gu, Yueming Wang, Xin Liu, Yan Ren, and Jianjun Li

    In this paper, we propose a moving target tracking algorithm based on the adaptive fusion of depth futures. This algorithm is aimed at solving the problems of poor anti-occlusion ability and robustness of traditional tracking algorithms in complex scenes. Considering the strong robustness of deep features and the advantages of high precision of shallow features, the deep sparse features are constructed using the sparse autoencoder to extract target features. Then, the depth features are adjusted according to the correlation information between adjacent frames as well as tracking confidence adaptive fusion with texture information to improve the tracker performance. To improve the robustness of the tracking algorithm while suppressing tracking drift when the confidence is lower than the set threshold, we introduce an improved speeded up robust features algorithm to locate the target. Experimental results show that the proposed algorithm has higher tracking accuracy, better robustness in occlusion scenes, and can effectively suppress tracking drift compared with the mainstream tracking algorithms.

    Sep. 02, 2020
  • Vol. 57 Issue 18 181501 (2020)
  • Yuqing Liu, Junkai Feng, Bowen Xing, and Shouqi Cao

    Owing to the high reflectivity of the water surface and the influence of edge features such as ripples, the traditional water surface target recognition algorithm is unable to appropriately identify the target. To this end, a water surface target recognition algorithm based on deep learning is proposed herein. First, a large number of target samples are collected and labeled, then the parameters and network structure of the algorithm are optimized based on the principle of the YOLOv3 (You Only Look Once v3) algorithm. Then, the target samples are trained using the deep convolutional neural network. The data enhancement of the target sample is conducted to adapt to different environments to improve the robustness of the proposed algorithm, and the phase correlation waterfront recognition algorithm is used to improve the recognition speed. Finally, the weight file obtained from the network structure training of the proposed algorithm is used to establish a surface target recognition system, which can achieve a higher recognition rate. Experimental results verify the effectiveness and robustness of the proposed algorithm and can provide a reference for future research on surface target recognition.

    Sep. 02, 2020
  • Vol. 57 Issue 18 181502 (2020)
  • Jun Liu, Rui Zhang, and Chaochao Hu

    To mitigate the problem of distance measurement errors in the existing vehicle ranging method based on lane line width of curved roads, this study propose a lane line lateral width estimation method at the curve based on the slope of the lane line. Further, an improved ranging model based on lateral width of the lane line is derived. Then, compared with the existing method, the ranging accuracy of the proposed method is obviously improved under the concentric circular lane line model. When the curvature of the lane line is 0.01 and 0.005, the distance measurement errors of proposed method within the true distance of 50 and 100 m are less than 3% and 1%, respectively. Finally, the proposed method is evaluated based on an actual road environment on the KITTI dataset. Results show that the average distance measurement error is less than 5%, indicating the accuracy of distance measurement is significantly improved at the curved road.

    Sep. 02, 2020
  • Vol. 57 Issue 18 181503 (2020)
  • Bei Yan, Li Zhang, Jianlin Zhang, and Zhiyong Xu

    Generative adversarial networks (GANs) effectively solve the difficulty of obtaining image data, but are disadvantaged by unstable training and poor quality of the generated images. To resolve these problems, this paper proposes an image generation method for an improved deep convolution GAN based on residual structures. The proposed method uses the residual structure to deepen the network and combines the image-label information to obtain the deep-level features of real image samples. It also introduces spectral constraints into the discriminator model, thereby improving the training stability of the network and the effective generation of the image data. The experimental results show that the proposed method has better performance in the visual effect and objective evaluation of the generated images.

    Sep. 02, 2020
  • Vol. 57 Issue 18 181504 (2020)
  • Shuhao Ma, and Jubai An

    Pneumonia is a disease that serious threat to human health, timely and accurate detection of pneumonia can help patients receive treatment as soon as possible. Therefore, in this paper, an improved Multi branch YOLO detection algorithm based on YOLOv3 is proposed. The output features of multi branch dilation convolution are used to replace the features of different levels in YOLOv3 for detection. Boosting thought is introduced into multi branch convolutional neural network, and the network is optimized with maximum entropy approach. Each convolution branch is regarded as a weak classifier, and the maximum entropy approach is adopted to promote each branch to learn the similar detection ability, so as to avoid the degeneration of multi branch convolution model into single-branch convolution model. Experimental data are provided by the radiological society of North America with lung X-ray images. The results show that algorithm's detection accuracy on experimental data sets is higher than other target detection algorithms.

    Sep. 02, 2020
  • Vol. 57 Issue 18 181505 (2020)
  • Na Pan, Min Jiang, and Jun Kong

    A human action recognition algorithm is proposed based on spatio-temporal interactive attention model (STIAM) to solve the problem of low recognition accuracy. This problem is caused by the incapability of the two-stream network to effectively extract the valid frames in each video and the valid regions in each frame. Initially, the proposed algorithm applies two different deep learning networks to extract spatial and temporal features respectively. Subsequently, a mask-guided spatial attention model is designed to calculate the salient regions in each frame. Then, an optical flow-guided temporal attention model is designed to locate the saliency frames in each video. Finally, the weights obtained from temporal and spatial attention are weighted respectively with spatial features and temporal features to make this model realize the spatio-temporal interaction. Compared with the existing methods on UCF101 and Penn Action datasets, the experimental results show that STIAM has high feature extraction performance and the accuracy of action recognition is obviously improved.

    Sep. 02, 2020
  • Vol. 57 Issue 18 181506 (2020)
  • Yang Jun, and Zhao Jinlong

    Sep. 01, 2020
  • Vol. 57 Issue 18 181507 (2020)
  • Guanyu Xu, Hongwei Dong, Junhao Qian, and Zhenlei Xu

    The existing three-dimensional object recognition and pose estimation methods cannot solve the scene of random bins well, especially for scenes with severe occlusion and clutter. Aiming at this problem, a point cloud matching and pose estimation algorithm based on point pair features is used in this paper. A series of improvements are made to obtain more ideal pose estimation results according to the characteristics of random bins in industrial environments, such as the adjustment of the normal direction consistency of the scene point clouds, the adjustment of the grab pose filtering strategy, and the adjustment of angular deviation caused by the rotation symmetry. In this paper, a series of experiments are carried out in the simulation environment and the real environment. Experimental results show that the adopted algorithm has good pose estimation effect in the scene of random bins.

    Sep. 02, 2020
  • Vol. 57 Issue 18 181508 (2020)
  • Ze Zhu, Qingbing Sang, and Hao Zhang

    With the rapid development of video technology, more and more video applications gradually enter people's lives, Therefore, conducting research on video quality is very meaningful. Herein, a no-reference video quality assessment algorithm based on the powerful feature-extraction capabilities of convolutional neural networks and recurrent neural networks combined with the attention mechanism is proposed. This algorithm first extracts the spatial features of the distorted videos by using the Visual Geometry Group (VGG) network, the distortion of video airspace feature extraction. Further, we use cycle time-domain features of neural networks to extract the video distortion. Then the introduced attention mechanism important degree for the space-time characteristics of the video is calculated according to the important degree of the overall characteristics of the video. Finally, regression of the entire connection layer is performed to obtain the evaluation score of the video quality. Experiment results on three public video databases show that the predicted results are in good agreement with human subjective quality scores and have better performance than the latest video quality evaluation algorithms.

    Sep. 02, 2020
  • Vol. 57 Issue 18 181509 (2020)
  • Weisong Yang, Shuaiping Guo, Xuejun Li, and Hongguang Li

    Aiming at resolving the issue of accuracy of the existing checkerboard corner detection algorithms, we propose a high-precision checkerboard corner detection algorithm based on Hough transform and circular template. First, we used the Hough transform to extract straight lines in an image, used the distribution features of the checkerboard lines to obtain the effective straight lines, and then obtained and roughly located the approximate corner points based on these lines. Second, we constructed a new circular template that is moved around the roughly located corner points to search for related points; we simultaneously obtained their image coordinates and observation distances. Finally, we solved more accurate corners, which should satisfy the minimum difference between the actual distances from related points and their observation distances. The experimental results reveal that the calibration error of the proposed method significantly reduces compared with the existing methods. When the illumination is not ideal, the proposed method can also realize accurate detection. This method provides a strong basis for the application of high-precision calibration of actual cameras.

    Sep. 02, 2020
  • Vol. 57 Issue 18 181510 (2020)
  • Jingfa Lei, Wang Wei, Yongling Li, Miao Zhang, and Yu He

    Whether the external dimensions of hydraulic components are qualified is directly related to the quality and performance of hydraulic products. Aiming at the problem that the weak texture on the surface of the hydraulic component caused great interference to the measurement, the method of projecting random spots to the target scene was used to enhance the texture of the hydraulic component. A binocular stereo vision system was set up and calibrated by a new circular array calibration plate to obtain the internal and external parameters of the two cameras. The digital images collected by binocular cameras were matched by pixels using the normalized covariance correlation algorithm after the limit constraint to obtain the disparity value and the three-dimensional data of the corresponding positions of the hydraulic components. Results showed that the system can obtain clearer point cloud images, and has a higher recognition rate and better robustness for the measurement of hydraulic components. It can meet the real-time detection of weak texture hydraulic components in assembly line production.

    Sep. 02, 2020
  • Vol. 57 Issue 18 181511 (2020)
  • Jinxuan Huang, Yanfang He, and Feng Lin

    The capsule endoscope currently on the market only images the front field of view. Furthermore, the folds and loops of intestinal tissue in the human body cause a large number of blind spots in the rear field of view. In this paper, we report the design of a capsule objective lens that simultaneously images the front (0°-80°) and the side rear (50°-80°) fields of view. The objective lens is 15.6 mm in length, and the maximum radial diameter is 8 mm. An OH02A10 imaging chip ensures a high-resolution image (maximum cut-off frequency >0.5 at 89 lp/mm) and provides good image quality.

    Sep. 02, 2020
  • Vol. 57 Issue 18 181701 (2020)
  • Kailong Ren, Yi Wang, Xiaodong Chen, and Huaiyu Cai

    A long short-term memory (LSTM) recurrent neural network based on an i-vector feature is presented for speech control of laparoscopic supporter to realize short-term isolated word command recognition from the speech of a specific doctor using small training samples. In this model, LSTM recurrent neural network is used as the basic model, Mel-frequency cepstrum coefficient (MFCC) is used as the input characteristic parameter, i-vector feature is used as the deep input information of LSTM recurrent neural network, and the deep feature information behind LSTM layer in the neural network is spliced to achieve the purpose of parameter fusion, so as to realize the accurate recognition of the voice instructions of the specific surgeon and the rejection recognition of the voice instructions of the non surgeon. This approach offers a secure and intelligent speech recognition scheme for laparoscopic surgeries. Further, a self-built speech database is used as a training library to verify speech recognition performance of the proposed algorithm as well as its rejection performance for the speech not included in the training library. Experiments show that compared with dynamic time warping(DTW)and Gaussian mixture model-Hidden Markov model (GMM-HMM), the proposed model exhibits a 99.6% correct recognition rate for voice commands of specific people recorded in the training library while maintaining a false acceptance rate of 0%, with an average false acceptance rate of 2.5% for voices not included in the training library. The proposed model meets the requirements of accuracy and safety expected by laparoscopic supporter control standards.

    Sep. 02, 2020
  • Vol. 57 Issue 18 181702 (2020)
  • Ye Li, Lei Zhang, Siyuan He, Yunhua Zhang, and Guoqiang Zhu

    With the development of synthetic aperture radar (SAR) technology, the amount of data acquired by SAR increases rapidly. When SAR images are used to identify targets, the amount of calculation is large and time-consuming. In order to realize fast and effective recognition of targets, we propose a fast classification method of targets based on geometric model. In this method, binary target region and shadow region are selected as features. First, the forward features are predicted by using the optical visible information of the target geometry model. Then, the binary region extracted from the measured SAR images is aligned with the predicted binary region to establish the correlation. Finally, by judging the similarity criterion, the target classification is realized, and the efficiency and validity of the method are verified on MSTAR data set. Since this method does not involve time-consuming electromagnetic calculation, it can reduce the amount of calculation and accelerate the speed of target recognition.

    Sep. 02, 2020
  • Vol. 57 Issue 18 182801 (2020)
  • Jinguang Sun, Yanbei Li, Xian Wei, and Wanli Wang

    Conventional hyperspectral image classification only considers ground objects' spectral information and ignores the spatial information. The existing space-spectrum joint classification methods are difficult to effectively extract the spatial neighborhood information. To address these problems, this paper proposes a method that combines a convolutional neural network with a sparse dictionary. Most existing sparse coding methods only consider the spectral information and discard the spatial information. The proposed method leverages the advantages of a convolutional neural network to effectively extract the deep data features, simultaneously extracts the spatial-spectral features of hyperspectral images, obtains the high-dimensional deep features, and then applies sparse coding to the deep features through dictionary learning to obtain the identification features for classification. The classification results are confirmed using the classifier. Experiments are conducted in which three open datasets are classified using the proposed method and five existing algorithms. The proposed method outperforms the other methods in terms of overall classification accuracy, average classification accuracy, and the Kappa coefficient. The experimental results demonstrate that the proposed method can simultaneously extract the spatial-spectral features of hyperspectral data, has good robustness and discrimination, effectively improves classification accuracy, and performs well on a dataset with a small number of samples.

    Sep. 02, 2020
  • Vol. 57 Issue 18 182802 (2020)
  • Yuchen Chen, Chuankang Li, Xiang Hao, Cuifang Kuang, and Xu Liu

    The resolution of conventional optical microscopes is limited to about half of the wavelength of illumination light due to the optical diffraction limit, which severely limits the observation of finer structures in biological and material research fields. As the most typical and earliest point scanning microscopy, the confocal microscopy has become the most widely used optical microscopy with good optical slicing ability and high signal-to-noise ratio. However, due to the limited cut-off frequency of the confocal microscopy, the improvement of resolution is also limited. Frequency shifting technique aims to move the higher frequency information to the observable frequency range, so as to improve the resolution of point scanning microscopy. In this review, the basic principle, advantages, and disadvantages of point scanning frequency shifting super-resolution imaging technology are introduced in detail, and its prospect is also given.

    Sep. 02, 2020
  • Vol. 57 Issue 18 180001 (2020)
  • Ying Liu, Yaliang Yang, and Xian Yue

    Optical coherence tomography angiography (OCTA) is a new noninvasive imaging method, which does not require dye injection and can thus be repeatedly used. It can present the fundus vascular network system including capillaries with high resolution and sensitivity. Its axial resolution up to micron level enables us to locate the primary location of lesions in the retina and choroid. OCTA can provide the same or even better vascular observation effect as the current gold standards. Therefore it has been developed rapidly, and its commercial products and applications in clinical practice have appeared. In order to help the relevant personnel quickly understand this technology, this paper introduces its principle and methods, applications in ophthalmology, current situations of products and clinical applications, existing shortcomings and prospects.

    Sep. 02, 2020
  • Vol. 57 Issue 18 180002 (2020)
  • Fubin Wang, Zhilin Sun, and Shangzheng Wang

    In this study, the grayscale characteristics of plasma spots are analyzed. First, the L component of the original spot images is extracted based on the principal component combined with mask processing in the HSL color space. Thus, effective grayscale images can be obtained with a high signal-to-noise ratio. The average grayscale curves of single spot image are analyzed before and after processing, and the grayscale curves of the original spot images contain several burrs, whereas those of the processed image are smoother. Second, by extracting the grayscale feature of the spot sequence images, the graysale value of the spot sequence images can be observed to change periodically with the reciprocating ablation motion of the femtosecond laser beam. Third, a method that combines fractal interpolation with wavelet transformation is used for processing the average grayscale characteristic curves of the spot sequence images. This method not only keeps the detail information of the average grayscale characteristic curve clearly, but also makes the curve smoother. Finally, by analyzing the correlation between the spot grayscale characteristics as well as the ablative processing power and the movement of the processing platform, a strong correlation can be observed between them. Thus, this study provides a significant basis for adjusting the laser power and obtaining the processing-depth feedback of microstructures based on the spot grayscale index.

    Sep. 02, 2020
  • Vol. 57 Issue 18 183201 (2020)
  • Please enter the answer below before you can view the full text.
    2-2=
    Submit